{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "95e389fe",
   "metadata": {},
   "source": [
    "# BCSD: Bias Correction and Spatial Disaggregation\n",
    "\n",
    "## Setup\n",
    "\n",
    "First, let's import the necessary libraries."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2a80241b",
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "import scipy.stats\n",
    "import xarray as xr\n",
    "\n",
    "from skdownscale.pointwise_models import BcsdPrecipitation, BcsdTemperature"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "315465ce",
   "metadata": {},
   "source": [
    "## Helper Functions for Visualization\n",
    "\n",
    "We'll define helper functions to visualize cumulative distribution functions (CDFs), which are useful for comparing the statistical properties of our input and downscaled data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a9c67616",
   "metadata": {},
   "outputs": [],
   "source": [
    "def plot_cdf(ax=None, **kwargs):\n",
    "    \"\"\"Plot cumulative distribution function for multiple datasets.\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    ax : matplotlib axis, optional\n",
    "        Axis to plot on\n",
    "    **kwargs : dict\n",
    "        Datasets to plot, where key is the label and value is the data\n",
    "    \"\"\"\n",
    "    if ax:\n",
    "        plt.sca(ax)\n",
    "    else:\n",
    "        ax = plt.gca()\n",
    "\n",
    "    for label, X in kwargs.items():\n",
    "        vals = np.sort(X, axis=0)\n",
    "        pp = scipy.stats.mstats.plotting_positions(vals)\n",
    "        ax.plot(pp, vals, label=label)\n",
    "    ax.legend()\n",
    "    ax.set_xlabel('Cumulative Probability')\n",
    "    ax.set_ylabel('Value')\n",
    "    return ax\n",
    "\n",
    "\n",
    "def plot_cdf_by_month(**kwargs):\n",
    "    \"\"\"Plot CDFs separately for each month of the year.\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    **kwargs : dict\n",
    "        Datasets to plot, where key is the label and value is the data\n",
    "    \"\"\"\n",
    "    fig, axes = plt.subplots(4, 3, sharex=True, sharey=False, figsize=(12, 8))\n",
    "    fig.suptitle('CDFs by Month', fontsize=14)\n",
    "\n",
    "    for label, X in kwargs.items():\n",
    "        for month, ax in zip(range(1, 13), axes.flat):\n",
    "            vals = np.sort(X[X.index.month == month], axis=0)\n",
    "            pp = scipy.stats.mstats.plotting_positions(vals)\n",
    "            ax.plot(pp, vals, label=label)\n",
    "            ax.set_title(f'Month {month}')\n",
    "\n",
    "    # Add legend to last subplot\n",
    "    axes.flat[-1].legend()\n",
    "    fig.tight_layout()\n",
    "    return fig, axes"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cfc57fdd",
   "metadata": {},
   "source": [
    "## Step 1: Load Climate Data\n",
    "\n",
    "We'll load sample climate data that includes:\n",
    "- **Training data**: Coarse-resolution climate model output (predictor)\n",
    "- **Target data**: High-resolution observations (predictand)\n",
    "\n",
    "The data is stored in a cloud-optimized Zarr format for efficient access."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "099a0f07",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define the training period\n",
    "training_time_slice = slice('1980', '2001')\n",
    "\n",
    "# Load the data from cloud storage\n",
    "data = xr.open_datatree(\n",
    "    's3://carbonplan/share/scikit-downscale/test-data.zarr',\n",
    "    engine='zarr',\n",
    "    chunks={},\n",
    "    storage_options={'anon': True, 'endpoint_url': 'https://rice1.osn.mghpcc.org'},\n",
    ")\n",
    "\n",
    "# Extract training and target datasets\n",
    "training = data['training'].to_dataset().sel(time=training_time_slice)\n",
    "targets = data['targets'].to_dataset().sel(time=training_time_slice)\n",
    "\n",
    "print('Training data:')\n",
    "display(training)\n",
    "print('\\nTarget data:')\n",
    "display(targets)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bdf9c71e",
   "metadata": {},
   "source": [
    "## Step 2: Prepare Data for a Single Location\n",
    "\n",
    "For this tutorial, we'll focus on a single spatial point. We'll extract and prepare both temperature and precipitation data.\n",
    "\n",
    "### Temperature Data\n",
    "\n",
    "- Convert from Kelvin to Celsius\n",
    "- Resample to monthly means"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a9267b06",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Extract temperature data for point 0\n",
    "# Training: daily maximum temperature from climate model\n",
    "X_temp = (\n",
    "    training.isel(point=0)\n",
    "    .to_dataframe()[['T2max']]\n",
    "    .resample('MS')  # Monthly start frequency\n",
    "    .mean()\n",
    "    - 273.15  # Convert Kelvin to Celsius\n",
    ")\n",
    "\n",
    "# Target: observed daily maximum temperature\n",
    "y_temp = targets.isel(point=0).to_dataframe()[['Tmax']].resample('MS').mean()\n",
    "\n",
    "print('Training temperature (first 5 months):')\n",
    "display(X_temp.head())\n",
    "print('\\nTarget temperature (first 5 months):')\n",
    "display(y_temp.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cb8d49cc",
   "metadata": {},
   "source": [
    "### Precipitation Data\n",
    "\n",
    "- Convert units to mm/day by multiplying by 24\n",
    "- Resample to monthly totals"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "425b99e4",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Extract precipitation data for point 0\n",
    "# Training: total precipitation from climate model\n",
    "X_pcp = (\n",
    "    training.isel(point=0).to_dataframe()[['PREC_TOT']].resample('MS').sum()\n",
    "    * 24  # Convert to mm/day equivalent\n",
    ")\n",
    "\n",
    "# Target: observed precipitation\n",
    "y_pcp = targets.isel(point=0).to_dataframe()[['Prec']].resample('MS').sum()\n",
    "\n",
    "print('Training precipitation (first 5 months):')\n",
    "display(X_pcp.head())\n",
    "print('\\nTarget precipitation (first 5 months):')\n",
    "display(y_pcp.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "27b90bb0",
   "metadata": {},
   "source": [
    "## Step 3: Downscale Temperature with BCSD\n",
    "\n",
    "The BCSD temperature model uses quantile mapping to correct biases in the climate model output.\n",
    "\n",
    "### How it works:\n",
    "1. **Fit**: Learn the relationship between model and observed quantiles\n",
    "2. **Predict**: Apply the correction to new data\n",
    "3. **Adjust**: Add the bias-corrected anomaly back to the original data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "30560ef4",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initialize the BCSD temperature model\n",
    "bcsd_temp = BcsdTemperature()\n",
    "\n",
    "# Fit the model using training data\n",
    "bcsd_temp.fit(X_temp, y_temp)\n",
    "\n",
    "# Generate downscaled predictions\n",
    "# Note: We add back X_temp because the model predicts anomalies\n",
    "temp_downscaled = bcsd_temp.predict(X_temp) + X_temp\n",
    "\n",
    "print('Downscaled temperature (first 5 months):')\n",
    "display(temp_downscaled.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2f2938b5",
   "metadata": {},
   "source": [
    "### Evaluate Temperature Downscaling\n",
    "\n",
    "Let's visualize how well the downscaling corrects the model bias by comparing CDFs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fbf0f3b2",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Plot overall CDF\n",
    "fig, ax = plt.subplots(figsize=(10, 6))\n",
    "plot_cdf(\n",
    "    ax=ax,\n",
    "    Training=X_temp.values.flatten(),\n",
    "    Observed=y_temp.values.flatten(),\n",
    "    Downscaled=temp_downscaled.values.flatten(),\n",
    ")\n",
    "ax.set_title('Temperature CDF Comparison')\n",
    "ax.set_ylabel('Temperature (°C)')\n",
    "plt.show()\n",
    "\n",
    "# Plot time series for a sample period\n",
    "fig, ax = plt.subplots(figsize=(12, 5))\n",
    "temp_downscaled['2000':'2001'].plot(ax=ax, label='Downscaled')\n",
    "y_temp['2000':'2001'].plot(ax=ax, label='Observed')\n",
    "X_temp['2000':'2001'].plot(ax=ax, label='Training', alpha=0.7)\n",
    "ax.set_title('Temperature Time Series (2000-2001)')\n",
    "ax.set_ylabel('Temperature (°C)')\n",
    "ax.legend()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d0693752",
   "metadata": {},
   "source": [
    "### Monthly CDFs for Temperature\n",
    "\n",
    "Since climate statistics vary by season, let's examine the CDFs for each month separately."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3bccf530",
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, axes = plot_cdf_by_month(Training=X_temp, Observed=y_temp, Downscaled=temp_downscaled)\n",
    "fig.suptitle('Temperature CDFs by Month', fontsize=14)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "72b983ad",
   "metadata": {},
   "source": [
    "## Step 4: Downscale Precipitation with BCSD\n",
    "\n",
    "Precipitation requires special handling due to its non-Gaussian distribution and the presence of zero values.\n",
    "\n",
    "### Key differences from temperature:\n",
    "- Uses multiplicative (rather than additive) adjustments\n",
    "- Handles zero-precipitation days separately\n",
    "- Uses specialized quantile mapping for non-negative values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1715b94d",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initialize the BCSD precipitation model\n",
    "bcsd_pcp = BcsdPrecipitation()\n",
    "\n",
    "# Fit the model using training data\n",
    "bcsd_pcp.fit(X_pcp, y_pcp)\n",
    "\n",
    "# Generate downscaled predictions\n",
    "# Note: We multiply by X_pcp because the model predicts a scaling factor\n",
    "pcp_downscaled = bcsd_pcp.predict(X_pcp) * X_pcp\n",
    "\n",
    "print('Downscaled precipitation (first 5 months):')\n",
    "display(pcp_downscaled.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fb8ae90f",
   "metadata": {},
   "source": [
    "### Evaluate Precipitation Downscaling\n",
    "\n",
    "Let's compare the statistical properties of our downscaled precipitation with observations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e86972ab",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Plot overall CDF\n",
    "fig, ax = plt.subplots(figsize=(10, 6))\n",
    "plot_cdf(\n",
    "    ax=ax,\n",
    "    Training=X_pcp.values.flatten(),\n",
    "    Observed=y_pcp.values.flatten(),\n",
    "    Downscaled=pcp_downscaled.values.flatten(),\n",
    ")\n",
    "ax.set_title('Precipitation CDF Comparison')\n",
    "ax.set_ylabel('Precipitation (mm)')\n",
    "plt.show()\n",
    "\n",
    "# Plot time series for a sample period\n",
    "fig, ax = plt.subplots(figsize=(12, 5))\n",
    "pcp_downscaled['2000':'2001'].plot(ax=ax, label='Downscaled')\n",
    "y_pcp['2000':'2001'].plot(ax=ax, label='Observed')\n",
    "X_pcp['2000':'2001'].plot(ax=ax, label='Training', alpha=0.7)\n",
    "ax.set_title('Precipitation Time Series (2000-2001)')\n",
    "ax.set_ylabel('Precipitation (mm)')\n",
    "ax.legend()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f26b72c8",
   "metadata": {},
   "source": [
    "### Monthly CDFs for Precipitation\n",
    "\n",
    "Precipitation patterns often vary significantly by month, so let's examine monthly CDFs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9c442339",
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, axes = plot_cdf_by_month(Training=X_pcp, Observed=y_pcp, Downscaled=pcp_downscaled)\n",
    "fig.suptitle('Precipitation CDFs by Month', fontsize=14)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "27241f2c",
   "metadata": {},
   "source": [
    "## Summary\n",
    "\n",
    "In this tutorial, we demonstrated how to:\n",
    "\n",
    "1. ✅ Load and prepare climate data from cloud storage\n",
    "2. ✅ Apply BCSD temperature downscaling using quantile mapping\n",
    "3. ✅ Apply BCSD precipitation downscaling with special handling for non-negative values\n",
    "4. ✅ Evaluate results using CDFs and time series plots\n",
    "\n",
    "### Key Takeaways\n",
    "\n",
    "- **Temperature**: BCSD corrects biases using additive quantile mapping\n",
    "- **Precipitation**: BCSD uses multiplicative adjustments due to its non-Gaussian nature\n",
    "- **Evaluation**: CDFs are effective for comparing statistical properties across datasets\n",
    "- **Monthly analysis**: Examining results by month reveals seasonal performance\n",
    "\n",
    "### Next Steps\n",
    "\n",
    "- Try applying BCSD to multiple spatial points using `PointWiseDownscaler`\n",
    "- Explore other downscaling methods like GARD or Analog methods\n",
    "- Compare different methods on the same dataset\n",
    "- Apply downscaling to your own climate data\n",
    "\n",
    "### References\n",
    "\n",
    "Wood, A. W., Leung, L. R., Sridhar, V., & Lettenmaier, D. P. (2004). Hydrological implications of dynamical and statistical approaches to downscaling climate model outputs. *Climatic Change*, 62(1-3), 189-216."
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}