{ "cells": [ { "cell_type": "markdown", "id": "95e389fe", "metadata": {}, "source": [ "# BCSD: Bias Correction and Spatial Disaggregation\n", "\n", "## Setup\n", "\n", "First, let's import the necessary libraries." ] }, { "cell_type": "code", "execution_count": null, "id": "2a80241b", "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import scipy.stats\n", "import xarray as xr\n", "\n", "from skdownscale.pointwise_models import BcsdPrecipitation, BcsdTemperature" ] }, { "cell_type": "markdown", "id": "315465ce", "metadata": {}, "source": [ "## Helper Functions for Visualization\n", "\n", "We'll define helper functions to visualize cumulative distribution functions (CDFs), which are useful for comparing the statistical properties of our input and downscaled data." ] }, { "cell_type": "code", "execution_count": null, "id": "a9c67616", "metadata": {}, "outputs": [], "source": [ "def plot_cdf(ax=None, **kwargs):\n", " \"\"\"Plot cumulative distribution function for multiple datasets.\n", "\n", " Parameters\n", " ----------\n", " ax : matplotlib axis, optional\n", " Axis to plot on\n", " **kwargs : dict\n", " Datasets to plot, where key is the label and value is the data\n", " \"\"\"\n", " if ax:\n", " plt.sca(ax)\n", " else:\n", " ax = plt.gca()\n", "\n", " for label, X in kwargs.items():\n", " vals = np.sort(X, axis=0)\n", " pp = scipy.stats.mstats.plotting_positions(vals)\n", " ax.plot(pp, vals, label=label)\n", " ax.legend()\n", " ax.set_xlabel('Cumulative Probability')\n", " ax.set_ylabel('Value')\n", " return ax\n", "\n", "\n", "def plot_cdf_by_month(**kwargs):\n", " \"\"\"Plot CDFs separately for each month of the year.\n", "\n", " Parameters\n", " ----------\n", " **kwargs : dict\n", " Datasets to plot, where key is the label and value is the data\n", " \"\"\"\n", " fig, axes = plt.subplots(4, 3, sharex=True, sharey=False, figsize=(12, 8))\n", " fig.suptitle('CDFs by Month', fontsize=14)\n", "\n", " for label, X in kwargs.items():\n", " for month, ax in zip(range(1, 13), axes.flat):\n", " vals = np.sort(X[X.index.month == month], axis=0)\n", " pp = scipy.stats.mstats.plotting_positions(vals)\n", " ax.plot(pp, vals, label=label)\n", " ax.set_title(f'Month {month}')\n", "\n", " # Add legend to last subplot\n", " axes.flat[-1].legend()\n", " fig.tight_layout()\n", " return fig, axes" ] }, { "cell_type": "markdown", "id": "cfc57fdd", "metadata": {}, "source": [ "## Step 1: Load Climate Data\n", "\n", "We'll load sample climate data that includes:\n", "- **Training data**: Coarse-resolution climate model output (predictor)\n", "- **Target data**: High-resolution observations (predictand)\n", "\n", "The data is stored in a cloud-optimized Zarr format for efficient access." ] }, { "cell_type": "code", "execution_count": null, "id": "099a0f07", "metadata": {}, "outputs": [], "source": [ "# Define the training period\n", "training_time_slice = slice('1980', '2001')\n", "\n", "# Load the data from cloud storage\n", "data = xr.open_datatree(\n", " 's3://carbonplan/share/scikit-downscale/test-data.zarr',\n", " engine='zarr',\n", " chunks={},\n", " storage_options={'anon': True, 'endpoint_url': 'https://rice1.osn.mghpcc.org'},\n", ")\n", "\n", "# Extract training and target datasets\n", "training = data['training'].to_dataset().sel(time=training_time_slice)\n", "targets = data['targets'].to_dataset().sel(time=training_time_slice)\n", "\n", "print('Training data:')\n", "display(training)\n", "print('\\nTarget data:')\n", "display(targets)" ] }, { "cell_type": "markdown", "id": "bdf9c71e", "metadata": {}, "source": [ "## Step 2: Prepare Data for a Single Location\n", "\n", "For this tutorial, we'll focus on a single spatial point. We'll extract and prepare both temperature and precipitation data.\n", "\n", "### Temperature Data\n", "\n", "- Convert from Kelvin to Celsius\n", "- Resample to monthly means" ] }, { "cell_type": "code", "execution_count": null, "id": "a9267b06", "metadata": {}, "outputs": [], "source": [ "# Extract temperature data for point 0\n", "# Training: daily maximum temperature from climate model\n", "X_temp = (\n", " training.isel(point=0)\n", " .to_dataframe()[['T2max']]\n", " .resample('MS') # Monthly start frequency\n", " .mean()\n", " - 273.15 # Convert Kelvin to Celsius\n", ")\n", "\n", "# Target: observed daily maximum temperature\n", "y_temp = targets.isel(point=0).to_dataframe()[['Tmax']].resample('MS').mean()\n", "\n", "print('Training temperature (first 5 months):')\n", "display(X_temp.head())\n", "print('\\nTarget temperature (first 5 months):')\n", "display(y_temp.head())" ] }, { "cell_type": "markdown", "id": "cb8d49cc", "metadata": {}, "source": [ "### Precipitation Data\n", "\n", "- Convert units to mm/day by multiplying by 24\n", "- Resample to monthly totals" ] }, { "cell_type": "code", "execution_count": null, "id": "425b99e4", "metadata": {}, "outputs": [], "source": [ "# Extract precipitation data for point 0\n", "# Training: total precipitation from climate model\n", "X_pcp = (\n", " training.isel(point=0).to_dataframe()[['PREC_TOT']].resample('MS').sum()\n", " * 24 # Convert to mm/day equivalent\n", ")\n", "\n", "# Target: observed precipitation\n", "y_pcp = targets.isel(point=0).to_dataframe()[['Prec']].resample('MS').sum()\n", "\n", "print('Training precipitation (first 5 months):')\n", "display(X_pcp.head())\n", "print('\\nTarget precipitation (first 5 months):')\n", "display(y_pcp.head())" ] }, { "cell_type": "markdown", "id": "27b90bb0", "metadata": {}, "source": [ "## Step 3: Downscale Temperature with BCSD\n", "\n", "The BCSD temperature model uses quantile mapping to correct biases in the climate model output.\n", "\n", "### How it works:\n", "1. **Fit**: Learn the relationship between model and observed quantiles\n", "2. **Predict**: Apply the correction to new data\n", "3. **Adjust**: Add the bias-corrected anomaly back to the original data" ] }, { "cell_type": "code", "execution_count": null, "id": "30560ef4", "metadata": {}, "outputs": [], "source": [ "# Initialize the BCSD temperature model\n", "bcsd_temp = BcsdTemperature()\n", "\n", "# Fit the model using training data\n", "bcsd_temp.fit(X_temp, y_temp)\n", "\n", "# Generate downscaled predictions\n", "# Note: We add back X_temp because the model predicts anomalies\n", "temp_downscaled = bcsd_temp.predict(X_temp) + X_temp\n", "\n", "print('Downscaled temperature (first 5 months):')\n", "display(temp_downscaled.head())" ] }, { "cell_type": "markdown", "id": "2f2938b5", "metadata": {}, "source": [ "### Evaluate Temperature Downscaling\n", "\n", "Let's visualize how well the downscaling corrects the model bias by comparing CDFs." ] }, { "cell_type": "code", "execution_count": null, "id": "fbf0f3b2", "metadata": {}, "outputs": [], "source": [ "# Plot overall CDF\n", "fig, ax = plt.subplots(figsize=(10, 6))\n", "plot_cdf(\n", " ax=ax,\n", " Training=X_temp.values.flatten(),\n", " Observed=y_temp.values.flatten(),\n", " Downscaled=temp_downscaled.values.flatten(),\n", ")\n", "ax.set_title('Temperature CDF Comparison')\n", "ax.set_ylabel('Temperature (°C)')\n", "plt.show()\n", "\n", "# Plot time series for a sample period\n", "fig, ax = plt.subplots(figsize=(12, 5))\n", "temp_downscaled['2000':'2001'].plot(ax=ax, label='Downscaled')\n", "y_temp['2000':'2001'].plot(ax=ax, label='Observed')\n", "X_temp['2000':'2001'].plot(ax=ax, label='Training', alpha=0.7)\n", "ax.set_title('Temperature Time Series (2000-2001)')\n", "ax.set_ylabel('Temperature (°C)')\n", "ax.legend()\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "d0693752", "metadata": {}, "source": [ "### Monthly CDFs for Temperature\n", "\n", "Since climate statistics vary by season, let's examine the CDFs for each month separately." ] }, { "cell_type": "code", "execution_count": null, "id": "3bccf530", "metadata": {}, "outputs": [], "source": [ "fig, axes = plot_cdf_by_month(Training=X_temp, Observed=y_temp, Downscaled=temp_downscaled)\n", "fig.suptitle('Temperature CDFs by Month', fontsize=14)\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "72b983ad", "metadata": {}, "source": [ "## Step 4: Downscale Precipitation with BCSD\n", "\n", "Precipitation requires special handling due to its non-Gaussian distribution and the presence of zero values.\n", "\n", "### Key differences from temperature:\n", "- Uses multiplicative (rather than additive) adjustments\n", "- Handles zero-precipitation days separately\n", "- Uses specialized quantile mapping for non-negative values" ] }, { "cell_type": "code", "execution_count": null, "id": "1715b94d", "metadata": {}, "outputs": [], "source": [ "# Initialize the BCSD precipitation model\n", "bcsd_pcp = BcsdPrecipitation()\n", "\n", "# Fit the model using training data\n", "bcsd_pcp.fit(X_pcp, y_pcp)\n", "\n", "# Generate downscaled predictions\n", "# Note: We multiply by X_pcp because the model predicts a scaling factor\n", "pcp_downscaled = bcsd_pcp.predict(X_pcp) * X_pcp\n", "\n", "print('Downscaled precipitation (first 5 months):')\n", "display(pcp_downscaled.head())" ] }, { "cell_type": "markdown", "id": "fb8ae90f", "metadata": {}, "source": [ "### Evaluate Precipitation Downscaling\n", "\n", "Let's compare the statistical properties of our downscaled precipitation with observations." ] }, { "cell_type": "code", "execution_count": null, "id": "e86972ab", "metadata": {}, "outputs": [], "source": [ "# Plot overall CDF\n", "fig, ax = plt.subplots(figsize=(10, 6))\n", "plot_cdf(\n", " ax=ax,\n", " Training=X_pcp.values.flatten(),\n", " Observed=y_pcp.values.flatten(),\n", " Downscaled=pcp_downscaled.values.flatten(),\n", ")\n", "ax.set_title('Precipitation CDF Comparison')\n", "ax.set_ylabel('Precipitation (mm)')\n", "plt.show()\n", "\n", "# Plot time series for a sample period\n", "fig, ax = plt.subplots(figsize=(12, 5))\n", "pcp_downscaled['2000':'2001'].plot(ax=ax, label='Downscaled')\n", "y_pcp['2000':'2001'].plot(ax=ax, label='Observed')\n", "X_pcp['2000':'2001'].plot(ax=ax, label='Training', alpha=0.7)\n", "ax.set_title('Precipitation Time Series (2000-2001)')\n", "ax.set_ylabel('Precipitation (mm)')\n", "ax.legend()\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "f26b72c8", "metadata": {}, "source": [ "### Monthly CDFs for Precipitation\n", "\n", "Precipitation patterns often vary significantly by month, so let's examine monthly CDFs." ] }, { "cell_type": "code", "execution_count": null, "id": "9c442339", "metadata": {}, "outputs": [], "source": [ "fig, axes = plot_cdf_by_month(Training=X_pcp, Observed=y_pcp, Downscaled=pcp_downscaled)\n", "fig.suptitle('Precipitation CDFs by Month', fontsize=14)\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "27241f2c", "metadata": {}, "source": [ "## Summary\n", "\n", "In this tutorial, we demonstrated how to:\n", "\n", "1. ✅ Load and prepare climate data from cloud storage\n", "2. ✅ Apply BCSD temperature downscaling using quantile mapping\n", "3. ✅ Apply BCSD precipitation downscaling with special handling for non-negative values\n", "4. ✅ Evaluate results using CDFs and time series plots\n", "\n", "### Key Takeaways\n", "\n", "- **Temperature**: BCSD corrects biases using additive quantile mapping\n", "- **Precipitation**: BCSD uses multiplicative adjustments due to its non-Gaussian nature\n", "- **Evaluation**: CDFs are effective for comparing statistical properties across datasets\n", "- **Monthly analysis**: Examining results by month reveals seasonal performance\n", "\n", "### Next Steps\n", "\n", "- Try applying BCSD to multiple spatial points using `PointWiseDownscaler`\n", "- Explore other downscaling methods like GARD or Analog methods\n", "- Compare different methods on the same dataset\n", "- Apply downscaling to your own climate data\n", "\n", "### References\n", "\n", "Wood, A. W., Leung, L. R., Sridhar, V., & Lettenmaier, D. P. (2004). Hydrological implications of dynamical and statistical approaches to downscaling climate model outputs. *Climatic Change*, 62(1-3), 189-216." ] } ], "metadata": { "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 5 }