BCSD: Bias Correction and Spatial Disaggregation

Setup

First, let’s import the necessary libraries.

import matplotlib.pyplot as plt
import numpy as np
import scipy.stats
import xarray as xr

from skdownscale.pointwise_models import BcsdPrecipitation, BcsdTemperature

Helper Functions for Visualization

We’ll define helper functions to visualize cumulative distribution functions (CDFs), which are useful for comparing the statistical properties of our input and downscaled data.

def plot_cdf(ax=None, **kwargs):
    """Plot cumulative distribution function for multiple datasets.

    Parameters
    ----------
    ax : matplotlib axis, optional
        Axis to plot on
    **kwargs : dict
        Datasets to plot, where key is the label and value is the data
    """
    if ax:
        plt.sca(ax)
    else:
        ax = plt.gca()

    for label, X in kwargs.items():
        vals = np.sort(X, axis=0)
        pp = scipy.stats.mstats.plotting_positions(vals)
        ax.plot(pp, vals, label=label)
    ax.legend()
    ax.set_xlabel('Cumulative Probability')
    ax.set_ylabel('Value')
    return ax


def plot_cdf_by_month(**kwargs):
    """Plot CDFs separately for each month of the year.

    Parameters
    ----------
    **kwargs : dict
        Datasets to plot, where key is the label and value is the data
    """
    fig, axes = plt.subplots(4, 3, sharex=True, sharey=False, figsize=(12, 8))
    fig.suptitle('CDFs by Month', fontsize=14)

    for label, X in kwargs.items():
        for month, ax in zip(range(1, 13), axes.flat):
            vals = np.sort(X[X.index.month == month], axis=0)
            pp = scipy.stats.mstats.plotting_positions(vals)
            ax.plot(pp, vals, label=label)
            ax.set_title(f'Month {month}')

    # Add legend to last subplot
    axes.flat[-1].legend()
    fig.tight_layout()
    return fig, axes

Step 1: Load Climate Data

We’ll load sample climate data that includes:

  • Training data: Coarse-resolution climate model output (predictor)

  • Target data: High-resolution observations (predictand)

The data is stored in a cloud-optimized Zarr format for efficient access.

# Define the training period
training_time_slice = slice('1980', '2001')

# Load the data from cloud storage
data = xr.open_datatree(
    's3://carbonplan/share/scikit-downscale/test-data.zarr',
    engine='zarr',
    chunks={},
    storage_options={'anon': True, 'endpoint_url': 'https://rice1.osn.mghpcc.org'},
)

# Extract training and target datasets
training = data['training'].to_dataset().sel(time=training_time_slice)
targets = data['targets'].to_dataset().sel(time=training_time_slice)

print('Training data:')
display(training)
print('\nTarget data:')
display(targets)
Training data:
<xarray.Dataset> Size: 2MB
Dimensions:      (point: 5, time: 8036)
Coordinates:
  * time         (time) datetime64[ns] 64kB 1980-01-01T11:30:00 ... 2001-12-3...
    lat          (point) float32 20B dask.array<chunksize=(5,), meta=np.ndarray>
    lon          (point) float32 20B dask.array<chunksize=(5,), meta=np.ndarray>
Dimensions without coordinates: point
Data variables: (12/15)
    DIV          (point, time) float32 161kB dask.array<chunksize=(3, 8036), meta=np.ndarray>
    PREC_ACC_C   (point, time) float32 161kB dask.array<chunksize=(3, 8036), meta=np.ndarray>
    PREC_ACC_NC  (point, time) float32 161kB dask.array<chunksize=(3, 8036), meta=np.ndarray>
    PREC_TOT     (point, time) float32 161kB dask.array<chunksize=(3, 8036), meta=np.ndarray>
    PSFC         (point, time) float32 161kB dask.array<chunksize=(3, 8036), meta=np.ndarray>
    QVAPOR       (point, time) float32 161kB dask.array<chunksize=(3, 8036), meta=np.ndarray>
    ...           ...
    T2min        (point, time) float32 161kB dask.array<chunksize=(3, 8036), meta=np.ndarray>
    T_MEAN       (point, time) float32 161kB dask.array<chunksize=(3, 8036), meta=np.ndarray>
    T_RANGE      (point, time) float32 161kB dask.array<chunksize=(3, 8036), meta=np.ndarray>
    U            (point, time) float32 161kB dask.array<chunksize=(3, 8036), meta=np.ndarray>
    V            (point, time) float32 161kB dask.array<chunksize=(3, 8036), meta=np.ndarray>
    W            (point, time) float32 161kB dask.array<chunksize=(3, 8036), meta=np.ndarray>
Attributes:
    NCO:                        "4.5.5"
    history:                    Wed Mar  1 13:48:35 2017: ncatted -a calendar...
    history_of_appended_files:  Wed Feb  8 14:15:52 2017: Appended file wrf_d...
    nco_openmp_thread_number:   1
Target data:
<xarray.Dataset> Size: 707kB
Dimensions:  (time: 8036, point: 5)
Coordinates:
  * time     (time) datetime64[ns] 64kB 1980-01-01 1980-01-02 ... 2001-12-31
    lat      (point) float64 40B dask.array<chunksize=(5,), meta=np.ndarray>
    lon      (point) float64 40B dask.array<chunksize=(5,), meta=np.ndarray>
Dimensions without coordinates: point
Data variables:
    Prec     (time, point) float32 161kB dask.array<chunksize=(731, 5), meta=np.ndarray>
    Tmax     (time, point) float32 161kB dask.array<chunksize=(731, 5), meta=np.ndarray>
    Tmin     (time, point) float32 161kB dask.array<chunksize=(731, 5), meta=np.ndarray>
    wind     (time, point) float32 161kB dask.array<chunksize=(731, 5), meta=np.ndarray>
Attributes:
    CDI:                       Climate Data Interface version 1.6.4 (http://c...
    CDO:                       Climate Data Operators version 1.6.4 (http://c...
    Conventions:               CF-1.4
    NCO:                       4.4.5
    history:                   Fri Oct 10 17:54:37 2014: cdo ifthenelse /Volu...
    nco_openmp_thread_number:  1

Step 2: Prepare Data for a Single Location

For this tutorial, we’ll focus on a single spatial point. We’ll extract and prepare both temperature and precipitation data.

Temperature Data

  • Convert from Kelvin to Celsius

  • Resample to monthly means

# Extract temperature data for point 0
# Training: daily maximum temperature from climate model
X_temp = (
    training.isel(point=0)
    .to_dataframe()[['T2max']]
    .resample('MS')  # Monthly start frequency
    .mean()
    - 273.15  # Convert Kelvin to Celsius
)

# Target: observed daily maximum temperature
y_temp = targets.isel(point=0).to_dataframe()[['Tmax']].resample('MS').mean()

print('Training temperature (first 5 months):')
display(X_temp.head())
print('\nTarget temperature (first 5 months):')
display(y_temp.head())
Training temperature (first 5 months):
T2max
time
1980-01-01 3.291565
1980-02-01 9.805084
1980-03-01 8.460297
1980-04-01 17.106628
1980-05-01 19.035583
Target temperature (first 5 months):
Tmax
time
1980-01-01 3.545161
1980-02-01 8.900001
1980-03-01 9.139677
1980-04-01 16.180000
1980-05-01 16.735161

Precipitation Data

  • Convert units to mm/day by multiplying by 24

  • Resample to monthly totals

# Extract precipitation data for point 0
# Training: total precipitation from climate model
X_pcp = (
    training.isel(point=0).to_dataframe()[['PREC_TOT']].resample('MS').sum()
    * 24  # Convert to mm/day equivalent
)

# Target: observed precipitation
y_pcp = targets.isel(point=0).to_dataframe()[['Prec']].resample('MS').sum()

print('Training precipitation (first 5 months):')
display(X_pcp.head())
print('\nTarget precipitation (first 5 months):')
display(y_pcp.head())
Training precipitation (first 5 months):
PREC_TOT
time
1980-01-01 93.716293
1980-02-01 128.091492
1980-03-01 136.653229
1980-04-01 70.510910
1980-05-01 103.933807
Target precipitation (first 5 months):
Prec
time
1980-01-01 149.882751
1980-02-01 177.640610
1980-03-01 187.000565
1980-04-01 124.414742
1980-05-01 89.521523

Step 3: Downscale Temperature with BCSD

The BCSD temperature model uses quantile mapping to correct biases in the climate model output.

How it works:

  1. Fit: Learn the relationship between model and observed quantiles

  2. Predict: Apply the correction to new data

  3. Adjust: Add the bias-corrected anomaly back to the original data

# Initialize the BCSD temperature model
bcsd_temp = BcsdTemperature()

# Fit the model using training data
bcsd_temp.fit(X_temp, y_temp)

# Generate downscaled predictions
# Note: We add back X_temp because the model predicts anomalies
temp_downscaled = bcsd_temp.predict(X_temp) + X_temp

print('Downscaled temperature (first 5 months):')
display(temp_downscaled.head())
Downscaled temperature (first 5 months):
/home/docs/checkouts/readthedocs.org/user_builds/scikit-downscale/envs/latest/lib/python3.13/site-packages/sklearn/utils/validation.py:1406: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)
T2max
time
1980-01-01 1.245200
1980-02-01 10.878950
1980-03-01 6.240979
1980-04-01 18.493428
1980-05-01 18.662377

Evaluate Temperature Downscaling

Let’s visualize how well the downscaling corrects the model bias by comparing CDFs.

# Plot overall CDF
fig, ax = plt.subplots(figsize=(10, 6))
plot_cdf(
    ax=ax,
    Training=X_temp.values.flatten(),
    Observed=y_temp.values.flatten(),
    Downscaled=temp_downscaled.values.flatten(),
)
ax.set_title('Temperature CDF Comparison')
ax.set_ylabel('Temperature (°C)')
plt.show()

# Plot time series for a sample period
fig, ax = plt.subplots(figsize=(12, 5))
temp_downscaled['2000':'2001'].plot(ax=ax, label='Downscaled')
y_temp['2000':'2001'].plot(ax=ax, label='Observed')
X_temp['2000':'2001'].plot(ax=ax, label='Training', alpha=0.7)
ax.set_title('Temperature Time Series (2000-2001)')
ax.set_ylabel('Temperature (°C)')
ax.legend()
plt.show()
../_images/67e92887351f72ec696b5653399f9db99cc2d57bcee34e67fa035e5f251af199.png ../_images/365a05a339c4c9517499551cdef2ffd3378d43e52da38c0daf456a108d407166.png

Monthly CDFs for Temperature

Since climate statistics vary by season, let’s examine the CDFs for each month separately.

fig, axes = plot_cdf_by_month(Training=X_temp, Observed=y_temp, Downscaled=temp_downscaled)
fig.suptitle('Temperature CDFs by Month', fontsize=14)
plt.show()
../_images/c58acc9aab4afe97bc24f894b2e6b484f9bd29db3342dad6136186a08cc73fee.png

Step 4: Downscale Precipitation with BCSD

Precipitation requires special handling due to its non-Gaussian distribution and the presence of zero values.

Key differences from temperature:

  • Uses multiplicative (rather than additive) adjustments

  • Handles zero-precipitation days separately

  • Uses specialized quantile mapping for non-negative values

# Initialize the BCSD precipitation model
bcsd_pcp = BcsdPrecipitation()

# Fit the model using training data
bcsd_pcp.fit(X_pcp, y_pcp)

# Generate downscaled predictions
# Note: We multiply by X_pcp because the model predicts a scaling factor
pcp_downscaled = bcsd_pcp.predict(X_pcp) * X_pcp

print('Downscaled precipitation (first 5 months):')
display(pcp_downscaled.head())
Downscaled precipitation (first 5 months):
/home/docs/checkouts/readthedocs.org/user_builds/scikit-downscale/envs/latest/lib/python3.13/site-packages/sklearn/utils/validation.py:1406: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)
PREC_TOT
time
1980-01-01 46.673401
1980-02-01 114.921283
1980-03-01 161.957216
1980-04-01 55.210280
1980-05-01 154.597319

Evaluate Precipitation Downscaling

Let’s compare the statistical properties of our downscaled precipitation with observations.

# Plot overall CDF
fig, ax = plt.subplots(figsize=(10, 6))
plot_cdf(
    ax=ax,
    Training=X_pcp.values.flatten(),
    Observed=y_pcp.values.flatten(),
    Downscaled=pcp_downscaled.values.flatten(),
)
ax.set_title('Precipitation CDF Comparison')
ax.set_ylabel('Precipitation (mm)')
plt.show()

# Plot time series for a sample period
fig, ax = plt.subplots(figsize=(12, 5))
pcp_downscaled['2000':'2001'].plot(ax=ax, label='Downscaled')
y_pcp['2000':'2001'].plot(ax=ax, label='Observed')
X_pcp['2000':'2001'].plot(ax=ax, label='Training', alpha=0.7)
ax.set_title('Precipitation Time Series (2000-2001)')
ax.set_ylabel('Precipitation (mm)')
ax.legend()
plt.show()
../_images/d736145d4c6092dfaed4d591dcc3014d2ee063d58d6d2c410057abf2ebc36f3e.png ../_images/db598dfbdfe5123f726f7beaff66b7c003e4c1ced0ef938f1518419ebaabb884.png

Monthly CDFs for Precipitation

Precipitation patterns often vary significantly by month, so let’s examine monthly CDFs.

fig, axes = plot_cdf_by_month(Training=X_pcp, Observed=y_pcp, Downscaled=pcp_downscaled)
fig.suptitle('Precipitation CDFs by Month', fontsize=14)
plt.show()
../_images/1a837128dc4620615d4a5a1a5050035d4305b7d8868642aab340c1be160048e3.png

Summary

In this tutorial, we demonstrated how to:

  1. ✅ Load and prepare climate data from cloud storage

  2. ✅ Apply BCSD temperature downscaling using quantile mapping

  3. ✅ Apply BCSD precipitation downscaling with special handling for non-negative values

  4. ✅ Evaluate results using CDFs and time series plots

Key Takeaways

  • Temperature: BCSD corrects biases using additive quantile mapping

  • Precipitation: BCSD uses multiplicative adjustments due to its non-Gaussian nature

  • Evaluation: CDFs are effective for comparing statistical properties across datasets

  • Monthly analysis: Examining results by month reveals seasonal performance

Next Steps

  • Try applying BCSD to multiple spatial points using PointWiseDownscaler

  • Explore other downscaling methods like GARD or Analog methods

  • Compare different methods on the same dataset

  • Apply downscaling to your own climate data

References

Wood, A. W., Leung, L. R., Sridhar, V., & Lettenmaier, D. P. (2004). Hydrological implications of dynamical and statistical approaches to downscaling climate model outputs. Climatic Change, 62(1-3), 189-216.