ClusterSC — Synthetic Control with Donor Selection (Rho et al. 2025)#

Estimator:

Cluster Synthetic Controls (CLUSTERSC)mlsynth.CLUSTERSC

Source:

Rho, S., Tang, A., Bergam, N., Cummings, R., & Misra, V. (2025), “ClusterSC: Advancing Synthetic Control with Donor Selection,” arXiv:2503.21629.

Replication type:

Path B — the paper’s synthetic Monte Carlo (Section 6.1), exercising both estimator modes, with a cross-validation of the authors’ reference code against their own headline.

Status:

Verified — mlsynth reproduces the paper’s central claim in the high-dimensional-subgroup regime; the authors’ code reproduces its ~50% headline on its own DGP.

Two estimators for the price of one#

mlsynth’s single CLUSTERSC estimator covers both halves of the paper:

  • clustering=False is plain RSC / PCR-SC (Amjad-Shah-Shen 2018: HSVT-denoise the donor pool to rank r, then OLS) — the paper’s benchmark.

  • clustering=True is ClusterSC — k-means the donors in SVD-feature space (Algorithm 3), keep the target’s subgroup, then run RSC on it (Algorithm 4).

So one benchmark validates both the baseline and the contribution.

The regime that matters#

The paper motivates ClusterSC by the curse of dimensionality: with many donors, the whole-pool regression lives in a high-dimensional, noisy space. mlsynth’s RSC, however, denoises the pre-period donor matrix to a fixed rank before regressing, so it is already robust to the raw donor count n — and on the paper’s own two-subgroup DGP (rank 3 + 3) clustering buys mlsynth almost nothing, because its rank-6 whole-pool fit already captures the structure.

The lever that genuinely exercises the paper’s argument is the pooled signal rank. With K well-separated subgroups of rank r, the pooled donor matrix has rank K·r; once K·r exceeds the pre-period length T0 the whole-pool fit must under-denoise, while each subgroup stays low-rank and well-conditioned. We use K=6 subgroups of rank 3 with T0=8 (pooled rank 18 ≫ 8). In this regime mlsynth’s ClusterSC clearly beats its whole-pool RSC at every noise level (placebo DGP, true effect 0, so post-period MSE is pure prediction error):

Median post-period test MSE (mlsynth, 25 placebo targets, seed 0)#

Noise σ

RSC (whole pool)

ClusterSC

MSE reduction

0.10

0.0299

0.0117

60.8%

0.25

0.1161

0.0659

43.2%

0.40

0.2495

0.1888

24.3%

The durable check is benchmarks/cases/clustersc_subgroups.py:

python benchmarks/run_benchmarks.py --case clustersc_subgroups

It asserts clustering_wins_all == 1 (ClusterSC beats RSC at every noise) plus a positive floor on the smallest gain. The DGP is a faithful K-subgroup generalisation of the authors’ two-subgroup sine mixture, in mlsynth.utils.clustersc_helpers.simulation.simulate_subgroup_panel().

Cross-validation against the authors’ code#

mlsynth and the authors’ reference implementation (srho1/ClusterSC) make different, individually-valid modelling choices:

Step

mlsynth (paper Algorithm 3/4)

reference code

Clustering features

rank-r truncated UΣ_r of the pre-period donors

full, untruncated of the whole panel

Denoising

pre-period only (no post leakage)

full panel

Weight fit

OLS via pseudo-inverse, no intercept

OLS with intercept

Because of this, the two are strongest in different regimes — clustering helps mlsynth where the pooled rank exceeds T0, and helps the reference on its own two-subgroup DGP (where its full-panel-plus-intercept RSC baseline is weaker). A per-target numeric cross-validation between the two implementations is therefore not meaningful, and we do not assert one.

What is a clean cross-check is the authors’ code against the authors’ paper: benchmarks/cases/clustersc_subgroups_ref.py clones the reference repo (pinned commit, MIT-licensed, imported not vendored) and runs its RSC / ClusterSC on its own generate_sine_dataset_A / _B DGP, confirming the paper’s headline that clustering substantially lowers test MSE (single seed, 30 targets):

Median test-MSE reduction, authors’ code on authors’ DGP#

Noise σ

MSE reduction

0.10

38.8%

0.25

71.3%

0.40

70.6%

(The paper reports ~50% at n=1000 over 500 reps; the single-seed median over 30 targets is noisier but unambiguously large and positive at every noise level.)

Run it with:

pip install kneed scikit-learn
python benchmarks/run_benchmarks.py --case clustersc_subgroups_ref

It skips gracefully when the repo cannot be cloned or syclib / kneed are unavailable.

RSC pre/post test error (Amjad-Shah-Shen 2018)#

The PCR-SC path is also benchmarked against the Robust Synthetic Control paper (Amjad, Shah & Shen 2018, JMLR 19:1-51) that underpins it. Section 5.3, Table 1 reports that on a low-rank latent-variable panel the pre-intervention MSE (training error) approximates the post-intervention MSE (generalization error) – so the in-sample pre-fit honestly predicts out-of-sample forecast accuracy. Both errors are taken against the true (noise-free) mean, which the DGP exposes; mlsynth’s RSC (CLUSTERSC with clustering=False) reproduces this at every noise level:

PCR-SC error vs the true mean (N=100, T=2000, T0=1600, rank 3)#

Noise σ

Training (pre) MSE

Generalization (post) MSE

gen / train

3.1

0.176

0.202

1.15

1.3

0.041

0.044

1.08

0.4

0.0043

0.0044

1.03

The ratio sits just above 1 throughout, matching the paper’s “training error approximates generalization error” finding; the absolute magnitudes depend on the (paper-underspecified) truncation rank, so the durable check (benchmarks/cases/rsc_synth_error.py) pins the ratio and noise-monotonicity rather than Table 1’s exact cells. The DGP is mlsynth.utils.clustersc_helpers.simulation.simulate_rsc_panel().

Confidence-interval coverage (Shen et al.)#

mlsynth’s frequentist PCR-SC confidence intervals port the variance estimators of Shen et al.’s Same Root Different Leaves (the deshen24/panel-data-regressions reference). Two cross-checks in benchmarks/cases/rsc_shen_coverage.py:

  • Variance cross-validation. On identical resampled inputs, mlsynth’s _var_homo / _var_jack equal the reference var.py to machine precision (max \(|\Delta| \approx 4\times 10^{-16}\) / exactly 0).

  • Coverage validity. Reproducing the repo’s simulation.py Monte Carlo (calibrated to Prop 99), the doubly-robust variance is approximately valid for all three estimands, while a single-source variance under-covers the estimand it is not built for:

    95%-CI coverage (seed 0, 500 reps)#

    Variance

    μ_hz

    μ_vt

    μ_dr

    doubly robust (DR)

    0.95

    1.00

    0.92

    vertical only (VT)

    0.63

    0.93

    0.58

    The DR variance keeps coverage near nominal everywhere; the vertical-only variance covers the horizontal estimand just 63% of the time – the paper’s motivation for the doubly-robust construction. Run with python benchmarks/run_benchmarks.py --case rsc_shen_coverage (skips if the repo cannot be cloned).

RPCA-SC: West German reunification (Bayani 2021)#

The sections above all exercise the PCR-SC / RSC family. CLUSTERSC’s other family – RPCA-SC (robust \(L+S\) low-rank donor denoising, Bayani 2021) – is pinned on the canonical German-reunification application. With the PCP robust-PCA denoiser, CLUSTERSC(method="rpca", rpca_method="PCP") reproduces Bayani’s reference design on basedata/german_reunification.csv:

RPCA-SC synthetic West Germany (PCP)#

Quantity

mlsynth

Bayani (2021)

Norway weight

0.485

~0.48

France weight

0.354

~0.35

New Zealand weight

0.296

~0.30

Pre-fit RMSE

88.6

~90

Effect on West German GDP

negative

negative (ADH)

The durable check is benchmarks/cases/clustersc_rpca_germany.py:

python benchmarks/run_benchmarks.py --case clustersc_rpca_germany

References#

Shen, D., Ding, P., Sekhon, J., & Yu, B. (2023). “Same Root Different Leaves: Time Series and Cross-Sectional Methods in Panel Data.” Econometrica 91(6):2125-2154.

Rho, S., Tang, A., Bergam, N., Cummings, R., & Misra, V. (2025). “ClusterSC: Advancing Synthetic Control with Donor Selection.” arXiv:2503.21629.

Amjad, M., Shah, D., & Shen, D. (2018). “Robust Synthetic Control.” Journal of Machine Learning Research 19(22):1-51.

Bayani, M. (2021). “Robust PCA Synthetic Control.”