ClusterSC — Synthetic Control with Donor Selection (Rho et al. 2025)#

Estimator:: Cluster Synthetic Controls (CLUSTERSC) — mlsynth.CLUSTERSC
Source:: Rho, S., Tang, A., Bergam, N., Cummings, R., & Misra, V. (2025), “ClusterSC: Advancing Synthetic Control with Donor Selection,” arXiv:2503.21629.
Replication type:: Path B — the paper’s synthetic Monte Carlo (Section 6.1), exercising both estimator modes, with a cross-validation of the authors’ reference code against their own headline.
Status:: Verified — mlsynth reproduces the paper’s central claim in the high-dimensional-subgroup regime; the authors’ code reproduces its ~50% headline on its own DGP.

Two estimators for the price of one#

mlsynth’s single CLUSTERSC estimator covers both halves of the paper:

clustering=False is plain RSC / PCR-SC (Amjad-Shah-Shen 2018: HSVT-denoise the donor pool to rank r, then OLS) — the paper’s benchmark.
clustering=True is ClusterSC — k-means the donors in SVD-feature space (Algorithm 3), keep the target’s subgroup, then run RSC on it (Algorithm 4).

So one benchmark validates both the baseline and the contribution.

The regime that matters#

The paper motivates ClusterSC by the curse of dimensionality: with many donors, the whole-pool regression lives in a high-dimensional, noisy space. mlsynth’s RSC, however, denoises the pre-period donor matrix to a fixed rank before regressing, so it is already robust to the raw donor count n — and on the paper’s own two-subgroup DGP (rank 3 + 3) clustering buys mlsynth almost nothing, because its rank-6 whole-pool fit already captures the structure.

The lever that genuinely exercises the paper’s argument is the pooled signal rank. With K well-separated subgroups of rank r, the pooled donor matrix has rank K·r; once K·r exceeds the pre-period length T0 the whole-pool fit must under-denoise, while each subgroup stays low-rank and well-conditioned. We use K=6 subgroups of rank 3 with T0=8 (pooled rank 18 ≫ 8). In this regime mlsynth’s ClusterSC clearly beats its whole-pool RSC at every noise level (placebo DGP, true effect 0, so post-period MSE is pure prediction error):

Median post-period test MSE (mlsynth, 25 placebo targets, seed 0)#
Noise σ	RSC (whole pool)	ClusterSC	MSE reduction
0.10	0.0299	0.0117	60.8%
0.25	0.1161	0.0659	43.2%
0.40	0.2495	0.1888	24.3%

The durable check is benchmarks/cases/clustersc_subgroups.py:

python benchmarks/run_benchmarks.py --case clustersc_subgroups

It asserts clustering_wins_all == 1 (ClusterSC beats RSC at every noise) plus a positive floor on the smallest gain. The DGP is a faithful K-subgroup generalisation of the authors’ two-subgroup sine mixture, in mlsynth.utils.clustersc_helpers.simulation.simulate_subgroup_panel().

Cross-validation against the authors’ code#

mlsynth and the authors’ reference implementation (srho1/ClusterSC) make different, individually-valid modelling choices:

Step	mlsynth (paper Algorithm 3/4)	reference code
Clustering features	rank-`r` truncated `UΣ_r` of the pre-period donors	full, untruncated `UΣ` of the whole panel
Denoising	pre-period only (no post leakage)	full panel
Weight fit	OLS via pseudo-inverse, no intercept	OLS with intercept

Because of this, the two are strongest in different regimes — clustering helps mlsynth where the pooled rank exceeds T0, and helps the reference on its own two-subgroup DGP (where its full-panel-plus-intercept RSC baseline is weaker). A per-target numeric cross-validation between the two implementations is therefore not meaningful, and we do not assert one.

What is a clean cross-check is the authors’ code against the authors’ paper: benchmarks/cases/clustersc_subgroups_ref.py clones the reference repo (pinned commit, MIT-licensed, imported not vendored) and runs its RSC / ClusterSC on its own generate_sine_dataset_A / _B DGP, confirming the paper’s headline that clustering substantially lowers test MSE (single seed, 30 targets):

Median test-MSE reduction, authors’ code on authors’ DGP#
Noise σ	MSE reduction
0.10	38.8%
0.25	71.3%
0.40	70.6%

(The paper reports ~50% at n=1000 over 500 reps; the single-seed median over 30 targets is noisier but unambiguously large and positive at every noise level.)

Run it with:

pip install kneed scikit-learn
python benchmarks/run_benchmarks.py --case clustersc_subgroups_ref

It skips gracefully when the repo cannot be cloned or syclib / kneed are unavailable.

RSC pre/post test error (Amjad-Shah-Shen 2018)#

The PCR-SC path is also benchmarked against the Robust Synthetic Control paper (Amjad, Shah & Shen 2018, JMLR 19:1-51) that underpins it. Section 5.3, Table 1 reports that on a low-rank latent-variable panel the pre-intervention MSE (training error) approximates the post-intervention MSE (generalization error) – so the in-sample pre-fit honestly predicts out-of-sample forecast accuracy. Both errors are taken against the true (noise-free) mean, which the DGP exposes; mlsynth’s RSC (CLUSTERSC with clustering=False) reproduces this at every noise level:

PCR-SC error vs the true mean (N=100, T=2000, T0=1600, rank 3)#
Noise σ	Training (pre) MSE	Generalization (post) MSE	gen / train
3.1	0.176	0.202	1.15
1.3	0.041	0.044	1.08
0.4	0.0043	0.0044	1.03

The ratio sits just above 1 throughout, matching the paper’s “training error approximates generalization error” finding; the absolute magnitudes depend on the (paper-underspecified) truncation rank, so the durable check (benchmarks/cases/rsc_synth_error.py) pins the ratio and noise-monotonicity rather than Table 1’s exact cells. The DGP is mlsynth.utils.clustersc_helpers.simulation.simulate_rsc_panel().

Confidence-interval coverage (Shen et al.)#

mlsynth’s frequentist PCR-SC confidence intervals port the variance estimators of Shen et al.’s Same Root Different Leaves (the deshen24/panel-data-regressions reference). Two cross-checks in benchmarks/cases/rsc_shen_coverage.py:

Variance cross-validation. On identical resampled inputs, mlsynth’s _var_homo / _var_jack equal the reference var.py to machine precision (max \(|\Delta| \approx 4\times 10^{-16}\) / exactly 0).
Coverage validity. Reproducing the repo’s simulation.py Monte Carlo (calibrated to Prop 99), the doubly-robust variance is approximately valid for all three estimands, while a single-source variance under-covers the estimand it is not built for:

95%-CI coverage (seed 0, 500 reps)#

Variance

μ_hz

μ_vt

μ_dr

doubly robust (DR)

0.95

1.00

0.92

vertical only (VT)

0.63

0.93

0.58

The DR variance keeps coverage near nominal everywhere; the vertical-only variance covers the horizontal estimand just 63% of the time – the paper’s motivation for the doubly-robust construction. Run with python benchmarks/run_benchmarks.py --case rsc_shen_coverage (skips if the repo cannot be cloned).

95%-CI coverage (seed 0, 500 reps)#
Variance	μ_hz	μ_vt	μ_dr
doubly robust (DR)	0.95	1.00	0.92
vertical only (VT)	0.63	0.93	0.58

RPCA-SC: West German reunification (Bayani 2021)#

The sections above all exercise the PCR-SC / RSC family. CLUSTERSC’s other family – RPCA-SC (robust \(L+S\) low-rank donor denoising, Bayani 2021) – is pinned on the canonical German-reunification application. With the PCP robust-PCA denoiser, CLUSTERSC(method="rpca", rpca_method="PCP") reproduces Bayani’s reference design on basedata/german_reunification.csv:

RPCA-SC synthetic West Germany (PCP)#
Quantity	mlsynth	Bayani (2021)
Norway weight	0.485	~0.48
France weight	0.354	~0.35
New Zealand weight	0.296	~0.30
Pre-fit RMSE	88.6	~90
Effect on West German GDP	negative	negative (ADH)

The durable check is benchmarks/cases/clustersc_rpca_germany.py:

python benchmarks/run_benchmarks.py --case clustersc_rpca_germany

References#

Shen, D., Ding, P., Sekhon, J., & Yu, B. (2023). “Same Root Different Leaves: Time Series and Cross-Sectional Methods in Panel Data.” Econometrica 91(6):2125-2154.

Rho, S., Tang, A., Bergam, N., Cummings, R., & Misra, V. (2025). “ClusterSC: Advancing Synthetic Control with Donor Selection.” arXiv:2503.21629.

Amjad, M., Shah, D., & Shen, D. (2018). “Robust Synthetic Control.” Journal of Machine Learning Research 19(22):1-51.

Bayani, M. (2021). “Robust PCA Synthetic Control.”

ClusterSC — Synthetic Control with Donor Selection (Rho et al. 2025)

Contents