SPCD — Lu et al. (2022) Prop 99 design study#
Reproduction of the real-data result (Section 4.2, Table 1) in
Lu, Li, Ying & Blanchet (2022). Synthetic Principal Component Design: Fast Covariate Balancing with Synthetic Controls. arXiv:2211.15241.
SPCD is a fast spectral experimental-design method — it selects the treated units and synthetic-control weights from pre-treatment data via a normalized generalized power method (phase synchronization), then estimates the treatment effect. The paper’s headline is that this design slashes the RMSE of the effect estimate versus a random design.
Data#
basedata/smoking_data.csv — the Abadie-Diamond-Hainmueller Prop 99 per-capita
pack-sales panel; value-identical to the authors’ california_prop99.csv
(synthdid repo, the paper’s footnote source). California is excluded, leaving
38 states, 1970-2000.
Result#
The first T years fit the design, the remaining 31 − T are the
post-period. With no real treatment the true effect is zero, so the placebo
RMSE of the estimated effect measures design quality:
T=25 matches to the digit; T=15 is within ~2× (the no-public-code tolerance). The paper’s central claim reproduces at both horizons: SPCD ≪ Random ≪ SC.
Note
Table 1’s other block (US BLS unemployment, SPCD RMSE 0.9/0.6) is not
reproducible from the paper alone. mlsynth’s empirical_weights implements
Eq. 9 exactly (verified ‖w‖₁ = 2), yet a faithful run lands near 8 on those
noisy, rank-deficient (T₀=5, N=20) subsamples — SPCD ships no public code and
treats α/λ/β as unspecified “pre-defined” hyperparameters. The
discrepancy is under-specification, not an mlsynth defect; the Prop 99 cell is
the durable target.
Reproduce#
python benchmarks/run_benchmarks.py spcd_prop99
The durable case is benchmarks/cases/spcd_prop99.py; a self-contained
factor-model RMSE demonstration also lives in the estimator’s docs example.