SPCD — Lu et al. (2022) Prop 99 design study

Contents

SPCD — Lu et al. (2022) Prop 99 design study#

Reproduction of the real-data result (Section 4.2, Table 1) in

Lu, Li, Ying & Blanchet (2022). Synthetic Principal Component Design: Fast Covariate Balancing with Synthetic Controls. arXiv:2211.15241.

SPCD is a fast spectral experimental-design method — it selects the treated units and synthetic-control weights from pre-treatment data via a normalized generalized power method (phase synchronization), then estimates the treatment effect. The paper’s headline is that this design slashes the RMSE of the effect estimate versus a random design.

Data#

basedata/smoking_data.csv — the Abadie-Diamond-Hainmueller Prop 99 per-capita pack-sales panel; value-identical to the authors’ california_prop99.csv (synthdid repo, the paper’s footnote source). California is excluded, leaving 38 states, 1970-2000.

Result#

The first T years fit the design, the remaining 31 T are the post-period. With no real treatment the true effect is zero, so the placebo RMSE of the estimated effect measures design quality:

T=25 matches to the digit; T=15 is within ~2× (the no-public-code tolerance). The paper’s central claim reproduces at both horizons: SPCD ≪ Random ≪ SC.

Note

Table 1’s other block (US BLS unemployment, SPCD RMSE 0.9/0.6) is not reproducible from the paper alone. mlsynth’s empirical_weights implements Eq. 9 exactly (verified ‖w‖₁ = 2), yet a faithful run lands near 8 on those noisy, rank-deficient (T₀=5, N=20) subsamples — SPCD ships no public code and treats α/λ/β as unspecified “pre-defined” hyperparameters. The discrepancy is under-specification, not an mlsynth defect; the Prop 99 cell is the durable target.

Reproduce#

python benchmarks/run_benchmarks.py spcd_prop99

The durable case is benchmarks/cases/spcd_prop99.py; a self-contained factor-model RMSE demonstration also lives in the estimator’s docs example.