NSC — Nonlinear Synthetic Control (Tian 2023)#
- Estimator:
Nonlinear Synthetic Control (NSC) —
mlsynth.NSC- Source:
Tian, Wei (2023), “The Synthetic Control Method with Nonlinear Outcomes,” arXiv:2306.01967v1.
- Replication type:
Cross-validation against the author’s R implementation on the canonical Proposition 99 panel (Section 5.1, Table 2) and Path B — the paper’s nonlinear-outcome Monte Carlo (Section 4, Table 1).
- Status:
Verified — the empirical weights and effect path match the author’s reference, and the simulation reproduces Table 1’s robust geometry.
Cross-validation — Proposition 99#
Tian’s headline empirical revisits Abadie-Diamond-Hainmueller’s California
tobacco study with the nonlinear synthetic control. Both the author’s R code
(benchmarks/R/nsc_tian2023_reference.R) and the ADH smoking panel
(basedata/smoking_data.csv) are public, so mlsynth’s NSC is validated
against them directly.
The reference cross-validation of the elastic-net penalty (a, b) is
stochastic — each fold draws a random held-in donor (hence set.seed(123)
in the author’s application script) — so it does not port to Python. The
application is deterministic given the selected penalty a* = 0.3, b* = 0.7
reported in Table 2, so we fix those and match the per-donor weights:
Quantity |
mlsynth NSC |
Tian Table 2 / paper |
|---|---|---|
weight correlation (38 donors) |
0.989 |
— |
max per-donor \(|\Delta|\) |
0.024 |
(Table 2 rounded to 3 dp) |
mean per-donor \(|\Delta|\) |
0.006 |
— |
average post-period effect |
\(-19.1\) |
\(\approx -19\) |
effect in 1990 / 1995 / 2000 |
\(-9.1 / -22.6 / -27.0\) |
\(-9.5 / -24.5 / -28.7\) |
mlsynth’s NSC is a faithful port of the reference QP — eigenvalue-scaled penalty,
the rbind(Z, -Z) negativity trick, distance-weighted L1 — so it recovers the
same signed donor pool (positive weights concentrated on Idaho, Montana,
Colorado, Connecticut; small negative weights on Alabama, Arkansas, Tennessee)
and the paper’s growing effect path. The residual per-weight differences
(\(<0.025\)) come from the standardisation convention and Table 2’s
three-decimal rounding.
The durable check lives in benchmarks/cases/nsc_prop99.py:
python benchmarks/run_benchmarks.py --case nsc_prop99
Path B — the nonlinear-outcome Monte Carlo#
The DGP (Tian 2023, Section 4, eqs. 9-10) is packaged in
mlsynth.utils.nsc_helpers.simulate.simulate_nsc_panel(): each unit carries
two observed and four unobserved predictors with N(10, 1) time coefficients;
the latent outcome is rescaled to \([0, 1]\) and raised to the power r
(r = 1 linear, r = 2 nonlinear, where standard SC is biased); the treated
unit receives the ramped effect \(0.02, 0.04, \ldots, 0.20\) over ten
post-treatment periods.
Table 1 reports three quantities for NSC across settings that vary the donor count \(J\), the pre-period length \(T_0\) and the nonlinearity \(r\). Two are robust at a small simulation count and are reproduced here on the nonlinear (\(r = 2\)) panel:
Near-nominal coverage — NSC’s 95% confidence interval covers the true per-period effect about 94% of the time (the paper reports ~0.935-0.950).
Error shrinks as the donor pool grows — the mean absolute error falls as \(J\) doubles from 25 to 50, the paper’s “more donors are unambiguously better in the nonlinear case”.
(Table 1’s signed bias column has a magnitude of ~0.01 that needs the paper’s 5000 simulations to estimate; the coverage and error-shrinkage geometry are the robust, reproducible findings at a benchmark-sized draw count.)
The durable check lives in benchmarks/cases/nsc_mc.py:
python benchmarks/run_benchmarks.py --case nsc_mc
What it confirms#
NSC is validated on two fronts: it reproduces the author’s published Proposition 99 result weight-for-weight against the reference implementation, and its inference is reliable under nonlinearity with error that shrinks as the donor pool grows — the two pillars of Tian (2023).