NSC — Nonlinear Synthetic Control (Tian 2023)#

Estimator:

Nonlinear Synthetic Control (NSC)mlsynth.NSC

Source:

Tian, Wei (2023), “The Synthetic Control Method with Nonlinear Outcomes,” arXiv:2306.01967v1.

Replication type:

Cross-validation against the author’s R implementation on the canonical Proposition 99 panel (Section 5.1, Table 2) and Path B — the paper’s nonlinear-outcome Monte Carlo (Section 4, Table 1).

Status:

Verified — the empirical weights and effect path match the author’s reference, and the simulation reproduces Table 1’s robust geometry.

Cross-validation — Proposition 99#

Tian’s headline empirical revisits Abadie-Diamond-Hainmueller’s California tobacco study with the nonlinear synthetic control. Both the author’s R code (benchmarks/R/nsc_tian2023_reference.R) and the ADH smoking panel (basedata/smoking_data.csv) are public, so mlsynth’s NSC is validated against them directly.

The reference cross-validation of the elastic-net penalty (a, b) is stochastic — each fold draws a random held-in donor (hence set.seed(123) in the author’s application script) — so it does not port to Python. The application is deterministic given the selected penalty a* = 0.3, b* = 0.7 reported in Table 2, so we fix those and match the per-donor weights:

Quantity

mlsynth NSC

Tian Table 2 / paper

weight correlation (38 donors)

0.989

max per-donor \(|\Delta|\)

0.024

(Table 2 rounded to 3 dp)

mean per-donor \(|\Delta|\)

0.006

average post-period effect

\(-19.1\)

\(\approx -19\)

effect in 1990 / 1995 / 2000

\(-9.1 / -22.6 / -27.0\)

\(-9.5 / -24.5 / -28.7\)

mlsynth’s NSC is a faithful port of the reference QP — eigenvalue-scaled penalty, the rbind(Z, -Z) negativity trick, distance-weighted L1 — so it recovers the same signed donor pool (positive weights concentrated on Idaho, Montana, Colorado, Connecticut; small negative weights on Alabama, Arkansas, Tennessee) and the paper’s growing effect path. The residual per-weight differences (\(<0.025\)) come from the standardisation convention and Table 2’s three-decimal rounding.

The durable check lives in benchmarks/cases/nsc_prop99.py:

python benchmarks/run_benchmarks.py --case nsc_prop99

Path B — the nonlinear-outcome Monte Carlo#

The DGP (Tian 2023, Section 4, eqs. 9-10) is packaged in mlsynth.utils.nsc_helpers.simulate.simulate_nsc_panel(): each unit carries two observed and four unobserved predictors with N(10, 1) time coefficients; the latent outcome is rescaled to \([0, 1]\) and raised to the power r (r = 1 linear, r = 2 nonlinear, where standard SC is biased); the treated unit receives the ramped effect \(0.02, 0.04, \ldots, 0.20\) over ten post-treatment periods.

Table 1 reports three quantities for NSC across settings that vary the donor count \(J\), the pre-period length \(T_0\) and the nonlinearity \(r\). Two are robust at a small simulation count and are reproduced here on the nonlinear (\(r = 2\)) panel:

  • Near-nominal coverage — NSC’s 95% confidence interval covers the true per-period effect about 94% of the time (the paper reports ~0.935-0.950).

  • Error shrinks as the donor pool grows — the mean absolute error falls as \(J\) doubles from 25 to 50, the paper’s “more donors are unambiguously better in the nonlinear case”.

(Table 1’s signed bias column has a magnitude of ~0.01 that needs the paper’s 5000 simulations to estimate; the coverage and error-shrinkage geometry are the robust, reproducible findings at a benchmark-sized draw count.)

The durable check lives in benchmarks/cases/nsc_mc.py:

python benchmarks/run_benchmarks.py --case nsc_mc

What it confirms#

NSC is validated on two fronts: it reproduces the author’s published Proposition 99 result weight-for-weight against the reference implementation, and its inference is reliable under nonlinearity with error that shrinks as the donor pool grows — the two pillars of Tian (2023).