VanillaSC — Standard Synthetic Control (ADH 2010/2015; Abadie-Gardeazabal 2003)#

Estimator:: Vanilla Synthetic Control (VanillaSC) — mlsynth.VanillaSC
Source:: Abadie, Diamond & Hainmueller (2010) [ABADIE2010]; Abadie, Diamond & Hainmueller (2015) [Abadie2015]; Abadie & Gardeazabal (2003) [ABADIE2003]; leave-two-out placebo of Lei & Sudijono (2025).
Replication type:: Path A — the three canonical synthetic-control studies on their original datasets, plus the Lei-Sudijono (2025) Table-1 placebo relations.
Status:: Fully verified — donor pools, ATTs and (for LTO) the paper’s p-value relations reproduce. Locked as regression tests in mlsynth/tests/test_vanillasc_replications.py.

Three canonical SCM studies#

Each is trained on its full pre-treatment period, from the datasets shipped under basedata/.

California / Proposition 99 (ADH 2010)#

Treatment in 1989; pre-period 1970-1988. Covariates averaged over 1980-1988 (beer 1984-1988) plus three lagged cigarette-sales predictors (1975, 1980, 1988). With mscmt this reproduces ADH Table 2 almost exactly — Utah 0.335, Nevada 0.236, Montana 0.202, Colorado 0.160, Connecticut 0.068 (ADH: 0.334 / 0.234 / 0.199 / 0.164 / 0.069) — and an ATT of about \(-19\) packs.

import pandas as pd
from mlsynth import VanillaSC

d = pd.read_csv("basedata/augmented_cali_long.csv")
for yr, col in [(1975, "cig_1975"), (1980, "cig_1980"), (1988, "cig_1988")]:
    d[col] = d.state.map(d[d.year == yr].set_index("state").cigsale)
d["treated"] = ((d.state == "California") & (d.year >= 1989)).astype(int)

res = VanillaSC({
    "df": d, "outcome": "cigsale", "treat": "treated",
    "unitid": "state", "time": "year",
    "backend": "mscmt", "canonical_v": "min.loss.w", "seed": 1,
    "covariates": ["p_cig", "pct15-24", "loginc", "pc_beer",
                   "cig_1975", "cig_1980", "cig_1988"],
    "covariate_windows": {"p_cig": (1980, 1988), "pct15-24": (1980, 1988),
                          "loginc": (1980, 1988), "pc_beer": (1984, 1988)},
    "display_graphs": False,
}).fit()
print(res.effects.att)                 # ~ -19
print(res.weights.donor_weights)       # Utah/Nevada/Montana/Colorado/Connecticut

(In augmented_cali_long.csv the columns are labelled such that p_cig is log GDP per capita and loginc is the retail price — the predictor means reproduce ADH’s Table 1 “Real California” column.)

German reunification (ADH 2015)#

Treatment (reunification) in 1990; pre-period 1960-1990. GDP, trade, inflation and industry share averaged over 1981-1990; investment rate and schooling over 1980-1985. With mscmt the synthetic West Germany is Austria-dominant with the USA, Switzerland, Japan and the Netherlands — the ADH 2015 set — and a negative ATT (reunification lowered per-capita GDP relative to the synthetic).

import pandas as pd
from mlsynth import VanillaSC

d = pd.read_stata("basedata/repgermany.dta")
d["treated"] = ((d.country == "West Germany") & (d.year >= 1990)).astype(int)

res = VanillaSC({
    "df": d, "outcome": "gdp", "treat": "treated",
    "unitid": "country", "time": "year",
    "backend": "mscmt", "seed": 1,
    "covariates": ["gdp", "trade", "infrate", "industry", "invest80", "schooling"],
    "covariate_windows": {"gdp": (1981, 1990), "trade": (1981, 1990),
                          "infrate": (1981, 1990), "industry": (1981, 1990),
                          "invest80": (1980, 1980), "schooling": (1980, 1985)},
    "display_graphs": False,
}).fit()
print(res.weights.donor_weights)       # Austria/USA/Switzerland/Japan/Netherlands

Basque terrorism (Abadie-Gardeazabal 2003)#

The treatment indicator (terrorism) first turns on in 1975, so the model trains on the full 1955-1974 pre-period. On this long pre-period the problem is well-conditioned and the synthetic Basque is Cataluna :math:`approx 0.8`, Madrid :math:`approx 0.2` — the published Abadie-Gardeazabal result — with an ATT of about \(-0.68\) (the roughly 10% per-capita GDP gap). Outcome-only already recovers this; mscmt with the special-predictor covariates confirms it.

Note

This is instructive: on the short 1960-1969 window used by some later papers the Basque donor weights are fragile (they drift to Baleares/Madrid), but on the full 1955-1974 pre-period the long outcome path pins \(W\) to the Cataluna/Madrid solution. The training window matters; VanillaSC uses the full pre-period defined by the treatment indicator.

import pandas as pd
from mlsynth import VanillaSC

b = pd.read_csv("basedata/basque_data.csv")
b = b[b.regionno != 1]                                  # drop Spain
b["treated"] = ((b.regionno == 17) & (b.year >= 1975)).astype(int)

res = VanillaSC({
    "df": b, "outcome": "gdpcap", "treat": "treated",
    "unitid": "regionno", "time": "year",
    "backend": "outcome-only", "display_graphs": False,
}).fit()
print(res.effects.att)                 # ~ -0.68
print(res.weights.donor_weights)       # region 10 (Cataluna) ~0.8, 14 (Madrid) ~0.2

Leave-two-out placebo: Lei & Sudijono (2025) Table 1#

With inference="lto" VanillaSC runs the Lei-Sudijono (2025) refined placebo. Their Table 1 (covariate-matched Synth, \(\alpha = 0.05\)) lays out how the methods relate across the three canonical datasets, which mlsynth reproduces:

quantity	Prop 99	Basque	German
\(N\)	39	17	17
\(p_{\text{app-placebo}}\)	0.00	0.35	0.00
\(p_{\text{exact-placebo}}\)	0.026	0.41	0.059
\(p_{\mathrm{naive\text{-}LTO}}\)	0.024	0.67	0.042
\(p_{\mathrm{powered\text{-}LTO}}(\alpha)\)	0.022	0.66	0.03
\(\Gamma_{\mathrm{LTO}}\)	1.4	NA	1.1

Three relations are worth internalising:

LTO can change the conclusion (German). The exact placebo p-value of 0.059 does not reject at 0.05, but the naive LTO (0.042) and powered LTO (0.03) both do. With only 16 donors the placebo grid is too coarse to resolve a borderline effect; LTO’s finer grid does. The small \(\Gamma = 1.1\), though, warns that this significance is fragile to mild departures from uniform assignment.
LTO is not mechanically smaller (Basque). Here LTO (0.67) is larger than the placebo (0.41); nothing is significant by any method. The refinement changes granularity, not direction — it does not manufacture significance.
LTO ≈ placebo when both already reject (Prop 99). The two p-values (0.024 vs 0.026) nearly coincide; the powered version (0.022) buys a little extra margin, and \(\Gamma = 1.4\) says the conclusion survives moderate confounding.

The Lei-Sudijono helper constants reproduce the paper’s reported values exactly (c(39, 0.05) = 0.002, c(17, 0.05) = 0.0125; see test_lto_helpers_match_paper), and the covariate-matched ordinary placebo reproduces California’s exact-placebo p-value (rank 1 of 39, \(p = 0.0256\) vs the paper’s 0.026). The p-value tracks the chosen specification: the covariate-matched Synth concentrates the effect on California (\(p_{\mathrm{naive\text{-}LTO}} \approx 0.024\)), whereas the outcome-only fit — where California is only rank 3 of 39 — gives \(\approx 0.10\); both are internally consistent with their respective ordinary placebo p-values. Choose the specification before reading the test.

Cross-validation of the placebo inference#

The Path-A checks above pin the donor weights and the fit. The in-space placebo test that turns those into a p-value is cross-validated separately, against the two R packages a user would otherwise reach for, in benchmarks/cases/vanillasc_xval_references.py.

Run on the full Abadie-Diamond-Hainmueller predictor spec, mlsynth and tidysynth agree exactly: California ranks first among the 39 units and the p-value is \(1/39 = 0.0256\) in both. mlsynth’s donor weights also land closer to the published Table 2 than tidysynth’s (largest deviation \(0.004\) against \(0.017\)).

Run on a V-free objective – Synth driven with a uniform custom.v, which collapses its inner problem to the same quadratic program mlsynth solves with backend="outcome-only" – the two implementations again place California third, with a per-unit correlation of \(0.994\) across the placebo pool. Two differences are worth recording. mlsynth reaches a pre-treatment sum of squares of \(52.13\) against Synth’s \(55.70\), a 6.4 percent lower value of the objective Synth is itself minimising; and Synth’s ipop solver aborts on one donor (Nebraska, reporting a computationally singular system), so it fits 38 of the 39 placebos while mlsynth fits all 39. The benchmark reports both without asserting them, so a change surfaces without “mlsynth is better” becoming a pass condition.

One convention to watch when comparing by hand: tidysynth’s mspe_ratio is on the squared scale (\(120.5\) here) while mlsynth reports the RMSPE ratio, its square root (\(11.4\)). Reconciled, the two agree to 3.5 percent.