Vanilla Synthetic Control (VanillaSC)#
Overview#
VanillaSC is the standard synthetic control method (Abadie &
Gardeazabal 2003; Abadie, Diamond & Hainmueller 2010, 2015), built on
mlsynth’s self-contained bilevel engine. It estimates the effect on a
single treated unit by constructing a weighted average of donor units –
the synthetic control – that tracks the treated unit’s pre-treatment
path, and reads the effect as the post-treatment gap between the treated
unit and its synthetic counterpart.
What distinguishes this implementation is how it treats the two regimes of the SCM optimisation honestly:
No covariates -> the donor weights \(W\) solve the convex simplex least-squares fit on the pre-treatment outcomes. This is a single, well-posed convex program – deterministic and reproducible (unique up to donor collinearity).
Covariates -> the predictor weights \(V\) and donor weights \(W\) are chosen jointly through a bilevel program. This is non-convex, and the predictor weights are generically non-identified.
VanillaSCsolves it with a reliable backend and reports a diagnostic (\(\text{v\_agreement}\)) so that fragility is visible rather than silent.
Mathematical formulation#
For a treated unit with pre-treatment outcomes \(y_1 \in \mathbb{R}^{T_0}\) and donors \(Y_0 \in \mathbb{R}^{T_0 \times J}\):
Outcome-only (no covariates).
Covariate matching (bilevel). With predictor matrices \(X_1 \in \mathbb{R}^{P}\) (treated) and \(X_0 \in \mathbb{R}^{P \times J}\) (donors), each predictor averaged over its window and scaled to unit variance, the lower level solves, for given diagonal predictor weights \(V\),
and the upper level chooses \(V\) to minimise the pre-treatment outcome fit,
The donor weights \(W\) and the counterfactual are pinned by this program; the predictor weights \(V\) are generically not (a whole polytope of \(V\) reproduces the same \(W\)).
Backends#
The covariate path exposes three reliable solvers via backend=:
"outcome-only"No predictor weights; the convex simplex fit above. The well-posed default (also selected by
backend="auto"when no covariates are given)."mscmt"Becker & Kloessner (2018): a global differential-evolution search over \(\log_{10} V\) with the simplex inner solve. The default when covariates are supplied. Set
canonical_v="min.loss.w"(or"max.order") to report a canonical, reproducible \(V\) via the MSCMTdetermine_vstep."malo"Malo et al. (2024): a staged corner search. Fast and exact when the optimum is a predictor corner – but note that when a lagged outcome is among the predictors, the loss-minimising corner puts all weight on that lag, collapsing the inner match to pure outcome-fitting (it drifts toward the outcome floor).
"penalized"Abadie & L’Hour (2021): a pairwise-penalized estimator with leave-one-out \(\lambda\) selection, giving a unique, sparse \(W\). Works with or without covariates.
The identification diagnostic#
When covariates are used, res.weights.summary_stats["v_agreement"]
reports the maximum absolute difference between the two MSCMT canonical
predictor-weight vectors (min.loss.w and max.order). It is small
when \(V\) is well identified and large (up to 1) when the predictor
weights – and the donor weights they imply – are fragile. A large value
is a warning that the covariate-matched solution should not be
over-interpreted.
Inference#
Two inference modes are available via inference=:
"placebo"(default,inference=True)Abadie’s in-space placebo test: the synthetic control is refit treating each donor as pseudo-treated, and the treated unit’s post/pre RMSPE ratio is ranked against the placebo distribution to give a p-value. Simple and assumption-light, but the smallest achievable p-value is about \(1/(J+1)\).
"scpi"– prediction intervals (Cattaneo, Feng & Titiunik 2021)Treats \(\tau_T\) as a predictand (a random variable) and builds prediction intervals, decomposing the prediction error as
\[\widehat\tau_T - \tau_T = e_T - \mathbf{p}_T'(\widehat\beta - \beta_0),\]an out-of-sample shock \(e_T\) plus an in-sample weight-estimation error. The counterfactual prediction band is assembled period-by-period as \([\,Y_{\text{fit}} + w_L + e_L,\; Y_{\text{fit}} + w_U + e_U\,]\), and the treatment-effect interval is \([\,Y_{\text{obs}} - \text{cf}_U,\; Y_{\text{obs}} - \text{cf}_L\,]\).
In-sample (\(w_L\)/\(w_U\)): a simulation-based bound. With \(Q = Z'Z/T_0\) (donor pre-outcomes), \(\widehat\Sigma = Z' \mathrm{diag}(\omega)\,Z / T_0^2\) where \(\omega_t = \tfrac{T_0}{T_0-\mathrm{df}}(u_t - E[u_t])^2\) (HC1), and pre-period residuals \(u = A - B\widehat w\), draw \(G^\star \sim N(0,\widehat\Sigma)\). For each draw and predictor \(\mathbf{p}_T\), solve over the localised simplex set
\[\min/\max\ \mathbf{p}_T'x \quad\text{s.t.}\quad (x-\widehat w)'Q(x-\widehat w) - 2G^{\star\prime}(x-\widehat w) \le 0,\; \textstyle\sum x = 1,\; x \ge \ell,\]with \(\ell_j = \widehat w_j\) if \(\widehat w_j < \rho\) else \(0\). The regularisation parameter \(\rho\) is data-driven and capped at \(\rho_{\max} = 0.2\); \(Q\) is reduced via a thresholded eigen-square-root so collinear (near-null) donor directions are left unconstrained. \(w_L\)/\(w_U\) are the \(\alpha_1/2\) / \(1-\alpha_1/2\) quantiles of \(\mathbf{p}_T'(\widehat w - x)\) across draws.
Out-of-sample (\(e_L\)/\(e_U\)): a location-scale model, \(e_T = E[e] + \sqrt{\mathrm{Var}[e]}\,\varepsilon\). The conditional mean and a log-variance scale (capped by the residual IQR, Gaussian \(\varepsilon\)) are estimated by regressing \(u\) on the active-donor design;
"ls"and"empirical"use standardized / raw residual quantiles.
VanillaSCreturns the average-effect (ATT) interval inres.inference.ci_lower/ci_upperand the full per-period sequence (point effects, prediction intervals, counterfactual bands, and the in-/out-of-sample components) inres.inference.details. This implements the canonical simplex / outcome-only case; for covariate backends it uses the same outcome design and is approximate.Note
This is a self-contained, MIT-licensed re-derivation of the Cattaneo-Feng-Titiunik algorithm – it does not import the GPL reference package
scpi. It is validated to reproducescpi’sCI_all_gaussianon the Proposition 99 panel to within Monte-Carlo error (seetest_scpi_matches_reference_package, which is skipped unlessscpi_pkghappens to be installed)."lto"– leave-two-out refined placebo (Lei & Sudijono 2025)A design-based randomization test that fixes the two structural weaknesses of the ordinary placebo test – its coarse \(\{1/N, 2/N, \dots\}\) grid and its zero size when \(\alpha < 1/N\). It replaces the “one turn each” permutation with a tournament over triples and reports both a naive p-value (
res.inference.p_value) and a powered one (details["p_powered"]), together with the Type-I bound and tournament tallies. It shares the placebo test’s assumptions but is far more powerful in small donor pools. See The leave-two-out refined placebo test and the two theory subsections below for the full treatment.
How the SCPI machinery works (one fit)#
scpi_intervals(y, Y0, pre, W, ...) takes the fitted donor weights
\(\widehat w\) (from any backend), the donor outcome matrix, and the
number of pre-treatment periods, and runs the following steps. Let
\(A = y_{1:T_0}\) be the treated pre-outcomes, \(B = Y_{0,\,1:T_0}\)
the donor pre-outcomes, \(P\) the donor post-outcomes, and
\(u = A - B\widehat w\) the pre-period residuals.
Degrees of freedom. For the simplex, \(\mathrm{df} = (\#\{\widehat w_j \neq 0\}) - 1\), giving the HC1 correction \(\mathrm{vc} = T_0/(T_0-\mathrm{df})\).
Regularisation parameter \(\rho\). The data-driven
type-1value \(\rho = \tfrac{\sigma_u}{\min_j \mathrm{sd}(B_j)} \sqrt{\log(J)\, d_0 \log T_0}/\sqrt{T_0}\), capped at \(\rho_{\max}=0.2\) (with a fallback bump if it comes out below \(0.001\)). \(\rho\) defines the “active” donor set \(\{\,j : \widehat w_j > \rho\,\}\).Conditional mean & variance. Regress \(u\) on the active-donor design \([\,B_{\cdot,\text{active}},\,\mathbf{1}\,]\) to get \(E[u]\) (the
u_misspstep), then \(\omega_t = \mathrm{vc}\,(u_t - E[u_t])^2\). Form \(Q = B'B/T_0\) and \(\widehat\Sigma = B'\mathrm{diag}(\omega)B/T_0^2\), and its matrix square root \(\Sigma^{1/2}\).Localised feasible set. Lower bounds \(\ell_j = \widehat w_j\) if \(\widehat w_j < \rho\) else \(0\) (near-binding donors are pinned at their tiny weight; active donors may move down to zero). \(Q\) is reduced by a thresholded eigen-square-root so the near-null (collinear) directions are left unconstrained.
In-sample simulation. For each of
scpi_simsdraws \(G^\star = \Sigma^{1/2}\,z\), \(z\sim N(0,I)\), and each post predictor \(\mathbf{p}_T\), solve the small conic program in \(x\) (donor weights) twice – minimise and maximise \(\mathbf{p}_T'x\) subject to \((x-\widehat w)'Q(x-\widehat w) - 2G^{\star\prime}(x-\widehat w)\le 0\), \(\sum x = 1\), \(x\ge\ell\). Record \(\mathbf{p}_T'(\widehat w - x)\) for each branch; \(w_L\)/\(w_U\) are the \(\alpha_1/2\) / \(1-\alpha_1/2\) quantiles across draws.Out-of-sample band. From the location-scale model on \(u\) get \(e_L\)/\(e_U\) per post period (Section above).
Assemble. Counterfactual band \([\,Y_{\text{fit}} + w_L + e_L,\; Y_{\text{fit}} + w_U + e_U\,]\), effect interval \([\,Y_{\text{obs}} - \text{cf}_U,\; Y_{\text{obs}} - \text{cf}_L\,]\), and an ATT interval from an appended post-period-average predictor row. An extra averaged row is carried through steps 5-6 so the ATT interval uses the same simulation, not a naive average of the per-period bounds.
The result is an InferenceResults with ci_lower/ci_upper (the ATT
interval), confidence_level \(= 1-2\alpha\), and a details dict
holding the per-period periods, tau, pi_lower/pi_upper,
counterfactual_lower/upper, the in_sample_* (\(w_L,w_U\)) and
out_of_sample_* (\(e_L,e_U\)) components, sims and e_method.
Composing SCPI with the backends#
backend (how \(W\) is estimated) and inference (how uncertainty is
quantified) are orthogonal – any of the four backends pairs with any of
the three inference modes:
VanillaSC({..., "backend": "mscmt", "inference": "scpi"}).fit()
VanillaSC({..., "backend": "malo", "inference": "scpi"}).fit()
The pipeline fits the weights with the chosen backend and hands the resulting
res.W to scpi_intervals. Two things to keep in mind:
The in-sample simulation rebuilds \(Q\) and \(\widehat\Sigma\) from the donor pre-outcomes \(B\), treating \(\widehat w\) as simplex weights. With outcome-only this is the exact Cattaneo-Feng-Titiunik interval (the case validated against
scpi). With mscmt/malo the weights were also shaped by the covariate predictors, so SCPI uses the outcome design as a stand-in – it is approximate for covariate backends. The point effects, the ATT, and the out-of-sample band are unaffected; only the in-sample \(w_L\)/\(w_U\) term carries the approximation.Read the SCPI interval alongside \(\text{v\_agreement}\). When the predictor weights are non-identified (
v_agreementnear 1, e.g. Prop 99 with lagged outcomes) the point counterfactual is still pinned, but the covariate-matched solution is fragile; the placebo test, which is exact for any backend, is the conservative cross-check.
The leave-two-out (LTO) refined placebo test#
What it is. The ordinary placebo test (above) gives each of the \(N\) units exactly one turn as the pseudo-treated unit and ranks the real treated unit’s fit statistic against those \(N\) values. That is its weakness: the p-value can only land on the grid \(\{1/N, 2/N, \dots, 1\}\), and at a conventional level like \(\alpha = 0.05\) with a small donor pool the test is either coarse or – when \(\alpha < 1/N\) – literally unable to reject (its size is zero). The Lei-Sudijono (2025) leave-two-out test keeps the same design-based logic but replaces the “one turn each” permutation with a tournament over triples. Think of every triple \(\{i, j, I\}\) (two controls and the treated unit) as a match: leave all three out of the donor pool, build a synthetic control for each of them from the remaining \(N-3\) units, score each by its post/pre RMSPE ratio, and the unit with the largest ratio “wins” the match. The treated unit should win often if the treatment had a real effect (a large post-period gap relative to a tight pre-period fit). The p-value is the fraction of matches the treated unit does not win,
where \(R_{i,j,I;k} = \lvert S_{\text{ratio-RMSPE}}(Y_k, \widehat Y_k)\rvert\) is the score of unit \(k\) when the pool excludes \(\{i, j, I\}\). Because there are \(\binom{N-1}{2}\) matches rather than \(N\), the p-value lives on an \(O(N^2)\)-fine grid – the granularity problem disappears.
Two p-values. res.inference.p_value is the naive LTO p-value above.
res.inference.details["p_powered"] is the powered variant
\(p_{\mathrm{naive\text{-}LTO}} - c(N, \alpha) + \delta\), which shifts the
naive value down by the largest amount the discrete Type-I bound allows
(powered_offset_c), strictly increasing power. The powered value is a
decision rule tied to one \(\alpha\) – reject when it is
\(\le \alpha\) (reject_at_alpha) – not a general-purpose p-value, so do
not compare it across levels or report it as “the” p-value.
LTO: design-based assumptions and econometric theory#
The LTO test is design-based, not outcome-model-based: the potential outcomes are treated as fixed, and all randomness comes from which unit got treated. Its validity rests on two assumptions.
Uniform assignment. The treated index \(I\) is uniformly distributed over \(\{1, \dots, N\}\) – a priori, any unit was equally likely to be the treated one. Under the null this makes the \(N\) units exchangeable, which is exactly what licenses the tournament. This holds by construction in (cluster-)randomized experiments. In observational work it is a modelling choice: it is most defensible when the treated unit is comparable to the donors (often after covariate adjustment), and in quasi-experimental settings – e.g. natural disasters, where which locality is hit is plausibly close to random over a small comparable region.
Sharp null. The hypothesis tested is Fisher’s sharp null \(H_0 : Y_{i,t}(1) = Y_{i,t}(0)\) for all \(t > T_0\) (no effect for any unit in any post period), or a known-\(\tau\) additive version \(Y_{i,t}(1) = Y_{i,t}(0) + \tau_{i,t}\). Sharpness is what lets the test impute every unit’s counterfactual under the null and so run the tournament.
Under these, the test has a finite-sample Type-I error guarantee (no large-\(N\), no long-\(T\), no asymptotics):
reported as type_i_bound. This bound is never worse than the
approximate-placebo bound \((\lfloor N\alpha\rfloor + 1)/N\), and for the
levels and sizes typical of SCM applications (\(\alpha \in \{0.01, 0.02\}\)
for \(6 < N < 200\); \(\alpha = 0.05\) for most \(N\)) it is
identical to it – so switching to LTO costs nothing in worst-case Type-I
error. Crucially, the placebo bound is tight whereas the LTO bound generally
is not: in practice the LTO test’s actual Type-I error is often strictly
below \(\alpha\), i.e. it can be unconditionally valid even when
\(\alpha < 1/N\).
Two further theoretical properties matter in practice:
Consistency where the placebo test fails (Theorem 6.1). When \(\alpha < 1/N\), the LTO test is uniformly consistent – its power goes to 1 as the effect size grows. The approximate placebo test is not: in this regime it can have essentially zero power no matter how large the true effect (zero if \(N\) is even, \(\le 1/N\) if odd). This is the single strongest reason to prefer LTO in small donor pools.
Confidence regions. Inverting the additive-\(\tau\) test (\(\{\theta : p_{\mathrm{naive\text{-}LTO}}(\theta) > \alpha\}\)) yields a region for the post-period effect path with guaranteed coverage \(\ge 1 - \lfloor N f(N,\alpha)\rfloor / N\). (mlsynth currently reports the p-values; the inversion is a straightforward extension.)
Methodologically, the LTO test is a new kind of randomization inference: it
generalises the Jackknife+ of Barber et al. (2021) (which leaves one point
out and so still has \(1/N\) granularity) and is distinct from classical
permutation/rank inference. It also – unlike most asymptotic SCM inference –
does not simplify the synthetic-control construction: the full
weight/predictor machinery (any VanillaSC backend) is re-run inside every
match, so the test reflects the estimator you actually use.
When the LTO assumptions are violated#
The sharp null is testable and usually uncontroversial; the uniform assignment assumption is where care is needed.
Selection on outcomes / non-comparable treated unit. If the treated unit was chosen because of its (anticipated) trajectory, or is structurally unlike every donor, exchangeability fails and the Type-I guarantee no longer holds. The usual remedy is to restore comparability through the specification – match on covariates, restrict the donor pool to genuinely similar units – before trusting any placebo-type p-value.
Known non-uniform assignment. When the treatment probabilities \(\pi_k\) are known or estimable (e.g. seismic risk for an earthquake study), Lei-Sudijono give a weighted LTO p-value \(p_{\text{w-LTO}}(\pi)\) that reweights each match by \(\pi_j\pi_k / ((1-\pi_I)^2 - \sum_{l\neq I}\pi_l^2)\) and reduces to the naive value when \(\pi_i \equiv 1/N\).
Sensitivity analysis (the \(\Gamma\) ). Rather than commit to uniformity, one can ask how far from it the design could be before the conclusion flips. Following Rosenbaum, constrain \(\pi_i \in [\tfrac{1}{\Gamma N}, \tfrac{\Gamma}{N}]\) and find the smallest \(\Gamma \ge 1\) at which the worst-case weighted p-value crosses \(\alpha\). In the paper, Prop 99 tolerates \(\Gamma \approx 1.4\) (robust) while German reunification flips at only \(\Gamma \approx 1.1\) (fragile). The weighted p-value and \(\Gamma\) search require solving a non-convex (NP-hard) quadratic program and are not yet implemented in
VanillaSC; the uniform-assignment naive/powered p-values are.
Choosing among placebo, LTO, and SCPI#
Prefer LTO over the ordinary placebo whenever the donor pool is small – especially in the \(\alpha < 1/N\) regime (e.g. \(N \le 20\) at \(\alpha = 0.05\)), where the placebo test cannot reject and LTO can. The
poweredvariant almost Pareto-improves the placebo: same worst-case Type-I error, more power. Both share the same assumptions, so LTO is close to a free upgrade.Keep the ordinary placebo when you want the most familiar, widely-reported statistic, when \(N\) is large enough that granularity is a non-issue, or as a cheap (\(O(N)\) vs \(O(N^2)\)) cross-check.
Reach for SCPI when the question is how big the effect is (a prediction interval / confidence statement on the magnitude), not just whether there is one. SCPI rests on different (model-based, conditional) foundations than the design-based placebo/LTO tests, so the two are complementary: LTO answers “is the treated unit special?” by randomization, SCPI quantifies the effect’s uncertainty.
When to use it#
You want the standard synthetic control done reliably, with the solver choice and identification fragility surfaced.
Outcome-only matching when you have a long, informative pre-period – this is the well-posed, reproducible case.
Covariate matching with
mscmtwhen the donor pool is rich enough that the problem is well-conditioned (see the replications below). When \(\text{v\_agreement}\) comes back near 1, prefer outcome-only orpenalized.
Empirical replications#
The three canonical studies, each trained on its full pre-treatment
period. All run from the datasets shipped under basedata/. These are
locked as regression tests in
mlsynth/tests/test_vanillasc_replications.py.
California / Proposition 99 (ADH 2010)#
Treatment in 1989; pre-period 1970-1988. Covariates averaged over
1980-1988 (beer 1984-1988) plus three lagged cigarette-sales predictors
(1975, 1980, 1988). With mscmt this reproduces ADH Table 2 almost
exactly – Utah 0.335, Nevada 0.236, Montana 0.202, Colorado 0.160,
Connecticut 0.068 (ADH: 0.334 / 0.234 / 0.199 / 0.164 / 0.069) – and an
ATT of about \(-19\) packs.
import pandas as pd
from mlsynth import VanillaSC
d = pd.read_csv("basedata/augmented_cali_long.csv")
for yr, col in [(1975, "cig_1975"), (1980, "cig_1980"), (1988, "cig_1988")]:
d[col] = d.state.map(d[d.year == yr].set_index("state").cigsale)
d["treated"] = ((d.state == "California") & (d.year >= 1989)).astype(int)
res = VanillaSC({
"df": d, "outcome": "cigsale", "treat": "treated",
"unitid": "state", "time": "year",
"backend": "mscmt", "canonical_v": "min.loss.w", "seed": 1,
"covariates": ["p_cig", "pct15-24", "loginc", "pc_beer",
"cig_1975", "cig_1980", "cig_1988"],
"covariate_windows": {"p_cig": (1980, 1988), "pct15-24": (1980, 1988),
"loginc": (1980, 1988), "pc_beer": (1984, 1988)},
"display_graphs": False,
}).fit()
print(res.effects.att) # ~ -19
print(res.weights.donor_weights) # Utah/Nevada/Montana/Colorado/Connecticut
(In augmented_cali_long.csv the columns are labelled such that
p_cig is log GDP per capita and loginc is the retail price – the
predictor means reproduce ADH’s Table 1 “Real California” column.)
German reunification (ADH 2015)#
Treatment (reunification) in 1990; pre-period 1960-1990. GDP, trade,
inflation and industry share averaged over 1981-1990; investment rate and
schooling over 1980-1985. With mscmt the synthetic West Germany is
Austria-dominant with the USA, Switzerland, Japan and the Netherlands –
the ADH 2015 set – and a negative ATT (reunification lowered per-capita
GDP relative to the synthetic).
import pandas as pd
from mlsynth import VanillaSC
d = pd.read_stata("basedata/repgermany.dta")
d["treated"] = ((d.country == "West Germany") & (d.year >= 1990)).astype(int)
res = VanillaSC({
"df": d, "outcome": "gdp", "treat": "treated",
"unitid": "country", "time": "year",
"backend": "mscmt", "seed": 1,
"covariates": ["gdp", "trade", "infrate", "industry", "invest80", "schooling"],
"covariate_windows": {"gdp": (1981, 1990), "trade": (1981, 1990),
"infrate": (1981, 1990), "industry": (1981, 1990),
"invest80": (1980, 1980), "schooling": (1980, 1985)},
"display_graphs": False,
}).fit()
print(res.weights.donor_weights) # Austria/USA/Switzerland/Japan/Netherlands
Basque terrorism (Abadie-Gardeazabal 2003)#
The treatment indicator (terrorism) first turns on in 1975, so the
model trains on the full 1955-1974 pre-period. On this long pre-period
the problem is well-conditioned and the synthetic Basque is Cataluna
:math:`approx 0.8`, Madrid :math:`approx 0.2` – the published
Abadie-Gardeazabal result – with an ATT of about \(-0.68\) (the
roughly 10% per-capita GDP gap). Outcome-only already recovers this;
mscmt with the special-predictor covariates confirms it.
Note
This is instructive: on the short 1960-1969 window used by some later
papers the Basque donor weights are fragile (they drift to
Baleares/Madrid), but on the full 1955-1974 pre-period the long outcome
path pins \(W\) to the Cataluna/Madrid solution. The training
window matters; VanillaSC uses the full pre-period defined by the
treatment indicator.
import pandas as pd
from mlsynth import VanillaSC
b = pd.read_csv("basedata/basque_data.csv")
b = b[b.regionno != 1] # drop Spain
b["treated"] = ((b.regionno == 17) & (b.year >= 1975)).astype(int)
res = VanillaSC({
"df": b, "outcome": "gdpcap", "treat": "treated",
"unitid": "regionno", "time": "year",
"backend": "outcome-only", "display_graphs": False,
}).fit()
print(res.effects.att) # ~ -0.68
print(res.weights.donor_weights) # region 10 (Cataluna) ~0.8, 14 (Madrid) ~0.2
Core API#
VanillaSC: the standard synthetic control, on the bilevel engine.
The ordinary single-treated synthetic control method (Abadie & Gardeazabal 2003; Abadie, Diamond & Hainmueller 2010), implemented on mlsynth’s self-contained bilevel machinery:
No covariates -> the well-posed convex problem: donor weights
Wminimise the pre-treatment outcome fit on the simplex. Unique up to donor collinearity, deterministic, reproducible.Covariates -> the bilevel program (predictor weights
V+ donor weightsW), solved by a reliable backend:"mscmt"(global differential evolution, Becker-Kloessner 2018),"malo"(corner search, Malo et al. 2024), or"penalized"(unique/sparse, Abadie-L’Hour 2021).
Because predictor weights are generically non-identified, VanillaSC reports a
v_agreement diagnostic (the gap between the two MSCMT canonical V
choices): small means V is well identified, large means the predictor
weights – and the donor weights they imply – are fragile.
- class mlsynth.estimators.vanillasc.VanillaSC(config: VanillaSCConfig | dict)#
Bases:
objectStandard synthetic control estimator (bilevel engine).
- Parameters:
config (VanillaSCConfig or dict) – Configuration. See
mlsynth.config_models.VanillaSCConfig.
Examples
>>> from mlsynth.estimators.vanillasc import VanillaSC >>> cfg = {"df": panel, "outcome": "gdp", "treat": "treated", ... "unitid": "country", "time": "year", ... "covariates": ["trade", "infrate"], "backend": "mscmt"} >>> res = VanillaSC(cfg).fit() >>> res.effects.att
- fit() BaseEstimatorResults#
Estimate the synthetic control and return standardized results.
Configuration#
- class mlsynth.config_models.VanillaSCConfig(*, df: ~pandas.DataFrame, outcome: str, treat: str, unitid: str, time: str, display_graphs: bool = True, save: bool | str = False, counterfactual_color: ~typing.List[str] = <factory>, treated_color: str = 'black', backend: ~typing.Literal['auto', 'outcome-only', 'malo', 'mscmt', 'penalized'] = 'auto', covariates: ~typing.List[str] | None = None, covariate_windows: ~typing.Dict[~typing.Any, ~typing.Any] | None = None, canonical_v: bool | str = False, seed: int = 0, mscmt_maxiter: ~typing.Annotated[int, ~annotated_types.Ge(ge=1)] = 300, mscmt_popsize: ~typing.Annotated[int, ~annotated_types.Ge(ge=1)] = 15, inference: bool | str = True, alpha: ~typing.Annotated[float, ~annotated_types.Gt(gt=0.0), ~annotated_types.Lt(lt=1.0)] = 0.05, scpi_sims: ~typing.Annotated[int, ~annotated_types.Ge(ge=1)] = 200, scpi_e_method: ~typing.Literal['gaussian', 'empirical'] = 'gaussian', lto_max_pairs: ~typing.Annotated[int | None, ~annotated_types.Ge(ge=1)] = None)#
Configuration for the VanillaSC estimator (standard SCM, bilevel engine).
The ordinary single-treated synthetic control, built on the self-contained bilevel machinery. With no covariates it reduces to the well-posed convex outcome-matching problem; with covariates it routes through the bilevel predictor-weight (
V) optimisation, with a selectable, reliable backend.- Parameters:
backend ({“auto”, “outcome-only”, “malo”, “mscmt”, “penalized”}) – Predictor-weight backend.
"auto"(default) uses"outcome-only"(convex simplex fit on pre-treatment outcomes) when no covariates are given, and"mscmt"(global differential-evolutionVsearch) when they are."malo"is the Malo et al. (2024) corner search,"penalized"the Abadie-L’Hour (2021) unique/sparse estimator.covariates (list of str, optional) – Predictor columns. Each is averaged over its window (see
covariate_windows) and scaled to unit variance, then matched via the bilevel program.None-> outcome-only matching.covariate_windows (dict, optional) – Per-covariate inclusive
(start, end)averaging window of time labels (Abadie’s special-predictor spec). Covariates not listed are averaged over the full pre-treatment period.canonical_v (bool or {“min.loss.w”, “max.order”}) – Canonicalise the (non-identified) predictor weights for
mscmt(MSCMTdetermine_v). The reportedv_agreementis small whenVis well identified and large when it is fragile. Default False.seed (int) – RNG seed for the
mscmtdifferential-evolution search.mscmt_maxiter, mscmt_popsize (int) – Differential-evolution budget for the
mscmtbackend.inference (bool or {“placebo”, “scpi”, “lto”}) – Inference method.
True/"placebo"(default) runs Abadie in-space placebo inference (refit treating each donor as pseudo-treated; the p-value ranks the treated unit’s post/pre RMSPE ratio)."scpi"runs Cattaneo-Feng-Titiunik (2021) prediction intervals (in-sample simulation + out-of-sample location-scale; exact for the simplex / outcome-only synthetic control)."lto"runs the Lei-Sudijono (2025) leave-two-out refined placebo test (O(J^2) reference comparisons; finer granularity and non-zero size whenalpha < 1/N).Falseskips inference.alpha (float) – Level. For placebo, the confidence statement; for SCPI, used as both the in-sample (alpha1) and out-of-sample (alpha2) levels, giving a prediction interval with coverage approximately
1 - 2*alpha.scpi_sims (int) – Number of Gaussian draws for the SCPI in-sample simulation.
scpi_e_method ({“gaussian”, “empirical”}) – Out-of-sample location-scale tabulation for SCPI.
lto_max_pairs (int, optional) – Cap on the number of donor pairs evaluated by the
"lto"test (deterministic subsample viaseed).None(default) uses allJ*(J-1)/2pairs; set a cap to keep the O(J^2) cost tractable with slow backends.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
Engine#
Bilevel SCM engine for the VanillaSC estimator.
A thin, dataprep-agnostic wrapper that turns the self-contained bilevel
machinery (mlsynth.utils.fscm_helpers.bilevel) into the standard
single-treated synthetic control, with a selectable predictor-weight backend:
"outcome-only"– no covariates: the donor weights solve the convex simplex least-squares fit on the pre-treatment outcomes (Abadie’s outcome matching). Well-posed and unique up to donor collinearity."malo"/"mscmt"– covariate matching via the bilevel program (predictor weightsV+ donor weightsW): Malo et al. (2024) corner search or Becker-Kloessner (2018) global differential evolution."penalized"– Abadie-L’Hour (2021) pairwise-penalized estimator (a unique, sparseW); works with or without covariates.
The engine takes plain NumPy arrays so it can be unit-tested in isolation and
reused outside the estimator. The VanillaSC estimator feeds it the matrices
produced by mlsynth.utils.datautils.dataprep().
- class mlsynth.utils.vanillasc_helpers.engine.BilevelSCM(backend: str = 'auto', *, canonical_v=False, seed: int = 0, **solver_kwargs: Any)#
Standard single-treated synthetic control via the bilevel solver.
- Parameters:
backend ({“auto”, “outcome-only”, “malo”, “mscmt”, “penalized”}) – Predictor-weight backend.
"auto"(default) picks"outcome-only"when no covariates are supplied and"mscmt"when they are.canonical_v (bool or {“min.loss.w”, “max.order”}) – Canonicalise the (non-identified) predictor weights
Vfor themscmtbackend (seemlsynth.utils.fscm_helpers.bilevel.determine_v.canonical_v()). Ignored by other backends. DefaultFalse.seed (int) – RNG seed for the
mscmtdifferential-evolution search.solver_kwargs – Extra keyword arguments forwarded to the backend (e.g.
maxiter,popsizeformscmt;lamforpenalized).
- fit(y_pre: ndarray, Y0_pre: ndarray, *, X1: ndarray | None = None, X0: ndarray | None = None, donor_names: List[str] | None = None, predictor_names: List[str] | None = None) BilevelSCMResult#
Solve for the donor weights
W.- Parameters:
y_pre (np.ndarray) – Treated pre-treatment outcomes, shape
(T0,).Y0_pre (np.ndarray) – Donor pre-treatment outcomes, shape
(T0, J).X1 (np.ndarray, optional) – Treated predictor (covariate) values, shape
(P,).X0 (np.ndarray, optional) – Donor predictor matrix, shape
(P, J).donor_names (list of str, optional) – Donor labels (defaults to
donor_0 ...).predictor_names (list of str, optional) – Predictor labels.
- class mlsynth.utils.vanillasc_helpers.engine.BilevelSCMResult(W: ~numpy.ndarray, donor_weights: ~typing.Dict[str, float], V: ~numpy.ndarray | None, predictor_names: ~typing.List[str], backend: str, pre_rmspe: float, v_agreement: float | None = None, diagnostics: ~typing.Dict[str, ~typing.Any] = <factory>)#
Output of
BilevelSCM.fit().- W#
Donor weights, shape
(J,), on the simplex.- Type:
np.ndarray
- V#
Predictor (covariate) weights, shape
(P,), normalised to the simplex;Nonefor outcome-only matching.- Type:
np.ndarray or None
- v_agreement#
Identification diagnostic: max abs difference between the
min.loss.wandmax.ordercanonical predictor weights. Small =Vwell identified; large = predictor weights fragile.Nonefor outcome-only.- Type:
float or None
- W: ndarray#
- counterfactual(donor_matrix: ndarray) ndarray#
Synthetic outcome path
donor_matrix @ Wover all periods.donor_matrixhas shape(T, J)(donors in columns, matching the order ofW).
Orchestration for the VanillaSC estimator.
dataprep -> (optional) covariate matrices -> bilevel engine -> ATT, fit diagnostics, in-space placebo inference -> standardized results.
- mlsynth.utils.vanillasc_helpers.pipeline.run_vanillasc(config) BaseEstimatorResults#
Fit VanillaSC and assemble
BaseEstimatorResults.
SCPI prediction intervals for the (simplex) synthetic control.
Cattaneo, Feng & Titiunik (2021, JASA) and Cattaneo, Feng, Palomba &
Titiunik (2025, JSS scpi). The prediction error of the synthetic-control
counterfactual decomposes as
tau_hat_T - tau_T = e_T - p_T’ (beta_hat - beta_0),
an out-of-sample shock e_T plus an in-sample weight-estimation error.
This module is a from-scratch (MIT-licensed) re-derivation of the algorithm
described in those papers – it does not import the GPL scpi package –
and has been validated to reproduce scpi’s CI_all_gaussian for the
canonical simplex control to within Monte-Carlo error.
The counterfactual prediction band is assembled period-by-period as
[ Y_fit + w_lb + e_lb , Y_fit + w_ub + e_ub ],
with the treatment-effect interval [Y_obs - cf_upper, Y_obs - cf_lower].
In-sample component (w_lb/w_ub)#
With Z = B (donor pre-outcomes), Q = Z'Z / T0 and pre-period
residuals u = A - B w_hat, draw G* ~ N(0, Sigma) with
Sigma = Z' diag(omega) Z / T0**2 and omega_t = (T0/(T0-df)) (u_t -
E[u_t])**2 (HC1; E[u] from a regression of u on the active-donor
design when u_missp). For each draw and post-period predictor p_T
solve, over the localised simplex set,
- min / max p_T’ x s.t. (x - w_hat)’Q(x - w_hat) - 2 G*’(x - w_hat) <= 0,
sum(x) = 1, x >= lb,
where lb_j = w_hat_j if w_hat_j < rho else 0 (the local geometry of
Cattaneo et al.; rho is the data-driven regularisation parameter, capped at
rho_max = 0.2). Q is reduced via a thresholded eigen-square-root so that
collinear (near-null) donor directions are left unconstrained, exactly as in the
reference conic reformulation. w_lb/w_ub are the alpha1/2 /
1 - alpha1/2 quantiles, across draws, of p_T'(w_hat - x) for the
maximising / minimising branch.
Out-of-sample component (e_lb/e_ub)#
A location-scale model for e_T: regress u on the active-donor design
to get the conditional mean E[e] and a log-variance model for the scale
sqrt(Var[e]) (Gaussian), capped by the inter-quartile range of the
residuals (IQR / 1.34). The Gaussian band is E[e] +/- sqrt(-2 ln alpha2)
* scale; "ls" uses standardized-residual quantiles, "empirical" the
raw residual quantiles.
This implements the canonical simplex case (w >= 0, sum w = 1), the
scpi default and the standard synthetic control.
- class mlsynth.utils.vanillasc_helpers.scpi.SCPIResult(tau: ndarray, lower: ndarray, upper: ndarray, cf_lower: ndarray, cf_upper: ndarray, M1_lower: ndarray, M1_upper: ndarray, M2_lower: ndarray, M2_upper: ndarray, metadata: Dict[str, Any])#
Per-post-period SCPI prediction intervals (arrays of length T_post).
- M1_lower: ndarray#
- M1_upper: ndarray#
- M2_lower: ndarray#
- M2_upper: ndarray#
- cf_lower: ndarray#
- cf_upper: ndarray#
- lower: ndarray#
- tau: ndarray#
- upper: ndarray#
- mlsynth.utils.vanillasc_helpers.scpi.scpi_intervals(y: ndarray, Y0: ndarray, pre: int, W: ndarray, *, sims: int = 200, u_alpha: float = 0.05, e_alpha: float = 0.05, u_missp: bool = True, e_method: str = 'gaussian', seed: int = 0) SCPIResult#
Compute SCPI prediction intervals for a simplex synthetic control.
- Parameters:
y (np.ndarray) – Treated outcome over all periods, shape
(T,).Y0 (np.ndarray) – Donor outcomes over all periods, shape
(T, J)(columns matchW).pre (int) – Number of pre-treatment periods
T0.W (np.ndarray) – Fitted simplex donor weights, shape
(J,).sims (int) – Number of Gaussian draws for the in-sample simulation.
u_alpha, e_alpha (float) – In-sample (
alpha1) and out-of-sample (alpha2) levels.u_missp (bool) – If True, allow
E[u | H] != 0(estimated by regressing the pre-period residuals on the active-donor design); else assume 0.e_method ({“gaussian”, “ls”, “empirical”}) – Tabulation for the out-of-sample shock.
seed (int) – RNG seed for the simulation.
Leave-Two-Out (LTO) refined placebo test for the synthetic control.
Lei & Sudijono (2025), “Inference for Synthetic Controls via Refined Placebo Tests” (arXiv:2401.07152). The ordinary placebo / permutation test builds its null distribution from only \(N\) reference estimates, so its p-value lives on the coarse grid \(\{1/N, 2/N, \dots, 1\}\) and has zero size when \(\alpha < 1/N\). The LTO test bypasses this by leaving two control units out at a time, producing \(O(N^2)\) reference comparisons while retaining the same finite-sample Type-I error guarantee under uniform assignment.
Procedure (naive LTO, eqs. 5-7)#
Let \(I\) be the treated unit and \([N]\setminus\{I\}\) the controls (\(N = J + 1\) with \(J\) donors). For every unordered pair of distinct controls \(\{i, j\}\):
Build the synthetic control for each \(k \in \{i, j, I\}\) using the donor pool \([N]\setminus\{i, j, I\}\) (all controls except \(i, j\)), and form the residual \(R_{i,j,I;k} = \lvert S(Y_k, \hat Y_k)\rvert\) with \(S\) the post/pre RMSPE-ratio statistic.
Let \(R^{\mathrm{LTO}}_{i,j} = \max(R_{i,j,I;i}, R_{i,j,I;j})\); the treated unit “wins” the triple when \(R_{i,j,I;I} > R^{\mathrm{LTO}}_{i,j}\).
The naive LTO p-value counts the fraction of pairs the treated unit does not win,
which (Theorem 2.2) satisfies \(\mathbb{P}_{H_0}(p_{\mathrm{naive\text{-}LTO}} \le \alpha) \le \lfloor N f(N, \alpha)\rfloor / N\).
Powered LTO (Theorem 2.3)#
For testing at a fixed level \(\alpha\), the powered p-value \(p_{\mathrm{powered\text{-}LTO}}(\alpha) = p_{\mathrm{naive\text{-}LTO}} - c(N, \alpha) + \delta\) shifts the naive value down by the largest amount that leaves the discrete Type-I bound unchanged, strictly increasing power. It is only valid for the \(\alpha\) it was computed at (reject when it is \(\le \alpha\)).
- mlsynth.utils.vanillasc_helpers.lto.lto_f(N: int, alpha: float) float#
Type-I error rate function \(f(N, \alpha)\) (Lei-Sudijono eq. 9).
- mlsynth.utils.vanillasc_helpers.lto.lto_placebo_test(engine: Any, y: ndarray, Y0: ndarray, pre: int, *, X1: ndarray | None = None, X0: ndarray | None = None, alpha: float = 0.05, max_pairs: int | None = None, seed: int = 0) Dict[str, Any]#
Run the Lei-Sudijono (2025) LTO refined placebo test.
- Parameters:
engine (BilevelSCM) – Fitted-config synthetic-control engine;
engine.fit(...)is re-run for each leave-two-out subproblem (any backend works, but the cost is \(O(J^2)\) fits, so fast backends are recommended).y (np.ndarray) – Treated outcome over all periods, shape
(T,).Y0 (np.ndarray) – Donor outcomes, shape
(T, J).pre (int) – Number of pre-treatment periods.
X1, X0 (np.ndarray, optional) – Treated predictor vector
(P,)and donor predictor matrix(P, J)(already windowed and scaled).Nonefor outcome-only matching.alpha (float) – Level at which the powered LTO p-value and Type-I bound are reported.
max_pairs (int, optional) – Cap on the number of donor pairs evaluated (deterministic subsample, for expensive backends).
None-> all \(\binom{J}{2}\) pairs.seed (int) – RNG seed for the pair subsample when
max_pairsis set.
- Returns:
dict –
p_value(naive LTO),p_powered(valid only atalpha),c(powered offset),type_i_bound,n_pairs,treated_losses,N,alpha,reject(powered decision atalpha), andsubsampled.
- mlsynth.utils.vanillasc_helpers.lto.lto_powered_offset(N: int, alpha: float) float#
c(N, alpha): largest shift leaving the discrete Type-I bound fixed.Defined (Theorem 2.3) as the smallest
cwithf(N, alpha + c) = (floor(N f(N, alpha)) + 1) / N. Found by bisection on the monotone increasingf. Reproduces the paper’s values (c(39, 0.05) = 0.002,c(17, 0.05) = 0.0125).
- mlsynth.utils.vanillasc_helpers.lto.lto_type_i_bound(N: int, alpha: float) float#
Discrete Type-I error upper bound \(\lfloor N f(N,\alpha)\rfloor/N\).
SCPI prediction intervals#
To request Cattaneo-Feng-Titiunik prediction intervals instead of the placebo
test, set inference="scpi". On Prop 99 (outcome-only) this yields an ATT
around \(-19\) with a 90% prediction interval that excludes zero, and
per-period intervals that widen as the post-period extends.
import pandas as pd
from mlsynth import VanillaSC
d = pd.read_csv("basedata/augmented_cali_long.csv")
d["treated"] = ((d.state == "California") & (d.year >= 1989)).astype(int)
res = VanillaSC({
"df": d[["state", "year", "cigsale", "treated"]],
"outcome": "cigsale", "treat": "treated", "unitid": "state", "time": "year",
"backend": "outcome-only", "inference": "scpi", "alpha": 0.05,
"scpi_sims": 200, "display_graphs": False,
}).fit()
print(res.inference.ci_lower, res.inference.ci_upper) # ATT prediction interval
det = res.inference.details # per-period sequence
for yr, lo, up in zip(det["periods"], det["pi_lower"], det["pi_upper"]):
print(yr, round(lo, 1), round(up, 1))
SCPI with the covariate backends (MSCMT and Malo)#
The same inference="scpi" switch composes with the covariate-matching
backends. Running each of the three canonical studies under both mscmt and
malo (alpha=0.05 -> 90% intervals, scpi_sims=200, seed=1) gives
the table below. The ATT prediction interval excludes zero in every case,
and the two backends agree to within Monte-Carlo / weight-choice differences –
a useful robustness cross-check. Note the v_agreement column: for Prop 99
and Germany under mscmt the predictor weights are non-identified
(\(\approx 1\)), so those intervals should be read with the caveat above.
Study (backend) |
ATT |
ATT 90% PI |
v_agreement |
top donors |
|---|---|---|---|---|
California (mscmt) |
\(-18.98\) |
\([-27.31,\,-5.28]\) |
\(\approx 1\) (fragile) |
Utah .34, Nevada .24, Montana .20 |
California (malo) |
\(-19.60\) |
\([-31.32,\,-3.27]\) |
n/a |
Utah .38, Montana .25, Nevada .21 |
Germany (mscmt) |
\(-1396\) |
\([-2368,\,-949]\) |
\(\approx 1\) (fragile) |
Austria .40, Switz .16, USA .15 |
Germany (malo) |
\(-1306\) |
\([-2025,\,-521]\) |
n/a |
USA .35, Austria .33, Switz .11 |
Basque (mscmt) |
\(-0.70\) |
\([-1.13,\,-0.32]\) |
\(0.63\) |
Cataluna .84, Madrid .16 |
Basque (malo) |
\(-0.63\) |
\([-1.14,\,-0.18]\) |
\(\approx 0\) (clean) |
Cataluna .47, Madrid .33 |
The Basque case is the cleanest: with the special-predictor covariates,
malo returns a well-identified \(V\) (v_agreement \(\approx 0\))
and mscmt recovers the published Cataluna/Madrid split, both with tight
intervals that exclude zero. The early German post-years (1990-1992) are not
significant under either backend – the interval includes zero – and only turn
decisively negative later, exactly as the reunification narrative implies.
import pandas as pd
from mlsynth import VanillaSC
# --- California / Prop 99 (ADH 2010) ---
d = pd.read_csv("basedata/augmented_cali_long.csv")
for yr, col in [(1975, "cig_1975"), (1980, "cig_1980"), (1988, "cig_1988")]:
d[col] = d.state.map(d[d.year == yr].set_index("state").cigsale)
d["treated"] = ((d.state == "California") & (d.year >= 1989)).astype(int)
cov = ["p_cig", "pct15-24", "loginc", "pc_beer", "cig_1975", "cig_1980", "cig_1988"]
win = {"p_cig": (1980, 1988), "pct15-24": (1980, 1988),
"loginc": (1980, 1988), "pc_beer": (1984, 1988)}
common = dict(df=d, outcome="cigsale", treat="treated", unitid="state", time="year",
covariates=cov, covariate_windows=win, inference="scpi",
alpha=0.05, scpi_sims=200, seed=1, display_graphs=False)
mscmt = VanillaSC({**common, "backend": "mscmt", "canonical_v": "min.loss.w"}).fit()
malo = VanillaSC({**common, "backend": "malo"}).fit()
for name, r in [("mscmt", mscmt), ("malo", malo)]:
i = r.inference
print(name, round(r.effects.att, 2), (round(i.ci_lower, 2), round(i.ci_upper, 2)),
"v_agreement=", r.weights.summary_stats.get("v_agreement"))
# --- German reunification (ADH 2015): outcome "gdp", same pattern ---
# --- Basque (AG 2003): outcome "gdpcap", special-predictor covariates ---
# (swap df/outcome/covariates; everything else is identical.)
The per-period sequence is always in res.inference.details; switching
backend changes \(\widehat w\) (and hence the centre and width of the band)
but not the inference code path.
Leave-two-out refined placebo test#
Set inference="lto" for the Lei-Sudijono (2025) refined placebo test. It is
a drop-in replacement for the ordinary placebo with a much finer p-value grid
and valid rejections when \(\alpha < 1/N\).
import pandas as pd
from mlsynth import VanillaSC
d = pd.read_csv("basedata/augmented_cali_long.csv")
d["treated"] = ((d.state == "California") & (d.year >= 1989)).astype(int)
res = VanillaSC({
"df": d[["state", "year", "cigsale", "treated"]],
"outcome": "cigsale", "treat": "treated", "unitid": "state", "time": "year",
"backend": "outcome-only", "inference": "lto", "alpha": 0.05,
"display_graphs": False,
}).fit()
det = res.inference.details
print(res.inference.p_value) # naive LTO p-value (703 pairs for N = 39)
print(det["p_powered"], det["powered_offset_c"]) # powered p-value at alpha
print(det["type_i_bound"], det["reject_at_alpha"])
Empirical relations across the three studies#
Lei-Sudijono’s Table 1 (their covariate-matched Synth specification, \(\alpha = 0.05\)) lays out how the methods relate on the canonical datasets:
quantity |
Prop 99 |
Basque |
German |
|---|---|---|---|
\(N\) |
39 |
17 |
17 |
\(p_{\text{app-placebo}}\) |
0.00 |
0.35 |
0.00 |
\(p_{\text{exact-placebo}}\) |
0.026 |
0.41 |
0.059 |
\(p_{\mathrm{naive\text{-}LTO}}\) |
0.024 |
0.67 |
0.042 |
\(p_{\mathrm{powered\text{-}LTO}}(\alpha)\) |
0.022 |
0.66 |
0.03 |
\(\Gamma_{\mathrm{LTO}}\) |
1.4 |
NA |
1.1 |
Three relations are worth internalising:
LTO can change the conclusion (German). The exact placebo p-value of 0.059 does not reject at 0.05, but the naive LTO (0.042) and powered LTO (0.03) both do. With only 16 donors the placebo grid is too coarse to resolve a borderline effect; LTO’s finer grid does. The small \(\Gamma = 1.1\), though, warns that this significance is fragile to mild departures from uniform assignment.
LTO is not mechanically smaller (Basque). Here LTO (0.67) is larger than the placebo (0.41); nothing is significant by any method. The refinement changes granularity, not direction – it does not manufacture significance. (Abadie-Gardeazabal’s original Basque analysis dropped poorly-fitting regions; LTO makes no such adjustment, which partly explains the larger value.)
LTO ≈ placebo when both already reject (Prop 99). The two p-values (0.024 vs 0.026) nearly coincide; the powered version (0.022) buys a little extra margin, and \(\Gamma = 1.4\) says the conclusion survives moderate confounding.
A note on specification and validation. The LTO p-value, like the ordinary
placebo, is only as good as the synthetic-control fit it is built on. In
mlsynth the f(N, \alpha) and powered-offset c(N, \alpha) functions
reproduce the paper’s reported values exactly (c(39, 0.05) = 0.002,
c(17, 0.05) = 0.0125; see test_lto_helpers_match_paper), and the
covariate-matched ordinary placebo reproduces California’s exact-placebo
p-value (rank 1 of 39, \(p = 0.0256\) vs the paper’s 0.026). The p-value
itself tracks the chosen specification: with the paper’s covariate-matched
Synth, California is the dominant unit and \(p_{\mathrm{naive\text{-}LTO}}
\approx 0.024\), whereas the outcome-only fit – where California is only
rank 3 of 39 – gives \(\approx 0.10\). Both are internally consistent with
their respective ordinary placebo p-values; the covariate spec is what
concentrates the effect on the treated unit, so choose the specification before
reading the test. Because the cost is \(O(J^2)\) engine fits, run the
covariate-matched (mscmt) version on the smaller studies or cap pairs with
lto_max_pairs; for the 38-donor Prop 99 panel the outcome-only LTO runs
in well under two minutes.
References#
Abadie, A., & Gardeazabal, J. (2003). “The Economic Costs of Conflict: A Case Study of the Basque Country.” American Economic Review 93(1):113-132.
Abadie, A., Diamond, A., & Hainmueller, J. (2010). “Synthetic Control Methods for Comparative Case Studies.” Journal of the American Statistical Association 105(490):493-505.
Abadie, A., Diamond, A., & Hainmueller, J. (2015). “Comparative Politics and the Synthetic Control Method.” American Journal of Political Science 59(2):495-510.
Abadie, A., & L’Hour, J. (2021). “A Penalized Synthetic Control Estimator for Disaggregated Data.” Journal of the American Statistical Association 116(536):1817-1834.
Becker, M., & Kloessner, S. (2018). “Fast and Reliable Computation of Generalized Synthetic Controls.” Econometrics and Statistics 5:1-19.
Lei, L., & Sudijono, T. (2025). “Inference for Synthetic Controls via Refined Placebo Tests.” arXiv:2401.07152.
Malo, P., Eskelinen, J., Zhou, X., & Kuosmanen, T. (2024). “Computing Synthetic Controls Using Bilevel Optimization.” Computational Economics 64:1113-1136.