Vanilla Synthetic Control (VanillaSC)

Contents

Vanilla Synthetic Control (VanillaSC)#

Overview#

VanillaSC is the standard synthetic control method (Abadie & Gardeazabal 2003; Abadie, Diamond & Hainmueller 2010, 2015), built on mlsynth’s self-contained bilevel engine. It estimates the effect on a single treated unit by constructing a weighted average of donor units – the synthetic control – that tracks the treated unit’s pre-treatment path, and reads the effect as the post-treatment gap between the treated unit and its synthetic counterpart.

What distinguishes this implementation is how it treats the two regimes of the SCM optimisation honestly:

  • No covariates -> the donor weights \(W\) solve the convex simplex least-squares fit on the pre-treatment outcomes. This is a single, well-posed convex program – deterministic and reproducible (unique up to donor collinearity).

  • Covariates -> the predictor weights \(V\) and donor weights \(W\) are chosen jointly through a bilevel program. This is non-convex, and the predictor weights are generically non-identified. VanillaSC solves it with a reliable backend and reports a diagnostic (\(\text{v\_agreement}\)) so that fragility is visible rather than silent.

Mathematical formulation#

For a treated unit with pre-treatment outcomes \(y_1 \in \mathbb{R}^{T_0}\) and donors \(Y_0 \in \mathbb{R}^{T_0 \times J}\):

Outcome-only (no covariates).

\[\widehat W = \arg\min_{W} \; \lVert y_1 - Y_0 W \rVert^2 \quad \text{s.t.} \quad W \ge 0,\ \mathbf{1}'W = 1.\]

Covariate matching (bilevel). With predictor matrices \(X_1 \in \mathbb{R}^{P}\) (treated) and \(X_0 \in \mathbb{R}^{P \times J}\) (donors), each predictor averaged over its window and scaled to unit variance, the lower level solves, for given diagonal predictor weights \(V\),

\[W^\*(V) = \arg\min_{W \in \Delta} \; (X_1 - X_0 W)' V (X_1 - X_0 W),\]

and the upper level chooses \(V\) to minimise the pre-treatment outcome fit,

\[\min_{V} \; \lVert y_1 - Y_0\, W^\*(V) \rVert^2 .\]

The donor weights \(W\) and the counterfactual are pinned by this program; the predictor weights \(V\) are generically not (a whole polytope of \(V\) reproduces the same \(W\)).

Backends#

The covariate path exposes three reliable solvers via backend=:

"outcome-only"

No predictor weights; the convex simplex fit above. The well-posed default (also selected by backend="auto" when no covariates are given).

"mscmt"

Becker & Kloessner (2018): a global differential-evolution search over \(\log_{10} V\) with the simplex inner solve. The default when covariates are supplied. Set canonical_v="min.loss.w" (or "max.order") to report a canonical, reproducible \(V\) via the MSCMT determine_v step.

"malo"

Malo et al. (2024): a staged corner search. Fast and exact when the optimum is a predictor corner – but note that when a lagged outcome is among the predictors, the loss-minimising corner puts all weight on that lag, collapsing the inner match to pure outcome-fitting (it drifts toward the outcome floor).

"penalized"

Abadie & L’Hour (2021): a pairwise-penalized estimator with leave-one-out \(\lambda\) selection, giving a unique, sparse \(W\). Works with or without covariates.

The identification diagnostic#

When covariates are used, res.weights.summary_stats["v_agreement"] reports the maximum absolute difference between the two MSCMT canonical predictor-weight vectors (min.loss.w and max.order). It is small when \(V\) is well identified and large (up to 1) when the predictor weights – and the donor weights they imply – are fragile. A large value is a warning that the covariate-matched solution should not be over-interpreted.

Inference#

Two inference modes are available via inference=:

"placebo" (default, inference=True)

Abadie’s in-space placebo test: the synthetic control is refit treating each donor as pseudo-treated, and the treated unit’s post/pre RMSPE ratio is ranked against the placebo distribution to give a p-value. Simple and assumption-light, but the smallest achievable p-value is about \(1/(J+1)\).

"scpi" – prediction intervals (Cattaneo, Feng & Titiunik 2021)

Treats \(\tau_T\) as a predictand (a random variable) and builds prediction intervals, decomposing the prediction error as

\[\widehat\tau_T - \tau_T = e_T - \mathbf{p}_T'(\widehat\beta - \beta_0),\]

an out-of-sample shock \(e_T\) plus an in-sample weight-estimation error. The counterfactual prediction band is assembled period-by-period as \([\,Y_{\text{fit}} + w_L + e_L,\; Y_{\text{fit}} + w_U + e_U\,]\), and the treatment-effect interval is \([\,Y_{\text{obs}} - \text{cf}_U,\; Y_{\text{obs}} - \text{cf}_L\,]\).

  • In-sample (\(w_L\)/\(w_U\)): a simulation-based bound. With \(Q = Z'Z/T_0\) (donor pre-outcomes), \(\widehat\Sigma = Z' \mathrm{diag}(\omega)\,Z / T_0^2\) where \(\omega_t = \tfrac{T_0}{T_0-\mathrm{df}}(u_t - E[u_t])^2\) (HC1), and pre-period residuals \(u = A - B\widehat w\), draw \(G^\star \sim N(0,\widehat\Sigma)\). For each draw and predictor \(\mathbf{p}_T\), solve over the localised simplex set

    \[\min/\max\ \mathbf{p}_T'x \quad\text{s.t.}\quad (x-\widehat w)'Q(x-\widehat w) - 2G^{\star\prime}(x-\widehat w) \le 0,\; \textstyle\sum x = 1,\; x \ge \ell,\]

    with \(\ell_j = \widehat w_j\) if \(\widehat w_j < \rho\) else \(0\). The regularisation parameter \(\rho\) is data-driven and capped at \(\rho_{\max} = 0.2\); \(Q\) is reduced via a thresholded eigen-square-root so collinear (near-null) donor directions are left unconstrained. \(w_L\)/\(w_U\) are the \(\alpha_1/2\) / \(1-\alpha_1/2\) quantiles of \(\mathbf{p}_T'(\widehat w - x)\) across draws.

  • Out-of-sample (\(e_L\)/\(e_U\)): a location-scale model, \(e_T = E[e] + \sqrt{\mathrm{Var}[e]}\,\varepsilon\). The conditional mean and a log-variance scale (capped by the residual IQR, Gaussian \(\varepsilon\)) are estimated by regressing \(u\) on the active-donor design; "ls" and "empirical" use standardized / raw residual quantiles.

VanillaSC returns the average-effect (ATT) interval in res.inference.ci_lower/ci_upper and the full per-period sequence (point effects, prediction intervals, counterfactual bands, and the in-/out-of-sample components) in res.inference.details. This implements the canonical simplex / outcome-only case; for covariate backends it uses the same outcome design and is approximate.

Note

This is a self-contained, MIT-licensed re-derivation of the Cattaneo-Feng-Titiunik algorithm – it does not import the GPL reference package scpi. It is validated to reproduce scpi’s CI_all_gaussian on the Proposition 99 panel to within Monte-Carlo error (see test_scpi_matches_reference_package, which is skipped unless scpi_pkg happens to be installed).

"lto" – leave-two-out refined placebo (Lei & Sudijono 2025)

A design-based randomization test that fixes the two structural weaknesses of the ordinary placebo test – its coarse \(\{1/N, 2/N, \dots\}\) grid and its zero size when \(\alpha < 1/N\). It replaces the “one turn each” permutation with a tournament over triples and reports both a naive p-value (res.inference.p_value) and a powered one (details["p_powered"]), together with the Type-I bound and tournament tallies. It shares the placebo test’s assumptions but is far more powerful in small donor pools. See The leave-two-out refined placebo test and the two theory subsections below for the full treatment.

How the SCPI machinery works (one fit)#

scpi_intervals(y, Y0, pre, W, ...) takes the fitted donor weights \(\widehat w\) (from any backend), the donor outcome matrix, and the number of pre-treatment periods, and runs the following steps. Let \(A = y_{1:T_0}\) be the treated pre-outcomes, \(B = Y_{0,\,1:T_0}\) the donor pre-outcomes, \(P\) the donor post-outcomes, and \(u = A - B\widehat w\) the pre-period residuals.

  1. Degrees of freedom. For the simplex, \(\mathrm{df} = (\#\{\widehat w_j \neq 0\}) - 1\), giving the HC1 correction \(\mathrm{vc} = T_0/(T_0-\mathrm{df})\).

  2. Regularisation parameter \(\rho\). The data-driven type-1 value \(\rho = \tfrac{\sigma_u}{\min_j \mathrm{sd}(B_j)} \sqrt{\log(J)\, d_0 \log T_0}/\sqrt{T_0}\), capped at \(\rho_{\max}=0.2\) (with a fallback bump if it comes out below \(0.001\)). \(\rho\) defines the “active” donor set \(\{\,j : \widehat w_j > \rho\,\}\).

  3. Conditional mean & variance. Regress \(u\) on the active-donor design \([\,B_{\cdot,\text{active}},\,\mathbf{1}\,]\) to get \(E[u]\) (the u_missp step), then \(\omega_t = \mathrm{vc}\,(u_t - E[u_t])^2\). Form \(Q = B'B/T_0\) and \(\widehat\Sigma = B'\mathrm{diag}(\omega)B/T_0^2\), and its matrix square root \(\Sigma^{1/2}\).

  4. Localised feasible set. Lower bounds \(\ell_j = \widehat w_j\) if \(\widehat w_j < \rho\) else \(0\) (near-binding donors are pinned at their tiny weight; active donors may move down to zero). \(Q\) is reduced by a thresholded eigen-square-root so the near-null (collinear) directions are left unconstrained.

  5. In-sample simulation. For each of scpi_sims draws \(G^\star = \Sigma^{1/2}\,z\), \(z\sim N(0,I)\), and each post predictor \(\mathbf{p}_T\), solve the small conic program in \(x\) (donor weights) twice – minimise and maximise \(\mathbf{p}_T'x\) subject to \((x-\widehat w)'Q(x-\widehat w) - 2G^{\star\prime}(x-\widehat w)\le 0\), \(\sum x = 1\), \(x\ge\ell\). Record \(\mathbf{p}_T'(\widehat w - x)\) for each branch; \(w_L\)/\(w_U\) are the \(\alpha_1/2\) / \(1-\alpha_1/2\) quantiles across draws.

  6. Out-of-sample band. From the location-scale model on \(u\) get \(e_L\)/\(e_U\) per post period (Section above).

  7. Assemble. Counterfactual band \([\,Y_{\text{fit}} + w_L + e_L,\; Y_{\text{fit}} + w_U + e_U\,]\), effect interval \([\,Y_{\text{obs}} - \text{cf}_U,\; Y_{\text{obs}} - \text{cf}_L\,]\), and an ATT interval from an appended post-period-average predictor row. An extra averaged row is carried through steps 5-6 so the ATT interval uses the same simulation, not a naive average of the per-period bounds.

The result is an InferenceResults with ci_lower/ci_upper (the ATT interval), confidence_level \(= 1-2\alpha\), and a details dict holding the per-period periods, tau, pi_lower/pi_upper, counterfactual_lower/upper, the in_sample_* (\(w_L,w_U\)) and out_of_sample_* (\(e_L,e_U\)) components, sims and e_method.

Composing SCPI with the backends#

backend (how \(W\) is estimated) and inference (how uncertainty is quantified) are orthogonal – any of the four backends pairs with any of the three inference modes:

VanillaSC({..., "backend": "mscmt", "inference": "scpi"}).fit()
VanillaSC({..., "backend": "malo",  "inference": "scpi"}).fit()

The pipeline fits the weights with the chosen backend and hands the resulting res.W to scpi_intervals. Two things to keep in mind:

  • The in-sample simulation rebuilds \(Q\) and \(\widehat\Sigma\) from the donor pre-outcomes \(B\), treating \(\widehat w\) as simplex weights. With outcome-only this is the exact Cattaneo-Feng-Titiunik interval (the case validated against scpi). With mscmt/malo the weights were also shaped by the covariate predictors, so SCPI uses the outcome design as a stand-in – it is approximate for covariate backends. The point effects, the ATT, and the out-of-sample band are unaffected; only the in-sample \(w_L\)/\(w_U\) term carries the approximation.

  • Read the SCPI interval alongside \(\text{v\_agreement}\). When the predictor weights are non-identified (v_agreement near 1, e.g. Prop 99 with lagged outcomes) the point counterfactual is still pinned, but the covariate-matched solution is fragile; the placebo test, which is exact for any backend, is the conservative cross-check.

The leave-two-out (LTO) refined placebo test#

What it is. The ordinary placebo test (above) gives each of the \(N\) units exactly one turn as the pseudo-treated unit and ranks the real treated unit’s fit statistic against those \(N\) values. That is its weakness: the p-value can only land on the grid \(\{1/N, 2/N, \dots, 1\}\), and at a conventional level like \(\alpha = 0.05\) with a small donor pool the test is either coarse or – when \(\alpha < 1/N\) – literally unable to reject (its size is zero). The Lei-Sudijono (2025) leave-two-out test keeps the same design-based logic but replaces the “one turn each” permutation with a tournament over triples. Think of every triple \(\{i, j, I\}\) (two controls and the treated unit) as a match: leave all three out of the donor pool, build a synthetic control for each of them from the remaining \(N-3\) units, score each by its post/pre RMSPE ratio, and the unit with the largest ratio “wins” the match. The treated unit should win often if the treatment had a real effect (a large post-period gap relative to a tight pre-period fit). The p-value is the fraction of matches the treated unit does not win,

\[p_{\mathrm{naive\text{-}LTO}} = \frac{1}{(N-1)(N-2)} \sum_{i \neq j} \mathbf{1}\bigl\{R_{i,j,I;I} \le \max(R_{i,j,I;i}, R_{i,j,I;j})\bigr\},\]

where \(R_{i,j,I;k} = \lvert S_{\text{ratio-RMSPE}}(Y_k, \widehat Y_k)\rvert\) is the score of unit \(k\) when the pool excludes \(\{i, j, I\}\). Because there are \(\binom{N-1}{2}\) matches rather than \(N\), the p-value lives on an \(O(N^2)\)-fine grid – the granularity problem disappears.

Two p-values. res.inference.p_value is the naive LTO p-value above. res.inference.details["p_powered"] is the powered variant \(p_{\mathrm{naive\text{-}LTO}} - c(N, \alpha) + \delta\), which shifts the naive value down by the largest amount the discrete Type-I bound allows (powered_offset_c), strictly increasing power. The powered value is a decision rule tied to one \(\alpha\) – reject when it is \(\le \alpha\) (reject_at_alpha) – not a general-purpose p-value, so do not compare it across levels or report it as “the” p-value.

LTO: design-based assumptions and econometric theory#

The LTO test is design-based, not outcome-model-based: the potential outcomes are treated as fixed, and all randomness comes from which unit got treated. Its validity rests on two assumptions.

  • Uniform assignment. The treated index \(I\) is uniformly distributed over \(\{1, \dots, N\}\) – a priori, any unit was equally likely to be the treated one. Under the null this makes the \(N\) units exchangeable, which is exactly what licenses the tournament. This holds by construction in (cluster-)randomized experiments. In observational work it is a modelling choice: it is most defensible when the treated unit is comparable to the donors (often after covariate adjustment), and in quasi-experimental settings – e.g. natural disasters, where which locality is hit is plausibly close to random over a small comparable region.

  • Sharp null. The hypothesis tested is Fisher’s sharp null \(H_0 : Y_{i,t}(1) = Y_{i,t}(0)\) for all \(t > T_0\) (no effect for any unit in any post period), or a known-\(\tau\) additive version \(Y_{i,t}(1) = Y_{i,t}(0) + \tau_{i,t}\). Sharpness is what lets the test impute every unit’s counterfactual under the null and so run the tournament.

Under these, the test has a finite-sample Type-I error guarantee (no large-\(N\), no long-\(T\), no asymptotics):

\[\mathbb{P}_{H_0}\!\left(p_{\mathrm{naive\text{-}LTO}} \le \alpha\right) \le \frac{\lfloor N f(N, \alpha)\rfloor}{N},\]

reported as type_i_bound. This bound is never worse than the approximate-placebo bound \((\lfloor N\alpha\rfloor + 1)/N\), and for the levels and sizes typical of SCM applications (\(\alpha \in \{0.01, 0.02\}\) for \(6 < N < 200\); \(\alpha = 0.05\) for most \(N\)) it is identical to it – so switching to LTO costs nothing in worst-case Type-I error. Crucially, the placebo bound is tight whereas the LTO bound generally is not: in practice the LTO test’s actual Type-I error is often strictly below \(\alpha\), i.e. it can be unconditionally valid even when \(\alpha < 1/N\).

Two further theoretical properties matter in practice:

  • Consistency where the placebo test fails (Theorem 6.1). When \(\alpha < 1/N\), the LTO test is uniformly consistent – its power goes to 1 as the effect size grows. The approximate placebo test is not: in this regime it can have essentially zero power no matter how large the true effect (zero if \(N\) is even, \(\le 1/N\) if odd). This is the single strongest reason to prefer LTO in small donor pools.

  • Confidence regions. Inverting the additive-\(\tau\) test (\(\{\theta : p_{\mathrm{naive\text{-}LTO}}(\theta) > \alpha\}\)) yields a region for the post-period effect path with guaranteed coverage \(\ge 1 - \lfloor N f(N,\alpha)\rfloor / N\). (mlsynth currently reports the p-values; the inversion is a straightforward extension.)

Methodologically, the LTO test is a new kind of randomization inference: it generalises the Jackknife+ of Barber et al. (2021) (which leaves one point out and so still has \(1/N\) granularity) and is distinct from classical permutation/rank inference. It also – unlike most asymptotic SCM inference – does not simplify the synthetic-control construction: the full weight/predictor machinery (any VanillaSC backend) is re-run inside every match, so the test reflects the estimator you actually use.

When the LTO assumptions are violated#

The sharp null is testable and usually uncontroversial; the uniform assignment assumption is where care is needed.

  • Selection on outcomes / non-comparable treated unit. If the treated unit was chosen because of its (anticipated) trajectory, or is structurally unlike every donor, exchangeability fails and the Type-I guarantee no longer holds. The usual remedy is to restore comparability through the specification – match on covariates, restrict the donor pool to genuinely similar units – before trusting any placebo-type p-value.

  • Known non-uniform assignment. When the treatment probabilities \(\pi_k\) are known or estimable (e.g. seismic risk for an earthquake study), Lei-Sudijono give a weighted LTO p-value \(p_{\text{w-LTO}}(\pi)\) that reweights each match by \(\pi_j\pi_k / ((1-\pi_I)^2 - \sum_{l\neq I}\pi_l^2)\) and reduces to the naive value when \(\pi_i \equiv 1/N\).

  • Sensitivity analysis (the \(\Gamma\) ). Rather than commit to uniformity, one can ask how far from it the design could be before the conclusion flips. Following Rosenbaum, constrain \(\pi_i \in [\tfrac{1}{\Gamma N}, \tfrac{\Gamma}{N}]\) and find the smallest \(\Gamma \ge 1\) at which the worst-case weighted p-value crosses \(\alpha\). In the paper, Prop 99 tolerates \(\Gamma \approx 1.4\) (robust) while German reunification flips at only \(\Gamma \approx 1.1\) (fragile). The weighted p-value and \(\Gamma\) search require solving a non-convex (NP-hard) quadratic program and are not yet implemented in VanillaSC; the uniform-assignment naive/powered p-values are.

Choosing among placebo, LTO, and SCPI#

  • Prefer LTO over the ordinary placebo whenever the donor pool is small – especially in the \(\alpha < 1/N\) regime (e.g. \(N \le 20\) at \(\alpha = 0.05\)), where the placebo test cannot reject and LTO can. The powered variant almost Pareto-improves the placebo: same worst-case Type-I error, more power. Both share the same assumptions, so LTO is close to a free upgrade.

  • Keep the ordinary placebo when you want the most familiar, widely-reported statistic, when \(N\) is large enough that granularity is a non-issue, or as a cheap (\(O(N)\) vs \(O(N^2)\)) cross-check.

  • Reach for SCPI when the question is how big the effect is (a prediction interval / confidence statement on the magnitude), not just whether there is one. SCPI rests on different (model-based, conditional) foundations than the design-based placebo/LTO tests, so the two are complementary: LTO answers “is the treated unit special?” by randomization, SCPI quantifies the effect’s uncertainty.

When to use it#

  • You want the standard synthetic control done reliably, with the solver choice and identification fragility surfaced.

  • Outcome-only matching when you have a long, informative pre-period – this is the well-posed, reproducible case.

  • Covariate matching with mscmt when the donor pool is rich enough that the problem is well-conditioned (see the replications below). When \(\text{v\_agreement}\) comes back near 1, prefer outcome-only or penalized.

Empirical replications#

The three canonical studies, each trained on its full pre-treatment period. All run from the datasets shipped under basedata/. These are locked as regression tests in mlsynth/tests/test_vanillasc_replications.py.

California / Proposition 99 (ADH 2010)#

Treatment in 1989; pre-period 1970-1988. Covariates averaged over 1980-1988 (beer 1984-1988) plus three lagged cigarette-sales predictors (1975, 1980, 1988). With mscmt this reproduces ADH Table 2 almost exactly – Utah 0.335, Nevada 0.236, Montana 0.202, Colorado 0.160, Connecticut 0.068 (ADH: 0.334 / 0.234 / 0.199 / 0.164 / 0.069) – and an ATT of about \(-19\) packs.

import pandas as pd
from mlsynth import VanillaSC

d = pd.read_csv("basedata/augmented_cali_long.csv")
for yr, col in [(1975, "cig_1975"), (1980, "cig_1980"), (1988, "cig_1988")]:
    d[col] = d.state.map(d[d.year == yr].set_index("state").cigsale)
d["treated"] = ((d.state == "California") & (d.year >= 1989)).astype(int)

res = VanillaSC({
    "df": d, "outcome": "cigsale", "treat": "treated",
    "unitid": "state", "time": "year",
    "backend": "mscmt", "canonical_v": "min.loss.w", "seed": 1,
    "covariates": ["p_cig", "pct15-24", "loginc", "pc_beer",
                   "cig_1975", "cig_1980", "cig_1988"],
    "covariate_windows": {"p_cig": (1980, 1988), "pct15-24": (1980, 1988),
                          "loginc": (1980, 1988), "pc_beer": (1984, 1988)},
    "display_graphs": False,
}).fit()
print(res.effects.att)                 # ~ -19
print(res.weights.donor_weights)       # Utah/Nevada/Montana/Colorado/Connecticut

(In augmented_cali_long.csv the columns are labelled such that p_cig is log GDP per capita and loginc is the retail price – the predictor means reproduce ADH’s Table 1 “Real California” column.)

German reunification (ADH 2015)#

Treatment (reunification) in 1990; pre-period 1960-1990. GDP, trade, inflation and industry share averaged over 1981-1990; investment rate and schooling over 1980-1985. With mscmt the synthetic West Germany is Austria-dominant with the USA, Switzerland, Japan and the Netherlands – the ADH 2015 set – and a negative ATT (reunification lowered per-capita GDP relative to the synthetic).

import pandas as pd
from mlsynth import VanillaSC

d = pd.read_stata("basedata/repgermany.dta")
d["treated"] = ((d.country == "West Germany") & (d.year >= 1990)).astype(int)

res = VanillaSC({
    "df": d, "outcome": "gdp", "treat": "treated",
    "unitid": "country", "time": "year",
    "backend": "mscmt", "seed": 1,
    "covariates": ["gdp", "trade", "infrate", "industry", "invest80", "schooling"],
    "covariate_windows": {"gdp": (1981, 1990), "trade": (1981, 1990),
                          "infrate": (1981, 1990), "industry": (1981, 1990),
                          "invest80": (1980, 1980), "schooling": (1980, 1985)},
    "display_graphs": False,
}).fit()
print(res.weights.donor_weights)       # Austria/USA/Switzerland/Japan/Netherlands

Basque terrorism (Abadie-Gardeazabal 2003)#

The treatment indicator (terrorism) first turns on in 1975, so the model trains on the full 1955-1974 pre-period. On this long pre-period the problem is well-conditioned and the synthetic Basque is Cataluna :math:`approx 0.8`, Madrid :math:`approx 0.2` – the published Abadie-Gardeazabal result – with an ATT of about \(-0.68\) (the roughly 10% per-capita GDP gap). Outcome-only already recovers this; mscmt with the special-predictor covariates confirms it.

Note

This is instructive: on the short 1960-1969 window used by some later papers the Basque donor weights are fragile (they drift to Baleares/Madrid), but on the full 1955-1974 pre-period the long outcome path pins \(W\) to the Cataluna/Madrid solution. The training window matters; VanillaSC uses the full pre-period defined by the treatment indicator.

import pandas as pd
from mlsynth import VanillaSC

b = pd.read_csv("basedata/basque_data.csv")
b = b[b.regionno != 1]                                  # drop Spain
b["treated"] = ((b.regionno == 17) & (b.year >= 1975)).astype(int)

res = VanillaSC({
    "df": b, "outcome": "gdpcap", "treat": "treated",
    "unitid": "regionno", "time": "year",
    "backend": "outcome-only", "display_graphs": False,
}).fit()
print(res.effects.att)                 # ~ -0.68
print(res.weights.donor_weights)       # region 10 (Cataluna) ~0.8, 14 (Madrid) ~0.2

Core API#

VanillaSC: the standard synthetic control, on the bilevel engine.

The ordinary single-treated synthetic control method (Abadie & Gardeazabal 2003; Abadie, Diamond & Hainmueller 2010), implemented on mlsynth’s self-contained bilevel machinery:

  • No covariates -> the well-posed convex problem: donor weights W minimise the pre-treatment outcome fit on the simplex. Unique up to donor collinearity, deterministic, reproducible.

  • Covariates -> the bilevel program (predictor weights V + donor weights W), solved by a reliable backend: "mscmt" (global differential evolution, Becker-Kloessner 2018), "malo" (corner search, Malo et al. 2024), or "penalized" (unique/sparse, Abadie-L’Hour 2021).

Because predictor weights are generically non-identified, VanillaSC reports a v_agreement diagnostic (the gap between the two MSCMT canonical V choices): small means V is well identified, large means the predictor weights – and the donor weights they imply – are fragile.

class mlsynth.estimators.vanillasc.VanillaSC(config: VanillaSCConfig | dict)#

Bases: object

Standard synthetic control estimator (bilevel engine).

Parameters:

config (VanillaSCConfig or dict) – Configuration. See mlsynth.config_models.VanillaSCConfig.

Examples

>>> from mlsynth.estimators.vanillasc import VanillaSC
>>> cfg = {"df": panel, "outcome": "gdp", "treat": "treated",
...        "unitid": "country", "time": "year",
...        "covariates": ["trade", "infrate"], "backend": "mscmt"}
>>> res = VanillaSC(cfg).fit()
>>> res.effects.att
fit() BaseEstimatorResults#

Estimate the synthetic control and return standardized results.

Configuration#

class mlsynth.config_models.VanillaSCConfig(*, df: ~pandas.DataFrame, outcome: str, treat: str, unitid: str, time: str, display_graphs: bool = True, save: bool | str = False, counterfactual_color: ~typing.List[str] = <factory>, treated_color: str = 'black', backend: ~typing.Literal['auto', 'outcome-only', 'malo', 'mscmt', 'penalized'] = 'auto', covariates: ~typing.List[str] | None = None, covariate_windows: ~typing.Dict[~typing.Any, ~typing.Any] | None = None, canonical_v: bool | str = False, seed: int = 0, mscmt_maxiter: ~typing.Annotated[int, ~annotated_types.Ge(ge=1)] = 300, mscmt_popsize: ~typing.Annotated[int, ~annotated_types.Ge(ge=1)] = 15, inference: bool | str = True, alpha: ~typing.Annotated[float, ~annotated_types.Gt(gt=0.0), ~annotated_types.Lt(lt=1.0)] = 0.05, scpi_sims: ~typing.Annotated[int, ~annotated_types.Ge(ge=1)] = 200, scpi_e_method: ~typing.Literal['gaussian', 'empirical'] = 'gaussian', lto_max_pairs: ~typing.Annotated[int | None, ~annotated_types.Ge(ge=1)] = None)#

Configuration for the VanillaSC estimator (standard SCM, bilevel engine).

The ordinary single-treated synthetic control, built on the self-contained bilevel machinery. With no covariates it reduces to the well-posed convex outcome-matching problem; with covariates it routes through the bilevel predictor-weight (V) optimisation, with a selectable, reliable backend.

Parameters:
  • backend ({“auto”, “outcome-only”, “malo”, “mscmt”, “penalized”}) – Predictor-weight backend. "auto" (default) uses "outcome-only" (convex simplex fit on pre-treatment outcomes) when no covariates are given, and "mscmt" (global differential-evolution V search) when they are. "malo" is the Malo et al. (2024) corner search, "penalized" the Abadie-L’Hour (2021) unique/sparse estimator.

  • covariates (list of str, optional) – Predictor columns. Each is averaged over its window (see covariate_windows) and scaled to unit variance, then matched via the bilevel program. None -> outcome-only matching.

  • covariate_windows (dict, optional) – Per-covariate inclusive (start, end) averaging window of time labels (Abadie’s special-predictor spec). Covariates not listed are averaged over the full pre-treatment period.

  • canonical_v (bool or {“min.loss.w”, “max.order”}) – Canonicalise the (non-identified) predictor weights for mscmt (MSCMT determine_v). The reported v_agreement is small when V is well identified and large when it is fragile. Default False.

  • seed (int) – RNG seed for the mscmt differential-evolution search.

  • mscmt_maxiter, mscmt_popsize (int) – Differential-evolution budget for the mscmt backend.

  • inference (bool or {“placebo”, “scpi”, “lto”}) – Inference method. True/"placebo" (default) runs Abadie in-space placebo inference (refit treating each donor as pseudo-treated; the p-value ranks the treated unit’s post/pre RMSPE ratio). "scpi" runs Cattaneo-Feng-Titiunik (2021) prediction intervals (in-sample simulation + out-of-sample location-scale; exact for the simplex / outcome-only synthetic control). "lto" runs the Lei-Sudijono (2025) leave-two-out refined placebo test (O(J^2) reference comparisons; finer granularity and non-zero size when alpha < 1/N). False skips inference.

  • alpha (float) – Level. For placebo, the confidence statement; for SCPI, used as both the in-sample (alpha1) and out-of-sample (alpha2) levels, giving a prediction interval with coverage approximately 1 - 2*alpha.

  • scpi_sims (int) – Number of Gaussian draws for the SCPI in-sample simulation.

  • scpi_e_method ({“gaussian”, “empirical”}) – Out-of-sample location-scale tabulation for SCPI.

  • lto_max_pairs (int, optional) – Cap on the number of donor pairs evaluated by the "lto" test (deterministic subsample via seed). None (default) uses all J*(J-1)/2 pairs; set a cap to keep the O(J^2) cost tractable with slow backends.

alpha: float#
backend: Literal['auto', 'outcome-only', 'malo', 'mscmt', 'penalized']#
canonical_v: bool | str#
covariate_windows: Dict[Any, Any] | None#
covariates: List[str] | None#
inference: bool | str#
lto_max_pairs: int | None#
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

mscmt_maxiter: int#
mscmt_popsize: int#
scpi_e_method: Literal['gaussian', 'empirical']#
scpi_sims: int#
seed: int#

Engine#

Bilevel SCM engine for the VanillaSC estimator.

A thin, dataprep-agnostic wrapper that turns the self-contained bilevel machinery (mlsynth.utils.fscm_helpers.bilevel) into the standard single-treated synthetic control, with a selectable predictor-weight backend:

  • "outcome-only" – no covariates: the donor weights solve the convex simplex least-squares fit on the pre-treatment outcomes (Abadie’s outcome matching). Well-posed and unique up to donor collinearity.

  • "malo" / "mscmt" – covariate matching via the bilevel program (predictor weights V + donor weights W): Malo et al. (2024) corner search or Becker-Kloessner (2018) global differential evolution.

  • "penalized" – Abadie-L’Hour (2021) pairwise-penalized estimator (a unique, sparse W); works with or without covariates.

The engine takes plain NumPy arrays so it can be unit-tested in isolation and reused outside the estimator. The VanillaSC estimator feeds it the matrices produced by mlsynth.utils.datautils.dataprep().

class mlsynth.utils.vanillasc_helpers.engine.BilevelSCM(backend: str = 'auto', *, canonical_v=False, seed: int = 0, **solver_kwargs: Any)#

Standard single-treated synthetic control via the bilevel solver.

Parameters:
  • backend ({“auto”, “outcome-only”, “malo”, “mscmt”, “penalized”}) – Predictor-weight backend. "auto" (default) picks "outcome-only" when no covariates are supplied and "mscmt" when they are.

  • canonical_v (bool or {“min.loss.w”, “max.order”}) – Canonicalise the (non-identified) predictor weights V for the mscmt backend (see mlsynth.utils.fscm_helpers.bilevel.determine_v.canonical_v()). Ignored by other backends. Default False.

  • seed (int) – RNG seed for the mscmt differential-evolution search.

  • solver_kwargs – Extra keyword arguments forwarded to the backend (e.g. maxiter, popsize for mscmt; lam for penalized).

fit(y_pre: ndarray, Y0_pre: ndarray, *, X1: ndarray | None = None, X0: ndarray | None = None, donor_names: List[str] | None = None, predictor_names: List[str] | None = None) BilevelSCMResult#

Solve for the donor weights W.

Parameters:
  • y_pre (np.ndarray) – Treated pre-treatment outcomes, shape (T0,).

  • Y0_pre (np.ndarray) – Donor pre-treatment outcomes, shape (T0, J).

  • X1 (np.ndarray, optional) – Treated predictor (covariate) values, shape (P,).

  • X0 (np.ndarray, optional) – Donor predictor matrix, shape (P, J).

  • donor_names (list of str, optional) – Donor labels (defaults to donor_0 ...).

  • predictor_names (list of str, optional) – Predictor labels.

class mlsynth.utils.vanillasc_helpers.engine.BilevelSCMResult(W: ~numpy.ndarray, donor_weights: ~typing.Dict[str, float], V: ~numpy.ndarray | None, predictor_names: ~typing.List[str], backend: str, pre_rmspe: float, v_agreement: float | None = None, diagnostics: ~typing.Dict[str, ~typing.Any] = <factory>)#

Output of BilevelSCM.fit().

W#

Donor weights, shape (J,), on the simplex.

Type:

np.ndarray

donor_weights#

{donor_name: weight} for weight-bearing donors.

Type:

dict

V#

Predictor (covariate) weights, shape (P,), normalised to the simplex; None for outcome-only matching.

Type:

np.ndarray or None

predictor_names#

Names of the P predictors (empty for outcome-only).

Type:

list of str

backend#

The backend actually used.

Type:

str

pre_rmspe#

Root mean squared prediction error on the pre-treatment outcomes.

Type:

float

v_agreement#

Identification diagnostic: max abs difference between the min.loss.w and max.order canonical predictor weights. Small = V well identified; large = predictor weights fragile. None for outcome-only.

Type:

float or None

diagnostics#

Free-form solver diagnostics (stage, gap, lower bound, …).

Type:

dict

V: ndarray | None#
W: ndarray#
backend: str#
counterfactual(donor_matrix: ndarray) ndarray#

Synthetic outcome path donor_matrix @ W over all periods.

donor_matrix has shape (T, J) (donors in columns, matching the order of W).

diagnostics: Dict[str, Any]#
donor_weights: Dict[str, float]#
pre_rmspe: float#
predictor_names: List[str]#
v_agreement: float | None = None#

Orchestration for the VanillaSC estimator.

dataprep -> (optional) covariate matrices -> bilevel engine -> ATT, fit diagnostics, in-space placebo inference -> standardized results.

mlsynth.utils.vanillasc_helpers.pipeline.run_vanillasc(config) BaseEstimatorResults#

Fit VanillaSC and assemble BaseEstimatorResults.

SCPI prediction intervals for the (simplex) synthetic control.

Cattaneo, Feng & Titiunik (2021, JASA) and Cattaneo, Feng, Palomba & Titiunik (2025, JSS scpi). The prediction error of the synthetic-control counterfactual decomposes as

tau_hat_T - tau_T = e_T - p_T’ (beta_hat - beta_0),

an out-of-sample shock e_T plus an in-sample weight-estimation error. This module is a from-scratch (MIT-licensed) re-derivation of the algorithm described in those papers – it does not import the GPL scpi package – and has been validated to reproduce scpi’s CI_all_gaussian for the canonical simplex control to within Monte-Carlo error.

The counterfactual prediction band is assembled period-by-period as

[ Y_fit + w_lb + e_lb , Y_fit + w_ub + e_ub ],

with the treatment-effect interval [Y_obs - cf_upper, Y_obs - cf_lower].

In-sample component (w_lb/w_ub)#

With Z = B (donor pre-outcomes), Q = Z'Z / T0 and pre-period residuals u = A - B w_hat, draw G* ~ N(0, Sigma) with Sigma = Z' diag(omega) Z / T0**2 and omega_t = (T0/(T0-df)) (u_t - E[u_t])**2 (HC1; E[u] from a regression of u on the active-donor design when u_missp). For each draw and post-period predictor p_T solve, over the localised simplex set,

min / max p_T’ x s.t. (x - w_hat)’Q(x - w_hat) - 2 G*’(x - w_hat) <= 0,

sum(x) = 1, x >= lb,

where lb_j = w_hat_j if w_hat_j < rho else 0 (the local geometry of Cattaneo et al.; rho is the data-driven regularisation parameter, capped at rho_max = 0.2). Q is reduced via a thresholded eigen-square-root so that collinear (near-null) donor directions are left unconstrained, exactly as in the reference conic reformulation. w_lb/w_ub are the alpha1/2 / 1 - alpha1/2 quantiles, across draws, of p_T'(w_hat - x) for the maximising / minimising branch.

Out-of-sample component (e_lb/e_ub)#

A location-scale model for e_T: regress u on the active-donor design to get the conditional mean E[e] and a log-variance model for the scale sqrt(Var[e]) (Gaussian), capped by the inter-quartile range of the residuals (IQR / 1.34). The Gaussian band is E[e] +/- sqrt(-2 ln alpha2) * scale; "ls" uses standardized-residual quantiles, "empirical" the raw residual quantiles.

This implements the canonical simplex case (w >= 0, sum w = 1), the scpi default and the standard synthetic control.

class mlsynth.utils.vanillasc_helpers.scpi.SCPIResult(tau: ndarray, lower: ndarray, upper: ndarray, cf_lower: ndarray, cf_upper: ndarray, M1_lower: ndarray, M1_upper: ndarray, M2_lower: ndarray, M2_upper: ndarray, metadata: Dict[str, Any])#

Per-post-period SCPI prediction intervals (arrays of length T_post).

M1_lower: ndarray#
M1_upper: ndarray#
M2_lower: ndarray#
M2_upper: ndarray#
cf_lower: ndarray#
cf_upper: ndarray#
lower: ndarray#
metadata: Dict[str, Any]#
tau: ndarray#
upper: ndarray#
mlsynth.utils.vanillasc_helpers.scpi.scpi_intervals(y: ndarray, Y0: ndarray, pre: int, W: ndarray, *, sims: int = 200, u_alpha: float = 0.05, e_alpha: float = 0.05, u_missp: bool = True, e_method: str = 'gaussian', seed: int = 0) SCPIResult#

Compute SCPI prediction intervals for a simplex synthetic control.

Parameters:
  • y (np.ndarray) – Treated outcome over all periods, shape (T,).

  • Y0 (np.ndarray) – Donor outcomes over all periods, shape (T, J) (columns match W).

  • pre (int) – Number of pre-treatment periods T0.

  • W (np.ndarray) – Fitted simplex donor weights, shape (J,).

  • sims (int) – Number of Gaussian draws for the in-sample simulation.

  • u_alpha, e_alpha (float) – In-sample (alpha1) and out-of-sample (alpha2) levels.

  • u_missp (bool) – If True, allow E[u | H] != 0 (estimated by regressing the pre-period residuals on the active-donor design); else assume 0.

  • e_method ({“gaussian”, “ls”, “empirical”}) – Tabulation for the out-of-sample shock.

  • seed (int) – RNG seed for the simulation.

Leave-Two-Out (LTO) refined placebo test for the synthetic control.

Lei & Sudijono (2025), “Inference for Synthetic Controls via Refined Placebo Tests” (arXiv:2401.07152). The ordinary placebo / permutation test builds its null distribution from only \(N\) reference estimates, so its p-value lives on the coarse grid \(\{1/N, 2/N, \dots, 1\}\) and has zero size when \(\alpha < 1/N\). The LTO test bypasses this by leaving two control units out at a time, producing \(O(N^2)\) reference comparisons while retaining the same finite-sample Type-I error guarantee under uniform assignment.

Procedure (naive LTO, eqs. 5-7)#

Let \(I\) be the treated unit and \([N]\setminus\{I\}\) the controls (\(N = J + 1\) with \(J\) donors). For every unordered pair of distinct controls \(\{i, j\}\):

  1. Build the synthetic control for each \(k \in \{i, j, I\}\) using the donor pool \([N]\setminus\{i, j, I\}\) (all controls except \(i, j\)), and form the residual \(R_{i,j,I;k} = \lvert S(Y_k, \hat Y_k)\rvert\) with \(S\) the post/pre RMSPE-ratio statistic.

  2. Let \(R^{\mathrm{LTO}}_{i,j} = \max(R_{i,j,I;i}, R_{i,j,I;j})\); the treated unit “wins” the triple when \(R_{i,j,I;I} > R^{\mathrm{LTO}}_{i,j}\).

The naive LTO p-value counts the fraction of pairs the treated unit does not win,

\[p_{\mathrm{naive\text{-}LTO}} = \frac{1}{(N-1)(N-2)} \sum_{i \neq j} \mathbf{1}\{R_{i,j,I;I} \le R^{\mathrm{LTO}}_{i,j}\},\]

which (Theorem 2.2) satisfies \(\mathbb{P}_{H_0}(p_{\mathrm{naive\text{-}LTO}} \le \alpha) \le \lfloor N f(N, \alpha)\rfloor / N\).

Powered LTO (Theorem 2.3)#

For testing at a fixed level \(\alpha\), the powered p-value \(p_{\mathrm{powered\text{-}LTO}}(\alpha) = p_{\mathrm{naive\text{-}LTO}} - c(N, \alpha) + \delta\) shifts the naive value down by the largest amount that leaves the discrete Type-I bound unchanged, strictly increasing power. It is only valid for the \(\alpha\) it was computed at (reject when it is \(\le \alpha\)).

mlsynth.utils.vanillasc_helpers.lto.lto_f(N: int, alpha: float) float#

Type-I error rate function \(f(N, \alpha)\) (Lei-Sudijono eq. 9).

mlsynth.utils.vanillasc_helpers.lto.lto_placebo_test(engine: Any, y: ndarray, Y0: ndarray, pre: int, *, X1: ndarray | None = None, X0: ndarray | None = None, alpha: float = 0.05, max_pairs: int | None = None, seed: int = 0) Dict[str, Any]#

Run the Lei-Sudijono (2025) LTO refined placebo test.

Parameters:
  • engine (BilevelSCM) – Fitted-config synthetic-control engine; engine.fit(...) is re-run for each leave-two-out subproblem (any backend works, but the cost is \(O(J^2)\) fits, so fast backends are recommended).

  • y (np.ndarray) – Treated outcome over all periods, shape (T,).

  • Y0 (np.ndarray) – Donor outcomes, shape (T, J).

  • pre (int) – Number of pre-treatment periods.

  • X1, X0 (np.ndarray, optional) – Treated predictor vector (P,) and donor predictor matrix (P, J) (already windowed and scaled). None for outcome-only matching.

  • alpha (float) – Level at which the powered LTO p-value and Type-I bound are reported.

  • max_pairs (int, optional) – Cap on the number of donor pairs evaluated (deterministic subsample, for expensive backends). None -> all \(\binom{J}{2}\) pairs.

  • seed (int) – RNG seed for the pair subsample when max_pairs is set.

Returns:

dictp_value (naive LTO), p_powered (valid only at alpha), c (powered offset), type_i_bound, n_pairs, treated_losses, N, alpha, reject (powered decision at alpha), and subsampled.

mlsynth.utils.vanillasc_helpers.lto.lto_powered_offset(N: int, alpha: float) float#

c(N, alpha): largest shift leaving the discrete Type-I bound fixed.

Defined (Theorem 2.3) as the smallest c with f(N, alpha + c) = (floor(N f(N, alpha)) + 1) / N. Found by bisection on the monotone increasing f. Reproduces the paper’s values (c(39, 0.05) = 0.002, c(17, 0.05) = 0.0125).

mlsynth.utils.vanillasc_helpers.lto.lto_type_i_bound(N: int, alpha: float) float#

Discrete Type-I error upper bound \(\lfloor N f(N,\alpha)\rfloor/N\).

SCPI prediction intervals#

To request Cattaneo-Feng-Titiunik prediction intervals instead of the placebo test, set inference="scpi". On Prop 99 (outcome-only) this yields an ATT around \(-19\) with a 90% prediction interval that excludes zero, and per-period intervals that widen as the post-period extends.

import pandas as pd
from mlsynth import VanillaSC

d = pd.read_csv("basedata/augmented_cali_long.csv")
d["treated"] = ((d.state == "California") & (d.year >= 1989)).astype(int)

res = VanillaSC({
    "df": d[["state", "year", "cigsale", "treated"]],
    "outcome": "cigsale", "treat": "treated", "unitid": "state", "time": "year",
    "backend": "outcome-only", "inference": "scpi", "alpha": 0.05,
    "scpi_sims": 200, "display_graphs": False,
}).fit()

print(res.inference.ci_lower, res.inference.ci_upper)   # ATT prediction interval
det = res.inference.details                              # per-period sequence
for yr, lo, up in zip(det["periods"], det["pi_lower"], det["pi_upper"]):
    print(yr, round(lo, 1), round(up, 1))

SCPI with the covariate backends (MSCMT and Malo)#

The same inference="scpi" switch composes with the covariate-matching backends. Running each of the three canonical studies under both mscmt and malo (alpha=0.05 -> 90% intervals, scpi_sims=200, seed=1) gives the table below. The ATT prediction interval excludes zero in every case, and the two backends agree to within Monte-Carlo / weight-choice differences – a useful robustness cross-check. Note the v_agreement column: for Prop 99 and Germany under mscmt the predictor weights are non-identified (\(\approx 1\)), so those intervals should be read with the caveat above.

Study (backend)

ATT

ATT 90% PI

v_agreement

top donors

California (mscmt)

\(-18.98\)

\([-27.31,\,-5.28]\)

\(\approx 1\) (fragile)

Utah .34, Nevada .24, Montana .20

California (malo)

\(-19.60\)

\([-31.32,\,-3.27]\)

n/a

Utah .38, Montana .25, Nevada .21

Germany (mscmt)

\(-1396\)

\([-2368,\,-949]\)

\(\approx 1\) (fragile)

Austria .40, Switz .16, USA .15

Germany (malo)

\(-1306\)

\([-2025,\,-521]\)

n/a

USA .35, Austria .33, Switz .11

Basque (mscmt)

\(-0.70\)

\([-1.13,\,-0.32]\)

\(0.63\)

Cataluna .84, Madrid .16

Basque (malo)

\(-0.63\)

\([-1.14,\,-0.18]\)

\(\approx 0\) (clean)

Cataluna .47, Madrid .33

The Basque case is the cleanest: with the special-predictor covariates, malo returns a well-identified \(V\) (v_agreement \(\approx 0\)) and mscmt recovers the published Cataluna/Madrid split, both with tight intervals that exclude zero. The early German post-years (1990-1992) are not significant under either backend – the interval includes zero – and only turn decisively negative later, exactly as the reunification narrative implies.

import pandas as pd
from mlsynth import VanillaSC

# --- California / Prop 99 (ADH 2010) ---
d = pd.read_csv("basedata/augmented_cali_long.csv")
for yr, col in [(1975, "cig_1975"), (1980, "cig_1980"), (1988, "cig_1988")]:
    d[col] = d.state.map(d[d.year == yr].set_index("state").cigsale)
d["treated"] = ((d.state == "California") & (d.year >= 1989)).astype(int)
cov = ["p_cig", "pct15-24", "loginc", "pc_beer", "cig_1975", "cig_1980", "cig_1988"]
win = {"p_cig": (1980, 1988), "pct15-24": (1980, 1988),
       "loginc": (1980, 1988), "pc_beer": (1984, 1988)}
common = dict(df=d, outcome="cigsale", treat="treated", unitid="state", time="year",
              covariates=cov, covariate_windows=win, inference="scpi",
              alpha=0.05, scpi_sims=200, seed=1, display_graphs=False)

mscmt = VanillaSC({**common, "backend": "mscmt", "canonical_v": "min.loss.w"}).fit()
malo  = VanillaSC({**common, "backend": "malo"}).fit()
for name, r in [("mscmt", mscmt), ("malo", malo)]:
    i = r.inference
    print(name, round(r.effects.att, 2), (round(i.ci_lower, 2), round(i.ci_upper, 2)),
          "v_agreement=", r.weights.summary_stats.get("v_agreement"))

# --- German reunification (ADH 2015): outcome "gdp", same pattern ---
# --- Basque (AG 2003): outcome "gdpcap", special-predictor covariates ---
# (swap df/outcome/covariates; everything else is identical.)

The per-period sequence is always in res.inference.details; switching backend changes \(\widehat w\) (and hence the centre and width of the band) but not the inference code path.

Leave-two-out refined placebo test#

Set inference="lto" for the Lei-Sudijono (2025) refined placebo test. It is a drop-in replacement for the ordinary placebo with a much finer p-value grid and valid rejections when \(\alpha < 1/N\).

import pandas as pd
from mlsynth import VanillaSC

d = pd.read_csv("basedata/augmented_cali_long.csv")
d["treated"] = ((d.state == "California") & (d.year >= 1989)).astype(int)

res = VanillaSC({
    "df": d[["state", "year", "cigsale", "treated"]],
    "outcome": "cigsale", "treat": "treated", "unitid": "state", "time": "year",
    "backend": "outcome-only", "inference": "lto", "alpha": 0.05,
    "display_graphs": False,
}).fit()

det = res.inference.details
print(res.inference.p_value)        # naive LTO p-value (703 pairs for N = 39)
print(det["p_powered"], det["powered_offset_c"])   # powered p-value at alpha
print(det["type_i_bound"], det["reject_at_alpha"])

Empirical relations across the three studies#

Lei-Sudijono’s Table 1 (their covariate-matched Synth specification, \(\alpha = 0.05\)) lays out how the methods relate on the canonical datasets:

quantity

Prop 99

Basque

German

\(N\)

39

17

17

\(p_{\text{app-placebo}}\)

0.00

0.35

0.00

\(p_{\text{exact-placebo}}\)

0.026

0.41

0.059

\(p_{\mathrm{naive\text{-}LTO}}\)

0.024

0.67

0.042

\(p_{\mathrm{powered\text{-}LTO}}(\alpha)\)

0.022

0.66

0.03

\(\Gamma_{\mathrm{LTO}}\)

1.4

NA

1.1

Three relations are worth internalising:

  • LTO can change the conclusion (German). The exact placebo p-value of 0.059 does not reject at 0.05, but the naive LTO (0.042) and powered LTO (0.03) both do. With only 16 donors the placebo grid is too coarse to resolve a borderline effect; LTO’s finer grid does. The small \(\Gamma = 1.1\), though, warns that this significance is fragile to mild departures from uniform assignment.

  • LTO is not mechanically smaller (Basque). Here LTO (0.67) is larger than the placebo (0.41); nothing is significant by any method. The refinement changes granularity, not direction – it does not manufacture significance. (Abadie-Gardeazabal’s original Basque analysis dropped poorly-fitting regions; LTO makes no such adjustment, which partly explains the larger value.)

  • LTO ≈ placebo when both already reject (Prop 99). The two p-values (0.024 vs 0.026) nearly coincide; the powered version (0.022) buys a little extra margin, and \(\Gamma = 1.4\) says the conclusion survives moderate confounding.

A note on specification and validation. The LTO p-value, like the ordinary placebo, is only as good as the synthetic-control fit it is built on. In mlsynth the f(N, \alpha) and powered-offset c(N, \alpha) functions reproduce the paper’s reported values exactly (c(39, 0.05) = 0.002, c(17, 0.05) = 0.0125; see test_lto_helpers_match_paper), and the covariate-matched ordinary placebo reproduces California’s exact-placebo p-value (rank 1 of 39, \(p = 0.0256\) vs the paper’s 0.026). The p-value itself tracks the chosen specification: with the paper’s covariate-matched Synth, California is the dominant unit and \(p_{\mathrm{naive\text{-}LTO}} \approx 0.024\), whereas the outcome-only fit – where California is only rank 3 of 39 – gives \(\approx 0.10\). Both are internally consistent with their respective ordinary placebo p-values; the covariate spec is what concentrates the effect on the treated unit, so choose the specification before reading the test. Because the cost is \(O(J^2)\) engine fits, run the covariate-matched (mscmt) version on the smaller studies or cap pairs with lto_max_pairs; for the 38-donor Prop 99 panel the outcome-only LTO runs in well under two minutes.

References#

Abadie, A., & Gardeazabal, J. (2003). “The Economic Costs of Conflict: A Case Study of the Basque Country.” American Economic Review 93(1):113-132.

Abadie, A., Diamond, A., & Hainmueller, J. (2010). “Synthetic Control Methods for Comparative Case Studies.” Journal of the American Statistical Association 105(490):493-505.

Abadie, A., Diamond, A., & Hainmueller, J. (2015). “Comparative Politics and the Synthetic Control Method.” American Journal of Political Science 59(2):495-510.

Abadie, A., & L’Hour, J. (2021). “A Penalized Synthetic Control Estimator for Disaggregated Data.” Journal of the American Statistical Association 116(536):1817-1834.

Becker, M., & Kloessner, S. (2018). “Fast and Reliable Computation of Generalized Synthetic Controls.” Econometrics and Statistics 5:1-19.

Lei, L., & Sudijono, T. (2025). “Inference for Synthetic Controls via Refined Placebo Tests.” arXiv:2401.07152.

Malo, P., Eskelinen, J., Zhou, X., & Kuosmanen, T. (2024). “Computing Synthetic Controls Using Bilevel Optimization.” Computational Economics 64:1113-1136.