Synthetic Interventions (SI)#
When to Use This Estimator#
Classical synthetic control answers a single counterfactual question: what would the treated unit have done under the status quo? Synthetic Interventions (SI), due to Agarwal, Shah and Shen (2026) [SI], generalises this to many interventions at once: what would a unit have done under each of several alternative treatments it did not receive?
The motivating example is the canonical Proposition 99 tobacco study. In 1989 California enacted a large anti-tobacco program. Over the following decade other states instead adopted anti-tobacco programs (Arizona, Massachusetts, Oregon, Florida) or raised cigarette taxes (Alaska, Hawaii, Maryland, Michigan, New Jersey, New York, Washington). SI lets you ask not only “what would California have done under the status quo?” but also “what would California’s cigarette sales have been had it instead raised taxes or run a program?” — by borrowing the post-treatment trajectories of the states that actually did those things.
Reach for SI when:
You have multiple, distinct interventions across units (policies, product launches, treatment arms) and want to compare a focal unit’s counterfactual across them, not just against control.
A low-rank factor structure is plausible. SI rests on a latent-factor (interactive fixed-effects) model in which each unit’s latent loadings are shared across time and across interventions — the structural bridge that lets weights learned on pre-period control data transfer to post-period outcomes under a different intervention.
You want valid inference. The bias-corrected SI-PCR estimator (the default here) is asymptotically normal, yielding closed-form confidence intervals — a feature absent from most SC point estimators.
The flip side: SI assumes no interference and no dynamic effects (Assumption 1), and its factor model assumes each donor pool is observed under a single intervention throughout the post-period. Staggered adoption is not modelled (the paper only approximates it with a common post-window).
Notation#
Index time by \(t \in [T]\), units by \(i \in [N]\), and interventions by \(d \in \{0, 1, \dots, D\}\) (with \(d = 0\) the status quo). Let \(Y_{ti}(d)\) be the potential outcome of unit \(i\) at time \(t\) under intervention \(d\). The first \(T_0\) periods are pre-treatment (all units under control); the remaining \(T_1 = T - T_0\) are post-treatment. \(I(d)\) is the set of \(N_d\) units assigned to intervention \(d\) after \(T_0\) (the donor pool for \(d\)).
For a focal target unit \(i\), write \(Y_{\text{pre}, i} = [Y_{ti0} : t \le T_0] \in \mathbb{R}^{T_0}\) for its pre-period (control) outcomes and \(Y_{\text{pre}, I(d)} \in \mathbb{R}^{T_0 \times N_d}\) for the donor pool’s pre-period (control) outcomes. The estimand is
the focal unit’s average post-period outcome had it received intervention \(d\).
Assumptions#
SI inherits the SC identification conditions and adds one structural assumption that does the real work — invariance of unit factors across interventions. Each is stated with a plain-language remark.
Assumption 1 (SUTVA / observation pattern). Pre-treatment, every unit is observed under control (\(Y_{ti0} = Y_{ti}(0)\) for \(t \le T_0\)); post-treatment, each unit is observed under its assigned intervention (\(Y_{tid} = Y_{ti}(d)\) for \(t > T_0\), \(i \in I(d)\)).
Remark. This rules out spillovers between units and, with the static factor model below, dynamic (carry-over) treatment effects. Estimating a counterfactual is then exactly a tensor-completion problem: imputing the unobserved \((t, i, d)\) cells of the potential-outcome tensor.
Assumption 2 (tensor factor model — the SI assumption). Each potential outcome factorises as
where the unit factors \(v_i\) are invariant across both time and interventions, and only the time-intervention factors \(u_t(d)\) depend on \(d\).
Remark. This is the crux. In single-intervention SC the unit factors are invariant across time, which lets pre-period weights predict post-period control outcomes. SI strengthens this to invariance across interventions: weights learned from pre-period control data can be applied to post-period outcomes under a different intervention \(d\). Conceptually, each unit has stable intrinsic traits (\(v_i\)) that any intervention acts on through \(u_t(d)\).
Assumption 3 (low rank). The signal \(\mathbb{E}[Y_{\text{pre}, I(d)} \mid \mathcal{E}]\) is low rank (rank \(r_{\text{pre}} \le r\)).
Remark. This is what makes the spectral (PCR) denoising step meaningful: the large singular values of the noisy donor pre-matrix capture the signal, the small ones capture noise.
Assumption 4 (span / linear span condition). The focal unit’s factor lies in the span of the donor pool’s factors, so a weight vector \(w^{(i,d)}\) exists with \(v_i = \sum_{j \in I(d)} w^{(i,d)}_j v_j\).
Remark. The multi-intervention analogue of “the treated unit lies in the convex hull of the donors.” A strong pre-period fit (small \(\|Y_{\text{pre},i} - Y_{\text{pre},I(d)} w\|\)) is the data-driven sanity check; a poor fit warns that the span condition or low-rank structure fails.
Assumption 5 (homoskedastic noise). The idiosyncratic noise is mean-zero with common variance \(\sigma^2\).
Remark. Used only for the variance estimate \(\hat\sigma^2\) (eq. 14) behind the confidence interval; the point estimator does not need it.
Assumptions 6-8 (regularity for normality). Bounded factors / sub-Gaussian noise / incoherence-type conditions, plus the rate constraints \(N_d < T_0\) and \(T_1 = \tilde o(\min\{r_{\text{pre}}^{-3} N_d,\, r_{\text{pre}}^{-1} \sqrt{T_0}\})\).
Remark. These are what Theorem 2 needs for asymptotic normality. The practical content: the post-window \(T_1\) must be small relative to the pre-window \(T_0\), and the target must have a non-vanishing pre-period signal. The Monte Carlo below shows the CI’s coverage degrade exactly when \(T_1\) is pushed too large.
When the assumptions bind: practical diagnostics#
Assumptions 1-8 are stated above in their structural form. Here is what each looks like in a real dataset, and what to check in the SI fit object before trusting an arm-level counterfactual.
SUTVA / no spillovers / no carry-over (A1). SI assumes each unit’s post-period outcome under \(d\) is a function of \(d\) only – no influence from other units’ interventions, no dynamic effect from pre-period exposure.
Plausibly violated when the interventions are geographically or socially adjacent (state-level tobacco programs whose advertising crosses borders; vertically linked markets), or when treatment has a persistent effect that bleeds into the post-window. Diagnostic: re-run SI dropping donors that are geographic / network neighbours of treated units; large changes in an arm’s counterfactual flag interference. For genuine spillovers switch to Spillover-Aware Synthetic Control (SPILLSYNTH) or Spatial Synthetic Difference-in-Differences (SpSyDiD); for dynamics switch to Time-Aware Synthetic Control (TASC)/Dynamic Synthetic Control for Auto-Regressive processes (DSCAR).
Factor invariance of unit loadings across interventions (A2 – the SI assumption). Each unit’s \(v_i\) is the same whether observed under control or under \(d\). SI’s transfer step is precisely the statement that weights learned on pre-period control data work to impute post-period outcomes under \(d\).
Plausibly violated when the intervention changes who the donors are: a tax raises the price elasticity itself for tax-state consumers, a marketing program builds new audience segments inside program states. Once \(v_i\) shifts after \(T_0\), the pre-period weights are stale. Diagnostic: this is the silent failure – a pre-fit can look excellent while the counterfactual is biased, because the pre-data is all under control. The empirical cross-check (Section 6.2 of the paper) is to hold out a slice of the donor pool’s post-period under d outcomes and verify the pre-period weights also reproduce those;
SI.fitexposes the per-arm validation coverage (e.g. 26/38, 6/7, 3/5 in the Prop 99 case study). Low validation coverage for an arm is the only honest flag for an A2 failure on that arm.Low-rank donor structure (A3). The donor pre-matrix is approximately low-rank, so the spectral truncation kept by Gavish-Donoho separates signal from noise.
Plausibly violated when the spectrum decays slowly (no clear gap), or when individual donors carry idiosyncratic noise comparable to the signal. Diagnostic: print
arm.singular_values(or recompute the SVD of the donor pre-matrix) and look for a sharp gap; a slow decay means the Gavish-Donoho cut is somewhere in the noise floor. Ifrank_method="donoho"keepskclose tomin(T0, N_d), the low-rank story has failed and SI-PCR is essentially OLS on the donor columns.Span condition (A4). The focal unit’s \(v_i\) lies in the span of the donor pool’s loadings under \(d\).
Plausibly violated when the focal unit is structurally different from every donor under intervention \(d\) – California (a coastal mega-state) against a donor pool that happens to be small interior states. Diagnostic: inspect the per-arm pre-period RMSE of \(Y_{\text{pre}, i}\) against \(Y_{\text{pre}, I(d)} \hat w\); a visibly poor pre-fit on a particular arm means that arm’s span condition is failing. This is the loud failure mode – it shows up directly in pre-fit residuals.
Homoskedastic noise (A5). Only the variance estimate \(\hat\sigma^2\) and hence the CI width depend on this; the point estimator is unaffected.
Plausibly violated when donor variance is heavily heterogeneous (a quiet donor next to a noisy one). Diagnostic: per-donor pre-period residual variance; if it spans an order of magnitude, treat the CI as approximate. Switching to
variance="time_iv"or"double"(the default) is more robust than the main-text"units"estimate.Rate condition on \(T_1\) vs. \(T_0\) (A6-8). Theorem 2 needs \(T_1\) small relative to \(T_0\) and factors that stay bounded. The note further below shows coverage collapsing from 93% to 52% when the post-window is pushed and factors are nonstationary.
Plausibly violated when you want to track an effect over many post-periods (multi-year follow-up after a one-shot policy change), or when factors trend like a random walk (financial / business-cycle panels). Diagnostic: refit with the post-window cropped to the first few periods; if the counterfactual changes materially, the long-horizon CI was not protected by Theorem 2. Pair this with Synthetic Business Cycle (SBC) (a stationary-cycle estimator) if the factor nonstationarity is what you suspect.
Graphical demonstration: span condition vs. factor invariance#
The decisive distinction in practice is between A4 (the span condition, which fails loudly – you see it in a poor pre-fit) and A2 (the cross- intervention factor invariance, which fails silently – pre-fit looks fine but the post-period counterfactual is wrong). The block below generates a rank-\(r = 2\) panel and overlays SI’s counterfactual on the true noiseless one in two regimes side-by-side, holding everything else fixed.
import numpy as np
import matplotlib.pyplot as plt
from mlsynth.utils.si_helpers.estimation import bias_corrected_fit
N, T0, T1, r, sigma = 12, 80, 20, 2, 0.5
T = T0 + T1
rng = np.random.default_rng(0)
# Shared time factors. The intervention's effect is a post-period shift in u_t.
U_ctrl = rng.normal(0.0, 1.0, (T, r))
U_d = U_ctrl.copy()
U_d[T0:] += np.array([0.0, 5.0])
V = rng.normal(0.0, 1.0, (N, r)) # donor loadings (unit factors)
def fit_panel(v_target_pre, v_target_post, V_donor, seed):
"""One SI fit. v_target_pre / v_target_post are the focal unit's
loading under control (pre-period) and under intervention d (post)."""
gen = np.random.default_rng(seed)
pre_donor = U_ctrl[:T0] @ V_donor.T + sigma * gen.standard_normal((T0, N - 1))
pre_target = U_ctrl[:T0] @ v_target_pre + sigma * gen.standard_normal(T0)
post_donor = U_d[T0:] @ V_donor.T + sigma * gen.standard_normal((T1, N - 1))
omega, w, _ = bias_corrected_fit(pre_donor, pre_target, rank=r)
pre_fit = pre_donor[:, omega] @ w
cf = post_donor[:, omega] @ w
truth = U_d[T0:] @ v_target_post # noiseless cf under d
return pre_target, pre_fit, cf, truth
v_in_span = 0.5 * V[1] + 0.5 * V[2] # focal loading inside donor span
# Regime A: A2 and A4 both hold.
regime_A = fit_panel(v_in_span, v_in_span, V[1:], seed=1)
# Regime B: A2 violated -- focal unit's loading SHIFTS under d.
# (Donors still satisfy A2; only the focal unit changes between
# pre-control and post-d. This is the structural identifying failure.)
v_target_under_d = v_in_span + np.array([1.5, -1.5])
regime_B = fit_panel(v_in_span, v_target_under_d, V[1:], seed=2)
# Regime C: A4 violated -- focal loading far outside donor span.
v_out_span = np.array([4.0, -4.0])
regime_C = fit_panel(v_out_span, v_out_span, V[1:], seed=3)
fig, ax = plt.subplots(1, 3, figsize=(15, 4), sharex=True)
t_pre, t_post = np.arange(T0), np.arange(T0, T)
titles = [
"A2 & A4 hold:\nSI cf hugs truth",
"A2 violated (silent):\npre-fit OK, cf badly biased",
"A4 violated (loud):\npre-fit poor -- visible flag",
]
for a, (pre_t, pre_f, cf, truth), title in zip(ax, [regime_A, regime_B, regime_C], titles):
a.plot(t_pre, pre_t, "k", lw=1.0, alpha=0.6, label="focal (observed)")
a.plot(t_pre, pre_f, "C0--", lw=1.2, label="SI pre-fit")
a.plot(t_post, truth, "k", lw=1.8, label="true cf under d")
a.plot(t_post, cf, "C3", lw=1.8, label="SI cf under d")
a.axvline(T0, color="gray", ls=":")
a.set_title(title); a.set_xlabel("time")
pre_rmse = float(np.sqrt(((pre_t - pre_f) ** 2).mean()))
post_err = float(np.abs(cf - truth).mean())
a.text(0.02, 0.97,
f"pre RMSE = {pre_rmse:.2f}\npost |cf-truth| = {post_err:.2f}",
transform=a.transAxes, va="top", fontsize=9, family="monospace",
bbox=dict(boxstyle="round,pad=0.3", fc="white", ec="0.7"))
ax[0].set_ylabel("Y")
ax[0].legend(loc="lower left", fontsize=8)
plt.tight_layout(); plt.show()
Representative output (seeds fixed in the snippet):
Regime A (assumptions hold): pre RMSE = 0.53 post |cf - truth| = 0.24
Regime B (A2 violated): pre RMSE = 0.55 post |cf - truth| = 7.46
Regime C (A4 violated): pre RMSE = 1.07 post |cf - truth| = 1.03
The three panels read as follows.
Left (assumptions hold). Pre-fit is tight and the SI counterfactual sits on top of the truth in the post-period; the mean absolute gap is at the noise floor.
Middle (A2 violated – silent). The focal unit’s loading \(v_i\) changes between the pre-period (under control) and the post-period (under \(d\)). Pre-fit is just as tight as in Regime A (pre RMSE 0.55 vs 0.53) – the pre-data is all under control, so the shift in the focal unit’s loading is invisible to it. But the weights learned on pre-control data target the wrong loading for the post-d projection, and the SI counterfactual misses the truth by an order of magnitude (post error ~7.5 vs noise floor ~0.5). Nothing in the pre-period fit warns of this. The only practical defence is the arm-level validation-coverage check the case study uses: hold out part of the donor pool’s post-period under \(d\) and verify that the pre-period weights reproduce it. Low validation coverage for an arm = A2 failure on that arm.
Right (A4 violated – loud). The focal unit’s loading is structurally outside the donor span. Pre-fit residuals are visibly large already (pre RMSE 1.07, twice the noise floor), and the post-period counterfactual is correspondingly wrong. This failure mode is detectable from pre-data alone, so it is the safer one.
The take-away: a tight pre-period fit is necessary but not sufficient for trusting an SI counterfactual on a given arm. Always pair it with arm-level validation coverage before reading off post-period effects.
When not to use SI#
You only need a single counterfactual against control. SI’s whole point is comparing a focal unit’s counterfactual across several alternative interventions. If you only need the status-quo counterfactual (the classical SC question), Two-Step Synthetic Control, Forward Difference-in-Differences (FDID), canonical SCM, or Factor Model Approach (FMA) are simpler and have stronger small-\(T_0\) properties.
Donor pool for an arm is too small for the rank. SI’s bias-corrected inference requires \(N_d > k\) (and ideally \(N_d\) somewhat bigger than the selected rank). With three or four donors per arm and a Gavish-Donoho rank near that, the rank-complete subset \(\Omega\) is the entire donor set and the variance is unstable. Either pool arms (broader policy categories), prune the rank by hand (
rank_method="fixed"), or step down to a single-arm SC.Spillovers across treatment arms. Interventions whose effect propagates across units (a tobacco program’s media campaign reaching tax-only states, a marketing intervention shared via social graph) break A1 at the inter-arm boundary, not just within an arm. SI cannot fix this; use Spillover-Aware Synthetic Control (SPILLSYNTH) or Spatial Synthetic Difference-in-Differences (SpSyDiD) and accept that identification is now at the aggregate (not per-arm) level.
Dynamic / carry-over treatment effects. SI’s tensor model is static: \(u_t(d)\) depends on \(d\) and \(t\) but the treatment has no within-unit dynamics. Persistent effects (a tax that trains consumers over time) or treatment-effect dynamics on the treated belong in Time-Aware Synthetic Control (TASC) (state-space) or Dynamic Synthetic Control for Auto-Regressive processes (DSCAR) (autoregressive treated process).
Staggered adoption with a wide reporting window. SI’s framing treats each donor as observed under a single intervention throughout the post-period; the paper and
mlsynthapproximate staggered designs with a common short post-window (1999-2002 in the Prop 99 case). If your post-window must span many years of cumulative adoption – some donors enter the intervention years apart – SI’s identification gradually erodes. Use the staggered SC variants in FECT or Synthetic Difference-in-Differences (SDID) instead.No low-rank factor structure. When the donor spectrum has no clear gap (e.g. each donor genuinely idiosyncratic), the Gavish-Donoho cut keeps too many components and SI-PCR essentially fits OLS. In that regime a covariate-balancing estimator (MicroSynth (User-Level Balancing SC) if you have unit-level data, a balancing-aware aggregate alternative otherwise) is closer to the truth than forcing a low-rank fit.
Continuous or multi-valued treatment. SI partitions units into arms by discrete intervention label \(d\). Continuous dose (minimum wage, ad spend, drug dosage) needs the GSC framework in Continuous-Treatment Synthetic Control (CTSC).
Long post-window with nonstationary factors. As the note below shows, coverage collapses (~93% → ~52%) when \(T_1\) grows and factors drift like a random walk. If your application requires many-period post-windows on trending series, either crop the post-window for inference and report the rest as descriptive, or switch to a stationary-cycle approach (Synthetic Business Cycle (SBC)).
You need per-period or per-unit causal estimates inside an arm. SI delivers the focal unit’s counterfactual under each intervention, not unit-specific effects across the donor pool. For heterogeneous treatment effects across donors within an arm, use Continuous-Treatment Synthetic Control (CTSC) (which estimates unit-specific slopes).
Mathematical Formulation#
The SI Estimator (Proposition 1)#
Under Assumptions 1-4 there is a weight vector \(w^{(i,d)}\) such that the estimand is recovered from donor-pool outcomes under \(d\), and the weights are identified from pre-period control data alone. The SI estimator is two steps:
The choice of constraint set \(\mathcal{W}\) recovers the usual SC variants
(simplex, ridge, lasso, OLS). mlsynth implements the PCR variant.
SI-PCR (eq. 10)#
Let the SVD of the donor pre-matrix be \(Y_{\text{pre}, I(d)} = \sum_\ell \hat s_\ell \hat u_\ell \hat v_\ell^\top\). Keeping the top \(k\) components,
SI-PCR projects the donor pre-matrix onto its top-\(k\) principal subspace
(denoising it under Assumption 3) and regresses the target onto the result. The
rank \(k\) is chosen by the Gavish-Donoho optimal hard threshold. The
default rank_method="donoho" reproduces the authors’ exact rule (the
\(\omega(\beta)\) approximation evaluated at \(\beta = T_0/N_d\));
"usvt" is the same threshold at the canonical min/max aspect ratio,
"cumvar" keeps a spectral-energy fraction, and "fixed" takes an explicit
\(k\). SI-PCR reuses the HSVT primitives shared with Cluster Synthetic Controls (CLUSTERSC).
Bias-Corrected SI-PCR and Inference (Section 4.3)#
Plain SI-PCR is consistent (Corollary 1) but converges too slowly for normality, because spreading weight across many near-collinear donors dilutes the weight norm and deflates the variance term. The bias-corrected estimator fixes this by restricting to a rank-complete donor subset \(\Omega \subset I(d)\) with \(|\Omega| = k\) columns of full rank, and fitting by pseudo-inverse:
where \(Y^k\) is the rank-\(k\) approximation. This is a second layer
of (structured) sparsity: contributions outside \(\Omega\) are zeroed by
explicit model selection, concentrating weight along independent directions and
stabilising the weight norm. mlsynth selects \(\Omega\) by
column-pivoted QR on the denoised donor matrix.
The estimator is then asymptotically normal (Theorem 2): \(\sqrt{T_1}\,(\hat\theta_i^\Omega(d) - \theta_i(d)) / (\sigma \|w\|_2) \to \mathcal{N}(0, 1)\), giving the closed-form confidence interval
where \(\hat U_k\) are the left singular vectors of the rank-\(k\) donor approximation (eq. 14) and the noise estimate is the residual of regressing the target’s pre-period onto the donor subspace.
mlsynth exposes three noise-variance estimators via variance: the
main-text "units" estimator (eq. 14 above), a "time_iv" estimator from
the donor post-period residual, and the degrees-of-freedom-weighted "double"
combination (the default, matching the authors’ code). The interval can be the
eq.-13 confidence interval (interval="confidence") or the wider prediction
interval (interval="prediction", half-width
\(z_{\alpha/2}\hat\sigma\sqrt{1 + \|\hat w\|_2^2}/\sqrt{T_1}\)) the case
study uses for coverage validation.
Monte Carlo: Coverage of the Confidence Interval#
The block below draws low-rank panels (a focal unit plus a donor pool sharing \(r\) latent factors), fits the bias-corrected estimator, and checks whether the CI covers the true (noiseless) counterfactual mean \(\theta_i(d)\). The focal unit receives no genuine effect, so the counterfactual is its own noiseless post-period mean.
import numpy as np
from mlsynth.utils.si_helpers.estimation import bias_corrected_fit
rng = np.random.default_rng(0)
N, T0, T1, r, sigma = 10, 80, 4, 3, 1.0 # paper regime: T1 small vs T0
cov = 0
reps = 600
for _ in range(reps):
T = T0 + T1
F = rng.normal(0, 1, (T, r)) # bounded (stationary) factors
lam = rng.normal(0, 1, (N, r))
L = lam @ F.T
Y = L + sigma * rng.standard_normal((N, T))
donor_pre, target_pre = Y[1:, :T0].T, Y[0, :T0]
omega, w, sig = bias_corrected_fit(donor_pre, target_pre, rank=r)
theta_hat = (Y[1:, T0:].T[:, omega] @ w).mean()
theta_true = L[0, T0:].mean()
half = 1.96 * sig * np.linalg.norm(w) / np.sqrt(T1)
cov += theta_hat - half <= theta_true <= theta_hat + half
print(f"95% CI coverage: {cov / reps:.3f}") # ~0.933
Under the paper’s regime (\(T_0 = 80\), \(T_1 = 4\), bounded factors) the empirical coverage is 0.933, close to the nominal 0.95 — the CI is valid.
Note
Coverage degrades sharply when Theorem 2’s conditions are violated. Repeating the experiment with random-walk (nonstationary) factors and a large post-window (\(T_1 = 10\) vs \(T_0 = 30\)) drops coverage to \(\approx 0.52\): weight-estimation error multiplied by an unbounded, growing post-period signal swamps the variance the CI accounts for. The practical lesson mirrors the paper — keep \(T_1\) short relative to \(T_0\), which is also why the empirical study below fixes a short post-window.
Empirical Application: Proposition 99 (California)#
We reproduce the paper’s case study (Section 6) on the 50-state cigarette-sales panel (1970-2015): California (focal) under three interventions — control (38 status-quo states), a cigarette-tax increase (Alaska, Hawaii, Maryland, Michigan, New Jersey, New York, Washington), and an anti-tobacco program (Arizona, Massachusetts, Oregon, Florida). Following the paper, weights are fit on 1970-1988 (\(T_0 = 19\)) and the counterfactual is reported over the common 1999-2002 window (\(T_1 = 4\)).
import numpy as np
import pandas as pd
from mlsynth import SI
raw = pd.read_csv(
"https://raw.githubusercontent.com/jehangiramjad/tslib/"
"refs/heads/master/tests/testdata/prop99.csv"
)
raw = raw[(raw.Year >= 1970) & (raw.Year <= 2015)]
raw = raw[raw.SubMeasureDesc == "Cigarette Consumption (Pack Sales Per Capita)"]
d = raw[["LocationDesc", "Year", "Data_Value"]].rename(
columns={"LocationDesc": "state", "Year": "year", "Data_Value": "cigsale"})
d = d[d.state != "District of Columbia"]
d = d[(d.year <= 1988) | ((d.year >= 1999) & (d.year <= 2002))] # fit + report
tax = ["Alaska", "Hawaii", "Maryland", "Michigan", "New Jersey", "New York", "Washington"]
program = ["Arizona", "Massachusetts", "Oregon", "Florida"] # California is a program state
treated = set(tax) | set(program) | {"California"}
d["control"] = (~d.state.isin(treated)).astype(int)
d["taxes"] = d.state.isin(tax).astype(int)
d["program"] = d.state.isin([s for s in program if s != "California"] + ["California"]).astype(int)
d["Prop99"] = ((d.state == "California") & (d.year >= 1999)).astype(int)
res = SI({
"df": d, "outcome": "cigsale", "unitid": "state", "time": "year",
"treat": "Prop99", "inters": ["control", "taxes", "program"],
"interval": "prediction", "display_graphs": True,
}).fit()
for iv, arm in res.arms.items():
lo, hi = arm.cf_mean_ci
print(f"{iv:>8}: k={arm.selected_rank} cf={arm.cf_mean:.1f} 95% PI=({lo:.1f}, {hi:.1f})")
The Gavish-Donoho threshold selects k = 5 for the control donor pool and k = 1 for both the tax and program pools — exactly the ranks the paper reports (Section 6.2.1) — and the bias-corrected estimator’s prediction interval matches the published numbers:
Intervention |
k |
Counterfactual |
95% PI |
|---|---|---|---|
Status quo (control) |
5 |
75.8 |
(70.9, 80.6) |
Tax increase |
1 |
57.5 |
(48.0, 67.1) |
Anti-tobacco program |
1 |
59.1 |
(49.3, 68.9) |
The reading mirrors the paper: California’s observed 1999-2002 sales (~50 packs) sit below all three counterfactuals, and the control counterfactual (75.8) is far higher than the tax (57.5) or program (59.1) ones — i.e. relative to having done nothing, Prop 99 cut sales sharply, while relative to a tax or a program the additional effect is modest. The tax and program counterfactuals overlap heavily, consistent with the paper’s conclusion that the two policy levers deliver similar trajectories.
Core API#
Synthetic Interventions (SI) estimator.
Implements:
Agarwal, A., Shah, D., & Shen, D. (2026). “Synthetic Interventions: Extending Synthetic Controls to Multiple Treatments.” Operations Research 74(2):840-859.
SI extends synthetic control to multiple interventions. For a focal target
unit, it estimates the counterfactual outcome the unit would have realised
under each alternative intervention d by:
regressing the target’s pre-treatment (control) outcomes onto the pre-treatment outcomes of the units that actually received
d(the donor poolI(d)), thenapplying those weights to the donor pool’s post-treatment outcomes under
dto predict the target’s counterfactual underd.
The default variant is bias-corrected SI-PCR (Section 4.3): the donor
pre-matrix is denoised by rank-k HSVT, weights are fit on a rank-complete
donor subset, and an asymptotic-normality confidence interval is reported.
- class mlsynth.estimators.si.SI(config: SIConfig | dict)#
Bases:
objectSynthetic Interventions (SI) estimator.
- Parameters:
config (SIConfig or dict) – Configuration object. See
mlsynth.config_models.SIConfig.- Returns:
SIResults – Per-intervention donor weights, counterfactuals, ATTs, and (with the default bias-corrected estimator) asymptotic-normality confidence intervals.
Configuration#
- class mlsynth.config_models.SIConfig(*, df: ~pandas.DataFrame, outcome: str, treat: str, unitid: str, time: str, display_graphs: bool = True, save: bool | str = False, counterfactual_color: ~typing.List[str] = <factory>, treated_color: str = 'black', inters: ~typing.Annotated[~typing.List[str], ~annotated_types.MinLen(min_length=1)], rank_method: ~typing.Literal['donoho', 'usvt', 'cumvar', 'fixed'] = 'donoho', rank: ~typing.Annotated[int | None, ~annotated_types.Ge(ge=1)] = None, cumvar_threshold: ~typing.Annotated[float, ~annotated_types.Gt(gt=0.0), ~annotated_types.Le(le=1.0)] = 0.95, bias_correct: bool = True, variance: ~typing.Literal['double', 'units', 'time_iv'] = 'double', interval: ~typing.Literal['confidence', 'prediction'] = 'confidence', alpha: ~typing.Annotated[float, ~annotated_types.Gt(gt=0.0), ~annotated_types.Lt(lt=1.0)] = 0.05)#
Configuration for the Synthetic Interventions (SI) estimator.
Implements:
Agarwal, A., Shah, D., & Shen, D. (2026). “Synthetic Interventions: Extending Synthetic Controls to Multiple Treatments.” Operations Research 74(2):840-859.
SI estimates the focal target unit’s counterfactual outcome under each alternative intervention in
intersvia SI-PCR: regress the target’s pre-treatment control outcomes onto the rank-kdenoised donor pool, then apply the weights to the donor pool’s post-intervention outcomes.- Parameters:
inters (list of str) – Binary indicator columns naming the alternative interventions; for each, the units flagged
1form that intervention’s donor pool.rank_method ({“donoho”, “usvt”, “cumvar”, “fixed”}) – Spectral-rank rule for the donor pre-matrix.
"donoho"(default) reproduces the paper’s exact Gavish-Donoho optimal hard threshold (evaluated atratio = T0 / Nd);"usvt"is the same threshold at the canonicalmin/maxaspect ratio;"cumvar"keeps enough components forcumvar_thresholdof the spectral energy;"fixed"usesrank.rank (int or None) – Explicit spectral rank
kforrank_method="fixed".cumvar_threshold (float) – Cumulative-energy target in
(0, 1]forrank_method="cumvar".bias_correct (bool) – Use the bias-corrected SI-PCR estimator (default
True), which fits weights on a rank-complete donor subset and enables asymptotic-normality intervals (Section 4.3).Falsegives plain SI-PCR (eq. 10), point estimate only.variance ({“double”, “units”, “time_iv”}) – Noise-variance estimator behind the interval.
"double"(default) matches the paper’s code (a d.o.f.-weighted combination);"units"is the main-text eq. 14;"time_iv"uses the donor post-period residual.interval ({“confidence”, “prediction”}) – Interval type.
"confidence"(default) is the eq.-13 CI for the counterfactual mean;"prediction"is the wider interval the paper’s case study uses for coverage validation.alpha (float) – Two-sided significance level for the intervals.
display_graphs (bool) – Show the observed-vs-counterfactual plot after fitting.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
Result Containers#
SI.fit() returns an
SIResults, whose arms maps each
intervention to an SIArm (donor
weights, the counterfactual, the ATT, the selected rank, and – under bias
correction – \(\hat\sigma\), the weight norm, and the confidence
intervals), alongside the prepared
SIInputs.
Frozen dataclass containers for the Synthetic Interventions (SI) pipeline.
Implements the containers used throughout SI, which itself implements:
Agarwal, A., Shah, D., & Shen, D. (2026). “Synthetic Interventions: Extending Synthetic Controls to Multiple Treatments.” Operations Research 74(2):840-859.
All containers are frozen (immutable) per the repository convention.
- class mlsynth.utils.si_helpers.structures.SIArm(name: str, donor_names: List[str], weights: Dict[str, float], selected_rank: int, omega_names: List[str], counterfactual: ndarray, gap: ndarray, att: float, cf_mean: float, pre_rmse: float, bias_corrected: bool, sigma_hat: float | None = None, weight_norm: float | None = None, cf_mean_ci: Tuple[float, float] | None = None, att_ci: Tuple[float, float] | None = None)#
Bases:
objectSI estimate of the focal unit’s outcome under one intervention.
- Parameters:
name (str) – Intervention label.
donor_names (list of str) – Full donor pool for this intervention.
weights (dict of {str: float}) – Donor weights with non-trivial magnitude. For the bias-corrected estimator these are supported on the active subset
omega_names.selected_rank (int) – Spectral rank
kused by SI-PCR.omega_names (list of str) – Active (rank-complete) donor subset for the bias-corrected estimator (Agarwal-Shah-Shen Section 4.3); equals
donor_namesfor the plain SI-PCR estimator.counterfactual (np.ndarray) – Focal unit’s counterfactual outcome under this intervention over the full timeline, shape
(T,).gap (np.ndarray) – Observed minus counterfactual, shape
(T,).att (float) – Average post-treatment effect (mean of the post-period gap).
cf_mean (float) – Average post-treatment counterfactual outcome
theta_i(d).pre_rmse (float) – Pre-treatment root-mean-square fit error.
bias_corrected (bool) – Whether the bias-corrected estimator (and hence inference) was used.
sigma_hat (float, optional) – Estimated noise standard deviation (eq. 14);
Nonewithout bias correction.weight_norm (float, optional) –
||w(i, d, Omega)||_2, the bias-corrected weight norm entering the CI;Nonewithout bias correction.cf_mean_ci (tuple of (float, float), optional) – Asymptotic-normality confidence interval for
cf_mean(eq. 13).att_ci (tuple of (float, float), optional) – Confidence interval for
att(the CI half-width is shared withcf_mean_cisince the observed outcome is fixed).
- counterfactual: ndarray#
- gap: ndarray#
- class mlsynth.utils.si_helpers.structures.SIDonorPool(name: str, matrix: ndarray, names: List[str])#
Bases:
objectDonor pool for one alternative intervention.
- Parameters:
name (str) – Intervention label (a column in
inters).matrix (np.ndarray) – Donor outcomes over the full timeline, shape
(T, Nd)– rows are periods, columns the units that received this intervention.names (list of str) – Donor unit labels (length
Nd).
- matrix: ndarray#
- class mlsynth.utils.si_helpers.structures.SIInputs(treated_unit_name: Any, y_target: ndarray, T0: int, time_labels: ndarray, pools: Dict[str, SIDonorPool])#
Bases:
objectPreprocessed panel data for SI.
The focal (target) unit is the one flagged by the
treatcolumn; SI estimates its counterfactual outcome under each alternative intervention.- Parameters:
treated_unit_name (Any) – Label of the focal target unit.
y_target (np.ndarray) – Focal unit’s observed outcome over the full timeline, shape
(T,).T0 (int) – Number of (common) pre-treatment periods.
time_labels (np.ndarray) – Time-period labels (length
T).pools (dict of {str: SIDonorPool}) – Donor pool per alternative intervention.
- property Y_post: ndarray#
Focal unit’s post-treatment outcomes, shape
(T1,).
- property Y_pre: ndarray#
Focal unit’s pre-treatment outcomes, shape
(T0,).
- pools: Dict[str, SIDonorPool]#
- time_labels: ndarray#
- y_target: ndarray#
- class mlsynth.utils.si_helpers.structures.SIResults(inputs: SIInputs, arms: Dict[str, SIArm], alpha: float, bias_corrected: bool)#
Bases:
objectUser-facing output of the SI estimator.
- Parameters:
inputs (SIInputs) – Preprocessed panel data for the focal unit and donor pools.
arms (dict of {str: SIArm}) – One
SIArmper alternative intervention.alpha (float) – Two-sided significance level used for the confidence intervals.
bias_corrected (bool) – Whether the bias-corrected SI-PCR estimator (with CIs) was used.
- property observed: ndarray#
Focal unit’s observed outcome over the full timeline, shape
(T,).
Helper Modules#
Input preparation for the Synthetic Interventions (SI) estimator.
Builds an SIInputs from a long
panel: the focal target unit (flagged by treat) plus one donor pool per
alternative intervention column in inters.
- mlsynth.utils.si_helpers.setup.prepare_si_inputs(df: DataFrame, outcome: str, unitid: str, time: str, treat: str, inters: List[str]) SIInputs#
Prepare focal-unit data and per-intervention donor pools.
- Parameters:
df (pandas.DataFrame) – Balanced long panel.
outcome, unitid, time, treat (str) – Column names.
treatflags the focal target unit’s treatment timing (defining the common pre-periodT0).inters (list of str) – Binary indicator columns; for each, the units flagged
1form that intervention’s donor pool.
- Returns:
SIInputs
SI-PCR estimation math (Agarwal, Shah & Shen 2026).
Two estimators are implemented on top of the shared HSVT primitives in
mlsynth.utils.clustersc_helpers.pcr.hsvt:
si_pcr_weights()– the plain SI-PCR weights (paper eq. 10): regress the target’s pre-period control outcomes onto the rank-kdenoised donor pool.bias_corrected_fit()– the bias-corrected SI-PCR estimator (Section 4.3): restrict to a rank-complete donor subsetOmega(|Omega| = k), fit weights by the pseudo-inverse (eq. 12), and return the noise-variance estimate (eq. 14) and weight norm needed for the asymptotic-normality CI (eq. 13).
All routines take the pre-treatment donor matrix (under control) and the target’s pre-treatment outcomes; the post-period prediction (applying the weights to donor outcomes under the intervention) lives in the orchestrator.
- mlsynth.utils.si_helpers.estimation.bias_corrected_fit(donor_pre: ndarray, target_pre: ndarray, rank: int) Tuple[List[int], ndarray, float]#
Bias-corrected SI-PCR fit on a rank-complete subset (Section 4.3).
Restricts to a rank-complete donor subset
Omegaand fits weights by the pseudo-inverse of the denoised pre-matrix (eq. 12), then estimates the noise variance from the target’s residual against the rank-kdonor subspace (eq. 14).- Parameters:
donor_pre (np.ndarray) – Donor pre-treatment (control) outcomes, shape
(T0, Nd).target_pre (np.ndarray) – Target pre-treatment (control) outcomes, shape
(T0,).rank (int) – Spectral rank
k(also|Omega|).
- Returns:
omega (list of int) – Indices of the active donor subset.
w_omega (np.ndarray) – Bias-corrected weights on
omega, shape(|Omega|,).sigma_hat (float) – Estimated noise standard deviation (square root of eq. 14).
- mlsynth.utils.si_helpers.estimation.donoho_rank(s: ndarray, ratio: float) int#
Gavish-Donoho (2014) optimal hard threshold, as applied by Agarwal-Shah-Shen.
The authors evaluate the \(\omega(\beta)\) approximation at
ratio = m / n– the donor pre-matrix’s rows-over-columns (\(T_0 / N_d\)) – rather than the canonicalmin/maxaspect ratio. Reproduced here verbatim so SI matches the paper’s reported ranks; seerank_method="donoho".
- mlsynth.utils.si_helpers.estimation.resolve_rank(donor_pre: ndarray, rank_method: str, rank: int = None, cumvar_threshold: float = 0.95) int#
Resolve the spectral rank
kfor a donor pre-matrix."donoho"(the SI default) reproduces Agarwal-Shah-Shen’s exact rank rule (donoho_rank()withratio = T0 / Nd). The remaining modes delegate tomlsynth.utils.clustersc_helpers.pcr.hsvt.select_rank()so SI shares ClusterSC’s HSVT machinery ("usvt"is the same threshold evaluated at the canonicalmin/maxaspect ratio,"cumvar"/"fixed"as in HSVT).
- mlsynth.utils.si_helpers.estimation.select_omega(donor_pre: ndarray, rank: int) List[int]#
Pick a rank-complete donor subset
Omega(|Omega| = k).Selects
rankcolumns of the rank-kdenoised donor matrix that are linearly independent (full column rankk) via column-pivoted QR – the pivots are the most independent columns, which is the structured model selection the bias-corrected estimator relies on (Section 4.3).- Parameters:
donor_pre (np.ndarray) – Donor pre-treatment outcomes, shape
(T0, Nd).rank (int) – Number of donors to retain (
k).
- Returns:
list of int – Column indices of the selected donors (length
min(rank, Nd)).
- mlsynth.utils.si_helpers.estimation.si_pcr_weights(donor_pre: ndarray, target_pre: ndarray, rank: int) ndarray#
Plain SI-PCR donor weights over the full pool (paper eq. 10).
w_hat = (sum_{l<=k} (1/s_l) v_l u_l^T) y_pre,i, i.e. regress the target’s pre-period outcomes onto the top-rankprincipal subspace of the donor pre-matrix.- Parameters:
donor_pre (np.ndarray) – Donor pre-treatment (control) outcomes, shape
(T0, Nd).target_pre (np.ndarray) – Target pre-treatment (control) outcomes, shape
(T0,).rank (int) – Spectral truncation rank
k.
- Returns:
np.ndarray – Donor weight vector, shape
(Nd,).
- mlsynth.utils.si_helpers.estimation.variance_estimation(U_k: ndarray, V_k: ndarray, target_pre: ndarray, donor_post: ndarray) Tuple[float, float, float]#
Noise-standard-deviation estimates (Agarwal-Shah-Shen
inference.py).Returns
(double, units, time_iv):units– the main-text estimator (eq. 14): residual of the target’s pre-period against the donor left-singular subspace, overT0 - k.time_iv– the donor post-period residual against the right-singular subspace, overT1 (Nd - k).double– the degrees-of-freedom-weighted combination of the two (the estimator the paper’s code uses for its intervals).
- Parameters:
U_k (np.ndarray) – Left singular vectors of the rank-
kdonor pre-matrix, shape(T0, k).V_k (np.ndarray) – Right singular vectors, shape
(Nd, k).target_pre (np.ndarray) – Target pre-period outcomes, shape
(T0,).donor_post (np.ndarray) – Donor post-period outcomes, shape
(T1, Nd).
Top-level SI solve: fit every intervention arm and assemble results.
- mlsynth.utils.si_helpers.orchestration.solve_si(inputs: SIInputs, rank_method: str = 'donoho', rank: int | None = None, cumvar_threshold: float = 0.95, bias_correct: bool = True, alpha: float = 0.05, variance: str = 'double', interval: str = 'confidence') SIResults#
Fit the SI estimator for every alternative intervention.
- Parameters:
inputs (SIInputs) – Prepared focal-unit data and donor pools.
rank_method ({“donoho”, “usvt”, “cumvar”, “fixed”}) – Spectral-rank rule.
"donoho"(default) reproduces the paper’s exact Gavish-Donoho rank (ratio = T0 / Nd).rank (int, optional) – Explicit rank for
rank_method="fixed".cumvar_threshold (float) – Cumulative-energy target for
rank_method="cumvar".bias_correct (bool) – Use the bias-corrected SI-PCR estimator (enables intervals).
alpha (float) – Two-sided significance level for the intervals.
variance ({“double”, “units”, “time_iv”}) – Noise-variance estimator behind the interval.
"double"(default) matches the paper’s code;"units"is the main-text eq. 14.interval ({“confidence”, “prediction”}) – Interval type.
"confidence"is the eq.-13 CI for the counterfactual mean;"prediction"is the wider prediction interval the case study uses for coverage validation.
- Returns:
SIResults
Data-generating processes for the SI simulation studies.
Self-contained reimplementations of the two low-rank DGPs in Agarwal, Shah & Shen (2026), so the consistency (Section 5.1) and inference (Section 5.2) studies can be replicated without the authors’ external code:
generate_low_rank_matrix()– the inference-study DGP: post-period time-intervention factors are projected onto the pre-period factor span, so the target’s signal is recoverable from the donor pool (used to measure CI coverage).generate_low_rank_matrices()– the consistency-study DGP: returns an in-span (A8 holds) and an out-of-span (A8 fails) post-period, to show SI-PCR is consistent only when the rank condition holds.
In both, the target unit’s loading is forced into the convex/linear span of the donor loadings (Assumption 4).
- mlsynth.utils.si_helpers.simulation.generate_low_rank_matrices(N: int, T0: int, T1: int, r: int, r_pre: int, rng: Generator | None = None) Tuple[ndarray, ndarray]#
Consistency-study DGP (paper Section 5.1).
Returns two expected-outcome matrices that share the pre-period but differ post-period:
A_inprojects the post-period factors onto the pre-period span (rank condition holds,A8 holds) andA_outonto its orthogonal complement (rank condition fails,A8 fails).- Returns:
(A_in, A_out) (tuple of np.ndarray) – Each shape
(T0 + T1, N).
- mlsynth.utils.si_helpers.simulation.generate_low_rank_matrix(N: int, T0: int, T1: int, r: int, rng: Generator | None = None) ndarray#
Inference-study DGP (paper Section 5.2).
Builds an
(T0 + T1, N)expected-outcome matrix whose post-period time-intervention factors lie in the pre-period factor span (so the target is recoverable). The last column is the target unit.- Returns:
np.ndarray – Expected outcomes
A, shape(T0 + T1, N).
Plotting for SI: the focal unit’s observed series vs its counterfactual under each alternative intervention.
References#
Agarwal, A., Shah, D., & Shen, D. (2026). “Synthetic Interventions: Extending Synthetic Controls to Multiple Treatments.” Operations Research 74(2):840-859. See [SI].
Abadie, A., Diamond, A., & Hainmueller, J. (2010). “Synthetic Control Methods for Comparative Case Studies.” Journal of the American Statistical Association 105(490):493-505.
Agarwal, A., Shah, D., Shen, D., & Song, D. (2021). “On Robustness of Principal Component Regression.” Journal of the American Statistical Association 116(536):1731-1745.
Gavish, M., & Donoho, D. L. (2014). “The Optimal Hard Threshold for Singular Values is \(4/\sqrt{3}\).” IEEE Transactions on Information Theory 60(8):5040-5053.