Imperfect Synthetic Controls (ISCM)

Imperfect Synthetic Controls (ISCM)#

Overview#

ISCM (Powell, D. (2026). “Imperfect Synthetic Controls,” Journal of Applied Econometrics 41(3):253-264) confronts the synthetic control method’s least defensible assumption: that a perfect synthetic control exists. The classic SCM requires the treated unit to lie inside the convex hull of the donors and its pre-treatment path to be matched exactly. With transitory shocks – noise with non-vanishing variance – an exact fit is impossible even in expectation, and the convex-hull condition may simply fail (the treated unit can be more extreme than any weighted average of donors).

ISCM relaxes this with two ideas:

Synthetic controls for every unit. Rather than fitting one synthetic control for the treated unit, ISCM builds one for all units. The treatment effect is then identified even when the treated unit is outside the convex hull – because it can still appear as a donor for control units, and those units’ post-treatment residuals carry information about the effect (paper eq. 6).
Moment conditions robust to transitory shocks. ISCM relies on conditions of the form \(\sum_{j} w_i^j \mathbb{E}[y_{jt}] = \mathbb{E}[y_{it}]\) that need only hold in expectation, producing asymptotically unbiased estimates as the pre-period grows even when no unit fits perfectly in sample.

It adds a data-driven fit metric \(a_i\) that asymptotically excludes units lacking a valid synthetic control – removing the researcher’s eyeball “is the pre-fit good enough” decision – and an Ibragimov-Muller inference procedure that stays valid with a tiny donor pool.

The identifying intuition#

Suppose the treated unit (unit \(1\)) is too extreme to be matched by any convex combination of controls. A control unit \(i\) whose synthetic control does place weight \(w_i^1 > 0\) on unit \(1\) will, after treatment, have its synthetic counterfactual contaminated by the effect: its residual picks up \(-w_i^1 \tau\). Since unit \(i\) is itself untreated, regressing its residual on its “treatment exposure” \(-w_i^1\) recovers \(\tau\). ISCM pools this signal across all such units.

When to use ISCM#

The treated unit’s pre-period path is not well inside the donor convex hull (it trends above/below all donors), so traditional SCM produces a visibly poor fit and a biased counterfactual.
Outcomes are noisy (transitory shocks), so an exact pre-period match is implausible and would overfit.
The donor pool is small, so permutation inference cannot reach conventional significance.
You have a long pre-period (the method’s guarantees are asymptotic in \(T_0\)).

Notation#

Let the units be \(\mathcal{N} \coloneqq \{1, \dots, N\}\), with the treated unit indexed \(1\); because ISCM builds a synthetic control for every unit, a running unit index \(i \in \mathcal{N}\) denotes the unit whose synthetic control is being formed, and \(j, k \neq i\) index its donors. Time runs over \(t \in \mathcal{T} \coloneqq \{1, \dots, T\}\), 1-indexed; the intervention takes effect after period \(T_0\), splitting \(\mathcal{T}\) into the pre-period \(\mathcal{T}_1 \coloneqq \{t \in \mathcal{T} : t \le T_0\}\) (of length \(T_0\)) and the post-period \(\mathcal{T}_2 \coloneqq \{t \in \mathcal{T} : t > T_0\}\).

The scalar outcome is \(y_{jt}\) (unit, then time), with treatment dummy \(d_{jt}\) and transitory shock \(\epsilon_{jt}\); in Abadie’s potential-outcome notation \(y_{jt}^N\) is the outcome without the intervention and \(y_{jt}^I\) under it. Unit \(i\)’s donor weights are \(\mathbf{w}_i \in \mathbb{R}^{N_0}\) (entries \(w_i^j\)), constrained to the unit simplex \(\Delta^{N_0} \coloneqq \{\mathbf{w} \in \mathbb{R}_{\ge 0}^{N_0} : \|\mathbf{w}\|_1 = 1\}\); the optimiser is \(\widehat{\mathbf{w}}_i\). The per-period, per-unit treatment effect is \(\tau_{it}\); the pooled ATT is \(\widehat{\tau}\) and the per-unit estimate \(\widehat{\tau}_i\). The data-driven fit metric is \(a_i\) and the contributing set \(C\).

Mathematical Formulation#

Setup (paper Section 2)#

For \(N\) units over \(T\) periods with a latent-factor outcome

\[y_{it} = \tau_{it}\, d_{it} + L_{it} + \epsilon_{it}, \qquad L_{it} = \boldsymbol{\lambda}_t^\top \boldsymbol{\mu}_i,\]

ISCM builds, for every unit \(i\), a synthetic control from the others (paper eq. 5):

\[\widehat{\mathbf{w}}_i = \operatorname*{argmin}_{\mathbf{w}} \sum_{t \le T_0} \Bigl( y_{it} - \sum_{j \ne i} w^j y_{jt} \Bigr)^2, \quad w^j \ge 0,\ \sum_{j \ne i} w^j = 1.\]

Fit metric (paper eq. 14)#

Each unit is weighted by how well its synthetic control satisfies the SCM moment conditions in the pre-period. With residual \(R_{it} = y_{it} - \sum_j \widehat{w}_i^{\,j} y_{jt}\) and moment vector \(M_i^k = \tfrac{1}{T_0}\sum_{t \le T_0} R_{it} y_{kt}\),

\[a_i = \frac{\min_\ell M_\ell^\top M_\ell}{M_i^\top M_i} \in (0, 1],\]

so the best-fitting unit gets \(a_i = 1\) and units without a valid synthetic control get \(a_i \to 0\) – they are dropped from the estimate automatically.

Treatment effect (paper eq. 8 / 15)#

With treatment exposure \(E_{it} = d_{it} - \sum_j \widehat{w}_i^{\,j} d_{jt}\), the ATT is the \(a_i\)-weighted least-squares slope, pooled over all units and the post-period:

\[\widehat{\tau} = \frac{\sum_i a_i \sum_{t > T_0} E_{it} R_{it}} {\sum_i a_i \sum_{t > T_0} E_{it}^2} = \sum_{i \in C} v_i\, \widehat{\tau}_i, \quad \widehat{\tau}_i = \frac{\sum_{t>T_0} E_{it} R_{it}}{\sum_{t>T_0} E_{it}^2},\]

where \(C\) is the contributing set (units with non-zero exposure) and \(v_i = a_i \sum_t E_{it}^2 / \sum_\ell a_\ell \sum_t E_{\ell t}^2\).

Inference (paper Section 5, eq. 16)#

ISCM produces one estimate \(\widehat{\tau}_i\) per contributing unit. The Ibragimov-Muller approach forms a t-statistic from their weighted spread and calibrates the p-value with a sign-flip (Rademacher) randomization test on the weighted deviations \(v_i(\widehat{\tau}_i - \tau_0)\). This is conservative but valid with very few units – though note the achievable p-value floor is about \(2/2^{|C|}\), so a handful of contributing units cannot reach conventional thresholds (exactly the small-donor-pool limitation Powell highlights).

Scope of this implementation#

This follows Powell’s applied procedure: synthetic controls for all units come from the traditional SCM (the documented starting point), the \(a_i\) weights are formed from the pre-period moment conditions, the ATT is the \(a_i\)-weighted least-squares effect, and inference is the sign-flip test. It does not run the optional continuously-updating GMM refinement that re-estimates the weights jointly to be orthogonal to transitory shocks (paper Section 3.2-3.4); the SCM-initialised weights are that procedure’s starting point and deliver the headline relaxed-convex-hull identification.

Assumptions (Powell 2026)#

ISCM trades the canonical SCM’s “perfect synthetic control” requirement for a substantially weaker set of moment conditions on the transitory shocks. The paper’s formal assumptions (Section 4.1):

A1 (Outcomes). \(y_{it} = \tau_{it}\, d_{it} + L_{it} + \epsilon_{it}\) with \(L_{it}\) a fixed (but possibly latent) systematic component, \(y_{it}\) continuous, and bounded products \(\lVert y_{it} \epsilon_{jt} \rVert < \infty\). The latent component nests interactive fixed effects (\(L_{it} = \boldsymbol{\lambda}_t^\top \boldsymbol{\mu}_i\)), additive two-way FE (\(L_{it} = \theta_i + \gamma_t\)), and other workhorse panel structures.

Remark. The outcome is the standard latent-factor panel: a treatment term plus a fixed systematic component plus transitory noise. Because \(L_{it}\) nests both interactive and additive fixed effects, ISCM does not commit to a particular panel structure – it only needs \(L_{it}\) to be reproducible by a synthetic control (A2), which is what the next assumption asserts.

A2 (Existence of Synthetic Controls). For every unit \(i\), either (a) there exist simplex weights \(\mathbf{w}_i\) such that \(L_{it} = \sum_{k \ne i} w_i^k L_{kt}\) for all \(t\), or (b) there exists some other unit \(j\) with \(w_j^i > 0\) such that \(L_{jt} = \sum_{k \ne j} w_j^k L_{kt}\) for all \(t\).

Remark. In words: every unit either has its own convex-hull synthetic control, or appears as a positive-weight donor in some other unit’s synthetic control. The whole point of ISCM is that (b) suffices for the treated unit – it can be too extreme to admit (a) and still be identifiable through (b).

A3 (Independence of transitory shocks). (a) \(\mathbb{E}[\epsilon_{it} \mid \mathbf{d}_i, L_i] = 0\) (mean-independence of the shocks from treatment and the latent component); (b) \(\mathbb{E}[\epsilon_{it} \epsilon_{jt} \mid \mathbf{d}_i, L_i, \mathbf{d}_j, L_j] = 0\) for all \(i \ne j\) (no contemporaneous cross-unit correlation in the shocks).

Remark. The moment conditions that drive the ISCM estimator rely on cross-sectional shock independence after conditioning on the latent component. A common contemporaneous shock across units violates (b) and reintroduces a bias term the estimator cannot remove.

A4 (Within-unit serial dependence allowed). \(\epsilon_{it}\) is a strongly mixing sequence in \(t\) of size \(-r/(r-1)\) for some \(r > 1\), with \(\mathbb{E}|\epsilon_{it}|^{r+\delta} < \infty\) for some \(\delta > 0\).

Remark. ISCM permits serially correlated shocks within a unit (a meaningful relaxation vs. canonical SC’s iid assumption) provided they mix at a uniform rate. Unit roots and other persistent (non-mixing) structures are ruled out.

A5 (Regularity of the fit weights). If A2(a) holds for unit \(i\), then \(a_i(\mathbf{w}) \xrightarrow{p} \bar a_i > 0\).

Remark. The data-driven fit metric does not collapse for units that actually have a valid synthetic control. Convenient (paper footnote 10): holds straightforwardly for any unit whose pre-period moment distance is bounded away from zero.

Theorem 4.1 (asymptotic unbiasedness). Under A1-A5, \(\widehat{\tau}_{1t} \xrightarrow{p} \tau_{1t} + V_t\) with \(\mathbb{E}[V_t] = 0\) as \(T_0 \to \infty\). The estimator is asymptotically unbiased but not consistent for a single post-period – aggregation across the post-period (eq. 15) or across multiple treated units (Section 4.4.3) is what drives the variance term toward zero.

When the assumptions bind: practical diagnostics#

Latent component is fixed and continuous (A1). The systematic component \(L_{it}\) is deterministic conditional on the unit; the outcome \(y_{it}\) is continuous.

Plausibly violated when outcomes are binary or low-count (handgun-suicide-events per month in a small state can spike to zero or single digits; an LPM/Tobit-style nonlinearity enters that A1 does not cover). Diagnostic: histogram the outcome; heaps at zero or integer values flag a non-continuous structure. Aggregate to coarser time bins (Powell aggregates monthly suicides to 12-month windows for exactly this reason) or move to Distributional Synthetic Control (DSC) if the distribution is the object.
A2(a) OR A2(b) for every unit, with at least one A2(a). Identification fails only if no unit has a proper synthetic control. The treated unit may fail A2(a) entirely – that’s the whole point – as long as it appears with positive weight in some other unit’s synthetic control.

Plausibly violated when the donor pool itself sits in a qualitatively different regime (e.g. the eight waiting-period states are all low-suicide-rate while Wisconsin and any donor that would put weight on it are high-rate). Diagnostic: read res.fit_metric; if every \(a_i \approx 0\), no unit in the panel has a proper synthetic control and the estimator has nothing to anchor on. If only the treated unit has \(a_1 \to 0\) but a handful of donors have \(a_i \gtrsim 0.5\) (the Wisconsin pattern: Iowa, Indiana, Mississippi all fit well as syntheses and place positive weight on Wisconsin), A2(b) is still doing its job.
No contemporaneous cross-sectional shock correlation (A3b). The shocks \(\epsilon_{it}, \epsilon_{jt}\) are uncorrelated across units in the same period given the latent component.

Plausibly violated when a common national-level shock hits every unit in the same period – a federal policy change, a macro recession, a pandemic. The ISCM moment conditions (paper eq. 10) are no longer zero in expectation because the cross-unit residual covariances enter the bias term. Diagnostic: form the panel of pre-period SCM residuals \(R_{it} = y_{it} - \sum_j \widehat{w}_i^{\,j} y_{jt}\) and compute the cross-sectional correlation matrix; large off-diagonal entries flag A3b violations. Drop the common shock with a unit-time fixed effect before fitting ISCM, or use a time-fixed-effects pre-residualisation step (Powell notes ISCM does not require unit FE but does require A3b).
Within-unit mixing of the shock (A4). Serial correlation within a unit is allowed but must decay – strongly mixing of size \(-r/(r-1)\), \(r > 1\). This rules out unit roots and other persistent (non-mixing) structures in the shock.

Plausibly violated when the outcome contains a unit-root component – raw monthly stock-price levels, cumulative population, undifferenced trending series. The ISCM estimator converges to the true effect plus a non-vanishing variance term \(V_t\) (Theorem 4.1), and that variance fails to average down over the post-period when \(\epsilon\) does not mix. Diagnostic: ADF or KPSS test on the pre-period residuals; if non-stationary, first-difference the outcome or move to a stationary-cycle estimator (Synthetic Business Cycle (SBC)) before feeding into ISCM.
Long pre-period (Theorem 4.1 is asymptotic in \(T_0\)). Unbiasedness requires \(T_0 \to \infty\); in finite samples the bias term has a residual of order \(T_0^{-1/2}\) from the empirical moment conditions.

Plausibly violated when \(T_0\) is on the order of \(N\) or smaller, especially with a small donor pool. Diagnostic: the paper’s application uses 161 monthly pre- treatment observations; if your \(T_0\) is in the tens, re-estimate after lengthening the pre-period (e.g. by aggregating across a finer time grid) and compare. Large swings in the estimate flag the finite-sample bias.
Inference floor under tiny contributing sets \(|C|\). The Ibragimov-Muller sign-flip test on per-unit estimates has p-value floor \(2 / 2^{|C|}\). With \(|C| = 3\) the floor is 0.25; with \(|C| = 5\) it is 0.0625; with \(|C| = 8\) it is 0.0078.

Plausibly violated when the donor pool is so small that only a few units survive the \(a_i\) filtering. The Wisconsin application has \(|C| = 5\) (Wisconsin plus California, Indiana, Iowa, Mississippi, Rhode Island), so the smallest possible p-value the test can return is \(\approx 0.0625\) – the headline 0.046 sits at the floor. Diagnostic: read res.inference.n_contributing; if it is below 6 or so, interpret p-values cautiously and consider expanding the donor pool or aggregating multiple treated units (Powell Section 4.4.3) to add inference power.

When to use ISCM – and when not to#

Reach for ISCM when:

The treated unit is visibly outside the donor convex hull (its pre-period trends above or below every donor, no convex combination can match it) and you can see the canonical SCM pre-fit is bad. ISCM’s whole identification story is built for this case.
You have a long pre-period (Powell’s application uses 161 monthly observations; Theorem 4.1 is asymptotic in \(T_0\)).
Transitory shocks are large – outcomes are noisy enough that an exact pre-period match is implausible even when the hull holds. The moment-condition framework is robust to shock variance that breaks the canonical SCM’s exact-fit assumption.
The donor pool is small – conventional permutation inference cannot reach 5% with 8 donors (the floor is roughly \(1/9\)); ISCM’s per-unit decomposition + Ibragimov-Muller sign-flip gets you to the floor \(2/2^{|C|}\), which is meaningfully tighter (0.0625 at \(|C|=5\)) than 1/9.
You want to remove the eyeball “is the pre-fit good enough” decision from your workflow. The \(a_i\) weights systematise that judgement, asymptotically excluding units without proper synthetic controls without the researcher flagging them by hand.

Do not use ISCM when:

The treated unit is well inside the hull and the canonical SCM pre-fit is tight. ISCM adds estimation noise from the per-unit decomposition and the \(a_i\) weighting machinery for no identification gain. Use canonical SCM, Two-Step Synthetic Control, or Forward Difference-in-Differences (FDID) – they have stronger small-sample guarantees in the happy case.
Contemporaneous national-level shocks are part of the DGP (national policy change, macro recession, pandemic spanning treated + donors). A3b fails; the ISCM moment conditions acquire a bias term you cannot remove. Either de-mean by time-FE before fitting or move to Spillover-Aware Synthetic Control (SPILLSYNTH) / Spatial Synthetic Difference-in-Differences (SpSyDiD) if the shock has spatial structure.
The outcome is non-stationary or unit-root without differencing. A4’s mixing condition fails; the post-period variance term in Theorem 4.1 does not average down. First-difference, or move to Synthetic Business Cycle (SBC) (a stationary-cycle estimator) before feeding into ISCM.
Binary, ordinal, or low-count outcomes. A1 requires continuity. With heaps at zero or integer values, aggregate to coarser time bins (the Wisconsin application aggregates monthly suicides to 12-month windows), or move to Distributional Synthetic Control (DSC) for the distributional question.
No unit anywhere in the panel has a valid synthetic control. A2(a) must hold for some unit (paper Discussion, Section 4.2). If every \(a_i \to 0\), ISCM has nothing to anchor on. Diagnose by inspecting res.fit_metric; if every value is near zero, the panel is structurally unsuited and you need a different identification strategy.
Continuous or multi-valued treatment. ISCM encodes binary on/off treatment with the exposure \(E_{it} = d_{it} - \sum_j \widehat{w}_i^{\,j} d_{jt}\). Continuous dose (minimum wage, ad spend, drug dosage) belongs in Continuous-Treatment Synthetic Control (CTSC).
Staggered adoption with a long mixed-treatment pre-period. Section 4.4.3 sketches a multi-treated extension but assumes a common pre-period free of treatment. If donors adopt the policy at different times across a long window, drop late adopters or use a staggered SC variant (FECT, Synthetic Difference-in-Differences (SDID)).
You need a sparse, interpretable single weight vector. ISCM returns per-unit weights for all units (one synthetic control per unit) and aggregates them via \(a_i\). If the policy story you need is “this single state is a convex combination of these four donors”, report the canonical SC weight vector alongside ISCM, or use canonical SCM / Two-Step Synthetic Control whose output IS a single sparse weight vector.

Empirical: Wisconsin’s 48-h handgun waiting-period repeal#

Powell’s Section 6 application: in June 2015 Wisconsin repealed its 48-h handgun-purchase waiting period. The donor pool is the eight states that also had waiting periods during the analysis window and did not repeal them (California, Hawaii, Illinois, Iowa, Maryland, Minnesota, New Jersey, Rhode Island). The outcome is monthly handgun-suicide deaths per 100,000 from January 2002 to May 2019 (161 pre-treatment months, 47 post-period).

The setup is exactly the case ISCM was built for:

Wisconsin is structurally outside the donor hull. It has the highest handgun-suicide rate in 73 of the 161 pre-period months – no convex combination of the eight donors can match its level even in expectation. The canonical SCM (Figure 1B, blue line) drifts visibly upward in the pre-period, flagging the convex-hull violation; the demeaned SCM (Powell also runs this) corrects for level but still shows an upward pre-period trend, indicating the convex-hull assumption fails in expectation, not just in sample.
The donor pool is small (\(N_0 = 8\)), so the canonical permutation test’s smallest achievable p-value is roughly \(1/9 \approx 0.11\) – above any conventional threshold.

ISCM produces:

Main estimate: \(\widehat{\tau} = 0.105\) deaths per 100,000 (about a 30% increase relative to the pre-repeal Wisconsin rate), \(p = 0.046\) via the Ibragimov-Muller sign-flip on the contributing units. The p-value sits at the inference floor \(2/2^{|C|}\) with \(|C| = 5\).
Per-state decomposition (paper Table 1, v_i weights):

State

\(v_i\)

Estimate

Iowa

35.73%

0.038

Indiana

34.46%

0.063

Mississippi

19.43%

0.232

Wisconsin

9.94%

0.232

California

0.43%

0.336

Rhode Island

0.02%

-1.931

Wisconsin contributes only ~10% of the total estimate – it is itself a bad synthetic control for the other waiting-period states (its outside-hull position cuts both ways) – while Iowa and Indiana, which produce good synthetic controls and place positive weight on Wisconsin, drive 70% of the estimate. This is the A2(b) identification mechanism at work: even though Wisconsin fails A2(a), the unbiased treatment-effect signal is recovered from the donors who use Wisconsin as a donor.

State	\(v_i\)	Estimate
Iowa	35.73%	0.038
Indiana	34.46%	0.063
Mississippi	19.43%	0.232
Wisconsin	9.94%	0.232
California	0.43%	0.336
Rhode Island	0.02%	-1.931

Because the application uses restricted-access NVSS mortality data, this estimator is not runnable end-to-end from public sources. The replication package (https://journaldata.zbw.eu/dataset/imperfect-synthetic-controls) documents access; the mlsynth ISCM API call is structurally identical to the example above.

Core API#

ISCM: Imperfect Synthetic Controls (Powell 2026).

Powell, D. (2026). “Imperfect Synthetic Controls.” Journal of Applied Econometrics 41(3):253-264.

The synthetic control method assumes a perfect synthetic control exists – the treated unit lies inside the convex hull of the donors and its pre-period path is matched exactly. With transitory shocks this is implausible: an exact fit cannot hold even in expectation. ISCM relaxes the assumption by constructing synthetic controls for every unit and identifying the treatment effect even when the treated unit is outside the convex hull. The intuition (paper eq. 6): a treated unit that fits no donor combination can still appear as a donor for control units, and those units’ post-treatment residuals then carry information about the effect.

ISCM also introduces a data-driven fit metric \(a_i\) that asymptotically excludes units lacking a valid synthetic control – removing the researcher’s eyeball judgment of pre-period fit – and an Ibragimov-Muller inference procedure that remains valid with a very small donor pool, where permutation tests cannot reach standard significance thresholds.

This implementation follows Powell’s applied procedure: synthetic controls for all units are obtained by the traditional SCM, the \(a_i\) fit weights are formed from the pre-period moment conditions, the ATT is the \(a_i\)-weighted least-squares effect (eq. 8 / 15), and inference is the sign-flip randomization test of eq. 16. It does not run the optional continuously-updating GMM refinement of the weights (paper Section 3.2-3.4); the SCM-initialised weights are the documented starting point of that procedure.

class mlsynth.estimators.iscm.ISCM(config: ISCMConfig | dict)#

Bases: object

Imperfect Synthetic Controls estimator.

Parameters:: config (ISCMConfig or dict) – Configuration object. See mlsynth.config_models.ISCMConfig.

fit() → ISCMResults#: Run ISCM and return ISCMResults.

Configuration#

class mlsynth.config_models.ISCMConfig(*, df: ~pandas.DataFrame, outcome: str, treat: str, unitid: str, time: str, display_graphs: bool = True, save: bool | str = False, counterfactual_color: ~typing.List[str] = <factory>, treated_color: str = 'black', plot: ~mlsynth.config_models.PlotConfig = <factory>, inference: bool = True, null_value: float = 0.0, alpha: ~typing.Annotated[float, ~annotated_types.Gt(gt=0.0), ~annotated_types.Lt(lt=1.0)] = 0.05, n_draws: ~typing.Annotated[int, ~annotated_types.Ge(ge=100)] = 10000, random_state: int = 0)#

Configuration for the Imperfect Synthetic Controls (ISCM) estimator.

Powell, D. (2026). “Imperfect Synthetic Controls,” Journal of Applied Econometrics. Builds synthetic controls for every unit, identifies the treatment effect even when the treated unit is outside the convex hull, weights units by a data-driven fit metric, and uses Ibragimov-Muller inference valid for small donor pools. Inherits the standard df / outcome / treat / unitid / time interface.

Parameters:

inference (bool) – Run Ibragimov-Muller inference over the per-unit estimates. Default True.
null_value (float) – Null effect alpha_0 for the randomization test. Default 0.
alpha (float) – Two-sided level for the confidence interval.
n_draws (int) – Number of Rademacher sign-flip draws for the p-value.
random_state (int) – Seed for the randomization-test RNG.

alpha: float#

inference: bool#

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_draws: int#

null_value: float#

random_state: int#

Helper Modules#

Panel ingestion for the ISCM estimator.

Pivots a long-format panel into the dense (N, T) outcome and treatment matrices ISCM operates on. ISCM builds a synthetic control for every unit, so – unlike single-treated SCM – there is no treated / donor split at this stage; all units are retained.

mlsynth.utils.iscm_helpers.setup.prepare_iscm_inputs(df: DataFrame, outcome: str, treat: str, unitid: str, time: str) → ISCMInputs#

Pivot a long panel into ISCMInputs.

Parameters:

df (pd.DataFrame) – Balanced long panel; one row per (unit, time).
outcome, treat, unitid, time (str) – Column names. A treated unit has treat == 1 from a common adoption period onward; ISCM assumes a single adoption date (no staggered timing).

All-units synthetic-control weights for ISCM.

For every unit \(i\), ISCM constructs a synthetic control from the other units by the usual constrained least-squares pre-period fit (paper eq. 5):

\[\widehat w_i = \arg\min_{w} \sum_{t \le T_0} \Bigl( Y_{it} - \sum_{j \ne i} w_j Y_{jt} \Bigr)^2, \quad w_j \ge 0,\ \sum_{j \ne i} w_j = 1.\]

These per-unit weights are the starting point of ISCM (Powell’s applied procedure initialises with the traditional SCM); the fit metric and the treatment-effect regression are built on top of them.

mlsynth.utils.iscm_helpers.weights.all_units_weights(Y: ndarray, T0: int) → ndarray#

Return the (N, N) all-units SC weight matrix.

Row i holds unit i’s synthetic-control weights over the other units (W[i, i] = 0, each row non-negative and summing to one).

Parameters:

Y (np.ndarray) – Outcomes, shape (N, T).
T0 (int) – Number of pre-treatment periods.

Fit metric and weighted-least-squares treatment effect for ISCM.

Given the all-units synthetic-control weights \(W\), ISCM forms

the SC residuals \(R_{it} = Y_{it} - \sum_j w_{ij} Y_{jt}\),
the treatment exposure \(E_{it} = D_{it} - \sum_j w_{ij} D_{jt}\) (the regressor: a control unit that borrows the treated unit as a donor has non-zero post-period exposure \(-w_{i,\text{tr}}\)),
a per-unit fit metric \(a_i\) (paper eq. 14) that weights units by how well their synthetic control satisfies the SCM moment conditions in the pre-period – asymptotically excluding units without a valid synthetic control,

and estimates the ATT by weighted least squares pooled over all units (paper eq. 8 / 15):

\[\widehat\alpha = \frac{\sum_i a_i \sum_{t > T_0} E_{it} R_{it}} {\sum_i a_i \sum_{t > T_0} E_{it}^2}.\]

mlsynth.utils.iscm_helpers.estimate.fit_metric(R: ndarray, Y: ndarray, T0: int) → ndarray#

Per-unit fit weights \(a_i \in (0, 1]\) (paper eq. 14).

The pre-period moment vector for unit i collects the empirical covariances of its SC residual with every other unit’s outcomes, \(M_i^k = \tfrac{1}{T_0}\sum_{t \le T_0} R_{it} Y_{kt}\). A good synthetic control makes these (near) zero. The metric normalises so the best-fitting unit gets 1 and poorer fits get progressively smaller weights:

\[a_i = \frac{\min_\ell M_\ell' M_\ell}{M_i' M_i}.\]

mlsynth.utils.iscm_helpers.estimate.residuals_and_exposure(Y: ndarray, D: ndarray, W: ndarray) → Tuple[ndarray, ndarray]#: Return (residuals R, exposure E), each shape (N, T).

mlsynth.utils.iscm_helpers.estimate.weighted_att(R: ndarray, E: ndarray, a: ndarray, T0: int) → Tuple[float, ndarray, ndarray]#

Aggregate ATT and per-unit decomposition (paper eq. 15).

Returns:

att (float) – WLS aggregate effect.
unit_att (np.ndarray, shape (N,)) – Per-unit estimate \(\widehat\alpha_i = \sum_{t>T_0} E_{it} R_{it} / \sum_{t>T_0} E_{it}^2\); NaN outside the contributing set (units with zero post-period exposure).
contribution (np.ndarray, shape (N,)) – Per-unit share \(v_i\) of the aggregate effect; sums to one over the contributing set.

Ibragimov-Muller inference for ISCM (paper Section 5, eq. 16).

ISCM yields one treatment-effect estimate \(\widehat\alpha_i\) per contributing unit, with relative weights \(v_i\) (so \(\widehat\alpha = \sum_{i \in C} v_i \widehat\alpha_i\)). Under the null \(H_0: \alpha = \alpha_0\) the weighted deviations \(v_i(\widehat\alpha_i - \alpha_0)\) are treated as approximately symmetric, and a sign-flip (Rademacher) randomization test calibrates the p-value – conservative but valid even with a handful of contributing units, where a permutation test cannot reach standard thresholds.

mlsynth.utils.iscm_helpers.inference.ibragimov_muller_inference(att: float, unit_att: ndarray, contribution: ndarray, *, N: int, null_value: float = 0.0, alpha_level: float = 0.05, n_draws: int = 10000, random_state: int = 0) → ISCMInference#

Sign-flip randomization test over the per-unit estimates.

Parameters:

att (float) – Aggregate ATT.
unit_att (np.ndarray) – Per-unit estimates (NaN outside the contributing set).
contribution (np.ndarray) – Per-unit weights \(v_i\) (sum to one over C).
N (int) – Total number of units (for the finite-sample variance scaling).
null_value (float) – Tested null \(\alpha_0\).
alpha_level (float) – Two-sided level for the reported CI.
n_draws (int) – Number of Rademacher sign-flip draws.
random_state (int) – RNG seed.

Orchestration for the ISCM estimator (Powell 2026).

Pipeline:

Build synthetic controls for every unit (all_units_weights()).
Form SC residuals and treatment exposure; compute the per-unit fit metric \(a_i\) (estimate).
Estimate the ATT by the \(a_i\)-weighted least-squares regression pooled over all units (paper eq. 15).
Optionally run Ibragimov-Muller inference over the per-unit estimates.

mlsynth.utils.iscm_helpers.pipeline.run_iscm(inputs: ISCMInputs, *, inference: bool = True, null_value: float = 0.0, alpha_level: float = 0.05, n_draws: int = 10000, random_state: int = 0) → ISCMResults#

Run ISCM and assemble ISCMResults.

Parameters:

inputs (ISCMInputs) – Preprocessed panel.
inference (bool) – If True, run Ibragimov-Muller inference. Default True.
null_value (float) – Null effect \(\alpha_0\) for the test.
alpha_level (float) – Two-sided level for the confidence interval.
n_draws (int) – Number of Rademacher sign-flip draws.
random_state (int) – RNG seed for the randomization test.

Frozen dataclasses for the Imperfect Synthetic Controls (ISCM) estimator.

Powell, D. (2026). “Imperfect Synthetic Controls.” Journal of Applied Econometrics 41(3):253-264.

The standard SCM assumes a perfect synthetic control exists: the treated unit lies inside the convex hull of the donors and its pre-period outcomes are matched exactly. With transitory shocks this is implausible – an exact fit is impossible even in expectation. ISCM relaxes this by

constructing synthetic controls for every unit (not just the treated one), so the treatment effect is identified even when the treated unit is outside the convex hull – it can still appear as a donor for control units, and those units’ post-treatment residuals carry information about the effect (paper eq. 6);
weighting units by a data-driven fit metric \(a_i\) that asymptotically excludes units lacking a valid synthetic control (paper eq. 14), removing the researcher’s eyeball “is the pre-fit good enough” judgment;
estimating the effect by weighted least squares across all units (paper eq. 8 / 15);
conducting inference via the Ibragimov-Muller t-statistic over the per-unit estimates (paper eq. 16), valid even with a very small donor pool.

class mlsynth.utils.iscm_helpers.structures.ISCMInference(method: str, null_value: float, t_stat: float, p_value: float, se: float, ci: tuple, alpha_level: float, n_contributing: int, n_draws: int)#

Ibragimov-Muller inference for ISCM (paper Section 5, eq. 16).

ISCM produces one treatment-effect estimate per contributing unit. Their (weighted) spread calibrates uncertainty via a sign-flip (Rademacher) randomization test – conservative but valid even with a handful of donors, where a permutation test cannot reach standard significance thresholds.

method#

"ibragimov_muller".

Type:: str

null_value#

The tested null effect \(\alpha_0\).

Type:: float

t_stat#

The Ibragimov-Muller test statistic (paper eq. 16).

Type:: float

p_value#

Two-sided sign-flip randomization p-value.

Type:: float

se#

Standard error implied by the per-unit estimate spread.

Type:: float

ci#

Approximate two-sided confidence interval for the ATT.

Type:: tuple of float

alpha_level#

Two-sided level used for ci.

Type:: float

n_contributing#

Size of the contributing set C.

Type:: int

n_draws#

Number of Rademacher draws.

Type:: int

alpha_level: float#

ci: tuple#

method: str#

n_contributing: int#

n_draws: int#

null_value: float#

p_value: float#

se: float#

t_stat: float#

class mlsynth.utils.iscm_helpers.structures.ISCMInputs(Y: ndarray, D: ndarray, T0: int, unit_names: List[Any], time_labels: ndarray, treated_idx: ndarray)#

Preprocessed panel for ISCM (synthetic controls for all units).

Y#

Outcomes for every unit, shape (N, T).

Type:: np.ndarray

D#

Treatment indicators, shape (N, T); D[i, t] = 1 iff unit i is treated at period t.

Type:: np.ndarray

T0#

Number of pre-treatment periods (treatment starts at T0).

Type:: int

unit_names#

Length-N unit identifiers.

Type:: list

time_labels#

Length-T period labels.

Type:: np.ndarray

treated_idx#

Indices of the ever-treated units.

Type:: np.ndarray

D: ndarray#

property N: int#

property T: int#

T0: int#

Y: ndarray#

property n_post: int#

time_labels: ndarray#

treated_idx: ndarray#

unit_names: List[Any]#

class mlsynth.utils.iscm_helpers.structures.ISCMResults(*, effects: EffectsResults | None = None, fit_diagnostics: FitDiagnosticsResults | None = None, time_series: TimeSeriesResults | None = None, weights: WeightsResults | None = None, inference: InferenceResults | None = None, method_details: MethodDetailsResults | None = None, sub_method_results: Dict[str, Any] | None = None, additional_outputs: Dict[str, Any] | None = None, raw_results: Dict[str, Any] | None = None, execution_summary: Dict[str, Any] | None = None, plot_config: PlotConfig | None = None)#

Top-level container returned by mlsynth.ISCM.fit().

inputs#

Preprocessed panel.

Type:: ISCMInputs

att#

Average treatment effect on the treated, aggregated over the post-treatment period (paper eq. 15).

Type:: float

unit_weight_matrix#

All-units synthetic-control weight matrix, shape (N, N). Row i is the synthetic control for unit i ([i, i] = 0, each row non-negative and summing to one).

Type:: np.ndarray

fit_metric#

Per-unit fit weights \(a_i \in (0, 1]\), shape (N,); 1 for the best-fitting unit, smaller for poorer synthetic controls (paper eq. 14).

Type:: np.ndarray

unit_att#

Per-unit treatment-effect estimates, shape (N,); NaN for units that carry no identifying variation. Only units in the contributing set C (non-zero treatment exposure) are finite.

Type:: np.ndarray

contribution#

Per-unit share \(v_i\) of the aggregate ATT, shape (N,); sums to one over the contributing set (paper, before eq. 16).

Type:: np.ndarray

residuals#

Synthetic-control residuals \(Y_{it} - \sum_j w_{ij} Y_{jt}\), shape (N, T).

Type:: np.ndarray

exposure#

Treatment exposure \(D_{it} - \sum_j w_{ij} D_{jt}\), shape (N, T) – the regressor in the WLS effect estimate.

Type:: np.ndarray

weights#

Standardized donor weights for the treated unit(s) – the treated unit’s row of unit_weight_matrix (a convex synthetic control over the other units). For multiple treated units, the cross-unit average; per-unit rows live in summary_stats.

Type:: WeightsResults, optional

inference#

ISCMInference when inference=True; None otherwise.

Type:: object, optional

metadata#

Free-form diagnostics.

Type:: dict

contribution: np.ndarray#

exposure: np.ndarray#

fit_metric: np.ndarray#

inference_detail: 'ISCMInference' | None#

inputs: ISCMInputs#

metadata: Dict[str, Any]#

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'frozen': True, 'json_encoders': {<class 'numpy.ndarray'>: <function BaseEstimatorResults.Config.<lambda>>}}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

residuals: np.ndarray#

unit_att: np.ndarray#

unit_weight_matrix: np.ndarray#

Example#

A one-factor panel where the treated unit has the largest factor loading – placing it outside the convex hull of the controls, so a traditional SCM cannot match it. ISCM still recovers the planted effect via the control units that use the treated unit as a donor.

import numpy as np
import pandas as pd

from mlsynth import ISCM

# ------------------------------------------------------------------
# 1. One-factor panel; unit 0 (treated) has the MAX loading
# ------------------------------------------------------------------
rng = np.random.default_rng(0)
N, T, T0, true_alpha = 8, 60, 48, 3.0
loadings = np.linspace(2.0, -1.5, N)        # unit 0 outside the hull
f = np.cumsum(rng.standard_normal(T)) * 0.3 + np.linspace(0, 2, T)
Y = np.outer(loadings, f) + rng.standard_normal((N, T)) * 0.05
D = np.zeros((N, T))
Y[0, T0:] += true_alpha
D[0, T0:] = 1

rows = [{"unit": f"u{i}", "time": t, "y": Y[i, t], "D": int(D[i, t])}
        for i in range(N) for t in range(T)]
df = pd.DataFrame(rows)

# ------------------------------------------------------------------
# 2. Fit ISCM with Ibragimov-Muller inference
# ------------------------------------------------------------------
res = ISCM({
    "df": df, "outcome": "y", "treat": "D",
    "unitid": "unit", "time": "time",
    "inference": True,
}).fit()

# ------------------------------------------------------------------
# 3. Inspect the result
# ------------------------------------------------------------------
print(f"ATT = {res.att:+.3f}  (true = {true_alpha})")
print(f"treated fit metric a_0 = {res.fit_metric[0]:.3f}  "
      f"(small => outside the hull)")
print(f"treated contribution   = {res.contribution[0]*100:.1f}%")
print(f"p-value = {res.inference.p_value:.3f}  "
      f"(n contributing = {res.inference.n_contributing})")

References#

Abadie, A., Diamond, A., & Hainmueller, J. (2010). “Synthetic Control Methods for Comparative Case Studies.” Journal of the American Statistical Association 105(490):493-505.

Ferman, B., & Pinto, C. (2021). “Synthetic Controls with Imperfect Pretreatment Fit.” Quantitative Economics 12(4):1197-1221.

Fry, J. (2024). “A Method of Moments Approach to Asymptotically Unbiased Synthetic Controls.” Journal of Econometrics 244:105846.

Ibragimov, R., & Muller, U. K. (2010). “T-Statistic Based Correlation and Heterogeneity Robust Inference.” Journal of Business & Economic Statistics 28(4):453-468.

Powell, D. (2026). “Imperfect Synthetic Controls.” Journal of Applied Econometrics 41(3):253-264.

Imperfect Synthetic Controls (ISCM)

Contents

Imperfect Synthetic Controls (ISCM)#

Overview#

The identifying intuition#

When to use ISCM#

Notation#

Mathematical Formulation#

Setup (paper Section 2)#

Fit metric (paper eq. 14)#

Treatment effect (paper eq. 8 / 15)#

Inference (paper Section 5, eq. 16)#

Scope of this implementation#

Assumptions (Powell 2026)#

When the assumptions bind: practical diagnostics#

When to use ISCM – and when not to#

Empirical: Wisconsin’s 48-h handgun waiting-period repeal#

Core API#

Configuration#

Helper Modules#

Example#

References#