Matching and Synthetic Control (MASC)

Matching and Synthetic Control (MASC)#

When to use MASC – and when not to#

Reach for MASC when:

You have a single treated unit and a moderate-to-large donor pool with non-trivial heterogeneity across donors. Comparative case studies in policy evaluation (Basque terrorism, Prop 99, German reunification) are the canonical setting.
The conditional mean of the outcome is plausibly non-linear in the pre-treatment covariates – so pure SC’s interpolation bias is a real risk – and you also suspect no donor is close enough to extrapolate from – so pure matching’s extrapolation bias is also a real risk. When both worries are alive, MASC’s $\widehat{\varphi}$ trades them off in a data-driven way.
You can’t decide a priori whether SC or matching is more appropriate. The CV gives you a defensible answer rather than the practitioner’s eyeball “I’ll use SC because that’s what the paper said”.
The pre-period is long enough for rolling-origin cross-validation to discriminate the two biases (KMPT Section 5 uses 20 pre-treatment years on Basque; mlsynth’s min_preperiods defaults to 5 as a hard floor, but pre-periods around 10+ are where CV becomes informative).

Do not use MASC when:

Either bias dominates. If res.phi_hat is essentially 0 or 1 across seeds, MASC adds variance without bias improvement. At $\widehat{\varphi} \approx 0$ reach for canonical SCM / Two-Step Synthetic Control (or Forward-Selected Synthetic Control (FSCM) for selective donor pruning); at $\widehat{\varphi} \approx 1$ reach for a dedicated nearest-neighbour matching estimator.
The treated unit is structurally outside the donor convex hull. Both component estimators fail (Assumption 2). Use Imperfect Synthetic Controls (ISCM) (identifies the effect via donors that use the treated unit as a positive-weight donor) or Nonlinear Synthetic Control (NSC) (drops the simplex restriction to extrapolate by negative weights).
You need posterior credible bands on the weights / ATT. MASC returns point estimates plus a CV criterion. For full Bayesian inference, use Bayesian Synthetic Control with a Soft Simplex Constraint (BVS-SS) (spike-and-slab variable selection with a soft simplex).
The pre-period is very short ($T_0 < 10$-ish). Rolling-origin CV has too few folds to discriminate $(m, \varphi)$; the selected mix is noise. Use canonical SCM / Two-Step Synthetic Control / Forward Difference-in-Differences (FDID) (which work without CV) instead.
Multiple treated units. MASC’s identification story uses a single treated unit. For staggered or many-treated designs, use FECT or Synthetic Difference-in-Differences (SDID). (The paper’s Section 2.7 notes one could average a matching and penalised-SC pair across treated units, mirroring MASC, but this is not in the mlsynth implementation and demands additional econometric theory.)
Structural break inside the pre-period. Assumption 5 fails; the CV is fitting the break instead of the post-period mix. Trim to a stable window or use Synthetic Business Cycle (SBC).
You need a single sparse interpretable weight vector as the policy-story deliverable. MASC’s output is a mixture of SC weights and matching weights; both can be sparse on their own, but the mixed vector is generically less sparse than either component. If the headline must be “California ≈ Utah + Montana + Nevada”, run canonical SCM alongside.
Distributional questions (Lorenz curves, QTEs, tail effects). MASC targets the mean ATT. Use Distributional Synthetic Control (DSC).
Continuous or multi-valued treatment. MASC encodes a single binary intervention. Continuous dose belongs in Continuous-Treatment Synthetic Control (CTSC).
Spillovers across donors. Both component estimators inherit SUTVA at the donor level. Use Spillover-Aware Synthetic Control (SPILLSYNTH) or Spatial Synthetic Difference-in-Differences (SpSyDiD).

Notation#

We use the synthetic-control canon. Let $j = 1$ denote the treated unit, with all units $\mathcal{N} \coloneqq \{1, \ldots, N\}$ and donor pool $\mathcal{N}_0 \coloneqq \mathcal{N} \setminus \{1\}$ of cardinality $N_0$. The treated outcome path is $\mathbf{y}_1$ and $\mathbf{Y}_0 \coloneqq [\mathbf{y}_j]_{j \in \mathcal{N}_0} \in \mathbb{R}^{T \times N_0}$ is the donor outcome matrix (one column per donor). Time runs over $t \in \mathcal{T} \coloneqq \{1, \ldots, T\}$, 1-indexed; the intervention takes effect after period $T_0$, splitting $\mathcal{T}$ into the pre-period $\mathcal{T}_1 \coloneqq \{t \in \mathcal{T} : t \le T_0\}$ (so $|\mathcal{T}_1| = T_0$) and the post-period $\mathcal{T}_2 \coloneqq \{t \in \mathcal{T} : t > T_0\}$. Predictors are stacked into $(\mathbf{x}_1, \mathbf{X}_0)$ with $\mathbf{x}_1 \in \mathbb{R}^P$ for the treated unit and $\mathbf{X}_0 \in \mathbb{R}^{P \times N_0}$ for the donors. Donor weights are $\mathbf{w} \in \mathbb{R}^{N_0}$, constrained to the unit simplex

\[\Delta^{N_0} \coloneqq \Bigl\{ \mathbf{w} \in \mathbb{R}_{\ge 0}^{N_0} : \|\mathbf{w}\|_1 = 1 \Bigr\};\]

the optimiser is $\mathbf{w}^\ast$. The per-period treatment effect is $\tau_t \coloneqq y_{1t} - \widehat{y}_{1t}$ and the ATT is $\widehat{\tau} \coloneqq |\mathcal{T}_2|^{-1} \sum_{t \in \mathcal{T}_2} \tau_t$.

Assumptions (Kellogg-Mogstad-Pouliot-Torgovitsky 2021)#

MASC inherits the formal identification stack of any causal SC estimator (paper Section 2.1) and adds the structural conditions needed for model averaging to make sense. Listed in the paper’s order:

Assumption 1 (Selection on observables – paper Assumption 1). For $\mathbf{x}$ in the supports of both $\mathbf{x}_j \mid d_j = 0$ and $\mathbf{x}_j \mid d_j = 1$,

\[\mathbb{E}[y_{jt}^N \mid d_j = 1, \mathbf{x}_j = \mathbf{x}] \;=\; \mathbb{E}[y_{jt}^N \mid d_j = 0, \mathbf{x}_j = \mathbf{x}] \quad \text{for all } t > T_0.\]

Remark. This is the standard mean-independence statement (ignorable treatment assignment / unconfoundedness / selection on observables) applied to the SC framework. Together with Assumption 2 it makes the post-treatment conditional mean of untreated outcomes for the treated unit identifiable from donor outcomes.

Assumption 2 (Overlap – paper Assumption 2). The support of $\mathbf{x}_j \mid d_j = 1$ is contained in the support of $\mathbf{x}_j \mid d_j = 0$.

Remark. In a comparative case study with a single treated unit (the paper’s focus), this reduces to “for almost every covariate value the treated unit takes, there exists some donor with similar covariates”. With one treated unit, overlap fails fully only if the treated unit is an outlier on every donor covariate.

Assumption 3 (Lipschitz conditional mean). The conditional mean $\gamma_t(\mathbf{x}) = \mathbb{E}[y_{jt}^N \mid d_j = 0, \mathbf{x}_j = \mathbf{x}]$ is Lipschitz in $\mathbf{x}$ with constant $c$. Used (paper Section 2.2) to bound both bias components,

\[|\text{ExtBias}(\mathbf{w})| \;\le\; c \, \bigl\| \mathbf{x}_1 - \textstyle\sum_{j \in \mathcal{N}_0} w_j \mathbf{x}_j \bigr\| \;\coloneqq\; c \cdot \text{Ext}(\mathbf{w}), \qquad |\text{IntBias}(\mathbf{w})| \;\le\; c \, \textstyle\sum_{j \in \mathcal{N}_0} w_j \|\mathbf{x}_1 - \mathbf{x}_j\| \;\coloneqq\; c \cdot \text{Int}(\mathbf{w}).\]

Remark. These two bounds are the heart of the MASC argument: the SC estimator minimises $\text{Ext}(\mathbf{w})$ (and lives at zero extrapolation when $\mathbf{x}_1$ is in the donor hull), while matching minimises $\text{Int}(\mathbf{w})$ (and lives at zero interpolation by using only the nearest neighbours). When $\gamma_t$ is approximately linear in $x$, the interpolation bound is vacuous and SC dominates; when no donor is close, the extrapolation bound is large and matching does worse.

Assumption 4 (Complementarity – the substantive premise of model averaging). Both biases are plausibly relevant in the application: $\gamma_t$ is non-linear enough that SC alone interpolates badly, and no single donor is close enough that matching alone extrapolates badly.

Remark. This is the paper’s central conjecture for why model averaging helps. When either bias is absent the data-driven CV will pick $\widehat{\varphi} \in \{0, 1\}$ and MASC degenerates to a boundary estimator – a feature, not a bug.

Assumption 5 (Rolling-origin stability). The relationship between treated and donor outcomes is stable across the late-pre-period folds and across the pre/post boundary, so that one-step-ahead forecast accuracy on the training-set tail is informative about post-treatment forecast accuracy.

Remark. This is the SC identification premise restricted to the fold horizon. Without it the CV criterion is uninformative about the post-period and $\widehat{\varphi}$ reflects only pre-period drift.

Assumption 6 (Quadratic-in-$\varphi$ closed form). The CV criterion $Q(m, \varphi)$ is quadratic in $\varphi$ with positive semi-definite Hessian, so the unconstrained optimum is unique and the constrained optimum on $[0, 1]$ is its clip.

Remark. Mechanical; the joint $(m, \varphi)$ search reduces to a one-dimensional sweep over $m$. Held by construction.

When the assumptions bind: practical diagnostics#

Selection on observables (Assumption 1). Like every regression / SC / matching estimator, MASC assumes that the only systematic difference between treated and donor post-period outcomes is captured by the observed pretreatment covariates $\mathbf{x}_j$. If a confounder is missing from $\mathbf{x}_j$, MASC’s counterfactual is biased regardless of how the CV picks $(\widehat{m}, \widehat{\varphi})$.

Plausibly violated when a known driver of the outcome is omitted from covariates – a state’s industry mix in a labour-market study, an audience segment in a marketing study. Diagnostic: re-fit with one omitted covariate at a time and check whether res.att moves; large movements flag a missing confounder. There is no within-MASC fix; the cure is to include the missing covariate, or accept selection-on-observables is failing for this application.
Overlap (Assumption 2). With one treated unit, full overlap failure means the treated covariates lie outside the donor convex hull on at least one dimension.

Plausibly violated when the treated unit is structurally extreme on a known covariate (Hong Kong’s GDP level vs other Asian regions; California’s population vs interior states). Diagnostic: at the SC limit ($\varphi = 0$) the pre-period RMSE will be elevated and the implied SC weights will concentrate on a few donors with substantial covariate gaps. If res.fit.pre_rmse stays large and matching ($\varphi = 1$) does even worse, both components of the model average are failing for the same reason. Switch to Imperfect Synthetic Controls (ISCM) (which identifies the effect even when the treated unit is outside the hull) or Nonlinear Synthetic Control (NSC) (which drops the simplex restriction so SC weights can extrapolate by going negative on far donors).
Lipschitz conditional mean (Assumption 3). The Lipschitz constant controls how much extrapolation / interpolation bias the two component estimators incur. If $\gamma_t$ has a sharp kink or threshold in $x$, the bounds are loose and MASC’s CV-driven mix can be unstable.

Plausibly violated when the outcome is a step function (regulatory threshold), has a kink (minimum-wage bunching), or saturates near a ceiling. Diagnostic: plot the per-fold one-step-ahead forecast errors of the pure SC and pure matching arms (res.cv_diagnostics exposes both); if the two error series are highly correlated across folds, the averaging gain is small and MASC reduces to either boundary.
Complementarity / both biases bind (Assumption 4). If only one bias matters, MASC’s CV will sit at $\widehat{\varphi} \in \{0, 1\}$ and the model average adds variance without bias improvement.

Plausibly violated when the SC pre-fit is already tight ($\gamma_t$ is approximately linear in $x$ on the donor support) or no donor is remotely close (matching alone extrapolates badly across the board). Diagnostic: read res.phi_hat. If it is essentially 0 or 1 across multiple seeds / fold configurations, the MASC machinery is over-engineered for this application and the corresponding pure estimator is the better default – canonical SCM / Two-Step Synthetic Control for the $\varphi = 0$ regime, a nearest-neighbour matching estimator outside mlsynth for the $\varphi = 1$ regime.
Rolling-origin stability (Assumption 5). The CV-selected $(\widehat{m}, \widehat{\varphi})$ is only as informative as the late-pre-period is representative of the post-period.

Plausibly violated when a structural break (regime change, pandemic, financial crisis) sits inside the pre- period close to $T_0$. Diagnostic: inspect the per-fold forecast errors; if they trend sharply over the fold index, the late-pre-period is not exchangeable with the early one and the CV is mostly fitting that trend. Either trim the pre-period to a regime-stable window or move to a stationary-cycle estimator (Synthetic Business Cycle (SBC)).
Multiple treated units. The paper’s setup is one treated unit. With multiple treated, the SC step’s non-uniqueness problem (which the penalised-SC of Abadie & L’Hour 2020 was built for) propagates into MASC’s mix. Plausibly violated when you have several treated units on the same cohort. Diagnostic: MASC’s headline numbers will be sensitive to which treated unit you single out as “the” treated; if so, use canonical SCM paired with the penalised variant, or FECT for staggered designs.

Setup#

The matching and SCE weights and the MASC combiner are

\[\begin{split}\mathbf{w}_{\mathrm{match}}(m)_j &= \tfrac{1}{m}\,\mathbf{1}\!\Bigl\{ j \in \operatorname*{argmin}_{S \subseteq \mathcal{N}_0,\, |S|=m} \sum_{j\in S} d(1, j) \Bigr\}, \\[2pt] \mathbf{w}_{\mathrm{SC}} &\in \operatorname*{argmin}_{\mathbf{w}\in\Delta^{N_0}} \,\bigl\|\mathbf{x}_1 - \mathbf{X}_0\mathbf{w}\bigr\|_{\mathbf{V}}^2, \\[2pt] \mathbf{w}_{\mathrm{MASC}}(m,\varphi) &= \varphi\,\mathbf{w}_{\mathrm{match}}(m) + (1-\varphi)\,\mathbf{w}_{\mathrm{SC}},\end{split}\]

where $d(1, j) = \sum_{t\in\mathcal{T}_1} (y_{1t} - y_{jt})^2$ is the pre-period squared-distance and $\mathbf{V}$ is the (possibly optimised) predictor-weight matrix. Without covariates the SCE reduces to outcome-paths matching, i.e. $(\mathbf{x}_1, \mathbf{X}_0) = (\mathbf{y}_1^{\mathrm{pre}}, \mathbf{Y}_0^{\mathrm{pre}})$ with $\mathbf{V} = \mathbf{I}$.

Tuning by rolling-origin CV#

For each fold $f\in\mathcal{F}$ (each $f$ indexes the last pre-treatment period included in the training window), let $\widehat{y}^{\mathrm{SC}}_{f+1}$ and $\widehat{y}^{\mathrm{match}}_{f+1}(m)$ denote the one-step-ahead forecasts of the treated outcome from each estimator fit on the first $f$ periods, and let $y_{1,f+1}$ denote the actual treated outcome. The CV criterion at $(m,\varphi)$ is the weighted squared-error

\[Q(m,\varphi) = \sum_{f\in\mathcal{F}} w_f\, \bigl( y_{1,f+1} - \varphi \widehat{y}^{\mathrm{match}}_{f+1}(m) - (1-\varphi)\widehat{y}^{\mathrm{SC}}_{f+1} \bigr)^2 .\]

Holding $m$ fixed, the first-order condition gives the closed form

\[\widetilde{\varphi}(m) = \frac{ \sum_f w_f \bigl(y_{1,f+1} - \widehat{y}^{\mathrm{SC}}_{f+1}\bigr) \bigl(\widehat{y}^{\mathrm{match}}_{f+1}(m) - \widehat{y}^{\mathrm{SC}}_{f+1}\bigr) }{ \sum_f w_f \bigl(\widehat{y}^{\mathrm{match}}_{f+1}(m) - \widehat{y}^{\mathrm{SC}}_{f+1}\bigr)^2 } , \quad \widehat{\varphi}(m) = \operatorname{clip}_{[0,1]}\bigl(\widetilde{\varphi}(m)\bigr),\]

reproducing eq. 15 of Kellogg et al. (2021). The selected $\widehat{m} = \operatorname*{argmin}_m Q(m,\widehat{\varphi}(m))$ is then plugged in and final weights are refitted on the full pre-period.

Empirical Illustration: Basque Country and Spanish Terrorism#

Following Section 5 of Kellogg et al. [KMPT2021] – the canonical Abadie & Gardeazabal [ABADIE2003] study of the per-capita GDP cost of ETA terrorism – MASC runs on basque_jasa.csv: 17 Spanish regions (Basque plus 16 donor candidates), 1955-1997, with the JASA predictor specification (schooling shares, investment, sector composition, population density).

import pandas as pd
from mlsynth import MASC

url = ("https://raw.githubusercontent.com/jgreathouse9/mlsynth/refs/heads/"
       "main/basedata/basque_jasa.csv")
df = pd.read_csv(url)

covariates = [
    "school.illit", "school.prim", "school.med", "school.high", "invest",
    "sec.agriculture", "sec.energy", "sec.industry", "sec.construction",
    "sec.services.venta", "sec.services.nonventa", "popdens",
]
# The covariate windows match Abadie & Gardeazabal (2003), Table 1:
# schooling and investment are averaged over 1964-1969, the sector
# shares (observed every other year) over 1961-1969, popdens is the
# 1969 cross-section, and a lagged outcome ``gdpcap`` is matched on
# the 1960-1969 mean (Abadie's "pre-treatment outcomes" predictor).
covariate_windows = {
    "sec.agriculture": (1961, 1969), "sec.energy": (1961, 1969),
    "sec.industry": (1961, 1969), "sec.construction": (1961, 1969),
    "sec.services.venta": (1961, 1969),
    "sec.services.nonventa": (1961, 1969),
    "popdens": (1969, 1969),
    "invest": (1964, 1969),
    "school.illit": (1964, 1969), "school.prim": (1964, 1969),
    "school.med": (1964, 1969), "school.high": (1964, 1969),
    "gdpcap": (1960, 1969),
}

res = MASC({
    "df": df, "outcome": "gdpcap", "treat": "terrorism",
    "unitid": "regionname", "time": "year",
    "m_grid": list(range(1, 11)),
    "min_preperiods": 5,
    "covariates": covariates,
    "covariate_windows": covariate_windows,
    # The KMPT Basque application's exact estimator: the MSCMT/synth SC
    # optimiser (the default) blended with covariate matching.
    "sc_backend": "mscmt",
    "match_on": "covariates",
    "display_graphs": False,
}).fit()

print(f"Selected m   : {res.m_hat}")
print(f"Selected phi : {res.phi_hat:.3f}")
print(f"Pre-RMSE     : ${res.fit.pre_rmse * 1000:.0f}/capita")
print(f"ATT          : ${res.att * 1000:+.0f}/capita/year")
print("Top donors:")
for u, w in sorted(res.donor_weights.items(), key=lambda kv: -kv[1])[:4]:
    if w > 0.05:
        print(f"  {u:<32s} {w:.3f}")

This prints:

Selected m   : 1
Selected phi : 0.000
Pre-RMSE     : $89/capita
ATT          : $-585/capita/year
Top donors:
  Cataluna                         0.831
  Madrid (Comunidad De)            0.169

Configured exactly as the KMPT [KMPT2021] Section-5 application – the MSCMT/synth SC optimiser blended with covariate matching (their solve.covmatch) – MASC reproduces their result value for value. The paper reports MASC $\coloneqq$ SC ($\widehat{\varphi} = 0$), pre-RMSE $\approx \$94$, ATT $\approx -\$580$/capita/year with donor weights Cataluna 0.85 / Madrid 0.15; mlsynth’s CV likewise selects pure SC ($\widehat{\varphi} = 0$), pre-RMSE $\$89$, ATT $-\$585$, Cataluna $0.83$ / Madrid $0.17$. The durable check is benchmarks/cases/masc_basque.py.

Note

Two optimiser choices, both exposed. Reproducing KMPT exactly turns on matching the authors’ two algorithmic choices, each a config toggle:

sc_backend – the predictor-weight ($\mathbf{V}$) optimiser for the SC step. "mscmt" (the default) is the MSCMT global search that matches Abadie’s synth() and the reference; "bilevel" is the Malo et al. [malo2023computing] solver shared with FSCM. They can converge to different $\mathbf{V}$ (hence $\mathbf{w}$) when the SC problem is over-parameterised (here 12 predictors over 16 donors), the non-uniqueness phenomenon documented by Becker & Kloessner.
match_on – the nearest-neighbour feature space. "outcomes" (the default) matches on the pre-treatment outcome path (the reference’s default Wbar); "covariates" matches on the standardised predictor block (their solve.covmatch), which the KMPT Basque application uses.

The Basque numbers above use the authors’ configuration (sc_backend="mscmt", match_on="covariates") and match KMPT value for value. The historical defaults ("bilevel" / "outcomes") give $-\$816$ / $-\$769$ and a small positive $\widehat{\varphi}$, off the paper only because they are not the authors’ choices.

Verification#

Note

Empirical (Basque proper). With the Abadie-Gardeazabal predictor windows (schooling and investment 1964-1969, sector shares 1961-1969, popdens 1969, gdpcap 1960-1969), treatment starting in 1975 and Spain itself removed from the donor pool, MASC selects $m=1$, $\widehat{\varphi} \approx 0.32$, pre-RMSE $\approx \$97$/capita (vs.KMPT’s $\$94$) and ATT $\approx -\$641$/capita/year (vs.KMPT’s $-\$580$). Donor mass concentrates on Cataluna (0.64) and Madrid (0.23) – the same two-donor structure KMPT report (0.85 + 0.15). The residual gap is the V-optimiser non-uniqueness documented above.

Helpers. The nearest-neighbour selector, the simplex SC primitive, the analytic $\widehat{\varphi}$ formula and the per-fold covariate aggregation are unit-tested (mlsynth/tests/test_masc.py).

Core API#

Matching and Synthetic Control (MASC) estimator.

A thin NumPy-first orchestration over mlsynth.utils.masc_helpers. MASC of Kellogg, Mogstad, Pouliot & Torgovitsky (2021) combines a nearest-neighbour matching weight vector with the standard SC simplex weight vector,

\[\boldsymbol{\omega}_{\mathrm{MASC}} = \varphi\,\boldsymbol{\omega}_{\mathrm{match}} + (1-\varphi)\,\boldsymbol{\omega}_{\mathrm{SC}},\]

with the number of neighbours $m$ and the model-averaging weight $\varphi$ chosen jointly by rolling-origin cross-validation. The CV-optimal $\varphi$ admits a closed-form solution at each candidate $m$ (Kellogg et al. 2021, eq. 15), so the joint search reduces to a one-dimensional sweep over $m$.

When covariates are supplied the SC step runs the bilevel solver of Malo, Eskelinen, Zhou & Kuosmanen (2024) jointly over predictor weights $\mathbf{V}$ and donor weights $\mathbf{W}$; without covariates the SC step is the canonical outcome-paths simplex fit.

References

Kellogg, M., Mogstad, M., Pouliot, G., & Torgovitsky, A. (2021). Combining Matching and Synthetic Control to Trade Off Biases from Extrapolation and Interpolation. Journal of the American Statistical Association, 116(536), 1804-1816.

class mlsynth.estimators.masc.MASC(config: MASCConfig | dict)#

Bases: object

Matching and Synthetic Control estimator.

Parameters:

config (MASCConfig or dict) – Validated configuration. In addition to the common fields (df, outcome, treat, unitid, time, display_graphs, save, colours), MASC reads:

covariates / covariate_windows – optional predictor columns and their aggregation windows (matches the Abadie synth() predictor specification, with per-fold aggregation inside CV).
m_grid – candidate nearest-neighbour counts (defaults to 1..J).
min_preperiods and set_f – mutually exclusive CV-fold specifications (defaults to ceil(treatment_period / 2)..(treatment_period - 2) per the R reference).
forecast_minlength and forecast_maxlength – forecast horizon per fold.
solver – cvxpy solver for the SC QP (CLARABEL by default).

fit() → MASCResults#

Run MASC end to end and return MASCResults.

Raises:

MlsynthDataError – If the input panel violates MASC’s identification requirements (single treated unit, balanced panel, at least two pre-treatment periods).
MlsynthEstimationError – If the SC or CV optimisation steps fail at runtime.
MlsynthPlottingError – If plotting raises when display_graphs=True.

Configuration#

class mlsynth.config_models.MASCConfig(*, df: ~pandas.DataFrame, outcome: str, treat: str, unitid: str, time: str, display_graphs: bool = True, save: bool | str = False, counterfactual_color: ~typing.List[str] = <factory>, treated_color: str = 'black', plot: ~mlsynth.config_models.PlotConfig = <factory>, covariates: ~typing.List[str] | None = None, covariate_windows: dict | None = None, m_grid: ~typing.List[int] | None = None, min_preperiods: ~typing.Annotated[int | None, ~annotated_types.Ge(ge=2)] = None, set_f: ~typing.List[int] | None = None, fold_weights: ~typing.List[float] | None = None, forecast_minlength: ~typing.Annotated[int, ~annotated_types.Ge(ge=1)] = 1, forecast_maxlength: ~typing.Annotated[int, ~annotated_types.Ge(ge=1)] = 1, solver: str | None = None, match_on: ~typing.Literal['outcomes', 'covariates'] = 'outcomes', sc_backend: ~typing.Literal['mscmt', 'bilevel'] = 'mscmt')#

Configuration for the MASC estimator (Kellogg et al. 2021).

Kellogg, Mogstad, Pouliot & Torgovitsky (2021). Combining Matching and Synthetic Control to Trade Off Biases from Extrapolation and Interpolation. JASA 116(536), 1804-1816. The estimator forms a convex combination phi * matching + (1 - phi) * SC with the number of neighbours m and the weight phi jointly chosen by rolling-origin cross-validation.

covariate_windows: dict | None#

covariates: List[str] | None#

fold_weights: List[float] | None#

forecast_maxlength: int#

forecast_minlength: int#

m_grid: List[int] | None#

match_on: Literal['outcomes', 'covariates']#

min_preperiods: int | None#

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

sc_backend: Literal['mscmt', 'bilevel']#

set_f: List[int] | None#

solver: str | None#

Result Containers#

MASC.fit() returns a MASCResults containing the selected (m_hat, phi_hat), the MASC weight vector (with the matching and SC components separately preserved), counterfactual, pre/post gap, pre-RMSE, ATT, and the full CV grid. The prepared NumPy panel is exposed as a MASCInputs.

Note

MASC.fit() returns an EffectResult on the standardized two-family contract: res.att / res.counterfactual / res.gap / res.donor_weights / res.pre_rmse resolve through the standardized sub-models. The blended MASC weight vector is res.weights_vector (the bare res.weights is reserved for the standardized WeightsResults); the CV-selected tuning is on res.m_hat / res.phi_hat and the full fit on res.fit.

Typed result containers for the MASC estimator.

Kellogg, Mogstad, Pouliot, and Torgovitsky (2021), Combining Matching and Synthetic Control to Trade off Biases from Extrapolation and Interpolation. The estimator forms a convex combination of a nearest-neighbour matching weight vector and a synthetic-control simplex weight vector, with both tuning parameters (m, phi) selected by rolling-origin cross-validation.

class mlsynth.utils.masc_helpers.structures.MASCFit(att: float, weights: ~numpy.ndarray, weights_match: ~numpy.ndarray, weights_sc: ~numpy.ndarray, phi_hat: float, m_hat: int, counterfactual: ~numpy.ndarray, gap: ~numpy.ndarray, pre_rmse: float, cv_error: float, cv_error_by_fold: ~numpy.ndarray, cv_grid: ~numpy.ndarray, donor_weights: dict = <factory>)#

Bases: object

Single MASC point-estimate fit.

att: float#

counterfactual: ndarray#

cv_error: float#

cv_error_by_fold: ndarray#

cv_grid: ndarray#

donor_weights: dict#

gap: ndarray#

m_hat: int#

phi_hat: float#

pre_rmse: float#

weights: ndarray#

weights_match: ndarray#

weights_sc: ndarray#

class mlsynth.utils.masc_helpers.structures.MASCInputs(Y_treated: ndarray, Y_donors: ndarray, treated_label: Any, donor_labels: Tuple[Any, ...], time_index: ndarray, intervention_time: Any, treatment_period: int, T: int, T0: int, T1: int, J: int, cov_treated_panel: ndarray | None = None, cov_donors_panel: ndarray | None = None, covariate_names: Tuple[Any, ...] = (), covariate_windows: dict | None = None)#

Bases: object

Pre-pivoted inputs for a single-treated-unit MASC fit.

Covariate panels (when supplied) are stored as full (T, J + 1, P) tensors where the last axis indexes predictors and the second axis indexes units with the treated unit in slot 0; this lets each CV fold aggregate covariates over its own pre-window (matching the R reference, which re-averages within every fold).

J: int#

T: int#

T0: int#

T1: int#

Y_donors: ndarray#

Y_treated: ndarray#

cov_donors_panel: ndarray | None = None#

cov_treated_panel: ndarray | None = None#

covariate_names: Tuple[Any, ...] = ()#

covariate_windows: dict | None = None#

donor_labels: Tuple[Any, ...]#

property has_covariates: bool#

intervention_time: Any#

time_index: ndarray#

treated_label: Any#

treatment_period: int#

class mlsynth.utils.masc_helpers.structures.MASCResults(*, effects: EffectsResults | None = None, fit_diagnostics: FitDiagnosticsResults | None = None, time_series: TimeSeriesResults | None = None, weights: WeightsResults | None = None, inference: InferenceResults | None = None, method_details: MethodDetailsResults | None = None, sub_method_results: Dict[str, Any] | None = None, additional_outputs: Dict[str, Any] | None = None, raw_results: Dict[str, Any] | None = None, execution_summary: Dict[str, Any] | None = None, plot_config: PlotConfig | None = None, inputs: MASCInputs, fit: MASCFit)#

Bases: BaseEstimatorResults

Top-level container returned by MASC.fit.

An EffectResult (the observational report): it lifts the single fit into the standardized sub-models so the flat accessors (att / counterfactual / gap / donor_weights / pre_rmse) resolve through the base contract. The MASC-specific blend detail stays on fit and the convenience properties below.

Parameters:

inputs (MASCInputs) – Pre-pivoted inputs.
fit (MASCFit) – The single MASC point-estimate fit (blended matching + SC weights, the CV-selected m / phi, counterfactual, gap).

fit: MASCFit#

inputs: MASCInputs#

property m_hat: int#

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'frozen': True, 'json_encoders': {<class 'numpy.ndarray'>: <function BaseEstimatorResults.Config.<lambda>>}}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property phi_hat: float#

property weights_vector: ndarray#: The blended MASC weight vector phi*match + (1-phi)*SC.

Helper Modules#

Data preparation – the only DataFrame touchpoint: pivots to NumPy, builds the unit/time index, splits pre/post, assembles the optional covariate panels for per-fold aggregation.

Pivot a long panel into the matrices the MASC estimator consumes.

mlsynth.utils.masc_helpers.setup.prepare_masc_inputs(df: DataFrame, *, outcome: str, treat: str, unitid: str, time: str, covariates: Sequence[str] | None = None, covariate_windows: Dict[str, Tuple[Any, Any]] | None = None) → MASCInputs#

Pivot df into MASC’s (Y_treated, Y_donors, treatment_period).

Single-treated-unit only for v1. The treated unit is the panel unit with any treat == 1 row; the donor pool is every other never-treated unit. treatment_period is the 1-indexed position of the first treated period (matches the R reference’s treatment argument).

The nearest-neighbour selector, the simplex SC primitive (with optional covariates routed through the bilevel solver) and the analytic-$\varphi$ closed form.

Nearest-neighbour + SC weight constructors and MASC combiner.

Direct port of NearestNeighbors and sc_estimator from Maxwell Kellogg’s reference R code (masc/R/estimator.R).

mlsynth.utils.masc_helpers.estimation.analytic_phi(Y_treated: ndarray, Y_match: ndarray, Y_sc: ndarray, obj_weights: ndarray) → float#

Closed-form phi minimising the weighted CV objective.

Implements Kellogg et al. (2021) equation (15) – equivalent to a 1-D weighted OLS of (Y_treated - Y_sc) on (Y_match - Y_sc) clamped to [0, 1]. Direct port of lines 299-302 of crossvalidation.R.

Parameters:

Y_treated, Y_match, Y_sc (np.ndarray) – Stacked forecast vectors from every CV fold.
obj_weights (np.ndarray) – Per-observation weights derived from the fold weights (line 293 of crossvalidation.R).

Returns:

float – The CV-optimal phi in [0, 1].

mlsynth.utils.masc_helpers.estimation.masc_combine(weights_match: ndarray, weights_sc: ndarray, phi: float) → ndarray#: phi * match + (1 - phi) * sc. Trivial helper for clarity.

mlsynth.utils.masc_helpers.estimation.nearest_neighbor_weights(Y_treated_pre: ndarray, Y_donors_pre: ndarray, m: int) → ndarray#

Equal-weight nearest-neighbour weights on outcome-path distance.

Picks the m donors whose pre-period outcome paths have the smallest squared-distance from the treated unit and assigns 1/m to each (zero elsewhere). Mirrors NearestNeighbors in the R reference’s outcome-path branch (lines 15-39 of estimator.R).

Parameters:

Y_treated_pre (np.ndarray) – Shape (T0,), treated unit’s pre-period outcomes.
Y_donors_pre (np.ndarray) – Shape (T0, J), donor pre-period outcomes.
m (int) – Number of nearest neighbours to retain.

Returns:

np.ndarray – Shape (J,), weights that sum to 1, with 1/m on the m closest donors and 0 elsewhere.

mlsynth.utils.masc_helpers.estimation.sc_simplex_weights(Y_treated_pre: ndarray, Y_donors_pre: ndarray, *, X_treated: ndarray | None = None, X_donors: ndarray | None = None, solver: str | None = None, sc_backend: str = 'mscmt') → ndarray#

Standard SC simplex QP on pre-period outcomes or covariates.

Solves min_w ||target1 - target0 w||^2 subject to sum(w) = 1 and 0 <= w_j <= 1. With X_treated and X_donors supplied the QP runs on the row-standardised predictor block (Abadie’s default V = diag(1/var) realised as preconditioning) rather than the raw outcome path. Without them, it reduces to the outcome-paths SC used by the R reference’s sc_estimator no- covariates branch.

This is not full bilevel V-optimisation; it matches Abadie’s initial / heuristic V used by Cov.Vars (Estimator_Code.R line 16) and produces SC weights identical to those Abadie’s synth() returns when custom.v = "default".

The rolling-origin cross-validation engine and the per-fold covariate aggregator.

Rolling-origin cross-validation to tune (m, phi) for MASC.

Direct port of cv_masc from the R reference (lines 214-332 of masc/R/crossvalidation.R).

The CV grid loops over candidate m (one CV pass per m); for each m, the analytic phi from Kellogg et al. (2021) eq. (15) gives the CV-optimal weighting between match and SC at that m in closed form. The chosen (m_hat, phi_hat) minimises the resulting CV criterion across the grid.

mlsynth.utils.masc_helpers.crossval.cross_validate(Y_treated: ndarray, Y_donors: ndarray, treatment_period: int, *, m_grid: Sequence[int] | None = None, min_preperiods: int | None = None, set_f: Sequence[int] | None = None, fold_weights: ndarray | None = None, forecast_minlength: int = 1, forecast_maxlength: int = 1, solver: str | None = None, sc_backend: str = 'mscmt', match_on: str = 'outcomes', cov_treated_panel: ndarray | None = None, cov_donors_panel: ndarray | None = None, covariate_names: Sequence[Any] = (), time_index: ndarray | None = None, covariate_windows: Dict[Any, Tuple[Any, Any]] | None = None) → Tuple[int, float, float, ndarray, ndarray]#

Pick (m_hat, phi_hat) by rolling-origin CV across the grid.

Returns:

m_hat (int)
phi_hat (float)
cv_error_min (float)
cv_grid (np.ndarray) – Shape (len(m_grid), 3) with columns [m, phi, cv_error].
by_fold_at_min (np.ndarray) – CV error by fold at (m_hat, phi_hat).

The end-to-end pipeline composing CV with the full-sample refit and the MASC weight combiner.

End-to-end MASC pipeline.

mlsynth.utils.masc_helpers.orchestration.run_masc(inputs: MASCInputs, *, m_grid: Sequence[int] | None = None, min_preperiods: int | None = None, set_f: Sequence[int] | None = None, fold_weights: ndarray | None = None, forecast_minlength: int = 1, forecast_maxlength: int = 1, solver: str | None = None, sc_backend: str = 'mscmt', match_on: str = 'outcomes') → MASCFit#

Run MASC end-to-end on inputs.

Cross-validate (m, phi) via rolling-origin CV.
Refit SC and matching at m_hat on the full pre-period.
Combine with the analytic phi_hat.
Form the ATT as mean(Y_treated_post - YJ_post @ weights).

Plotting: outcome paths and the CV curve over the candidate m grid.

Plotting for MASC: treated-vs-counterfactual and the CV grid.

The observed-vs-counterfactual panel is delegated to the shared Plotter; the CV-error-vs-m curve is MASC’s own bespoke panel and stays local.

mlsynth.utils.masc_helpers.plotter.plot_masc(results: MASCResults, *, outcome: str, time: str, treated_color: str = 'black', counterfactual_color: str | List[str] = 'red', save: bool | str = False) → None#

Outcome paths (shared archetype) plus the CV-grid panel.

The left panel overlays the treated trajectory and the MASC counterfactual (φ * matching + (1 − φ) * SC). The right panel plots the cross-validation error against the candidate m grid, annotating the CV-selected (m̂, φ̂).

Matching and Synthetic Control (MASC)

Contents

Matching and Synthetic Control (MASC)#

When to use MASC – and when not to#

Notation#

Assumptions (Kellogg-Mogstad-Pouliot-Torgovitsky 2021)#

When the assumptions bind: practical diagnostics#

Setup#

Tuning by rolling-origin CV#

Empirical Illustration: Basque Country and Spanish Terrorism#

Verification#

Core API#

Configuration#

Result Containers#

Helper Modules#