Matching and Synthetic Control (MASC)#
Assumptions (Kellogg-Mogstad-Pouliot-Torgovitsky 2021)#
MASC inherits the formal identification stack of any causal SC estimator (paper Section 2.1) and adds the structural conditions needed for model averaging to make sense. Listed in the paper’s order:
A1 (Selection on observables – paper Assumption 1). For \(x\) in the supports of both \(X_i \mid D_i = 0\) and \(X_i \mid D_i = 1\),
This is the standard mean-independence statement (ignorable treatment assignment / unconfoundedness / selection on observables) applied to the SC framework. Together with A2 it makes the post-treatment conditional mean of untreated outcomes for the treated unit identifiable from donor outcomes.
A2 (Overlap – paper Assumption 2). The support of \(X_i \mid D_i = 1\) is contained in the support of \(X_i \mid D_i = 0\). In a comparative case study with a single treated unit (the paper’s focus), this reduces to “for almost every covariate value the treated unit takes, there exists some donor with similar covariates”. With one treated unit, overlap fails fully only if the treated unit is an outlier on every donor covariate.
A3 (Lipschitz conditional mean). The conditional mean \(\gamma_t(x) = \mathbb{E}[Y_{it}(0) \mid D_i = 0, X_i = x]\) is Lipschitz in \(x\) with constant \(c\). Used (paper Section 2.2) to bound both bias components,
These two bounds are the heart of the MASC argument: the SC estimator minimises \(\text{Ext}(w)\) (and lives at zero extrapolation when \(x_1\) is in the donor hull), while matching minimises \(\text{Int}(w)\) (and lives at zero interpolation by using only the nearest neighbours). When \(\gamma_t\) is approximately linear in \(x\), the interpolation bound is vacuous and SC dominates; when no donor is close, the extrapolation bound is large and matching does worse.
A4 (Complementarity – the substantive premise of model averaging). Both biases are plausibly relevant in the application: \(\gamma_t\) is non-linear enough that SC alone interpolates badly, and no single donor is close enough that matching alone extrapolates badly. Remark. This is the paper’s central conjecture for why model averaging helps. When either bias is absent the data-driven CV will pick \(\hat\varphi \in \{0, 1\}\) and MASC degenerates to a boundary estimator – a feature, not a bug.
A5 (Rolling-origin stability). The relationship between treated and donor outcomes is stable across the late-pre-period folds and across the pre/post boundary, so that one-step-ahead forecast accuracy on the training-set tail is informative about post-treatment forecast accuracy. Remark. This is the SC identification premise restricted to the fold horizon. Without it the CV criterion is uninformative about the post-period and \(\hat\varphi\) reflects only pre-period drift.
A6 (Quadratic-in-\(\varphi\) closed form). The CV criterion \(Q(m, \varphi)\) is quadratic in \(\varphi\) with positive semi-definite Hessian, so the unconstrained optimum is unique and the constrained optimum on \([0, 1]\) is its clip. Remark. Mechanical; the joint \((m, \varphi)\) search reduces to a one-dimensional sweep over \(m\). Held by construction.
When the assumptions bind: practical diagnostics#
Selection on observables (A1). Like every regression / SC / matching estimator, MASC assumes that the only systematic difference between treated and donor post-period outcomes is captured by the observed pretreatment covariates \(X_i\). If a confounder is missing from \(X_i\), MASC’s counterfactual is biased regardless of how the CV picks \((\hat m, \hat \varphi)\).
Plausibly violated when a known driver of the outcome is omitted from
covariates– a state’s industry mix in a labour-market study, an audience segment in a marketing study. Diagnostic: re-fit with one omitted covariate at a time and check whetherres.attmoves; large movements flag a missing confounder. There is no within-MASC fix; the cure is to include the missing covariate, or accept selection-on-observables is failing for this application.Overlap (A2). With one treated unit, full overlap failure means the treated covariates lie outside the donor convex hull on at least one dimension.
Plausibly violated when the treated unit is structurally extreme on a known covariate (Hong Kong’s GDP level vs other Asian regions; California’s population vs interior states). Diagnostic: at the SC limit (\(\varphi = 0\)) the pre-period RMSE will be elevated and the implied SC weights will concentrate on a few donors with substantial covariate gaps. If
res.fit.pre_rmsestays large and matching (\(\varphi = 1\)) does even worse, both components of the model average are failing for the same reason. Switch to Imperfect Synthetic Controls (ISCM) (which identifies the effect even when the treated unit is outside the hull) or Nonlinear Synthetic Control (NSC) (which drops the simplex restriction so SC weights can extrapolate by going negative on far donors).Lipschitz conditional mean (A3). The Lipschitz constant controls how much extrapolation / interpolation bias the two component estimators incur. If \(\gamma_t\) has a sharp kink or threshold in \(x\), the bounds are loose and MASC’s CV-driven mix can be unstable.
Plausibly violated when the outcome is a step function (regulatory threshold), has a kink (minimum-wage bunching), or saturates near a ceiling. Diagnostic: plot the per-fold one-step-ahead forecast errors of the pure SC and pure matching arms (
res.cv_diagnosticsexposes both); if the two error series are highly correlated across folds, the averaging gain is small and MASC reduces to either boundary.Complementarity / both biases bind (A4). If only one bias matters, MASC’s CV will sit at \(\hat\varphi \in \{0, 1\}\) and the model average adds variance without bias improvement.
Plausibly violated when the SC pre-fit is already tight (\(\gamma_t\) is approximately linear in \(x\) on the donor support) or no donor is remotely close (matching alone extrapolates badly across the board). Diagnostic: read
res.phi_hat. If it is essentially 0 or 1 across multiple seeds / fold configurations, the MASC machinery is over-engineered for this application and the corresponding pure estimator is the better default – canonical SCM / Two-Step Synthetic Control for the \(\varphi = 0\) regime, a nearest-neighbour matching estimator outside mlsynth for the \(\varphi = 1\) regime.Rolling-origin stability (A5). The CV-selected \((\hat m, \hat \varphi)\) is only as informative as the late-pre-period is representative of the post-period.
Plausibly violated when a structural break (regime change, pandemic, financial crisis) sits inside the pre- period close to \(t^\star\). Diagnostic: inspect the per-fold forecast errors; if they trend sharply over the fold index, the late-pre-period is not exchangeable with the early one and the CV is mostly fitting that trend. Either trim the pre-period to a regime-stable window or move to a stationary-cycle estimator (Synthetic Business Cycle (SBC)).
Multiple treated units. The paper’s setup is one treated unit. With multiple treated, the SC step’s non-uniqueness problem (which the penalised-SC of Abadie & L’Hour 2020 was built for) propagates into MASC’s mix. Plausibly violated when you have several treated units on the same cohort. Diagnostic: MASC’s headline numbers will be sensitive to which treated unit you single out as “the” treated; if so, use canonical SCM paired with the penalised variant, or FECT for staggered designs.
When to use MASC – and when not to#
Reach for MASC when:
You have a single treated unit and a moderate-to-large donor pool with non-trivial heterogeneity across donors. Comparative case studies in policy evaluation (Basque terrorism, Prop 99, German reunification) are the canonical setting.
The conditional mean of the outcome is plausibly non-linear in the pre-treatment covariates – so pure SC’s interpolation bias is a real risk – and you also suspect no donor is close enough to extrapolate from – so pure matching’s extrapolation bias is also a real risk. When both worries are alive, MASC’s \(\hat\varphi\) trades them off in a data-driven way.
You can’t decide a priori whether SC or matching is more appropriate. The CV gives you a defensible answer rather than the practitioner’s eyeball “I’ll use SC because that’s what the paper said”.
The pre-period is long enough for rolling-origin cross-validation to discriminate the two biases (KMPT Section 5 uses 20 pre-treatment years on Basque; mlsynth’s
min_preperiodsdefaults to 5 as a hard floor, but pre-periods around 10+ are where CV becomes informative).
Do not use MASC when:
Either bias dominates. If
res.phi_hatis essentially 0 or 1 across seeds, MASC adds variance without bias improvement. At \(\hat\varphi \approx 0\) reach for canonical SCM / Two-Step Synthetic Control (or Forward-Selected Synthetic Control (FSCM) for selective donor pruning); at \(\hat\varphi \approx 1\) reach for a dedicated nearest-neighbour matching estimator.The treated unit is structurally outside the donor convex hull. Both component estimators fail (A2). Use Imperfect Synthetic Controls (ISCM) (identifies the effect via donors that use the treated unit as a positive-weight donor) or Nonlinear Synthetic Control (NSC) (drops the simplex restriction to extrapolate by negative weights).
You need posterior credible bands on the weights / ATT. MASC returns point estimates plus a CV criterion. For full Bayesian inference, use Bayesian Synthetic Control with a Soft Simplex Constraint (BVS-SS) (spike-and-slab variable selection with a soft simplex).
The pre-period is very short (\(T_1 < 10\)-ish). Rolling-origin CV has too few folds to discriminate \((m, \varphi)\); the selected mix is noise. Use canonical SCM / Two-Step Synthetic Control / Forward Difference-in-Differences (FDID) (which work without CV) instead.
Multiple treated units. MASC’s identification story uses a single treated unit. For staggered or many-treated designs, use FECT or Synthetic Difference-in-Differences (SDID). (The paper’s Section 2.7 notes one could average a matching and penalised-SC pair across treated units, mirroring MASC, but this is not in the mlsynth implementation and demands additional econometric theory.)
Structural break inside the pre-period. A5 fails; the CV is fitting the break instead of the post-period mix. Trim to a stable window or use Synthetic Business Cycle (SBC).
You need a single sparse interpretable weight vector as the policy-story deliverable. MASC’s output is a mixture of SC weights and matching weights; both can be sparse on their own, but the mixed vector is generically less sparse than either component. If the headline must be “California ≈ Utah + Montana + Nevada”, run canonical SCM alongside.
Distributional questions (Lorenz curves, QTEs, tail effects). MASC targets the mean ATT. Use Distributional Synthetic Control (DSC).
Continuous or multi-valued treatment. MASC encodes a single binary intervention. Continuous dose belongs in Continuous-Treatment Synthetic Control (CTSC).
Spillovers across donors. Both component estimators inherit SUTVA at the donor level. Use Spillover-Aware Synthetic Control (SPILLSYNTH) or Spatial Synthetic Difference-in-Differences (SpSyDiD).
Notation#
We use the synthetic-control canon. Unit \(j=0\) is treated and \(\mathcal{N} = \{1, \ldots, N\}\) indexes the donor pool; \(\mathbf{y}_0\) is the treated outcome path and \(\mathbf{Y}\) is the \((T,N)\) donor outcome matrix. The pre-treatment window is \(\mathcal{T}_1 = \{1, \ldots, T_1\}\) and the post-treatment window is \(\mathcal{T}_2 = \{T_1+1, \ldots, T\}\), with treatment beginning at \(t = T_1 + 1\). Predictors are stacked into \((\mathbf{x}_0, \mathbf{X})\) with \(\mathbf{x}_0\in\mathbb{R}^P\) for the treated unit and \(\mathbf{X}\in\mathbb{R}^{P\times N}\) for the donors. The simplex is
Setup#
The matching and SCE weights and the MASC combiner are
where \(d(j_0, i) = \sum_{t\in\mathcal{T}_1} (y_{0t} - y_{it})^2\) is the
pre-period squared-distance and \(\mathbf{V}\) is the (possibly
optimised) predictor-weight matrix. Without covariates the SCE reduces
to outcome-paths matching, i.e. \((\mathbf{x}_0, \mathbf{X}) =
(\mathbf{y}_0^{\mathrm{pre}}, \mathbf{Y}^{\mathrm{pre}})\) with
\(\mathbf{V} = \mathbf{I}\).
Tuning by rolling-origin CV#
For each fold \(f\in\mathcal{F}\) (each \(f\) indexes the last pre-treatment period included in the training window), let \(\hat y^{\mathrm{SC}}_{f+1}\) and \(\hat y^{\mathrm{match}}_{f+1}(m)\) denote the one-step-ahead forecasts of the treated outcome from each estimator fit on the first \(f\) periods, and let \(y_{0,f+1}\) denote the actual treated outcome. The CV criterion at \((m,\varphi)\) is the weighted squared-error
Holding \(m\) fixed, the first-order condition gives the closed form
reproducing eq. 15 of Kellogg et al. (2021). The selected \(\hat m = \operatorname*{argmin}_m Q(m,\hat\varphi(m))\) is then plugged in and final weights are refitted on the full pre-period.
Empirical Illustration: Basque Country and Spanish Terrorism#
Following Section 5 of Kellogg et al. [KMPT2021] – the canonical Abadie &
Gardeazabal [ABADIE2003] study of the per-capita GDP cost of ETA terrorism –
MASC runs on basque_jasa.csv: 17 Spanish regions (Basque plus 16
donor candidates), 1955-1997, with the JASA predictor specification
(schooling shares, investment, sector composition, population density).
import pandas as pd
from mlsynth import MASC
url = ("https://raw.githubusercontent.com/jgreathouse9/mlsynth/refs/heads/"
"main/basedata/basque_jasa.csv")
df = pd.read_csv(url)
covariates = [
"school.illit", "school.prim", "school.med", "school.high", "invest",
"sec.agriculture", "sec.energy", "sec.industry", "sec.construction",
"sec.services.venta", "sec.services.nonventa", "popdens",
]
# The covariate windows match Abadie & Gardeazabal (2003), Table 1:
# schooling and investment are averaged over 1964-1969, the sector
# shares (observed every other year) over 1961-1969, popdens is the
# 1969 cross-section, and a lagged outcome ``gdpcap`` is matched on
# the 1960-1969 mean (Abadie's "pre-treatment outcomes" predictor).
covariate_windows = {
"sec.agriculture": (1961, 1969), "sec.energy": (1961, 1969),
"sec.industry": (1961, 1969), "sec.construction": (1961, 1969),
"sec.services.venta": (1961, 1969),
"sec.services.nonventa": (1961, 1969),
"popdens": (1969, 1969),
"invest": (1964, 1969),
"school.illit": (1964, 1969), "school.prim": (1964, 1969),
"school.med": (1964, 1969), "school.high": (1964, 1969),
"gdpcap": (1960, 1969),
}
res = MASC({
"df": df, "outcome": "gdpcap", "treat": "terrorism",
"unitid": "regionname", "time": "year",
"m_grid": list(range(1, 11)),
"min_preperiods": 5,
"covariates": covariates,
"covariate_windows": covariate_windows,
"display_graphs": False,
}).fit()
print(f"Selected m : {res.m_hat}")
print(f"Selected phi : {res.phi_hat:.3f}")
print(f"Pre-RMSE : ${res.fit.pre_rmse * 1000:.0f}/capita")
print(f"ATT : ${res.att * 1000:+.0f}/capita/year")
print("Top donors:")
for u, w in sorted(res.donor_weights.items(), key=lambda kv: -kv[1])[:4]:
if w > 0.05:
print(f" {u:<32s} {w:.3f}")
This prints (roughly):
Selected m : 1
Selected phi : 0.324
Pre-RMSE : $97/capita
ATT : $-641/capita/year
Top donors:
Cataluna 0.640
Madrid (Comunidad De) 0.225
Principado De Asturias 0.063
Baleares (Islas) 0.054
The paper [KMPT2021], Section 5, reports MASC ≡ SCE
(\(\hat\varphi = 0\)), pre-RMSE \(\approx \$94\), ATT
\(\approx -\$580\)/capita/year, with donor weights Cataluna 0.85,
Madrid 0.15. The dominant donors (Cataluna + Madrid) agree, both
quantitatively (Cataluna 0.64 + Madrid 0.23 here vs.0.85 + 0.15 in
KMPT) and in pre-RMSE (\(\$97\) vs.\(\$94\)). The ATT
\(-\$641\) is within \(\$60\) of the published \(-\$580\);
the residual gap is the documented V-optimiser non-uniqueness below.
Note
Why our :math:`hatvarphi` is small but non-zero. The JASA paper
computes \(\boldsymbol{\omega}_{\mathrm{SC}}\) via the synth()
package’s quasi-Newton search over the predictor-weight matrix
\(\mathbf{V}\). mlsynth delegates the V-optimisation to the
Malo et al. [malo2023computing] bilevel solver (the same solver used by
FSCM). Both are mathematically valid V-optimisation strategies; on
this problem they converge to slightly different \(\mathbf{V}\) and
therefore slightly different \(\mathbf{W}\) (Cataluna 0.64 + Madrid
0.23 vs.0.85 + 0.15). The rolling-origin CV then prefers a small
amount of nearest-neighbour matching (\(\hat\varphi \approx 0.32\),
\(m=1\)) rather than pure SC.
This is the non-uniqueness phenomenon documented by Becker & Kloessner
and discussed in Malo et al.: when the SC problem is over-parameterised
(here 12 predictors over 16 donors) the upper-level loss is flat over many
feasible \(\mathbf{V}\), and different V-optimisers converge to
different \(\mathbf{W}\). Bit-perfect replication of JASA’s Section 5
would require a true ADH synth() port; the present implementation is
a faithful port of the MASC algorithm (matching, rolling-origin CV,
closed-form \(\varphi\)) on top of mlsynth’s bilevel V solver, with
the documented caveat above.
Verification#
Note
Empirical (Basque proper). With the Abadie-Gardeazabal predictor windows (schooling and investment 1964-1969, sector shares 1961-1969, popdens 1969, gdpcap 1960-1969), treatment starting in 1975 and Spain itself removed from the donor pool, MASC selects \(m=1\), \(\hat\varphi \approx 0.32\), pre-RMSE \(\approx \$97\)/capita (vs.KMPT’s \(\$94\)) and ATT \(\approx -\$641\)/capita/year (vs.KMPT’s \(-\$580\)). Donor mass concentrates on Cataluna (0.64) and Madrid (0.23) – the same two-donor structure KMPT report (0.85 + 0.15). The residual gap is the V-optimiser non-uniqueness documented above.
Helpers. The nearest-neighbour selector, the simplex SC primitive,
the analytic \(\hat\varphi\) formula and the per-fold covariate
aggregation are unit-tested (mlsynth/tests/test_masc.py).
Core API#
Matching and Synthetic Control (MASC) estimator.
A thin NumPy-first orchestration over mlsynth.utils.masc_helpers.
MASC of Kellogg, Mogstad, Pouliot & Torgovitsky (2021) combines a
nearest-neighbour matching weight vector with the standard SC simplex
weight vector,
with the number of neighbours \(m\) and the model-averaging weight \(\varphi\) chosen jointly by rolling-origin cross-validation. The CV-optimal \(\varphi\) admits a closed-form solution at each candidate \(m\) (Kellogg et al. 2021, eq. 15), so the joint search reduces to a one-dimensional sweep over \(m\).
When covariates are supplied the SC step runs the bilevel solver of
Malo, Eskelinen, Zhou & Kuosmanen (2024) jointly over predictor weights
\(\mathbf{V}\) and donor weights \(\mathbf{W}\); without
covariates the SC step is the canonical outcome-paths simplex fit.
References
Kellogg, M., Mogstad, M., Pouliot, G., & Torgovitsky, A. (2021). Combining Matching and Synthetic Control to Trade Off Biases from Extrapolation and Interpolation. Journal of the American Statistical Association, 116(536), 1804-1816.
- class mlsynth.estimators.masc.MASC(config: MASCConfig | dict)#
Bases:
objectMatching and Synthetic Control estimator.
- Parameters:
config (MASCConfig or dict) – Validated configuration. In addition to the common fields (
df,outcome,treat,unitid,time,display_graphs,save, colours), MASC reads:covariates/covariate_windows– optional predictor columns and their aggregation windows (matches the Abadiesynth()predictor specification, with per-fold aggregation inside CV).m_grid– candidate nearest-neighbour counts (defaults to1..J).min_preperiodsandset_f– mutually exclusive CV-fold specifications (defaults toceil(treatment_period / 2)..(treatment_period - 2)per the R reference).forecast_minlengthandforecast_maxlength– forecast horizon per fold.solver– cvxpy solver for the SC QP (CLARABEL by default).
- fit() MASCResults#
Run MASC end to end and return
MASCResults.- Raises:
MlsynthDataError – If the input panel violates MASC’s identification requirements (single treated unit, balanced panel, at least two pre-treatment periods).
MlsynthEstimationError – If the SC or CV optimisation steps fail at runtime.
MlsynthPlottingError – If plotting raises when
display_graphs=True.
Configuration#
- class mlsynth.config_models.MASCConfig(*, df: ~pandas.DataFrame, outcome: str, treat: str, unitid: str, time: str, display_graphs: bool = True, save: bool | str = False, counterfactual_color: ~typing.List[str] = <factory>, treated_color: str = 'black', covariates: ~typing.List[str] | None = None, covariate_windows: dict | None = None, m_grid: ~typing.List[int] | None = None, min_preperiods: ~typing.Annotated[int | None, ~annotated_types.Ge(ge=2)] = None, set_f: ~typing.List[int] | None = None, fold_weights: ~typing.List[float] | None = None, forecast_minlength: ~typing.Annotated[int, ~annotated_types.Ge(ge=1)] = 1, forecast_maxlength: ~typing.Annotated[int, ~annotated_types.Ge(ge=1)] = 1, solver: str | None = None)#
Configuration for the MASC estimator (Kellogg et al. 2021).
Kellogg, Mogstad, Pouliot & Torgovitsky (2021). Combining Matching and Synthetic Control to Trade Off Biases from Extrapolation and Interpolation. JASA 116(536), 1804-1816. The estimator forms a convex combination
phi * matching + (1 - phi) * SCwith the number of neighboursmand the weightphijointly chosen by rolling-origin cross-validation.- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
Result Containers#
MASC.fit() returns a
MASCResults containing the
selected (m_hat, phi_hat), the MASC weight vector (with the matching
and SC components separately preserved), counterfactual, pre/post gap,
pre-RMSE, ATT, and the full CV grid. The prepared NumPy panel is exposed
as a MASCInputs.
Typed result containers for the MASC estimator.
Kellogg, Mogstad, Pouliot, and Torgovitsky (2021), Combining Matching
and Synthetic Control to Trade off Biases from Extrapolation and
Interpolation. The estimator forms a convex combination of a
nearest-neighbour matching weight vector and a synthetic-control
simplex weight vector, with both tuning parameters (m, phi)
selected by rolling-origin cross-validation.
- class mlsynth.utils.masc_helpers.structures.MASCFit(att: float, weights: ~numpy.ndarray, weights_match: ~numpy.ndarray, weights_sc: ~numpy.ndarray, phi_hat: float, m_hat: int, counterfactual: ~numpy.ndarray, gap: ~numpy.ndarray, pre_rmse: float, cv_error: float, cv_error_by_fold: ~numpy.ndarray, cv_grid: ~numpy.ndarray, donor_weights: dict = <factory>)#
Bases:
objectSingle MASC point-estimate fit.
- counterfactual: ndarray#
- cv_error_by_fold: ndarray#
- cv_grid: ndarray#
- gap: ndarray#
- weights: ndarray#
- weights_match: ndarray#
- weights_sc: ndarray#
- class mlsynth.utils.masc_helpers.structures.MASCInputs(Y_treated: ndarray, Y_donors: ndarray, treated_label: Any, donor_labels: Tuple[Any, ...], time_index: ndarray, intervention_time: Any, treatment_period: int, T: int, T0: int, T1: int, J: int, cov_treated_panel: ndarray | None = None, cov_donors_panel: ndarray | None = None, covariate_names: Tuple[Any, ...] = (), covariate_windows: dict | None = None)#
Bases:
objectPre-pivoted inputs for a single-treated-unit MASC fit.
Covariate panels (when supplied) are stored as full
(T, J + 1, P)tensors where the last axis indexes predictors and the second axis indexes units with the treated unit in slot 0; this lets each CV fold aggregate covariates over its own pre-window (matching the R reference, which re-averages within every fold).- Y_donors: ndarray#
- Y_treated: ndarray#
- time_index: ndarray#
- class mlsynth.utils.masc_helpers.structures.MASCResults(inputs: MASCInputs, fit: MASCFit)#
Bases:
objectTop-level container returned by
MASC.fit.- property counterfactual: ndarray#
- property gap: ndarray#
- inputs: MASCInputs#
- property weights: ndarray#
Helper Modules#
Data preparation – the only DataFrame touchpoint: pivots to NumPy, builds the unit/time index, splits pre/post, assembles the optional covariate panels for per-fold aggregation.
Pivot a long panel into the matrices the MASC estimator consumes.
- mlsynth.utils.masc_helpers.setup.prepare_masc_inputs(df: DataFrame, *, outcome: str, treat: str, unitid: str, time: str, covariates: Sequence[str] | None = None, covariate_windows: Dict[str, Tuple[Any, Any]] | None = None) MASCInputs#
Pivot
dfinto MASC’s(Y_treated, Y_donors, treatment_period).Single-treated-unit only for v1. The treated unit is the panel unit with any
treat == 1row; the donor pool is every other never-treated unit.treatment_periodis the 1-indexed position of the first treated period (matches the R reference’streatmentargument).
The nearest-neighbour selector, the simplex SC primitive (with optional covariates routed through the bilevel solver) and the analytic-\(\varphi\) closed form.
Nearest-neighbour + SC weight constructors and MASC combiner.
Direct port of NearestNeighbors and sc_estimator from Maxwell
Kellogg’s reference R code (masc/R/estimator.R).
- mlsynth.utils.masc_helpers.estimation.analytic_phi(Y_treated: ndarray, Y_match: ndarray, Y_sc: ndarray, obj_weights: ndarray) float#
Closed-form phi minimising the weighted CV objective.
Implements Kellogg et al. (2021) equation (15) – equivalent to a 1-D weighted OLS of
(Y_treated - Y_sc)on(Y_match - Y_sc)clamped to[0, 1]. Direct port of lines 299-302 ofcrossvalidation.R.- Parameters:
Y_treated, Y_match, Y_sc (np.ndarray) – Stacked forecast vectors from every CV fold.
obj_weights (np.ndarray) – Per-observation weights derived from the fold weights (line 293 of
crossvalidation.R).
- Returns:
float – The CV-optimal
phiin[0, 1].
- mlsynth.utils.masc_helpers.estimation.masc_combine(weights_match: ndarray, weights_sc: ndarray, phi: float) ndarray#
phi * match + (1 - phi) * sc. Trivial helper for clarity.
- mlsynth.utils.masc_helpers.estimation.nearest_neighbor_weights(Y_treated_pre: ndarray, Y_donors_pre: ndarray, m: int) ndarray#
Equal-weight nearest-neighbour weights on outcome-path distance.
Picks the
mdonors whose pre-period outcome paths have the smallest squared-distance from the treated unit and assigns1/mto each (zero elsewhere). MirrorsNearestNeighborsin the R reference’s outcome-path branch (lines 15-39 ofestimator.R).- Parameters:
Y_treated_pre (np.ndarray) – Shape
(T0,), treated unit’s pre-period outcomes.Y_donors_pre (np.ndarray) – Shape
(T0, J), donor pre-period outcomes.m (int) – Number of nearest neighbours to retain.
- Returns:
np.ndarray – Shape
(J,), weights that sum to 1, with1/mon themclosest donors and 0 elsewhere.
- mlsynth.utils.masc_helpers.estimation.sc_simplex_weights(Y_treated_pre: ndarray, Y_donors_pre: ndarray, *, X_treated: ndarray | None = None, X_donors: ndarray | None = None, solver: str | None = None) ndarray#
Standard SC simplex QP on pre-period outcomes or covariates.
Solves
min_w ||target1 - target0 w||^2subject tosum(w) = 1and0 <= w_j <= 1. WithX_treatedandX_donorssupplied the QP runs on the row-standardised predictor block (Abadie’s default V =diag(1/var)realised as preconditioning) rather than the raw outcome path. Without them, it reduces to the outcome-paths SC used by the R reference’ssc_estimatorno- covariates branch.This is not full bilevel V-optimisation; it matches Abadie’s initial / heuristic V used by
Cov.Vars(Estimator_Code.Rline 16) and produces SC weights identical to those Abadie’ssynth()returns whencustom.v = "default".
The rolling-origin cross-validation engine and the per-fold covariate aggregator.
Rolling-origin cross-validation to tune (m, phi) for MASC.
Direct port of cv_masc from the R reference (lines 214-332 of
masc/R/crossvalidation.R).
The CV grid loops over candidate m (one CV pass per m); for
each m, the analytic phi from Kellogg et al. (2021) eq. (15)
gives the CV-optimal weighting between match and SC at that m in
closed form. The chosen (m_hat, phi_hat) minimises the resulting
CV criterion across the grid.
- mlsynth.utils.masc_helpers.crossval.cross_validate(Y_treated: ndarray, Y_donors: ndarray, treatment_period: int, *, m_grid: Sequence[int] | None = None, min_preperiods: int | None = None, set_f: Sequence[int] | None = None, fold_weights: ndarray | None = None, forecast_minlength: int = 1, forecast_maxlength: int = 1, solver: str | None = None, cov_treated_panel: ndarray | None = None, cov_donors_panel: ndarray | None = None, covariate_names: Sequence[Any] = (), time_index: ndarray | None = None, covariate_windows: Dict[Any, Tuple[Any, Any]] | None = None) Tuple[int, float, float, ndarray, ndarray]#
Pick
(m_hat, phi_hat)by rolling-origin CV across the grid.- Returns:
m_hat (int)
phi_hat (float)
cv_error_min (float)
cv_grid (np.ndarray) – Shape
(len(m_grid), 3)with columns[m, phi, cv_error].by_fold_at_min (np.ndarray) – CV error by fold at
(m_hat, phi_hat).
The end-to-end pipeline composing CV with the full-sample refit and the MASC weight combiner.
End-to-end MASC pipeline.
- mlsynth.utils.masc_helpers.orchestration.run_masc(inputs: MASCInputs, *, m_grid: Sequence[int] | None = None, min_preperiods: int | None = None, set_f: Sequence[int] | None = None, fold_weights: ndarray | None = None, forecast_minlength: int = 1, forecast_maxlength: int = 1, solver: str | None = None) MASCFit#
Run MASC end-to-end on
inputs.Cross-validate
(m, phi)via rolling-origin CV.Refit SC and matching at
m_haton the full pre-period.Combine with the analytic
phi_hat.Form the ATT as
mean(Y_treated_post - YJ_post @ weights).
Plotting: outcome paths and the CV curve over the candidate m grid.
Plotting for MASC: treated-vs-counterfactual and the CV grid.
The observed-vs-counterfactual panel is delegated to the shared
Plotter; the CV-error-vs-m curve is
MASC’s own bespoke panel and stays local.
- mlsynth.utils.masc_helpers.plotter.plot_masc(results: MASCResults, *, outcome: str, time: str, treated_color: str = 'black', counterfactual_color: str | List[str] = 'red', save: bool | str = False) None#
Outcome paths (shared archetype) plus the CV-grid panel.
The left panel overlays the treated trajectory and the MASC counterfactual (
φ * matching + (1 − φ) * SC). The right panel plots the cross-validation error against the candidatemgrid, annotating the CV-selected(m̂, φ̂).