Forward-Selected Synthetic Control (FSCM)

Contents

Forward-Selected Synthetic Control (FSCM)#

When to Use This Estimator#

The synthetic control method (SCM) of Abadie, Diamond and Hainmueller [ABADIE2010] builds a treated unit’s counterfactual as a convex combination of donor units. Conventional practice is to start from all available donors and let the simplex weights zero out the irrelevant ones. Cerulli [FSCM] argues this is often suboptimal: the number of initial donors is itself a complexity parameter governing a bias–variance trade-off. A richer donor pool fits the pre-treatment window better in sample, but each extra donor that is only weakly correlated with the treated unit injects variance into the counterfactual out of sample – the synthetic control overfits the pre-period and predicts the post-period worse.

FSCM resolves this by treating the donor count as a tuning parameter chosen by out-of-sample validation. It is the right tool when you have a single treated unit and a sizeable donor pool and suspect that not all donors deserve to be in the comparison set – when you want SCM to tell you how many and which donors to use, rather than assuming “more is better.” Because the selection is greedy (forward stepwise), it scales linearly in the pool size, unlike the \(2^N\) exhaustive subset search.

The donor (and predictor) weights are computed by the bilevel optimization of Malo, Eskelinen, Zhou and Kuosmanen [malo2023computing], implemented from scratch in mlsynth.utils.fscm_helpers.bilevel – no external QP solver is used. Two switches control the estimator:

  • forward_selection (default True) – when True, run the greedy forward selection with rolling-origin out-of-sample validation, fitting each candidate donor set with the bilevel solver. When False, skip selection entirely and take the full bilevel solve over all donors (the global SCM optimum), reporting the donors that carry weight.

  • covariates / match_periods – when given, the estimator runs in predictor mode (Abadie’s predictor matching with a bilevel-optimized predictor-weight matrix \(\mathbf{V}\)); when omitted, in trajectory mode (matching the pre-treatment outcome path). The four combinations all run.

Notation#

We use the synthetic-control canon. Unit \(j=0\) is treated and \(\mathcal{N} = \{1, \ldots, N\}\) indexes the donors; \(\mathbf{y}_0\) is the treated outcome and \(\mathbf{Y} = (y_{jt})\) the donor matrix. The intervention occurs after the pre-period \(\mathcal{T}_1 = \{1, \ldots, T_1\}\); the post-period is \(\mathcal{T}_2\) with \(T_2 = |\mathcal{T}_2|\). For a donor subset \(U \subseteq \mathcal{N}\), the simplex weights solve

\[\boldsymbol{\omega}^*(U) = \operatorname*{argmin}_{\boldsymbol{\omega}\in\Delta_U} \sum_{t\in\mathcal{S}} \bigl(y_{0t} - \mathbf{Y}_{t,U}\,\boldsymbol{\omega}\bigr)^2, \qquad \Delta_U = \Bigl\{\boldsymbol{\omega}\ge 0 : \textstyle\sum_{j\in U}\omega_j = 1\Bigr\},\]

fit over a window \(\mathcal{S}\), and the root-mean-square prediction error over an evaluation window \(\mathcal{E}\) is \(\mathrm{RMSPE}_{\mathcal{E}}(U) = \sqrt{|\mathcal{E}|^{-1}\sum_{t\in\mathcal{E}} (y_{0t} - \mathbf{Y}_{t,U}\boldsymbol{\omega}^*(U))^2}\).

Computing the weights: bilevel optimization#

With predictors, SCM jointly chooses predictor weights \(\mathbf{V}\) (a \(K\times K\) non-negative diagonal matrix on the simplex) and donor weights \(\mathbf{W}\). Malo et al. [malo2023computing] show this is an optimistic bilevel program: the upper level fits the outcome, the lower level fits the \(\mathbf{V}\)-weighted predictors,

\[\min_{\mathbf{V},\mathbf{W}} \; \tfrac{1}{T_1}\bigl\|\mathbf{y}_0^{\text{pre}} - \mathbf{Y}_0^{\text{pre}}\mathbf{W}\bigr\|^2 \quad\text{s.t.}\quad \mathbf{W} \in \operatorname*{argmin}_{\mathbf{W}\in\Delta} \bigl\|\mathbf{x}_0 - \mathbf{X}_0\mathbf{W}\bigr\|_{\mathbf{V}}^2 ,\]

which is NP-hard in general and is the reason off-the-shelf SCM packages can be numerically unstable. mlsynth implements the paper’s globally-convergent iterative algorithm in three stages, short-circuiting as soon as an optimum is certified (the paper notes the optimum is usually a corner found early):

  1. Unconstrained feasibility (Section 3.1) – solve the simplex regression of the treated outcome on the donors, giving the lower bound \(L(W^{**})\) on the upper-level loss; an LP over \(\mathbf{V}\) checks whether some predictor is already matched, which certifies optimality.

  2. Corner solutions (Section 3.2) – evaluate the \(K\) basic predictor weightings (all weight on one predictor) and keep the best by outcome loss.

  3. Tykhonov-regularized descent (Section 3.3) – only if a gap remains, descend over \(\mathbf{V}\) for a vanishing regularization sequence.

The lower-level (and the trajectory-mode) simplex problems are solved by a self-contained FISTA projected-gradient routine (simplex_lstsq()), which matches a reference QP solver to ~1e-8. In predictor mode the optimal \(\mathbf{V}\) is computed once on the full donor pool and reused through forward selection.

The forward stepwise algorithm#

When forward_selection=True (default), the donor count is chosen by Cerulli’s procedure ([FSCM], Table 1):

  1. Start from the empty model \(U_0 = \varnothing\).

  2. For \(k = 0, 1, \ldots, J-1\): among the \(J-k\) candidate donors not yet selected, add the one whose inclusion minimizes the in-sample pre-period RMSPE, giving the nested model \(U_{k+1} = U_k \cup \{j^*\}\).

  3. For each nested model \(U_k\), compute an out-of-sample validation RMSPE and select \(k^* = \operatorname*{argmin}_k \mathrm{CV}(U_k)\).

The selected donor set is \(U_{k^*}\); mlsynth then refits the weights on the full pre-period over \(U_{k^*}\) to form the counterfactual \(\hat{y}_{0t}^0 = \mathbf{Y}_{t,U_{k^*}}\boldsymbol{\omega}^*\), the gap \(\hat{\Delta}_t = y_{0t} - \hat{y}_{0t}^0\), and the ATT \(\bar{\Delta} = \mathcal{E}_{\mathcal{T}_2}(\hat{\Delta}_t)\).

When forward_selection=False the selection and cross-validation are skipped: the estimator returns the single full bilevel solve over all donors (the global SCM optimum), reporting the weight-bearing donors. This is faster and is the right choice when you want the canonical SCM weights rather than a parsimonious donor subset.

Rolling-origin cross-validation. Cerulli’s paper splits the pre-period once (early half train, late half test). With short pre-periods (Proposition 99 has only 19) a single split is noisy, so mlsynth uses an expanding-window, one-step-ahead scheme instead: for each origin \(t\) from \(\lceil T_1\cdot\texttt{cv\_split}\rceil\) to \(T_1-1\), weights are fit on \(\{1,\ldots,t-1\}\) and used to forecast period \(t\); the validation score is the RMSPE of those one-step forecasts. Every late pre-period period serves as a test point, using the data more efficiently than a single cut.

Assumptions (forward convex hull + selection consistency)#

FSCM relaxes canonical SCM’s “the full donor pool must contain a convex-hull match for the treated unit” to the much weaker “some subset does,” and the forward-stepwise selector is the device that finds it. The four assumptions that make the selector behave:

A1 (forward convex-hull condition – the identifying premise). There exists a non-empty subset \(U^* \subseteq \mathcal{N}\) and simplex weights \(\omega^* \in \Delta_{U^*}\) such that the treated unit’s pre-period trajectory is (approximately) reproduced by the corresponding donor combination,

\[y_{0t} \;\approx\; \sum_{j \in U^*} \omega_j^*\, y_{jt} \quad \text{for all } t \in \mathcal{T}_1.\]

The classical SCM hull condition is the special case \(U^* = \mathcal{N}\); FSCM operates whenever some subset (potentially a small one) supplies the hull. If no subset of controls can form a convex hull around the target, FSCM cannot be used – and no amount of forward stepwise will rescue it. The diagnostic is the lower envelope of in-sample RMSPE across the nested models \(\{U_k\}\): if it never falls below the noise floor at any size, A1 is failing.

A2 (stable pre/post relationship). The weights that reproduce the treated unit on the pre-period also reproduce its untreated trajectory on the post-period – the standard SCM identification premise carried through to the selected subset. Remark. This is what licenses using pre-period out-of-sample fit (the cross-validation test window) as a stand-in for post-period predictive accuracy of the counterfactual, which is never observed.

A3 (forward-stepwise selection consistency). Under regularity conditions on the donor pool and the pre-period length (Shi & Huang 2023, Theorem 1) [fsPDA], the forward stepwise selection rule recovers the oracle donor set wpa1:

\[\mathbb{P}\bigl( U_{k^*} = U^* \bigr) \;\longrightarrow\; 1 \qquad \text{as } T_1 \to \infty.\]

Remark. This is the wpa1 selection-consistency property that distinguishes FSCM from heuristic donor pre-screening: greedy forward steps are not just computationally tractable (\(1 + J(J+1)/2\) fits vs. the exhaustive \(2^J\)), they asymptotically pick the right subset. The regularity conditions follow Shi & Huang and assume bounded donor signals, mixing shocks within units, and a signal-strength condition on the oracle weights (no donor in \(U^*\) carries vanishing weight in \(\omega^*\)).

A4 (informative cross-validation split). The pre-period is long enough, and the donor / treated dynamics stable enough across the split, that the test-RMSPE on the held-out interval is a consistent estimate of out-of-sample prediction error. Remark. mlsynth’s expanding-window scheme expects the late pre-period to resemble the post-period in distribution; if the panel has a structural break inside the pre-window (a global financial crisis, a regime change), this assumption fails and the selected \(k^*\) will reflect the break, not the donor pool.

Empirical Illustration: California’s Proposition 99#

Following Cerulli’s application – the canonical Abadie [ABADIE2010] study of California’s 1988 tobacco-control program – FSCM runs on P99data.csv (per-capita cigarette sales, 38 control states, 1970–2000). The treated indicator is California from 1989 onward.

import pandas as pd
from mlsynth import FSCM

url = "https://raw.githubusercontent.com/jgreathouse9/mlsynth/refs/heads/main/basedata/P99data.csv"
df = pd.read_csv(url)
df["treated"] = ((df["state"] == "California") & (df["year"] >= 1989)).astype(int)

res = FSCM({"df": df, "outcome": "cigsale", "treat": "treated",
            "unitid": "state", "time": "year", "display_graphs": True}).fit()

print(f"optimal donors : {res.n_selected} of {res.fit_diagnostics['n_donors_available']}")
print(f"selected       : {res.selected_donors}")
print(f"ATT            : {res.att:.2f}")
print(f"pre-period R^2 : {res.fit_diagnostics['pre_r_squared']:.3f}")
print(f"CV RMSPE       : optimum={res.fit_diagnostics['cv_rmspe_at_optimum']:.3f}"
      f"  full pool={res.fit_diagnostics['cv_rmspe_full_pool']:.3f}")

This prints:

optimal donors : 3 of 38
selected       : ['Montana', 'Nevada', 'Utah']
ATT            : -20.15
pre-period R^2 : 0.970
CV RMSPE       : optimum=1.605  full pool=2.916

Forward selection keeps 3 of the 38 donors and estimates a drop of about 20 packs per capita, consistent with Abadie’s original synthetic-control estimate. The key diagnostic is the last line: the rolling-origin CV RMSPE at the optimum (1.61) is far below using all 38 donors (2.92) – the out-of-sample evidence for Cerulli’s bias–variance argument, that the smaller donor set forecasts the treated unit better than the full pool. With display_graphs=True the second panel plots the CV-RMSPE curve against the number of donors (the paper’s Fig. 2–3), with the selected count marked.

Matching on the author’s full predictor specification#

P99data.csv ships with Abadie’s predictor specification: the covariates lnincome, beer, age15to24, retprice and cigarette sales in 1975, 1980 and 1988. Abadie averages the covariates over 1980–1988 (beer over 1984–1988); these aggregation windows are given through covariate_windows, and the lagged smoking values through match_periods. With forward_selection=False the estimator returns the full bilevel SCM optimum over all donors:

res = FSCM({"df": df, "outcome": "cigsale", "treat": "treated",
            "unitid": "state", "time": "year", "forward_selection": False,
            "covariates": ["lnincome", "beer", "age15to24", "retprice"],
            "covariate_windows": {"lnincome": (1980, 1988), "age15to24": (1980, 1988),
                                  "retprice": (1980, 1988), "beer": (1984, 1988)},
            "match_periods": [1975, 1980, 1988]}).fit()
print(f"R^2 {res.fit_diagnostics['pre_r_squared']:.4f}  ATT {res.att:.2f}")
print({str(k): round(v, 3) for k, v in res.donor_weights.items() if v > 1e-3})
# R^2 0.9787  ATT -19.68
# {'Utah': 0.386, 'Montana': 0.257, 'Nevada': 0.206, 'Connecticut': 0.107, 'New Hampshire': 0.043}

This reproduces the global optimum reported in Malo et al. [malo2023computing] (their Table 1): \(R^2 = 0.979\) with donor weights concentrated on Utah, Montana, Nevada, Connecticut and New Hampshire – the solution the standard Synth package fails to reach. Consistent with the paper’s central finding, the optimal predictor weights \(\mathbf{V}\) form a corner solution that places all weight on a single predictor: many predictors are interchangeable at the optimum (the upper-level loss is non-unique in \(\mathbf{V}\)), so the bilevel solver certifies optimality at a corner rather than spreading weight across predictors.

Note

With forward_selection=True and predictors, the global \(\mathbf{V}\) is frozen and reused across candidate donor sets. Because that corner \(\mathbf{V}\) may rest on an easily-matched predictor, it can stop constraining the subset fits, so the rolling-origin CV may decline to prune the pool. For predictor matching, forward_selection=False (the full bilevel optimum) is therefore the more reliable choice; forward selection is most useful in trajectory mode.

Verification#

Note

Empirical (Proposition 99). In trajectory mode with forward selection, FSCM reproduces Cerulli’s regime: a small donor set (3 of 38) with a rolling-origin CV RMSPE minimized below the full pool – the out-of-sample signature of the bias–variance trade-off – and an ATT \(\approx -20\). With the full Abadie predictor spec and forward_selection=False, the bilevel solver reproduces the global SCM optimum of Malo et al. [malo2023computing] exactly (\(R^2 = 0.979\), Table 1 donor weights), which the Synth package does not reach.

Solver. The self-contained FISTA simplex solver agrees with a reference QP solver to ~1e-8 over random problems; the bilevel unconstrained feasibility certificate, corner-solution bounds, and a determinism check are unit-tested (mlsynth/tests/test_fscm_bilevel.py). All four forward_selection x covariates combinations are exercised in mlsynth/tests/test_fscm.py.

Core API#

Forward-Selected Synthetic Control (FSCM).

A thin, NumPy-first orchestration over mlsynth.utils.fscm_helpers. FSCM (Cerulli 2024) treats the number of donors as a complexity parameter governing a bias–variance trade-off. A forward stepwise selection grows a nested donor sequence on the training half of the pre-period (greedy on in-sample RMSPE), and a two-interval-time out-of-sample validation on the held-out test half picks the donor count that minimizes test RMSPE. The final simplex weights are refit on the full pre-period over the selected donors.

Optional covariate matching (covariates=[...]) augments the SCM objective with the author’s predictor specification; selection and validation scores are always measured on the outcome.

References

Cerulli, G. (2024). Optimal initial donor selection for the synthetic control method. Economics Letters, 244, 111976. https://doi.org/10.1016/j.econlet.2024.111976

class mlsynth.estimators.fscm.FSCM(config: FSCMConfig | dict)#

Bases: object

Forward-Selected Synthetic Control estimator.

Parameters:

config (FSCMConfig or dict) – Validated configuration. Beyond the common fields (df, outcome, treat, unitid, time, display_graphs, save, colors), FSCM reads covariates (optional predictor columns), match_periods (specific pre-treatment periods whose outcome value is matched directly, as in Abadie’s special predictors), cv_split (training fraction of the pre-period) and max_donors (cap on forward-selection steps).

fit() FSCMResults#

Select donors, fit the synthetic control, and return results.

Returns:

FSCMResults – Container with the optimal donor set, simplex weights, counterfactual, gap, ATT, fit diagnostics, and the forward-selection / cross-validation path.

Configuration#

class mlsynth.config_models.FSCMConfig(*, df: ~pandas.DataFrame, outcome: str, treat: str, unitid: str, time: str, display_graphs: bool = True, save: bool | str = False, counterfactual_color: ~typing.List[str] = <factory>, treated_color: str = 'black', forward_selection: bool = True, covariates: ~typing.List[str] | None = None, covariate_windows: ~typing.Dict[str, ~typing.Any] | None = None, match_periods: ~typing.List[~typing.Any] | None = None, cv_split: ~typing.Annotated[float, ~annotated_types.Gt(gt=0.0), ~annotated_types.Lt(lt=1.0)] = 0.5, max_donors: ~typing.Annotated[int | None, ~annotated_types.Ge(ge=1)] = None)#

Configuration for the Forward-Selected Synthetic Control Method (FSCM).

FSCM grows a nested donor sequence by forward stepwise selection on the training half of the pre-period (greedy on in-sample RMSPE), then chooses the donor count by minimizing out-of-sample RMSPE on the held-out test half (two-interval-time cross-validation). The final simplex weights are refit on the full pre-period over the selected donors.

References

Cerulli, Giovanni. 2024. “Optimal initial donor selection for the synthetic control method.” Economics Letters, 244: 111976. https://doi.org/10.1016/j.econlet.2024.111976

covariate_windows: Dict[str, Any] | None#
covariates: List[str] | None#
cv_split: float#
forward_selection: bool#
match_periods: List[Any] | None#
max_donors: int | None#
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Result Containers#

FSCM.fit() returns a FSCMResults: the weight-bearing donor set, simplex weights, the full-pool donor_weights dict, counterfactual, gap, ATT, fit diagnostics, and – when forward selection ran – a FSCMSelectionPath (per-size in-sample and rolling-origin CV RMSPE, and the selection order; None when forward_selection=False). The prepared, NumPy-only panel is exposed as a FSCMInputs, with units and time addressed through an IndexSet.

Frozen, NumPy-first containers for Forward-Selected SCM (FSCM).

FSCM (Cerulli 2024) treats the number of donors as a complexity parameter governing a bias–variance trade-off: a richer donor pool fits the pre-treatment window better in sample but injects variance from poorly correlated donors out of sample. A forward stepwise selection grows a nested donor sequence (greedy on the training-period fit), and a two-interval-time out-of-sample validation on the held-out tail of the pre-period picks the donor count that minimizes test RMSPE.

Everything below is pure NumPy; units/time are addressed through IndexSet. The only DataFrame touchpoint is setup.

References

Cerulli, G. (2024). Optimal initial donor selection for the synthetic control method. Economics Letters, 244, 111976.

class mlsynth.utils.fscm_helpers.structures.FSCMInputs(unit_index: ~mlsynth.utils.fast_scm_helpers.structure.IndexSet, time_index: ~mlsynth.utils.fast_scm_helpers.structure.IndexSet, y: ~numpy.ndarray, Y: ~numpy.ndarray, T0: int, treated_label: ~typing.Any, cov_treated: ~numpy.ndarray = <factory>, cov_donors: ~numpy.ndarray = <factory>, covariate_names: ~typing.List[str] = <factory>, match_idx: ~numpy.ndarray = <factory>, match_periods: ~typing.List[~typing.Any] = <factory>, metadata: ~typing.Dict[str, ~typing.Any] = <factory>)#

Bases: object

Preprocessed, NumPy-only panel for the FSCM engine.

Parameters:
  • unit_index (IndexSet) – All N donor units (column order of Y).

  • time_index (IndexSet) – All T periods (row order of y and Y).

  • y (np.ndarray) – Treated-unit outcome over all periods, shape (T,).

  • Y (np.ndarray) – Donor outcomes, shape (T, N).

  • T0 (int) – Number of pre-treatment periods; post is T2 = T - T0.

  • treated_label (Any) – Identifier of the treated unit.

  • cov_treated (np.ndarray, optional) – Treated covariate predictor values, shape (P,) – each covariate averaged over its (Abadie) aggregation window. Empty if no covariates.

  • cov_donors (np.ndarray, optional) – Donor covariate predictor values, shape (N, P). Empty if none.

  • covariate_names (list of str) – Names of the P covariate columns.

  • match_idx (np.ndarray, optional) – Time-row indices of “special predictor” periods – specific pre-treatment periods whose outcome value is matched directly (e.g. the 1975/1980/1988 cigarette-sales values in Abadie’s spec).

  • match_periods (list) – Labels of those periods, for provenance.

  • metadata (dict) – Free-form provenance.

property T: int#
T0: int#
property T2: int#
Y: ndarray#
cov_donors: ndarray#
cov_treated: ndarray#
covariate_names: List[str]#
property donor_labels: ndarray#
property has_covariates: bool#
property has_match_periods: bool#
property has_predictors: bool#
match_idx: ndarray#
match_periods: List[Any]#
metadata: Dict[str, Any]#
property n_donors: int#
time_index: IndexSet#
treated_label: Any#
unit_index: IndexSet#
y: ndarray#
class mlsynth.utils.fscm_helpers.structures.FSCMResults(inputs: ~mlsynth.utils.fscm_helpers.structures.FSCMInputs, selected_donors: ~typing.List[~typing.Any], weights: ~numpy.ndarray, donor_weights: ~typing.Dict[~typing.Any, float], counterfactual: ~numpy.ndarray, gap: ~numpy.ndarray, att: float, selection_path: ~mlsynth.utils.fscm_helpers.structures.FSCMSelectionPath | None = None, fit_diagnostics: ~typing.Dict[str, float] = <factory>, metadata: ~typing.Dict[str, ~typing.Any] = <factory>)#

Bases: object

Top-level container returned by mlsynth.FSCM.fit().

att: float#
counterfactual: ndarray#
donor_weights: Dict[Any, float]#
fit_diagnostics: Dict[str, float]#
gap: ndarray#
inputs: FSCMInputs#
metadata: Dict[str, Any]#
property n_selected: int#
property pre_rmse: float#
selected_donors: List[Any]#
selection_path: FSCMSelectionPath | None = None#
weights: ndarray#
class mlsynth.utils.fscm_helpers.structures.FSCMSelectionPath(sizes: ndarray, order: List[Any], train_rmspe: ndarray, test_rmspe: ndarray, optimal_size: int)#

Bases: object

The forward-selection / cross-validation trace (Cerulli Fig. 2–3).

Each entry k describes the nested model with k donors: the donor added at that step, the in-sample pre-period RMSPE, and the out-of-sample rolling-origin CV RMSPE used to choose the optimal size.

optimal_size: int#
order: List[Any]#
sizes: ndarray#
test_rmspe: ndarray#
train_rmspe: ndarray#

Helper Modules#

Data preparation – the only DataFrame touchpoint: pivots to NumPy, builds the unit/time ``IndexSet``es, splits pre/post, and assembles the optional covariate arrays.

Long-DataFrame -> NumPy boundary for FSCM (the only pandas touchpoint).

mlsynth.utils.fscm_helpers.setup.derive_treatment(df: DataFrame, unitid: str, time: str, treat: str) Tuple[Any, Any]#

Read the single treated unit and its first treated period from treat.

mlsynth.utils.fscm_helpers.setup.prepare_fscm_inputs(df: DataFrame, *, unitid: str, time: str, outcome: str, treat: str, covariates: List[str] | None = None, covariate_windows: Dict[str, Tuple[Any, Any]] | None = None, match_periods: List[Any] | None = None) FSCMInputs#

Pivot the panel to NumPy, build IndexSets, split pre/post.

Covariate predictors are each averaged over an aggregation window (Abadie’s specification): covariate_windows maps a covariate to an inclusive (start, end) label range; covariates not listed are averaged over the full pre-treatment period. match_periods are specific pre-treatment periods whose outcome value is matched directly.

The forward-selection / rolling-origin cross-validation engine and the weight-fitting that dispatches to the bilevel solver (predictor mode) or the trajectory simplex solve.

NumPy-first FSCM engine: forward selection + rolling-origin CV.

Implements Cerulli (2024) with two matching modes:

  • Trajectory mode (no covariates and no special predictors): the SCM weights match the treated unit’s pre-treatment outcome trajectory.

  • Predictor mode (covariates and/or match_periods given): the weights match Abadie’s predictor specification, with the predictor-weight matrix V and donor weights jointly determined by the bilevel optimization of Malo et al. (2024) – V is solved once on the full donor pool and then reused through forward selection (see bilevel).

In both modes the donor pool is grown greedily and the donor count is chosen by rolling-origin cross-validation; the final weights are refit on the full pre-period over the selected donors. All optimization is self-contained – the simplex problems are solved by the FISTA primitive in bilevel.simplex, not by Opt.SCopt.

mlsynth.utils.fscm_helpers.estimation.run_fscm(inputs: FSCMInputs, *, forward_selection: bool = True, cv_split: float = 0.5, max_donors: int | None = None) FSCMResults#

Synthetic control via the Malo et al. (2024) bilevel solver.

Parameters:
  • inputs (FSCMInputs) – Prepared NumPy panel.

  • forward_selection (bool, default True) – If True, run Cerulli’s forward selection with rolling-origin OOS validation to choose a donor subset (each candidate fit by the bilevel solver). If False, take the full bilevel solve over all donors with no selection.

  • cv_split (float, default 0.5) – Sets the first rolling origin (forward selection only): forecasting begins at period round(T0 * cv_split) and sweeps to the pre-period end.

  • max_donors (int, optional) – Cap on the number of forward-selection steps (default: all donors).

Plotting: the outcome paths and the donor-count CV-RMSPE selection curve.

Plotting for FSCM: treated-vs-counterfactual and the donor-count CV curve.

The observed-vs-counterfactual panel is delegated to the shared Plotter; the rolling-origin CV curve is FSCM’s own bespoke panel and stays local.

mlsynth.utils.fscm_helpers.plotter.plot_fscm(results: FSCMResults, *, outcome: str, time: str, treated_color: str = 'black', counterfactual_color: str | List[str] = 'red', save: bool | str = False) None#

Outcome paths (shared archetype) plus the donor-count CV curve.

Bilevel optimization (Malo et al. 2024)#

A self-contained implementation of the optimistic bilevel SCM program, used as the weight solver in predictor mode (and the simplex primitive everywhere). No external QP solver is involved.

The simplex-constrained least-squares core: Euclidean projection onto the probability simplex and the FISTA accelerated projected-gradient solver.

Self-contained simplex-constrained least squares (no external QP solver).

These primitives replace Opt.SCopt for the bilevel SCM experiment. The workhorse is an accelerated projected-gradient (FISTA) solver for

min_w ||A w - b||^2 s.t. w >= 0, sum(w) = 1,

built from two small, independently testable pieces: a Euclidean projection onto the probability simplex and a power-iteration estimate of the gradient’s Lipschitz constant.

mlsynth.utils.fscm_helpers.bilevel.simplex.mspe(y1: ndarray, Y0: ndarray, w: ndarray) float#

Mean squared prediction error mean((y1 - Y0 w)^2).

mlsynth.utils.fscm_helpers.bilevel.simplex.project_simplex(v: ndarray, z: float = 1.0) ndarray#

Euclidean projection of v onto {w >= 0, sum(w) = z}.

Uses the exact sort-based algorithm (Held, Wolfe & Crowder 1974; Duchi et al. 2008): \(O(n \log n)\) and exact.

mlsynth.utils.fscm_helpers.bilevel.simplex.simplex_lstsq(A: ndarray, b: ndarray, *, max_iter: int = 2000, tol: float = 1e-09, warn: bool = False) ndarray#

Minimize ||A w - b||^2 over the probability simplex via FISTA.

Parameters:
  • A (np.ndarray) – Design matrix, shape (m, n).

  • b (np.ndarray) – Target vector, shape (m,).

  • max_iter, tol (int, float) – Stopping controls.

  • warn (bool) – If True, emit a RuntimeWarning when max_iter is exhausted before the step norm falls below tol (i.e. FISTA did not converge). Off by default so the inner-loop callers stay silent.

Returns:

np.ndarray – Weights of shape (n,) on the simplex.

The three algorithm stages – unconstrained feasibility, corner solutions, and Tykhonov-regularized descent.

The three stages of the Malo et al. (2024) bilevel SCM algorithm.

  • unconstrained_feasibility() – Section 3.1: solve the constrained regression on outcomes for the lower bound, then check (via the predictor LP) whether that solution is already bilevel-optimal.

  • corner_solutions() – Section 3.2: evaluate the K basic predictor weightings (all weight on one predictor) and keep the best by upper-level loss.

  • tykhonov_refine() – Section 3.3: a pragmatic regularized descent over the predictor simplex, used only if a gap remains. (The paper notes the optimum is usually a corner found earlier, so this rarely changes anything.)

mlsynth.utils.fscm_helpers.bilevel.stages.corner_solutions(prob: BilevelProblem, *, eps: float = 1e-06) Tuple[ndarray, ndarray, float, List[float]]#

Section 3.2: best of the K corner (basic) predictor weightings.

For each predictor k set V = e_k, solve the lower-level problem for W_k, and score the upper-level loss L_V. Returns (V_best, W_best, upper_loss_best, all_upper_losses).

mlsynth.utils.fscm_helpers.bilevel.stages.tykhonov_refine(prob: BilevelProblem, V0: ndarray, *, eps0: float = 0.001, shrink: float = 0.5, outer_iters: int = 6, inner_iters: int = 8, fd_step: float = 0.001, lr: float = 0.2) Tuple[ndarray, ndarray, float, int]#

Section 3.3: pragmatic regularized descent over the predictor simplex.

For a decreasing sequence eps_k -> 0 (Proposition 4) the implicit objective L_eps(V) = L_V(V, W*_eps(V)) is minimized by projected gradient descent on V (finite-difference gradient, simplex projection). A lighter stand-in for the full Bouligand-stationary QP of Appendix B.1; it only moves V when doing so lowers the outcome loss.

mlsynth.utils.fscm_helpers.bilevel.stages.unconstrained_feasibility(prob: BilevelProblem, *, feas_tol: float = 1e-08) Tuple[ndarray, ndarray, float, bool]#

Section 3.1: unconstrained simplex regression + feasibility check.

Returns (W_unc, V_star, lower_bound, is_optimal). W_unc minimizes the outcome fit on the simplex (Eq. 3) and lower_bound = L(W**). The predictor LP (Eq. 10) is linear in V over the simplex, so its optimum is the basis vector on the best-matched predictor; if that predictor’s discrepancy is ~0, W_unc is the global bilevel solution.

mlsynth.utils.fscm_helpers.bilevel.stages.warn_on_gap(gap: float, lower_bound: float, factor: float, *, stacklevel: int = 2) None#

Warn when the bilevel optimality gap is a large multiple of the bound.

A large gap means the predictor-constrained fit is far worse than the best achievable outcome fit – i.e. the predictors are hard to match on the donor simplex and the predictor weights V are weakly identified (the pathology behind diverging malo/mscmt estimates). Only fires for a non-trivial bound.

The driver composing the stages, and the structured problem/solution containers.

Driver for the bilevel SCM solver.

Two interchangeable backends solve the same optimistic bilevel program (minimise pre-treatment outcome fit, subject to W minimising the V-weighted predictor discrepancy on the donor simplex):

  • "malo" (default) – Malo et al. (2024): the staged corner search of stages, short-circuiting as soon as an optimal solution is certified (the optimum is typically a corner found in the early stages).

  • "mscmt" – Becker & Kloessner (2018): a global differential-evolution search over log10(V) with the simplex inner solve (mscmt).

  • "penalized" – Abadie & L’Hour (2021): fixes Gamma = I and adds a pairwise matching penalty, with lambda chosen by leave-one-out CV. It yields a unique, sparse W (penalized), sidestepping the V non-identification that makes malo and mscmt disagree.

The first two share the Section 3.1 / MSCMT Eq. 13 global-optimum certificate and the same lower bound, so they agree exactly whenever the unconstrained outcome optimum is predictor-feasible; they differ when the optimal V is interior. The penalized backend resolves that ambiguity by construction.

mlsynth.utils.fscm_helpers.bilevel.solver.lower_level_weights(prob: BilevelProblem, V: ndarray, eps: float = 1e-06) ndarray#

Public wrapper: donor weights for fixed predictor weights V.

mlsynth.utils.fscm_helpers.bilevel.solver.solve_bilevel(prob: BilevelProblem, *, method: str = 'malo', **kwargs) BilevelSolution#

Solve the bilevel SCM problem (Eq. 6-7), dispatching on method.

Parameters:
  • prob (BilevelProblem) – Outcome and predictor matrices.

  • method ({“malo”, “mscmt”, “penalized”}) – Which backend to use. "malo" (default) runs the staged corner search; "mscmt" runs the global differential-evolution outer search; "penalized" runs the Abadie-L’Hour penalized estimator with leave-one-out lambda selection. The default preserves the historical behaviour of every existing caller.

  • **kwargs – Backend-specific options. For "malo": feas_tol, eps_corner, refine, refine_gap_tol. For "mscmt": lb, maxiter, popsize, tol, seed, polish, feas_tol. For "penalized": lam (default "loo"), lam_grid, max_iter, tol (see mlsynth.utils.fscm_helpers.bilevel.penalized.solve_penalized()).

Returns:

BilevelSolution

Frozen containers for the bilevel SCM optimization (Malo et al. 2024).

The synthetic-control problem is an optimistic bilevel program: the upper level chooses predictor weights V (and, implicitly, donor weights W) to minimize the pre-treatment outcome fit, while the lower level chooses W to minimize the V-weighted predictor discrepancy on the donor simplex.

References

Malo, P., Eskelinen, J., Zhou, X., & Kuosmanen, T. (2024). Computing Synthetic Controls Using Bilevel Optimization. Computational Economics, 64, 1113-1136. https://doi.org/10.1007/s10614-023-10471-7

class mlsynth.utils.fscm_helpers.bilevel.structure.BilevelProblem(y1_pre: ~numpy.ndarray, Y0_pre: ~numpy.ndarray, X1: ~numpy.ndarray, X0: ~numpy.ndarray, predictor_names: ~typing.List[~typing.Any] = <factory>)#

Bases: object

Inputs to the bilevel SCM solver (pure NumPy).

Parameters:
  • y1_pre (np.ndarray) – Treated pre-treatment outcomes, shape (Tpre,) – the upper-level target Y_1^{pre}.

  • Y0_pre (np.ndarray) – Donor pre-treatment outcomes, shape (Tpre, J)Y_0^{pre}.

  • X1 (np.ndarray) – Treated predictor vector, shape (K,)X_1.

  • X0 (np.ndarray) – Donor predictor matrix, shape (K, J)X_0.

  • predictor_names (list) – Labels of the K predictors, for provenance.

Raises:

MlsynthDataError – If the arrays are non-finite, wrongly dimensioned, mutually inconsistent in shape, or carry no donors. K == 0 (no predictors) is permitted – only the predictor-free penalized backend accepts it.

property Tpre: int#
X0: ndarray#
X1: ndarray#
Y0_pre: ndarray#
property n_donors: int#
property n_predictors: int#
predictor_names: List[Any]#
y1_pre: ndarray#
class mlsynth.utils.fscm_helpers.bilevel.structure.BilevelSolution(V: ~numpy.ndarray, W: ~numpy.ndarray, upper_loss: float, lower_loss: float, lower_bound: float, stage: str, iterations: int = 0, metadata: ~typing.Dict[str, ~typing.Any] = <factory>)#

Bases: object

Output of the bilevel SCM solver.

V#

Optimal predictor weights, shape (K,) on the simplex.

Type:

np.ndarray

W#

Optimal donor weights, shape (J,) on the simplex.

Type:

np.ndarray

upper_loss#

Upper-level loss L_V = ||Y1_pre - Y0_pre W||^2 (outcome fit).

Type:

float

lower_loss#

Lower-level loss L_W = ||X1 - X0 W||^2_V (V-weighted predictor fit).

Type:

float

lower_bound#

L(W**) from the unconstrained simplex regression – the global lower bound on upper_loss.

Type:

float

stage#

Which stage produced the solution: "unconstrained", "corner", or "tykhonov".

Type:

str

iterations#

Number of refinement iterations performed (0 if a corner/unconstrained solution was accepted).

Type:

int

metadata#

Free-form diagnostics (corner losses, gap, etc.).

Type:

dict

V: ndarray#
W: ndarray#
property gap: float#

Upper-level optimality gap L_V - L(W**) (>= 0).

iterations: int = 0#
lower_bound: float#
lower_loss: float#
metadata: Dict[str, Any]#
stage: str#
upper_loss: float#