Forward-Selected Synthetic Control (FSCM)#
When to Use This Estimator#
The synthetic control method (SCM) of Abadie, Diamond and Hainmueller [ABADIE2010] builds a treated unit’s counterfactual as a convex combination of donor units. Conventional practice is to start from all available donors and let the simplex weights zero out the irrelevant ones. Cerulli [FSCM] argues this is often suboptimal: the number of initial donors is itself a complexity parameter governing a bias–variance trade-off. A richer donor pool fits the pre-treatment window better in sample, but each extra donor that is only weakly correlated with the treated unit injects variance into the counterfactual out of sample – the synthetic control overfits the pre-period and predicts the post-period worse.
FSCM resolves this by treating the donor count as a tuning parameter chosen
by out-of-sample validation. It is the right tool when you have a single
treated unit and a sizeable donor pool and suspect that not all donors
deserve to be in the comparison set – when you want SCM to tell you how many
and which donors to use, rather than assuming “more is better.” Because the
selection is greedy (forward stepwise), it scales linearly in the pool size,
unlike the \(2^N\) exhaustive subset search.
The donor (and predictor) weights are computed by the bilevel optimization
of Malo, Eskelinen, Zhou and Kuosmanen [malo2023computing], implemented from
scratch in mlsynth.utils.fscm_helpers.bilevel – no external QP solver
is used. Two switches control the estimator:
forward_selection(defaultTrue) – whenTrue, run the greedy forward selection with rolling-origin out-of-sample validation, fitting each candidate donor set with the bilevel solver. WhenFalse, skip selection entirely and take the full bilevel solve over all donors (the global SCM optimum), reporting the donors that carry weight.covariates/match_periods– when given, the estimator runs in predictor mode (Abadie’s predictor matching with a bilevel-optimized predictor-weight matrix \(\mathbf{V}\)); when omitted, in trajectory mode (matching the pre-treatment outcome path). The four combinations all run.
Notation#
We use the synthetic-control canon. Unit \(j=0\) is treated and \(\mathcal{N} = \{1, \ldots, N\}\) indexes the donors; \(\mathbf{y}_0\) is the treated outcome and \(\mathbf{Y} = (y_{jt})\) the donor matrix. The intervention occurs after the pre-period \(\mathcal{T}_1 = \{1, \ldots, T_1\}\); the post-period is \(\mathcal{T}_2\) with \(T_2 = |\mathcal{T}_2|\). For a donor subset \(U \subseteq \mathcal{N}\), the simplex weights solve
fit over a window \(\mathcal{S}\), and the root-mean-square prediction error over an evaluation window \(\mathcal{E}\) is \(\mathrm{RMSPE}_{\mathcal{E}}(U) = \sqrt{|\mathcal{E}|^{-1}\sum_{t\in\mathcal{E}} (y_{0t} - \mathbf{Y}_{t,U}\boldsymbol{\omega}^*(U))^2}\).
Computing the weights: bilevel optimization#
With predictors, SCM jointly chooses predictor weights \(\mathbf{V}\) (a \(K\times K\) non-negative diagonal matrix on the simplex) and donor weights \(\mathbf{W}\). Malo et al. [malo2023computing] show this is an optimistic bilevel program: the upper level fits the outcome, the lower level fits the \(\mathbf{V}\)-weighted predictors,
which is NP-hard in general and is the reason off-the-shelf SCM packages can be
numerically unstable. mlsynth implements the paper’s globally-convergent
iterative algorithm in three stages, short-circuiting as soon as an optimum is
certified (the paper notes the optimum is usually a corner found early):
Unconstrained feasibility (Section 3.1) – solve the simplex regression of the treated outcome on the donors, giving the lower bound \(L(W^{**})\) on the upper-level loss; an LP over \(\mathbf{V}\) checks whether some predictor is already matched, which certifies optimality.
Corner solutions (Section 3.2) – evaluate the \(K\) basic predictor weightings (all weight on one predictor) and keep the best by outcome loss.
Tykhonov-regularized descent (Section 3.3) – only if a gap remains, descend over \(\mathbf{V}\) for a vanishing regularization sequence.
The lower-level (and the trajectory-mode) simplex problems are solved by a
self-contained FISTA projected-gradient routine
(simplex_lstsq()), which matches a
reference QP solver to ~1e-8. In predictor mode the optimal \(\mathbf{V}\)
is computed once on the full donor pool and reused through forward
selection.
The forward stepwise algorithm#
When forward_selection=True (default), the donor count is chosen by
Cerulli’s procedure ([FSCM], Table 1):
Start from the empty model \(U_0 = \varnothing\).
For \(k = 0, 1, \ldots, J-1\): among the \(J-k\) candidate donors not yet selected, add the one whose inclusion minimizes the in-sample pre-period RMSPE, giving the nested model \(U_{k+1} = U_k \cup \{j^*\}\).
For each nested model \(U_k\), compute an out-of-sample validation RMSPE and select \(k^* = \operatorname*{argmin}_k \mathrm{CV}(U_k)\).
The selected donor set is \(U_{k^*}\); mlsynth then refits the weights
on the full pre-period over \(U_{k^*}\) to form the counterfactual
\(\hat{y}_{0t}^0 = \mathbf{Y}_{t,U_{k^*}}\boldsymbol{\omega}^*\), the gap
\(\hat{\Delta}_t = y_{0t} - \hat{y}_{0t}^0\), and the ATT
\(\bar{\Delta} = \mathcal{E}_{\mathcal{T}_2}(\hat{\Delta}_t)\).
When forward_selection=False the selection and cross-validation are skipped:
the estimator returns the single full bilevel solve over all donors (the
global SCM optimum), reporting the weight-bearing donors. This is faster and is
the right choice when you want the canonical SCM weights rather than a
parsimonious donor subset.
Rolling-origin cross-validation. Cerulli’s paper splits the pre-period once
(early half train, late half test). With short pre-periods (Proposition 99 has
only 19) a single split is noisy, so mlsynth uses an expanding-window,
one-step-ahead scheme instead: for each origin \(t\) from
\(\lceil T_1\cdot\texttt{cv\_split}\rceil\) to \(T_1-1\), weights are fit
on \(\{1,\ldots,t-1\}\) and used to forecast period \(t\); the
validation score is the RMSPE of those one-step forecasts. Every late
pre-period period serves as a test point, using the data more efficiently than
a single cut.
Assumptions (forward convex hull + selection consistency)#
FSCM relaxes canonical SCM’s “the full donor pool must contain a convex-hull match for the treated unit” to the much weaker “some subset does,” and the forward-stepwise selector is the device that finds it. The four assumptions that make the selector behave:
A1 (forward convex-hull condition – the identifying premise). There exists a non-empty subset \(U^* \subseteq \mathcal{N}\) and simplex weights \(\omega^* \in \Delta_{U^*}\) such that the treated unit’s pre-period trajectory is (approximately) reproduced by the corresponding donor combination,
The classical SCM hull condition is the special case \(U^* = \mathcal{N}\); FSCM operates whenever some subset (potentially a small one) supplies the hull. If no subset of controls can form a convex hull around the target, FSCM cannot be used – and no amount of forward stepwise will rescue it. The diagnostic is the lower envelope of in-sample RMSPE across the nested models \(\{U_k\}\): if it never falls below the noise floor at any size, A1 is failing.
A2 (stable pre/post relationship). The weights that reproduce the treated unit on the pre-period also reproduce its untreated trajectory on the post-period – the standard SCM identification premise carried through to the selected subset. Remark. This is what licenses using pre-period out-of-sample fit (the cross-validation test window) as a stand-in for post-period predictive accuracy of the counterfactual, which is never observed.
A3 (forward-stepwise selection consistency). Under regularity conditions on the donor pool and the pre-period length (Shi & Huang 2023, Theorem 1) [fsPDA], the forward stepwise selection rule recovers the oracle donor set wpa1:
Remark. This is the wpa1 selection-consistency property that distinguishes FSCM from heuristic donor pre-screening: greedy forward steps are not just computationally tractable (\(1 + J(J+1)/2\) fits vs. the exhaustive \(2^J\)), they asymptotically pick the right subset. The regularity conditions follow Shi & Huang and assume bounded donor signals, mixing shocks within units, and a signal-strength condition on the oracle weights (no donor in \(U^*\) carries vanishing weight in \(\omega^*\)).
A4 (informative cross-validation split). The pre-period is long enough, and the donor / treated dynamics stable enough across the split, that the test-RMSPE on the held-out interval is a consistent estimate of out-of-sample prediction error. Remark. mlsynth’s expanding-window scheme expects the late pre-period to resemble the post-period in distribution; if the panel has a structural break inside the pre-window (a global financial crisis, a regime change), this assumption fails and the selected \(k^*\) will reflect the break, not the donor pool.
Empirical Illustration: California’s Proposition 99#
Following Cerulli’s application – the canonical Abadie [ABADIE2010] study of
California’s 1988 tobacco-control program – FSCM runs on P99data.csv
(per-capita cigarette sales, 38 control states, 1970–2000). The treated
indicator is California from 1989 onward.
import pandas as pd
from mlsynth import FSCM
url = "https://raw.githubusercontent.com/jgreathouse9/mlsynth/refs/heads/main/basedata/P99data.csv"
df = pd.read_csv(url)
df["treated"] = ((df["state"] == "California") & (df["year"] >= 1989)).astype(int)
res = FSCM({"df": df, "outcome": "cigsale", "treat": "treated",
"unitid": "state", "time": "year", "display_graphs": True}).fit()
print(f"optimal donors : {res.n_selected} of {res.fit_diagnostics['n_donors_available']}")
print(f"selected : {res.selected_donors}")
print(f"ATT : {res.att:.2f}")
print(f"pre-period R^2 : {res.fit_diagnostics['pre_r_squared']:.3f}")
print(f"CV RMSPE : optimum={res.fit_diagnostics['cv_rmspe_at_optimum']:.3f}"
f" full pool={res.fit_diagnostics['cv_rmspe_full_pool']:.3f}")
This prints:
optimal donors : 3 of 38
selected : ['Montana', 'Nevada', 'Utah']
ATT : -20.15
pre-period R^2 : 0.970
CV RMSPE : optimum=1.605 full pool=2.916
Forward selection keeps 3 of the 38 donors and estimates a drop of about
20 packs per capita, consistent with Abadie’s original synthetic-control
estimate. The key diagnostic is the last line: the rolling-origin CV RMSPE at
the optimum (1.61) is far below using all 38 donors (2.92) – the
out-of-sample evidence for Cerulli’s bias–variance argument, that the smaller
donor set forecasts the treated unit better than the full pool. With
display_graphs=True the second panel plots the CV-RMSPE curve against the
number of donors (the paper’s Fig. 2–3), with the selected count marked.
Verification#
Note
Empirical (Proposition 99). In trajectory mode with forward selection,
FSCM reproduces Cerulli’s regime: a small donor set (3 of 38) with a
rolling-origin CV RMSPE minimized below the full pool – the
out-of-sample signature of the bias–variance trade-off – and an ATT
\(\approx -20\). With the full Abadie predictor spec and
forward_selection=False, the bilevel solver reproduces the global SCM
optimum of Malo et al. [malo2023computing] exactly (\(R^2 = 0.979\),
Table 1 donor weights), which the Synth package does not reach.
Solver. The self-contained FISTA simplex solver agrees with a reference
QP solver to ~1e-8 over random problems; the bilevel unconstrained
feasibility certificate, corner-solution bounds, and a determinism check are
unit-tested (mlsynth/tests/test_fscm_bilevel.py). All four
forward_selection x covariates combinations are exercised in
mlsynth/tests/test_fscm.py.
Core API#
Forward-Selected Synthetic Control (FSCM).
A thin, NumPy-first orchestration over mlsynth.utils.fscm_helpers.
FSCM (Cerulli 2024) treats the number of donors as a complexity parameter
governing a bias–variance trade-off. A forward stepwise selection grows a
nested donor sequence on the training half of the pre-period (greedy on
in-sample RMSPE), and a two-interval-time out-of-sample validation on the
held-out test half picks the donor count that minimizes test RMSPE. The final
simplex weights are refit on the full pre-period over the selected donors.
Optional covariate matching (covariates=[...]) augments the SCM objective
with the author’s predictor specification; selection and validation scores are
always measured on the outcome.
References
Cerulli, G. (2024). Optimal initial donor selection for the synthetic control method. Economics Letters, 244, 111976. https://doi.org/10.1016/j.econlet.2024.111976
- class mlsynth.estimators.fscm.FSCM(config: FSCMConfig | dict)#
Bases:
objectForward-Selected Synthetic Control estimator.
- Parameters:
config (FSCMConfig or dict) – Validated configuration. Beyond the common fields (
df,outcome,treat,unitid,time,display_graphs,save, colors), FSCM readscovariates(optional predictor columns),match_periods(specific pre-treatment periods whose outcome value is matched directly, as in Abadie’s special predictors),cv_split(training fraction of the pre-period) andmax_donors(cap on forward-selection steps).
- fit() FSCMResults#
Select donors, fit the synthetic control, and return results.
- Returns:
FSCMResults – Container with the optimal donor set, simplex weights, counterfactual, gap, ATT, fit diagnostics, and the forward-selection / cross-validation path.
Configuration#
- class mlsynth.config_models.FSCMConfig(*, df: ~pandas.DataFrame, outcome: str, treat: str, unitid: str, time: str, display_graphs: bool = True, save: bool | str = False, counterfactual_color: ~typing.List[str] = <factory>, treated_color: str = 'black', forward_selection: bool = True, covariates: ~typing.List[str] | None = None, covariate_windows: ~typing.Dict[str, ~typing.Any] | None = None, match_periods: ~typing.List[~typing.Any] | None = None, cv_split: ~typing.Annotated[float, ~annotated_types.Gt(gt=0.0), ~annotated_types.Lt(lt=1.0)] = 0.5, max_donors: ~typing.Annotated[int | None, ~annotated_types.Ge(ge=1)] = None)#
Configuration for the Forward-Selected Synthetic Control Method (FSCM).
FSCM grows a nested donor sequence by forward stepwise selection on the training half of the pre-period (greedy on in-sample RMSPE), then chooses the donor count by minimizing out-of-sample RMSPE on the held-out test half (two-interval-time cross-validation). The final simplex weights are refit on the full pre-period over the selected donors.
References
Cerulli, Giovanni. 2024. “Optimal initial donor selection for the synthetic control method.” Economics Letters, 244: 111976. https://doi.org/10.1016/j.econlet.2024.111976
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
Result Containers#
FSCM.fit() returns a
FSCMResults: the weight-bearing
donor set, simplex weights, the full-pool donor_weights dict,
counterfactual, gap, ATT, fit diagnostics, and – when forward selection ran –
a FSCMSelectionPath (per-size
in-sample and rolling-origin CV RMSPE, and the selection order;
None when forward_selection=False). The prepared, NumPy-only panel is
exposed as a FSCMInputs, with
units and time addressed through an IndexSet.
Frozen, NumPy-first containers for Forward-Selected SCM (FSCM).
FSCM (Cerulli 2024) treats the number of donors as a complexity parameter governing a bias–variance trade-off: a richer donor pool fits the pre-treatment window better in sample but injects variance from poorly correlated donors out of sample. A forward stepwise selection grows a nested donor sequence (greedy on the training-period fit), and a two-interval-time out-of-sample validation on the held-out tail of the pre-period picks the donor count that minimizes test RMSPE.
Everything below is pure NumPy; units/time are addressed through
IndexSet. The only DataFrame touchpoint is setup.
References
Cerulli, G. (2024). Optimal initial donor selection for the synthetic control method. Economics Letters, 244, 111976.
- class mlsynth.utils.fscm_helpers.structures.FSCMInputs(unit_index: ~mlsynth.utils.fast_scm_helpers.structure.IndexSet, time_index: ~mlsynth.utils.fast_scm_helpers.structure.IndexSet, y: ~numpy.ndarray, Y: ~numpy.ndarray, T0: int, treated_label: ~typing.Any, cov_treated: ~numpy.ndarray = <factory>, cov_donors: ~numpy.ndarray = <factory>, covariate_names: ~typing.List[str] = <factory>, match_idx: ~numpy.ndarray = <factory>, match_periods: ~typing.List[~typing.Any] = <factory>, metadata: ~typing.Dict[str, ~typing.Any] = <factory>)#
Bases:
objectPreprocessed, NumPy-only panel for the FSCM engine.
- Parameters:
unit_index (IndexSet) – All
Ndonor units (column order ofY).time_index (IndexSet) – All
Tperiods (row order ofyandY).y (np.ndarray) – Treated-unit outcome over all periods, shape
(T,).Y (np.ndarray) – Donor outcomes, shape
(T, N).T0 (int) – Number of pre-treatment periods; post is
T2 = T - T0.treated_label (Any) – Identifier of the treated unit.
cov_treated (np.ndarray, optional) – Treated covariate predictor values, shape
(P,)– each covariate averaged over its (Abadie) aggregation window. Empty if no covariates.cov_donors (np.ndarray, optional) – Donor covariate predictor values, shape
(N, P). Empty if none.covariate_names (list of str) – Names of the
Pcovariate columns.match_idx (np.ndarray, optional) – Time-row indices of “special predictor” periods – specific pre-treatment periods whose outcome value is matched directly (e.g. the 1975/1980/1988 cigarette-sales values in Abadie’s spec).
match_periods (list) – Labels of those periods, for provenance.
metadata (dict) – Free-form provenance.
- Y: ndarray#
- cov_donors: ndarray#
- cov_treated: ndarray#
- property donor_labels: ndarray#
- match_idx: ndarray#
- time_index: IndexSet#
- unit_index: IndexSet#
- y: ndarray#
- class mlsynth.utils.fscm_helpers.structures.FSCMResults(inputs: ~mlsynth.utils.fscm_helpers.structures.FSCMInputs, selected_donors: ~typing.List[~typing.Any], weights: ~numpy.ndarray, donor_weights: ~typing.Dict[~typing.Any, float], counterfactual: ~numpy.ndarray, gap: ~numpy.ndarray, att: float, selection_path: ~mlsynth.utils.fscm_helpers.structures.FSCMSelectionPath | None = None, fit_diagnostics: ~typing.Dict[str, float] = <factory>, metadata: ~typing.Dict[str, ~typing.Any] = <factory>)#
Bases:
objectTop-level container returned by
mlsynth.FSCM.fit().- counterfactual: ndarray#
- gap: ndarray#
- inputs: FSCMInputs#
- selection_path: FSCMSelectionPath | None = None#
- weights: ndarray#
- class mlsynth.utils.fscm_helpers.structures.FSCMSelectionPath(sizes: ndarray, order: List[Any], train_rmspe: ndarray, test_rmspe: ndarray, optimal_size: int)#
Bases:
objectThe forward-selection / cross-validation trace (Cerulli Fig. 2–3).
Each entry
kdescribes the nested model withkdonors: the donor added at that step, the in-sample pre-period RMSPE, and the out-of-sample rolling-origin CV RMSPE used to choose the optimal size.- sizes: ndarray#
- test_rmspe: ndarray#
- train_rmspe: ndarray#
Helper Modules#
Data preparation – the only DataFrame touchpoint: pivots to NumPy, builds the unit/time ``IndexSet``es, splits pre/post, and assembles the optional covariate arrays.
Long-DataFrame -> NumPy boundary for FSCM (the only pandas touchpoint).
- mlsynth.utils.fscm_helpers.setup.derive_treatment(df: DataFrame, unitid: str, time: str, treat: str) Tuple[Any, Any]#
Read the single treated unit and its first treated period from
treat.
- mlsynth.utils.fscm_helpers.setup.prepare_fscm_inputs(df: DataFrame, *, unitid: str, time: str, outcome: str, treat: str, covariates: List[str] | None = None, covariate_windows: Dict[str, Tuple[Any, Any]] | None = None, match_periods: List[Any] | None = None) FSCMInputs#
Pivot the panel to NumPy, build
IndexSets, split pre/post.Covariate predictors are each averaged over an aggregation window (Abadie’s specification):
covariate_windowsmaps a covariate to an inclusive(start, end)label range; covariates not listed are averaged over the full pre-treatment period.match_periodsare specific pre-treatment periods whose outcome value is matched directly.
The forward-selection / rolling-origin cross-validation engine and the weight-fitting that dispatches to the bilevel solver (predictor mode) or the trajectory simplex solve.
NumPy-first FSCM engine: forward selection + rolling-origin CV.
Implements Cerulli (2024) with two matching modes:
Trajectory mode (no covariates and no special predictors): the SCM weights match the treated unit’s pre-treatment outcome trajectory.
Predictor mode (covariates and/or
match_periodsgiven): the weights match Abadie’s predictor specification, with the predictor-weight matrixVand donor weights jointly determined by the bilevel optimization of Malo et al. (2024) –Vis solved once on the full donor pool and then reused through forward selection (seebilevel).
In both modes the donor pool is grown greedily and the donor count is chosen
by rolling-origin cross-validation; the final weights are refit on the full
pre-period over the selected donors. All optimization is self-contained – the
simplex problems are solved by the FISTA primitive in bilevel.simplex,
not by Opt.SCopt.
- mlsynth.utils.fscm_helpers.estimation.run_fscm(inputs: FSCMInputs, *, forward_selection: bool = True, cv_split: float = 0.5, max_donors: int | None = None) FSCMResults#
Synthetic control via the Malo et al. (2024) bilevel solver.
- Parameters:
inputs (FSCMInputs) – Prepared NumPy panel.
forward_selection (bool, default True) – If True, run Cerulli’s forward selection with rolling-origin OOS validation to choose a donor subset (each candidate fit by the bilevel solver). If False, take the full bilevel solve over all donors with no selection.
cv_split (float, default 0.5) – Sets the first rolling origin (forward selection only): forecasting begins at period
round(T0 * cv_split)and sweeps to the pre-period end.max_donors (int, optional) – Cap on the number of forward-selection steps (default: all donors).
Plotting: the outcome paths and the donor-count CV-RMSPE selection curve.
Plotting for FSCM: treated-vs-counterfactual and the donor-count CV curve.
The observed-vs-counterfactual panel is delegated to the shared
Plotter; the rolling-origin CV curve is FSCM’s
own bespoke panel and stays local.
Bilevel optimization (Malo et al. 2024)#
A self-contained implementation of the optimistic bilevel SCM program, used as the weight solver in predictor mode (and the simplex primitive everywhere). No external QP solver is involved.
The simplex-constrained least-squares core: Euclidean projection onto the probability simplex and the FISTA accelerated projected-gradient solver.
Self-contained simplex-constrained least squares (no external QP solver).
These primitives replace Opt.SCopt for the bilevel SCM experiment. The
workhorse is an accelerated projected-gradient (FISTA) solver for
min_w ||A w - b||^2 s.t. w >= 0, sum(w) = 1,
built from two small, independently testable pieces: a Euclidean projection onto the probability simplex and a power-iteration estimate of the gradient’s Lipschitz constant.
- mlsynth.utils.fscm_helpers.bilevel.simplex.mspe(y1: ndarray, Y0: ndarray, w: ndarray) float#
Mean squared prediction error
mean((y1 - Y0 w)^2).
- mlsynth.utils.fscm_helpers.bilevel.simplex.project_simplex(v: ndarray, z: float = 1.0) ndarray#
Euclidean projection of
vonto{w >= 0, sum(w) = z}.Uses the exact sort-based algorithm (Held, Wolfe & Crowder 1974; Duchi et al. 2008): \(O(n \log n)\) and exact.
- mlsynth.utils.fscm_helpers.bilevel.simplex.simplex_lstsq(A: ndarray, b: ndarray, *, max_iter: int = 2000, tol: float = 1e-09, warn: bool = False) ndarray#
Minimize
||A w - b||^2over the probability simplex via FISTA.- Parameters:
A (np.ndarray) – Design matrix, shape
(m, n).b (np.ndarray) – Target vector, shape
(m,).max_iter, tol (int, float) – Stopping controls.
warn (bool) – If
True, emit aRuntimeWarningwhenmax_iteris exhausted before the step norm falls belowtol(i.e. FISTA did not converge). Off by default so the inner-loop callers stay silent.
- Returns:
np.ndarray – Weights of shape
(n,)on the simplex.
The three algorithm stages – unconstrained feasibility, corner solutions, and Tykhonov-regularized descent.
The three stages of the Malo et al. (2024) bilevel SCM algorithm.
unconstrained_feasibility()– Section 3.1: solve the constrained regression on outcomes for the lower bound, then check (via the predictor LP) whether that solution is already bilevel-optimal.corner_solutions()– Section 3.2: evaluate theKbasic predictor weightings (all weight on one predictor) and keep the best by upper-level loss.tykhonov_refine()– Section 3.3: a pragmatic regularized descent over the predictor simplex, used only if a gap remains. (The paper notes the optimum is usually a corner found earlier, so this rarely changes anything.)
- mlsynth.utils.fscm_helpers.bilevel.stages.corner_solutions(prob: BilevelProblem, *, eps: float = 1e-06) Tuple[ndarray, ndarray, float, List[float]]#
Section 3.2: best of the
Kcorner (basic) predictor weightings.For each predictor
ksetV = e_k, solve the lower-level problem forW_k, and score the upper-level lossL_V. Returns(V_best, W_best, upper_loss_best, all_upper_losses).
- mlsynth.utils.fscm_helpers.bilevel.stages.tykhonov_refine(prob: BilevelProblem, V0: ndarray, *, eps0: float = 0.001, shrink: float = 0.5, outer_iters: int = 6, inner_iters: int = 8, fd_step: float = 0.001, lr: float = 0.2) Tuple[ndarray, ndarray, float, int]#
Section 3.3: pragmatic regularized descent over the predictor simplex.
For a decreasing sequence
eps_k -> 0(Proposition 4) the implicit objectiveL_eps(V) = L_V(V, W*_eps(V))is minimized by projected gradient descent onV(finite-difference gradient, simplex projection). A lighter stand-in for the full Bouligand-stationary QP of Appendix B.1; it only movesVwhen doing so lowers the outcome loss.
- mlsynth.utils.fscm_helpers.bilevel.stages.unconstrained_feasibility(prob: BilevelProblem, *, feas_tol: float = 1e-08) Tuple[ndarray, ndarray, float, bool]#
Section 3.1: unconstrained simplex regression + feasibility check.
Returns
(W_unc, V_star, lower_bound, is_optimal).W_uncminimizes the outcome fit on the simplex (Eq. 3) andlower_bound = L(W**). The predictor LP (Eq. 10) is linear inVover the simplex, so its optimum is the basis vector on the best-matched predictor; if that predictor’s discrepancy is ~0,W_uncis the global bilevel solution.
- mlsynth.utils.fscm_helpers.bilevel.stages.warn_on_gap(gap: float, lower_bound: float, factor: float, *, stacklevel: int = 2) None#
Warn when the bilevel optimality gap is a large multiple of the bound.
A large gap means the predictor-constrained fit is far worse than the best achievable outcome fit – i.e. the predictors are hard to match on the donor simplex and the predictor weights
Vare weakly identified (the pathology behind diverging malo/mscmt estimates). Only fires for a non-trivial bound.
The driver composing the stages, and the structured problem/solution containers.
Driver for the bilevel SCM solver.
Two interchangeable backends solve the same optimistic bilevel program
(minimise pre-treatment outcome fit, subject to W minimising the
V-weighted predictor discrepancy on the donor simplex):
"malo"(default) – Malo et al. (2024): the staged corner search ofstages, short-circuiting as soon as an optimal solution is certified (the optimum is typically a corner found in the early stages)."mscmt"– Becker & Kloessner (2018): a global differential-evolution search overlog10(V)with the simplex inner solve (mscmt)."penalized"– Abadie & L’Hour (2021): fixesGamma = Iand adds a pairwise matching penalty, withlambdachosen by leave-one-out CV. It yields a unique, sparseW(penalized), sidestepping theVnon-identification that makes malo and mscmt disagree.
The first two share the Section 3.1 / MSCMT Eq. 13 global-optimum certificate
and the same lower bound, so they agree exactly whenever the unconstrained
outcome optimum is predictor-feasible; they differ when the optimal V is
interior. The penalized backend resolves that ambiguity by construction.
- mlsynth.utils.fscm_helpers.bilevel.solver.lower_level_weights(prob: BilevelProblem, V: ndarray, eps: float = 1e-06) ndarray#
Public wrapper: donor weights for fixed predictor weights
V.
- mlsynth.utils.fscm_helpers.bilevel.solver.solve_bilevel(prob: BilevelProblem, *, method: str = 'malo', **kwargs) BilevelSolution#
Solve the bilevel SCM problem (Eq. 6-7), dispatching on
method.- Parameters:
prob (BilevelProblem) – Outcome and predictor matrices.
method ({“malo”, “mscmt”, “penalized”}) – Which backend to use.
"malo"(default) runs the staged corner search;"mscmt"runs the global differential-evolution outer search;"penalized"runs the Abadie-L’Hour penalized estimator with leave-one-outlambdaselection. The default preserves the historical behaviour of every existing caller.**kwargs – Backend-specific options. For
"malo":feas_tol,eps_corner,refine,refine_gap_tol. For"mscmt":lb,maxiter,popsize,tol,seed,polish,feas_tol. For"penalized":lam(default"loo"),lam_grid,max_iter,tol(seemlsynth.utils.fscm_helpers.bilevel.penalized.solve_penalized()).
- Returns:
BilevelSolution
Frozen containers for the bilevel SCM optimization (Malo et al. 2024).
The synthetic-control problem is an optimistic bilevel program: the
upper level chooses predictor weights V (and, implicitly, donor weights
W) to minimize the pre-treatment outcome fit, while the lower level
chooses W to minimize the V-weighted predictor discrepancy on the
donor simplex.
References
Malo, P., Eskelinen, J., Zhou, X., & Kuosmanen, T. (2024). Computing Synthetic Controls Using Bilevel Optimization. Computational Economics, 64, 1113-1136. https://doi.org/10.1007/s10614-023-10471-7
- class mlsynth.utils.fscm_helpers.bilevel.structure.BilevelProblem(y1_pre: ~numpy.ndarray, Y0_pre: ~numpy.ndarray, X1: ~numpy.ndarray, X0: ~numpy.ndarray, predictor_names: ~typing.List[~typing.Any] = <factory>)#
Bases:
objectInputs to the bilevel SCM solver (pure NumPy).
- Parameters:
y1_pre (np.ndarray) – Treated pre-treatment outcomes, shape
(Tpre,)– the upper-level targetY_1^{pre}.Y0_pre (np.ndarray) – Donor pre-treatment outcomes, shape
(Tpre, J)–Y_0^{pre}.X1 (np.ndarray) – Treated predictor vector, shape
(K,)–X_1.X0 (np.ndarray) – Donor predictor matrix, shape
(K, J)–X_0.predictor_names (list) – Labels of the
Kpredictors, for provenance.
- Raises:
MlsynthDataError – If the arrays are non-finite, wrongly dimensioned, mutually inconsistent in shape, or carry no donors.
K == 0(no predictors) is permitted – only the predictor-free penalized backend accepts it.
- X0: ndarray#
- X1: ndarray#
- Y0_pre: ndarray#
- y1_pre: ndarray#
- class mlsynth.utils.fscm_helpers.bilevel.structure.BilevelSolution(V: ~numpy.ndarray, W: ~numpy.ndarray, upper_loss: float, lower_loss: float, lower_bound: float, stage: str, iterations: int = 0, metadata: ~typing.Dict[str, ~typing.Any] = <factory>)#
Bases:
objectOutput of the bilevel SCM solver.
- V#
Optimal predictor weights, shape
(K,)on the simplex.- Type:
np.ndarray
- W#
Optimal donor weights, shape
(J,)on the simplex.- Type:
np.ndarray
- lower_bound#
L(W**)from the unconstrained simplex regression – the global lower bound onupper_loss.- Type:
- iterations#
Number of refinement iterations performed (0 if a corner/unconstrained solution was accepted).
- Type:
- V: ndarray#
- W: ndarray#