Sparse Synthetic Control (SparseSC)#
Overview#
SparseSC implements the L1-penalized predictor-weighting variant of canonical synthetic control proposed by Vives-i-Bastida (2023, Predictor Selection for Synthetic Controls). It targets the same Abadie, Diamond, and Hainmueller (2010) framework as classical SCM, but adds a lasso penalty on the predictor-importance vector \(v\) to deliver interpretable predictor selection: as the L1 penalty grows, uninformative predictors get \(v\)-weights of exactly zero and are dropped from the fit.
Compared with the canonical SCM data-driven \(V\) choice (a cross-validated grid search over diagonal \(V\) minimizing pre-period MSE), SparseSC
selects predictors explicitly via L1 sparsity rather than implicitly via small but nonzero \(v\)-weights;
picks the L1 penalty \(\lambda\) on a held-out validation block of the pre-period (a 75/25 train/validation split by default, which matches the 14/5-year split Vives used in the empirical Prop 99 application); and
anchors the first predictor’s \(v\)-weight at 1, which fixes the overall scale and removes the trivial \(v = 0\) minimum that the L1 penalty would otherwise admit.
The donor weights \(w\) solve the usual SCM simplex QP given \(v\).
Inference defaults to a moving-block conformal CI for the ATT
in the spirit of Chernozhukov, Wuethrich and Zhu (2021), calibrated
on the validation-block residuals. Vives’s Abadie-style placebo
permutation is still available via inference_method="placebo".
Mathematical Formulation#
Setup#
Let \(Y_{1, t}\) denote the treated outcome and \(Y_{0, t} \in \mathbb{R}^N\) the donor outcomes at time \(t\). The pre- treatment window \(t = 1, \dots, T_0\) is partitioned into a training block \(t = 1, \dots, T_0^{\text{tr}}\) and a validation block \(t = T_0^{\text{tr}} + 1, \dots, T_0\). Predictors enter through a treated vector \(X_1 \in \mathbb{R}^P\) and a donor matrix \(X_0 \in \mathbb{R}^{P \times N}\). After standardization each row of \([X_0, X_1]\) has unit sample standard deviation across units.
Inner W-weight QP#
Given \(v \in \mathbb{R}^P_{\ge 0}\) the donor weights solve
where \(\Delta_N = \{w \in \mathbb{R}^N_{\ge 0} :
\mathbf{1}^\top w = 1\}\) is the donor simplex. This is exactly the
QP MATLAB’s quadprog solves inside
sparse_synth/loss_function.m.
mlsynth calls Clarabel directly (bypassing CVXPY’s
canonicalization layer), which is the single biggest performance
fix versus the prior CVXPY-based implementation: CVXPY parsing
overhead was ~10-50 ms per call for a 39-donor problem, while the
underlying Clarabel solve itself takes microseconds. The constraint
skeleton (A, b, cones, settings) is cached per donor count
\(N\) so only the data terms \(H = X_0^\top \mathrm{diag}(v)
X_0\) and \(q = -2 X_0^\top \mathrm{diag}(v) X_1\) are rebuilt per
call.
For numerical robustness — the augmented k > N spec is rank-
deficient and Clarabel can return InsufficientProgress at tight
tolerance — the inner solve retries with a trace-scaled ridge and
looser tolerances before falling back to a uniform-w feasible point.
This prevents the outer L-BFGS-B sweep from aborting mid-run on a
single bad exploration step.
Outer V-weight problem#
The \(v\)-weights minimize a penalized outcome MSE plus the L1 penalty on \(v\):
The \(v_1 = 1\) anchor is what prevents the trivial all-zero solution at any \(\lambda > 0\): without it the outer objective is positive-scale-invariant in \(v\) and the L1 penalty would push every component to zero (Vives 2023, Appendix 6.1).
The window \(\mathcal{T}\) is set by outer_loss_window:
"training"(default) — \(\mathcal{T} = \{1, \dots, T_0^{\text{tr}}\}\). Matches the unpublished MATLAB driversparse_synth.mand reproduces the Prop 99 estimates Vives reports in the empirical section."validation"— \(\mathcal{T} = \{T_0^{\text{tr}} + 1, \dots, T_0\}\). Matches the page-5 \(L_V\) definition in Vives’s Algorithm 1 literally; useful for ablations but produces notably worse in-sample fit than the training variant.
Each evaluation of the outer objective invokes the inner QP, so the
outer problem is a smooth bound-constrained NLP solved with
L-BFGS-B (scipy.optimize).
Gradient computation#
L-BFGS-B needs gradients of the outer objective in \(v\). Two
modes are available, controlled by use_analytical_grad:
False(default) — central-difference numerical gradient. Each outer step pays \(2(P-1)\) inner-QP solves.True— closed-form gradient via the envelope theorem applied at the inner optimum \(w^*(v)\). With active set \(\mathcal{A} = \{i : w_i^* > 0\}\), one \((|\mathcal{A}| + 1) \times (|\mathcal{A}| + 1)\) Cholesky on the reduced KKT matrix yields all \(P - 1\) gradient components in \(O(P |\mathcal{A}|)\) work:\[\frac{\partial L}{\partial v_k} = -\frac{4}{|\mathcal{T}|}\, r_k \cdot \bigl(X_0[k, \mathcal{A}]\, z\bigr) + \lambda,\]where \(r_k = X_{1k} - X_0[k, \mathcal{A}] w_{\mathcal{A}}^*\) is the predictor-\(k\) pre-fit residual and \(z\) solves
\[\begin{split}\begin{pmatrix} 2 H_{\mathcal{A}\mathcal{A}} & \mathbf{1} \\ \mathbf{1}^\top & 0 \end{pmatrix} \begin{pmatrix} z \\ \mu_z \end{pmatrix} = \begin{pmatrix} Z_0[:, \mathcal{A}]^\top r_{\text{outer}} \\ 0 \end{pmatrix}.\end{split}\]The analytical gradient is exact (verified against central FD to ~1e-7 at random interior points). It yields a ~5–10× speedup on the outer sweep, but the cleaner gradient lets L-BFGS-B settle at the first critical point near the cold init on the non-convex L1- penalized V-objective. The FD path’s implicit gradient noise tends to find better local optima at non-zero lambda, so the default is FD for correctness. Opt in to the analytical path when running large placebo sweeps where throughput matters more than exact local-optimum reproducibility. When
use_analytical_grad = True, the L-BFGS-Bftolauto-tightens to1e-12because the clean gradient converges in many fewer iterations and the default1e-8terminates the loop before convergence.
Lambda selection#
The penalty \(\lambda\) is selected by the unpenalized validation-block outcome MSE:
The default grid is \(\Lambda = \{0\} \cup \text{logspace}(10^{-4}, 1, 50)\). Setting \(\lambda = 0\) recovers the unpenalized data-driven SCM with a unit-anchored first predictor.
Predictor selection#
As \(\lambda\) grows, the L1 penalty drives uninformative \(v_p\) to exactly zero; the corresponding predictor effectively drops out of the fit. The selected predictor set is
This is what makes the method Sparse SC: the explanation of the treated unit’s pre-trajectory is interpretable in terms of a small subset of predictors.
ATT and Counterfactual#
With \(\hat v = v^*(\hat\lambda)\) and \(\hat w = w^*(\hat v)\) recovered on the full pre-period, the counterfactual and ATT are
Conformal ATT inference (default)#
Inference defaults to a moving-block conformal CI for the ATT, following the philosophy of Chernozhukov, Wuethrich and Zhu (2021): treat the in-sample residuals as a calibration sample of what “noise” should look like under the no-treatment null, and invert a permutation test in \(\theta\) to bracket the ATT.
Define the residual series \(e_t = Y_{1, t} - Y_{0, t}^\top \hat w\). The calibration set is
The validation block is genuinely out-of-sample under the chosen \(v\); the full pre-block gives a larger calibration sample but its training-block residuals are in-sample under \(v\).
The conformity score for a block \(B\) of size \(b = \max(3, \lfloor\sqrt{T - T_0}\rfloor)\) is
and the calibration distribution is built by sliding the block across \(e^{\text{calib}}\) (with wrap-around blocks for boundary coverage). The post-treatment test statistic at the candidate ATT \(\theta\) is
The \((1 - \alpha)\) conformal CI is
which we compute by grid search over a generous neighbourhood of \(\widehat{\mathrm{ATT}}\). The two-sided p-value for \(H_0 : \mathrm{ATT} = 0\) is
Pointwise per-period bands use the \((1 - \alpha)\)-quantile of the calibration scores directly:
This inferential procedure trades the cross-donor exchangeability assumption of Vives’s placebo (every donor is equally likely to be the treated unit) for a within-unit exchangeability assumption on the residuals (validation-period residuals look like the no-treatment counterfactual’s noise). On Prop 99 the conformal 95% CI is typically \([-20, -18]\) versus the placebo’s much wider bounds, because conformal leverages the actual model’s residual structure rather than donor-level heterogeneity.
Abadie-style placebo (opt-in)#
Set inference_method="placebo" to recover Vives’s procedure.
For each donor \(j\), swap that donor into the treated slot,
remove it from the donor pool, refit SparseSC at the already-
selected \(\hat\lambda\) (or, optionally, re-run the full
\(\lambda\) sweep) and record the placebo ATT. The two-sided
permutation p-value is
where \(B\) is the number of completed placebos. Re-using
\(\hat\lambda\) makes the placebo loop tractable; set
placebo_resweep=True to re-select \(\lambda\) for every
placebo (much slower).
Predictor Convention#
Like every other mlsynth estimator, SparseSC is fed a single
long-format df with one row per (unit, time). Predictors are
constructed under the hood from the same frame, in two flavors:
covariates— column names indfwhose per-unit pre- treatment mean is taken as the predictor value. Time-invariant unit characteristics collapse trivially; time-varying covariates are summarized by the pre-period mean.outcome_lag_periods— specific pre-treatment time labels (as found in thetimecolumn) whose outcome values become additional predictor rows. These are the canonical Abadie, Diamond & Hainmueller (2010) lagged-outcome predictors (e.g., thesmk_75,smk_80,smk_88rows in the Prop 99 example).
The two lists are concatenated to form the predictor matrix; the
first predictor (first entry of covariates if any, otherwise
the first outcome lag) is the anchor whose \(v\)-weight is
fixed at 1. The anchor choice matters in finite samples — Vives
recommends picking a predictor known to be informative, or
treating the anchor as a hyperparameter and sweeping it (Vives
2023, Appendix 6.1).
Performance notes#
The single biggest cost in a SparseSC fit is the inner W-weight QP, which is invoked
times where \(g = 2(P-1)\) under finite-difference gradients
and \(g = 1\) under the analytical gradient. For Vives’s
augmented k=40 spec that’s ~50,000 inner-QP calls just for the
fit, plus another \(B \approx 38\) placebos under
inference_method="placebo". Two optimizations in this build
matter:
Direct Clarabel removes CVXPY canonicalization (~30-60× per call). Speedup applies universally; no correctness tradeoff.
Analytical gradient (opt-in via
use_analytical_grad=True) removes the \(2(P-1)\) finite-difference factor (~5-10× on the outer loop). Tradeoff: the cleaner gradient can settle in worse local optima of the non-convex L1-penalized outer objective; FD’s implicit gradient noise tends to escape them. Default off for correctness.
Empirically, the combination puts the canonical ADH-7 California Prop 99 fit at ~5 s with analytical gradient and ~23 s with FD (versus a CVXPY+SCS baseline that would hang for minutes on the augmented k=40 spec).
Core API#
Sparse Synthetic Control (SparseSC) estimator.
Implements the L1-penalized predictor-weighting SCM of Vives-i-Bastida (2023, Predictor Selection for Synthetic Controls), applied to the canonical Abadie, Diamond, and Hainmueller (2010) framework.
The estimator has a two-level structure. The inner problem is the
standard SCM simplex QP that picks donor weights w given a fixed
diagonal predictor-importance matrix diag(v). The outer problem
picks the V-weights themselves by minimizing the validation-block
pre-treatment outcome MSE plus an L1 penalty on |v| (matching
Algorithm 1 of the paper). The penalty parameter is selected by the
unpenalized validation MSE. The first V-weight is pinned to 1 to
anchor the scale; the others are bound-constrained non-negative.
Compared with canonical SCM, the L1 penalty yields interpretable
predictor selection: as lambda increases, V-weights collapse to
zero on uninformative predictors, leaving a sparse explanation of the
fit.
The unpublished MATLAB driver sparse_synth.m minimizes the
outcome MSE on the training block (not the validation block) in the
outer V step. That behavior is available via
outer_loss_window="training".
- class mlsynth.estimators.sparse_sc.SparseSC(config: SparseSCConfig | dict)#
Bases:
objectL1-penalized Sparse Synthetic Control estimator.
- Parameters:
config (SparseSCConfig or dict) – Configuration object. See
mlsynth.config_models.SparseSCConfig.- Returns:
SparseSCResults – Typed container with the selected V- and W-weights, the validation-MSE curve over the lambda grid, the counterfactual, and (optionally) Abadie placebo inference.
Notes
Predictors are supplied through
covariates(columns indfwhose per-unit pre-treatment mean becomes one predictor row) and/oroutcome_lag_periods(specific pre-treatment time labels whose outcome values become predictor rows – the canonical ADH lagged- outcome predictors). The first predictor is the “anchor” whose V-weight is fixed at 1.Examples
>>> import pandas as pd >>> from mlsynth import SparseSC >>> df = pd.read_csv("smoking_long.csv") >>> res = SparseSC({ ... "df": df, "outcome": "cigsale", ... "treat": "Proposition 99", "unitid": "state", "time": "year", ... "covariates": ["p_cig", "loginc", "pct15-24", "pc_beer"], ... "outcome_lag_periods": [1975, 1980, 1988], ... "display_graphs": False, ... }).fit() >>> res.att -19.5...
- fit() SparseSCResults#
Run the lambda sweep, recover W-weights, and return results.
Configuration#
- class mlsynth.config_models.SparseSCConfig(*, df: ~pandas.DataFrame, outcome: str, treat: str, unitid: str, time: str, display_graphs: bool = True, save: bool | str = False, counterfactual_color: ~typing.List[str] = <factory>, treated_color: str = 'black', covariates: ~typing.List[str] | None = None, outcome_lag_periods: ~typing.List[~typing.Any] | None = None, T0_train: ~typing.Annotated[int | None, ~annotated_types.Ge(ge=2)] = None, lambda_grid: ~typing.List[float] | None = None, standardize: bool = True, outer_loss_window: str = 'training', solver: ~typing.Any = None, max_outer_iter: ~typing.Annotated[int, ~annotated_types.Ge(ge=10)] = 500, run_inference: bool = True, inference_method: ~typing.Literal['conformal', 'placebo', 'none'] = 'conformal', conformal_window: ~typing.Literal['validation', 'pre'] = 'validation', alpha: ~typing.Annotated[float, ~annotated_types.Gt(gt=0.0), ~annotated_types.Lt(lt=1.0)] = 0.05, n_placebo: ~typing.Annotated[int | None, ~annotated_types.Ge(ge=1)] = None, placebo_resweep: bool = False, seed: int = 1400, use_analytical_grad: bool = False, warm_start: bool = False)#
Configuration for the Sparse Synthetic Control (SparseSC) estimator.
Implements the L1-penalized predictor-weighting SCM variant of Vives-i-Bastida and collaborators (port of the MATLAB
sparse_synth.mdriver) for the canonical Abadie, Diamond, and Hainmueller (2010) framework.Like every other
mlsynthestimator this one is fed a single long-formatdfwith one row per (unit, time). Predictors are constructed under the hood from the long frame: each column listed incovariatesis collapsed to its pre-treatment mean per unit, and each entry ofoutcome_lag_periodsadds the outcome at that specific pre-treatment period as a predictor.- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
Helper Modules#
Panel and predictor preparation for SparseSC.
The estimator takes a single long-format df (one row per
(unit, time)) and constructs both the outcome panel and the
unit-by-predictor matrix internally. Predictors come from two
sources:
covariates– columns indfwhose per-unit pre-treatment mean becomes one predictor row.outcome_lag_periods– specific pre-treatment time labels whose outcome values become additional predictor rows (the canonical Abadie, Diamond & Hainmueller (2010) lagged-outcome predictors).
The first predictor (first entry of covariates if any, otherwise
the first outcome lag) is the anchor whose V-weight is fixed at 1.
- mlsynth.utils.sparse_sc_helpers.setup.prepare_sparse_sc_inputs(df: DataFrame, outcome: str, treat: str, unitid: str, time: str, covariates: Sequence[str] | None = None, outcome_lag_periods: Sequence[Any] | None = None, T0_train: int | None = None, standardize: bool = True) SparseSCInputs#
Build SparseSC inputs from a single long-format panel.
- Parameters:
df (pd.DataFrame) – Long-format balanced panel: one row per
(unit, time)with the outcome, a binary treatment indicator, and any covariates.outcome, treat, unitid, time (str) – Column names in
df.covariates (Sequence[str], optional) – Columns in
dfwhose per-unit pre-treatment mean becomes a predictor row. The first covariate is the anchor (V-weight pinned to 1).outcome_lag_periods (Sequence, optional) – Specific pre-treatment time labels whose outcome values become additional predictor rows. Appended after
covariates.T0_train (int, optional) – End of the training block within the pre-period (exclusive). Defaults to
floor(T0_total * 0.75)– a 75/25 split.standardize (bool) – Standardize each predictor row by its sample standard deviation across all units. Default
True.
Inner W-weight QP for SparseSC.
Given V-weights v (a vector of predictor importance), the donor
weights w solve the canonical SCM simplex QP
- min over w >= 0, sum(w) = 1:
w’ X0’ diag(v) X0 w - 2 X1’ diag(v) X0 w.
The inner QP is called O(grid x outer_iters x P) times during a SparseSC fit, so its per-call cost dominates the total wall time. We solve it by calling Clarabel directly (skipping CVXPY’s canonicalization layer); CVXPY adds ~10-50 ms of parsing overhead per call on a 39-donor problem, while the underlying solve is microseconds.
A tiny ridge is added to the quadratic term for numerical stability: when the augmented Vives k~40 specification is used the donor design matrix can be rank-deficient (more predictors than donors), which makes H = X0’ diag(v) X0 numerically singular under any QP solver. The ridge is scaled by the trace of H so it is invariant to the units of v and X0.
The solver argument is retained for API compatibility but no longer
selects between back-ends; Clarabel is used unconditionally.
- mlsynth.utils.sparse_sc_helpers.inner.solve_w(v: ndarray, X1: ndarray, X0: ndarray, solver: Any = None) ndarray#
Return the donor-weight vector
won the simplex.- Parameters:
v (np.ndarray) – Length-
PV-weight vector.X1 (np.ndarray) – Length-
Ptreated predictor vector.X0 (np.ndarray) –
(P, N)donor predictor matrix.solver (Any, optional) – Unused. Retained for backwards compatibility with the previous CVXPY-based interface.
Outer V-objective for SparseSC, with closed-form gradient.
The outer objective is
L(v_2; lam) = (1/T_outer) * ||Z1 - Z0 w*(v)||^2 + lam * ||v||_1
where v = [1; v_2] (the first predictor is pinned at 1 to break the
positive-scale invariance of the inner simplex QP, as argued in
Vives-i-Bastida (2023) Appendix 6.1) and w*(v) solves the inner
simplex QP
min_w w’ H(v) w - 2 g(v)’ w s.t. 1’ w = 1, w >= 0,
with H(v) = X0' diag(v) X0 and g(v) = X0' diag(v) X1.
Two outer windows are supported, controlled at call sites via the
Z1, Z0 arguments: pass the validation block to match Algorithm 1
in the paper; pass the training block to match the MATLAB driver.
This module provides three callables:
outer_loss– the loss alone (kept for back-compat).selection_mse– the unpenalised validation-block MSE used tochoose lambda.
outer_loss_and_grad–(loss, grad)with the closed-formenvelope-theorem gradient.
The closed-form gradient avoids the 2(P-1)-evaluation finite-
difference cost that L-BFGS-B otherwise incurs per outer step. The
derivation is in the module docstring of optimization.py.
- mlsynth.utils.sparse_sc_helpers.objective.outer_loss(v2: ndarray, X1: ndarray, X0: ndarray, Z1: ndarray, Z0: ndarray, lam: float, solver: Any = None) float#
Outer V-objective:
mean((Z1 - Z0 w(v))^2) + lam * ||v||_1.Pass the validation block to match the paper’s Algorithm 1; pass the training block to match the MATLAB driver.
- mlsynth.utils.sparse_sc_helpers.objective.outer_loss_and_grad(v2: ndarray, X1: ndarray, X0: ndarray, Z1: ndarray, Z0: ndarray, lam: float, solver: Any = None, active_tol: float = 1e-07) Tuple[float, ndarray]#
Return
(loss, grad)of the outer V-objective w.r.t.v_2.Implements the envelope-theorem gradient described in this module’s docstring. The active set is recovered from the inner solution by thresholding on
active_tol. A single(|A|+1) x (|A|+1)KKT solve gives the adjoint, after which allP-1gradient components are computed inO(P |A|)work.The L1 part contributes
+lamper coordinate (right-derivative under thev_2 >= 0bound L-BFGS-B already enforces).Falls back to lstsq if the reduced KKT matrix is numerically singular (which can happen when the same donor appears at multiple predictor rows).
- mlsynth.utils.sparse_sc_helpers.objective.selection_mse(v2: ndarray, X1: ndarray, X0: ndarray, Z1_val: ndarray, Z0_val: ndarray, solver: Any = None) float#
Unpenalised validation-block MSE used to select lambda.
- mlsynth.utils.sparse_sc_helpers.objective.training_loss(v2: ndarray, X1: ndarray, X0: ndarray, Z1: ndarray, Z0: ndarray, lam: float, solver: Any = None) float#
Outer V-objective:
mean((Z1 - Z0 w(v))^2) + lam * ||v||_1.Pass the validation block to match the paper’s Algorithm 1; pass the training block to match the MATLAB driver.
- mlsynth.utils.sparse_sc_helpers.objective.validation_mse(v2: ndarray, X1: ndarray, X0: ndarray, Z1_val: ndarray, Z0_val: ndarray, solver: Any = None) float#
Unpenalised validation-block MSE used to select lambda.
Lambda sweep + V-weight optimisation for SparseSC.
For each lambda on the grid the outer V-weight problem is a smooth
bound-constrained nonlinear program (v_2 >= 0) solved with
scipy.optimize.minimize (L-BFGS-B). The selected lambda is the
value minimising the unpenalised validation-block MSE.
Two performance refinements over a naive implementation are in place:
Closed-form gradient (Vives’s Algorithm 1 outer objective is smooth in v away from the L1 kink; the L1 part has a trivial right-derivative under the v_2 >= 0 bound L-BFGS-B already enforces). Without this, L-BFGS-B falls back to a 2(P-1)-evaluation central finite-difference per outer step, which is the dominant cost on large predictor sets. The closed-form gradient is implemented in
objective.outer_loss_and_gradvia the envelope theorem: at the inner optimum w*(v), one(|A|+1) x (|A|+1)Cholesky on the active-set KKT matrix produces all P-1 gradient components.Warm starts across the lambda grid. The path is monotone in lambda, so the V-solution at lambda_i is a good initialiser for lambda_{i+1}. A failed warm start falls back to the cold MATLAB init
default_v20.
The outer V-objective window is controlled by outer_loss_window:
"validation"(default, paper) – outer V minimises validation- block MSE + lambda * ||V||_1. Matches Vives-i-Bastida (2023) Algorithm 1."training"– outer V minimises training-block MSE + lambda * ||V||_1. Matches the unpublished MATLAB driversparse_synth.m.
- mlsynth.utils.sparse_sc_helpers.optimization.default_lambda_grid(size: int = 51) ndarray#
Return
[0, logspace(-4, 0, size - 1)](matches MATLAB).
- mlsynth.utils.sparse_sc_helpers.optimization.default_v20(X0: ndarray) ndarray#
MATLAB starting v_2 = (sd_1 / sd_k)^2 for k > 1.
- mlsynth.utils.sparse_sc_helpers.optimization.recover_w(v: ndarray, X1: ndarray, X0: ndarray, solver: Any = None) ndarray#
Final donor-weight recovery at the selected V-weights.
- mlsynth.utils.sparse_sc_helpers.optimization.sweep_lambda(X1: ndarray, X0: ndarray, Y1: ndarray, Y0: ndarray, T0_total: int, T0_train: int, lambda_grid: ndarray | None = None, solver: Any = None, max_outer_iter: int = 500, ftol: float | None = None, outer_loss_window: str = 'validation', use_analytical_grad: bool = False, warm_start: bool = False, multi_start: int = 1) Tuple[ndarray, float, ndarray, ndarray, ndarray, ndarray]#
Sweep lambda and return the best V-weights.
- Parameters:
outer_loss_window ({“validation”, “training”}) – Which pre-treatment block the outer V-objective evaluates the outcome MSE over.
use_analytical_grad (bool, default True) – Use the envelope-theorem closed-form gradient inside L-BFGS-B. Set to False to fall back to scipy’s finite-difference gradient (~20-50x slower on the augmented Vives spec).
warm_start (bool, default True) – Reuse the previous lambda’s V-solution as the initialiser for the next lambda. Falls back to the cold MATLAB init if a warm-started fit appears to fail.
- Returns:
optv (np.ndarray) – Final V-weights, shape
(P,)withoptv[0] = 1.opt_lambda (float) – Lambda value selected on the validation MSE.
grid (np.ndarray) – Lambda grid actually used.
outer_curve (np.ndarray) – Penalised outer objective at each grid point.
val_curve (np.ndarray) – Unpenalised validation MSE at each grid point (selection target).
v_path (np.ndarray) – Per-grid-point V-weights, shape
(len(grid), P).
Inference procedures for SparseSC.
Two methods are implemented:
run_placebo– the Abadie-style placebo permutation. For each donor we treat that donor as the placebo treated unit, refit SparseSC at the already-selected lambda, and compare the observed ATT against the distribution of placebo ATTs.conformal_inference– a moving-block conformal CI in the spirit of Chernozhukov, Wuethrich and Zhu (2021), adapted to the SparseSC pre / validation / post panel layout. Calibration residuals come from either the validation block (default – smallest sample but truly out-of-sample under V) or the entire pre-treatment block (larger sample, but training residuals are in-sample under V). The ATT CI is obtained by inverting a moving- block test of the formmean(|e_post - theta|) <= q_{1-alpha}of the calibration distribution; pointwise per-period bands use the sameq_{1-alpha}quantile directly.
- mlsynth.utils.sparse_sc_helpers.inference.conformal_inference(gap: ndarray, T0_train: int, T0_total: int, T: int, conformal_window: str = 'validation', alpha: float = 0.05, block_size: int | None = None, grid_size: int = 401, grid_half_width_se: float = 6.0) dict#
Conformal ATT confidence interval from in-sample residuals.
- Parameters:
gap (np.ndarray) – Full-period residual
Y1 - Y0 @ w, shape(T,). The pre- treatment portion (gap[:T0_total]) is interpreted as noise under the no-treatment null;gap[T0_total:]is the post- treatment effect-plus-noise sequence.T0_train, T0_total, T (int) – Training-block end / pre-block end / full length. Pre =
[0, T0_total), validation =[T0_train, T0_total), post =[T0_total, T).conformal_window ({“validation”, “pre”}) – Which residual block to use for calibration.
"validation"uses onlygap[T0_train:T0_total](truly out-of-sample under the chosen V);"pre"uses the entiregap[:T0_total].alpha (float) – Two-sided significance level.
block_size (int, optional) – Moving-block size for the conformity score. Defaults to
max(3, sqrt(n_post)), matching LEXSCM.grid_size (int) – Number of theta candidates in the grid search for the ATT CI.
grid_half_width_se (float) – The grid spans
[ATT_hat +/- grid_half_width_se * SE]where SE is a plug-in standard error from the calibration residuals.
- Returns:
dict – Keys:
method,att_observed,ci_lower,ci_upper,p_value,calibration_residuals,pointwise_lower,pointwise_upper,alpha.
- mlsynth.utils.sparse_sc_helpers.inference.run_placebo(Y0: ndarray, Y1: ndarray, X0: ndarray, X1: ndarray, T0_total: int, T0_train: int, selected_lambda: float, observed_att: float, solver: Any = None, resweep: bool = False, lambda_grid: ndarray | None = None, n_placebo: int | None = None, seed: int = 1400, outer_loss_window: str = 'validation') Tuple[ndarray, float, int]#
Return
(placebo_atts, p_value, n_completed).- Parameters:
Y0, Y1, X0, X1 (np.ndarray) – Full pre-standardized panel + predictor matrices.
T0_total, T0_train (int) – Pre-treatment window bounds.
selected_lambda (float) – Lambda chosen on the actual treated unit. Reused for each placebo when
resweep=False(default).observed_att (float) – ATT of the actual treated unit, used to construct the p-value.
resweep (bool) – If True, re-run the full lambda grid for each placebo. Slow.
lambda_grid (np.ndarray, optional) – Grid for the resweep case.
n_placebo (int, optional) – Subsample of donors to use as placebos.
Noneuses every donor.seed (int) – Seed for the subsample when
n_placebo < N.
Plot helper for SparseSC.
Wraps mlsynth.utils.resultutils.plot_estimates() so the
observed-vs-counterfactual chart works with the typed results object.
- mlsynth.utils.sparse_sc_helpers.plotter.plot_sparse_sc(results: SparseSCResults, treated_color: str = 'black', counterfactual_color: str | List[str] = 'red', save: bool | str | dict = False, time_axis_label: str = 'Time', treatment_label: str = 'Treatment', unit_label: str = 'Unit') None#
Render observed vs SparseSC counterfactual on the treated unit.
Typed result containers for SparseSC.
- class mlsynth.utils.sparse_sc_helpers.structures.SparseSCDesign(v: ndarray, w: ndarray, opt_lambda: float, lambda_grid: ndarray, train_loss_curve: ndarray, val_mse_curve: ndarray, v_path: ndarray)#
Optimization outputs of the lambda sweep.
- Parameters:
v (np.ndarray) – Final V-weights, shape
(P,). First entry is 1 (the anchor).w (np.ndarray) – Final donor weights, shape
(N,), on the simplex.opt_lambda (float) – Selected L1 penalty.
lambda_grid (np.ndarray) – Full grid of lambdas swept.
train_loss_curve (np.ndarray) – Training loss at each grid point, length equal to
lambda_grid.val_mse_curve (np.ndarray) – Validation MSE at each grid point.
v_path (np.ndarray) – Per-grid-point V-weights, shape
(len(grid), P).
- lambda_grid: ndarray#
- train_loss_curve: ndarray#
- v: ndarray#
- v_path: ndarray#
- val_mse_curve: ndarray#
- w: ndarray#
- class mlsynth.utils.sparse_sc_helpers.structures.SparseSCInference(method: str, p_value: float, att_observed: float = nan, ci_lower: float = nan, ci_upper: float = nan, alpha: float = nan, placebo_atts: ~numpy.ndarray = <factory>, n_placebo: int = 0, calibration_residuals: ~numpy.ndarray = <factory>, pointwise_lower: ~numpy.ndarray = <factory>, pointwise_upper: ~numpy.ndarray = <factory>)#
Inference results for SparseSC.
Either the Abadie-style placebo permutation or the validation-block conformal inference of Chernozhukov, Wuethrich and Zhu (2021) adapted to the SparseSC pre/post layout. The
methodtag identifies which fields are populated.- Parameters:
method (str) –
"abadie_placebo_permutation","conformal_validation","conformal_pre", or"none".p_value (float) – Two-sided p-value for
H_0: ATT = 0. NaN when no inference was run.att_observed (float) – Point estimate of ATT, copied here for convenience.
ci_lower, ci_upper (float) – Lower/upper bounds of the (1 - alpha) confidence interval for the ATT. NaN for
method="none".alpha (float) – Two-sided significance level used to build
ci_*.placebo_atts (np.ndarray) – Placebo ATTs, populated only when
methodis the placebo permutation. Empty array otherwise.n_placebo (int) – Number of placebo runs (placebo method only; 0 otherwise).
calibration_residuals (np.ndarray) – Residuals used to build the conformity scores (conformal method only). Empty for the placebo method.
pointwise_lower, pointwise_upper (np.ndarray) – Per-period pointwise band around each post-period gap from the (1 - alpha)-quantile of the conformity scores. Empty for non-conformal methods.
- calibration_residuals: ndarray#
- placebo_atts: ndarray#
- pointwise_lower: ndarray#
- pointwise_upper: ndarray#
- class mlsynth.utils.sparse_sc_helpers.structures.SparseSCInputs(Y0: ndarray, Y1: ndarray, X0: ndarray, X1: ndarray, T: int, T0_total: int, T0_train: int, treated_unit_name: Any, donor_names: Sequence, predictor_names: Sequence, time_labels: ndarray, Ywide: Any, outcome: str)#
Pre-processed panel + predictor matrices for SparseSC.
- Parameters:
Y0 (np.ndarray) – Donor outcome matrix, shape
(T, N)(rows = time, columns = donors), aligned withdonor_names.Y1 (np.ndarray) – Treated outcome series, shape
(T,).X0 (np.ndarray) – Donor predictor matrix, shape
(P, N)(rows = predictors, columns = donors), already standardized.X1 (np.ndarray) – Treated predictor vector, shape
(P,), already standardized.T (int) – Total number of time periods.
T0_total (int) – End of the full pre-treatment window (exclusive).
T0_train (int) – End of the training block within the pre-period (exclusive). Validation block is
[T0_train, T0_total).treated_unit_name (Any) – Label of the treated unit.
donor_names (Sequence) – Donor labels in column order of
Y0/X0.predictor_names (Sequence) – Predictor labels in row order of
X0/X1.time_labels (np.ndarray) – Time labels in row order of
Y0.Ywide (Any) – Wide outcome frame preserved for plotting.
outcome (str) – Outcome variable name.
- X0: ndarray#
- X1: ndarray#
- Y0: ndarray#
- Y1: ndarray#
- time_labels: ndarray#
- class mlsynth.utils.sparse_sc_helpers.structures.SparseSCResults(inputs: SparseSCInputs, design: SparseSCDesign, inference: SparseSCInference, counterfactual: ndarray, gap: ndarray, att: float, pre_rmse: float, donor_weights: Dict[Any, float], predictor_weights: Dict[Any, float])#
Public
SparseSC.fit()return container.- Parameters:
inputs (SparseSCInputs) – Pre-processed panel + predictors.
design (SparseSCDesign) – Lambda-selection results, V and W weights.
inference (SparseSCInference) – Placebo p-value or
method = "none".counterfactual (np.ndarray) –
Y0 @ wover allTperiods.gap (np.ndarray) –
Y1 - counterfactual, shape(T,).att (float) – Mean post-treatment gap.
pre_rmse (float) – Root-mean-squared pre-treatment fit error.
donor_weights (Dict[Any, float]) –
{donor_name: w_j}.predictor_weights (Dict[Any, float]) –
{predictor_name: v_p}.
- counterfactual: ndarray#
- design: SparseSCDesign#
- gap: ndarray#
- inference: SparseSCInference#
- inputs: SparseSCInputs#
Example#
The canonical empirical example is Vives’s augmented California Proposition 99 study. Load the reshaped long-form panel and run SparseSC with the original ADH-7 predictor set:
"""Run SparseSC on the long-form augmented California dataset."""
from __future__ import annotations
import pandas as pd
from mlsynth import SparseSC
# ---------------------------------------------------------------------
# Load long-form panel
# ---------------------------------------------------------------------
df = pd.read_csv(
"https://raw.githubusercontent.com/jgreathouse9/mlsynth/refs/heads/main/basedata/augmented_cali_long.csv"
)
LAG_PERIODS = [1975, 1980, 1988]
COVARIATES = [
"p_cig",
"loginc",
"pct15-24",
"pc_beer",
]
# ---------------------------------------------------------------------
# SparseSC fit
# ---------------------------------------------------------------------
results = SparseSC(
{
"df": df,
"outcome": "cigsale",
"treat": "Proposition 99",
"unitid": "state",
"time": "year",
"covariates": COVARIATES,
"outcome_lag_periods": LAG_PERIODS,
"display_graphs": True,
"run_inference": False,
}
).fit()
Enable inference (validation-block conformal is the default) and inspect the ATT CI:
results = SparseSC({
"df": df,
"outcome": "cigsale",
"treat": "Proposition 99",
"unitid": "state",
"time": "year",
"covariates": COVARIATES,
"outcome_lag_periods": LAG_PERIODS,
"alpha": 0.05, # CI level
"run_inference": True,
"display_graphs": False,
}).fit()
print(results.att) # post-period ATT
print(results.inference.ci_lower,
results.inference.ci_upper) # 95% conformal CI
print(results.inference.p_value) # H_0: ATT = 0
print(results.design.opt_lambda) # selected L1 penalty
print(results.predictor_weights) # {predictor: v_p}
print(results.donor_weights) # {donor: w_j} on the simplex
# Lambda sweep diagnostics.
import matplotlib.pyplot as plt
plt.plot(results.design.lambda_grid, results.design.val_mse_curve)
plt.xscale("log"); plt.xlabel("lambda"); plt.ylabel("validation MSE")
References#
Abadie, A., Diamond, A., & Hainmueller, J. (2010). “Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California’s Tobacco Control Program.” Journal of the American Statistical Association 105(490):493-505.
Chernozhukov, V., Wuethrich, K., & Zhu, Y. (2021). “An Exact and Robust Conformal Inference Method for Counterfactual and Synthetic Controls.” Journal of the American Statistical Association 116(536):1849-1864.
Vives-i-Bastida, J. (2023). “Predictor Selection for Synthetic Controls.” arXiv:2203.11576v2.