Sparse Synthetic Control (SparseSC)

Contents

Sparse Synthetic Control (SparseSC)#

Overview#

SparseSC implements the L1-penalized predictor-weighting variant of canonical synthetic control proposed by Vives-i-Bastida (2023, Predictor Selection for Synthetic Controls). It targets the same Abadie, Diamond, and Hainmueller (2010) framework as classical SCM, but adds a lasso penalty on the predictor-importance vector \(v\) to deliver interpretable predictor selection: as the L1 penalty grows, uninformative predictors get \(v\)-weights of exactly zero and are dropped from the fit.

Compared with the canonical SCM data-driven \(V\) choice (a cross-validated grid search over diagonal \(V\) minimizing pre-period MSE), SparseSC

  • selects predictors explicitly via L1 sparsity rather than implicitly via small but nonzero \(v\)-weights;

  • picks the L1 penalty \(\lambda\) on a held-out validation block of the pre-period (a 75/25 train/validation split by default, which matches the 14/5-year split Vives used in the empirical Prop 99 application); and

  • anchors the first predictor’s \(v\)-weight at 1, which fixes the overall scale and removes the trivial \(v = 0\) minimum that the L1 penalty would otherwise admit.

The donor weights \(w\) solve the usual SCM simplex QP given \(v\).

Inference defaults to a moving-block conformal CI for the ATT in the spirit of Chernozhukov, Wuethrich and Zhu (2021), calibrated on the validation-block residuals. Vives’s Abadie-style placebo permutation is still available via inference_method="placebo".

Mathematical Formulation#

Setup#

Let \(Y_{1, t}\) denote the treated outcome and \(Y_{0, t} \in \mathbb{R}^N\) the donor outcomes at time \(t\). The pre- treatment window \(t = 1, \dots, T_0\) is partitioned into a training block \(t = 1, \dots, T_0^{\text{tr}}\) and a validation block \(t = T_0^{\text{tr}} + 1, \dots, T_0\). Predictors enter through a treated vector \(X_1 \in \mathbb{R}^P\) and a donor matrix \(X_0 \in \mathbb{R}^{P \times N}\). After standardization each row of \([X_0, X_1]\) has unit sample standard deviation across units.

Inner W-weight QP#

Given \(v \in \mathbb{R}^P_{\ge 0}\) the donor weights solve

\[w^*(v) = \arg\min_{w \in \Delta_N} \; w^\top X_0^\top \mathrm{diag}(v)\, X_0\, w - 2\, X_1^\top \mathrm{diag}(v)\, X_0\, w,\]

where \(\Delta_N = \{w \in \mathbb{R}^N_{\ge 0} : \mathbf{1}^\top w = 1\}\) is the donor simplex. This is exactly the QP MATLAB’s quadprog solves inside sparse_synth/loss_function.m.

mlsynth calls Clarabel directly (bypassing CVXPY’s canonicalization layer), which is the single biggest performance fix versus the prior CVXPY-based implementation: CVXPY parsing overhead was ~10-50 ms per call for a 39-donor problem, while the underlying Clarabel solve itself takes microseconds. The constraint skeleton (A, b, cones, settings) is cached per donor count \(N\) so only the data terms \(H = X_0^\top \mathrm{diag}(v) X_0\) and \(q = -2 X_0^\top \mathrm{diag}(v) X_1\) are rebuilt per call.

For numerical robustness — the augmented k > N spec is rank- deficient and Clarabel can return InsufficientProgress at tight tolerance — the inner solve retries with a trace-scaled ridge and looser tolerances before falling back to a uniform-w feasible point. This prevents the outer L-BFGS-B sweep from aborting mid-run on a single bad exploration step.

Outer V-weight problem#

The \(v\)-weights minimize a penalized outcome MSE plus the L1 penalty on \(v\):

\[v^*(\lambda) = \arg\min_{v \in \mathbb{R}^P_{\ge 0},\; v_1 = 1} \; \frac{1}{|\mathcal{T}|} \sum_{t \in \mathcal{T}} \bigl(Y_{1, t} - Y_{0, t}^\top w^*(v)\bigr)^2 + \lambda \, \|v\|_1.\]

The \(v_1 = 1\) anchor is what prevents the trivial all-zero solution at any \(\lambda > 0\): without it the outer objective is positive-scale-invariant in \(v\) and the L1 penalty would push every component to zero (Vives 2023, Appendix 6.1).

The window \(\mathcal{T}\) is set by outer_loss_window:

  • "training" (default) — \(\mathcal{T} = \{1, \dots, T_0^{\text{tr}}\}\). Matches the unpublished MATLAB driver sparse_synth.m and reproduces the Prop 99 estimates Vives reports in the empirical section.

  • "validation"\(\mathcal{T} = \{T_0^{\text{tr}} + 1, \dots, T_0\}\). Matches the page-5 \(L_V\) definition in Vives’s Algorithm 1 literally; useful for ablations but produces notably worse in-sample fit than the training variant.

Each evaluation of the outer objective invokes the inner QP, so the outer problem is a smooth bound-constrained NLP solved with L-BFGS-B (scipy.optimize).

Gradient computation#

L-BFGS-B needs gradients of the outer objective in \(v\). Two modes are available, controlled by use_analytical_grad:

  • False (default) — central-difference numerical gradient. Each outer step pays \(2(P-1)\) inner-QP solves.

  • True — closed-form gradient via the envelope theorem applied at the inner optimum \(w^*(v)\). With active set \(\mathcal{A} = \{i : w_i^* > 0\}\), one \((|\mathcal{A}| + 1) \times (|\mathcal{A}| + 1)\) Cholesky on the reduced KKT matrix yields all \(P - 1\) gradient components in \(O(P |\mathcal{A}|)\) work:

    \[\frac{\partial L}{\partial v_k} = -\frac{4}{|\mathcal{T}|}\, r_k \cdot \bigl(X_0[k, \mathcal{A}]\, z\bigr) + \lambda,\]

    where \(r_k = X_{1k} - X_0[k, \mathcal{A}] w_{\mathcal{A}}^*\) is the predictor-\(k\) pre-fit residual and \(z\) solves

    \[\begin{split}\begin{pmatrix} 2 H_{\mathcal{A}\mathcal{A}} & \mathbf{1} \\ \mathbf{1}^\top & 0 \end{pmatrix} \begin{pmatrix} z \\ \mu_z \end{pmatrix} = \begin{pmatrix} Z_0[:, \mathcal{A}]^\top r_{\text{outer}} \\ 0 \end{pmatrix}.\end{split}\]

    The analytical gradient is exact (verified against central FD to ~1e-7 at random interior points). It yields a ~5–10× speedup on the outer sweep, but the cleaner gradient lets L-BFGS-B settle at the first critical point near the cold init on the non-convex L1- penalized V-objective. The FD path’s implicit gradient noise tends to find better local optima at non-zero lambda, so the default is FD for correctness. Opt in to the analytical path when running large placebo sweeps where throughput matters more than exact local-optimum reproducibility. When use_analytical_grad = True, the L-BFGS-B ftol auto-tightens to 1e-12 because the clean gradient converges in many fewer iterations and the default 1e-8 terminates the loop before convergence.

Lambda selection#

The penalty \(\lambda\) is selected by the unpenalized validation-block outcome MSE:

\[\hat\lambda = \arg\min_{\lambda \in \Lambda} \; \frac{1}{T_0 - T_0^{\text{tr}}} \sum_{t = T_0^{\text{tr}} + 1}^{T_0} \bigl(Y_{1, t} - Y_{0, t}^\top w^*(v^*(\lambda))\bigr)^2.\]

The default grid is \(\Lambda = \{0\} \cup \text{logspace}(10^{-4}, 1, 50)\). Setting \(\lambda = 0\) recovers the unpenalized data-driven SCM with a unit-anchored first predictor.

Predictor selection#

As \(\lambda\) grows, the L1 penalty drives uninformative \(v_p\) to exactly zero; the corresponding predictor effectively drops out of the fit. The selected predictor set is

\[\mathcal{S}(\hat\lambda) = \{p : v_p^*(\hat\lambda) > 0\}.\]

This is what makes the method Sparse SC: the explanation of the treated unit’s pre-trajectory is interpretable in terms of a small subset of predictors.

ATT and Counterfactual#

With \(\hat v = v^*(\hat\lambda)\) and \(\hat w = w^*(\hat v)\) recovered on the full pre-period, the counterfactual and ATT are

\[\widehat{Y}_{1, t} = Y_{0, t}^\top \hat w, \qquad \widehat{\mathrm{ATT}} = \frac{1}{T - T_0} \sum_{t = T_0 + 1}^T \bigl(Y_{1, t} - \widehat{Y}_{1, t}\bigr).\]

Conformal ATT inference (default)#

Inference defaults to a moving-block conformal CI for the ATT, following the philosophy of Chernozhukov, Wuethrich and Zhu (2021): treat the in-sample residuals as a calibration sample of what “noise” should look like under the no-treatment null, and invert a permutation test in \(\theta\) to bracket the ATT.

Define the residual series \(e_t = Y_{1, t} - Y_{0, t}^\top \hat w\). The calibration set is

\[\begin{split}e^{\text{calib}} = \begin{cases} \{e_t : t \in (T_0^{\text{tr}}, T_0]\} & \text{if ``conformal\_window = "validation"`` (default)} \\ \{e_t : t \in [1, T_0]\} & \text{if ``conformal\_window = "pre"``} \end{cases}.\end{split}\]

The validation block is genuinely out-of-sample under the chosen \(v\); the full pre-block gives a larger calibration sample but its training-block residuals are in-sample under \(v\).

The conformity score for a block \(B\) of size \(b = \max(3, \lfloor\sqrt{T - T_0}\rfloor)\) is

\[s(B) = \frac{1}{b}\sum_{t \in B} |e_t|,\]

and the calibration distribution is built by sliding the block across \(e^{\text{calib}}\) (with wrap-around blocks for boundary coverage). The post-treatment test statistic at the candidate ATT \(\theta\) is

\[s_{\text{post}}(\theta) = \frac{1}{T - T_0}\sum_{t = T_0 + 1}^T \bigl|e_t - \theta\bigr|.\]

The \((1 - \alpha)\) conformal CI is

\[\mathrm{CI}_{1 - \alpha} = \{\theta : \Pr_{B}\bigl(s(B) \ge s_{\text{post}}(\theta)\bigr) > \alpha\},\]

which we compute by grid search over a generous neighbourhood of \(\widehat{\mathrm{ATT}}\). The two-sided p-value for \(H_0 : \mathrm{ATT} = 0\) is

\[p_{\text{conf}} = \Pr_{B}\bigl(s(B) \ge s_{\text{post}}(0)\bigr) = \frac{1}{|\mathcal{B}|}\sum_{B \in \mathcal{B}} \mathbb{1}\{s(B) \ge s_{\text{post}}(0)\}.\]

Pointwise per-period bands use the \((1 - \alpha)\)-quantile of the calibration scores directly:

\[[e_t - q_{1 - \alpha},\; e_t + q_{1 - \alpha}], \quad q_{1 - \alpha} = \mathrm{Quantile}_{1 - \alpha}\{s(B)\}.\]

This inferential procedure trades the cross-donor exchangeability assumption of Vives’s placebo (every donor is equally likely to be the treated unit) for a within-unit exchangeability assumption on the residuals (validation-period residuals look like the no-treatment counterfactual’s noise). On Prop 99 the conformal 95% CI is typically \([-20, -18]\) versus the placebo’s much wider bounds, because conformal leverages the actual model’s residual structure rather than donor-level heterogeneity.

Abadie-style placebo (opt-in)#

Set inference_method="placebo" to recover Vives’s procedure. For each donor \(j\), swap that donor into the treated slot, remove it from the donor pool, refit SparseSC at the already- selected \(\hat\lambda\) (or, optionally, re-run the full \(\lambda\) sweep) and record the placebo ATT. The two-sided permutation p-value is

\[p = \frac{\#\{j : |\mathrm{ATT}_j^{\text{placebo}}| \ge |\widehat{\mathrm{ATT}}|\} + 1}{B + 1},\]

where \(B\) is the number of completed placebos. Re-using \(\hat\lambda\) makes the placebo loop tractable; set placebo_resweep=True to re-select \(\lambda\) for every placebo (much slower).

Predictor Convention#

Like every other mlsynth estimator, SparseSC is fed a single long-format df with one row per (unit, time). Predictors are constructed under the hood from the same frame, in two flavors:

  • covariates — column names in df whose per-unit pre- treatment mean is taken as the predictor value. Time-invariant unit characteristics collapse trivially; time-varying covariates are summarized by the pre-period mean.

  • outcome_lag_periods — specific pre-treatment time labels (as found in the time column) whose outcome values become additional predictor rows. These are the canonical Abadie, Diamond & Hainmueller (2010) lagged-outcome predictors (e.g., the smk_75, smk_80, smk_88 rows in the Prop 99 example).

The two lists are concatenated to form the predictor matrix; the first predictor (first entry of covariates if any, otherwise the first outcome lag) is the anchor whose \(v\)-weight is fixed at 1. The anchor choice matters in finite samples — Vives recommends picking a predictor known to be informative, or treating the anchor as a hyperparameter and sweeping it (Vives 2023, Appendix 6.1).

Performance notes#

The single biggest cost in a SparseSC fit is the inner W-weight QP, which is invoked

\[|\Lambda| \times (\text{outer iters}) \times g\]

times where \(g = 2(P-1)\) under finite-difference gradients and \(g = 1\) under the analytical gradient. For Vives’s augmented k=40 spec that’s ~50,000 inner-QP calls just for the fit, plus another \(B \approx 38\) placebos under inference_method="placebo". Two optimizations in this build matter:

  • Direct Clarabel removes CVXPY canonicalization (~30-60× per call). Speedup applies universally; no correctness tradeoff.

  • Analytical gradient (opt-in via use_analytical_grad=True) removes the \(2(P-1)\) finite-difference factor (~5-10× on the outer loop). Tradeoff: the cleaner gradient can settle in worse local optima of the non-convex L1-penalized outer objective; FD’s implicit gradient noise tends to escape them. Default off for correctness.

Empirically, the combination puts the canonical ADH-7 California Prop 99 fit at ~5 s with analytical gradient and ~23 s with FD (versus a CVXPY+SCS baseline that would hang for minutes on the augmented k=40 spec).

Core API#

Sparse Synthetic Control (SparseSC) estimator.

Implements the L1-penalized predictor-weighting SCM of Vives-i-Bastida (2023, Predictor Selection for Synthetic Controls), applied to the canonical Abadie, Diamond, and Hainmueller (2010) framework.

The estimator has a two-level structure. The inner problem is the standard SCM simplex QP that picks donor weights w given a fixed diagonal predictor-importance matrix diag(v). The outer problem picks the V-weights themselves by minimizing the validation-block pre-treatment outcome MSE plus an L1 penalty on |v| (matching Algorithm 1 of the paper). The penalty parameter is selected by the unpenalized validation MSE. The first V-weight is pinned to 1 to anchor the scale; the others are bound-constrained non-negative.

Compared with canonical SCM, the L1 penalty yields interpretable predictor selection: as lambda increases, V-weights collapse to zero on uninformative predictors, leaving a sparse explanation of the fit.

The unpublished MATLAB driver sparse_synth.m minimizes the outcome MSE on the training block (not the validation block) in the outer V step. That behavior is available via outer_loss_window="training".

class mlsynth.estimators.sparse_sc.SparseSC(config: SparseSCConfig | dict)#

Bases: object

L1-penalized Sparse Synthetic Control estimator.

Parameters:

config (SparseSCConfig or dict) – Configuration object. See mlsynth.config_models.SparseSCConfig.

Returns:

SparseSCResults – Typed container with the selected V- and W-weights, the validation-MSE curve over the lambda grid, the counterfactual, and (optionally) Abadie placebo inference.

Notes

Predictors are supplied through covariates (columns in df whose per-unit pre-treatment mean becomes one predictor row) and/or outcome_lag_periods (specific pre-treatment time labels whose outcome values become predictor rows – the canonical ADH lagged- outcome predictors). The first predictor is the “anchor” whose V-weight is fixed at 1.

Examples

>>> import pandas as pd
>>> from mlsynth import SparseSC
>>> df = pd.read_csv("smoking_long.csv")
>>> res = SparseSC({
...     "df": df, "outcome": "cigsale",
...     "treat": "Proposition 99", "unitid": "state", "time": "year",
...     "covariates": ["p_cig", "loginc", "pct15-24", "pc_beer"],
...     "outcome_lag_periods": [1975, 1980, 1988],
...     "display_graphs": False,
... }).fit()
>>> res.att
-19.5...
fit() SparseSCResults#

Run the lambda sweep, recover W-weights, and return results.

Configuration#

class mlsynth.config_models.SparseSCConfig(*, df: ~pandas.DataFrame, outcome: str, treat: str, unitid: str, time: str, display_graphs: bool = True, save: bool | str = False, counterfactual_color: ~typing.List[str] = <factory>, treated_color: str = 'black', covariates: ~typing.List[str] | None = None, outcome_lag_periods: ~typing.List[~typing.Any] | None = None, T0_train: ~typing.Annotated[int | None, ~annotated_types.Ge(ge=2)] = None, lambda_grid: ~typing.List[float] | None = None, standardize: bool = True, outer_loss_window: str = 'training', solver: ~typing.Any = None, max_outer_iter: ~typing.Annotated[int, ~annotated_types.Ge(ge=10)] = 500, run_inference: bool = True, inference_method: ~typing.Literal['conformal', 'placebo', 'none'] = 'conformal', conformal_window: ~typing.Literal['validation', 'pre'] = 'validation', alpha: ~typing.Annotated[float, ~annotated_types.Gt(gt=0.0), ~annotated_types.Lt(lt=1.0)] = 0.05, n_placebo: ~typing.Annotated[int | None, ~annotated_types.Ge(ge=1)] = None, placebo_resweep: bool = False, seed: int = 1400, use_analytical_grad: bool = False, warm_start: bool = False)#

Configuration for the Sparse Synthetic Control (SparseSC) estimator.

Implements the L1-penalized predictor-weighting SCM variant of Vives-i-Bastida and collaborators (port of the MATLAB sparse_synth.m driver) for the canonical Abadie, Diamond, and Hainmueller (2010) framework.

Like every other mlsynth estimator this one is fed a single long-format df with one row per (unit, time). Predictors are constructed under the hood from the long frame: each column listed in covariates is collapsed to its pre-treatment mean per unit, and each entry of outcome_lag_periods adds the outcome at that specific pre-treatment period as a predictor.

T0_train: int | None#
alpha: float#
conformal_window: Literal['validation', 'pre']#
covariates: List[str] | None#
inference_method: Literal['conformal', 'placebo', 'none']#
lambda_grid: List[float] | None#
max_outer_iter: int#
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_placebo: int | None#
outcome_lag_periods: List[Any] | None#
outer_loss_window: str#
placebo_resweep: bool#
run_inference: bool#
seed: int#
solver: Any#
standardize: bool#
use_analytical_grad: bool#
warm_start: bool#

Helper Modules#

Panel and predictor preparation for SparseSC.

The estimator takes a single long-format df (one row per (unit, time)) and constructs both the outcome panel and the unit-by-predictor matrix internally. Predictors come from two sources:

  • covariates – columns in df whose per-unit pre-treatment mean becomes one predictor row.

  • outcome_lag_periods – specific pre-treatment time labels whose outcome values become additional predictor rows (the canonical Abadie, Diamond & Hainmueller (2010) lagged-outcome predictors).

The first predictor (first entry of covariates if any, otherwise the first outcome lag) is the anchor whose V-weight is fixed at 1.

mlsynth.utils.sparse_sc_helpers.setup.prepare_sparse_sc_inputs(df: DataFrame, outcome: str, treat: str, unitid: str, time: str, covariates: Sequence[str] | None = None, outcome_lag_periods: Sequence[Any] | None = None, T0_train: int | None = None, standardize: bool = True) SparseSCInputs#

Build SparseSC inputs from a single long-format panel.

Parameters:
  • df (pd.DataFrame) – Long-format balanced panel: one row per (unit, time) with the outcome, a binary treatment indicator, and any covariates.

  • outcome, treat, unitid, time (str) – Column names in df.

  • covariates (Sequence[str], optional) – Columns in df whose per-unit pre-treatment mean becomes a predictor row. The first covariate is the anchor (V-weight pinned to 1).

  • outcome_lag_periods (Sequence, optional) – Specific pre-treatment time labels whose outcome values become additional predictor rows. Appended after covariates.

  • T0_train (int, optional) – End of the training block within the pre-period (exclusive). Defaults to floor(T0_total * 0.75) – a 75/25 split.

  • standardize (bool) – Standardize each predictor row by its sample standard deviation across all units. Default True.

Inner W-weight QP for SparseSC.

Given V-weights v (a vector of predictor importance), the donor weights w solve the canonical SCM simplex QP

min over w >= 0, sum(w) = 1:

w’ X0’ diag(v) X0 w - 2 X1’ diag(v) X0 w.

The inner QP is called O(grid x outer_iters x P) times during a SparseSC fit, so its per-call cost dominates the total wall time. We solve it by calling Clarabel directly (skipping CVXPY’s canonicalization layer); CVXPY adds ~10-50 ms of parsing overhead per call on a 39-donor problem, while the underlying solve is microseconds.

A tiny ridge is added to the quadratic term for numerical stability: when the augmented Vives k~40 specification is used the donor design matrix can be rank-deficient (more predictors than donors), which makes H = X0’ diag(v) X0 numerically singular under any QP solver. The ridge is scaled by the trace of H so it is invariant to the units of v and X0.

The solver argument is retained for API compatibility but no longer selects between back-ends; Clarabel is used unconditionally.

mlsynth.utils.sparse_sc_helpers.inner.solve_w(v: ndarray, X1: ndarray, X0: ndarray, solver: Any = None) ndarray#

Return the donor-weight vector w on the simplex.

Parameters:
  • v (np.ndarray) – Length-P V-weight vector.

  • X1 (np.ndarray) – Length-P treated predictor vector.

  • X0 (np.ndarray) – (P, N) donor predictor matrix.

  • solver (Any, optional) – Unused. Retained for backwards compatibility with the previous CVXPY-based interface.

Outer V-objective for SparseSC, with closed-form gradient.

The outer objective is

L(v_2; lam) = (1/T_outer) * ||Z1 - Z0 w*(v)||^2 + lam * ||v||_1

where v = [1; v_2] (the first predictor is pinned at 1 to break the positive-scale invariance of the inner simplex QP, as argued in Vives-i-Bastida (2023) Appendix 6.1) and w*(v) solves the inner simplex QP

min_w w’ H(v) w - 2 g(v)’ w s.t. 1’ w = 1, w >= 0,

with H(v) = X0' diag(v) X0 and g(v) = X0' diag(v) X1.

Two outer windows are supported, controlled at call sites via the Z1, Z0 arguments: pass the validation block to match Algorithm 1 in the paper; pass the training block to match the MATLAB driver.

This module provides three callables:

  • outer_loss – the loss alone (kept for back-compat).

  • selection_mse – the unpenalised validation-block MSE used to

    choose lambda.

  • outer_loss_and_grad(loss, grad) with the closed-form

    envelope-theorem gradient.

The closed-form gradient avoids the 2(P-1)-evaluation finite- difference cost that L-BFGS-B otherwise incurs per outer step. The derivation is in the module docstring of optimization.py.

mlsynth.utils.sparse_sc_helpers.objective.outer_loss(v2: ndarray, X1: ndarray, X0: ndarray, Z1: ndarray, Z0: ndarray, lam: float, solver: Any = None) float#

Outer V-objective: mean((Z1 - Z0 w(v))^2) + lam * ||v||_1.

Pass the validation block to match the paper’s Algorithm 1; pass the training block to match the MATLAB driver.

mlsynth.utils.sparse_sc_helpers.objective.outer_loss_and_grad(v2: ndarray, X1: ndarray, X0: ndarray, Z1: ndarray, Z0: ndarray, lam: float, solver: Any = None, active_tol: float = 1e-07) Tuple[float, ndarray]#

Return (loss, grad) of the outer V-objective w.r.t. v_2.

Implements the envelope-theorem gradient described in this module’s docstring. The active set is recovered from the inner solution by thresholding on active_tol. A single (|A|+1) x (|A|+1) KKT solve gives the adjoint, after which all P-1 gradient components are computed in O(P |A|) work.

The L1 part contributes +lam per coordinate (right-derivative under the v_2 >= 0 bound L-BFGS-B already enforces).

Falls back to lstsq if the reduced KKT matrix is numerically singular (which can happen when the same donor appears at multiple predictor rows).

mlsynth.utils.sparse_sc_helpers.objective.selection_mse(v2: ndarray, X1: ndarray, X0: ndarray, Z1_val: ndarray, Z0_val: ndarray, solver: Any = None) float#

Unpenalised validation-block MSE used to select lambda.

mlsynth.utils.sparse_sc_helpers.objective.training_loss(v2: ndarray, X1: ndarray, X0: ndarray, Z1: ndarray, Z0: ndarray, lam: float, solver: Any = None) float#

Outer V-objective: mean((Z1 - Z0 w(v))^2) + lam * ||v||_1.

Pass the validation block to match the paper’s Algorithm 1; pass the training block to match the MATLAB driver.

mlsynth.utils.sparse_sc_helpers.objective.validation_mse(v2: ndarray, X1: ndarray, X0: ndarray, Z1_val: ndarray, Z0_val: ndarray, solver: Any = None) float#

Unpenalised validation-block MSE used to select lambda.

Lambda sweep + V-weight optimisation for SparseSC.

For each lambda on the grid the outer V-weight problem is a smooth bound-constrained nonlinear program (v_2 >= 0) solved with scipy.optimize.minimize (L-BFGS-B). The selected lambda is the value minimising the unpenalised validation-block MSE.

Two performance refinements over a naive implementation are in place:

  • Closed-form gradient (Vives’s Algorithm 1 outer objective is smooth in v away from the L1 kink; the L1 part has a trivial right-derivative under the v_2 >= 0 bound L-BFGS-B already enforces). Without this, L-BFGS-B falls back to a 2(P-1)-evaluation central finite-difference per outer step, which is the dominant cost on large predictor sets. The closed-form gradient is implemented in objective.outer_loss_and_grad via the envelope theorem: at the inner optimum w*(v), one (|A|+1) x (|A|+1) Cholesky on the active-set KKT matrix produces all P-1 gradient components.

  • Warm starts across the lambda grid. The path is monotone in lambda, so the V-solution at lambda_i is a good initialiser for lambda_{i+1}. A failed warm start falls back to the cold MATLAB init default_v20.

The outer V-objective window is controlled by outer_loss_window:

  • "validation" (default, paper) – outer V minimises validation- block MSE + lambda * ||V||_1. Matches Vives-i-Bastida (2023) Algorithm 1.

  • "training" – outer V minimises training-block MSE + lambda * ||V||_1. Matches the unpublished MATLAB driver sparse_synth.m.

mlsynth.utils.sparse_sc_helpers.optimization.default_lambda_grid(size: int = 51) ndarray#

Return [0, logspace(-4, 0, size - 1)] (matches MATLAB).

mlsynth.utils.sparse_sc_helpers.optimization.default_v20(X0: ndarray) ndarray#

MATLAB starting v_2 = (sd_1 / sd_k)^2 for k > 1.

mlsynth.utils.sparse_sc_helpers.optimization.recover_w(v: ndarray, X1: ndarray, X0: ndarray, solver: Any = None) ndarray#

Final donor-weight recovery at the selected V-weights.

mlsynth.utils.sparse_sc_helpers.optimization.sweep_lambda(X1: ndarray, X0: ndarray, Y1: ndarray, Y0: ndarray, T0_total: int, T0_train: int, lambda_grid: ndarray | None = None, solver: Any = None, max_outer_iter: int = 500, ftol: float | None = None, outer_loss_window: str = 'validation', use_analytical_grad: bool = False, warm_start: bool = False, multi_start: int = 1) Tuple[ndarray, float, ndarray, ndarray, ndarray, ndarray]#

Sweep lambda and return the best V-weights.

Parameters:
  • outer_loss_window ({“validation”, “training”}) – Which pre-treatment block the outer V-objective evaluates the outcome MSE over.

  • use_analytical_grad (bool, default True) – Use the envelope-theorem closed-form gradient inside L-BFGS-B. Set to False to fall back to scipy’s finite-difference gradient (~20-50x slower on the augmented Vives spec).

  • warm_start (bool, default True) – Reuse the previous lambda’s V-solution as the initialiser for the next lambda. Falls back to the cold MATLAB init if a warm-started fit appears to fail.

Returns:

  • optv (np.ndarray) – Final V-weights, shape (P,) with optv[0] = 1.

  • opt_lambda (float) – Lambda value selected on the validation MSE.

  • grid (np.ndarray) – Lambda grid actually used.

  • outer_curve (np.ndarray) – Penalised outer objective at each grid point.

  • val_curve (np.ndarray) – Unpenalised validation MSE at each grid point (selection target).

  • v_path (np.ndarray) – Per-grid-point V-weights, shape (len(grid), P).

Inference procedures for SparseSC.

Two methods are implemented:

  • run_placebo – the Abadie-style placebo permutation. For each donor we treat that donor as the placebo treated unit, refit SparseSC at the already-selected lambda, and compare the observed ATT against the distribution of placebo ATTs.

  • conformal_inference – a moving-block conformal CI in the spirit of Chernozhukov, Wuethrich and Zhu (2021), adapted to the SparseSC pre / validation / post panel layout. Calibration residuals come from either the validation block (default – smallest sample but truly out-of-sample under V) or the entire pre-treatment block (larger sample, but training residuals are in-sample under V). The ATT CI is obtained by inverting a moving- block test of the form mean(|e_post - theta|) <= q_{1-alpha} of the calibration distribution; pointwise per-period bands use the same q_{1-alpha} quantile directly.

mlsynth.utils.sparse_sc_helpers.inference.conformal_inference(gap: ndarray, T0_train: int, T0_total: int, T: int, conformal_window: str = 'validation', alpha: float = 0.05, block_size: int | None = None, grid_size: int = 401, grid_half_width_se: float = 6.0) dict#

Conformal ATT confidence interval from in-sample residuals.

Parameters:
  • gap (np.ndarray) – Full-period residual Y1 - Y0 @ w, shape (T,). The pre- treatment portion (gap[:T0_total]) is interpreted as noise under the no-treatment null; gap[T0_total:] is the post- treatment effect-plus-noise sequence.

  • T0_train, T0_total, T (int) – Training-block end / pre-block end / full length. Pre = [0, T0_total), validation = [T0_train, T0_total), post = [T0_total, T).

  • conformal_window ({“validation”, “pre”}) – Which residual block to use for calibration. "validation" uses only gap[T0_train:T0_total] (truly out-of-sample under the chosen V); "pre" uses the entire gap[:T0_total].

  • alpha (float) – Two-sided significance level.

  • block_size (int, optional) – Moving-block size for the conformity score. Defaults to max(3, sqrt(n_post)), matching LEXSCM.

  • grid_size (int) – Number of theta candidates in the grid search for the ATT CI.

  • grid_half_width_se (float) – The grid spans [ATT_hat +/- grid_half_width_se * SE] where SE is a plug-in standard error from the calibration residuals.

Returns:

dict – Keys: method, att_observed, ci_lower, ci_upper, p_value, calibration_residuals, pointwise_lower, pointwise_upper, alpha.

mlsynth.utils.sparse_sc_helpers.inference.run_placebo(Y0: ndarray, Y1: ndarray, X0: ndarray, X1: ndarray, T0_total: int, T0_train: int, selected_lambda: float, observed_att: float, solver: Any = None, resweep: bool = False, lambda_grid: ndarray | None = None, n_placebo: int | None = None, seed: int = 1400, outer_loss_window: str = 'validation') Tuple[ndarray, float, int]#

Return (placebo_atts, p_value, n_completed).

Parameters:
  • Y0, Y1, X0, X1 (np.ndarray) – Full pre-standardized panel + predictor matrices.

  • T0_total, T0_train (int) – Pre-treatment window bounds.

  • selected_lambda (float) – Lambda chosen on the actual treated unit. Reused for each placebo when resweep=False (default).

  • observed_att (float) – ATT of the actual treated unit, used to construct the p-value.

  • resweep (bool) – If True, re-run the full lambda grid for each placebo. Slow.

  • lambda_grid (np.ndarray, optional) – Grid for the resweep case.

  • n_placebo (int, optional) – Subsample of donors to use as placebos. None uses every donor.

  • seed (int) – Seed for the subsample when n_placebo < N.

Plot helper for SparseSC.

Wraps mlsynth.utils.resultutils.plot_estimates() so the observed-vs-counterfactual chart works with the typed results object.

mlsynth.utils.sparse_sc_helpers.plotter.plot_sparse_sc(results: SparseSCResults, treated_color: str = 'black', counterfactual_color: str | List[str] = 'red', save: bool | str | dict = False, time_axis_label: str = 'Time', treatment_label: str = 'Treatment', unit_label: str = 'Unit') None#

Render observed vs SparseSC counterfactual on the treated unit.

Typed result containers for SparseSC.

class mlsynth.utils.sparse_sc_helpers.structures.SparseSCDesign(v: ndarray, w: ndarray, opt_lambda: float, lambda_grid: ndarray, train_loss_curve: ndarray, val_mse_curve: ndarray, v_path: ndarray)#

Optimization outputs of the lambda sweep.

Parameters:
  • v (np.ndarray) – Final V-weights, shape (P,). First entry is 1 (the anchor).

  • w (np.ndarray) – Final donor weights, shape (N,), on the simplex.

  • opt_lambda (float) – Selected L1 penalty.

  • lambda_grid (np.ndarray) – Full grid of lambdas swept.

  • train_loss_curve (np.ndarray) – Training loss at each grid point, length equal to lambda_grid.

  • val_mse_curve (np.ndarray) – Validation MSE at each grid point.

  • v_path (np.ndarray) – Per-grid-point V-weights, shape (len(grid), P).

lambda_grid: ndarray#
opt_lambda: float#
train_loss_curve: ndarray#
v: ndarray#
v_path: ndarray#
val_mse_curve: ndarray#
w: ndarray#
class mlsynth.utils.sparse_sc_helpers.structures.SparseSCInference(method: str, p_value: float, att_observed: float = nan, ci_lower: float = nan, ci_upper: float = nan, alpha: float = nan, placebo_atts: ~numpy.ndarray = <factory>, n_placebo: int = 0, calibration_residuals: ~numpy.ndarray = <factory>, pointwise_lower: ~numpy.ndarray = <factory>, pointwise_upper: ~numpy.ndarray = <factory>)#

Inference results for SparseSC.

Either the Abadie-style placebo permutation or the validation-block conformal inference of Chernozhukov, Wuethrich and Zhu (2021) adapted to the SparseSC pre/post layout. The method tag identifies which fields are populated.

Parameters:
  • method (str) – "abadie_placebo_permutation", "conformal_validation", "conformal_pre", or "none".

  • p_value (float) – Two-sided p-value for H_0: ATT = 0. NaN when no inference was run.

  • att_observed (float) – Point estimate of ATT, copied here for convenience.

  • ci_lower, ci_upper (float) – Lower/upper bounds of the (1 - alpha) confidence interval for the ATT. NaN for method="none".

  • alpha (float) – Two-sided significance level used to build ci_*.

  • placebo_atts (np.ndarray) – Placebo ATTs, populated only when method is the placebo permutation. Empty array otherwise.

  • n_placebo (int) – Number of placebo runs (placebo method only; 0 otherwise).

  • calibration_residuals (np.ndarray) – Residuals used to build the conformity scores (conformal method only). Empty for the placebo method.

  • pointwise_lower, pointwise_upper (np.ndarray) – Per-period pointwise band around each post-period gap from the (1 - alpha)-quantile of the conformity scores. Empty for non-conformal methods.

alpha: float = nan#
att_observed: float = nan#
calibration_residuals: ndarray#
ci_lower: float = nan#
ci_upper: float = nan#
method: str#
n_placebo: int = 0#
p_value: float#
placebo_atts: ndarray#
pointwise_lower: ndarray#
pointwise_upper: ndarray#
class mlsynth.utils.sparse_sc_helpers.structures.SparseSCInputs(Y0: ndarray, Y1: ndarray, X0: ndarray, X1: ndarray, T: int, T0_total: int, T0_train: int, treated_unit_name: Any, donor_names: Sequence, predictor_names: Sequence, time_labels: ndarray, Ywide: Any, outcome: str)#

Pre-processed panel + predictor matrices for SparseSC.

Parameters:
  • Y0 (np.ndarray) – Donor outcome matrix, shape (T, N) (rows = time, columns = donors), aligned with donor_names.

  • Y1 (np.ndarray) – Treated outcome series, shape (T,).

  • X0 (np.ndarray) – Donor predictor matrix, shape (P, N) (rows = predictors, columns = donors), already standardized.

  • X1 (np.ndarray) – Treated predictor vector, shape (P,), already standardized.

  • T (int) – Total number of time periods.

  • T0_total (int) – End of the full pre-treatment window (exclusive).

  • T0_train (int) – End of the training block within the pre-period (exclusive). Validation block is [T0_train, T0_total).

  • treated_unit_name (Any) – Label of the treated unit.

  • donor_names (Sequence) – Donor labels in column order of Y0 / X0.

  • predictor_names (Sequence) – Predictor labels in row order of X0 / X1.

  • time_labels (np.ndarray) – Time labels in row order of Y0.

  • Ywide (Any) – Wide outcome frame preserved for plotting.

  • outcome (str) – Outcome variable name.

property N: int#

Number of donor units.

property P: int#

Number of predictors.

T: int#
T0_total: int#
T0_train: int#
X0: ndarray#
X1: ndarray#
Y0: ndarray#
Y1: ndarray#
Ywide: Any#
donor_names: Sequence#
outcome: str#
predictor_names: Sequence#
time_labels: ndarray#
treated_unit_name: Any#
class mlsynth.utils.sparse_sc_helpers.structures.SparseSCResults(inputs: SparseSCInputs, design: SparseSCDesign, inference: SparseSCInference, counterfactual: ndarray, gap: ndarray, att: float, pre_rmse: float, donor_weights: Dict[Any, float], predictor_weights: Dict[Any, float])#

Public SparseSC.fit() return container.

Parameters:
  • inputs (SparseSCInputs) – Pre-processed panel + predictors.

  • design (SparseSCDesign) – Lambda-selection results, V and W weights.

  • inference (SparseSCInference) – Placebo p-value or method = "none".

  • counterfactual (np.ndarray) – Y0 @ w over all T periods.

  • gap (np.ndarray) – Y1 - counterfactual, shape (T,).

  • att (float) – Mean post-treatment gap.

  • pre_rmse (float) – Root-mean-squared pre-treatment fit error.

  • donor_weights (Dict[Any, float]) – {donor_name: w_j}.

  • predictor_weights (Dict[Any, float]) – {predictor_name: v_p}.

att: float#
counterfactual: ndarray#
design: SparseSCDesign#
donor_weights: Dict[Any, float]#
gap: ndarray#
inference: SparseSCInference#
inputs: SparseSCInputs#
pre_rmse: float#
predictor_weights: Dict[Any, float]#

Example#

The canonical empirical example is Vives’s augmented California Proposition 99 study. Load the reshaped long-form panel and run SparseSC with the original ADH-7 predictor set:

"""Run SparseSC on the long-form augmented California dataset."""

from __future__ import annotations

import pandas as pd

from mlsynth import SparseSC


# ---------------------------------------------------------------------
# Load long-form panel
# ---------------------------------------------------------------------

df = pd.read_csv(
    "https://raw.githubusercontent.com/jgreathouse9/mlsynth/refs/heads/main/basedata/augmented_cali_long.csv"
)

LAG_PERIODS = [1975, 1980, 1988]

COVARIATES = [
    "p_cig",
    "loginc",
    "pct15-24",
    "pc_beer",
]


# ---------------------------------------------------------------------
# SparseSC fit
# ---------------------------------------------------------------------

results = SparseSC(
    {
        "df": df,
        "outcome": "cigsale",
        "treat": "Proposition 99",
        "unitid": "state",
        "time": "year",
        "covariates": COVARIATES,
        "outcome_lag_periods": LAG_PERIODS,
        "display_graphs": True,
        "run_inference": False,
    }
).fit()

Enable inference (validation-block conformal is the default) and inspect the ATT CI:

results = SparseSC({
    "df": df,
    "outcome": "cigsale",
    "treat": "Proposition 99",
    "unitid": "state",
    "time": "year",
    "covariates": COVARIATES,
    "outcome_lag_periods": LAG_PERIODS,
    "alpha": 0.05,                       # CI level
    "run_inference": True,
    "display_graphs": False,
}).fit()

print(results.att)                       # post-period ATT
print(results.inference.ci_lower,
      results.inference.ci_upper)        # 95% conformal CI
print(results.inference.p_value)         # H_0: ATT = 0
print(results.design.opt_lambda)         # selected L1 penalty
print(results.predictor_weights)         # {predictor: v_p}
print(results.donor_weights)             # {donor: w_j} on the simplex

# Lambda sweep diagnostics.
import matplotlib.pyplot as plt
plt.plot(results.design.lambda_grid, results.design.val_mse_curve)
plt.xscale("log"); plt.xlabel("lambda"); plt.ylabel("validation MSE")

References#

Abadie, A., Diamond, A., & Hainmueller, J. (2010). “Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California’s Tobacco Control Program.” Journal of the American Statistical Association 105(490):493-505.

Chernozhukov, V., Wuethrich, K., & Zhu, Y. (2021). “An Exact and Robust Conformal Inference Method for Counterfactual and Synthetic Controls.” Journal of the American Statistical Association 116(536):1849-1864.

Vives-i-Bastida, J. (2023). “Predictor Selection for Synthetic Controls.” arXiv:2203.11576v2.