Sparse Synthetic Control (SparseSC)

Sparse Synthetic Control (SparseSC)#

Overview#

SparseSC implements the L1-penalized predictor-weighting variant of canonical synthetic control proposed by Vives-i-Bastida (2023, Predictor Selection for Synthetic Controls). It targets the same Abadie, Diamond, and Hainmueller (2010) framework as classical SCM, but adds a lasso penalty on the predictor-importance vector \(\mathbf{v}\) to deliver interpretable predictor selection: as the L1 penalty grows, uninformative predictors get \(v_p\)-weights of exactly zero and are dropped from the fit.

Compared with the canonical SCM data-driven \(\mathbf{V}\) choice (a cross-validated grid search over diagonal \(\mathbf{V}\) minimizing pre-period MSE), SparseSC

selects predictors explicitly via L1 sparsity rather than implicitly via small but nonzero \(v_p\)-weights;
picks the L1 penalty \(\lambda\) on a held-out validation block of the pre-period (a 75/25 train/validation split by default, which matches the 14/5-year split Vives used in the empirical Prop 99 application); and
anchors the first predictor’s \(v_p\)-weight at 1, which fixes the overall scale and removes the trivial \(\mathbf{v} = \mathbf{0}\) minimum that the L1 penalty would otherwise admit.

The donor weights \(\mathbf{w}\) solve the usual SCM simplex QP given \(\mathbf{v}\).

Inference defaults to a moving-block conformal CI for the ATT in the spirit of Chernozhukov, Wuethrich and Zhu (2021), calibrated on the validation-block residuals. Vives’s Abadie-style placebo permutation is still available via inference_method="placebo".

When to use this estimator#

Reach for SparseSC when you have one treated unit, a rich predictor set, and you want the fit to tell you which predictors matter rather than carry all of them with small but nonzero weights. The lasso penalty on the predictor-importance vector drives the uninformative predictors to exactly zero, so the synthetic control’s explanation of the treated pre-trajectory is interpretable in terms of a small, named subset.

A concrete example: a state passes a tobacco-control law and you have dozens of candidate predictors of cigarette sales – prices, income, beer consumption, the youth share, and several lagged outcomes. With the canonical data-driven \(\mathbf{V}\) every predictor keeps some weight and the story is muddy. SparseSC prunes the over-rich set down to the handful that actually drive the pre-law fit, selects the L1 penalty on a held-out validation block, and reads the policy effect as the post-law gap – with a conformal interval calibrated on the validation residuals rather than a coarse donor-permutation grid.

Notation#

Let \(j = 1\) denote the treated unit, with all units \(\mathcal{N} \coloneqq \{1, \dots, N\}\) and donor pool \(\mathcal{N}_0 \coloneqq \mathcal{N} \setminus \{1\}\) of cardinality \(N_0\). Time runs over \(t \in \mathcal{T} \coloneqq \{1, \dots, T\}\), 1-indexed; the intervention takes effect after period \(T_0\), splitting \(\mathcal{T}\) into the pre-period \(\mathcal{T}_1 \coloneqq \{t \in \mathcal{T} : t \le T_0\}\) (of length \(T_0\)) and the post-period \(\mathcal{T}_2 \coloneqq \{t \in \mathcal{T} : t > T_0\}\).

The treated outcome is \(y_{1t}\); each donor \(j \in \mathcal{N}_0\) contributes a series \(\mathbf{y}_j\), stacked into the donor matrix \(\mathbf{Y}_0 \coloneqq [\mathbf{y}_j]_{j \in \mathcal{N}_0} \in \mathbb{R}^{T \times N_0}\) (one column per donor); write \(\mathbf{y}_{0t} \in \mathbb{R}^{N_0}\) for the donor outcomes at time \(t\) (the \(t\)-th row). Donor weights are \(\mathbf{w} \in \mathbb{R}^{N_0}\), constrained to the unit simplex \(\Delta^{N_0} \coloneqq \{\mathbf{w} \in \mathbb{R}_{\ge 0}^{N_0} : \|\mathbf{w}\|_1 = 1\}\); the optimiser is \(\mathbf{w}^\ast\). The synthetic counterfactual is \(\widehat{y}_{1t} \coloneqq \mathbf{y}_{0t}^\top \mathbf{w}^\ast\), the per-period effect is \(\tau_t \coloneqq y_{1t} - \widehat{y}_{1t}\), and the ATT is \(\widehat{\tau} \coloneqq |\mathcal{T}_2|^{-1} \sum_{t \in \mathcal{T}_2} \tau_t\). The significance level is \(\alpha\).

Predictors enter through a treated vector \(\mathbf{x}_1 \in \mathbb{R}^P\) and a donor matrix \(\mathbf{X}_0 \in \mathbb{R}^{P \times N_0}\) over \(P\) predictors; after standardization each row of \([\mathbf{X}_0, \mathbf{x}_1]\) has unit sample standard deviation across units. The predictor-importance vector is \(\mathbf{v} \in \mathbb{R}^P_{\ge 0}\) with components \(v_p\); the L1 penalty strength is \(\lambda \ge 0\), and \(\mathbf{V} = \mathrm{diag}(\mathbf{v})\) is the diagonal predictor-weight matrix.

The pre-period is further split for tuning. The training block is \(\mathcal{T}_1^{\mathrm{tr}} \coloneqq \{1, \dots, T_0^{\mathrm{tr}}\}\) and the validation block is \(\mathcal{T}_1^{\mathrm{val}} \coloneqq \{T_0^{\mathrm{tr}} + 1, \dots, T_0\}\), with \(\mathcal{T}_1 = \mathcal{T}_1^{\mathrm{tr}} \cup \mathcal{T}_1^{\mathrm{val}}\) (default 75/25). This pre-period split is internal to predictor and penalty selection and is distinct from the canonical pre/post split at \(T_0\).

Identifying assumptions#

Pre-treatment fit / convex-hull support. There exist weights \(\mathbf{w} \in \Delta^{N_0}\) and predictor weights \(\mathbf{v}\) under which the treated pre-period predictors are matched by the donors, \(\mathbf{x}_1 \approx \mathbf{X}_0 \mathbf{w}\) – equivalently, the treated unit lies inside (or near) the convex hull of the donors over the selected predictors (Abadie, Diamond & Hainmueller 2010).

Remark. This is the workhorse identifying condition for any synthetic control: a good pre-period match is the empirical certificate one inspects. SparseSC adds that the match should be achievable with few predictors – if it is, the lasso recovers a sparse, interpretable explanation; if the treated unit can only be matched by leaning on many predictors at once, the selected set will not be sparse.
Informative anchor. The first predictor (the anchor, with \(v_1 = 1\)) is genuinely informative about the treated unit’s pre-trajectory (Vives 2023, Appendix 6.1).

Remark. The anchor fixes the overall scale of \(\mathbf{v}\) and removes the trivial \(\mathbf{v} = \mathbf{0}\) minimum the L1 penalty would otherwise admit. Because the anchor’s weight is pinned at 1, a poorly chosen anchor biases the fit; Vives recommends picking a predictor known to be informative, or treating the anchor as a hyperparameter and sweeping it.
No anticipation. Treatment has no effect before \(T_0\): \(y_{1t} = y_{1t}^N\) for all \(t \in \mathcal{T}_1\), so the pre-period outcomes and predictors reflect the no-intervention path.

Remark. If the treated unit reacts in advance of the formal intervention date, the pre-period fit – and the validation-block residuals the conformal interval is calibrated on – are contaminated by the effect itself. Date \(T_0\) at the first plausible response, not the nominal policy date.
Outcome-model stability. The no-intervention outcomes follow a stable data-generating process across \(\mathcal{T}\), so weights selected on \(\mathcal{T}_1\) continue to reproduce the treated unit’s no-intervention path on \(\mathcal{T}_2\) (Abadie, Diamond & Hainmueller 2010).

Remark. This is what licenses extrapolating the pre-period fit forward, and it is also what the validation block stress-tests: holding out \(\mathcal{T}_1^{\mathrm{val}}\) checks that the selected predictors and penalty generalise within the pre-period before they are asked to generalise past \(T_0\).

Mathematical Formulation#

Inner W-weight QP#

Given \(\mathbf{v} \in \mathbb{R}^P_{\ge 0}\) the donor weights solve

\[\mathbf{w}^\ast(\mathbf{v}) = \operatorname*{argmin}_{\mathbf{w} \in \Delta^{N_0}} \; \mathbf{w}^\top \mathbf{X}_0^\top \mathrm{diag}(\mathbf{v})\, \mathbf{X}_0\, \mathbf{w} - 2\, \mathbf{x}_1^\top \mathrm{diag}(\mathbf{v})\, \mathbf{X}_0\, \mathbf{w},\]

where \(\Delta^{N_0} = \{\mathbf{w} \in \mathbb{R}^{N_0}_{\ge 0} : \mathbf{1}^\top \mathbf{w} = 1\}\) is the donor simplex. This is exactly the QP MATLAB’s quadprog solves inside sparse_synth/loss_function.m.

mlsynth calls Clarabel directly (bypassing CVXPY’s canonicalization layer), which is the single biggest performance fix versus the prior CVXPY-based implementation: CVXPY parsing overhead was ~10-50 ms per call for a 39-donor problem, while the underlying Clarabel solve itself takes microseconds. The constraint skeleton (A, b, cones, settings) is cached per donor count \(N_0\) so only the data terms \(\mathbf{H} = \mathbf{X}_0^\top \mathrm{diag}(\mathbf{v}) \mathbf{X}_0\) and \(\mathbf{q} = -2 \mathbf{X}_0^\top \mathrm{diag}(\mathbf{v}) \mathbf{x}_1\) are rebuilt per call.

For numerical robustness — the augmented k > N spec is rank- deficient and Clarabel can return InsufficientProgress at tight tolerance — the inner solve retries with a trace-scaled ridge and looser tolerances before falling back to a uniform-w feasible point. This prevents the outer L-BFGS-B sweep from aborting mid-run on a single bad exploration step.

Outer V-weight problem#

The \(\mathbf{v}\)-weights minimize a penalized outcome MSE plus the L1 penalty on \(\mathbf{v}\), over a window \(\mathcal{W}\):

\[\mathbf{v}^\ast(\lambda) = \operatorname*{argmin}_{\mathbf{v} \in \mathbb{R}^P_{\ge 0},\; v_1 = 1} \; \frac{1}{|\mathcal{W}|} \sum_{t \in \mathcal{W}} \bigl(y_{1t} - \mathbf{y}_{0t}^\top \mathbf{w}^\ast(\mathbf{v})\bigr)^2 + \lambda \, \|\mathbf{v}\|_1 .\]

The \(v_1 = 1\) anchor is what prevents the trivial all-zero solution at any \(\lambda > 0\): without it the outer objective is positive-scale-invariant in \(\mathbf{v}\) and the L1 penalty would push every component to zero (Vives 2023, Appendix 6.1).

The window \(\mathcal{W}\) is set by outer_loss_window:

"training" (default) — \(\mathcal{W} = \mathcal{T}_1^{\mathrm{tr}}\). Matches the unpublished MATLAB driver sparse_synth.m and reproduces the Prop 99 estimates Vives reports in the empirical section.
"validation" — \(\mathcal{W} = \mathcal{T}_1^{\mathrm{val}}\). Matches the page-5 \(L_V\) definition in Vives’s Algorithm 1 literally; useful for ablations but produces notably worse in-sample fit than the training variant.

Each evaluation of the outer objective invokes the inner QP, so the outer problem is a smooth bound-constrained NLP solved with L-BFGS-B (scipy.optimize).

Gradient computation#

L-BFGS-B needs gradients of the outer objective in \(\mathbf{v}\). Two modes are available, controlled by use_analytical_grad:

False (default) — central-difference numerical gradient. Each outer step pays \(2(P-1)\) inner-QP solves.
True — closed-form gradient via the envelope theorem applied at the inner optimum \(\mathbf{w}^\ast(\mathbf{v})\). With active set \(\mathcal{A} = \{i : w_i^\ast > 0\}\), one \((|\mathcal{A}| + 1) \times (|\mathcal{A}| + 1)\) Cholesky on the reduced KKT matrix yields all \(P - 1\) gradient components in \(O(P |\mathcal{A}|)\) work:

\[\frac{\partial \mathcal{L}}{\partial v_p} = -\frac{4}{|\mathcal{W}|}\, r_p \cdot \bigl(\mathbf{X}_0[p, \mathcal{A}]\, \mathbf{z}\bigr) + \lambda,\]

where \(r_p = x_{1p} - \mathbf{X}_0[p, \mathcal{A}]\, \mathbf{w}_{\mathcal{A}}^\ast\) is the predictor-\(p\) pre-fit residual and \(\mathbf{z}\) solves

\[\begin{split}\begin{pmatrix} 2 \mathbf{H}_{\mathcal{A}\mathcal{A}} & \mathbf{1} \\ \mathbf{1}^\top & 0 \end{pmatrix} \begin{pmatrix} \mathbf{z} \\ \mu_z \end{pmatrix} = \begin{pmatrix} \mathbf{Z}_0[:, \mathcal{A}]^\top \mathbf{r}_{\text{outer}} \\ 0 \end{pmatrix}.\end{split}\]

The analytical gradient is exact (verified against central FD to ~1e-7 at random interior points). It yields a ~5–10× speedup on the outer sweep, but the cleaner gradient lets L-BFGS-B settle at the first critical point near the cold init on the non-convex L1- penalized V-objective. The FD path’s implicit gradient noise tends to find better local optima at non-zero lambda, so the default is FD for correctness. Opt in to the analytical path when running large placebo sweeps where throughput matters more than exact local-optimum reproducibility. When use_analytical_grad = True, the L-BFGS-B ftol auto-tightens to 1e-12 because the clean gradient converges in many fewer iterations and the default 1e-8 terminates the loop before convergence.

Lambda selection#

The penalty \(\lambda\) is selected by the unpenalized validation-block outcome MSE:

\[\widehat{\lambda} = \operatorname*{argmin}_{\lambda \in \Lambda} \; \frac{1}{T_0 - T_0^{\mathrm{tr}}} \sum_{t \in \mathcal{T}_1^{\mathrm{val}}} \bigl(y_{1t} - \mathbf{y}_{0t}^\top \mathbf{w}^\ast(\mathbf{v}^\ast(\lambda))\bigr)^2 .\]

The default grid is \(\Lambda = \{0\} \cup \text{logspace}(10^{-4}, 1, 50)\). Setting \(\lambda = 0\) recovers the unpenalized data-driven SCM with a unit-anchored first predictor.

Predictor selection#

As \(\lambda\) grows, the L1 penalty drives uninformative \(v_p\) to exactly zero; the corresponding predictor effectively drops out of the fit. The selected predictor set is

\[\mathcal{S}(\widehat{\lambda}) = \{p : v_p^\ast(\widehat{\lambda}) > 0\}.\]

This is what makes the method Sparse SC: the explanation of the treated unit’s pre-trajectory is interpretable in terms of a small subset of predictors.

ATT and Counterfactual#

With \(\widehat{\mathbf{v}} = \mathbf{v}^\ast(\widehat{\lambda})\) and \(\widehat{\mathbf{w}} = \mathbf{w}^\ast(\widehat{\mathbf{v}})\) recovered on the full pre-period, the counterfactual and ATT are

\[\widehat{y}_{1t} = \mathbf{y}_{0t}^\top \widehat{\mathbf{w}}, \qquad \widehat{\tau} = \frac{1}{T - T_0} \sum_{t = T_0 + 1}^T \bigl(y_{1t} - \widehat{y}_{1t}\bigr).\]

Conformal ATT inference (default)#

Inference defaults to a moving-block conformal CI for the ATT, following the philosophy of Chernozhukov, Wuethrich and Zhu (2021): treat the in-sample residuals as a calibration sample of what “noise” should look like under the no-treatment null, and invert a permutation test in \(\theta\) to bracket the ATT.

Define the residual series \(e_t = y_{1t} - \mathbf{y}_{0t}^\top \widehat{\mathbf{w}}\). The calibration set is

\[\begin{split}e^{\text{calib}} = \begin{cases} \{e_t : t \in (T_0^{\text{tr}}, T_0]\} & \text{if ``conformal\_window = "validation"`` (default)} \\ \{e_t : t \in [1, T_0]\} & \text{if ``conformal\_window = "pre"``} \end{cases}.\end{split}\]

The validation block is genuinely out-of-sample under the chosen \(\mathbf{v}\); the full pre-block gives a larger calibration sample but its training-block residuals are in-sample under \(\mathbf{v}\).

The conformity score for a block \(B\) of size \(b = \max(3, \lfloor\sqrt{T - T_0}\rfloor)\) is

\[s(B) = \frac{1}{b}\sum_{t \in B} |e_t|,\]

and the calibration distribution is built by sliding the block across \(e^{\text{calib}}\) (with wrap-around blocks for boundary coverage). The post-treatment test statistic at the candidate ATT \(\theta\) is

\[s_{\text{post}}(\theta) = \frac{1}{T - T_0}\sum_{t = T_0 + 1}^T \bigl|e_t - \theta\bigr|.\]

The \((1 - \alpha)\) conformal CI is

\[\mathrm{CI}_{1 - \alpha} = \{\theta : \Pr_{B}\bigl(s(B) \ge s_{\text{post}}(\theta)\bigr) > \alpha\},\]

which we compute by grid search over a generous neighbourhood of \(\widehat{\mathrm{ATT}}\). The two-sided p-value for \(H_0 : \mathrm{ATT} = 0\) is

\[p_{\text{conf}} = \Pr_{B}\bigl(s(B) \ge s_{\text{post}}(0)\bigr) = \frac{1}{|\mathcal{B}|}\sum_{B \in \mathcal{B}} \mathbb{1}\{s(B) \ge s_{\text{post}}(0)\}.\]

Pointwise per-period bands use the \((1 - \alpha)\)-quantile of the calibration scores directly:

\[[e_t - q_{1 - \alpha},\; e_t + q_{1 - \alpha}], \quad q_{1 - \alpha} = \mathrm{Quantile}_{1 - \alpha}\{s(B)\}.\]

This inferential procedure trades the cross-donor exchangeability assumption of Vives’s placebo (every donor is equally likely to be the treated unit) for a within-unit exchangeability assumption on the residuals (validation-period residuals look like the no-treatment counterfactual’s noise). On Prop 99 the conformal 95% CI is typically \([-20, -18]\) versus the placebo’s much wider bounds, because conformal leverages the actual model’s residual structure rather than donor-level heterogeneity.

Abadie-style placebo (opt-in)#

Set inference_method="placebo" to recover Vives’s procedure. For each donor \(j\), swap that donor into the treated slot, remove it from the donor pool, refit SparseSC at the already- selected \(\widehat\lambda\) (or, optionally, re-run the full \(\lambda\) sweep) and record the placebo ATT. The two-sided permutation p-value is

\[p = \frac{\#\{j : |\mathrm{ATT}_j^{\text{placebo}}| \ge |\widehat{\mathrm{ATT}}|\} + 1}{B + 1},\]

where \(B\) is the number of completed placebos. Re-using \(\widehat\lambda\) makes the placebo loop tractable; set placebo_resweep=True to re-select \(\lambda\) for every placebo (much slower).

Predictor Convention#

Like every other mlsynth estimator, SparseSC is fed a single long-format df with one row per (unit, time). Predictors are constructed under the hood from the same frame, in two flavors:

covariates — column names in df whose per-unit pre- treatment mean is taken as the predictor value. Time-invariant unit characteristics collapse trivially; time-varying covariates are summarized by the pre-period mean.
outcome_lag_periods — specific pre-treatment time labels (as found in the time column) whose outcome values become additional predictor rows. These are the canonical Abadie, Diamond & Hainmueller (2010) lagged-outcome predictors (e.g., the smk_75, smk_80, smk_88 rows in the Prop 99 example).

The two lists are concatenated to form the predictor matrix; the first predictor (first entry of covariates if any, otherwise the first outcome lag) is the anchor whose \(v\)-weight is fixed at 1. The anchor choice matters in finite samples — Vives recommends picking a predictor known to be informative, or treating the anchor as a hyperparameter and sweeping it (Vives 2023, Appendix 6.1).

Performance notes#

The single biggest cost in a SparseSC fit is the inner W-weight QP, which is invoked

\[|\Lambda| \times (\text{outer iters}) \times g\]

times where \(g = 2(P-1)\) under finite-difference gradients and \(g = 1\) under the analytical gradient. For Vives’s augmented k=40 spec that’s ~50,000 inner-QP calls just for the fit, plus another \(B \approx 38\) placebos under inference_method="placebo". Two optimizations in this build matter:

Direct Clarabel removes CVXPY canonicalization (~30-60× per call). Speedup applies universally; no correctness tradeoff.
Analytical gradient (opt-in via use_analytical_grad=True) removes the \(2(P-1)\) finite-difference factor (~5-10× on the outer loop). Tradeoff: the cleaner gradient can settle in worse local optima of the non-convex L1-penalized outer objective; FD’s implicit gradient noise tends to escape them. Default off for correctness.

Empirically, the combination puts the canonical ADH-7 California Prop 99 fit at ~5 s with analytical gradient and ~23 s with FD (versus a CVXPY+SCS baseline that would hang for minutes on the augmented k=40 spec).

Core API#

Sparse Synthetic Control (SparseSC) estimator.

Implements the L1-penalized predictor-weighting SCM of Vives-i-Bastida (2023, Predictor Selection for Synthetic Controls), applied to the canonical Abadie, Diamond, and Hainmueller (2010) framework.

The estimator has a two-level structure. The inner problem is the standard SCM simplex QP that picks donor weights w given a fixed diagonal predictor-importance matrix diag(v). The outer problem picks the V-weights themselves by minimizing the validation-block pre-treatment outcome MSE plus an L1 penalty on |v| (matching Algorithm 1 of the paper). The penalty parameter is selected by the unpenalized validation MSE. The first V-weight is pinned to 1 to anchor the scale; the others are bound-constrained non-negative.

Compared with canonical SCM, the L1 penalty yields interpretable predictor selection: as lambda increases, V-weights collapse to zero on uninformative predictors, leaving a sparse explanation of the fit.

The unpublished MATLAB driver sparse_synth.m minimizes the outcome MSE on the training block (not the validation block) in the outer V step. That behavior is available via outer_loss_window="training".

class mlsynth.estimators.sparse_sc.SparseSC(config: SparseSCConfig | dict)#

Bases: object

L1-penalized Sparse Synthetic Control estimator.

Parameters:: config (SparseSCConfig or dict) – Configuration object. See mlsynth.config_models.SparseSCConfig.
Returns:: SparseSCResults – Typed container with the selected V- and W-weights, the validation-MSE curve over the lambda grid, the counterfactual, and (optionally) Abadie placebo inference.

Notes

Predictors are supplied through covariates (columns in df whose per-unit pre-treatment mean becomes one predictor row) and/or outcome_lag_periods (specific pre-treatment time labels whose outcome values become predictor rows – the canonical ADH lagged- outcome predictors). The first predictor is the “anchor” whose V-weight is fixed at 1.

Examples

>>> import pandas as pd
>>> from mlsynth import SparseSC
>>> df = pd.read_csv("smoking_long.csv")
>>> res = SparseSC({
...     "df": df, "outcome": "cigsale",
...     "treat": "Proposition 99", "unitid": "state", "time": "year",
...     "covariates": ["p_cig", "loginc", "pct15-24", "pc_beer"],
...     "outcome_lag_periods": [1975, 1980, 1988],
...     "display_graphs": False,
... }).fit()
>>> res.att
-19.5...

fit() → SparseSCResults#: Run the lambda sweep, recover W-weights, and return results.

Configuration#

class mlsynth.config_models.SparseSCConfig(*, df: ~pandas.DataFrame, outcome: str, treat: str, unitid: str, time: str, display_graphs: bool = True, save: bool | str = False, counterfactual_color: ~typing.List[str] = <factory>, treated_color: str = 'black', plot: ~mlsynth.config_models.PlotConfig = <factory>, covariates: ~typing.List[str] | None = None, outcome_lag_periods: ~typing.List[~typing.Any] | None = None, T0_train: ~typing.Annotated[int | None, ~annotated_types.Ge(ge=2)] = None, lambda_grid: ~typing.List[float] | None = None, standardize: bool = True, outer_loss_window: str = 'training', solver: ~typing.Any = None, max_outer_iter: ~typing.Annotated[int, ~annotated_types.Ge(ge=10)] = 500, run_inference: bool = True, inference_method: ~typing.Literal['conformal', 'placebo', 'none'] = 'conformal', conformal_window: ~typing.Literal['validation', 'pre'] = 'validation', alpha: ~typing.Annotated[float, ~annotated_types.Gt(gt=0.0), ~annotated_types.Lt(lt=1.0)] = 0.05, n_placebo: ~typing.Annotated[int | None, ~annotated_types.Ge(ge=1)] = None, placebo_resweep: bool = False, seed: int = 1400, use_analytical_grad: bool = False, warm_start: bool = False)#

Configuration for the Sparse Synthetic Control (SparseSC) estimator.

Implements the L1-penalized predictor-weighting SCM variant of Vives-i-Bastida and collaborators (port of the MATLAB sparse_synth.m driver) for the canonical Abadie, Diamond, and Hainmueller (2010) framework.

Like every other mlsynth estimator this one is fed a single long-format df with one row per (unit, time). Predictors are constructed under the hood from the long frame: each column listed in covariates is collapsed to its pre-treatment mean per unit, and each entry of outcome_lag_periods adds the outcome at that specific pre-treatment period as a predictor.

T0_train: int | None#

alpha: float#

conformal_window: Literal['validation', 'pre']#

covariates: List[str] | None#

inference_method: Literal['conformal', 'placebo', 'none']#

lambda_grid: List[float] | None#

max_outer_iter: int#

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_placebo: int | None#

outcome_lag_periods: List[Any] | None#

outer_loss_window: str#

placebo_resweep: bool#

run_inference: bool#

seed: int#

solver: Any#

standardize: bool#

use_analytical_grad: bool#

warm_start: bool#

Helper Modules#

Panel and predictor preparation for SparseSC.

The estimator takes a single long-format df (one row per (unit, time)) and constructs both the outcome panel and the unit-by-predictor matrix internally. Predictors come from two sources:

covariates – columns in df whose per-unit pre-treatment mean becomes one predictor row.
outcome_lag_periods – specific pre-treatment time labels whose outcome values become additional predictor rows (the canonical Abadie, Diamond & Hainmueller (2010) lagged-outcome predictors).

The first predictor (first entry of covariates if any, otherwise the first outcome lag) is the anchor whose V-weight is fixed at 1.

mlsynth.utils.sparse_sc_helpers.setup.prepare_sparse_sc_inputs(df: DataFrame, outcome: str, treat: str, unitid: str, time: str, covariates: Sequence[str] | None = None, outcome_lag_periods: Sequence[Any] | None = None, T0_train: int | None = None, standardize: bool = True) → SparseSCInputs#

Build SparseSC inputs from a single long-format panel.

Parameters:

df (pd.DataFrame) – Long-format balanced panel: one row per (unit, time) with the outcome, a binary treatment indicator, and any covariates.
outcome, treat, unitid, time (str) – Column names in df.
covariates (Sequence[str], optional) – Columns in df whose per-unit pre-treatment mean becomes a predictor row. The first covariate is the anchor (V-weight pinned to 1).
outcome_lag_periods (Sequence, optional) – Specific pre-treatment time labels whose outcome values become additional predictor rows. Appended after covariates.
T0_train (int, optional) – End of the training block within the pre-period (exclusive). Defaults to floor(T0_total * 0.75) – a 75/25 split.
standardize (bool) – Standardize each predictor row by its sample standard deviation across all units. Default True.

Inner W-weight QP for SparseSC.

Given V-weights v (a vector of predictor importance), the donor weights w solve the canonical SCM simplex QP

min over w >= 0, sum(w) = 1:
w’ X0’ diag(v) X0 w - 2 X1’ diag(v) X0 w.

The inner QP is called O(grid x outer_iters x P) times during a SparseSC fit, so its per-call cost dominates the total wall time. We solve it by calling Clarabel directly (skipping CVXPY’s canonicalization layer); CVXPY adds ~10-50 ms of parsing overhead per call on a 39-donor problem, while the underlying solve is microseconds.

A tiny ridge is added to the quadratic term for numerical stability: when the augmented Vives k~40 specification is used the donor design matrix can be rank-deficient (more predictors than donors), which makes H = X0’ diag(v) X0 numerically singular under any QP solver. The ridge is scaled by the trace of H so it is invariant to the units of v and X0.

The solver argument is retained for API compatibility but no longer selects between back-ends; Clarabel is used unconditionally.

mlsynth.utils.sparse_sc_helpers.inner.solve_w(v: ndarray, X1: ndarray, X0: ndarray, solver: Any = None) → ndarray#

Return the donor-weight vector w on the simplex.

Parameters:

v (np.ndarray) – Length-P V-weight vector.
X1 (np.ndarray) – Length-P treated predictor vector.
X0 (np.ndarray) – (P, N) donor predictor matrix.
solver (Any, optional) – Unused. Retained for backwards compatibility with the previous CVXPY-based interface.

Outer V-objective for SparseSC, with closed-form gradient.

The outer objective is

L(v_2; lam) = (1/T_outer) * ||Z1 - Z0 w*(v)||^2 + lam * ||v||_1

where v = [1; v_2] (the first predictor is pinned at 1 to break the positive-scale invariance of the inner simplex QP, as argued in Vives-i-Bastida (2023) Appendix 6.1) and w*(v) solves the inner simplex QP

min_w w’ H(v) w - 2 g(v)’ w s.t. 1’ w = 1, w >= 0,

with H(v) = X0' diag(v) X0 and g(v) = X0' diag(v) X1.

Two outer windows are supported, controlled at call sites via the Z1, Z0 arguments: pass the validation block to match Algorithm 1 in the paper; pass the training block to match the MATLAB driver.

This module provides three callables:

outer_loss – the loss alone (kept for back-compat).
selection_mse – the unpenalised validation-block MSE used to
choose lambda.
outer_loss_and_grad – (loss, grad) with the closed-form
envelope-theorem gradient.

The closed-form gradient avoids the 2(P-1)-evaluation finite- difference cost that L-BFGS-B otherwise incurs per outer step. The derivation is in the module docstring of optimization.py.

mlsynth.utils.sparse_sc_helpers.objective.outer_loss(v2: ndarray, X1: ndarray, X0: ndarray, Z1: ndarray, Z0: ndarray, lam: float, solver: Any = None) → float#

Outer V-objective: mean((Z1 - Z0 w(v))^2) + lam * ||v||_1.

Pass the validation block to match the paper’s Algorithm 1; pass the training block to match the MATLAB driver.

mlsynth.utils.sparse_sc_helpers.objective.outer_loss_and_grad(v2: ndarray, X1: ndarray, X0: ndarray, Z1: ndarray, Z0: ndarray, lam: float, solver: Any = None, active_tol: float = 1e-07) → Tuple[float, ndarray]#

Return (loss, grad) of the outer V-objective w.r.t. v_2.

Implements the envelope-theorem gradient described in this module’s docstring. The active set is recovered from the inner solution by thresholding on active_tol. A single (|A|+1) x (|A|+1) KKT solve gives the adjoint, after which all P-1 gradient components are computed in O(P |A|) work.

The L1 part contributes +lam per coordinate (right-derivative under the v_2 >= 0 bound L-BFGS-B already enforces).

Falls back to lstsq if the reduced KKT matrix is numerically singular (which can happen when the same donor appears at multiple predictor rows).

mlsynth.utils.sparse_sc_helpers.objective.selection_mse(v2: ndarray, X1: ndarray, X0: ndarray, Z1_val: ndarray, Z0_val: ndarray, solver: Any = None) → float#: Unpenalised validation-block MSE used to select lambda.

mlsynth.utils.sparse_sc_helpers.objective.training_loss(v2: ndarray, X1: ndarray, X0: ndarray, Z1: ndarray, Z0: ndarray, lam: float, solver: Any = None) → float#

Outer V-objective: mean((Z1 - Z0 w(v))^2) + lam * ||v||_1.

Pass the validation block to match the paper’s Algorithm 1; pass the training block to match the MATLAB driver.

mlsynth.utils.sparse_sc_helpers.objective.validation_mse(v2: ndarray, X1: ndarray, X0: ndarray, Z1_val: ndarray, Z0_val: ndarray, solver: Any = None) → float#: Unpenalised validation-block MSE used to select lambda.

Lambda sweep + V-weight optimisation for SparseSC.

For each lambda on the grid the outer V-weight problem is a smooth bound-constrained nonlinear program (v_2 >= 0) solved with scipy.optimize.minimize (L-BFGS-B). The selected lambda is the value minimising the unpenalised validation-block MSE.

Two performance refinements over a naive implementation are in place:

Closed-form gradient (Vives’s Algorithm 1 outer objective is smooth in v away from the L1 kink; the L1 part has a trivial right-derivative under the v_2 >= 0 bound L-BFGS-B already enforces). Without this, L-BFGS-B falls back to a 2(P-1)-evaluation central finite-difference per outer step, which is the dominant cost on large predictor sets. The closed-form gradient is implemented in objective.outer_loss_and_grad via the envelope theorem: at the inner optimum w*(v), one (|A|+1) x (|A|+1) Cholesky on the active-set KKT matrix produces all P-1 gradient components.
Warm starts across the lambda grid. The path is monotone in lambda, so the V-solution at lambda_i is a good initialiser for lambda_{i+1}. A failed warm start falls back to the cold MATLAB init default_v20.

The outer V-objective window is controlled by outer_loss_window:

"validation" (default, paper) – outer V minimises validation- block MSE + lambda * ||V||_1. Matches Vives-i-Bastida (2023) Algorithm 1.
"training" – outer V minimises training-block MSE + lambda * ||V||_1. Matches the unpublished MATLAB driver sparse_synth.m.

mlsynth.utils.sparse_sc_helpers.optimization.default_lambda_grid(size: int = 51) → ndarray#: Return [0, logspace(-4, 0, size - 1)] (matches MATLAB).

mlsynth.utils.sparse_sc_helpers.optimization.default_v20(X0: ndarray) → ndarray#: MATLAB starting v_2 = (sd_1 / sd_k)^2 for k > 1.

mlsynth.utils.sparse_sc_helpers.optimization.recover_w(v: ndarray, X1: ndarray, X0: ndarray, solver: Any = None) → ndarray#: Final donor-weight recovery at the selected V-weights.

mlsynth.utils.sparse_sc_helpers.optimization.sweep_lambda(X1: ndarray, X0: ndarray, Y1: ndarray, Y0: ndarray, T0_total: int, T0_train: int, lambda_grid: ndarray | None = None, solver: Any = None, max_outer_iter: int = 500, ftol: float | None = None, outer_loss_window: str = 'validation', use_analytical_grad: bool = False, warm_start: bool = False, multi_start: int = 1) → Tuple[ndarray, float, ndarray, ndarray, ndarray, ndarray]#

Sweep lambda and return the best V-weights.

Parameters:

outer_loss_window ({“validation”, “training”}) – Which pre-treatment block the outer V-objective evaluates the outcome MSE over.
use_analytical_grad (bool, default True) – Use the envelope-theorem closed-form gradient inside L-BFGS-B. Set to False to fall back to scipy’s finite-difference gradient (~20-50x slower on the augmented Vives spec).
warm_start (bool, default True) – Reuse the previous lambda’s V-solution as the initialiser for the next lambda. Falls back to the cold MATLAB init if a warm-started fit appears to fail.

Returns:

optv (np.ndarray) – Final V-weights, shape (P,) with optv[0] = 1.
opt_lambda (float) – Lambda value selected on the validation MSE.
grid (np.ndarray) – Lambda grid actually used.
outer_curve (np.ndarray) – Penalised outer objective at each grid point.
val_curve (np.ndarray) – Unpenalised validation MSE at each grid point (selection target).
v_path (np.ndarray) – Per-grid-point V-weights, shape (len(grid), P).

Inference procedures for SparseSC.

Two methods are implemented:

run_placebo – the Abadie-style placebo permutation. For each donor we treat that donor as the placebo treated unit, refit SparseSC at the already-selected lambda, and compare the observed ATT against the distribution of placebo ATTs.
conformal_inference – a moving-block conformal CI in the spirit of Chernozhukov, Wuethrich and Zhu (2021), adapted to the SparseSC pre / validation / post panel layout. Calibration residuals come from either the validation block (default – smallest sample but truly out-of-sample under V) or the entire pre-treatment block (larger sample, but training residuals are in-sample under V). The ATT CI is obtained by inverting a moving- block test of the form mean(|e_post - theta|) <= q_{1-alpha} of the calibration distribution; pointwise per-period bands use the same q_{1-alpha} quantile directly.

mlsynth.utils.sparse_sc_helpers.inference.conformal_inference(gap: ndarray, T0_train: int, T0_total: int, T: int, conformal_window: str = 'validation', alpha: float = 0.05, block_size: int | None = None, grid_size: int = 401, grid_half_width_se: float = 6.0) → dict#

Conformal ATT confidence interval from in-sample residuals.

Parameters:

gap (np.ndarray) – Full-period residual Y1 - Y0 @ w, shape (T,). The pre- treatment portion (gap[:T0_total]) is interpreted as noise under the no-treatment null; gap[T0_total:] is the post- treatment effect-plus-noise sequence.
T0_train, T0_total, T (int) – Training-block end / pre-block end / full length. Pre = [0, T0_total), validation = [T0_train, T0_total), post = [T0_total, T).
conformal_window ({“validation”, “pre”}) – Which residual block to use for calibration. "validation" uses only gap[T0_train:T0_total] (truly out-of-sample under the chosen V); "pre" uses the entire gap[:T0_total].
alpha (float) – Two-sided significance level.
block_size (int, optional) – Moving-block size for the conformity score. Defaults to max(3, sqrt(n_post)), matching LEXSCM.
grid_size (int) – Number of theta candidates in the grid search for the ATT CI.
grid_half_width_se (float) – The grid spans [ATT_hat +/- grid_half_width_se * SE] where SE is a plug-in standard error from the calibration residuals.

Returns:

dict – Keys: method, att_observed, ci_lower, ci_upper, p_value, calibration_residuals, pointwise_lower, pointwise_upper, alpha.

mlsynth.utils.sparse_sc_helpers.inference.run_placebo(Y0: ndarray, Y1: ndarray, X0: ndarray, X1: ndarray, T0_total: int, T0_train: int, selected_lambda: float, observed_att: float, solver: Any = None, resweep: bool = False, lambda_grid: ndarray | None = None, n_placebo: int | None = None, seed: int = 1400, outer_loss_window: str = 'validation') → Tuple[ndarray, float, int]#

Return (placebo_atts, p_value, n_completed).

Parameters:

Y0, Y1, X0, X1 (np.ndarray) – Full pre-standardized panel + predictor matrices.
T0_total, T0_train (int) – Pre-treatment window bounds.
selected_lambda (float) – Lambda chosen on the actual treated unit. Reused for each placebo when resweep=False (default).
observed_att (float) – ATT of the actual treated unit, used to construct the p-value.
resweep (bool) – If True, re-run the full lambda grid for each placebo. Slow.
lambda_grid (np.ndarray, optional) – Grid for the resweep case.
n_placebo (int, optional) – Subsample of donors to use as placebos. None uses every donor.
seed (int) – Seed for the subsample when n_placebo < N.

Plot helper for SparseSC.

Wraps mlsynth.utils.resultutils.plot_estimates() so the observed-vs-counterfactual chart works with the typed results object.

mlsynth.utils.sparse_sc_helpers.plotter.plot_sparse_sc(results: SparseSCResults, treated_color: str = 'black', counterfactual_color: str | List[str] = 'red', save: bool | str | dict = False, time_axis_label: str = 'Time', treatment_label: str = 'Treatment', unit_label: str = 'Unit') → None#: Render observed vs SparseSC counterfactual on the treated unit.

Typed result containers for SparseSC.

class mlsynth.utils.sparse_sc_helpers.structures.SparseSCDesign(v: ndarray, w: ndarray, opt_lambda: float, lambda_grid: ndarray, train_loss_curve: ndarray, val_mse_curve: ndarray, v_path: ndarray)#

Optimization outputs of the lambda sweep.

Parameters:

v (np.ndarray) – Final V-weights, shape (P,). First entry is 1 (the anchor).
w (np.ndarray) – Final donor weights, shape (N,), on the simplex.
opt_lambda (float) – Selected L1 penalty.
lambda_grid (np.ndarray) – Full grid of lambdas swept.
train_loss_curve (np.ndarray) – Training loss at each grid point, length equal to lambda_grid.
val_mse_curve (np.ndarray) – Validation MSE at each grid point.
v_path (np.ndarray) – Per-grid-point V-weights, shape (len(grid), P).

lambda_grid: ndarray#

opt_lambda: float#

train_loss_curve: ndarray#

v: ndarray#

v_path: ndarray#

val_mse_curve: ndarray#

w: ndarray#

class mlsynth.utils.sparse_sc_helpers.structures.SparseSCInference(method: str, p_value: float, att_observed: float = nan, ci_lower: float = nan, ci_upper: float = nan, alpha: float = nan, placebo_atts: ~numpy.ndarray = <factory>, n_placebo: int = 0, calibration_residuals: ~numpy.ndarray = <factory>, pointwise_lower: ~numpy.ndarray = <factory>, pointwise_upper: ~numpy.ndarray = <factory>)#

Inference results for SparseSC.

Either the Abadie-style placebo permutation or the validation-block conformal inference of Chernozhukov, Wuethrich and Zhu (2021) adapted to the SparseSC pre/post layout. The method tag identifies which fields are populated.

Parameters:

method (str) – "abadie_placebo_permutation", "conformal_validation", "conformal_pre", or "none".
p_value (float) – Two-sided p-value for H_0: ATT = 0. NaN when no inference was run.
att_observed (float) – Point estimate of ATT, copied here for convenience.
ci_lower, ci_upper (float) – Lower/upper bounds of the (1 - alpha) confidence interval for the ATT. NaN for method="none".
alpha (float) – Two-sided significance level used to build ci_*.
placebo_atts (np.ndarray) – Placebo ATTs, populated only when method is the placebo permutation. Empty array otherwise.
n_placebo (int) – Number of placebo runs (placebo method only; 0 otherwise).
calibration_residuals (np.ndarray) – Residuals used to build the conformity scores (conformal method only). Empty for the placebo method.
pointwise_lower, pointwise_upper (np.ndarray) – Per-period pointwise band around each post-period gap from the (1 - alpha)-quantile of the conformity scores. Empty for non-conformal methods.

alpha: float = nan#

att_observed: float = nan#

calibration_residuals: ndarray#

ci_lower: float = nan#

ci_upper: float = nan#

method: str#

n_placebo: int = 0#

p_value: float#

placebo_atts: ndarray#

pointwise_lower: ndarray#

pointwise_upper: ndarray#

class mlsynth.utils.sparse_sc_helpers.structures.SparseSCInputs(Y0: ndarray, Y1: ndarray, X0: ndarray, X1: ndarray, T: int, T0_total: int, T0_train: int, treated_unit_name: Any, donor_names: Sequence, predictor_names: Sequence, time_labels: ndarray, Ywide: Any, outcome: str)#

Pre-processed panel + predictor matrices for SparseSC.

Parameters:

Y0 (np.ndarray) – Donor outcome matrix, shape (T, N) (rows = time, columns = donors), aligned with donor_names.
Y1 (np.ndarray) – Treated outcome series, shape (T,).
X0 (np.ndarray) – Donor predictor matrix, shape (P, N) (rows = predictors, columns = donors), already standardized.
X1 (np.ndarray) – Treated predictor vector, shape (P,), already standardized.
T (int) – Total number of time periods.
T0_total (int) – End of the full pre-treatment window (exclusive).
T0_train (int) – End of the training block within the pre-period (exclusive). Validation block is [T0_train, T0_total).
treated_unit_name (Any) – Label of the treated unit.
donor_names (Sequence) – Donor labels in column order of Y0 / X0.
predictor_names (Sequence) – Predictor labels in row order of X0 / X1.
time_labels (np.ndarray) – Time labels in row order of Y0.
Ywide (Any) – Wide outcome frame preserved for plotting.
outcome (str) – Outcome variable name.

property N: int#: Number of donor units.

property P: int#: Number of predictors.

T: int#

T0_total: int#

T0_train: int#

X0: ndarray#

X1: ndarray#

Y0: ndarray#

Y1: ndarray#

Ywide: Any#

donor_names: Sequence#

outcome: str#

predictor_names: Sequence#

time_labels: ndarray#

treated_unit_name: Any#

class mlsynth.utils.sparse_sc_helpers.structures.SparseSCResults(*, effects: EffectsResults | None = None, fit_diagnostics: FitDiagnosticsResults | None = None, time_series: TimeSeriesResults | None = None, weights: WeightsResults | None = None, inference: InferenceResults | None = None, method_details: MethodDetailsResults | None = None, sub_method_results: Dict[str, Any] | None = None, additional_outputs: Dict[str, Any] | None = None, raw_results: Dict[str, Any] | None = None, execution_summary: Dict[str, Any] | None = None, plot_config: PlotConfig | None = None, inputs: SparseSCInputs, design: SparseSCDesign, inference_detail: SparseSCInference, predictor_weights: Dict[Any, float])#

Public SparseSC.fit() return container.

An EffectResult (the observational report): in addition to the SparseSC-specific fields below it exposes the standardized sub-models (effects, time_series, weights, inference, fit_diagnostics, method_details) and the flat accessors att / counterfactual / gap / att_ci / pre_rmse / donor_weights.

Parameters:

inputs (SparseSCInputs) – Pre-processed panel + predictors.
design (SparseSCDesign) – Lambda-selection results, V and W weights.
inference_detail (SparseSCInference) – The raw placebo / conformal inference object (method / p_value / placebo_atts / pointwise_* / …) or method="none". (Renamed from inference; the standardized InferenceResults is mirrored into the inference slot so res.att_ci resolves.)
predictor_weights (Dict[Any, float]) – {predictor_name: v_p}.

Notes

The donor weights ({donor_name: w_j}) live in the standardized weights slot and are served by res.donor_weights; the predictor weights are also mirrored into weights.summary_stats.

design: SparseSCDesign#

inference_detail: SparseSCInference#

inputs: SparseSCInputs#

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'frozen': True, 'json_encoders': {<class 'numpy.ndarray'>: <function BaseEstimatorResults.Config.<lambda>>}}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

predictor_weights: Dict[Any, float]#

Replication#

SparseSC is verified on the canonical Proposition 99 panel: handed an over-rich augmented predictor set, the L1 penalty prunes to 6 of 33 predictors and the effect lands at \(-17.9\) packs (95% conformal CI \([-21.3, -15.4]\)) on the Abadie-Diamond-Hainmueller donor pool. See the dedicated replication page for the full specification, code, and number-match.

Example#

The canonical empirical example is Vives’s augmented California Proposition 99 study. Load the reshaped long-form panel and run SparseSC with the original ADH-7 predictor set:

"""Run SparseSC on the long-form augmented California dataset."""

from __future__ import annotations

import pandas as pd

from mlsynth import SparseSC


# ---------------------------------------------------------------------
# Load long-form panel
# ---------------------------------------------------------------------

df = pd.read_csv(
    "https://raw.githubusercontent.com/jgreathouse9/mlsynth/refs/heads/main/basedata/augmented_cali_long.csv"
)

LAG_PERIODS = [1975, 1980, 1988]

COVARIATES = [
    "p_cig",
    "loginc",
    "pct15-24",
    "pc_beer",
]


# ---------------------------------------------------------------------
# SparseSC fit
# ---------------------------------------------------------------------

results = SparseSC(
    {
        "df": df,
        "outcome": "cigsale",
        "treat": "Proposition 99",
        "unitid": "state",
        "time": "year",
        "covariates": COVARIATES,
        "outcome_lag_periods": LAG_PERIODS,
        "display_graphs": True,
        "run_inference": False,
    }
).fit()

Enable inference (validation-block conformal is the default) and inspect the ATT CI:

results = SparseSC({
    "df": df,
    "outcome": "cigsale",
    "treat": "Proposition 99",
    "unitid": "state",
    "time": "year",
    "covariates": COVARIATES,
    "outcome_lag_periods": LAG_PERIODS,
    "alpha": 0.05,                       # CI level
    "run_inference": True,
    "display_graphs": False,
}).fit()

print(results.att)                       # post-period ATT
lo, hi = results.att_ci                  # 95% conformal CI (standardized)
print(lo, hi)
print(results.inference.p_value)         # H_0: ATT = 0 (standardized slot)
print(results.inference_detail.method)   # raw placebo/conformal object
print(results.design.opt_lambda)         # selected L1 penalty
print(results.predictor_weights)         # {predictor: v_p}
print(results.donor_weights)             # {donor: w_j} on the simplex

# Lambda sweep diagnostics.
import matplotlib.pyplot as plt
plt.plot(results.design.lambda_grid, results.design.val_mse_curve)
plt.xscale("log"); plt.xlabel("lambda"); plt.ylabel("validation MSE")

References#

Abadie, A., Diamond, A., & Hainmueller, J. (2010). “Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California’s Tobacco Control Program.” Journal of the American Statistical Association 105(490):493-505.

Chernozhukov, V., Wuethrich, K., & Zhu, Y. (2021). “An Exact and Robust Conformal Inference Method for Counterfactual and Synthetic Controls.” Journal of the American Statistical Association 116(536):1849-1864.

Vives-i-Bastida, J. (2023). “Predictor Selection for Synthetic Controls.” arXiv:2203.11576v2.

Sparse Synthetic Control (SparseSC)

Contents

Sparse Synthetic Control (SparseSC)#

Overview#

When to use this estimator#

Notation#

Identifying assumptions#

Mathematical Formulation#

Inner W-weight QP#

Outer V-weight problem#

Gradient computation#

Lambda selection#

Predictor selection#

ATT and Counterfactual#

Conformal ATT inference (default)#

Abadie-style placebo (opt-in)#

Predictor Convention#

Performance notes#

Core API#

Configuration#

Helper Modules#

Replication#

Example#

References#