Relaxed / Penalized Synthetic Control (RESCM)

Contents

Relaxed / Penalized Synthetic Control (RESCM)#

When to Use This Estimator#

The classic synthetic control method (SCM) of Abadie, Diamond and Hainmueller [ABADIE2010] builds a counterfactual as a convex combination of donor units – weights on the simplex, no intercept. That convex hull restriction is what makes SCM interpretable and robust, but it is also brittle when the donor pool is large relative to the pre-period: with \(N\) donors and only \(T_1\) pre-periods, the least-squares fit underlying SCM is under-determined once \(N \gtrsim T_1\), the weights become unstable, and many “equally good” solutions exist.

RESCM is a single convex program that nests a whole family of SCM estimators as corner cases, so you can dial the donor-pool regularization continuously from classic SCM all the way to difference-in-differences (equal weights), and pick the corner that suits your data. Two branches are exposed, each with the estimation/inference theory of its own paper:

  • Penalized branch\(\min \tfrac{1}{2}\|y_0 - \mu - Y\omega\|_2^2 + P(\omega)\) with \(P\) an \(\ell_1\) / \(\ell_2\) / \(\ell_\infty\) (or mixed) penalty. The \(\ell_\infty\) member is the L-infinity-norm SCM of Wang, Xing and Ye [LinfSC], which spreads weight across donors (capping the largest weight) rather than concentrating it; classic Abadie SCM is the no-penalty (\(\lambda = 0\)) simplex corner, and equal-weights/DiD is the heavy-\(\ell_\infty\) limit [DoudchenkoImbens2017].

  • Relaxation branch – the SCM-relaxation of Liao, Shi and Zheng [RelaxSC]: keep the simplex \(\omega \in \Delta_J\), but relax the exact balance first-order condition to an \(\ell_\infty\) tolerance \(\eta\), then among all weights satisfying the relaxed condition pick the one minimizing an information-theoretic divergence \(D(\omega)\) (squared \(\ell_2\), entropy, or empirical likelihood). The divergence picks a unique, stable weight (e.g. closest-to-uniform), and under a latent group structure recovers equal-within-group weights.

Use RESCM when you have a single treated unit and a large donor pool and want either (i) a dense, stable counterfactual that does not hinge on a handful of donors (LINF / RELAX_*), or (ii) a one-stop interface to compare classic SCM against its penalized and relaxed cousins on the same panel. Pick estimators by name through methods.

Notation#

We use the synthetic-control canon. Unit \(j=0\) is treated and \(\mathcal{N} = \{1, \ldots, N\}\) indexes the donors; \(\mathbf{y}_0\) is the treated outcome and \(\mathbf{Y} = (y_{jt})\) the \(T \times N\) donor matrix. The intervention occurs after the pre-period \(\mathcal{T}_1 = \{1, \ldots, T_1\}\); the post-period is \(\mathcal{T}_2 = \{T_1+1, \ldots, T\}\) with \(T_2 = |\mathcal{T}_2|\). Donor weights \(\boldsymbol{\omega} = (\omega_1, \ldots, \omega_N)'\) live on the simplex \(\Delta_N = \{\boldsymbol{\omega} : \omega_j \ge 0,\, \sum_j \omega_j = 1\}\) (relaxation branch) or are penalized (penalized branch); \(\mu\) is an optional intercept. The counterfactual is \(\hat{y}_{0t}^0 = \mu + \mathbf{Y}_{t\cdot}\,\boldsymbol{\omega}\), the effect \(\hat{\Delta}_t = y_{0t} - \hat{y}_{0t}^0\), and the ATE \(\bar{\Delta} = T_2^{-1}\sum_{t\in\mathcal{T}_2}\hat{\Delta}_t\). \(\|\mathbf{A}\|_\infty = \max_{ij}|a_{ij}|\). With pre-period donor Gram matrix \(\hat{\boldsymbol{\Sigma}} = T_1^{-1}\sum_{t\in\mathcal{T}_1} \mathbf{Y}_{t\cdot}'\mathbf{Y}_{t\cdot}\) and cross-moment \(\hat{\boldsymbol{\Upsilon}} = T_1^{-1}\sum_{t\in\mathcal{T}_1} \mathbf{Y}_{t\cdot}' y_{0t}\), the SCM least-squares first-order condition is \(\hat{\boldsymbol{\Sigma}}\boldsymbol{\omega} = \hat{\boldsymbol{\Upsilon}}\).

The unified convex program#

Every RESCM corner case is a special case of one program. The penalized form fits the in-sample loss with a regularizer,

\[\min_{\mu,\,\boldsymbol{\omega}\in\mathcal{C}} \; \tfrac{1}{2}\sum_{t\in\mathcal{T}_1} \bigl(y_{0t} - \mu - \mathbf{Y}_{t\cdot}\boldsymbol{\omega}\bigr)^2 + \lambda\Bigl[\alpha\|\boldsymbol{\omega}\|_1 + (1-\alpha)\,Q(\boldsymbol{\omega})\Bigr], \qquad Q \in \{\tfrac{1}{2}\|\cdot\|_2^2,\; \|\cdot\|_\infty\},\]

with constraint set \(\mathcal{C}\) (simplex by default), mixing \(\alpha\in[0,1]\), and strength \(\lambda\) chosen by K-fold cross-validation. The relaxation form instead minimizes a divergence subject to a relaxed balance condition,

\[\min_{\boldsymbol{\omega}\in\Delta_N} \; D(\boldsymbol{\omega}) \quad\text{s.t.}\quad \bigl\|\hat{\boldsymbol{\Sigma}}\boldsymbol{\omega} - \hat{\boldsymbol{\Upsilon}}\bigr\|_\infty \le \eta, \qquad D \in \Bigl\{\tfrac{1}{2}\|\boldsymbol{\omega}\|_2^2,\; \textstyle\sum_j \omega_j\log\omega_j,\; -\textstyle\sum_j \log\omega_j\Bigr\},\]

where the three divergences are squared \(\ell_2\), entropy, and empirical likelihood respectively, and \(\eta\) is selected by validation. Classic SCM is recovered at \(\lambda=0\) (penalized, simplex) or \(\eta = 0\) (relaxation); the equal-weights/DiD estimator is the \(\lambda\to\infty\) \(\ell_\infty\) limit or large-\(\eta\) limit.

Penalized branch: L-infinity SCM (Wang, Xing & Ye)#

Idea. Classic SCM tends to load heavily on one or two donors. When the true data-generating process spreads signal across many donors, that concentration is fragile – a single idiosyncratic donor shock contaminates the counterfactual. The \(\ell_\infty\) penalty \(\|\boldsymbol{\omega}\|_\infty = \max_j |\omega_j|\) directly caps the largest weight, so minimizing it (under the simplex) pushes the solution toward spreading weight across donors – in the limit, equal weights. The mixed \(\ell_1 + \ell_\infty\) penalty (L1LINF) trades sparsity against spreading. Both shrink toward a stable, dense counterfactual that tolerates \(N > T_1\).

Assumptions (Wang, Xing & Ye [LinfSC]).

Assumption 1 (factor model). The outcomes follow a linear factor model \(y_{jt} = \boldsymbol{\lambda}_j'\mathbf{f}_t + u_{jt}\) with the treated unit’s loading in (or near) the convex hull of the donor loadings, so an \(\ell_\infty\)-regularized convex combination approximates the treated factor structure.

Assumption 2 (weak dependence). The idiosyncratic errors \(u_{jt}\) are mean-zero and weakly dependent over time (strong-mixing with summable autocovariances), permitting a HAC long-run variance and a sequential CLT for the post-period mean.

Assumption 3 (regularization rate). The penalty strength \(\lambda\) (and mixing \(\alpha\)) are chosen so the weight estimate is consistent for its population target; in practice both are selected by K-fold cross-validation on the pre-period.

Remark. The \(\ell_\infty\) cap is what buys stability in high dimensions: by refusing to let any single weight dominate, the estimator’s prediction variance stays controlled even when \(N \gg T_1\), where the unconstrained least-squares (and concentrated classic SCM) solutions degrade. This is the synthetic-control analogue of the L2-relaxation argument for dense coefficients.

Inference. Wang, Xing and Ye extend the synthetic-control ATE inference of Li [LiSCM2020] to the dense, weakly dependent setting. With fixed weights the post-period gap \(\hat{\Delta}_t\) has mean \(\bar{\Delta}\), and

\[\hat{Z} = \frac{\bar{\Delta}} {\sqrt{\hat{\rho}^2_{(1)}/T_1 + \hat{\rho}^2_{(2)}/T_2}} \xrightarrow{d} N(0,1),\]

with \(\hat{\rho}^2_{(1)}\) the HAC long-run variance of the pre-period prediction residuals (first-stage weight-estimation uncertainty) and \(\hat{\rho}^2_{(2)}\) that of the de-meaned post-period effects. This is the two-term variance RESCM uses for all corner cases (see Inference below).

When to use. Dense, factor-driven donor structure; high dimension (\(N>T_1\) permitted); when you want a counterfactual robust to any single donor’s idiosyncrasies rather than a sparse, concentrated fit.

Relaxation branch: SCM-relaxation (Liao, Shi & Zheng)#

Idea. Classic SCM solves a least-squares problem whose first-order condition is \(\hat{\boldsymbol{\Sigma}}\boldsymbol{\omega} = \hat{\boldsymbol{\Upsilon}}\) – exact balance of the donor moments. When \(N\) is large this condition is over-strict: it is satisfied (or nearly so) by a continuum of weights, and the chosen one is arbitrary and unstable. SCM-relaxation keeps the simplex but relaxes the balance condition to an \(\ell_\infty\) tolerance \(\eta\), then selects, among all admissible weights, the one minimizing a divergence \(D(\boldsymbol{\omega})\):

\[\hat{\boldsymbol{\omega}} = \operatorname*{argmin}_{\boldsymbol{\omega}\in\Delta_N} D(\boldsymbol{\omega}) \quad\text{s.t.}\quad \bigl\|\hat{\boldsymbol{\Sigma}}\boldsymbol{\omega} - \hat{\boldsymbol{\Upsilon}}\bigr\|_\infty \le \eta.\]

The divergence breaks the tie deterministically: \(\ell_2\) (RELAX_L2) picks the minimum-norm weight, entropy (RELAX_ENTROPY) the maximum-entropy / closest-to-uniform weight, and empirical likelihood (RELAX_EL) the EL-optimal weight. \(\eta=0\) reduces to (a divergence- selected) classic SCM; large \(\eta\) admits the whole simplex and the divergence alone determines the weight (entropy/EL \(\to\) equal weights).

Assumptions (Liao, Shi & Zheng [RelaxSC]).

Assumption 1 (approximate factor model with group structure). Units load on common factors and fall into latent groups; the treated unit’s loading is spanned by the donor loadings up to an approximation error that the relaxation tolerance \(\eta\) absorbs.

Assumption 2 (identifiable divergence minimizer). The divergence \(D\) is strictly convex on the simplex, so the relaxed feasible set has a unique minimizer – this is what restores well-posedness when the exact balance condition does not.

Assumption 3 (weak dependence and moments). The errors are weakly dependent with bounded moments, so sample moments \(\hat{\boldsymbol{\Sigma}}, \hat{\boldsymbol{\Upsilon}}\) converge and a post-period CLT applies.

Remark. The key result (oracle prediction) is that the relaxed weight predicts the treated counterfactual as well as the infeasible oracle that knows the factor structure, and under the latent group structure the divergence- selected weight is equal within groups – a transparent, interpretable solution that the arbitrary classic-SCM tie-break does not deliver.

Inference. Liao, Shi and Zheng’s main theory concerns prediction consistency (the oracle property). For an ATE confidence interval mlsynth applies the same weak-dependence two-term HAC test as the penalized branch (Li [LiSCM2020]), treating the relaxed weights as the fixed first stage. See the caveat under Verification.

The named corner cases#

methods selects estimators by name; each resolves to one exact call of the convex engine.

Name

Branch

Estimator

SC

penalized

Classic Abadie simplex SCM (\(\lambda=0\)).

LASSO

penalized

\(\ell_1\) penalty; sparse donor weights.

RIDGE

penalized

\(\ell_2\) penalty; dense shrunken weights.

ENET

penalized

Elastic net (\(\ell_1 + \ell_2\)); \(\alpha\) by CV.

LINF

penalized

L-infinity-norm SCM [LinfSC]; spreads weight, nests DiD.

L1LINF

penalized

Mixed \(\ell_1 + \ell_\infty\) penalty.

RELAX_L2

relaxation

SCM-relaxation, \(\ell_2\) divergence [RelaxSC].

RELAX_ENTROPY

relaxation

SCM-relaxation, entropy divergence.

RELAX_EL

relaxation

SCM-relaxation, empirical-likelihood divergence.

Shared assumptions across the RESCM class#

The penalized and relaxation branches differ in how they regularize the weights, but they share the same identifying stack. The shared structural conditions, consolidated from Wang-Xing-Ye (\(L_\infty\) SCM) and Liao-Shi-Zheng (SCM-relaxation):

A1 (Linear factor model for untreated outcomes). Each unit’s untreated outcome obeys

\[y_{jt}^N \;=\; \boldsymbol\lambda_j' \mathbf f_t \;+\; u_{jt}, \qquad j \in \{0\} \cup [J], \;\; t \in \mathcal T,\]

with \(\mathbf f_t\) an \(r\)-vector of latent common factors, \(\boldsymbol\lambda_j\) a unit-specific loading, and \(u_{jt}\) a mean-zero idiosyncratic shock orthogonal to the factors. Both branches lean on this factor structure: it is what makes the treated unit’s untreated outcome a (dense) linear combination of the donors’ outcomes plus an orthogonal error.

A2 (Single treated unit, sharp absorbing treatment). Unit \(j = 0\) is the only treated unit; treatment turns on at \(T_0 + 1\) and stays on. Donors are untreated throughout (no interference). Both papers’ main theorems are stated for the single-treated case; multi-treated extensions exist (FECT / Synthetic Difference-in-Differences (SDID) for staggered designs are the mlsynth alternatives).

A3 (Weak temporal dependence). The errors \(u_{jt}\) are mean-zero, weakly dependent (\(\alpha\)-mixing or \(\rho\)-mixing with summable autocovariances), and have bounded moments. This is what licenses the HAC long-run variance estimator and the sequential CLT for the post-period mean. The Wang-Xing-Ye theory explicitly accommodates stationary, trend-stationary, and unit-root non-stationary cases under exponential decay in correlation rates; the Liao-Shi-Zheng theory uses an \(\alpha\)-mixing CLT.

A4 (High-dimensional sample-size regime). \(T_0 \to \infty\) and \(N\) may grow (potentially \(N \gg T_0\)), which is the entire point of regularizing toward a dense solution. The classical-SCM least-squares first-order condition \(\hat{\boldsymbol\Sigma} \boldsymbol\omega = \hat{\boldsymbol\Upsilon}\) is under-determined once \(N \gtrsim T_0\); both branches restore well-posedness, but do so under the same growth regime.

A5 (Treated unit (approximately) in the convex hull of donor loadings). The treated loading \(\boldsymbol\lambda_0\) is spanned by the donor loadings up to an approximation error that the regularization tolerance (\(\lambda\) for penalized, \(\eta\) for relaxation) absorbs. Remark. If \(\boldsymbol\lambda_0\) is structurally outside the donor hull – even after relaxation – both branches fail. Use Imperfect Synthetic Controls (ISCM).

A6 (Branch-specific regularization rate).

  • For the penalized branch (Wang-Xing-Ye): the penalty strength \(\lambda\) and (when applicable) the mixing \(\alpha\) are chosen so the weight estimator is consistent for its population target. In mlsynth both are selected by K-fold cross-validation on the pre-period.

  • For the relaxation branch (Liao-Shi-Zheng): the tolerance \(\eta\) (selected by validation) shrinks at a rate that makes the relaxed first-order condition asymptotically equivalent to the oracle moment condition; Theorem 1 of the paper gives the precise rate.

A7 (Latent group structure – relaxation branch only). The donor loadings \(\boldsymbol\lambda_j\) fall into \(K\) latent groups; both \(K\) and the factor count \(r\) may diverge, regardless of their relative order. Under this structure the relaxation branch’s divergence-minimizing weight asymptotically recovers equal weights within groups – Liao-Shi-Zheng’s interpretable “transparent solution” that classical SCM’s arbitrary tie-break does not deliver. Remark. Without latent groups the relaxation still produces a valid counterfactual; the within-group-equal-weight interpretation is the bonus that group structure buys.

A8 (Strictly convex divergence – relaxation branch only). The divergence \(D\) is strictly convex on the simplex (squared \(\ell_2\), entropy, empirical likelihood, or any GEL-class function with non-negative support and restricted strong convexity). Strict convexity is what makes the relaxed feasible set have a unique minimizer – this is the well-posedness fix that distinguishes RESCM-relaxation from classical SCM in the under-determined regime.

  1. Latent factor model (A1). Both branches lean on the factor structure. If the panel is not well-described by a small number of common factors, the linear-combination representation has a non-vanishing residual the regularization cannot remove.

    Plausibly violated when donors are largely idiosyncratic (unrelated processes), or when a donor has a one-time structural break that no factor model absorbs. Diagnostic: SVD of the pre-period donor matrix should show a clear factor cutoff (top few singular values carrying most of the energy). A flat-tailed spectrum flags A1 failure; in that regime, switch to balancing-aware estimators (MicroSynth (User-Level Balancing SC) for unit-level data) or stay with canonical SCM (which does not lean as hard on A1).

  2. Weak temporal dependence (A3). The HAC long-run variance estimator needs mixing with at-least-summable autocovariances. Strong serial correlation, long-memory series, or non-stationarity break this.

    Plausibly violated when outcomes are price levels or cumulative quantities. Diagnostic: ADF/KPSS on the pre-period residuals; non-stationarity flags A3 failure. For unit-root outcomes the Wang-Xing-Ye theory still goes through under exponential correlation decay, but the finite-sample HAC variance is often optimistic. First-difference before fitting, or use Synthetic Business Cycle (SBC) (stationary-cycle estimator).

  3. Sample-size regime (A4). Both branches’ asymptotics require \(T_0\) large. The Verification section’s Monte Carlo documents what happens at \(T_0 < N\): the point estimate is unbiased and powerful, but the analytic two-term variance over-rejects (the normal approximation has not kicked in).

    Plausibly violated when \(T_0 \le N\) or \(T_0 < 50\)-ish. Diagnostic: read res.pre_rmse against the donor noise floor (the mean leave-one-out pre-RMSE across donors); if treated pre-RMSE is much smaller, the estimator is over-fitting and \(\hat\rho^2_{(1)}\) understates uncertainty. Report a placebo / conformal CI (e.g. via canonical SCM with permutation inference) alongside the analytic CI.

  4. Treated unit in donor convex hull (A5). Both branches keep the simplex (or shrink toward the simplex). If the treated unit’s loading is structurally outside the donor hull, the regularization tolerance \(\eta\) / \(\lambda\) cannot bridge it.

    Plausibly violated when the treated unit is qualitatively different from every donor (a coastal mega-state against only interior states; a tech-led economy against commodity-led donors). Diagnostic: read res.pre_rmse on the un-regularized SC corner and on the most- regularized variant in your run. If both stay high, the hull condition is failing. Switch to Imperfect Synthetic Controls (ISCM) (which identifies the effect even when the treated unit is outside the hull) or Nonlinear Synthetic Control (NSC) (which drops the simplex to allow negative-weight extrapolation).

  5. Regularization rate (A6) – choosing :math:`lambda` / :math:`eta`. Cross-validation can be noisy on a short pre-period; the selected hyperparameter is then noise and the implied counterfactual flips with the seed.

    Plausibly violated when \(T_0 \le 20\). Diagnostic: re-run with different cv_folds or different train/validation splits; if the selected \(\lambda\) /\(\eta\) and the implied ATT move substantially, the CV is not informative. Either fix \(\lambda\) at a domain-informed value via the explicit knobs, or fall back to canonical SCM / Two-Step Synthetic Control which do not need CV at all.

  6. Latent group structure (A7 – relaxation only). The within-group-equal-weights interpretation requires that donors actually fall into groups on their loadings.

    Plausibly violated when donor loadings are continuously distributed without natural clusters. Diagnostic: inspect the empirical CDF of res.fits["RELAX_ENTROPY"].donor_weights; a multi-modal CDF supports the group structure, a smooth CDF means the group story is rhetorical rather than real. If groups are not present, prefer RELAX_L2 (which targets minimum-norm weights, no group assumption needed).

  7. Choice of divergence (A8 – relaxation only). The three divergences encode different priors: \(\ell_2\) minimum-norm (defensive default), entropy / maximum-entropy (closest-to-uniform), empirical likelihood (GEL-optimal). Each is valid; the choice is a modeling decision.

    Practical rule of thumb: RELAX_L2 for a defensible default; RELAX_ENTROPY when the policy story is “the counterfactual should be close to a uniform average of donors”; RELAX_EL when the inferential framework is explicitly GEL-based and you want the maximum-likelihood interpretation. The three rarely disagree by more than Monte-Carlo noise on well-behaved panels.

Reach for RESCM when:

  • You have a single treated unit and a large donor pool (\(N\) comparable to or exceeding \(T_0\)), and a factor-driven panel where many donors plausibly carry signal. Classical SCM concentrates weight on a few donors; the dense-shrinkage variants in RESCM diversify prediction risk across the pool.

  • You want a stable, dense counterfactual that does not hinge on one or two donors – exactly the “denser weighting philosophy” both papers advocate. The \(\ell_\infty\) penalty caps the largest weight; the relaxation branch spreads weight by minimizing a strictly-convex divergence on the simplex.

  • You want a one-stop interface to compare classical SCM, Lasso-SC, Ridge-SC, elastic-net SCM, \(L_\infty\)-SCM, and three relaxation variants on the same panel. The methods argument selects estimators by name; each maps to one call of the same convex engine.

  • You want HAC-based, classical-statistics inference (Li 2020 two-term standard errors) on the ATE rather than a permutation or conformal procedure. Note the finite-sample-inference caveat below.

Do not use RESCM when:

  • You need sparse, hand-interpretable weights as the headline deliverable. Both branches actively shrink toward dense solutions; that is the design choice. If the policy story has to be “California ≈ Utah + Montana + Nevada”, run canonical SCM / Two-Step Synthetic Control, or Forward-Selected Synthetic Control (FSCM) (forward-selected SC with the simplex retained) for a sparse-by-construction donor set.

  • The treated unit is structurally outside the donor hull. Both branches keep (or shrink toward) the simplex. Pre-RMSE stays high at every regularization level. Use Imperfect Synthetic Controls (ISCM) (identifies the effect through donors that use the treated unit as a positive-weight donor in their own synthetic controls) or Nonlinear Synthetic Control (NSC) (drops the simplex restriction).

  • Very short pre-period \((T_0 < N)\). The Path-B Monte Carlo in the Verification section above documents the over-rejection: the analytic two-term variance has not kicked in. Either lengthen the pre-period (aggregate to a finer time grid), prune donors, or report a placebo / conformal CI alongside.

  • Non-stationary or unit-root outcomes. A3’s mixing assumption strains here; the Wang-Xing-Ye theory accommodates unit-root cases under exponential correlation decay, but the HAC variance is often optimistic in finite samples. First-difference, or use Synthetic Business Cycle (SBC).

  • You have multiple treated units / staggered adoption. RESCM’s theory is built for a single treated unit. Use FECT or Synthetic Difference-in-Differences (SDID) for staggered designs.

  • Spillovers across donors. A2’s no-interference clause fails; the factor model’s orthogonality breaks. Use Spillover-Aware Synthetic Control (SPILLSYNTH) or Spatial Synthetic Difference-in-Differences (SpSyDiD).

  • Continuous or multi-valued treatment. RESCM encodes a single binary intervention; continuous dose belongs in Continuous-Treatment Synthetic Control (CTSC).

  • Distributional questions (Lorenz curves, QTEs, tail effects). RESCM targets the mean ATE through a Gaussian- likelihood linear projection. Use Distributional Synthetic Control (DSC) for distributional effects.

  • You need Bayesian posterior credible bands. RESCM returns frequentist HAC-based CIs and a single point estimate per corner. For posterior inclusion probabilities and credible bands on donor weights, use Bayesian Synthetic Control with a Soft Simplex Constraint (BVS-SS) (spike-and-slab with a soft simplex – the natural Bayesian analogue of the dense-vs-sparse trade-off the RESCM family addresses frequentistically).

  • You want predictor-level (covariate + lagged-outcome) matching rather than outcome-only matching. RESCM’s workhorse projection is on donor outcomes; for predictor-matching with L1 sparsity on the predictor-weight matrix, use Sparse Synthetic Control (SparseSC).

  • You want the factor model itself (the loadings and factors) estimated and reported. RESCM is agnostic to factor estimation – the factor model only motivates the linear projection and never enters the estimator. For factor-aware estimators that surface \(\hat F\) and \(\hat\Lambda\), use Factor Model Approach (FMA) or Cluster Synthetic Controls (CLUSTERSC) (which exposes the HSVT rank in its results).

  • Donor selection is the bottleneck, not weight shrinkage. If you have a small number of donors and want to select the best subset rather than spread weight across a wide pool, use Forward-Selected Synthetic Control (FSCM) (forward selection on donor units) or Panel Data Approach (PDA) with methods=["fs"] (forward-selected PDA with sample-splitting inference).

Inference#

Once the donor weights are fixed by any corner case, the post-period gap \(\hat{\Delta}_t = y_{0t} - \hat{y}_{0t}^0\) is a scalar series whose mean is the ATE. RESCM reports the Li [LiSCM2020] two-term long-run variance used by the L-infinity paper,

\[\widehat{\mathrm{se}}(\bar{\Delta}) = \sqrt{\hat{\rho}^2_{(1)}/T_1 + \hat{\rho}^2_{(2)}/T_2},\]

where \(\hat{\rho}^2_{(1)}\) is the HAC long-run variance of the pre-period prediction residuals – carrying the first-stage weight-estimation uncertainty – and \(\hat{\rho}^2_{(2)}\) is the HAC long-run variance of the de-meaned post-period effects (Bartlett kernel, \(\lfloor T_2^{1/4}\rfloor\) lag). The pre-period term is essential for dense penalized/relaxed weights: unlike forward-selection PDA – whose sample splitting makes pre/post asymptotically independent and lets a post-only variance suffice – the dense estimators reuse the entire pre-window to fit the weights, so dropping \(\hat{\rho}^2_{(1)}\) understates the standard error.

Empirical Illustration: California’s Proposition 99#

The canonical synthetic-control application [ABADIE2010] studies the effect of California’s 1988 Proposition 99 tobacco-control program on per-capita cigarette sales, with 38 control states over 1970-2000. Running RESCM with the classic SC corner alongside the two papers’ headline estimators reuses the same panel and returns each method’s counterfactual, ATE, and a HAC confidence interval.

import pandas as pd
from mlsynth import RESCM

url = "https://raw.githubusercontent.com/jgreathouse9/mlsynth/refs/heads/main/basedata/smoking_data.csv"
df = pd.read_csv(url)
df["Proposition 99"] = df["Proposition 99"].astype(int)

res = RESCM({"df": df, "outcome": "cigsale", "treat": "Proposition 99",
             "unitid": "state", "time": "year",
             "methods": ["SC", "LINF", "RELAX_L2"], "alpha": 0.05,
             "display_graphs": True}).fit()

for name, fit in res.fits.items():
    print(f"{name:9s} ATE {fit.att:8.3f}  SE {fit.att_se:6.3f}  "
          f"95% CI ({fit.ci[0]:7.2f},{fit.ci[1]:7.2f})  "
          f"p={fit.p_value:.3f}  donors={len(fit.donor_weights)}")

This prints:

SC        ATE  -17.371  SE  2.304  95% CI ( -21.89, -12.86)  p=0.000  donors=6
LINF      ATE  -17.359  SE  2.303  95% CI ( -21.87, -12.85)  p=0.000  donors=6
RELAX_L2  ATE  -22.190  SE  4.115  95% CI ( -30.26, -14.12)  p=0.000  donors=7

The penalized corners (SC, LINF) agree on an outcome-only effect of about \(-17\) cigarette packs per capita: when the donor pool fits the pre-period well (pre-treatment \(R^2 \approx 0.98\), pre-RMSE \(1.45\)), cross-validation picks near-zero \(\ell_\infty\) shrinkage and LINF collapses onto classic SCM. The relaxation corner (RELAX_L2) trades a little pre-fit (\(R^2 \approx 0.97\)) for a more constrained weight and lands at a larger effect, \(-22.2\), whose wider confidence interval reflects the larger pre-period residual variance the two-term standard error correctly propagates.

Verification#

Note

Empirical (Path A, Proposition 99). SC/LINF/RELAX_L2 run on the smoking panel (above); the penalized corners reproduce the classic outcome-only SCM effect (\(\approx -17\)) and the relaxation corner the denser, larger estimate, both significant and consistent with the literature.

Simulation (Path B, high dimension). A size/power Monte Carlo in the regime these methods target – \(N=90\) donors, \(T_1=36\) pre-periods (\(N \gg T_1\)), \(T_2=36\), three-factor AR(1) DGP, treated loading a convex mix of donors, idiosyncratic \(N(0,1)\) – rejecting H0: ATE = 0 at 5% (50 replications; \(\delta=0\) is size, \(\delta=1\) is power):

\(\delta\)

SC

LINF

RELAX_L2

0 (size)

0.20

0.22

0.28

1 (power)

0.98

0.98

0.94

Estimation is unbiased and powerful (mean ATT bias \(\approx 0\); power \(0.94\)-\(0.98\)), but the analytic ATE test over-rejects in this short, high-dimensional panel. A diagnostic isolates two mechanisms: SC genuinely over-fits the 36-period pre-window (pre-RMSE \(0.77\) vs the true noise sd \(1.0\)), so \(\hat{\rho}^2_{(1)}\) understates estimation uncertainty – it self-corrects as \(T_1\) grows (pre-RMSE \(\to 0.94\) at \(T_1=250\)); RELAX_L2 does not over-fit (pre-RMSE \(\approx 1.0\)), so its over-rejection reflects the analytic influence function not capturing how strongly-relaxed dense weights feed into the ATE. Both papers’ asymptotics require \(T_1\) large relative to \(N\); the normal approximation is unreliable at \(T_1 < N\), and the two-term variance (which already cuts the post-only over-rejection by \(\sim 35\%\)) does not fully close the gap here. For honest finite-sample inference in this regime, prefer a placebo / conformal procedure [CWZ2021]. (Only 50 replications – noisy; the relaxation \(\eta\) is validated by CV, not fixed.)

Core API#

Relaxed/penalized synthetic control (RESCM).

A thin, NumPy-first orchestration over mlsynth.utils.laxscm_helpers. RESCM is a single convex synthetic-control program that nests a family of estimators as corner cases, selected by name:

  • penalized branch – classic SC, LASSO, RIDGE, ENET and the L-infinity-norm SCM (LINF / L1LINF) of Wang, Xing & Ye (2025): min ||y0 - mu - Y omega||^2 + P(omega).

  • relaxation branch – RELAX_L2 / RELAX_ENTROPY / RELAX_EL, the SCM-relaxation of Liao, Shi & Zheng (2026): keep the simplex and relax the exact balance first-order condition to an L-infinity tolerance, then minimise an information-theoretic divergence.

Pick estimators with methods; the first one drives the convenience aliases on the returned RESCMResults.

class mlsynth.estimators.laxscm.RESCM(config: RESCMConfig | dict)#

Bases: object

Relaxed/penalized synthetic-control estimator.

Parameters:

config (RESCMConfig or dict) – Validated configuration. Beyond the common fields, RESCM reads methods (which named corner cases to fit), alpha (CI level), tau / n_splits / n_taus (relaxation-branch CV controls), and solver.

References

Liao, C., Shi, Z., & Zheng, Y. (2026). A Relaxation Approach to Synthetic Control. arXiv:2508.01793.

Wang, L., Xing, X., & Ye, Y. (2025). An L-infinity Norm Synthetic Control Approach. arXiv:2510.26053.

Abadie, A., Diamond, A., & Hainmueller, J. (2010). Synthetic control methods for comparative case studies. JASA, 105(490), 493-505.

Doudchenko, N., & Imbens, G. W. (2017). Balancing, Regression, Difference-in-Differences and Synthetic Control Methods: A Synthesis. arXiv:1610.07748.

fit() RESCMResults#

Fit the requested RESCM corner case(s) and return typed results.

Returns:

RESCMResults – Container of per-method RESCMMethodFit objects (donor weights, counterfactual, gap, ATE, HAC standard error, CI, p-value), with convenience aliases (att, att_se, counterfactual, donor_weights) forwarding to the first requested method.

Configuration#

class mlsynth.config_models.RESCMConfig(*, df: ~pandas.DataFrame, outcome: str, treat: str, unitid: str, time: str, display_graphs: bool = True, save: bool | str = False, counterfactual_color: ~typing.List[str] = <factory>, treated_color: str = 'black', methods: ~typing.List[str] = <factory>, tau: float | ~typing.Literal['heuristic'] | None = None, n_splits: ~typing.Annotated[int | None, ~annotated_types.Ge(ge=2)] = None, n_taus: ~typing.Annotated[int | None, ~annotated_types.Ge(ge=1)] = None, solver: ~typing.Any = 'CLARABEL', alpha: ~typing.Annotated[float, ~annotated_types.Gt(gt=0.0), ~annotated_types.Lt(lt=1.0)] = 0.05)#

Configuration for the Relaxed/Balanced SCM (RESCM) estimator.

Pick one or more named corner-case estimators of the RESCM convex program via methods (e.g. ["SC", "LINF", "RELAX_L2"]); the first listed drives the convenience aliases on the returned RESCMResults. Valid names and aliases come from the registry in mlsynth.utils.laxscm_helpers.specs (METHOD_SPECS).

class Config#
extra = 'forbid'#
alpha: float#
methods: List[str]#
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_splits: int | None#
n_taus: int | None#
solver: Any#
tau: float | Literal['heuristic'] | None#

Result Containers#

RESCM.fit() returns a RESCMResults, whose fits maps each named corner case to a RESCMMethodFit (donor weights, intercept, counterfactual, gap, ATE, two-term HAC standard error, CI, p-value, nonzero donor weights, fit diagnostics, and the realized hyperparameters). Convenience aliases (att, att_se, counterfactual, donor_weights) forward to the first requested method. The prepared, NumPy-only panel is exposed as a RESCMInputs, with units and time addressed through an IndexSet.

Frozen, NumPy-first containers for the Relaxed/penalized SCM engine (RESCM).

RESCM is a single convex synthetic-control program that nests a family of estimators as corner cases. Two branches are exposed, each with its own source paper:

  • relaxed – SCM-relaxation (Liao, Shi & Zheng 2026): keep the simplex omega in Delta_J and relax the exact balance first-order condition to an L-infinity tolerance, then minimise an information-theoretic divergence D(omega) (l2 / entropy / el).

  • elastic – penalized SCM: min ||y0 - mu - Y omega||^2 + P(omega) with P an L1 / L2 / L-infinity (or mixed) penalty. The L-infinity branch is the L-infinity-norm SCM of Wang, Xing & Ye (2025); classic Abadie SC is the lambda = 0 simplex corner.

Everything below is pure NumPy; units/time are addressed through IndexSet. The only DataFrame touchpoint is setup.

class mlsynth.utils.laxscm_helpers.structures.RESCMInputs(unit_index: ~mlsynth.utils.fast_scm_helpers.structure.IndexSet, time_index: ~mlsynth.utils.fast_scm_helpers.structure.IndexSet, y: ~numpy.ndarray, X: ~numpy.ndarray, T0: int, treated_label: ~typing.Any, metadata: ~typing.Dict[str, ~typing.Any] = <factory>)#

Bases: object

Preprocessed, NumPy-only panel for the RESCM engine.

Parameters:
  • unit_index (IndexSet) – All N donor units (column order of X).

  • time_index (IndexSet) – All T periods (row order of y and X).

  • y (np.ndarray) – Treated-unit outcome over all periods, shape (T,).

  • X (np.ndarray) – Donor outcomes, shape (T, N).

  • T0 (int) – Number of pre-treatment periods (T1); post is T2 = T - T0.

  • treated_label (Any) – Identifier of the treated unit.

  • metadata (dict) – Free-form provenance.

property T: int#
T0: int#
property T2: int#
X: ndarray#
property donor_labels: ndarray#
metadata: Dict[str, Any]#
property n_donors: int#
time_index: IndexSet#
treated_label: Any#
unit_index: IndexSet#
y: ndarray#
class mlsynth.utils.laxscm_helpers.structures.RESCMMethodFit(name: str, branch: str, display_name: str, weights: ~numpy.ndarray, intercept: float, counterfactual: ~numpy.ndarray, gap: ~numpy.ndarray, att: float, att_se: float, ci: ~typing.Tuple[float, float], p_value: float, donor_weights: ~typing.Dict[~typing.Any, float], fit_diagnostics: ~typing.Dict[str, ~typing.Any] = <factory>, hyperparameters: ~typing.Dict[str, ~typing.Any] = <factory>, metadata: ~typing.Dict[str, ~typing.Any] = <factory>)#

Bases: object

A single RESCM corner-case fit (e.g. SC / LINF / RELAX_L2).

att: float#
att_se: float#
branch: str#
ci: Tuple[float, float]#
counterfactual: ndarray#
display_name: str#
donor_weights: Dict[Any, float]#
fit_diagnostics: Dict[str, Any]#
gap: ndarray#
hyperparameters: Dict[str, Any]#
intercept: float#
metadata: Dict[str, Any]#
name: str#
p_value: float#
weights: ndarray#
class mlsynth.utils.laxscm_helpers.structures.RESCMResults(inputs: ~mlsynth.utils.laxscm_helpers.structures.RESCMInputs, fits: ~typing.Dict[str, ~mlsynth.utils.laxscm_helpers.structures.RESCMMethodFit], selected_variant: str, metadata: ~typing.Dict[str, ~typing.Any] = <factory>)#

Bases: object

Top-level container returned by mlsynth.RESCM.fit().

property att: float#
att_by_method() Dict[str, float]#
property att_se: float#
property counterfactual: ndarray#
property donor_weights: Dict[Any, float]#
fits: Dict[str, RESCMMethodFit]#
property gap: ndarray#
inputs: RESCMInputs#
metadata: Dict[str, Any]#
se_by_method() Dict[str, float]#
selected_variant: str#

Helper Modules#

The named-estimator registry: each entry maps a name to the exact convex-engine call (branch and keyword arguments).

Named corner-case estimators of the RESCM convex program.

The legacy API exposed a nested models_to_run dict where the caller had to know which second_norm / relaxation / constraint_type / alpha combination realised which estimator. This module replaces that with a flat registry of named estimators: the user picks methods by name and each name resolves to the exact engine call.

Every spec dispatches to one of two engine entry points (mlsynth.utils.laxscm_helpers.crossval.fit_relaxed_scm() or fit_en_scm); kwargs are forwarded verbatim.

class mlsynth.utils.laxscm_helpers.specs.MethodSpec(name: str, branch: str, description: str, kwargs: ~typing.Dict[str, ~typing.Any] = <factory>)#

How a named RESCM estimator maps onto the convex engine.

branch: str#
description: str#
kwargs: Dict[str, Any]#
name: str#
mlsynth.utils.laxscm_helpers.specs.normalize_method(name: str) str#

Map a user-supplied method name to a registry key (case-insensitive).

mlsynth.utils.laxscm_helpers.specs.resolve_specs(methods) list[MethodSpec]#

Return the ordered list of MethodSpec for the requested names.

Data preparation – the only DataFrame touchpoint: pivots to NumPy, builds the unit/time ``IndexSet``es, and splits pre/post.

Long-DataFrame -> NumPy boundary for RESCM (the only pandas touchpoint).

mlsynth.utils.laxscm_helpers.setup.derive_treatment(df: DataFrame, unitid: str, time: str, treat: str) Tuple[Any, Any]#

Read the single treated unit and its first treated period from treat.

mlsynth.utils.laxscm_helpers.setup.prepare_rescm_inputs(df: DataFrame, *, unitid: str, time: str, outcome: str, treat: str) RESCMInputs#

Pivot the panel to NumPy, build IndexSets, split pre/post.

Returns:

RESCMInputs – Pure-NumPy container: treated vector y (T,), donor matrix X (T, N), T0, and unit/time IndexSets.

The weak-dependence two-term HAC ATE inference (Li 2020).

Weak-dependence ATE inference for the RESCM counterfactual.

Once donor weights are fixed, the post-treatment gap d_t = y_t - yhat_t has mean equal to the ATE. Under weak dependence (Li 2020, extended to dense weights by Wang, Xing & Ye 2025), the ATE is asymptotically normal,

Z = ATE / sqrt( rho1^2 / T1 + rho2^2 / T2 ) -> N(0, 1),

with rho1^2 the HAC long-run variance of the pre-period prediction residuals (which carries the first-stage weight-estimation uncertainty) and rho2^2 the HAC long-run variance of the de-meaned post-period effects.

The pre-period term is essential for dense penalized/relaxed weights: unlike forward selection (whose sample splitting makes the pre/post periods asymptotically independent and lets a post-only variance suffice), the dense estimators reuse the whole pre-window to fit the weights, so ignoring rho1 severely understates the standard error when N is large relative to T1.

mlsynth.utils.laxscm_helpers.inference.ate_inference(gap: ndarray, T0: int, alpha: float = 0.05) Tuple[float, float, Tuple[float, float], float]#

Return (att, se, ci, p_value) for the post-period gap mean.

Uses the Li (2020) two-term long-run variance: pre-period residual LRV (estimation uncertainty) plus de-meaned post-period effect LRV.

Run loop dispatching each corner case to the convex engine and assembling the typed per-method fits.

Run the requested RESCM corner case(s) and assemble typed per-method fits.

mlsynth.utils.laxscm_helpers.estimation.run_rescm(inputs: RESCMInputs, methods: List[str], *, tau: float | None = None, n_splits: int | None = None, n_taus: int | None = None, solver: str = 'CLARABEL', alpha: float = 0.05) Dict[str, RESCMMethodFit]#

Fit each requested RESCM corner case and attach weak-dependence ATE inference.