Multivariate Square-root Lasso Synthetic Control (MSQRT)

Multivariate Square-root Lasso Synthetic Control (MSQRT)#

When to Use This Estimator#

MSQRT implements the high-dimensional synthetic-control estimator of Shen, Song and Abadie [MSQRT]. It is built for disaggregated panels: many fine-grained units (stores, ZIP codes, geos, products) where the number of candidate donors \(N_0\) is comparable to or larger than the number of pre-treatment periods \(T_0\), and where several units are treated at the same time (a block design). Two ideas distinguish it from running an ordinary synthetic control unit-by-unit.

First, it pools the treated units into one matrix regression \(\mathbf{Y}_1 = \mathbf{Y}_0\boldsymbol{\Theta} + \mathbf{E}\) and estimates the entire donor-weight matrix \(\boldsymbol{\Theta}\) jointly, borrowing strength across treated units rather than solving one independent, noisy fit per treated unit. Second, the loss is the nuclear norm of the residual matrix, a “square-root” (pivotal) objective whose optimal penalty level does not depend on the unknown noise variance – so a single cross-validated \(\lambda\) regularises the whole problem, while the element-wise \(\ell_1\) term performs donor selection in the regime where the classical quadratic program is non-unique and plain Lasso over-selects.

Reach for MSQRT when you have a block of treated units adopting together, a wide donor pool (\(N_0 \gtrsim T_0\)), and you want a single interpretable, sparse set of donor weights shared across the treated block, with an ATT read as the mean post-period gap. It is the natural tool for retrospective “matched-market” lift studies and other disaggregated multi-unit roll-outs.

Do not use MSQRT when#

Adoption is staggered – treated units switch on at different times. MSQRT assumes a block design and will refuse staggered panels. Use Synthetic Difference-in-Differences (SDID), Sequential Synthetic Difference-in-Differences (Sequential SDiD), Partially Pooled SCM (PPSCM), or Matrix Completion with Nuclear Norm Minimization (MCNNM) (which handles staggered missingness natively).
There is a single treated unit. The pooling that makes MSQRT efficient has nothing to borrow across. Start at Forward Difference-in-Differences (FDID) or Two-Step Synthetic Control, or the high-dimensional single-unit tools (Forward-Selected Synthetic Control (FSCM), Sparse Synthetic Control (SparseSC)).
The donor pool is small (\(N_0 \ll T_0\)). The classical quadratic program is well-posed; use Two-Step Synthetic Control, Synthetic Control with Multiple Outcomes (SCMO), or Forward Difference-in-Differences (FDID) and keep the simplex interpretation.
The outcome panel has missing cells / gaps. MSQRT needs a balanced block panel; for informative or structured missingness use Matrix Completion with Nuclear Norm Minimization (MCNNM) (MAR) or Synthetic Nearest Neighbors / Causal Matrix Completion (SNN) (MNAR).
Spillovers contaminate the donor block (SUTVA fails) – use Spatial Synthetic Difference-in-Differences (SpSyDiD) or Spillover-Aware Synthetic Control (SPILLSYNTH).
Treatment is endogenous and you have an instrument – use Synthetic IV.
You are designing an experiment (choosing whom to treat) rather than estimating a retrospective effect – use Synthetic Controls for Experimental Design (MAREX), Synthetic Design (SYNDES), or Lexicographic Synthetic Control (LEXSCM).

Notation#

There are \(N\) units \(\mathcal{N} \coloneqq \{1, \dots, N\}\) observed over periods \(t \in \mathcal{T} \coloneqq \{1, \dots, T\}\), 1-indexed, with all treated units adopting after period \(T_0\). The pre-period is \(\mathcal{T}_1 \coloneqq \{t \in \mathcal{T} : t \le T_0\}\) (of length \(T_0\)) and the post-period is \(\mathcal{T}_2 \coloneqq \{t \in \mathcal{T} : t > T_0\}\). The treated block is the units \(j = 1, \dots, J\) and the donor pool is the never-treated units \(\mathcal{N}_0 \coloneqq \mathcal{N} \setminus \{1, \dots, J\}\) of cardinality \(N_0\).

Stack the treated units’ pre-period outcomes into the time-major matrix \(\mathbf{Y}_1 \in \mathbb{R}^{T_0 \times J}\) (column \(j\) is treated series \(\mathbf{y}_j\)) and the never-treated donors into the donor matrix \(\mathbf{Y}_0 \coloneqq [\mathbf{y}_j]_{j \in \mathcal{N}_0} \in \mathbb{R}^{T_0 \times N_0}\) (one column per donor) over \(\mathcal{T}_1\). The donor-weight matrix \(\boldsymbol{\Theta} \in \mathbb{R}^{N_0 \times J}\) has one column per treated unit; column \(j\) is that unit’s synthetic-control weights. The nuclear norm \(\lVert \mathbf{A} \rVert_{*} \coloneqq \sum_i \sigma_i(\mathbf{A})\) sums the singular values of \(\mathbf{A}\). The treatment effect on treated unit \(j\) at post-period time \(t\) is \(\tau_{jt} \coloneqq y_{jt} - (\mathbf{Y}_0\widehat{\boldsymbol{\Theta}})_{tj}\), and the ATT \(\widehat{\tau}\) is its average over the treated block and \(\mathcal{T}_2\).

The estimator#

MSQRT estimates the weight matrix by Multivariate Square-root Lasso (paper Eq. 5):

\[\widehat{\boldsymbol{\Theta}} \;=\; \operatorname*{argmin}_{\boldsymbol{\Theta}}\; \frac{1}{\sqrt{T_0}}\,\bigl\lVert \mathbf{Y}_1 - \mathbf{Y}_0\boldsymbol{\Theta} \bigr\rVert_{*} \;+\; \lambda \sum_{i,j} \lvert \Theta_{ij} \rvert .\]

The first term is a pivotal (square-root-type) loss: the nuclear norm of the residual matrix, scaled by \(1/\sqrt{T_0}\). Its key property is that the penalty level \(\lambda\) that delivers the oracle rate does not depend on the unknown noise scale, so the same \(\lambda\) regularises every column of \(\boldsymbol{\Theta}\) simultaneously. The second term is the element-wise \(\ell_1\) penalty driving sparse donor selection. The synthetic counterfactual for the treated block is \(\mathbf{Y}_0\widehat{\boldsymbol{\Theta}}\), extended into the post-period with the post-period donor matrix \(\mathbf{Y}_{0,\mathcal{T}_2}\).

Internal solver (two-split ADMM)#

The objective is convex but couples a nuclear norm and an \(\ell_1\) penalty, so mlsynth solves it with a purpose-built two-split ADMM rather than a generic conic solver (the paper itself emphasises computational efficiency). Introduce two consensus blocks – \(\mathbf{R} = \mathbf{Y}_0\boldsymbol{\Theta}\) carrying the nuclear-norm term and \(\mathbf{Z} = \boldsymbol{\Theta}\) carrying the \(\ell_1\) term. With scaled dual variables \(\mathbf{U}, \mathbf{W}\) and penalty parameter \(\rho\), each iteration is three closed-form updates (Boyd et al. 2011, §3):

\[\begin{split}\boldsymbol{\Theta} &\;\leftarrow\; (\mathbf{Y}_0^\top\mathbf{Y}_0 + \mathbf{I})^{-1} \bigl(\mathbf{Y}_0^\top(\mathbf{R} - \mathbf{U}) + (\mathbf{Z} - \mathbf{W})\bigr), \\ \mathbf{R} &\;\leftarrow\; \mathbf{Y}_1 - \mathcal{S}^{*}_{(1/\sqrt{T_0})/\rho} \bigl(\mathbf{Y}_1 - \mathbf{Y}_0\boldsymbol{\Theta} - \mathbf{U}\bigr), \\ \mathbf{Z} &\;\leftarrow\; \mathcal{S}_{\lambda/\rho} \bigl(\boldsymbol{\Theta} + \mathbf{W}\bigr),\end{split}\]

followed by the dual ascents \(\mathbf{U} \leftarrow \mathbf{U} + (\mathbf{Y}_0\boldsymbol{\Theta} - \mathbf{R})\) and \(\mathbf{W} \leftarrow \mathbf{W} + (\boldsymbol{\Theta} - \mathbf{Z})\). Here \(\mathcal{S}^{*}_\tau\) is singular-value soft-thresholding – the nuclear-norm proximal map, which SVDs its argument and shrinks each singular value toward zero by \(\tau\) – and \(\mathcal{S}_\tau\) is elementwise soft-thresholding, the \(\ell_1\) proximal map. The returned weight matrix is the exactly-sparse \(\mathbf{Z}\) block.

Three design choices make this fast. The \(\boldsymbol{\Theta}\)-step is an exact least-squares solve whose system matrix \(\mathbf{Y}_0^\top\mathbf{Y}_0 + \mathbf{I}\) does not depend on \(\rho\), so it is Cholesky-factored once and reused at every iteration; the coupling is over-relaxed; and \(\rho\) is balanced adaptively – doubled or halved whenever the primal and dual residuals diverge by more than a factor of ten. Iteration stops on the standard ADMM primal/dual residual criterion (Boyd et al. 2011, §3.3): the primal residual \((\mathbf{Y}_0\boldsymbol{\Theta} - \mathbf{R},\, \boldsymbol{\Theta} - \mathbf{Z})\) and the dual residual \(\rho\,\mathbf{A}^\top(\mathbf{z}^{k} - \mathbf{z}^{k-1})\) both fall below an absolute/relative tolerance. A high-dimensional fit therefore converges in a few hundred lightweight iterations – orders of magnitude faster than the conic reformulation, which it matches to numerical precision. A cvxpy backend (_fit_msqrt_cvxpy()) is retained as a validation oracle, selectable with solver="cvxpy".

Tuning the penalty#

The penalty \(\lambda\) is chosen by rolling-origin (expanding-window) cross-validation on the pre-period. From an initial training window the model is fit; the next cv_val_window periods are forecast and scored by mean squared error; the window expands by cv_step and the fold repeats; the \(\lambda\) minimising the mean validation MSE across folds is selected. The schedule adapts to \(T_0\) by default (initial window \(\approx 0.6\,T_0\), validation block \(\approx T_0/5\)) and is overridable (cv_initial_train, cv_val_window, cv_step, cv_folds). Two properties keep it cheap and honest: the search is pathwise – candidate penalties are visited in descending order and each ADMM solve is warm-started from the previous (larger-\(\lambda\), sparser) solution, which sharply cuts the iteration count without changing the selected penalty – and the expanding window never lets a future period inform an earlier fit. A fixed lambda_ skips cross-validation entirely.

Assumptions and remarks#

Block design. All treated units adopt at the same period, and the panel is balanced.

Remark. This is what lets the treated units share one \(\boldsymbol{\Theta}\) problem; mlsynth enforces it at ingestion and points staggered panels to Synthetic Difference-in-Differences (SDID)/Matrix Completion with Nuclear Norm Minimization (MCNNM).
Approximate factor structure / sparsity. Each treated unit is well approximated by a sparse combination of donors driven by a common low-dimensional factor structure.

Remark. The nuclear-norm loss exploits the shared factor structure across treated units; the \(\ell_1\) penalty keeps the per-unit weight vectors sparse and interpretable when \(N_0 \gtrsim T_0\).
Pivotal tuning. The noise is homoskedastic enough that the square-root loss’s variance-free penalty calibration applies.

Remark. This is the payoff of the nuclear-norm formulation – one \(\lambda\) for the whole matrix, no per-unit variance estimation.

Causal use#

MSQRT forms \(\tau_{jt} = y_{jt} - (\mathbf{Y}_0\widehat{\boldsymbol{\Theta}})_{tj}\) over the post-treatment block and aggregates to the overall ATT \(\widehat{\tau}\) (mean post-period gap), reporting it both in levels (att) and as a percentage of the synthetic counterfactual (att_percent). It also exposes att_t (mean treated gap at each post period) and unit_att (per-treated-unit post-period mean gap), so heterogeneity across the treated block is visible.

Inference#

The Shen, Song & Abadie paper establishes finite-sample estimation-error bounds for \(\widehat{\boldsymbol{\Theta}}\) rather than a confidence-interval procedure for the treatment effect. For uncertainty quantification mlsynth instead uses the non-asymptotic prediction intervals of Cattaneo, Feng, Palomba and Titiunik [SCPI] (the scpi framework). Those intervals decompose the prediction error \(\widehat{\tau} - \tau\) into an in-sample error (from estimating the SC weights) and an out-of-sample error (the irreducible post-treatment sampling noise), and bound each separately:

\[I(\tau) = \bigl[\, \widehat{\tau} - \overline{M}_{\text{in}} - \overline{M}_{\text{out}},\;\; \widehat{\tau} - \underline{M}_{\text{in}} - \underline{M}_{\text{out}} \,\bigr].\]

The in-sample bound is derived from the optimality condition of a quadratic-loss SC program. MSQRT instead minimises a nuclear-norm square-root-Lasso objective with no constraints on \(\boldsymbol{\Theta}\), so that derivation does not strictly apply. mlsynth therefore models, for MSQRT, only the out-of-sample error – the rigorous component – via the sub-Gaussian concentration bound

\[\overline{M}_{\text{out}},\,\underline{M}_{\text{out}} = \widehat{\mathbb{E}}[u\mid\mathcal{H}] \pm \sqrt{2\,\widehat{\sigma}^2 \log(2/\alpha_{\text{out}})},\]

with the conditional mean and variance proxy \(\widehat{\sigma}^2\) estimated from each treated unit’s pre-treatment residuals. The full miscoverage budget \(\alpha\) is spent on the out-of-sample band (\(\alpha_{\text{in}} = 0\)). The resulting intervals reflect post-treatment sampling uncertainty but not weight-estimation uncertainty; they are correspondingly narrower, and this is recorded on the result (in_sample_included = False).

Setting inference=True returns a SCPIResults with the four scpi predictands, each with its own band:

taua – Time-averaged unit-averaged: the overall ATT.
tsua – Time-specific unit-averaged: {period: band}, the average effect across treated units at each post-period.
taus – Time-averaged unit-specific: {unit: band}, each treated unit’s effect averaged over the post-window.
tsus – Time-specific unit-specific: {(unit, period): band}, the per-cell effect.

plus simultaneous – TSUS bands widened (Bonferroni over the post-periods) for joint coverage across all post-periods of a unit, which supports statements such as “the largest per-period effect is significant”. The time-averaged predictands accept a time_dependence setting ("iid", default, shrinks the band by \(\sqrt{L}\); "general" makes no serial-dependence assumption). The reusable mlsynth.utils.scpi_helpers module also implements the in-sample simulation bound for use by quadratic-loss SC estimators.

Size-weighted aggregate#

The unit-averaged predictands (TSUA, TAUA) weight every treated unit equally: they answer “what happened in the average treated unit”. When the treated units differ in scale – markets of different population, stores of different revenue – a size-weighted aggregate is often the quantity of interest instead: “what happened across the footprint, weighting units by how much of it they are”. This is the population-weighted aggregate noted by Wing, Freedman and Hollingsworth (2024), and it sits squarely inside the CFPT predictand framework: the unit-averaged predictand is a convex combination of the per-unit effects, and nothing in the construction requires the weights to be equal.

Pass a per-unit-constant column of weights through weight_col (for example a market’s population). The estimator then additionally reports the size-weighted TSUA/TAUA: with weights \(\omega_i\) summing to one, the point estimate is \(\sum_i \omega_i\,\widehat\tau_i\), and – because the prediction error decomposes linearly across units into an in-sample and an out-of-sample piece – the band is built from the same two components combined with the same \(\omega_i\). Equal weights reproduce the ordinary TAUA band exactly. The weighted bands appear on the SCPIResults as taua_weighted and tsua_weighted (None when no weight_col is given), alongside the equal-weight taua / tsua, so both views are available from one fit.

res = MSQRT({
    "df": df, "outcome": "Y", "treat": "treated",
    "unitid": "unit", "time": "time",
    "inference": True, "weight_col": "population",   # size weights
}).fit()

eq = res.inference_intervals.taua            # equal-weight ATT + band
wt = res.inference_intervals.taua_weighted   # population-weighted ATT + band
print(f"equal-weight ATT    = {eq.point:+.3f}  [{eq.lower:+.3f}, {eq.upper:+.3f}]")
print(f"population-weighted = {wt.point:+.3f}  [{wt.lower:+.3f}, {wt.upper:+.3f}]")

Example#

A high-dimensional, multiple-treated panel in the Shen-Song-Abadie regime: five treated units adopt together at period 100, against forty never-treated donors following an AR(1) process, with each treated unit a sparse convex combination of the donors plus noise and a constant treatment effect of +2. MSQRT pools the treated units, selects \(\lambda\) by cross-validation, and recovers the ATT with a CFPT/scpi prediction band.

from mlsynth import MSQRT
from mlsynth.utils.msqrt_helpers.simulation import simulate_msqrt_panel

df = simulate_msqrt_panel(
    n_treated=5, n_control=40, T0=100, n_post=10, att=2.0, seed=0,
)

res = MSQRT({
    "df": df, "outcome": "Y", "treat": "treated",
    "unitid": "unit", "time": "time",
    "n_lambda": 15,            # log-spaced CV grid
    "inference": True,         # CFPT/scpi prediction intervals
    "display_graphs": True,    # treated mean vs synthetic mean
}).fit()

print(f"ATT (true 2.0)   = {res.att:+.3f}")
lo, hi = res.att_ci                          # overall ATT (TAUA) band
print(f"90% scpi band    = [{lo:+.3f}, {hi:+.3f}]")
# per-period (unit-averaged) effects with their bands (raw SCPI object)
for period, band in res.inference_intervals.tsua.items():
    print(f"  k={period}: {band.point:+.3f}  [{band.lower:+.3f}, {band.upper:+.3f}]")
print(f"selected lambda  = {res.best_lambda:.3f}")
print(f"avg active donors per treated = "
      f"{res.weights.summary_stats['avg_active_donors_per_treated']:.1f}")

This prints an ATT of roughly +2.05 with a 90% band that brackets the true +2 and a pre-treatment RMSE below the noise floor.

Verification#

Note

Monte-Carlo replication. mlsynth.utils.msqrt_helpers.replication reproduces the Shen, Song & Abadie [MSQRT] simulation study (their Section 6) through the public mlsynth.MSQRT.fit() API: \(T_0 = 100\) pre-periods, \(N_0 = 400\) donors, a treated-block grid \(J \in \{50, \ldots, 400\}\), and the paper’s two data-generating settings. The headline findings reproduce – the ATT estimator is essentially unbiased (bias centred at zero) and the RMSE for the imputed \(\mathbf{Y}(0)\) stays flat in \(J\) at roughly 0.71-0.73, matching the paper’s Table 1. The PAPER preset runs the full 500-replication study; the DEMO preset is a faster, reduced-count version.

Solver. The two-split ADMM matches the cvxpy conic solution of eq. (5) to numerical precision while being orders of magnitude faster – the property that makes the high-dimensional, many-treated regime tractable.

Block-design guard. Feeding a staggered panel raises MlsynthDataError, redirecting to the staggered-adoption estimators – MSQRT’s assumptions are enforced, not assumed.

Core API#

MSQRT: Multivariate Square-root Lasso Synthetic Control (Shen et al. 2025).

Shen, Z., Song, X. & Abadie, A. (2025). “Efficiently Learning Synthetic Control Models for High-dimensional Disaggregated Data.” arXiv:2510.22828.

MSQRT targets the regime where the panel is disaggregated – many fine-grained units (often with the number of donors \(n\) comparable to or exceeding the number of pre-treatment periods \(T_0\)) and several units treated at the same time (a block design). Rather than fitting each treated unit separately, it stacks all treated units into a single matrix regression \(Y = X\Theta + E\) and estimates the donor-weight matrix \(\Theta\) by Multivariate Square-root Lasso,

\[\widehat\Theta \;=\; \arg\min_{\Theta}\; \tfrac{1}{\sqrt{T_0}}\,\lVert Y - X\Theta \rVert_{*} \;+\; \lambda \sum_{i,j} \lvert \Theta_{ij}\rvert ,\]

where \(\lVert\cdot\rVert_{*}\) is the nuclear norm. The nuclear-norm (“square-root”) loss is pivotal: the optimal penalty level does not depend on the unknown noise variance, so a single \(\lambda\) regularises the whole weight matrix while the element-wise \(\ell_1\) term performs donor selection. Borrowing strength across treated units in one joint problem is what makes the high-dimensional, multiple-treated case tractable and efficient.

The treatment effect is the mean post-treatment gap (observed minus synthetic) over the treated cells – an ATT. mlsynth selects \(\lambda\) by rolling-origin cross-validation on the pre-period and, optionally, attaches CFPT/scpi prediction intervals (Cattaneo, Feng, Palomba & Titiunik 2025) for all four predictands – TSUS, TAUS, TSUA and the overall ATT (TAUA) – plus simultaneous bands. For MSQRT only the out-of-sample (post-treatment noise) error is modelled; see mlsynth.utils.scpi_helpers.

class mlsynth.estimators.msqrt.MSQRT(config: MSQRTConfig | dict)#

Bases: object

Multivariate Square-root Lasso Synthetic Control estimator.

Parameters:: config (MSQRTConfig or dict) – Configuration object. See mlsynth.config_models.MSQRTConfig.

fit() → MSQRTResults#: Run MSQRT and return MSQRTResults.

Configuration#

class mlsynth.config_models.MSQRTConfig(*, df: ~pandas.DataFrame, outcome: str, treat: str, unitid: str, time: str, display_graphs: bool = True, save: bool | str = False, counterfactual_color: ~typing.List[str] = <factory>, treated_color: str = 'black', plot: ~mlsynth.config_models.PlotConfig = <factory>, lambda_: ~typing.Annotated[float | None, ~annotated_types.Gt(gt=0.0)] = None, n_lambda: ~typing.Annotated[int, ~annotated_types.Ge(ge=2)] = 15, cv_initial_train: ~typing.Annotated[int | None, ~annotated_types.Ge(ge=2)] = None, cv_val_window: ~typing.Annotated[int | None, ~annotated_types.Ge(ge=1)] = None, cv_step: ~typing.Annotated[int | None, ~annotated_types.Ge(ge=1)] = None, cv_folds: ~typing.Annotated[int | None, ~annotated_types.Ge(ge=1)] = None, inference: bool = False, alpha: ~typing.Annotated[float, ~annotated_types.Gt(gt=0.0), ~annotated_types.Lt(lt=1.0)] = 0.1, time_dependence: ~typing.Literal['iid', 'general'] = 'iid', weight_col: str | None = None)#

Configuration for the MSQRT estimator.

Shen, Song & Abadie (2025), “Efficiently Learning Synthetic Control Models for High-dimensional Disaggregated Data” (arXiv:2510.22828). Stacks all treated units into one matrix regression Y = X Theta + E and fits the donor-weight matrix by Multivariate Square-root Lasso (nuclear-norm loss + element-wise L1), with the L1 penalty chosen by rolling-origin cross-validation on the pre-period. Assumes a block design (all treated units adopt at the same period) and inherits the standard df / outcome / treat / unitid / time interface.

Parameters:

lambda_ (float, optional) – Fixed L1 penalty. If None (default), selected by cross-validation.
n_lambda (int) – Number of log-spaced candidate penalties in the CV grid.
cv_initial_train, cv_val_window, cv_step, cv_folds (int, optional) – Rolling-origin CV schedule overrides. Adaptive defaults (scaled to the pre-period length) are used when left None.
inference (bool) – Attach CFPT/scpi prediction intervals (Cattaneo, Feng, Palomba & Titiunik 2025) for all four predictands plus simultaneous bands. For MSQRT only the out-of-sample error is modelled. Default False.
alpha (float) – Total miscoverage level for the intervals (default 0.1 -> 90%).
time_dependence ({“iid”, “general”}) – Time-averaging assumption for the time-averaged predictands (TAUS, TAUA). "iid" (default) shrinks the band by sqrt(L); "general" makes no serial-dependence assumption.

alpha: float#

cv_folds: int | None#

cv_initial_train: int | None#

cv_step: int | None#

cv_val_window: int | None#

inference: bool#

lambda_: float | None#

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_lambda: int#

time_dependence: Literal['iid', 'general']#

weight_col: str | None#

Result Containers#

MSQRT.fit() returns a MSQRTResults: the ATT (levels and percent), the donor-weight matrix \(\boldsymbol{\Theta}\), per-treated-unit and aggregate donor weights (WeightsResults), the synthetic counterfactual and gap matrices, att_t and unit_att, the cross-unit observed/synthetic means the plotter draws, the CV-selected best_lambda, per-unit sparsity, the pre-period RMSE, and – when requested – the SCPIResults prediction intervals.

Frozen dataclasses for the MSQRT estimator.

Shen, Song & Abadie (2025), Efficiently Learning Synthetic Control Models for High-dimensional Disaggregated Data (arXiv:2510.22828). The estimator stacks all treated units into one matrix regression Y = X Theta + E and fits the donor-weight matrix Theta by Multivariate Square-root Lasso (nuclear-norm loss + element-wise L1), then forms the ATT as the mean post-treatment gap (observed minus synthetic) over the treated cells.

class mlsynth.utils.msqrt_helpers.structures.MSQRTInputs(Y_pre: ndarray, Y_post: ndarray, X_pre: ndarray, X_post: ndarray, treated_names: List[Any], control_names: List[Any], time_labels: ndarray, T0: int, unit_weights: Dict[Any, float] | None = None)#

Bases: object

Preprocessed block panel for MSQRT (multiple treated units, common timing).

Matrices are time-major (rows = periods) to match the paper’s Y = X Theta + E convention.

Y_pre, Y_post

Treated outcomes, shape (T0, m) and (T_post, m).

Type:: np.ndarray

X_pre, X_post

Control (donor) outcomes, shape (T0, n) and (T_post, n).

Type:: np.ndarray

treated_names, control_names

Type:: list

time_labels#

All period labels (length T0 + T_post).

Type:: np.ndarray

T0#

Number of pre-treatment periods (treatment begins at index T0).

Type:: int

T0: int#

X_post: ndarray#

X_pre: ndarray#

Y_post: ndarray#

Y_pre: ndarray#

control_names: List[Any]#

property m: int#

property n: int#

property n_post: int#

time_labels: ndarray#

treated_names: List[Any]#

unit_weights: Dict[Any, float] | None = None#

class mlsynth.utils.msqrt_helpers.structures.MSQRTResults(*, effects: ~mlsynth.config_models.EffectsResults | None = None, fit_diagnostics: ~mlsynth.config_models.FitDiagnosticsResults | None = None, time_series: ~mlsynth.config_models.TimeSeriesResults | None = None, weights: ~mlsynth.config_models.WeightsResults | None = None, inference: ~mlsynth.config_models.InferenceResults | None = None, method_details: ~mlsynth.config_models.MethodDetailsResults | None = None, sub_method_results: ~typing.Dict[str, ~typing.Any] | None = None, additional_outputs: ~typing.Dict[str, ~typing.Any] | None = None, raw_results: ~typing.Dict[str, ~typing.Any] | None = None, execution_summary: ~typing.Dict[str, ~typing.Any] | None = None, plot_config: ~mlsynth.config_models.PlotConfig | None = None, inputs: ~mlsynth.utils.msqrt_helpers.structures.MSQRTInputs, att_percent: float, theta: ~numpy.ndarray, counterfactual_matrix: ~numpy.ndarray, gap_matrix: ~numpy.ndarray, att_t: ~numpy.ndarray, unit_att: ~typing.Dict[~typing.Any, float], treated_mean: ~numpy.ndarray, synthetic_mean: ~numpy.ndarray, best_lambda: float, sparsity: ~numpy.ndarray, inference_intervals: ~typing.Any | None = None, metadata: ~typing.Dict[str, ~typing.Any] = <factory>)#

Bases: BaseEstimatorResults

Top-level container returned by mlsynth.MSQRT.fit().

An EffectResult (the observational report): in addition to the MSQRT-specific fields below it exposes the standardized sub-models (effects, time_series, weights, inference, fit_diagnostics, method_details) and the flat accessors att / counterfactual / gap / att_ci / pre_rmse. The treated counterfactual path (res.counterfactual) is the cross-treated-unit synthetic mean; the full (T, m) synthetic / gap matrices live in counterfactual_matrix / gap_matrix. The per-treated-unit PCR donor weights live in the standardized weights slot.

Parameters:

inputs (MSQRTInputs)
att_percent (float) – att as a percentage of the mean synthetic counterfactual on the post window.
theta (np.ndarray) – Estimated donor-weight matrix Theta, shape (n, m).
counterfactual_matrix (np.ndarray) – Synthetic (untreated) outcome for the treated units, shape (T0 + T_post, m). (Renamed from counterfactual, which now returns the 1-D treated path per the result contract.)
gap_matrix (np.ndarray) – Observed minus synthetic for the treated units, same shape. (Renamed from gap, which now returns the 1-D treated gap path.)
att_t (np.ndarray) – Mean treated gap at each post-treatment period, shape (T_post,).
unit_att (Dict) – {treated_name: post-period mean gap}.
treated_mean, synthetic_mean (np.ndarray) – Cross-treated-unit means of observed / synthetic over the full timeline (the series the plotter draws), length T0 + T_post.
best_lambda (float) – Selected (or supplied) L1 penalty.
sparsity (np.ndarray) – Per-treated-unit count of active donors (|Theta_ij| > tol).
inference_intervals (SCPIResults, optional) – CFPT/scpi prediction intervals (Cattaneo, Feng, Palomba & Titiunik 2025); see mlsynth.utils.scpi_helpers. For MSQRT only the out-of-sample error is modelled. (Renamed from inference; the standardized InferenceResults is mirrored into the inference slot so res.att_ci resolves.)
metadata (dict)

att_percent: float#

att_t: np.ndarray#

best_lambda: float#

counterfactual_matrix: np.ndarray#

gap_matrix: np.ndarray#

inference_intervals: Any | None#

inputs: MSQRTInputs#

metadata: Dict[str, Any]#

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'frozen': True, 'json_encoders': {<class 'numpy.ndarray'>: <function BaseEstimatorResults.Config.<lambda>>}}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

sparsity: np.ndarray#

synthetic_mean: np.ndarray#

theta: np.ndarray#

treated_mean: np.ndarray#

unit_att: Dict[Any, float]#

Helper Modules#

Data preparation – the DataFrame touchpoint: pivots the long panel into the stacked \(\mathbf{Y}_1 = \mathbf{Y}_0\boldsymbol{\Theta} + \mathbf{E}\) block matrices and enforces the block design.

Panel ingestion for the MSQRT estimator (block, multiple treated units).

mlsynth.utils.msqrt_helpers.setup.prepare_msqrt_inputs(df: DataFrame, outcome: str, treat: str, unitid: str, time: str, weight_col: str | None = None) → MSQRTInputs#

Pivot a long panel into the stacked Y = X Theta + E block matrices.

MSQRT (Shen, Song & Abadie 2025) assumes a block design: every ever-treated unit adopts at the same period. Treated units form the columns of Y; never-treated units form the columns of X.

The two-split ADMM solve (singular-value + element-wise soft-thresholding with a cached exact \(\boldsymbol{\Theta}\) step, adaptive penalty and over-relaxation) and the warm-started pathwise rolling-origin cross-validation over \(\lambda\). A cvxpy reference solve is retained for validation.

Multivariate Square-root Lasso solve + rolling-origin lambda selection.

Estimator (Shen, Song & Abadie 2025, eq. 5):

Theta_hat = argmin_Theta  (1/sqrt(T0)) * ||Y - X Theta||_*  +  lambda * ||Theta||_1

where ||.||_* is the nuclear norm (the “square-root”/pivotal loss; tuning does not depend on the unknown noise variance) and ||.||_1 is the elementwise L1 penalty sum_{i,j} |Theta_ij| that drives donor selection in the high-dimensional regime.

The default solver is a two-split (consensus) ADMM in which every subproblem has a closed form:

the nuclear-norm term -> singular-value soft-thresholding of the residual,
the L1 term -> elementwise soft-thresholding,
the coupling -> an exact least-squares step, solved with a Cholesky factorisation of X'X + I that is computed once and reused across every iteration (it does not depend on the penalty parameter rho).

Replacing the linearised proximal-gradient step of a one-split scheme with this exact step collapses the iteration count from thousands to a few hundred, which is what makes the high-dimensional, many-treated regime tractable. A cvxpy path is retained for validation against a general conic solver.

mlsynth.utils.msqrt_helpers.optimization.fit_msqrt_admm(Y: ndarray, X: ndarray, lambd: float, *, rho: float = 1.0, over_relax: float = 1.5, max_iter: int = 5000, abstol: float = 1e-06, reltol: float = 0.0001, accelerate: bool = False, eta: float = 0.999, adaptive_rho: bool = True, warm_start: dict | None = None, return_state: bool = False)#

Solve the Multivariate Square-root Lasso (eq. 5) by two-split ADMM.

Minimises

\[\frac{1}{\sqrt{T}}\,\lVert Y - X\Theta\rVert_* + \lambda \lVert \Theta \rVert_1\]

over Theta in R^{n x m}. The problem is cast in the standard ADMM form min f(x) + g(z) s.t. A x + B z = c with x = Theta, two consensus blocks z = (R, Z) for R = X Theta (nuclear-norm term) and Z = Theta (L1 term), A = [X; I], B = -I and c = 0. The scaled-dual updates (Boyd et al. 2011, Sec. 3.1.1) are

Theta – exact least squares: (X'X + I) Theta = X'(Rhat - Uhat) + (Zhat - What);
R – R = Y - svt(Y - X Theta - Uhat, (1/sqrt(T)) / rho);
Z – Z = soft_threshold(Theta + What, lambda / rho);
duals – U = Uhat + (X Theta - R), W = What + (Theta - Z).

where (Rhat, Zhat, Uhat, What) are the (possibly extrapolated) inputs to each iteration – equal to the previous iterate for vanilla ADMM, or a Nesterov-momentum extrapolation when accelerate=True.

The system matrix M = X'X + I does not depend on rho, so it is Cholesky-factorised once and reused for every iteration.

Parameters:

Y (numpy.ndarray, shape (T, m)) – Treated-unit outcomes, time-major (rows are periods).
X (numpy.ndarray, shape (T, n)) – Donor (control) outcomes, time-major.
lambd (float) – Non-negative L1 penalty lambda in eq. (5).
rho (float, optional) – ADMM penalty parameter (default 1.0). Affects only the convergence path, not the solution. Held fixed when accelerate=True.
over_relax (float, optional) – Over-relaxation parameter alpha in (0, 2) (default 1.5; Boyd et al. 2011, Sec. 3.4.3). In the R/Z and dual updates the coupling term A Theta is replaced by alpha (A Theta) + (1 - alpha) (previous z). Values in [1.5, 1.8] typically cut the iteration count appreciably at no per-iteration cost; alpha = 1 recovers the un-relaxed scheme.
max_iter (int, optional) – Maximum number of iterations (default 5000); a safety cap rarely reached because the exact Theta step (and acceleration) converge quickly.
abstol (float, optional) – Absolute tolerance for the primal/dual stopping rule (default 1e-6).
reltol (float, optional) – Relative tolerance for the primal/dual stopping rule (default 1e-4).
accelerate (bool, optional) – If True, apply the Nesterov-momentum acceleration with adaptive restart of Goldstein, O’Donoghue, Setzer & Baraniuk (2014), “Fast Alternating Direction Optimization Methods”, SIAM J. Imaging Sci. 7(3) (UCLA CAM report 12-35), Algorithm 8 (the general-convex restart variant). rho is held fixed in this mode. Default False: acceleration targets the convergence order, but for this problem the limiting factor is the mismatch in scale between the two constraints (R = X Theta is on the scale of the data; Z = Theta on the scale of the weights). Boyd’s adaptive-rho residual balancing (Sec. 3.4.1) – the practical proxy for the per-constraint penalties of Sec. 3.4.2 – handles that directly and, empirically, converges to the cvxpy optimum (objective agreeing to five significant figures) far faster than fixed-rho acceleration.
eta (float, optional) – Restart threshold in (0, 1) (default 0.999): when accelerating, momentum is kept while the combined residual decreases by at least a factor eta, otherwise the iterate restarts (momentum reset). Only used when accelerate=True.
adaptive_rho (bool, optional) – If True (default), and only when accelerate=False, apply residual-balancing rho updates (Boyd et al. 2011, Sec. 3.4.1): double/halve rho when the primal and dual residuals differ by more than 10x (mu = 10, tau = 2), rescaling the scaled duals accordingly. This is the default solving mode. Ignored when accelerate=True.
warm_start (dict or None, optional) – Initial solver state to start from, e.g. the state returned by a previous call (keys R, Z, U, W, rho). Defaults to a cold start (zeros). Warm starting from a nearby solution – as in the pathwise cross-validation of select_lambda_cv() – sharply reduces the iteration count.
return_state (bool, optional) – If True, also return the final solver state dict (for warm-starting a subsequent solve). Default False.

Returns:

numpy.ndarray, shape (n, m) – The estimated weight matrix Theta (the L1 Z-block, hence exactly sparse). X @ Theta is the fitted treated-outcome matrix.
state (dict, optional) – Returned only when return_state=True: the final R, Z, U, W and rho, suitable as a warm_start for a related problem.

Notes

Termination uses the primal/dual residual criterion of Boyd et al. (2011, Sec. 3.3): stop when ||r||_2 <= eps_pri and ||s||_2 <= eps_dual with primal residual r = (X Theta - R, Theta - Z), dual residual s = rho * A'(z^k - z^{k-1}), and tolerances eps_pri = sqrt((T+n) m) abstol + reltol max(||A Theta||, ||B z||) and eps_dual = sqrt(n m) abstol + reltol ||A' (rho u)||. The accelerated restart uses the combined residual c_k = rho (||u^k - uhat^k||^2 + ||z^k - zhat^k||^2).

mlsynth.utils.msqrt_helpers.optimization.fit_msqrt_weights(Y: ndarray, X: ndarray, lambd: float, *, tol: float = 0.01, solver: str = 'admm', **solver_kwargs) → Tuple[ndarray, ndarray, ndarray]#

Solve eq. (5) for the donor-weight matrix and report fit + sparsity.

Parameters:

Y (numpy.ndarray, shape (T, m)) – Treated-unit outcomes, time-major.
X (numpy.ndarray, shape (T, n)) – Donor outcomes, time-major.
lambd (float) – Non-negative L1 penalty lambda.
tol (float, optional) – Magnitude below which a weight is treated as zero when counting active donors (default 1e-2). Does not affect the solve.
solver ({“admm”, “cvxpy”}, optional) – Which backend to use. "admm" (default) is the fast two-split solver fit_msqrt_admm(); "cvxpy" is the slow conic reference _fit_msqrt_cvxpy().
**solver_kwargs – Extra keyword arguments forwarded to fit_msqrt_admm() (ignored for the cvxpy backend), e.g. rho, max_iter, abstol, reltol.

Returns:

Theta_hat (numpy.ndarray, shape (n, m)) – Estimated weight matrix.
Y_hat (numpy.ndarray, shape (T, m)) – Fitted treated outcomes X @ Theta_hat.
nonzero_per_col (numpy.ndarray, shape (m,)) – Number of active donors (|Theta_ij| > tol) for each treated unit.

Raises:

ValueError – If solver is not "admm" or "cvxpy".

mlsynth.utils.msqrt_helpers.optimization.select_lambda_cv(Y_pre: ndarray, X_pre: ndarray, lambdas: Sequence[float], *, initial_train: int | None = None, val_window: int | None = None, step: int | None = None, n_folds: int | None = None, solver: str = 'admm') → float#

Select the L1 penalty by rolling-origin cross-validation on the pre-period.

Uses an expanding training window: for each fold the model is fit on Y_pre[:train_end] and scored by mean squared error on the next val_window periods. The penalty minimising the mean validation MSE over all folds is returned.

Parameters:

Y_pre (numpy.ndarray, shape (T0, m)) – Pre-treatment treated-unit outcomes, time-major.
X_pre (numpy.ndarray, shape (T0, n)) – Pre-treatment donor outcomes, time-major.
lambdas (sequence of float) – Candidate penalty values to search.
initial_train, val_window, step (int or None, optional) – Rolling-origin schedule overrides; see _cv_schedule() for the adaptive defaults used when these are None.
n_folds (int or None, optional) – If given, use at most this many folds (the earliest ones).
solver ({“admm”, “cvxpy”}, optional) – Backend forwarded to the solver (default "admm").

Returns:

float – The cross-validated penalty. Falls back to min(lambdas) when the pre-period is too short to form any fold.

Notes

With the ADMM backend the search is pathwise: within each fold the candidate penalties are visited in descending order and each solve is warm-started from the previous (larger-lambda, sparser) solution. Since neighbouring penalties have nearby solutions, this cuts the per-solve iteration count substantially relative to cold starts, without changing the selected penalty.

Run loop: CV / fit, ATT and per-period/per-unit effects, donor-weight assembly, and the optional scpi prediction intervals.

Orchestration for the MSQRT estimator (Shen, Song & Abadie 2025).

Coordinates the steps end to end: choose the L1 penalty (or take the supplied one), fit the donor-weight matrix by Multivariate Square-root Lasso, build the synthetic counterfactual, form the ATT and per-period/per-unit effects, optionally attach a block-conformal band, and assemble MSQRTResults.

mlsynth.utils.msqrt_helpers.pipeline.run_msqrt(inputs: MSQRTInputs, *, lambda_: float | None = None, n_lambda: int = 15, lambdas: Sequence[float] | None = None, cv_initial_train: int | None = None, cv_val_window: int | None = None, cv_step: int | None = None, cv_folds: int | None = None, inference: bool = False, alpha: float = 0.1, time_dependence: str = 'iid') → MSQRTResults#

Fit MSQRT and assemble MSQRTResults.

Parameters:

inputs (MSQRTInputs)
lambda_ (float, optional) – Fixed L1 penalty. If None, chosen by rolling-origin CV.
n_lambda (int) – Size of the default log-spaced CV grid (used when lambdas is None).
lambdas (sequence, optional) – Explicit CV grid (overrides n_lambda).
cv_* (int, optional) – Rolling-origin CV schedule overrides; adaptive defaults otherwise.
inference (bool) – Attach CFPT/scpi prediction intervals (Cattaneo, Feng, Palomba & Titiunik 2025) on all four predictands + simultaneous bands. For MSQRT only the out-of-sample (post-treatment noise) error is modelled – see mlsynth.utils.scpi_helpers. Default False.
alpha (float) – Total miscoverage level for the intervals.
time_dependence ({“iid”, “general”}) – Time-averaging assumption for the time-averaged predictands.

The data-generating process: AR(1) donors and a sparse convex donor-weight matrix (the Shen-Song-Abadie regime), wrapped into a long panel with an injected treatment effect for examples and tests.

Data-generating processes for MSQRT examples, tests and replication.

Implements the regime of Shen, Song & Abadie (2025): many disaggregated donor units following an AR(1) process, with several treated units adopting at the same period (a block design), each treated unit a sparse, convex combination of the donors. These are the building blocks of the paper’s Monte-Carlo study (see mlsynth.utils.msqrt_helpers.replication); simulate_msqrt_panel() wraps them into a tidy long panel – with an injected treatment effect – ready for mlsynth.MSQRT.

mlsynth.utils.msqrt_helpers.simulation.ar1_panel(n: int, T: int, rng: Generator, *, burn: int = 50) → ndarray#

Simulate the paper’s AR(1) outcome panel, time-major.

Each unit follows Y_{i,t} = 0.1 * c_i + 0.9 * Y_{i,t-1} + Z_{i,t} with c_i in {1, ..., 10} (cycled across units) and Z_{i,t} ~ N(0, 1), so the stationary mean of unit i is c_i and the series is persistent.

Parameters:

n (int) – Number of units.
T (int) – Number of retained periods (after burn-in).
rng (numpy.random.Generator) – Random generator (drives the innovations).
burn (int, optional) – Burn-in periods discarded so the panel starts near stationarity (default 50).

Returns:

numpy.ndarray, shape (T, n) – Time-major outcome matrix (rows are periods, columns are units).

mlsynth.utils.msqrt_helpers.simulation.random_theta(n: int, m: int, s: int, rng: Generator) → ndarray#

Random sparse weight matrix with s total non-zeros, columns summing to 1.

The s non-zeros are spread across the m columns as evenly as possible (at least one per column); within a column the non-zero weights are drawn uniformly and normalised to sum to one (a convex combination of donors).

Parameters:

n (int) – Number of donors (rows of Theta).
m (int) – Number of treated units (columns of Theta).
s (int) – Total number of non-zero entries across the whole matrix.
rng (numpy.random.Generator) – Random generator.

Returns:

numpy.ndarray, shape (n, m) – Sparse, column-stochastic weight matrix.

mlsynth.utils.msqrt_helpers.simulation.simulate_msqrt_panel(n_treated: int = 5, n_control: int = 40, T0: int = 100, n_post: int = 10, nonzeros_per_unit: int = 5, att: float = 2.0, noise: float = 0.5, seed: int = 0) → DataFrame#

Simulate a block, multiple-treated panel in the Shen-Song-Abadie regime.

Donors follow the AR(1) process of ar1_panel(); each treated unit is a sparse convex combination of the donors (random_theta()) plus Gaussian noise – i.e. Y_treated = X @ Theta + E – so the synthetic-control model is well specified. A constant treatment effect att is then added to the treated units’ post-treatment cells; all treated units adopt at period T0 (a block design).

Parameters:

n_treated, n_control (int) – Counts of treated and never-treated (donor) units.
T0, n_post (int) – Pre- and post-treatment period counts (common adoption at T0).
nonzeros_per_unit (int) – Approximate number of active donors per treated unit (controls the sparsity of Theta); capped at n_control.
att (float) – Constant additive treatment effect applied to treated post-period cells.
noise (float) – Standard deviation of the idiosyncratic error E.
seed (int) – RNG seed.

Returns:

pandas.DataFrame – Long panel with columns unit (c* donors, t* treated), time, Y, treated.

Replication of the paper’s Monte-Carlo study (Section 6) through the public MSQRT.fit API – the two data-generating settings, the ATT-bias and RMSE metrics, and the PAPER / DEMO presets.

Replication of the Shen, Song & Abadie (2025) Monte-Carlo study (Section 6).

Shen, Z., Song, X. & Abadie, A. (2025), Efficiently Learning Synthetic Control Models for High-dimensional Disaggregated Data (arXiv:2510.22828).

This reproduces the simulation design for the MSC estimator (the paper’s name for the Multivariate Square-root Lasso SC that mlsynth ships as MSQRT):

Design#

T0 pre-treatment periods, T1 post-treatment periods, n controls, m treated units (swept over a grid), s total non-zeros in Theta.
Controls X follow an AR(1): Y_{i,t} = 0.1*c_i + 0.9*Y_{i,t-1} + Z, c_i in {1..10}, Z ~ N(0, 1).
Setting (1) – treated units generated by the same AR(1) process (no true sparse SC representation; misspecified).
Setting (2) – treated units generated by Y = X Theta + E with a random s-sparse Theta whose columns sum to 1 and E ~ N(0, sigma^2) (well-specified sparse SC).

No treatment effect is injected, so the true ATT delta = 0 and the reported ATT bias delta_hat - delta is the mean counterfactual-prediction error at the first post-period (paper eq. 1). The RMSE is the root-mean-square error of the predicted Y(0) over all T1 post-periods (paper Table 1).

The paper’s headline parameters (PAPER) are n=400, T0=100, T1=10, s=1000, m in {50,100,150,200,250,300,350,400}, 500 reps. Those nuclear-norm programs are large; DEMO is a proportionally scaled preset (same s/n, m/n and s/m ratios) that runs quickly and directionally reproduces the findings.

class mlsynth.utils.msqrt_helpers.replication.SimConfig(n: 'int', T0: 'int', T1: 'int', s: 'int', sigma: 'float', m_grid: 'List[int]', n_reps: 'int', lambda_grid: 'List[float]' = <factory>)#

T0: int#

T1: int#

lambda_grid: List[float]#

m_grid: List[int]#

n: int#

n_reps: int#

s: int#

sigma: float#

mlsynth.utils.msqrt_helpers.replication.run_msqrt_simulation(cfg: SimConfig = SimConfig(n=150, T0=100, T1=10, s=375, sigma=0.5, m_grid=[20, 40, 60, 80, 100, 120, 150], n_reps=10, lambda_grid=[np.float64(0.01), np.float64(0.039810717055349734), np.float64(0.15848931924611134), np.float64(0.630957344480193), np.float64(2.5118864315095797), np.float64(10.0)]), *, settings=(1, 2), seed: int = 0, lambda_override: float | None = None, verbose: bool = True) → Dict#

Run the MSC Monte-Carlo and return summary tables per setting.

For each setting the L1 penalty is pre-selected once (as in the paper) by rolling-origin CV on a representative draw, then reused across all m and replications. Pass lambda_override to fix the penalty and skip CV (much faster – CV is many extra solves).

Returns:: dict – {setting: list of {m, rmse_mean, rmse_std, bias_mean, bias_std}}.

mlsynth.utils.msqrt_helpers.replication.simulate_shen2025(setting: int, m: int, *, n: int, T0: int, T1: int, s: int, sigma: float, rng: Generator)#: One Monte-Carlo draw. Returns (X, Y_treated) time-major over T0+T1.

mlsynth.utils.msqrt_helpers.replication.to_long_df(X: ndarray, Y: ndarray, T0: int) → DataFrame#

Stack control matrix X (T,n) and treated matrix Y (T,m) into the long panel MSQRT expects: columns unit, time, Y, treated.

Treated units (one block, common adoption) switch on at period T0; no treatment effect is injected, so the observed outcome is Y(0).

Uncertainty quantification is provided by the shared CFPT/scpi module (Cattaneo, Feng, Palomba & Titiunik 2025), which any synthetic-control estimator can reuse.

CFPT/scpi prediction-interval construction.

Out-of-sample bands (Section 4.2 of Cattaneo, Feng, Palomba & Titiunik 2025) for the four causal predictands and their simultaneous (uniform-over-periods) versions, plus the in-sample simulation bound (Section 4.1) as a reusable helper for quadratic-loss SC estimators.

All bands are returned as SCPIBand objects and assembled into a SCPIResults. The interval for a predictand tau is

[ tau_hat - Mbar_in - Mbar_out , tau_hat - M_in - M_out ].

When the in-sample error is not modelled (in_sample_included=False, e.g. MSQRT), the in-sample contribution is (0, 0) and the full miscoverage budget alpha is spent on the out-of-sample band.

mlsynth.utils.scpi_helpers.intervals.in_sample_band_gaussian(donor_post: ndarray, Q_hat: ndarray, Sigma_hat: ndarray, *, alpha_in: float = 0.05, n_sim: int = 1000, random_state: int = 0) → Tuple[float, float]#

In-sample error band for a TSUS counterfactual, unconstrained weights.

Implements Cattaneo et al.’s simulation (their eq. 4.4) for the special case of an unconstrained donor regression, where the feasible set Delta* is all of R^J0 and the inf/sup of the donor projection over the quadratic constraint delta' Q_hat delta - 2 G*' delta <= 0 has the closed form

a’ Q_hat^{-1} G* ± sqrt( (G*’ Q_hat^{-1} G*) (a’ Q_hat^{-1} a) ),

with a = donor_post. Draws G* ~ N(0, Sigma_hat) and returns the (alpha_in/2, 1 - alpha_in/2) quantiles (M_in, Mbar_in).

Parameters:

donor_post (np.ndarray) – (J0,) donor outcomes at the post-treatment period.
Q_hat (np.ndarray) – (J0, J0) donor Gram matrix Y_N' Y_N over the pre-period.
Sigma_hat (np.ndarray) – (J0, J0) estimate of V[gamma_hat | H].

mlsynth.utils.scpi_helpers.intervals.out_of_sample_intervals(effects: ndarray, pre_residuals: ndarray, unit_names: Sequence, period_labels: Sequence, *, alpha: float = 0.1, time_dependence: str = 'iid', assume_zero_mean: bool = False, unit_weights: Dict | None = None) → SCPIResults#

Build the full CFPT out-of-sample interval family for a block design.

Parameters:

effects (np.ndarray) – (L, m) post-period predicted effects tau_hat_ik (rows = post-periods, columns = treated units).
pre_residuals (np.ndarray) – (T0, m) pre-period SC residuals (one column per treated unit).
unit_names (sequence) – Length-m treated-unit identifiers.
period_labels (sequence) – Length-L post-period labels.
alpha (float) – Total (and, here, out-of-sample) miscoverage level.
time_dependence ({“iid”, “general”}) – Time-averaging bound for the time-averaged predictands (TAUS, TAUA). "iid" shrinks the band by sqrt(L) (residuals independent over time); "general" makes no dependence assumption (no shrinkage).
assume_zero_mean (bool) – Pass-through to the conditional-moment estimator.

Conditional-moment estimation for the out-of-sample CFPT band.

The out-of-sample error of an SC prediction is the post-treatment residual u_i(Ti+k). Cattaneo et al. bound it with a sub-Gaussian concentration inequality that needs two conditional moments per treated unit: the mean E[u_it | H] and a variance proxy sigma_it^2 (the optimal sub-Gaussian parameter, which equals V[u_it | H] under strict sub-Gaussianity). Both are estimated from the pre-treatment residuals of the synthetic-control fit.

For the block, short-post settings MSQRT targets, a per-unit constant proxy (the residual mean and variance over the pre-period) is the natural default; assume_zero_mean imposes the common E[u|H] = 0 simplification.

mlsynth.utils.scpi_helpers.moments.conditional_moments(pre_residuals: ndarray, *, assume_zero_mean: bool = False) → Tuple[float, float]#

Per-unit (mean, variance-proxy) from a vector of pre-period residuals.

Parameters:

pre_residuals (np.ndarray) – Synthetic-control residuals u_hat_it over the pre-treatment periods for one treated unit.
assume_zero_mean (bool) – If True, fix E[u|H] = 0 and estimate the variance proxy as the mean squared residual; otherwise demean first.

mlsynth.utils.scpi_helpers.moments.unit_moments(pre_residuals: ndarray, unit_names, *, assume_zero_mean: bool = False) → Tuple[Dict, Dict]#

Stack conditional_moments() over the columns of a residual matrix.

Parameters:

pre_residuals (np.ndarray) – (T0, m) pre-period residuals (one column per treated unit).
unit_names (sequence) – Length-m treated-unit identifiers (the dict keys).

Returns:

(mu, sigma) ((dict, dict)) – {unit: mean} and {unit: sigma} (standard deviation, not variance).

Result containers for the CFPT/scpi prediction intervals.

Cattaneo, Feng, Palomba & Titiunik (2025), “Uncertainty Quantification in Synthetic Controls with Staggered Treatment Adoption” (arXiv:2210.05026).

The method quantifies the uncertainty of a synthetic-control prediction tau_hat by separately bounding two error sources and combining them by a union bound,

I(tau) = [ tau_hat - Mbar_in - Mbar_out , tau_hat - M_in - M_out ],

where [M_in, Mbar_in] covers the in-sample error (SC-weight estimation) and [M_out, Mbar_out] covers the out-of-sample error (post-treatment sampling noise). Four causal predictands are supported – TSUS (unit x period), TAUS (unit, time-averaged), TSUA (period, unit-averaged) and TAUA (overall) – plus simultaneous (uniform-over-periods) bands.

class mlsynth.utils.scpi_helpers.structures.SCPIBand(predictand: str, label: Any, point: float, lower: float, upper: float, out_sample: Tuple[float, float], in_sample: Tuple[float, float] = (0.0, 0.0))#

Bases: object

One prediction interval for a single predictand value.

predictand#

One of "TSUS", "TAUS", "TSUA", "TAUA".

Type:: str

label#

Identifier of the value: a unit (TAUS), a post-period (TSUA), a (unit, period) pair (TSUS), or None (TAUA).

Type:: Any

point#

The point prediction tau_hat.

Type:: float

lower, upper

Prediction-interval endpoints.

Type:: float

out_sample#

The out-of-sample contribution (M_out, Mbar_out).

Type:: (float, float)

in_sample#

The in-sample contribution (M_in, Mbar_in) ((0, 0) when the in-sample error is not modelled, e.g. for MSQRT).

Type:: (float, float)

property ci: Tuple[float, float]#

in_sample: Tuple[float, float] = (0.0, 0.0)#

label: Any#

lower: float#

out_sample: Tuple[float, float]#

point: float#

predictand: str#

property significant: bool#: True when the interval excludes zero.

upper: float#

class mlsynth.utils.scpi_helpers.structures.SCPIResults(method: str, alpha: float, alpha_in: float, alpha_out: float, in_sample_included: bool, taua: ~mlsynth.utils.scpi_helpers.structures.SCPIBand, tsua: ~typing.Dict[~typing.Any, ~mlsynth.utils.scpi_helpers.structures.SCPIBand], taus: ~typing.Dict[~typing.Any, ~mlsynth.utils.scpi_helpers.structures.SCPIBand], tsus: ~typing.Dict[~typing.Tuple[~typing.Any, ~typing.Any], ~mlsynth.utils.scpi_helpers.structures.SCPIBand], simultaneous: ~typing.Dict[~typing.Any, ~typing.List[~mlsynth.utils.scpi_helpers.structures.SCPIBand]], sigma: ~typing.Dict[~typing.Any, float], time_dependence: str = 'iid', metadata: ~typing.Dict[str, ~typing.Any] = <factory>, taua_weighted: ~mlsynth.utils.scpi_helpers.structures.SCPIBand | None = None, tsua_weighted: ~typing.Dict[~typing.Any, ~mlsynth.utils.scpi_helpers.structures.SCPIBand] | None = None)#

Bases: object

Full CFPT/scpi uncertainty quantification for a set of predictands.

method#

"cfpt_scpi".

Type:: str

alpha, alpha_in, alpha_out

Total target miscoverage and its in/out split. When the in-sample error is omitted, alpha_out == alpha and alpha_in == 0.

Type:: float

in_sample_included#

Whether the in-sample error was modelled.

Type:: bool

taua#

Overall average effect (the ATT) with its band.

Type:: SCPIBand

tsua#

{period_label: SCPIBand} – unit-averaged effect each post-period.

Type:: dict

taus#

{unit: SCPIBand} – time-averaged effect for each treated unit.

Type:: dict

tsus#

{(unit, period_label): SCPIBand} – per unit, per period.

Type:: dict

simultaneous#

{unit: [SCPIBand, ...]} – TSUS bands widened for joint coverage across all post-periods of that unit.

Type:: dict

sigma#

{unit: sigma} – per-unit sub-Gaussian variance-proxy estimate.

Type:: dict

time_dependence#

"iid" or "general" (controls the time-averaging bound).

Type:: str

alpha: float#

alpha_in: float#

alpha_out: float#

property att: float#

property ci: Tuple[float, float]#: The overall-ATT (TAUA) interval.

in_sample_included: bool#

metadata: Dict[str, Any]#

method: str#

sigma: Dict[Any, float]#

simultaneous: Dict[Any, List[SCPIBand]]#

taua: SCPIBand#

taua_weighted: SCPIBand | None = None#

taus: Dict[Any, SCPIBand]#

time_dependence: str = 'iid'#

tsua: Dict[Any, SCPIBand]#

tsua_weighted: Dict[Any, SCPIBand] | None = None#

tsus: Dict[Tuple[Any, Any], SCPIBand]#

Multivariate Square-root Lasso Synthetic Control (MSQRT)

Contents

Multivariate Square-root Lasso Synthetic Control (MSQRT)#

When to Use This Estimator#

Do not use MSQRT when#

Notation#

The estimator#

Internal solver (two-split ADMM)#

Tuning the penalty#

Assumptions and remarks#

Causal use#

Inference#

Size-weighted aggregate#

Example#

Verification#

Core API#

Configuration#

Result Containers#

Helper Modules#

Design#