Multi-Level Synthetic Control (mlSC)

Multi-Level Synthetic Control (mlSC)#

Overview#

Multi-Level Synthetic Control (mlSC) is the estimator proposed by Bottmer (2025) for panels with two levels of aggregation: an aggregate level at which treatment is assigned (e.g. states), and a finer disaggregate level at which outcomes are also observed (e.g. counties within those states). Such panels are common in policy evaluation — state-wide cigarette and minimum-wage policies, for instance, are routinely studied with county-level outcome data — and they raise a question that classical SC does not address: at what level of aggregation should the estimator operate?

Existing practice spans the full range: some studies aggregate every- thing to the state level and apply classical SC; others enlarge the donor pool with all disaggregated control units; still others disaggregate both treated and control sides. Each choice has a different bias / noise profile. The mlSC estimator turns the aggregation choice into a data-driven regularization problem: the disaggregate weights are allowed to vary freely, but a ridge-type penalty shrinks them toward the classical-SC solution, with the penalty strength selected from data.

Two structural properties distinguish mlSC from the rest of the mlsynth toolkit. First, it operates on two long-form DataFrames, one at the aggregate level (df_agg) and one at the disaggregate level (df_disagg), with an agg_id column on df_disagg mapping each disaggregate unit to its parent aggregate. Every other estimator in the package takes a single panel. Second, the entire spectrum from classical SC to fully disaggregated SC is recovered as limiting cases of a single penalty parameter \(\lambda\) — at \(\lambda \to \infty\) mlSC reduces to the classical SC of Abadie, Diamond, and Hainmueller (2010); at \(\lambda = 0\) it recovers the fully-disaggregated control SC (dGSC-AD in the paper’s notation); intermediate values pick a data-driven mixture.

When to use mlSC – and when not to#

Reach for mlSC when:

You have disaggregate data underneath an aggregate-level policy: county outcomes beneath a state-wide tax change, store-level outcomes beneath a chain-wide pricing change, household outcomes beneath a regional-government program. The estimand stays at the aggregate level; the disaggregate data is information to be exploited.
The canonical SC pre-fit at the aggregate level is mediocre – the small number of aggregates makes the hull thin and the pre-period RMSE elevated. Disaggregating the donor pool widens the hull by an order of magnitude (counties vs states), and the penalty keeps the wider hull from degenerating into a non-unique solution.
You want a single data-driven aggregation choice rather than the manual “do I use state or county donors?” call. The heuristic \(\widehat{\lambda}\) and the (forthcoming) cross-validation-over-time selector turn that judgment into a tunable parameter.
Pre-period \(T_0\) is short enough that classical SC overfits but long enough that the heuristic-variance decomposition is well-identified (a dozen pre-periods or more, per Bottmer’s Section 6 calibration).

Do not use mlSC when:

You only have aggregate-level data. mlSC requires df_disagg with at least a handful of disaggregate units per donor aggregate. Without it, fall back to canonical canonical SCM / Two-Step Synthetic Control / Forward Difference-in-Differences (FDID).
Treatment is assigned at the disaggregate level. The paper enforces aggregate-level assignment (A2); if the policy hits a single county rather than a whole state, you want a disaggregate-target estimator (canonical SCM, MicroSynth (User-Level Balancing SC) for individual-level treatment with covariate balancing).
The aggregate outcome isn’t a linear function of disaggregate outcomes. Ratios, order statistics, and log-of-aggregate outcomes break A1. Either re-derive the outcome so the aggregation is exact, or move to canonical SC.
The treated aggregate is structurally outside the disaggregate hull. Even at \(\lambda = 0\) the pre-period RMSE stays large – the wider hull doesn’t help because the treated unit’s level is unreachable by any convex combination of donor disaggregates. Use Imperfect Synthetic Controls (ISCM), whose A2(b) mechanism identifies the effect through donors that use the treated aggregate as a positive-weight donor.
You need within-aggregate treatment effect heterogeneity. mlSC targets the aggregate ATT. The paper notes (Introduction) that disaggregating the treated unit (dGSC-DA / dGSC) is the door to heterogeneity but typically worsens out-of-sample performance for the aggregate-level effect. For genuinely heterogeneous-effects questions, run county-by-county canonical SC instead.
Outcomes are extremely noisy at the disaggregate level. When the within-state idiosyncratic shock variance dominates the total variance, the disaggregate data is more noise than signal and the heuristic pushes \(\lambda \to \infty\) (collapsing mlSC to classical SC). In that regime use canonical SC directly to skip the extra bookkeeping.
Staggered adoption across multiple aggregates. Single-cohort is the paper’s main scope; for staggered designs use FECT or Synthetic Difference-in-Differences (SDID).
Spillovers within a donor aggregate. A3 fails if counties within a donor state influence each other in a way the aggregation cannot wash out. Use Spillover-Aware Synthetic Control (SPILLSYNTH) or Spatial Synthetic Difference-in-Differences (SpSyDiD).
Distributional questions (Lorenz curves, QTEs). mlSC targets the mean aggregate ATT; for distributional effects use Distributional Synthetic Control (DSC).
Continuous treatment. Use Continuous-Treatment Synthetic Control (CTSC).
You need a sparse, hand-interpretable single weight vector for policy storytelling. mlSC’s disaggregate weight vector \(\mathbf{w}\) is intrinsically a many-county object; you can report the implied aggregate weights results.aggregate_donor_weights (Bottmer’s \(w_s = \sum_{c} w_{sc}\)) but those are derived rather than directly optimised, and at small \(\lambda\) they will not be sparse. If the headline story has to be “California ≈ 0.4 Utah + 0.3 Montana + 0.2 Nevada”, canonical SC remains the cleaner deliverable.

Notation#

mlSC works on a two-level panel, so its symbols extend the canon with one extra index for the disaggregate (sub-unit) level. Let \(j = 1\) denote the treated aggregate unit (the single state where the policy lands), with all aggregates \(\mathcal{N} \coloneqq \{1, \dots, N\}\) and the donor pool of control aggregates \(\mathcal{N}_0 \coloneqq \mathcal{N} \setminus \{1\}\). We index a generic control aggregate by \(s \in \mathcal{N}_0\); each aggregate \(s\) contains \(C_s\) disaggregate sub-units (e.g. counties), indexed \(c = 1, \dots, C_s\). Time runs over \(t \in \mathcal{T} \coloneqq \{1, \dots, T\}\), 1-indexed; the intervention takes effect after period \(T_0\), splitting \(\mathcal{T}\) into the pre-period \(\mathcal{T}_1 \coloneqq \{t \in \mathcal{T} : t \le T_0\}\) (of length \(T_0\)) and the post-period \(\mathcal{T}_2 \coloneqq \{t \in \mathcal{T} : t > T_0\}\), and is absorbing thereafter (treatment never turns off).

Outcomes carry the sub-unit index: \(y_{sct}\) is the disaggregate outcome of sub-unit \(c\) in aggregate \(s\) at time \(t\), and \(y_{st}\) the aggregate outcome. The treated aggregate’s series is \(\mathbf{y}_1\) with entries \(y_{1t}\). The disaggregate donor pool stacks every sub-unit of every control aggregate into the donor matrix \(\mathbf{Y}_0 \in \mathbb{R}^{T \times N_0}\) with \(N_0 \coloneqq \sum_{s \in \mathcal{N}_0} C_s\) columns (one per disaggregate donor). Population aggregation weights \(v_{sc}\) sum to one within each aggregate. Disaggregate donor weights are \(\mathbf{w} \in \mathbb{R}^{N_0}\) with entries \(w_{sc}\), constrained to the unit simplex \(\Delta^{N_0} \coloneqq \{\mathbf{w} \in \mathbb{R}_{\ge 0}^{N_0} : \|\mathbf{w}\|_1 = 1\}\); the optimiser is \(\mathbf{w}^\ast\), and \(w_s \coloneqq \sum_{c} w_{sc}\) is the implied aggregate weight on control aggregate \(s\). The penalty strength is \(\lambda \ge 0\). The synthetic counterfactual is \(\widehat{\mathbf{y}}_1 \coloneqq \mathbf{Y}_0\,\mathbf{w}^\ast\) with entries \(\widehat{y}_{1t}\), the per-period effect is \(\tau_t \coloneqq y_{1t} - \widehat{y}_{1t}\), and the ATT is \(\widehat{\tau} \coloneqq |\mathcal{T}_2|^{-1} \sum_{t \in \mathcal{T}_2} \tau_t\).

Assumptions (Bottmer 2025)#

mlSC inherits the canonical SC identification stack and adds the structural conditions needed for the hierarchical penalty to make sense – and for the heuristic \(\lambda\) to be calibrated. The paper’s formal assumptions:

A1 (linear aggregation). The aggregate outcome is a known weighted average of its disaggregate components, \(y_{st} = \sum_{c = 1}^{C_s} v_{sc}\, y_{sct}\) with \(\sum_{c} v_{sc} = 1\) for every aggregate. The weights \(v_{sc}\) are observable to the analyst (uniform by default; population shares are the canonical alternative).

Remark. This is what makes the estimand at the aggregate level a clean function of the disaggregate potential outcomes – the disaggregate weights \(\mathbf{w}\) can target the aggregate path \(\mathbf{y}_1\) exactly because the two levels are tied by a known linear map. Outcomes that aggregate non-linearly (a rate, a log, an order statistic) break the link and the disaggregate data no longer reconstructs the aggregate target.

A2 (binary, sharp, absorbing treatment at the aggregate level). A single aggregate unit (\(j = 1\)) is treated; the treatment indicator \(d_{sct}\) is 1 iff \(s = 1\) and \(t > T_0\), and once on it stays on. Equivalently defined at the aggregate level. Multiple treated aggregates are out of scope of the paper’s main theory (the paper discusses staggered designs only as a future direction; mlsynth’s FECT / Synthetic Difference-in-Differences (SDID) are the staggered-aware alternatives).

Remark. The single-cohort, never-reversed design is what lets the whole pre-period \(\mathcal{T}_1\) serve as untreated training data for one target series \(\mathbf{y}_1\). Staggered or reversible treatment would contaminate the donor pool with treated outcomes; the config validator therefore enforces a single cohort by construction.

A3 (SUTVA at the disaggregate level). No interference between disaggregate units. Note this is stronger than canonical-SC SUTVA which is at the aggregate level: mlSC pulls within-state variation directly into the weight fit, so an exposed county influencing a non-exposed county within the same donor state breaks identification at the level mlSC operates on.

Remark. Because the weights are fit on sub-unit series rather than the aggregate alone, contamination at the sub-unit level enters the fit directly. A donor sub-unit that responds to the treated aggregate’s shock (a shared labour or media market) carries intervention signal into \(\mathbf{Y}_0\), biasing the counterfactual at the very level mlSC exploits.

A4 (hierarchical latent-factor outcome model – Section 6.1, Appendix G). The untreated disaggregate outcome decomposes as

\[y_{sct}^{N} \;=\; \underbrace{\alpha_s' \beta_t}_{L_{st}^{\text{agg}}} \;+\; \underbrace{\eta_{sc}' \beta_t}_{L_{sct}^{\text{disagg}}} \;+\; \varepsilon_{sct}, \qquad \varepsilon_{sct} \stackrel{\text{iid}}{\sim} \mathcal{N}(0, \sigma_\varepsilon^2),\]

with a shared set of time factors \(\beta_t\), aggregate-level loadings \(\alpha_s\), and within-aggregate disaggregate deviations \(\eta_{sc}\).

Remark. This is what licenses the variance-decomposition recipe (Appendix G) feeding the heuristic \(\widehat{\lambda} = 2 \widehat{\sigma}_\varepsilon^2 / \widehat{\sigma}_y^2\). The model also implies the four-way MSE decomposition the paper uses to characterise the central trade-off (oracle bias + restriction bias + estimation error + post-period noise). When the disaggregate variation departs from this clean state-effect-plus-iid-shock structure, the heuristic mis-prices the flexibility.

A5 (feasibility / convex hull for dGSC-AD). The treated aggregate’s pre-period trajectory lies in the convex hull of the disaggregate donor pool \(\{y_{sct}\}_{s \in \mathcal{N}_0, c}\). With disaggregation, the donor pool is \(N_0 = \sum_{s \in \mathcal{N}_0} C_s\) units wide – typically an order of magnitude larger than the aggregate-level hull (only \(|\mathcal{N}_0|\) control aggregates wide) – so this is easier to satisfy than canonical SC’s hull condition, which is the main empirical motivation for the estimator.

Remark. The penalty term is what stops the expanded hull from degenerating into a non-unique solution at zero pre-period RMSE: a wide disaggregate pool can fit the pre-period in many ways, and the shrinkage toward population-share-times-aggregate-weight selects among them. When even the \(\lambda = 0\) fit leaves a large pre-period RMSE, the treated aggregate is structurally outside the hull and mlSC cannot help (see the diagnostics below).

Headline theoretical finding (paper Section 1, MSE decomposition). Under A1-A4, the post-treatment MSE of any dGSC estimator decomposes as

\[\mathrm{MSE}_{\text{post}} \;=\; \underbrace{\text{oracle bias}}_{\text{common to all SC}} \;+\; \underbrace{\text{restriction bias}}_{\text{from forced aggregation}} \;+\; \underbrace{\text{estimation error}}_{\text{weight variance}} \;+\; \underbrace{\text{post-period noise}}_{\text{irreducible}}.\]

The classical SC at \(\lambda \to \infty\) minimises estimation error but pays the restriction bias; the fully disaggregated SC at \(\lambda = 0\) removes the restriction bias but inflates estimation error from the wider donor pool. The mlSC penalty traces the bias-variance frontier between the two, with the data-driven \(\widehat{\lambda}\) picking the sweet spot. Bottmer emphasises this is not a standard bias-variance trade- off – it is a flexibility-vs-noise-sensitivity trade-off, which is why the heuristic scales as \(\sigma_\varepsilon^2 / \sigma_y^2\) (the disaggregate-noise share of total variance).

When the assumptions bind: practical diagnostics#

Linear aggregation with known weights (A1). The aggregate outcome must be a known linear function of the disaggregate outcomes. Some outcomes don’t aggregate cleanly – a state-level unemployment rate, for instance, is a ratio whose state-level value is not the population-weighted mean of county-level ratios.

Plausibly violated when the outcome is a ratio (rate, share, log-of-aggregate), an order statistic (median income), or a normalised index. Diagnostic: compute \(\sum_{c} v_{sc} y_{sct}\) directly from df_disagg and compare to the corresponding row of df_agg; if the two disagree by more than rounding noise for any (s, t), A1 is failing. Re-derive the disaggregate outcome so that the aggregation is exact (e.g. work with county-level employment counts, not unemployment rates), or move to canonical SC on the aggregate panel alone.
Binary, sharp, absorbing treatment (A2). Multiple treated aggregates, treatment that turns off, or continuous dose break the identification.

Plausibly violated when the policy is rescinded mid- panel, when several states adopted at different times, or when the treatment intensity varies. Diagnostic: the config validator enforces single-cohort by construction; if you have to silently coerce a staggered-adoption panel into a single cohort to get past validation, that is a structural warning. Use FECT / Synthetic Difference-in-Differences (SDID) for staggered adoption, Continuous-Treatment Synthetic Control (CTSC) for continuous dose.
SUTVA at the disaggregate level (A3). Disaggregate units within donor states must not interfere with each other in a way that the aggregation washes out.

Plausibly violated when counties within a donor state share a labour market, a school district, or a media market that an exposed county can influence. Diagnostic: split each donor state into core / periphery counties by a geographic or economic criterion, refit, and check whether the implied \(w_{sc}\) and pre-RMSE move substantially. If they do, the within-state interference is doing real work and you need a spillover-aware setup (Spillover-Aware Synthetic Control (SPILLSYNTH), Spatial Synthetic Difference-in-Differences (SpSyDiD)) which dispenses with the disaggregate SUTVA.
Hierarchical latent-factor outcome model (A4). The Appendix-G variance decomposition that feeds \(\widehat{\lambda}\) is derived under a specific state-effect + county-within-state + iid shock structure. If the disaggregate variation is not well-described by a clean state-level component plus a uniform-variance idiosyncratic shock, the heuristic mis-prices the flexibility-vs-noise trade-off.

Plausibly violated when some counties are systematically noisier than others (urban vs rural variance pattern), or when within-state cross-county correlation is high (a state-wide industry shock hitting every county similarly). Diagnostic: read results.design.sigma_eps2 and results.design.sigma_y2 and refit with lambda_est="fixed" swept over a grid spanning 2 * sigma_eps2 / sigma_y2 by an order of magnitude in each direction; if the implied ATT moves by more than its bootstrap SE across that range, the heuristic is unstable on your panel and you should switch to "cross-validation over time" once that selector ships.
Convex-hull feasibility with disaggregate donors (A5). Expanding the donor pool by disaggregation usually makes the hull easier to satisfy, but in pathological cases – a coastal mega-state matched only by tiny interior counties – the wider pool may still miss the treated aggregate.

Plausibly violated when results.pre_rmse is large and results.aggregate_donor_weights is highly concentrated on a single donor state (the penalty is pulling toward the classical SC but the classical SC is itself missing). Diagnostic: sweep lambda_est="fixed" over a wide grid (the existing example block in the docs shows the recipe); if pre-RMSE stays large at \(\lambda = 0\), the disaggregate hull is structurally missing the treated aggregate and you need Imperfect Synthetic Controls (ISCM) (which identifies the effect even when the treated unit is outside the hull).
Aggregation weights stable over time. A1 implicitly assumes the \(v_{sc}\) are fixed across \(t\). If population shares drift substantially within the analysis window, the disaggregate-to-aggregate map itself becomes a moving target.

Plausibly violated when the analysis window spans a decade or more of demographic change (urban-rural migration; county boundary changes). Diagnostic: re-run with weight_col set to a time-averaged share vs an end-of-period share; if the ATT moves, the drift is binding. Use the end-of-period share if the post-period weighting is what the policy question targets, or interpolate \(v_{sc}\) over time and feed in a time-varying weight column (not currently supported in v1; a roadmap item).

Mathematical Formulation#

Setup#

The treated aggregate is \(j = 1\); control aggregates are indexed by \(s \in \mathcal{N}_0\), each containing \(C_s\) disaggregate sub-units indexed \(c = 1, \dots, C_s\). Time periods are \(t = 1, \dots, T\), with treatment assigned in period \(T_0 + 1\) and absorbing thereafter (treatment never turns off). Let \(y_{sct}\) denote the disaggregate outcome and \(y_{st}\) the aggregate outcome. The two are linked by population aggregation weights \(v_{sc}\) summing to one within each aggregate:

\[y_{st} \;=\; \sum_{c = 1}^{C_s} v_{sc} \, y_{sct}, \qquad \sum_{c = 1}^{C_s} v_{sc} = 1.\]

The default in mlsynth.config_models.MLSCConfig is uniform weights \(v_{sc} = 1 / C_s\); non-uniform population weights can be supplied through the weight_col field.

The dGSC Family and the Aggregation Choice#

Section 4 of the paper introduces the disaggregated general SC (dGSC) class, parametrized by a weight array \(W_{c_0 s c}\) indexed by treated disaggregate units \(c_0\), control aggregates \(s\), and control disaggregates \(c\):

\[\tau_t^{\,\text{dGSC}} \;=\; \sum_{c_0 = 1}^{C_1} v_{1 c_0} \left( y_{1 c_0 t} - \sum_{s \in \mathcal{N}_0} \sum_{c = 1}^{C_s} W_{c_0 s c} \, y_{sct} \right),\]

with \(W\) chosen to minimize the pre-treatment squared error subject to convex-hull constraints \(\sum_{s, c} W_{c_0 s c} = 1\) and \(W \geq 0\). The four edge cases discussed in Sections 3-4 of the paper correspond to constraints on \(W\):

dGSC-AA — aggregate treated, aggregate controls (the classical SC of Abadie et al., 2010). All within-block weights are proportional to \(v_{sc}\) and equal across treated disaggregates.
dGSC-AD — aggregate treated, disaggregate controls. The treated unit is kept at the aggregate level (weights equal across \(c_0\)); disaggregate control weights vary freely within convexity.
dGSC-DA — disaggregate treated, aggregate controls.
dGSC — disaggregate on both sides.

Proposition 1 of the paper shows that all four cases share the same classical-SC objective, differing only in the donor pool implied by the weight constraints. Section 6 demonstrates empirically that disaggregating the controls is the primary source of estimation gains; disaggregating the treated unit alone usually worsens performance (Jensen’s inequality: matching an average is easier than matching each component). For this reason, the mlSC estimator implemented in mlsynth focuses on the aggregate-treated, flexible-disaggregate-controls regime — the dGSC-AD case — with a penalty that shrinks toward classical SC.

The mlSC Objective#

With the treated unit kept aggregated (the paper’s preferred variant, Equation 5.2), the weight array collapses to a single vector \(\mathbf{w}\) over all disaggregate control units, with entries \(w_{sc}\). Let \(w_s = \sum_{c} w_{sc}\) denote the implied aggregate weight on donor aggregate \(s\). The mlSC estimator solves

\[\mathbf{w}^\ast \in \operatorname*{argmin}_{\mathbf{w} \in \Delta^{N_0}} \quad \underbrace{ \sum_{t = 1}^{T_0} \left( y_{1t} - \sum_{s \in \mathcal{N}_0} \sum_{c = 1}^{C_s} w_{sc} \, y_{sct} \right)^{2} }_{\text{pre-treatment fit}} \;+\; \lambda \, \widehat{\sigma}_y^2 \underbrace{ \sum_{s \in \mathcal{N}_0} \sum_{c = 1}^{C_s} \left( w_{sc} - v_{sc} \, w_s \right)^{2} }_{\text{hierarchical penalty}}.\]

The penalty term measures how far the disaggregate weight vector deviates from “population-share times aggregate weight”. When \(\lambda \to \infty\) the penalty forces \(w_{sc} = v_{sc} \, w_s\), which collapses the estimator to classical SC at the aggregate level. When \(\lambda = 0\) the disaggregate weights are free to vary within the convex hull, recovering the fully-disaggregated dGSC-AD estimator. The scale factor \(\widehat{\sigma}_y^2\) keeps \(\lambda\) itself scale-invariant across panels.

The Block-Diagonal Penalty Matrix#

Stacking the within-block penalty terms gives a clean quadratic form. For each aggregate \(s\), let \(\mathbf{w}_s \in \mathbb{R}^{C_s}\) be the within-block weight vector and \(\mathbf{v}_s \in \mathbb{R}^{C_s}\) the corresponding population weights. Defining \(\mathbf{A}_s = \mathbf{I} - \mathbf{v}_s \mathbf{1}^\top\), the within-block contribution is

\[\sum_{c = 1}^{C_s} \left( w_{sc} - v_{sc} \, w_s \right)^{2} \;=\; \| \mathbf{A}_s \, \mathbf{w}_s \|_2^2 \;=\; \mathbf{w}_s^\top \mathbf{Q}_s \, \mathbf{w}_s,\]

with

\[\mathbf{Q}_s \;=\; \mathbf{A}_s^\top \mathbf{A}_s \;=\; \mathbf{I} \;-\; \mathbf{v}_s \mathbf{1}^\top \;-\; \mathbf{1} \mathbf{v}_s^\top \;+\; \| \mathbf{v}_s \|_2^{\,2} \, \mathbf{1} \mathbf{1}^\top.\]

The full penalty matrix \(\mathbf{Q} \in \mathbb{R}^{N_0 \times N_0}\), with \(N_0 = \sum_s C_s\) the total number of disaggregate control units, is block-diagonal across \(s\) with blocks \(\mathbf{Q}_s\). \(\mathbf{Q}_s\) is symmetric positive semidefinite with \(\mathbf{v}_s\) in its kernel (any weight vector proportional to \(\mathbf{v}_s\) incurs zero penalty), so the mlSC optimization remains a convex QP and admits standard cvxpy solvers. mlsynth uses SCS by default; this matches the reference implementation and avoids any commercial-solver dependency.

Selecting \(\lambda\): Heuristic and Fixed Modes#

Three penalty-selection rules are exposed via mlsynth.config_models.MLSCConfig.lambda_est:

'heuristic' — the closed-form rule from Section 5.2 of the paper,

\[\widehat{\lambda} \;=\; 2 \, \widehat{\sigma}_\varepsilon^2 \,/\, \widehat{\sigma}_y^2,\]

derived from the oracle optimal \(\lambda^\ast\) in a stylized hierarchical random-effects model under \(T = S = C_s = 2\) (Appendix B of the paper). Intuitively, the heuristic imposes a larger penalty when the panel is noisier, since added flexibility is more likely to overfit noise than to extract signal.
'fixed' — use a user-supplied \(\lambda\) directly. Useful for sensitivity analysis (sweep \(\lambda\) and inspect how the aggregate weight pattern shifts between classical-SC and disaggregated-SC regimes) and for reproducing the paper’s grid- search experiments.
'cross-validation' — the rolling cross-validation-over-time rule from Section 5.2. The last cv_holdout_periods pre-treatment periods are held out as a one-step-ahead forecast target; the penalized program is refit over lambda_grid on the earlier pre-period and the \(\lambda\) with the smallest held-out forecast MSE is selected (the 0 grid point recovers fully-disaggregated SC). This is a faithful port of the reference get_lambda_cv; on the README DGP it selects the same grid penalty as the reference (e.g. \(316.23\) on the seed-42 panel), and the cross-check is wired into the mlsc_bottmer benchmark.

Variance Decomposition (Appendix G)#

The heuristic requires plug-in estimates of \(\sigma_\varepsilon^2\) and \(\sigma_y^2\). Following the paper’s Appendix G, these are obtained from a simplified hierarchical random-effects model \(y_{sct} = \alpha_s + \eta_{sc} + \varepsilon_{sct}\) fit to the pre-treatment slice of df_disagg. For each control aggregate \(s\) (the treated aggregate is excluded so the post-period unobservability does not contaminate the estimates), define

\[\widehat{\mu}_{sc} \;=\; \frac{1}{T_0} \sum_{t = 1}^{T_0} y_{sct}, \qquad \widehat{\mathrm{Var}}(\varepsilon)_s \;=\; \frac{1}{C_s T_0} \sum_{c = 1}^{C_s} \sum_{t = 1}^{T_0} (y_{sct} - \widehat{\mu}_{sc})^{2},\]

and

\[\widehat{\mathrm{Var}}(y)_s \;=\; \frac{1}{C_s T_0} \sum_{c = 1}^{C_s} \sum_{t = 1}^{T_0} \left( y_{sct} - \overline{y}_{s\cdot\cdot} \right)^{2},\]

with \(\overline{y}_{s\cdot\cdot}\) the pre-treatment grand mean over aggregate \(s\). The plug-in estimates are then the non-treated averages \(\widehat{\sigma}_\varepsilon^2 = \mathrm{mean}_s \widehat{\mathrm{Var}}(\varepsilon)_s\) and \(\widehat{\sigma}_y^2 = \mathrm{mean}_s \widehat{\mathrm{Var}}(y)_s\). The implementation in mlsynth.utils.mlsc_helpers.variance.estimate_variance_components() matches the reference Python package and the paper’s description to the letter.

Counterfactual and Effect Summaries#

Given the optimal \(\mathbf{w}^\ast\), the aggregate counterfactual is

\[\widehat{y}_{1t} \;=\; \sum_{s \in \mathcal{N}_0} \sum_{c = 1}^{C_s} w_{sc}^\ast \, y_{sct}, \qquad t = 1, \dots, T,\]

and the average treatment effect on the treated aggregate is

\[\widehat{\tau} \;=\; |\mathcal{T}_2|^{-1} \sum_{t \in \mathcal{T}_2} \left( y_{1t} - \widehat{y}_{1t} \right),\]

reported as MLSCResults.att. The pre-period RMSE between the observed treated series and its mlSC reconstruction, \(\mathrm{RMSE}_{\text{pre}} = \big[\frac{1}{T_0} \sum_{t \in \mathcal{T}_1} (y_{1t} - \widehat{y}_{1t})^2\big]^{1/2}\), is reported as MLSCResults.pre_rmse.

Two-DataFrame API and Validation#

Because the aggregate-level and disaggregate-level data live at different units of observation, they are passed as two separate long-form DataFrames sharing the same outcome, time, and treat column names. df_disagg additionally carries an agg_id column mapping each disaggregate unit to its parent aggregate, and an optional weight_col for non-uniform \(v_{sc}\). The configuration validator enforces four cross-DataFrame invariants:

Treatment timing alignment. The pre-period count \(T_0\) implied by df_agg must match the one implied by df_disagg; any disagreement raises mlsynth.exceptions.MlsynthDataError.
Treated-aggregate consistency. Every disaggregate unit with treat = 1 must map (via agg_id) to the same aggregate unit, which must equal the treated aggregate in df_agg.
Single treatment cohort. All treated disaggregate units must share the same treatment start period (treatment-never-turns-off is part of the mlSC framework).
Aggregate-label coverage. Every value of agg_id in df_disagg must appear as a unitid_agg in df_agg.

These invariants are enforced once in mlsynth.utils.mlsc_helpers.setup.prepare_mlsc_inputs(), which reuses mlsynth.utils.datautils.dataprep() on both panels (including the disaggregate cohorts shape) and then assembles the matrices the rest of the pipeline consumes.

Core API#

Multi-Level Synthetic Control (mlSC) estimator.

Implements:

Bottmer, L. (2025). “Synthetic Control with Disaggregated Data.” Stanford University job-market paper.

Reference Python package: multi-levelSC (Apache-2.0), reimplemented here under mlsynth’s MIT license following the SCDI / SPCD / LEXSCM / TASC architectural conventions.

The mlSC estimator solves a hierarchical-aggregation regularized SC problem on a two-level panel (e.g. state-level treated unit + county-level controls). Treatment is assigned at the aggregate level; the disaggregate data enters as an enlarged donor pool, with a ridge-type penalty that shrinks each disaggregate weight omega_sc toward v_sc * w_s (population-share times implied aggregate weight). At lambda -> infinity the estimator recovers classical SC; at lambda = 0 it recovers the fully- disaggregated control SC (dGSC-AD in the paper’s notation). v1 implements the Section 5.2 heuristic lambda = 2 * sigma_eps^2 / sigma_y^2 and a user-supplied fixed lambda.

class mlsynth.estimators.mlsc.MLSC(config: MLSCConfig | dict)#

Bases: object

Multi-Level Synthetic Control (mlSC) estimator.

Operates on two long-form DataFrames: one aggregate-level (df_agg, e.g. state by time) and one disaggregate-level (df_disagg, e.g. county by time with an agg_id column mapping each county to its state). Treatment is assigned at the aggregate level. The optimization weights all disaggregate control units to minimize pre-treatment fit on the aggregate treated series, with a hierarchical penalty that shrinks toward classical SC.

Parameters:: config (MLSCConfig or dict) – Configuration object. See mlsynth.config_models.MLSCConfig.
Returns:: MLSCResults – Container with the optimal disaggregate weights, implied aggregate weights, counterfactual path, ATT, and pre-period RMSE.

Notes

Both DataFrames must share the same outcome, time, and treat column names and must agree on treatment timing. The treated row in df_disagg must map (via agg_id) to the treated aggregate unit in df_agg.

References

Bottmer, L. (2025). Synthetic Control with Disaggregated Data. Stanford University job-market paper.

fit() → MLSCResults#: Run the mlSC pipeline and return the learned design.

Configuration#

class mlsynth.config_models.MLSCConfig(*, df_agg: ~pandas.DataFrame, df_disagg: ~pandas.DataFrame, outcome: str, time: str, treat: str, unitid_agg: str, unitid_disagg: str, agg_id: str, weight_col: str | None = None, lambda_est: ~typing.Literal['heuristic', 'fixed', 'cross-validation'] = 'heuristic', lambda_val: ~typing.Annotated[float, ~annotated_types.Ge(ge=0)] = 0.0001, lambda_grid: ~typing.List[float] | None = None, cv_holdout_periods: ~typing.Annotated[int, ~annotated_types.Ge(ge=1)] = 1, solver: ~typing.Any = None, display_graphs: bool = True, save: bool | str | ~typing.Dict[str, str] = False, counterfactual_color: str | ~typing.List[str] = <factory>, treated_color: str = 'black')#

Configuration for the Multi-Level Synthetic Control (mlSC) estimator.

Implements the data-driven hierarchical-aggregation estimator of Bottmer (2025, “Synthetic Control with Disaggregated Data”). Unlike the rest of mlsynth’s estimators, mlSC operates on a two-level panel: an aggregate-level DataFrame (e.g. state-by-time) and a disaggregate-level DataFrame (e.g. county-by-time, with a column linking each county to its parent state). Treatment is assigned at the aggregate level and the estimand is the aggregate-level ATT, but the disaggregate data enters the donor pool with a ridge-type penalty that shrinks the disaggregated weights toward “population-share times aggregate weight” — recovering classical SC in the large-penalty limit and fully-disaggregated SC at penalty zero.

Implements all three Section-5.2 penalty-selection paths: the heuristic closed form, a fixed lambda, and rolling cross-validation over time.

class Config#

arbitrary_types_allowed = True#

extra = 'forbid'#

agg_id: str#

counterfactual_color: str | List[str]#

cv_holdout_periods: int#

df_agg: pd.DataFrame#

df_disagg: pd.DataFrame#

display_graphs: bool#

lambda_est: Literal['heuristic', 'fixed', 'cross-validation']#

lambda_grid: List[float] | None#

lambda_val: float#

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

outcome: str#

save: bool | str | Dict[str, str]#

solver: Any#

time: str#

treat: str#

treated_color: str#

unitid_agg: str#

unitid_disagg: str#

weight_col: str | None#

Helper Modules#

Two-DataFrame data preparation for mlSC.

Wraps datautils.dataprep once on each panel and assembles the matrices the rest of the pipeline consumes. Validation enforces alignment between the aggregate and disaggregate panels: matching treatment timing, complete agg_id coverage, and population weights that sum to 1 within each aggregate (or default to uniform).

mlsynth.utils.mlsc_helpers.setup.prepare_mlsc_inputs(df_agg: DataFrame, df_disagg: DataFrame, outcome: str, time: str, treat: str, unitid_agg: str, unitid_disagg: str, agg_id: str, weight_col: str | None) → MLSCInputs#

Validate, pivot, and stack the two-level panel for mlSC.

Parameters:

df_agg, df_disagg (pd.DataFrame) – Long-form aggregate and disaggregate panels.
outcome, time, treat (str) – Column names shared by both panels.
unitid_agg, unitid_disagg, agg_id, weight_col (str) – Aggregate and disaggregate identifier columns, the disaggregate-> aggregate mapping column, and an optional population-weights column.

Returns:

MLSCInputs – Pre-processed matrices, weight vector, block-index map, and metadata.

Variance decomposition for the mlSC heuristic penalty.

Implements Appendix G of Bottmer (2025): under the simplified hierarchical random-effects model

Y_sct = alpha_s + eta_sc + eps_sct,

the unit-level pre-treatment means yield consistent estimates of sigma_eps^2 and sigma_y^2. The mlSC heuristic lambda = 2 * sigma_eps^2 / sigma_y^2 (Section 5.2) is built from these.

The decomposition is computed only over the control aggregates — the treated aggregate is excluded, matching the upstream package and the paper’s description (“taking the average estimated variance across all other aggregated units except for the treated unit”).

mlsynth.utils.mlsc_helpers.variance.estimate_variance_components(inputs: MLSCInputs) → Tuple[float, float]#

Return (sigma_eps2, sigma_y2) per Appendix G of Bottmer (2025).

Parameters:

inputs (MLSCInputs) – Pre-processed mlSC inputs. Uses Y_disagg_pre_full, disagg_to_agg_full, and treated_agg_idx_full to run the decomposition over the full disaggregate pre-treatment panel, excluding the treated aggregate.

Returns:

sigma_eps2 (float) – Estimated within-disaggregate-unit noise variance, averaged across non-treated aggregates.
sigma_y2 (float) – Estimated total disaggregate-outcome variance, averaged across non-treated aggregates.

mlsynth.utils.mlsc_helpers.variance.heuristic_lambda(sigma_eps2: float, sigma_y2: float) → float#: Closed-form heuristic lambda = 2 * sigma_eps^2 / sigma_y^2.

Block-diagonal hierarchical-aggregation penalty matrix for mlSC.

The mlSC objective contains the ridge-type penalty

sum_{s, c} (omega_sc - v_sc * w_s)^2, w_s = sum_c omega_sc.

For each aggregate block s with population weights v_s in R^{C_s} satisfying 1^T v_s = 1, the within-block penalty equals omega_s^T Q_s omega_s where

Q_s = (I - v_s 1^T)^T (I - v_s 1^T)
= I - v_s 1^T - 1 v_s^T + ||v_s||^2 * 1 1^T.

The full penalty matrix Q in R^{M x M} is block-diagonal with these Q_s blocks. Q_s is positive semidefinite (kernel: v_s), so the mlSC objective remains a convex QP.

mlsynth.utils.mlsc_helpers.penalty.build_block(v_s: ndarray) → ndarray#: Return the single-block penalty matrix Q_s for one aggregate.

mlsynth.utils.mlsc_helpers.penalty.build_penalty_matrix(v_population: ndarray, disagg_to_agg: ndarray) → ndarray#

Assemble the full block-diagonal Q matrix.

Parameters:

v_population (np.ndarray) – Length-M population weights for each disaggregate column.
disagg_to_agg (np.ndarray) – Length-M integer block-membership index for each column.

Returns:

np.ndarray – Block-diagonal Q of shape (M, M).

mlsynth.utils.mlsc_helpers.penalty.build_sqrt_block(v_s: ndarray) → ndarray#

Return the single-block square-root factor R_s = I - v_s 1^T.

By construction R_s^T R_s == Q_s (see build_block()), so stacking sqrt(penalty) * R_s as extra rows of the design reproduces the quadratic penalty omega_s^T Q_s omega_s as an ordinary least-squares term. This is what lets the active-set simplex QP solve the penalized program without a general quadratic-form solver.

mlsynth.utils.mlsc_helpers.penalty.build_sqrt_factor(v_population: ndarray, disagg_to_agg: ndarray) → ndarray#

Assemble the block-diagonal square-root factor R with R^T R == Q.

Parameters mirror build_penalty_matrix().

Returns:: np.ndarray – Block-diagonal R of shape (M, M) whose blocks are build_sqrt_block().

MlSC QP solver.

Solves Equation 5.2 of Bottmer (2025):

min over omega in R^M
    || Y_agg_treated_pre  -  X_disagg_pre @ omega ||^2
    + lambda * sigma_y^2 * omega^T Q omega
s.t.  1^T omega = 1,   omega >= 0.

The penalty Q = R^T R factors in closed form (R_s = I - v_s 1^T per block; see mlsynth.utils.mlsc_helpers.penalty.build_sqrt_factor()), so the penalized objective is an ordinary simplex least squares once the penalty is folded into the design as the augmentation [X; sqrt(lambda sigma_y^2) R] with a zero target. For any lambda > 0 the program is strictly convex whenever the aggregate control matrix has full column rank (the penalty fills exactly the part of X’s null space the aggregate directions do not pin down), so the optimum is unique. We solve it with the library’s active-set simplex QP (mlsynth.utils.bilevel.active_set.solve_simplex_qp()) – exact and free of cvxpy’s per-call canonicalisation overhead in the cross-validation loop.

The lambda = 0 case is regularised by a tiny ridge (_RIDGE_FLOOR), the same device the reference uses to keep the fully-disaggregated fit unique on a flat objective. An explicit solver argument routes through cvxpy instead, an escape hatch that preserves the original behaviour for callers who request a particular cvxpy solver.

mlsynth.utils.mlsc_helpers.optimization.solve_mlsc(inputs: MLSCInputs, Q: ndarray, lambda_val: float, sigma_y2: float, solver: Any = None) → Tuple[ndarray, ndarray, str]#

Solve the mlSC QP.

Parameters:

inputs (MLSCInputs) – Pre-processed two-level panel.
Q (np.ndarray) – Block-diagonal penalty matrix of shape (M, M) (used only by the cvxpy escape hatch; the native path rebuilds the square-root factor).
lambda_val (float) – Penalty value (already chosen via heuristic or fixed).
sigma_y2 (float) – Outcome variance, the penalty’s scale factor so that lambda is scale-invariant as in the paper.
solver (Any) – cvxpy solver. None (default) uses the native active-set simplex QP; any explicit value routes through cvxpy with that solver.

Returns:

omega (np.ndarray) – Optimal disaggregate weights, length M.
aggregate_weights (np.ndarray) – Implied w_s = sum_c omega_sc, length S.
status (str) – Solver status string.

Counterfactual path and effect summaries for mlSC.

The aggregate-level counterfactual is just X_disagg @ omega evaluated across all T periods. The pre-period RMSE and post-period mean gap come from comparing this trajectory against the observed aggregate treated outcome.

mlsynth.utils.mlsc_helpers.inference.counterfactual_path(inputs: MLSCInputs, omega: ndarray) → MLSCInference#: Build the counterfactual and gap series.

mlsynth.utils.mlsc_helpers.inference.summarize_effects(inputs: MLSCInputs, inference: MLSCInference) → Tuple[float, float]#: Return (att, pre_rmse) for the aggregate treated unit.

Plotting helper for mlSC.

Wraps mlsynth.utils.resultutils.plot_estimates so we get the standard observed-vs-counterfactual chart, where “observed” is the aggregate treated series and “counterfactual” is X_disagg @ omega.

mlsynth.utils.mlsc_helpers.plotter.plot_mlsc(results: MLSCResults, treated_color: str = 'black', counterfactual_color: str | List[str] = 'red', save: bool | dict | str = False, time_axis_label: str = 'Time', outcome_label: str = 'Outcome', treatment_label: str = 'Treatment', unit_label: str = 'Unit') → None#: Render the mlSC aggregate counterfactual against the observed series.

Note

MLSC.fit() returns an EffectResult on the standardized two-family contract: res.att / res.counterfactual / res.gap / res.donor_weights (the disaggregate-level weights) / res.pre_rmse resolve through the standardized sub-models. MLSC produces no statistical inference, so res.inference is None; the MLSC-specific detail stays on res.design / res.paths / res.aggregate_donor_weights.

Structured containers for the mlSC pipeline.

All matrices follow mlsynth’s standard (T, N) orientation (rows = time, columns = unit), matching datautils.dataprep.

class mlsynth.utils.mlsc_helpers.structures.MLSCDesign(omega: ndarray, aggregate_weights: ndarray, lambda_used: float, sigma_eps2: float, sigma_y2: float, lambda_est: str, solver_status: str)#

Optimization output and selected hyperparameters.

Parameters:

omega (np.ndarray) – Optimal disaggregate weights, length M. Satisfies sum(omega) = 1 and omega >= 0.
aggregate_weights (np.ndarray) – Implied aggregate weights w_s = sum_c omega_sc for each control aggregate, length S.
lambda_used (float) – Penalty value actually applied.
sigma_eps2 (float) – Estimated noise variance from Appendix G.
sigma_y2 (float) – Estimated outcome variance from Appendix G.
lambda_est (str) – Selection rule that was used ('heuristic' or 'fixed').
solver_status (str) – cvxpy solver status string.

aggregate_weights: ndarray#

lambda_est: str#

lambda_used: float#

omega: ndarray#

sigma_eps2: float#

sigma_y2: float#

solver_status: str#

class mlsynth.utils.mlsc_helpers.structures.MLSCInference(counterfactual: ndarray, gap: ndarray)#

Counterfactual path and post-period gap.

Parameters:

counterfactual (np.ndarray) – Estimated aggregate counterfactual X_disagg @ omega, length T.
gap (np.ndarray) – Observed minus counterfactual, length T.

counterfactual: ndarray#

gap: ndarray#

class mlsynth.utils.mlsc_helpers.structures.MLSCInputs(Y_agg_treated: ndarray, X_disagg: ndarray, v_population: ndarray, disagg_to_agg: ndarray, agg_labels: Sequence, disagg_labels: Sequence, Y_disagg_pre_full: ndarray, disagg_to_agg_full: ndarray, treated_agg_idx_full: int, T: int, T0: int, treated_unit_name: str, time_labels: ndarray, Ywide_agg: object, outcome: str)#

Pre-processed data fed into the mlSC optimization and inference loops.

Parameters:

Y_agg_treated (np.ndarray) – Observed aggregate-level treated outcome series, length T.
X_disagg (np.ndarray) – Disaggregate-control outcome matrix of shape (T, M) where M is the total number of disaggregate control units (sum of C_s across all non-treated aggregates). Columns are ordered first by aggregate label and then by disaggregate label, with this ordering preserved in disagg_to_agg and v_population.
v_population (np.ndarray) – Population aggregation weights v_sc for each disaggregate control column, length M. Within each aggregate block the entries sum to 1.
disagg_to_agg (np.ndarray) – Length-M integer array giving the aggregate-block index for each disaggregate column. Aggregate blocks are indexed 0..S-1.
agg_labels (Sequence) – Labels of the S control aggregates in block order.
disagg_labels (Sequence) – Labels of the M disaggregate-control columns in column order.
Y_disagg_pre_full (np.ndarray) – Pre-treatment disaggregate outcome matrix for all aggregates (including the treated one), shape (T0, M_full). Used by the Appendix-G variance decomposition.
disagg_to_agg_full (np.ndarray) – Length-M_full aggregate-index assignment matching Y_disagg_pre_full columns.
treated_agg_idx_full (int) – Index of the treated aggregate within Y_disagg_pre_full blocks (used to exclude it from the variance estimate).
T (int) – Total number of time periods.
T0 (int) – Number of pre-treatment periods.
treated_unit_name (str) – Label of the treated aggregate.
time_labels (np.ndarray) – Time labels in original order, length T.
Ywide_agg (object) – Wide aggregate-level outcome frame (rows = time, columns = aggregate unit) preserved from dataprep for downstream plotting.
outcome (str) – Outcome variable name.

property M: int#: Number of disaggregate control columns.

property S: int#: Number of control aggregate units.

T: int#

T0: int#

X_disagg: ndarray#

Y_agg_treated: ndarray#

Y_disagg_pre_full: ndarray#

Ywide_agg: object#

agg_labels: Sequence#

disagg_labels: Sequence#

disagg_to_agg: ndarray#

disagg_to_agg_full: ndarray#

outcome: str#

time_labels: ndarray#

treated_agg_idx_full: int#

treated_unit_name: str#

v_population: ndarray#

class mlsynth.utils.mlsc_helpers.structures.MLSCResults(*, effects: EffectsResults | None = None, fit_diagnostics: FitDiagnosticsResults | None = None, time_series: TimeSeriesResults | None = None, weights: WeightsResults | None = None, inference: InferenceResults | None = None, method_details: MethodDetailsResults | None = None, sub_method_results: Dict[str, Any] | None = None, additional_outputs: Dict[str, Any] | None = None, raw_results: Dict[str, Any] | None = None, execution_summary: Dict[str, Any] | None = None, plot_config: PlotConfig | None = None, inputs: MLSCInputs, design: MLSCDesign, paths: MLSCInference, aggregate_donor_weights: Dict[Any, float])#

Public MLSC.fit() return container.

An EffectResult (the observational report): in addition to the mlSC-specific fields below it exposes the standardized sub-models (effects, time_series, weights, fit_diagnostics, method_details) and the flat accessors att / counterfactual / gap / pre_rmse / donor_weights. mlSC has no statistical inference (no SE/CI), so the inference slot is None.

Parameters:

inputs (MLSCInputs) – Pre-processed two-level panel.
design (MLSCDesign) – Optimization outputs and variance-decomposition diagnostics.
paths (MLSCInference) – Counterfactual path and gap. (Renamed from inference — it carries the fitted series, not statistical inference, and the contract reserves the inference slot for InferenceResults. The same series are exposed flat as res.counterfactual / res.gap.)
aggregate_donor_weights (dict) – Mapping from aggregate-unit label to its implied weight w_s = sum_c omega_sc. Convenience view of design.aggregate_weights. (The disaggregate donor_weights live in the standardized weights slot, served by res.donor_weights.)

aggregate_donor_weights: Dict[Any, float]#

design: MLSCDesign#

inputs: MLSCInputs#

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'frozen': True, 'json_encoders': {<class 'numpy.ndarray'>: <function BaseEstimatorResults.Config.<lambda>>}}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

paths: MLSCInference#

The README hierarchical-factor DGP from the author’s reference package, packaged as simulate_mlsc_sample so the Verification replication runs as a one-liner.

README data-generating process for Bottmer (2025) mlSC.

Implements the simulation distributed with the author’s reference package (multi-level-sc-estimator/README.md): a hierarchical linear factor model with state-level loadings and within-state county deviations.

\[y_{s, c, t} = (\alpha_s + \eta_{sc}) \cdot f_t + \varepsilon_{sct},\]

with

\(f_t \sim \mathcal{N}(0, \sigma_{\text{time}}^2)\) — one shared time factor;
\(\alpha_s \sim \mathcal{N}(0, \sigma_{\text{state}}^2)\) — state-level loading;
\(\eta_{sc} \sim \mathcal{N}(0, \sigma_{\text{county}}^2)\) — within-state county deviation;
\(\varepsilon_{sct} \sim \mathcal{N}(0, \sigma_{\text{eps}}^2)\) — idiosyncratic shock.

Aggregation to the state level uses equal within-state weights (\(v_{sc} = 1 / C_s\)), so \(y_{st} = (1 / C_s) \sum_c y_{sct}\). The default README parameters are \(N_s = 10\), \(C_s = 10\), \(T = 20\), \((\sigma_{\text{time}}, \sigma_{\text{state}}, \sigma_{\text{county}}, \sigma_{\text{eps}}) = (1.0, 0.8, 0.5, 0.3)\).

The simulation never adds a treatment effect — the true ATT is exactly zero, so the Monte Carlo target is unbiasedness of the estimated ATT and value-for-value agreement with the reference implementation.

class mlsynth.utils.mlsc_helpers.simulation.MLSCSample(df_agg: DataFrame, df_disagg: DataFrame, data_s: ndarray, data_c: ndarray, n_c: ndarray, w_c: List[ndarray], idx: int, t: int)#

One draw from the README’s hierarchical-factor DGP.

df_agg, df_disagg

Long panels ready for mlsynth.MLSC. Columns: state / county (disagg only) / time / y / treated.

Type:: pd.DataFrame

data_s#

Aggregate-level outcome matrix, shape (N_states, T).

Type:: np.ndarray

data_c#

Disaggregate-level outcome matrix, shape (N_states * C_s, T).

Type:: np.ndarray

n_c#

Counties per state, shape (N_states,).

Type:: np.ndarray

w_c#

Within-state population weights (each array sums to one). One entry per state, in the order the reference impl expects.

Type:: list of np.ndarray

idx#

Treated state index.

Type:: int

t#

Treated period index (0-based; the README uses t = T - 1).

Type:: int

data_c: ndarray#

data_s: ndarray#

df_agg: DataFrame#

df_disagg: DataFrame#

idx: int#

n_c: ndarray#

t: int#

w_c: List[ndarray]#

mlsynth.utils.mlsc_helpers.simulation.simulate_mlsc_sample(N_states: int = 10, counties_per_state: int = 10, T: int = 20, sigma_time: float = 1.0, sigma_state: float = 0.8, sigma_county: float = 0.5, sigma_eps: float = 0.3, treated_idx: int = 0, treated_t: int | None = None, rng: Generator | None = None) → MLSCSample#

Draw one sample from the Bottmer (2025) README DGP.

Parameters:

N_states (int, default 10) – Number of aggregate units (e.g. states).
counties_per_state (int, default 10) – Number of disaggregate units inside each aggregate unit. Constant across states in the README; pass a per-state count via the reference implementation if you need imbalance.
T (int, default 20) – Total number of periods.
sigma_time, sigma_state, sigma_county, sigma_eps (float) – Standard deviations for the time factor, state-level loading, within-state county deviation, and idiosyncratic shock.
treated_idx (int, default 0) – Aggregate unit assigned to treatment.
treated_t (int or None) – First treated period. Defaults to T - 1 (the README’s choice).
rng (np.random.Generator, optional) – NumPy RNG. Defaults to np.random.default_rng().

Returns:

MLSCSample

Example#

import pandas as pd
from mlsynth import MLSC

# Two long-form panels sharing outcome / time / treat column names.
#
#   df_agg     — one row per (state, year): the aggregate panel.
#   df_disagg  — one row per (county, year): the disaggregate panel.
#                Must contain an 'agg_id' column mapping each county
#                to its parent state, and may optionally carry a
#                'population' column for non-uniform v_sc.

df_agg    = pd.read_csv("state_panel.csv")
df_disagg = pd.read_csv("county_panel.csv")

config = {
    "df_agg":         df_agg,
    "df_disagg":      df_disagg,
    "outcome":        "sales",
    "time":           "year",
    "treat":          "treated",        # binary 0/1, equal at both levels
    "unitid_agg":     "state",          # column in df_agg
    "unitid_disagg":  "county_fips",    # column in df_disagg
    "agg_id":         "state",          # column in df_disagg
    "weight_col":     "population",     # optional; defaults to 1 / C_s
    "lambda_est":     "heuristic",      # or "fixed" with lambda_val
    "display_graphs": True,
}

results = MLSC(config).fit()

# Point estimate, fit diagnostic, learned penalty.
print(results.att)                          # mean post-period aggregate gap
print(results.pre_rmse)                     # pre-period RMSE
print(results.design.lambda_used)           # selected lambda
print(results.design.sigma_eps2,
      results.design.sigma_y2)              # Appendix-G variance estimates

# Disaggregate and implied aggregate weights.
print(results.donor_weights)                # {county_fips: w_sc}
print(results.aggregate_donor_weights)      # {state:        w_s = sum_c w_sc}

# Counterfactual trajectory (length T) and the observed - counterfactual gap.
cf  = results.counterfactual          # standardized accessor (== results.paths.counterfactual)
gap = results.gap                     # == results.paths.gap

# Sensitivity: sweep lambda by hand.
for lam in [0.0, 1e-2, 1e-1, 1.0, 10.0, 1e6]:
    r = MLSC({**config, "lambda_est": "fixed", "lambda_val": lam}).fit()
    print(f"lambda={lam:>8.4f}  ATT={r.att:.4f}  preRMSE={r.pre_rmse:.4f}")

Verification#

Empirical replication against the author’s reference code (Path A) plus a Monte Carlo unbiasedness check (Path B). Bottmer’s published empirical application — the effect of a state minimum-wage policy change on county-level teen employment in Iowa — uses the Quarterly Workforce Indicators (QWI) panel that ships with her reference package (tests/ia_emp_app_teen_empl.csv). The simulation block in her README spells out a hierarchical linear-factor DGP that’s free of proprietary data and is the design mlSC_estimator is calibrated against. Both validations are wired in below.

The DGP is packaged in mlsynth.utils.mlsc_helpers.simulation.simulate_mlsc_sample():

\[y_{sct} = (\alpha_s + \eta_{sc}) \cdot f_t + \varepsilon_{sct}, \quad y_{st} = \frac{1}{C_s} \sum_{c} y_{sct},\]

with \(f_t \sim \mathcal{N}(0, 1)\), \(\alpha_s \sim \mathcal{N}(0, 0.8^2)\), \(\eta_{sc} \sim \mathcal{N}(0, 0.5^2)\), and \(\varepsilon_{sct} \sim \mathcal{N}(0, 0.3^2)\) at the README defaults \(N_s = 10\), \(C_s = 10\), \(T = 20\). The simulation never adds a treatment effect, so the true ATT is exactly zero and the Monte Carlo target is unbiasedness.

Path A: value-for-value vs the reference code (one draw)#

On the README’s exact panel (np.random.default_rng(42)), mlsynth’s MLSC reproduces both the selected penalty and the reported ATT to solver tolerance:

import numpy as np
from mlsynth import MLSC
from mlsynth.utils.mlsc_helpers.simulation import simulate_mlsc_sample
from multi_level_sc_estimator.mlSC import mlSC_estimator       # reference

s = simulate_mlsc_sample(rng=np.random.default_rng(42))
tau_ref, lam_ref, _ = mlSC_estimator(
    s.data_s, s.data_c, s.idx, s.n_c, s.t, s.w_c,
    lambda_est="heuristic")
res = MLSC({"df_agg": s.df_agg, "df_disagg": s.df_disagg,
             "outcome": "y", "time": "time", "treat": "treated",
             "unitid_agg": "state", "unitid_disagg": "county",
             "agg_id": "state", "lambda_est": "heuristic",
             "display_graphs": False}).fit()
print(f"REFERENCE  tau_hat = {tau_ref:+.6f}  lambda = {lam_ref:.6f}")
print(f"MLSYNTH    tau_hat = {res.att:+.6f}  "
      f"lambda = {res.design.lambda_used:.6f}")

prints:

REFERENCE  tau_hat = +0.011928  lambda = 1.970185
MLSYNTH    tau_hat = +0.011930  lambda = 1.970185

Differences: \(|\widehat{\tau}_{\text{mlsynth}} - \widehat{\tau}_{\text{ref}}| = 1.5 \times 10^{-6}\), \(|\widehat{\lambda}_{\text{mlsynth}} - \widehat{\lambda}_{\text{ref}}| = 4.4 \times 10^{-16}\) — i.e. solver noise on the ATT and machine precision on the penalty.

Path B: unbiasedness across 200 draws#

Looping the comparison across \(M = 200\) draws confirms (a) both estimators are unbiased for the true zero ATT, and (b) they remain in near-machine agreement throughout:

def one_rep(seed):
    s = simulate_mlsc_sample(rng=np.random.default_rng(seed))
    tau_ref, _, _ = mlSC_estimator(
        s.data_s, s.data_c, s.idx, s.n_c, s.t, s.w_c,
        lambda_est="heuristic")
    res = MLSC({"df_agg": s.df_agg, "df_disagg": s.df_disagg,
                 "outcome": "y", "time": "time", "treat": "treated",
                 "unitid_agg": "state", "unitid_disagg": "county",
                 "agg_id": "state", "lambda_est": "heuristic",
                 "display_graphs": False}).fit()
    return float(tau_ref), float(res.att)

import numpy as np
tau_pairs = np.array([one_rep(s) for s in range(200)])
tau_ref, tau_ml = tau_pairs[:, 0], tau_pairs[:, 1]
print(f"reference: mean={tau_ref.mean():+.4f}  std={tau_ref.std(ddof=1):.4f}  "
      f"RMSE={np.sqrt((tau_ref**2).mean()):.4f}")
print(f"mlsynth  : mean={tau_ml.mean():+.4f}   std={tau_ml.std(ddof=1):.4f}  "
      f"RMSE={np.sqrt((tau_ml**2).mean()):.4f}")
print(f"max |Δ| = {np.abs(tau_ml - tau_ref).max():.2e}, "
      f"median |Δ| = {np.median(np.abs(tau_ml - tau_ref)):.2e}")

prints:

reference: mean=-0.0103  std=0.1663  RMSE=0.1662
mlsynth  : mean=-0.0103  std=0.1663  RMSE=0.1662
max |Δ| = 5.76e-04, median |Δ| = 1.05e-05

The Monte Carlo standard error of the mean at \(M = 200\) is \(\sigma / \sqrt{200} \approx 0.012\), so the observed bias of \(-0.010\) is well inside one MC SE of zero — the heuristic \(\widehat{\lambda}\) does not introduce systematic bias, and mlsynth and the reference impl produce statistically identical samples.

Both checks are wired as the durable benchmark mlsc_bottmer, which clones the author’s repository (pinned commit 0fb2639) at run time, runs both estimators on the README DGP, and skips gracefully when the reference or its dependencies are unavailable.

Empirical application#

For reference, Bottmer’s empirical application — Iowa’s 2007 minimum- wage hike on county-level teen employment, run on the Quarterly Workforce Indicators panel that ships with her reference package (tests/ia_emp_app_teen_empl.csv) — uses the same heuristic-lambda configuration. Because the panel is public, mlsynth’s MLSC can be pointed at it directly through the two-DataFrame API; see the estimator’s docstring for the data-cleaning conventions the package expects (drop counties with any NaN over the analysis window, mirror the state-level treatment indicator on each contained county, normalize within-state weights to sum to one).

References#

Bottmer, L. (2025). Synthetic Control with Disaggregated Data. Stanford University job-market paper. PDF.

The reference Python implementation by the paper’s author, multi-levelSC (Apache-2.0), inspired several details of this rewrite — in particular the variance-decomposition implementation and the classical-SC warm-start used by the cvxpy solver.

Abadie, A., Diamond, A., and Hainmueller, J. (2010). “Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California’s Tobacco Control Program.” Journal of the American Statistical Association 105(490):493-505.

Multi-Level Synthetic Control (mlSC)

Contents

Multi-Level Synthetic Control (mlSC)#

Overview#

When to use mlSC – and when not to#

Notation#

Assumptions (Bottmer 2025)#

When the assumptions bind: practical diagnostics#

Mathematical Formulation#

Setup#

The dGSC Family and the Aggregation Choice#

The mlSC Objective#

The Block-Diagonal Penalty Matrix#

Selecting \(\lambda\): Heuristic and Fixed Modes#

Variance Decomposition (Appendix G)#

Counterfactual and Effect Summaries#

Two-DataFrame API and Validation#

Core API#

Configuration#

Helper Modules#

Example#

Verification#

Path A: value-for-value vs the reference code (one draw)#

Path B: unbiasedness across 200 draws#

Empirical application#

References#