Multi-Level Synthetic Control (mlSC)

Contents

Multi-Level Synthetic Control (mlSC)#

Overview#

Multi-Level Synthetic Control (mlSC) is the estimator proposed by Bottmer (2025) for panels with two levels of aggregation: an aggregate level at which treatment is assigned (e.g. states), and a finer disaggregate level at which outcomes are also observed (e.g. counties within those states). Such panels are common in policy evaluation — state-wide cigarette and minimum-wage policies, for instance, are routinely studied with county-level outcome data — and they raise a question that classical SC does not address: at what level of aggregation should the estimator operate?

Existing practice spans the full range: some studies aggregate every- thing to the state level and apply classical SC; others enlarge the donor pool with all disaggregated control units; still others disaggregate both treated and control sides. Each choice has a different bias / noise profile. The mlSC estimator turns the aggregation choice into a data-driven regularization problem: the disaggregate weights are allowed to vary freely, but a ridge-type penalty shrinks them toward the classical-SC solution, with the penalty strength selected from data.

Two structural properties distinguish mlSC from the rest of the mlsynth toolkit. First, it operates on two long-form DataFrames, one at the aggregate level (df_agg) and one at the disaggregate level (df_disagg), with an agg_id column on df_disagg mapping each disaggregate unit to its parent aggregate. Every other estimator in the package takes a single panel. Second, the entire spectrum from classical SC to fully disaggregated SC is recovered as limiting cases of a single penalty parameter \(\lambda\) — at \(\lambda \to \infty\) mlSC reduces to the classical SC of Abadie, Diamond, and Hainmueller (2010); at \(\lambda = 0\) it recovers the fully-disaggregated control SC (dGSC-AD in the paper’s notation); intermediate values pick a data-driven mixture.

Mathematical Formulation#

Setup#

Let \(s = 0, 1, \dots, S\) index \(S + 1\) aggregate units (e.g. states), with \(s = 0\) the single treated aggregate. Each aggregate \(s\) contains \(C_s\) disaggregate units (e.g. counties), indexed \(c = 1, \dots, C_s\). Time periods are \(t = 1, \dots, T\), with treatment assigned in period \(T_0 + 1\) and absorbing thereafter (treatment never turns off). Let \(Y_{sct}\) denote the disaggregate outcome and \(Y_{st}\) the aggregate outcome. The two are linked by population aggregation weights \(v_{sc}\) summing to one within each aggregate:

\[Y_{st} \;=\; \sum_{c = 1}^{C_s} v_{sc} \, Y_{sct}, \qquad \sum_{c = 1}^{C_s} v_{sc} = 1.\]

The default in mlsynth.config_models.MLSCConfig is uniform weights \(v_{sc} = 1 / C_s\); non-uniform population weights can be supplied through the weight_col field.

The dGSC Family and the Aggregation Choice#

Section 4 of the paper introduces the disaggregated general SC (dGSC) class, parametrized by a weight matrix \(W_{c_0 s c}\) indexed by treated disaggregate units \(c_0\), control aggregates \(s\), and control disaggregates \(c\):

\[\hat\tau_{0t}^{\,\text{dGSC}} \;=\; \sum_{c_0 = 1}^{C_0} v_{0 c_0} \left( Y_{0 c_0 t} - \sum_{s = 1}^{S} \sum_{c = 1}^{C_s} W_{c_0 s c} \, Y_{sct} \right),\]

with \(W\) chosen to minimize the pre-treatment squared error subject to convex-hull constraints \(\sum_{s, c} W_{c_0 s c} = 1\) and \(W \geq 0\). The four edge cases discussed in Sections 3-4 of the paper correspond to constraints on \(W\):

  • dGSC-AA — aggregate treated, aggregate controls (the classical SC of Abadie et al., 2010). All within-block weights are proportional to \(v_{sc}\) and equal across treated disaggregates.

  • dGSC-AD — aggregate treated, disaggregate controls. The treated unit is kept at the aggregate level (weights equal across \(c_0\)); disaggregate control weights vary freely within convexity.

  • dGSC-DA — disaggregate treated, aggregate controls.

  • dGSC — disaggregate on both sides.

Proposition 1 of the paper shows that all four cases share the same classical-SC objective, differing only in the donor pool implied by the weight constraints. Section 6 demonstrates empirically that disaggregating the controls is the primary source of estimation gains; disaggregating the treated unit alone usually worsens performance (Jensen’s inequality: matching an average is easier than matching each component). For this reason, the mlSC estimator implemented in mlsynth focuses on the aggregate-treated, flexible-disaggregate-controls regime — the dGSC-AD case — with a penalty that shrinks toward classical SC.

The mlSC Objective#

With the treated unit kept aggregated (the paper’s preferred variant, Equation 5.2), the weight matrix collapses to a single vector \(\omega_{sc}\) over all disaggregate control units. Let \(w_s = \sum_c \omega_{sc}\) denote the implied aggregate weight on donor unit \(s\). The mlSC estimator solves

\[\min_{\omega \,\geq\, 0,\; \mathbf{1}^\top \omega \,=\, 1} \quad \underbrace{ \sum_{t = 1}^{T_0} \left( Y_{0t} - \sum_{s = 1}^{S} \sum_{c = 1}^{C_s} \omega_{sc} \, Y_{sct} \right)^{2} }_{\text{pre-treatment fit}} \;+\; \lambda \, \hat\sigma_y^2 \underbrace{ \sum_{s = 1}^{S} \sum_{c = 1}^{C_s} \left( \omega_{sc} - v_{sc} \, w_s \right)^{2} }_{\text{hierarchical penalty}}.\]

The penalty term measures how far the disaggregate weight vector deviates from “population-share times aggregate weight”. When \(\lambda \to \infty\) the penalty forces \(\omega_{sc} = v_{sc} \, w_s\), which collapses the estimator to classical SC at the aggregate level. When \(\lambda = 0\) the disaggregate weights are free to vary within the convex hull, recovering the fully-disaggregated dGSC-AD estimator. The scale factor \(\hat\sigma_y^2\) keeps \(\lambda\) itself scale-invariant across panels.

The Block-Diagonal Penalty Matrix#

Stacking the within-block penalty terms gives a clean quadratic form. For each aggregate \(s\), let \(\omega_s \in \mathbb{R}^{C_s}\) be the within-block weight vector and \(v_s \in \mathbb{R}^{C_s}\) the corresponding population weights. Defining \(A_s = I - v_s \mathbf{1}^\top\), the within-block contribution is

\[\sum_{c = 1}^{C_s} \left( \omega_{sc} - v_{sc} \, w_s \right)^{2} \;=\; \| A_s \, \omega_s \|_2^2 \;=\; \omega_s^\top Q_s \, \omega_s,\]

with

\[Q_s \;=\; A_s^\top A_s \;=\; I \;-\; v_s \mathbf{1}^\top \;-\; \mathbf{1} v_s^\top \;+\; \| v_s \|_2^{\,2} \, \mathbf{1} \mathbf{1}^\top.\]

The full penalty matrix \(Q \in \mathbb{R}^{M \times M}\), with \(M = \sum_s C_s\) the total number of disaggregate control units, is block-diagonal across \(s\) with blocks \(Q_s\). \(Q_s\) is symmetric positive semidefinite with \(v_s\) in its kernel (any weight vector proportional to \(v_s\) incurs zero penalty), so the mlSC optimization remains a convex QP and admits standard cvxpy solvers. mlsynth uses SCS by default; this matches the reference implementation and avoids any commercial-solver dependency.

Selecting \(\lambda\): Heuristic and Fixed Modes#

Two penalty-selection rules are exposed via mlsynth.config_models.MLSCConfig.lambda_est:

  • 'heuristic' — the closed-form rule from Section 5.2 of the paper,

    \[\hat\lambda \;=\; 2 \, \hat\sigma_\varepsilon^2 \,/\, \hat\sigma_y^2,\]

    derived from the oracle optimal \(\lambda^*\) in a stylized hierarchical random-effects model under \(T = S = C_s = 2\) (Appendix B of the paper). Intuitively, the heuristic imposes a larger penalty when the panel is noisier, since added flexibility is more likely to overfit noise than to extract signal.

  • 'fixed' — use a user-supplied \(\lambda\) directly. Useful for sensitivity analysis (sweep \(\lambda\) and inspect how the aggregate weight pattern shifts between classical-SC and disaggregated-SC regimes) and for reproducing the paper’s grid- search experiments.

The cross-validation-over-time rule from Section 5.2 is not yet exposed in v1 of the mlsynth implementation; it is on the short-term roadmap.

Variance Decomposition (Appendix G)#

The heuristic requires plug-in estimates of \(\sigma_\varepsilon^2\) and \(\sigma_y^2\). Following the paper’s Appendix G, these are obtained from a simplified hierarchical random-effects model \(Y_{sct} = \alpha_s + \eta_{sc} + \varepsilon_{sct}\) fit to the pre-treatment slice of df_disagg. For each control aggregate \(s\) (the treated aggregate is excluded so the post-period unobservability does not contaminate the estimates), define

\[\hat\mu_{sc} \;=\; \frac{1}{T_0} \sum_{t = 1}^{T_0} Y_{sct}, \qquad \widehat{\mathrm{Var}}(\varepsilon)_s \;=\; \frac{1}{C_s T_0} \sum_{c = 1}^{C_s} \sum_{t = 1}^{T_0} (Y_{sct} - \hat\mu_{sc})^{2},\]

and

\[\widehat{\mathrm{Var}}(y)_s \;=\; \frac{1}{C_s T_0} \sum_{c = 1}^{C_s} \sum_{t = 1}^{T_0} \left( Y_{sct} - \overline{Y}_{s\cdot\cdot} \right)^{2},\]

with \(\overline{Y}_{s\cdot\cdot}\) the pre-treatment grand mean over aggregate \(s\). The plug-in estimates are then the non-treated averages \(\hat\sigma_\varepsilon^2 = \mathrm{mean}_s \widehat{\mathrm{Var}}(\varepsilon)_s\) and \(\hat\sigma_y^2 = \mathrm{mean}_s \widehat{\mathrm{Var}}(y)_s\). The implementation in mlsynth.utils.mlsc_helpers.variance.estimate_variance_components() matches the reference Python package and the paper’s description to the letter.

Counterfactual and Effect Summaries#

Given the optimal \(\omega\), the aggregate counterfactual is

\[\hat Y_{0t}(0) \;=\; \sum_{s = 1}^{S} \sum_{c = 1}^{C_s} \omega_{sc} \, Y_{sct}, \qquad t = 1, \dots, T,\]

and the average treatment effect on the treated aggregate is

\[\widehat{\mathrm{ATT}} \;=\; \frac{1}{T - T_0} \sum_{t = T_0 + 1}^{T} \left( Y_{0t} - \hat Y_{0t}(0) \right),\]

reported as MLSCResults.att. The pre-period RMSE between the observed treated series and its mlSC reconstruction, \(\mathrm{RMSE}_{\text{pre}} = \big[\frac{1}{T_0} \sum_{t \le T_0} (Y_{0t} - \hat Y_{0t})^2\big]^{1/2}\), is reported as MLSCResults.pre_rmse.

Two-DataFrame API and Validation#

Because the aggregate-level and disaggregate-level data live at different units of observation, they are passed as two separate long-form DataFrames sharing the same outcome, time, and treat column names. df_disagg additionally carries an agg_id column mapping each disaggregate unit to its parent aggregate, and an optional weight_col for non-uniform \(v_{sc}\). The configuration validator enforces four cross-DataFrame invariants:

  1. Treatment timing alignment. The pre-period count \(T_0\) implied by df_agg must match the one implied by df_disagg; any disagreement raises mlsynth.exceptions.MlsynthDataError.

  2. Treated-aggregate consistency. Every disaggregate unit with treat = 1 must map (via agg_id) to the same aggregate unit, which must equal the treated aggregate in df_agg.

  3. Single treatment cohort. All treated disaggregate units must share the same treatment start period (treatment-never-turns-off is part of the mlSC framework).

  4. Aggregate-label coverage. Every value of agg_id in df_disagg must appear as a unitid_agg in df_agg.

These invariants are enforced once in mlsynth.utils.mlsc_helpers.setup.prepare_mlsc_inputs(), which reuses mlsynth.utils.datautils.dataprep() on both panels (including the disaggregate cohorts shape) and then assembles the matrices the rest of the pipeline consumes.

Assumptions (Bottmer 2026)#

mlSC inherits the canonical SC identification stack and adds the structural conditions needed for the hierarchical penalty to make sense – and for the heuristic \(\lambda\) to be calibrated. The paper’s formal assumptions:

A1 (linear aggregation). The aggregate outcome is a known weighted average of its disaggregate components, \(Y_{st} = \sum_{c = 1}^{C_s} v_{sc} Y_{sct}\) with \(\sum_c v_{sc} = 1\) for every aggregate. The weights \(v_{sc}\) are observable to the analyst (uniform by default; population shares are the canonical alternative). Required for the estimand at the aggregate level to be a clean function of the disaggregate potential outcomes.

A2 (binary, sharp, absorbing treatment at the aggregate level). A single aggregate unit (\(s = 0\)) is treated; the treatment indicator \(W_{sct}\) is 1 iff \(s = 0\) and \(t > T_0\), and once on it stays on. Equivalently defined at the aggregate level. Multiple treated aggregates are out of scope of the paper’s main theory (the paper discusses staggered designs only as a future direction; mlsynth’s FECT / Synthetic Difference-in-Differences (SDID) are the staggered-aware alternatives).

A3 (SUTVA at the disaggregate level). No interference between disaggregate units. Note this is stronger than canonical-SC SUTVA which is at the aggregate level: mlSC pulls within-state variation directly into the weight fit, so an exposed county influencing a non-exposed county within the same donor state breaks identification at the level mlSC operates on.

A4 (hierarchical latent-factor outcome model – Section 6.1, Appendix G). The untreated disaggregate outcome decomposes as

\[Y_{sct}^{(0)} \;=\; \underbrace{\alpha_s' \beta_t}_{L_{st}^{\text{agg}}} \;+\; \underbrace{\eta_{sc}' \beta_t}_{L_{sct}^{\text{disagg}}} \;+\; \varepsilon_{sct}, \qquad \varepsilon_{sct} \stackrel{\text{iid}}{\sim} \mathcal{N}(0, \sigma_\varepsilon^2),\]

with a shared set of time factors \(\beta_t\), aggregate-level loadings \(\alpha_s\), and within-aggregate disaggregate deviations \(\eta_{sc}\). This is what licenses the variance- decomposition recipe (Appendix G) feeding the heuristic \(\hat\lambda = 2 \hat\sigma_\varepsilon^2 / \hat\sigma_y^2\). The model also implies the four-way MSE decomposition the paper uses to characterise the central trade-off (oracle bias + restriction bias + estimation error + post-period noise).

A5 (feasibility / convex hull for dGSC-AD). The treated aggregate’s pre-period trajectory lies in the convex hull of the disaggregate donor pool \(\{Y_{sct}\}_{s \ge 1, c}\). With disaggregation, the donor pool is \(\sum_{s \ge 1} C_s\) units wide – typically an order of magnitude larger than the \(S\)-state aggregate hull – so this is easier to satisfy than canonical SC’s hull condition, which is the main empirical motivation for the estimator. The penalty term is what stops the expanded hull from degenerating into a non-unique solution at zero pre-period RMSE.

Headline theoretical finding (paper Section 1, MSE decomposition). Under A1-A4, the post-treatment MSE of any dGSC estimator decomposes as

\[\mathrm{MSE}_{\text{post}} \;=\; \underbrace{\text{oracle bias}}_{\text{common to all SC}} \;+\; \underbrace{\text{restriction bias}}_{\text{from forced aggregation}} \;+\; \underbrace{\text{estimation error}}_{\text{weight variance}} \;+\; \underbrace{\text{post-period noise}}_{\text{irreducible}}.\]

The classical SC at \(\lambda \to \infty\) minimises estimation error but pays the restriction bias; the fully disaggregated SC at \(\lambda = 0\) removes the restriction bias but inflates estimation error from the wider donor pool. The mlSC penalty traces the bias-variance frontier between the two, with the data-driven \(\hat\lambda\) picking the sweet spot. Bottmer emphasises this is not a standard bias-variance trade- off – it is a flexibility-vs-noise-sensitivity trade-off, which is why the heuristic scales as \(\sigma_\varepsilon^2 / \sigma_y^2\) (the disaggregate-noise share of total variance).

When the assumptions bind: practical diagnostics#

  1. Linear aggregation with known weights (A1). The aggregate outcome must be a known linear function of the disaggregate outcomes. Some outcomes don’t aggregate cleanly – a state-level unemployment rate, for instance, is a ratio whose state-level value is not the population-weighted mean of county-level ratios.

    Plausibly violated when the outcome is a ratio (rate, share, log-of-aggregate), an order statistic (median income), or a normalised index. Diagnostic: compute \(\sum_c v_{sc} Y_{sct}\) directly from df_disagg and compare to the corresponding row of df_agg; if the two disagree by more than rounding noise for any (s, t), A1 is failing. Re-derive the disaggregate outcome so that the aggregation is exact (e.g. work with county-level employment counts, not unemployment rates), or move to canonical SC on the aggregate panel alone.

  2. Binary, sharp, absorbing treatment (A2). Multiple treated aggregates, treatment that turns off, or continuous dose break the identification.

    Plausibly violated when the policy is rescinded mid- panel, when several states adopted at different times, or when the treatment intensity varies. Diagnostic: the config validator enforces single-cohort by construction; if you have to silently coerce a staggered-adoption panel into a single cohort to get past validation, that is a structural warning. Use FECT / Synthetic Difference-in-Differences (SDID) for staggered adoption, Continuous-Treatment Synthetic Control (CTSC) for continuous dose.

  3. SUTVA at the disaggregate level (A3). Disaggregate units within donor states must not interfere with each other in a way that the aggregation washes out.

    Plausibly violated when counties within a donor state share a labour market, a school district, or a media market that an exposed county can influence. Diagnostic: split each donor state into core / periphery counties by a geographic or economic criterion, refit, and check whether the implied \(\omega_{sc}\) and pre-RMSE move substantially. If they do, the within-state interference is doing real work and you need a spillover-aware setup (Spillover-Aware Synthetic Control (SPILLSYNTH), Spatial Synthetic Difference-in-Differences (SpSyDiD)) which dispenses with the disaggregate SUTVA.

  4. Hierarchical latent-factor outcome model (A4). The Appendix-G variance decomposition that feeds \(\hat\lambda\) is derived under a specific state-effect + county-within-state + iid shock structure. If the disaggregate variation is not well-described by a clean state-level component plus a uniform-variance idiosyncratic shock, the heuristic mis-prices the flexibility-vs-noise trade-off.

    Plausibly violated when some counties are systematically noisier than others (urban vs rural variance pattern), or when within-state cross-county correlation is high (a state-wide industry shock hitting every county similarly). Diagnostic: read results.design.sigma_eps2 and results.design.sigma_y2 and refit with lambda_est="fixed" swept over a grid spanning 2 * sigma_eps2 / sigma_y2 by an order of magnitude in each direction; if the implied ATT moves by more than its bootstrap SE across that range, the heuristic is unstable on your panel and you should switch to "cross-validation over time" once that selector ships.

  5. Convex-hull feasibility with disaggregate donors (A5). Expanding the donor pool by disaggregation usually makes the hull easier to satisfy, but in pathological cases – a coastal mega-state matched only by tiny interior counties – the wider pool may still miss the treated aggregate.

    Plausibly violated when results.pre_rmse is large and results.aggregate_donor_weights is highly concentrated on a single donor state (the penalty is pulling toward the classical SC but the classical SC is itself missing). Diagnostic: sweep lambda_est="fixed" over a wide grid (the existing example block in the docs shows the recipe); if pre-RMSE stays large at \(\lambda = 0\), the disaggregate hull is structurally missing the treated aggregate and you need Imperfect Synthetic Controls (ISCM) (which identifies the effect even when the treated unit is outside the hull).

  6. Aggregation weights stable over time. A1 implicitly assumes the \(v_{sc}\) are fixed across \(t\). If population shares drift substantially within the analysis window, the disaggregate-to-aggregate map itself becomes a moving target.

    Plausibly violated when the analysis window spans a decade or more of demographic change (urban-rural migration; county boundary changes). Diagnostic: re-run with weight_col set to a time-averaged share vs an end-of-period share; if the ATT moves, the drift is binding. Use the end-of-period share if the post-period weighting is what the policy question targets, or interpolate \(v_{sc}\) over time and feed in a time-varying weight column (not currently supported in v1; a roadmap item).

When to use mlSC – and when not to#

Reach for mlSC when:

  • You have disaggregate data underneath an aggregate-level policy: county outcomes beneath a state-wide tax change, store-level outcomes beneath a chain-wide pricing change, household outcomes beneath a regional-government program. The estimand stays at the aggregate level; the disaggregate data is information to be exploited.

  • The canonical SC pre-fit at the aggregate level is mediocre – the small number of aggregates makes the hull thin and the pre-period RMSE elevated. Disaggregating the donor pool widens the hull by an order of magnitude (counties vs states), and the penalty keeps the wider hull from degenerating into a non-unique solution.

  • You want a single data-driven aggregation choice rather than the manual “do I use state or county donors?” call. The heuristic \(\hat\lambda\) and the (forthcoming) cross-validation-over-time selector turn that judgment into a tunable parameter.

  • Pre-period \(T_0\) is short enough that classical SC overfits but long enough that the heuristic-variance decomposition is well-identified (a dozen pre-periods or more, per Bottmer’s Section 6 calibration).

Do not use mlSC when:

  • You only have aggregate-level data. mlSC requires df_disagg with at least a handful of disaggregate units per donor aggregate. Without it, fall back to canonical canonical SCM / Two-Step Synthetic Control / Forward Difference-in-Differences (FDID).

  • Treatment is assigned at the disaggregate level. The paper enforces aggregate-level assignment (A2); if the policy hits a single county rather than a whole state, you want a disaggregate-target estimator (canonical SCM, MicroSynth (User-Level Balancing SC) for individual-level treatment with covariate balancing).

  • The aggregate outcome isn’t a linear function of disaggregate outcomes. Ratios, order statistics, and log-of-aggregate outcomes break A1. Either re-derive the outcome so the aggregation is exact, or move to canonical SC.

  • The treated aggregate is structurally outside the disaggregate hull. Even at \(\lambda = 0\) the pre-period RMSE stays large – the wider hull doesn’t help because the treated unit’s level is unreachable by any convex combination of donor disaggregates. Use Imperfect Synthetic Controls (ISCM), whose A2(b) mechanism identifies the effect through donors that use the treated aggregate as a positive-weight donor.

  • You need within-aggregate treatment effect heterogeneity. mlSC targets the aggregate ATT. The paper notes (Introduction) that disaggregating the treated unit (dGSC-DA / dGSC) is the door to heterogeneity but typically worsens out-of-sample performance for the aggregate-level effect. For genuinely heterogeneous-effects questions, run county-by-county canonical SC instead.

  • Outcomes are extremely noisy at the disaggregate level. When the within-state idiosyncratic shock variance dominates the total variance, the disaggregate data is more noise than signal and the heuristic pushes \(\lambda \to \infty\) (collapsing mlSC to classical SC). In that regime use canonical SC directly to skip the extra bookkeeping.

  • Staggered adoption across multiple aggregates. Single-cohort is the paper’s main scope; for staggered designs use FECT or Synthetic Difference-in-Differences (SDID).

  • Spillovers within a donor aggregate. A3 fails if counties within a donor state influence each other in a way the aggregation cannot wash out. Use Spillover-Aware Synthetic Control (SPILLSYNTH) or Spatial Synthetic Difference-in-Differences (SpSyDiD).

  • Distributional questions (Lorenz curves, QTEs). mlSC targets the mean aggregate ATT; for distributional effects use Distributional Synthetic Control (DSC).

  • Continuous treatment. Use Continuous-Treatment Synthetic Control (CTSC).

  • You need a sparse, hand-interpretable single weight vector for policy storytelling. mlSC’s disaggregate weight matrix \(\omega_{sc}\) is intrinsically a many-county object; you can report the implied aggregate weights results.aggregate_donor_weights (Bottmer’s \(w_s = \sum_c \omega_{sc}\)) but those are derived rather than directly optimised, and at small \(\lambda\) they will not be sparse. If the headline story has to be “California ≈ 0.4 Utah + 0.3 Montana + 0.2 Nevada”, canonical SC remains the cleaner deliverable.

Core API#

Multi-Level Synthetic Control (mlSC) estimator.

Implements:

Bottmer, L. (2025). “Synthetic Control with Disaggregated Data.” Stanford University job-market paper.

Reference Python package: multi-levelSC (Apache-2.0), reimplemented here under mlsynth’s MIT license following the SCDI / SPCD / LEXSCM / TASC architectural conventions.

The mlSC estimator solves a hierarchical-aggregation regularized SC problem on a two-level panel (e.g. state-level treated unit + county-level controls). Treatment is assigned at the aggregate level; the disaggregate data enters as an enlarged donor pool, with a ridge-type penalty that shrinks each disaggregate weight omega_sc toward v_sc * w_s (population-share times implied aggregate weight). At lambda -> infinity the estimator recovers classical SC; at lambda = 0 it recovers the fully- disaggregated control SC (dGSC-AD in the paper’s notation). v1 implements the Section 5.2 heuristic lambda = 2 * sigma_eps^2 / sigma_y^2 and a user-supplied fixed lambda.

class mlsynth.estimators.mlsc.MLSC(config: MLSCConfig | dict)#

Bases: object

Multi-Level Synthetic Control (mlSC) estimator.

Operates on two long-form DataFrames: one aggregate-level (df_agg, e.g. state by time) and one disaggregate-level (df_disagg, e.g. county by time with an agg_id column mapping each county to its state). Treatment is assigned at the aggregate level. The optimization weights all disaggregate control units to minimize pre-treatment fit on the aggregate treated series, with a hierarchical penalty that shrinks toward classical SC.

Parameters:

config (MLSCConfig or dict) – Configuration object. See mlsynth.config_models.MLSCConfig.

Returns:

MLSCResults – Container with the optimal disaggregate weights, implied aggregate weights, counterfactual path, ATT, and pre-period RMSE.

Notes

Both DataFrames must share the same outcome, time, and treat column names and must agree on treatment timing. The treated row in df_disagg must map (via agg_id) to the treated aggregate unit in df_agg.

References

Bottmer, L. (2025). Synthetic Control with Disaggregated Data. Stanford University job-market paper.

fit() MLSCResults#

Run the mlSC pipeline and return the learned design.

Configuration#

class mlsynth.config_models.MLSCConfig(*, df_agg: ~pandas.DataFrame, df_disagg: ~pandas.DataFrame, outcome: str, time: str, treat: str, unitid_agg: str, unitid_disagg: str, agg_id: str, weight_col: str | None = None, lambda_est: ~typing.Literal['heuristic', 'fixed'] = 'heuristic', lambda_val: ~typing.Annotated[float, ~annotated_types.Ge(ge=0)] = 0.0001, solver: ~typing.Any = None, display_graphs: bool = True, save: bool | str | ~typing.Dict[str, str] = False, counterfactual_color: str | ~typing.List[str] = <factory>, treated_color: str = 'black')#

Configuration for the Multi-Level Synthetic Control (mlSC) estimator.

Implements the data-driven hierarchical-aggregation estimator of Bottmer (2025, “Synthetic Control with Disaggregated Data”). Unlike the rest of mlsynth’s estimators, mlSC operates on a two-level panel: an aggregate-level DataFrame (e.g. state-by-time) and a disaggregate-level DataFrame (e.g. county-by-time, with a column linking each county to its parent state). Treatment is assigned at the aggregate level and the estimand is the aggregate-level ATT, but the disaggregate data enters the donor pool with a ridge-type penalty that shrinks the disaggregated weights toward “population-share times aggregate weight” — recovering classical SC in the large-penalty limit and fully-disaggregated SC at penalty zero.

v1 implements only the heuristic and fixed-lambda penalty-selection paths from Section 5.2 of the paper; cross-validation will follow.

class Config#
arbitrary_types_allowed = True#
extra = 'forbid'#
agg_id: str#
counterfactual_color: str | List[str]#
df_agg: DataFrame#
df_disagg: DataFrame#
display_graphs: bool#
lambda_est: Literal['heuristic', 'fixed']#
lambda_val: float#
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

outcome: str#
save: bool | str | Dict[str, str]#
solver: Any#
time: str#
treat: str#
treated_color: str#
unitid_agg: str#
unitid_disagg: str#
weight_col: str | None#

Helper Modules#

Two-DataFrame data preparation for mlSC.

Wraps datautils.dataprep once on each panel and assembles the matrices the rest of the pipeline consumes. Validation enforces alignment between the aggregate and disaggregate panels: matching treatment timing, complete agg_id coverage, and population weights that sum to 1 within each aggregate (or default to uniform).

mlsynth.utils.mlsc_helpers.setup.prepare_mlsc_inputs(df_agg: DataFrame, df_disagg: DataFrame, outcome: str, time: str, treat: str, unitid_agg: str, unitid_disagg: str, agg_id: str, weight_col: str | None) MLSCInputs#

Validate, pivot, and stack the two-level panel for mlSC.

Parameters:
  • df_agg, df_disagg (pd.DataFrame) – Long-form aggregate and disaggregate panels.

  • outcome, time, treat (str) – Column names shared by both panels.

  • unitid_agg, unitid_disagg, agg_id, weight_col (str) – Aggregate and disaggregate identifier columns, the disaggregate-> aggregate mapping column, and an optional population-weights column.

Returns:

MLSCInputs – Pre-processed matrices, weight vector, block-index map, and metadata.

Variance decomposition for the mlSC heuristic penalty.

Implements Appendix G of Bottmer (2025): under the simplified hierarchical random-effects model

Y_sct = alpha_s + eta_sc + eps_sct,

the unit-level pre-treatment means yield consistent estimates of sigma_eps^2 and sigma_y^2. The mlSC heuristic lambda = 2 * sigma_eps^2 / sigma_y^2 (Section 5.2) is built from these.

The decomposition is computed only over the control aggregates — the treated aggregate is excluded, matching the upstream package and the paper’s description (“taking the average estimated variance across all other aggregated units except for the treated unit”).

mlsynth.utils.mlsc_helpers.variance.estimate_variance_components(inputs: MLSCInputs) Tuple[float, float]#

Return (sigma_eps2, sigma_y2) per Appendix G of Bottmer (2025).

Parameters:

inputs (MLSCInputs) – Pre-processed mlSC inputs. Uses Y_disagg_pre_full, disagg_to_agg_full, and treated_agg_idx_full to run the decomposition over the full disaggregate pre-treatment panel, excluding the treated aggregate.

Returns:

  • sigma_eps2 (float) – Estimated within-disaggregate-unit noise variance, averaged across non-treated aggregates.

  • sigma_y2 (float) – Estimated total disaggregate-outcome variance, averaged across non-treated aggregates.

mlsynth.utils.mlsc_helpers.variance.heuristic_lambda(sigma_eps2: float, sigma_y2: float) float#

Closed-form heuristic lambda = 2 * sigma_eps^2 / sigma_y^2.

Block-diagonal hierarchical-aggregation penalty matrix for mlSC.

The mlSC objective contains the ridge-type penalty

sum_{s, c} (omega_sc - v_sc * w_s)^2, w_s = sum_c omega_sc.

For each aggregate block s with population weights v_s in R^{C_s} satisfying 1^T v_s = 1, the within-block penalty equals omega_s^T Q_s omega_s where

Q_s = (I - v_s 1^T)^T (I - v_s 1^T)

= I - v_s 1^T - 1 v_s^T + ||v_s||^2 * 1 1^T.

The full penalty matrix Q in R^{M x M} is block-diagonal with these Q_s blocks. Q_s is positive semidefinite (kernel: v_s), so the mlSC objective remains a convex QP.

mlsynth.utils.mlsc_helpers.penalty.build_block(v_s: ndarray) ndarray#

Return the single-block penalty matrix Q_s for one aggregate.

mlsynth.utils.mlsc_helpers.penalty.build_penalty_matrix(v_population: ndarray, disagg_to_agg: ndarray) ndarray#

Assemble the full block-diagonal Q matrix.

Parameters:
  • v_population (np.ndarray) – Length-M population weights for each disaggregate column.

  • disagg_to_agg (np.ndarray) – Length-M integer block-membership index for each column.

Returns:

np.ndarray – Block-diagonal Q of shape (M, M).

MlSC QP solver.

Solves Equation 5.2 of Bottmer (2025):

min over omega in R^M
    || Y_agg_treated_pre  -  X_disagg_pre @ omega ||^2
    + lambda * sigma_y^2 * omega^T Q omega
s.t.  1^T omega = 1,   omega >= 0.

The objective is a convex QP. We seed cvxpy with the warm-start trick the upstream package uses (Bottmer’s reference implementation): solve the classical aggregate-level SC first, then expand each aggregate weight w_s proportionally to the population shares v_s to produce a feasible disaggregate starting point. This typically takes the SCS solver a handful of iterations to reach optimality even on noisy panels.

mlsynth.utils.mlsc_helpers.optimization.solve_mlsc(inputs: MLSCInputs, Q: ndarray, lambda_val: float, sigma_y2: float, solver: Any = None) Tuple[ndarray, ndarray, str]#

Solve the mlSC QP.

Parameters:
  • inputs (MLSCInputs) – Pre-processed two-level panel.

  • Q (np.ndarray) – Block-diagonal penalty matrix of shape (M, M).

  • lambda_val (float) – Penalty value (already chosen via heuristic or fixed).

  • sigma_y2 (float) – Outcome variance, used as the penalty’s scale factor so that lambda is scale-invariant as in the paper.

  • solver (Any) – cvxpy solver. None -> cp.SCS.

Returns:

  • omega (np.ndarray) – Optimal disaggregate weights, length M.

  • aggregate_weights (np.ndarray) – Implied w_s = sum_c omega_sc, length S.

  • status (str) – cvxpy solver status string.

Counterfactual path and effect summaries for mlSC.

The aggregate-level counterfactual is just X_disagg @ omega evaluated across all T periods. The pre-period RMSE and post-period mean gap come from comparing this trajectory against the observed aggregate treated outcome.

mlsynth.utils.mlsc_helpers.inference.counterfactual_path(inputs: MLSCInputs, omega: ndarray) MLSCInference#

Build the counterfactual and gap series.

mlsynth.utils.mlsc_helpers.inference.summarize_effects(inputs: MLSCInputs, inference: MLSCInference) Tuple[float, float]#

Return (att, pre_rmse) for the aggregate treated unit.

Plotting helper for mlSC.

Wraps mlsynth.utils.resultutils.plot_estimates so we get the standard observed-vs-counterfactual chart, where “observed” is the aggregate treated series and “counterfactual” is X_disagg @ omega.

mlsynth.utils.mlsc_helpers.plotter.plot_mlsc(results: MLSCResults, treated_color: str = 'black', counterfactual_color: str | List[str] = 'red', save: bool | dict | str = False, time_axis_label: str = 'Time', outcome_label: str = 'Outcome', treatment_label: str = 'Treatment', unit_label: str = 'Unit') None#

Render the mlSC aggregate counterfactual against the observed series.

Structured containers for the mlSC pipeline.

All matrices follow mlsynth’s standard (T, N) orientation (rows = time, columns = unit), matching datautils.dataprep.

class mlsynth.utils.mlsc_helpers.structures.MLSCDesign(omega: ndarray, aggregate_weights: ndarray, lambda_used: float, sigma_eps2: float, sigma_y2: float, lambda_est: str, solver_status: str)#

Optimization output and selected hyperparameters.

Parameters:
  • omega (np.ndarray) – Optimal disaggregate weights, length M. Satisfies sum(omega) = 1 and omega >= 0.

  • aggregate_weights (np.ndarray) – Implied aggregate weights w_s = sum_c omega_sc for each control aggregate, length S.

  • lambda_used (float) – Penalty value actually applied.

  • sigma_eps2 (float) – Estimated noise variance from Appendix G.

  • sigma_y2 (float) – Estimated outcome variance from Appendix G.

  • lambda_est (str) – Selection rule that was used ('heuristic' or 'fixed').

  • solver_status (str) – cvxpy solver status string.

aggregate_weights: ndarray#
lambda_est: str#
lambda_used: float#
omega: ndarray#
sigma_eps2: float#
sigma_y2: float#
solver_status: str#
class mlsynth.utils.mlsc_helpers.structures.MLSCInference(counterfactual: ndarray, gap: ndarray)#

Counterfactual path and post-period gap.

Parameters:
  • counterfactual (np.ndarray) – Estimated aggregate counterfactual X_disagg @ omega, length T.

  • gap (np.ndarray) – Observed minus counterfactual, length T.

counterfactual: ndarray#
gap: ndarray#
class mlsynth.utils.mlsc_helpers.structures.MLSCInputs(Y_agg_treated: ndarray, X_disagg: ndarray, v_population: ndarray, disagg_to_agg: ndarray, agg_labels: Sequence, disagg_labels: Sequence, Y_disagg_pre_full: ndarray, disagg_to_agg_full: ndarray, treated_agg_idx_full: int, T: int, T0: int, treated_unit_name: str, time_labels: ndarray, Ywide_agg: object, outcome: str)#

Pre-processed data fed into the mlSC optimization and inference loops.

Parameters:
  • Y_agg_treated (np.ndarray) – Observed aggregate-level treated outcome series, length T.

  • X_disagg (np.ndarray) – Disaggregate-control outcome matrix of shape (T, M) where M is the total number of disaggregate control units (sum of C_s across all non-treated aggregates). Columns are ordered first by aggregate label and then by disaggregate label, with this ordering preserved in disagg_to_agg and v_population.

  • v_population (np.ndarray) – Population aggregation weights v_sc for each disaggregate control column, length M. Within each aggregate block the entries sum to 1.

  • disagg_to_agg (np.ndarray) – Length-M integer array giving the aggregate-block index for each disaggregate column. Aggregate blocks are indexed 0..S-1.

  • agg_labels (Sequence) – Labels of the S control aggregates in block order.

  • disagg_labels (Sequence) – Labels of the M disaggregate-control columns in column order.

  • Y_disagg_pre_full (np.ndarray) – Pre-treatment disaggregate outcome matrix for all aggregates (including the treated one), shape (T0, M_full). Used by the Appendix-G variance decomposition.

  • disagg_to_agg_full (np.ndarray) – Length-M_full aggregate-index assignment matching Y_disagg_pre_full columns.

  • treated_agg_idx_full (int) – Index of the treated aggregate within Y_disagg_pre_full blocks (used to exclude it from the variance estimate).

  • T (int) – Total number of time periods.

  • T0 (int) – Number of pre-treatment periods.

  • treated_unit_name (str) – Label of the treated aggregate.

  • time_labels (np.ndarray) – Time labels in original order, length T.

  • Ywide_agg (object) – Wide aggregate-level outcome frame (rows = time, columns = aggregate unit) preserved from dataprep for downstream plotting.

  • outcome (str) – Outcome variable name.

property M: int#

Number of disaggregate control columns.

property S: int#

Number of control aggregate units.

T: int#
T0: int#
X_disagg: ndarray#
Y_agg_treated: ndarray#
Y_disagg_pre_full: ndarray#
Ywide_agg: object#
agg_labels: Sequence#
disagg_labels: Sequence#
disagg_to_agg: ndarray#
disagg_to_agg_full: ndarray#
outcome: str#
time_labels: ndarray#
treated_agg_idx_full: int#
treated_unit_name: str#
v_population: ndarray#
class mlsynth.utils.mlsc_helpers.structures.MLSCResults(inputs: MLSCInputs, design: MLSCDesign, inference: MLSCInference, att: float, pre_rmse: float, donor_weights: dict, aggregate_donor_weights: dict)#

Public MLSC.fit() return container.

Parameters:
  • inputs (MLSCInputs) – Pre-processed two-level panel.

  • design (MLSCDesign) – Optimization outputs and variance-decomposition diagnostics.

  • inference (MLSCInference) – Counterfactual path and gap.

  • att (float) – Mean post-period gap.

  • pre_rmse (float) – Pre-period RMSE of the aggregate counterfactual against the observed treated series.

  • donor_weights (dict) – Mapping from disaggregate-unit label to its weight. Convenience view of design.omega.

  • aggregate_donor_weights (dict) – Mapping from aggregate-unit label to its implied weight w_s = sum_c omega_sc. Convenience view of design.aggregate_weights.

aggregate_donor_weights: dict#
att: float#
design: MLSCDesign#
donor_weights: dict#
inference: MLSCInference#
inputs: MLSCInputs#
pre_rmse: float#

The README hierarchical-factor DGP from the author’s reference package, packaged as simulate_mlsc_sample so the Verification replication runs as a one-liner.

README data-generating process for Bottmer (2025) mlSC.

Implements the simulation distributed with the author’s reference package (multi-level-sc-estimator/README.md): a hierarchical linear factor model with state-level loadings and within-state county deviations.

\[y_{s, c, t} = (\alpha_s + \eta_{sc}) \cdot f_t + \varepsilon_{sct},\]

with

  • \(f_t \sim \mathcal{N}(0, \sigma_{\text{time}}^2)\) — one shared time factor;

  • \(\alpha_s \sim \mathcal{N}(0, \sigma_{\text{state}}^2)\) — state-level loading;

  • \(\eta_{sc} \sim \mathcal{N}(0, \sigma_{\text{county}}^2)\) — within-state county deviation;

  • \(\varepsilon_{sct} \sim \mathcal{N}(0, \sigma_{\text{eps}}^2)\) — idiosyncratic shock.

Aggregation to the state level uses equal within-state weights (\(v_{sc} = 1 / C_s\)), so \(y_{st} = (1 / C_s) \sum_c y_{sct}\). The default README parameters are \(N_s = 10\), \(C_s = 10\), \(T = 20\), \((\sigma_{\text{time}}, \sigma_{\text{state}}, \sigma_{\text{county}}, \sigma_{\text{eps}}) = (1.0, 0.8, 0.5, 0.3)\).

The simulation never adds a treatment effect — the true ATT is exactly zero, so the Monte Carlo target is unbiasedness of the estimated ATT and value-for-value agreement with the reference implementation.

class mlsynth.utils.mlsc_helpers.simulation.MLSCSample(df_agg: DataFrame, df_disagg: DataFrame, data_s: ndarray, data_c: ndarray, n_c: ndarray, w_c: List[ndarray], idx: int, t: int)#

One draw from the README’s hierarchical-factor DGP.

df_agg, df_disagg

Long panels ready for mlsynth.MLSC. Columns: state / county (disagg only) / time / y / treated.

Type:

pd.DataFrame

data_s#

Aggregate-level outcome matrix, shape (N_states, T).

Type:

np.ndarray

data_c#

Disaggregate-level outcome matrix, shape (N_states * C_s, T).

Type:

np.ndarray

n_c#

Counties per state, shape (N_states,).

Type:

np.ndarray

w_c#

Within-state population weights (each array sums to one). One entry per state, in the order the reference impl expects.

Type:

list of np.ndarray

idx#

Treated state index.

Type:

int

t#

Treated period index (0-based; the README uses t = T - 1).

Type:

int

data_c: ndarray#
data_s: ndarray#
df_agg: DataFrame#
df_disagg: DataFrame#
idx: int#
n_c: ndarray#
t: int#
w_c: List[ndarray]#
mlsynth.utils.mlsc_helpers.simulation.simulate_mlsc_sample(N_states: int = 10, counties_per_state: int = 10, T: int = 20, sigma_time: float = 1.0, sigma_state: float = 0.8, sigma_county: float = 0.5, sigma_eps: float = 0.3, treated_idx: int = 0, treated_t: int | None = None, rng: Generator | None = None) MLSCSample#

Draw one sample from the Bottmer (2025) README DGP.

Parameters:
  • N_states (int, default 10) – Number of aggregate units (e.g. states).

  • counties_per_state (int, default 10) – Number of disaggregate units inside each aggregate unit. Constant across states in the README; pass a per-state count via the reference implementation if you need imbalance.

  • T (int, default 20) – Total number of periods.

  • sigma_time, sigma_state, sigma_county, sigma_eps (float) – Standard deviations for the time factor, state-level loading, within-state county deviation, and idiosyncratic shock.

  • treated_idx (int, default 0) – Aggregate unit assigned to treatment.

  • treated_t (int or None) – First treated period. Defaults to T - 1 (the README’s choice).

  • rng (np.random.Generator, optional) – NumPy RNG. Defaults to np.random.default_rng().

Returns:

MLSCSample

Example#

import pandas as pd
from mlsynth import MLSC

# Two long-form panels sharing outcome / time / treat column names.
#
#   df_agg     — one row per (state, year): the aggregate panel.
#   df_disagg  — one row per (county, year): the disaggregate panel.
#                Must contain an 'agg_id' column mapping each county
#                to its parent state, and may optionally carry a
#                'population' column for non-uniform v_sc.

df_agg    = pd.read_csv("state_panel.csv")
df_disagg = pd.read_csv("county_panel.csv")

config = {
    "df_agg":         df_agg,
    "df_disagg":      df_disagg,
    "outcome":        "sales",
    "time":           "year",
    "treat":          "treated",        # binary 0/1, equal at both levels
    "unitid_agg":     "state",          # column in df_agg
    "unitid_disagg":  "county_fips",    # column in df_disagg
    "agg_id":         "state",          # column in df_disagg
    "weight_col":     "population",     # optional; defaults to 1 / C_s
    "lambda_est":     "heuristic",      # or "fixed" with lambda_val
    "display_graphs": True,
}

results = MLSC(config).fit()

# Point estimate, fit diagnostic, learned penalty.
print(results.att)                          # mean post-period aggregate gap
print(results.pre_rmse)                     # pre-period RMSE
print(results.design.lambda_used)           # selected lambda
print(results.design.sigma_eps2,
      results.design.sigma_y2)              # Appendix-G variance estimates

# Disaggregate and implied aggregate weights.
print(results.donor_weights)                # {county_fips: omega_sc}
print(results.aggregate_donor_weights)      # {state:        w_s = sum_c omega_sc}

# Counterfactual trajectory (length T) and the observed - counterfactual gap.
cf  = results.inference.counterfactual
gap = results.inference.gap

# Sensitivity: sweep lambda by hand.
for lam in [0.0, 1e-2, 1e-1, 1.0, 10.0, 1e6]:
    r = MLSC({**config, "lambda_est": "fixed", "lambda_val": lam}).fit()
    print(f"lambda={lam:>8.4f}  ATT={r.att:.4f}  preRMSE={r.pre_rmse:.4f}")

Verification#

Empirical replication against the author’s reference code (Path A) plus a Monte Carlo unbiasedness check (Path B). Bottmer’s published empirical application — the effect of a state minimum-wage policy change on county-level teen employment in Iowa — uses the Quarterly Workforce Indicators (QWI) panel that ships with her reference package (tests/ia_emp_app_teen_empl.csv). The simulation block in her README spells out a hierarchical linear-factor DGP that’s free of proprietary data and is the design mlSC_estimator is calibrated against. Both validations are wired in below.

The DGP is packaged in mlsynth.utils.mlsc_helpers.simulation.simulate_mlsc_sample():

\[y_{s, c, t} = (\alpha_s + \eta_{sc}) \cdot f_t + \varepsilon_{sct}, \quad y_{s, t} = \frac{1}{C_s} \sum_c y_{s, c, t},\]

with \(f_t \sim \mathcal{N}(0, 1)\), \(\alpha_s \sim \mathcal{N}(0, 0.8^2)\), \(\eta_{sc} \sim \mathcal{N}(0, 0.5^2)\), and \(\varepsilon_{sct} \sim \mathcal{N}(0, 0.3^2)\) at the README defaults \(N_s = 10\), \(C_s = 10\), \(T = 20\). The simulation never adds a treatment effect, so the true ATT is exactly zero and the Monte Carlo target is unbiasedness.

Path A: value-for-value vs the reference code (one draw)#

On the README’s exact panel (np.random.default_rng(42)), mlsynth’s MLSC reproduces both the selected penalty and the reported ATT to solver tolerance:

import numpy as np
from mlsynth import MLSC
from mlsynth.utils.mlsc_helpers.simulation import simulate_mlsc_sample
from multi_level_sc_estimator.mlSC import mlSC_estimator       # reference

s = simulate_mlsc_sample(rng=np.random.default_rng(42))
tau_ref, lam_ref, _ = mlSC_estimator(
    s.data_s, s.data_c, s.idx, s.n_c, s.t, s.w_c,
    lambda_est="heuristic")
res = MLSC({"df_agg": s.df_agg, "df_disagg": s.df_disagg,
             "outcome": "y", "time": "time", "treat": "treated",
             "unitid_agg": "state", "unitid_disagg": "county",
             "agg_id": "state", "lambda_est": "heuristic",
             "display_graphs": False}).fit()
print(f"REFERENCE  tau_hat = {tau_ref:+.6f}  lambda = {lam_ref:.6f}")
print(f"MLSYNTH    tau_hat = {res.att:+.6f}  "
      f"lambda = {res.design.lambda_used:.6f}")

prints:

REFERENCE  tau_hat = +0.011928  lambda = 1.970185
MLSYNTH    tau_hat = +0.011930  lambda = 1.970185

Differences: \(|\hat\tau_{\text{mlsynth}} - \hat\tau_{\text{ref}}| = 1.5 \times 10^{-6}\), \(|\hat\lambda_{\text{mlsynth}} - \hat\lambda_{\text{ref}}| = 4.4 \times 10^{-16}\) — i.e. solver noise on the ATT and machine precision on the penalty.

Path B: unbiasedness across 200 draws#

Looping the comparison across \(M = 200\) draws confirms (a) both estimators are unbiased for the true zero ATT, and (b) they remain in near-machine agreement throughout:

def one_rep(seed):
    s = simulate_mlsc_sample(rng=np.random.default_rng(seed))
    tau_ref, _, _ = mlSC_estimator(
        s.data_s, s.data_c, s.idx, s.n_c, s.t, s.w_c,
        lambda_est="heuristic")
    res = MLSC({"df_agg": s.df_agg, "df_disagg": s.df_disagg,
                 "outcome": "y", "time": "time", "treat": "treated",
                 "unitid_agg": "state", "unitid_disagg": "county",
                 "agg_id": "state", "lambda_est": "heuristic",
                 "display_graphs": False}).fit()
    return float(tau_ref), float(res.att)

import numpy as np
tau_pairs = np.array([one_rep(s) for s in range(200)])
tau_ref, tau_ml = tau_pairs[:, 0], tau_pairs[:, 1]
print(f"reference: mean={tau_ref.mean():+.4f}  std={tau_ref.std(ddof=1):.4f}  "
      f"RMSE={np.sqrt((tau_ref**2).mean()):.4f}")
print(f"mlsynth  : mean={tau_ml.mean():+.4f}   std={tau_ml.std(ddof=1):.4f}  "
      f"RMSE={np.sqrt((tau_ml**2).mean()):.4f}")
print(f"max |Δ| = {np.abs(tau_ml - tau_ref).max():.2e}, "
      f"median |Δ| = {np.median(np.abs(tau_ml - tau_ref)):.2e}")

prints:

reference: mean=-0.0103  std=0.1663  RMSE=0.1662
mlsynth  : mean=-0.0103  std=0.1663  RMSE=0.1662
max |Δ| = 5.76e-04, median |Δ| = 1.05e-05

The Monte Carlo standard error of the mean at \(M = 200\) is \(\sigma / \sqrt{200} \approx 0.012\), so the observed bias of \(-0.010\) is well inside one MC SE of zero — the heuristic \(\hat\lambda\) does not introduce systematic bias, and mlsynth and the reference impl produce statistically identical samples.

Empirical application#

For reference, Bottmer’s empirical application — Iowa’s 2007 minimum- wage hike on county-level teen employment, run on the Quarterly Workforce Indicators panel that ships with her reference package (tests/ia_emp_app_teen_empl.csv) — uses the same heuristic-lambda configuration. Because the panel is public, mlsynth’s MLSC can be pointed at it directly through the two-DataFrame API; see the estimator’s docstring for the data-cleaning conventions the package expects (drop counties with any NaN over the analysis window, mirror the state-level treatment indicator on each contained county, normalize within-state weights to sum to one).

References#

Bottmer, L. (2025). Synthetic Control with Disaggregated Data. Stanford University job-market paper. PDF.

The reference Python implementation by the paper’s author, multi-levelSC (Apache-2.0), inspired several details of this rewrite — in particular the variance-decomposition implementation and the classical-SC warm-start used by the cvxpy solver.

Abadie, A., Diamond, A., and Hainmueller, J. (2010). “Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California’s Tobacco Control Program.” Journal of the American Statistical Association 105(490):493-505.