Multi-Level Synthetic Control (mlSC)#
Overview#
Multi-Level Synthetic Control (mlSC) is the estimator proposed by Bottmer (2025) for panels with two levels of aggregation: an aggregate level at which treatment is assigned (e.g. states), and a finer disaggregate level at which outcomes are also observed (e.g. counties within those states). Such panels are common in policy evaluation — state-wide cigarette and minimum-wage policies, for instance, are routinely studied with county-level outcome data — and they raise a question that classical SC does not address: at what level of aggregation should the estimator operate?
Existing practice spans the full range: some studies aggregate every- thing to the state level and apply classical SC; others enlarge the donor pool with all disaggregated control units; still others disaggregate both treated and control sides. Each choice has a different bias / noise profile. The mlSC estimator turns the aggregation choice into a data-driven regularization problem: the disaggregate weights are allowed to vary freely, but a ridge-type penalty shrinks them toward the classical-SC solution, with the penalty strength selected from data.
Two structural properties distinguish mlSC from the rest of the
mlsynth toolkit. First, it operates on two long-form
DataFrames, one at the aggregate level (df_agg) and one at the
disaggregate level (df_disagg), with an agg_id column on
df_disagg mapping each disaggregate unit to its parent aggregate.
Every other estimator in the package takes a single panel. Second,
the entire spectrum from classical SC to fully disaggregated SC is
recovered as limiting cases of a single penalty parameter
\(\lambda\) — at \(\lambda \to \infty\) mlSC reduces to the
classical SC of Abadie, Diamond, and Hainmueller (2010); at
\(\lambda = 0\) it recovers the fully-disaggregated control SC
(dGSC-AD in the paper’s notation); intermediate values pick a
data-driven mixture.
Mathematical Formulation#
Setup#
Let \(s = 0, 1, \dots, S\) index \(S + 1\) aggregate units (e.g. states), with \(s = 0\) the single treated aggregate. Each aggregate \(s\) contains \(C_s\) disaggregate units (e.g. counties), indexed \(c = 1, \dots, C_s\). Time periods are \(t = 1, \dots, T\), with treatment assigned in period \(T_0 + 1\) and absorbing thereafter (treatment never turns off). Let \(Y_{sct}\) denote the disaggregate outcome and \(Y_{st}\) the aggregate outcome. The two are linked by population aggregation weights \(v_{sc}\) summing to one within each aggregate:
The default in mlsynth.config_models.MLSCConfig is uniform
weights \(v_{sc} = 1 / C_s\); non-uniform population weights can
be supplied through the weight_col field.
The dGSC Family and the Aggregation Choice#
Section 4 of the paper introduces the disaggregated general SC
(dGSC) class, parametrized by a weight matrix
\(W_{c_0 s c}\) indexed by treated disaggregate units
\(c_0\), control aggregates \(s\), and control disaggregates
\(c\):
with \(W\) chosen to minimize the pre-treatment squared error subject to convex-hull constraints \(\sum_{s, c} W_{c_0 s c} = 1\) and \(W \geq 0\). The four edge cases discussed in Sections 3-4 of the paper correspond to constraints on \(W\):
dGSC-AA — aggregate treated, aggregate controls (the classical SC of Abadie et al., 2010). All within-block weights are proportional to \(v_{sc}\) and equal across treated disaggregates.
dGSC-AD — aggregate treated, disaggregate controls. The treated unit is kept at the aggregate level (weights equal across \(c_0\)); disaggregate control weights vary freely within convexity.
dGSC-DA — disaggregate treated, aggregate controls.
dGSC — disaggregate on both sides.
Proposition 1 of the paper shows that all four cases share the same
classical-SC objective, differing only in the donor pool implied by
the weight constraints. Section 6 demonstrates empirically that
disaggregating the controls is the primary source of estimation
gains; disaggregating the treated unit alone usually worsens
performance (Jensen’s inequality: matching an average is easier than
matching each component). For this reason, the mlSC estimator
implemented in mlsynth focuses on the aggregate-treated,
flexible-disaggregate-controls regime — the dGSC-AD case — with a
penalty that shrinks toward classical SC.
The mlSC Objective#
With the treated unit kept aggregated (the paper’s preferred variant, Equation 5.2), the weight matrix collapses to a single vector \(\omega_{sc}\) over all disaggregate control units. Let \(w_s = \sum_c \omega_{sc}\) denote the implied aggregate weight on donor unit \(s\). The mlSC estimator solves
The penalty term measures how far the disaggregate weight vector deviates from “population-share times aggregate weight”. When \(\lambda \to \infty\) the penalty forces \(\omega_{sc} = v_{sc} \, w_s\), which collapses the estimator to classical SC at the aggregate level. When \(\lambda = 0\) the disaggregate weights are free to vary within the convex hull, recovering the fully-disaggregated dGSC-AD estimator. The scale factor \(\hat\sigma_y^2\) keeps \(\lambda\) itself scale-invariant across panels.
The Block-Diagonal Penalty Matrix#
Stacking the within-block penalty terms gives a clean quadratic form. For each aggregate \(s\), let \(\omega_s \in \mathbb{R}^{C_s}\) be the within-block weight vector and \(v_s \in \mathbb{R}^{C_s}\) the corresponding population weights. Defining \(A_s = I - v_s \mathbf{1}^\top\), the within-block contribution is
with
The full penalty matrix
\(Q \in \mathbb{R}^{M \times M}\), with \(M = \sum_s C_s\)
the total number of disaggregate control units, is block-diagonal
across \(s\) with blocks \(Q_s\). \(Q_s\) is symmetric
positive semidefinite with \(v_s\) in its kernel (any
weight vector proportional to \(v_s\) incurs zero penalty), so
the mlSC optimization remains a convex QP and admits standard
cvxpy solvers. mlsynth uses SCS by default; this
matches the reference implementation and avoids any commercial-solver
dependency.
Selecting \(\lambda\): Heuristic and Fixed Modes#
Two penalty-selection rules are exposed via
mlsynth.config_models.MLSCConfig.lambda_est:
'heuristic'— the closed-form rule from Section 5.2 of the paper,\[\hat\lambda \;=\; 2 \, \hat\sigma_\varepsilon^2 \,/\, \hat\sigma_y^2,\]derived from the oracle optimal \(\lambda^*\) in a stylized hierarchical random-effects model under \(T = S = C_s = 2\) (Appendix B of the paper). Intuitively, the heuristic imposes a larger penalty when the panel is noisier, since added flexibility is more likely to overfit noise than to extract signal.
'fixed'— use a user-supplied \(\lambda\) directly. Useful for sensitivity analysis (sweep \(\lambda\) and inspect how the aggregate weight pattern shifts between classical-SC and disaggregated-SC regimes) and for reproducing the paper’s grid- search experiments.
The cross-validation-over-time rule from Section 5.2 is not yet
exposed in v1 of the mlsynth implementation; it is on the
short-term roadmap.
Variance Decomposition (Appendix G)#
The heuristic requires plug-in estimates of
\(\sigma_\varepsilon^2\) and \(\sigma_y^2\). Following the
paper’s Appendix G, these are obtained from a simplified hierarchical
random-effects model \(Y_{sct} = \alpha_s + \eta_{sc} +
\varepsilon_{sct}\) fit to the pre-treatment slice of df_disagg.
For each control aggregate \(s\) (the treated aggregate is
excluded so the post-period unobservability does not contaminate the
estimates), define
and
with \(\overline{Y}_{s\cdot\cdot}\) the pre-treatment grand mean
over aggregate \(s\). The plug-in estimates are then the
non-treated averages \(\hat\sigma_\varepsilon^2 =
\mathrm{mean}_s \widehat{\mathrm{Var}}(\varepsilon)_s\) and
\(\hat\sigma_y^2 = \mathrm{mean}_s \widehat{\mathrm{Var}}(y)_s\).
The implementation in
mlsynth.utils.mlsc_helpers.variance.estimate_variance_components()
matches the reference Python package and the paper’s description
to the letter.
Counterfactual and Effect Summaries#
Given the optimal \(\omega\), the aggregate counterfactual is
and the average treatment effect on the treated aggregate is
reported as MLSCResults.att. The pre-period RMSE between
the observed treated series and its mlSC reconstruction,
\(\mathrm{RMSE}_{\text{pre}} = \big[\frac{1}{T_0}
\sum_{t \le T_0} (Y_{0t} - \hat Y_{0t})^2\big]^{1/2}\), is reported
as MLSCResults.pre_rmse.
Two-DataFrame API and Validation#
Because the aggregate-level and disaggregate-level data live at
different units of observation, they are passed as two separate
long-form DataFrames sharing the same outcome, time, and
treat column names. df_disagg additionally carries an
agg_id column mapping each disaggregate unit to its parent
aggregate, and an optional weight_col for non-uniform
\(v_{sc}\). The configuration validator enforces four
cross-DataFrame invariants:
Treatment timing alignment. The pre-period count \(T_0\) implied by
df_aggmust match the one implied bydf_disagg; any disagreement raisesmlsynth.exceptions.MlsynthDataError.Treated-aggregate consistency. Every disaggregate unit with
treat = 1must map (viaagg_id) to the same aggregate unit, which must equal the treated aggregate indf_agg.Single treatment cohort. All treated disaggregate units must share the same treatment start period (treatment-never-turns-off is part of the mlSC framework).
Aggregate-label coverage. Every value of
agg_idindf_disaggmust appear as aunitid_aggindf_agg.
These invariants are enforced once in
mlsynth.utils.mlsc_helpers.setup.prepare_mlsc_inputs(), which
reuses mlsynth.utils.datautils.dataprep() on both panels
(including the disaggregate cohorts shape) and then assembles the
matrices the rest of the pipeline consumes.
Assumptions (Bottmer 2026)#
mlSC inherits the canonical SC identification stack and adds the structural conditions needed for the hierarchical penalty to make sense – and for the heuristic \(\lambda\) to be calibrated. The paper’s formal assumptions:
A1 (linear aggregation). The aggregate outcome is a known weighted average of its disaggregate components, \(Y_{st} = \sum_{c = 1}^{C_s} v_{sc} Y_{sct}\) with \(\sum_c v_{sc} = 1\) for every aggregate. The weights \(v_{sc}\) are observable to the analyst (uniform by default; population shares are the canonical alternative). Required for the estimand at the aggregate level to be a clean function of the disaggregate potential outcomes.
A2 (binary, sharp, absorbing treatment at the aggregate level). A single aggregate unit (\(s = 0\)) is treated; the treatment indicator \(W_{sct}\) is 1 iff \(s = 0\) and \(t > T_0\), and once on it stays on. Equivalently defined at the aggregate level. Multiple treated aggregates are out of scope of the paper’s main theory (the paper discusses staggered designs only as a future direction; mlsynth’s FECT / Synthetic Difference-in-Differences (SDID) are the staggered-aware alternatives).
A3 (SUTVA at the disaggregate level). No interference between disaggregate units. Note this is stronger than canonical-SC SUTVA which is at the aggregate level: mlSC pulls within-state variation directly into the weight fit, so an exposed county influencing a non-exposed county within the same donor state breaks identification at the level mlSC operates on.
A4 (hierarchical latent-factor outcome model – Section 6.1, Appendix G). The untreated disaggregate outcome decomposes as
with a shared set of time factors \(\beta_t\), aggregate-level loadings \(\alpha_s\), and within-aggregate disaggregate deviations \(\eta_{sc}\). This is what licenses the variance- decomposition recipe (Appendix G) feeding the heuristic \(\hat\lambda = 2 \hat\sigma_\varepsilon^2 / \hat\sigma_y^2\). The model also implies the four-way MSE decomposition the paper uses to characterise the central trade-off (oracle bias + restriction bias + estimation error + post-period noise).
A5 (feasibility / convex hull for dGSC-AD). The treated aggregate’s pre-period trajectory lies in the convex hull of the disaggregate donor pool \(\{Y_{sct}\}_{s \ge 1, c}\). With disaggregation, the donor pool is \(\sum_{s \ge 1} C_s\) units wide – typically an order of magnitude larger than the \(S\)-state aggregate hull – so this is easier to satisfy than canonical SC’s hull condition, which is the main empirical motivation for the estimator. The penalty term is what stops the expanded hull from degenerating into a non-unique solution at zero pre-period RMSE.
Headline theoretical finding (paper Section 1, MSE decomposition). Under A1-A4, the post-treatment MSE of any dGSC estimator decomposes as
The classical SC at \(\lambda \to \infty\) minimises estimation error but pays the restriction bias; the fully disaggregated SC at \(\lambda = 0\) removes the restriction bias but inflates estimation error from the wider donor pool. The mlSC penalty traces the bias-variance frontier between the two, with the data-driven \(\hat\lambda\) picking the sweet spot. Bottmer emphasises this is not a standard bias-variance trade- off – it is a flexibility-vs-noise-sensitivity trade-off, which is why the heuristic scales as \(\sigma_\varepsilon^2 / \sigma_y^2\) (the disaggregate-noise share of total variance).
When the assumptions bind: practical diagnostics#
Linear aggregation with known weights (A1). The aggregate outcome must be a known linear function of the disaggregate outcomes. Some outcomes don’t aggregate cleanly – a state-level unemployment rate, for instance, is a ratio whose state-level value is not the population-weighted mean of county-level ratios.
Plausibly violated when the outcome is a ratio (rate, share, log-of-aggregate), an order statistic (median income), or a normalised index. Diagnostic: compute \(\sum_c v_{sc} Y_{sct}\) directly from
df_disaggand compare to the corresponding row ofdf_agg; if the two disagree by more than rounding noise for any(s, t), A1 is failing. Re-derive the disaggregate outcome so that the aggregation is exact (e.g. work with county-level employment counts, not unemployment rates), or move to canonical SC on the aggregate panel alone.Binary, sharp, absorbing treatment (A2). Multiple treated aggregates, treatment that turns off, or continuous dose break the identification.
Plausibly violated when the policy is rescinded mid- panel, when several states adopted at different times, or when the treatment intensity varies. Diagnostic: the config validator enforces single-cohort by construction; if you have to silently coerce a staggered-adoption panel into a single cohort to get past validation, that is a structural warning. Use FECT / Synthetic Difference-in-Differences (SDID) for staggered adoption, Continuous-Treatment Synthetic Control (CTSC) for continuous dose.
SUTVA at the disaggregate level (A3). Disaggregate units within donor states must not interfere with each other in a way that the aggregation washes out.
Plausibly violated when counties within a donor state share a labour market, a school district, or a media market that an exposed county can influence. Diagnostic: split each donor state into core / periphery counties by a geographic or economic criterion, refit, and check whether the implied \(\omega_{sc}\) and pre-RMSE move substantially. If they do, the within-state interference is doing real work and you need a spillover-aware setup (Spillover-Aware Synthetic Control (SPILLSYNTH), Spatial Synthetic Difference-in-Differences (SpSyDiD)) which dispenses with the disaggregate SUTVA.
Hierarchical latent-factor outcome model (A4). The Appendix-G variance decomposition that feeds \(\hat\lambda\) is derived under a specific state-effect + county-within-state + iid shock structure. If the disaggregate variation is not well-described by a clean state-level component plus a uniform-variance idiosyncratic shock, the heuristic mis-prices the flexibility-vs-noise trade-off.
Plausibly violated when some counties are systematically noisier than others (urban vs rural variance pattern), or when within-state cross-county correlation is high (a state-wide industry shock hitting every county similarly). Diagnostic: read
results.design.sigma_eps2andresults.design.sigma_y2and refit withlambda_est="fixed"swept over a grid spanning2 * sigma_eps2 / sigma_y2by an order of magnitude in each direction; if the implied ATT moves by more than its bootstrap SE across that range, the heuristic is unstable on your panel and you should switch to"cross-validation over time"once that selector ships.Convex-hull feasibility with disaggregate donors (A5). Expanding the donor pool by disaggregation usually makes the hull easier to satisfy, but in pathological cases – a coastal mega-state matched only by tiny interior counties – the wider pool may still miss the treated aggregate.
Plausibly violated when
results.pre_rmseis large andresults.aggregate_donor_weightsis highly concentrated on a single donor state (the penalty is pulling toward the classical SC but the classical SC is itself missing). Diagnostic: sweeplambda_est="fixed"over a wide grid (the existing example block in the docs shows the recipe); if pre-RMSE stays large at \(\lambda = 0\), the disaggregate hull is structurally missing the treated aggregate and you need Imperfect Synthetic Controls (ISCM) (which identifies the effect even when the treated unit is outside the hull).Aggregation weights stable over time. A1 implicitly assumes the \(v_{sc}\) are fixed across \(t\). If population shares drift substantially within the analysis window, the disaggregate-to-aggregate map itself becomes a moving target.
Plausibly violated when the analysis window spans a decade or more of demographic change (urban-rural migration; county boundary changes). Diagnostic: re-run with
weight_colset to a time-averaged share vs an end-of-period share; if the ATT moves, the drift is binding. Use the end-of-period share if the post-period weighting is what the policy question targets, or interpolate \(v_{sc}\) over time and feed in a time-varying weight column (not currently supported in v1; a roadmap item).
When to use mlSC – and when not to#
Reach for mlSC when:
You have disaggregate data underneath an aggregate-level policy: county outcomes beneath a state-wide tax change, store-level outcomes beneath a chain-wide pricing change, household outcomes beneath a regional-government program. The estimand stays at the aggregate level; the disaggregate data is information to be exploited.
The canonical SC pre-fit at the aggregate level is mediocre – the small number of aggregates makes the hull thin and the pre-period RMSE elevated. Disaggregating the donor pool widens the hull by an order of magnitude (counties vs states), and the penalty keeps the wider hull from degenerating into a non-unique solution.
You want a single data-driven aggregation choice rather than the manual “do I use state or county donors?” call. The heuristic \(\hat\lambda\) and the (forthcoming) cross-validation-over-time selector turn that judgment into a tunable parameter.
Pre-period \(T_0\) is short enough that classical SC overfits but long enough that the heuristic-variance decomposition is well-identified (a dozen pre-periods or more, per Bottmer’s Section 6 calibration).
Do not use mlSC when:
You only have aggregate-level data. mlSC requires
df_disaggwith at least a handful of disaggregate units per donor aggregate. Without it, fall back to canonical canonical SCM / Two-Step Synthetic Control / Forward Difference-in-Differences (FDID).Treatment is assigned at the disaggregate level. The paper enforces aggregate-level assignment (A2); if the policy hits a single county rather than a whole state, you want a disaggregate-target estimator (canonical SCM, MicroSynth (User-Level Balancing SC) for individual-level treatment with covariate balancing).
The aggregate outcome isn’t a linear function of disaggregate outcomes. Ratios, order statistics, and log-of-aggregate outcomes break A1. Either re-derive the outcome so the aggregation is exact, or move to canonical SC.
The treated aggregate is structurally outside the disaggregate hull. Even at \(\lambda = 0\) the pre-period RMSE stays large – the wider hull doesn’t help because the treated unit’s level is unreachable by any convex combination of donor disaggregates. Use Imperfect Synthetic Controls (ISCM), whose A2(b) mechanism identifies the effect through donors that use the treated aggregate as a positive-weight donor.
You need within-aggregate treatment effect heterogeneity. mlSC targets the aggregate ATT. The paper notes (Introduction) that disaggregating the treated unit (dGSC-DA / dGSC) is the door to heterogeneity but typically worsens out-of-sample performance for the aggregate-level effect. For genuinely heterogeneous-effects questions, run county-by-county canonical SC instead.
Outcomes are extremely noisy at the disaggregate level. When the within-state idiosyncratic shock variance dominates the total variance, the disaggregate data is more noise than signal and the heuristic pushes \(\lambda \to \infty\) (collapsing mlSC to classical SC). In that regime use canonical SC directly to skip the extra bookkeeping.
Staggered adoption across multiple aggregates. Single-cohort is the paper’s main scope; for staggered designs use FECT or Synthetic Difference-in-Differences (SDID).
Spillovers within a donor aggregate. A3 fails if counties within a donor state influence each other in a way the aggregation cannot wash out. Use Spillover-Aware Synthetic Control (SPILLSYNTH) or Spatial Synthetic Difference-in-Differences (SpSyDiD).
Distributional questions (Lorenz curves, QTEs). mlSC targets the mean aggregate ATT; for distributional effects use Distributional Synthetic Control (DSC).
Continuous treatment. Use Continuous-Treatment Synthetic Control (CTSC).
You need a sparse, hand-interpretable single weight vector for policy storytelling. mlSC’s disaggregate weight matrix \(\omega_{sc}\) is intrinsically a many-county object; you can report the implied aggregate weights
results.aggregate_donor_weights(Bottmer’s \(w_s = \sum_c \omega_{sc}\)) but those are derived rather than directly optimised, and at small \(\lambda\) they will not be sparse. If the headline story has to be “California ≈ 0.4 Utah + 0.3 Montana + 0.2 Nevada”, canonical SC remains the cleaner deliverable.
Core API#
Multi-Level Synthetic Control (mlSC) estimator.
Implements:
Bottmer, L. (2025). “Synthetic Control with Disaggregated Data.” Stanford University job-market paper.
Reference Python package: multi-levelSC (Apache-2.0), reimplemented here
under mlsynth’s MIT license following the SCDI / SPCD / LEXSCM / TASC
architectural conventions.
The mlSC estimator solves a hierarchical-aggregation regularized SC problem
on a two-level panel (e.g. state-level treated unit + county-level
controls). Treatment is assigned at the aggregate level; the disaggregate
data enters as an enlarged donor pool, with a ridge-type penalty that
shrinks each disaggregate weight omega_sc toward v_sc * w_s
(population-share times implied aggregate weight). At lambda -> infinity
the estimator recovers classical SC; at lambda = 0 it recovers the fully-
disaggregated control SC (dGSC-AD in the paper’s notation). v1 implements
the Section 5.2 heuristic lambda = 2 * sigma_eps^2 / sigma_y^2 and a
user-supplied fixed lambda.
- class mlsynth.estimators.mlsc.MLSC(config: MLSCConfig | dict)#
Bases:
objectMulti-Level Synthetic Control (mlSC) estimator.
Operates on two long-form DataFrames: one aggregate-level (
df_agg, e.g. state by time) and one disaggregate-level (df_disagg, e.g. county by time with anagg_idcolumn mapping each county to its state). Treatment is assigned at the aggregate level. The optimization weights all disaggregate control units to minimize pre-treatment fit on the aggregate treated series, with a hierarchical penalty that shrinks toward classical SC.- Parameters:
config (MLSCConfig or dict) – Configuration object. See
mlsynth.config_models.MLSCConfig.- Returns:
MLSCResults – Container with the optimal disaggregate weights, implied aggregate weights, counterfactual path, ATT, and pre-period RMSE.
Notes
Both DataFrames must share the same
outcome,time, andtreatcolumn names and must agree on treatment timing. The treated row indf_disaggmust map (viaagg_id) to the treated aggregate unit indf_agg.References
Bottmer, L. (2025). Synthetic Control with Disaggregated Data. Stanford University job-market paper.
- fit() MLSCResults#
Run the mlSC pipeline and return the learned design.
Configuration#
- class mlsynth.config_models.MLSCConfig(*, df_agg: ~pandas.DataFrame, df_disagg: ~pandas.DataFrame, outcome: str, time: str, treat: str, unitid_agg: str, unitid_disagg: str, agg_id: str, weight_col: str | None = None, lambda_est: ~typing.Literal['heuristic', 'fixed'] = 'heuristic', lambda_val: ~typing.Annotated[float, ~annotated_types.Ge(ge=0)] = 0.0001, solver: ~typing.Any = None, display_graphs: bool = True, save: bool | str | ~typing.Dict[str, str] = False, counterfactual_color: str | ~typing.List[str] = <factory>, treated_color: str = 'black')#
Configuration for the Multi-Level Synthetic Control (mlSC) estimator.
Implements the data-driven hierarchical-aggregation estimator of Bottmer (2025, “Synthetic Control with Disaggregated Data”). Unlike the rest of mlsynth’s estimators, mlSC operates on a two-level panel: an aggregate-level DataFrame (e.g. state-by-time) and a disaggregate-level DataFrame (e.g. county-by-time, with a column linking each county to its parent state). Treatment is assigned at the aggregate level and the estimand is the aggregate-level ATT, but the disaggregate data enters the donor pool with a ridge-type penalty that shrinks the disaggregated weights toward “population-share times aggregate weight” — recovering classical SC in the large-penalty limit and fully-disaggregated SC at penalty zero.
v1 implements only the heuristic and fixed-lambda penalty-selection paths from Section 5.2 of the paper; cross-validation will follow.
- df_agg: DataFrame#
- df_disagg: DataFrame#
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
Helper Modules#
Two-DataFrame data preparation for mlSC.
Wraps datautils.dataprep once on each panel and assembles the matrices
the rest of the pipeline consumes. Validation enforces alignment between the
aggregate and disaggregate panels: matching treatment timing, complete
agg_id coverage, and population weights that sum to 1 within each
aggregate (or default to uniform).
- mlsynth.utils.mlsc_helpers.setup.prepare_mlsc_inputs(df_agg: DataFrame, df_disagg: DataFrame, outcome: str, time: str, treat: str, unitid_agg: str, unitid_disagg: str, agg_id: str, weight_col: str | None) MLSCInputs#
Validate, pivot, and stack the two-level panel for mlSC.
- Parameters:
df_agg, df_disagg (pd.DataFrame) – Long-form aggregate and disaggregate panels.
outcome, time, treat (str) – Column names shared by both panels.
unitid_agg, unitid_disagg, agg_id, weight_col (str) – Aggregate and disaggregate identifier columns, the disaggregate-> aggregate mapping column, and an optional population-weights column.
- Returns:
MLSCInputs – Pre-processed matrices, weight vector, block-index map, and metadata.
Variance decomposition for the mlSC heuristic penalty.
Implements Appendix G of Bottmer (2025): under the simplified hierarchical random-effects model
Y_sct = alpha_s + eta_sc + eps_sct,
the unit-level pre-treatment means yield consistent estimates of
sigma_eps^2 and sigma_y^2. The mlSC heuristic
lambda = 2 * sigma_eps^2 / sigma_y^2 (Section 5.2) is built from these.
The decomposition is computed only over the control aggregates — the treated aggregate is excluded, matching the upstream package and the paper’s description (“taking the average estimated variance across all other aggregated units except for the treated unit”).
- mlsynth.utils.mlsc_helpers.variance.estimate_variance_components(inputs: MLSCInputs) Tuple[float, float]#
Return
(sigma_eps2, sigma_y2)per Appendix G of Bottmer (2025).- Parameters:
inputs (MLSCInputs) – Pre-processed mlSC inputs. Uses
Y_disagg_pre_full,disagg_to_agg_full, andtreated_agg_idx_fullto run the decomposition over the full disaggregate pre-treatment panel, excluding the treated aggregate.- Returns:
sigma_eps2 (float) – Estimated within-disaggregate-unit noise variance, averaged across non-treated aggregates.
sigma_y2 (float) – Estimated total disaggregate-outcome variance, averaged across non-treated aggregates.
- mlsynth.utils.mlsc_helpers.variance.heuristic_lambda(sigma_eps2: float, sigma_y2: float) float#
Closed-form heuristic
lambda = 2 * sigma_eps^2 / sigma_y^2.
Block-diagonal hierarchical-aggregation penalty matrix for mlSC.
The mlSC objective contains the ridge-type penalty
sum_{s, c} (omega_sc - v_sc * w_s)^2, w_s = sum_c omega_sc.
For each aggregate block s with population weights v_s in R^{C_s}
satisfying 1^T v_s = 1, the within-block penalty equals
omega_s^T Q_s omega_s where
- Q_s = (I - v_s 1^T)^T (I - v_s 1^T)
= I - v_s 1^T - 1 v_s^T + ||v_s||^2 * 1 1^T.
The full penalty matrix Q in R^{M x M} is block-diagonal with these
Q_s blocks. Q_s is positive semidefinite (kernel: v_s), so the
mlSC objective remains a convex QP.
- mlsynth.utils.mlsc_helpers.penalty.build_block(v_s: ndarray) ndarray#
Return the single-block penalty matrix
Q_sfor one aggregate.
- mlsynth.utils.mlsc_helpers.penalty.build_penalty_matrix(v_population: ndarray, disagg_to_agg: ndarray) ndarray#
Assemble the full block-diagonal
Qmatrix.- Parameters:
v_population (np.ndarray) – Length-
Mpopulation weights for each disaggregate column.disagg_to_agg (np.ndarray) – Length-
Minteger block-membership index for each column.
- Returns:
np.ndarray – Block-diagonal
Qof shape(M, M).
MlSC QP solver.
Solves Equation 5.2 of Bottmer (2025):
min over omega in R^M
|| Y_agg_treated_pre - X_disagg_pre @ omega ||^2
+ lambda * sigma_y^2 * omega^T Q omega
s.t. 1^T omega = 1, omega >= 0.
The objective is a convex QP. We seed cvxpy with the warm-start trick the
upstream package uses (Bottmer’s reference implementation): solve the
classical aggregate-level SC first, then expand each aggregate weight
w_s proportionally to the population shares v_s to produce a feasible
disaggregate starting point. This typically takes the SCS solver a handful
of iterations to reach optimality even on noisy panels.
- mlsynth.utils.mlsc_helpers.optimization.solve_mlsc(inputs: MLSCInputs, Q: ndarray, lambda_val: float, sigma_y2: float, solver: Any = None) Tuple[ndarray, ndarray, str]#
Solve the mlSC QP.
- Parameters:
inputs (MLSCInputs) – Pre-processed two-level panel.
Q (np.ndarray) – Block-diagonal penalty matrix of shape
(M, M).lambda_val (float) – Penalty value (already chosen via heuristic or fixed).
sigma_y2 (float) – Outcome variance, used as the penalty’s scale factor so that
lambdais scale-invariant as in the paper.solver (Any) – cvxpy solver.
None->cp.SCS.
- Returns:
omega (np.ndarray) – Optimal disaggregate weights, length
M.aggregate_weights (np.ndarray) – Implied
w_s = sum_c omega_sc, lengthS.status (str) – cvxpy solver status string.
Counterfactual path and effect summaries for mlSC.
The aggregate-level counterfactual is just X_disagg @ omega evaluated
across all T periods. The pre-period RMSE and post-period mean gap come
from comparing this trajectory against the observed aggregate treated
outcome.
- mlsynth.utils.mlsc_helpers.inference.counterfactual_path(inputs: MLSCInputs, omega: ndarray) MLSCInference#
Build the counterfactual and gap series.
- mlsynth.utils.mlsc_helpers.inference.summarize_effects(inputs: MLSCInputs, inference: MLSCInference) Tuple[float, float]#
Return
(att, pre_rmse)for the aggregate treated unit.
Plotting helper for mlSC.
Wraps mlsynth.utils.resultutils.plot_estimates so we get the standard
observed-vs-counterfactual chart, where “observed” is the aggregate treated
series and “counterfactual” is X_disagg @ omega.
- mlsynth.utils.mlsc_helpers.plotter.plot_mlsc(results: MLSCResults, treated_color: str = 'black', counterfactual_color: str | List[str] = 'red', save: bool | dict | str = False, time_axis_label: str = 'Time', outcome_label: str = 'Outcome', treatment_label: str = 'Treatment', unit_label: str = 'Unit') None#
Render the mlSC aggregate counterfactual against the observed series.
Structured containers for the mlSC pipeline.
All matrices follow mlsynth’s standard (T, N) orientation (rows = time,
columns = unit), matching datautils.dataprep.
- class mlsynth.utils.mlsc_helpers.structures.MLSCDesign(omega: ndarray, aggregate_weights: ndarray, lambda_used: float, sigma_eps2: float, sigma_y2: float, lambda_est: str, solver_status: str)#
Optimization output and selected hyperparameters.
- Parameters:
omega (np.ndarray) – Optimal disaggregate weights, length
M. Satisfiessum(omega) = 1andomega >= 0.aggregate_weights (np.ndarray) – Implied aggregate weights
w_s = sum_c omega_scfor each control aggregate, lengthS.lambda_used (float) – Penalty value actually applied.
sigma_eps2 (float) – Estimated noise variance from Appendix G.
sigma_y2 (float) – Estimated outcome variance from Appendix G.
lambda_est (str) – Selection rule that was used (
'heuristic'or'fixed').solver_status (str) – cvxpy solver status string.
- aggregate_weights: ndarray#
- omega: ndarray#
- class mlsynth.utils.mlsc_helpers.structures.MLSCInference(counterfactual: ndarray, gap: ndarray)#
Counterfactual path and post-period gap.
- Parameters:
counterfactual (np.ndarray) – Estimated aggregate counterfactual
X_disagg @ omega, lengthT.gap (np.ndarray) – Observed minus counterfactual, length
T.
- counterfactual: ndarray#
- gap: ndarray#
- class mlsynth.utils.mlsc_helpers.structures.MLSCInputs(Y_agg_treated: ndarray, X_disagg: ndarray, v_population: ndarray, disagg_to_agg: ndarray, agg_labels: Sequence, disagg_labels: Sequence, Y_disagg_pre_full: ndarray, disagg_to_agg_full: ndarray, treated_agg_idx_full: int, T: int, T0: int, treated_unit_name: str, time_labels: ndarray, Ywide_agg: object, outcome: str)#
Pre-processed data fed into the mlSC optimization and inference loops.
- Parameters:
Y_agg_treated (np.ndarray) – Observed aggregate-level treated outcome series, length
T.X_disagg (np.ndarray) – Disaggregate-control outcome matrix of shape
(T, M)whereMis the total number of disaggregate control units (sum ofC_sacross all non-treated aggregates). Columns are ordered first by aggregate label and then by disaggregate label, with this ordering preserved indisagg_to_aggandv_population.v_population (np.ndarray) – Population aggregation weights
v_scfor each disaggregate control column, lengthM. Within each aggregate block the entries sum to 1.disagg_to_agg (np.ndarray) – Length-
Minteger array giving the aggregate-block index for each disaggregate column. Aggregate blocks are indexed 0..S-1.agg_labels (Sequence) – Labels of the
Scontrol aggregates in block order.disagg_labels (Sequence) – Labels of the
Mdisaggregate-control columns in column order.Y_disagg_pre_full (np.ndarray) – Pre-treatment disaggregate outcome matrix for all aggregates (including the treated one), shape
(T0, M_full). Used by the Appendix-G variance decomposition.disagg_to_agg_full (np.ndarray) – Length-
M_fullaggregate-index assignment matchingY_disagg_pre_fullcolumns.treated_agg_idx_full (int) – Index of the treated aggregate within
Y_disagg_pre_fullblocks (used to exclude it from the variance estimate).T (int) – Total number of time periods.
T0 (int) – Number of pre-treatment periods.
treated_unit_name (str) – Label of the treated aggregate.
time_labels (np.ndarray) – Time labels in original order, length
T.Ywide_agg (object) – Wide aggregate-level outcome frame (rows = time, columns = aggregate unit) preserved from
dataprepfor downstream plotting.outcome (str) – Outcome variable name.
- X_disagg: ndarray#
- Y_agg_treated: ndarray#
- Y_disagg_pre_full: ndarray#
- disagg_to_agg: ndarray#
- disagg_to_agg_full: ndarray#
- time_labels: ndarray#
- v_population: ndarray#
- class mlsynth.utils.mlsc_helpers.structures.MLSCResults(inputs: MLSCInputs, design: MLSCDesign, inference: MLSCInference, att: float, pre_rmse: float, donor_weights: dict, aggregate_donor_weights: dict)#
Public
MLSC.fit()return container.- Parameters:
inputs (MLSCInputs) – Pre-processed two-level panel.
design (MLSCDesign) – Optimization outputs and variance-decomposition diagnostics.
inference (MLSCInference) – Counterfactual path and gap.
att (float) – Mean post-period gap.
pre_rmse (float) – Pre-period RMSE of the aggregate counterfactual against the observed treated series.
donor_weights (dict) – Mapping from disaggregate-unit label to its weight. Convenience view of
design.omega.aggregate_donor_weights (dict) – Mapping from aggregate-unit label to its implied weight
w_s = sum_c omega_sc. Convenience view ofdesign.aggregate_weights.
- design: MLSCDesign#
- inference: MLSCInference#
- inputs: MLSCInputs#
The README hierarchical-factor DGP from the author’s reference package,
packaged as simulate_mlsc_sample so the Verification replication
runs as a one-liner.
README data-generating process for Bottmer (2025) mlSC.
Implements the simulation distributed with the author’s reference package (multi-level-sc-estimator/README.md): a hierarchical linear factor model with state-level loadings and within-state county deviations.
with
\(f_t \sim \mathcal{N}(0, \sigma_{\text{time}}^2)\) — one shared time factor;
\(\alpha_s \sim \mathcal{N}(0, \sigma_{\text{state}}^2)\) — state-level loading;
\(\eta_{sc} \sim \mathcal{N}(0, \sigma_{\text{county}}^2)\) — within-state county deviation;
\(\varepsilon_{sct} \sim \mathcal{N}(0, \sigma_{\text{eps}}^2)\) — idiosyncratic shock.
Aggregation to the state level uses equal within-state weights (\(v_{sc} = 1 / C_s\)), so \(y_{st} = (1 / C_s) \sum_c y_{sct}\). The default README parameters are \(N_s = 10\), \(C_s = 10\), \(T = 20\), \((\sigma_{\text{time}}, \sigma_{\text{state}}, \sigma_{\text{county}}, \sigma_{\text{eps}}) = (1.0, 0.8, 0.5, 0.3)\).
The simulation never adds a treatment effect — the true ATT is exactly zero, so the Monte Carlo target is unbiasedness of the estimated ATT and value-for-value agreement with the reference implementation.
- class mlsynth.utils.mlsc_helpers.simulation.MLSCSample(df_agg: DataFrame, df_disagg: DataFrame, data_s: ndarray, data_c: ndarray, n_c: ndarray, w_c: List[ndarray], idx: int, t: int)#
One draw from the README’s hierarchical-factor DGP.
- df_agg, df_disagg
Long panels ready for
mlsynth.MLSC. Columns:state/county(disagg only) /time/y/treated.- Type:
pd.DataFrame
- data_s#
Aggregate-level outcome matrix, shape
(N_states, T).- Type:
np.ndarray
- data_c#
Disaggregate-level outcome matrix, shape
(N_states * C_s, T).- Type:
np.ndarray
- n_c#
Counties per state, shape
(N_states,).- Type:
np.ndarray
- w_c#
Within-state population weights (each array sums to one). One entry per state, in the order the reference impl expects.
- Type:
list of np.ndarray
- data_c: ndarray#
- data_s: ndarray#
- df_agg: DataFrame#
- df_disagg: DataFrame#
- n_c: ndarray#
- mlsynth.utils.mlsc_helpers.simulation.simulate_mlsc_sample(N_states: int = 10, counties_per_state: int = 10, T: int = 20, sigma_time: float = 1.0, sigma_state: float = 0.8, sigma_county: float = 0.5, sigma_eps: float = 0.3, treated_idx: int = 0, treated_t: int | None = None, rng: Generator | None = None) MLSCSample#
Draw one sample from the Bottmer (2025) README DGP.
- Parameters:
N_states (int, default 10) – Number of aggregate units (e.g. states).
counties_per_state (int, default 10) – Number of disaggregate units inside each aggregate unit. Constant across states in the README; pass a per-state count via the reference implementation if you need imbalance.
T (int, default 20) – Total number of periods.
sigma_time, sigma_state, sigma_county, sigma_eps (float) – Standard deviations for the time factor, state-level loading, within-state county deviation, and idiosyncratic shock.
treated_idx (int, default 0) – Aggregate unit assigned to treatment.
treated_t (int or None) – First treated period. Defaults to
T - 1(the README’s choice).rng (np.random.Generator, optional) – NumPy RNG. Defaults to
np.random.default_rng().
- Returns:
MLSCSample
Example#
import pandas as pd
from mlsynth import MLSC
# Two long-form panels sharing outcome / time / treat column names.
#
# df_agg — one row per (state, year): the aggregate panel.
# df_disagg — one row per (county, year): the disaggregate panel.
# Must contain an 'agg_id' column mapping each county
# to its parent state, and may optionally carry a
# 'population' column for non-uniform v_sc.
df_agg = pd.read_csv("state_panel.csv")
df_disagg = pd.read_csv("county_panel.csv")
config = {
"df_agg": df_agg,
"df_disagg": df_disagg,
"outcome": "sales",
"time": "year",
"treat": "treated", # binary 0/1, equal at both levels
"unitid_agg": "state", # column in df_agg
"unitid_disagg": "county_fips", # column in df_disagg
"agg_id": "state", # column in df_disagg
"weight_col": "population", # optional; defaults to 1 / C_s
"lambda_est": "heuristic", # or "fixed" with lambda_val
"display_graphs": True,
}
results = MLSC(config).fit()
# Point estimate, fit diagnostic, learned penalty.
print(results.att) # mean post-period aggregate gap
print(results.pre_rmse) # pre-period RMSE
print(results.design.lambda_used) # selected lambda
print(results.design.sigma_eps2,
results.design.sigma_y2) # Appendix-G variance estimates
# Disaggregate and implied aggregate weights.
print(results.donor_weights) # {county_fips: omega_sc}
print(results.aggregate_donor_weights) # {state: w_s = sum_c omega_sc}
# Counterfactual trajectory (length T) and the observed - counterfactual gap.
cf = results.inference.counterfactual
gap = results.inference.gap
# Sensitivity: sweep lambda by hand.
for lam in [0.0, 1e-2, 1e-1, 1.0, 10.0, 1e6]:
r = MLSC({**config, "lambda_est": "fixed", "lambda_val": lam}).fit()
print(f"lambda={lam:>8.4f} ATT={r.att:.4f} preRMSE={r.pre_rmse:.4f}")
Verification#
Empirical replication against the author’s reference code (Path A)
plus a Monte Carlo unbiasedness check (Path B). Bottmer’s published
empirical application — the effect of a state minimum-wage policy
change on county-level teen employment in Iowa — uses the Quarterly
Workforce Indicators (QWI) panel that ships with her reference
package (tests/ia_emp_app_teen_empl.csv). The simulation block in
her README spells out a hierarchical linear-factor DGP that’s free of
proprietary data and is the design mlSC_estimator is calibrated
against. Both validations are wired in below.
The DGP is packaged in
mlsynth.utils.mlsc_helpers.simulation.simulate_mlsc_sample():
with \(f_t \sim \mathcal{N}(0, 1)\), \(\alpha_s \sim \mathcal{N}(0, 0.8^2)\), \(\eta_{sc} \sim \mathcal{N}(0, 0.5^2)\), and \(\varepsilon_{sct} \sim \mathcal{N}(0, 0.3^2)\) at the README defaults \(N_s = 10\), \(C_s = 10\), \(T = 20\). The simulation never adds a treatment effect, so the true ATT is exactly zero and the Monte Carlo target is unbiasedness.
Path A: value-for-value vs the reference code (one draw)#
On the README’s exact panel (np.random.default_rng(42)), mlsynth’s
MLSC reproduces both the selected penalty and the reported ATT to
solver tolerance:
import numpy as np
from mlsynth import MLSC
from mlsynth.utils.mlsc_helpers.simulation import simulate_mlsc_sample
from multi_level_sc_estimator.mlSC import mlSC_estimator # reference
s = simulate_mlsc_sample(rng=np.random.default_rng(42))
tau_ref, lam_ref, _ = mlSC_estimator(
s.data_s, s.data_c, s.idx, s.n_c, s.t, s.w_c,
lambda_est="heuristic")
res = MLSC({"df_agg": s.df_agg, "df_disagg": s.df_disagg,
"outcome": "y", "time": "time", "treat": "treated",
"unitid_agg": "state", "unitid_disagg": "county",
"agg_id": "state", "lambda_est": "heuristic",
"display_graphs": False}).fit()
print(f"REFERENCE tau_hat = {tau_ref:+.6f} lambda = {lam_ref:.6f}")
print(f"MLSYNTH tau_hat = {res.att:+.6f} "
f"lambda = {res.design.lambda_used:.6f}")
prints:
REFERENCE tau_hat = +0.011928 lambda = 1.970185
MLSYNTH tau_hat = +0.011930 lambda = 1.970185
Differences: \(|\hat\tau_{\text{mlsynth}} - \hat\tau_{\text{ref}}| = 1.5 \times 10^{-6}\), \(|\hat\lambda_{\text{mlsynth}} - \hat\lambda_{\text{ref}}| = 4.4 \times 10^{-16}\) — i.e. solver noise on the ATT and machine precision on the penalty.
Path B: unbiasedness across 200 draws#
Looping the comparison across \(M = 200\) draws confirms (a) both estimators are unbiased for the true zero ATT, and (b) they remain in near-machine agreement throughout:
def one_rep(seed):
s = simulate_mlsc_sample(rng=np.random.default_rng(seed))
tau_ref, _, _ = mlSC_estimator(
s.data_s, s.data_c, s.idx, s.n_c, s.t, s.w_c,
lambda_est="heuristic")
res = MLSC({"df_agg": s.df_agg, "df_disagg": s.df_disagg,
"outcome": "y", "time": "time", "treat": "treated",
"unitid_agg": "state", "unitid_disagg": "county",
"agg_id": "state", "lambda_est": "heuristic",
"display_graphs": False}).fit()
return float(tau_ref), float(res.att)
import numpy as np
tau_pairs = np.array([one_rep(s) for s in range(200)])
tau_ref, tau_ml = tau_pairs[:, 0], tau_pairs[:, 1]
print(f"reference: mean={tau_ref.mean():+.4f} std={tau_ref.std(ddof=1):.4f} "
f"RMSE={np.sqrt((tau_ref**2).mean()):.4f}")
print(f"mlsynth : mean={tau_ml.mean():+.4f} std={tau_ml.std(ddof=1):.4f} "
f"RMSE={np.sqrt((tau_ml**2).mean()):.4f}")
print(f"max |Δ| = {np.abs(tau_ml - tau_ref).max():.2e}, "
f"median |Δ| = {np.median(np.abs(tau_ml - tau_ref)):.2e}")
prints:
reference: mean=-0.0103 std=0.1663 RMSE=0.1662
mlsynth : mean=-0.0103 std=0.1663 RMSE=0.1662
max |Δ| = 5.76e-04, median |Δ| = 1.05e-05
The Monte Carlo standard error of the mean at \(M = 200\) is
\(\sigma / \sqrt{200} \approx 0.012\), so the observed bias of
\(-0.010\) is well inside one MC SE of zero — the heuristic
\(\hat\lambda\) does not introduce systematic bias, and mlsynth
and the reference impl produce statistically identical samples.
Empirical application#
For reference, Bottmer’s empirical application — Iowa’s 2007 minimum-
wage hike on county-level teen employment, run on the Quarterly
Workforce Indicators panel that ships with her reference package
(tests/ia_emp_app_teen_empl.csv) — uses the same heuristic-lambda
configuration. Because the panel is public, mlsynth’s MLSC can be
pointed at it directly through the two-DataFrame API; see the
estimator’s docstring for the data-cleaning conventions the package
expects (drop counties with any NaN over the analysis window, mirror
the state-level treatment indicator on each contained county, normalize
within-state weights to sum to one).
References#
Bottmer, L. (2025). Synthetic Control with Disaggregated Data. Stanford University job-market paper. PDF.
The reference Python implementation by the paper’s author, multi-levelSC (Apache-2.0), inspired several details of this rewrite — in particular the variance-decomposition implementation and the classical-SC warm-start used by the cvxpy solver.
Abadie, A., Diamond, A., and Hainmueller, J. (2010). “Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California’s Tobacco Control Program.” Journal of the American Statistical Association 105(490):493-505.