Lexicographic Synthetic Control (LEXSCM)#
Overview#
LEXSCM is a tool for synthetic experimental design: given a panel of units, a subset of which are eligible for treatment, it chooses which \(m\) of them to actually treat so that the rest of the panel forms a credible synthetic control for the treated group, and so that the resulting experiment has enough statistical power to detect the effects the analyst cares about. It is the design-stage analogue of the estimation-stage synthetic control methods elsewhere in mlsynth: instead of taking the treated unit as given and building a donor combination, it takes the donor pool as given and chooses the treated combination.
The method follows Abadie & Zhao, Synthetic Controls for Experimental Design [ABADIE2024], with the power / minimum-detectable-effect (MDE) machinery drawn from the synthetic-experimental-design work of Vives-i-Bastida (2022). LEXSCM optimizes two objectives in a strict priority order (“lexicographically”):
Validity – the treated combination reproduces the population trajectory on the pre-treatment predictors (small imbalance);
Power – subject to validity, the design detects the smallest possible effect (small minimum detectable effect).
The pipeline has four stages, each in its own helper module:
Treated-tuple selection (
lexsearch) – search the \(\binom{M}{m}\) candidate treated combinations for the most balanced ones, under an optional budget.Control fit (
fast_scm_control) – build a synthetic control for each candidate treated group and score its pre-treatment fit.Power analysis (
lexpower) – a moving-block placebo MDE curve over a grid of post-treatment horizons.Recommendation (
lexselect) – a lexicographic rule (validity gate \(\to\) power \(\to\) stability \(\to\) cost) returning a single recommended design plus a Pareto frontier.
Mathematical Formulation#
Setup and notation#
We observe \(J\) units over a pre-treatment period of length \(T_0\). Stack the pre-period outcomes as \(Y \in \mathbb{R}^{T_0 \times J}\) (rows = time, columns = units) and, optionally, \(K\) time-invariant or pre-period covariates as \(Z \in \mathbb{R}^{K \times J}\). The two are stacked vertically into the predictor matrix
so each column \(X_{\cdot, j}\) is unit \(j\)’s predictor profile.
A subset \(\mathcal{C} \subseteq \{1, \dots, J\}\) of size
\(M = |\mathcal{C}|\) is flagged as treatment-eligible (the
candidate_col); the design must pick \(m\) of these.
A population weighting vector \(f \in \mathbb{R}^J\),
\(f \ge 0\), \(\mathbf{1}^\top f = 1\) (uniform \(1/J\) by
default, or a weight_col such as population or revenue) defines the
estimand’s target trajectory – the \(f\)-weighted average unit
The pre-period predictor rows are split into an estimation window \(E\) (the first \(\lfloor \texttt{frac\_E} \cdot T_0 \rfloor\) time rows, plus all covariate rows) and a held-out blank window \(B\) (the remaining pre-period time rows). \(E\) is where the design is fit; \(B\) is a pre-treatment “dress rehearsal” with no treatment, used to validate fit and to calibrate power.
Over \(E\) the predictors are row-standardized against the population target:
with \(\sigma_t\) floored away from zero. Centring on \(\bar{X}_t\) puts the population target at the origin; scaling by the cross-sectional spread \(\sigma_t\) makes mixed-scale predictors (outcomes in dollars, covariates in percent) commensurable so no single predictor dominates the fit. The \(J \times J\) Gram matrix
summarizes all pairwise predictor inner products over \(E\).
Identifying assumptions#
LEXSCM inherits the design-based identification of Abadie & Zhao [ABADIE2024]. The untreated potential outcomes are assumed to follow an (approximate) linear factor model
with common time effects \(\delta_t\), latent time factors \(\theta_t\), unit loadings \(\mu_j\), and mean-zero transitory shocks \(\varepsilon_{j, t}\).
Assumption 1 (approximate balance). There exist treatment weights \(w\) on the chosen treated set \(S\) and control weights \(v\) on the remainder such that the synthetic treated and synthetic control reproduce the population target on the pre-period predictors, \(\bar{X} - \sum_{j \in S} w_j X_j \approx 0\). Remark. This is the design analogue of the SCM convex-hull condition, and it is the only substantive requirement. Crucially it is not imposed as an axiom: the achieved imbalance \(\lVert \bar{X} - \sum_j w_j X_j\rVert\) is a measurable goodness-of-fit quantity, reported for every design, on which the validity of the bias bound and the inference is conditional (Abadie & Zhao, p.13). The analyst checks the \(\approx 0\) condition rather than assuming it.
Assumption 2 (factor structure controls bias). Under the linear factor model, matching the treated and synthetic-control trajectories on a long enough pre-period drives the latent loadings \(\mu_j\) into alignment, so the design-based bias of the average treatment effect on the treated is bounded by the achieved imbalance and shrinks as \(T_0\) grows and the fit tightens. Remark. This is why Stage 1 minimizes imbalance rather than any in-sample treatment contrast: imbalance is the quantity the bias bound is written in.
Assumption 3 (placebo exchangeability / weak stationarity). On the blank window \(B\) no treatment has occurred, so the treated-minus-control gap is pure noise; that gap process is weakly stationary and free of anticipation, so its serial-dependence structure carries over to the post-treatment window. Remark. This is what licenses the Stage 3 power analysis: the post-treatment null distribution of the test statistic is reconstructed by moving-block resampling the blank-window gaps, which preserves autocorrelation – the time-series-robust inference of Abadie & Zhao rather than an i.i.d. permutation.
Stage 1 – Treated-tuple selection#
For a treated set \(S\) with \(|S| = m\) and simplex weights \(w \in \Delta(S) = \{w \ge 0 : \mathbf{1}^\top w = 1\}\), the synthetic treated unit is \(\sum_{j \in S} w_j X_j\). Its imbalance against the population target is, over the estimation window,
where \(G_{SS}\) is the \(m \times m\) sub-Gram matrix on the rows
and columns of \(S\). Because the design is \(f\)-centred the
population target sits at the origin, so \(L(S)\) is the squared
distance from the population centroid to the convex hull of the selected
donors, and \(\sqrt{L(S)}\) is the achieved imbalance of
Assumption 1. Stage 1 returns the top_K sets of smallest \(L(S)\),
subject to the budget below.
How a single tuple is built: the inner simplex QP#
“Building” a tuple \(S\) means solving its inner problem
\(\min_{w \in \Delta(S)} w^\top G_{SS}\, w\) for the optimal
treatment weights \(w(S)\); the synthetic treated unit is then
\(\sum_{j \in S} w_j(S)\, X_j\) and the design’s achieved imbalance is
\(\sqrt{w(S)^\top G_{SS}\, w(S)}\). This convex quadratic program over
the probability simplex is solved by an Away-step Frank-Wolfe (AFW)
routine in pure NumPy (_afw_single), chosen because every iterate stays
exactly on the simplex (no projection step) and the away move removes the
zig-zagging that plain Frank-Wolfe suffers near a face-constrained optimum
– so the support of \(w\) (which donors carry positive weight) sharpens
in a handful of iterations.
Write \(Q = G_{SS}\), \(f(w) = w^\top Q w\), \(\nabla f(w) = 2 Q w\). From a vertex start \(w^{(0)} = e_{\arg\min_i Q_{ii}}\) (the single donor closest to the target), each iteration:
Pick two vertices. The Frank-Wolfe vertex \(s = \arg\min_i [\nabla f(w)]_i\) and the away vertex \(a = \arg\max_{i \in \operatorname{supp}(w)} [\nabla f(w)]_i\) – the currently active donor the gradient most wants to shed.
Choose the direction. Compare the FW direction \(d_{\mathrm{FW}} = e_s - w\) against the away direction \(d_{\mathrm{AW}} = w - e_a\) by \(\langle \nabla f, d\rangle\) and take whichever descends more. The away step’s maximal feasible length is \(\gamma_{\max} = w_a / (1 - w_a)\); the FW step caps at \(1\).
Exact line search. Because \(f\) is quadratic the optimal step along \(d\) is closed-form,
\[\gamma^\star = \operatorname{clip}\!\Bigl( -\frac{w^\top Q d}{d^\top Q d},\; 0,\; \gamma_{\max}\Bigr),\]then \(w \leftarrow w + \gamma^\star d\), dropping \(a\) from the active set when a full away step empties it.
Certified lower bound. The Frank-Wolfe gap gives a running lower bound \(f(w) + \min_i[\nabla f(w)]_i - \nabla f(w)^\top w \le f(w^\star)\) on the tuple’s true minimal loss; iteration stops once the duality gap \(\nabla f(w)^\top w - \min_i[\nabla f(w)]_i\) drops below
tol.
Two-pass precision. Scoring every candidate tuple to convergence would
be wasteful, so Stage 1 runs AFW at two fidelities. A vectorized
batched AFW (_afw_batched, iters=80, all \(m \times m\)
problems advanced together with einsum) ranks thousands of tuples in
one sweep; only the surviving top_K are re-solved by the scalar
_afw_single at iters=600, tol=1e-14 to pin \(w(S)\) and the
loss to full precision. Each returned
TreatedDesign then
carries its indices, the high-precision weights \(w(S)\),
loss \(= L(S)\), imbalance \(= \sqrt{L(S)}\),
total_cost, and a label-keyed weight_dict.
Why a search and not a full enumeration or an exact MIP#
The obvious approach – enumerate every treated combination and keep the best – is defeated by combinatorics. With \(M\) eligible markets and \(m\) to treat there are \(\binom{M}{m}\) combinations. For a modest design with \(M = 120\) eligible markets choosing \(m = 4\) that is already
candidate tuples, and the count explodes super-polynomially in \(m\): \(\binom{120}{6} \approx 3.8 \times 10^{9}\) and \(\binom{200}{6} \approx 8.2 \times 10^{10}\). Scoring a simplex QP for each is hopeless past small cases.
The natural fallback – a branch-and-bound integer program with a convex relaxation lower bound to prune – also fails here, for a structural reason specific to this objective. The population target is the \(f\)-weighted centroid of the candidate predictors, so it lies inside the convex hull of the full candidate set. Any convex (cardinality-free) relaxation of “distance from the origin to the hull of a chosen subset” is therefore \(\approx 0\) over the entire upper half of the branch-and-bound tree: the relaxation cannot certify that a partial selection is bad, so it cannot prune. An exact MIP would degenerate into near-exhaustive enumeration with extra overhead.
Both failures are also unnecessary. Abadie & Zhao [ABADIE2024] require only that the chosen design be feasible and approximately balanced; validity is conditional on the achieved imbalance, a reported quantity, not on a certificate of global optimality. LEXSCM therefore:
Enumerates exactly when \(\binom{M}{m} \le \texttt{enumerate\_max}\) (default \(3{,}000{,}000\)). This is the gold standard – it returns the certified global
top_Kand reports termination statusOPTIMAL.Runs a strengthened multi-start local search otherwise: greedy construction from diverse seeds, best-improvement swap descent to a local optimum, and basin-hopping kicks to escape it. In Monte Carlo this lands on the exact optimum 83-100% of the time and within roughly \(1\%\) (mean) / \(7\%\) (worst case) of the minimal imbalance – immaterial under the approximate-balance criterion. Termination status is
FEASIBLE.
Because the local search has no MIP optimality gap, LEXSCM reports a
consensus diagnostic in its place: the fraction of independent starts
that converged to the incumbent (consensus_rate), the number of
distinct local optima seen, and the incumbent-improvement trail. High
consensus across many random starts is the practical confidence signal
that the incumbent is the global optimum. (A convex hull lower bound is
also reported, but only as advisory information – for the reason above
it is \(\approx 0\) and is deliberately not turned into an
optimality gap.)
Building tuples in the heuristic regime#
When \(\binom{M}{m}\) exceeds enumerate_max the same per-tuple QP is
reused, but the set of tuples scored is grown adaptively by a multi-start
local search (_local_search):
Seeding. \(2 \times\)
n_startsseeds: then_startsunits with the smallest \(G_{jj}\) (single donors already nearest the population centroid – the cheapest places to start near balance) plusn_startsuniformly random units for diversity.Greedy construction. From a seed, repeatedly add the candidate that most lowers the batched loss of the partial tuple until \(|S| = m\), skipping any addition that would breach the budget.
Best-improvement descent. Score the full 1-swap neighbourhood (replace one member with one non-member), move to the steepest improving swap, and repeat to a local optimum.
Basin-hopping kicks. Perturb the incumbent with a random 2-swap
kickand re-descend, keeping the better basin; repeatedn_kicks(4) times per start.
Every distinct tuple ever scored is cached, and the global top_K of
that cache is returned and re-solved to full precision exactly as in the
exact path. The consensus block – how many independent starts’ final
optima coincide with the incumbent – is the confidence diagnostic that
stands in for the absent MIP gap.
Budget constraints#
When a per-unit treatment cost \(c_j\) is supplied (unit_cost_col,
constant within unit) and a total budget \(B_{\max}\), every
returned design must satisfy the hard knapsack constraint
Two mechanisms enforce it. First, a sound feasibility presolve removes any eligible unit \(i\) that cannot belong to any budget-feasible \(m\)-tuple, i.e. whenever
This only ever drops provably impossible units; it never touches the
objective, so it cannot change which feasible design is optimal. Second,
the search itself respects the budget at every step: exact enumeration
filters combinations by their cost sum, and the local search rejects any
greedy addition, swap, or kick that would breach \(B_{\max}\). If no
\(m\)-tuple fits the budget the stage reports INFEASIBLE.
Stage 1 returns a solver-style diagnostics block (stats) with the
problem definition, presolve removals, search method and number of subsets
evaluated, the incumbent and worst-in-pool imbalance, the advisory
relaxation bound, the solution pool of top_K tuples, and termination
status / runtime.
Stage 2 – Control fit#
For each candidate treated set \(S\) with weights \(w\), the
synthetic treated trajectory is \(X_{\cdot, S}\, w\). A synthetic
control is then built from the non-treated units: control weights
\(v\) solve a ridge-penalized quadratic program (penalty
lambda_penalty) matching the synthetic treated over \(E\), subject
to an exclusion constraint \(v_j = 0\) for all \(j \in S\) (a
treated unit cannot also be its own control). The per-period gap is
Pre-treatment fit is summarized by the normalized mean-squared error
(NMSE) of the synthetic treated against the target on both windows
(nmse_E, nmse_B). The blank-window gaps
\(e_B = \{e_t : t \in B\}\) (residuals_B) are, under Assumption 3,
pure placebo noise; they are the raw material for the power analysis. This
stage is identical to the pre-rebuild control path – only the search and
power stages around it changed.
Stage 3 – Power analysis (minimum detectable effect)#
Stage 3 asks: how large must a sustained treatment effect be for this design to detect it? The answer is a minimum-detectable-effect (MDE) curve over post-treatment horizon lengths, computed entirely from the Stage 2 placebo residuals with one consistent resampling model for both the null and the alternative.
What “at each horizon” means – and what it does not#
The curve is indexed by the length \(h\) of the post-treatment
window (n_post_grid, default \(h = 2, \dots, 8\)), not by calendar
period. There is no “MDE at period \(t = 91\)”. Instead, for each
candidate duration \(h\), the analysis manufactures many synthetic
\(h\)-long gap windows by resampling the placebo residuals and asks what
constant effect a window of that length could detect. Reading the curve as
\(h\) grows tells you how detectability improves the longer the
experiment runs.
Moving-block resampling#
Let the placebo pool be one or more residual series – the chosen design’s
blank-window gaps \(e_B\), optionally pooled with donor-unit placebo
gaps. The scale \(\sigma\) is the standard deviation of the pooled
residuals (all series concatenated, ddof=1, floored at
\(10^{-12}\)). The block length defaults to
where \(\tilde L\) is the median series length: floored at 1, and
capped at the horizon \(h\) so a block never exceeds the window it
fills. To draw one length-\(h\) window
(block_resample_windows()):
pick a placebo series uniformly at random, then repeatedly cut contiguous
blocks of length \(\ell\) from a random offset with wraparound
(np.take(..., mode="wrap")), concatenate them, and truncate to
\(h\). Wrapping preserves the within-series autocorrelation
(Assumption 3); the random series choice reproduces the cross-unit placebo
distribution.
Test statistic and critical value#
The statistic is the mean absolute gap over the window,
Resampling \(n_\text{null}\) (default 4000) placebo windows gives the null distribution; its \((1 - \alpha)\) empirical quantile is the critical value \(c_\alpha\).
Power, the effect grid, and the MDE#
A treatment effect is modelled as a constant additive shift \(\tau\) applied to every point of a resampled window – the homogeneous-effect working assumption behind the MDE. Its power is estimated from a fresh draw of \(n_\text{power}\) (default 2000) windows (same residual model, new randomness), so the null and the alternative differ only by the shift:
The effect is swept on a grid of \(n_\text{grid} = 64\) points in
standard-deviation units, \(\tau / \sigma \in [0, \texttt{max\_sd}]\)
(default cap \(8\)). The search walks the grid until
\(\operatorname{power}(\tau)\) first reaches power_target, then
linearly interpolates between the last sub-threshold point
\((g_0, p_0)\) and the crossing point \((g, p)\) for a finer value:
If the grid is exhausted without reaching the target, the horizon is infeasible and returns \(\infty\).
Three reported scales#
Each horizon reports the MDE on three scales:
mde_sd\(= \mathrm{MDE}(h)/\sigma\) – effect-size units, the primary scale (matching Vives-i-Bastida’s “detect effects larger than \(0.1\) s.d.” convention) and numerically robust: nothing is divided by a fragile mean, so zero-mean or near-zero outcomes cannot blow the calculation up;mde_abs\(= \mathrm{MDE}(h)\) – outcome units;mde_pct– the manager-facing percentage, \(100 \cdot \mathrm{MDE}(h) / \lvert\text{baseline}\rvert\), where the baseline is the counterfactual level over the matching window, \(\text{baseline} = \operatorname{mean}\bigl(\texttt{synthetic\_treated} [-h:]\bigr)\). This is guarded: when \(\lvert\text{baseline}\rvert\) falls below a floor (default one \(\sigma\)) the percentage is returned asNaNdeliberately, so a near-zero or sign-flipping level cannot manufacture a spurious “we can detect a 0.3% effect.”
Detectability curve and horizon collapse#
detectability_curve() sweeps
the horizon grid and returns curve_sd
(\(h \mapsto \mathrm{MDE}(h)/\sigma\)), curve_pct (the percentage
curve), the per-horizon details, and min_horizon_mde_le_0p1sd – the
shortest horizon whose MDE falls to \(0.1\sigma\), a quick read on how
long the experiment must run to clear the conventional effect-size bar.
Stage 4 then collapses the curve to a single representative MDE per the
mde_horizon setting:
"late"(default) – the MDE at the longest horizon; the conservative choice for a sustained-exposure experiment."early_min"– the smallest MDE across feasible horizons (most optimistic detectability)."early_mean"– the mean MDE across feasible horizons (the percentage averaged only over horizons where it is defined).
Stage 4 – Final recommendation (lexicographic selection)#
The final stage turns the per-design metrics – imbalance (validity),
mde_sd (power), stability \(=\) out-of-sample
\(\text{NMSE}_B\) (fit robustness), and total_cost – into one
recommendation, applying the method’s priorities in strict order:
Validity gate. Keep only designs whose imbalance is within a relative slack of the best achievable balance,
\[\text{imbalance}(S) \le \bigl(1 + \texttt{imbalance\_tol}\bigr)\cdot \min_{S'} \text{imbalance}(S') ,\](default
imbalance_tol\(= 0.25\)). This is the set of designs that satisfy Assumption 1 well enough to be trustworthy. (A degenerate tolerance that would empty the gate falls back to the single best-balanced design.)Power. Among gated designs with a feasible MDE, choose the smallest
mde_sd– the most detectable valid design.Tie-breaks. Break ties by better out-of-sample stability (\(\text{NMSE}_B\)), then by lower
total_cost.
The recommendation is returned as a tuple of fields:
winner (the chosen design), shortlist (the top
max_shortlist ranked designs), pareto_ids (the Pareto frontier on
imbalance \(\downarrow\) vs. mde_sd \(\downarrow\), always
exposed for transparency), a status, a human-readable
explanation, and a per-design table (which becomes
results.summary). The status is one of:
OK– a valid, adequately powered design was found;POWER_NOT_ESTABLISHED– no gated design reached the power target within the effect grid, so the best-balanced design is recommended and the power caveat is flagged (the pipeline degrades gracefully rather than crashing);EMPTY– no candidate designs were supplied.
Worked numeric walkthrough#
To make the two core mechanisms concrete, here is a single tuple built and powered end to end on a deliberately tiny panel (\(T_0 = 6\) pre-period rows, \(J = 5\) units, treat \(m = 2\)). Every number below is the actual helper output, not an illustration.
Building the tuple (Stage 1)#
After \(f\)-centring and row-standardising the predictors, the Gram matrix’s diagonal – each unit’s squared distance to the population centroid – is
Unit 1 (0-indexed) sits closest to the target, so it seeds the AFW vertex
start. With \(\binom{5}{2} = 10\) the search enumerates exactly
(status OPTIMAL, 10 subsets scored). The winning tuple is
\(S = \{0, 1\}\), and its inner simplex QP returns treatment weights
The standalone high-precision re-solve reproduces this loss with a Frank-Wolfe lower bound equal to it to \(10^{-6}\) – the QP is at its certified optimum. The synthetic treated unit is \(0.41\,X_0 + 0.59\,X_1\), and \(1.2838\) is the achieved imbalance on which the bias bound (Assumption 1-2) is conditional.
Powering the design (Stage 3)#
Take this design’s blank-window placebo gaps to be a length-\(L = 24\) series with \(\sigma = 0.8695\). The auto block length is \(\ell = \min\bigl(h, \operatorname{round}(24^{1/3})\bigr) = \min(h, 3)\), so at \(h = 2\) it is capped to \(2\) and at \(h \ge 3\) it is \(3\). With a counterfactual level of \(\text{baseline} = 12.0\), the per-horizon MDE is:
\(h\) |
\(\ell\) |
\(c_\alpha\) |
|
|
|
|---|---|---|---|---|---|
2 |
2 |
1.245 |
2.037 |
1.771 |
14.8% |
4 |
3 |
1.108 |
1.672 |
1.454 |
12.1% |
8 |
3 |
0.981 |
1.282 |
1.115 |
9.3% |
Reading down the table is the detectability curve: as the horizon
lengthens, averaging over more periods shrinks both the critical value
\(c_\alpha\) and the MDE, so a sustained effect of
\(\approx 9\%\) of the counterfactual becomes detectable by
\(h = 8\), versus \(\approx 15\%\) at \(h = 2\). Under
mde_horizon="late" Stage 4 carries the \(h = 8\) row
(mde_sd \(= 1.28\)) as this design’s power score; under
"early_min" it would carry the smallest feasible mde_sd.
Standardized Post-Fit and Power Analysis#
LEXSCM’s Stage 3 already produces the lexicographic MDE curve used to rank
designs against each other; the standardized post-fit attached to
LEXSCMResults.post_fit is the complementary surface used once a design
has been chosen – it is the same
SyntheticControlPostFit object SYNDES,
MAREX, and PANGEO expose, so downstream consumers (dashboards, paper-style
reports, comparison tables) read identical fields across the family.
pf = res.post_fit # SyntheticControlPostFit
pf.ate, pf.ate_percent, pf.total_effect # treatment-effect scalars
pf.rmse_fit, pf.rmse_blank, pf.rmse_post # pre / blank / post fit quality
pf.p_value, pf.ci_lower, pf.ci_upper # conformal inference
pf.power # PowerAnalysis (see below)
The trajectories pf.treated_series and pf.control_series are the
winning candidate’s predictions.synthetic_treated and
predictions.synthetic_control – per-unit weighted aggregates of the
treated and control donors over the full timeline. The phase boundaries
(n_fit, n_blank, n_post) line up with the same E / B / post split LEXSCM
uses internally (so n_fit + n_blank = T0 and the 30%-of-pre-tail
holdout convention applies via frac_E = 0.7).
Where the unified power analysis fits in#
Stage 3’s detectability_curve()
is the design-selection workhorse – it converts the moving-block placebo
null on the B window into a per-horizon MDE used by Stage 4’s lexicographic
selector. The post-fit power analysis is the post-selection companion:
analytical Gaussian + AR(1) variance inflation on the realised gap residuals,
matching the MAREX / SYNDES / PANGEO power surfaces exactly so the same
diagnostic table can be produced for every family member.
with \(\hat\sigma_{\text{placebo}}\) the SD of the gap on the B (blank / holdout) window, \(\hat\rho\) the lag-1 autocorrelation of those residuals clipped to \((-0.99, 0.99)\), and \(\mathrm{VIF}(T, \rho) = \tfrac{1}{T}\bigl(1 + 2\sum_{k=1}^{T-1} (1-k/T)\rho^k\bigr)\) the standard AR(1) variance-inflation factor (textbook \(1/T\) when \(\rho = 0\)). See Synthetic Controls for Experimental Design (MAREX) for the full derivation; the same module powers all three estimators.
p = res.post_fit.power # PowerAnalysis
p.headline.mde_absolute # MDE at the realised T_post
p.headline.mde_pct # ... as % of post-period baseline
p.headline.power_at_observed # power to detect res.post_fit.ate
p.curve # tuple of MDEPoint per horizon
p.sigma_placebo # σ̂ used (B window in LEXSCM)
p.serial_correlation # ρ̂ AR(1) of the B residuals
Two MDEs, complementary roles#
Stage 3 MDE (
best_candidate.mde_results) – moving-block placebo null on the B window, used to rank designs against each other. Aggregated to a representative scalar bymde_horizon(late/early_min/early_mean) and consumed by Stage 4’s lexicographic gate.Post-fit MDE (
res.post_fit.power) – analytical Gaussian + AR(1) MDE consumed after a design has been chosen, on the same surface that MAREX / SYNDES / PANGEO produce. Use this when reporting a single detectability number alongside the realised ATE / CI.
Power-analysis failures (e.g. degenerate B-window residuals) never break a
fit; res.post_fit.power is simply left as None. To compute on a
non-default horizon grid or significance level call
compute_power_analysis() directly.
Core API#
- class mlsynth.estimators.lexscm.LEXSCM(config)#
Bases:
objectLexicographic Synthetic Control (LEXSCM) estimator.
This estimator automatically designs synthetic control experiments by jointly optimizing:
Pre-treatment fit (validity)
Statistical power (detectability of effects)
The user interacts with the estimator through a single entry point: fit(), while all modeling choices are controlled via the configuration object.
- Parameters:
config (dict or LEXSCMConfig) – Configuration object specifying data inputs, identification strategy, synthetic control settings, search budget, and inference parameters.
The following fields are supported:
Required
df(pd.DataFrame): Panel dataset containing unit-level observations over time.outcome(str): Name of the outcome variable.unitid(str): Column identifying observational units.time(str): Time index column.candidate_col(str): Boolean (0/1 or True/False) column indicating units eligible for treatment assignment.m(int): Number of units selected per treated tuple.
Identification / Design
post_col(str, optional): Indicator for post-treatment period (0/1).frac_E(float): Fraction of pre-treatment period used for estimation window E.unit_cost_col(str, optional): Per-unit treatment cost column (must be constant within unit).budget(float, optional): Total budget constraint on selected treated units.seed(int): Random seed for reproducibility.
Synthetic Control Specification
weight_col(str, optional): Unit-level weights (e.g., population, revenue).covariates(list of str, optional): Covariates included in synthetic control construction.lambda_penalty(float): Regularization penalty for control mismatch in quadratic program.
Search / Optimization Budget
top_K(int): Number of top candidate tuples returned by branch-and-bound.top_P(int): Number of seed units used for BnB initialization.
Inference / Power Analysis
alpha(float): Significance level for statistical testing.n_post_grid(list of int): Post-treatment horizons used for MDE detectability curves.n_sims(int): Number of Monte Carlo simulations for null distribution.post_imputation({“mean”, “max”, “double_max”}): Method for imputing post-treatment outcomes.test_statistic({“mean_abs”, “mean”, “rms”}): Test statistic used for treatment effect evaluation.delta(float): Absolute minimum detectable effect threshold.relative_delta(float, optional): Relative MDE threshold (> 1.0).target_mde_horizon(str): Target horizon for MDE evaluation (e.g., early_mde_avg).max_shortlist(int): Maximum number of candidate designs retained after filtering.
System / Logging
verbose(bool): If True, enables progress logging.
- Returns:
LEXSCM – Initialized estimator ready for .fit() execution.
See also
LEXSCM.fitExecutes the full optimization and estimation pipeline
Notes
The pipeline integrates: search, estimation, evaluation, power analysis, selection.
Designed for fully automated experimental design under constraints.
References
This implementation is based on the synthetic experimental design framework:
https://economics.mit.edu/sites/default/files/2026-02/Synthetic%20Controls%20for%20Experimental%20Design%20Feb%202026.pdf Develops the formal framework for selecting treated units using synthetic controls to reduce bias and improve experimental design in aggregate settings.
https://ivalua.cat/sites/default/files/2023-03/Vives-i-Bastida_2022_anon.pdf Provides a practical guide to synthetic experimental design in policy contexts, including inference and design trade-offs.
- fit(**kwargs) LEXSCM#
Run the full Synthetic Experiment Design pipeline.
- This method executes the end-to-end workflow:
prepares panel data
searches over candidate treated unit sets
fits synthetic controls
evaluates pre-treatment fit
estimates statistical power (MDE)
selects the optimal design
- Parameters:
**kwargs (dict, optional) – Reserved for future extensions (currently unused).
- Returns:
LEXSCMResults – Object containing: - summary : ranked table of candidate designs - best_candidate : selected experimental design - all_candidates : full evaluated set - bnb_metadata : search diagnostics - additional dataset and inference diagnostics
Configuration#
- class mlsynth.config_models.LEXSCMConfig(*, df: ~pandas.DataFrame, outcome: str, unitid: str, time: str, candidate_col: str, m: ~typing.Annotated[int, ~annotated_types.Gt(gt=0)], post_col: str | None = None, weight_col: str | None = None, unit_cost_col: str | None = None, budget: ~typing.Annotated[float | None, ~annotated_types.Gt(gt=0.0)] = None, seed: int = 42, frac_E: float = 0.7, covariates: ~typing.List[str] | None = None, lambda_penalty: float = 0.1, top_K: int = 20, top_P: int = 10, alpha: ~typing.Annotated[float, ~annotated_types.Gt(gt=0.0), ~annotated_types.Lt(lt=1.0)] = 0.05, n_post_grid: ~typing.List[int] = <factory>, n_sims: int = 1000, post_imputation: ~typing.Literal['mean', 'max', 'double_max'] = 'mean', test_statistic: ~typing.Literal['mean_abs', 'mean', 'rms'] = 'mean_abs', mde_horizon: ~typing.Literal['early_mean', 'early_min', 'late'] = 'late', max_shortlist: ~typing.Annotated[int, ~annotated_types.Gt(gt=0)] = 5, power_target: ~typing.Annotated[float, ~annotated_types.Gt(gt=0.0), ~annotated_types.Lt(lt=1.0)] = 0.8, imbalance_tol: ~typing.Annotated[float, ~annotated_types.Ge(ge=0)] = 0.25, display_graph: bool = False, verbose: bool = True)#
Configuration for LEXSCM - Fast Synthetic Experiment Design pipeline.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
Result Containers#
LEXSCM.fit() returns a
LEXSCMResults carrying the
winning SEDCandidate
(weights, predictions, losses, inference), the full candidate shortlist, the
Stage-1 branch-and-bound metadata, and the time / unit metadata blocks. In
addition, results.post_fit is the standardized
SyntheticControlPostFit shared across the
MAREX family (LEXSCM / MAREX / SYNDES / PANGEO): same ATE / RMSE / SMD /
inference / power surface, regardless of which estimator produced the design.
- class mlsynth.utils.fast_scm_helpers.structure.LEXSCMResults(summary: pandas.DataFrame, best_candidate: mlsynth.utils.fast_scm_helpers.structure.SEDCandidate, all_candidates: List[mlsynth.utils.fast_scm_helpers.structure.SEDCandidate], bnb_metadata: Dict[str, Any], time: mlsynth.utils.fast_scm_helpers.structure.TimeInfo, units: mlsynth.utils.fast_scm_helpers.structure.UnitInfo, outcome: str, y_pop_mean_t: numpy.ndarray = <factory>, post_fit: Optional[Any] = None)#
Bases:
object- best_candidate: SEDCandidate#
- summary: DataFrame#
- time: TimeInfo#
- units: UnitInfo#
- y_pop_mean_t: ndarray#
- class mlsynth.utils.post_fit.SyntheticControlPostFit(treated_series: ndarray, control_series: ndarray, gap_series: ndarray, n_fit: int, n_blank: int, n_post: int, ate: float | None = None, total_effect: float | None = None, ate_percent: float | None = None, ate_per_period: ndarray | None = None, cumulative_effect: ndarray | None = None, p_value: float | None = None, ci_lower: float | None = None, ci_upper: float | None = None, inference_method: str | None = None, rmse_fit: float | None = None, rmse_blank: float | None = None, rmse_post: float | None = None, covariate_names: Tuple[str, ...] = (), covariate_smd: Dict[str, float] | None = None, covariate_smd_abs_max: float | None = None, covariate_smd_squared_sum: float | None = None, covariate_smd_treated_vs_pop: Dict[str, float] | None = None, covariate_smd_treated_vs_pop_abs_max: float | None = None, covariate_smd_treated_vs_pop_squared_sum: float | None = None, covariate_smd_control_vs_pop: Dict[str, float] | None = None, covariate_smd_control_vs_pop_abs_max: float | None = None, covariate_smd_control_vs_pop_squared_sum: float | None = None, power: PowerAnalysis | None = None)#
Bases:
objectStandardized post-fit diagnostics for a single synthetic control design.
Field semantics are estimator-agnostic; every MAREX-family adapter populates the same shape. Any field that isn’t naturally computable for the producing estimator is left
None.- control_series: ndarray#
- gap_series: ndarray#
- power: PowerAnalysis | None = None#
- treated_series: ndarray#
- class mlsynth.utils.post_fit.PowerAnalysis(headline: MDEPoint, curve: Tuple[MDEPoint, ...], alpha: float, power_target: float, sigma_placebo: float, serial_correlation: float, baseline: float, method: str = 'analytical_ar1')#
Bases:
objectStandardized power-analysis output attached to
SyntheticControlPostFit.Built from the placebo / blank-period gap variance and an analytical Gaussian approximation, with AR(1) variance inflation to handle serial correlation in the gap residuals. The intent matches the per-estimator power modules already in the library (PangeoPower, SPCDPowerAnalysis, SYNDESPower) but consumes the same
SyntheticControlPostFitshape so every covariate-aware SCM-family estimator gets the surface for free.- curve#
MDE / power values across the requested
post_gridhorizons (so callers can read a detectability curve).
- serial_correlation#
Lag-1 (AR(1)) autocorrelation of the placebo gap residuals used to inflate the variance for serial dependence.
- Type:
- baseline#
Mean of the control trajectory on the post window (denominator for
mde_pct). NaN when no post window exists.- Type:
- method#
"analytical_ar1"for the closed-form Gaussian + AR(1) MDE used here. Reserved for future"monte_carlo"extensions.- Type:
Helper Modules#
Treated-tuple selection (Stage 1)#
Exact enumeration / multi-start local search over treated \(m\)-tuples, with the budget presolve and the away-step Frank-Wolfe simplex solver.
Treated-unit selection for synthetic-control experimental design.
Drop-in replacement for the branch-and-bound search in
fast_scm_helpers.fast_scm_bb. It selects the top_K candidate
treated m-tuples that best reproduce the population predictor vector,
i.e. that minimise the imbalance
- L(S) = min_{w in simplex(S)} || sum_{j in S} w_j Xtilde_j ||^2
= w’ G_SS w, G = Xtilde’ Xtilde (PSD Gram matrix)
where Xtilde are the f-centred, standardised predictors over the
estimation window E. Because the design is f-centred, the population
target is the donor centroid (the origin), so L is the squared
distance from the origin to the convex hull of the selected donors and
sqrt(L) is the achieved imbalance.
Why this is a search and not an exact MIP#
Abadie & Zhao (2026, “Synthetic Controls for Experimental Design”, p.13)
do not require global optimality of the weights – only that the
chosen design is feasible and approximately balanced,
Xbar - sum_j w_j X_j ~= 0. Validity of the bias bound and the
inference is conditional on the achieved imbalance, a goodness-of-fit
quantity reported here, not on a certificate of optimality.
Moreover, an exact bound is structurally weak: because the origin lies inside the convex hull of all candidates, the convex relaxation lower bound is ~0 over the upper half of any branch-and-bound tree, so it cannot prune there. We therefore:
enumerate exactly when
C(M, m)is small (the gold standard), andotherwise run a strengthened multi-start local search, which in Monte Carlo lands on the exact optimum 83-100% of the time and within ~1% (mean) / ~7% (worst) of the minimal imbalance – immaterial under the approximate-balance criterion above.
The achieved imbalance of every returned design is reported so the
analyst can check the ~= 0 condition that actually licenses the
method.
- class mlsynth.utils.fast_scm_helpers.lexsearch.TreatedDesign(indices: 'List[int]', weights: 'np.ndarray', loss: 'float', imbalance: 'float', total_cost: 'float' = 0.0, label: 'str' = '', labels: 'List[Any]' = <factory>, weight_dict: 'Dict[Any, float]' = <factory>, full_weights: 'Optional[np.ndarray]' = None)#
-
- weights: ndarray#
- mlsynth.utils.fast_scm_helpers.lexsearch.select_treated_designs(G: ndarray, candidate_idx: Sequence[int], m: int, top_K: int = 20, *, unit_costs: ndarray | None = None, budget: float | None = None, method: str = 'auto', enumerate_max: int = 3000000, n_starts: int = 16, iters: int = 80, random_state: int = 0, unit_index: Any | None = None) Dict[str, Any]#
Select the
top_Ktreated m-tuples with smallest imbalance.- Parameters:
G ((n, n) PSD Gram matrix of f-centred standardised predictors.)
candidate_idx (indices eligible for treatment.)
m (number of treated units per design.)
top_K (how many designs to return.)
unit_costs, budget (optional per-unit costs and total budget.)
method (“enumerate” forces exact; “heuristic” forces local search;) – “auto” enumerates when C(M, m) <=
enumerate_maxelse searches.
- Returns:
dict –
Dictionary with:
"top_designs" : list[TreatedDesign] sorted by imbalance. "stats" : diagnostics, incl. whether the result is the exact global top-K and the achieved-imbalance range.
Control fit (Stage 2)#
- mlsynth.utils.fast_scm_helpers.fast_scm_control.evaluate_candidates(candidates: List[Solution], X: ndarray, X_E: ndarray, Y: ndarray, f: ndarray, E_idx: ndarray, B_idx: ndarray, lambda_penalty: float, index_set: IndexSet) List[SEDCandidate]#
Evaluate candidate treated-unit subsets by constructing synthetic controls.
- Parameters:
candidates (list of Solution) – Candidate subsets from branch-and-bound, each containing indices and weights.
X (np.ndarray, shape (T, N)) – Full feature matrix (outcomes + covariates).
X_E (np.ndarray, shape (T_E, N)) – Standardized feature matrix over the estimation period.
Y (np.ndarray, shape (T, J)) – Outcome matrix.
f (np.ndarray, shape (N,)) – Weight vector used to construct the target series.
E_idx (np.ndarray) – Indices for estimation period.
B_idx (np.ndarray) – Indices for validation/backcast period.
lambda_penalty (float) – Regularization strength for control QP.
- Returns:
results (list of SEDCandidate) – Evaluated candidates including weights, predictions, and loss metrics.
Notes
For each candidate subset:
Construct synthetic treated unit:
treated_vec_E = X_E[:, treated_idx] @ w
Solve for synthetic control weights
v.Compute full time-series:
synthetic treated
synthetic control
treatment effects
Evaluate fit using NMSE on estimation and validation periods.
Additional Details#
Target series is defined as:
target = X[:, :J] @ f[:J]
(weighted average over outcome units).
Candidates with failed QP solves are skipped.
Results are packaged into structured SEDCandidate objects.
Power / MDE (Stage 3)#
Moving-block placebo minimum-detectable-effect analysis.
Power / minimum-detectable-effect (MDE) analysis for LEXSCM designs.
Rebuilt to be a consistent nonparametric permutation analysis, replacing the
earlier version whose null was quasi-empirical while its alternative was a pure
Gaussian shift (an internally inconsistent power calc), and whose effect was
normalised by mean(synth_treated[-n_post:]) floored at 1e-8 (which exploded
on zero-mean outcomes).
Design#
One residual model for null and alternative. Both are built by moving-block resampling the placebo residuals (the held-out blank-period gaps treated - control, which are pure noise under H0), so serial dependence is preserved – matching Abadie & Zhao’s time-series-robust inference.
MDE in standard-deviation units (effect size), the scale used by Vives-i-Bastida (“detect effects > 0.1 s.d.”), plus the absolute outcome-unit value. No division by a fragile level, so zero-mean outcomes are fine.
Adaptive effect grid: the search extends until
power_targetis reached (or a cap), so a detectable-but-large effect never returns NaN.Multi-series placebo pooling: pass the candidate’s blank gaps and the donor-unit placebo gaps; the null resamples within a randomly chosen series, giving the paper’s cross-unit placebo distribution.
Test statistic: S(e) = mean(|e_t|) over the post window (matches the
permutation test used for inference).
- mlsynth.utils.fast_scm_helpers.lexpower.block_resample_windows(series_list, n_post, n_draws, block_len, rng)#
Return an (n_draws, n_post) array of moving-block resampled windows.
Each draw: pick a random placebo series, then tile moving blocks (with wraparound) until the window has length
n_post. Preserves within-series autocorrelation and pools across series.
- mlsynth.utils.fast_scm_helpers.lexpower.compute_mde(noise_pool, n_post: int, *, alpha: float = 0.05, power_target: float = 0.8, block_len: int | None = None, n_null: int = 4000, n_power: int = 2000, max_sd: float = 8.0, n_grid: int = 64, random_state: int = 0, baseline: float | None = None, baseline_floor: float | None = None) Dict#
Minimum detectable (constant) effect at horizon
n_post.- Parameters:
noise_pool (1D array, or list of 1D arrays) – Placebo residual series (held-out blank-period gaps; optionally also the donor-unit placebo gaps). Outcome units.
n_post (int) – Post-treatment window length.
alpha, power_target (float) – Test level and target power (e.g. 0.05 and 0.80).
block_len (int, optional) – Moving-block length; default ~ L**(1/3).
max_sd (float) – Largest effect (in residual-SD units) the adaptive grid will probe.
baseline (float, optional) – Counterfactual outcome level the effect is expressed relative to (e.g.
mean(synthetic_treated[-n_post:])). When supplied, a percentage MDE is reported – the manager-facing “we can detect a Y% effect” number.baseline_floor (float, optional) – Minimum
abs(baseline)for which a percentage is trustworthy; defaults to one residual SD (sigma). Below the floormde_pctisNaN, deliberately, so zero-mean / sign-flipping outcomes cannot reproduce the spurious blow-up that motivated dropping level-normalisation here.
- Returns:
dict –
Dictionary with:
mde_sd : MDE in residual-standard-deviation units (np.inf if not reached within ``max_sd``). mde_abs : MDE in outcome units (= mde_sd * sigma). mde_pct : MDE as a percentage of ``baseline`` (NaN if no baseline was given or it falls below ``baseline_floor``). baseline : the level used for the percentage (NaN if none). sigma : residual SD scale. c_alpha : critical value of the placebo null statistic. power_at_mde: achieved power at the reported MDE. feasible : whether power_target was reached within max_sd. curve : list of (effect_sd, power) probed.
- mlsynth.utils.fast_scm_helpers.lexpower.detectability_curve(noise_pool, n_post_grid, *, baseline_series=None, **kw) Dict#
MDE at each horizon in
n_post_grid.Returns
{'details', 'curve_sd', 'curve_pct', ...}. Ifbaseline_series(the synthetic-treated counterfactual level) is supplied, each horizonwalso gets a percentage MDE relative tomean(baseline_series[-w:])– the level over the matching post window. Seecompute_mde()for the guard that returnsNaNwhen that level is not a trustworthy magnitude.
Recommendation (Stage 4)#
Lexicographic design selection.
Final design recommendation for LEXSCM (lexicographic selection).
Replaces the old select_best_tuple, whose efficiency score multiplied an
MDE that was often NaN/Inf and whose Pareto frame could empty out and raise an
opaque IndexError.
The selection is lexicographic, matching the method’s premise (Abadie & Zhao: approximate balance first, then power):
Validity gate. Keep designs whose imbalance is within
imbalance_tol(relative) of the best achievable balance – the set of “good treated fits”.Power. Among the gated designs, pick the smallest MDE (most detectable).
Tie-breaks. Then better out-of-sample stability, then lower cost.
If no gated design has a feasible MDE, it still returns the best-by-balance
design and flags status='POWER_NOT_ESTABLISHED' instead of crashing. A
Pareto frontier (imbalance vs MDE) is always exposed for transparency.
- class mlsynth.utils.fast_scm_helpers.lexselect.DesignMetrics(design_id: 'str', indices: 'List[int]', imbalance: 'float', mde_sd: 'float' = inf, mde_abs: 'float' = inf, mde_pct: 'float' = nan, mde_feasible: 'bool' = False, stability: 'float' = nan, total_cost: 'float' = 0.0, labels: 'List[Any]' = <factory>)#
- class mlsynth.utils.fast_scm_helpers.lexselect.Recommendation(winner: 'Optional[DesignMetrics]', shortlist: 'List[DesignMetrics]', pareto_ids: 'List[str]', status: 'str', explanation: 'str', table: 'List[Dict[str, Any]]')#
-
- shortlist: List[DesignMetrics]#
- winner: DesignMetrics | None#
- mlsynth.utils.fast_scm_helpers.lexselect.select_design(designs: List[DesignMetrics], *, imbalance_tol: float = 0.25, imbalance_abs: float | None = None, max_shortlist: int = 5) Recommendation#
Lexicographic recommendation: balance gate -> power -> stability -> cost.
Data preparation and matrix construction#
Utilities for constructing the design matrices, the population target, the estimation/blank split, and the Gram matrix.
- mlsynth.utils.fast_scm_helpers.fast_scm_setup.build_X_tilde(X: ndarray, f: ndarray, idx: ndarray)#
Standardize the feature matrix X over the estimation period, returning a normalized matrix and its Gram matrix.
- Parameters:
X (np.ndarray, shape (T, N)) – Full feature matrix including outcomes and covariates.
f (np.ndarray, shape (N,)) – Weight vector (typically group means) used for centering.
idx (np.ndarray, shape (T_E,)) – Indices corresponding to the estimation period.
J (int) – Number of outcome columns in X (the first J columns).
- Returns:
X_E (np.ndarray, shape (T_E, N)) – Standardized X over the estimation period. Each row is centered by the weighted mean mu = X[:, :J] @ f[:J] and scaled by per-time standard deviation across all columns.
G (np.ndarray, shape (N, N)) – Gram matrix of the standardized X: G = X_E.T @ X_E.
Notes
Standardization is performed only over the estimation period rows (idx).
Weighted mean is applied to the first J columns (outcome variables) only.
Rows with near-zero standard deviation are scaled with sigma=1.0 to avoid division by zero.
This function is fully vectorized using NumPy for efficiency.
- mlsynth.utils.fast_scm_helpers.fast_scm_setup.build_Y_matrix(working_df, outcome, time, unitid, unit_index)#
Construct outcome matrix in (time × unit) format.
- Parameters:
working_df (pd.DataFrame) – Input panel data.
outcome (str) – Outcome variable name.
time (str) – Time index column.
unitid (str) – Unit identifier column.
unit_index (IndexSet) – Ordered unit index.
- Returns:
np.ndarray, shape (T, J) – Wide-format outcome matrix.
- mlsynth.utils.fast_scm_helpers.fast_scm_setup.build_Z_matrix(working_df, covariates, time, unitid, unit_index)#
Construct stacked covariate matrix, collapsing time-invariant variables to a single row to prevent unintended over-weighting.
- Parameters:
working_df (pd.DataFrame) – Input panel data.
covariates (list of str) – List of covariate column names.
time (str) – Time column.
unitid (str) – Unit identifier column.
unit_index (IndexSet) – Unit ordering.
- Returns:
np.ndarray or None – Stacked covariate matrix (N rows x Units).
- mlsynth.utils.fast_scm_helpers.fast_scm_setup.build_candidate_mask(working_df, candidate_col, unit_index, unitid)#
Construct boolean mask identifying candidate units.
- Parameters:
working_df (pd.DataFrame) – Preprocessed panel data.
candidate_col (str) – Column indicating eligibility for treatment.
unit_index (IndexSet) – Full set of unit labels.
unitid (str) – Unit identifier column name.
- Returns:
np.ndarray – Boolean mask of candidate units aligned with unit_index.
- mlsynth.utils.fast_scm_helpers.fast_scm_setup.build_f_vector(working_df, weight_col, unitid, unit_index)#
Construct unit weighting vector.
- Parameters:
working_df (pd.DataFrame) – Input dataset.
weight_col (str or None) – Optional column defining unit weights.
unitid (str) – Unit identifier column.
unit_index (IndexSet) – Unit ordering.
- Returns:
np.ndarray – Normalized weight vector over units.
- mlsynth.utils.fast_scm_helpers.fast_scm_setup.prepare_experiment_inputs(Y: ndarray, Z: ndarray | None = None, f: ndarray | None = None, candidate_mask: ndarray | None = None, m: int = 5) Tuple[ndarray, ndarray, ndarray, int, int]#
Combine inputs into a unified feature matrix and aligned selection structures.
- Parameters:
Y (np.ndarray, shape (T, J)) – Outcome matrix.
Z (np.ndarray, optional) – Covariate matrix. If provided, concatenated with Y along columns.
f (np.ndarray, optional) – Weight vector of length N. Defaults to uniform weights if None.
candidate_mask (np.ndarray of bool, optional) – Boolean mask indicating candidate columns. If shorter than N, it will be extended with False values. If None, defaults to selecting only Y columns.
m (int, default=5) – Minimum required number of candidate units.
- Returns:
X (np.ndarray, shape (T, N)) – Combined feature matrix (Y and optionally Z).
f (np.ndarray, shape (N,)) – Weight vector aligned with X.
candidate_idx (np.ndarray) – Indices of candidate columns in X.
T (int) – Number of time periods.
N (int) – Total number of columns in X.
- Raises:
ValueError – If candidate_mask is longer than N or if fewer than m candidates are available.
Notes
Candidate mask is automatically extended to match the full feature dimension.
By default, only outcome columns (Y) are considered candidates.
- mlsynth.utils.fast_scm_helpers.fast_scm_setup.split_periods(T0: int, n_covariates: int, frac_E: float = 0.7, post_df: DataFrame | None = None, time_col: str = 'time') Tuple[ndarray, ndarray, ndarray, int, int]#
Returns indices and time-period counts.
Standardized post-fit (shared across the MAREX family) – the
compute_post_fit() and
compute_power_analysis() helpers that
populate results.post_fit live outside this package so SYNDES, MAREX,
and PANGEO all consume the same diagnostics module:
Standardized post-fit diagnostics for synthetic control designs and the matching power-analysis surface that consumes them.
After any MAREX-family estimator (LEXSCM, MAREX, SYNDES, PANGEO, …) solves its design problem, downstream consumers (the SAGE dashboard, paper-style reports, comparison tables) all need the same numbers:
the post-treatment ATT, total effect, percentage lift, per-period gap;
pre / blank / post root-mean-squared-error of the synthetic gap;
inference scalars (p-value, CI bounds) when computed;
covariate-balance standardized mean differences (SMDs) when covariates were used in the design.
This module exposes one frozen dataclass (SyntheticControlPostFit)
and three free functions:
compute_smd()– standalone, panel-independent SMDfrom any (cov_matrix, treated_w, control_w);
compute_post_fit()– the full diagnostic bundle fromtrajectories + boundaries + (optional) covariate matrix + (optional) inference;
compute_post_fit_marex()– adapter that builds the bundle from a
MAREXResults+MAREXPanelpair.
The free-function entry points are deliberately small and reusable, so the LEXSCM / SYNDES / PANGEO equivalents can be added one-at-a-time without touching this module: they just compose the same primitives.
- class mlsynth.utils.post_fit.MDEPoint(post_periods: int, mde_absolute: float, mde_pct: float, se: float, power_at_observed: float | None = None)#
Bases:
objectMinimum detectable effect at a single post-treatment horizon.
- class mlsynth.utils.post_fit.PowerAnalysis(headline: MDEPoint, curve: Tuple[MDEPoint, ...], alpha: float, power_target: float, sigma_placebo: float, serial_correlation: float, baseline: float, method: str = 'analytical_ar1')#
Bases:
objectStandardized power-analysis output attached to
SyntheticControlPostFit.Built from the placebo / blank-period gap variance and an analytical Gaussian approximation, with AR(1) variance inflation to handle serial correlation in the gap residuals. The intent matches the per-estimator power modules already in the library (PangeoPower, SPCDPowerAnalysis, SYNDESPower) but consumes the same
SyntheticControlPostFitshape so every covariate-aware SCM-family estimator gets the surface for free.- curve#
MDE / power values across the requested
post_gridhorizons (so callers can read a detectability curve).
- serial_correlation#
Lag-1 (AR(1)) autocorrelation of the placebo gap residuals used to inflate the variance for serial dependence.
- Type:
- baseline#
Mean of the control trajectory on the post window (denominator for
mde_pct). NaN when no post window exists.- Type:
- method#
"analytical_ar1"for the closed-form Gaussian + AR(1) MDE used here. Reserved for future"monte_carlo"extensions.- Type:
- class mlsynth.utils.post_fit.SyntheticControlPostFit(treated_series: ndarray, control_series: ndarray, gap_series: ndarray, n_fit: int, n_blank: int, n_post: int, ate: float | None = None, total_effect: float | None = None, ate_percent: float | None = None, ate_per_period: ndarray | None = None, cumulative_effect: ndarray | None = None, p_value: float | None = None, ci_lower: float | None = None, ci_upper: float | None = None, inference_method: str | None = None, rmse_fit: float | None = None, rmse_blank: float | None = None, rmse_post: float | None = None, covariate_names: Tuple[str, ...] = (), covariate_smd: Dict[str, float] | None = None, covariate_smd_abs_max: float | None = None, covariate_smd_squared_sum: float | None = None, covariate_smd_treated_vs_pop: Dict[str, float] | None = None, covariate_smd_treated_vs_pop_abs_max: float | None = None, covariate_smd_treated_vs_pop_squared_sum: float | None = None, covariate_smd_control_vs_pop: Dict[str, float] | None = None, covariate_smd_control_vs_pop_abs_max: float | None = None, covariate_smd_control_vs_pop_squared_sum: float | None = None, power: PowerAnalysis | None = None)#
Bases:
objectStandardized post-fit diagnostics for a single synthetic control design.
Field semantics are estimator-agnostic; every MAREX-family adapter populates the same shape. Any field that isn’t naturally computable for the producing estimator is left
None.- control_series: ndarray#
- gap_series: ndarray#
- power: PowerAnalysis | None = None#
- treated_series: ndarray#
- mlsynth.utils.post_fit.compute_post_fit(treated_series: ndarray, control_series: ndarray, *, n_fit: int, n_blank: int = 0, n_post: int | None = None, cov_matrix: ndarray | None = None, cov_names: Sequence[str] | None = None, cov_scales: ndarray | None = None, treated_weights: ndarray | None = None, control_weights: ndarray | None = None, population_weights: ndarray | None = None, inference: Any | None = None, n_treated_units: int | None = None) SyntheticControlPostFit#
Compute a
SyntheticControlPostFitfrom trajectories + boundaries.The trajectories
treated_seriesandcontrol_seriesare the estimator’s own synthetic constructs (Σⱼ wⱼ Yⱼ and Σⱼ vⱼ Yⱼ in Abadie-Zhou notation).n_postdefaults tolen(treated_series) - n_fit - n_blank.Covariate balance fields are populated when
cov_matrix+treated_weights+control_weightsare all supplied (the natural inputs for any MAREX-family design). Thecompute_smd()helper does the work, so the SMD numbers are exactly consistent with a standalone call tocompute_smd().Inference scalars are pulled from the estimator’s inference object via
_extract_inference(), which knows about the four common shapes (LEXSCMInference, MAREXMAREXInference, SYNDESSYNDESInference, or a plain dict). All inference fields are optional.
- mlsynth.utils.post_fit.compute_post_fit_marex(raw, panel, *, cov_scales: ndarray | None = None) SyntheticControlPostFit#
Adapt a
MAREXResults+MAREXPanelpair into aSyntheticControlPostFit.Pulls the aggregate synthetic-treated / synthetic-control trajectories from
raw.globres, the (T0, blank_periods) split frompanel.T0andpanel.blank_periods, the inference object fromraw.globres.inference, and the covariate matrix frompanel.covariates(when present).
- mlsynth.utils.post_fit.compute_power_analysis(post_fit: SyntheticControlPostFit, *, alpha: float = 0.05, power_target: float = 0.8, post_grid: Sequence[int] | None = None) PowerAnalysis#
Analytical MDE + power curve for a design’s
SyntheticControlPostFit.Uses the placebo / blank-period gap residuals (or the pre-period gap when no blank window was carved out) to estimate the noise standard deviation
sigma_placeboand the AR(1) autocorrelationrho, then computes the minimum detectable effect for each horizonTinpost_gridvia the Gaussian formulaMDE(T) = (z_{1-alpha/2} + z_{power}) * sigma_placebo * sqrt(VIF(T, rho)),
where
VIF(T, rho) = Var(mean of T AR(1) periods) / sigma_placebo^2. The headline MDE usesT = post_fit.n_post(the realised post window).- Parameters:
post_fit (SyntheticControlPostFit) – The standardized post-fit from any MAREX-family estimator.
alpha (float, default 0.05) – Two-sided significance level.
power_target (float, default 0.80) – Target power for the MDE.
post_grid (sequence of int, optional) – Post-treatment horizons at which to compute MDE. Defaults to a small geometric grid centered on
post_fit.n_postso users see the detectability tradeoff vs. running the experiment longer.
- Returns:
PowerAnalysis – Headline MDE + a curve over the requested horizons.
- mlsynth.utils.post_fit.compute_smd(cov_matrix: ndarray, treated_weights: ndarray, control_weights: ndarray, *, cov_names: Sequence[str] | None = None, cov_scales: ndarray | None = None) Dict[str, Any]#
Standardized mean differences between weighted treated and control means.
- Parameters:
cov_matrix (ndarray, shape (N, M)) – Per-unit covariate values; rows align to
treated_weightsandcontrol_weights.treated_weights, control_weights (ndarray, shape (N,)) – Non-negative weights with disjoint supports. They are renormalised to sum to 1 internally (so callers may pass raw sums-to-K weights).
cov_names (sequence of str, optional) – Names for the M covariates. Defaults to
("cov_0", "cov_1", ...).cov_scales (ndarray, shape (M,), optional) – Pre-computed per-covariate standardization scales (cross-unit std). Defaults to the std of
cov_matrixcolumns. Passing the value already cached bybuild_covariate_matrixis the right move.
- Returns:
dict with keys
smd(the per-covariate dict),smd_abs_max,and
smd_squared_sum. Returns empty / NaN summaries if either weightvector is all-zero.
Example: choosing treated markets under a budget#
This end-to-end example simulates a panel of sales markets, flags a subset as treatment-eligible with heterogeneous treatment costs, and lets LEXSCM choose which \(m = 3\) markets to treat under a fixed budget – then inspects the solver-style search diagnostics and the lexicographic recommendation.
Generate a synthetic panel#
import numpy as np
import pandas as pd
from mlsynth import LEXSCM
def generate_synthetic_sales_panel(
n_units: int = 100,
n_time_periods: int = 100,
n_candidates: int = 40,
treatment_start: int = 80,
seed: int = 42,
sigma: float = 3.0,
sales_scale: float = 100.0,
) -> pd.DataFrame:
np.random.seed(seed)
unit_fe = np.random.normal(0, 8, size=n_units)
unit_trend = np.random.normal(0.3, 0.15, size=n_units)
unit_sensitivity = np.random.uniform(0.8, 1.2, size=n_units)
t = np.arange(n_time_periods)
common_factor = (
1.5 * t + 120
+ 12 * np.sin(2 * np.pi * t / 52)
+ np.random.normal(0, 3.5, n_time_periods)
)
sales = np.zeros((n_time_periods, n_units))
for j in range(n_units):
sales[:, j] = (
sales_scale
+ common_factor * unit_sensitivity[j]
+ unit_fe[j]
+ unit_trend[j] * t
+ np.random.normal(0, sigma, n_time_periods)
)
sales = np.maximum(sales, 5.0)
# heterogeneous per-market treatment costs
base_cost = np.random.lognormal(mean=12, sigma=0.6, size=n_units) * 5
size_factor = np.random.uniform(0.7, 1.4, n_units)
treatment_cost = np.round((base_cost * size_factor) / 10) * 10
unit_ids = np.repeat(np.arange(n_units), n_time_periods)
time_ids = np.tile(np.arange(n_time_periods), n_units)
sales_flat = sales.ravel(order="F")
df = pd.DataFrame({
"unitid": unit_ids,
"time": time_ids,
"sales": sales_flat,
"treatment_cost": np.repeat(treatment_cost, n_time_periods),
})
candidate_mask = np.zeros(n_units, dtype=bool)
candidate_idx = np.random.choice(n_units, size=n_candidates, replace=False)
candidate_mask[candidate_idx] = True
df["candidate"] = np.repeat(candidate_mask, n_time_periods)
df["post"] = (df["time"] >= treatment_start).astype(int)
return df
Run LEXSCM and inspect the design#
df = generate_synthetic_sales_panel(
n_units=120,
n_candidates=40,
n_time_periods=100,
treatment_start=90,
seed=4545,
)
config = {
"df": df,
"outcome": "sales",
"unitid": "unitid",
"time": "time",
"candidate_col": "candidate",
"m": 3, # treat 3 of the 40 eligible markets
"post_col": "post",
"unit_cost_col": "treatment_cost",
"budget": 4_000_000, # hard knapsack cap on selected costs
"lambda_penalty": 0.5,
"mde_horizon": "late", # conservative MDE at the longest horizon
"power_target": 0.8,
"imbalance_tol": 0.25, # validity gate: within 25% of best balance
"top_K": 20,
}
results = LEXSCM(config).fit()
# --- Stage 1: solver-style search diagnostics ---
stats = results.bnb_metadata["stats"]
print("status :", stats["termination"]["status"]) # OPTIMAL / FEASIBLE
print("method :", stats["search"]["method"])
print("subsets scored :", stats["search"]["subsets_evaluated"])
print("C(M,m) :", stats["problem"]["feasible_region_C(M,m)"])
print("best imbalance :", round(stats["incumbent"]["imbalance"], 4))
# --- Stage 4: lexicographic recommendation tuple ---
rec = results.bnb_metadata["recommendation"]
print("rec status :", rec["status"]) # OK / POWER_NOT_ESTABLISHED
print("winner :", rec["winner"])
print("pareto ids :", rec["pareto_ids"])
print(rec["explanation"])
# --- the recommended design ---
print(results.summary) # ranked per-design table
best = results.best_candidate
print("treated weights:", best.treated_weight_dict)
print("control weights:", best.control_weight_dict)
With \(M = 40\) eligible markets and \(m = 3\),
\(\binom{40}{3} = 9{,}880\) is well under enumerate_max, so the
search enumerates exactly and reports status = OPTIMAL – a
certified global top_K. Raising \(M\) and \(m\) until
\(\binom{M}{m}\) exceeds the cap flips the search to the multi-start
local solver (status = FEASIBLE), at which point the consensus
block under stats["search"] reports how many independent starts agreed
on the incumbent.
References#
Abadie, A., & Zhao, J. (2026). “Synthetic Controls for Experimental Design.” arXiv Working Paper 2108.02196. (The February 2026 revision develops the design-based bias bound and the approximate-balance criterion used here.) https://arxiv.org/abs/2108.02196
Vives-i-Bastida, J. (2022). “Synthetic Experimental Design for a UBI pilot study.” Working paper (https://ivalua.cat/sites/default/files/2023-03/Vives-i-Bastida_2022_anon.pdf)
Doudchenko, N., Khosravi, K., Pouget-Abadie, J., Lahaie, S., et al. “Synthetic Design: An Optimization Approach to Experimental Design with Synthetic Controls.” – related optimization view of synthetic experimental design.