Lexicographic Synthetic Control (LEXSCM)

Contents

Lexicographic Synthetic Control (LEXSCM)#

Overview#

LEXSCM is a tool for synthetic experimental design: given a panel of units, a subset of which are eligible for treatment, it chooses which \(m\) of them to actually treat so that the rest of the panel forms a credible synthetic control for the treated group, and so that the resulting experiment has enough statistical power to detect the effects the analyst cares about. It is the design-stage analogue of the estimation-stage synthetic control methods elsewhere in mlsynth: instead of taking the treated unit as given and building a donor combination, it takes the donor pool as given and chooses the treated combination.

The method follows Abadie & Zhao, Synthetic Controls for Experimental Design [ABADIE2024], with the power / minimum-detectable-effect (MDE) machinery drawn from the synthetic-experimental-design work of Vives-i-Bastida (2022). LEXSCM optimizes two objectives in a strict priority order (“lexicographically”):

  1. Validity – the treated combination reproduces the population trajectory on the pre-treatment predictors (small imbalance);

  2. Power – subject to validity, the design detects the smallest possible effect (small minimum detectable effect).

The pipeline has four stages, each in its own helper module:

  1. Treated-tuple selection (lexsearch) – search the \(\binom{M}{m}\) candidate treated combinations for the most balanced ones, under an optional budget.

  2. Control fit (fast_scm_control) – build a synthetic control for each candidate treated group and score its pre-treatment fit.

  3. Power analysis (lexpower) – a moving-block placebo MDE curve over a grid of post-treatment horizons.

  4. Recommendation (lexselect) – a lexicographic rule (validity gate \(\to\) power \(\to\) stability \(\to\) cost) returning a single recommended design plus a Pareto frontier.

Mathematical Formulation#

Setup and notation#

We observe \(J\) units over a pre-treatment period of length \(T_0\). Stack the pre-period outcomes as \(Y \in \mathbb{R}^{T_0 \times J}\) (rows = time, columns = units) and, optionally, \(K\) time-invariant or pre-period covariates as \(Z \in \mathbb{R}^{K \times J}\). The two are stacked vertically into the predictor matrix

\[\begin{split}X = \begin{bmatrix} Y \\ Z \end{bmatrix} \in \mathbb{R}^{(T_0 + K) \times J},\end{split}\]

so each column \(X_{\cdot, j}\) is unit \(j\)’s predictor profile. A subset \(\mathcal{C} \subseteq \{1, \dots, J\}\) of size \(M = |\mathcal{C}|\) is flagged as treatment-eligible (the candidate_col); the design must pick \(m\) of these.

A population weighting vector \(f \in \mathbb{R}^J\), \(f \ge 0\), \(\mathbf{1}^\top f = 1\) (uniform \(1/J\) by default, or a weight_col such as population or revenue) defines the estimand’s target trajectory – the \(f\)-weighted average unit

\[\bar{X} = X f, \qquad \bar{X}_t = X_{t, \cdot}\, f .\]

The pre-period predictor rows are split into an estimation window \(E\) (the first \(\lfloor \texttt{frac\_E} \cdot T_0 \rfloor\) time rows, plus all covariate rows) and a held-out blank window \(B\) (the remaining pre-period time rows). \(E\) is where the design is fit; \(B\) is a pre-treatment “dress rehearsal” with no treatment, used to validate fit and to calibrate power.

Over \(E\) the predictors are row-standardized against the population target:

\[\widetilde{X}_{t, j} = \frac{X_{t, j} - \bar{X}_t}{\sigma_t}, \qquad \sigma_t = \operatorname{sd}_j\!\bigl(X_{t, \cdot}\bigr),\]

with \(\sigma_t\) floored away from zero. Centring on \(\bar{X}_t\) puts the population target at the origin; scaling by the cross-sectional spread \(\sigma_t\) makes mixed-scale predictors (outcomes in dollars, covariates in percent) commensurable so no single predictor dominates the fit. The \(J \times J\) Gram matrix

\[G = \widetilde{X}_E^\top \widetilde{X}_E \succeq 0\]

summarizes all pairwise predictor inner products over \(E\).

Identifying assumptions#

LEXSCM inherits the design-based identification of Abadie & Zhao [ABADIE2024]. The untreated potential outcomes are assumed to follow an (approximate) linear factor model

\[Y_{j, t}(0) = \delta_t + \theta_t^\top \mu_j + \varepsilon_{j, t},\]

with common time effects \(\delta_t\), latent time factors \(\theta_t\), unit loadings \(\mu_j\), and mean-zero transitory shocks \(\varepsilon_{j, t}\).

Assumption 1 (approximate balance). There exist treatment weights \(w\) on the chosen treated set \(S\) and control weights \(v\) on the remainder such that the synthetic treated and synthetic control reproduce the population target on the pre-period predictors, \(\bar{X} - \sum_{j \in S} w_j X_j \approx 0\). Remark. This is the design analogue of the SCM convex-hull condition, and it is the only substantive requirement. Crucially it is not imposed as an axiom: the achieved imbalance \(\lVert \bar{X} - \sum_j w_j X_j\rVert\) is a measurable goodness-of-fit quantity, reported for every design, on which the validity of the bias bound and the inference is conditional (Abadie & Zhao, p.13). The analyst checks the \(\approx 0\) condition rather than assuming it.

Assumption 2 (factor structure controls bias). Under the linear factor model, matching the treated and synthetic-control trajectories on a long enough pre-period drives the latent loadings \(\mu_j\) into alignment, so the design-based bias of the average treatment effect on the treated is bounded by the achieved imbalance and shrinks as \(T_0\) grows and the fit tightens. Remark. This is why Stage 1 minimizes imbalance rather than any in-sample treatment contrast: imbalance is the quantity the bias bound is written in.

Assumption 3 (placebo exchangeability / weak stationarity). On the blank window \(B\) no treatment has occurred, so the treated-minus-control gap is pure noise; that gap process is weakly stationary and free of anticipation, so its serial-dependence structure carries over to the post-treatment window. Remark. This is what licenses the Stage 3 power analysis: the post-treatment null distribution of the test statistic is reconstructed by moving-block resampling the blank-window gaps, which preserves autocorrelation – the time-series-robust inference of Abadie & Zhao rather than an i.i.d. permutation.

Stage 1 – Treated-tuple selection#

For a treated set \(S\) with \(|S| = m\) and simplex weights \(w \in \Delta(S) = \{w \ge 0 : \mathbf{1}^\top w = 1\}\), the synthetic treated unit is \(\sum_{j \in S} w_j X_j\). Its imbalance against the population target is, over the estimation window,

\[L(S) = \min_{w \in \Delta(S)} \Bigl\lVert \sum_{j \in S} w_j \widetilde{X}_{E, j} \Bigr\rVert_2^2 = \min_{w \in \Delta(S)} w^\top G_{SS}\, w ,\]

where \(G_{SS}\) is the \(m \times m\) sub-Gram matrix on the rows and columns of \(S\). Because the design is \(f\)-centred the population target sits at the origin, so \(L(S)\) is the squared distance from the population centroid to the convex hull of the selected donors, and \(\sqrt{L(S)}\) is the achieved imbalance of Assumption 1. Stage 1 returns the top_K sets of smallest \(L(S)\), subject to the budget below.

How a single tuple is built: the inner simplex QP#

“Building” a tuple \(S\) means solving its inner problem \(\min_{w \in \Delta(S)} w^\top G_{SS}\, w\) for the optimal treatment weights \(w(S)\); the synthetic treated unit is then \(\sum_{j \in S} w_j(S)\, X_j\) and the design’s achieved imbalance is \(\sqrt{w(S)^\top G_{SS}\, w(S)}\). This convex quadratic program over the probability simplex is solved by an Away-step Frank-Wolfe (AFW) routine in pure NumPy (_afw_single), chosen because every iterate stays exactly on the simplex (no projection step) and the away move removes the zig-zagging that plain Frank-Wolfe suffers near a face-constrained optimum – so the support of \(w\) (which donors carry positive weight) sharpens in a handful of iterations.

Write \(Q = G_{SS}\), \(f(w) = w^\top Q w\), \(\nabla f(w) = 2 Q w\). From a vertex start \(w^{(0)} = e_{\arg\min_i Q_{ii}}\) (the single donor closest to the target), each iteration:

  1. Pick two vertices. The Frank-Wolfe vertex \(s = \arg\min_i [\nabla f(w)]_i\) and the away vertex \(a = \arg\max_{i \in \operatorname{supp}(w)} [\nabla f(w)]_i\) – the currently active donor the gradient most wants to shed.

  2. Choose the direction. Compare the FW direction \(d_{\mathrm{FW}} = e_s - w\) against the away direction \(d_{\mathrm{AW}} = w - e_a\) by \(\langle \nabla f, d\rangle\) and take whichever descends more. The away step’s maximal feasible length is \(\gamma_{\max} = w_a / (1 - w_a)\); the FW step caps at \(1\).

  3. Exact line search. Because \(f\) is quadratic the optimal step along \(d\) is closed-form,

    \[\gamma^\star = \operatorname{clip}\!\Bigl( -\frac{w^\top Q d}{d^\top Q d},\; 0,\; \gamma_{\max}\Bigr),\]

    then \(w \leftarrow w + \gamma^\star d\), dropping \(a\) from the active set when a full away step empties it.

  4. Certified lower bound. The Frank-Wolfe gap gives a running lower bound \(f(w) + \min_i[\nabla f(w)]_i - \nabla f(w)^\top w \le f(w^\star)\) on the tuple’s true minimal loss; iteration stops once the duality gap \(\nabla f(w)^\top w - \min_i[\nabla f(w)]_i\) drops below tol.

Two-pass precision. Scoring every candidate tuple to convergence would be wasteful, so Stage 1 runs AFW at two fidelities. A vectorized batched AFW (_afw_batched, iters=80, all \(m \times m\) problems advanced together with einsum) ranks thousands of tuples in one sweep; only the surviving top_K are re-solved by the scalar _afw_single at iters=600, tol=1e-14 to pin \(w(S)\) and the loss to full precision. Each returned TreatedDesign then carries its indices, the high-precision weights \(w(S)\), loss \(= L(S)\), imbalance \(= \sqrt{L(S)}\), total_cost, and a label-keyed weight_dict.

Why a search and not a full enumeration or an exact MIP#

The obvious approach – enumerate every treated combination and keep the best – is defeated by combinatorics. With \(M\) eligible markets and \(m\) to treat there are \(\binom{M}{m}\) combinations. For a modest design with \(M = 120\) eligible markets choosing \(m = 4\) that is already

\[\binom{120}{4} = 8{,}214{,}570\]

candidate tuples, and the count explodes super-polynomially in \(m\): \(\binom{120}{6} \approx 3.8 \times 10^{9}\) and \(\binom{200}{6} \approx 8.2 \times 10^{10}\). Scoring a simplex QP for each is hopeless past small cases.

The natural fallback – a branch-and-bound integer program with a convex relaxation lower bound to prune – also fails here, for a structural reason specific to this objective. The population target is the \(f\)-weighted centroid of the candidate predictors, so it lies inside the convex hull of the full candidate set. Any convex (cardinality-free) relaxation of “distance from the origin to the hull of a chosen subset” is therefore \(\approx 0\) over the entire upper half of the branch-and-bound tree: the relaxation cannot certify that a partial selection is bad, so it cannot prune. An exact MIP would degenerate into near-exhaustive enumeration with extra overhead.

Both failures are also unnecessary. Abadie & Zhao [ABADIE2024] require only that the chosen design be feasible and approximately balanced; validity is conditional on the achieved imbalance, a reported quantity, not on a certificate of global optimality. LEXSCM therefore:

  • Enumerates exactly when \(\binom{M}{m} \le \texttt{enumerate\_max}\) (default \(3{,}000{,}000\)). This is the gold standard – it returns the certified global top_K and reports termination status OPTIMAL.

  • Runs a strengthened multi-start local search otherwise: greedy construction from diverse seeds, best-improvement swap descent to a local optimum, and basin-hopping kicks to escape it. In Monte Carlo this lands on the exact optimum 83-100% of the time and within roughly \(1\%\) (mean) / \(7\%\) (worst case) of the minimal imbalance – immaterial under the approximate-balance criterion. Termination status is FEASIBLE.

Because the local search has no MIP optimality gap, LEXSCM reports a consensus diagnostic in its place: the fraction of independent starts that converged to the incumbent (consensus_rate), the number of distinct local optima seen, and the incumbent-improvement trail. High consensus across many random starts is the practical confidence signal that the incumbent is the global optimum. (A convex hull lower bound is also reported, but only as advisory information – for the reason above it is \(\approx 0\) and is deliberately not turned into an optimality gap.)

Building tuples in the heuristic regime#

When \(\binom{M}{m}\) exceeds enumerate_max the same per-tuple QP is reused, but the set of tuples scored is grown adaptively by a multi-start local search (_local_search):

  • Seeding. \(2 \times\) n_starts seeds: the n_starts units with the smallest \(G_{jj}\) (single donors already nearest the population centroid – the cheapest places to start near balance) plus n_starts uniformly random units for diversity.

  • Greedy construction. From a seed, repeatedly add the candidate that most lowers the batched loss of the partial tuple until \(|S| = m\), skipping any addition that would breach the budget.

  • Best-improvement descent. Score the full 1-swap neighbourhood (replace one member with one non-member), move to the steepest improving swap, and repeat to a local optimum.

  • Basin-hopping kicks. Perturb the incumbent with a random 2-swap kick and re-descend, keeping the better basin; repeated n_kicks (4) times per start.

Every distinct tuple ever scored is cached, and the global top_K of that cache is returned and re-solved to full precision exactly as in the exact path. The consensus block – how many independent starts’ final optima coincide with the incumbent – is the confidence diagnostic that stands in for the absent MIP gap.

Budget constraints#

When a per-unit treatment cost \(c_j\) is supplied (unit_cost_col, constant within unit) and a total budget \(B_{\max}\), every returned design must satisfy the hard knapsack constraint

\[\sum_{j \in S} c_j \le B_{\max}.\]

Two mechanisms enforce it. First, a sound feasibility presolve removes any eligible unit \(i\) that cannot belong to any budget-feasible \(m\)-tuple, i.e. whenever

\[c_i + \sum_{\text{(}m-1\text{) cheapest other candidates}} c_j \;>\; B_{\max}.\]

This only ever drops provably impossible units; it never touches the objective, so it cannot change which feasible design is optimal. Second, the search itself respects the budget at every step: exact enumeration filters combinations by their cost sum, and the local search rejects any greedy addition, swap, or kick that would breach \(B_{\max}\). If no \(m\)-tuple fits the budget the stage reports INFEASIBLE.

Stage 1 returns a solver-style diagnostics block (stats) with the problem definition, presolve removals, search method and number of subsets evaluated, the incumbent and worst-in-pool imbalance, the advisory relaxation bound, the solution pool of top_K tuples, and termination status / runtime.

Stage 2 – Control fit#

For each candidate treated set \(S\) with weights \(w\), the synthetic treated trajectory is \(X_{\cdot, S}\, w\). A synthetic control is then built from the non-treated units: control weights \(v\) solve a ridge-penalized quadratic program (penalty lambda_penalty) matching the synthetic treated over \(E\), subject to an exclusion constraint \(v_j = 0\) for all \(j \in S\) (a treated unit cannot also be its own control). The per-period gap is

\[e_t = (X_{\cdot, S}\, w)_t - (X v)_t .\]

Pre-treatment fit is summarized by the normalized mean-squared error (NMSE) of the synthetic treated against the target on both windows (nmse_E, nmse_B). The blank-window gaps \(e_B = \{e_t : t \in B\}\) (residuals_B) are, under Assumption 3, pure placebo noise; they are the raw material for the power analysis. This stage is identical to the pre-rebuild control path – only the search and power stages around it changed.

Stage 3 – Power analysis (minimum detectable effect)#

Stage 3 asks: how large must a sustained treatment effect be for this design to detect it? The answer is a minimum-detectable-effect (MDE) curve over post-treatment horizon lengths, computed entirely from the Stage 2 placebo residuals with one consistent resampling model for both the null and the alternative.

What “at each horizon” means – and what it does not#

The curve is indexed by the length \(h\) of the post-treatment window (n_post_grid, default \(h = 2, \dots, 8\)), not by calendar period. There is no “MDE at period \(t = 91\)”. Instead, for each candidate duration \(h\), the analysis manufactures many synthetic \(h\)-long gap windows by resampling the placebo residuals and asks what constant effect a window of that length could detect. Reading the curve as \(h\) grows tells you how detectability improves the longer the experiment runs.

Moving-block resampling#

Let the placebo pool be one or more residual series – the chosen design’s blank-window gaps \(e_B\), optionally pooled with donor-unit placebo gaps. The scale \(\sigma\) is the standard deviation of the pooled residuals (all series concatenated, ddof=1, floored at \(10^{-12}\)). The block length defaults to

\[\ell = \max\!\bigl(1,\ \min(h,\ \operatorname{round}(\tilde L^{1/3}))\bigr),\]

where \(\tilde L\) is the median series length: floored at 1, and capped at the horizon \(h\) so a block never exceeds the window it fills. To draw one length-\(h\) window (block_resample_windows()): pick a placebo series uniformly at random, then repeatedly cut contiguous blocks of length \(\ell\) from a random offset with wraparound (np.take(..., mode="wrap")), concatenate them, and truncate to \(h\). Wrapping preserves the within-series autocorrelation (Assumption 3); the random series choice reproduces the cross-unit placebo distribution.

Test statistic and critical value#

The statistic is the mean absolute gap over the window,

\[S(e) = \frac{1}{h} \sum_{t=1}^{h} \lvert e_t \rvert .\]

Resampling \(n_\text{null}\) (default 4000) placebo windows gives the null distribution; its \((1 - \alpha)\) empirical quantile is the critical value \(c_\alpha\).

Power, the effect grid, and the MDE#

A treatment effect is modelled as a constant additive shift \(\tau\) applied to every point of a resampled window – the homogeneous-effect working assumption behind the MDE. Its power is estimated from a fresh draw of \(n_\text{power}\) (default 2000) windows (same residual model, new randomness), so the null and the alternative differ only by the shift:

\[\operatorname{power}(\tau) = \Pr\bigl[\, S(e + \tau) \ge c_\alpha \,\bigr] \approx \frac{1}{n_\text{power}} \sum_{b} \mathbf{1}\!\bigl\{ S(e^{(b)} + \tau) \ge c_\alpha \bigr\}.\]

The effect is swept on a grid of \(n_\text{grid} = 64\) points in standard-deviation units, \(\tau / \sigma \in [0, \texttt{max\_sd}]\) (default cap \(8\)). The search walks the grid until \(\operatorname{power}(\tau)\) first reaches power_target, then linearly interpolates between the last sub-threshold point \((g_0, p_0)\) and the crossing point \((g, p)\) for a finer value:

\[\frac{\mathrm{MDE}(h)}{\sigma} = g_0 + (\texttt{power\_target} - p_0)\,\frac{g - g_0}{p - p_0} .\]

If the grid is exhausted without reaching the target, the horizon is infeasible and returns \(\infty\).

Three reported scales#

Each horizon reports the MDE on three scales:

  • mde_sd \(= \mathrm{MDE}(h)/\sigma\)effect-size units, the primary scale (matching Vives-i-Bastida’s “detect effects larger than \(0.1\) s.d.” convention) and numerically robust: nothing is divided by a fragile mean, so zero-mean or near-zero outcomes cannot blow the calculation up;

  • mde_abs \(= \mathrm{MDE}(h)\)outcome units;

  • mde_pct – the manager-facing percentage, \(100 \cdot \mathrm{MDE}(h) / \lvert\text{baseline}\rvert\), where the baseline is the counterfactual level over the matching window, \(\text{baseline} = \operatorname{mean}\bigl(\texttt{synthetic\_treated} [-h:]\bigr)\). This is guarded: when \(\lvert\text{baseline}\rvert\) falls below a floor (default one \(\sigma\)) the percentage is returned as NaN deliberately, so a near-zero or sign-flipping level cannot manufacture a spurious “we can detect a 0.3% effect.”

Detectability curve and horizon collapse#

detectability_curve() sweeps the horizon grid and returns curve_sd (\(h \mapsto \mathrm{MDE}(h)/\sigma\)), curve_pct (the percentage curve), the per-horizon details, and min_horizon_mde_le_0p1sd – the shortest horizon whose MDE falls to \(0.1\sigma\), a quick read on how long the experiment must run to clear the conventional effect-size bar. Stage 4 then collapses the curve to a single representative MDE per the mde_horizon setting:

  • "late" (default) – the MDE at the longest horizon; the conservative choice for a sustained-exposure experiment.

  • "early_min" – the smallest MDE across feasible horizons (most optimistic detectability).

  • "early_mean" – the mean MDE across feasible horizons (the percentage averaged only over horizons where it is defined).

Stage 4 – Final recommendation (lexicographic selection)#

The final stage turns the per-design metrics – imbalance (validity), mde_sd (power), stability \(=\) out-of-sample \(\text{NMSE}_B\) (fit robustness), and total_cost – into one recommendation, applying the method’s priorities in strict order:

  1. Validity gate. Keep only designs whose imbalance is within a relative slack of the best achievable balance,

    \[\text{imbalance}(S) \le \bigl(1 + \texttt{imbalance\_tol}\bigr)\cdot \min_{S'} \text{imbalance}(S') ,\]

    (default imbalance_tol \(= 0.25\)). This is the set of designs that satisfy Assumption 1 well enough to be trustworthy. (A degenerate tolerance that would empty the gate falls back to the single best-balanced design.)

  2. Power. Among gated designs with a feasible MDE, choose the smallest mde_sd – the most detectable valid design.

  3. Tie-breaks. Break ties by better out-of-sample stability (\(\text{NMSE}_B\)), then by lower total_cost.

The recommendation is returned as a tuple of fields: winner (the chosen design), shortlist (the top max_shortlist ranked designs), pareto_ids (the Pareto frontier on imbalance \(\downarrow\) vs. mde_sd \(\downarrow\), always exposed for transparency), a status, a human-readable explanation, and a per-design table (which becomes results.summary). The status is one of:

  • OK – a valid, adequately powered design was found;

  • POWER_NOT_ESTABLISHED – no gated design reached the power target within the effect grid, so the best-balanced design is recommended and the power caveat is flagged (the pipeline degrades gracefully rather than crashing);

  • EMPTY – no candidate designs were supplied.

Worked numeric walkthrough#

To make the two core mechanisms concrete, here is a single tuple built and powered end to end on a deliberately tiny panel (\(T_0 = 6\) pre-period rows, \(J = 5\) units, treat \(m = 2\)). Every number below is the actual helper output, not an illustration.

Building the tuple (Stage 1)#

After \(f\)-centring and row-standardising the predictors, the Gram matrix’s diagonal – each unit’s squared distance to the population centroid – is

\[\operatorname{diag}(G) = (5.77,\ 3.64,\ 6.08,\ 6.27,\ 8.24).\]

Unit 1 (0-indexed) sits closest to the target, so it seeds the AFW vertex start. With \(\binom{5}{2} = 10\) the search enumerates exactly (status OPTIMAL, 10 subsets scored). The winning tuple is \(S = \{0, 1\}\), and its inner simplex QP returns treatment weights

\[w(S) = (0.4098,\ 0.5902), \qquad L(S) = w^\top G_{SS}\, w = 1.6481, \qquad \sqrt{L(S)} = 1.2838 .\]

The standalone high-precision re-solve reproduces this loss with a Frank-Wolfe lower bound equal to it to \(10^{-6}\) – the QP is at its certified optimum. The synthetic treated unit is \(0.41\,X_0 + 0.59\,X_1\), and \(1.2838\) is the achieved imbalance on which the bias bound (Assumption 1-2) is conditional.

Powering the design (Stage 3)#

Take this design’s blank-window placebo gaps to be a length-\(L = 24\) series with \(\sigma = 0.8695\). The auto block length is \(\ell = \min\bigl(h, \operatorname{round}(24^{1/3})\bigr) = \min(h, 3)\), so at \(h = 2\) it is capped to \(2\) and at \(h \ge 3\) it is \(3\). With a counterfactual level of \(\text{baseline} = 12.0\), the per-horizon MDE is:

\(h\)

\(\ell\)

\(c_\alpha\)

mde_sd

mde_abs

mde_pct

2

2

1.245

2.037

1.771

14.8%

4

3

1.108

1.672

1.454

12.1%

8

3

0.981

1.282

1.115

9.3%

Reading down the table is the detectability curve: as the horizon lengthens, averaging over more periods shrinks both the critical value \(c_\alpha\) and the MDE, so a sustained effect of \(\approx 9\%\) of the counterfactual becomes detectable by \(h = 8\), versus \(\approx 15\%\) at \(h = 2\). Under mde_horizon="late" Stage 4 carries the \(h = 8\) row (mde_sd \(= 1.28\)) as this design’s power score; under "early_min" it would carry the smallest feasible mde_sd.

Standardized Post-Fit and Power Analysis#

LEXSCM’s Stage 3 already produces the lexicographic MDE curve used to rank designs against each other; the standardized post-fit attached to LEXSCMResults.post_fit is the complementary surface used once a design has been chosen – it is the same SyntheticControlPostFit object SYNDES, MAREX, and PANGEO expose, so downstream consumers (dashboards, paper-style reports, comparison tables) read identical fields across the family.

pf = res.post_fit                          # SyntheticControlPostFit
pf.ate, pf.ate_percent, pf.total_effect    # treatment-effect scalars
pf.rmse_fit, pf.rmse_blank, pf.rmse_post   # pre / blank / post fit quality
pf.p_value, pf.ci_lower, pf.ci_upper       # conformal inference
pf.power                                   # PowerAnalysis (see below)

The trajectories pf.treated_series and pf.control_series are the winning candidate’s predictions.synthetic_treated and predictions.synthetic_control – per-unit weighted aggregates of the treated and control donors over the full timeline. The phase boundaries (n_fit, n_blank, n_post) line up with the same E / B / post split LEXSCM uses internally (so n_fit + n_blank = T0 and the 30%-of-pre-tail holdout convention applies via frac_E = 0.7).

Where the unified power analysis fits in#

Stage 3’s detectability_curve() is the design-selection workhorse – it converts the moving-block placebo null on the B window into a per-horizon MDE used by Stage 4’s lexicographic selector. The post-fit power analysis is the post-selection companion: analytical Gaussian + AR(1) variance inflation on the realised gap residuals, matching the MAREX / SYNDES / PANGEO power surfaces exactly so the same diagnostic table can be produced for every family member.

\[\mathrm{MDE}(T) = \bigl(z_{1-\alpha/2} + z_{1-\beta}\bigr) \cdot \hat\sigma_{\text{placebo}} \cdot \sqrt{\mathrm{VIF}(T, \hat\rho)},\]

with \(\hat\sigma_{\text{placebo}}\) the SD of the gap on the B (blank / holdout) window, \(\hat\rho\) the lag-1 autocorrelation of those residuals clipped to \((-0.99, 0.99)\), and \(\mathrm{VIF}(T, \rho) = \tfrac{1}{T}\bigl(1 + 2\sum_{k=1}^{T-1} (1-k/T)\rho^k\bigr)\) the standard AR(1) variance-inflation factor (textbook \(1/T\) when \(\rho = 0\)). See Synthetic Controls for Experimental Design (MAREX) for the full derivation; the same module powers all three estimators.

p = res.post_fit.power                      # PowerAnalysis
p.headline.mde_absolute                     # MDE at the realised T_post
p.headline.mde_pct                          # ... as % of post-period baseline
p.headline.power_at_observed                # power to detect res.post_fit.ate
p.curve                                     # tuple of MDEPoint per horizon
p.sigma_placebo                             # σ̂ used (B window in LEXSCM)
p.serial_correlation                        # ρ̂ AR(1) of the B residuals

Two MDEs, complementary roles#

  • Stage 3 MDE (best_candidate.mde_results) – moving-block placebo null on the B window, used to rank designs against each other. Aggregated to a representative scalar by mde_horizon (late / early_min / early_mean) and consumed by Stage 4’s lexicographic gate.

  • Post-fit MDE (res.post_fit.power) – analytical Gaussian + AR(1) MDE consumed after a design has been chosen, on the same surface that MAREX / SYNDES / PANGEO produce. Use this when reporting a single detectability number alongside the realised ATE / CI.

Power-analysis failures (e.g. degenerate B-window residuals) never break a fit; res.post_fit.power is simply left as None. To compute on a non-default horizon grid or significance level call compute_power_analysis() directly.

Core API#

class mlsynth.estimators.lexscm.LEXSCM(config)#

Bases: object

Lexicographic Synthetic Control (LEXSCM) estimator.

This estimator automatically designs synthetic control experiments by jointly optimizing:

  1. Pre-treatment fit (validity)

  2. Statistical power (detectability of effects)

The user interacts with the estimator through a single entry point: fit(), while all modeling choices are controlled via the configuration object.

Parameters:

config (dict or LEXSCMConfig) – Configuration object specifying data inputs, identification strategy, synthetic control settings, search budget, and inference parameters.

The following fields are supported:

Required

  • df (pd.DataFrame): Panel dataset containing unit-level observations over time.

  • outcome (str): Name of the outcome variable.

  • unitid (str): Column identifying observational units.

  • time (str): Time index column.

  • candidate_col (str): Boolean (0/1 or True/False) column indicating units eligible for treatment assignment.

  • m (int): Number of units selected per treated tuple.

Identification / Design

  • post_col (str, optional): Indicator for post-treatment period (0/1).

  • frac_E (float): Fraction of pre-treatment period used for estimation window E.

  • unit_cost_col (str, optional): Per-unit treatment cost column (must be constant within unit).

  • budget (float, optional): Total budget constraint on selected treated units.

  • seed (int): Random seed for reproducibility.

Synthetic Control Specification

  • weight_col (str, optional): Unit-level weights (e.g., population, revenue).

  • covariates (list of str, optional): Covariates included in synthetic control construction.

  • lambda_penalty (float): Regularization penalty for control mismatch in quadratic program.

Search / Optimization Budget

  • top_K (int): Number of top candidate tuples returned by branch-and-bound.

  • top_P (int): Number of seed units used for BnB initialization.

Inference / Power Analysis

  • alpha (float): Significance level for statistical testing.

  • n_post_grid (list of int): Post-treatment horizons used for MDE detectability curves.

  • n_sims (int): Number of Monte Carlo simulations for null distribution.

  • post_imputation ({“mean”, “max”, “double_max”}): Method for imputing post-treatment outcomes.

  • test_statistic ({“mean_abs”, “mean”, “rms”}): Test statistic used for treatment effect evaluation.

  • delta (float): Absolute minimum detectable effect threshold.

  • relative_delta (float, optional): Relative MDE threshold (> 1.0).

  • target_mde_horizon (str): Target horizon for MDE evaluation (e.g., early_mde_avg).

  • max_shortlist (int): Maximum number of candidate designs retained after filtering.

System / Logging

  • verbose (bool): If True, enables progress logging.

Returns:

LEXSCM – Initialized estimator ready for .fit() execution.

See also

LEXSCM.fit

Executes the full optimization and estimation pipeline

Notes

  • The pipeline integrates: search, estimation, evaluation, power analysis, selection.

  • Designed for fully automated experimental design under constraints.

References

This implementation is based on the synthetic experimental design framework:

fit(**kwargs) LEXSCM#

Run the full Synthetic Experiment Design pipeline.

This method executes the end-to-end workflow:
  • prepares panel data

  • searches over candidate treated unit sets

  • fits synthetic controls

  • evaluates pre-treatment fit

  • estimates statistical power (MDE)

  • selects the optimal design

Parameters:

**kwargs (dict, optional) – Reserved for future extensions (currently unused).

Returns:

LEXSCMResults – Object containing: - summary : ranked table of candidate designs - best_candidate : selected experimental design - all_candidates : full evaluated set - bnb_metadata : search diagnostics - additional dataset and inference diagnostics

Configuration#

class mlsynth.config_models.LEXSCMConfig(*, df: ~pandas.DataFrame, outcome: str, unitid: str, time: str, candidate_col: str, m: ~typing.Annotated[int, ~annotated_types.Gt(gt=0)], post_col: str | None = None, weight_col: str | None = None, unit_cost_col: str | None = None, budget: ~typing.Annotated[float | None, ~annotated_types.Gt(gt=0.0)] = None, seed: int = 42, frac_E: float = 0.7, covariates: ~typing.List[str] | None = None, lambda_penalty: float = 0.1, top_K: int = 20, top_P: int = 10, alpha: ~typing.Annotated[float, ~annotated_types.Gt(gt=0.0), ~annotated_types.Lt(lt=1.0)] = 0.05, n_post_grid: ~typing.List[int] = <factory>, n_sims: int = 1000, post_imputation: ~typing.Literal['mean', 'max', 'double_max'] = 'mean', test_statistic: ~typing.Literal['mean_abs', 'mean', 'rms'] = 'mean_abs', mde_horizon: ~typing.Literal['early_mean', 'early_min', 'late'] = 'late', max_shortlist: ~typing.Annotated[int, ~annotated_types.Gt(gt=0)] = 5, power_target: ~typing.Annotated[float, ~annotated_types.Gt(gt=0.0), ~annotated_types.Lt(lt=1.0)] = 0.8, imbalance_tol: ~typing.Annotated[float, ~annotated_types.Ge(ge=0)] = 0.25, display_graph: bool = False, verbose: bool = True)#

Configuration for LEXSCM - Fast Synthetic Experiment Design pipeline.

class Config#
arbitrary_types_allowed = True#
extra = 'forbid'#
alpha: float#
budget: float | None#
candidate_col: str#
covariates: List[str] | None#
display_graph: bool#
frac_E: float#
imbalance_tol: float#
lambda_penalty: float#
m: int#
max_shortlist: int#
mde_horizon: Literal['early_mean', 'early_min', 'late']#
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_post_grid: List[int]#
n_sims: int#
post_col: str | None#
post_imputation: Literal['mean', 'max', 'double_max']#
power_target: float#
seed: int#
test_statistic: Literal['mean_abs', 'mean', 'rms']#
top_K: int#
top_P: int#
unit_cost_col: str | None#
verbose: bool#
weight_col: str | None#

Result Containers#

LEXSCM.fit() returns a LEXSCMResults carrying the winning SEDCandidate (weights, predictions, losses, inference), the full candidate shortlist, the Stage-1 branch-and-bound metadata, and the time / unit metadata blocks. In addition, results.post_fit is the standardized SyntheticControlPostFit shared across the MAREX family (LEXSCM / MAREX / SYNDES / PANGEO): same ATE / RMSE / SMD / inference / power surface, regardless of which estimator produced the design.

class mlsynth.utils.fast_scm_helpers.structure.LEXSCMResults(summary: pandas.DataFrame, best_candidate: mlsynth.utils.fast_scm_helpers.structure.SEDCandidate, all_candidates: List[mlsynth.utils.fast_scm_helpers.structure.SEDCandidate], bnb_metadata: Dict[str, Any], time: mlsynth.utils.fast_scm_helpers.structure.TimeInfo, units: mlsynth.utils.fast_scm_helpers.structure.UnitInfo, outcome: str, y_pop_mean_t: numpy.ndarray = <factory>, post_fit: Optional[Any] = None)#

Bases: object

all_candidates: List[SEDCandidate]#
best_candidate: SEDCandidate#
bnb_metadata: Dict[str, Any]#
outcome: str#
post_fit: Any | None = None#
summary: DataFrame#
time: TimeInfo#
units: UnitInfo#
y_pop_mean_t: ndarray#
class mlsynth.utils.post_fit.SyntheticControlPostFit(treated_series: ndarray, control_series: ndarray, gap_series: ndarray, n_fit: int, n_blank: int, n_post: int, ate: float | None = None, total_effect: float | None = None, ate_percent: float | None = None, ate_per_period: ndarray | None = None, cumulative_effect: ndarray | None = None, p_value: float | None = None, ci_lower: float | None = None, ci_upper: float | None = None, inference_method: str | None = None, rmse_fit: float | None = None, rmse_blank: float | None = None, rmse_post: float | None = None, covariate_names: Tuple[str, ...] = (), covariate_smd: Dict[str, float] | None = None, covariate_smd_abs_max: float | None = None, covariate_smd_squared_sum: float | None = None, covariate_smd_treated_vs_pop: Dict[str, float] | None = None, covariate_smd_treated_vs_pop_abs_max: float | None = None, covariate_smd_treated_vs_pop_squared_sum: float | None = None, covariate_smd_control_vs_pop: Dict[str, float] | None = None, covariate_smd_control_vs_pop_abs_max: float | None = None, covariate_smd_control_vs_pop_squared_sum: float | None = None, power: PowerAnalysis | None = None)#

Bases: object

Standardized post-fit diagnostics for a single synthetic control design.

Field semantics are estimator-agnostic; every MAREX-family adapter populates the same shape. Any field that isn’t naturally computable for the producing estimator is left None.

ate: float | None = None#
ate_per_period: ndarray | None = None#
ate_percent: float | None = None#
ci_lower: float | None = None#
ci_upper: float | None = None#
control_series: ndarray#
covariate_names: Tuple[str, ...] = ()#
covariate_smd: Dict[str, float] | None = None#
covariate_smd_abs_max: float | None = None#
covariate_smd_control_vs_pop: Dict[str, float] | None = None#
covariate_smd_control_vs_pop_abs_max: float | None = None#
covariate_smd_control_vs_pop_squared_sum: float | None = None#
covariate_smd_squared_sum: float | None = None#
covariate_smd_treated_vs_pop: Dict[str, float] | None = None#
covariate_smd_treated_vs_pop_abs_max: float | None = None#
covariate_smd_treated_vs_pop_squared_sum: float | None = None#
cumulative_effect: ndarray | None = None#
gap_series: ndarray#
inference_method: str | None = None#
n_blank: int#
n_fit: int#
n_post: int#
p_value: float | None = None#
power: PowerAnalysis | None = None#
rmse_blank: float | None = None#
rmse_fit: float | None = None#
rmse_post: float | None = None#
total_effect: float | None = None#
treated_series: ndarray#
class mlsynth.utils.post_fit.PowerAnalysis(headline: MDEPoint, curve: Tuple[MDEPoint, ...], alpha: float, power_target: float, sigma_placebo: float, serial_correlation: float, baseline: float, method: str = 'analytical_ar1')#

Bases: object

Standardized power-analysis output attached to SyntheticControlPostFit.

Built from the placebo / blank-period gap variance and an analytical Gaussian approximation, with AR(1) variance inflation to handle serial correlation in the gap residuals. The intent matches the per-estimator power modules already in the library (PangeoPower, SPCDPowerAnalysis, SYNDESPower) but consumes the same SyntheticControlPostFit shape so every covariate-aware SCM-family estimator gets the surface for free.

headline#

MDE for the actual n_post horizon of the realised design.

Type:

MDEPoint

curve#

MDE / power values across the requested post_grid horizons (so callers can read a detectability curve).

Type:

list of MDEPoint

alpha#

Two-sided significance level assumed.

Type:

float

power_target#

Target power the MDEs are computed at (default 0.80).

Type:

float

sigma_placebo#

Standard deviation of the placebo gap series used as the noise scale.

Type:

float

serial_correlation#

Lag-1 (AR(1)) autocorrelation of the placebo gap residuals used to inflate the variance for serial dependence.

Type:

float

baseline#

Mean of the control trajectory on the post window (denominator for mde_pct). NaN when no post window exists.

Type:

float

method#

"analytical_ar1" for the closed-form Gaussian + AR(1) MDE used here. Reserved for future "monte_carlo" extensions.

Type:

str

alpha: float#
baseline: float#
curve: Tuple[MDEPoint, ...]#
headline: MDEPoint#
mde_by_horizon() Dict[int, float]#

{post_periods: mde_pct} for quick lookup.

method: str = 'analytical_ar1'#
power_target: float#
serial_correlation: float#
sigma_placebo: float#
class mlsynth.utils.post_fit.MDEPoint(post_periods: int, mde_absolute: float, mde_pct: float, se: float, power_at_observed: float | None = None)#

Bases: object

Minimum detectable effect at a single post-treatment horizon.

mde_absolute: float#
mde_pct: float#
post_periods: int#
power_at_observed: float | None = None#
se: float#

Helper Modules#

Treated-tuple selection (Stage 1)#

Exact enumeration / multi-start local search over treated \(m\)-tuples, with the budget presolve and the away-step Frank-Wolfe simplex solver.

Treated-unit selection for synthetic-control experimental design.

Drop-in replacement for the branch-and-bound search in fast_scm_helpers.fast_scm_bb. It selects the top_K candidate treated m-tuples that best reproduce the population predictor vector, i.e. that minimise the imbalance

L(S) = min_{w in simplex(S)} || sum_{j in S} w_j Xtilde_j ||^2

= w’ G_SS w, G = Xtilde’ Xtilde (PSD Gram matrix)

where Xtilde are the f-centred, standardised predictors over the estimation window E. Because the design is f-centred, the population target is the donor centroid (the origin), so L is the squared distance from the origin to the convex hull of the selected donors and sqrt(L) is the achieved imbalance.

Why this is a search and not an exact MIP#

Abadie & Zhao (2026, “Synthetic Controls for Experimental Design”, p.13) do not require global optimality of the weights – only that the chosen design is feasible and approximately balanced, Xbar - sum_j w_j X_j ~= 0. Validity of the bias bound and the inference is conditional on the achieved imbalance, a goodness-of-fit quantity reported here, not on a certificate of optimality.

Moreover, an exact bound is structurally weak: because the origin lies inside the convex hull of all candidates, the convex relaxation lower bound is ~0 over the upper half of any branch-and-bound tree, so it cannot prune there. We therefore:

  • enumerate exactly when C(M, m) is small (the gold standard), and

  • otherwise run a strengthened multi-start local search, which in Monte Carlo lands on the exact optimum 83-100% of the time and within ~1% (mean) / ~7% (worst) of the minimal imbalance – immaterial under the approximate-balance criterion above.

The achieved imbalance of every returned design is reported so the analyst can check the ~= 0 condition that actually licenses the method.

class mlsynth.utils.fast_scm_helpers.lexsearch.TreatedDesign(indices: 'List[int]', weights: 'np.ndarray', loss: 'float', imbalance: 'float', total_cost: 'float' = 0.0, label: 'str' = '', labels: 'List[Any]' = <factory>, weight_dict: 'Dict[Any, float]' = <factory>, full_weights: 'Optional[np.ndarray]' = None)#
full_weights: ndarray | None = None#
imbalance: float#
indices: List[int]#
label: str = ''#
labels: List[Any]#
loss: float#
total_cost: float = 0.0#
weight_dict: Dict[Any, float]#
weights: ndarray#
mlsynth.utils.fast_scm_helpers.lexsearch.select_treated_designs(G: ndarray, candidate_idx: Sequence[int], m: int, top_K: int = 20, *, unit_costs: ndarray | None = None, budget: float | None = None, method: str = 'auto', enumerate_max: int = 3000000, n_starts: int = 16, iters: int = 80, random_state: int = 0, unit_index: Any | None = None) Dict[str, Any]#

Select the top_K treated m-tuples with smallest imbalance.

Parameters:
  • G ((n, n) PSD Gram matrix of f-centred standardised predictors.)

  • candidate_idx (indices eligible for treatment.)

  • m (number of treated units per design.)

  • top_K (how many designs to return.)

  • unit_costs, budget (optional per-unit costs and total budget.)

  • method (“enumerate” forces exact; “heuristic” forces local search;) – “auto” enumerates when C(M, m) <= enumerate_max else searches.

Returns:

dict

Dictionary with:

"top_designs" : list[TreatedDesign] sorted by imbalance.
"stats"       : diagnostics, incl. whether the result is the exact
                global top-K and the achieved-imbalance range.

Control fit (Stage 2)#

mlsynth.utils.fast_scm_helpers.fast_scm_control.evaluate_candidates(candidates: List[Solution], X: ndarray, X_E: ndarray, Y: ndarray, f: ndarray, E_idx: ndarray, B_idx: ndarray, lambda_penalty: float, index_set: IndexSet) List[SEDCandidate]#

Evaluate candidate treated-unit subsets by constructing synthetic controls.

Parameters:
  • candidates (list of Solution) – Candidate subsets from branch-and-bound, each containing indices and weights.

  • X (np.ndarray, shape (T, N)) – Full feature matrix (outcomes + covariates).

  • X_E (np.ndarray, shape (T_E, N)) – Standardized feature matrix over the estimation period.

  • Y (np.ndarray, shape (T, J)) – Outcome matrix.

  • f (np.ndarray, shape (N,)) – Weight vector used to construct the target series.

  • E_idx (np.ndarray) – Indices for estimation period.

  • B_idx (np.ndarray) – Indices for validation/backcast period.

  • lambda_penalty (float) – Regularization strength for control QP.

Returns:

results (list of SEDCandidate) – Evaluated candidates including weights, predictions, and loss metrics.

Notes

For each candidate subset:

  1. Construct synthetic treated unit:

    treated_vec_E = X_E[:, treated_idx] @ w
    
  2. Solve for synthetic control weights v.

  3. Compute full time-series:

    • synthetic treated

    • synthetic control

    • treatment effects

  4. Evaluate fit using NMSE on estimation and validation periods.

Additional Details#

  • Target series is defined as:

    target = X[:, :J] @ f[:J]
    

    (weighted average over outcome units).

  • Candidates with failed QP solves are skipped.

  • Results are packaged into structured SEDCandidate objects.

Power / MDE (Stage 3)#

Moving-block placebo minimum-detectable-effect analysis.

Power / minimum-detectable-effect (MDE) analysis for LEXSCM designs.

Rebuilt to be a consistent nonparametric permutation analysis, replacing the earlier version whose null was quasi-empirical while its alternative was a pure Gaussian shift (an internally inconsistent power calc), and whose effect was normalised by mean(synth_treated[-n_post:]) floored at 1e-8 (which exploded on zero-mean outcomes).

Design#

  • One residual model for null and alternative. Both are built by moving-block resampling the placebo residuals (the held-out blank-period gaps treated - control, which are pure noise under H0), so serial dependence is preserved – matching Abadie & Zhao’s time-series-robust inference.

  • MDE in standard-deviation units (effect size), the scale used by Vives-i-Bastida (“detect effects > 0.1 s.d.”), plus the absolute outcome-unit value. No division by a fragile level, so zero-mean outcomes are fine.

  • Adaptive effect grid: the search extends until power_target is reached (or a cap), so a detectable-but-large effect never returns NaN.

  • Multi-series placebo pooling: pass the candidate’s blank gaps and the donor-unit placebo gaps; the null resamples within a randomly chosen series, giving the paper’s cross-unit placebo distribution.

Test statistic: S(e) = mean(|e_t|) over the post window (matches the permutation test used for inference).

mlsynth.utils.fast_scm_helpers.lexpower.block_resample_windows(series_list, n_post, n_draws, block_len, rng)#

Return an (n_draws, n_post) array of moving-block resampled windows.

Each draw: pick a random placebo series, then tile moving blocks (with wraparound) until the window has length n_post. Preserves within-series autocorrelation and pools across series.

mlsynth.utils.fast_scm_helpers.lexpower.compute_mde(noise_pool, n_post: int, *, alpha: float = 0.05, power_target: float = 0.8, block_len: int | None = None, n_null: int = 4000, n_power: int = 2000, max_sd: float = 8.0, n_grid: int = 64, random_state: int = 0, baseline: float | None = None, baseline_floor: float | None = None) Dict#

Minimum detectable (constant) effect at horizon n_post.

Parameters:
  • noise_pool (1D array, or list of 1D arrays) – Placebo residual series (held-out blank-period gaps; optionally also the donor-unit placebo gaps). Outcome units.

  • n_post (int) – Post-treatment window length.

  • alpha, power_target (float) – Test level and target power (e.g. 0.05 and 0.80).

  • block_len (int, optional) – Moving-block length; default ~ L**(1/3).

  • max_sd (float) – Largest effect (in residual-SD units) the adaptive grid will probe.

  • baseline (float, optional) – Counterfactual outcome level the effect is expressed relative to (e.g. mean(synthetic_treated[-n_post:])). When supplied, a percentage MDE is reported – the manager-facing “we can detect a Y% effect” number.

  • baseline_floor (float, optional) – Minimum abs(baseline) for which a percentage is trustworthy; defaults to one residual SD (sigma). Below the floor mde_pct is NaN, deliberately, so zero-mean / sign-flipping outcomes cannot reproduce the spurious blow-up that motivated dropping level-normalisation here.

Returns:

dict

Dictionary with:

mde_sd      : MDE in residual-standard-deviation units (np.inf if not
              reached within ``max_sd``).
mde_abs     : MDE in outcome units (= mde_sd * sigma).
mde_pct     : MDE as a percentage of ``baseline`` (NaN if no baseline
              was given or it falls below ``baseline_floor``).
baseline    : the level used for the percentage (NaN if none).
sigma       : residual SD scale.
c_alpha     : critical value of the placebo null statistic.
power_at_mde: achieved power at the reported MDE.
feasible    : whether power_target was reached within max_sd.
curve       : list of (effect_sd, power) probed.

mlsynth.utils.fast_scm_helpers.lexpower.detectability_curve(noise_pool, n_post_grid, *, baseline_series=None, **kw) Dict#

MDE at each horizon in n_post_grid.

Returns {'details', 'curve_sd', 'curve_pct', ...}. If baseline_series (the synthetic-treated counterfactual level) is supplied, each horizon w also gets a percentage MDE relative to mean(baseline_series[-w:]) – the level over the matching post window. See compute_mde() for the guard that returns NaN when that level is not a trustworthy magnitude.

Recommendation (Stage 4)#

Lexicographic design selection.

Final design recommendation for LEXSCM (lexicographic selection).

Replaces the old select_best_tuple, whose efficiency score multiplied an MDE that was often NaN/Inf and whose Pareto frame could empty out and raise an opaque IndexError.

The selection is lexicographic, matching the method’s premise (Abadie & Zhao: approximate balance first, then power):

  1. Validity gate. Keep designs whose imbalance is within imbalance_tol (relative) of the best achievable balance – the set of “good treated fits”.

  2. Power. Among the gated designs, pick the smallest MDE (most detectable).

  3. Tie-breaks. Then better out-of-sample stability, then lower cost.

If no gated design has a feasible MDE, it still returns the best-by-balance design and flags status='POWER_NOT_ESTABLISHED' instead of crashing. A Pareto frontier (imbalance vs MDE) is always exposed for transparency.

class mlsynth.utils.fast_scm_helpers.lexselect.DesignMetrics(design_id: 'str', indices: 'List[int]', imbalance: 'float', mde_sd: 'float' = inf, mde_abs: 'float' = inf, mde_pct: 'float' = nan, mde_feasible: 'bool' = False, stability: 'float' = nan, total_cost: 'float' = 0.0, labels: 'List[Any]' = <factory>)#
design_id: str#
imbalance: float#
indices: List[int]#
labels: List[Any]#
mde_abs: float = inf#
mde_feasible: bool = False#
mde_pct: float = nan#
mde_sd: float = inf#
stability: float = nan#
total_cost: float = 0.0#
class mlsynth.utils.fast_scm_helpers.lexselect.Recommendation(winner: 'Optional[DesignMetrics]', shortlist: 'List[DesignMetrics]', pareto_ids: 'List[str]', status: 'str', explanation: 'str', table: 'List[Dict[str, Any]]')#
explanation: str#
pareto_ids: List[str]#
shortlist: List[DesignMetrics]#
status: str#
table: List[Dict[str, Any]]#
winner: DesignMetrics | None#
mlsynth.utils.fast_scm_helpers.lexselect.select_design(designs: List[DesignMetrics], *, imbalance_tol: float = 0.25, imbalance_abs: float | None = None, max_shortlist: int = 5) Recommendation#

Lexicographic recommendation: balance gate -> power -> stability -> cost.

Data preparation and matrix construction#

Utilities for constructing the design matrices, the population target, the estimation/blank split, and the Gram matrix.

mlsynth.utils.fast_scm_helpers.fast_scm_setup.build_X_tilde(X: ndarray, f: ndarray, idx: ndarray)#

Standardize the feature matrix X over the estimation period, returning a normalized matrix and its Gram matrix.

Parameters:
  • X (np.ndarray, shape (T, N)) – Full feature matrix including outcomes and covariates.

  • f (np.ndarray, shape (N,)) – Weight vector (typically group means) used for centering.

  • idx (np.ndarray, shape (T_E,)) – Indices corresponding to the estimation period.

  • J (int) – Number of outcome columns in X (the first J columns).

Returns:

  • X_E (np.ndarray, shape (T_E, N)) – Standardized X over the estimation period. Each row is centered by the weighted mean mu = X[:, :J] @ f[:J] and scaled by per-time standard deviation across all columns.

  • G (np.ndarray, shape (N, N)) – Gram matrix of the standardized X: G = X_E.T @ X_E.

Notes

  • Standardization is performed only over the estimation period rows (idx).

  • Weighted mean is applied to the first J columns (outcome variables) only.

  • Rows with near-zero standard deviation are scaled with sigma=1.0 to avoid division by zero.

  • This function is fully vectorized using NumPy for efficiency.

mlsynth.utils.fast_scm_helpers.fast_scm_setup.build_Y_matrix(working_df, outcome, time, unitid, unit_index)#

Construct outcome matrix in (time × unit) format.

Parameters:
  • working_df (pd.DataFrame) – Input panel data.

  • outcome (str) – Outcome variable name.

  • time (str) – Time index column.

  • unitid (str) – Unit identifier column.

  • unit_index (IndexSet) – Ordered unit index.

Returns:

np.ndarray, shape (T, J) – Wide-format outcome matrix.

mlsynth.utils.fast_scm_helpers.fast_scm_setup.build_Z_matrix(working_df, covariates, time, unitid, unit_index)#

Construct stacked covariate matrix, collapsing time-invariant variables to a single row to prevent unintended over-weighting.

Parameters:
  • working_df (pd.DataFrame) – Input panel data.

  • covariates (list of str) – List of covariate column names.

  • time (str) – Time column.

  • unitid (str) – Unit identifier column.

  • unit_index (IndexSet) – Unit ordering.

Returns:

np.ndarray or None – Stacked covariate matrix (N rows x Units).

mlsynth.utils.fast_scm_helpers.fast_scm_setup.build_candidate_mask(working_df, candidate_col, unit_index, unitid)#

Construct boolean mask identifying candidate units.

Parameters:
  • working_df (pd.DataFrame) – Preprocessed panel data.

  • candidate_col (str) – Column indicating eligibility for treatment.

  • unit_index (IndexSet) – Full set of unit labels.

  • unitid (str) – Unit identifier column name.

Returns:

np.ndarray – Boolean mask of candidate units aligned with unit_index.

mlsynth.utils.fast_scm_helpers.fast_scm_setup.build_f_vector(working_df, weight_col, unitid, unit_index)#

Construct unit weighting vector.

Parameters:
  • working_df (pd.DataFrame) – Input dataset.

  • weight_col (str or None) – Optional column defining unit weights.

  • unitid (str) – Unit identifier column.

  • unit_index (IndexSet) – Unit ordering.

Returns:

np.ndarray – Normalized weight vector over units.

mlsynth.utils.fast_scm_helpers.fast_scm_setup.prepare_experiment_inputs(Y: ndarray, Z: ndarray | None = None, f: ndarray | None = None, candidate_mask: ndarray | None = None, m: int = 5) Tuple[ndarray, ndarray, ndarray, int, int]#

Combine inputs into a unified feature matrix and aligned selection structures.

Parameters:
  • Y (np.ndarray, shape (T, J)) – Outcome matrix.

  • Z (np.ndarray, optional) – Covariate matrix. If provided, concatenated with Y along columns.

  • f (np.ndarray, optional) – Weight vector of length N. Defaults to uniform weights if None.

  • candidate_mask (np.ndarray of bool, optional) – Boolean mask indicating candidate columns. If shorter than N, it will be extended with False values. If None, defaults to selecting only Y columns.

  • m (int, default=5) – Minimum required number of candidate units.

Returns:

  • X (np.ndarray, shape (T, N)) – Combined feature matrix (Y and optionally Z).

  • f (np.ndarray, shape (N,)) – Weight vector aligned with X.

  • candidate_idx (np.ndarray) – Indices of candidate columns in X.

  • T (int) – Number of time periods.

  • N (int) – Total number of columns in X.

Raises:

ValueError – If candidate_mask is longer than N or if fewer than m candidates are available.

Notes

  • Candidate mask is automatically extended to match the full feature dimension.

  • By default, only outcome columns (Y) are considered candidates.

mlsynth.utils.fast_scm_helpers.fast_scm_setup.split_periods(T0: int, n_covariates: int, frac_E: float = 0.7, post_df: DataFrame | None = None, time_col: str = 'time') Tuple[ndarray, ndarray, ndarray, int, int]#

Returns indices and time-period counts.

Standardized post-fit (shared across the MAREX family) – the compute_post_fit() and compute_power_analysis() helpers that populate results.post_fit live outside this package so SYNDES, MAREX, and PANGEO all consume the same diagnostics module:

Standardized post-fit diagnostics for synthetic control designs and the matching power-analysis surface that consumes them.

After any MAREX-family estimator (LEXSCM, MAREX, SYNDES, PANGEO, …) solves its design problem, downstream consumers (the SAGE dashboard, paper-style reports, comparison tables) all need the same numbers:

  • the post-treatment ATT, total effect, percentage lift, per-period gap;

  • pre / blank / post root-mean-squared-error of the synthetic gap;

  • inference scalars (p-value, CI bounds) when computed;

  • covariate-balance standardized mean differences (SMDs) when covariates were used in the design.

This module exposes one frozen dataclass (SyntheticControlPostFit) and three free functions:

  • compute_smd() – standalone, panel-independent SMD

    from any (cov_matrix, treated_w, control_w);

  • compute_post_fit() – the full diagnostic bundle from

    trajectories + boundaries + (optional) covariate matrix + (optional) inference;

  • compute_post_fit_marex() – adapter that builds the bundle from a

    MAREXResults + MAREXPanel pair.

The free-function entry points are deliberately small and reusable, so the LEXSCM / SYNDES / PANGEO equivalents can be added one-at-a-time without touching this module: they just compose the same primitives.

class mlsynth.utils.post_fit.MDEPoint(post_periods: int, mde_absolute: float, mde_pct: float, se: float, power_at_observed: float | None = None)#

Bases: object

Minimum detectable effect at a single post-treatment horizon.

mde_absolute: float#
mde_pct: float#
post_periods: int#
power_at_observed: float | None = None#
se: float#
class mlsynth.utils.post_fit.PowerAnalysis(headline: MDEPoint, curve: Tuple[MDEPoint, ...], alpha: float, power_target: float, sigma_placebo: float, serial_correlation: float, baseline: float, method: str = 'analytical_ar1')#

Bases: object

Standardized power-analysis output attached to SyntheticControlPostFit.

Built from the placebo / blank-period gap variance and an analytical Gaussian approximation, with AR(1) variance inflation to handle serial correlation in the gap residuals. The intent matches the per-estimator power modules already in the library (PangeoPower, SPCDPowerAnalysis, SYNDESPower) but consumes the same SyntheticControlPostFit shape so every covariate-aware SCM-family estimator gets the surface for free.

headline#

MDE for the actual n_post horizon of the realised design.

Type:

MDEPoint

curve#

MDE / power values across the requested post_grid horizons (so callers can read a detectability curve).

Type:

list of MDEPoint

alpha#

Two-sided significance level assumed.

Type:

float

power_target#

Target power the MDEs are computed at (default 0.80).

Type:

float

sigma_placebo#

Standard deviation of the placebo gap series used as the noise scale.

Type:

float

serial_correlation#

Lag-1 (AR(1)) autocorrelation of the placebo gap residuals used to inflate the variance for serial dependence.

Type:

float

baseline#

Mean of the control trajectory on the post window (denominator for mde_pct). NaN when no post window exists.

Type:

float

method#

"analytical_ar1" for the closed-form Gaussian + AR(1) MDE used here. Reserved for future "monte_carlo" extensions.

Type:

str

alpha: float#
baseline: float#
curve: Tuple[MDEPoint, ...]#
headline: MDEPoint#
mde_by_horizon() Dict[int, float]#

{post_periods: mde_pct} for quick lookup.

method: str = 'analytical_ar1'#
power_target: float#
serial_correlation: float#
sigma_placebo: float#
class mlsynth.utils.post_fit.SyntheticControlPostFit(treated_series: ndarray, control_series: ndarray, gap_series: ndarray, n_fit: int, n_blank: int, n_post: int, ate: float | None = None, total_effect: float | None = None, ate_percent: float | None = None, ate_per_period: ndarray | None = None, cumulative_effect: ndarray | None = None, p_value: float | None = None, ci_lower: float | None = None, ci_upper: float | None = None, inference_method: str | None = None, rmse_fit: float | None = None, rmse_blank: float | None = None, rmse_post: float | None = None, covariate_names: Tuple[str, ...] = (), covariate_smd: Dict[str, float] | None = None, covariate_smd_abs_max: float | None = None, covariate_smd_squared_sum: float | None = None, covariate_smd_treated_vs_pop: Dict[str, float] | None = None, covariate_smd_treated_vs_pop_abs_max: float | None = None, covariate_smd_treated_vs_pop_squared_sum: float | None = None, covariate_smd_control_vs_pop: Dict[str, float] | None = None, covariate_smd_control_vs_pop_abs_max: float | None = None, covariate_smd_control_vs_pop_squared_sum: float | None = None, power: PowerAnalysis | None = None)#

Bases: object

Standardized post-fit diagnostics for a single synthetic control design.

Field semantics are estimator-agnostic; every MAREX-family adapter populates the same shape. Any field that isn’t naturally computable for the producing estimator is left None.

ate: float | None = None#
ate_per_period: ndarray | None = None#
ate_percent: float | None = None#
ci_lower: float | None = None#
ci_upper: float | None = None#
control_series: ndarray#
covariate_names: Tuple[str, ...] = ()#
covariate_smd: Dict[str, float] | None = None#
covariate_smd_abs_max: float | None = None#
covariate_smd_control_vs_pop: Dict[str, float] | None = None#
covariate_smd_control_vs_pop_abs_max: float | None = None#
covariate_smd_control_vs_pop_squared_sum: float | None = None#
covariate_smd_squared_sum: float | None = None#
covariate_smd_treated_vs_pop: Dict[str, float] | None = None#
covariate_smd_treated_vs_pop_abs_max: float | None = None#
covariate_smd_treated_vs_pop_squared_sum: float | None = None#
cumulative_effect: ndarray | None = None#
gap_series: ndarray#
inference_method: str | None = None#
n_blank: int#
n_fit: int#
n_post: int#
p_value: float | None = None#
power: PowerAnalysis | None = None#
rmse_blank: float | None = None#
rmse_fit: float | None = None#
rmse_post: float | None = None#
total_effect: float | None = None#
treated_series: ndarray#
mlsynth.utils.post_fit.compute_post_fit(treated_series: ndarray, control_series: ndarray, *, n_fit: int, n_blank: int = 0, n_post: int | None = None, cov_matrix: ndarray | None = None, cov_names: Sequence[str] | None = None, cov_scales: ndarray | None = None, treated_weights: ndarray | None = None, control_weights: ndarray | None = None, population_weights: ndarray | None = None, inference: Any | None = None, n_treated_units: int | None = None) SyntheticControlPostFit#

Compute a SyntheticControlPostFit from trajectories + boundaries.

The trajectories treated_series and control_series are the estimator’s own synthetic constructs (Σⱼ wⱼ Yⱼ and Σⱼ vⱼ Yⱼ in Abadie-Zhou notation). n_post defaults to len(treated_series) - n_fit - n_blank.

Covariate balance fields are populated when cov_matrix + treated_weights + control_weights are all supplied (the natural inputs for any MAREX-family design). The compute_smd() helper does the work, so the SMD numbers are exactly consistent with a standalone call to compute_smd().

Inference scalars are pulled from the estimator’s inference object via _extract_inference(), which knows about the four common shapes (LEXSCM Inference, MAREX MAREXInference, SYNDES SYNDESInference, or a plain dict). All inference fields are optional.

mlsynth.utils.post_fit.compute_post_fit_marex(raw, panel, *, cov_scales: ndarray | None = None) SyntheticControlPostFit#

Adapt a MAREXResults + MAREXPanel pair into a SyntheticControlPostFit.

Pulls the aggregate synthetic-treated / synthetic-control trajectories from raw.globres, the (T0, blank_periods) split from panel.T0 and panel.blank_periods, the inference object from raw.globres.inference, and the covariate matrix from panel.covariates (when present).

mlsynth.utils.post_fit.compute_power_analysis(post_fit: SyntheticControlPostFit, *, alpha: float = 0.05, power_target: float = 0.8, post_grid: Sequence[int] | None = None) PowerAnalysis#

Analytical MDE + power curve for a design’s SyntheticControlPostFit.

Uses the placebo / blank-period gap residuals (or the pre-period gap when no blank window was carved out) to estimate the noise standard deviation sigma_placebo and the AR(1) autocorrelation rho, then computes the minimum detectable effect for each horizon T in post_grid via the Gaussian formula

MDE(T) = (z_{1-alpha/2} + z_{power}) * sigma_placebo * sqrt(VIF(T, rho)),

where VIF(T, rho) = Var(mean of T AR(1) periods) / sigma_placebo^2. The headline MDE uses T = post_fit.n_post (the realised post window).

Parameters:
  • post_fit (SyntheticControlPostFit) – The standardized post-fit from any MAREX-family estimator.

  • alpha (float, default 0.05) – Two-sided significance level.

  • power_target (float, default 0.80) – Target power for the MDE.

  • post_grid (sequence of int, optional) – Post-treatment horizons at which to compute MDE. Defaults to a small geometric grid centered on post_fit.n_post so users see the detectability tradeoff vs. running the experiment longer.

Returns:

PowerAnalysis – Headline MDE + a curve over the requested horizons.

mlsynth.utils.post_fit.compute_smd(cov_matrix: ndarray, treated_weights: ndarray, control_weights: ndarray, *, cov_names: Sequence[str] | None = None, cov_scales: ndarray | None = None) Dict[str, Any]#

Standardized mean differences between weighted treated and control means.

Parameters:
  • cov_matrix (ndarray, shape (N, M)) – Per-unit covariate values; rows align to treated_weights and control_weights.

  • treated_weights, control_weights (ndarray, shape (N,)) – Non-negative weights with disjoint supports. They are renormalised to sum to 1 internally (so callers may pass raw sums-to-K weights).

  • cov_names (sequence of str, optional) – Names for the M covariates. Defaults to ("cov_0", "cov_1", ...).

  • cov_scales (ndarray, shape (M,), optional) – Pre-computed per-covariate standardization scales (cross-unit std). Defaults to the std of cov_matrix columns. Passing the value already cached by build_covariate_matrix is the right move.

Returns:

  • dict with keys smd (the per-covariate dict), smd_abs_max,

  • and smd_squared_sum. Returns empty / NaN summaries if either weight

  • vector is all-zero.

Example: choosing treated markets under a budget#

This end-to-end example simulates a panel of sales markets, flags a subset as treatment-eligible with heterogeneous treatment costs, and lets LEXSCM choose which \(m = 3\) markets to treat under a fixed budget – then inspects the solver-style search diagnostics and the lexicographic recommendation.

Generate a synthetic panel#

import numpy as np
import pandas as pd

from mlsynth import LEXSCM


def generate_synthetic_sales_panel(
    n_units: int = 100,
    n_time_periods: int = 100,
    n_candidates: int = 40,
    treatment_start: int = 80,
    seed: int = 42,
    sigma: float = 3.0,
    sales_scale: float = 100.0,
) -> pd.DataFrame:

    np.random.seed(seed)

    unit_fe = np.random.normal(0, 8, size=n_units)
    unit_trend = np.random.normal(0.3, 0.15, size=n_units)
    unit_sensitivity = np.random.uniform(0.8, 1.2, size=n_units)

    t = np.arange(n_time_periods)
    common_factor = (
        1.5 * t + 120
        + 12 * np.sin(2 * np.pi * t / 52)
        + np.random.normal(0, 3.5, n_time_periods)
    )

    sales = np.zeros((n_time_periods, n_units))
    for j in range(n_units):
        sales[:, j] = (
            sales_scale
            + common_factor * unit_sensitivity[j]
            + unit_fe[j]
            + unit_trend[j] * t
            + np.random.normal(0, sigma, n_time_periods)
        )
    sales = np.maximum(sales, 5.0)

    # heterogeneous per-market treatment costs
    base_cost = np.random.lognormal(mean=12, sigma=0.6, size=n_units) * 5
    size_factor = np.random.uniform(0.7, 1.4, n_units)
    treatment_cost = np.round((base_cost * size_factor) / 10) * 10

    unit_ids = np.repeat(np.arange(n_units), n_time_periods)
    time_ids = np.tile(np.arange(n_time_periods), n_units)
    sales_flat = sales.ravel(order="F")

    df = pd.DataFrame({
        "unitid": unit_ids,
        "time": time_ids,
        "sales": sales_flat,
        "treatment_cost": np.repeat(treatment_cost, n_time_periods),
    })

    candidate_mask = np.zeros(n_units, dtype=bool)
    candidate_idx = np.random.choice(n_units, size=n_candidates, replace=False)
    candidate_mask[candidate_idx] = True
    df["candidate"] = np.repeat(candidate_mask, n_time_periods)
    df["post"] = (df["time"] >= treatment_start).astype(int)

    return df

Run LEXSCM and inspect the design#

df = generate_synthetic_sales_panel(
    n_units=120,
    n_candidates=40,
    n_time_periods=100,
    treatment_start=90,
    seed=4545,
)

config = {
    "df": df,
    "outcome": "sales",
    "unitid": "unitid",
    "time": "time",
    "candidate_col": "candidate",
    "m": 3,                         # treat 3 of the 40 eligible markets
    "post_col": "post",
    "unit_cost_col": "treatment_cost",
    "budget": 4_000_000,            # hard knapsack cap on selected costs
    "lambda_penalty": 0.5,
    "mde_horizon": "late",          # conservative MDE at the longest horizon
    "power_target": 0.8,
    "imbalance_tol": 0.25,          # validity gate: within 25% of best balance
    "top_K": 20,
}

results = LEXSCM(config).fit()

# --- Stage 1: solver-style search diagnostics ---
stats = results.bnb_metadata["stats"]
print("status        :", stats["termination"]["status"])      # OPTIMAL / FEASIBLE
print("method        :", stats["search"]["method"])
print("subsets scored :", stats["search"]["subsets_evaluated"])
print("C(M,m)        :", stats["problem"]["feasible_region_C(M,m)"])
print("best imbalance :", round(stats["incumbent"]["imbalance"], 4))

# --- Stage 4: lexicographic recommendation tuple ---
rec = results.bnb_metadata["recommendation"]
print("rec status    :", rec["status"])      # OK / POWER_NOT_ESTABLISHED
print("winner        :", rec["winner"])
print("pareto ids    :", rec["pareto_ids"])
print(rec["explanation"])

# --- the recommended design ---
print(results.summary)                        # ranked per-design table
best = results.best_candidate
print("treated weights:", best.treated_weight_dict)
print("control weights:", best.control_weight_dict)

With \(M = 40\) eligible markets and \(m = 3\), \(\binom{40}{3} = 9{,}880\) is well under enumerate_max, so the search enumerates exactly and reports status = OPTIMAL – a certified global top_K. Raising \(M\) and \(m\) until \(\binom{M}{m}\) exceeds the cap flips the search to the multi-start local solver (status = FEASIBLE), at which point the consensus block under stats["search"] reports how many independent starts agreed on the incumbent.

References#

Abadie, A., & Zhao, J. (2026). “Synthetic Controls for Experimental Design.” arXiv Working Paper 2108.02196. (The February 2026 revision develops the design-based bias bound and the approximate-balance criterion used here.) https://arxiv.org/abs/2108.02196

Vives-i-Bastida, J. (2022). “Synthetic Experimental Design for a UBI pilot study.” Working paper (https://ivalua.cat/sites/default/files/2023-03/Vives-i-Bastida_2022_anon.pdf)

Doudchenko, N., Khosravi, K., Pouget-Abadie, J., Lahaie, S., et al. “Synthetic Design: An Optimization Approach to Experimental Design with Synthetic Controls.” – related optimization view of synthetic experimental design.