Lexicographic Synthetic Control (LEXSCM)

Lexicographic Synthetic Control (LEXSCM)#

Overview#

LEXSCM is a tool for synthetic experimental design: given a panel of units, a subset of which are eligible for treatment, it chooses which $m$ of them to actually treat so that the rest of the panel forms a credible synthetic control for the treated group, and so that the resulting experiment has enough statistical power to detect the effects the analyst cares about. It is the design-stage analogue of the estimation-stage synthetic control methods elsewhere in mlsynth: instead of taking the treated unit as given and building a donor combination, it takes the donor pool as given and chooses the treated combination.

The method follows Abadie & Zhao, Synthetic Controls for Experimental Design [ABADIE2024], with the power / minimum-detectable-effect (MDE) machinery drawn from the synthetic-experimental-design work of Vives-i-Bastida (2022). LEXSCM optimizes two objectives in a strict priority order (“lexicographically”):

Validity – the treated combination reproduces the population trajectory on the pre-treatment predictors (small imbalance);
Power – subject to validity, the design detects the smallest possible effect (small minimum detectable effect).

The pipeline has four stages, each in its own helper module:

Treated-tuple selection (lexsearch) – search the $\binom{M}{m}$ candidate treated combinations for the most balanced ones, under an optional budget.
Control fit (fast_scm_control) – build a synthetic control for each candidate treated group and score its pre-treatment fit.
Power analysis (lexpower) – a moving-block placebo MDE curve over a grid of post-treatment horizons.
Recommendation (lexselect) – a lexicographic rule (validity gate $\to$ power $\to$ stability $\to$ cost) returning a single recommended design plus a Pareto frontier.

When to use this estimator#

Reach for LEXSCM before an experiment, not after one. The typical setting is a geo-experiment or marketing test: you have a panel of markets, only some of which can be treated (a budget, an operational constraint, or a policy makes the rest off-limits), and you must decide which of the eligible markets to treat. Treat too few or the wrong ones and the remaining markets cannot reconstruct the treated trajectory, so the effect is biased; treat markets whose gap is too noisy and the experiment cannot detect the lift you care about. LEXSCM chooses the treated combination so that the untreated markets form a credible synthetic control (validity) and the resulting design has the power to detect a meaningfully small effect (power), in that strict priority order.

A concrete example: a retailer plans a price promotion and can run it in at most three of its forty largest metros. LEXSCM searches the eligible metros, returns the three whose combined sales path the rest of the chain can reproduce most closely on the pre-period, and reports the minimum lift that design could detect over a planned eight-week window – together with a budget gate, geography (spillover) exclusions, and coverage quotas so the chosen markets are non-adjacent and spread across regions. Once the promotion runs, the same object realizes into a standard effect report.

Notation#

There are $N$ units $\mathcal{N} \coloneqq \{1, \dots, N\}$, observed over the pre-treatment window $\mathcal{T}_1 \coloneqq \{t \in \mathcal{T} : t \le T_0\}$ of length $T_0$, with $\mathcal{T} \coloneqq \{1, \dots, T\}$ 1-indexed and the intervention taking effect after $T_0$ (the post-period $\mathcal{T}_2 \coloneqq \{t \in \mathcal{T} : t > T_0\}$; the design phase uses only $\mathcal{T}_1$). Unit $j$’s outcome at time $t$ is $y_{jt}$. A subset $\mathcal{E} \subseteq \mathcal{N}$ of size $M = |\mathcal{E}|$ is flagged as treatment-eligible (the candidate_col); the design must pick $m$ of these as the treated set $\mathcal{S} \subseteq \mathcal{E}$, $|\mathcal{S}| = m$. Treatment weights on the chosen treated set are $\mathbf{w}$ and control weights on the remainder are $\mathbf{v}$, both on a simplex; an optimiser carries the star superscript $\mathbf{w}^\ast$. The per-period gap between the synthetic treated and synthetic control is $e_t$; a sustained treatment effect is $\tau$ (the quantity the power analysis sizes), and $\alpha$ is the significance level. The interference graph is $\mathbf{A}$ with conflict-neighbours $\mathcal{A}(\mathcal{S})$ (below).

Mathematical Formulation#

Setup and notation#

We observe $N$ units $\mathcal{N} \coloneqq \{1, \dots, N\}$ over a pre-treatment period $\mathcal{T}_1$ of length $T_0$. Stack the pre-period outcomes as $\mathbf{Y} \in \mathbb{R}^{T_0 \times N}$ (rows = time, columns = units) and, optionally, $K$ time-invariant or pre-period covariates as $\mathbf{Z} \in \mathbb{R}^{K \times N}$. The two are stacked vertically into the predictor matrix

\[\begin{split}\mathbf{X} = \begin{bmatrix} \mathbf{Y} \\ \mathbf{Z} \end{bmatrix} \in \mathbb{R}^{(T_0 + K) \times N},\end{split}\]

so each column $\mathbf{x}_j$ is unit $j$’s predictor profile. A subset $\mathcal{E} \subseteq \mathcal{N}$ of size $M = |\mathcal{E}|$ is flagged as treatment-eligible (the candidate_col); the design must pick $m$ of these.

A population weighting vector $\mathbf{f} \in \mathbb{R}^N$, $\mathbf{f} \ge 0$, $\mathbf{1}^\top \mathbf{f} = 1$ (uniform $1/N$ by default, or a weight_col such as population or revenue) defines the estimand’s target trajectory – the $\mathbf{f}$-weighted average unit

\[\bar{\mathbf{x}} \coloneqq \mathbf{X} \mathbf{f}, \qquad \bar{x}_t = \mathbf{X}_{t, \cdot}\, \mathbf{f} .\]

The pre-period predictor rows are split into an estimation window $E$ (the first $\lfloor \texttt{frac\_E} \cdot T_0 \rfloor$ time rows, plus all covariate rows) and a held-out blank window $B$ (the remaining pre-period time rows). $E$ is where the design is fit; $B$ is a pre-treatment “dress rehearsal” with no treatment, used to validate fit and to calibrate power.

Over $E$ the predictors are row-standardized against the population target:

\[\widetilde{X}_{t, j} = \frac{X_{t, j} - \bar{x}_t}{\sigma_t}, \qquad \sigma_t = \operatorname{sd}_j\!\bigl(X_{t, \cdot}\bigr),\]

with $\sigma_t$ floored away from zero. Centring on $\bar{x}_t$ puts the population target at the origin; scaling by the cross-sectional spread $\sigma_t$ makes mixed-scale predictors (outcomes in dollars, covariates in percent) commensurable so no single predictor dominates the fit. The $N \times N$ Gram matrix

\[\mathbf{G} \coloneqq \widetilde{\mathbf{X}}_E^\top \widetilde{\mathbf{X}}_E \succeq 0\]

summarizes all pairwise predictor inner products over $E$.

Identifying assumptions#

LEXSCM inherits the design-based identification of Abadie & Zhao [ABADIE2024]. The untreated potential outcomes are assumed to follow an (approximate) linear factor model

\[y_{jt}^N = \delta_t + \theta_t^\top \mu_j + \varepsilon_{j, t},\]

with common time effects $\delta_t$, latent time factors $\theta_t$, unit loadings $\mu_j$, and mean-zero transitory shocks $\varepsilon_{j, t}$.

Approximate balance. There exist treatment weights $\mathbf{w}$ on the chosen treated set $\mathcal{S}$ and control weights $\mathbf{v}$ on the remainder such that the synthetic treated and synthetic control reproduce the population target on the pre-period predictors, $\bar{\mathbf{x}} - \sum_{j \in \mathcal{S}} w_j \mathbf{x}_j \approx 0$.

Remark. This is the design analogue of the SCM convex-hull condition, and it is the only substantive requirement. Crucially it is not imposed as an axiom: the achieved imbalance $\lVert \bar{\mathbf{x}} - \sum_j w_j \mathbf{x}_j\rVert$ is a measurable goodness-of-fit quantity, reported for every design, on which the validity of the bias bound and the inference is conditional (Abadie & Zhao, p.13). The analyst checks the $\approx 0$ condition rather than assuming it.
Factor structure controls bias. Under the linear factor model, matching the treated and synthetic-control trajectories on a long enough pre-period drives the latent loadings $\mu_j$ into alignment, so the design-based bias of the average treatment effect on the treated is bounded by the achieved imbalance and shrinks as $T_0$ grows and the fit tightens.

Remark. This is why Stage 1 minimizes imbalance rather than any in-sample treatment contrast: imbalance is the quantity the bias bound is written in.
Placebo exchangeability / weak stationarity. On the blank window $B$ no treatment has occurred, so the treated-minus-control gap is pure noise; that gap process is weakly stationary and free of anticipation, so its serial-dependence structure carries over to the post-treatment window.

Remark. This is what licenses the Stage 3 power analysis: the post-treatment null distribution of the test statistic is reconstructed by moving-block resampling the blank-window gaps, which preserves autocorrelation – the time-series-robust inference of Abadie & Zhao rather than an i.i.d. permutation.

Stage 1 – Treated-tuple selection#

For a treated set $\mathcal{S}$ with $|\mathcal{S}| = m$ and simplex weights $\mathbf{w} \in \Delta(\mathcal{S}) \coloneqq \{\mathbf{w} \ge 0 : \mathbf{1}^\top \mathbf{w} = 1\}$, the synthetic treated unit is $\sum_{j \in \mathcal{S}} w_j \mathbf{x}_j$. Its imbalance against the population target is, over the estimation window,

\[L(\mathcal{S}) = \min_{\mathbf{w} \in \Delta(\mathcal{S})} \Bigl\lVert \sum_{j \in \mathcal{S}} w_j \widetilde{\mathbf{x}}_{E, j} \Bigr\rVert_2^2 = \min_{\mathbf{w} \in \Delta(\mathcal{S})} \mathbf{w}^\top \mathbf{G}_{\mathcal{S}\mathcal{S}}\, \mathbf{w} ,\]

where $\mathbf{G}_{\mathcal{S}\mathcal{S}}$ is the $m \times m$ sub-Gram matrix on the rows and columns of $\mathcal{S}$. Because the design is $\mathbf{f}$-centred the population target sits at the origin, so $L(\mathcal{S})$ is the squared distance from the population centroid to the convex hull of the selected donors, and $\sqrt{L(\mathcal{S})}$ is the achieved imbalance of Assumption 1. Stage 1 returns the top_K sets of smallest $L(\mathcal{S})$, subject to the budget below.

Weakly targeted designs (`targeting_penalty`)#

By default Stage 1 is fully targeted: the weights $\mathbf{w}$ are free to contort onto the population mean (Abadie & Zhao’s representative-experiment goal, so the estimand is the population ATE $\tau_t = \sum_j f_j(Y^I_{jt} - Y^N_{jt})$). In practice you often treat the chosen markets as a group – equal- or population-weighted – not with bespoke fractional weights, and you may prefer the treated group to look like itself rather than the population. targeting_penalty $= \gamma \ge 0$ adds an anchor toward the group’s own equal-weight aggregate,

\[\min_{\mathbf{w} \in \Delta(\mathcal{S})}\ \mathbf{w}^\top \mathbf{G}_{\mathcal{S}\mathcal{S}}\, \mathbf{w} \;+\; \gamma\,\bigl\lVert \mathbf{w} - \tfrac{1}{m}\mathbf{1} \bigr\rVert_2^2 .\]

On the simplex $\mathbf{1}^\top\mathbf{w}=1$, so $\lVert \mathbf{w} - \tfrac1m\mathbf{1}\rVert_2^2 = \mathbf{w}^\top\mathbf{w} - \tfrac1m$ and the penalty is exactly a diagonal ridge, $\min_{\mathbf{w}\in\Delta} \mathbf{w}^\top(\mathbf{G}_{\mathcal{S}\mathcal{S}} + \gamma\mathbf{I})\mathbf{w}$. The reported imbalance stays the true targeting distance $\sqrt{\mathbf{w}^\top\mathbf{G}_{\mathcal{S}\mathcal{S}}\mathbf{w}}$ at the penalized weights. $\gamma = 0$ (default) is the fully targeted design; $\gamma \to \infty$ selects the best equal-weight $m$-tuple; in between is weakly targeted, sliding the estimand from the population ATE toward the treated group’s own ATT. This is also the mechanism that discourages idiosyncratic treated sets: free weights can hit the population mean even for an odd set, whereas the anchor favours sets that sit near the population naturally (with near-equal weights), which the donor pool can also reconstruct.

How a single tuple is built: the inner simplex QP#

“Building” a tuple $\mathcal{S}$ means solving its inner problem $\min_{\mathbf{w} \in \Delta(\mathcal{S})} \mathbf{w}^\top \mathbf{G}_{\mathcal{S}\mathcal{S}}\, \mathbf{w}$ for the optimal treatment weights $\mathbf{w}(\mathcal{S})$; the synthetic treated unit is then $\sum_{j \in \mathcal{S}} w_j(\mathcal{S})\, \mathbf{x}_j$ and the design’s achieved imbalance is $\sqrt{\mathbf{w}(\mathcal{S})^\top \mathbf{G}_{\mathcal{S}\mathcal{S}}\, \mathbf{w}(\mathcal{S})}$. This convex quadratic program over the probability simplex is solved by an Away-step Frank-Wolfe (AFW) routine in pure NumPy (_afw_single), chosen because every iterate stays exactly on the simplex (no projection step) and the away move removes the zig-zagging that plain Frank-Wolfe suffers near a face-constrained optimum – so the support of $\mathbf{w}$ (which donors carry positive weight) sharpens in a handful of iterations.

Write $\mathbf{Q} = \mathbf{G}_{\mathcal{S}\mathcal{S}}$, $f(\mathbf{w}) = \mathbf{w}^\top \mathbf{Q} \mathbf{w}$, $\nabla f(\mathbf{w}) = 2 \mathbf{Q} \mathbf{w}$. From a vertex start $\mathbf{w}^{(0)} = \mathbf{e}_{\operatorname*{argmin}_i Q_{ii}}$ (the single donor closest to the target), each iteration (all vectors below are the iterate $\mathbf{w}$ and its simplex vertices $\mathbf{e}_i$):

Pick two vertices. The Frank-Wolfe vertex $s = \operatorname*{argmin}_i [\nabla f(\mathbf{w})]_i$ and the away vertex $a = \operatorname*{argmax}_{i \in \operatorname{supp}(\mathbf{w})} [\nabla f(\mathbf{w})]_i$ – the currently active donor the gradient most wants to shed.
Choose the direction. Compare the FW direction $\mathbf{d}_{\mathrm{FW}} = \mathbf{e}_s - \mathbf{w}$ against the away direction $\mathbf{d}_{\mathrm{AW}} = \mathbf{w} - \mathbf{e}_a$ by $\langle \nabla f, \mathbf{d}\rangle$ and take whichever descends more. The away step’s maximal feasible length is $\gamma_{\max} = w_a / (1 - w_a)$; the FW step caps at $1$.
Exact line search. Because $f$ is quadratic the optimal step along $\mathbf{d}$ is closed-form,

\[\gamma^\ast = \operatorname{clip}\!\Bigl( -\frac{\mathbf{w}^\top \mathbf{Q} \mathbf{d}}{\mathbf{d}^\top \mathbf{Q} \mathbf{d}},\; 0,\; \gamma_{\max}\Bigr),\]

then $\mathbf{w} \leftarrow \mathbf{w} + \gamma^\ast \mathbf{d}$, dropping $a$ from the active set when a full away step empties it.
Certified lower bound. The Frank-Wolfe gap gives a running lower bound $f(\mathbf{w}) + \min_i[\nabla f(\mathbf{w})]_i - \nabla f(\mathbf{w})^\top \mathbf{w} \le f(\mathbf{w}^\ast)$ on the tuple’s true minimal loss; iteration stops once the duality gap $\nabla f(\mathbf{w})^\top \mathbf{w} - \min_i[\nabla f(\mathbf{w})]_i$ drops below tol.

Two-pass precision. Scoring every candidate tuple to convergence would be wasteful, so Stage 1 runs AFW at two fidelities. A vectorized batched AFW (_afw_batched, iters=80, all $m \times m$ problems advanced together with einsum) ranks thousands of tuples in one sweep; only the surviving top_K are re-solved by the scalar _afw_single at iters=600, tol=1e-14 to pin $\mathbf{w}(\mathcal{S})$ and the loss to full precision. Each returned TreatedDesign then carries its indices, the high-precision weights $\mathbf{w}(\mathcal{S})$, loss $= L(\mathcal{S})$, imbalance $= \sqrt{L(\mathcal{S})}$, total_cost, and a label-keyed weight_dict.

Why a search and not a full enumeration or an exact MIP#

The obvious approach – enumerate every treated combination and keep the best – is defeated by combinatorics. With $M$ eligible markets and $m$ to treat there are $\binom{M}{m}$ combinations. For a modest design with $M = 120$ eligible markets choosing $m = 4$ that is already

\[\binom{120}{4} = 8{,}214{,}570\]

candidate tuples, and the count explodes super-polynomially in $m$: $\binom{120}{6} \approx 3.8 \times 10^{9}$ and $\binom{200}{6} \approx 8.2 \times 10^{10}$. Scoring a simplex QP for each is hopeless past small cases.

The natural fallback – a branch-and-bound integer program with a convex relaxation lower bound to prune – also fails here, for a structural reason specific to this objective. The population target is the $f$-weighted centroid of the candidate predictors, so it lies inside the convex hull of the full candidate set. Any convex (cardinality-free) relaxation of “distance from the origin to the hull of a chosen subset” is therefore $\approx 0$ over the entire upper half of the branch-and-bound tree: the relaxation cannot certify that a partial selection is bad, so it cannot prune. An exact MIP would degenerate into near-exhaustive enumeration with extra overhead.

Both failures are also unnecessary. Abadie & Zhao [ABADIE2024] require only that the chosen design be feasible and approximately balanced; validity is conditional on the achieved imbalance, a reported quantity, not on a certificate of global optimality. LEXSCM therefore:

Enumerates exactly when $\binom{M}{m} \le \texttt{enumerate\_max}$ (default $3{,}000{,}000$). This is the gold standard – it returns the certified global top_K and reports termination status OPTIMAL.
Runs a strengthened multi-start local search otherwise: greedy construction from diverse seeds, best-improvement swap descent to a local optimum, and basin-hopping kicks to escape it. In Monte Carlo this lands on the exact optimum 83-100% of the time and within roughly $1\%$ (mean) / $7\%$ (worst case) of the minimal imbalance – immaterial under the approximate-balance criterion. Termination status is FEASIBLE.

Because the local search has no MIP optimality gap, LEXSCM reports a consensus diagnostic in its place: the fraction of independent starts that converged to the incumbent (consensus_rate), the number of distinct local optima seen, and the incumbent-improvement trail. High consensus across many random starts is the practical confidence signal that the incumbent is the global optimum. (A convex hull lower bound is also reported, but only as advisory information – for the reason above it is $\approx 0$ and is deliberately not turned into an optimality gap.)

Building tuples in the heuristic regime#

When $\binom{M}{m}$ exceeds enumerate_max the same per-tuple QP is reused, but the set of tuples scored is grown adaptively by a multi-start local search (_local_search):

Seeding. $2 \times$ n_starts seeds: the n_starts units with the smallest $G_{jj}$ (single donors already nearest the population centroid – the cheapest places to start near balance) plus n_starts uniformly random units for diversity.
Greedy construction. From a seed, repeatedly add the candidate that most lowers the batched loss of the partial tuple until $|\mathcal{S}| = m$, skipping any addition that would breach the budget.
Best-improvement descent. Score the full 1-swap neighbourhood (replace one member with one non-member), move to the steepest improving swap, and repeat to a local optimum.
Basin-hopping kicks. Perturb the incumbent with a random 2-swap kick and re-descend, keeping the better basin; repeated n_kicks (4) times per start.

Every distinct tuple ever scored is cached, and the global top_K of that cache is returned and re-solved to full precision exactly as in the exact path. The consensus block – how many independent starts’ final optima coincide with the incumbent – is the confidence diagnostic that stands in for the absent MIP gap.

Budget constraints#

When a per-unit treatment cost $c_j$ is supplied (unit_cost_col, constant within unit) and a total budget $B_{\max}$, every returned design must satisfy the hard knapsack constraint

\[\sum_{j \in \mathcal{S}} c_j \le B_{\max}.\]

Two mechanisms enforce it. First, a sound feasibility presolve removes any eligible unit $i$ that cannot belong to any budget-feasible $m$-tuple, i.e. whenever

\[c_i + \sum_{\text{(}m-1\text{) cheapest other candidates}} c_j \;>\; B_{\max}.\]

This only ever drops provably impossible units; it never touches the objective, so it cannot change which feasible design is optimal. Second, the search itself respects the budget at every step: exact enumeration filters combinations by their cost sum, and the local search rejects any greedy addition, swap, or kick that would breach $B_{\max}$. If no $m$-tuple fits the budget the stage reports INFEASIBLE.

Stage 1 returns a solver-style diagnostics block (stats) with the problem definition, presolve removals, search method and number of subsets evaluated, the incumbent and worst-in-pool imbalance, the advisory relaxation bound, the solution pool of top_K tuples, and termination status / runtime.

Stage 2 – Control fit#

For each candidate treated set $\mathcal{S}$ with weights $\mathbf{w}$, the synthetic treated trajectory is $\mathbf{X}_{\cdot, \mathcal{S}}\, \mathbf{w}$. A synthetic control is then built from the non-treated units: control weights $\mathbf{v}$ solve a ridge-penalized quadratic program (penalty lambda_penalty) matching the synthetic treated over $E$, subject to an exclusion constraint $v_j = 0$ for all $j \in \mathcal{S}$ (a treated unit cannot also be its own control; the spillover adjacency widens this to $\mathcal{S} \cup \mathcal{A}(\mathcal{S})$). The per-period gap is

\[e_t = (\mathbf{X}_{\cdot, \mathcal{S}}\, \mathbf{w})_t - (\mathbf{X} \mathbf{v})_t .\]

Pre-treatment fit is summarized by the normalized mean-squared error (NMSE) of the synthetic treated against the target on both windows (nmse_E, nmse_B). The blank-window gaps $e_B = \{e_t : t \in B\}$ (residuals_B) are, under Assumption 3, pure placebo noise; they are the raw material for the power analysis. This stage is identical to the pre-rebuild control path – only the search and power stages around it changed.

Spillover / interference exclusions#

When treating a set of units, interference is a design concern, and Vives-i-Bastida (2022) handles it with two exclusion criteria that LEXSCM implements directly. Both read off one conflict graph, encoded by a symmetric matrix $\mathbf{A} \in \{0, 1\}^{N \times N}$ with $A_{ij} = 1$ iff units $i$ and $j$ interfere (zero diagonal). $\mathbf{A}$ is supplied either as a cluster_col ($A_{ij} = 1$ iff $i, j$ share a cluster, e.g. a state or province) or as an adjacency / spillover matrix ($A_{ij} = 1$ iff the entry exceeds spillover_threshold); the two combine entrywise by logical OR. The matrix is aligned to the IndexSet, the single source of truth for unit identity. Write the conflict-neighbours of a treated set $\mathcal{S}$ as

\[\mathcal{A}(\mathcal{S}) \coloneqq \{\, k : A_{jk} = 1 \ \text{for some}\ j \in \mathcal{S} \,\}.\]

“No interference” (Stage 1, a treatment criterion). The treated set $\mathcal{S} = \operatorname{supp}(\mathbf{w})$ must be conflict-free – an independent set of $\mathbf{A}$, $A_{ij} = 0$ for all $i, j \in \mathcal{S}$. This restricts only the admissible supports, so it enters exactly where the cardinality constraint $\lVert \mathbf{w} \rVert_0 = m$ already lives – a filter on the candidate tuples, not a term in the inner weight program – and only shrinks the search.
“Exclusion restriction” (Stage 2, a control criterion). The donor pool drops $\mathcal{S} \cup \mathcal{A}(\mathcal{S})$: the Stage-2 program pins $v_k = 0$ for every $k \in \mathcal{S} \cup \mathcal{A}(\mathcal{S})$, so a treated unit’s spillover neighbours cannot enter its synthetic control and contaminate the counterfactual.

If no conflict-free $m$-tuple exists (e.g. $m$ exceeds the number of clusters), or the exclusions empty every donor pool, the fit raises MlsynthConfigError rather than returning a degenerate design. With no cluster_col / adjacency supplied, $\mathbf{A} = \mathbf{0}$ and the behaviour is exactly the unconstrained search.

# at most one treated unit per state, and no treated unit's
# same-state neighbours used as its donors
LEXSCM({"df": df, "outcome": "y", "unitid": "unit", "time": "year",
        "candidate_col": "eligible", "m": 2, "cluster_col": "state"}).fit()

Coverage quotas and size bands#

Two further treatment criteria from the same design checklist restrict which units may be treated – both, like the spillover “no interference” rule, act on the admissible supports and never enter the inner weight program.

Coverage / stratification. Give each unit a stratum (region / tier / segment) via stratum_col. min_per_stratum requires at least that many treated units from every stratum that has a candidate (“test in every region”); max_per_stratum allows at most that many (“a quota”). Formally the treated set $\mathcal{S}$ must satisfy $\texttt{min} \le |\{j \in \mathcal{S} : g(j) = s\}| \le \texttt{max}$ for each candidate stratum $s$, where $g(j)$ is unit $j$’s stratum. (Setting max_per_stratum = 1 mirrors a cluster_col quota on the treated side, but unlike the spillover rule it does not also clean the donor pool.)
Treated-unit size bands. With size_col and min_size / max_size, only units whose size lies in $[\texttt{min\_size}, \texttt{max\_size}]$ are eligible for treatment (they remain available as donors). The floor is a power / operational minimum; the ceiling encodes the convex-hull limit – a unit far larger than the rest cannot be reproduced by a convex combination of the others (Vives-i-Bastida excludes big cities for exactly this reason).

Either constraint raises MlsynthConfigError when it is infeasible (e.g. min_per_stratum over more strata than $m$, or fewer than $m$ candidates inside the size band).

# cover every region, at most two per region, and only mid-sized markets
LEXSCM({"df": df, "outcome": "y", "unitid": "unit", "time": "year",
        "candidate_col": "eligible", "m": 4,
        "stratum_col": "region", "min_per_stratum": 1, "max_per_stratum": 2,
        "size_col": "population", "min_size": 50_000, "max_size": 2_000_000}).fit()

Forced-in and forbidden markets#

Two hard market lists complete the treatment-criteria vocabulary, matching SYNDES and GeoLift. to_be_treated pins specified markets into the treated set: every candidate $m$-tuple the search considers must contain them, which is useful when a market is decided in advance (a client insists on testing it) and the design only has to choose the rest around it. not_to_be_treated does the opposite – those markets are removed from the treatment pool but stay available as donors – for a market already running another campaign, with poor data, or earmarked as a clean control. A forced-in market must be a treatment candidate (candidate_col true and within any size band), at most $m$ may be forced, and a market cannot appear in both lists; each violation raises MlsynthConfigError. Forcing in a market that is hard to synthesize costs imbalance, so check the achieved fit.

# always test DMA 501; never test DMA 635 (it stays an eligible donor)
LEXSCM({"df": df, "outcome": "y", "unitid": "unit", "time": "year",
        "candidate_col": "eligible", "m": 4,
        "to_be_treated": [501], "not_to_be_treated": [635]}).fit()

Stage 3 – Power analysis (minimum detectable effect)#

Stage 3 asks: how large must a sustained treatment effect be for this design to detect it? The answer is a minimum-detectable-effect (MDE) curve over post-treatment horizon lengths, computed entirely from the Stage 2 placebo residuals with one consistent resampling model for both the null and the alternative.

What “at each horizon” means – and what it does not#

The curve is indexed by the length $h$ of the post-treatment window (n_post_grid, default $h = 2, \dots, 8$), not by calendar period. There is no “MDE at period $t = 91$”. Instead, for each candidate duration $h$, the analysis manufactures many synthetic $h$-long gap windows by resampling the placebo residuals and asks what constant effect a window of that length could detect. Reading the curve as $h$ grows tells you how detectability improves the longer the experiment runs.

Moving-block resampling#

Let the placebo pool be one or more residual series – the chosen design’s blank-window gaps $e_B$, optionally pooled with donor-unit placebo gaps. The scale $\sigma$ is the standard deviation of the pooled residuals (all series concatenated, ddof=1, floored at $10^{-12}$). The block length defaults to

\[\ell = \max\!\bigl(1,\ \min(h,\ \operatorname{round}(\tilde L^{1/3}))\bigr),\]

where $\tilde L$ is the median series length: floored at 1, and capped at the horizon $h$ so a block never exceeds the window it fills. To draw one length-$h$ window (block_resample_windows()): pick a placebo series uniformly at random, then repeatedly cut contiguous blocks of length $\ell$ from a random offset with wraparound (np.take(..., mode="wrap")), concatenate them, and truncate to $h$. Wrapping preserves the within-series autocorrelation (Assumption 3); the random series choice reproduces the cross-unit placebo distribution.

Test statistic and critical value#

The statistic is the mean absolute gap over the window,

\[S(e) = \frac{1}{h} \sum_{t=1}^{h} \lvert e_t \rvert .\]

Resampling $n_\text{null}$ (default 4000) placebo windows gives the null distribution; its $(1 - \alpha)$ empirical quantile is the critical value $c_\alpha$.

Power, the effect grid, and the MDE#

A treatment effect is modelled as a constant additive shift $\tau$ applied to every point of a resampled window – the homogeneous-effect working assumption behind the MDE. Its power is estimated from a fresh draw of $n_\text{power}$ (default 2000) windows (same residual model, new randomness), so the null and the alternative differ only by the shift:

\[\operatorname{power}(\tau) = \Pr\bigl[\, S(e + \tau) \ge c_\alpha \,\bigr] \approx \frac{1}{n_\text{power}} \sum_{b} \mathbf{1}\!\bigl\{ S(e^{(b)} + \tau) \ge c_\alpha \bigr\}.\]

The effect is swept on a grid of $n_\text{grid} = 64$ points in standard-deviation units, $\tau / \sigma \in [0, \texttt{max\_sd}]$ (default cap $8$). The search walks the grid until $\operatorname{power}(\tau)$ first reaches power_target, then linearly interpolates between the last sub-threshold point $(g_0, p_0)$ and the crossing point $(g, p)$ for a finer value:

\[\frac{\mathrm{MDE}(h)}{\sigma} = g_0 + (\texttt{power\_target} - p_0)\,\frac{g - g_0}{p - p_0} .\]

If the grid is exhausted without reaching the target, the horizon is infeasible and returns $\infty$.

Three reported scales#

Each horizon reports the MDE on three scales:

mde_sd $= \mathrm{MDE}(h)/\sigma$ – effect-size units, the primary scale (matching Vives-i-Bastida’s “detect effects larger than $0.1$ s.d.” convention) and numerically robust: nothing is divided by a fragile mean, so zero-mean or near-zero outcomes cannot blow the calculation up;
mde_abs $= \mathrm{MDE}(h)$ – outcome units;
mde_pct – the manager-facing percentage, $100 \cdot \mathrm{MDE}(h) / \lvert\text{baseline}\rvert$, where the baseline is the counterfactual level over the matching window, $\text{baseline} = \operatorname{mean}\bigl(\texttt{synthetic\_treated} [-h:]\bigr)$. This is guarded: when $\lvert\text{baseline}\rvert$ falls below a floor (default one $\sigma$) the percentage is returned as NaN deliberately, so a near-zero or sign-flipping level cannot manufacture a spurious “we can detect a 0.3% effect.”

Detectability curve and horizon collapse#

detectability_curve() sweeps the horizon grid and returns curve_sd ($h \mapsto \mathrm{MDE}(h)/\sigma$), curve_pct (the percentage curve), the per-horizon details, and min_horizon_mde_le_0p1sd – the shortest horizon whose MDE falls to $0.1\sigma$, a quick read on how long the experiment must run to clear the conventional effect-size bar. Stage 4 then collapses the curve to a single representative MDE per the mde_horizon setting:

"late" (default) – the MDE at the longest horizon; the conservative choice for a sustained-exposure experiment.
"early_min" – the smallest MDE across feasible horizons (most optimistic detectability).
"early_mean" – the mean MDE across feasible horizons (the percentage averaged only over horizons where it is defined).

Stage 4 – Final recommendation (lexicographic selection)#

The final stage turns the per-design metrics – imbalance (validity), mde_sd (power), stability $=$ out-of-sample $\text{NMSE}_B$ (fit robustness), and total_cost – into one recommendation, applying the method’s priorities in strict order:

Validity gate. Keep only designs whose imbalance is within a relative slack of the best achievable balance,

\[\text{imbalance}(S) \le \bigl(1 + \texttt{imbalance\_tol}\bigr)\cdot \min_{S'} \text{imbalance}(S') ,\]

(default imbalance_tol $= 0.25$). This is the set of designs that satisfy Assumption 1 well enough to be trustworthy. (A degenerate tolerance that would empty the gate falls back to the single best-balanced design.)
Power. Among gated designs with a feasible MDE, choose the smallest mde_sd – the most detectable valid design.
Tie-breaks. Break ties by better out-of-sample stability ($\text{NMSE}_B$), then by lower total_cost.

The recommendation is returned as a tuple of fields: winner (the chosen design), shortlist (the top max_shortlist ranked designs), pareto_ids (the Pareto frontier on imbalance $\downarrow$ vs. mde_sd $\downarrow$, always exposed for transparency), a status, a human-readable explanation, and a per-design table (which becomes results.search.shortlist). The status is one of:

OK – a valid, adequately powered design was found;
POWER_NOT_ESTABLISHED – no gated design reached the power target within the effect grid, so the best-balanced design is recommended and the power caveat is flagged (the pipeline degrades gracefully rather than crashing);
EMPTY – no candidate designs were supplied.

Worked numeric walkthrough#

To make the two core mechanisms concrete, here is a single tuple built and powered end to end on a deliberately tiny panel ($T_0 = 6$ pre-period rows, $N = 5$ units, treat $m = 2$). Every number below is the actual helper output, not an illustration.

Building the tuple (Stage 1)#

After $f$-centring and row-standardising the predictors, the Gram matrix’s diagonal – each unit’s squared distance to the population centroid – is

\[\operatorname{diag}(\mathbf{G}) = (5.77,\ 3.64,\ 6.08,\ 6.27,\ 8.24).\]

Unit 1 (0-indexed) sits closest to the target, so it seeds the AFW vertex start. With $\binom{5}{2} = 10$ the search enumerates exactly (status OPTIMAL, 10 subsets scored). The winning tuple is $\mathcal{S} = \{0, 1\}$, and its inner simplex QP returns treatment weights

\[\mathbf{w}(\mathcal{S}) = (0.4098,\ 0.5902), \qquad L(\mathcal{S}) = \mathbf{w}^\top \mathbf{G}_{\mathcal{S}\mathcal{S}}\, \mathbf{w} = 1.6481, \qquad \sqrt{L(\mathcal{S})} = 1.2838 .\]

The standalone high-precision re-solve reproduces this loss with a Frank-Wolfe lower bound equal to it to $10^{-6}$ – the QP is at its certified optimum. The synthetic treated unit is $0.41\,\mathbf{x}_0 + 0.59\,\mathbf{x}_1$, and $1.2838$ is the achieved imbalance on which the bias bound (Assumption 1-2) is conditional.

Powering the design (Stage 3)#

Take this design’s blank-window placebo gaps to be a length-$L = 24$ series with $\sigma = 0.8695$. The auto block length is $\ell = \min\bigl(h, \operatorname{round}(24^{1/3})\bigr) = \min(h, 3)$, so at $h = 2$ it is capped to $2$ and at $h \ge 3$ it is $3$. With a counterfactual level of $\text{baseline} = 12.0$, the per-horizon MDE is:

$h$	$\ell$	$c_\alpha$	`mde_sd`	`mde_abs`	`mde_pct`
2	2	1.245	2.037	1.771	14.8%
4	3	1.108	1.672	1.454	12.1%
8	3	0.981	1.282	1.115	9.3%

Reading down the table is the detectability curve: as the horizon lengthens, averaging over more periods shrinks both the critical value $c_\alpha$ and the MDE, so a sustained effect of $\approx 9\%$ of the counterfactual becomes detectable by $h = 8$, versus $\approx 15\%$ at $h = 2$. Under mde_horizon="late" Stage 4 carries the $h = 8$ row (mde_sd $= 1.28$) as this design’s power score; under "early_min" it would carry the smallest feasible mde_sd.

Standardized Post-Fit and Power Analysis#

LEXSCM’s Stage 3 already produces the lexicographic MDE curve used to rank designs against each other; the standardized post-fit attached to LEXSCMResults.post_fit is the complementary surface used once a design has been chosen – it is the same SyntheticControlPostFit object SYNDES, MAREX, and PANGEO expose, so downstream consumers (dashboards, paper-style reports, comparison tables) read identical fields across the family.

pf = res.report.additional_outputs["post_fit"]   # SyntheticControlPostFit
pf.ate, pf.ate_percent, pf.total_effect    # treatment-effect scalars
pf.rmse_fit, pf.rmse_blank, pf.rmse_post   # pre / blank / post fit quality
pf.p_value, pf.ci_lower, pf.ci_upper       # conformal inference
pf.power                                   # PowerAnalysis (see below)

The trajectories pf.treated_series and pf.control_series are the winning candidate’s predictions.synthetic_treated and predictions.synthetic_control – per-unit weighted aggregates of the treated and control donors over the full timeline. The phase boundaries (n_fit, n_blank, n_post) line up with the same E / B / post split LEXSCM uses internally (so n_fit + n_blank = T0 and the 30%-of-pre-tail holdout convention applies via frac_E = 0.7).

Where the unified power analysis fits in#

Stage 3’s detectability_curve() is the design-selection workhorse – it converts the moving-block placebo null on the B window into a per-horizon MDE used by Stage 4’s lexicographic selector. The post-fit power analysis is the post-selection companion: analytical Gaussian + AR(1) variance inflation on the realised gap residuals, matching the MAREX / SYNDES / PANGEO power surfaces exactly so the same diagnostic table can be produced for every family member.

\[\mathrm{MDE}(T) = \bigl(z_{1-\alpha/2} + z_{1-\beta}\bigr) \cdot \widehat{\sigma}_{\text{placebo}} \cdot \sqrt{\mathrm{VIF}(T, \widehat{\rho})},\]

with $\widehat{\sigma}_{\text{placebo}}$ the SD of the gap on the B (blank / holdout) window, $\widehat{\rho}$ the lag-1 autocorrelation of those residuals clipped to $(-0.99, 0.99)$, and $\mathrm{VIF}(T, \rho) = \tfrac{1}{T}\bigl(1 + 2\sum_{k=1}^{T-1} (1-k/T)\rho^k\bigr)$ the standard AR(1) variance-inflation factor (textbook $1/T$ when $\rho = 0$). See Synthetic Controls for Experimental Design (MAREX) for the full derivation; the same module powers all three estimators.

p = res.power                               # PowerAnalysis (single source)
p.headline.mde_absolute                     # MDE at the realised T_post
p.headline.mde_pct                          # ... as % of post-period baseline
p.headline.power_at_observed                # power to detect res.report.att
p.curve                                     # tuple of MDEPoint per horizon
p.sigma_placebo                             # σ̂ used (B window in LEXSCM)
p.serial_correlation                        # ρ̂ AR(1) of the B residuals

Two MDEs, complementary roles#

Stage 3 MDE (best_candidate.mde_results) – moving-block placebo null on the B window, used to rank designs against each other. Aggregated to a representative scalar by mde_horizon (late / early_min / early_mean) and consumed by Stage 4’s lexicographic gate.
Post-fit MDE (res.power) – analytical Gaussian + AR(1) MDE consumed after a design has been chosen, on the same surface that MAREX / SYNDES / PANGEO produce. Use this when reporting a single detectability number alongside the realised ATE / CI.

Power-analysis failures (e.g. degenerate B-window residuals) never break a fit; res.power is simply left as None. To compute on a non-default horizon grid or significance level call compute_power_analysis() directly.

Verification#

LEXSCM is validated against the synthetic-experimental-design framework it implements (Abadie & Zhao 2026; Vives-i-Bastida 2022). Path A: the Walmart 45-store placebo design tracks pre-period to ~2.7% of mean sales and yields a placebo effect of ~0.9% whose permutation test fails to reject (CI covers zero) – the paper’s “no spurious effect” result. Path B: on the paper’s exact Section-5 linear-factor DGP (J = 15, T_E = 20) the design recovers the planted effect with MAE far below its scale (~0.24 for the single-treated-unit design, falling to ~0.16 at m = 2). Pinned in benchmarks/cases/lexscm_walmart.py and lexscm_design_mc.py; see the dedicated page LEXSCM — Synthetic Experimental Design (Abadie-Zhao 2026; Vives-i-Bastida 2022).

Core API#

class mlsynth.estimators.lexscm.LEXSCM(config)#

Bases: object

Lexicographic Synthetic Control (LEXSCM) estimator.

This estimator automatically designs synthetic control experiments by jointly optimizing:

Pre-treatment fit (validity)

Statistical power (detectability of effects)

The user interacts with the estimator through a single entry point: fit(), while all modeling choices are controlled via the configuration object.

Parameters:

config (dict or LEXSCMConfig) – Configuration object specifying data inputs, identification strategy, synthetic control settings, search budget, and inference parameters.

The following fields are supported:

Required

df (pd.DataFrame): Panel dataset containing unit-level observations over time.
outcome (str): Name of the outcome variable.
unitid (str): Column identifying observational units.
time (str): Time index column.
candidate_col (str): Boolean (0/1 or True/False) column indicating units eligible for treatment assignment.
m (int): Number of units selected per treated tuple.

Identification / Design

post_col (str, optional): Indicator for post-treatment period (0/1).
frac_E (float): Fraction of pre-treatment period used for estimation window E.
unit_cost_col (str, optional): Per-unit treatment cost column (must be constant within unit).
budget (float, optional): Total budget constraint on selected treated units.
seed (int): Random seed for reproducibility.

Synthetic Control Specification

weight_col (str, optional): Unit-level weights (e.g., population, revenue).
covariates (list of str, optional): Covariates included in synthetic control construction.
lambda_penalty (float): Regularization penalty for control mismatch in quadratic program.

Search / Optimization Budget

top_K (int): Number of top candidate tuples returned by the treated-design selection search.
top_P (int): Deprecated/unused (the multi-start search seeds itself); retained for backward-compatible configs.

Inference / Power Analysis

alpha (float): Significance level for statistical testing.
n_post_grid (list of int): Post-treatment horizons used for MDE detectability curves.
n_sims (int): Number of Monte Carlo simulations for null distribution.
post_imputation ({“mean”, “max”, “double_max”}): Method for imputing post-treatment outcomes.
test_statistic ({“mean_abs”, “mean”, “rms”}): Test statistic used for treatment effect evaluation.
delta (float): Absolute minimum detectable effect threshold.
relative_delta (float, optional): Relative MDE threshold (> 1.0).
target_mde_horizon (str): Target horizon for MDE evaluation (e.g., early_mde_avg).
max_shortlist (int): Maximum number of candidate designs retained after filtering.

System / Logging

verbose (bool): If True, enables progress logging.

Returns:

LEXSCM – Initialized estimator ready for .fit() execution.

Configuration#

class mlsynth.config_models.LEXSCMConfig(*, df: ~pandas.DataFrame, outcome: str, unitid: str, time: str, candidate_col: str, m: ~typing.Annotated[int, ~annotated_types.Gt(gt=0)], post_col: str | None = None, weight_col: str | None = None, unit_cost_col: str | None = None, budget: ~typing.Annotated[float | None, ~annotated_types.Gt(gt=0.0)] = None, targeting_penalty: ~typing.Annotated[float, ~annotated_types.Ge(ge=0)] = 0.0, cluster_col: str | None = None, adjacency: ~typing.Any | None = None, spillover_threshold: ~typing.Annotated[float, ~annotated_types.Ge(ge=0)] = 0.0, stratum_col: str | None = None, min_per_stratum: ~typing.Annotated[int | None, ~annotated_types.Ge(ge=1)] = None, max_per_stratum: ~typing.Annotated[int | None, ~annotated_types.Ge(ge=1)] = None, size_col: str | None = None, min_size: float | None = None, max_size: float | None = None, to_be_treated: ~typing.List | None = None, not_to_be_treated: ~typing.List | None = None, seed: int = 42, frac_E: float = 0.7, covariates: ~typing.List[str] | None = None, lambda_penalty: float = 0.1, top_K: int = 20, top_P: int = 10, alpha: ~typing.Annotated[float, ~annotated_types.Gt(gt=0.0), ~annotated_types.Lt(lt=1.0)] = 0.05, n_post_grid: ~typing.List[int] = <factory>, n_sims: int = 1000, post_imputation: ~typing.Literal['mean', 'max', 'double_max'] = 'mean', test_statistic: ~typing.Literal['mean_abs', 'mean', 'rms'] = 'mean_abs', mde_horizon: ~typing.Literal['early_mean', 'early_min', 'late'] = 'late', max_shortlist: ~typing.Annotated[int, ~annotated_types.Gt(gt=0)] = 5, power_target: ~typing.Annotated[float, ~annotated_types.Gt(gt=0.0), ~annotated_types.Lt(lt=1.0)] = 0.8, imbalance_tol: ~typing.Annotated[float, ~annotated_types.Ge(ge=0)] = 0.25, display_graph: bool = False, verbose: bool = True)#

Configuration for LEXSCM - Fast Synthetic Experiment Design pipeline.

class Config#

arbitrary_types_allowed = True#

extra = 'forbid'#

adjacency: Any | None#

alpha: float#

budget: float | None#

candidate_col: str#

cluster_col: str | None#

covariates: List[str] | None#

display_graph: bool#

frac_E: float#

imbalance_tol: float#

lambda_penalty: float#

m: int#

max_per_stratum: int | None#

max_shortlist: int#

max_size: float | None#

mde_horizon: Literal['early_mean', 'early_min', 'late']#

min_per_stratum: int | None#

min_size: float | None#

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_post_grid: List[int]#

n_sims: int#

not_to_be_treated: List | None#

post_col: str | None#

post_imputation: Literal['mean', 'max', 'double_max']#

power_target: float#

seed: int#

size_col: str | None#

spillover_threshold: float#

stratum_col: str | None#

targeting_penalty: float#

test_statistic: Literal['mean_abs', 'mean', 'rms']#

to_be_treated: List | None#

top_K: int#

top_P: int#

unit_cost_col: str | None#

verbose: bool#

weight_col: str | None#

Result Containers#

LEXSCM.fit() returns a LEXSCMResults, which is a DesignResult – the experimental-design half of mlsynth’s two-family result contract. LEXSCM chooses which units to treat before any intervention, so it returns a design that resolves to an effect report. The surface is small and grouped – one obvious home for each thing.

Front door (the standardized contract):

res.report – the realized effect as an EffectResult: the single source for the ATT / CI / pre-fit (flat att / att_ci / counterfactual / gap plus the standard sub-models). Its additional_outputs['post_fit'] carries the full SyntheticControlPostFit – the MAREX-family shared bundle with per-period effects and covariate-balance SMDs – alongside ate_per_period and cumulative_effect.
res.power – the design’s MDE / power analysis (the single source for power; None if power analysis failed).
res.selected_units / res.assignment / res.design_weights – the chosen design (the treated units, the treated/control split, and the implied WeightsResults).
res.metadata – the lexicographic recommendation diagnostics.

Grouped detail:

res.search (LEXSCMSearch) – how the design was chosen: shortlist (the ranked table), candidates (every evaluated design), winner (the chosen SEDCandidate with its weights / predictions / losses / raw MDE curve), and selection (the Stage-1 treated-design selection diagnostics).
res.panel (LEXSCMPanel) – the panel structure: time, units, outcome, population_mean.

class mlsynth.utils.fast_scm_helpers.structure.LEXSCMResults(*, report: BaseEstimatorResults | None = None, assignment: Any | None = None, selected_units: Any | None = None, design_weights: WeightsResults | None = None, power: Any | None = None, metadata: Dict[str, Any] | None = None, search: LEXSCMSearch, panel: LEXSCMPanel, **extra_data: Any)#

Bases: DesignResult

Top-level container returned by mlsynth.LEXSCM.fit().

A DesignResult (the experimental-design family) with a deliberately small, grouped surface – one obvious home for each thing:

Front door (the standardized contract).

report – the realized effect as an EffectResult (the single source for ATT / CI / p-value / pre-fit; its additional_outputs['post_fit'] holds the full SyntheticControlPostFit with the per-period effects and covariate-balance SMDs).
power – the design’s MDE / power analysis (the single source for power; None if power analysis failed).
selected_units / assignment / design_weights – the design.
metadata – the lexicographic recommendation diagnostics.

Grouped detail.

search – the candidate search (LEXSCMSearch): shortlist, candidates, winner, selection.
panel – the panel structure (LEXSCMPanel): time, units, outcome, population_mean.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'allow', 'frozen': True, 'json_encoders': {<class 'numpy.ndarray'>: <function MlsynthResult.Config.<lambda>>}}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

panel: LEXSCMPanel#

search: LEXSCMSearch#

class mlsynth.utils.fast_scm_helpers.structure.LEXSCMSearch(shortlist: Any, candidates: List[SEDCandidate], winner: SEDCandidate, selection: Dict[str, Any])#

The design search – how the treated set was chosen.

shortlist#

Ranked table of candidate designs (the recommendation shortlist).

Type:: pd.DataFrame

candidates#

Every evaluated candidate design.

Type:: list of SEDCandidate

winner#

The selected design (winner of the lexicographic search), with its weights / predictions / losses / inference / raw MDE curve.

Type:: SEDCandidate

selection#

Treated-design selection diagnostics (the imbalance-minimizing enumerate-or-local-search) plus the recommendation record.

Type:: dict

candidates: List[SEDCandidate]#

selection: Dict[str, Any]#

shortlist: Any#

winner: SEDCandidate#

class mlsynth.utils.fast_scm_helpers.structure.LEXSCMPanel(time: TimeInfo, units: UnitInfo, outcome: str, population_mean: ndarray)#

The panel structure the design was built on.

time#

Fitting / blank / post period layout.

Type:: TimeInfo

units#

Treated / control unit rosters.

Type:: UnitInfo

outcome#

Outcome column name.

Type:: str

population_mean#

Population-mean outcome series over time (the design target).

Type:: np.ndarray

outcome: str#

population_mean: ndarray#

time: TimeInfo#

units: UnitInfo#

class mlsynth.utils.post_fit.SyntheticControlPostFit(treated_series: ndarray, control_series: ndarray, gap_series: ndarray, n_fit: int, n_blank: int, n_post: int, ate: float | None = None, total_effect: float | None = None, ate_percent: float | None = None, ate_per_period: ndarray | None = None, cumulative_effect: ndarray | None = None, p_value: float | None = None, ci_lower: float | None = None, ci_upper: float | None = None, inference_method: str | None = None, rmse_fit: float | None = None, rmse_blank: float | None = None, rmse_post: float | None = None, covariate_names: Tuple[str, ...] = (), covariate_smd: Dict[str, float] | None = None, covariate_smd_abs_max: float | None = None, covariate_smd_squared_sum: float | None = None, covariate_smd_treated_vs_pop: Dict[str, float] | None = None, covariate_smd_treated_vs_pop_abs_max: float | None = None, covariate_smd_treated_vs_pop_squared_sum: float | None = None, covariate_smd_control_vs_pop: Dict[str, float] | None = None, covariate_smd_control_vs_pop_abs_max: float | None = None, covariate_smd_control_vs_pop_squared_sum: float | None = None, power: PowerAnalysis | None = None)#

Bases: object

Standardized post-fit diagnostics for a single synthetic control design.

Field semantics are estimator-agnostic; every MAREX-family adapter populates the same shape. Any field that isn’t naturally computable for the producing estimator is left None.

ate: float | None = None#

ate_per_period: ndarray | None = None#

ate_percent: float | None = None#

ci_lower: float | None = None#

ci_upper: float | None = None#

control_series: ndarray#

covariate_names: Tuple[str, ...] = ()#

covariate_smd: Dict[str, float] | None = None#

covariate_smd_abs_max: float | None = None#

covariate_smd_control_vs_pop: Dict[str, float] | None = None#

covariate_smd_control_vs_pop_abs_max: float | None = None#

covariate_smd_control_vs_pop_squared_sum: float | None = None#

covariate_smd_squared_sum: float | None = None#

covariate_smd_treated_vs_pop: Dict[str, float] | None = None#

covariate_smd_treated_vs_pop_abs_max: float | None = None#

covariate_smd_treated_vs_pop_squared_sum: float | None = None#

cumulative_effect: ndarray | None = None#

gap_series: ndarray#

inference_method: str | None = None#

n_blank: int#

n_fit: int#

n_post: int#

p_value: float | None = None#

power: PowerAnalysis | None = None#

rmse_blank: float | None = None#

rmse_fit: float | None = None#

rmse_post: float | None = None#

total_effect: float | None = None#

treated_series: ndarray#

class mlsynth.utils.post_fit.PowerAnalysis(headline: MDEPoint, curve: Tuple[MDEPoint, ...], alpha: float, power_target: float, sigma_placebo: float, serial_correlation: float, baseline: float, method: str = 'analytical_ar1')#

Bases: object

Standardized power-analysis output attached to SyntheticControlPostFit.

Built from the placebo / blank-period gap variance and an analytical Gaussian approximation, with AR(1) variance inflation to handle serial correlation in the gap residuals. The intent matches the per-estimator power modules already in the library (PangeoPower, SPCDPowerAnalysis, SYNDESPower) but consumes the same SyntheticControlPostFit shape so every covariate-aware SCM-family estimator gets the surface for free.

headline#

MDE for the actual n_post horizon of the realised design.

Type:: MDEPoint

curve#

MDE / power values across the requested post_grid horizons (so callers can read a detectability curve).

Type:: list of MDEPoint

alpha#

Two-sided significance level assumed.

Type:: float

power_target#

Target power the MDEs are computed at (default 0.80).

Type:: float

sigma_placebo#

Standard deviation of the placebo gap series used as the noise scale.

Type:: float

serial_correlation#

Lag-1 (AR(1)) autocorrelation of the placebo gap residuals used to inflate the variance for serial dependence.

Type:: float

baseline#

Mean of the control trajectory on the post window (denominator for mde_pct). NaN when no post window exists.

Type:: float

method#

"analytical_ar1" for the closed-form Gaussian + AR(1) MDE used here. Reserved for future "monte_carlo" extensions.

Type:: str

alpha: float#

baseline: float#

curve: Tuple[MDEPoint, ...]#

headline: MDEPoint#

mde_by_horizon() → Dict[int, float]#: {post_periods: mde_pct} for quick lookup.

method: str = 'analytical_ar1'#

power_target: float#

serial_correlation: float#

sigma_placebo: float#

class mlsynth.utils.post_fit.MDEPoint(post_periods: int, mde_absolute: float, mde_pct: float, se: float, power_at_observed: float | None = None)#

Bases: object

Minimum detectable effect at a single post-treatment horizon.

mde_absolute: float#

mde_pct: float#

post_periods: int#

power_at_observed: float | None = None#

se: float#

Helper Modules#

Treated-tuple selection (Stage 1)#

Exact enumeration / multi-start local search over treated $m$-tuples, with the budget presolve and the away-step Frank-Wolfe simplex solver.

Treated-unit selection for synthetic-control experimental design.

Drop-in replacement for the branch-and-bound search in fast_scm_helpers.fast_scm_bb. It selects the top_K candidate treated m-tuples that best reproduce the population predictor vector, i.e. that minimise the imbalance

L(S) = min_{w in simplex(S)} || sum_{j in S} w_j Xtilde_j ||^2
= w’ G_SS w, G = Xtilde’ Xtilde (PSD Gram matrix)

where Xtilde are the f-centred, standardised predictors over the estimation window E. Because the design is f-centred, the population target is the donor centroid (the origin), so L is the squared distance from the origin to the convex hull of the selected donors and sqrt(L) is the achieved imbalance.

Why this is a search and not an exact MIP#

Abadie & Zhao (2026, “Synthetic Controls for Experimental Design”, p.13) do not require global optimality of the weights – only that the chosen design is feasible and approximately balanced, Xbar - sum_j w_j X_j ~= 0. Validity of the bias bound and the inference is conditional on the achieved imbalance, a goodness-of-fit quantity reported here, not on a certificate of optimality.

Moreover, an exact bound is structurally weak: because the origin lies inside the convex hull of all candidates, the convex relaxation lower bound is ~0 over the upper half of any branch-and-bound tree, so it cannot prune there. We therefore:

enumerate exactly when C(M, m) is small (the gold standard), and
otherwise run a strengthened multi-start local search, which in Monte Carlo lands on the exact optimum 83-100% of the time and within ~1% (mean) / ~7% (worst) of the minimal imbalance – immaterial under the approximate-balance criterion above.

The achieved imbalance of every returned design is reported so the analyst can check the ~= 0 condition that actually licenses the method.

class mlsynth.utils.fast_scm_helpers.lexsearch.TreatedDesign(indices: 'List[int]', weights: 'np.ndarray', loss: 'float', imbalance: 'float', total_cost: 'float' = 0.0, label: 'str' = '', labels: 'List[Any]' = <factory>, weight_dict: 'Dict[Any, float]' = <factory>, full_weights: 'Optional[np.ndarray]' = None)#

full_weights: ndarray | None = None#

imbalance: float#

indices: List[int]#

label: str = ''#

labels: List[Any]#

loss: float#

total_cost: float = 0.0#

weight_dict: Dict[Any, float]#

weights: ndarray#

mlsynth.utils.fast_scm_helpers.lexsearch.select_treated_designs(G: ndarray, candidate_idx: Sequence[int], m: int, top_K: int = 20, *, unit_costs: ndarray | None = None, budget: float | None = None, method: str = 'auto', enumerate_max: int = 3000000, n_starts: int = 16, iters: int = 80, random_state: int = 0, unit_index: Any | None = None, conflict: ndarray | None = None, strata: ndarray | None = None, min_per_stratum: int | None = None, max_per_stratum: int | None = None, size_band: Any | None = None, targeting_penalty: float = 0.0, forced: Sequence[int] | None = None) → Dict[str, Any]#

Select the top_K treated m-tuples with smallest imbalance.

targeting_penalty (gamma >= 0) makes the design weakly targeted: it adds gamma || w - (1/m) 1 ||^2 to the treated QP, pulling the weights toward the group’s own equal-weight aggregate so the treated group looks more like itself than like the population mean (trading some population representativeness – the ATE – for a deployable equal-weight group closer to its own ATT). On the simplex this is exactly a diagonal ridge, min_w w'(G_SS + gamma I) w; the reported imbalance remains the true targeting distance sqrt(w' G w) at the penalized w. gamma=0 (the default) is the fully targeted design, unchanged.

Parameters:

G ((n, n) PSD Gram matrix of f-centred standardised predictors.)
candidate_idx (indices eligible for treatment.)
m (number of treated units per design.)
top_K (how many designs to return.)
unit_costs, budget (optional per-unit costs and total budget.)
method (“enumerate” forces exact; “heuristic” forces local search;) – “auto” enumerates when C(M, m) <= enumerate_max else searches.

Returns:

dict –

Dictionary with:

"top_designs" : list[TreatedDesign] sorted by imbalance.
"stats"       : diagnostics, incl. whether the result is the exact
                global top-K and the achieved-imbalance range.

Control fit (Stage 2)#

mlsynth.utils.fast_scm_helpers.fast_scm_control.evaluate_candidates(candidates: List[Solution], X: ndarray, X_E: ndarray, Y: ndarray, f: ndarray, E_idx: ndarray, B_idx: ndarray, lambda_penalty: float, index_set: IndexSet, conflict: ndarray | None = None) → List[SEDCandidate]#

Evaluate candidate treated-unit subsets by constructing synthetic controls.

Parameters:

candidates (list of Solution) – Candidate subsets from branch-and-bound, each containing indices and weights.
X (np.ndarray, shape (T, N)) – Full feature matrix (outcomes + covariates).
X_E (np.ndarray, shape (T_E, N)) – Standardized feature matrix over the estimation period.
Y (np.ndarray, shape (T, J)) – Outcome matrix.
f (np.ndarray, shape (N,)) – Weight vector used to construct the target series.
E_idx (np.ndarray) – Indices for estimation period.
B_idx (np.ndarray) – Indices for validation/backcast period.
lambda_penalty (float) – Regularization strength for control QP.

Returns:

results (list of SEDCandidate) – Evaluated candidates including weights, predictions, and loss metrics.

Notes

For each candidate subset:

Construct synthetic treated unit:

treated_vec_E = X_E[:, treated_idx] @ w

Solve for synthetic control weights v.
Compute full time-series:
- synthetic treated
- synthetic control
- treatment effects
Evaluate fit using NMSE on estimation and validation periods.

Additional Details#

Target series is defined as:
```
target = X[:, :J] @ f[:J]
```
(weighted average over outcome units).
Candidates with failed QP solves are skipped.
Results are packaged into structured SEDCandidate objects.

Power / MDE (Stage 3)#

Moving-block placebo minimum-detectable-effect analysis.

Power / minimum-detectable-effect (MDE) analysis for LEXSCM designs.

Rebuilt to be a consistent nonparametric permutation analysis, replacing the earlier version whose null was quasi-empirical while its alternative was a pure Gaussian shift (an internally inconsistent power calc), and whose effect was normalised by mean(synth_treated[-n_post:]) floored at 1e-8 (which exploded on zero-mean outcomes).

Design#

One residual model for null and alternative. Both are built by moving-block resampling the placebo residuals (the held-out blank-period gaps treated - control, which are pure noise under H0), so serial dependence is preserved – matching Abadie & Zhao’s time-series-robust inference.
MDE in standard-deviation units (effect size), the scale used by Vives-i-Bastida (“detect effects > 0.1 s.d.”), plus the absolute outcome-unit value. No division by a fragile level, so zero-mean outcomes are fine.
Adaptive effect grid: the search extends until power_target is reached (or a cap), so a detectable-but-large effect never returns NaN.
Multi-series placebo pooling: pass the candidate’s blank gaps and the donor-unit placebo gaps; the null resamples within a randomly chosen series, giving the paper’s cross-unit placebo distribution.

Test statistic: S(e) = mean(|e_t|) over the post window (matches the permutation test used for inference).

mlsynth.utils.fast_scm_helpers.lexpower.block_resample_windows(series_list, n_post, n_draws, block_len, rng)#

Return an (n_draws, n_post) array of moving-block resampled windows.

Each draw: pick a random placebo series, then tile moving blocks (with wraparound) until the window has length n_post. Preserves within-series autocorrelation and pools across series.

mlsynth.utils.fast_scm_helpers.lexpower.compute_mde(noise_pool, n_post: int, *, alpha: float = 0.05, power_target: float = 0.8, block_len: int | None = None, n_null: int = 4000, n_power: int = 2000, max_sd: float = 8.0, n_grid: int = 64, random_state: int = 0, baseline: float | None = None, baseline_floor: float | None = None) → Dict#

Minimum detectable (constant) effect at horizon n_post.

Parameters:

noise_pool (1D array, or list of 1D arrays) – Placebo residual series (held-out blank-period gaps; optionally also the donor-unit placebo gaps). Outcome units.
n_post (int) – Post-treatment window length.
alpha, power_target (float) – Test level and target power (e.g. 0.05 and 0.80).
block_len (int, optional) – Moving-block length; default ~ L**(1/3).
max_sd (float) – Largest effect (in residual-SD units) the adaptive grid will probe.
baseline (float, optional) – Counterfactual outcome level the effect is expressed relative to (e.g. mean(synthetic_treated[-n_post:])). When supplied, a percentage MDE is reported – the manager-facing “we can detect a Y% effect” number.
baseline_floor (float, optional) – Minimum abs(baseline) for which a percentage is trustworthy; defaults to one residual SD (sigma). Below the floor mde_pct is NaN, deliberately, so zero-mean / sign-flipping outcomes cannot reproduce the spurious blow-up that motivated dropping level-normalisation here.

Returns:

dict –

Dictionary with:

mde_sd      : MDE in residual-standard-deviation units (np.inf if not
              reached within ``max_sd``).
mde_abs     : MDE in outcome units (= mde_sd * sigma).
mde_pct     : MDE as a percentage of ``baseline`` (NaN if no baseline
              was given or it falls below ``baseline_floor``).
baseline    : the level used for the percentage (NaN if none).
sigma       : residual SD scale.
c_alpha     : critical value of the placebo null statistic.
power_at_mde: achieved power at the reported MDE.
feasible    : whether power_target was reached within max_sd.
curve       : list of (effect_sd, power) probed.

mlsynth.utils.fast_scm_helpers.lexpower.detectability_curve(noise_pool, n_post_grid, *, baseline_series=None, **kw) → Dict#

MDE at each horizon in n_post_grid.

Returns {'details', 'curve_sd', 'curve_pct', ...}. If baseline_series (the synthetic-treated counterfactual level) is supplied, each horizon w also gets a percentage MDE relative to mean(baseline_series[-w:]) – the level over the matching post window. See compute_mde() for the guard that returns NaN when that level is not a trustworthy magnitude.

Recommendation (Stage 4)#

Lexicographic design selection.

Final design recommendation for LEXSCM (lexicographic selection).

Replaces the old select_best_tuple, whose efficiency score multiplied an MDE that was often NaN/Inf and whose Pareto frame could empty out and raise an opaque IndexError.

The selection is lexicographic, matching the method’s premise (Abadie & Zhao: approximate balance first, then power):

Validity gate. Keep designs whose imbalance is within imbalance_tol (relative) of the best achievable balance – the set of “good treated fits”.

Power. Among the gated designs, pick the smallest MDE (most detectable).

Tie-breaks. Then better out-of-sample stability, then lower cost.

If no gated design has a feasible MDE, it still returns the best-by-balance design and flags status='POWER_NOT_ESTABLISHED' instead of crashing. A Pareto frontier (imbalance vs MDE) is always exposed for transparency.

class mlsynth.utils.fast_scm_helpers.lexselect.DesignMetrics(design_id: 'str', indices: 'List[int]', imbalance: 'float', mde_sd: 'float' = inf, mde_abs: 'float' = inf, mde_pct: 'float' = nan, mde_feasible: 'bool' = False, stability: 'float' = nan, total_cost: 'float' = 0.0, labels: 'List[Any]' = <factory>)#

design_id: str#

imbalance: float#

indices: List[int]#

labels: List[Any]#

mde_abs: float = inf#

mde_feasible: bool = False#

mde_pct: float = nan#

mde_sd: float = inf#

stability: float = nan#

total_cost: float = 0.0#

class mlsynth.utils.fast_scm_helpers.lexselect.Recommendation(winner: 'Optional[DesignMetrics]', shortlist: 'List[DesignMetrics]', pareto_ids: 'List[str]', status: 'str', explanation: 'str', table: 'List[Dict[str, Any]]')#

explanation: str#

pareto_ids: List[str]#

shortlist: List[DesignMetrics]#

status: str#

table: List[Dict[str, Any]]#

winner: DesignMetrics | None#

mlsynth.utils.fast_scm_helpers.lexselect.select_design(designs: List[DesignMetrics], *, imbalance_tol: float = 0.25, imbalance_abs: float | None = None, max_shortlist: int = 5) → Recommendation#: Lexicographic recommendation: balance gate -> power -> stability -> cost.

Data preparation and matrix construction#

Utilities for constructing the design matrices, the population target, the estimation/blank split, and the Gram matrix.

mlsynth.utils.fast_scm_helpers.fast_scm_setup.build_X_tilde(X: ndarray, f: ndarray, idx: ndarray)#

Standardize the feature matrix X over the estimation period, returning a normalized matrix and its Gram matrix.

Parameters:

X (np.ndarray, shape (T, N)) – Full feature matrix including outcomes and covariates.
f (np.ndarray, shape (N,)) – Weight vector (typically group means) used for centering.
idx (np.ndarray, shape (T_E,)) – Indices corresponding to the estimation period.
J (int) – Number of outcome columns in X (the first J columns).

Returns:

X_E (np.ndarray, shape (T_E, N)) – Standardized X over the estimation period. Each row is centered by the weighted mean mu = X[:, :J] @ f[:J] and scaled by per-time standard deviation across all columns.
G (np.ndarray, shape (N, N)) – Gram matrix of the standardized X: G = X_E.T @ X_E.

Notes

Standardization is performed only over the estimation period rows (idx).
Weighted mean is applied to the first J columns (outcome variables) only.
Rows with near-zero standard deviation are scaled with sigma=1.0 to avoid division by zero.
This function is fully vectorized using NumPy for efficiency.

mlsynth.utils.fast_scm_helpers.fast_scm_setup.build_Y_matrix(working_df, outcome, time, unitid, unit_index)#

Construct outcome matrix in (time × unit) format.

Parameters:

working_df (pd.DataFrame) – Input panel data.
outcome (str) – Outcome variable name.
time (str) – Time index column.
unitid (str) – Unit identifier column.
unit_index (IndexSet) – Ordered unit index.

Returns:

np.ndarray, shape (T, J) – Wide-format outcome matrix.

mlsynth.utils.fast_scm_helpers.fast_scm_setup.build_Z_matrix(working_df, covariates, time, unitid, unit_index)#

Construct stacked covariate matrix, collapsing time-invariant variables to a single row to prevent unintended over-weighting.

Parameters:

working_df (pd.DataFrame) – Input panel data.
covariates (list of str) – List of covariate column names.
time (str) – Time column.
unitid (str) – Unit identifier column.
unit_index (IndexSet) – Unit ordering.

Returns:

np.ndarray or None – Stacked covariate matrix (N rows x Units).

mlsynth.utils.fast_scm_helpers.fast_scm_setup.build_candidate_mask(working_df, candidate_col, unit_index, unitid)#

Construct boolean mask identifying candidate units.

Parameters:

working_df (pd.DataFrame) – Preprocessed panel data.
candidate_col (str) – Column indicating eligibility for treatment.
unit_index (IndexSet) – Full set of unit labels.
unitid (str) – Unit identifier column name.

Returns:

np.ndarray – Boolean mask of candidate units aligned with unit_index.

mlsynth.utils.fast_scm_helpers.fast_scm_setup.build_f_vector(working_df, weight_col, unitid, unit_index)#

Construct unit weighting vector.

Parameters:

working_df (pd.DataFrame) – Input dataset.
weight_col (str or None) – Optional column defining unit weights.
unitid (str) – Unit identifier column.
unit_index (IndexSet) – Unit ordering.

Returns:

np.ndarray – Normalized weight vector over units.

mlsynth.utils.fast_scm_helpers.fast_scm_setup.prepare_experiment_inputs(Y: ndarray, Z: ndarray | None = None, f: ndarray | None = None, candidate_mask: ndarray | None = None, m: int = 5) → Tuple[ndarray, ndarray, ndarray, int, int]#

Combine inputs into a unified feature matrix and aligned selection structures.

Parameters:

Y (np.ndarray, shape (T, J)) – Outcome matrix.
Z (np.ndarray, optional) – Covariate matrix. If provided, concatenated with Y along columns.
f (np.ndarray, optional) – Weight vector of length N. Defaults to uniform weights if None.
candidate_mask (np.ndarray of bool, optional) – Boolean mask indicating candidate columns. If shorter than N, it will be extended with False values. If None, defaults to selecting only Y columns.
m (int, default=5) – Minimum required number of candidate units.

Returns:

X (np.ndarray, shape (T, N)) – Combined feature matrix (Y and optionally Z).
f (np.ndarray, shape (N,)) – Weight vector aligned with X.
candidate_idx (np.ndarray) – Indices of candidate columns in X.
T (int) – Number of time periods.
N (int) – Total number of columns in X.

Raises:

ValueError – If candidate_mask is longer than N or if fewer than m candidates are available.

Notes

Candidate mask is automatically extended to match the full feature dimension.
By default, only outcome columns (Y) are considered candidates.

mlsynth.utils.fast_scm_helpers.fast_scm_setup.split_periods(T0: int, n_covariates: int, frac_E: float = 0.7, post_df: DataFrame | None = None, time_col: str = 'time') → Tuple[ndarray, ndarray, ndarray, int, int]#: Returns indices and time-period counts.

Standardized post-fit (shared across the MAREX family) – the compute_post_fit() and compute_power_analysis() helpers that populate the post-fit bundle live outside this package so SYNDES, MAREX, and PANGEO all consume the same diagnostics module:

Standardized post-fit diagnostics for synthetic control designs and the matching power-analysis surface that consumes them.

After any MAREX-family estimator (LEXSCM, MAREX, SYNDES, PANGEO, …) solves its design problem, downstream consumers (the SAGE dashboard, paper-style reports, comparison tables) all need the same numbers:

the post-treatment ATT, total effect, percentage lift, per-period gap;

pre / blank / post root-mean-squared-error of the synthetic gap;

inference scalars (p-value, CI bounds) when computed;

covariate-balance standardized mean differences (SMDs) when covariates were used in the design.

This module exposes one frozen dataclass (SyntheticControlPostFit) and three free functions:

compute_smd() – standalone, panel-independent SMD
from any (cov_matrix, treated_w, control_w);

compute_post_fit() – the full diagnostic bundle from
trajectories + boundaries + (optional) covariate matrix + (optional) inference;

compute_post_fit_marex() – adapter that builds the bundle from a
MAREXResults + MAREXPanel pair.

The free-function entry points are deliberately small and reusable, so the LEXSCM / SYNDES / PANGEO equivalents can be added one-at-a-time without touching this module: they just compose the same primitives.

class mlsynth.utils.post_fit.MDEPoint(post_periods: int, mde_absolute: float, mde_pct: float, se: float, power_at_observed: float | None = None)#

Bases: object

Minimum detectable effect at a single post-treatment horizon.

mde_absolute: float#

mde_pct: float#

post_periods: int#

power_at_observed: float | None = None#

se: float#

Bases: object

Standardized power-analysis output attached to SyntheticControlPostFit.

headline#

MDE for the actual n_post horizon of the realised design.

Type:: MDEPoint

curve#

MDE / power values across the requested post_grid horizons (so callers can read a detectability curve).

Type:: list of MDEPoint

alpha#

Two-sided significance level assumed.

Type:: float

power_target#

Target power the MDEs are computed at (default 0.80).

Type:: float

sigma_placebo#

Standard deviation of the placebo gap series used as the noise scale.

Type:: float

serial_correlation#

Lag-1 (AR(1)) autocorrelation of the placebo gap residuals used to inflate the variance for serial dependence.

Type:: float

baseline#

Mean of the control trajectory on the post window (denominator for mde_pct). NaN when no post window exists.

Type:: float

method#

"analytical_ar1" for the closed-form Gaussian + AR(1) MDE used here. Reserved for future "monte_carlo" extensions.

Type:: str

alpha: float#

baseline: float#

curve: Tuple[MDEPoint, ...]#

headline: MDEPoint#

mde_by_horizon() → Dict[int, float]#: {post_periods: mde_pct} for quick lookup.

method: str = 'analytical_ar1'#

power_target: float#

serial_correlation: float#

sigma_placebo: float#

Bases: object

Standardized post-fit diagnostics for a single synthetic control design.

Field semantics are estimator-agnostic; every MAREX-family adapter populates the same shape. Any field that isn’t naturally computable for the producing estimator is left None.

ate: float | None = None#

ate_per_period: ndarray | None = None#

ate_percent: float | None = None#

ci_lower: float | None = None#

ci_upper: float | None = None#

control_series: ndarray#

covariate_names: Tuple[str, ...] = ()#

covariate_smd: Dict[str, float] | None = None#

covariate_smd_abs_max: float | None = None#

covariate_smd_control_vs_pop: Dict[str, float] | None = None#

covariate_smd_control_vs_pop_abs_max: float | None = None#

covariate_smd_control_vs_pop_squared_sum: float | None = None#

covariate_smd_squared_sum: float | None = None#

covariate_smd_treated_vs_pop: Dict[str, float] | None = None#

covariate_smd_treated_vs_pop_abs_max: float | None = None#

covariate_smd_treated_vs_pop_squared_sum: float | None = None#

cumulative_effect: ndarray | None = None#

gap_series: ndarray#

inference_method: str | None = None#

n_blank: int#

n_fit: int#

n_post: int#

p_value: float | None = None#

power: PowerAnalysis | None = None#

rmse_blank: float | None = None#

rmse_fit: float | None = None#

rmse_post: float | None = None#

total_effect: float | None = None#

treated_series: ndarray#

mlsynth.utils.post_fit.compute_post_fit(treated_series: ndarray, control_series: ndarray, *, n_fit: int, n_blank: int = 0, n_post: int | None = None, cov_matrix: ndarray | None = None, cov_names: Sequence[str] | None = None, cov_scales: ndarray | None = None, treated_weights: ndarray | None = None, control_weights: ndarray | None = None, population_weights: ndarray | None = None, inference: Any | None = None, n_treated_units: int | None = None) → SyntheticControlPostFit#

Compute a SyntheticControlPostFit from trajectories + boundaries.

The trajectories treated_series and control_series are the estimator’s own synthetic constructs (Σⱼ wⱼ Yⱼ and Σⱼ vⱼ Yⱼ in Abadie-Zhou notation). n_post defaults to len(treated_series) - n_fit - n_blank.

Covariate balance fields are populated when cov_matrix + treated_weights + control_weights are all supplied (the natural inputs for any MAREX-family design). The compute_smd() helper does the work, so the SMD numbers are exactly consistent with a standalone call to compute_smd().

Inference scalars are pulled from the estimator’s inference object via _extract_inference(), which knows about the four common shapes (LEXSCM Inference, MAREX MAREXInference, SYNDES SYNDESInference, or a plain dict). All inference fields are optional.

mlsynth.utils.post_fit.compute_post_fit_marex(raw, panel, *, cov_scales: ndarray | None = None) → SyntheticControlPostFit#

Adapt a MAREXResults + MAREXPanel pair into a SyntheticControlPostFit.

Pulls the aggregate synthetic-treated / synthetic-control trajectories from raw.globres, the (T0, blank_periods) split from panel.T0 and panel.blank_periods, the inference object from raw.globres.inference, and the covariate matrix from panel.covariates (when present).

mlsynth.utils.post_fit.compute_power_analysis(post_fit: SyntheticControlPostFit, *, alpha: float = 0.05, power_target: float = 0.8, post_grid: Sequence[int] | None = None) → PowerAnalysis#

Analytical MDE + power curve for a design’s SyntheticControlPostFit.

Uses the placebo / blank-period gap residuals (or the pre-period gap when no blank window was carved out) to estimate the noise standard deviation sigma_placebo and the AR(1) autocorrelation rho, then computes the minimum detectable effect for each horizon T in post_grid via the Gaussian formula

MDE(T) = (z_{1-alpha/2} + z_{power}) * sigma_placebo * sqrt(VIF(T, rho)),

where VIF(T, rho) = Var(mean of T AR(1) periods) / sigma_placebo^2. The headline MDE uses T = post_fit.n_post (the realised post window).

Parameters:

post_fit (SyntheticControlPostFit) – The standardized post-fit from any MAREX-family estimator.
alpha (float, default 0.05) – Two-sided significance level.
power_target (float, default 0.80) – Target power for the MDE.
post_grid (sequence of int, optional) – Post-treatment horizons at which to compute MDE. Defaults to a small geometric grid centered on post_fit.n_post so users see the detectability tradeoff vs. running the experiment longer.

Returns:

PowerAnalysis – Headline MDE + a curve over the requested horizons.

mlsynth.utils.post_fit.compute_smd(cov_matrix: ndarray, treated_weights: ndarray, control_weights: ndarray, *, cov_names: Sequence[str] | None = None, cov_scales: ndarray | None = None) → Dict[str, Any]#

Standardized mean differences between weighted treated and control means.

Parameters:

cov_matrix (ndarray, shape (N, M)) – Per-unit covariate values; rows align to treated_weights and control_weights.
treated_weights, control_weights (ndarray, shape (N,)) – Non-negative weights with disjoint supports. They are renormalised to sum to 1 internally (so callers may pass raw sums-to-K weights).
cov_names (sequence of str, optional) – Names for the M covariates. Defaults to ("cov_0", "cov_1", ...).
cov_scales (ndarray, shape (M,), optional) – Pre-computed per-covariate standardization scales (cross-unit std). Defaults to the std of cov_matrix columns. Passing the value already cached by build_covariate_matrix is the right move.

Returns:

dict with keys smd (the per-covariate dict), smd_abs_max,
and smd_squared_sum. Returns empty / NaN summaries if either weight
vector is all-zero.

mlsynth.utils.post_fit.to_effect_result(pf: SyntheticControlPostFit, *, time_periods: ndarray | None = None, intervention_time: Any | None = None, method_name: str | None = None, donor_weights: Dict[str, float] | None = None) → Any#

Convert a SyntheticControlPostFit into a standardized EffectResult.

The single, family-wide adapter from the rich post-fit bundle to the contract’s EffectResult view, so every MAREX-family estimator (LEXSCM, MAREX, SYNDES, PANGEO) gets report for free instead of hand-copying fields. The realized effect’s standard scalars populate the standard sub-models; everything the contract has no slot for (per-period effects, cumulative effect, covariate SMDs, and the full post_fit object itself) is carried in additional_outputs so it remains discoverable.

Example: choosing treated markets under a budget#

This end-to-end example simulates a panel of sales markets, flags a subset as treatment-eligible with heterogeneous treatment costs, and lets LEXSCM choose which $m = 3$ markets to treat under a fixed budget – then inspects the solver-style search diagnostics and the lexicographic recommendation.

Generate a synthetic panel#

import numpy as np
import pandas as pd

from mlsynth import LEXSCM


def generate_synthetic_sales_panel(
    n_units: int = 100,
    n_time_periods: int = 100,
    n_candidates: int = 40,
    treatment_start: int = 80,
    seed: int = 42,
    sigma: float = 3.0,
    sales_scale: float = 100.0,
) -> pd.DataFrame:

    np.random.seed(seed)

    unit_fe = np.random.normal(0, 8, size=n_units)
    unit_trend = np.random.normal(0.3, 0.15, size=n_units)
    unit_sensitivity = np.random.uniform(0.8, 1.2, size=n_units)

    t = np.arange(n_time_periods)
    common_factor = (
        1.5 * t + 120
        + 12 * np.sin(2 * np.pi * t / 52)
        + np.random.normal(0, 3.5, n_time_periods)
    )

    sales = np.zeros((n_time_periods, n_units))
    for j in range(n_units):
        sales[:, j] = (
            sales_scale
            + common_factor * unit_sensitivity[j]
            + unit_fe[j]
            + unit_trend[j] * t
            + np.random.normal(0, sigma, n_time_periods)
        )
    sales = np.maximum(sales, 5.0)

    # heterogeneous per-market treatment costs
    base_cost = np.random.lognormal(mean=12, sigma=0.6, size=n_units) * 5
    size_factor = np.random.uniform(0.7, 1.4, n_units)
    treatment_cost = np.round((base_cost * size_factor) / 10) * 10

    unit_ids = np.repeat(np.arange(n_units), n_time_periods)
    time_ids = np.tile(np.arange(n_time_periods), n_units)
    sales_flat = sales.ravel(order="F")

    df = pd.DataFrame({
        "unitid": unit_ids,
        "time": time_ids,
        "sales": sales_flat,
        "treatment_cost": np.repeat(treatment_cost, n_time_periods),
    })

    candidate_mask = np.zeros(n_units, dtype=bool)
    candidate_idx = np.random.choice(n_units, size=n_candidates, replace=False)
    candidate_mask[candidate_idx] = True
    df["candidate"] = np.repeat(candidate_mask, n_time_periods)
    df["post"] = (df["time"] >= treatment_start).astype(int)

    return df

Run LEXSCM and inspect the design#

df = generate_synthetic_sales_panel(
    n_units=120,
    n_candidates=40,
    n_time_periods=100,
    treatment_start=90,
    seed=4545,
)

config = {
    "df": df,
    "outcome": "sales",
    "unitid": "unitid",
    "time": "time",
    "candidate_col": "candidate",
    "m": 3,                         # treat 3 of the 40 eligible markets
    "post_col": "post",
    "unit_cost_col": "treatment_cost",
    "budget": 4_000_000,            # hard knapsack cap on selected costs
    "lambda_penalty": 0.5,
    "mde_horizon": "late",          # conservative MDE at the longest horizon
    "power_target": 0.8,
    "imbalance_tol": 0.25,          # validity gate: within 25% of best balance
    "top_K": 20,
}

results = LEXSCM(config).fit()

# --- Stage 1: solver-style search diagnostics ---
stats = results.search.selection["stats"]
print("status        :", stats["termination"]["status"])      # OPTIMAL / FEASIBLE
print("method        :", stats["search"]["method"])
print("subsets scored :", stats["search"]["subsets_evaluated"])
print("C(M,m)        :", stats["problem"]["feasible_region_C(M,m)"])
print("best imbalance :", round(stats["incumbent"]["imbalance"], 4))

# --- Stage 4: lexicographic recommendation tuple ---
rec = results.search.selection["recommendation"]
print("rec status    :", rec["status"])      # OK / POWER_NOT_ESTABLISHED
print("winner        :", rec["winner"])
print("pareto ids    :", rec["pareto_ids"])
print(rec["explanation"])

# --- the recommended design ---
print(results.search.shortlist)               # ranked per-design table
best = results.search.winner
print("treated weights:", best.treated_weight_dict)
print("control weights:", best.control_weight_dict)

With $M = 40$ eligible markets and $m = 3$, $\binom{40}{3} = 9{,}880$ is well under enumerate_max, so the search enumerates exactly and reports status = OPTIMAL – a certified global top_K. Raising $M$ and $m$ until $\binom{M}{m}$ exceeds the cap flips the search to the multi-start local solver (status = FEASIBLE), at which point the consensus block under stats["search"] reports how many independent starts agreed on the incumbent.

Example: spillover-aware selection#

Suppose the markets sit inside larger regions and treating two markets in the same region would let the intervention spill across them (and onto each other’s donors). Assign every unit a region and pass it as cluster_col: LEXSCM then refuses to co-treat two markets from the same region (Stage 1) and refuses to build a treated market’s synthetic control from its own-region peers (Stage 2).

import numpy as np

df = generate_synthetic_sales_panel(n_units=120, n_candidates=40,
                                    n_time_periods=100, treatment_start=90,
                                    seed=4545)

# assign each unit to one of 8 regions (constant within unit)
rng = np.random.default_rng(0)
region_of = rng.integers(0, 8, size=df["unitid"].nunique())
df["region"] = df["unitid"].map(lambda u: f"R{region_of[u]}")

config = {
    "df": df, "outcome": "sales", "unitid": "unitid", "time": "time",
    "candidate_col": "candidate", "m": 3, "post_col": "post",
    "cluster_col": "region",        # <-- the only new line vs the budget example
    "mde_horizon": "late", "top_K": 20,
}
results = LEXSCM(config).fit()

treated = list(results.selected_units)
treated_regions = {df.set_index("unitid")["region"].loc[int(u)] for u in treated}
donor_regions = {df.set_index("unitid")["region"].loc[int(d)]
                 for d in results.design_weights.donor_weights}

print("treated markets :", treated)
print("treated regions :", treated_regions)       # all distinct -- one per region
print("donor regions   :", donor_regions)
assert treated_regions.isdisjoint(donor_regions)   # donors avoid treated regions

Instead of clusters you can hand LEXSCM a spillover/adjacency matrix – a pandas.DataFrame indexed and columned by unit id, with a positive entry wherever two units interfere – via adjacency=... and an optional spillover_threshold (units conflict when the entry exceeds it). A cluster_col and an adjacency may be supplied together; their conflicts combine. If the constraint admits no conflict-free m-tuple (for instance m larger than the number of regions), the fit raises MlsynthConfigError.

A ready-made matrix ships with mlsynth – you don’t need a shapefile or GIS toolchain. basedata/markets/dma_adjacency.csv is a 206 x 206 border matrix over US Nielsen Designated Market Areas (built from the public TopoJSON via shared-arc contiguity; see basedata/markets/build_dma_adjacency.py), so “no two treated markets may share a border” is one read_csv away:

import pandas as pd

adj = pd.read_csv("basedata/markets/dma_adjacency.csv", index_col=0)
meta = pd.read_csv("basedata/markets/dma_metadata.csv")          # state, lat, long
se = [n for n in meta.loc[meta.state.isin({"FL","GA","AL","MS","SC","NC","TN"}),
                          "dma_name"] if n in adj.index]
A = adj.loc[se, se]                                              # induced subgraph

LEXSCM({"df": df, "outcome": "sales", "unitid": "dma", "time": "week",
        "candidate_col": "eligible", "m": 4, "adjacency": A}).fit()

On a spatially-correlated Southeast panel this is the difference between an unconstrained design that co-treats a cluster of bordering Gulf-Coast markets (Hattiesburg, Jackson, Mobile-Pensacola, Montgomery – with Biloxi-Gulfport and Dothan as donors, both bordering treated markets) and the spillover-aware design, which spreads treatment across non-adjacent markets and keeps every donor clear of a treated market.

Example: South & Midwest design with grouped factors#

This example exercises every selection constraint at once on real geography. We take the South and Midwest DMAs from the bundled map, group them by census division (South Atlantic, East/West South Central, East/West North Central), and draw outcomes from a grouped linear factor model (after the grouped factor structure of Liao, Shi & Zheng [RelaxSC]): markets in the same division share factor loadings, so they co-move – the latent-group structure the constraints are designed for. Each division is both a stratum (for coverage / quota) and, through the DMA borders, a source of spillover.

Build the panel#

import numpy as np
import pandas as pd
from mlsynth import LEXSCM

DIVISION = {  # census divisions across the South + Midwest
    "FL": "S. Atlantic", "GA": "S. Atlantic", "NC": "S. Atlantic",
    "SC": "S. Atlantic", "VA": "S. Atlantic", "WV": "S. Atlantic",
    "MD": "S. Atlantic",
    "KY": "E.S. Central", "TN": "E.S. Central", "MS": "E.S. Central",
    "AL": "E.S. Central",
    "AR": "W.S. Central", "LA": "W.S. Central", "OK": "W.S. Central",
    "TX": "W.S. Central",
    "OH": "E.N. Central", "IN": "E.N. Central", "IL": "E.N. Central",
    "MI": "E.N. Central", "WI": "E.N. Central",
    "MN": "W.N. Central", "IA": "W.N. Central", "MO": "W.N. Central",
    "ND": "W.N. Central", "SD": "W.N. Central", "NE": "W.N. Central",
    "KS": "W.N. Central",
}

meta = pd.read_csv("basedata/markets/dma_metadata.csv")
adj = pd.read_csv("basedata/markets/dma_adjacency.csv", index_col=0)
meta = meta[meta.state.isin(DIVISION)].copy()
meta["division"] = meta.state.map(DIVISION)
names = [n for n in meta.dma_name if n in adj.index]          # 145 DMAs
meta = meta[meta.dma_name.isin(names)].reset_index(drop=True)
A = adj.loc[names, names]                                     # border matrix

# --- grouped linear factor model: same division -> shared loadings ---
rng = np.random.default_rng(11)
n, T, T_post, r = len(names), 60, 12, 4
div = meta.set_index("dma_name").loc[names, "division"].values
Lam_div = {d: rng.normal(size=r) for d in sorted(set(div))}
Lam = np.array([Lam_div[d] for d in div]) + 0.15 * rng.normal(size=(n, r))
F = np.cumsum(rng.normal(size=(T, r)), 0)                     # latent factors
population = np.round(rng.lognormal(12.5, 0.8, n)).astype(int)
cost = population * 5.0                       # $5/person intervention cost
Y = 100 + rng.normal(0, 5, n) + F @ Lam.T + rng.normal(0, 1.0, (T, n))
cand = sorted(rng.choice(n, 18, replace=False).tolist())     # eligible markets

df = pd.DataFrame([
    {"market": names[j], "week": t, "sales": Y[t, j],
     "eligible": int(j in cand), "post": int(t >= T - T_post),
     "division": div[j], "population": int(population[j]),
     "cost": float(cost[j])}
    for j in range(n) for t in range(T)
])

base = dict(df=df, outcome="sales", unitid="market", time="week",
            candidate_col="eligible", post_col="post", top_K=8, verbose=False)

Each constraint, one at a time#

# 1. Spillover: no two treated markets share a border, and no donor borders a
#    treated market (the DMA adjacency matrix).
LEXSCM({**base, "m": 3, "adjacency": A}).fit()

# 2. Coverage: at least one treated market in EVERY census division
#    (m must be >= the 5 divisions).
LEXSCM({**base, "m": 5, "stratum_col": "division", "min_per_stratum": 1}).fit()

# 3. Quota: at most one treated market per division (a spread-out design).
LEXSCM({**base, "m": 4, "stratum_col": "division", "max_per_stratum": 1}).fit()

# 4. Size band: only treat markets at or above the median population.
LEXSCM({**base, "m": 3, "size_col": "population",
        "min_size": int(np.median(population))}).fit()

# 5. Compose constraints: cover all five divisions (min 1) AND at most two
#    per division (max 2), at m=6 -- exactly one division carries two.
LEXSCM({**base, "m": 6, "stratum_col": "division",
        "min_per_stratum": 1, "max_per_stratum": 2}).fit()

# 6. With a size band on top, a division may lose all its eligible markets;
#    coverage then only requires the divisions that still HAVE a large enough
#    market (here East-South-Central drops out, so four divisions are covered).
LEXSCM({**base, "m": 4, "stratum_col": "division", "min_per_stratum": 1,
        "size_col": "population", "min_size": int(np.median(population))}).fit()

With the grouped factors, an unconstrained design tends to concentrate treatment in whichever division is easiest to balance; the coverage and quota constraints force it to spread across divisions (so the experiment speaks to the whole footprint), the adjacency rule keeps treated markets from contaminating each other, and the size band drops markets too small to power – or too large to reproduce from the rest. All compose on the same Stage-1 admissible-tuple layer, and an impossible combination (for example min_per_stratum=1 with m below the division count) raises MlsynthConfigError.

Under a budget#

Experiments cost money, and the cost is largely driven by market size – a bigger DMA means a bigger population to treat. Here cost is $\$5$ per person, and the program carries a hard $\$10\text{M}$ knapsack (unit_cost_col + budget) that composes with everything above.

BUDGET = 10_000_000
bud = {**base, "unit_cost_col": "cost", "budget": BUDGET}

# budget alone: the most balanced affordable design (here ~$6.9M of $10M)
LEXSCM({**bud, "m": 4}).fit()

# budget + coverage: one market per division AND total cost <= $10M -- the
# knapsack pushes the design toward the *cheaper* (smaller) market in each
# division (here ~$8.3M, all five divisions covered)
LEXSCM({**bud, "m": 5, "stratum_col": "division", "min_per_stratum": 1}).fit()

The constraints can genuinely conflict: requiring coverage of all five divisions and only above-median markets and a $10M cap asks for five large (expensive) markets that the budget cannot afford, so the fit fails loudly rather than silently dropping a criterion.

from mlsynth.exceptions import MlsynthConfigError

try:
    LEXSCM({**bud, "m": 5, "stratum_col": "division", "min_per_stratum": 1,
            "size_col": "population",
            "min_size": int(np.median(population))}).fit()
except MlsynthConfigError as e:
    print(e)

prints the binding constraint and by how much, not just “infeasible”:

LEXSCM design is infeasible -- the binding constraint(s):
  - budget: the 5 cheapest eligible markets cost $10,100,765, over the
    $10,000,000 budget by $100,765. Raise the budget to >= $10,100,765,
    reduce m, or relax the size band.

Every infeasibility – candidate pool, budget, coverage, quota, or spillover – is audited up front and reported in this same have vs need -> minimal fix shape, and all binding constraints are listed together (so you fix them in one pass rather than one error at a time). The audit only reports: it never silently relaxes a constraint you set. All of them raise the one MlsynthConfigError, so a caller catches a single type and surfaces exactly which design ask was impossible and the smallest change that would satisfy it.

References#

Abadie, A., & Zhao, J. (2026). “Synthetic Controls for Experimental Design.” arXiv Working Paper 2108.02196. (The February 2026 revision develops the design-based bias bound and the approximate-balance criterion used here.) https://arxiv.org/abs/2108.02196

Vives-i-Bastida, J. (2022). “Synthetic Experimental Design for a UBI pilot study.” Working paper (https://ivalua.cat/sites/default/files/2023-03/Vives-i-Bastida_2022_anon.pdf)

Doudchenko, N., Khosravi, K., Pouget-Abadie, J., Lahaie, S., et al. “Synthetic Design: An Optimization Approach to Experimental Design with Synthetic Controls.” – related optimization view of synthetic experimental design.

Lexicographic Synthetic Control (LEXSCM)

Contents

Lexicographic Synthetic Control (LEXSCM)#

Overview#

When to use this estimator#

Notation#

Mathematical Formulation#

Setup and notation#

Identifying assumptions#

Stage 1 – Treated-tuple selection#

Weakly targeted designs (targeting_penalty)#

How a single tuple is built: the inner simplex QP#

Why a search and not a full enumeration or an exact MIP#

Building tuples in the heuristic regime#

Budget constraints#

Stage 2 – Control fit#

Spillover / interference exclusions#

Coverage quotas and size bands#

Forced-in and forbidden markets#

Stage 3 – Power analysis (minimum detectable effect)#

What “at each horizon” means – and what it does not#

Moving-block resampling#

Test statistic and critical value#

Power, the effect grid, and the MDE#

Three reported scales#

Detectability curve and horizon collapse#

Stage 4 – Final recommendation (lexicographic selection)#

Worked numeric walkthrough#

Building the tuple (Stage 1)#

Powering the design (Stage 3)#

Standardized Post-Fit and Power Analysis#

Where the unified power analysis fits in#

Two MDEs, complementary roles#

Verification#

Core API#

Configuration#

Result Containers#

Helper Modules#

Treated-tuple selection (Stage 1)#

Why this is a search and not an exact MIP#

Control fit (Stage 2)#

Additional Details#

Power / MDE (Stage 3)#

Design#

Recommendation (Stage 4)#

Data preparation and matrix construction#

Example: choosing treated markets under a budget#

Generate a synthetic panel#

Run LEXSCM and inspect the design#

Example: spillover-aware selection#

Example: South & Midwest design with grouped factors#

Build the panel#

Each constraint, one at a time#

Under a budget#

References#

Weakly targeted designs (`targeting_penalty`)#