Synthetic Design (SYNDES)

Contents

Synthetic Design (SYNDES)#

When to Use This Estimator#

Most synthetic-control work takes the treated unit as given and asks only how to weight the donors. SYNDES answers the prior question Doudchenko et al. [SYNDES] pose: when you are about to run an experiment and have pre-treatment outcome data, which units should you treat? Treating units at random – or by hand – leaves accuracy on the table, because the variance of the resulting treatment-effect estimate depends on which units are treated and how the rest are weighted into a synthetic comparison.

The authors argue this is exactly the regime of market-level experiments: treatment can only be applied to coarse units (media markets, regions, whole products), each unit is expensive to treat (so K is small), and interference or equilibrium effects rule out a more granular randomization. In that setting the experimenter both chooses the treated set and estimates the effect, and SYNDES does both at once – it minimizes the mean squared error of the average-treatment-effect-on-the-treated estimator directly over the joint choice of treatment assignment and synthetic weights. Use it when:

  • you control assignment and have a panel of pre-treatment outcomes;

  • you want a small, well-chosen treated set rather than a random one;

  • you are willing to solve a mixed-integer program for a provably optimal design (or to bound the achievable power of one).

Notation#

We observe an outcome \(y_{it}\) for all units \(\mathcal{N} \coloneqq \{1, \ldots, N\}\) over pre-treatment periods \(t \in \mathcal{T}_1 \coloneqq \{1, \ldots, T_0\}\) of length \(T_0 = T\). After period \(T_0\) the experimenter assigns a binary treatment \(D_i \in \{0, 1\}\) to be applied over the post-treatment periods \(\mathcal{T}_2 \coloneqq \{t : t > T_0\}\) (of length \(S - T_0\)), with exactly K treated units (\(\sum_i D_i = K\)). The assignment vector \(\mathbf{D} \coloneqq (D_1, \ldots, D_N)^\top\) is itself a decision variable. Each unit has potential outcomes \((y_{it}^N, y_{it}^I)\) and observed outcome \(y_{it} = y_{it}(D_i)\). Synthetic weights \(\mathbf{w}\) live on the simplex (non-negative, summing to one on the relevant side). The estimand is the weighted average treatment effect on the treated (wATET) \(\tau \coloneqq \sum_{i:D_i=1} w_i \tau_i\), where \(\tau_i\) is unit \(i\)’s additive effect.

Note

Notation bridge. The single-treated-unit synthetic-control canon (treated \(j=1\), donor pool \(\mathcal{N}_0 \coloneqq \mathcal{N} \setminus \{1\}\)) takes the treated unit as given, so it does not fit a design problem in which K treated units are themselves chosen. We therefore keep the design index \(i\) for units, with the assignment vector \(\mathbf{D}\) a decision variable, and write \(T_0\) for the pre-treatment length (the page’s \(T\)) and \(\mathcal{T}_1\) / \(\mathcal{T}_2\) for the pre/post period sets.

The design problem#

Under the outcome model \(y_{it}^N = \mu_{it} + \varepsilon_{it}\) with mean-zero, homoskedastic noise (\(\operatorname{Var}\varepsilon_{it} = \sigma^2\)) and additive effects \(y_{it} = y_{it}^N + D_i \tau_i\), the conditional MSE of the per-unit synthetic-control estimator \(\widehat{\tau}_i \coloneqq y_{i,T_0+1} - \sum_{j:D_j=0} w^i_j y_{j,T_0+1}\) is

\[\mathbb{E}\bigl[(\widehat{\tau}_i - \tau_i)^2 \mid \mathbf{D}, \mathbf{w}\bigr] = \Bigl(\mu_{i,T_0+1} - \textstyle\sum_{j:D_j=0} w^i_j \mu_{j,T_0+1}\Bigr)^2 + \sigma^2\Bigl(1 + \textstyle\sum_{j:D_j=0} (w^i_j)^2\Bigr).\]

The first term is a bias from imperfect pre-treatment matching; the second a variance that grows with the weight concentration. SYNDES minimizes the empirical, pre-period analogue of this MSE jointly over \((\mathbf{D}, \mathbf{w})\) – the \(\sigma^2 \sum w^2\) term becomes the ridge penalty \(\lambda\) below. Because the choice of treated set makes the estimand itself stochastic, the target is the wATET for the units SYNDES selects, not a fixed population ATE.

The three MIP formulations#

The joint optimization over assignment and weights is a bilevel / mixed-integer program and is NP-hard. Doudchenko et al. give three forms, all exposed through mode and all solved as MIPs (auxiliary variables linearize the weight-assignment products). They differ in how the treated and control sides are weighted.

Per-unit (mode="per_unit")#

A separate synthetic control for every treated unit:

\[\operatorname*{argmin}_{\mathbf{D}, \{w^i_j\}} \; \frac{1}{K T_0} \sum_{i} \sum_{t \in \mathcal{T}_1} D_i \Bigl(y_{it} - \textstyle\sum_j w^i_j (1 - D_j) y_{jt}\Bigr)^2 + \frac{\lambda}{K} \sum_i \sum_j D_i (w^i_j)^2,\]

subject to \(w^i_j \ge 0\), \(\sum_i D_i = K\), and \(\sum_j w^i_j(1 - D_j) = 1\) for each treated \(i\). Each treated unit draws its own simplex of control weights.

Two-way global (mode="two_way_global")#

A single weight vector applied to both sides of one global contrast:

\[\operatorname*{argmin}_{\mathbf{D}, \{w_i\}} \; \frac{1}{T_0} \sum_{t \in \mathcal{T}_1} \Bigl(\textstyle\sum_i w_i D_i y_{it} - \sum_i w_i (1 - D_i) y_{it}\Bigr)^2 + \lambda \sum_i w_i^2,\]

subject to \(w_i \ge 0\), \(\sum_i D_i = K\), \(\sum_i w_i D_i = 1\) and \(\sum_i w_i (1 - D_i) = 1\). mlsynth linearizes \(q_i \coloneqq w_i D_i\) and enforces the two normalizations with \(\sum_i q_i = 1\), \(\sum_i w_i = 2\), so the per-period contrast is \((2\mathbf{q} - \mathbf{w})^\top \mathbf{y}_t\).

One-way global (mode="one_way_global")#

The two-way program with the treated weights pinned equal (a simple average, \(w_i = 1/K\) on the treated), while the control side stays a free synthetic control:

\[\operatorname*{argmin}_{\mathbf{D}, \mathbf{c}} \; \frac{1}{T_0}\sum_{t \in \mathcal{T}_1} \Bigl(\tfrac{1}{K}\textstyle\sum_i D_i y_{it} - \sum_i c_i y_{it}\Bigr)^2 + \lambda\Bigl(\tfrac{1}{K} + \textstyle\sum_i c_i^2\Bigr),\]

subject to \(c_i \ge 0\), \(\sum_i c_i = 1\), \(c_i \le 1 - D_i\) (treated units carry no control weight) and \(\sum_i D_i = K\).

Warning

One-way global is not difference-in-means. Only the treated side is fixed at \(1/K\); the control side \(\mathbf{c}\) is a free synthetic control to be optimized. Pinning both sides (treated \(1/K\), control \(1/(N-K)\)) would be the randomized difference-in-means baseline, a different (and weaker) design.

Assumptions / Remarks.

Assumption 1 (additive effects, homoskedastic noise). Outcomes follow \(y_{it}^N = \mu_{it} + \varepsilon_{it}\) with \(\mathbb{E}\varepsilon_{it}=0\), \(\operatorname{Var}\varepsilon_{it} = \sigma^2\), and treatment adds \(\tau_i\). Remark. This is what makes the MSE above decompose into the matching-bias and weight-variance terms the MIP minimizes; \(\sigma^2\) is unknown and is supplied through lam (default: the pre-period sample variance).

Assumption 2 (admissible weights). Weights are non-negative and normalized on their side (a convex combination), so the synthetic comparison does not extrapolate. Remark. The simplex is what gives the design an interpretable “synthetic unit” reading and bounds the variance term.

Assumption 3 (homogeneous vs. heterogeneous effects). When effects are homogeneous (\(\tau_i \equiv \tau\)) any weighted average recovers \(\tau\), so the global modes can choose weights freely to minimize MSE. When effects are heterogeneous, different weightings target different estimands; per_unit (or the fixed-treated-weight one_way_global) keeps the estimand well-defined. Remark. This is the authors’ guidance for choosing a mode – it is about which wATET you are willing to target, not just fit.

Assumption 4 (sharp null for inference). The permutation test targets the sharp null \(\tau_i = 0\) for all treated \(i\). Remark. Correct test size holds under the exchangeability the moving-block permutation imposes; the authors note this requires “rather strong assumptions” in finite samples.

Inference and minimum detectable effect#

The test and the power curve are both functionals of a single scalar series: the per-period contrast the design induces. Any fitted design (any mode) reduces to a unit-level contrast vector \(\mathbf{c} \in \mathbb{R}^N\), and the estimated treatment effect at period \(t\) is its projection onto the cross-section,

\[g_t \;\coloneqq\; \mathbf{y}_t^\top \mathbf{c}, \qquad \widehat{\tau} \;=\; \frac{1}{|\mathcal{T}_2|}\sum_{t \in \mathcal{T}_2} g_t .\]

For the two global modes \(\mathbf{c} = 2\mathbf{q} - \mathbf{w}\) (per-unit treated weight minus control weight); for per_unit the \(K\) per-unit synthetic-control estimators are averaged, giving \(c_j = (D_j - \sum_i q_{ij})/K\). Because the design balances the treated and synthetic-control sides over the pre-period, under the sharp null \(\tau_i \equiv 0\) the series has mean zero, \(\mathbb{E}[g_t]=0\), and its pre-period fluctuations are a direct readout of the estimator’s noise. Every quantity below is computed from \(\{g_t\}\).

The moving-block permutation test#

SYNDES tests \(H_0: \tau_i \equiv 0\) with the moving-block permutation test of Chernozhukov, Wuethrich and Zhu (2021). The observed statistic is the post-period mean \(\widehat{\tau}\); the reference distribution is built by cyclically shifting the stacked contrast series and recomputing the length-\(|\mathcal{T}_2|\) block mean at every offset. The two-sided \(p\)-value is the share of shifted block means at least as large in magnitude as \(\widehat{\tau}\), and the test is exact under the exchangeability the cyclic shift imposes.

Where the per-period standard error comes from#

The test compares a length-\(h\) block mean of \(g_t\) to its null distribution, so the standard error that governs detectability at horizon \(h \coloneqq n_{\text{post}}\) is the standard deviation of that block mean. It is estimated on the pre-period contrast \(\{g_t\}_{t\in\mathcal{T}_1}\) – the only periods untouched by the post-period effect. The i.i.d. per-period scale is the sample SD

\[\widehat{\sigma}_{\text{perm}} = \sqrt{\frac{1}{T_0 - 1}\sum_{t\in\mathcal{T}_1}\bigl(g_t - \bar g\bigr)^2},\]

reported as sigma_perm. Were the \(g_t\) serially independent, the mean of \(h\) of them would have standard error \(\widehat{\sigma}_{\text{perm}}/\sqrt{h}\) – the textbook scaling.

Serial correlation matters#

Contrast residuals are almost never independent: the weighting absorbs the level but persistent factors (seasonality, business cycles, slow trends) leak through – exactly the correlation the moving-block test is exposed to. The variance of a length-\(h\) mean is then \(\sigma_{\mathrm{LR}}^2/h\), not \(\sigma_{\text{perm}}^2/h\), where \(\sigma_{\mathrm{LR}}^2\) is the long-run variance. power_analysis() estimates it with a Newey-West (Bartlett-kernel) HAC estimator on the pre-period contrast,

\[\widehat{\sigma}_{\mathrm{LR}}^2 = \widehat{\gamma}_0 + 2\sum_{k=1}^{L}\Bigl(1 - \frac{k}{L+1}\Bigr)\widehat{\gamma}_k, \qquad \widehat{\gamma}_k = \frac{1}{T_0}\sum_{t=k+1}^{T_0}(g_t-\bar g)(g_{t-k}-\bar g),\]

with automatic bandwidth \(L = \lfloor 4\,(T_0/100)^{2/9}\rfloor\). The Bartlett weights \(1 - k/(L+1)\) keep the estimate non-negative, and \(\widehat{\sigma}_{\mathrm{LR}}\) (reported as long_run_sigma) collapses to \(\widehat{\sigma}_{\text{perm}}\) when the contrast is white noise. When a short horizon leaves too few overlapping blocks the estimator falls back to the i.i.d. scaling.

The per-horizon MDE#

Collecting terms, the null standard error of the ATT estimator at horizon \(h\) is \(\mathrm{SE}(h) = \widehat{\sigma}_{\mathrm{LR}}/\sqrt{h}\), and for a two-sided level-\(\alpha\) test with target power \(1-\beta\) the minimum detectable effect – the smallest true ATT the design would reject \(H_0\) for with probability \(1-\beta\) – is

\[\mathrm{MDE}(h) = \bigl(z_{1-\alpha/2} + z_{1-\beta}\bigr)\, \frac{\widehat{\sigma}_{\mathrm{LR}}}{\sqrt{h}}, \qquad \mathrm{MDE}_\%(h) = 100 \times \frac{\mathrm{MDE}(h)}{\text{baseline}},\]

with baseline the mean pre-period outcome over the SYNDES-selected treated units by default (alternatives: "overall", "control", or a user scalar). This is exactly the number tabulated for horizons \(h = 1,\dots,12\) in res.power_curve and returned by power_analysis(). The power to detect a given true effect \(\tau\) at horizon \(h\) uses the same SE,

\[\pi(\tau, h) = \Phi\!\Bigl(\frac{|\tau|}{\mathrm{SE}(h)} - z_{1-\alpha/2}\Bigr) + \Phi\!\Bigl(-\frac{|\tau|}{\mathrm{SE}(h)} - z_{1-\alpha/2}\Bigr).\]

Two estimators of the same standard error#

mlsynth exposes two SYNDES power surfaces. They share the contrast series \(\{g_t\}\) and the multiplier \(z_{1-\alpha/2}+z_{1-\beta}\), and differ only in how they turn the pre-period contrast into \(\mathrm{SE}(h)\):

  • res.power_curve and power_analysis() – the nonparametric Newey-West HAC above, \(\mathrm{SE}(h)=\widehat{\sigma}_{\mathrm{LR}}/\sqrt{h}\), tabulated per horizon \(h=1,\dots,12\).

  • res.post_fit.power – the family-wide parametric surface, which models the contrast as an AR(1) and uses \(\mathrm{SE}(h)=\widehat{\sigma}_{\text{placebo}}\sqrt{\mathrm{VIF}(h,\widehat{\rho})}\) (next section).

Both collapse to the textbook \((z_{1-\alpha/2}+z_{1-\beta})\,\widehat{\sigma}/\sqrt{h}\) when the contrast is serially uncorrelated; they differ only in how the serial-correlation correction is extrapolated (a nonparametric lag sum versus a one-parameter AR(1) fit).

Standardized Post-Fit and Power Analysis#

Every call to SYNDES.fit() attaches a SyntheticControlPostFit to res.post_fit — the same diagnostic surface used by LEXSCM, MAREX, and PANGEO. This is the one-stop container downstream consumers (dashboards, paper-style reports, comparison tables) read from, regardless of which member of the family produced the design:

import numpy as np
import pandas as pd
from mlsynth import SYNDES

rng = np.random.default_rng(0)
n_units, n_periods, n_post = 8, 20, 6
factors = rng.normal(size=(n_periods, 2))
loadings = rng.uniform(0.3, 1.0, size=(n_units, 2))
level = rng.uniform(8.0, 12.0, size=n_units)
Y = level + factors @ loadings.T + rng.normal(scale=0.3, size=(n_periods, n_units))
df = pd.DataFrame(
    [{"unit": j, "time": t, "Y": float(Y[t, j]),
      "post": int(t >= n_periods - n_post)}
     for j in range(n_units) for t in range(n_periods)]
)
res = SYNDES({"df": df, "outcome": "Y", "unitid": "unit", "time": "time",
              "K": 3, "mode": "two_way_global", "post_col": "post",
              "run_inference": True}).fit()

pf = res.post_fit                            # SyntheticControlPostFit
print(pf.ate, pf.ate_percent, pf.total_effect)   # treatment-effect scalars
print(pf.rmse_fit, pf.rmse_post)                 # pre / post fit quality
print(pf.p_value, pf.ci_lower, pf.ci_upper)      # permutation inference
print(pf.power)                                  # PowerAnalysis (see below)

The synthetic treated / control trajectories used to populate post_fit are the per-unit weighted aggregates Y[:, j] @ treated_weights and Y[:, j] @ control_weights over the full timeline. SYNDES has no pre-period blank window (its inference is a moving-block permutation on the post-period rather than a placebo test on a held-out pre-tail), so pf.n_blank = 0 and the power-analysis module falls back to the pre-period gap as its placebo proxy. Mathematically the MDE surface is the same Gaussian + AR(1) construction used across the family:

\[\mathrm{MDE}(T) = \bigl(z_{1-\alpha/2} + z_{1-\beta}\bigr) \cdot \widehat{\sigma}_{\text{placebo}} \cdot \sqrt{\mathrm{VIF}(T, \widehat{\rho})},\]

with \(\widehat{\sigma}_{\text{placebo}}\) the per-period contrast SD on the pre-period (the SYNDES paper’s “pre-period imbalance”), \(\widehat{\rho}\) the lag-1 autocorrelation of that contrast clipped to \((-0.99, 0.99)\), and \(\mathrm{VIF}(T, \rho) = \tfrac{1}{T}\bigl(1 + 2\sum_{k=1}^{T-1} (1-k/T)\rho^k\bigr)\) the AR(1) variance-inflation factor (textbook \(1/T\) when \(\rho = 0\)). See Synthetic Controls for Experimental Design (MAREX) for the full derivation; the same module powers all three estimators.

import numpy as np
import pandas as pd
from mlsynth import SYNDES

rng = np.random.default_rng(0)
n_units, n_periods, n_post = 8, 20, 6
factors = rng.normal(size=(n_periods, 2))
loadings = rng.uniform(0.3, 1.0, size=(n_units, 2))
level = rng.uniform(8.0, 12.0, size=n_units)
Y = level + factors @ loadings.T + rng.normal(scale=0.3, size=(n_periods, n_units))
df = pd.DataFrame(
    [{"unit": j, "time": t, "Y": float(Y[t, j]),
      "post": int(t >= n_periods - n_post)}
     for j in range(n_units) for t in range(n_periods)]
)
res = SYNDES({"df": df, "outcome": "Y", "unitid": "unit", "time": "time",
              "K": 3, "mode": "two_way_global", "post_col": "post",
              "run_inference": True}).fit()

p = res.post_fit.power                        # PowerAnalysis
print(p.headline.mde_absolute)                # MDE at the realised T_post
print(p.headline.mde_pct)                     # ... as % of post-period baseline
print(p.headline.power_at_observed)           # power to detect res.post_fit.ate
print(p.curve)                                # tuple of MDEPoint per horizon

A second, per-horizon table comes back from every fit by default as res.power_curve – a SYNDESPower over post-period horizons 1..12, so you can read the minimum-detectable-effect curve without a separate power_analysis() call:

import pandas as pd
from mlsynth import SYNDES

rng = __import__("numpy").random.default_rng(0)
n_units, n_periods, n_post = 8, 20, 6
Y = (rng.uniform(8.0, 12.0, n_units)
     + rng.normal(size=(n_periods, 2)) @ rng.uniform(0.3, 1.0, (n_units, 2)).T
     + rng.normal(scale=0.3, size=(n_periods, n_units)))
df = pd.DataFrame(
    [{"unit": j, "time": t, "Y": float(Y[t, j]),
      "post": int(t >= n_periods - n_post)}
     for j in range(n_units) for t in range(n_periods)]
)
res = SYNDES({"df": df, "outcome": "Y", "unitid": "unit", "time": "time",
              "K": 3, "mode": "two_way_global", "post_col": "post"}).fit()

print(res.power_curve.to_dataframe())         # n_post 1..12, mde_absolute, mde_percent

Power-analysis failures (e.g. degenerate pre-period contrast) never break a fit; res.post_fit.power and res.power_curve are simply left as None in that case. For a custom horizon grid, significance level, or baseline, call power_analysis() (per-horizon table) or compute_power_analysis() (the headline AR(1) surface) directly.

post_col vs T0#

SYNDES accepts either a scalar T0 (count of pre-treatment periods) or post_col (a 0/1 column marking the post-treatment window). Both express the same pre/post split — passing post_col is just the more ergonomic form when the panel already carries an experiment-window flag. If both are supplied and disagree, post_col wins and a UserWarning is emitted so the override is visible.

Choosing among the modes#

Mode

Weighting

Use when

per_unit

one synthetic control per treated unit

effects are heterogeneous; you want unit-level estimates and the tightest per-unit fit

two_way_global

one weight vector, both sides free

effects are homogeneous; you want the lowest-MSE single contrast

one_way_global

treated fixed at 1/K, control free

heterogeneous effects but a simple, fixed treated average is the target estimand

Solver runtime and the 5%-gap default#

The SYNDES MIP is structurally hard. The two_way_global formulation contains a bilinear product \(q_i \coloneqq w_i D_i\) between the continuous weight \(w_i\) and the binary assignment \(D_i\), encoded via the standard McCormick linearisation (q_i \le D_i, q_i \le w_i, q_i \ge w_i - (1 - D_i)). McCormick is the tightest linear relaxation of a bilinear term, but it is still loose at the root LP — so the SCIP optimality gap closes slowly on long panels even when the primal incumbent is essentially optimal. For example on the Walmart weekly-sales panel (\(N = 45,\ T_0 = 128\), \(K = 3\)) SCIP finds the optimal treated set within a minute, then spends an additional 30+ minutes proving optimality by climbing the dual bound. The treated set itself does not change during this proof phase.

This matters because in practice our SCM bias bounds do not require optimality of the solver. Abadie and Zhao (2026) (2026, eq. 10 discussion, p. 10 and 13), writing about their formulation, state explicitly:

“we do not strictly require optimality of \(\{w^*, v^*\}\), provided \(\{w^*, v^*\}\) is feasible and \(\bar{X} - \sum_j w^*_j X_j \approx 0\) and \(X_j - \sum_i v^*_{ij} X_i \approx 0\) for all j such that \(w^*_j > 0\).”

Their Theorems 1 and 2 are written in terms of the residual fit, not the QP optimality gap, so a 5%-suboptimal solution that achieves approximate balance inherits the same econometric guarantees as a proven-optimal one. SYNDES is not the same problem as the ones AZ are concerned with, but the conclusion still holds.

mlsynth therefore exposes two SCIP-knob fields on SYNDESConfig and defaults them to the production-friendly setting:

  • gap_limit (default 0.05, i.e. 5%) – handed to SCIP as scip_params={"limits/gap": value}. The MIP terminates as soon as the primal-dual gap is within this fraction of the incumbent.

  • time_limit (default 60.0 seconds) – wall-clock cap on the solve, passed through as scip_params={"limits/time": value}.

With these defaults Walmart-scale designs return in under a minute with a known \(\le 5\%\) gap to the (provable) optimum. Tighten either knob – or set it to None – for research-grade optimality:

import numpy as np
import pandas as pd
from mlsynth import SYNDES

# A small balanced panel: 8 units, 20 periods (last 6 are post-treatment).
rng = np.random.default_rng(0)
n_units, n_periods, n_post = 8, 20, 6
factors = rng.normal(size=(n_periods, 2))
loadings = rng.uniform(0.3, 1.0, size=(n_units, 2))
level = rng.uniform(8.0, 12.0, size=n_units)
Y = level + factors @ loadings.T + rng.normal(scale=0.3, size=(n_periods, n_units))
df = pd.DataFrame(
    [{"unit": j, "time": t, "Y": float(Y[t, j]),
      "post": int(t >= n_periods - n_post)}
     for j in range(n_units) for t in range(n_periods)]
)

base = {"df": df, "outcome": "Y", "unitid": "unit", "time": "time",
        "K": 3, "mode": "two_way_global", "post_col": "post"}

# Default: 5% gap, 60s wall-clock — production-suitable.
SYNDES(base).fit()

# Loosen the gap to return in seconds when you just need a
# plausible design for prototyping.
SYNDES({**base, "gap_limit": 0.25, "time_limit": 5.0}).fit()

# Disable both limits for an asymptotic-optimality run. Be
# prepared for hours-long solves on long panels.
SYNDES({**base, "gap_limit": None, "time_limit": None}).fit()

The MIP status codes user_limit and user_limit_inaccurate (SCIP’s “stopped early with a valid incumbent”) are accepted as successful returns alongside the standard optimal / optimal_inaccurate codes — again, because the theory only needs the incumbent’s feasibility, not the proof of optimality.

Note

If you have a commercial solver (Gurobi, CPLEX, MOSEK) installed, pass solver="GUROBI" and the MIP closes the gap orders of magnitude faster than SCIP — these solvers handle MIQP / MIQCP relaxations natively. The default of SCIP is chosen because it ships with mlsynth (via pyscipopt) with no license required.

Multiple Treatment Arms#

When a single experiment runs several treatment arms (e.g. different creatives, offers, or price points, each rolled out to its own set of markets), pass an arm column. SYNDES then solves the design problem independently within each arm’s units and returns a SYNDESMultiArmResults — a dict of per-arm results keyed by arm label. Every option (mode, K, lam, inference) applies within each arm, and K is interpreted per arm (so it must be smaller than the smallest arm’s unit count).

import numpy as np
import pandas as pd
from mlsynth import SYNDES

# Two arms (A, B), each with 6 markets over 20 periods (last 6 post).
rng = np.random.default_rng(0)
n_per_arm, n_periods, n_post = 6, 20, 6
rows = []
for arm in ("A", "B"):
    factors = rng.normal(size=(n_periods, 2))
    for j in range(n_per_arm):
        loading = rng.uniform(0.3, 1.0, size=2)
        level = rng.uniform(8.0, 12.0)
        series = level + factors @ loading + rng.normal(scale=0.3, size=n_periods)
        for t in range(n_periods):
            rows.append({"DMA": f"{arm}{j}", "week": t, "sales": float(series[t]),
                         "treat": arm, "post": int(t >= n_periods - n_post)})
df = pd.DataFrame(rows)

res = SYNDES({
    "df": df, "outcome": "sales", "unitid": "DMA", "time": "week",
    "arm": "treat",                 # categorical arm label per unit
    "K": 3, "mode": "two_way_global", "post_col": "post",
    "run_inference": True,
}).fit()

print(res.arm_designs["A"])              # full SYNDESResults for arm A
print(res.atet_by_arm())                 # {arm: ATET}
print(res.selected_unit_labels_by_arm()) # {arm: treated units}

The arm column must be constant within each unit over time. arm is not compatible with the global costs/budget constraint (the cost vector is defined over all units, not per arm). When arm is None (default), a single SYNDESResults is returned, exactly as before.

Across groups: arm vs. quotas vs. restrictions#

Three mechanisms touch on grouping/geography; they are distinct, not interchangeable. Pick by what you actually want:

  • A separate experiment per group – its own estimand, its own donor pool: arm (above). This is the SYNDES analog of MAREX’s cluster (Synthetic Controls for Experimental Design (MAREX)), which bakes the same idea into a per-cluster objective. arm partitions the units and solves an independent design in each.

  • One design representative of every group (coverage): stratum_col with min_per_stratum / max_per_stratum – one shared donor pool, one estimand, with a quota of treated units per group.

  • One design with geographic / forcing limits: the restriction suite (force in/out, border conflict, size band, donor rules).

A constraint cannot split one design into K independent designs, so arm is not a special case of the quota – it is a different object (K estimands and K disjoint donor pools). Note the asymmetry with MAREX: there the restrictions compose with cluster (they apply within each cluster), because cluster lives in the objective; in SYNDES arm runs separate solves and does not combine with the restriction fields – to get both, loop a restricted SYNDES over each arm’s sub-panel.

Example#

SYNDES takes a long balanced panel and a pre/post split (post_col or T0). The example below is self-contained – it generates a small panel and runs end to end (pyscipopt ships with mlsynth, so the SCIP solver is available on install). The same call shape serves all three designs.

import numpy as np
import pandas as pd
from mlsynth import SYNDES, power_analysis

# A small balanced panel: 8 units, 20 periods (last 6 are post-treatment).
rng = np.random.default_rng(0)
n_units, n_periods, n_post = 8, 20, 6
factors = rng.normal(size=(n_periods, 2))
loadings = rng.uniform(0.3, 1.0, size=(n_units, 2))
level = rng.uniform(8.0, 12.0, size=n_units)          # positive unit baselines
Y = level + factors @ loadings.T + rng.normal(scale=0.3, size=(n_periods, n_units))
df = pd.DataFrame(
    [{"unit": j, "time": t, "Y": float(Y[t, j]),
      "post": int(t >= n_periods - n_post)}
     for j in range(n_units) for t in range(n_periods)]
)

res = SYNDES({
    "df": df, "outcome": "Y", "unitid": "unit", "time": "time",
    "K": 3, "mode": "two_way_global", "post_col": "post",
    "run_inference": True, "alpha": 0.05, "solver": "SCIP",
}).fit()

print(res.design.selected_unit_labels)   # which units to treat
print(res.design.control_weights)        # synthetic-control weights
print(res.design.pre_fit_rmse)           # pre-period balance of the design
print(res.inference.atet, res.inference.p_value)

mde = power_analysis(res, n_post_periods=[4, 8, 12], power=0.80)
print(mde.to_dataframe())                # minimum detectable effect by horizon

A budget constraint (costs + budget) adds \(\sum_i \mathrm{cost}_i D_i \le B\) to the MIP; mode="two_way_global" also accepts K=None to let the program choose the number of treated units.

Design restrictions (geography, clustering, size, forcing)#

Because SYNDES selects the treated set by a MIP over a binary assignment vector \(D\), the geographic and clustering restrictions GEOLIFT and LEXSCM expose become linear constraints on \(D\) – the same vocabulary, enforced exactly rather than by filtering enumerated candidates:

  • to_be_treated – units forced into the treated set (\(D_i = 1\)); not_to_be_treated – units forbidden from treatment (\(D_i = 0\)), which stay available as control donors.

  • cluster_col and/or adjacency with spillover_threshold – a spillover/interference conflict graph (built by the same helper LEXSCM uses). Two units that share a cluster, or whose adjacency entry exceeds the threshold, may not both be treated (\(D_i + D_j \le 1\)), so the design never places treated units where they would interfere.

  • stratum_col with min_per_stratum / max_per_stratum – coverage quotas: at least min_per_stratum treated units in every stratum that contains a treatable unit, and at most max_per_stratum in any stratum (\(\min \le \sum_{i \in s} D_i \le \max\)).

  • size_col with min_size / max_size – a treated-unit size band; units outside it lose treatment eligibility (\(D_i = 0\)) but remain donors.

The cluster_col / stratum_col / size_col columns must be constant within each unit. Restrictions compose with each other and with costs / budget, and they flow through every selection rule (in-sample, holdout, ic). They are not available with mode="two_way_global_annealed" (no MIP) or an arm column (restrictions are global, not per-arm). Infeasible combinations – forcing more units than K, or forbidding so many that fewer than K remain treatable – raise a translated MlsynthConfigError rather than leaking a solver INFEASIBLE.

import pandas as pd
from mlsynth import SYNDES

res = SYNDES({
    "df": df, "outcome": "Y", "unitid": "location", "time": "date",
    "K": 3, "mode": "two_way_global", "post_col": "post",
    "to_be_treated": ["chicago"],          # force this market in
    "cluster_col": "dma",                  # no two treated in one DMA
    "stratum_col": "region",               # ...
    "min_per_stratum": 1,                  # at least one treated per region
    "size_col": "population",              # only mid-sized markets treatable
    "min_size": 5e5, "max_size": 5e6,
}).fit()

print(sorted(res.design.selected_unit_labels.tolist()))

Restricting the donor pool (region-matched, non-bordering)#

The restrictions above constrain who is treated. A separate family constrains who may be a treated unit’s donor – the control side. The primitive is a donor-exclusion relation \(B_{ij}\) (“if \(i\) is treated, \(j\) may not be its donor”), enforced by coupling the assignment \(D\) to the mode’s control weights, so it works in every mode:

\[\begin{split}\text{one-way global:}\quad & c_j \le 1 - D_i, \\ \text{two-way global:}\quad & w_j - q_j \le 1 - D_i, \\ \text{per-unit:}\quad & w_{ij} = 0.\end{split}\]

Two convenience knobs (and one escape hatch) fill \(B\):

  • donor_region_col – a donor must share the treated unit’s value in this column (e.g. a Census region). So a Midwest market can borrow from another Midwest market but never from a Northeast one.

  • exclude_bordering_donors – a treated unit’s spillover neighbours (from cluster_col / adjacency + spillover_threshold) are dropped from its donor pool, so a donor cannot sit right across the treated unit’s border (the Vives-i-Bastida exclusion restriction). Requires a conflict source.

  • donor_exclusion – an explicit per-pair matrix (DataFrame keyed by unit label, entry \([i, j] > 0\) forbids \(j\) as a donor for \(i\)), combined with the above by union. The fully general form.

A subtlety worth knowing: the global modes share one donor vector across all treated units, so donor_region_col there forces the treated set into a single region (one donor vector cannot be same-region as treated units from two regions). The per_unit mode gives each treated unit its own synthetic control, so it supports a genuinely multi-region design – Detroit drawing from the Midwest and a Pennsylvania market drawing from the Northeast, in the same fit. (Region exclusion in a global mode emits \(O(N^2)\) constraints; on a very large panel prefer per_unit or scoping the panel to the region.)

res = SYNDES({
    "df": df, "outcome": "Y", "unitid": "dma", "time": "date",
    "K": 3, "mode": "per_unit", "post_col": "post",
    "donor_region_col": "census_region",   # donors share the treated region
    "adjacency": dma_borders,              # contiguity matrix
    "exclude_bordering_donors": True,      # ...and never a bordering donor
}).fit()

Solution pool (top_K): a menu, not one answer#

The MIP returns the single MSE-optimal design, but that is optimal for fit alone – it discards every other feasible design, some of which may be cheaper, more detectable, or operationally preferable at a negligible fit cost. Setting top_K > 1 returns a ranked pool of the best top_K distinct designs, obtained by no-good cuts: after each solve the chosen treated set \(S\) is forbidden (\(\sum_{i \in S} D_i \le |S|-1\)) and the MIP is re-solved for the next-best design. The pool is attached as results.pool – a list of dicts ranked by MSE. Each entry is actionable, not merely rankable: it carries everything needed to deploy that design, not just the rank-1 winner kept on results.design. Its keys are:

  • markets – labels of the treated units (the design’s arms).

  • control_group – labels of the donor units carrying nonzero control weight (the synthetic-control pool backing the treated arms).

  • objective – the MIP objective (fit) the design was ranked by.

  • pre_fit_rmse – root-mean-square pre-period contrast.

  • mde_pct – minimum detectable effect at the realised post horizon, as a percent of the treated baseline. This is the entry’s power_curve value at that horizon (the Newey-West HAC MDE), so the headline number and the curve always agree.

  • cost – summed cost of the treated units (None when no costs given).

  • design – the full SYNDESDesign for the entry, with its treated, control, and contrast weights.

  • power_curve – the entry’s own SYNDESPower over horizons 1..12 (None if the computation is degenerate), so every candidate is comparable on power, not just the rank-1 winner on results.power_curve.

Because the objective only ranks fit, the value is precisely the re-scoring on the dimensions it ignored: a manager can trade a small fit increase for lower cost or higher power, then read the chosen entry’s control_group and design weights to deploy it. top_K=1 (default) is unchanged – only the optimum is returned and results.pool is None.

The example below imports a subset of the GeoLift pre-test panel (the same 40-market data the GeoLift Market Selection (GEOLIFT) page uses) and returns a five-design menu:

import pandas as pd
from mlsynth import SYNDES

df = pd.read_csv(                                      # GeoLift_PreTest, 40 mkts x 90d
    "https://raw.githubusercontent.com/jgreathouse9/mlsynth/"
    "refs/heads/main/basedata/geolift_market_data.csv"
)
markets = sorted(df["location"].unique())[:8]         # 8-market subset keeps the MIP small
df = df[df["location"].isin(markets)].copy()
cut = sorted(df["date"].unique())[-14]                # last 14 days = post window
df["post"] = (df["date"] >= cut).astype(int)

# top_K=5 runs five MIPs (one per no-good cut), so expect ~a minute on SCIP.
res = SYNDES({
    "df": df, "outcome": "Y", "unitid": "location", "time": "date",
    "K": 3, "mode": "two_way_global", "post_col": "post", "top_K": 5,
    "gap_limit": 0.2, "time_limit": 10.0,
}).fit()

print(sorted(res.design.selected_unit_labels.tolist()))   # the MSE-optimal design
for d in res.pool:                                         # ranked menu, best fit first
    print("treated:", sorted(d["markets"]),
          "| control:", sorted(d["control_group"]),        # donors backing the SC
          "| obj:", round(d["objective"], 1),
          "| mde%:", round(d["mde_pct"], 3))
# any entry is deployable: read its control group and weights, not just rank-1
chosen = res.pool[1]
print(chosen["control_group"], chosen["design"].control_weights)

The lesson is the whole point of the menu: the rank-1 design minimises MSE, but fit is not power. On this subset the best-fitting design is not the most detectable — a design ranked further down the pool, with an essentially identical objective, can carry a materially smaller minimum detectable effect (or a lower cost once costs is supplied). The menu lets you choose the design that agrees with what you will actually deploy, instead of the one number the optimiser happens to minimise.

Out-of-sample selection (holdout_frac)#

The MIP ranks designs by the in-sample pre-period contrast, so the winning treated set is the one that balances best over the periods it was fit on. When the treated unit moves with the donor pool only transiently, that tight balance need not persist, and the in-sample optimum can overfit. Setting holdout_frac turns selection into a train/validate procedure, in the spirit of the holdout split LEXSCM and MAREX use: the top_K candidate pool is learned on the leading \(1 - \texttt{holdout\_frac}\) of the pre-period, and the winner is the candidate whose held-out contrast error on the trailing holdout_frac is smallest. For example, holdout_frac=0.3 is a 70/30 split; the out-of-sample error of a design with contrast weights \(\mathbf{c}\) is \(\sqrt{\operatorname{mean}((\mathbf{Y}^{\text{val}} \mathbf{c})^2)}\), the validation-tail analogue of pre_fit_rmse.

The returned results.pool is then ranked by this out-of-sample error (rank-1 is the holdout winner kept on results.design), and each entry carries an oos_rmse key alongside the in-sample pre_fit_rmse. Holdout selection needs a candidate pool to choose among, so top_K >= 2 is required, and it applies to the MIP modes only (not the annealed relaxation). Power and inference are computed exactly as in the in-sample path. holdout_frac=None (the default) leaves selection on the in-sample optimum, unchanged.

res = SYNDES({
    "df": df, "outcome": "Y", "unitid": "location", "time": "date",
    "K": 3, "mode": "two_way_global", "post_col": "post",
    "top_K": 5, "holdout_frac": 0.3,            # 70/30 train/validate
    "gap_limit": 0.2, "time_limit": 10.0,
}).fit()

for d in res.pool:                              # ranked by out-of-sample error
    print("treated:", sorted(d["markets"]),
          "| oos:", round(d["oos_rmse"], 3),
          "| in-sample:", round(d["pre_fit_rmse"], 3))

In the cross-method comparison, compare_methods() defaults to this holdout selection for SYNDES (syndes_holdout_frac=0.3), adds an oos_rmse column to the comparison table, and orders the SYNDES rows by it; pass syndes_holdout_frac=None to compare in-sample designs instead.

Information-criterion selection (selection="ic")#

The holdout split spends part of the pre-period on validation, which is noisy exactly when the pre-period is short – the regime SYNDES is built for. An information criterion avoids the split: it scores every candidate on the whole pre-window and penalises the model-selection flexibility directly. Pouliot, Xie & Liu [SCInfoCrit] show that in short-\(T_0\) synthetic-control settings such a criterion outperforms cross-validation / holdout. Setting selection="ic" ranks the top_K pool (solved on the full pre-period) by

\[\mathrm{IC}(d) = \mathrm{SSR}^{\text{pre}}(d) + 2\,\hat{\sigma}^2\,\mathrm{df}(d),\]

mirroring Pouliot et al.’s \(\mathbb{E}\lVert \mathbf{Y} - \hat{\mathbf{Y}} \rVert^2 + 2\sigma^2\,\mathrm{df}\). Here \(\mathrm{SSR}^{\text{pre}}\) is the in-sample contrast sum of squares; \(\mathrm{df} = \lvert A\rvert - 1\) with \(A\) the active control donors (their closed form for the unpenalised SCM, in which searching over which donors to use is free); and \(\hat{\sigma}^2\) is a Mallows-\(C_p\)-style noise estimate – the best-fitting candidate’s per-period contrast variance. The candidate with the smallest IC wins, so a design that buys a tighter fit by activating more donors is penalised for it.

The returned results.pool is ranked by IC (rank-1 is the winner on results.design), and each entry carries an ic value and its df. Like holdout, IC selection needs top_K >= 2 and a MIP mode; it does not use holdout_frac. The selection field unifies the three rules – None (default) infers "holdout" when holdout_frac is set and "in_sample" otherwise, so existing configs are unchanged.

res = SYNDES({
    "df": df, "outcome": "Y", "unitid": "location", "time": "date",
    "K": 3, "mode": "two_way_global", "post_col": "post",
    "top_K": 5, "selection": "ic",              # IC over the whole pre-window
    "gap_limit": 0.2, "time_limit": 10.0,
}).fit()

for d in res.pool:                              # ranked by information criterion
    print("treated:", sorted(d["markets"]),
          "| ic:", round(d["ic"], 1), "| df:", d["df"])

Pass syndes_options={"selection": "ic"} to compare_methods() to use IC selection there (it overrides the default holdout).

Ranking the menu by power#

Each menu entry carries its own per-horizon power_curve (a SYNDESPower over horizons 1..12), computed with the same Newey-West machinery as the previous section, so the whole pool is comparable on power – not just the rank-1 winner on res.power_curve. Reading an entry’s MDE at the horizon you plan to run is a single array lookup. The example below takes a 20-market GeoLift subset and plots every menu design’s MDE at horizon \(h = 3\), sorted most-detectable first – turning the menu into a power ranking the MSE objective never sees:

import matplotlib.pyplot as plt
import pandas as pd
from mlsynth import SYNDES

df = pd.read_csv(                                      # GeoLift_PreTest panel
    "https://raw.githubusercontent.com/jgreathouse9/mlsynth/"
    "refs/heads/main/basedata/geolift_market_data.csv"
)
markets = sorted(df["location"].unique())[:20]        # 20-market subset
df = df[df["location"].isin(markets)].copy()
cut = sorted(df["date"].unique())[-14]                # last 14 days = post window
df["post"] = (df["date"] >= cut).astype(int)

# top_K=6 runs six MIPs (one per no-good cut) -- expect ~a minute on SCIP.
res = SYNDES({
    "df": df, "outcome": "Y", "unitid": "location", "time": "date",
    "K": 3, "mode": "two_way_global", "post_col": "post", "top_K": 6,
    "gap_limit": 0.2, "time_limit": 10.0,
}).fit()

H = 3                                                  # horizon to plan for
scored = [("+".join(map(str, sorted(d["markets"]))),
           float(d["power_curve"].mde_percent[H - 1]))   # 1-indexed horizon
          for d in res.pool]
scored.sort(key=lambda r: r[1])                        # smallest MDE -> left

labels = [s[0] for s in scored]
mde = [s[1] for s in scored]
fig, ax = plt.subplots(figsize=(10, 4.5))
ax.bar(range(len(mde)), mde, color="#4C72B0")
ax.set_xticks(range(len(mde)))
ax.set_xticklabels(labels, rotation=40, ha="right", fontsize=8)
ax.set_ylabel(f"MDE at horizon {H} (% of treated baseline)")
ax.set_xlabel("SYNDES menu design (treated set) -- most detectable first")
ax.set_title(f"SYNDES top_K menu: minimum detectable effect at horizon {H}")
fig.tight_layout()
fig.savefig("syndes_menu_mde_h3.png", dpi=120)

Reading the bars left to right gives the designs in order of detectability at the horizon you will actually run: the leftmost is the most powerful experiment in the pool, which – per the lesson above – need not be the rank-1 (best-fitting) design.

Pareto recommendation#

Ranking by power alone has the opposite blind spot to ranking by fit: the most detectable design may balance the pre-period poorly. The honest object is the trade-off between the two, so whenever a pool is produced SYNDES attaches a recommendation to res.recommendation (a SYNDESRecommendation), mirroring the LEXSCM recommender. It has two parts.

First, a Pareto frontier on fit versus power, where fit is the pre-period RMSE between the treated group and the weighted average of the controls (downwards) and power is mde_pct at the realised horizon (downwards): the designs for which neither can be improved without worsening the other. Dominated designs – including, very often, the rank-1 best-fitting design, which buys its fit at the cost of power – are set aside. The frontier is always exposed in res.recommendation.pareto_ids for transparency, with cost as a tie-break rather than a third axis.

Second, a single recommended design picked by a GeoLift-style composite score: each design is dense-ranked on fit and on power (best metric ranks first, exactly as GeoLift’s market-selection score aggregates its component ranks), and the two ranks are combined with the configurable power_weight / fit_weight (normalised to sum to one; default 0.51 / 0.49, a slight preference for power). The smallest combined score wins, with cost then pre-period RMSE breaking ties. Selection never raises: with no design whose MDE is finite it falls back to the best-fitting design and reports status="POWER_NOT_ESTABLISHED"; with no pool, status="EMPTY".

import pandas as pd
from mlsynth import SYNDES

df = pd.read_csv(                                      # GeoLift_PreTest panel
    "https://raw.githubusercontent.com/jgreathouse9/mlsynth/"
    "refs/heads/main/basedata/geolift_market_data.csv"
)
markets = sorted(df["location"].unique())[:20]        # 20-market subset
df = df[df["location"].isin(markets)].copy()
cut = sorted(df["date"].unique())[-14]
df["post"] = (df["date"] >= cut).astype(int)

# top_K=6 runs six MIPs -- expect ~a minute on SCIP. power_weight/fit_weight
# default to 0.51/0.49; raise power_weight to prefer detectability harder.
res = SYNDES({
    "df": df, "outcome": "Y", "unitid": "location", "time": "date",
    "K": 3, "mode": "two_way_global", "post_col": "post", "top_K": 6,
    "gap_limit": 0.2, "time_limit": 10.0,
}).fit()

rec = res.recommendation
print(rec.status, rec.weights)                        # 'OK' {'power':0.51,'fit':0.49}
print(rec.winner.design_id, rec.winner.markets)       # the recommended design
print(rec.winner.control_group)                       # its synthetic-control donors
print(rec.pareto_ids)                                 # fit-power Pareto frontier
print(pd.DataFrame(rec.table))                        # every design, scored + flagged

Designs are numbered by fit: D1 is always the best-fitting design (smallest pre-period RMSE), D2 the next, and so on. The recommended design need not be D1 – when power outweighs fit it is a higher-numbered design with a smaller MDE, which is exactly the trade-off the recommendation surfaces.

When display_graph=True and a pool exists, two figures are drawn: Panel A, the normal SYNDES design plot (synthetic treated vs synthetic control), and Panel B, this fit-versus-power Pareto frontier with the recommended design starred (x-axis: the pre-period RMSE). Panel B can also be drawn on its own with plot_syndes_pareto(). Both render in the in-house mlsynth plot style.

Comparing designs across methods (GEOLIFT vs SYNDES)#

SYNDES and GEOLIFT use different optimisers but share a grammar: each emits a design that reduces to a unit-level contrast \(\mathbf{c} = \mathbf{w}_{\text{treated}} - \mathbf{w}_{\text{control}}\) (both sides summing to one) over the same panel. From \(\mathbf{c}\) two comparable numbers follow – the pre-period fit RMSE \(\sqrt{\operatorname{mean}((\mathbf{Y}_{\text{pre}}\mathbf{c})^2)}\) and, by injecting a known effect at a fixed horizon, a minimum detectable effect. So the two methods’ designs can be placed on one fit-versus-power plane and their Pareto frontiers overlaid.

The one rule that keeps the comparison honest is that the MDE must be computed by a single shared harness for both methods – same horizon, same effect grid, same moving-block permutation null, same baseline – so the frontier gap reflects the designs, not two different power methodologies. mlsynth.utils.design_compare does exactly this. It rests on the fact that adding an effect \(\tau\) to the treated units shifts the contrast mean \(\mathbf{y}_t^\top\mathbf{c}\) by exactly \(\tau\) (the treated weights sum to one), so the length-\(h\) block-mean null shifted by \(\tau\) is the alternative, and the power at \(\tau\) is the share of that shifted null clearing the two-sided critical value.

The example fixes the horizon at five post-periods and overlays both frontiers on the full native GeoLift GeoLift_Test panel (all 40 markets):

compare_methods() is the one-call wrapper: feed it the common options (panel, treated-set size, horizon, post window) plus any per-method overrides, and it fits both estimators and scores them on the shared plane, returning the comparison in dataframe (.table) and plot (.plot()) form.

import pandas as pd
from mlsynth import compare_methods

df = pd.read_csv(                                       # full native GeoLift_Test panel
    "https://raw.githubusercontent.com/jgreathouse9/mlsynth/"
    "refs/heads/main/basedata/geolift_test_data.csv"
)                                                       # all 40 markets

cmp = compare_methods(
    df, outcome="Y", unitid="location", time="date",
    treated_size=3, horizon=5, n_post=5, top_K=6,       # only five post-periods
    syndes_options={"gap_limit": 0.05, "time_limit": 20.0},  # ~2 min for SYNDES
)
print(cmp.table[["method", "label", "fit_rmse", "mde_pct", "pareto"]])
cmp.plot()                                              # overlaid frontiers

cmp.table has one row per design scored on the shared plane (fit_rmse, mde_pct at horizon five) with a per-method pareto flag, and cmp.plot() overlays the two frontiers so you can read directly which method’s designs dominate, and where. cmp.syndes and cmp.geolift keep the underlying fits for further inspection. To do it by hand instead – fit each method yourself and pass the designs through from_syndes() / from_geolift() into compare_pareto() – see those functions.

Verification#

Note

Simulation (all three designs). Following the paper’s Section 5, each replication draws a fresh noisy panel (stationary AR(1) factors + unit levels), re-solves the design MIP on the pre-period, estimates the ATT on the post-period and runs the moving-block permutation test. Setup: \(N=10\) units, \(T_{\text{pre}}=18\), \(T_{\text{post}}=6\), \(K=3\), \(\sigma=0.25\), 40 replications; the effect is injected at \(\tau\) equal to the mean analytic MDE (0.165). Rejection at the 5% level:

design

MDE

bias

RMSE

size

power

per_unit

0.157

0.020

0.098

0.12

0.50

two_way_global

0.166

0.013

0.095

0.12

0.50

one_way_global

0.171

-0.004

0.115

0.23

0.45

random DiM (baseline)

0.096

0.982

0.15

0.25

The paper’s headline result reproduces: all three SYNDES designs are approximately unbiased and cut estimator RMSE roughly ten-fold versus a randomized difference-in-means design (~0.10 vs. 0.98). The moving-block permutation test is mildly over-sized / under-powered at this short pre-period – the design-optimized contrast tightens the pre-period permutation null, and the analytic MDE is a normal-theory benchmark – a finite-sample inference caveat (the authors note correct sizes hold “under rather strong assumptions”) that shrinks as the pre-period grows. The simulation script ships alongside the estimator’s tests.

A second, data-faithful replication reproduces the paper’s own Monte Carlo (Section 5, Table 1) on the exact BLS unemployment panel the authors use (basedata/urate_cps.csv, footnote 4): each simulation samples a 10×10 panel, the design selects \(K \in \{3, 7\}\) treated units on the pre-period, a homogeneous 0.05 effect is added to the treated post-periods, and the ATET RMSE (×1000) is compared to Table 1. mlsynth’s three design modes land at 8.7 / 9.2 / 9.2 (K=3) and 7.7 / 9.0 / 9.0 (K=7) against the paper’s 8.5 / 8.4 / 8.5 and 8.3 / 8.4 / 8.5, and all beat the randomized difference-in-means baseline — the paper’s headline. See SYNDES — Doudchenko et al. (2021) BLS Monte Carlo; run it with python benchmarks/run_benchmarks.py syndes_bls.

Core API#

Synthetic Design (SYNDES) estimator.

Implements the three mixed-integer programming formulations of Doudchenko, Khosravi, Pouget-Abadie, Lahaie, Lubin, Mirrokni, Spiess, and Imbens (2021), “Synthetic Design: An Optimization Approach to Experimental Design with Synthetic Controls” (arXiv:2112.00278). SYNDES jointly selects treated units and synthetic-control weights by solving a single MIP that minimises the post-period mean squared error of the resulting ATT estimator.

The three formulations exposed via the mode field correspond directly to Section 3 of the paper:

  • "per_unit" — separate SC weights w_{ji} per treated

    unit i. Trades a tighter per-unit fit against a richer parameter space.

  • "two_way_global" — single weight vector w_i applied

    symmetrically to the treated and control contrasts. Recommended when treatment effects are homogeneous.

  • "one_way_global"two_way_global with the treated weights

    pinned to 1/K; the SC step only adjusts the control combination. Easiest to interpret as a “weighted difference-in-means”.

Inference defaults to the moving-block permutation test of Chernozhukov, Wuethrich, and Zhu (2021), applied uniformly to all three modes via the shared contrast-vector dispatch in mlsynth.utils.syndes_helpers.inference.

Two additional MIP features round out the estimator:

  • Budget constraint (paper section 1): supply costs (length N) and budget to add sum_i c_i D_i <= B to the MIP.

  • Annealed relaxation (mode="two_way_global_annealed"): simulated-annealing alternative to the MIP for the symmetric two-way formulation. Useful when a commercial MIP solver is unavailable or the problem size makes the MIP impractical.

Post-fit, see mlsynth.power_analysis() for per-horizon minimum-detectable-effect tables.

class mlsynth.estimators.syndes.SYNDES(config: SYNDESConfig | dict)#

Bases: object

Synthetic Design (Doudchenko et al. 2021) estimator.

Parameters:

config (SYNDESConfig or dict) – Configuration object. See mlsynth.config_models.SYNDESConfig.

Returns:

SYNDESResults or RelaxedSolverResults – For the three MIP modes, a SYNDESResults container with the optimised design and optional permutation inference. For mode="two_way_global_annealed" the relaxed solver returns a RelaxedSolverResults container with design, trace, inputs, and optional inference.

fit() SYNDESResults | RelaxedSolverResults | SYNDESMultiArmResults#

Solve the MIP (or relaxation), run optional inference, return results.

Returns a single result when no arm column is configured; otherwise solves the SYNDES design independently within each arm’s units and returns a SYNDESMultiArmResults keyed by arm label.

Configuration#

class mlsynth.config_models.SYNDESConfig(*, df: DataFrame, outcome: str, unitid: str, time: str, K: Annotated[int | None, Gt(gt=0)] = None, mode: Literal['per_unit', 'two_way_global', 'one_way_global', 'two_way_global_annealed'] = 'two_way_global', lam: Annotated[float | None, Ge(ge=0.0)] = None, T0: Annotated[int | None, Gt(gt=0)] = None, post_col: str | None = None, alpha: Annotated[float, Gt(gt=0.0), Lt(lt=1.0)] = 0.1, run_inference: bool = True, solver: Any = 'SCIP', relaxed_max_iter: Annotated[int, Gt(gt=0)] = 40, relaxed_decay: Annotated[float, Gt(gt=0.0), Lt(lt=1.0)] = 0.97, gap_limit: Annotated[float | None, Ge(ge=0.0), Lt(lt=1.0)] = 0.05, time_limit: Annotated[float | None, Gt(gt=0.0)] = 60.0, display_graph: bool = False, verbose: bool = False, costs: List[float] | None = None, budget: Annotated[float | None, Gt(gt=0.0)] = None, top_K: Annotated[int, Ge(ge=1)] = 1, arm: str | None = None, power_weight: Annotated[float, Gt(gt=0)] = 0.51, fit_weight: Annotated[float, Gt(gt=0)] = 0.49, max_shortlist: Annotated[int, Gt(gt=0)] = 5, to_be_treated: List | None = None, not_to_be_treated: List | None = None, cluster_col: str | None = None, adjacency: DataFrame | None = None, spillover_threshold: float = 0.0, stratum_col: str | None = None, min_per_stratum: int | None = None, max_per_stratum: int | None = None, size_col: str | None = None, min_size: float | None = None, max_size: float | None = None, donor_region_col: str | None = None, exclude_bordering_donors: bool = False, donor_exclusion: DataFrame | None = None, selection: Literal['in_sample', 'holdout', 'ic'] | None = None, holdout_frac: Annotated[float | None, Gt(gt=0.0), Lt(lt=1.0)] = None)#

Configuration for the Synthetic Design (SYNDES) estimator.

Implements the three MIP formulations of Doudchenko, Khosravi, Pouget-Abadie, Lahaie, Lubin, Mirrokni, Spiess, and Imbens (2021), “Synthetic Design: An Optimization Approach to Experimental Design with Synthetic Controls” (arXiv:2112.00278). The estimator jointly chooses

  • which units to treat (binary assignment D), and

  • the synthetic-control weights w used to build the counterfactual,

by minimising a single mean-squared-error objective. Three formulations are exposed, each with a different geometry over the treated/control sample-variance terms (Theorem 1 of the paper):

  • "per_unit" – separate SC weights for each treated

    unit (paper’s “per-unit” problem).

  • "two_way_global" – single weight vector applied

    symmetrically to treated and control (paper’s “two-way global” problem).

  • "one_way_global""two_way_global" with equal

    weights pinned on the treated set (paper’s “one-way global” problem).

  • "two_way_global_annealed" – simulated-annealing relaxation

    of two_way_global (mlsynth-specific extension; not in the paper).

Parameters:
  • K (int or None) – Number of treated units. Required for per_unit and one_way_global. May be None for two_way_global (Doudchenko et al. 2021, paragraph after eq. 9, note that the K-constraint is mathematically optional in the symmetric formulation); when None the MIP picks the cardinality of the treated set endogenously, with at least one treated and one control unit.

  • mode (str) – Paper-aligned mode name (see above).

  • lam (float or None) – Penalty on the squared weights. None defaults to the sample variance of the pre-treatment outcomes (Section 6 of the paper).

  • T0 (int or None) – Number of pre-treatment periods. If neither T0 nor post_col is supplied, the entire panel is treated as pre-treatment (design-only / planning mode – no post period, so no ATT/inference is produced).

  • post_col (str or None) – Optional 0/1 column identifying post-treatment periods.

  • alpha (float) – Two-sided significance level for the permutation test.

  • run_inference (bool) – Whether to run the moving-block permutation test (Chernozhukov-Wuethrich-Zhu (2021) style; see Appendix A.4 of the paper).

  • solver (Any) – CVXPY-compatible MIP solver. Defaults to SCIP.

  • display_graph (bool) – Whether to plot the design.

  • verbose (bool) – Solver verbosity.

K: int | None#
T0: int | None#
adjacency: pd.DataFrame | None#
alpha: float#
arm: str | None#
budget: float | None#
cluster_col: str | None#
costs: List[float] | None#
display_graph: bool#
donor_exclusion: pd.DataFrame | None#
donor_region_col: str | None#
exclude_bordering_donors: bool#
fit_weight: float#
gap_limit: float | None#
holdout_frac: float | None#
lam: float | None#
max_per_stratum: int | None#
max_shortlist: int#
max_size: float | None#
min_per_stratum: int | None#
min_size: float | None#
mode: Literal['per_unit', 'two_way_global', 'one_way_global', 'two_way_global_annealed']#
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

not_to_be_treated: List | None#
post_col: str | None#
power_weight: float#
relaxed_decay: float#
relaxed_max_iter: int#
run_inference: bool#
selection: Literal['in_sample', 'holdout', 'ic'] | None#
size_col: str | None#
solver: Any#
spillover_threshold: float#
stratum_col: str | None#
time_limit: float | None#
to_be_treated: List | None#
top_K: int#
verbose: bool#

Result Containers#

SYNDES.fit() returns a SYNDESResults, bundling the optimized SYNDESDesign (assignment, treated/control/contrast weights, the pre-period contrast_series and pre_fit_rmse, objective value), the prepared SYNDESInputs, and optional SYNDESInference. The mode="two_way_global_annealed" path instead returns a RelaxedSolverResults.

Structured containers for the SYNDES synthetic design pipeline.

class mlsynth.utils.syndes_helpers.structures.SYNDESDesign(mode: str, objective_value: float, lambda_value: float, assignment: ~numpy.ndarray, selected_unit_indices: ~numpy.ndarray, selected_unit_labels: ~numpy.ndarray, assignment_by_unit: ~typing.Dict[~typing.Any, int], w: ~numpy.ndarray | None = None, q: ~numpy.ndarray | None = None, z: ~numpy.ndarray | None = None, raw_results: ~typing.Dict[str, ~typing.Any] = <factory>, treated_weights: ~numpy.ndarray | None = None, control_weights: ~numpy.ndarray | None = None, contrast_weights: ~numpy.ndarray | None = None, contrast_series: ~numpy.ndarray | None = None, pre_fit_rmse: float | None = None)#

Bases: object

Optimized SYNDES design solution.

Contains the outcome of solving the mixed-integer program, including treatment assignment and synthetic control weights.

Parameters:
  • mode (str) – Optimization mode used.

  • objective_value (float) – Optimal value of the objective function.

  • lambda_value (float) – Regularization parameter used in optimization.

  • assignment (np.ndarray) – Binary treatment assignment vector D.

  • selected_unit_indices (np.ndarray) – Integer indices of treated units.

  • selected_unit_labels (np.ndarray) – Original labels of treated units.

  • assignment_by_unit (dict) – Mapping from unit label to treatment indicator.

  • w (np.ndarray or None, optional) – Synthetic control weights (global or per-unit depending on mode).

  • q (np.ndarray or None, optional) – Auxiliary optimization variables (mode-dependent).

  • z (np.ndarray or None, optional) – Additional binary or continuous decision variables.

  • raw_results (dict, optional) – Raw solver output.

  • treated_weights (np.ndarray or None, optional) – Normalized weights over treated units.

  • control_weights (np.ndarray or None, optional) – Normalized weights over control units.

  • contrast_weights (np.ndarray or None, optional) – Difference weights used in global estimators.

  • contrast_series (np.ndarray or None, optional) – Pre-period fitted contrast Y_pre @ contrast_weights, shape (T_pre,) – the per-period treated-minus-synthetic gap the design balances. The design’s “prediction” of the (zero) pre-period effect.

  • pre_fit_rmse (float or None, optional) – Root mean squared pre-period contrast, sqrt(mean(contrast_series^2)) – how tightly the design balances treated and control before launch.

Notes

This object represents the solution to the design stage only. Post-treatment estimation and inference are handled separately.

assignment: ndarray#
assignment_by_unit: Dict[Any, int]#
contrast_series: ndarray | None = None#
contrast_weights: ndarray | None = None#
control_weights: ndarray | None = None#
lambda_value: float#
mode: str#
objective_value: float#
pre_fit_rmse: float | None = None#
q: ndarray | None = None#
raw_results: Dict[str, Any]#
selected_unit_indices: ndarray#
selected_unit_labels: ndarray#
treated_weights: ndarray | None = None#
w: ndarray | None = None#
z: ndarray | None = None#
class mlsynth.utils.syndes_helpers.structures.SYNDESInference(atet: float, p_value: float, reject: bool, alpha: float, method: str, null_stats: ndarray | None = None)#

Bases: object

Permutation-based inference results for SYNDES.

Parameters:
  • atet (float) – Estimated average treatment effect on treated units.

  • p_value (float) – Permutation-based p-value.

  • reject (bool) – Whether the null hypothesis is rejected at level alpha.

  • alpha (float) – Significance level used for the test.

  • method (str) – Name of inference procedure.

  • null_stats (np.ndarray or None) – Empirical null distribution of test statistics.

Notes

Inference is currently implemented for the global_2way mode only.

alpha: float#
atet: float#
method: str#
null_stats: ndarray | None = None#
p_value: float#
reject: bool#
class mlsynth.utils.syndes_helpers.structures.SYNDESInputs(Y_pre: ndarray, Y_post: ndarray | None, unit_index: IndexSet, time_index: IndexSet, pre_time_index: IndexSet, post_time_index: IndexSet | None, outcome: str)#

Bases: object

Preprocessed panel data for SYNDES estimation.

Contains matrix representations of outcomes and index mappings used in optimization.

Parameters:
  • Y_pre (np.ndarray) – Pre-treatment outcome matrix of shape (T_pre, N) – rows are time periods, columns are units.

  • Y_post (np.ndarray or None) – Post-treatment outcome matrix of shape (T_post, N), if available.

  • unit_index (IndexSet) – Mapping from unit labels to integer indices.

  • time_index (IndexSet) – Mapping from time labels to integer indices.

  • pre_time_index (IndexSet) – Index set for pre-treatment periods.

  • post_time_index (IndexSet or None) – Index set for post-treatment periods.

  • outcome (str) – Name of outcome variable.

Notes

All matrices are aligned such that rows correspond to units and columns correspond to time periods.

Y_post: ndarray | None#
Y_pre: ndarray#
outcome: str#
post_time_index: IndexSet | None#
pre_time_index: IndexSet#
time_index: IndexSet#
unit_index: IndexSet#
class mlsynth.utils.syndes_helpers.structures.SYNDESMultiArmResults(arm_designs: Dict[Any, Any], arm: str)#

Bases: object

Per-arm SYNDES designs.

Returned by mlsynth.estimators.SYNDES when an arm column is configured: the SYNDES design problem is solved independently within each arm’s units, and each arm’s full result (a SYNDESResults for the MIP modes, or a RelaxedSolverResults for the annealed mode) is collected here.

Parameters:
  • arm_designs (dict) – {arm_label: SYNDESResults} – one independent SYNDES solution per arm (or RelaxedSolverResults under mode="two_way_global_annealed").

  • arm (str) – Name of the arm column the units were partitioned on.

arm: str#
arm_designs: Dict[Any, Any]#
atet_by_arm() Dict[Any, float | None]#

{arm_label: ATET} across arms (None where no inference).

property mode: str#
selected_unit_labels_by_arm() Dict[Any, Any]#

{arm_label: selected_unit_labels} across arms.

class mlsynth.utils.syndes_helpers.structures.SYNDESProblemComponents(mode: str, objective: Expression, constraints: List[Constraint], variables: Dict[str, Variable], assignment_variable: Variable)#

Bases: object

CVXPY components for an SYNDES optimization problem.

This container stores the symbolic elements required to construct and solve the mixed-integer program underlying SYNDES.

Parameters:
  • mode (str) – SYNDES formulation (“global_2way”, “global_equal_weights”, “per_unit”).

  • objective (cp.Expression) – CVXPY objective function.

  • constraints (list of cp.Constraint) – Linear and convex constraints defining the feasible set.

  • variables (dict of str -> cp.Variable) – Named CVXPY decision variables (e.g., weights, assignment variables).

  • assignment_variable (cp.Variable) – Binary treatment assignment variable D.

with_constraints(additional_constraints)#

Return a new problem with extra constraints appended.

Notes

This object is intentionally “solver-agnostic” and is used to separate model construction from optimization execution.

assignment_variable: Variable#
constraints: List[Constraint]#
mode: str#
objective: Expression#
variables: Dict[str, Variable]#
with_constraints(additional_constraints: List[Constraint]) SYNDESProblemComponents#

Append additional constraints to the current formulation.

Parameters:

additional_constraints (list of cp.Constraint) – Extra constraints to include in the optimization problem.

Returns:

SYNDESProblemComponents – New instance with extended constraint set.

class mlsynth.utils.syndes_helpers.structures.SYNDESResults(design: SYNDESDesign, inputs: SYNDESInputs, inference: SYNDESInference | None = None, post_fit: Any | None = None, pool: Any | None = None, power_curve: Any | None = None, recommendation: Any | None = None)#

Bases: object

Complete SYNDES estimation output.

This object bundles the optimized design, prepared inputs, and optional inference results.

Parameters:
  • design (SYNDESDesign) – Optimization solution.

  • inputs (SYNDESInputs) – Preprocessed data used in estimation.

  • inference (SYNDESInference or None) – Optional inference results.

  • post_fit (SyntheticControlPostFit, optional) – Standardized post-fit diagnostics (ATE / total effect / percentage lift / fit RMSEs / inference / power analysis) attached at the end of mlsynth.estimators.SYNDES.fit() via mlsynth.utils.post_fit.compute_post_fit(). Same shape and semantics as the MAREX / LEXSCM / PANGEO equivalents, so downstream consumers see a single uniform surface across the family.

  • power_curve (SYNDESPower, optional) – Per-horizon minimum-detectable-effect table over post-period horizons 1..12, attached by default at the end of mlsynth.SYNDES.fit() so the MDE curve comes back without a separate mlsynth.power_analysis() call. power_curve.to_dataframe() is the tidy (n_post, mde_absolute, mde_percent) table. Left as None if the computation is degenerate (e.g. too-short pre-period). For a custom horizon grid, significance level, or baseline, call mlsynth.power_analysis() directly.

  • recommendation (SYNDESRecommendation, optional) – Present when a solution pool exists (top_K > 1): the Pareto frontier on fit vs power and a single recommended design chosen by a GeoLift-style composite score (see mlsynth.utils.syndes_helpers.select.recommend_syndes()). None for a single-design fit.

Notes

This is the primary return object of SYNDES.fit().

Properties#

modestr

Alias for design.mode.

selected_unit_labelsnp.ndarray

Labels of treated units selected by the design.

design: SYNDESDesign#
inference: SYNDESInference | None = None#
inputs: SYNDESInputs#
property mode: str#

Return the optimization mode used.

pool: Any | None = None#
post_fit: Any | None = None#
power_curve: Any | None = None#
recommendation: Any | None = None#
property selected_unit_labels: ndarray#

Return labels of selected treated units.

In addition, SYNDES.fit() attaches a SyntheticControlPostFit as results.post_fit — the standardized diagnostics container shared with the rest of the MAREX family (LEXSCM / MAREX / PANGEO). It carries the ATE / total effect / percentage lift / per-period gap, pre- and post-period RMSEs, the inference triple (p-value, CI), and a PowerAnalysis block with the headline MDE and the MDE-versus-horizon curve.

class mlsynth.utils.post_fit.SyntheticControlPostFit(treated_series: ndarray, control_series: ndarray, gap_series: ndarray, n_fit: int, n_blank: int, n_post: int, ate: float | None = None, total_effect: float | None = None, ate_percent: float | None = None, ate_per_period: ndarray | None = None, cumulative_effect: ndarray | None = None, p_value: float | None = None, ci_lower: float | None = None, ci_upper: float | None = None, inference_method: str | None = None, rmse_fit: float | None = None, rmse_blank: float | None = None, rmse_post: float | None = None, covariate_names: Tuple[str, ...] = (), covariate_smd: Dict[str, float] | None = None, covariate_smd_abs_max: float | None = None, covariate_smd_squared_sum: float | None = None, covariate_smd_treated_vs_pop: Dict[str, float] | None = None, covariate_smd_treated_vs_pop_abs_max: float | None = None, covariate_smd_treated_vs_pop_squared_sum: float | None = None, covariate_smd_control_vs_pop: Dict[str, float] | None = None, covariate_smd_control_vs_pop_abs_max: float | None = None, covariate_smd_control_vs_pop_squared_sum: float | None = None, power: PowerAnalysis | None = None)#

Bases: object

Standardized post-fit diagnostics for a single synthetic control design.

Field semantics are estimator-agnostic; every MAREX-family adapter populates the same shape. Any field that isn’t naturally computable for the producing estimator is left None.

ate: float | None = None#
ate_per_period: ndarray | None = None#
ate_percent: float | None = None#
ci_lower: float | None = None#
ci_upper: float | None = None#
control_series: ndarray#
covariate_names: Tuple[str, ...] = ()#
covariate_smd: Dict[str, float] | None = None#
covariate_smd_abs_max: float | None = None#
covariate_smd_control_vs_pop: Dict[str, float] | None = None#
covariate_smd_control_vs_pop_abs_max: float | None = None#
covariate_smd_control_vs_pop_squared_sum: float | None = None#
covariate_smd_squared_sum: float | None = None#
covariate_smd_treated_vs_pop: Dict[str, float] | None = None#
covariate_smd_treated_vs_pop_abs_max: float | None = None#
covariate_smd_treated_vs_pop_squared_sum: float | None = None#
cumulative_effect: ndarray | None = None#
gap_series: ndarray#
inference_method: str | None = None#
n_blank: int#
n_fit: int#
n_post: int#
p_value: float | None = None#
power: PowerAnalysis | None = None#
rmse_blank: float | None = None#
rmse_fit: float | None = None#
rmse_post: float | None = None#
total_effect: float | None = None#
treated_series: ndarray#
class mlsynth.utils.post_fit.PowerAnalysis(headline: MDEPoint, curve: Tuple[MDEPoint, ...], alpha: float, power_target: float, sigma_placebo: float, serial_correlation: float, baseline: float, method: str = 'analytical_ar1')#

Bases: object

Standardized power-analysis output attached to SyntheticControlPostFit.

Built from the placebo / blank-period gap variance and an analytical Gaussian approximation, with AR(1) variance inflation to handle serial correlation in the gap residuals. The intent matches the per-estimator power modules already in the library (PangeoPower, SPCDPowerAnalysis, SYNDESPower) but consumes the same SyntheticControlPostFit shape so every covariate-aware SCM-family estimator gets the surface for free.

headline#

MDE for the actual n_post horizon of the realised design.

Type:

MDEPoint

curve#

MDE / power values across the requested post_grid horizons (so callers can read a detectability curve).

Type:

list of MDEPoint

alpha#

Two-sided significance level assumed.

Type:

float

power_target#

Target power the MDEs are computed at (default 0.80).

Type:

float

sigma_placebo#

Standard deviation of the placebo gap series used as the noise scale.

Type:

float

serial_correlation#

Lag-1 (AR(1)) autocorrelation of the placebo gap residuals used to inflate the variance for serial dependence.

Type:

float

baseline#

Mean of the control trajectory on the post window (denominator for mde_pct). NaN when no post window exists.

Type:

float

method#

"analytical_ar1" for the closed-form Gaussian + AR(1) MDE used here. Reserved for future "monte_carlo" extensions.

Type:

str

alpha: float#
baseline: float#
curve: Tuple[MDEPoint, ...]#
headline: MDEPoint#
mde_by_horizon() Dict[int, float]#

{post_periods: mde_pct} for quick lookup.

method: str = 'analytical_ar1'#
power_target: float#
serial_correlation: float#
sigma_placebo: float#
class mlsynth.utils.post_fit.MDEPoint(post_periods: int, mde_absolute: float, mde_pct: float, se: float, power_at_observed: float | None = None)#

Bases: object

Minimum detectable effect at a single post-treatment horizon.

mde_absolute: float#
mde_pct: float#
post_periods: int#
power_at_observed: float | None = None#
se: float#

Helper Modules#

Data preparation – the only DataFrame touchpoint: pivots to wide pre/post matrices and builds the unit/time IndexSetes.

Data preparation helpers for SYNDES.

mlsynth.utils.syndes_helpers.setup.prepare_syndes_inputs(df: DataFrame, outcome: str, unitid: str, time: str, T0: int | None = None, post_col: str | None = None) SYNDESInputs#

Pivot long panel data and split it into pre/post matrices for SYNDES.

Parameters:
  • df (pd.DataFrame) – Long balanced panel data.

  • outcome, unitid, time (str) – Column names identifying the outcome, units, and time periods.

  • T0 (Optional[int]) – Number of pre-treatment periods when post_col is not supplied.

  • post_col (Optional[str]) – Optional 0/1 or boolean column identifying post-treatment periods.

Returns:

SYNDESInputs – Wide pre/post matrices and label metadata.

The CVXPY objective/constraint builders for the three MIP formulations.

Objective and constraint builders for SYNDES optimization problems.

This module follows the formulations of Doudchenko et al. (2021, arXiv:2112.00278). The three modes — global_2way, global_equal_weights and per_unit — express the pre-treatment residual implicitly inside cp.sum_squares rather than via explicit auxiliary residual variables z_t. The previous implementation introduced one explicit z_t decision variable plus one linear equality constraint per pre-treatment period, which on long panels (e.g. the Walmart weekly-sales panel, T = 128) added T extra columns and T extra rows to SCIP’s LP relaxation at every branch-and-bound node and dominated the solve time. Inlining the residual lets cvxpy emit a single second-order-cone epigraph and gives SCIP an LP that is roughly T rows smaller per node, matching the implicit-residual pattern used by MAREX (see mlsynth.utils.marex_helpers.formulation for the analogous cp.sum_squares(Xbar - Y_T @ w) style).

mlsynth.utils.syndes_helpers.formulation.build_global_2way_components(Y: ndarray, D: Variable, K: int, lam: float) SYNDESProblemComponents#

Construct full SYNDES problem (global two-way formulation).

Returns a complete CVXPY optimization specification consisting of: objective, constraints, variables and assignment variable.

Parameters:
  • Y (np.ndarray) – Outcome matrix (T, N).

  • D (cp.Variable) – Binary assignment vector.

  • K (int) – Number of treated units (may be None for global modes).

  • lam (float) – Weight regularization parameter.

Returns:

SYNDESProblemComponents – Fully specified optimization problem.

mlsynth.utils.syndes_helpers.formulation.build_global_2way_constraints(Y: ndarray, D: Variable, K: int | None, variables: Dict[str, Variable]) List[Constraint]#

Build constraints for the global two-way SYNDES formulation.

The formulation enforces:

  • exactly K treated units (or 1 <= sum(D) <= N-1 when K is None);

  • the McCormick linearisation of q_i = w_i * D_i;

  • the normalisation sum_i q_i = 1 (treated weight), sum_i w_i = 2 (treated weight + control weight, each summing to 1).

The residual z_t = sum_i (2 q_i - w_i) Y_{t,i} is no longer a decision variable; it is computed inline by build_global_2way_objective().

Parameters:
  • Y (np.ndarray) – Outcome matrix of shape (T, N).

  • D (cp.Variable) – Binary treatment assignment vector.

  • K (int, optional) – Number of treated units.

  • variables (dict) – CVXPY variables {"w", "q"}.

Returns:

list of cp.Constraint – Constraints defining the feasible region.

mlsynth.utils.syndes_helpers.formulation.build_global_2way_objective(Y: ndarray, lam: float, variables: Dict[str, Variable]) Expression#

Construct objective for global two-way SYNDES.

Objective corresponds to:

\[\frac{1}{T} \sum_t z_t^2 \,+\, \lambda \, \| w \|_2^2,\]

where \(z_t = \sum_i (2 q_i - w_i) Y_{t,i}\) is the treated-minus-control contrast residual. The residual is computed inline (as a single cp.sum_squares of the length-T vector Y @ (2 q - w)) rather than via T auxiliary variables and T equality constraints; cvxpy compiles it into a single second-order-cone epigraph, which dramatically reduces SCIP’s LP relaxation size per branch-and-bound node on long panels.

Parameters:
  • Y (np.ndarray) – Outcome matrix (T, N).

  • lam (float) – Regularization strength on weights.

  • variables (dict) – CVXPY variables {"w", "q"}.

Returns:

cp.Expression – Convex objective expression.

mlsynth.utils.syndes_helpers.formulation.build_global_2way_variables(T: int, N: int) Dict[str, Variable]#

Create CVXPY decision variables for the global two-way SYNDES formulation.

The pre-treatment residual is now expressed implicitly inside the objective (see build_global_2way_objective()) so no auxiliary z_t variables are returned.

Parameters:
  • T (int) – Number of pre-treatment time periods. Retained for signature compatibility; not used.

  • N (int) – Number of units.

Returns:

dict – Dictionary containing:

  • "w": unit weights (N,)

  • "q": treated-weight interaction variables (N,)

mlsynth.utils.syndes_helpers.formulation.build_global_equal_weights_components(Y: ndarray, D: Variable, K: int, lam: float) SYNDESProblemComponents#

Build full equal-weight global SYNDES problem.

Parameters:
  • Y (np.ndarray) – Outcome matrix.

  • D (cp.Variable) – Assignment variable.

  • K (int) – Number of treated units.

  • lam (float) – Regularization parameter.

Returns:

SYNDESProblemComponents – Complete optimization specification.

mlsynth.utils.syndes_helpers.formulation.build_global_equal_weights_constraints(Y: ndarray, D: Variable, K: int | None, variables: Dict[str, Variable]) List[Constraint]#

Build constraints for the one-way global SYNDES formulation.

The treated side is a simple average (weight 1/K on each treated unit); the control side is a free synthetic control. With c the control weights and D the assignment, the per-period contrast is

\[z_t = \frac{1}{K} \sum_i D_i Y_{i,t} - \sum_i c_i Y_{i,t},\]

subject to sum_i D_i = K, sum_i c_i = 1, c_i >= 0 and c_i <= 1 - D_i (so treated units carry no control weight). The residual is now computed inline by build_global_equal_weights_objective().

Parameters:
  • Y (np.ndarray) – Outcome matrix (T, N).

  • D (cp.Variable) – Binary assignment vector.

  • K (int) – Number of treated units (required for this mode).

  • variables (dict) – Contains "c".

Returns:

list of cp.Constraint – Feasibility constraints for assignment, control simplex.

mlsynth.utils.syndes_helpers.formulation.build_global_equal_weights_objective(Y: ndarray, K: int, lam: float, variables: Dict[str, Variable]) Expression#

Construct objective for the one-way global SYNDES formulation.

The objective is

\[\frac{1}{T} \sum_t z_t^2 \,+\, \lambda \left( \frac{1}{K} + \| c \|_2^2 \right),\]

where \(z_t = \frac{1}{K} \sum_i D_i Y_{i,t} - \sum_i c_i Y_{i,t}\) is the residual, 1/K is the (constant) penalty contributed by the pinned treated weights and \(\|c\|_2^2\) is the penalty on the free control weights. The residual vector is computed implicitly as (1/K) * (Y @ D) - Y @ c, a single length-T cvxpy expression that compiles into one second-order-cone epigraph.

Parameters:
  • Y (np.ndarray) – Outcome matrix (T, N).

  • K (int) – Number of treated units.

  • lam (float) – Regularization parameter (the paper’s \(\sigma^2\)).

  • variables (dict) – Contains "c".

Returns:

cp.Expression – Objective function.

mlsynth.utils.syndes_helpers.formulation.build_global_equal_weights_variables(T: int, N: int) Dict[str, Variable]#

Create CVXPY variables for the one-way global formulation.

This is the paper’s one-way global design (Doudchenko et al. 2021, eq. “one-way global”): the treated weights are pinned to 1/K (a simple average of the treated units), while the control weights remain free synthetic-control weights to be optimised. Only the control weights c are decision variables here; the assignment D is passed in separately. The pre-treatment residual is now expressed implicitly inside the objective (see build_global_equal_weights_objective()) so no auxiliary z_t variables are returned.

Parameters:
  • T (int) – Number of time periods. Retained for signature compatibility; not used.

  • N (int) – Number of units.

Returns:

dict – Contains:

  • "c": free control-side synthetic weights (N,), nonneg.

mlsynth.utils.syndes_helpers.formulation.build_per_unit_components(Y: ndarray, D: Variable, K: int, lam: float) SYNDESProblemComponents#

Construct full per-unit SYNDES optimization problem.

Parameters:
  • Y (np.ndarray) – Outcome matrix.

  • D (cp.Variable) – Assignment vector.

  • K (int) – Number of treated units.

  • lam (float) – Regularization parameter.

Returns:

SYNDESProblemComponents – Full per-unit optimization specification.

mlsynth.utils.syndes_helpers.formulation.build_per_unit_constraints(Y: ndarray, D: Variable, K: int, variables: Dict[str, Variable]) List[Constraint]#

Build constraints for per-unit SYNDES formulation.

Each treated unit constructs its own synthetic control using control units only.

Structure:

  • D selects treated units;

  • each treated unit i has weights over donor pool j;

  • q_{i,j} enforces the interaction q_{i,j} = w_{i,j} (1 - D_j).

The per-unit residual is now computed inline by build_per_unit_objective().

Parameters:
  • Y (np.ndarray) – Outcome matrix (T, N).

  • D (cp.Variable) – Binary treatment assignment.

  • K (int) – Number of treated units.

  • variables (dict) – Contains w and q.

Returns:

list of cp.Constraint – Constraints defining per-unit synthetic control system.

mlsynth.utils.syndes_helpers.formulation.build_per_unit_objective(Y: ndarray, K: int, lam: float, variables: Dict[str, Variable], D: Variable) Expression#

Construct objective for per-unit SYNDES formulation.

Objective corresponds to:

\[\frac{1}{KT} \sum_i \sum_t z_{i,t}^2 + \frac{\lambda}{K} \| w \|_F^2,\]

where \(z_{i,t} = D_i Y_{i,t} - \sum_j q_{i,j} Y_{j,t}\) is the per-unit residual. Each per-unit residual vector is computed implicitly as D[i] * Y[:, i] - Y @ q[i, :] (a length-T cvxpy expression) and stacked over the N units before the Frobenius-norm squared. cvxpy compiles the result into a single SOC epigraph, eliminating the N * T auxiliary z_{i,t} variables and N * T equality constraints used previously.

Parameters:
  • Y (np.ndarray) – Outcome matrix (T, N).

  • K (int) – Number of treated units.

  • lam (float) – Regularization parameter.

  • variables (dict) – Contains "w" and "q".

  • D (cp.Variable) – Binary treatment assignment (used inline to scale each treated unit’s contribution).

Returns:

cp.Expression – Objective function.

mlsynth.utils.syndes_helpers.formulation.build_per_unit_variables(T: int, N: int) Dict[str, Variable]#

Create CVXPY variables for the per-unit SYNDES formulation.

This formulation constructs a separate synthetic control for each treated unit i. The per-period, per-unit residual z_{i,t} = D_i Y_{i,t} - sum_j q_{i,j} Y_{j,t} is computed inline by build_per_unit_objective() and not stored as a decision variable.

Parameters:
  • T (int) – Number of pre-treatment periods. Retained for signature compatibility; not used.

  • N (int) – Number of units.

Returns:

dict – Contains:

  • "w": (N, N) unit-specific weights.

  • "q": (N, N) interaction terms q_{i,j} = w_{i,j} (1 - D_j).

mlsynth.utils.syndes_helpers.formulation.build_syndes_problem_components(Y: ndarray, D: Variable, K: int, lam: float, mode: str) SYNDESProblemComponents#

Dispatch SYNDES formulation builder based on mode.

Parameters:
  • Y (np.ndarray) – Outcome matrix.

  • D (cp.Variable) – Binary treatment assignment.

  • K (int) – Number of treated units.

  • lam (float) – Regularization parameter.

  • mode ({“global_2way”, “global_equal_weights”, “per_unit”}) – SYNDES formulation selector.

Returns:

SYNDESProblemComponents – Fully specified optimization problem.

Raises:

ValueError – If mode is not recognized.

mlsynth.utils.syndes_helpers.formulation.unpack_problem_components(components: SYNDESProblemComponents) Tuple[Expression, List[Constraint], Dict[str, Variable]]#

Unpack SYNDES problem components.

Parameters:

components (SYNDESProblemComponents) – Structured optimization container.

Returns:

tuple – (objective, constraints, variables)

The solver wrapper: builds the MIP, applies optional budget constraints, solves, and extracts the assignment, weights, and pre-period prediction.

Solver-facing optimization utilities for SYNDES.

mlsynth.utils.syndes_helpers.optimization.estimate_lambda(Y: ndarray) float#

Estimate SYNDES penalty parameter as average within-unit variance.

Parameters:

Y (np.ndarray) – Pre-treatment outcome matrix of shape (T, N).

Returns:

float – Estimated lambda value.

Raises:
  • MlsynthDataError – If Y is not 2D.

  • MlsynthConfigError – If fewer than 2 time periods are provided.

mlsynth.utils.syndes_helpers.optimization.solve_synthetic_design(Y: ndarray, K: int | None, mode: str = 'global_2way', lam: float | None = None, solver: Any = 'SCIP', verbose: bool = False, unit_index: IndexSet | None = None, costs: ndarray | None = None, budget: float | None = None, gap_limit: float | None = None, time_limit: float | None = None, forbidden_sets: list | None = None, restrictions: Any | None = None) SYNDESDesign#

Solve the SYNDES synthetic design optimization problem.

Parameters:
  • Y (np.ndarray) – Pre-treatment outcome matrix of shape (T, N).

  • K (int) – Number of treated units.

  • mode ({“global_2way”, “global_equal_weights”, “per_unit”}, optional) – SYNDES formulation type.

  • lam (float, optional) – Regularization parameter. If None, estimated from Y.

  • solver (Any, optional) – CVXPY-compatible solver specification.

  • verbose (bool, optional) – Whether to enable solver verbosity.

  • unit_index (IndexSet, optional) – Mapping from indices to unit labels.

Returns:

SYNDESDesign – Optimized design object.

Raises:
  • MlsynthConfigError – If inputs are invalid or lambda is negative.

  • MlsynthEstimationError – If optimization fails or is infeasible.

mlsynth.utils.syndes_helpers.optimization.solve_synthetic_design_pool(Y: ndarray, K: int, top_K: int = 1, *, mode: str = 'global_2way', lam: float | None = None, solver: Any = 'SCIP', verbose: bool = False, unit_index: IndexSet | None = None, costs: ndarray | None = None, budget: float | None = None, gap_limit: float | None = None, time_limit: float | None = None, restrictions: Any | None = None) list#

Top-top_K SYNDES designs, ranked by MSE, via no-good cuts.

Re-solves the SYNDES MIP top_K times; after each solve the chosen treated set is forbidden (sum_{i in S} D_i <= |S|-1) so the next solve returns the next-best distinct design. The returned list is ranked by objective value (non-decreasing), and solve_synthetic_design_pool(..., top_K=1) returns [solve_synthetic_design(...)] – the single optimum unchanged. Stops early (returning fewer than top_K) once the feasible region is exhausted.

Parameters:
  • top_K (int) – Number of distinct designs to enumerate (>= 1).

  • (all other parameters as in :func:`solve_synthetic_design`).

Returns:

list of SYNDESDesign – Distinct designs ordered by ascending MSE; [0] is the global optimum.

The moving-block permutation test (shared contrast dispatch across modes).

Inference helpers for SYNDES.

mlsynth.utils.syndes_helpers.inference.permutation_test_global(Y_pre: ndarray, Y_post: ndarray, design: SYNDESDesign, alpha: float = 0.1, include_null_stats: bool = True) SYNDESInference#

Moving-block permutation test for any SYNDES / Synthetic-Design mode.

Generalises the original global_2way-only implementation to the full set of MIP formulations from Doudchenko et al. (2021): global_2way, global_equal_weights (paper’s “one-way global”) and per_unit. The test follows the Chernozhukov, Wuethrich, and Zhu (2021) permutation-across-time logic: we treat each period’s cross-unit contrast as exchangeable under the no-effect null and compare the post-period mean to the null distribution obtained by cyclically shifting the stacked panel.

mlsynth.utils.syndes_helpers.inference.permutation_test_relaxed_global(Y_pre: ndarray, Y_post: ndarray, design: RelaxedDesign, alpha: float = 0.1, include_null_stats: bool = True) RelaxedInference#

Moving-block permutation test for a relaxed two-way SYNDES design.

Mirrors permutation_test_global() but consumes the relaxed solver’s contrast_weights directly rather than reconstructing them from q and w.

Parameters:
  • Y_pre (np.ndarray) – Pre-treatment outcome matrix of shape (T_pre, N).

  • Y_post (np.ndarray) – Post-treatment outcome matrix of shape (T_post, N).

  • design (RelaxedDesign) – Best-state design from solve_two_way_relaxed().

  • alpha (float, optional) – Significance level for the test.

  • include_null_stats (bool, optional) – Whether to attach the empirical null distribution to the result.

Returns:

RelaxedInference – Permutation-inference output.

Raises:

MlsynthDataError – If Y_post is missing or empty.

The minimum-detectable-effect power analysis (Newey-West long-run SE).

Minimum-detectable-effect (MDE) power analysis for SYNDES designs.

The Doudchenko et al. (2021) Synthetic Design paper computes power curves by Monte Carlo simulation (Appendix A.4, Figure 2): seed a known true ATET, run the permutation test, repeat, and tally the rejection rate. Repeating that procedure across a grid of effect sizes traces out the rejection probability as a function of the true effect.

For a single fitted design we short-circuit that loop by appealing to the asymptotic normality of the permutation test statistic under the null. The moving-block permutation test compares the post-period mean contrast mean_t (Y_t @ c) to the distribution of length-n_post block means, so the relevant null standard error is the std of those block means. We estimate it on the pre-period contrast series per_period = Y_pre @ c:

SE(n_post) = std_s ( mean_{u in block_s} per_period[u] ),

the std over overlapping length-n_post blocks. This captures any serial correlation in the outcomes (the same correlation the block permutation test is exposed to); under independence it reduces to sigma_perm / sqrt(n_post), the textbook MDE, and we fall back to that scaling when a horizon leaves too few blocks. The MDE at significance level alpha (two-sided) and power 1 - beta is

MDE_abs(n_post) = (z_{1 - alpha/2} + z_{1 - beta}) * SE(n_post),

where sigma_perm = std_t (Y_t @ c) is reported as the per-period contrast std. We report the MDE alongside its percentage version

MDE_pct(n_post) = 100 * MDE_abs(n_post) / baseline,

where baseline defaults to the mean pre-period outcome on the SYNDES-selected treated units (so MDE_pct reads as a percentage of treated-unit baseline). Other baselines available: "overall" (full panel mean), "control" (the SC-weighted control mean under the design’s contrast), or a user-supplied scalar.

Use power_analysis() as the public entry point; pass it the mlsynth.utils.syndes_helpers.structures.SYNDESResults returned by mlsynth.SYNDES.fit() (or any of the legacy SYNDES modes).

class mlsynth.utils.syndes_helpers.power.SYNDESPower(n_post_periods: ndarray, mde_absolute: ndarray, mde_percent: ndarray, sigma_perm: float, baseline: float, baseline_kind: str, alpha: float, power: float, contrast: ndarray, long_run_sigma: float = 0.0)#

Per-horizon MDE table for a fitted SYNDES design.

Parameters:
  • n_post_periods (np.ndarray) – Horizons evaluated, shape (H,).

  • mde_absolute (np.ndarray) – MDE in the same units as the outcome, shape (H,).

  • mde_percent (np.ndarray) – 100 * MDE_abs / baseline, shape (H,).

  • sigma_perm (float) – Ordinary std of the per-period contrast applied to the pre-period outcomes (the i.i.d. per-period scale).

  • long_run_sigma (float) – Newey-West (Bartlett HAC) long-run std of the per-period contrast – the serial-correlation-robust scale the MDE actually rests on. Equals sigma_perm when the contrast series is serially uncorrelated.

  • baseline (float) – Baseline outcome level used to convert mde_absolute into a percentage.

  • baseline_kind (str) – Tag identifying which baseline was used ("treated", "overall", "control", or "custom").

  • alpha (float) – Two-sided significance level used to build the CI.

  • power (float) – Target power 1 - beta used to compute the MDE.

  • contrast (np.ndarray) – The unit-level contrast vector that maps outcomes to the ATT estimator. Stored for downstream inspection.

alpha: float#
baseline: float#
baseline_kind: str#
contrast: ndarray#
long_run_sigma: float = 0.0#
mde_absolute: ndarray#
mde_percent: ndarray#
n_post_periods: ndarray#
power: float#
sigma_perm: float#
to_dataframe()#

Return a tidy (n_post, mde_abs, mde_pct) DataFrame.

mlsynth.utils.syndes_helpers.power.power_analysis(results: SYNDESResults, n_post_periods: Iterable[int] = range(1, 13), alpha: float = 0.05, power: float = 0.8, baseline: str | float = 'treated') SYNDESPower#

Compute the per-horizon minimum detectable effect for a SYNDES design.

Parameters:
  • results (SYNDESResults) – Output of mlsynth.SYNDES.fit() or mlsynth.SYNDES.fit(). Only the design and inputs fields are read.

  • n_post_periods (iterable of int, default range(1, 13)) – Horizons (in post-treatment periods) at which to report the MDE.

  • alpha (float, default 0.05) – Two-sided significance level.

  • power (float, default 0.80) – Target power for the MDE (1 - beta).

  • baseline (str or float, default "treated") – Denominator for the percentage MDE. Choices:

    • "treated" (default) – mean pre-period outcome over the SYNDES-selected treated units.

    • "overall" – mean pre-period outcome over every unit.

    • "control" – SC-weighted mean pre-period control outcome implied by the design’s contrast.

    • float – user-supplied baseline value.

Returns:

SYNDESPower – Frozen container with per-horizon MDE in absolute and percentage units.

Pareto recommendation – the composite-score selector that builds res.recommendation from the solution pool:

Pareto recommendation for the SYNDES solution pool.

The SYNDES MIP ranks designs by fit (the mean-squared pre-period imbalance) alone. But the best-fitting design is rarely the most detectable one, and a manager usually wants a single, defensible recommendation that balances the two. This module turns the top_K pool (already re-scored on power and cost by mlsynth.estimators.syndes._syndes_pool_menu()) into:

  • a Pareto frontier on (fit downwards, power downwards) – where fit is the pre-period RMSE between the treated group and the weighted average of the controls – the designs for which neither fit nor minimum-detectable-effect can be improved without worsening the other (cost enters only as a tie-break), always exposed for transparency; and

  • a single recommended design chosen by a GeoLift-style composite score: the weighted mean of the two dense ranks (fit and power), defaulting to a slight preference for power (0.51) over fit (0.49). Cost, then pre-period RMSE, break ties.

The selection never raises: with no feasibly-powered design it degrades to the best-fitting one and reports status="POWER_NOT_ESTABLISHED"; with an empty pool it returns status="EMPTY". The design mirrors the LEXSCM recommender in mlsynth.utils.fast_scm_helpers.lexselect; the difference is the selection rule (a weighted score rather than a lexicographic validity gate).

class mlsynth.utils.syndes_helpers.select.SYNDESDesignMetrics(design_id: str, markets: List[Any], control_group: List[Any], objective: float, mde_pct: float, mde_feasible: bool, cost: float | None = None, pre_fit_rmse: float | None = None, design: Any = None)#

Per-design criteria a SYNDES recommendation trades off.

Parameters:
  • design_id (str) – Stable id "D1" .. "Dn" in pool order (so D1 is the MSE-optimal, best-fitting design).

  • markets (list) – Treated-unit labels (the design’s arms).

  • control_group (list) – Donor labels backing the synthetic control.

  • objective (float) – The MIP objective (fit / mean-squared pre-period imbalance); lower is better.

  • mde_pct (float) – Minimum detectable effect at the realised horizon, as a percent of the treated baseline; lower is better. inf/nan means power could not be established.

  • mde_feasible (bool) – Whether mde_pct is finite.

  • cost (float or None) – Summed cost of the treated set (tie-break), or None.

  • pre_fit_rmse (float or None) – Root-mean-square pre-period contrast (secondary tie-break), or None.

  • design (Any) – The underlying SYNDESDesign (carried for deployment), or None.

control_group: List[Any]#
cost: float | None = None#
design: Any = None#
design_id: str#
markets: List[Any]#
mde_feasible: bool#
mde_pct: float#
objective: float#
pre_fit_rmse: float | None = None#
class mlsynth.utils.syndes_helpers.select.SYNDESRecommendation(winner: SYNDESDesignMetrics | None, shortlist: List[SYNDESDesignMetrics], pareto_ids: List[str], status: str, explanation: str, table: List[Dict[str, Any]], weights: Dict[str, float])#

Outcome of scoring a SYNDES pool.

Parameters:
  • winner (SYNDESDesignMetrics or None) – The recommended design (None only when the pool is empty).

  • shortlist (list of SYNDESDesignMetrics) – Top designs by composite score (best first), at most max_shortlist.

  • pareto_ids (list of str) – Design ids on the (fit, power) Pareto frontier (always exposed).

  • status (str) – "OK" (a feasibly-powered design was recommended), "POWER_NOT_ESTABLISHED" (no finite MDE; best-fitting design returned), or "EMPTY" (no pool).

  • explanation (str) – Human-readable summary of the recommendation.

  • table (list of dict) – One row per design with its metrics, composite score, and the pareto / winner flags – ready for a DataFrame.

  • weights (dict) – The normalised {"power": ..., "fit": ...} weights used.

explanation: str#
pareto_ids: List[str]#
shortlist: List[SYNDESDesignMetrics]#
status: str#
table: List[Dict[str, Any]]#
weights: Dict[str, float]#
winner: SYNDESDesignMetrics | None#
mlsynth.utils.syndes_helpers.select.recommend_syndes(pool: List[Dict[str, Any]], *, power_weight: float = 0.51, fit_weight: float = 0.49, max_shortlist: int = 5) SYNDESRecommendation#

Recommend a single SYNDES design from the solution pool.

Parameters:
  • pool (list of dict) – The results.pool menu (each entry carries markets, control_group, objective, mde_pct, cost, pre_fit_rmse, and design).

  • power_weight, fit_weight (float) – Relative weight on power versus fit in the composite score. Normalised to sum to one internally; defaults 0.51 / 0.49 (a slight preference for power).

  • max_shortlist (int) – Maximum number of designs in the returned shortlist.

Returns:

SYNDESRecommendation

Cross-method comparison – score GEOLIFT and SYNDES designs on one shared fit-vs-power plane:

Cross-estimator design comparison on a common fit-vs-power plane.

GEOLIFT and SYNDES differ in their optimisers but share a grammar: each emits a design that reduces to a unit-level contrast c = w_treated - w_control (both sides summing to one) over the same panel. From c two comparable numbers follow – the pre-period fit RMSE sqrt(mean((Y_pre @ c)**2)) and, by injecting a known effect at a fixed horizon, a minimum detectable effect.

Scoring every design from either method through one shared simulation harness – same horizon, same effect grid, same moving-block permutation null, same baseline – puts them on identical axes, so the per-method Pareto frontiers can be overlaid and compared directly. This is the point of the module: the frontier gap then reflects the designs, not two different power methodologies.

The MDE simulation rests on one fact true for any normalised design: adding an effect tau to the treated units’ outcomes shifts the contrast mean Y_t @ c by exactly tau (because the treated weights sum to one). So the alternative distribution of the length-h block mean is the null distribution shifted by tau, and the power at tau is the share of the (empirical, moving-block) null whose shifted magnitude clears the two-sided critical value.

class mlsynth.utils.design_compare.DesignComparison(table: ~pandas.DataFrame, syndes: ~typing.Any = None, geolift: ~typing.Any = None, specs: ~typing.List[~mlsynth.utils.design_compare.DesignSpec] = <factory>, horizon: int = 5)#

Result of compare_methods().

Parameters:
  • table (pd.DataFrame) – One row per design across all requested methods, with method, label, treated, fit_rmse, mde_pct (at horizon), and a per-method pareto flag – the dataframe form of the comparison.

  • syndes, geolift (optional) – The underlying fitted results (None for any method not requested), kept for inspection.

  • specs (list of DesignSpec) – The common-currency designs that were scored.

  • horizon (int) – The post-period horizon the MDE was simulated at.

geolift: Any = None#
horizon: int = 5#
plot(ax=None)#

Overlay the per-method Pareto frontiers (the plot form).

specs: List[DesignSpec]#
syndes: Any = None#
table: DataFrame#
class mlsynth.utils.design_compare.DesignSpec(method: str, label: str, contrast: Dict[Any, float], treated: List[Any], oos_rmse: float | None = None)#

A design from any estimator, in the common cross-method currency.

Parameters:
  • method (str) – Originating method, e.g. "SYNDES" or "GEOLIFT".

  • label (str) – Human-readable design label (shown on the plot).

  • contrast (dict) – {unit_label: weight} for w_treated - w_control; missing units are zero. Treated weights are positive and sum to one, control weights negative and sum to minus one.

  • treated (list) – Treated-unit labels (used for the baseline and the label).

  • oos_rmse (float, optional) – Out-of-sample (holdout) contrast RMSE of the design, when it was selected by SYNDES’s holdout selector. None for in-sample SYNDES designs and for GEOLIFT (which has no holdout notion).

contrast: Dict[Any, float]#
label: str#
method: str#
oos_rmse: float | None = None#
treated: List[Any]#
mlsynth.utils.design_compare.compare_methods(df: DataFrame, *, outcome: str, unitid: str, time: str, treated_size: int, horizon: int = 5, post_col: str | None = None, n_post: int | None = None, top_K: int = 6, alpha: float = 0.1, power_target: float = 0.8, effects_pct: Sequence[float] | None = None, methods: Sequence[str] = ('SYNDES', 'GEOLIFT'), syndes_holdout_frac: float | None = 0.3, syndes_options: Dict[str, Any] | None = None, geolift_options: Dict[str, Any] | None = None) DesignComparison#

Fit GEOLIFT and SYNDES on one panel and compare them on the shared plane.

A one-call wrapper around the adapters + compare_pareto(): it fits each requested method on the same data with the same treated-set size and post window, then scores every design through the one shared MDE harness so the frontiers are comparable.

Parameters:
  • df (pd.DataFrame) – Long panel.

  • outcome, unitid, time (str) – Column names.

  • treated_size (int) – Number of treated units (SYNDES K and GEOLIFT treatment_size).

  • horizon (int, default 5) – Post-period horizon for the simulated MDE.

  • post_col (str, optional) – 0/1 post-treatment column. Provide this or n_post.

  • n_post (int, optional) – If post_col is not given, mark the last n_post periods as post.

  • top_K (int, default 6) – SYNDES solution-pool size (number of candidate SYNDES designs).

  • alpha, power_target (float) – Significance level and target power for the MDE.

  • effects_pct (sequence of float, optional) – Common effect grid (percent of baseline) for the MDE simulation.

  • methods (sequence of str, default ("SYNDES", "GEOLIFT")) – Which methods to fit and compare.

  • syndes_holdout_frac (float or None, default 0.3) – Holdout fraction for SYNDES design selection: SYNDES learns its pool on the leading 1 - syndes_holdout_frac of the pre-period and is ranked by out-of-sample contrast error on the tail (column oos_rmse; SYNDES rows are ordered ascending by it). None reverts SYNDES to in-sample selection (oos_rmse is NaN). Ignored when top_K < 2 (a pool is required to validate). An explicit holdout_frac in syndes_options overrides this.

  • syndes_options, geolift_options (dict, optional) – Per-method overrides merged over the defaults (e.g. {"time_limit": 20.0} to give SYNDES a larger budget).

Returns:

DesignComparison.table (dataframe form) and .plot() (plot form).

mlsynth.utils.design_compare.compare_pareto(specs: Sequence[DesignSpec], Ywide: DataFrame, n_pre: int, *, horizon: int = 5, effects_pct: Sequence[float] | None = None, alpha: float = 0.1, power_target: float = 0.8) DataFrame#

Score designs from any method(s) on the shared fit-vs-power plane.

Parameters:
  • specs (sequence of DesignSpec) – Designs to compare (typically from from_syndes() and from_geolift(), run on the same panel).

  • Ywide (pd.DataFrame) – Outcome panel, rows = time, columns = unit labels (the canonical order every contrast is aligned to).

  • n_pre (int) – Number of leading rows that are pre-treatment.

  • horizon (int, default 5) – Post-period horizon for the MDE simulation.

  • effects_pct (sequence of float, optional) – Ascending grid of effect sizes as a percent of the treated baseline. Defaults to 0.25 .. 30 in 0.25 steps – the common grid both methods are simulated over.

  • alpha, power_target (float) – Two-sided significance level and target power.

Returns:

pd.DataFrame – One row per design: method, label, treated, fit_rmse, mde_pct (at horizon), and a per-method pareto flag.

mlsynth.utils.design_compare.from_geolift(res) List[DesignSpec]#

Build DesignSpecs from a fitted GEOLIFT result (its candidate designs).

The treated side is the equal-weighted (1/K) average of the selected markets, matching SYNDES’s normalised treated weights, so both methods’ contrasts share the sum(w_treated) = 1 convention.

mlsynth.utils.design_compare.from_syndes(res) List[DesignSpec]#

Build DesignSpecs from a fitted SYNDES result (its solution pool).

Falls back to the single returned design when no pool exists (top_K == 1).

mlsynth.utils.design_compare.plot_compare_pareto(frame: DataFrame, ax=None)#

Overlay each method’s fit-vs-power Pareto frontier on shared axes.

Renders in the in-house mlsynth style. Returns the axis drawn on.

mlsynth.utils.design_compare.simulated_mde(pre_contrast: ndarray, *, horizon: int, effects_abs: Sequence[float], alpha: float = 0.1, power_target: float = 0.8) float#

Minimum detectable effect from a contrast’s moving-block permutation null.

Parameters:
  • pre_contrast (np.ndarray) – The pre-period contrast series g_t = Y_t @ c, shape (T0,).

  • horizon (int) – Post-period horizon h the test averages over.

  • effects_abs (sequence of float) – Ascending grid of candidate effect sizes, in outcome units.

  • alpha, power_target (float) – Two-sided significance level and target power.

Returns:

float – Smallest grid effect reaching power_target, or inf if none does.

Standardized post-fit (shared across the MAREX family) — the compute_post_fit() / compute_power_analysis() / compute_smd() helpers that populate res.post_fit live outside this package so LEXSCM, MAREX, and PANGEO all consume the same diagnostics machinery:

Standardized post-fit diagnostics for synthetic control designs and the matching power-analysis surface that consumes them.

After any MAREX-family estimator (LEXSCM, MAREX, SYNDES, PANGEO, …) solves its design problem, downstream consumers (the SAGE dashboard, paper-style reports, comparison tables) all need the same numbers:

  • the post-treatment ATT, total effect, percentage lift, per-period gap;

  • pre / blank / post root-mean-squared-error of the synthetic gap;

  • inference scalars (p-value, CI bounds) when computed;

  • covariate-balance standardized mean differences (SMDs) when covariates were used in the design.

This module exposes one frozen dataclass (SyntheticControlPostFit) and three free functions:

  • compute_smd() – standalone, panel-independent SMD

    from any (cov_matrix, treated_w, control_w);

  • compute_post_fit() – the full diagnostic bundle from

    trajectories + boundaries + (optional) covariate matrix + (optional) inference;

  • compute_post_fit_marex() – adapter that builds the bundle from a

    MAREXResults + MAREXPanel pair.

The free-function entry points are deliberately small and reusable, so the LEXSCM / SYNDES / PANGEO equivalents can be added one-at-a-time without touching this module: they just compose the same primitives.

class mlsynth.utils.post_fit.MDEPoint(post_periods: int, mde_absolute: float, mde_pct: float, se: float, power_at_observed: float | None = None)#

Bases: object

Minimum detectable effect at a single post-treatment horizon.

mde_absolute: float#
mde_pct: float#
post_periods: int#
power_at_observed: float | None = None#
se: float#
class mlsynth.utils.post_fit.PowerAnalysis(headline: MDEPoint, curve: Tuple[MDEPoint, ...], alpha: float, power_target: float, sigma_placebo: float, serial_correlation: float, baseline: float, method: str = 'analytical_ar1')#

Bases: object

Standardized power-analysis output attached to SyntheticControlPostFit.

Built from the placebo / blank-period gap variance and an analytical Gaussian approximation, with AR(1) variance inflation to handle serial correlation in the gap residuals. The intent matches the per-estimator power modules already in the library (PangeoPower, SPCDPowerAnalysis, SYNDESPower) but consumes the same SyntheticControlPostFit shape so every covariate-aware SCM-family estimator gets the surface for free.

headline#

MDE for the actual n_post horizon of the realised design.

Type:

MDEPoint

curve#

MDE / power values across the requested post_grid horizons (so callers can read a detectability curve).

Type:

list of MDEPoint

alpha#

Two-sided significance level assumed.

Type:

float

power_target#

Target power the MDEs are computed at (default 0.80).

Type:

float

sigma_placebo#

Standard deviation of the placebo gap series used as the noise scale.

Type:

float

serial_correlation#

Lag-1 (AR(1)) autocorrelation of the placebo gap residuals used to inflate the variance for serial dependence.

Type:

float

baseline#

Mean of the control trajectory on the post window (denominator for mde_pct). NaN when no post window exists.

Type:

float

method#

"analytical_ar1" for the closed-form Gaussian + AR(1) MDE used here. Reserved for future "monte_carlo" extensions.

Type:

str

alpha: float#
baseline: float#
curve: Tuple[MDEPoint, ...]#
headline: MDEPoint#
mde_by_horizon() Dict[int, float]#

{post_periods: mde_pct} for quick lookup.

method: str = 'analytical_ar1'#
power_target: float#
serial_correlation: float#
sigma_placebo: float#
class mlsynth.utils.post_fit.SyntheticControlPostFit(treated_series: ndarray, control_series: ndarray, gap_series: ndarray, n_fit: int, n_blank: int, n_post: int, ate: float | None = None, total_effect: float | None = None, ate_percent: float | None = None, ate_per_period: ndarray | None = None, cumulative_effect: ndarray | None = None, p_value: float | None = None, ci_lower: float | None = None, ci_upper: float | None = None, inference_method: str | None = None, rmse_fit: float | None = None, rmse_blank: float | None = None, rmse_post: float | None = None, covariate_names: Tuple[str, ...] = (), covariate_smd: Dict[str, float] | None = None, covariate_smd_abs_max: float | None = None, covariate_smd_squared_sum: float | None = None, covariate_smd_treated_vs_pop: Dict[str, float] | None = None, covariate_smd_treated_vs_pop_abs_max: float | None = None, covariate_smd_treated_vs_pop_squared_sum: float | None = None, covariate_smd_control_vs_pop: Dict[str, float] | None = None, covariate_smd_control_vs_pop_abs_max: float | None = None, covariate_smd_control_vs_pop_squared_sum: float | None = None, power: PowerAnalysis | None = None)#

Bases: object

Standardized post-fit diagnostics for a single synthetic control design.

Field semantics are estimator-agnostic; every MAREX-family adapter populates the same shape. Any field that isn’t naturally computable for the producing estimator is left None.

ate: float | None = None#
ate_per_period: ndarray | None = None#
ate_percent: float | None = None#
ci_lower: float | None = None#
ci_upper: float | None = None#
control_series: ndarray#
covariate_names: Tuple[str, ...] = ()#
covariate_smd: Dict[str, float] | None = None#
covariate_smd_abs_max: float | None = None#
covariate_smd_control_vs_pop: Dict[str, float] | None = None#
covariate_smd_control_vs_pop_abs_max: float | None = None#
covariate_smd_control_vs_pop_squared_sum: float | None = None#
covariate_smd_squared_sum: float | None = None#
covariate_smd_treated_vs_pop: Dict[str, float] | None = None#
covariate_smd_treated_vs_pop_abs_max: float | None = None#
covariate_smd_treated_vs_pop_squared_sum: float | None = None#
cumulative_effect: ndarray | None = None#
gap_series: ndarray#
inference_method: str | None = None#
n_blank: int#
n_fit: int#
n_post: int#
p_value: float | None = None#
power: PowerAnalysis | None = None#
rmse_blank: float | None = None#
rmse_fit: float | None = None#
rmse_post: float | None = None#
total_effect: float | None = None#
treated_series: ndarray#
mlsynth.utils.post_fit.compute_post_fit(treated_series: ndarray, control_series: ndarray, *, n_fit: int, n_blank: int = 0, n_post: int | None = None, cov_matrix: ndarray | None = None, cov_names: Sequence[str] | None = None, cov_scales: ndarray | None = None, treated_weights: ndarray | None = None, control_weights: ndarray | None = None, population_weights: ndarray | None = None, inference: Any | None = None, n_treated_units: int | None = None) SyntheticControlPostFit#

Compute a SyntheticControlPostFit from trajectories + boundaries.

The trajectories treated_series and control_series are the estimator’s own synthetic constructs (Σⱼ wⱼ Yⱼ and Σⱼ vⱼ Yⱼ in Abadie-Zhou notation). n_post defaults to len(treated_series) - n_fit - n_blank.

Covariate balance fields are populated when cov_matrix + treated_weights + control_weights are all supplied (the natural inputs for any MAREX-family design). The compute_smd() helper does the work, so the SMD numbers are exactly consistent with a standalone call to compute_smd().

Inference scalars are pulled from the estimator’s inference object via _extract_inference(), which knows about the four common shapes (LEXSCM Inference, MAREX MAREXInference, SYNDES SYNDESInference, or a plain dict). All inference fields are optional.

mlsynth.utils.post_fit.compute_post_fit_marex(raw, panel, *, cov_scales: ndarray | None = None) SyntheticControlPostFit#

Adapt a MAREXResults + MAREXPanel pair into a SyntheticControlPostFit.

Pulls the aggregate synthetic-treated / synthetic-control trajectories from raw.globres, the (T0, blank_periods) split from panel.T0 and panel.blank_periods, the inference object from raw.globres.inference, and the covariate matrix from panel.covariates (when present).

mlsynth.utils.post_fit.compute_power_analysis(post_fit: SyntheticControlPostFit, *, alpha: float = 0.05, power_target: float = 0.8, post_grid: Sequence[int] | None = None) PowerAnalysis#

Analytical MDE + power curve for a design’s SyntheticControlPostFit.

Uses the placebo / blank-period gap residuals (or the pre-period gap when no blank window was carved out) to estimate the noise standard deviation sigma_placebo and the AR(1) autocorrelation rho, then computes the minimum detectable effect for each horizon T in post_grid via the Gaussian formula

MDE(T) = (z_{1-alpha/2} + z_{power}) * sigma_placebo * sqrt(VIF(T, rho)),

where VIF(T, rho) = Var(mean of T AR(1) periods) / sigma_placebo^2. The headline MDE uses T = post_fit.n_post (the realised post window).

Parameters:
  • post_fit (SyntheticControlPostFit) – The standardized post-fit from any MAREX-family estimator.

  • alpha (float, default 0.05) – Two-sided significance level.

  • power_target (float, default 0.80) – Target power for the MDE.

  • post_grid (sequence of int, optional) – Post-treatment horizons at which to compute MDE. Defaults to a small geometric grid centered on post_fit.n_post so users see the detectability tradeoff vs. running the experiment longer.

Returns:

PowerAnalysis – Headline MDE + a curve over the requested horizons.

mlsynth.utils.post_fit.compute_smd(cov_matrix: ndarray, treated_weights: ndarray, control_weights: ndarray, *, cov_names: Sequence[str] | None = None, cov_scales: ndarray | None = None) Dict[str, Any]#

Standardized mean differences between weighted treated and control means.

Parameters:
  • cov_matrix (ndarray, shape (N, M)) – Per-unit covariate values; rows align to treated_weights and control_weights.

  • treated_weights, control_weights (ndarray, shape (N,)) – Non-negative weights with disjoint supports. They are renormalised to sum to 1 internally (so callers may pass raw sums-to-K weights).

  • cov_names (sequence of str, optional) – Names for the M covariates. Defaults to ("cov_0", "cov_1", ...).

  • cov_scales (ndarray, shape (M,), optional) – Pre-computed per-covariate standardization scales (cross-unit std). Defaults to the std of cov_matrix columns. Passing the value already cached by build_covariate_matrix is the right move.

Returns:

  • dict with keys smd (the per-covariate dict), smd_abs_max,

  • and smd_squared_sum. Returns empty / NaN summaries if either weight

  • vector is all-zero.

mlsynth.utils.post_fit.to_effect_result(pf: SyntheticControlPostFit, *, time_periods: ndarray | None = None, intervention_time: Any | None = None, method_name: str | None = None, donor_weights: Dict[str, float] | None = None) Any#

Convert a SyntheticControlPostFit into a standardized EffectResult.

The single, family-wide adapter from the rich post-fit bundle to the contract’s EffectResult view, so every MAREX-family estimator (LEXSCM, MAREX, SYNDES, PANGEO) gets report for free instead of hand-copying fields. The realized effect’s standard scalars populate the standard sub-models; everything the contract has no slot for (per-period effects, cumulative effect, covariate SMDs, and the full post_fit object itself) is carried in additional_outputs so it remains discoverable.