Synthetic Design (SYNDES)

Contents

Synthetic Design (SYNDES)#

When to Use This Estimator#

Most synthetic-control work takes the treated unit as given and asks only how to weight the donors. SYNDES answers the prior question Doudchenko et al. [SYNDES] pose: when you are about to run an experiment and have pre-treatment outcome data, which units should you treat? Treating units at random – or by hand – leaves accuracy on the table, because the variance of the resulting treatment-effect estimate depends on which units are treated and how the rest are weighted into a synthetic comparison.

The authors argue this is exactly the regime of market-level experiments: treatment can only be applied to coarse units (media markets, regions, whole products), each unit is expensive to treat (so K is small), and interference or equilibrium effects rule out a more granular randomization. In that setting the experimenter both chooses the treated set and estimates the effect, and SYNDES does both at once – it minimizes the mean squared error of the average-treatment-effect-on-the-treated estimator directly over the joint choice of treatment assignment and synthetic weights. Use it when:

  • you control assignment and have a panel of pre-treatment outcomes;

  • you want a small, well-chosen treated set rather than a random one;

  • you are willing to solve a mixed-integer program for a provably optimal design (or to bound the achievable power of one).

Notation#

We observe an outcome \(Y_{it}\) for units \(i = 1, \ldots, N\) over pre-treatment periods \(t = 1, \ldots, T\). At \(t = T\) the experimenter assigns a binary treatment \(D_i \in \{0, 1\}\) to be applied over the \(S - T\) post-treatment periods, with exactly K treated units (\(\sum_i D_i = K\)). Each unit has potential outcomes \((Y_{it}(0), Y_{it}(1))\) and observed outcome \(Y_{it} = Y_{it}(D_i)\). Synthetic weights \(w\) live on the simplex (non-negative, summing to one on the relevant side). The estimand is the weighted average treatment effect on the treated (wATET) \(\tau = \sum_{i:D_i=1} w_i \tau_i\), where \(\tau_i\) is unit \(i\)’s additive effect.

Note

Notation bridge. The single-treated-unit synthetic-control canon (treated \(j=0\), donors \(1,\ldots,N\)) does not fit a design problem with K chosen treated units, so we follow the paper’s convention: units are indexed \(i\), the assignment vector \(D\) is itself a decision variable, and \(T\) denotes the pre-treatment length.

The design problem#

Under the outcome model \(Y_{it}(0) = \mu_{it} + \varepsilon_{it}\) with mean-zero, homoskedastic noise (\(\operatorname{Var}\varepsilon_{it} = \sigma^2\)) and additive effects \(Y_{it} = Y_{it}(0) + D_i \tau_i\), the conditional MSE of the per-unit synthetic-control estimator \(\hat\tau_i = Y_{i,T+1} - \sum_{j:D_j=0} w^i_j Y_{j,T+1}\) is

\[\mathbb{E}\bigl[(\hat\tau_i - \tau_i)^2 \mid D, w\bigr] = \Bigl(\mu_{i,T+1} - \textstyle\sum_{j:D_j=0} w^i_j \mu_{j,T+1}\Bigr)^2 + \sigma^2\Bigl(1 + \textstyle\sum_{j:D_j=0} (w^i_j)^2\Bigr).\]

The first term is a bias from imperfect pre-treatment matching; the second a variance that grows with the weight concentration. SYNDES minimizes the empirical, pre-period analogue of this MSE jointly over \((D, w)\) – the \(\sigma^2 \sum w^2\) term becomes the ridge penalty \(\lambda\) below. Because the choice of treated set makes the estimand itself stochastic, the target is the wATET for the units SYNDES selects, not a fixed population ATE.

The three MIP formulations#

The joint optimization over assignment and weights is a bilevel / mixed-integer program and is NP-hard. Doudchenko et al. give three forms, all exposed through mode and all solved as MIPs (auxiliary variables linearize the weight-assignment products). They differ in how the treated and control sides are weighted.

Per-unit (mode="per_unit")#

A separate synthetic control for every treated unit:

\[\min_{D, \{w^i_j\}} \; \frac{1}{KT} \sum_{i} \sum_{t} D_i \Bigl(Y_{it} - \textstyle\sum_j w^i_j (1 - D_j) Y_{jt}\Bigr)^2 + \frac{\lambda}{K} \sum_i \sum_j D_i (w^i_j)^2,\]

subject to \(w^i_j \ge 0\), \(\sum_i D_i = K\), and \(\sum_j w^i_j(1 - D_j) = 1\) for each treated \(i\). Each treated unit draws its own simplex of control weights.

Two-way global (mode="two_way_global")#

A single weight vector applied to both sides of one global contrast:

\[\min_{D, \{w_i\}} \; \frac{1}{T} \sum_t \Bigl(\textstyle\sum_i w_i D_i Y_{it} - \sum_i w_i (1 - D_i) Y_{it}\Bigr)^2 + \lambda \sum_i w_i^2,\]

subject to \(w_i \ge 0\), \(\sum_i D_i = K\), \(\sum_i w_i D_i = 1\) and \(\sum_i w_i (1 - D_i) = 1\). mlsynth linearizes \(q_i = w_i D_i\) and enforces the two normalizations with \(\sum_i q_i = 1\), \(\sum_i w_i = 2\), so the per-period contrast is \((2q - w)^\top Y_t\).

One-way global (mode="one_way_global")#

The two-way program with the treated weights pinned equal (a simple average, \(w_i = 1/K\) on the treated), while the control side stays a free synthetic control:

\[\min_{D, c} \; \frac{1}{T}\sum_t \Bigl(\tfrac{1}{K}\textstyle\sum_i D_i Y_{it} - \sum_i c_i Y_{it}\Bigr)^2 + \lambda\Bigl(\tfrac{1}{K} + \textstyle\sum_i c_i^2\Bigr),\]

subject to \(c_i \ge 0\), \(\sum_i c_i = 1\), \(c_i \le 1 - D_i\) (treated units carry no control weight) and \(\sum_i D_i = K\).

Warning

One-way global is not difference-in-means. Only the treated side is fixed at \(1/K\); the control side c is a free synthetic control to be optimized. Pinning both sides (treated \(1/K\), control \(1/(N-K)\)) would be the randomized difference-in-means baseline, a different (and weaker) design.

Assumptions / Remarks.

Assumption 1 (additive effects, homoskedastic noise). Outcomes follow \(Y_{it}(0) = \mu_{it} + \varepsilon_{it}\) with \(\mathbb{E}\varepsilon_{it}=0\), \(\operatorname{Var}\varepsilon_{it} = \sigma^2\), and treatment adds \(\tau_i\). Remark. This is what makes the MSE above decompose into the matching-bias and weight-variance terms the MIP minimizes; \(\sigma^2\) is unknown and is supplied through lam (default: the pre-period sample variance).

Assumption 2 (admissible weights). Weights are non-negative and normalized on their side (a convex combination), so the synthetic comparison does not extrapolate. Remark. The simplex is what gives the design an interpretable “synthetic unit” reading and bounds the variance term.

Assumption 3 (homogeneous vs. heterogeneous effects). When effects are homogeneous (\(\tau_i \equiv \tau\)) any weighted average recovers \(\tau\), so the global modes can choose weights freely to minimize MSE. When effects are heterogeneous, different weightings target different estimands; per_unit (or the fixed-treated-weight one_way_global) keeps the estimand well-defined. Remark. This is the authors’ guidance for choosing a mode – it is about which wATET you are willing to target, not just fit.

Assumption 4 (sharp null for inference). The permutation test targets the sharp null \(\tau_i = 0\) for all treated \(i\). Remark. Correct test size holds under the exchangeability the moving-block permutation imposes; the authors note this requires “rather strong assumptions” in finite samples.

Inference and minimum detectable effect#

For any mode the fitted design yields a unit-level contrast vector c such that the ATT estimate at period \(t\) is \(Y_t^\top c\) (treated weights minus control weights; for per_unit the \(K\) per-unit estimators are averaged). SYNDES tests the sharp null with the moving-block permutation test of Chernozhukov, Wuethrich and Zhu (2021): the post-period mean contrast is compared to the distribution obtained by cyclically shifting the stacked panel.

For design-time power, power_analysis() returns a per-horizon minimum detectable effect (MDE). Because the moving-block test averages a contrast over correlated periods, the relevant null standard error is the Newey-West (Bartlett HAC) long-run std of the per-period contrast, not the i.i.d. \(\sigma_{\text{perm}}/\sqrt{n_{\text{post}}}\):

\[\mathrm{MDE}(n_{\text{post}}) = (z_{1-\alpha/2} + z_{1-\beta})\, \frac{\hat\sigma_{\mathrm{LR}}}{\sqrt{n_{\text{post}}}},\]

reported as long_run_sigma. It reduces to the textbook formula when the contrast series is serially uncorrelated.

Standardized Post-Fit and Power Analysis#

Every call to SYNDES.fit() attaches a SyntheticControlPostFit to res.post_fit — the same diagnostic surface used by LEXSCM, MAREX, and PANGEO. This is the one-stop container downstream consumers (dashboards, paper-style reports, comparison tables) read from, regardless of which member of the family produced the design:

pf = res.post_fit                          # SyntheticControlPostFit
pf.ate, pf.ate_percent, pf.total_effect    # treatment-effect scalars
pf.rmse_fit, pf.rmse_post                  # pre / post fit quality
pf.p_value, pf.ci_lower, pf.ci_upper       # permutation inference
pf.power                                   # PowerAnalysis (see below)

The synthetic treated / control trajectories used to populate post_fit are the per-unit weighted aggregates Y[:, j] @ treated_weights and Y[:, j] @ control_weights over the full timeline. SYNDES has no pre-period blank window (its inference is a moving-block permutation on the post-period rather than a placebo test on a held-out pre-tail), so pf.n_blank = 0 and the power-analysis module falls back to the pre-period gap as its placebo proxy. Mathematically the MDE surface is the same Gaussian + AR(1) construction used across the family:

\[\mathrm{MDE}(T) = \bigl(z_{1-\alpha/2} + z_{1-\beta}\bigr) \cdot \hat\sigma_{\text{placebo}} \cdot \sqrt{\mathrm{VIF}(T, \hat\rho)},\]

with \(\hat\sigma_{\text{placebo}}\) the per-period contrast SD on the pre-period (the SYNDES paper’s “pre-period imbalance”), \(\hat\rho\) the lag-1 autocorrelation of that contrast clipped to \((-0.99, 0.99)\), and \(\mathrm{VIF}(T, \rho) = \tfrac{1}{T}\bigl(1 + 2\sum_{k=1}^{T-1} (1-k/T)\rho^k\bigr)\) the AR(1) variance-inflation factor (textbook \(1/T\) when \(\rho = 0\)). See Synthetic Controls for Experimental Design (MAREX) for the full derivation; the same module powers all three estimators.

p = res.post_fit.power                      # PowerAnalysis
p.headline.mde_absolute                     # MDE at the realised T_post
p.headline.mde_pct                          # ... as % of post-period baseline
p.headline.power_at_observed                # power to detect res.post_fit.ate
p.curve                                     # tuple of MDEPoint per horizon

Power-analysis failures (e.g. degenerate pre-period contrast) never break a fit; res.post_fit.power is simply left as None in that case. To compute on a non-default horizon grid or significance level call compute_power_analysis() directly.

SYNDES accepts either a scalar T0 (count of pre-treatment periods) or post_col (a 0/1 column marking the post-treatment window). Both express the same pre/post split — passing post_col is just the more ergonomic form when the panel already carries an experiment-window flag. If both are supplied and disagree, post_col wins and a UserWarning is emitted so the override is visible.

Choosing among the modes#

Mode

Weighting

Use when

per_unit

one synthetic control per treated unit

effects are heterogeneous; you want unit-level estimates and the tightest per-unit fit

two_way_global

one weight vector, both sides free

effects are homogeneous; you want the lowest-MSE single contrast

one_way_global

treated fixed at 1/K, control free

heterogeneous effects but a simple, fixed treated average is the target estimand

Solver runtime and the 5%-gap default#

The SYNDES MIP is structurally hard. The two_way_global formulation contains a bilinear product \(q_i = w_i D_i\) between the continuous weight w_i and the binary assignment D_i, encoded via the standard McCormick linearisation (q_i \le D_i, q_i \le w_i, q_i \ge w_i - (1 - D_i)). McCormick is the tightest linear relaxation of a bilinear term, but it is still loose at the root LP — so the SCIP optimality gap closes slowly on long panels even when the primal incumbent is essentially optimal. For example on the Walmart weekly-sales panel (\(N = 45,\ T_0 = 128\), \(K = 3\)) SCIP finds the optimal treated set within a minute, then spends an additional 30+ minutes proving optimality by climbing the dual bound. The treated set itself does not change during this proof phase.

This matters because in practice our SCM bias bounds do not require optimality of the solver. Abadie and Zhao (2026) (2026, eq. 10 discussion, p. 10 and 13), writing about their formulation, state explicitly:

“we do not strictly require optimality of \(\{w^*, v^*\}\), provided \(\{w^*, v^*\}\) is feasible and \(\bar{X} - \sum_j w^*_j X_j \approx 0\) and \(X_j - \sum_i v^*_{ij} X_i \approx 0\) for all j such that \(w^*_j > 0\).”

Their Theorems 1 and 2 are written in terms of the residual fit, not the QP optimality gap, so a 5%-suboptimal solution that achieves approximate balance inherits the same econometric guarantees as a proven-optimal one. SYNDES is not the same problem as the ones AZ are concerned with, but the conclusion still holds.

mlsynth therefore exposes two SCIP-knob fields on SYNDESConfig and defaults them to the production-friendly setting:

  • gap_limit (default 0.05, i.e. 5%) – handed to SCIP as scip_params={"limits/gap": value}. The MIP terminates as soon as the primal-dual gap is within this fraction of the incumbent.

  • time_limit (default 60.0 seconds) – wall-clock cap on the solve, passed through as scip_params={"limits/time": value}.

With these defaults Walmart-scale designs return in under a minute with a known \(\le 5\%\) gap to the (provable) optimum. Tighten either knob – or set it to None – for research-grade optimality:

# Default: 5% gap, 60s wall-clock — production-suitable.
SYNDES({
    "df": df, "outcome": "y", "unitid": "unit", "time": "time",
    "K": 3, "mode": "two_way_global", "post_col": "post",
}).fit()

# Loosen the gap to return in seconds when you just need a
# plausible design for prototyping.
SYNDES({...,
        "gap_limit": 0.25, "time_limit": 5.0,
}).fit()

# Disable both limits for an asymptotic-optimality run. Be
# prepared for hours-long solves on long panels.
SYNDES({...,
        "gap_limit": None, "time_limit": None,
}).fit()

The MIP status codes user_limit and user_limit_inaccurate (SCIP’s “stopped early with a valid incumbent”) are accepted as successful returns alongside the standard optimal / optimal_inaccurate codes — again, because the theory only needs the incumbent’s feasibility, not the proof of optimality.

Note

If you have a commercial solver (Gurobi, CPLEX, MOSEK) installed, pass solver="GUROBI" and the MIP closes the gap orders of magnitude faster than SCIP — these solvers handle MIQP / MIQCP relaxations natively. The default of SCIP is chosen because it ships with mlsynth (via pyscipopt) with no license required.

Multiple Treatment Arms#

When a single experiment runs several treatment arms (e.g. different creatives, offers, or price points, each rolled out to its own set of markets), pass an arm column. SYNDES then solves the design problem independently within each arm’s units and returns a SYNDESMultiArmResults — a dict of per-arm results keyed by arm label. Every option (mode, K, lam, inference) applies within each arm, and K is interpreted per arm (so it must be smaller than the smallest arm’s unit count).

res = SYNDES({
    "df": df, "outcome": "sales", "unitid": "DMA", "time": "week",
    "arm": "treat",                 # categorical arm label per unit
    "K": 3, "mode": "two_way_global", "post_col": "post",
    "run_inference": True,
}).fit()

res.arm_designs["A"]                 # full SYNDESResults for arm A
res.atet_by_arm()                    # {arm: ATET}
res.selected_unit_labels_by_arm()    # {arm: treated units}

The arm column must be constant within each unit over time. arm is not compatible with the global costs/budget constraint (the cost vector is defined over all units, not per arm). When arm is None (default), a single SYNDESResults is returned, exactly as before.

Example#

SYNDES takes a long balanced panel and a pre/post split (post_col or T0). The example below is self-contained – it generates a small panel and runs end to end (pyscipopt ships with mlsynth, so the SCIP solver is available on install). The same call shape serves all three designs.

import numpy as np
import pandas as pd
from mlsynth import SYNDES, power_analysis

# A small balanced panel: 8 units, 20 periods (last 6 are post-treatment).
rng = np.random.default_rng(0)
n_units, n_periods, n_post = 8, 20, 6
factors = rng.normal(size=(n_periods, 2))
loadings = rng.uniform(0.3, 1.0, size=(n_units, 2))
level = rng.uniform(8.0, 12.0, size=n_units)          # positive unit baselines
Y = level + factors @ loadings.T + rng.normal(scale=0.3, size=(n_periods, n_units))
df = pd.DataFrame(
    [{"unit": j, "time": t, "Y": float(Y[t, j]),
      "post": int(t >= n_periods - n_post)}
     for j in range(n_units) for t in range(n_periods)]
)

res = SYNDES({
    "df": df, "outcome": "Y", "unitid": "unit", "time": "time",
    "K": 3, "mode": "two_way_global", "post_col": "post",
    "run_inference": True, "alpha": 0.05, "solver": "SCIP",
}).fit()

print(res.design.selected_unit_labels)   # which units to treat
print(res.design.control_weights)        # synthetic-control weights
print(res.design.pre_fit_rmse)           # pre-period balance of the design
print(res.inference.atet, res.inference.p_value)

mde = power_analysis(res, n_post_periods=[4, 8, 12], power=0.80)
print(mde.to_dataframe())                # minimum detectable effect by horizon

A budget constraint (costs + budget) adds \(\sum_i \text{cost}_i D_i \le B\) to the MIP; mode="two_way_global" also accepts K=None to let the program choose the number of treated units.

Verification#

Note

Simulation (all three designs). Following the paper’s Section 5, each replication draws a fresh noisy panel (stationary AR(1) factors + unit levels), re-solves the design MIP on the pre-period, estimates the ATT on the post-period and runs the moving-block permutation test. Setup: \(N=10\) units, \(T_{\text{pre}}=18\), \(T_{\text{post}}=6\), \(K=3\), \(\sigma=0.25\), 40 replications; the effect is injected at \(\tau\) equal to the mean analytic MDE (0.165). Rejection at the 5% level:

design

MDE

bias

RMSE

size

power

per_unit

0.157

0.020

0.098

0.12

0.50

two_way_global

0.166

0.013

0.095

0.12

0.50

one_way_global

0.171

-0.004

0.115

0.23

0.45

random DiM (baseline)

0.096

0.982

0.15

0.25

The paper’s headline result reproduces: all three SYNDES designs are approximately unbiased and cut estimator RMSE roughly ten-fold versus a randomized difference-in-means design (~0.10 vs. 0.98). The moving-block permutation test is mildly over-sized / under-powered at this short pre-period – the design-optimized contrast tightens the pre-period permutation null, and the analytic MDE is a normal-theory benchmark – a finite-sample inference caveat (the authors note correct sizes hold “under rather strong assumptions”) that shrinks as the pre-period grows. The simulation script ships alongside the estimator’s tests.

Core API#

Synthetic Design (SYNDES) estimator.

Implements the three mixed-integer programming formulations of Doudchenko, Khosravi, Pouget-Abadie, Lahaie, Lubin, Mirrokni, Spiess, and Imbens (2021), “Synthetic Design: An Optimization Approach to Experimental Design with Synthetic Controls” (arXiv:2112.00278). SYNDES jointly selects treated units and synthetic-control weights by solving a single MIP that minimises the post-period mean squared error of the resulting ATT estimator.

The three formulations exposed via the mode field correspond directly to Section 3 of the paper:

  • "per_unit" — separate SC weights w_{ji} per treated

    unit i. Trades a tighter per-unit fit against a richer parameter space.

  • "two_way_global" — single weight vector w_i applied

    symmetrically to the treated and control contrasts. Recommended when treatment effects are homogeneous.

  • "one_way_global"two_way_global with the treated weights

    pinned to 1/K; the SC step only adjusts the control combination. Easiest to interpret as a “weighted difference-in-means”.

Inference defaults to the moving-block permutation test of Chernozhukov, Wuethrich, and Zhu (2021), applied uniformly to all three modes via the shared contrast-vector dispatch in mlsynth.utils.syndes_helpers.inference.

Two additional MIP features round out the estimator:

  • Budget constraint (paper section 1): supply costs (length N) and budget to add sum_i c_i D_i <= B to the MIP.

  • Annealed relaxation (mode="two_way_global_annealed"): simulated-annealing alternative to the MIP for the symmetric two-way formulation. Useful when a commercial MIP solver is unavailable or the problem size makes the MIP impractical.

Post-fit, see mlsynth.power_analysis() for per-horizon minimum-detectable-effect tables.

class mlsynth.estimators.syndes.SYNDES(config: SYNDESConfig | dict)#

Bases: object

Synthetic Design (Doudchenko et al. 2021) estimator.

Parameters:

config (SYNDESConfig or dict) – Configuration object. See mlsynth.config_models.SYNDESConfig.

Returns:

SYNDESResults or RelaxedSolverResults – For the three MIP modes, a SYNDESResults container with the optimised design and optional permutation inference. For mode="two_way_global_annealed" the relaxed solver returns a RelaxedSolverResults container with design, trace, inputs, and optional inference.

fit() SYNDESResults | RelaxedSolverResults | SYNDESMultiArmResults#

Solve the MIP (or relaxation), run optional inference, return results.

Returns a single result when no arm column is configured; otherwise solves the SYNDES design independently within each arm’s units and returns a SYNDESMultiArmResults keyed by arm label.

Configuration#

class mlsynth.config_models.SYNDESConfig(*, df: DataFrame, outcome: str, unitid: str, time: str, K: Annotated[int | None, Gt(gt=0)] = None, mode: Literal['per_unit', 'two_way_global', 'one_way_global', 'two_way_global_annealed'] = 'two_way_global', lam: Annotated[float | None, Ge(ge=0.0)] = None, T0: Annotated[int | None, Gt(gt=0)] = None, post_col: str | None = None, alpha: Annotated[float, Gt(gt=0.0), Lt(lt=1.0)] = 0.1, run_inference: bool = True, solver: Any = 'SCIP', relaxed_max_iter: Annotated[int, Gt(gt=0)] = 40, relaxed_decay: Annotated[float, Gt(gt=0.0), Lt(lt=1.0)] = 0.97, gap_limit: Annotated[float | None, Ge(ge=0.0), Lt(lt=1.0)] = 0.05, time_limit: Annotated[float | None, Gt(gt=0.0)] = 60.0, display_graph: bool = False, verbose: bool = False, costs: List[float] | None = None, budget: Annotated[float | None, Gt(gt=0.0)] = None, arm: str | None = None)#

Configuration for the Synthetic Design (SYNDES) estimator.

Implements the three MIP formulations of Doudchenko, Khosravi, Pouget-Abadie, Lahaie, Lubin, Mirrokni, Spiess, and Imbens (2021), “Synthetic Design: An Optimization Approach to Experimental Design with Synthetic Controls” (arXiv:2112.00278). The estimator jointly chooses

  • which units to treat (binary assignment D), and

  • the synthetic-control weights w used to build the counterfactual,

by minimising a single mean-squared-error objective. Three formulations are exposed, each with a different geometry over the treated/control sample-variance terms (Theorem 1 of the paper):

  • "per_unit" – separate SC weights for each treated

    unit (paper’s “per-unit” problem).

  • "two_way_global" – single weight vector applied

    symmetrically to treated and control (paper’s “two-way global” problem).

  • "one_way_global""two_way_global" with equal

    weights pinned on the treated set (paper’s “one-way global” problem).

  • "two_way_global_annealed" – simulated-annealing relaxation

    of two_way_global (mlsynth-specific extension; not in the paper).

Parameters:
  • K (int or None) – Number of treated units. Required for per_unit and one_way_global. May be None for two_way_global (Doudchenko et al. 2021, paragraph after eq. 9, note that the K-constraint is mathematically optional in the symmetric formulation); when None the MIP picks the cardinality of the treated set endogenously, with at least one treated and one control unit.

  • mode (str) – Paper-aligned mode name (see above).

  • lam (float or None) – Penalty on the squared weights. None defaults to the sample variance of the pre-treatment outcomes (Section 6 of the paper).

  • T0 (int or None) – Number of pre-treatment periods. If neither T0 nor post_col is supplied, the entire panel is treated as pre-treatment (design-only / planning mode – no post period, so no ATT/inference is produced).

  • post_col (str or None) – Optional 0/1 column identifying post-treatment periods.

  • alpha (float) – Two-sided significance level for the permutation test.

  • run_inference (bool) – Whether to run the moving-block permutation test (Chernozhukov-Wuethrich-Zhu (2021) style; see Appendix A.4 of the paper).

  • solver (Any) – CVXPY-compatible MIP solver. Defaults to SCIP.

  • display_graph (bool) – Whether to plot the design.

  • verbose (bool) – Solver verbosity.

K: int | None#
T0: int | None#
alpha: float#
arm: str | None#
budget: float | None#
costs: List[float] | None#
display_graph: bool#
gap_limit: float | None#
lam: float | None#
mode: Literal['per_unit', 'two_way_global', 'one_way_global', 'two_way_global_annealed']#
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

post_col: str | None#
relaxed_decay: float#
relaxed_max_iter: int#
run_inference: bool#
solver: Any#
time_limit: float | None#
verbose: bool#

Result Containers#

SYNDES.fit() returns a SYNDESResults, bundling the optimized SYNDESDesign (assignment, treated/control/contrast weights, the pre-period contrast_series and pre_fit_rmse, objective value), the prepared SYNDESInputs, and optional SYNDESInference. The mode="two_way_global_annealed" path instead returns a RelaxedSolverResults.

Structured containers for the SYNDES synthetic design pipeline.

class mlsynth.utils.syndes_helpers.structures.SYNDESDesign(mode: str, objective_value: float, lambda_value: float, assignment: ~numpy.ndarray, selected_unit_indices: ~numpy.ndarray, selected_unit_labels: ~numpy.ndarray, assignment_by_unit: ~typing.Dict[~typing.Any, int], w: ~numpy.ndarray | None = None, q: ~numpy.ndarray | None = None, z: ~numpy.ndarray | None = None, raw_results: ~typing.Dict[str, ~typing.Any] = <factory>, treated_weights: ~numpy.ndarray | None = None, control_weights: ~numpy.ndarray | None = None, contrast_weights: ~numpy.ndarray | None = None, contrast_series: ~numpy.ndarray | None = None, pre_fit_rmse: float | None = None)#

Bases: object

Optimized SYNDES design solution.

Contains the outcome of solving the mixed-integer program, including treatment assignment and synthetic control weights.

Parameters:
  • mode (str) – Optimization mode used.

  • objective_value (float) – Optimal value of the objective function.

  • lambda_value (float) – Regularization parameter used in optimization.

  • assignment (np.ndarray) – Binary treatment assignment vector D.

  • selected_unit_indices (np.ndarray) – Integer indices of treated units.

  • selected_unit_labels (np.ndarray) – Original labels of treated units.

  • assignment_by_unit (dict) – Mapping from unit label to treatment indicator.

  • w (np.ndarray or None, optional) – Synthetic control weights (global or per-unit depending on mode).

  • q (np.ndarray or None, optional) – Auxiliary optimization variables (mode-dependent).

  • z (np.ndarray or None, optional) – Additional binary or continuous decision variables.

  • raw_results (dict, optional) – Raw solver output.

  • treated_weights (np.ndarray or None, optional) – Normalized weights over treated units.

  • control_weights (np.ndarray or None, optional) – Normalized weights over control units.

  • contrast_weights (np.ndarray or None, optional) – Difference weights used in global estimators.

  • contrast_series (np.ndarray or None, optional) – Pre-period fitted contrast Y_pre @ contrast_weights, shape (T_pre,) – the per-period treated-minus-synthetic gap the design balances. The design’s “prediction” of the (zero) pre-period effect.

  • pre_fit_rmse (float or None, optional) – Root mean squared pre-period contrast, sqrt(mean(contrast_series^2)) – how tightly the design balances treated and control before launch.

Notes

This object represents the solution to the design stage only. Post-treatment estimation and inference are handled separately.

assignment: ndarray#
assignment_by_unit: Dict[Any, int]#
contrast_series: ndarray | None = None#
contrast_weights: ndarray | None = None#
control_weights: ndarray | None = None#
lambda_value: float#
mode: str#
objective_value: float#
pre_fit_rmse: float | None = None#
q: ndarray | None = None#
raw_results: Dict[str, Any]#
selected_unit_indices: ndarray#
selected_unit_labels: ndarray#
treated_weights: ndarray | None = None#
w: ndarray | None = None#
z: ndarray | None = None#
class mlsynth.utils.syndes_helpers.structures.SYNDESInference(atet: float, p_value: float, reject: bool, alpha: float, method: str, null_stats: ndarray | None = None)#

Bases: object

Permutation-based inference results for SYNDES.

Parameters:
  • atet (float) – Estimated average treatment effect on treated units.

  • p_value (float) – Permutation-based p-value.

  • reject (bool) – Whether the null hypothesis is rejected at level alpha.

  • alpha (float) – Significance level used for the test.

  • method (str) – Name of inference procedure.

  • null_stats (np.ndarray or None) – Empirical null distribution of test statistics.

Notes

Inference is currently implemented for the global_2way mode only.

alpha: float#
atet: float#
method: str#
null_stats: ndarray | None = None#
p_value: float#
reject: bool#
class mlsynth.utils.syndes_helpers.structures.SYNDESInputs(Y_pre: ndarray, Y_post: ndarray | None, unit_index: IndexSet, time_index: IndexSet, pre_time_index: IndexSet, post_time_index: IndexSet | None, outcome: str)#

Bases: object

Preprocessed panel data for SYNDES estimation.

Contains matrix representations of outcomes and index mappings used in optimization.

Parameters:
  • Y_pre (np.ndarray) – Pre-treatment outcome matrix of shape (T_pre, N) – rows are time periods, columns are units.

  • Y_post (np.ndarray or None) – Post-treatment outcome matrix of shape (T_post, N), if available.

  • unit_index (IndexSet) – Mapping from unit labels to integer indices.

  • time_index (IndexSet) – Mapping from time labels to integer indices.

  • pre_time_index (IndexSet) – Index set for pre-treatment periods.

  • post_time_index (IndexSet or None) – Index set for post-treatment periods.

  • outcome (str) – Name of outcome variable.

Notes

All matrices are aligned such that rows correspond to units and columns correspond to time periods.

Y_post: ndarray | None#
Y_pre: ndarray#
outcome: str#
post_time_index: IndexSet | None#
pre_time_index: IndexSet#
time_index: IndexSet#
unit_index: IndexSet#
class mlsynth.utils.syndes_helpers.structures.SYNDESMultiArmResults(arm_designs: Dict[Any, Any], arm: str)#

Bases: object

Per-arm SYNDES designs.

Returned by mlsynth.estimators.SYNDES when an arm column is configured: the SYNDES design problem is solved independently within each arm’s units, and each arm’s full result (a SYNDESResults for the MIP modes, or a RelaxedSolverResults for the annealed mode) is collected here.

Parameters:
  • arm_designs (dict) – {arm_label: SYNDESResults} – one independent SYNDES solution per arm (or RelaxedSolverResults under mode="two_way_global_annealed").

  • arm (str) – Name of the arm column the units were partitioned on.

arm: str#
arm_designs: Dict[Any, Any]#
atet_by_arm() Dict[Any, float | None]#

{arm_label: ATET} across arms (None where no inference).

property mode: str#
selected_unit_labels_by_arm() Dict[Any, Any]#

{arm_label: selected_unit_labels} across arms.

class mlsynth.utils.syndes_helpers.structures.SYNDESProblemComponents(mode: str, objective: Expression, constraints: List[Constraint], variables: Dict[str, Variable], assignment_variable: Variable)#

Bases: object

CVXPY components for an SYNDES optimization problem.

This container stores the symbolic elements required to construct and solve the mixed-integer program underlying SYNDES.

Parameters:
  • mode (str) – SYNDES formulation (“global_2way”, “global_equal_weights”, “per_unit”).

  • objective (cp.Expression) – CVXPY objective function.

  • constraints (list of cp.Constraint) – Linear and convex constraints defining the feasible set.

  • variables (dict of str -> cp.Variable) – Named CVXPY decision variables (e.g., weights, assignment variables).

  • assignment_variable (cp.Variable) – Binary treatment assignment variable D.

with_constraints(additional_constraints)#

Return a new problem with extra constraints appended.

Notes

This object is intentionally “solver-agnostic” and is used to separate model construction from optimization execution.

assignment_variable: Variable#
constraints: List[Constraint]#
mode: str#
objective: Expression#
variables: Dict[str, Variable]#
with_constraints(additional_constraints: List[Constraint]) SYNDESProblemComponents#

Append additional constraints to the current formulation.

Parameters:

additional_constraints (list of cp.Constraint) – Extra constraints to include in the optimization problem.

Returns:

SYNDESProblemComponents – New instance with extended constraint set.

class mlsynth.utils.syndes_helpers.structures.SYNDESResults(design: SYNDESDesign, inputs: SYNDESInputs, inference: SYNDESInference | None = None, post_fit: Any | None = None)#

Bases: object

Complete SYNDES estimation output.

This object bundles the optimized design, prepared inputs, and optional inference results.

Parameters:
  • design (SYNDESDesign) – Optimization solution.

  • inputs (SYNDESInputs) – Preprocessed data used in estimation.

  • inference (SYNDESInference or None) – Optional inference results.

  • post_fit (SyntheticControlPostFit, optional) – Standardized post-fit diagnostics (ATE / total effect / percentage lift / fit RMSEs / inference / power analysis) attached at the end of mlsynth.estimators.SYNDES.fit() via mlsynth.utils.post_fit.compute_post_fit(). Same shape and semantics as the MAREX / LEXSCM / PANGEO equivalents, so downstream consumers see a single uniform surface across the family.

Notes

This is the primary return object of SYNDES.fit().

Properties#

modestr

Alias for design.mode.

selected_unit_labelsnp.ndarray

Labels of treated units selected by the design.

design: SYNDESDesign#
inference: SYNDESInference | None = None#
inputs: SYNDESInputs#
property mode: str#

Return the optimization mode used.

post_fit: Any | None = None#
property selected_unit_labels: ndarray#

Return labels of selected treated units.

In addition, SYNDES.fit() attaches a SyntheticControlPostFit as results.post_fit — the standardized diagnostics container shared with the rest of the MAREX family (LEXSCM / MAREX / PANGEO). It carries the ATE / total effect / percentage lift / per-period gap, pre- and post-period RMSEs, the inference triple (p-value, CI), and a PowerAnalysis block with the headline MDE and the MDE-versus-horizon curve.

class mlsynth.utils.post_fit.SyntheticControlPostFit(treated_series: ndarray, control_series: ndarray, gap_series: ndarray, n_fit: int, n_blank: int, n_post: int, ate: float | None = None, total_effect: float | None = None, ate_percent: float | None = None, ate_per_period: ndarray | None = None, cumulative_effect: ndarray | None = None, p_value: float | None = None, ci_lower: float | None = None, ci_upper: float | None = None, inference_method: str | None = None, rmse_fit: float | None = None, rmse_blank: float | None = None, rmse_post: float | None = None, covariate_names: Tuple[str, ...] = (), covariate_smd: Dict[str, float] | None = None, covariate_smd_abs_max: float | None = None, covariate_smd_squared_sum: float | None = None, covariate_smd_treated_vs_pop: Dict[str, float] | None = None, covariate_smd_treated_vs_pop_abs_max: float | None = None, covariate_smd_treated_vs_pop_squared_sum: float | None = None, covariate_smd_control_vs_pop: Dict[str, float] | None = None, covariate_smd_control_vs_pop_abs_max: float | None = None, covariate_smd_control_vs_pop_squared_sum: float | None = None, power: PowerAnalysis | None = None)#

Bases: object

Standardized post-fit diagnostics for a single synthetic control design.

Field semantics are estimator-agnostic; every MAREX-family adapter populates the same shape. Any field that isn’t naturally computable for the producing estimator is left None.

ate: float | None = None#
ate_per_period: ndarray | None = None#
ate_percent: float | None = None#
ci_lower: float | None = None#
ci_upper: float | None = None#
control_series: ndarray#
covariate_names: Tuple[str, ...] = ()#
covariate_smd: Dict[str, float] | None = None#
covariate_smd_abs_max: float | None = None#
covariate_smd_control_vs_pop: Dict[str, float] | None = None#
covariate_smd_control_vs_pop_abs_max: float | None = None#
covariate_smd_control_vs_pop_squared_sum: float | None = None#
covariate_smd_squared_sum: float | None = None#
covariate_smd_treated_vs_pop: Dict[str, float] | None = None#
covariate_smd_treated_vs_pop_abs_max: float | None = None#
covariate_smd_treated_vs_pop_squared_sum: float | None = None#
cumulative_effect: ndarray | None = None#
gap_series: ndarray#
inference_method: str | None = None#
n_blank: int#
n_fit: int#
n_post: int#
p_value: float | None = None#
power: PowerAnalysis | None = None#
rmse_blank: float | None = None#
rmse_fit: float | None = None#
rmse_post: float | None = None#
total_effect: float | None = None#
treated_series: ndarray#
class mlsynth.utils.post_fit.PowerAnalysis(headline: MDEPoint, curve: Tuple[MDEPoint, ...], alpha: float, power_target: float, sigma_placebo: float, serial_correlation: float, baseline: float, method: str = 'analytical_ar1')#

Bases: object

Standardized power-analysis output attached to SyntheticControlPostFit.

Built from the placebo / blank-period gap variance and an analytical Gaussian approximation, with AR(1) variance inflation to handle serial correlation in the gap residuals. The intent matches the per-estimator power modules already in the library (PangeoPower, SPCDPowerAnalysis, SYNDESPower) but consumes the same SyntheticControlPostFit shape so every covariate-aware SCM-family estimator gets the surface for free.

headline#

MDE for the actual n_post horizon of the realised design.

Type:

MDEPoint

curve#

MDE / power values across the requested post_grid horizons (so callers can read a detectability curve).

Type:

list of MDEPoint

alpha#

Two-sided significance level assumed.

Type:

float

power_target#

Target power the MDEs are computed at (default 0.80).

Type:

float

sigma_placebo#

Standard deviation of the placebo gap series used as the noise scale.

Type:

float

serial_correlation#

Lag-1 (AR(1)) autocorrelation of the placebo gap residuals used to inflate the variance for serial dependence.

Type:

float

baseline#

Mean of the control trajectory on the post window (denominator for mde_pct). NaN when no post window exists.

Type:

float

method#

"analytical_ar1" for the closed-form Gaussian + AR(1) MDE used here. Reserved for future "monte_carlo" extensions.

Type:

str

alpha: float#
baseline: float#
curve: Tuple[MDEPoint, ...]#
headline: MDEPoint#
mde_by_horizon() Dict[int, float]#

{post_periods: mde_pct} for quick lookup.

method: str = 'analytical_ar1'#
power_target: float#
serial_correlation: float#
sigma_placebo: float#
class mlsynth.utils.post_fit.MDEPoint(post_periods: int, mde_absolute: float, mde_pct: float, se: float, power_at_observed: float | None = None)#

Bases: object

Minimum detectable effect at a single post-treatment horizon.

mde_absolute: float#
mde_pct: float#
post_periods: int#
power_at_observed: float | None = None#
se: float#

Helper Modules#

Data preparation – the only DataFrame touchpoint: pivots to wide pre/post matrices and builds the unit/time ``IndexSet``es.

Data preparation helpers for SYNDES.

mlsynth.utils.syndes_helpers.setup.prepare_syndes_inputs(df: DataFrame, outcome: str, unitid: str, time: str, T0: int | None = None, post_col: str | None = None) SYNDESInputs#

Pivot long panel data and split it into pre/post matrices for SYNDES.

Parameters:
  • df (pd.DataFrame) – Long balanced panel data.

  • outcome, unitid, time (str) – Column names identifying the outcome, units, and time periods.

  • T0 (Optional[int]) – Number of pre-treatment periods when post_col is not supplied.

  • post_col (Optional[str]) – Optional 0/1 or boolean column identifying post-treatment periods.

Returns:

SYNDESInputs – Wide pre/post matrices and label metadata.

The CVXPY objective/constraint builders for the three MIP formulations.

Objective and constraint builders for SYNDES optimization problems.

This module follows the formulations of Doudchenko et al. (2021, arXiv:2112.00278). The three modes — global_2way, global_equal_weights and per_unit — express the pre-treatment residual implicitly inside cp.sum_squares rather than via explicit auxiliary residual variables z_t. The previous implementation introduced one explicit z_t decision variable plus one linear equality constraint per pre-treatment period, which on long panels (e.g. the Walmart weekly-sales panel, T = 128) added T extra columns and T extra rows to SCIP’s LP relaxation at every branch-and-bound node and dominated the solve time. Inlining the residual lets cvxpy emit a single second-order-cone epigraph and gives SCIP an LP that is roughly T rows smaller per node, matching the implicit-residual pattern used by MAREX (see mlsynth.utils.marex_helpers.formulation for the analogous cp.sum_squares(Xbar - Y_T @ w) style).

mlsynth.utils.syndes_helpers.formulation.build_global_2way_components(Y: ndarray, D: Variable, K: int, lam: float) SYNDESProblemComponents#

Construct full SYNDES problem (global two-way formulation).

Returns a complete CVXPY optimization specification consisting of: objective, constraints, variables and assignment variable.

Parameters:
  • Y (np.ndarray) – Outcome matrix (T, N).

  • D (cp.Variable) – Binary assignment vector.

  • K (int) – Number of treated units (may be None for global modes).

  • lam (float) – Weight regularization parameter.

Returns:

SYNDESProblemComponents – Fully specified optimization problem.

mlsynth.utils.syndes_helpers.formulation.build_global_2way_constraints(Y: ndarray, D: Variable, K: int | None, variables: Dict[str, Variable]) List[Constraint]#

Build constraints for the global two-way SYNDES formulation.

The formulation enforces:

  • exactly K treated units (or 1 <= sum(D) <= N-1 when K is None);

  • the McCormick linearisation of q_i = w_i * D_i;

  • the normalisation sum_i q_i = 1 (treated weight), sum_i w_i = 2 (treated weight + control weight, each summing to 1).

The residual z_t = sum_i (2 q_i - w_i) Y_{t,i} is no longer a decision variable; it is computed inline by build_global_2way_objective().

Parameters:
  • Y (np.ndarray) – Outcome matrix of shape (T, N).

  • D (cp.Variable) – Binary treatment assignment vector.

  • K (int, optional) – Number of treated units.

  • variables (dict) – CVXPY variables {"w", "q"}.

Returns:

list of cp.Constraint – Constraints defining the feasible region.

mlsynth.utils.syndes_helpers.formulation.build_global_2way_objective(Y: ndarray, lam: float, variables: Dict[str, Variable]) Expression#

Construct objective for global two-way SYNDES.

Objective corresponds to:

\[\frac{1}{T} \sum_t z_t^2 \,+\, \lambda \, \| w \|_2^2,\]

where \(z_t = \sum_i (2 q_i - w_i) Y_{t,i}\) is the treated-minus-control contrast residual. The residual is computed inline (as a single cp.sum_squares of the length-T vector Y @ (2 q - w)) rather than via T auxiliary variables and T equality constraints; cvxpy compiles it into a single second-order-cone epigraph, which dramatically reduces SCIP’s LP relaxation size per branch-and-bound node on long panels.

Parameters:
  • Y (np.ndarray) – Outcome matrix (T, N).

  • lam (float) – Regularization strength on weights.

  • variables (dict) – CVXPY variables {"w", "q"}.

Returns:

cp.Expression – Convex objective expression.

mlsynth.utils.syndes_helpers.formulation.build_global_2way_variables(T: int, N: int) Dict[str, Variable]#

Create CVXPY decision variables for the global two-way SYNDES formulation.

The pre-treatment residual is now expressed implicitly inside the objective (see build_global_2way_objective()) so no auxiliary z_t variables are returned.

Parameters:
  • T (int) – Number of pre-treatment time periods. Retained for signature compatibility; not used.

  • N (int) – Number of units.

Returns:

dict – Dictionary containing:

  • "w": unit weights (N,)

  • "q": treated-weight interaction variables (N,)

mlsynth.utils.syndes_helpers.formulation.build_global_equal_weights_components(Y: ndarray, D: Variable, K: int, lam: float) SYNDESProblemComponents#

Build full equal-weight global SYNDES problem.

Parameters:
  • Y (np.ndarray) – Outcome matrix.

  • D (cp.Variable) – Assignment variable.

  • K (int) – Number of treated units.

  • lam (float) – Regularization parameter.

Returns:

SYNDESProblemComponents – Complete optimization specification.

mlsynth.utils.syndes_helpers.formulation.build_global_equal_weights_constraints(Y: ndarray, D: Variable, K: int | None, variables: Dict[str, Variable]) List[Constraint]#

Build constraints for the one-way global SYNDES formulation.

The treated side is a simple average (weight 1/K on each treated unit); the control side is a free synthetic control. With c the control weights and D the assignment, the per-period contrast is

\[z_t = \frac{1}{K} \sum_i D_i Y_{i,t} - \sum_i c_i Y_{i,t},\]

subject to sum_i D_i = K, sum_i c_i = 1, c_i >= 0 and c_i <= 1 - D_i (so treated units carry no control weight). The residual is now computed inline by build_global_equal_weights_objective().

Parameters:
  • Y (np.ndarray) – Outcome matrix (T, N).

  • D (cp.Variable) – Binary assignment vector.

  • K (int) – Number of treated units (required for this mode).

  • variables (dict) – Contains "c".

Returns:

list of cp.Constraint – Feasibility constraints for assignment, control simplex.

mlsynth.utils.syndes_helpers.formulation.build_global_equal_weights_objective(Y: ndarray, K: int, lam: float, variables: Dict[str, Variable]) Expression#

Construct objective for the one-way global SYNDES formulation.

The objective is

\[\frac{1}{T} \sum_t z_t^2 \,+\, \lambda \left( \frac{1}{K} + \| c \|_2^2 \right),\]

where \(z_t = \frac{1}{K} \sum_i D_i Y_{i,t} - \sum_i c_i Y_{i,t}\) is the residual, 1/K is the (constant) penalty contributed by the pinned treated weights and \(\|c\|_2^2\) is the penalty on the free control weights. The residual vector is computed implicitly as (1/K) * (Y @ D) - Y @ c, a single length-T cvxpy expression that compiles into one second-order-cone epigraph.

Parameters:
  • Y (np.ndarray) – Outcome matrix (T, N).

  • K (int) – Number of treated units.

  • lam (float) – Regularization parameter (the paper’s \(\sigma^2\)).

  • variables (dict) – Contains "c".

Returns:

cp.Expression – Objective function.

mlsynth.utils.syndes_helpers.formulation.build_global_equal_weights_variables(T: int, N: int) Dict[str, Variable]#

Create CVXPY variables for the one-way global formulation.

This is the paper’s one-way global design (Doudchenko et al. 2021, eq. “one-way global”): the treated weights are pinned to 1/K (a simple average of the treated units), while the control weights remain free synthetic-control weights to be optimised. Only the control weights c are decision variables here; the assignment D is passed in separately. The pre-treatment residual is now expressed implicitly inside the objective (see build_global_equal_weights_objective()) so no auxiliary z_t variables are returned.

Parameters:
  • T (int) – Number of time periods. Retained for signature compatibility; not used.

  • N (int) – Number of units.

Returns:

dict – Contains:

  • "c": free control-side synthetic weights (N,), nonneg.

mlsynth.utils.syndes_helpers.formulation.build_per_unit_components(Y: ndarray, D: Variable, K: int, lam: float) SYNDESProblemComponents#

Construct full per-unit SYNDES optimization problem.

Parameters:
  • Y (np.ndarray) – Outcome matrix.

  • D (cp.Variable) – Assignment vector.

  • K (int) – Number of treated units.

  • lam (float) – Regularization parameter.

Returns:

SYNDESProblemComponents – Full per-unit optimization specification.

mlsynth.utils.syndes_helpers.formulation.build_per_unit_constraints(Y: ndarray, D: Variable, K: int, variables: Dict[str, Variable]) List[Constraint]#

Build constraints for per-unit SYNDES formulation.

Each treated unit constructs its own synthetic control using control units only.

Structure:

  • D selects treated units;

  • each treated unit i has weights over donor pool j;

  • q_{i,j} enforces the interaction q_{i,j} = w_{i,j} (1 - D_j).

The per-unit residual is now computed inline by build_per_unit_objective().

Parameters:
  • Y (np.ndarray) – Outcome matrix (T, N).

  • D (cp.Variable) – Binary treatment assignment.

  • K (int) – Number of treated units.

  • variables (dict) – Contains w and q.

Returns:

list of cp.Constraint – Constraints defining per-unit synthetic control system.

mlsynth.utils.syndes_helpers.formulation.build_per_unit_objective(Y: ndarray, K: int, lam: float, variables: Dict[str, Variable], D: Variable) Expression#

Construct objective for per-unit SYNDES formulation.

Objective corresponds to:

\[\frac{1}{KT} \sum_i \sum_t z_{i,t}^2 + \frac{\lambda}{K} \| w \|_F^2,\]

where \(z_{i,t} = D_i Y_{i,t} - \sum_j q_{i,j} Y_{j,t}\) is the per-unit residual. Each per-unit residual vector is computed implicitly as D[i] * Y[:, i] - Y @ q[i, :] (a length-T cvxpy expression) and stacked over the N units before the Frobenius-norm squared. cvxpy compiles the result into a single SOC epigraph, eliminating the N * T auxiliary z_{i,t} variables and N * T equality constraints used previously.

Parameters:
  • Y (np.ndarray) – Outcome matrix (T, N).

  • K (int) – Number of treated units.

  • lam (float) – Regularization parameter.

  • variables (dict) – Contains "w" and "q".

  • D (cp.Variable) – Binary treatment assignment (used inline to scale each treated unit’s contribution).

Returns:

cp.Expression – Objective function.

mlsynth.utils.syndes_helpers.formulation.build_per_unit_variables(T: int, N: int) Dict[str, Variable]#

Create CVXPY variables for the per-unit SYNDES formulation.

This formulation constructs a separate synthetic control for each treated unit i. The per-period, per-unit residual z_{i,t} = D_i Y_{i,t} - sum_j q_{i,j} Y_{j,t} is computed inline by build_per_unit_objective() and not stored as a decision variable.

Parameters:
  • T (int) – Number of pre-treatment periods. Retained for signature compatibility; not used.

  • N (int) – Number of units.

Returns:

dict – Contains:

  • "w": (N, N) unit-specific weights.

  • "q": (N, N) interaction terms q_{i,j} = w_{i,j} (1 - D_j).

mlsynth.utils.syndes_helpers.formulation.build_syndes_problem_components(Y: ndarray, D: Variable, K: int, lam: float, mode: str) SYNDESProblemComponents#

Dispatch SYNDES formulation builder based on mode.

Parameters:
  • Y (np.ndarray) – Outcome matrix.

  • D (cp.Variable) – Binary treatment assignment.

  • K (int) – Number of treated units.

  • lam (float) – Regularization parameter.

  • mode ({“global_2way”, “global_equal_weights”, “per_unit”}) – SYNDES formulation selector.

Returns:

SYNDESProblemComponents – Fully specified optimization problem.

Raises:

ValueError – If mode is not recognized.

mlsynth.utils.syndes_helpers.formulation.unpack_problem_components(components: SYNDESProblemComponents) Tuple[Expression, List[Constraint], Dict[str, Variable]]#

Unpack SYNDES problem components.

Parameters:

components (SYNDESProblemComponents) – Structured optimization container.

Returns:

tuple – (objective, constraints, variables)

The solver wrapper: builds the MIP, applies optional budget constraints, solves, and extracts the assignment, weights, and pre-period prediction.

Solver-facing optimization utilities for SYNDES.

mlsynth.utils.syndes_helpers.optimization.estimate_lambda(Y: ndarray) float#

Estimate SYNDES penalty parameter as average within-unit variance.

Parameters:

Y (np.ndarray) – Pre-treatment outcome matrix of shape (T, N).

Returns:

float – Estimated lambda value.

Raises:
  • MlsynthDataError – If Y is not 2D.

  • MlsynthConfigError – If fewer than 2 time periods are provided.

mlsynth.utils.syndes_helpers.optimization.solve_synthetic_design(Y: ndarray, K: int | None, mode: str = 'global_2way', lam: float | None = None, solver: Any = 'SCIP', verbose: bool = False, unit_index: IndexSet | None = None, costs: ndarray | None = None, budget: float | None = None, gap_limit: float | None = None, time_limit: float | None = None) SYNDESDesign#

Solve the SYNDES synthetic design optimization problem.

Parameters:
  • Y (np.ndarray) – Pre-treatment outcome matrix of shape (T, N).

  • K (int) – Number of treated units.

  • mode ({“global_2way”, “global_equal_weights”, “per_unit”}, optional) – SYNDES formulation type.

  • lam (float, optional) – Regularization parameter. If None, estimated from Y.

  • solver (Any, optional) – CVXPY-compatible solver specification.

  • verbose (bool, optional) – Whether to enable solver verbosity.

  • unit_index (IndexSet, optional) – Mapping from indices to unit labels.

Returns:

SYNDESDesign – Optimized design object.

Raises:
  • MlsynthConfigError – If inputs are invalid or lambda is negative.

  • MlsynthEstimationError – If optimization fails or is infeasible.

The moving-block permutation test (shared contrast dispatch across modes).

Inference helpers for SYNDES.

mlsynth.utils.syndes_helpers.inference.permutation_test_global(Y_pre: ndarray, Y_post: ndarray, design: SYNDESDesign, alpha: float = 0.1, include_null_stats: bool = True) SYNDESInference#

Moving-block permutation test for any SYNDES / Synthetic-Design mode.

Generalises the original global_2way-only implementation to the full set of MIP formulations from Doudchenko et al. (2021): global_2way, global_equal_weights (paper’s “one-way global”) and per_unit. The test follows the Chernozhukov, Wuethrich, and Zhu (2021) permutation-across-time logic: we treat each period’s cross-unit contrast as exchangeable under the no-effect null and compare the post-period mean to the null distribution obtained by cyclically shifting the stacked panel.

mlsynth.utils.syndes_helpers.inference.permutation_test_relaxed_global(Y_pre: ndarray, Y_post: ndarray, design: RelaxedDesign, alpha: float = 0.1, include_null_stats: bool = True) RelaxedInference#

Moving-block permutation test for a relaxed two-way SYNDES design.

Mirrors permutation_test_global() but consumes the relaxed solver’s contrast_weights directly rather than reconstructing them from q and w.

Parameters:
  • Y_pre (np.ndarray) – Pre-treatment outcome matrix of shape (T_pre, N).

  • Y_post (np.ndarray) – Post-treatment outcome matrix of shape (T_post, N).

  • design (RelaxedDesign) – Best-state design from solve_two_way_relaxed().

  • alpha (float, optional) – Significance level for the test.

  • include_null_stats (bool, optional) – Whether to attach the empirical null distribution to the result.

Returns:

RelaxedInference – Permutation-inference output.

Raises:

MlsynthDataError – If Y_post is missing or empty.

The minimum-detectable-effect power analysis (Newey-West long-run SE).

Minimum-detectable-effect (MDE) power analysis for SYNDES designs.

The Doudchenko et al. (2021) Synthetic Design paper computes power curves by Monte Carlo simulation (Appendix A.4, Figure 2): seed a known true ATET, run the permutation test, repeat, and tally the rejection rate. Repeating that procedure across a grid of effect sizes traces out the rejection probability as a function of the true effect.

For a single fitted design we short-circuit that loop by appealing to the asymptotic normality of the permutation test statistic under the null. The moving-block permutation test compares the post-period mean contrast mean_t (Y_t @ c) to the distribution of length-n_post block means, so the relevant null standard error is the std of those block means. We estimate it on the pre-period contrast series per_period = Y_pre @ c:

SE(n_post) = std_s ( mean_{u in block_s} per_period[u] ),

the std over overlapping length-n_post blocks. This captures any serial correlation in the outcomes (the same correlation the block permutation test is exposed to); under independence it reduces to sigma_perm / sqrt(n_post), the textbook MDE, and we fall back to that scaling when a horizon leaves too few blocks. The MDE at significance level alpha (two-sided) and power 1 - beta is

MDE_abs(n_post) = (z_{1 - alpha/2} + z_{1 - beta}) * SE(n_post),

where sigma_perm = std_t (Y_t @ c) is reported as the per-period contrast std. We report the MDE alongside its percentage version

MDE_pct(n_post) = 100 * MDE_abs(n_post) / baseline,

where baseline defaults to the mean pre-period outcome on the SYNDES-selected treated units (so MDE_pct reads as a percentage of treated-unit baseline). Other baselines available: "overall" (full panel mean), "control" (the SC-weighted control mean under the design’s contrast), or a user-supplied scalar.

Use power_analysis() as the public entry point; pass it the mlsynth.utils.syndes_helpers.structures.SYNDESResults returned by mlsynth.SYNDES.fit() (or any of the legacy SYNDES modes).

class mlsynth.utils.syndes_helpers.power.SYNDESPower(n_post_periods: ndarray, mde_absolute: ndarray, mde_percent: ndarray, sigma_perm: float, baseline: float, baseline_kind: str, alpha: float, power: float, contrast: ndarray, long_run_sigma: float = 0.0)#

Per-horizon MDE table for a fitted SYNDES design.

Parameters:
  • n_post_periods (np.ndarray) – Horizons evaluated, shape (H,).

  • mde_absolute (np.ndarray) – MDE in the same units as the outcome, shape (H,).

  • mde_percent (np.ndarray) – 100 * MDE_abs / baseline, shape (H,).

  • sigma_perm (float) – Ordinary std of the per-period contrast applied to the pre-period outcomes (the i.i.d. per-period scale).

  • long_run_sigma (float) – Newey-West (Bartlett HAC) long-run std of the per-period contrast – the serial-correlation-robust scale the MDE actually rests on. Equals sigma_perm when the contrast series is serially uncorrelated.

  • baseline (float) – Baseline outcome level used to convert mde_absolute into a percentage.

  • baseline_kind (str) – Tag identifying which baseline was used ("treated", "overall", "control", or "custom").

  • alpha (float) – Two-sided significance level used to build the CI.

  • power (float) – Target power 1 - beta used to compute the MDE.

  • contrast (np.ndarray) – The unit-level contrast vector that maps outcomes to the ATT estimator. Stored for downstream inspection.

alpha: float#
baseline: float#
baseline_kind: str#
contrast: ndarray#
long_run_sigma: float = 0.0#
mde_absolute: ndarray#
mde_percent: ndarray#
n_post_periods: ndarray#
power: float#
sigma_perm: float#
to_dataframe()#

Return a tidy (n_post, mde_abs, mde_pct) DataFrame.

mlsynth.utils.syndes_helpers.power.power_analysis(results: SYNDESResults, n_post_periods: Iterable[int] = range(1, 13), alpha: float = 0.05, power: float = 0.8, baseline: str | float = 'treated') SYNDESPower#

Compute the per-horizon minimum detectable effect for a SYNDES design.

Parameters:
  • results (SYNDESResults) – Output of mlsynth.SYNDES.fit() or mlsynth.SYNDES.fit(). Only the design and inputs fields are read.

  • n_post_periods (iterable of int, default range(1, 13)) – Horizons (in post-treatment periods) at which to report the MDE.

  • alpha (float, default 0.05) – Two-sided significance level.

  • power (float, default 0.80) – Target power for the MDE (1 - beta).

  • baseline (str or float, default "treated") – Denominator for the percentage MDE. Choices:

    • "treated" (default) – mean pre-period outcome over the SYNDES-selected treated units.

    • "overall" – mean pre-period outcome over every unit.

    • "control" – SC-weighted mean pre-period control outcome implied by the design’s contrast.

    • float – user-supplied baseline value.

Returns:

SYNDESPower – Frozen container with per-horizon MDE in absolute and percentage units.

Standardized post-fit (shared across the MAREX family) — the compute_post_fit() / compute_power_analysis() / compute_smd() helpers that populate res.post_fit live outside this package so LEXSCM, MAREX, and PANGEO all consume the same diagnostics machinery:

Standardized post-fit diagnostics for synthetic control designs and the matching power-analysis surface that consumes them.

After any MAREX-family estimator (LEXSCM, MAREX, SYNDES, PANGEO, …) solves its design problem, downstream consumers (the SAGE dashboard, paper-style reports, comparison tables) all need the same numbers:

  • the post-treatment ATT, total effect, percentage lift, per-period gap;

  • pre / blank / post root-mean-squared-error of the synthetic gap;

  • inference scalars (p-value, CI bounds) when computed;

  • covariate-balance standardized mean differences (SMDs) when covariates were used in the design.

This module exposes one frozen dataclass (SyntheticControlPostFit) and three free functions:

  • compute_smd() – standalone, panel-independent SMD

    from any (cov_matrix, treated_w, control_w);

  • compute_post_fit() – the full diagnostic bundle from

    trajectories + boundaries + (optional) covariate matrix + (optional) inference;

  • compute_post_fit_marex() – adapter that builds the bundle from a

    MAREXResults + MAREXPanel pair.

The free-function entry points are deliberately small and reusable, so the LEXSCM / SYNDES / PANGEO equivalents can be added one-at-a-time without touching this module: they just compose the same primitives.

class mlsynth.utils.post_fit.MDEPoint(post_periods: int, mde_absolute: float, mde_pct: float, se: float, power_at_observed: float | None = None)#

Bases: object

Minimum detectable effect at a single post-treatment horizon.

mde_absolute: float#
mde_pct: float#
post_periods: int#
power_at_observed: float | None = None#
se: float#
class mlsynth.utils.post_fit.PowerAnalysis(headline: MDEPoint, curve: Tuple[MDEPoint, ...], alpha: float, power_target: float, sigma_placebo: float, serial_correlation: float, baseline: float, method: str = 'analytical_ar1')#

Bases: object

Standardized power-analysis output attached to SyntheticControlPostFit.

Built from the placebo / blank-period gap variance and an analytical Gaussian approximation, with AR(1) variance inflation to handle serial correlation in the gap residuals. The intent matches the per-estimator power modules already in the library (PangeoPower, SPCDPowerAnalysis, SYNDESPower) but consumes the same SyntheticControlPostFit shape so every covariate-aware SCM-family estimator gets the surface for free.

headline#

MDE for the actual n_post horizon of the realised design.

Type:

MDEPoint

curve#

MDE / power values across the requested post_grid horizons (so callers can read a detectability curve).

Type:

list of MDEPoint

alpha#

Two-sided significance level assumed.

Type:

float

power_target#

Target power the MDEs are computed at (default 0.80).

Type:

float

sigma_placebo#

Standard deviation of the placebo gap series used as the noise scale.

Type:

float

serial_correlation#

Lag-1 (AR(1)) autocorrelation of the placebo gap residuals used to inflate the variance for serial dependence.

Type:

float

baseline#

Mean of the control trajectory on the post window (denominator for mde_pct). NaN when no post window exists.

Type:

float

method#

"analytical_ar1" for the closed-form Gaussian + AR(1) MDE used here. Reserved for future "monte_carlo" extensions.

Type:

str

alpha: float#
baseline: float#
curve: Tuple[MDEPoint, ...]#
headline: MDEPoint#
mde_by_horizon() Dict[int, float]#

{post_periods: mde_pct} for quick lookup.

method: str = 'analytical_ar1'#
power_target: float#
serial_correlation: float#
sigma_placebo: float#
class mlsynth.utils.post_fit.SyntheticControlPostFit(treated_series: ndarray, control_series: ndarray, gap_series: ndarray, n_fit: int, n_blank: int, n_post: int, ate: float | None = None, total_effect: float | None = None, ate_percent: float | None = None, ate_per_period: ndarray | None = None, cumulative_effect: ndarray | None = None, p_value: float | None = None, ci_lower: float | None = None, ci_upper: float | None = None, inference_method: str | None = None, rmse_fit: float | None = None, rmse_blank: float | None = None, rmse_post: float | None = None, covariate_names: Tuple[str, ...] = (), covariate_smd: Dict[str, float] | None = None, covariate_smd_abs_max: float | None = None, covariate_smd_squared_sum: float | None = None, covariate_smd_treated_vs_pop: Dict[str, float] | None = None, covariate_smd_treated_vs_pop_abs_max: float | None = None, covariate_smd_treated_vs_pop_squared_sum: float | None = None, covariate_smd_control_vs_pop: Dict[str, float] | None = None, covariate_smd_control_vs_pop_abs_max: float | None = None, covariate_smd_control_vs_pop_squared_sum: float | None = None, power: PowerAnalysis | None = None)#

Bases: object

Standardized post-fit diagnostics for a single synthetic control design.

Field semantics are estimator-agnostic; every MAREX-family adapter populates the same shape. Any field that isn’t naturally computable for the producing estimator is left None.

ate: float | None = None#
ate_per_period: ndarray | None = None#
ate_percent: float | None = None#
ci_lower: float | None = None#
ci_upper: float | None = None#
control_series: ndarray#
covariate_names: Tuple[str, ...] = ()#
covariate_smd: Dict[str, float] | None = None#
covariate_smd_abs_max: float | None = None#
covariate_smd_control_vs_pop: Dict[str, float] | None = None#
covariate_smd_control_vs_pop_abs_max: float | None = None#
covariate_smd_control_vs_pop_squared_sum: float | None = None#
covariate_smd_squared_sum: float | None = None#
covariate_smd_treated_vs_pop: Dict[str, float] | None = None#
covariate_smd_treated_vs_pop_abs_max: float | None = None#
covariate_smd_treated_vs_pop_squared_sum: float | None = None#
cumulative_effect: ndarray | None = None#
gap_series: ndarray#
inference_method: str | None = None#
n_blank: int#
n_fit: int#
n_post: int#
p_value: float | None = None#
power: PowerAnalysis | None = None#
rmse_blank: float | None = None#
rmse_fit: float | None = None#
rmse_post: float | None = None#
total_effect: float | None = None#
treated_series: ndarray#
mlsynth.utils.post_fit.compute_post_fit(treated_series: ndarray, control_series: ndarray, *, n_fit: int, n_blank: int = 0, n_post: int | None = None, cov_matrix: ndarray | None = None, cov_names: Sequence[str] | None = None, cov_scales: ndarray | None = None, treated_weights: ndarray | None = None, control_weights: ndarray | None = None, population_weights: ndarray | None = None, inference: Any | None = None, n_treated_units: int | None = None) SyntheticControlPostFit#

Compute a SyntheticControlPostFit from trajectories + boundaries.

The trajectories treated_series and control_series are the estimator’s own synthetic constructs (Σⱼ wⱼ Yⱼ and Σⱼ vⱼ Yⱼ in Abadie-Zhou notation). n_post defaults to len(treated_series) - n_fit - n_blank.

Covariate balance fields are populated when cov_matrix + treated_weights + control_weights are all supplied (the natural inputs for any MAREX-family design). The compute_smd() helper does the work, so the SMD numbers are exactly consistent with a standalone call to compute_smd().

Inference scalars are pulled from the estimator’s inference object via _extract_inference(), which knows about the four common shapes (LEXSCM Inference, MAREX MAREXInference, SYNDES SYNDESInference, or a plain dict). All inference fields are optional.

mlsynth.utils.post_fit.compute_post_fit_marex(raw, panel, *, cov_scales: ndarray | None = None) SyntheticControlPostFit#

Adapt a MAREXResults + MAREXPanel pair into a SyntheticControlPostFit.

Pulls the aggregate synthetic-treated / synthetic-control trajectories from raw.globres, the (T0, blank_periods) split from panel.T0 and panel.blank_periods, the inference object from raw.globres.inference, and the covariate matrix from panel.covariates (when present).

mlsynth.utils.post_fit.compute_power_analysis(post_fit: SyntheticControlPostFit, *, alpha: float = 0.05, power_target: float = 0.8, post_grid: Sequence[int] | None = None) PowerAnalysis#

Analytical MDE + power curve for a design’s SyntheticControlPostFit.

Uses the placebo / blank-period gap residuals (or the pre-period gap when no blank window was carved out) to estimate the noise standard deviation sigma_placebo and the AR(1) autocorrelation rho, then computes the minimum detectable effect for each horizon T in post_grid via the Gaussian formula

MDE(T) = (z_{1-alpha/2} + z_{power}) * sigma_placebo * sqrt(VIF(T, rho)),

where VIF(T, rho) = Var(mean of T AR(1) periods) / sigma_placebo^2. The headline MDE uses T = post_fit.n_post (the realised post window).

Parameters:
  • post_fit (SyntheticControlPostFit) – The standardized post-fit from any MAREX-family estimator.

  • alpha (float, default 0.05) – Two-sided significance level.

  • power_target (float, default 0.80) – Target power for the MDE.

  • post_grid (sequence of int, optional) – Post-treatment horizons at which to compute MDE. Defaults to a small geometric grid centered on post_fit.n_post so users see the detectability tradeoff vs. running the experiment longer.

Returns:

PowerAnalysis – Headline MDE + a curve over the requested horizons.

mlsynth.utils.post_fit.compute_smd(cov_matrix: ndarray, treated_weights: ndarray, control_weights: ndarray, *, cov_names: Sequence[str] | None = None, cov_scales: ndarray | None = None) Dict[str, Any]#

Standardized mean differences between weighted treated and control means.

Parameters:
  • cov_matrix (ndarray, shape (N, M)) – Per-unit covariate values; rows align to treated_weights and control_weights.

  • treated_weights, control_weights (ndarray, shape (N,)) – Non-negative weights with disjoint supports. They are renormalised to sum to 1 internally (so callers may pass raw sums-to-K weights).

  • cov_names (sequence of str, optional) – Names for the M covariates. Defaults to ("cov_0", "cov_1", ...).

  • cov_scales (ndarray, shape (M,), optional) – Pre-computed per-covariate standardization scales (cross-unit std). Defaults to the std of cov_matrix columns. Passing the value already cached by build_covariate_matrix is the right move.

Returns:

  • dict with keys smd (the per-covariate dict), smd_abs_max,

  • and smd_squared_sum. Returns empty / NaN summaries if either weight

  • vector is all-zero.