Synthetic Controls for Experimental Design (MAREX)

Synthetic Controls for Experimental Design (MAREX)#

When to Use This Estimator#

The estimators elsewhere in mlsynth are retrospective: a treatment has already happened and you reweight donors to reconstruct the treated unit’s counterfactual. MAREX, due to Abadie and Zhao (2026) [ABADIE2024], is prospective — it designs an experiment. Before any treatment is assigned, and using only pre-experimental data, it chooses which aggregate units to treat and which to hold out as controls, so that the experiment you are about to run yields a credible estimate.

The motivating setting is a firm (say a ride-sharing company) that wants to test a new policy but can only deploy it in a few whole markets. A within-market A/B test is contaminated by interference (treated and control drivers compete); randomizing whole markets to treatment is unbiased ex ante but, with a handful of large units, routinely produces treated and control groups with very different baselines, so any single realisation is badly off. MAREX instead picks the treated and control markets so their pre-experiment predictors match the population — a non-randomized design that, the paper shows, substantially reduces estimation bias relative to randomization.

Reach for MAREX when:

Units are large aggregates (markets, regions, stores) and only one or a few can be treated.
You control the assignment and want to choose it well, rather than estimate after the fact.
Interference or equity rules out within-unit randomization, forcing whole-unit treatment.

Notation#

There are \(N\) units \(\mathcal{N} \coloneqq \{1, \dots, N\}\) and \(T\) periods \(t \in \mathcal{T} \coloneqq \{1, \dots, T\}\), 1-indexed; the experiment takes effect after period \(T_0\), splitting \(\mathcal{T}\) into the pre-experiment window \(\mathcal{T}_1 \coloneqq \{t \in \mathcal{T} : t \le T_0\}\) (of length \(T_0\)) and the experimental window \(\mathcal{T}_2 \coloneqq \{t \in \mathcal{T} : t > T_0\}\). Because MAREX designs an experiment rather than reweighting around one already-treated unit, units are indexed generically by \(i, j \in \mathcal{N}\) with no forced treated unit. Each unit has a pre-intervention predictor vector \(\mathbf{x}_j\) (pre-period outcomes and optional covariates); \(\bar{\mathbf{x}} = \sum_j f_j \mathbf{x}_j\) is the population predictor mean for known weights \(f_j\) (e.g. market shares, or \(1/N\)). The experimenter chooses treated weights \(\mathbf{w}\) and control weights \(\mathbf{v}\), both on the simplex, and disjoint:

\[\sum_j w_j = 1,\quad \sum_j v_j = 1,\quad w_j, v_j \ge 0,\quad w_j v_j = 0 \;\;\forall j.\]

Units with \(w_j > 0\) are treated; among the rest, units with \(v_j > 0\) form the synthetic control. Writing \(y_{jt}\) for the observed outcome (treated units realise \(y_{jt}^I\) over \(\mathcal{T}_2\), everyone else \(y_{jt}^N\)), the design estimator of the average effect is

\[\widehat{\tau}_t(\mathbf{w}, \mathbf{v}) = \sum_j w_j y_{jt} - \sum_j v_j y_{jt}, \qquad t \in \mathcal{T}_2.\]

Assumptions#

Assumption 1 (linear factor model). Potential outcomes follow

\[y_{jt}^N = \delta_t + \boldsymbol{\theta}_t^\top \mathbf{z}_j + \boldsymbol{\lambda}_t^\top \boldsymbol{\mu}_j + \varepsilon_{jt}, \qquad y_{jt}^I = \upsilon_t + \boldsymbol{\gamma}_t^\top \mathbf{z}_j + \boldsymbol{\eta}_t^\top \boldsymbol{\mu}_j + \xi_{jt},\]

with observed covariates \(\mathbf{z}_j\), unobserved factors \(\boldsymbol{\mu}_j\), and mean-zero idiosyncratic noise.

Remark. This is the interactive-fixed-effects model of the SC literature (Abadie-Diamond-Hainmueller 2010), extended with a separate factor structure for the treated potential outcome — necessary because a design must choose a treatment group, not just a comparison group.

Assumption 2 (regularity). The factor loadings are non-degenerate (\(F \le T_E\), smallest eigenvalue bounded below) and the noise is i.i.d. sub-Gaussian with common variance, independent across the two potential outcomes; dependence across units is allowed.

Remark. These conditions keep the factor structure recoverable from the pre-experiment window and the noise well-behaved, so the population predictor mean \(\bar{\mathbf{x}}\) is a meaningful matching target rather than an artifact of a degenerate loading matrix.

Assumption 3 / 4 (fit quality). A weight vector reproducing the population predictor means exists exactly (Assumption 3), or approximately within a tolerance \(d\) (Assumption 4). This is the design-time analogue of “the treated unit lies in the convex hull of the donors.”

Remark. Under these conditions Abadie & Zhao bound the bias of \(\widehat{\tau}_t(\mathbf{w}, \mathbf{v})\) and develop the permutation test below; the better the pre-experiment match, the smaller the bias.

Mathematical Formulation#

The Design Optimization#

MAREX chooses \(\mathbf{w}, \mathbf{v}\) (and a binary selection mask \(\mathbf{z}\), with \(w_j \le z_j\), \(v_j \le 1 - z_j\), so a unit is treated or a control, never both) to match the population predictor mean. The base design minimises

\[\min_{\mathbf{w}, \mathbf{v}, \mathbf{z}}\; \Bigl\| \bar{\mathbf{x}} - \sum_j w_j \mathbf{x}_j \Bigr\|_2^2 + \Bigl\| \bar{\mathbf{x}} - \sum_j v_j \mathbf{x}_j \Bigr\|_2^2 \quad \text{s.t. the simplex / disjointness / cardinality constraints,}\]

with the number of treated units pinned by m_eq (exactly) or bounded by m_min/m_max. This is a mixed-integer quadratic program (the binary z); mlsynth solves it with SCIP by default, or — via relaxed=True — relaxes z to \([0, 1]\), solves the QP, and discretises post hoc.

Most of the MIQP’s cost is SCIP proving a design optimal, which grows steeply with the number of markets and treated units. Two options manage that. A warm_start (a list of treated unit labels) seeds the search with a known good design — for instance LEXSCM’s top candidate, which solves a near-identical problem by lexicographic search rather than by proof: MAREX(..., warm_start=lexscm_warm_start(lex.fit())) (the helper lexscm_warm_start lives in mlsynth.utils.marex_helpers.warmstart). The seed is only a hint, so a solve that runs to completion returns the identical proven optimum; it just starts the branch-and-bound from a strong incumbent instead of hunting for one. Paired with time_limit (seconds), MAREX then returns the best design found within the budget — typically the seed, confirmed or refined — instead of paying the full optimality proof. Both default to off and apply to the exact MIQP only.

mlsynth exposes four objective variants through design (clear names that map to the paper’s formulations):

"standard" — match each predictor mean with both synthetic units (formulation 5);
"weakly_targeted" — match the treated synthetic to the mean and softly tie the control synthetic to it (weight beta);
"penalized" — standard plus a distance penalty that down-weights units far from the population mean (lambda1 / lambda2);
"unit_penalized" — standard plus unit-level penalties (lambda1_unit / lambda2_unit).

Covariates#

By default the design matches on pre-period outcomes. Passing covariates (time-invariant column names) appends them to the predictor vector, \(\mathbf{x}_j = [\mathbf{y}^E_j ; \mathbf{z}_j]\), exactly as in the paper — the synthetic treated and control are then balanced on both pre-period outcomes and covariates (with an optional covariate_weight scale). When the pre-period is long, the outcomes already encode the covariates’ contribution, so covariates matter most when few pre-periods are available.

Clustering, Costs, and Budgets#

Passing a cluster column solves the design within each cluster (one or a few treated units per cluster), which better approximates the population predictor distribution and limits interpolation bias (paper OA.1). Per-unit costs and a budget (scalar or per-cluster) add a knapsack constraint \(\sum_j c_j w_j \le B\), so the chosen treatment group respects a spend cap.

Geographic design restrictions#

MAREX already carries most geographic structure natively. Its cluster field is the paper’s “region as a distinct experimental design” (p.13: weather patterns dependent across cities in the same region motivate treating each region separately); m_min / m_max per cluster is a stratum quota (“at least / at most this many treated per region”); costs / budget is the cost bound; and because each cluster’s control synthetic \(v_{\cdot,k}\) is built only from cluster-k members, donors are automatically same-region. The same restriction vocabulary SYNDES and GEOLIFT expose then adds the rest, as constraints on the MIP:

to_be_treated / not_to_be_treated – force a market in (one leadership already committed to) or out (a regulated market, or one already under another test); a forbidden market stays a donor.
adjacency + spillover_threshold – no two treated markets may share a border (\(\sum_k z_{ik} + \sum_k z_{jk} \le 1\)), for when one test market’s campaign would bleed into a neighbouring one and bias the lift.
size_col + min_size / max_size – a treated-unit size band (the floor is a power minimum, the ceiling is synthesizability); out-of-band markets stay donors.
cluster_col – at most one treated market per cluster value (\(\sum_k z_{ik} + \sum_k z_{jk} \le 1\) for same-cluster pairs), a no-two-from-one-cluster rule for markets that share a media buy or retail footprint. This is distinct from the cluster design grouping above: it adds conflict constraints to the treated set rather than splitting the objective.
stratum_col + min_per_stratum / max_per_stratum – a coverage quota on the treated set of one design (at least / at most this many treated per stratum). Use this when the design is a single cluster but the read still has to span regions; when each region is its own design, cluster with per-cluster m_min / m_max expresses the same quota.
exclude_bordering_donors – drop a treated market’s bordering neighbours from its (within-cluster) control pool, so a market partly treated by spillover never sits in its counterfactual. Requires adjacency.

These are supported in the exact MIQP only (not relaxed=True, whose post-hoc rounding cannot guarantee them). An over-constrained design – e.g. forbidding every market of a region, leaving its required treated synthetic with no member – raises a translated MlsynthEstimationError naming the restrictions. The example below runs on the bundled real US DMA contiguity map and Census regions, with a region-grouped linear factor sales model (after Liao, Shi & Zheng [RelaxSC]); the geography is real, the sales reproducible.

import numpy as np
import pandas as pd
from mlsynth import MAREX

base = ("https://raw.githubusercontent.com/jgreathouse9/mlsynth/"
        "refs/heads/main/basedata/markets")
adj = pd.read_csv(f"{base}/dma_adjacency.csv", index_col=0)
meta = pd.read_csv(f"{base}/dma_metadata.csv")
CDC = {**{s: "Midwest" for s in ["OH", "IN", "MI", "KY"]},
       **{s: "South" for s in ["WV", "VA", "TN", "GA", "FL", "SC"]}}
meta = meta[meta["state"].isin(CDC)].copy()
meta["region"] = meta["state"].map(CDC)
names = [n for n in meta["dma_name"] if n in adj.index]
meta = meta[meta["dma_name"].isin(names)].reset_index(drop=True)
adj = adj.loc[names, names]

rng = np.random.default_rng(7)
reg = meta["region"].to_numpy(); n, r, T, T0 = len(names), 3, 30, 24
Lam = (np.array([{g: rng.normal(size=r) for g in sorted(set(reg))}[g] for g in reg])
       + 0.15 * rng.normal(size=(n, r)))
F = np.cumsum(rng.normal(size=(T, r)), axis=0)
pop = np.round(rng.lognormal(12.5, 0.7, n)).astype(int)
Y = 100 + rng.normal(0, 5, n) + F @ Lam.T + rng.normal(0, 1, (T, n))
df = pd.DataFrame([{"dma": names[j], "week": t, "sales": float(Y[t, j]),
                    "region": reg[j], "population": int(pop[j])}
                   for j in range(n) for t in range(T)])

B = dict(df=df, outcome="sales", unitid="dma", time="week",
         cluster="region", T0=T0)                    # cluster = Census region

MAREX({**B, "m_max": 2}).fit()                       # native quota: <=2 per region
MAREX({**B, "m_max": 2, "to_be_treated": ["Atlanta, GA"],
       "not_to_be_treated": ["Miami-Fort Lauderdale, FL"]}).fit()
MAREX({**B, "m_max": 2, "adjacency": adj,
       "spillover_threshold": 0.5}).fit()            # no two treated border
MAREX({**B, "m_max": 2, "size_col": "population",
       "min_size": 200_000, "max_size": 3_000_000}).fit()
MAREX({**B, "m_max": 1, "adjacency": adj, "spillover_threshold": 0.5,
       "exclude_bordering_donors": True}).fit()      # non-bordering donors

Across groups: cluster vs. quotas vs. restrictions#

Three mechanisms touch on grouping/geography; they are distinct, not interchangeable, and cluster is not made obsolete by the restrictions:

A separate experiment per group – each group’s own representativeness target and donor pool: cluster. This is the design’s objective, not a constraint: each cluster k reconstructs its own predictor mean \(\overline{X}_k\). It is the SYNDES analog of that estimator’s arm (Synthetic Design (SYNDES)), but baked into the objective rather than run as separate solves – which is why, in MAREX, the restrictions compose with cluster (they apply within each cluster) and the per-cluster cardinality m_min / m_max is the stratum quota.
One design with geographic / forcing limits within those clusters: the restriction suite above (force in/out, border conflict, size band, donor exclusion).

So a stratum quota is not an alternative to cluster; in MAREX it is a property of the clustering. Drop cluster (a single global cluster) and you recover one design against the whole-population mean – a different estimand, not a constrained version of the clustered one.

Inference#

When inference=True with blank_periods > 0, the last few pre-experiment periods are held out as blanks: there the synthetic treated minus synthetic control is pure noise, so its distribution calibrates inference for the post-period effect. MAREX reports a permutation p-value for the global null of no effect, per-period p-values, and a confidence band from split-conformal prediction (Lei et al. 2018; Vovk, Gammerman, and Shafer 2005) built on the held-out blank periods – rearranging only over the blank and post-intervention periods, not over all periods as in Chernozhukov-Wuthrich-Zhu (2021) – all on MAREXInference.

Standardized Post-Fit and Power Analysis#

Every call to MAREX.fit() attaches a SyntheticControlPostFit to res.post_fit. This is the single, estimator-agnostic surface for the diagnostic numbers a consumer of the design typically needs: effects, fit RMSEs, conformal / permutation inference, covariate balance (when covariates were used), and power analysis. It is computed by compute_post_fit() from MAREX’s own synthetic_treated / synthetic_control trajectories and weight vectors, so by construction it agrees with what the underlying optimization produced.

pf = res.post_fit                          # SyntheticControlPostFit
pf.ate, pf.ate_percent, pf.total_effect    # treatment-effect scalars
pf.rmse_fit, pf.rmse_blank, pf.rmse_post   # fit quality, per phase
pf.p_value, pf.ci_lower, pf.ci_upper       # inference (when computed)
pf.covariate_smd                           # treated-vs-control SMD dict
pf.covariate_smd_treated_vs_pop            # treated-vs-population
pf.covariate_smd_control_vs_pop            # control-vs-population
pf.power                                   # PowerAnalysis (see below)

Three Standardized Mean Differences#

When covariates=[...] is set, the post-fit reports the three covariate balance diagnostics that match the structure of Abadie & Zhao’s objective. Each is a per-covariate signed dict (covariate_smd_*) plus two summary scalars (max absolute SMD, sum of squared SMDs). With \(\bar{\mathbf{x}}\) the population covariate aggregate, \(\mathbf{x}_w \coloneqq \sum_j w_j \mathbf{x}_j\), \(\mathbf{x}_v \coloneqq \sum_j v_j \mathbf{x}_j\), and \(s_m\) the cross-unit standard deviation of covariate \(m\), each comparison is the unit-free vector

\[\mathrm{SMD}_m^{(a,b)} = \frac{x_a[m] - x_b[m]}{s_m}.\]

The three pairs (a, b) reported are:

covariate_smd — \((\mathbf{x}_w, \mathbf{x}_v)\): synthetic treated vs synthetic control. The internal-validity check (“is the experiment apples-to-apples?”).
covariate_smd_treated_vs_pop — \((\mathbf{x}_w, \bar{\mathbf{x}})\): synthetic treated vs population aggregate. Tracks the first term of MAREX’s objective, \(\|\bar{\mathbf{x}} - \sum_j w_j \mathbf{x}_j\|^2\). Tells you whether the chosen treated group represents the population.
covariate_smd_control_vs_pop — \((\mathbf{x}_v, \bar{\mathbf{x}})\): synthetic control vs population aggregate. Tracks the second term of the objective. Tells you whether the control set represents the population.

A rule-of-thumb threshold of \(|\mathrm{SMD}| < 0.1\) is conventionally “well balanced”; below \(0.25\) is acceptable; above is a red flag.

Power Analysis and Minimum Detectable Effect#

Power analysis answers the pre-experiment planning question: given the design I’ve chosen, how large a treatment effect can I detect with high probability? This is the dual of inference: inference asks “is the observed effect distinguishable from noise?”, power asks “what effect sizes would be?”

The paper’s inference is built to stay valid under serial dependence and non-stationarity; staying squarely within that framework, mlsynth also reports a planning-oriented minimum detectable effect – an analytical, AR(1)-inflated Gaussian MDE computed from the same residual series the permutation test draws on. Set inference=True and the result auto-populates res.post_fit.power — blank_periods defaults to max(1, floor(0.3 * T0)) so you do not need to pick a scalar yourself (matching the LEXSCM / SYNDES / PANGEO convention).

Where the noise standard deviation comes from#

Under the linear factor model of Assumption 1, the per-period contrast \(g_t \coloneqq \sum_j w_j y_{jt} - \sum_j v_j y_{jt}\) has expectation zero under the no-effect null. Its sample SD on the blank window \(\mathcal{B}\) (the held-out tail of the pre-period) is the natural estimator of the noise scale:

\[\widehat{\sigma}_{\text{placebo}} = \sqrt{\frac{1}{|\mathcal{B}| - 1} \sum_{t \in \mathcal{B}} \bigl(g_t - \bar g\bigr)^2}.\]

When no blank window is carved out (inference=False) the pre-period gap serves as the placebo proxy. The blank-window estimator is preferred because it uses periods that played no role in fitting the weights — it is honest in exactly the same sense Chernozhukov-Wuthrich-Zhu’s conformal residuals are.

Serial correlation matters#

Synthetic-control gap residuals are virtually always serially correlated: the donor weighting absorbs the level but the persistent components of the factor structure (business cycles, seasonality, slow trends) leak through. Ignoring this systematically under-states the SE at long horizons. We model it as an AR(1) process with lag-1 autocorrelation

\[\widehat{\rho} = \frac{\sum_t g_t g_{t-1}}{\sum_t g_t^2},\]

clipped to \((-0.99, 0.99)\) for numerical safety. The variance of the mean of \(T\) consecutive AR(1) periods, expressed as a multiple of \(\sigma^2\), is the variance inflation factor

\[\mathrm{VIF}(T, \rho) = \frac{1}{T}\!\left(1 + 2 \sum_{k=1}^{T-1}\!\Bigl(1 - \frac{k}{T}\Bigr)\rho^k\right),\]

which collapses to the textbook \(1/T\) when \(\rho = 0\) and grows substantially for \(\rho > 0.3\). The same formula is used by PANGEO’s power module.

The MDE formula#

Combining: the standard error of the mean of \(T\) post-period contrasts under \(H_0\) is \(\mathrm{SE}(T) = \widehat{\sigma}_{\text{placebo}} \, \sqrt{\mathrm{VIF}(T, \widehat{\rho})}\). For a two-sided test at level \(\alpha\) with target power \(1 - \beta\), the minimum detectable effect is

\[\mathrm{MDE}(T) = \bigl(z_{1-\alpha/2} + z_{1-\beta}\bigr) \cdot \widehat{\sigma}_{\text{placebo}} \cdot \sqrt{\mathrm{VIF}(T, \widehat{\rho})}.\]

The corresponding power to detect a given true effect \(\tau\) at horizon \(T\) is

\[\pi(\tau, T) = \Phi\!\Bigl(\frac{|\tau|}{\mathrm{SE}(T)} - z_{1-\alpha/2}\Bigr) + \Phi\!\Bigl(-\frac{|\tau|}{\mathrm{SE}(T)} - z_{1-\alpha/2}\Bigr),\]

which is reported as power_at_observed for each horizon point using the realised \(\widehat{\tau}\).

What the surface looks like#

p = res.post_fit.power                  # PowerAnalysis dataclass

p.headline.mde_absolute                 # MDE at the realised T_post
p.headline.mde_pct                      # ... as % of post-period baseline
p.headline.se                           # implied SE of mean(g_t) over T_post
p.headline.power_at_observed            # power to detect res.post_fit.ate

p.curve                                 # tuple of MDEPoint, one per horizon
for pt in p.curve:
    print(pt.post_periods, pt.mde_absolute, pt.mde_pct, pt.power_at_observed)

p.sigma_placebo                         # σ̂ used (from blank or pre window)
p.serial_correlation                    # ρ̂ AR(1) of the placebo gaps
p.baseline                              # mean(synthetic_control) on post window
p.alpha, p.power_target                 # 0.05 / 0.80 by default
p.method                                # "analytical_ar1"

The default horizon grid covers \(T \in \{1, 2, 4, 6, 8, 12\}\) plus the realised n_post, so the table also doubles as a “how long do I need to run?” answer — pick the smallest \(T\) whose MDE drops below your target effect size.

Practical reading#

A typical MAREX run with T_post = 6, blank_periods = 4 and modest serial correlation (\(\widehat{\rho} \approx 0.5\)) on a Walmart-style sales panel produces an MDE on the order of 0.05–0.15% of mean sales, well below the 1–3% effect sizes typical marketing interventions aim for; this is the quantitative substance of “good designs are well-powered”. Conversely, an MDE much above the expected effect is a signal the design needs more units (lower m_eq/m_max are typically worse for power) or more post-periods (extend the experiment).

Opting out#

The power computation is wrapped in a try/except in solve_marex() — a power analysis failure (e.g. degenerate residual variance) never breaks the fit, res.post_fit.power is just left as None. To compute power on a non-default horizon grid or significance level, call the free function directly:

from mlsynth.utils.post_fit import compute_power_analysis

alt = compute_power_analysis(
    res.post_fit, alpha=0.10, power_target=0.90,
    post_grid=[2, 4, 8, 16, 32, 52],     # weekly horizons out to a year
)

Monte Carlo: Recovering the Treatment Effect#

The block below replicates the qualitative finding of the paper’s simulation study (Section 5) using mlsynth’s own reimplementation of the linear-factor DGP. A sample is drawn, the design is fit on the pre-period, the treated units realise \(y^I\) in the experiment, and the estimate is compared to the true average effect.

import numpy as np
import pandas as pd
from mlsynth import MAREX
from mlsynth.utils.marex_helpers.simulation import generate_marex_sample

rng = np.random.default_rng(0)

def design_mae(sample, **card):
    J, T = sample.Y_N.shape
    T0 = sample.T0
    df = pd.DataFrame(
        [{"unit": f"u{j}", "time": t, "y": float(sample.Y_N[j, t])}
         for j in range(J) for t in range(T)]
    )
    res = MAREX({"df": df, "outcome": "y", "unitid": "unit",
                 "time": "time", "T0": T0, **card}).fit()
    w = res.globres.treated_weights_agg
    v = res.globres.control_weights_agg
    treated = np.where(w > 1e-8)[0]
    Y_obs = sample.Y_N.copy()
    Y_obs[treated, T0:] = sample.Y_I[treated, T0:]   # experiment realises Y^I
    tau_hat = w @ Y_obs[:, T0:] - v @ Y_obs[:, T0:]
    return np.mean(np.abs(tau_hat - sample.tau[T0:]))

maes, scales = [], []
for _ in range(5):
    s = generate_marex_sample(J=12, T=30, T0=25, rng=rng)
    maes.append(design_mae(s, m_min=1, m_max=11))     # Unconstrained
    scales.append(np.mean(np.abs(s.tau[s.T0:])))
print(f"MAE {np.mean(maes):.2f}  vs  effect scale {np.mean(scales):.2f}")

The synthetic-control design recovers the average treatment effect with mean absolute error far below the effect’s own scale (≈ 4.4 vs. ≈ 14, i.e. under a third), the central message of the paper’s Table 2. Over the paper’s full 1000 simulations the error also decreases as more units are allowed into the treated group (the Unconstrained design is best), with the largest gains moving from one to two or three treated units.

Note

This is a Path-B replication: it reproduces the simulation study’s conclusions from public DGPs and mlsynth code, with no dependency on the authors’ replication package. It is locked in as mlsynth.tests.test_marex_replication.

Empirical Application: Walmart (Placebo Experiment)#

We replicate the paper’s empirical illustration (Section 4) on the Walmart store-sales panel (basedata/walmart_weekly_sales.csv): weekly sales for 45 stores over 143 weeks (Feb 2010 – Oct 2012). Following the paper, we design a placebo experiment with a fictitious intervention at week 129: \(T_0 = 128\) pre-experiment weeks, of which the first \(T_E = 100\) are the fitting period and the last 28 are blank, leaving 15 experimental weeks. The design uses the constrained formulation with \(m = 2\) treated stores, uniform weights, and predictors normalised to unit variance (standardize).

import pandas as pd
from mlsynth import MAREX

df = pd.read_csv(
    "https://raw.githubusercontent.com/jgreathouse9/mlsynth/"
    "refs/heads/main/basedata/walmart_weekly_sales.csv"
)

res = MAREX({
    "df": df, "outcome": "sales", "unitid": "store", "time": "week",
    "T0": 128, "blank_periods": 28, "T_post": 15,   # TE=100, 28 blank, 15 post
    "m_eq": 2,                  # constrained design, two treated stores
    "design": "standard",
    "standardize": True,        # unit-variance predictors (paper's normalisation)
    "inference": True,
    "display_graph": True,
}).fit()

print("treated stores:", res.treated_units)              # [1, 15]
print("placebo p-value:", round(res.globres.inference.global_p_value, 3))

Because the intervention is a placebo (no real effect), a correct design should produce synthetic treated and control units that track closely and an estimated effect near zero. mlsynth reproduces exactly that — and the paper’s headline number:

Walmart placebo design (m = 2)#
Quantity	`mlsynth`	Paper (Section 4)
Pre-fit RMSE / mean sales	2.2%	small (close tracking)
Experimental ATT / mean sales	-1.0%	near zero
Placebo permutation p-value	0.937	0.933
Confidence band covers zero	yes (all post weeks)	yes

The synthetic treated and control units track to within ~2% of mean sales over the fitting and blank periods, the estimated placebo effect is ~1% of sales, and the permutation test fails to reject the null of no effect (\(p = 0.937\), matching the paper’s \(0.933\)) — exactly the “no spurious effect” result a good design should deliver on a placebo.

Note

This uses the exact MIQP (relaxed=False, the default) with standardize=True; the unit-variance normalisation is essential here because Walmart stores differ enormously in sales level, and without it the level differences dominate the match. The solve takes roughly a minute with the open-source SCIP solver (the paper used commercial Gurobi).

Note

Cross-validated against the authors’ code, without Gurobi. The durable benchmark benchmarks/cases/marex_walmart.py runs this design on the full 45-store panel, additionally matching on the four store-level covariates the reference data carries (temperature, fuel price, CPI, unemployment – the R code’s “few covariates” configuration), and compares it cell-by-cell against a live run of SCDesign’s own design code. The authors’ published routine is a Gurobi non-convex MIQP, but their constrained (cardinality-\(K\)) design – Synthetic_Experiment_Cardinality_Constraint, which enumerates the partitions of size \(\le K\) and solves each treated/control synthetic control through the open quadprog path – is exactly the design m_eq solves and needs no commercial solver. MAREX (SCIP) and SCDesign (quadprog) select the same two treated stores with treated weights agreeing to \(2\times10^{-4}\), the same pre-period fit, and the same placebo effect to \(10^{-4}\) of mean sales; see benchmarks/reference/marex_walmart/ for the captured reference run.

Correspondence with the Authors’ Code#

The authors’ R replication code (Random_Data_Generator.R, Synthetic_Experiments.R, Different_optimization_methods.R) maps directly onto mlsynth’s implementation, which was checked against it:

Authors’ R ↔ `mlsynth` MAREX#
Authors’ R	`mlsynth`
DGP (`Random_Data_Generator.R`)	`generate_marex_sample()`
Formulation (5), Gurobi non-convex QCQP	`design="standard"` (MIQP with binary `z`; same optimum)
Penalization formulation	`design="penalized"` (identical \(\lambda\) distance penalty)
Cardinality formulation	`m_eq` / `m_min` / `m_max`
Predictors \(\mathbf{x} = [\mathbf{y}^E ; \mathbf{z}]\)	`covariates=[...]` (matched on pre-outcomes + covariates)
“treated = smaller set” swap	applied in `solve_marex()`
Exact permutation test (sum statistic)	permutation inference (mlsynth defaults to a mean statistic / sampled permutations)

Driving mlsynth’s solve_design on the authors’ exact DGP and predictor matrix recovers the average treatment effect to within its scale, and the effect estimate degrades gracefully as the noise SD rises from 1 to 5 to 10 (the figures 2-7 settings) — matching the paper’s qualitative findings.

Note

Two faithfulness details from their R code: rnorm(N, 0, noise.variance) passes the value as a standard deviation, so the figures’ “variance” 1/5/10 are SDs; and the R code uses random population weights \(f_j\) whereas the 2026 paper (and mlsynth) use \(f_j = 1/J\). Also note that the unconstrained standard design (formulation 5) is degenerate — many disjoint splits match \(\bar X\) equally well, so the realised design (and hence a single ATE estimate) is solver-dependent; the cardinality-constrained design is the stable, recommended choice.

Example#

import pandas as pd
from mlsynth import MAREX

# long panel: one row per (market, period)
res = MAREX({
    "df": df, "outcome": "revenue", "unitid": "market", "time": "week",
    "T0": 40,            # 40 pre-experiment weeks
    # Equivalently: pass a 0/1 column marking the experiment window
    # "post_col": "in_experiment",
    "m_eq": 2,           # treat exactly two markets
    "design": "standard",
    "inference": True,   # blank_periods defaults to floor(0.3 * T0) = 12
    "display_graph": True,
}).fit()

print("treated markets:", res.treated_units)
print("global p-value:", res.globres.inference.global_p_value)
for label, c in res.clusters.items():
    print(label, c.unit_weight_map["Treated"])

Verification#

Validated against Abadie & Zhao’s Section 4 Walmart application (their reference code is jinglongzhao2/SCDesign): on a 10-store subset of walmart_weekly_sales.csv MAREX’s exact MIQP designs a placebo experiment that tracks closely pre-period (pre-fit RMSE ~2.7% of mean sales, matching LEXSCM) and yields a placebo effect indistinguishable from zero (~1% of mean, CI covering zero) – the paper’s “no spurious effect” result. This is an independent commit-stamped check (MAREX’s own optimizer) complementing the LEXSCM Walmart benchmark. See MAREX — Abadie & Zhao (2026) Walmart design (live R cross-validation); run it with python benchmarks/run_benchmarks.py marex_walmart.

Note

The benchmark uses the exact MIQP (free SCIP), not the relaxed continuous-z mode: the relaxation shares the design objective but drops the integrality that defines the selection, so its top-m rounding is degenerate and non-deterministic for small treated counts. The authors’ full 45-store MIQP uses Gurobi, so the validator is Path A on a subset rather than a live R cross-validation.

Core API#

MAREX: Synthetic Controls for Experimental Design (Abadie & Zhao 2026).

MAREX designs an experiment on aggregate units (e.g. markets): using only pre-experimental data it selects which units to treat (treated weights w) and which untreated units form the synthetic control (control weights v), on the simplex and disjoint (a unit is treated or a control, never both). The synthetic treated and synthetic control units are built to reproduce population predictor means, so their post-period difference estimates the average treatment effect. Optional clustering treats one (or a few) units per cluster; optional blank-period placebo inference yields p-values and confidence bands.

class mlsynth.estimators.scexp.MAREX(config: MAREXConfig | dict)#

Bases: object

Synthetic-control experimental design estimator (Abadie & Zhao 2026).

Parameters:: config (MAREXConfig or dict) – Configuration object. See mlsynth.config_models.MAREXConfig.
Returns:: MAREXResults – Per-cluster and aggregate treated/control weights, synthetic series, the selected treated units, and (optionally) placebo inference.

fit() → MAREXResults#: Run the MAREX design and return MAREXResults.

Configuration#

class mlsynth.config_models.MAREXConfig(*, df: DataFrame, outcome: str, unitid: str, time: str, T0: int | None = None, post_col: str | None = None, cluster: str | None = None, design: str = 'standard', covariates: List[str] | None = None, covariate_weight: float = 1.0, standardize: bool = False, program_type: str = 'MIQP', display_graph: bool = False, beta: float = 1e-06, lambda1: float = 0.0, lambda2: float = 0.0, xi: float = 0.0, lambda1_unit: float = 0.0, lambda2_unit: float = 0.0, costs: List[float] | None = None, budget: int | Dict[int, int] | None = None, blank_periods: int | None = None, m_eq: int | None = None, m_min: int | None = None, m_max: int | None = None, exclusive: bool = True, relaxed: bool = False, solver: Any = None, verbose: bool = False, warm_start: List | None = None, time_limit: Annotated[float | None, Gt(gt=0.0)] = None, inference: bool = False, T_post: int | None = None, to_be_treated: List | None = None, not_to_be_treated: List | None = None, adjacency: DataFrame | None = None, spillover_threshold: float = 0.0, exclude_bordering_donors: bool = False, size_col: str | None = None, min_size: float | None = None, max_size: float | None = None, cluster_col: str | None = None, stratum_col: str | None = None, min_per_stratum: Annotated[int | None, Ge(ge=1)] = None, max_per_stratum: Annotated[int | None, Ge(ge=1)] = None, top_K: Annotated[int, Ge(ge=1)] = 1, power_weight: Annotated[float, Gt(gt=0)] = 0.51, fit_weight: Annotated[float, Gt(gt=0)] = 0.49, max_shortlist: Annotated[int, Ge(ge=1)] = 5, alpha: Annotated[float, Gt(gt=0.0), Lt(lt=1.0)] = 0.05, power_target: Annotated[float, Gt(gt=0.0), Lt(lt=1.0)] = 0.8)#

Configuration for the Synthetic Experiment Design estimator (MAREX) in mlsynth.

T0: int | None#

T_post: int | None#

adjacency: pd.DataFrame | None#

alpha: float#

beta: float#

blank_periods: int | None#

budget: int | Dict[int, int] | None#

cluster: str | None#

cluster_col: str | None#

costs: List[float] | None#

covariate_weight: float#

covariates: List[str] | None#

design: str#

display_graph: bool#

exclude_bordering_donors: bool#

exclusive: bool#

fit_weight: float#

inference: bool#

lambda1: float#

lambda1_unit: float#

lambda2: float#

lambda2_unit: float#

m_eq: int | None#

m_max: int | None#

m_min: int | None#

max_per_stratum: int | None#

max_shortlist: int#

max_size: float | None#

min_per_stratum: int | None#

min_size: float | None#

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

not_to_be_treated: List | None#

post_col: str | None#

power_target: float#

power_weight: float#

program_type: str#

relaxed: bool#

size_col: str | None#

solver: Any#

spillover_threshold: float#

standardize: bool#

stratum_col: str | None#

time_limit: float | None#

to_be_treated: List | None#

top_K: int#

classmethod validate_design_params(values: Any) → Any#

verbose: bool#

warm_start: List | None#

xi: float#

Result Containers#

MAREX.fit() returns a MAREXResults: a dict of per-cluster MAREXClusterDesign objects, the aggregate MAREXGlobalDesign, the MAREXStudy hyperparameters, and (optionally) MAREXInference.

Note

MAREX.fit() returns a DesignResult (the experimental-design family, not an EffectResult): MAREX designs an experiment, so it exposes the standardized design surface – res.report (the realized effect as an EffectResult, the single source for ATT / CI / pre-fit; res.report.att / res.report.counterfactual / …), res.selected_units / res.assignment (treated vs control), res.design_weights, and res.power. The MAREX-specific design detail stays on res.clusters / res.study / res.globres / res.post_fit (the same SyntheticControlPostFit that res.report is built from).

Frozen dataclass containers for the MAREX (synthetic experimental design) pipeline.

Implements the containers for:

Abadie, A., & Zhao, J. (2026). “Synthetic Controls for Experimental Design.”

MAREX designs an experiment on aggregate units: it chooses treated weights w and control weights v (on the simplex, disjoint via w_j v_j = 0) so the synthetic treated and synthetic control units reproduce population predictor means. All containers are frozen (immutable) per the repository convention; inference, when requested, is computed up front and embedded.

class mlsynth.utils.marex_helpers.structures.MAREXClusterDesign(label: str, members: List[Any], cardinality: int, treated_weights: ndarray, control_weights: ndarray, selection_indicators: ndarray, synthetic_treated: ndarray, synthetic_control: ndarray, pre_treatment_means: ndarray, rmse: float, unit_weight_map: Dict[str, Dict[Any, float]], inference: MAREXInference | None = None)#

Bases: object

Design and synthetics for a single cluster.

Parameters:

label (str) – Cluster label.
members (list) – Unit labels in this cluster.
cardinality (int) – Number of units in the cluster.
treated_weights (np.ndarray) – Treated weights w for this cluster’s column, shape (N,).
control_weights (np.ndarray) – Control weights v for this cluster’s column, shape (N,).
selection_indicators (np.ndarray) – Binary selection mask z over the cluster’s members.
synthetic_treated (np.ndarray) – Synthetic treated outcome over the full timeline, shape (T,).
synthetic_control (np.ndarray) – Synthetic control outcome over the full timeline, shape (T,).
pre_treatment_means (np.ndarray) – Cluster predictor means used as the matching target.
rmse (float) – Pre-treatment fit RMSE (synthetic treated vs control).
unit_weight_map (dict) – {"Treated": {unit: w}, "Control": {unit: v}} for non-zero weights.
inference (MAREXInference, optional) – Inference for this cluster (None unless requested).

cardinality: int#

control_weights: ndarray#

inference: MAREXInference | None = None#

label: str#

members: List[Any]#

pre_treatment_means: ndarray#

rmse: float#

selection_indicators: ndarray#

synthetic_control: ndarray#

synthetic_treated: ndarray#

treated_weights: ndarray#

unit_weight_map: Dict[str, Dict[Any, float]]#

class mlsynth.utils.marex_helpers.structures.MAREXGlobalDesign(Y_full: ndarray, Y_fit: ndarray, Y_blank: ndarray | None, treated_weights_agg: ndarray, control_weights_agg: ndarray, synthetic_treated: ndarray, synthetic_control: ndarray, inference: MAREXInference | None = None)#

Bases: object

Aggregated (population-level) design and synthetics.

Parameters:

Y_full (np.ndarray) – Observed outcome matrix, shape (N, T).
Y_fit (np.ndarray) – Fitting slice, shape (N, T_fit).
Y_blank (np.ndarray, optional) – Held-out blank pre-periods, shape (N, Tb) (None if none).
treated_weights_agg (np.ndarray) – Cluster-size-weighted aggregate treated weights, shape (N,).
control_weights_agg (np.ndarray) – Cluster-size-weighted aggregate control weights, shape (N,).
synthetic_treated (np.ndarray) – Aggregate synthetic treated outcome, shape (T,).
synthetic_control (np.ndarray) – Aggregate synthetic control outcome, shape (T,).
inference (MAREXInference, optional) – Aggregate inference (None unless requested).

Y_blank: ndarray | None#

Y_fit: ndarray#

Y_full: ndarray#

control_weights_agg: ndarray#

inference: MAREXInference | None = None#

synthetic_control: ndarray#

synthetic_treated: ndarray#

treated_weights_agg: ndarray#

class mlsynth.utils.marex_helpers.structures.MAREXInference(treated_effects: ndarray, placebo_effects: ndarray, fulltreated_effects: ndarray, s_obs: float, global_p_value: float, per_period_pvals: ndarray, ci: ndarray, alpha: float = 0.05)#

Bases: object

Placebo/permutation inference for one synthetic treated-vs-control pair.

Parameters:

treated_effects (np.ndarray) – Post-period synthetic treated minus synthetic control, shape (T1,).
placebo_effects (np.ndarray) – The same contrast on the blank (held-out pre) periods, shape (Tb,).
fulltreated_effects (np.ndarray) – The contrast over the whole timeline, shape (T,).
s_obs (float) – Observed test statistic (mean absolute post-period effect).
global_p_value (float) – Permutation p-value for the global null of no effect.
per_period_pvals (np.ndarray) – Per-post-period p-values, shape (T1,).
ci (np.ndarray) – Split-conformal confidence band over the full timeline, shape (T, 2) (pre-period rows are NaN).
alpha (float) – Two-sided significance level.

alpha: float = 0.05#

ci: ndarray#

fulltreated_effects: ndarray#

global_p_value: float#

per_period_pvals: ndarray#

placebo_effects: ndarray#

s_obs: float#

treated_effects: ndarray#

class mlsynth.utils.marex_helpers.structures.MAREXRecommendation(winner: Dict[str, Any], shortlist: List[Dict[str, Any]], pareto: List[int], weights: Dict[str, float], status: str)#

Bases: object

Composite power-vs-fit recommendation over a MAREX solution pool.

Parameters:

winner (dict) – The recommended pool entry (lowest composite score among power-feasible designs).
shortlist (list of dict) – Pool entries ordered by composite score, truncated to max_shortlist.
pareto (list of int) – Indices (into the pool) of the designs on the fit-vs-power Pareto front.
weights (dict) – Normalised {"power": pw, "fit": fw} (sum to one).
status (str) – "OK" when at least one design has a finite MDE, else "POWER_NOT_ESTABLISHED".

pareto: List[int]#

shortlist: List[Dict[str, Any]]#

status: str#

weights: Dict[str, float]#

winner: Dict[str, Any]#

class mlsynth.utils.marex_helpers.structures.MAREXResults(*, report: BaseEstimatorResults | None = None, assignment: Any | None = None, selected_units: Any | None = None, design_weights: WeightsResults | None = None, power: Any | None = None, metadata: Dict[str, Any] | None = None, clusters: Dict[str, MAREXClusterDesign], study: MAREXStudy, globres: MAREXGlobalDesign, post_fit: Any | None = None, pool: List[Dict[str, Any]] | None = None, recommendation: Any | None = None, **extra_data: Any)#

Bases: DesignResult

User-facing output of the MAREX estimator.

A DesignResult (the experimental-design family): MAREX designs an experiment (it chooses which units to treat), so it populates the standardized design surface – report (the realized effect as an EffectResult, the single source for ATT / CI / pre-fit), selected_units / assignment / design_weights, and power – while the MAREX-specific design detail stays in the typed fields below.

Parameters:

clusters (dict of {str: MAREXClusterDesign}) – Per-cluster design (a single "0" entry when no cluster column).
study (MAREXStudy) – Design hyperparameters.
globres (MAREXGlobalDesign) – Aggregate design and synthetics.
post_fit (SyntheticControlPostFit, optional) – Standardized post-fit diagnostics (ATE / total effect / percentage lift / fit RMSEs / inference / covariate SMDs). Computed via mlsynth.utils.post_fit.compute_post_fit(). None only when an estimator failure leaves the result partially constructed. Also mirrored into report (the standardized effect view).

clusters: Dict[str, MAREXClusterDesign]#

property control_units: List[Any]#: Units assigned to control (non-zero aggregate control weight).

globres: MAREXGlobalDesign#

property mode: str#: Solver mode reported to downstream consumers.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'allow', 'frozen': True, 'json_encoders': {<class 'numpy.ndarray'>: <function MlsynthResult.Config.<lambda>>}}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

pool: List[Dict[str, Any]] | None#

post_fit: Any | None#

recommendation: Any | None#

study: MAREXStudy#

property synthetic_control: ndarray#: Aggregate synthetic control outcome, shape (T,).

property synthetic_treated: ndarray#: Aggregate synthetic treated outcome, shape (T,).

property treated_units: List[Any]#: Units assigned to treatment (non-zero aggregate treated weight).

class mlsynth.utils.marex_helpers.structures.MAREXStudy(design: str, T0: int, blank_periods: int, beta: float = 1e-06, lambda1: float = 0.0, lambda2: float = 0.0, xi: float = 0.0)#

Bases: object

Design hyperparameters of a MAREX study (was StudyConfig).

T0: int#

beta: float = 1e-06#

blank_periods: int#

design: str#

lambda1: float = 0.0#

lambda2: float = 0.0#

xi: float = 0.0#

In addition, MAREX.fit() attaches a SyntheticControlPostFit as results.post_fit: the standardized diagnostics container shared across the MAREX family (LEXSCM, MAREX, SYNDES, PANGEO). It carries the ATE / total / lift / per-period / cumulative effect summaries, the inference triple (\(p\), CI), the pre-/blank-/post-period RMSEs, the three standardized-mean-difference blocks (treated-vs-control, treated-vs-population, control-vs-population), and — when a valid noise window exists — a PowerAnalysis block with the headline MDE and the MDE-versus-horizon curve.

class mlsynth.utils.post_fit.SyntheticControlPostFit(treated_series: ndarray, control_series: ndarray, gap_series: ndarray, n_fit: int, n_blank: int, n_post: int, ate: float | None = None, total_effect: float | None = None, ate_percent: float | None = None, ate_per_period: ndarray | None = None, cumulative_effect: ndarray | None = None, p_value: float | None = None, ci_lower: float | None = None, ci_upper: float | None = None, inference_method: str | None = None, rmse_fit: float | None = None, rmse_blank: float | None = None, rmse_post: float | None = None, covariate_names: Tuple[str, ...] = (), covariate_smd: Dict[str, float] | None = None, covariate_smd_abs_max: float | None = None, covariate_smd_squared_sum: float | None = None, covariate_smd_treated_vs_pop: Dict[str, float] | None = None, covariate_smd_treated_vs_pop_abs_max: float | None = None, covariate_smd_treated_vs_pop_squared_sum: float | None = None, covariate_smd_control_vs_pop: Dict[str, float] | None = None, covariate_smd_control_vs_pop_abs_max: float | None = None, covariate_smd_control_vs_pop_squared_sum: float | None = None, power: PowerAnalysis | None = None)#

Bases: object

Standardized post-fit diagnostics for a single synthetic control design.

Field semantics are estimator-agnostic; every MAREX-family adapter populates the same shape. Any field that isn’t naturally computable for the producing estimator is left None.

ate: float | None = None#

ate_per_period: ndarray | None = None#

ate_percent: float | None = None#

ci_lower: float | None = None#

ci_upper: float | None = None#

control_series: ndarray#

covariate_names: Tuple[str, ...] = ()#

covariate_smd: Dict[str, float] | None = None#

covariate_smd_abs_max: float | None = None#

covariate_smd_control_vs_pop: Dict[str, float] | None = None#

covariate_smd_control_vs_pop_abs_max: float | None = None#

covariate_smd_control_vs_pop_squared_sum: float | None = None#

covariate_smd_squared_sum: float | None = None#

covariate_smd_treated_vs_pop: Dict[str, float] | None = None#

covariate_smd_treated_vs_pop_abs_max: float | None = None#

covariate_smd_treated_vs_pop_squared_sum: float | None = None#

cumulative_effect: ndarray | None = None#

gap_series: ndarray#

inference_method: str | None = None#

n_blank: int#

n_fit: int#

n_post: int#

p_value: float | None = None#

power: PowerAnalysis | None = None#

rmse_blank: float | None = None#

rmse_fit: float | None = None#

rmse_post: float | None = None#

total_effect: float | None = None#

treated_series: ndarray#

class mlsynth.utils.post_fit.PowerAnalysis(headline: MDEPoint, curve: Tuple[MDEPoint, ...], alpha: float, power_target: float, sigma_placebo: float, serial_correlation: float, baseline: float, method: str = 'analytical_ar1')#

Bases: object

Standardized power-analysis output attached to SyntheticControlPostFit.

Built from the placebo / blank-period gap variance and an analytical Gaussian approximation, with AR(1) variance inflation to handle serial correlation in the gap residuals. The intent matches the per-estimator power modules already in the library (PangeoPower, SPCDPowerAnalysis, SYNDESPower) but consumes the same SyntheticControlPostFit shape so every covariate-aware SCM-family estimator gets the surface for free.

headline#

MDE for the actual n_post horizon of the realised design.

Type:: MDEPoint

curve#

MDE / power values across the requested post_grid horizons (so callers can read a detectability curve).

Type:: list of MDEPoint

alpha#

Two-sided significance level assumed.

Type:: float

power_target#

Target power the MDEs are computed at (default 0.80).

Type:: float

sigma_placebo#

Standard deviation of the placebo gap series used as the noise scale.

Type:: float

serial_correlation#

Lag-1 (AR(1)) autocorrelation of the placebo gap residuals used to inflate the variance for serial dependence.

Type:: float

baseline#

Mean of the control trajectory on the post window (denominator for mde_pct). NaN when no post window exists.

Type:: float

method#

"analytical_ar1" for the closed-form Gaussian + AR(1) MDE used here. Reserved for future "monte_carlo" extensions.

Type:: str

alpha: float#

baseline: float#

curve: Tuple[MDEPoint, ...]#

headline: MDEPoint#

mde_by_horizon() → Dict[int, float]#: {post_periods: mde_pct} for quick lookup.

method: str = 'analytical_ar1'#

power_target: float#

serial_correlation: float#

sigma_placebo: float#

class mlsynth.utils.post_fit.MDEPoint(post_periods: int, mde_absolute: float, mde_pct: float, se: float, power_at_observed: float | None = None)#

Bases: object

Minimum detectable effect at a single post-treatment horizon.

mde_absolute: float#

mde_pct: float#

post_periods: int#

power_at_observed: float | None = None#

se: float#

Helper Modules#

Input preparation for MAREX: long panel -> design-ready arrays.

class mlsynth.utils.marex_helpers.setup.MAREXPanel(Y_full: DataFrame, clusters: ndarray, T0: int, blank_periods: int, unit_index: IndexSet, time_index: IndexSet, covariates: ndarray | None = None, covariate_names: Tuple[str, ...] = ())#

Prepared MAREX inputs.

unit_index / time_index are the single source of truth for unit and time identity (label <-> integer index). Y_full rows align to unit_index.labels and its columns to time_index.labels; the clusters vector is in unit_index order. Every downstream consumer (optimizer, orchestrator, restriction builder) indexes through these IndexSets rather than re-deriving identity from a DataFrame.

T0: int#

Y_full: DataFrame#

blank_periods: int#

clusters: ndarray#

covariate_names: Tuple[str, ...] = ()#

covariates: ndarray | None = None#

time_index: IndexSet#

unit_index: IndexSet#

mlsynth.utils.marex_helpers.setup.prepare_marex_panel(df: DataFrame, outcome: str, unitid: str, time: str, cluster: str | None, T0: int | None, inference: bool, blank_periods: int, T_post: int | None, covariates: List[str] | None = None) → MAREXPanel#

Pivot the long panel to units x time and forward the resolved T0 / blank_periods.

The MAREX config validator is the single source of truth for resolving T0 (from either an explicit scalar or a post_col 0/1 column) and the default 30%-of-pre-tail blank window — by the time this helper runs both are concrete integers. covariates columns are aggregated to a per-unit pre-period mean via mlsynth.utils.datautils.build_covariate_matrix() and returned as an (N, R) matrix aligned to the unit order. The matrix is left un-normalised here so MAREX’s existing standardize=True flag (applied to the combined [Y_fit; covariate_weight * Z] predictor matrix in marex_helpers.optimization) keeps its previous behaviour.

Design-formulation primitives for MAREX (Abadie & Zhao 2026).

The experimenter chooses treated weights w and control weights v per cluster on the simplex, with a binary selection mask z linking them (w_j <= z_j, v_j <= 1 - z_j) so a unit is either treated or a control, never both (the disjointness w_j v_j = 0). These helpers build the cvxpy variables, constraints, and the design-specific objective; the objective form is selected by design:

"base" – match each cluster mean with both synthetic units;
"weak" – match the treated synthetic to the mean and softly tie the control synthetic to it (weight beta);
"eq11" – base plus cluster-level distance penalties (lambda1 / lambda2);
"unit" – base plus unit-level penalties (xi / lambda1_unit / lambda2_unit).

mlsynth.utils.marex_helpers.formulation.build_constraints(w, v, z, M, cluster_members, cluster_labels, m_eq, m_min, m_max, costs, budget_dict, exclusive, restrictions=None)#: Simplex, disjointness, cardinality, cost, exclusivity, and (optional) geographic-restriction constraints.

mlsynth.utils.marex_helpers.formulation.build_membership_mask(clusters, label_to_k, N, K)#: Boolean (N, K) mask of unit-to-cluster membership.

mlsynth.utils.marex_helpers.formulation.build_objective(Y_fit, Xbar_clusters, cluster_members, w, v, z, design, beta=1e-06, lambda1=0.0, lambda2=0.0, xi=0.0, lambda1_unit=0.0, lambda2_unit=0.0, D1=None, D2_list=None, zeta=0.0)#

Design-specific cvxpy objective (see module docstring).

zeta adds an optional integrality penalty z (1 - z) used by the relaxed (continuous-z) solve; it is 0 for the exact MIQP.

mlsynth.utils.marex_helpers.formulation.compute_cluster_means_members(Y_fit, M, cluster_labels)#: Per-cluster predictor means and member index arrays.

mlsynth.utils.marex_helpers.formulation.get_per_cluster_param(param, klabel, default=None)#: Resolve a possibly-per-cluster parameter to its value for klabel.

mlsynth.utils.marex_helpers.formulation.init_cvxpy_variables(N, K, boolean=True)#

Treated (w), control (v) weights and selection (z).

z is binary for the exact MIQP (boolean=True) or continuous in [0, 1] for the relaxed QP (boolean=False).

mlsynth.utils.marex_helpers.formulation.precompute_distances(Y_fit, Xbar_clusters, cluster_members)#: Unit-to-cluster-mean distances D1 and within-cluster pairwise D2.

mlsynth.utils.marex_helpers.formulation.prepare_clusters(Y_full, clusters)#: Coerce Y_full/clusters to arrays and return cluster bookkeeping.

mlsynth.utils.marex_helpers.formulation.prepare_fit_slices(Y_full_np, T0, blank_periods)#: Split the pre-period into a fitting slice and a held-out blank slice.

mlsynth.utils.marex_helpers.formulation.validate_costs_budget(costs, budget, N, cluster_labels, K)#: Validate cost/budget inputs; return (costs_np, budget_dict).

mlsynth.utils.marex_helpers.formulation.validate_scm_inputs(Y_full, T0, blank_periods, design, beta=1e-06, lambda1=0.0, lambda2=0.0, xi=0.0, lambda1_unit=0.0, lambda2_unit=0.0)#: Validate shapes and design/parameter compatibility (raises ValueError).

MAREX design optimizers (Abadie & Zhao 2026).

solve_design solves the exact mixed-integer design (binary selection z); solve_design_relaxed relaxes z to [0, 1], solves the QP, then discretizes post hoc. Both return a raw result dict consumed by the orchestrator.

mlsynth.utils.marex_helpers.optimization.post_hoc_discretize(w_opt, v_opt, cluster_members, cluster_labels, m_eq=None, m_min=None, m_max=None, trim_threshold=0.01, Y_fit=None, Y_blank=None)#: Round relaxed weights to a feasible integer design (was internal).

mlsynth.utils.marex_helpers.optimization.solve_design(Y_full, T0, clusters, blank_periods=0, m_eq=None, m_min=None, m_max=None, exclusive=True, design='standard', beta=1e-06, lambda1=0.0, lambda2=0.0, xi=0.0, lambda1_unit=0.0, lambda2_unit=0.0, costs=None, budget=None, covariates=None, covariate_weight=1.0, standardize=False, solver='SCIP', verbose=False, restrictions=None, forbidden=None, warm_start=None, time_limit=None)#

Exact mixed-integer MAREX design (was SCMEXP).

forbidden is an optional list of previously-chosen assignments, each a list of (unit_idx, cluster_idx) pairs; for every one a no-good cut sum z[pairs] <= |pairs| - 1 is added, forbidding that exact design so a re-solve yields the next-best distinct one (used to build a solution pool).

warm_start is an optional iterable of treated unit indices (row order of Y_full); each is seeded z[j, cluster(j)] = 1 as a SCIP partial-solution MIP start. It is a hint – the proven optimum is unchanged – but gives the branch-and-bound a strong incumbent immediately. time_limit (seconds) caps the SCIP solve and returns the best incumbent found; with both set, MAREX can refine a seed design under a budget instead of proving optimality. Defaults of None leave the original solve path untouched.

mlsynth.utils.marex_helpers.optimization.solve_design_pool(Y_full, T0, clusters, *, top_K=1, **kwargs)#

Enumerate up to top_K distinct exact designs via no-good cuts.

Calls solve_design() repeatedly, forbidding each chosen assignment so the next solve returns the next-best distinct design. Stops early when the feasible region is exhausted (the solver becomes infeasible).

mlsynth.utils.marex_helpers.optimization.solve_design_relaxed(Y_full, T0, clusters, blank_periods=0, m_eq=None, m_min=None, m_max=None, exclusive=True, design='standard', beta=1e-06, lambda1=0.0, lambda2=0.0, xi=0.0, lambda1_unit=0.0, lambda2_unit=0.0, costs=None, budget=None, covariates=None, covariate_weight=1.0, standardize=False, solver=None, verbose=False, zeta=0.0, trim_threshold=0.01)#: Relaxed (continuous-z) design with post-hoc discretization (was SCMEXP_REL).

Placebo/permutation inference for a MAREX synthetic treated-vs-control pair.

The held-out blank pre-periods act as placebos: the synthetic treated minus synthetic control there should be noise, so its distribution calibrates a permutation p-value and a split-conformal confidence band for the post-period effect (Abadie & Zhao 2026, OA; Chernozhukov-Wuthrich-Zhu 2021).

mlsynth.utils.marex_helpers.inference.compute_inference(Y_treated: ndarray, Y_control: ndarray, T0: int, TcE: int, Tb: int, alpha: float = 0.05, max_combinations: int = 1000, random_state: int | None = None) → MAREXInference#

Compute permutation inference for one synthetic contrast.

Parameters:

Y_treated, Y_control (np.ndarray) – Synthetic treated / control outcomes over the full timeline, shape (T,).
T0 (int) – Number of pre-treatment periods.
TcE (int) – Start index of the blank (held-out) window.
Tb (int) – Number of blank periods.
alpha (float) – Two-sided significance level.
max_combinations (int) – Number of permutation draws for the global test.
random_state (int, optional) – Seed for reproducibility.

Returns:

MAREXInference

Top-level MAREX solve: run the design optimizer and assemble frozen results.

mlsynth.utils.marex_helpers.orchestration.solve_marex(Y_full, T0, clusters, design='standard', blank_periods=0, m_eq=None, m_min=None, m_max=None, exclusive=True, beta=1e-06, lambda1=0.0, lambda2=0.0, xi=0.0, lambda1_unit=0.0, lambda2_unit=0.0, costs=None, budget=None, covariates=None, covariate_names=(), covariate_weight=1.0, standardize=False, solver=None, verbose=False, relaxed=False, inference=False, alpha=0.05, max_combinations=1000, random_state=42, unit_index=None, time_index=None, restrictions=None, warm_start=None, time_limit=None) → MAREXResults#

Solve the MAREX design and return a frozen MAREXResults.

With relaxed=True the continuous-z QP with post-hoc discretization is used; otherwise the exact MIQP. With inference=True, blank-period placebo inference is computed for every cluster and the aggregate.

Linear-factor DGP for the MAREX simulation study (Abadie & Zhao 2026, Sec. 5).

Reimplements the paper’s baseline data-generating process (Assumption 1, equations 12a/12b) so the simulation study can be replicated without the authors’ code. Potential outcomes are

Y^N_jt = delta_t + theta_t’ Z_j + lambda_t’ mu_j + eps_jt Y^I_jt = upsilon_t + gamma_t’ Z_j + eta_t’ mu_j + xi_jt

with Z_j (R observed) and mu_j (F unobserved) covariates, sorted time effects, and i.i.d. Normal(0, sigma^2) noise.

class mlsynth.utils.marex_helpers.simulation.MAREXSample(Y_N: ndarray, Y_I: ndarray, tau: ndarray, T0: int, Z: ndarray)#

One simulated sample.

Y_N#

Potential outcomes under no treatment, shape (J, T).

Type:: np.ndarray

Y_I#

Potential outcomes under treatment, shape (J, T).

Type:: np.ndarray

tau#

True average treatment effect tau_t per period, shape (T,) (zero in the pre-period).

Type:: np.ndarray

T0#

Number of pre-treatment periods.

Type:: int

Z#

Observed (time-invariant) covariates that generate the outcome, shape (J, R).

Type:: np.ndarray

T0: int#

Y_I: ndarray#

Y_N: ndarray#

Z: ndarray#

tau: ndarray#

mlsynth.utils.marex_helpers.simulation.generate_marex_sample(J: int = 15, R: int = 7, F: int = 11, T: int = 30, T0: int = 25, sigma: float = 1.0, rng: Generator | None = None) → MAREXSample#

Draw one sample from the paper’s baseline linear-factor DGP (Sec. 5).

Returns:: MAREXSample

Plotting for MAREX: synthetic treated vs control (or the treatment effect).

mlsynth.utils.marex_helpers.plotter.plot_marex(results: MAREXResults, clusters: List[str] | None = None, plot_type: str = 'treatment', global_result: bool = True, figsize: tuple = (12, 6), donor_cloud: bool = False) → None#

Plot MAREX treatment effects (or predictions), one panel per cluster + global.

Parameters:

results (MAREXResults) – Output of mlsynth.estimators.MAREX.
clusters (list of str, optional) – Cluster labels to plot (default: all; a lone "0" cluster is skipped).
plot_type ({“treatment”, “prediction”}) – Plot the treated-minus-control effect, or both synthetic series.
global_result (bool) – Include the aggregate (global) panel.
donor_cloud (bool) – On the global "prediction" panel, overlay one faint line per unit (the rows of results.globres.Y_full) behind the series – the “observed data” cloud of Abadie & Zhao’s Figure 4. Ignored on the "treatment" (effect) plot and on cluster panels, which carry no per-unit outcome matrix. The population mean (a thick solid line) is drawn whenever the per-unit matrix is available, with or without the cloud; the synthetic treated and control are thinner dashed red / blue lines so they stand out against it.

The shared post-fit module — compute_smd(), compute_post_fit(), and compute_power_analysis() — lives outside the marex_helpers package so the other MAREX-family estimators (LEXSCM, SYNDES, PANGEO) can call into the same one-source-of-truth diagnostics:

Standardized post-fit diagnostics for synthetic control designs and the matching power-analysis surface that consumes them.

After any MAREX-family estimator (LEXSCM, MAREX, SYNDES, PANGEO, …) solves its design problem, downstream consumers (the SAGE dashboard, paper-style reports, comparison tables) all need the same numbers:

the post-treatment ATT, total effect, percentage lift, per-period gap;

pre / blank / post root-mean-squared-error of the synthetic gap;

inference scalars (p-value, CI bounds) when computed;

covariate-balance standardized mean differences (SMDs) when covariates were used in the design.

This module exposes one frozen dataclass (SyntheticControlPostFit) and three free functions:

compute_smd() – standalone, panel-independent SMD
from any (cov_matrix, treated_w, control_w);

compute_post_fit() – the full diagnostic bundle from
trajectories + boundaries + (optional) covariate matrix + (optional) inference;

compute_post_fit_marex() – adapter that builds the bundle from a
MAREXResults + MAREXPanel pair.

The free-function entry points are deliberately small and reusable, so the LEXSCM / SYNDES / PANGEO equivalents can be added one-at-a-time without touching this module: they just compose the same primitives.

class mlsynth.utils.post_fit.MDEPoint(post_periods: int, mde_absolute: float, mde_pct: float, se: float, power_at_observed: float | None = None)#

Bases: object

Minimum detectable effect at a single post-treatment horizon.

mde_absolute: float#

mde_pct: float#

post_periods: int#

power_at_observed: float | None = None#

se: float#

Bases: object

Standardized power-analysis output attached to SyntheticControlPostFit.

headline#

MDE for the actual n_post horizon of the realised design.

Type:: MDEPoint

curve#

MDE / power values across the requested post_grid horizons (so callers can read a detectability curve).

Type:: list of MDEPoint

alpha#

Two-sided significance level assumed.

Type:: float

power_target#

Target power the MDEs are computed at (default 0.80).

Type:: float

sigma_placebo#

Standard deviation of the placebo gap series used as the noise scale.

Type:: float

serial_correlation#

Lag-1 (AR(1)) autocorrelation of the placebo gap residuals used to inflate the variance for serial dependence.

Type:: float

baseline#

Mean of the control trajectory on the post window (denominator for mde_pct). NaN when no post window exists.

Type:: float

method#

"analytical_ar1" for the closed-form Gaussian + AR(1) MDE used here. Reserved for future "monte_carlo" extensions.

Type:: str

alpha: float#

baseline: float#

curve: Tuple[MDEPoint, ...]#

headline: MDEPoint#

mde_by_horizon() → Dict[int, float]#: {post_periods: mde_pct} for quick lookup.

method: str = 'analytical_ar1'#

power_target: float#

serial_correlation: float#

sigma_placebo: float#

Bases: object

Standardized post-fit diagnostics for a single synthetic control design.

Field semantics are estimator-agnostic; every MAREX-family adapter populates the same shape. Any field that isn’t naturally computable for the producing estimator is left None.

ate: float | None = None#

ate_per_period: ndarray | None = None#

ate_percent: float | None = None#

ci_lower: float | None = None#

ci_upper: float | None = None#

control_series: ndarray#

covariate_names: Tuple[str, ...] = ()#

covariate_smd: Dict[str, float] | None = None#

covariate_smd_abs_max: float | None = None#

covariate_smd_control_vs_pop: Dict[str, float] | None = None#

covariate_smd_control_vs_pop_abs_max: float | None = None#

covariate_smd_control_vs_pop_squared_sum: float | None = None#

covariate_smd_squared_sum: float | None = None#

covariate_smd_treated_vs_pop: Dict[str, float] | None = None#

covariate_smd_treated_vs_pop_abs_max: float | None = None#

covariate_smd_treated_vs_pop_squared_sum: float | None = None#

cumulative_effect: ndarray | None = None#

gap_series: ndarray#

inference_method: str | None = None#

n_blank: int#

n_fit: int#

n_post: int#

p_value: float | None = None#

power: PowerAnalysis | None = None#

rmse_blank: float | None = None#

rmse_fit: float | None = None#

rmse_post: float | None = None#

total_effect: float | None = None#

treated_series: ndarray#

mlsynth.utils.post_fit.compute_post_fit(treated_series: ndarray, control_series: ndarray, *, n_fit: int, n_blank: int = 0, n_post: int | None = None, cov_matrix: ndarray | None = None, cov_names: Sequence[str] | None = None, cov_scales: ndarray | None = None, treated_weights: ndarray | None = None, control_weights: ndarray | None = None, population_weights: ndarray | None = None, inference: Any | None = None, n_treated_units: int | None = None) → SyntheticControlPostFit#

Compute a SyntheticControlPostFit from trajectories + boundaries.

The trajectories treated_series and control_series are the estimator’s own synthetic constructs (Σⱼ wⱼ Yⱼ and Σⱼ vⱼ Yⱼ in Abadie-Zhou notation). n_post defaults to len(treated_series) - n_fit - n_blank.

Covariate balance fields are populated when cov_matrix + treated_weights + control_weights are all supplied (the natural inputs for any MAREX-family design). The compute_smd() helper does the work, so the SMD numbers are exactly consistent with a standalone call to compute_smd().

Inference scalars are pulled from the estimator’s inference object via _extract_inference(), which knows about the four common shapes (LEXSCM Inference, MAREX MAREXInference, SYNDES SYNDESInference, or a plain dict). All inference fields are optional.

mlsynth.utils.post_fit.compute_post_fit_marex(raw, panel, *, cov_scales: ndarray | None = None) → SyntheticControlPostFit#

Adapt a MAREXResults + MAREXPanel pair into a SyntheticControlPostFit.

Pulls the aggregate synthetic-treated / synthetic-control trajectories from raw.globres, the (T0, blank_periods) split from panel.T0 and panel.blank_periods, the inference object from raw.globres.inference, and the covariate matrix from panel.covariates (when present).

mlsynth.utils.post_fit.compute_power_analysis(post_fit: SyntheticControlPostFit, *, alpha: float = 0.05, power_target: float = 0.8, post_grid: Sequence[int] | None = None) → PowerAnalysis#

Analytical MDE + power curve for a design’s SyntheticControlPostFit.

Uses the placebo / blank-period gap residuals (or the pre-period gap when no blank window was carved out) to estimate the noise standard deviation sigma_placebo and the AR(1) autocorrelation rho, then computes the minimum detectable effect for each horizon T in post_grid via the Gaussian formula

MDE(T) = (z_{1-alpha/2} + z_{power}) * sigma_placebo * sqrt(VIF(T, rho)),

where VIF(T, rho) = Var(mean of T AR(1) periods) / sigma_placebo^2. The headline MDE uses T = post_fit.n_post (the realised post window).

Parameters:

post_fit (SyntheticControlPostFit) – The standardized post-fit from any MAREX-family estimator.
alpha (float, default 0.05) – Two-sided significance level.
power_target (float, default 0.80) – Target power for the MDE.
post_grid (sequence of int, optional) – Post-treatment horizons at which to compute MDE. Defaults to a small geometric grid centered on post_fit.n_post so users see the detectability tradeoff vs. running the experiment longer.

Returns:

PowerAnalysis – Headline MDE + a curve over the requested horizons.

mlsynth.utils.post_fit.compute_smd(cov_matrix: ndarray, treated_weights: ndarray, control_weights: ndarray, *, cov_names: Sequence[str] | None = None, cov_scales: ndarray | None = None) → Dict[str, Any]#

Standardized mean differences between weighted treated and control means.

Parameters:

cov_matrix (ndarray, shape (N, M)) – Per-unit covariate values; rows align to treated_weights and control_weights.
treated_weights, control_weights (ndarray, shape (N,)) – Non-negative weights with disjoint supports. They are renormalised to sum to 1 internally (so callers may pass raw sums-to-K weights).
cov_names (sequence of str, optional) – Names for the M covariates. Defaults to ("cov_0", "cov_1", ...).
cov_scales (ndarray, shape (M,), optional) – Pre-computed per-covariate standardization scales (cross-unit std). Defaults to the std of cov_matrix columns. Passing the value already cached by build_covariate_matrix is the right move.

Returns:

dict with keys smd (the per-covariate dict), smd_abs_max,
and smd_squared_sum. Returns empty / NaN summaries if either weight
vector is all-zero.

mlsynth.utils.post_fit.to_effect_result(pf: SyntheticControlPostFit, *, time_periods: ndarray | None = None, intervention_time: Any | None = None, method_name: str | None = None, donor_weights: Dict[str, float] | None = None) → Any#

Convert a SyntheticControlPostFit into a standardized EffectResult.

The single, family-wide adapter from the rich post-fit bundle to the contract’s EffectResult view, so every MAREX-family estimator (LEXSCM, MAREX, SYNDES, PANGEO) gets report for free instead of hand-copying fields. The realized effect’s standard scalars populate the standard sub-models; everything the contract has no slot for (per-period effects, cumulative effect, covariate SMDs, and the full post_fit object itself) is carried in additional_outputs so it remains discoverable.

References#

Abadie, A., & Zhao, J. (2026). “Synthetic Controls for Experimental Design.” See [ABADIE2024].

Abadie, A., Diamond, A., & Hainmueller, J. (2010). “Synthetic Control Methods for Comparative Case Studies.” Journal of the American Statistical Association 105(490):493-505.

Chernozhukov, V., Wuthrich, K., & Zhu, Y. (2021). “An Exact and Robust Conformal Inference Method for Counterfactual and Synthetic Controls.” Journal of the American Statistical Association 116(536):1849-1864.

Synthetic Controls for Experimental Design (MAREX)

Contents

Synthetic Controls for Experimental Design (MAREX)#

When to Use This Estimator#

Notation#

Assumptions#

Mathematical Formulation#

The Design Optimization#

Covariates#

Clustering, Costs, and Budgets#

Geographic design restrictions#

Across groups: cluster vs. quotas vs. restrictions#

Inference#

Standardized Post-Fit and Power Analysis#

Three Standardized Mean Differences#

Power Analysis and Minimum Detectable Effect#

Where the noise standard deviation comes from#

Serial correlation matters#

The MDE formula#

What the surface looks like#

Practical reading#

Opting out#

Monte Carlo: Recovering the Treatment Effect#

Empirical Application: Walmart (Placebo Experiment)#

Correspondence with the Authors’ Code#

Example#

Verification#

Core API#

Configuration#

Result Containers#

Helper Modules#

References#