Synthetic Controls for Experimental Design (MAREX)#
When to Use This Estimator#
The estimators elsewhere in mlsynth are retrospective: a treatment has
already happened and you reweight donors to reconstruct the treated unit’s
counterfactual. MAREX, due to Abadie and Zhao (2026) [ABADIE2024], is
prospective — it designs an experiment. Before any treatment is assigned,
and using only pre-experimental data, it chooses which aggregate units to
treat and which to hold out as controls, so that the experiment you are
about to run yields a credible estimate.
The motivating setting is a firm (say a ride-sharing company) that wants to test a new policy but can only deploy it in a few whole markets. A within-market A/B test is contaminated by interference (treated and control drivers compete); randomizing whole markets to treatment is unbiased ex ante but, with a handful of large units, routinely produces treated and control groups with very different baselines, so any single realisation is badly off. MAREX instead picks the treated and control markets so their pre-experiment predictors match the population — a non-randomized design that, the paper shows, substantially reduces estimation bias relative to randomization.
Reach for MAREX when:
Units are large aggregates (markets, regions, stores) and only one or a few can be treated.
You control the assignment and want to choose it well, rather than estimate after the fact.
Interference or equity rules out within-unit randomization, forcing whole-unit treatment.
Notation#
There are \(J\) units and \(T\) periods, with \(T_0\) pre-experiment periods; the experiment runs over \(t = T_0 + 1, \dots, T\). Each unit has a pre-intervention predictor vector \(X_j\) (pre-period outcomes and optional covariates); \(\bar X = \sum_j f_j X_j\) is the population predictor mean for known weights \(f_j\) (e.g. market shares, or \(1/J\)). The experimenter chooses treated weights \(w\) and control weights \(v\), both on the simplex, and disjoint:
Units with \(w_j > 0\) are treated; among the rest, units with \(v_j > 0\) form the synthetic control. Writing \(Y_{jt}\) for the observed outcome (treated units realise \(Y^I_{jt}\) post-treatment, everyone else \(Y^N_{jt}\)), the design estimator of the average effect is
Assumptions#
Assumption 1 (linear factor model). Potential outcomes follow
with observed covariates \(Z_j\), unobserved factors \(\mu_j\), and mean-zero idiosyncratic noise.
Remark. This is the interactive-fixed-effects model of the SC literature (Abadie-Diamond-Hainmueller 2010), extended with a separate factor structure for the treated potential outcome — necessary because a design must choose a treatment group, not just a comparison group.
Assumption 2 (regularity). The factor loadings are non-degenerate (\(F \le T_E\), smallest eigenvalue bounded below) and the noise is i.i.d. sub-Gaussian with common variance, independent across the two potential outcomes; dependence across units is allowed.
Assumption 3 / 4 (fit quality). A weight vector reproducing the population predictor means exists exactly (Assumption 3), or approximately within a tolerance \(d\) (Assumption 4). This is the design-time analogue of “the treated unit lies in the convex hull of the donors.”
Remark. Under these conditions Abadie & Zhao bound the bias of \(\hat\tau_t(w, v)\) and develop the permutation test below; the better the pre-experiment match, the smaller the bias.
Mathematical Formulation#
The Design Optimization#
MAREX chooses \(w, v\) (and a binary selection mask \(z\), with
\(w_j \le z_j\), \(v_j \le 1 - z_j\), so a unit is treated or a
control, never both) to match the population predictor mean. The base design
minimises
with the number of treated units pinned by m_eq (exactly) or bounded by
m_min/m_max. This is a mixed-integer quadratic program (the binary
z); mlsynth solves it with SCIP by default, or — via relaxed=True —
relaxes z to \([0, 1]\), solves the QP, and discretises post hoc.
mlsynth exposes four objective variants through design (clear names
that map to the paper’s formulations):
"standard"— match each predictor mean with both synthetic units (formulation 5);"weakly_targeted"— match the treated synthetic to the mean and softly tie the control synthetic to it (weightbeta);"penalized"—standardplus a distance penalty that down-weights units far from the population mean (lambda1/lambda2);"unit_penalized"—standardplus unit-level penalties (lambda1_unit/lambda2_unit).
Covariates#
By default the design matches on pre-period outcomes. Passing covariates
(time-invariant column names) appends them to the predictor vector,
\(X_j = [Y^E_j ; Z_j]\), exactly as in the paper — the synthetic treated and
control are then balanced on both pre-period outcomes and covariates (with an
optional covariate_weight scale). When the pre-period is long, the outcomes
already encode the covariates’ contribution, so covariates matter most when few
pre-periods are available.
Clustering, Costs, and Budgets#
Passing a cluster column solves the design within each cluster (one or a
few treated units per cluster), which better approximates the population
predictor distribution and limits interpolation bias (paper OA.1). Per-unit
costs and a budget (scalar or per-cluster) add a knapsack constraint
\(\sum_j c_j w_j \le B\), so the chosen treatment group respects a spend cap.
Inference#
When inference=True with blank_periods > 0, the last few pre-experiment
periods are held out as blanks: there the synthetic treated minus synthetic
control is pure noise, so its distribution calibrates inference for the
post-period effect. MAREX reports a permutation p-value for the global null of
no effect, per-period p-values, and a split-conformal confidence band
(Chernozhukov-Wuthrich-Zhu 2021), all on
MAREXInference.
Standardized Post-Fit and Power Analysis#
Every call to MAREX.fit() attaches a
SyntheticControlPostFit to res.post_fit.
This is the single, estimator-agnostic surface for the diagnostic numbers a
consumer of the design typically needs: effects, fit RMSEs, conformal /
permutation inference, covariate balance (when covariates were used), and
power analysis. It is computed by
compute_post_fit() from MAREX’s own
synthetic_treated / synthetic_control trajectories and weight vectors,
so by construction it agrees with what the underlying optimization produced.
pf = res.post_fit # SyntheticControlPostFit
pf.ate, pf.ate_percent, pf.total_effect # treatment-effect scalars
pf.rmse_fit, pf.rmse_blank, pf.rmse_post # fit quality, per phase
pf.p_value, pf.ci_lower, pf.ci_upper # inference (when computed)
pf.covariate_smd # treated-vs-control SMD dict
pf.covariate_smd_treated_vs_pop # treated-vs-population
pf.covariate_smd_control_vs_pop # control-vs-population
pf.power # PowerAnalysis (see below)
Three Standardized Mean Differences#
When covariates=[...] is set, the post-fit reports the three covariate
balance diagnostics that match the structure of Abadie & Zhao’s objective.
Each is a per-covariate signed dict (covariate_smd_*) plus two summary
scalars (max absolute SMD, sum of squared SMDs). With \(\bar X\) the
population covariate aggregate, \(X_w := \sum_j w_j X_j\),
\(X_v := \sum_j v_j X_j\), and \(s_m\) the cross-unit standard
deviation of covariate \(m\), each comparison is the unit-free vector
The three pairs (a, b) reported are:
covariate_smd—(X_w, X_v): synthetic treated vs synthetic control. The internal-validity check (“is the experiment apples-to-apples?”).covariate_smd_treated_vs_pop—(X_w, \bar X): synthetic treated vs population aggregate. Tracks the first term of MAREX’s objective, \(\|\bar X - \sum_j w_j X_j\|^2\). Tells you whether the chosen treated group represents the population.covariate_smd_control_vs_pop—(X_v, \bar X): synthetic control vs population aggregate. Tracks the second term of the objective. Tells you whether the control set represents the population.
A rule-of-thumb threshold of \(|\mathrm{SMD}| < 0.1\) is conventionally “well balanced”; below \(0.25\) is acceptable; above is a red flag.
Power Analysis and Minimum Detectable Effect#
Power analysis answers the pre-experiment planning question: given the design I’ve chosen, how large a treatment effect can I detect with high probability? This is the dual of inference: inference asks “is the observed effect distinguishable from noise?”, power asks “what effect sizes would be?”
The paper develops permutation inference for MAREX but does not provide a
matching MDE. mlsynth fills this gap with an analytical, AR(1)-inflated
Gaussian MDE computed from the same residual series the permutation test
draws on. Set inference=True and the result auto-populates
res.post_fit.power — blank_periods defaults to
max(1, floor(0.3 * T0)) so you do not need to pick a scalar yourself
(matching the LEXSCM / SYNDES / PANGEO convention).
Where the noise standard deviation comes from#
Under the linear factor model of Assumption 1, the per-period contrast \(g_t := \sum_j w_j Y_{jt} - \sum_j v_j Y_{jt}\) has expectation zero under the no-effect null. Its sample SD on the blank window \(\mathcal{B}\) (the held-out tail of the pre-period) is the natural estimator of the noise scale:
When no blank window is carved out (inference=False) the pre-period gap
serves as the placebo proxy. The blank-window estimator is preferred because
it uses periods that played no role in fitting the weights — it is honest in
exactly the same sense Chernozhukov-Wuthrich-Zhu’s conformal residuals are.
Serial correlation matters#
Synthetic-control gap residuals are virtually always serially correlated: the donor weighting absorbs the level but the persistent components of the factor structure (business cycles, seasonality, slow trends) leak through. Ignoring this systematically under-states the SE at long horizons. We model it as an AR(1) process with lag-1 autocorrelation
clipped to \((-0.99, 0.99)\) for numerical safety. The variance of the mean of \(T\) consecutive AR(1) periods, expressed as a multiple of \(\sigma^2\), is the variance inflation factor
which collapses to the textbook \(1/T\) when \(\rho = 0\) and grows substantially for \(\rho > 0.3\). The same formula is used by PANGEO’s power module.
The MDE formula#
Combining: the standard error of the mean of \(T\) post-period contrasts under \(H_0\) is \(\mathrm{SE}(T) = \hat\sigma_{\text{placebo}} \, \sqrt{\mathrm{VIF}(T, \hat\rho)}\). For a two-sided test at level \(\alpha\) with target power \(1 - \beta\), the minimum detectable effect is
The corresponding power to detect a given true effect \(\tau\) at horizon \(T\) is
which is reported as power_at_observed for each horizon point using the
realised \(\hat\tau\).
What the surface looks like#
p = res.post_fit.power # PowerAnalysis dataclass
p.headline.mde_absolute # MDE at the realised T_post
p.headline.mde_pct # ... as % of post-period baseline
p.headline.se # implied SE of mean(g_t) over T_post
p.headline.power_at_observed # power to detect res.post_fit.ate
p.curve # tuple of MDEPoint, one per horizon
for pt in p.curve:
print(pt.post_periods, pt.mde_absolute, pt.mde_pct, pt.power_at_observed)
p.sigma_placebo # σ̂ used (from blank or pre window)
p.serial_correlation # ρ̂ AR(1) of the placebo gaps
p.baseline # mean(synthetic_control) on post window
p.alpha, p.power_target # 0.05 / 0.80 by default
p.method # "analytical_ar1"
The default horizon grid covers \(T \in \{1, 2, 4, 6, 8, 12\}\) plus the
realised n_post, so the table also doubles as a “how long do I need to
run?” answer — pick the smallest \(T\) whose MDE drops below your
target effect size.
Practical reading#
A typical MAREX run with T_post = 6, blank_periods = 4 and modest
serial correlation (\(\hat\rho \approx 0.5\)) on a Walmart-style sales
panel produces an MDE on the order of 0.05–0.15% of mean sales, well
below the 1–3% effect sizes typical marketing interventions aim for; this
is the quantitative substance of “good designs are well-powered”.
Conversely, an MDE much above the expected effect is a signal the design
needs more units (lower m_eq/m_max are typically worse for power)
or more post-periods (extend the experiment).
Opting out#
The power computation is wrapped in a try/except in
solve_marex() — a power
analysis failure (e.g. degenerate residual variance) never breaks the fit,
res.post_fit.power is just left as None. To compute power on a
non-default horizon grid or significance level, call the free function
directly:
from mlsynth.utils.post_fit import compute_power_analysis
alt = compute_power_analysis(
res.post_fit, alpha=0.10, power_target=0.90,
post_grid=[2, 4, 8, 16, 32, 52], # weekly horizons out to a year
)
Monte Carlo: Recovering the Treatment Effect#
The block below replicates the qualitative finding of the paper’s simulation
study (Section 5) using mlsynth’s own reimplementation of the linear-factor
DGP. A sample is drawn, the design is fit on the pre-period, the treated units
realise \(Y^I\) in the experiment, and the estimate is compared to the true
average effect.
import numpy as np
import pandas as pd
from mlsynth import MAREX
from mlsynth.utils.marex_helpers.simulation import generate_marex_sample
rng = np.random.default_rng(0)
def design_mae(sample, **card):
J, T = sample.Y_N.shape
T0 = sample.T0
df = pd.DataFrame(
[{"unit": f"u{j}", "time": t, "y": float(sample.Y_N[j, t])}
for j in range(J) for t in range(T)]
)
res = MAREX({"df": df, "outcome": "y", "unitid": "unit",
"time": "time", "T0": T0, **card}).fit()
w = res.globres.treated_weights_agg
v = res.globres.control_weights_agg
treated = np.where(w > 1e-8)[0]
Y_obs = sample.Y_N.copy()
Y_obs[treated, T0:] = sample.Y_I[treated, T0:] # experiment realises Y^I
tau_hat = w @ Y_obs[:, T0:] - v @ Y_obs[:, T0:]
return np.mean(np.abs(tau_hat - sample.tau[T0:]))
maes, scales = [], []
for _ in range(5):
s = generate_marex_sample(J=12, T=30, T0=25, rng=rng)
maes.append(design_mae(s, m_min=1, m_max=11)) # Unconstrained
scales.append(np.mean(np.abs(s.tau[s.T0:])))
print(f"MAE {np.mean(maes):.2f} vs effect scale {np.mean(scales):.2f}")
The synthetic-control design recovers the average treatment effect with mean absolute error far below the effect’s own scale (≈ 4.4 vs. ≈ 14, i.e. under a third), the central message of the paper’s Table 2. Over the paper’s full 1000 simulations the error also decreases as more units are allowed into the treated group (the Unconstrained design is best), with the largest gains moving from one to two or three treated units.
Note
This is a Path-B replication: it reproduces the simulation study’s
conclusions from public DGPs and mlsynth code, with no dependency on the
authors’ replication package. It is locked in as
mlsynth.tests.test_marex_replication.
Empirical Application: Walmart (Placebo Experiment)#
We replicate the paper’s empirical illustration (Section 4) on the Walmart
store-sales panel (basedata/walmart_weekly_sales.csv): weekly sales for
45 stores over 143 weeks (Feb 2010 – Oct 2012). Following the paper, we
design a placebo experiment with a fictitious intervention at week 129:
\(T_0 = 128\) pre-experiment weeks, of which the first \(T_E = 100\) are
the fitting period and the last 28 are blank, leaving 15 experimental weeks. The
design uses the constrained formulation with \(m = 2\) treated stores,
uniform weights, and predictors normalised to unit variance (standardize).
import pandas as pd
from mlsynth import MAREX
df = pd.read_csv(
"https://raw.githubusercontent.com/jgreathouse9/mlsynth/"
"refs/heads/main/basedata/walmart_weekly_sales.csv"
)
res = MAREX({
"df": df, "outcome": "sales", "unitid": "store", "time": "week",
"T0": 128, "blank_periods": 28, "T_post": 15, # TE=100, 28 blank, 15 post
"m_eq": 2, # constrained design, two treated stores
"design": "standard",
"standardize": True, # unit-variance predictors (paper's normalisation)
"inference": True,
"display_graph": True,
}).fit()
print("treated stores:", res.treated_units) # [1, 15]
print("placebo p-value:", round(res.globres.inference.global_p_value, 3))
Because the intervention is a placebo (no real effect), a correct design should
produce synthetic treated and control units that track closely and an estimated
effect near zero. mlsynth reproduces exactly that — and the paper’s headline
number:
Quantity |
|
Paper (Section 4) |
|---|---|---|
Pre-fit RMSE / mean sales |
2.2% |
small (close tracking) |
Experimental ATT / mean sales |
-1.0% |
near zero |
Placebo permutation p-value |
0.937 |
0.933 |
Confidence band covers zero |
yes (all post weeks) |
yes |
The synthetic treated and control units track to within ~2% of mean sales over the fitting and blank periods, the estimated placebo effect is ~1% of sales, and the permutation test fails to reject the null of no effect (\(p = 0.937\), matching the paper’s \(0.933\)) — exactly the “no spurious effect” result a good design should deliver on a placebo.
Note
This uses the exact MIQP (relaxed=False, the default) with
standardize=True; the unit-variance normalisation is essential here
because Walmart stores differ enormously in sales level, and without it the
level differences dominate the match. The solve takes roughly a minute with
the open-source SCIP solver (the paper used commercial Gurobi).
Example#
import pandas as pd
from mlsynth import MAREX
# long panel: one row per (market, period)
res = MAREX({
"df": df, "outcome": "revenue", "unitid": "market", "time": "week",
"T0": 40, # 40 pre-experiment weeks
# Equivalently: pass a 0/1 column marking the experiment window
# "post_col": "in_experiment",
"m_eq": 2, # treat exactly two markets
"design": "standard",
"inference": True, # blank_periods defaults to floor(0.3 * T0) = 12
"display_graph": True,
}).fit()
print("treated markets:", res.treated_units)
print("global p-value:", res.globres.inference.global_p_value)
for label, c in res.clusters.items():
print(label, c.unit_weight_map["Treated"])
Core API#
MAREX: Synthetic Controls for Experimental Design (Abadie & Zhao 2026).
MAREX designs an experiment on aggregate units (e.g. markets): using only
pre-experimental data it selects which units to treat (treated weights w)
and which untreated units form the synthetic control (control weights v),
on the simplex and disjoint (a unit is treated or a control, never both). The
synthetic treated and synthetic control units are built to reproduce population
predictor means, so their post-period difference estimates the average
treatment effect. Optional clustering treats one (or a few) units per cluster;
optional blank-period placebo inference yields p-values and confidence bands.
- class mlsynth.estimators.scexp.MAREX(config: MAREXConfig | dict)#
Bases:
objectSynthetic-control experimental design estimator (Abadie & Zhao 2026).
- Parameters:
config (MAREXConfig or dict) – Configuration object. See
mlsynth.config_models.MAREXConfig.- Returns:
MAREXResults – Per-cluster and aggregate treated/control weights, synthetic series, the selected treated units, and (optionally) placebo inference.
- fit() MAREXResults#
Run the MAREX design and return
MAREXResults.
Configuration#
- class mlsynth.config_models.MAREXConfig(*, df: DataFrame, outcome: str, unitid: str, time: str, T0: int | None = None, post_col: str | None = None, cluster: str | None = None, design: str = 'standard', covariates: List[str] | None = None, covariate_weight: float = 1.0, standardize: bool = False, program_type: str = 'MIQP', display_graph: bool = False, beta: float = 1e-06, lambda1: float = 0.0, lambda2: float = 0.0, xi: float = 0.0, lambda1_unit: float = 0.0, lambda2_unit: float = 0.0, costs: List[float] | None = None, budget: int | Dict[int, int] | None = None, blank_periods: int | None = None, m_eq: int | None = None, m_min: int | None = None, m_max: int | None = None, exclusive: bool = True, relaxed: bool = False, solver: Any = None, verbose: bool = False, inference: bool = False, T_post: int | None = None)#
Configuration for the Synthetic Experiment Design estimator (MAREX) in mlsynth.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
Result Containers#
MAREX.fit() returns a
MAREXResults: a dict of
per-cluster MAREXClusterDesign
objects, the aggregate
MAREXGlobalDesign, the
MAREXStudy hyperparameters, and
(optionally) MAREXInference.
Frozen dataclass containers for the MAREX (synthetic experimental design) pipeline.
Implements the containers for:
Abadie, A., & Zhao, J. (2026). “Synthetic Controls for Experimental Design.”
MAREX designs an experiment on aggregate units: it chooses treated weights
w and control weights v (on the simplex, disjoint via w_j v_j = 0)
so the synthetic treated and synthetic control units reproduce population
predictor means. All containers are frozen (immutable) per the repository
convention; inference, when requested, is computed up front and embedded.
- class mlsynth.utils.marex_helpers.structures.MAREXClusterDesign(label: str, members: List[Any], cardinality: int, treated_weights: ndarray, control_weights: ndarray, selection_indicators: ndarray, synthetic_treated: ndarray, synthetic_control: ndarray, pre_treatment_means: ndarray, rmse: float, unit_weight_map: Dict[str, Dict[Any, float]], inference: MAREXInference | None = None)#
Bases:
objectDesign and synthetics for a single cluster.
- Parameters:
label (str) – Cluster label.
members (list) – Unit labels in this cluster.
cardinality (int) – Number of units in the cluster.
treated_weights (np.ndarray) – Treated weights
wfor this cluster’s column, shape(N,).control_weights (np.ndarray) – Control weights
vfor this cluster’s column, shape(N,).selection_indicators (np.ndarray) – Binary selection mask
zover the cluster’s members.synthetic_treated (np.ndarray) – Synthetic treated outcome over the full timeline, shape
(T,).synthetic_control (np.ndarray) – Synthetic control outcome over the full timeline, shape
(T,).pre_treatment_means (np.ndarray) – Cluster predictor means used as the matching target.
rmse (float) – Pre-treatment fit RMSE (synthetic treated vs control).
unit_weight_map (dict) –
{"Treated": {unit: w}, "Control": {unit: v}}for non-zero weights.inference (MAREXInference, optional) – Inference for this cluster (
Noneunless requested).
- control_weights: ndarray#
- inference: MAREXInference | None = None#
- pre_treatment_means: ndarray#
- selection_indicators: ndarray#
- synthetic_control: ndarray#
- synthetic_treated: ndarray#
- treated_weights: ndarray#
- class mlsynth.utils.marex_helpers.structures.MAREXGlobalDesign(Y_full: ndarray, Y_fit: ndarray, Y_blank: ndarray | None, treated_weights_agg: ndarray, control_weights_agg: ndarray, synthetic_treated: ndarray, synthetic_control: ndarray, inference: MAREXInference | None = None)#
Bases:
objectAggregated (population-level) design and synthetics.
- Parameters:
Y_full (np.ndarray) – Observed outcome matrix, shape
(N, T).Y_fit (np.ndarray) – Fitting slice, shape
(N, T_fit).Y_blank (np.ndarray, optional) – Held-out blank pre-periods, shape
(N, Tb)(Noneif none).treated_weights_agg (np.ndarray) – Cluster-size-weighted aggregate treated weights, shape
(N,).control_weights_agg (np.ndarray) – Cluster-size-weighted aggregate control weights, shape
(N,).synthetic_treated (np.ndarray) – Aggregate synthetic treated outcome, shape
(T,).synthetic_control (np.ndarray) – Aggregate synthetic control outcome, shape
(T,).inference (MAREXInference, optional) – Aggregate inference (
Noneunless requested).
- Y_fit: ndarray#
- Y_full: ndarray#
- control_weights_agg: ndarray#
- inference: MAREXInference | None = None#
- synthetic_control: ndarray#
- synthetic_treated: ndarray#
- treated_weights_agg: ndarray#
- class mlsynth.utils.marex_helpers.structures.MAREXInference(treated_effects: ndarray, placebo_effects: ndarray, fulltreated_effects: ndarray, s_obs: float, global_p_value: float, per_period_pvals: ndarray, ci: ndarray, alpha: float = 0.05)#
Bases:
objectPlacebo/permutation inference for one synthetic treated-vs-control pair.
- Parameters:
treated_effects (np.ndarray) – Post-period synthetic treated minus synthetic control, shape
(T1,).placebo_effects (np.ndarray) – The same contrast on the blank (held-out pre) periods, shape
(Tb,).fulltreated_effects (np.ndarray) – The contrast over the whole timeline, shape
(T,).s_obs (float) – Observed test statistic (mean absolute post-period effect).
global_p_value (float) – Permutation p-value for the global null of no effect.
per_period_pvals (np.ndarray) – Per-post-period p-values, shape
(T1,).ci (np.ndarray) – Split-conformal confidence band over the full timeline, shape
(T, 2)(pre-period rows areNaN).alpha (float) – Two-sided significance level.
- ci: ndarray#
- fulltreated_effects: ndarray#
- per_period_pvals: ndarray#
- placebo_effects: ndarray#
- treated_effects: ndarray#
- class mlsynth.utils.marex_helpers.structures.MAREXResults(clusters: Dict[str, MAREXClusterDesign], study: MAREXStudy, globres: MAREXGlobalDesign, post_fit: Any | None = None)#
Bases:
objectUser-facing output of the MAREX estimator.
- Parameters:
clusters (dict of {str: MAREXClusterDesign}) – Per-cluster design (a single
"0"entry when no cluster column).study (MAREXStudy) – Design hyperparameters.
globres (MAREXGlobalDesign) – Aggregate design and synthetics.
post_fit (SyntheticControlPostFit, optional) – Standardized post-fit diagnostics (ATE / total effect / percentage lift / fit RMSEs / inference / covariate SMDs). Computed at the end of
mlsynth.estimators.MAREX.fit()viamlsynth.utils.post_fit.compute_post_fit_marex().Noneonly when an estimator failure leaves the result partially constructed.
- clusters: Dict[str, MAREXClusterDesign]#
- globres: MAREXGlobalDesign#
- study: MAREXStudy#
- property synthetic_control: ndarray#
Aggregate synthetic control outcome, shape
(T,).
- property synthetic_treated: ndarray#
Aggregate synthetic treated outcome, shape
(T,).
- class mlsynth.utils.marex_helpers.structures.MAREXStudy(design: str, T0: int, blank_periods: int, beta: float = 1e-06, lambda1: float = 0.0, lambda2: float = 0.0, xi: float = 0.0)#
Bases:
objectDesign hyperparameters of a MAREX study (was
StudyConfig).
In addition, MAREX.fit() attaches a
SyntheticControlPostFit as
results.post_fit: the standardized diagnostics container shared across the
MAREX family (LEXSCM, MAREX, SYNDES, PANGEO). It carries the ATE / total /
lift / per-period / cumulative effect summaries, the inference triple
(\(p\), CI), the pre-/blank-/post-period RMSEs, the three
standardized-mean-difference blocks (treated-vs-control, treated-vs-population,
control-vs-population), and — when a valid noise window exists — a
PowerAnalysis block with the headline MDE and
the MDE-versus-horizon curve.
- class mlsynth.utils.post_fit.SyntheticControlPostFit(treated_series: ndarray, control_series: ndarray, gap_series: ndarray, n_fit: int, n_blank: int, n_post: int, ate: float | None = None, total_effect: float | None = None, ate_percent: float | None = None, ate_per_period: ndarray | None = None, cumulative_effect: ndarray | None = None, p_value: float | None = None, ci_lower: float | None = None, ci_upper: float | None = None, inference_method: str | None = None, rmse_fit: float | None = None, rmse_blank: float | None = None, rmse_post: float | None = None, covariate_names: Tuple[str, ...] = (), covariate_smd: Dict[str, float] | None = None, covariate_smd_abs_max: float | None = None, covariate_smd_squared_sum: float | None = None, covariate_smd_treated_vs_pop: Dict[str, float] | None = None, covariate_smd_treated_vs_pop_abs_max: float | None = None, covariate_smd_treated_vs_pop_squared_sum: float | None = None, covariate_smd_control_vs_pop: Dict[str, float] | None = None, covariate_smd_control_vs_pop_abs_max: float | None = None, covariate_smd_control_vs_pop_squared_sum: float | None = None, power: PowerAnalysis | None = None)#
Bases:
objectStandardized post-fit diagnostics for a single synthetic control design.
Field semantics are estimator-agnostic; every MAREX-family adapter populates the same shape. Any field that isn’t naturally computable for the producing estimator is left
None.- control_series: ndarray#
- gap_series: ndarray#
- power: PowerAnalysis | None = None#
- treated_series: ndarray#
- class mlsynth.utils.post_fit.PowerAnalysis(headline: MDEPoint, curve: Tuple[MDEPoint, ...], alpha: float, power_target: float, sigma_placebo: float, serial_correlation: float, baseline: float, method: str = 'analytical_ar1')#
Bases:
objectStandardized power-analysis output attached to
SyntheticControlPostFit.Built from the placebo / blank-period gap variance and an analytical Gaussian approximation, with AR(1) variance inflation to handle serial correlation in the gap residuals. The intent matches the per-estimator power modules already in the library (PangeoPower, SPCDPowerAnalysis, SYNDESPower) but consumes the same
SyntheticControlPostFitshape so every covariate-aware SCM-family estimator gets the surface for free.- curve#
MDE / power values across the requested
post_gridhorizons (so callers can read a detectability curve).
- serial_correlation#
Lag-1 (AR(1)) autocorrelation of the placebo gap residuals used to inflate the variance for serial dependence.
- Type:
- baseline#
Mean of the control trajectory on the post window (denominator for
mde_pct). NaN when no post window exists.- Type:
- method#
"analytical_ar1"for the closed-form Gaussian + AR(1) MDE used here. Reserved for future"monte_carlo"extensions.- Type:
Helper Modules#
Input preparation for MAREX: long panel -> design-ready arrays.
- class mlsynth.utils.marex_helpers.setup.MAREXPanel(Y_full: DataFrame, clusters: ndarray, T0: int, blank_periods: int, covariates: ndarray | None = None, covariate_names: Tuple[str, ...] = ())#
Prepared MAREX inputs.
- Y_full: DataFrame#
- clusters: ndarray#
- mlsynth.utils.marex_helpers.setup.prepare_marex_panel(df: DataFrame, outcome: str, unitid: str, time: str, cluster: str | None, T0: int | None, inference: bool, blank_periods: int, T_post: int | None, covariates: List[str] | None = None) MAREXPanel#
Pivot the long panel to
units x timeand forward the resolvedT0/blank_periods.The MAREX config validator is the single source of truth for resolving
T0(from either an explicit scalar or apost_col0/1 column) and the default 30%-of-pre-tail blank window — by the time this helper runs both are concrete integers.covariatescolumns are aggregated to a per-unit pre-period mean viamlsynth.utils.datautils.build_covariate_matrix()and returned as an(N, R)matrix aligned to the unit order. The matrix is left un-normalised here so MAREX’s existingstandardize=Trueflag (applied to the combined[Y_fit; covariate_weight * Z]predictor matrix inmarex_helpers.optimization) keeps its previous behaviour.
Design-formulation primitives for MAREX (Abadie & Zhao 2026).
The experimenter chooses treated weights w and control weights v per
cluster on the simplex, with a binary selection mask z linking them
(w_j <= z_j, v_j <= 1 - z_j) so a unit is either treated or a control,
never both (the disjointness w_j v_j = 0). These helpers build the cvxpy
variables, constraints, and the design-specific objective; the objective form
is selected by design:
"base"– match each cluster mean with both synthetic units;"weak"– match the treated synthetic to the mean and softly tie the control synthetic to it (weightbeta);"eq11"–baseplus cluster-level distance penalties (lambda1/lambda2);"unit"–baseplus unit-level penalties (xi/lambda1_unit/lambda2_unit).
- mlsynth.utils.marex_helpers.formulation.build_constraints(w, v, z, M, cluster_members, cluster_labels, m_eq, m_min, m_max, costs, budget_dict, exclusive)#
Simplex, disjointness, cardinality, cost, and exclusivity constraints.
- mlsynth.utils.marex_helpers.formulation.build_membership_mask(clusters, label_to_k, N, K)#
Boolean
(N, K)mask of unit-to-cluster membership.
- mlsynth.utils.marex_helpers.formulation.build_objective(Y_fit, Xbar_clusters, cluster_members, w, v, z, design, beta=1e-06, lambda1=0.0, lambda2=0.0, xi=0.0, lambda1_unit=0.0, lambda2_unit=0.0, D1=None, D2_list=None, zeta=0.0)#
Design-specific cvxpy objective (see module docstring).
zetaadds an optional integrality penaltyz (1 - z)used by the relaxed (continuous-z) solve; it is0for the exact MIQP.
- mlsynth.utils.marex_helpers.formulation.compute_cluster_means_members(Y_fit, M, cluster_labels)#
Per-cluster predictor means and member index arrays.
- mlsynth.utils.marex_helpers.formulation.get_per_cluster_param(param, klabel, default=None)#
Resolve a possibly-per-cluster parameter to its value for
klabel.
- mlsynth.utils.marex_helpers.formulation.init_cvxpy_variables(N, K, boolean=True)#
Treated (
w), control (v) weights and selection (z).zis binary for the exact MIQP (boolean=True) or continuous in[0, 1]for the relaxed QP (boolean=False).
- mlsynth.utils.marex_helpers.formulation.precompute_distances(Y_fit, Xbar_clusters, cluster_members)#
Unit-to-cluster-mean distances
D1and within-cluster pairwiseD2.
- mlsynth.utils.marex_helpers.formulation.prepare_clusters(Y_full, clusters)#
Coerce
Y_full/clustersto arrays and return cluster bookkeeping.
- mlsynth.utils.marex_helpers.formulation.prepare_fit_slices(Y_full_np, T0, blank_periods)#
Split the pre-period into a fitting slice and a held-out blank slice.
- mlsynth.utils.marex_helpers.formulation.validate_costs_budget(costs, budget, N, cluster_labels, K)#
Validate cost/budget inputs; return
(costs_np, budget_dict).
- mlsynth.utils.marex_helpers.formulation.validate_scm_inputs(Y_full, T0, blank_periods, design, beta=1e-06, lambda1=0.0, lambda2=0.0, xi=0.0, lambda1_unit=0.0, lambda2_unit=0.0)#
Validate shapes and design/parameter compatibility (raises ValueError).
MAREX design optimizers (Abadie & Zhao 2026).
solve_design solves the exact mixed-integer design (binary selection z);
solve_design_relaxed relaxes z to [0, 1], solves the QP, then
discretizes post hoc. Both return a raw result dict consumed by the
orchestrator.
- mlsynth.utils.marex_helpers.optimization.post_hoc_discretize(w_opt, v_opt, cluster_members, cluster_labels, m_eq=None, m_min=None, m_max=None, trim_threshold=0.01, Y_fit=None, Y_blank=None)#
Round relaxed weights to a feasible integer design (was internal).
- mlsynth.utils.marex_helpers.optimization.solve_design(Y_full, T0, clusters, blank_periods=0, m_eq=None, m_min=None, m_max=None, exclusive=True, design='standard', beta=1e-06, lambda1=0.0, lambda2=0.0, xi=0.0, lambda1_unit=0.0, lambda2_unit=0.0, costs=None, budget=None, covariates=None, covariate_weight=1.0, standardize=False, solver='SCIP', verbose=False)#
Exact mixed-integer MAREX design (was
SCMEXP).
- mlsynth.utils.marex_helpers.optimization.solve_design_relaxed(Y_full, T0, clusters, blank_periods=0, m_eq=None, m_min=None, m_max=None, exclusive=True, design='standard', beta=1e-06, lambda1=0.0, lambda2=0.0, xi=0.0, lambda1_unit=0.0, lambda2_unit=0.0, costs=None, budget=None, covariates=None, covariate_weight=1.0, standardize=False, solver=None, verbose=False, zeta=0.0, trim_threshold=0.01)#
Relaxed (continuous-
z) design with post-hoc discretization (wasSCMEXP_REL).
Placebo/permutation inference for a MAREX synthetic treated-vs-control pair.
The held-out blank pre-periods act as placebos: the synthetic treated minus synthetic control there should be noise, so its distribution calibrates a permutation p-value and a split-conformal confidence band for the post-period effect (Abadie & Zhao 2026, OA; Chernozhukov-Wuthrich-Zhu 2021).
- mlsynth.utils.marex_helpers.inference.compute_inference(Y_treated: ndarray, Y_control: ndarray, T0: int, TcE: int, Tb: int, alpha: float = 0.05, max_combinations: int = 1000, random_state: int | None = None) MAREXInference#
Compute permutation inference for one synthetic contrast.
- Parameters:
Y_treated, Y_control (np.ndarray) – Synthetic treated / control outcomes over the full timeline, shape
(T,).T0 (int) – Number of pre-treatment periods.
TcE (int) – Start index of the blank (held-out) window.
Tb (int) – Number of blank periods.
alpha (float) – Two-sided significance level.
max_combinations (int) – Number of permutation draws for the global test.
random_state (int, optional) – Seed for reproducibility.
- Returns:
MAREXInference
Top-level MAREX solve: run the design optimizer and assemble frozen results.
- mlsynth.utils.marex_helpers.orchestration.solve_marex(Y_full, T0, clusters, design='standard', blank_periods=0, m_eq=None, m_min=None, m_max=None, exclusive=True, beta=1e-06, lambda1=0.0, lambda2=0.0, xi=0.0, lambda1_unit=0.0, lambda2_unit=0.0, costs=None, budget=None, covariates=None, covariate_names=(), covariate_weight=1.0, standardize=False, solver=None, verbose=False, relaxed=False, inference=False, alpha=0.05, max_combinations=1000, random_state=42) MAREXResults#
Solve the MAREX design and return a frozen
MAREXResults.With
relaxed=Truethe continuous-zQP with post-hoc discretization is used; otherwise the exact MIQP. Withinference=True, blank-period placebo inference is computed for every cluster and the aggregate.
Linear-factor DGP for the MAREX simulation study (Abadie & Zhao 2026, Sec. 5).
Reimplements the paper’s baseline data-generating process (Assumption 1, equations 12a/12b) so the simulation study can be replicated without the authors’ code. Potential outcomes are
Y^N_jt = delta_t + theta_t’ Z_j + lambda_t’ mu_j + eps_jt Y^I_jt = upsilon_t + gamma_t’ Z_j + eta_t’ mu_j + xi_jt
with Z_j (R observed) and mu_j (F unobserved) covariates, sorted time effects, and i.i.d. Normal(0, sigma^2) noise.
- class mlsynth.utils.marex_helpers.simulation.MAREXSample(Y_N: ndarray, Y_I: ndarray, tau: ndarray, T0: int)#
One simulated sample.
- Y_N#
Potential outcomes under no treatment, shape
(J, T).- Type:
np.ndarray
- Y_I#
Potential outcomes under treatment, shape
(J, T).- Type:
np.ndarray
- tau#
True average treatment effect
tau_tper period, shape(T,)(zero in the pre-period).- Type:
np.ndarray
- Y_I: ndarray#
- Y_N: ndarray#
- tau: ndarray#
- mlsynth.utils.marex_helpers.simulation.generate_marex_sample(J: int = 15, R: int = 7, F: int = 11, T: int = 30, T0: int = 25, sigma: float = 1.0, rng: Generator | None = None) MAREXSample#
Draw one sample from the paper’s baseline linear-factor DGP (Sec. 5).
- Returns:
MAREXSample
Plotting for MAREX: synthetic treated vs control (or the treatment effect).
- mlsynth.utils.marex_helpers.plotter.plot_marex(results: MAREXResults, clusters: List[str] | None = None, plot_type: str = 'treatment', global_result: bool = True, figsize: tuple = (12, 6)) None#
Plot MAREX treatment effects (or predictions), one panel per cluster + global.
- Parameters:
results (MAREXResults) – Output of
mlsynth.estimators.MAREX.clusters (list of str, optional) – Cluster labels to plot (default: all; a lone
"0"cluster is skipped).plot_type ({“treatment”, “prediction”}) – Plot the treated-minus-control effect, or both synthetic series.
global_result (bool) – Include the aggregate (global) panel.
The shared post-fit module — compute_smd(),
compute_post_fit(), and
compute_power_analysis() — lives outside the
marex_helpers package so the other MAREX-family estimators (LEXSCM,
SYNDES, PANGEO) can call into the same one-source-of-truth diagnostics:
Standardized post-fit diagnostics for synthetic control designs and the matching power-analysis surface that consumes them.
After any MAREX-family estimator (LEXSCM, MAREX, SYNDES, PANGEO, …) solves its design problem, downstream consumers (the SAGE dashboard, paper-style reports, comparison tables) all need the same numbers:
the post-treatment ATT, total effect, percentage lift, per-period gap;
pre / blank / post root-mean-squared-error of the synthetic gap;
inference scalars (p-value, CI bounds) when computed;
covariate-balance standardized mean differences (SMDs) when covariates were used in the design.
This module exposes one frozen dataclass (SyntheticControlPostFit)
and three free functions:
compute_smd()– standalone, panel-independent SMDfrom any (cov_matrix, treated_w, control_w);
compute_post_fit()– the full diagnostic bundle fromtrajectories + boundaries + (optional) covariate matrix + (optional) inference;
compute_post_fit_marex()– adapter that builds the bundle from a
MAREXResults+MAREXPanelpair.
The free-function entry points are deliberately small and reusable, so the LEXSCM / SYNDES / PANGEO equivalents can be added one-at-a-time without touching this module: they just compose the same primitives.
- class mlsynth.utils.post_fit.MDEPoint(post_periods: int, mde_absolute: float, mde_pct: float, se: float, power_at_observed: float | None = None)#
Bases:
objectMinimum detectable effect at a single post-treatment horizon.
- class mlsynth.utils.post_fit.PowerAnalysis(headline: MDEPoint, curve: Tuple[MDEPoint, ...], alpha: float, power_target: float, sigma_placebo: float, serial_correlation: float, baseline: float, method: str = 'analytical_ar1')#
Bases:
objectStandardized power-analysis output attached to
SyntheticControlPostFit.Built from the placebo / blank-period gap variance and an analytical Gaussian approximation, with AR(1) variance inflation to handle serial correlation in the gap residuals. The intent matches the per-estimator power modules already in the library (PangeoPower, SPCDPowerAnalysis, SYNDESPower) but consumes the same
SyntheticControlPostFitshape so every covariate-aware SCM-family estimator gets the surface for free.- curve#
MDE / power values across the requested
post_gridhorizons (so callers can read a detectability curve).
- serial_correlation#
Lag-1 (AR(1)) autocorrelation of the placebo gap residuals used to inflate the variance for serial dependence.
- Type:
- baseline#
Mean of the control trajectory on the post window (denominator for
mde_pct). NaN when no post window exists.- Type:
- method#
"analytical_ar1"for the closed-form Gaussian + AR(1) MDE used here. Reserved for future"monte_carlo"extensions.- Type:
- class mlsynth.utils.post_fit.SyntheticControlPostFit(treated_series: ndarray, control_series: ndarray, gap_series: ndarray, n_fit: int, n_blank: int, n_post: int, ate: float | None = None, total_effect: float | None = None, ate_percent: float | None = None, ate_per_period: ndarray | None = None, cumulative_effect: ndarray | None = None, p_value: float | None = None, ci_lower: float | None = None, ci_upper: float | None = None, inference_method: str | None = None, rmse_fit: float | None = None, rmse_blank: float | None = None, rmse_post: float | None = None, covariate_names: Tuple[str, ...] = (), covariate_smd: Dict[str, float] | None = None, covariate_smd_abs_max: float | None = None, covariate_smd_squared_sum: float | None = None, covariate_smd_treated_vs_pop: Dict[str, float] | None = None, covariate_smd_treated_vs_pop_abs_max: float | None = None, covariate_smd_treated_vs_pop_squared_sum: float | None = None, covariate_smd_control_vs_pop: Dict[str, float] | None = None, covariate_smd_control_vs_pop_abs_max: float | None = None, covariate_smd_control_vs_pop_squared_sum: float | None = None, power: PowerAnalysis | None = None)#
Bases:
objectStandardized post-fit diagnostics for a single synthetic control design.
Field semantics are estimator-agnostic; every MAREX-family adapter populates the same shape. Any field that isn’t naturally computable for the producing estimator is left
None.- control_series: ndarray#
- gap_series: ndarray#
- power: PowerAnalysis | None = None#
- treated_series: ndarray#
- mlsynth.utils.post_fit.compute_post_fit(treated_series: ndarray, control_series: ndarray, *, n_fit: int, n_blank: int = 0, n_post: int | None = None, cov_matrix: ndarray | None = None, cov_names: Sequence[str] | None = None, cov_scales: ndarray | None = None, treated_weights: ndarray | None = None, control_weights: ndarray | None = None, population_weights: ndarray | None = None, inference: Any | None = None, n_treated_units: int | None = None) SyntheticControlPostFit#
Compute a
SyntheticControlPostFitfrom trajectories + boundaries.The trajectories
treated_seriesandcontrol_seriesare the estimator’s own synthetic constructs (Σⱼ wⱼ Yⱼ and Σⱼ vⱼ Yⱼ in Abadie-Zhou notation).n_postdefaults tolen(treated_series) - n_fit - n_blank.Covariate balance fields are populated when
cov_matrix+treated_weights+control_weightsare all supplied (the natural inputs for any MAREX-family design). Thecompute_smd()helper does the work, so the SMD numbers are exactly consistent with a standalone call tocompute_smd().Inference scalars are pulled from the estimator’s inference object via
_extract_inference(), which knows about the four common shapes (LEXSCMInference, MAREXMAREXInference, SYNDESSYNDESInference, or a plain dict). All inference fields are optional.
- mlsynth.utils.post_fit.compute_post_fit_marex(raw, panel, *, cov_scales: ndarray | None = None) SyntheticControlPostFit#
Adapt a
MAREXResults+MAREXPanelpair into aSyntheticControlPostFit.Pulls the aggregate synthetic-treated / synthetic-control trajectories from
raw.globres, the (T0, blank_periods) split frompanel.T0andpanel.blank_periods, the inference object fromraw.globres.inference, and the covariate matrix frompanel.covariates(when present).
- mlsynth.utils.post_fit.compute_power_analysis(post_fit: SyntheticControlPostFit, *, alpha: float = 0.05, power_target: float = 0.8, post_grid: Sequence[int] | None = None) PowerAnalysis#
Analytical MDE + power curve for a design’s
SyntheticControlPostFit.Uses the placebo / blank-period gap residuals (or the pre-period gap when no blank window was carved out) to estimate the noise standard deviation
sigma_placeboand the AR(1) autocorrelationrho, then computes the minimum detectable effect for each horizonTinpost_gridvia the Gaussian formulaMDE(T) = (z_{1-alpha/2} + z_{power}) * sigma_placebo * sqrt(VIF(T, rho)),
where
VIF(T, rho) = Var(mean of T AR(1) periods) / sigma_placebo^2. The headline MDE usesT = post_fit.n_post(the realised post window).- Parameters:
post_fit (SyntheticControlPostFit) – The standardized post-fit from any MAREX-family estimator.
alpha (float, default 0.05) – Two-sided significance level.
power_target (float, default 0.80) – Target power for the MDE.
post_grid (sequence of int, optional) – Post-treatment horizons at which to compute MDE. Defaults to a small geometric grid centered on
post_fit.n_postso users see the detectability tradeoff vs. running the experiment longer.
- Returns:
PowerAnalysis – Headline MDE + a curve over the requested horizons.
- mlsynth.utils.post_fit.compute_smd(cov_matrix: ndarray, treated_weights: ndarray, control_weights: ndarray, *, cov_names: Sequence[str] | None = None, cov_scales: ndarray | None = None) Dict[str, Any]#
Standardized mean differences between weighted treated and control means.
- Parameters:
cov_matrix (ndarray, shape (N, M)) – Per-unit covariate values; rows align to
treated_weightsandcontrol_weights.treated_weights, control_weights (ndarray, shape (N,)) – Non-negative weights with disjoint supports. They are renormalised to sum to 1 internally (so callers may pass raw sums-to-K weights).
cov_names (sequence of str, optional) – Names for the M covariates. Defaults to
("cov_0", "cov_1", ...).cov_scales (ndarray, shape (M,), optional) – Pre-computed per-covariate standardization scales (cross-unit std). Defaults to the std of
cov_matrixcolumns. Passing the value already cached bybuild_covariate_matrixis the right move.
- Returns:
dict with keys
smd(the per-covariate dict),smd_abs_max,and
smd_squared_sum. Returns empty / NaN summaries if either weightvector is all-zero.
References#
Abadie, A., & Zhao, J. (2026). “Synthetic Controls for Experimental Design.” See [ABADIE2024].
Abadie, A., Diamond, A., & Hainmueller, J. (2010). “Synthetic Control Methods for Comparative Case Studies.” Journal of the American Statistical Association 105(490):493-505.
Chernozhukov, V., Wuthrich, K., & Zhu, Y. (2021). “An Exact and Robust Conformal Inference Method for Counterfactual and Synthetic Controls.” Journal of the American Statistical Association 116(536):1849-1864.