MicroSynth (User-Level Balancing SC)#
Overview#
MicroSynth implements Robbins & Davenport (2021, J. Stat. Software), “microsynth: Synthetic Control Methods for Disaggregated and Micro-Level Data in R”. It is the user-level cousin of classical synthetic control: rather than reweighting a small donor pool of aggregate units (states, cities) to match a single treated unit’s pre-trajectory, MicroSynth reweights a large pool of individual control users to match a group of treated users on covariate moments.
This is the right tool when:
The unit of analysis is an individual user (or household, or block-group) — not an aggregate region.
There are many treated units (typically thousands or millions) rather than one.
The setting is marketing-science / ad-attribution / holdout- contamination measurement, where you have user-level impression logs and want to estimate causal lift without trusting a potentially contaminated randomized holdout.
Compared to the aggregate-unit SC estimators in mlsynth
(Forward Difference-in-Differences (FDID), Synthetic Difference-in-Differences (SDID), Partially Pooled SCM (PPSCM), Sparse Synthetic Control (SparseSC), …)
MicroSynth has a dramatically larger donor pool but a much smaller
balancing constraint set — the dual problem lives in
\(\mathbb{R}^{d+1}\) where \(d\) is the number of
covariates, regardless of how many control users there are. This
is what makes single-machine MicroSynth tractable on millions of
users.
Mathematical Formulation#
Setup#
Let \(\mathcal{T}\) and \(\mathcal{C}\) denote the sets of treated and control users, with sizes \(n_T\) and \(n_C\). For each user \(i\) we observe a covariate vector \(X_i \in \mathbb{R}^d\) and a post-treatment outcome \(Y_i\). The treatment indicator is the actual exposure (impressions, not assignment), so contamination of a randomized holdout is absorbed: a holdout-arm user who actually saw the ad is treated; a treated-arm user who in fact got no impressions is a control.
The estimand is the population ATT on the actually-exposed group:
Primal QP#
MicroSynth solves a constrained quadratic program for non-negative simplex weights on the controls:
where \(\bar{X}_T = (1/n_T) \sum_{i \in \mathcal{T}} X_i\) is the treated group’s covariate mean and \(X_C \in \mathbb{R}^{n_C \times d}\) stacks the control covariates.
The objective pulls weights toward the uniform \(1/n_C\) baseline (so the solution doesn’t collapse onto one user); the equality constraints exactly balance every covariate moment between treated and reweighted controls; the simplex constraints preserve the “synthetic” interpretation (non-negative, sum-to-one).
The square-loss penalty makes \(w\) sparse: most controls end up with \(w_i = 0\) and only the controls genuinely close to the treated profile receive mass. This is the “Synth” in MicroSynth.
Dual Ascent#
The primal is high-dimensional (\(n_C\) variables can be in the millions). The dual is low-dimensional: one Lagrange multiplier per equality constraint, so \(\theta = (\lambda, \nu) \in \mathbb{R}^{d+1}\). Solving the dual via L-BFGS-B with the analytical gradient is fast and parallelizable in \(n_C\). The primal weights recover in closed form via the KKT relationship:
normalized so \(\sum_i w_i = 1\).
Counterfactual and ATT#
With \(\hat{w}\) solved, the synthetic counterfactual outcome and ATT are:
When there are multiple post-treatment periods, the same
\(\hat{w}\) is applied to every post-period outcome and the
final scalar att is the mean of the per-period gaps. The full
per-period vector is exposed on
MicroSynthResults.gap_trajectory.
Identifying Assumption#
Selection-on-observables: conditional on \(X\), treatment exposure is independent of the potential outcomes. In marketing applications this means \(X\) must include every feature the ad-targeting system uses that also predicts conversion. Typical required covariates: prior-engagement metrics, device platform, audience-segment / persona membership, geo, demographics, frequency exposure to parallel campaigns, time-of-day patterns.
Selection-on-observables is the headline assumption, but in a Snap-style ad-attribution deployment several others are doing silent work. Each is listed here together with the realistic failure mode you would see in a marketing-science setting and a diagnostic that flags it.
Selection-on-observables on every conversion-predictive feature. The covariate vector \(X\) must contain every signal the bidder / targeting model conditions on that also predicts conversion. If a targeting feature is missing, MicroSynth’s reweighting closes balance only on the features you gave it and leaves selection bias on the one you did not.
Plausibly violated when the bidder optimises against a model that uses features the analyst does not have access to – on-device signals, third-party audience segments, latent embeddings, in-market scoring. Diagnostic: probe the unobserved-intent residual by regressing post-period conversion on the residual of a saw-ad model that conditions on \(X\); a non-zero coefficient is unobserved confounding that MicroSynth cannot remove. The existing “When Balancing Is Not Enough” section below makes this concrete: when intent is latent, the as-treated MicroSynth ATT overstates the per-exposure effect by ~29% even with all SMDs below 1e-3.
SUTVA at the user level (no network spillovers in conversion). The synthetic-control framing treats each user’s potential outcome as a function of their own exposure only. Exposed users influencing unexposed users (a friend talks about the ad, an organic post amplifies the campaign) breaks the comparison: the control pool itself has been partially treated.
Plausibly violated when the campaign is viral or social by design – influencer-led launches, group-chat-shareable AR lenses, referral mechanics. Diagnostic: split controls by social distance to the exposed cohort (e.g. friends-of-treated vs. network-distant controls) and refit; a non-trivial gap between the two ATTs is a SUTVA failure. For genuinely spillover-prone designs, switch to a spillover-aware aggregate estimator (Spillover-Aware Synthetic Control (SPILLSYNTH), Spatial Synthetic Difference-in-Differences (SpSyDiD)).
Overlap: the treated covariate mean lies in the convex hull of the controls. The primal QP enforces \(X_C^{\!\top} w = \bar X_T\) with \(w\) on the simplex. There is a feasible solution if and only if \(\bar X_T\) is in the convex hull of the rows of \(X_C\); if not, no reweighting can balance every constraint and the dual still returns a vector, but the residual imbalance is real.
Plausibly violated when the campaign targeted a covariate cell that the control pool barely contains – a brand-new audience-segment launch, a country where the ad ran but very few organic users live, an iOS-only push with mostly Android in the control pool. Diagnostic: read
MicroSynthResults.design.feasibility_messageand the per-covariatesmd_after; if the feasibility flag is False or any SMD exceedsbalance_tol, the hull condition is failing. The fix is to widen the control pool (drop sub-population filters), drop a covariate that is genuinely outside support, or accept the residual imbalance and discuss its sign.Linear functional form (or sufficient basis expansion) of the outcome in \(X\). Balancing only the first moments of \(X\) gives an unbiased ATT when the conditional expectation \(\mathbb{E}[Y(0) \mid X]\) is linear in \(X\). If the expectation is nonlinear (e.g. age enters as a smooth bump rather than a slope), first-moment balance is not enough – the doubly robust property of the balancing approach (Lin et al. 2023) only holds under linearity in one of the outcome or selection models.
Plausibly violated when engagement metrics enter non-linearly (saturation effects, threshold heaps in prior-engagement). * Diagnostic*: add quadratic terms and selected interactions to
covariatesand rerun – if the ATT moves materially, the linear specification was binding. The KDD paper (Section 4) explicitly recommends including higher-order moments of skewed user-engagement covariates for exactly this reason.Pre-period parallel mean for the rebalanced control group. Because the constraints are contemporaneous moment balance, the counterfactual at \(t > T_0\) is trustworthy only if the rebalanced controls would have moved in parallel with the treated group absent treatment. The covariates should therefore include pre-period outcome levels (the Roanoke / Snap recipe: include pre-intervention outcome trajectories as constraint moments).
Plausibly violated when the analyst forgot to include pre-period outcomes in the constraint set, or when there is a secular trend in the treated pool’s outcome that no covariate captures. Diagnostic: plot
MicroSynthResults.gap_trajectoryover the pre-period (include enough pre-periods to see a trend) – a non-flat pre-period gap is a parallel-trends violation. Robbins, Saunders & Kilmer (2017) build the constraint set explicitly out of all pre-period outcome-by-time cells for this reason.Stable covariates over the analysis window (no compositional drift). The primal solves a single \(w\) and applies it to every post-period. Implicit: the donor pool’s covariate vector is sufficient to characterise it across \([t = 0, T]\).
Plausibly violated when the user base churns mid-campaign (new cohorts join, old cohorts age out), or when a covariate itself shifts after \(T_0\) (e.g.
country_tierre-classification, persona-segment redefinition). Diagnostic: rebuild \(X\) on the post-period sample only, recompute \(\bar X\) for the rebalanced controls, and check thatsmd_afteris still tight; drift shows up as post-period SMDs that have crept above the pre-period tolerance.Treatment indicator is the actually-realised exposure, used consistently with the estimand. MicroSynth identifies the ATT on the actually-exposed group when
treatis the impression column. If you instead use the assignment column, you get an ITT under balancing on \(X\). Mixing the two – naming an assignment columntreatbut interpreting the answer per-exposure – is a specification error, not an assumption failure of the method.Plausibly violated when the team operationalises “treated” as “assigned” because that is what the experimentation platform logs, but reports per-exposure lift. Diagnostic: always sanity check the printed treated-fraction against the impression log; if they disagree, the wrong column was passed.
When not to use MicroSynth#
Clean randomised AB test with full compliance. MicroSynth’s whole selling point is removing observational selection bias. If the experimentation platform delivered a non-contaminated holdout and exposure compliance is near-complete, a plain difference of means is both unbiased and lower-variance. MicroSynth then adds variance (the bootstrap, the constraint set) without buying identification.
Confounding is dominated by unobserved features (latent intent). This is the boundary case spelled out below in “When Balancing Is Not Enough”. When holdout leakage is driven by an in-market signal the analyst does not have, MicroSynth zeros out SMDs on every observed covariate and still returns a biased ATT. Stay on the randomised arms – report ITT under MicroSynth balancing for precision, and divide by the compliance gap to get a covariate- balanced CACE / Wald (the section below shows the full recipe).
Aggregate region-level data, single treated unit. A one-state, one-policy DMI-style design is what classical aggregate SC was built for. MicroSynth’s dual is \((d + 1)\)-dimensional but the primal must have many controls; with a handful of aggregate donors the QP degenerates and the convex-hull / overlap argument is exactly the classical SC argument. Use canonical SCM, Two-Step Synthetic Control, Forward Difference-in-Differences (FDID), or Factor Model Approach (FMA) instead.
The distribution of the outcome is the object of interest. MicroSynth balances means (or moments you specify) and returns a scalar ATT. If the question is “does the campaign compress the lower tail of session length?” or “what is the QTE at the 90th percentile of basket size?”, switch to Distributional Synthetic Control (DSC) – the Wasserstein-barycenter machinery is designed exactly for that.
The treatment is continuous or multi-valued (ad dose). MicroSynth encodes a binary saw-ad / not-saw-ad column. Multi-valued exposure (one impression vs ten vs a hundred), spend dose, or auction price needs the continuous-treatment framework in Continuous-Treatment Synthetic Control (CTSC).
Spillovers / interference within the user graph. SUTVA at the user level is a hard assumption; viral and social-by-design campaigns violate it. Covariate balancing on the user pool does nothing about spillovers. Switch to a spillover-aware design (Spillover-Aware Synthetic Control (SPILLSYNTH), Spatial Synthetic Difference-in-Differences (SpSyDiD)) and accept that you are now identifying an aggregate quantity, not a user-level lift.
Convex-hull condition fails on the targeting axis. If the campaign was narrowly targeted – an iOS-only push to a brand-new audience segment with almost no organic match in the control pool –
feasibility_messagewill fire and the residual SMDs will be visibly above tolerance. There is no balancing fix here: either widen the control pool (relax the segment filter, pool across countries), drop the constraint that is outside support, or acknowledge the residual imbalance in the writeup.You have billions of users and a single-machine budget. The in-memory dual scales as \(O(N d K)\); at Snap-scale this is a cluster job, not a workstation job. Switch to the distributed DistEB / DistMS variants in Lin et al. (2023), which are designed to run as MapReduce gradient steps over PySpark.
Tiny treated cohort (handful of users) with many covariates. With \(n_T\) small, \(\bar X_T\) is itself noisy, the balance constraints are noisy targets, and the bootstrap CI widens to uselessness. Aggregate the treated cohort up to a meaningful unit (campaign-level, segment-level) and run an aggregate SC, or prune covariates to those with credible cross-validated predictive signal.
Diagnostics#
The dual solver returns weights that — when the treated group’s
covariate mean lies in the convex hull of the controls’ covariate
matrix — achieve all balance constraints to numerical precision.
mlsynth reports four diagnostics per fit:
SMD before and after weighting: per-covariate standardized mean difference. After weighting these should be at the
balance_tolfloor (default 1e-4).Effective sample size (ESS)
= 1 / sum(w^2): how many effective control units carry the weight. ESS close to \(n_C\) is healthy; ESS \(\ll n_C\) means a small fraction of controls dominate the counterfactual.Max weight: the largest single control-user weight, a concentration indicator.
Feasibility flag:
Falseif any final SMD exceedsbalance_tol— diagnoses convex-hull violations where no reweighting can equalize covariates.
Inference#
The default run_inference=True runs a paired stratified
bootstrap: resample \(n_T\) treated users and \(n_C\)
control users separately, refit the dual, repeat n_bootstrap
times. The percentile CI and SE come from the bootstrap
distribution.
Each bootstrap rep is fast because the dual ascent re-converges
quickly from cold start (the dual is convex and low-dimensional);
with n_bootstrap = 500 on 100K users + 20 covariates, total
inference time is in the low minutes.
Core API#
MicroSynth estimator (Robbins-Davenport 2021).
User-level balancing synthetic control. Solves a constrained QP for
non-negative simplex weights on the control population that exactly
balance covariate moments against the treated group’s moments, then
reads off the ATT as the weighted-mean outcome difference. Scales
to N_C in the millions on a single machine because the dual
optimization is in R^{d+1} regardless of N_C.
See mlsynth.config_models.MicroSynthConfig for the public
configuration. Helpers live in
mlsynth.utils.microsynth_helpers.
- class mlsynth.estimators.microsynth.MicroSynth(config: MicroSynthConfig | dict)#
Bases:
objectUser-level balancing synthetic control estimator.
- Parameters:
config (MicroSynthConfig or dict) – Configuration object. See
mlsynth.config_models.MicroSynthConfig.- Returns:
MicroSynthResults – Typed container with the dual-ascent weights, balance diagnostics, counterfactual trajectory, ATT, and (optionally) a paired stratified bootstrap CI.
Notes
Unlike aggregate-unit estimators in
mlsynth(FDID, SDID, PPSCM, SparseSC, etc.), MicroSynth treats individual users as units. There can be thousands of treated users; the control “donor pool” is the entire untreated population. Covariate moments listed incovariates– and optionally pre-treatment outcome values listed inoutcome_lag_periods– are exactly balanced between treated and weighted controls by a quadratic program.The identifying assumption is selection-on-observables: given the covariate set, treatment exposure is independent of potential outcomes. Marketing applications typically need covariates that include audience-segment / persona membership, device, geo, prior engagement, and frequency exposure to parallel campaigns; missing any of those that influence both exposure and the outcome introduces residual bias.
Examples
>>> import pandas as pd >>> from mlsynth import MicroSynth >>> df = pd.read_csv("user_panel.csv") >>> res = MicroSynth({ ... "df": df, "outcome": "converted", ... "treat": "saw_ad", "unitid": "user_id", "time": "week", ... "covariates": ["age", "device", "prior_engagement", ... "country_tier", "gender"], ... "display_graphs": False, ... }).fit() >>> res.att 0.052
- fit() MicroSynthResults#
Run the dual-ascent fit and (optionally) bootstrap CI.
Configuration#
- class mlsynth.config_models.MicroSynthConfig(*, df: ~pandas.DataFrame, outcome: str, treat: str, unitid: str, time: str, display_graphs: bool = True, save: bool | str = False, counterfactual_color: ~typing.List[str] = <factory>, treated_color: str = 'black', covariates: ~typing.List[str], outcome_lag_periods: ~typing.List[~typing.Any] | None = None, standardize_covariates: bool = True, balance_tol: ~typing.Annotated[float, ~annotated_types.Gt(gt=0)] = 0.0001, max_iter: ~typing.Annotated[int, ~annotated_types.Ge(ge=10)] = 500, gtol: ~typing.Annotated[float, ~annotated_types.Gt(gt=0)] = 1e-08, run_inference: bool = True, n_bootstrap: ~typing.Annotated[int, ~annotated_types.Ge(ge=2)] = 500, seed: int = 1400)#
Configuration for the MicroSynth estimator.
Implements Robbins & Davenport (2021, J. Stat. Software), “microsynth: Synthetic Control Methods for Disaggregated and Micro-Level Data in R”. A user-level balancing estimator: solve a constrained QP for non-negative simplex weights on control users that exactly balance covariate moments against the treated group’s moments, then read off the ATT as the weighted-mean outcome difference.
Unlike aggregate-unit SCM estimators in
mlsynth, MicroSynth operates at the individual-user level with many treated units and a large donor pool of controls. The dual ascent solver scales with the number of balancing constraints (d + 1), not with the number of controls, making it tractable forN_Cin the millions on a single machine.- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
Helper Modules#
Long-DataFrame ingestion for MicroSynth.
Converts a long-format panel (one row per (user, time)) into the
matrices the dual solver needs:
X_T,X_C– treated and control covariate matrices, one row per user.Y_T,Y_C– post-treatment outcome matrices, one row per user and one column per post-treatment period.
Conventions:
A “treated user” is any unit that has
treat = 1for at least one period (the actual-exposure indicator).A “control user” has
treat = 0for every period.The cohort time
T0is the first period where any user hastreat = 1. Users with treatment onsets at different times (staggered adoption) are rejected – MicroSynth assumes a single cohort.Covariates listed in
covariatesmust be time-invariant per user (a single value peruser_id). Time-varying features should be collapsed by the caller, or passed viaoutcome_lag_periodsif they’re pre-treatment outcomes.
- mlsynth.utils.microsynth_helpers.setup.prepare_microsynth_inputs(df: DataFrame, outcome: str, treat: str, unitid: str, time: str, covariates: Sequence[str], outcome_lag_periods: Sequence[Any] | None = None, standardize: bool = True) MicroSynthInputs#
Build MicroSynth inputs from a long-format panel.
- Parameters:
df (pd.DataFrame) – Long-format panel: one row per
(user, time).outcome, treat, unitid, time (str) – Column names.
covariates (Sequence[str]) – Columns in
dfto use as balancing covariates. Each must be time-invariant per user.outcome_lag_periods (Sequence, optional) – Specific pre-treatment time labels whose outcome values become additional balancing constraints.
standardize (bool) – Z-score covariates across all users before fitting.
- Returns:
MicroSynthInputs
L-BFGS-B dual ascent for the MicroSynth QP.
The primal is:
min_w (1/2) || w - 1/n_C ||^2
s.t. X_C^T w = x_bar_T (d balancing constraints)
1^T w = 1 (sum-to-one)
w >= 0 (non-negativity)
The dual lives in R^{d+1} regardless of how many control units N_C there are – one Lagrange multiplier per balance constraint plus one for the sum-to-one constraint. This is the entire reason MicroSynth scales to millions of control users on a single machine: solving an R^{d+1} convex program (typically d <= 30 in marketing settings) and reading the primal off the KKT relationship in closed form.
The dual objective and its closed-form gradient are derived from the convex conjugate of the primal objective + non-negativity indicator. See Snap KDD 2023 (Lin et al.) Eq. (11)-(15) for the matching formulation in the distributed setting.
- class mlsynth.utils.microsynth_helpers.dual_solver.DualSolverResult(w: 'np.ndarray', dual_lambda: 'np.ndarray', dual_nu: 'float', n_iterations: 'int', converged: 'bool')#
-
- dual_lambda: ndarray#
- w: ndarray#
- mlsynth.utils.microsynth_helpers.dual_solver.solve_microsynth_dual(X_C: ndarray, xbar_T: ndarray, max_iter: int = 500, gtol: float = 1e-08) DualSolverResult#
Solve the MicroSynth dual via L-BFGS-B.
- Parameters:
X_C (np.ndarray) – Control-user covariate matrix, shape
(n_C, d).xbar_T (np.ndarray) – Treated-group covariate mean, shape
(d,).max_iter (int) – L-BFGS-B maximum iterations.
gtol (float) – Gradient tolerance.
- Returns:
DualSolverResult – Primal weights
w(shape(n_C,)), dual variableslambda(shape(d,)) andnu(scalar), iteration count, and convergence flag.
Balance diagnostics for MicroSynth.
Functions to assess whether the dual solver’s weights actually achieved covariate balance, how concentrated the weights are, and how many effective control units remain after weighting.
- mlsynth.utils.microsynth_helpers.diagnostics.effective_sample_size(w: ndarray) float#
Effective sample size,
1 / sum(w^2).Equal weights give
ESS = n_C. A degenerate single-user solution givesESS = 1. Lower ESS means the weighted estimator depends on fewer effective observations.
- mlsynth.utils.microsynth_helpers.diagnostics.feasibility_check(smd_after: ndarray, balance_tol: float) Tuple[bool, str]#
Did every balancing constraint achieve
|SMD| < balance_tol?If not, the treated group’s covariate mean lies outside the convex hull of the controls’ covariate matrix, and no choice of non-negative weights summing to 1 can satisfy all constraints exactly. The QP returns the closest feasible point but the estimator is biased.
- mlsynth.utils.microsynth_helpers.diagnostics.standardized_mean_difference(X_T: ndarray, X_C: ndarray, w: ndarray | None = None) ndarray#
Per-covariate SMD between treated and (optionally weighted) controls.
SMD is defined as
(mean_T - mean_C) / pooled_sdwherepooled_sd = sqrt((var_T + var_C) / 2). By convention|SMD| < 0.1is considered balanced.
Paired stratified bootstrap inference for MicroSynth.
Resample treated users and control users separately with replacement,
preserving the original (n_T, n_C) allocation. For each resample,
refit the dual solver and recompute the ATT. The bootstrap
distribution of ATT-estimates yields the CI.
Single-user weight bootstrapping is not used here – it requires re-standardization that complicates inference. Pair-wise resampling on the user blocks is the standard ATT bootstrap and matches the practice in Wang-Zubizarreta (2019) and the original Robbins-Davenport reference implementation.
- mlsynth.utils.microsynth_helpers.inference.paired_bootstrap_ci(X_T: ndarray, X_C: ndarray, Y_T: ndarray, Y_C: ndarray, n_bootstrap: int, seed: int, max_iter: int = 500, gtol: float = 1e-08, ci_level: float = 0.95) Tuple[float, ndarray, ndarray, int]#
Paired stratified bootstrap on (treated, control) blocks.
- Returns:
se (float) – Bootstrap standard error of the ATT.
ci (np.ndarray) – Percentile CI at
ci_level, shape(2,).boot_atts (np.ndarray) – Full bootstrap distribution, shape
(n_complete,).n_complete (int) – Number of bootstrap reps that converged (out of
n_bootstrap).
Plot helpers for MicroSynth.
Two diagnostics:
Love plot: per-covariate SMD before and after weighting. The standard balance diagnostic that marketing-science folks recognize from propensity-score work.
Lift trajectory: per-post-period gap with a bootstrap band (only meaningful when
T_post > 1).
- mlsynth.utils.microsynth_helpers.plotter.plot_microsynth(results: MicroSynthResults, treated_color: str = 'black', counterfactual_color: str | List[str] = 'red', save: bool | str | dict = False) None#
Render the love plot + (if applicable) the lift trajectory.
Uses matplotlib lazily so that the module imports cleanly even when matplotlib is unavailable.
Typed result containers for MicroSynth.
- class mlsynth.utils.microsynth_helpers.structures.MicroSynthDesign(w: ndarray, dual_lambda: ndarray, dual_nu: float, smd_before: ndarray, smd_after: ndarray, ess: float, max_weight: float, feasible: bool, feasibility_message: str, n_iterations: int, converged: bool)#
Outputs of the dual ascent + balance diagnostics.
- Parameters:
w (np.ndarray) – Control-side weights on the simplex, shape
(n_C,).sum(w) == 1,w >= 0.dual_lambda (np.ndarray) – Lagrange multipliers for the covariate balance constraints, shape
(d,).dual_nu (float) – Lagrange multiplier for the sum-to-one constraint.
smd_before (np.ndarray) – Per-covariate standardized mean difference between treated and unweighted controls, shape
(d,).smd_after (np.ndarray) – Per-covariate SMD after applying
w, shape(d,). Should be near zero on every constraint.ess (float) – Effective sample size of the weighted control group,
1 / sum(w^2).max_weight (float) – Largest single control-user weight.
feasible (bool) –
Trueif every|smd_after_k| < balance_tol.Falsesignals that the QP did not achieve balance and the treated group may lie outside the convex hull of controls.feasibility_message (str) – Human-readable diagnostic.
n_iterations (int) – L-BFGS-B iterations to convergence.
converged (bool) – Whether the optimizer reported success.
- dual_lambda: ndarray#
- smd_after: ndarray#
- smd_before: ndarray#
- w: ndarray#
- class mlsynth.utils.microsynth_helpers.structures.MicroSynthInference(method: str, att: float, se: float, ci: ndarray, n_bootstrap: int, bootstrap_atts: ndarray)#
Bootstrap confidence interval and standard error.
- bootstrap_atts: ndarray#
- ci: ndarray#
- class mlsynth.utils.microsynth_helpers.structures.MicroSynthInputs(X_T: ndarray, X_C: ndarray, Y_T: ndarray, Y_C: ndarray, treated_unit_names: Sequence, control_unit_names: Sequence, covariate_names: Sequence, cohort_time: Any, covariate_sd: ndarray | None, outcome: str)#
Pre-processed user-level matrices for MicroSynth.
- Parameters:
X_T (np.ndarray) – Treated-user covariate matrix, shape
(n_T, d). Already standardized ifstandardize_covariates=True.X_C (np.ndarray) – Control-user covariate matrix, shape
(n_C, d). Same standardization asX_T.Y_T (np.ndarray) – Treated-user post-treatment outcomes, shape
(n_T, T_post)whereT_postis the number of post-treatment periods. IfT_post = 1this is collapsed to(n_T,).Y_C (np.ndarray) – Control-user post-treatment outcomes, shape
(n_C, T_post)or(n_C,)matchingY_T.treated_unit_names (Sequence) – Identifiers of the treated users, in row order of
X_T.control_unit_names (Sequence) – Identifiers of the control users, in row order of
X_C.covariate_names (Sequence[str]) – Labels of the balancing constraints in column order of
X_T/X_C. Includes both the user-suppliedcovariatesand anyoutcome_lag_periodscolumns.n_T, n_C, d, T_post (int) – Cached shapes.
cohort_time (Any) – The treatment-onset time inferred from
df.covariate_sd (np.ndarray) – Pooled SD used for standardization, shape
(d,).Noneif standardization was disabled.outcome (str) – Outcome column name.
- X_C: ndarray#
- X_T: ndarray#
- Y_C: ndarray#
- Y_T: ndarray#
- class mlsynth.utils.microsynth_helpers.structures.MicroSynthResults(inputs: MicroSynthInputs, design: MicroSynthDesign, inference: MicroSynthInference, counterfactual: ndarray, gap: ndarray, gap_trajectory: ndarray, att: float, donor_weights: Dict[Any, float])#
Public return container for
MicroSynth.fit().- Parameters:
inputs (MicroSynthInputs) – Pre-processed inputs.
design (MicroSynthDesign) – Weights, dual variables, balance diagnostics.
inference (MicroSynthInference) – Bootstrap CI on the ATT (or
method = "none"if disabled).counterfactual (np.ndarray) – Weighted-control outcomes per post-treatment period, shape matches
Y_T.gap (np.ndarray) – Treated mean minus counterfactual, per post-treatment period. Shape matches
Y_T.gap_trajectory (np.ndarray) – Per-post-period gap, always 1-D (length
T_post).att (float) – Mean of
gap_trajectory.donor_weights (Dict[Any, float]) –
{control_user_name: w_i}for all controls withw_i > 0.
- counterfactual: ndarray#
- design: MicroSynthDesign#
- gap: ndarray#
- gap_trajectory: ndarray#
- inference: MicroSynthInference#
- inputs: MicroSynthInputs#
Example: Holdout-Contamination Recovery#
The motivating use case: a randomized holdout was supposed to be clean, but some held-out users were contaminated (got served the ad anyway through other audience segments). Naive ITT (using the assignment column) understates lift; naive TOT (using the impression column without balancing) overstates lift because the ad-bidder cherry-picked engaged users. MicroSynth treats the impression log as the treatment indicator and rebalances:
# Triangulate against ITT and naive TOT to verify the contamination story. itt = df.query(“week > 0”).groupby(“assigned_exposed”)[“converted”].mean() tot = df.query(“week > 0”).groupby(“saw_ad”)[“converted”].mean() print(f” ITT lift = {itt[1] - itt[0]:+.4f} (contamination-biased)”) print(f” Naive TOT = {tot[1] - tot[0]:+.4f} (selection-biased)”) print(f” MicroSynth ATT = {results.att:+.4f} (causal estimate)”)
Simulation Study: Contamination Recovery#
The most informative way to convince yourself the method is doing what it claims is to run it against a data-generating process where you know the ground truth. The script below simulates the randomized-holdout-with-contamination setting end-to-end:
2000 users, randomly assigned 1200/800 to exposed/holdout.
300 of the 800 holdouts get contaminated (saw ads anyway), with contamination biased toward high-engagement, older users — the realistic case where the ad-bidder cherry-picks the same kind of users that would convert at higher baseline rates.
True lift is a constant +5 percentage points on conversion.
Three estimators are computed on the same data:
ITT (assignment-based): biased toward zero by contamination — treats contaminated holdouts as “control” even though they got ads.
Naive TOT (impression-based, no balancing): biased upward by bidder selection — the actually-exposed users are positively selected on covariates that predict conversion.
MicroSynth: takes impressions as the treatment indicator, reweights the clean holdouts to match the actually-exposed group on covariates, and computes the lift on the rebalanced controls.
The triangulation pattern to look for is
ITT < MicroSynth ≈ truth < Naive TOT. The simulation reproduces
this pattern in median across replications.
import numpy as np
import pandas as pd
from scipy.special import expit
from mlsynth import MicroSynth
# ---- Constants ----
N_USERS = 2000
N_ASSIGNED_EXPOSED = 1200
CONTAMINATION_COUNT = 300
TRUE_LIFT = 0.05
N_SIMS = 200
COVS = ["age", "device", "gender", "country_tier", "prior_engagement"]
def simulate_one(rng):
"""Generate one contaminated-holdout panel as a long DataFrame."""
n = N_USERS
age = rng.standard_normal(n)
prior_engagement = rng.standard_normal(n)
device = rng.binomial(1, 0.4, n).astype(float)
gender = rng.binomial(1, 0.5, n).astype(float)
country_tier = rng.standard_normal(n)
# Conversion propensity under control.
logit_p0 = (
-1.5 + 0.30 * age + 0.60 * prior_engagement + 0.20 * device
- 0.10 * gender + 0.20 * country_tier
)
p0 = expit(logit_p0)
p1 = np.clip(p0 + TRUE_LIFT, 0, 1)
Y0 = rng.binomial(1, p0)
Y1 = rng.binomial(1, p1)
# Randomized assignment.
perm = rng.permutation(n)
assigned_exposed = np.zeros(n, dtype=bool)
assigned_exposed[perm[:N_ASSIGNED_EXPOSED]] = True
# Non-random contamination: bidder picks up high-engagement,
# older holdouts via other audience segments.
holdout_idx = np.where(~assigned_exposed)[0]
contam_score = expit(
0.8 * prior_engagement[holdout_idx]
+ 0.5 * age[holdout_idx]
+ 0.4 * country_tier[holdout_idx]
)
probs = contam_score / contam_score.sum()
contam_local = rng.choice(
len(holdout_idx), size=CONTAMINATION_COUNT,
replace=False, p=probs,
)
saw_ads = assigned_exposed.copy()
saw_ads[holdout_idx[contam_local]] = True
Y_obs = np.where(saw_ads, Y1, Y0)
# Long-form panel: one pre-period (week 0) and one post-period
# (week 1). Time-invariant covariates broadcast across both.
rows = []
for i in range(n):
base = dict(
user_id=f"u{i:05d}",
age=age[i], device=device[i], gender=gender[i],
country_tier=country_tier[i],
prior_engagement=prior_engagement[i],
assigned_exposed=int(assigned_exposed[i]),
)
rows.append({**base, "week": 0, "converted": 0, "saw_ad": 0})
rows.append({
**base, "week": 1,
"converted": int(Y_obs[i]),
"saw_ad": int(saw_ads[i]),
})
return pd.DataFrame(rows)
def estimate_itt(post_df):
grp = post_df.groupby("assigned_exposed")["converted"].mean()
return grp[1] - grp[0]
def estimate_naive_tot(post_df):
grp = post_df.groupby("saw_ad")["converted"].mean()
return grp[1] - grp[0]
# ---- One representative draw with full diagnostics ----
df_demo = simulate_one(np.random.default_rng(42))
post_demo = df_demo[df_demo["week"] == 1]
res = MicroSynth({
"df": df_demo, "outcome": "converted", "treat": "saw_ad",
"unitid": "user_id", "time": "week",
"covariates": COVS,
"run_inference": True, "n_bootstrap": 200, "seed": 42,
"display_graphs": False,
}).fit()
itt = estimate_itt(post_demo)
tot = estimate_naive_tot(post_demo)
print(f"TRUE LIFT = {TRUE_LIFT:+.4f}")
print(f"ITT (contaminated) = {itt:+.4f} bias = {itt - TRUE_LIFT:+.4f}")
print(f"Naive TOT (biased) = {tot:+.4f} bias = {tot - TRUE_LIFT:+.4f}")
print(f"MicroSynth = {res.att:+.4f} bias = {res.att - TRUE_LIFT:+.4f}")
print(f" 95% CI = [{res.inference.ci[0]:+.4f}, {res.inference.ci[1]:+.4f}]")
print(f" Feasibility: {res.design.feasibility_message}")
print(f" ESS / n_C = {res.design.ess:.1f} / {len(res.design.w)}")
print(f" max |SMD| after weighting: {abs(res.design.smd_after).max():.2e}")
# ---- Monte Carlo replications ----
itt_vec = np.empty(N_SIMS)
naive_vec = np.empty(N_SIMS)
ms_vec = np.empty(N_SIMS)
rng_mc = np.random.default_rng(7)
for s in range(N_SIMS):
sim_rng = np.random.default_rng(rng_mc.integers(2**32))
df_s = simulate_one(sim_rng)
post_s = df_s[df_s["week"] == 1]
itt_vec[s] = estimate_itt(post_s)
naive_vec[s] = estimate_naive_tot(post_s)
ms_vec[s] = MicroSynth({
"df": df_s, "outcome": "converted", "treat": "saw_ad",
"unitid": "user_id", "time": "week",
"covariates": COVS,
"run_inference": False, "display_graphs": False,
}).fit().att
def summarize(vec, name):
bias = vec.mean() - TRUE_LIFT
sd = vec.std(ddof=1)
rmse = np.sqrt(((vec - TRUE_LIFT) ** 2).mean())
print(f" {name:<15} mean = {vec.mean():+.4f} "
f"bias = {bias:+.4f} SD = {sd:.4f} RMSE = {rmse:.4f}")
print()
print(f"Monte Carlo, {N_SIMS} replications:")
print(f" TRUE LIFT = {TRUE_LIFT:+.4f}")
summarize(itt_vec, "ITT")
summarize(naive_vec, "Naive TOT")
summarize(ms_vec, "MicroSynth")
Expected output (seed-dependent, but the pattern is stable):
TRUE LIFT = +0.0500
ITT (contaminated) = +0.0342 bias = -0.0158
Naive TOT (biased) = +0.0893 bias = +0.0393
MicroSynth = +0.0410 bias = -0.0090
95% CI = [-0.0033, +0.0949]
Feasibility: Balance achieved (max |SMD| = 2.32e-05 < tol = 1.00e-04).
ESS / n_C = 417.1 / 500
max |SMD| after weighting: 2.32e-05
Monte Carlo, 200 replications:
TRUE LIFT = +0.0500
ITT mean = +0.0319 bias = -0.0181 SD = 0.0211 RMSE = 0.0277
Naive TOT mean = +0.0791 bias = +0.0291 SD = 0.0198 RMSE = 0.0351
MicroSynth mean = +0.0528 bias = +0.0028 SD = 0.0203 RMSE = 0.0204
Across 200 replications MicroSynth recovers the true lift with bias under 30 basis points while both ITT and Naive TOT carry bias 1.8-2.9pp in opposite directions. MicroSynth’s RMSE is also lowest – it isn’t just unbiased, the variance is comparable to ITT, so total error is smaller. The single-draw diagnostic shows all standardized mean differences driven to ~2e-5 after weighting (the constraints are binding), and the effective sample size is 417 out of 500 clean holdouts (minimal weight concentration).
When Balancing Is Not Enough: ITT vs. As-Treated vs. CACE#
The study above is the happy case: contamination is selected on observed covariates, so balancing on them removes the bias. Reality is rarely so kind. Suppose the thing that makes a held-out user see the ad anyway – latent purchase intent, in-market status – is unobserved, and that same intent also lifts sales. Now the actually-exposed users are positively selected on a confounder you cannot put in the balancing constraint, and reweighting on age / income only removes the slice of that selection the covariates happen to explain. No amount of balancing recovers the truth from an as-treated comparison, because the bias lives in a variable the method never sees.
The decisive move is not to regroup users by what they received (exposed vs. not) – that is exactly what reintroduces the selection. Keep users in their randomized arm and let MicroSynth balance for precision, then either report the intent-to-treat (ITT) effect or divide it by the compliance gap to recover the per-exposure effect (a covariate-balanced Wald / CACE ratio):
The helper mlsynth.utils.microsynth_helpers.simulate_ad_holdout()
generates exactly this DGP – randomized assignment, holdout leakage
selected on latent intent, and an unobserved confounder in the sales
equation – and encodes treatment two ways: D_itt (assigned arm)
and D_att (actually exposed).
from mlsynth import MicroSynth
from mlsynth.utils.microsynth_helpers import simulate_ad_holdout
df, truth = simulate_ad_holdout(n_per_arm=8000, delta=1.0, seed=1)
gap = truth["compliance_gap"]
def att(treat_col):
return MicroSynth({
"df": df, "outcome": "sales", "treat": treat_col,
"unitid": "user_id", "time": "time",
"covariates": ["age", "income"],
"run_inference": False, "display_graphs": False,
}).fit().att
as_treated = att("D_att") # regroup by exposure -- the WRONG move
itt = att("D_itt") # randomized arms -- correct ITT
cace = itt / gap # per-exposure -- covariate-balanced Wald
print(f"true per-exposure delta = {truth['delta_per_exposure']:.3f}")
print(f"true ITT effect = {truth['itt_effect']:.3f}")
print(f"as-treated ATT = {as_treated:.3f} (biased: balancing "
f"cannot remove unobserved intent)")
print(f"ITT ATT = {itt:.3f} (~ true ITT effect)")
print(f"CACE = ITT / gap = {cace:.3f} (~ true per-exposure delta)")
Representative output:
true per-exposure delta = 1.000
true ITT effect = 0.779
as-treated ATT = 1.286 (biased: balancing cannot remove unobserved intent)
ITT ATT = 0.806 (~ true ITT effect)
CACE = ITT / gap = 1.035 (~ true per-exposure delta)
The as-treated estimate overstates the per-exposure effect by ~29%
even though balancing drives every standardized mean difference on
age and income below 1e-3 – the leftover bias is the unobserved
intent. ITT lands on the diluted campaign effect, and the Wald ratio
recovers the per-exposure effect while never breaking randomization.
The lesson is the boundary of the method: MicroSynth removes
imbalance on the covariates you give it; it is the estimand (ITT,
CACE), not the balancing, that handles non-compliance and unobserved
selection.
References#
Robbins, M.W., & Davenport, S. (2021). “microsynth: Synthetic Control Methods for Disaggregated and Micro-Level Data in R.” Journal of Statistical Software 97(2):1-31.
Robbins, M.W., Saunders, J., & Kilmer, B. (2017). “A Framework for Synthetic Control Methods With High-Dimensional, Micro-Level Data: Evaluating a Neighborhood-Specific Crime Intervention.” Journal of the American Statistical Association 112(517):109-126.
Hainmueller, J. (2012). “Entropy Balancing for Causal Effects: A Multivariate Reweighting Method to Produce Balanced Samples in Observational Studies.” Political Analysis 20(1):25-46.
Lin, S., Xu, M., Zhang, X., Chao, S.-K., Huang, Y.-K., & Shi, X. (2023). “Balancing Approach for Causal Inference at Scale.” In Proceedings of KDD ‘23, 4485-4496. (Distributed-computing implementation for large-scale settings.)