GeoLift Market Selection (GEOLIFT)#

Overview#

Most estimators in mlsynth are retrospective: a treatment has happened and we want its effect. GEOLIFT is prospective — a tool for synthetic experimental design in geo-experiments. Before any ad spend, it answers:

Which markets should be treated, for how long, so that a real lift would be detectable?

It is a faithful port of Meta’s GeoLift market-selection routine onto the mlsynth Augmented-SCM machinery (Ben-Michael, Feller & Rothstein, 2021 [BMFR2021]), with conformal inference (Chernozhukov, Wüthrich & Zhu, 2021 [CWZ2021]) and the standardized design/effect result contract. Reach for it when test markets must be chosen up front, you want a minimum detectable effect (MDE) and power per candidate region plus the deployable synthetic control, you may need to force markets in or out, and — once the experiment runs — you want to realize the chosen design into an effect report.

Mathematical Formulation#

Setup and notation#

There are $N$ markets $\mathcal{N} \coloneqq \{1, \dots, N\}$ and $T$ periods $t \in \mathcal{T} \coloneqq \{1, \dots, T\}$. The design uses only the pre-treatment window $\mathcal{T}_1 \coloneqq \{t \in \mathcal{T} : t \le T_0\}$ of length $T_0$; if post-treatment data exist they are sliced off (see Pre/post split). The outcome of market $j$ at time $t$ is $y_{jt}$, with market series $\mathbf{y}_j = (y_{j1}, \dots, y_{jT})^\top \in \mathbb{R}^{T}$.

A candidate test region is a set $\mathcal{S} \subseteq \mathcal{N}$ of $k \coloneqq |\mathcal{S}|$ markets (the treatment_size). It plays the role of the canonical treated unit through its aggregate series

\[\mathbf{y}^{\mathcal{S}}, \qquad y^{\mathcal{S}}_t \coloneqq \operatorname{agg}_{j \in \mathcal{S}} y_{jt}, \qquad \operatorname{agg} \in \Bigl\{\textstyle\sum,\ \operatorname{mean}\Bigr\}.\]

The donor pool is every other market, $\mathcal{N}_0(\mathcal{S}) \coloneqq \mathcal{N} \setminus \mathcal{S}$ with $N_0 \coloneqq N - k$, giving the donor matrix $\mathbf{Y}_0^{\mathcal{S}} \coloneqq [\mathbf{y}_j]_{j \in \mathcal{N}_0(\mathcal{S})} \in \mathbb{R}^{T \times N_0}$. The sum aggregate is GeoLift’s default (the right object for total spend/lift); the mean keeps $\mathbf{y}^{\mathcal{S}}$ at donor scale — inside the donor convex hull — for a better-posed fit.

Stage 1 — Candidate nomination#

Enumerating all $\binom{N}{k}$ regions is intractable, so GeoLift nominates a tractable shortlist by correlation similarity. On the pre-period, form the Pearson correlation matrix $\mathbf{P} = [\rho_{ij}] \in \mathbb{R}^{N \times N}$,

\[\rho_{ij} \coloneqq \frac{\sum_{t \in \mathcal{T}_1}(y_{it} - \bar{y}_i)(y_{jt} - \bar{y}_j)} {\sqrt{\sum_{t \in \mathcal{T}_1}(y_{it} - \bar{y}_i)^2}\; \sqrt{\sum_{t \in \mathcal{T}_1}(y_{jt} - \bar{y}_j)^2}}, \qquad \bar{y}_i \coloneqq T_0^{-1}\!\!\sum_{t \in \mathcal{T}_1} y_{it}.\]

For each anchor $i$, let $\pi_i$ order the other markets by descending correlation, $\rho_{i,\pi_i(1)} \ge \rho_{i,\pi_i(2)} \ge \dots \ge \rho_{i,\pi_i(N-1)}$. The deterministic nominee anchored at $i$ is that market plus its $k-1$ nearest neighbours,

\[\mathcal{S}_i \coloneqq \{i\} \cup \bigl\{\pi_i(1), \dots, \pi_i(k-1)\bigr\},\]

and the shortlist is $\{\mathcal{S}_i\}_{i \in \mathcal{N}}$ deduplicated — $N$ candidates instead of $\binom{N}{k}$. The stochastic (“paired-jitter”) variant replaces ranks $1, \dots, k-1$ by one draw from each adjacent pair $\{1,2\}, \{3,4\}, \dots$, exploring near-rank neighbours (run_stochastic; stochastic_mode="global" is faithful to GeoLift, "per_anchor" draws independently per anchor).

Forcing constraints. Given a forced-in set $\mathcal{S}_{\mathrm{in}}$ (to_be_treated) and a forbidden set $\mathcal{S}_{\mathrm{out}}$ (not_to_be_treated, $\mathcal{S}_{\mathrm{in}} \cap \mathcal{S}_{\mathrm{out}} = \varnothing$), nominees are drawn from the free pool $\mathcal{F} \coloneqq \mathcal{N} \setminus (\mathcal{S}_{\mathrm{in}} \cup \mathcal{S}_{\mathrm{out}})$ at size $k - |\mathcal{S}_{\mathrm{in}}|$ and unioned with the forced-in set, so every candidate satisfies $\mathcal{S}_{\mathrm{in}} \subseteq \mathcal{S}$ and $\mathcal{S} \cap \mathcal{S}_{\mathrm{out}} = \varnothing$.

Stage 2 — The synthetic control#

For a candidate $\mathcal{S}$, the counterfactual is a weighted donor combination. The default is the Augmented SCM [BMFR2021]. Each period is first centred by the donor mean $\mu_t \coloneqq N_0^{-1}\sum_{j \in \mathcal{N}_0}\! y_{jt}$ (the augsynth intercept), giving $\widetilde{\mathbf{y}}^{\mathcal{S}} = \mathbf{y}^{\mathcal{S}} - \boldsymbol{\mu}$ and $\widetilde{\mathbf{Y}}_0 = \mathbf{Y}_0 - \boldsymbol{\mu}\mathbf{1}^\top$. A base simplex SCM is solved on the pre-period,

\[\mathbf{w}^{\mathrm{scm}} \in \operatorname*{argmin}_{\mathbf{w} \in \Delta^{N_0}} \bigl\| \widetilde{\mathbf{y}}^{\mathcal{S}}_{\mathcal{T}_1} - \widetilde{\mathbf{Y}}_{0,\mathcal{T}_1}\mathbf{w} \bigr\|_2^2, \quad \Delta^{N_0} \coloneqq \{\mathbf{w} \in \mathbb{R}_{\ge 0}^{N_0} : \|\mathbf{w}\|_1 = 1\},\]

then ridge-augmented to close the residual pre-period imbalance,

\[\mathbf{w}^\ast = \mathbf{w}^{\mathrm{scm}} + \widetilde{\mathbf{Y}}_{0,\mathcal{T}_1}^{\top} \bigl(\widetilde{\mathbf{Y}}_{0,\mathcal{T}_1}\widetilde{\mathbf{Y}}_{0,\mathcal{T}_1}^{\top} + \lambda \mathbf{I}\bigr)^{+} \bigl(\widetilde{\mathbf{y}}^{\mathcal{S}}_{\mathcal{T}_1} - \widetilde{\mathbf{Y}}_{0,\mathcal{T}_1}\mathbf{w}^{\mathrm{scm}}\bigr),\]

with the penalty $\lambda$ chosen by leave-one-period-out cross-validation. The counterfactual and gap follow the canon,

\[\widehat{y}^{\mathcal{S}}_t \coloneqq \bigl(\mathbf{Y}_0\mathbf{w}^\ast\bigr)_t, \qquad \tau_t \coloneqq y^{\mathcal{S}}_t - \widehat{y}^{\mathcal{S}}_t, \qquad \widehat{\tau} \coloneqq |\mathcal{T}_2|^{-1}\!\!\sum_{t \in \mathcal{T}_2}\! \tau_t .\]

The augment=None variant is the plain simplex SCM with an explicit intercept $\alpha = \operatorname{mean}_{\mathcal{T}_1}(\mathbf{y}^{\mathcal{S}} - \mathbf{Y}_0\mathbf{w}^\ast)$, predicting $\widehat{y}^{\mathcal{S}}_t = \alpha + (\mathbf{Y}_0\mathbf{w}^\ast)_t$.

Pre-fit quality is the scaled L2 imbalance — the fitted pre-period imbalance relative to the imbalance of uniform donor weights $\mathbf{w}^{\mathrm{unif}} \coloneqq N_0^{-1}\mathbf{1}$,

\[\kappa(\mathcal{S}) \coloneqq \frac{\bigl\|\mathbf{Y}_{0,\mathcal{T}_1}\mathbf{w}^\ast - \mathbf{y}^{\mathcal{S}}_{\mathcal{T}_1}\bigr\|_2} {\bigl\|\mathbf{Y}_{0,\mathcal{T}_1}\mathbf{w}^{\mathrm{unif}} - \mathbf{y}^{\mathcal{S}}_{\mathcal{T}_1}\bigr\|_2} \;\in\; [0, \infty),\]

so $\kappa = 0$ is a perfect fit and $\kappa = 1$ is no better than the donor average. It is unitless (hence comparable across regions of different magnitudes) and is reported per candidate.

Stage 3 — Power simulation#

Power is estimated by placebo-in-time experiments carved from the end of the pre-period. For a treatment duration $\ell$ (a durations entry) and a lookback placement $s \in \{1, \dots, L\}$ (with $L =$ lookback_window), the pseudo-treatment window is the $\ell$ periods ending $s - 1$ before $T_0$,

\[\mathcal{T}^{(s,\ell)}_{2} \coloneqq \{\,T_0 - \ell - s + 2,\ \dots,\ T_0 - s + 1\,\}, \qquad \mathcal{T}^{(s,\ell)}_{1} \coloneqq \{1, \dots, T_0 - \ell - s + 1\},\]

faithful to GeoLift’s max_time - tp - sim + 2 with $T_0$ the pre-period end. The SCM is fit on $\mathcal{T}^{(s,\ell)}_{1}$. For an effect size $\delta$ (an effect_sizes entry) a known multiplicative lift is injected on the pseudo-post block, $y^{\mathcal{S},(\delta)}_t = (1+\delta)\,y^{\mathcal{S}}_t$ for $t \in \mathcal{T}^{(s,\ell)}_2$, which shifts the gap by $\delta\,y^{\mathcal{S}}_t$ there.

Detection uses the conformal test [CWZ2021]. With the post-block statistic

\[S_q(\boldsymbol{\tau}) \coloneqq \Bigl( |\mathcal{T}_2|^{-1/2} \textstyle\sum_{t \in \mathcal{T}_2} |\tau_t|^{q} \Bigr)^{1/q} \qquad (q = 1 \text{ by default}),\]

the joint-null p-value compares the observed statistic to $n_s$ i.i.d. permutations $\Pi$ of the residual path,

\[p \coloneqq \frac{1}{n_s}\sum_{\Pi} \mathbf{1}\!\bigl\{\, S_q(\boldsymbol{\tau}_{\mathcal{T}_2}) \le S_q\bigl((\Pi\boldsymbol{\tau})_{\mathcal{T}_2}\bigr) \,\bigr\},\]

and an effect is detected when $p < \alpha$. The permutation set $\Pi$ follows conformal_type: "iid" (the augsynth/GeoLift default — $n_s$ independent draws) or "block" (the $T$ moving-block cyclic shifts $\Pi_k(t) = ((t + k) \bmod T)$, which preserve serial dependence and are deterministic, ignoring $n_s$). Power is the detection rate across the $L$ lookback placements,

\[\beta(\mathcal{S}, \ell, \delta) \coloneqq \frac{1}{L}\sum_{s=1}^{L} \mathbf{1}\!\bigl\{\, p^{(s)}(\mathcal{S}, \ell, \delta) < \alpha \,\bigr\}.\]

Fit-once, sweep-$\delta$ (an exact optimization)

The injection touches only the post block, so the pre-period the cross-validation sees is identical across effect sizes; the CV-selected $\lambda$ is therefore the same for every $\delta$. mlsynth cross-validates once per $(\mathcal{S}, \ell, s)$ and reuses $\lambda$ across $\delta$ (augsynth’s own behaviour). This is provably identical to GeoLift’s per-$\delta$ refit — pinned by test_simulate_lookback_cv_once_equals_per_es_refit — at $1/|\{\delta\}|$ the cross-validation cost.

Stage 4 — MDE and the composite rank#

The minimum detectable effect for a region/duration is the smallest-magnitude effect whose power clears the threshold $\beta_0$ (power_threshold, default $0.8$),

\[\delta^\ast(\mathcal{S}, \ell) \coloneqq \operatorname*{arg\,min}_{\delta \,:\, \beta(\mathcal{S},\ell,\delta) \ge \beta_0} |\delta|,\]

with GeoLift’s signed positive/negative tie rule. Writing $\widehat{\delta}$ for the recovered lift at $\delta^\ast$ and the recovery error $\eta(\mathcal{S},\ell) \coloneqq |\widehat{\delta} - \delta^\ast|$, the composite rank is the mean of three dense ranks ($\operatorname{dr}$ over the surviving candidates), faithful to GeoLift,

\[r(\mathcal{S}, \ell) = \operatorname{rank}\!\left(\tfrac13\Bigl[ \operatorname{dr}\!\bigl(|\delta^\ast|\bigr) + \operatorname{dr}\!\bigl(\beta\bigr) + \operatorname{dr}\!\bigl(\eta\bigr) \Bigr]\right), \qquad \text{(lower is better).}\]

Note

Two GeoLift-fidelity quirks, replicated as-is: $\operatorname{dr}(\beta)$ is ascending (an MDE whose power sits just above $\beta_0$ is a tighter estimate of the threshold, so it ranks better), and the scaled L2 imbalance $\kappa$ is not a ranking term — only $\delta^\ast$, $\beta$, and $\eta$ enter. Both are documented and one line to change.

Identifying assumptions#

Pre-period synthesizability. The aggregate $\mathbf{y}^{\mathcal{S}}$ lies in (or near) the span/convex hull of the donor pool over $\mathcal{T}_1$. Quantified by $\kappa(\mathcal{S})$: a low value certifies that the synthetic tracks the region, the prerequisite for a credible counterfactual.

Remark. With how="sum" the target is $k\times$ donor scale and can sit outside the convex hull, inflating $\kappa$ toward 1; how="mean" restores synthesizability. The choice is the user’s.
Exchangeability under the null. The conformal test treats the residual path as exchangeable under $H_0$: no effect, which the all-period refit underlying $p$ is designed to deliver [CWZ2021].
Stationarity of the placebo windows. Power from the lookback placements transports to the real experiment only if the pre-period dynamics resemble the experiment window — the usual SC stability assumption.

Budget planning (CPIC)#

Setting cpic (cost per incremental conversion) turns each candidate’s MDE into a spend, faithful to GeoLiftMarketSelection:

\[\mathrm{investment} \;=\; \mathrm{cpic} \times \delta \times \sum_{i \in \mathcal{S}} \sum_{t \in \mathcal{W}} Y_{it},\]

i.e. cost-per-incremental $\times$ effect size $\times$ the summed treated volume over the (lookback) window — the baseline outcome, on the total scale, independent of the mean-of-units fit. The shortlist carries an investment column; supplying budget drops candidates whose detectable investment exceeds it (GeoLift’s abs(budget) > abs(Investment) gate). The realized report adds the post-test cost $= \mathrm{cpic} \times$ incremental outcome. The investment is a deterministic data transform, so it matches GeoLiftMarketSelection to the cent (durable case geolift_cpic); ROI (a value/margin per conversion) is a planned extension beyond GeoLift’s cost-only cpic.

Inference and the realized design#

The design phase reports $\delta^\ast$ and $\beta$ per candidate. When a post_col leaves a post window, GEOLIFT.fit() realizes the winning design under the hood — applying the winner’s pre-period weights $\mathbf{w}^\ast$ to the full panel and running conformal inference: the per-period effect $\tau_t$ for $t \in \mathcal{T}_2$, prediction intervals by test inversion (a grid of nulls $\tau_0$, the interval being the non-rejected range at level $\alpha$), and the joint-null p-value $p$ — exposed on result.report (the DesignResult resolving to its EffectResult). A design over a no-effect post window returns $p$ non-significant and intervals covering zero.

Pipeline and Options#

The estimator is a thin front door over the helper pipeline; each stage above is a tested leaf in mlsynth/utils/geolift_helpers/. The public surface is a single GEOLIFT.fit() — realization and plotting are handled inside it, driven by the data and config (a post_col triggers realization; display_graphs triggers the plot).

treatment_size $k$, durations $\{\ell\}$, effect_sizes $\{\delta\}$, lookback_window $L$.
to_be_treated / not_to_be_treated — $\mathcal{S}_{\mathrm{in}}$ / $\mathcal{S}_{\mathrm{out}}$.
post_col — a 0/1 column marking post-treatment periods; the design slices to $\mathcal{T}_1$, so it is identical whether you pass the full post-treatment panel or a pre-only one (the “rerun after treatment” invariance). Different post lengths simply change $T_0$.
how ($\operatorname{sum}$ / $\operatorname{mean}$), augment ("ridge" / None), alpha $\alpha$, power_threshold $\beta_0$, ns $n_s$, run_stochastic / stochastic_mode.
conformal_type — the conformal permutation scheme, "iid" (default, matching GeoLift) or "block" (moving-block cyclic shifts for serially-dependent residuals; GeoLift’s conformal_type = "block" option).

Scanning several durations yields an MDE per duration (”$\ell = 7$ detects 10%, but $\ell = 14$ is needed for 5%”):

res = GEOLIFT({..., "durations": [7, 14, 21]}).fit()
res.power[["candidate", "duration", "mde", "power"]]   # one row per (S, l)

Plotting#

With display_graphs (default True), GEOLIFT.fit() plots the recommended design in the mlsynth house style (mlsynth.utils.plotting.mlsynth_style()): the design phase shows $\mathbf{y}^{\mathcal{S}}$ vs $\widehat{\mathbf{y}}^{\mathcal{S}}$ over $\mathcal{T}_1$; the post phase (when the design was realized) adds the conformal band and the per-period gap $\tau_t$ over $\mathcal{T}_2$, with the intervention line at $T_0$. The standalone helper mlsynth.utils.geolift_helpers.marketselect.plotter.plot_design() re-draws from a result on demand.

Example: GeoLift’s 40-Market Panel#

The package ships GeoLift’s example panel (basedata/geolift_market_data.csv): $N = 40$ markets over $T = 90$ days. We design a $k = 3$ test region, then realize it over a 10-day no-effect post window (so the realized effect should be null).

import pandas as pd
from mlsynth import GEOLIFT

df = pd.read_csv(
    "https://raw.githubusercontent.com/jgreathouse9/mlsynth/"
    "refs/heads/main/basedata/geolift_market_data.csv"
)
df["post"] = df["date"].isin(sorted(df["date"].unique())[-10:]).astype(int)

geo = GEOLIFT({
    "df": df, "outcome": "Y", "unitid": "location", "time": "date",
    "post_col": "post",                 # design on the pre-80, post-10 reserved
    "treatment_size": 3, "durations": [14], "effect_sizes": [0.0, 0.1, 0.2],
    "lookback_window": 3, "how": "mean", "augment": "ridge", "ns": 100,
})
res = geo.fit()              # designs, auto-realizes (post window) and plots
print(res.selected_units, res.search.winner.mde, res.search.winner.power)
print("joint conformal p:", round(res.report.inference.p_value, 3))   # ~0.66 -> null

Without a post_col the same call returns a design-only result (res.report is None); set display_graphs=False to suppress the plot.

The synthetic tracks the test markets, $\widehat{\tau} \approx 0$, and the joint conformal p-value is far from significant — the correct null over a placebo post period.

Every option at once — the full market-selection call#

GeoLift’s GeoLiftMarketSelection exposes the whole design surface in one call:

MarketSelections <- GeoLiftMarketSelection(data = GeoTestData_PreTest,
    treatment_periods = c(10, 15),
    N = c(2, 3, 4, 5),
    Y_id = "Y", location_id = "location", time_id = "time",
    effect_size = seq(0, 0.2, 0.05),
    lookback_window = 1,
    include_markets = c("chicago"),
    exclude_markets = c("honolulu"),
    cpic = 7.50, budget = 100000,
    alpha = 0.1, Correlations = TRUE,
    fixed_effects = TRUE, side_of_test = "two_sided")

Every argument maps onto a GEOLIFT config field:

import pandas as pd
from mlsynth import GEOLIFT

df = pd.read_csv(                                      # GeoLift_PreTest, 40 mkts x 90d
    "https://raw.githubusercontent.com/jgreathouse9/mlsynth/"
    "refs/heads/main/basedata/geolift_market_data.csv"
)

cfg = {
    "df": df, "outcome": "Y", "unitid": "location", "time": "date",
    "durations": [10, 15],                       # treatment_periods
    "effect_sizes": [0.0, 0.05, 0.10, 0.15, 0.20],
    "lookback_window": 1,
    "to_be_treated": ["chicago"],                # include_markets
    "not_to_be_treated": ["honolulu"],           # exclude_markets
    "cpic": 7.50, "budget": 100000.0,            # budget planning
    "alpha": 0.1,
    "fixed_effects": True,                       # GeoLift default
    "how": "mean", "augment": "ridge",
    "conformal_type": "iid",                     # two-sided |stat| permutation
    "display_graphs": False,
}
# N = c(2,3,4,5): GEOLIFT takes one treatment_size; scan by looping it.
shortlist = pd.concat(
    GEOLIFT({**cfg, "treatment_size": k}).fit().search.shortlist for k in (2, 3, 4, 5)
).sort_values("rank")
shortlist[["candidate", "duration", "mde", "power", "investment", "scaled_l2", "rank"]]

A few argument notes: Correlations=TRUE is GeoLift’s correlation ranking, which mlsynth always uses to nominate candidates (Stage 1); side_of_test = "two_sided" is the default conformal statistic (the symmetric $|x|^q$ norm); cpic + budget drop candidates whose detectable investment busts the budget (see Budget planning above).

The recommended designs (GeoLift’s BestMarkets table; mlsynth’s shortlist carries the same candidates and reproduces Investment to the cent, see GEOLIFT — Meta’s GeoLift walkthrough (augsynth cross-validation)):

Test markets	Dur	Investment	AvgATT	L2	Rank
chicago, cincinnati, houston, portland	15	74,118.38	159.36	0.197	1
chicago, portland	15	64,563.75	290.01	0.174	1
chicago, cincinnati, houston, portland	10	99,027.75	316.62	0.197	3
chicago, portland	10	43,646.25	300.94	0.168	3
chicago, houston, portland	10	75,389.25	350.31	0.231	5
chicago, cincinnati, houston, nashville, san d.	15	95,755.50	146.80	0.270	6
atlanta, chicago	15	81,348.75	336.78	0.446	7
atlanta, chicago, cleveland, las vegas	15	86,661.75	220.82	0.532	7

(28 designs total; the maximum surviving investment is $99,321.75 < the $100k budget, so the budget gate held.) Lower Rank is better; the smallest, lowest- imbalance region that clears power wins — here chicago, portland.

Reading the results — plots, tables, and weights#

Everything the design and the realized report produce lives on the result object. The complete, runnable example below loads the data, realizes a design with a budget (cpic), and then draws every view — the power / MDE / budget table, observed vs synthetic, the effect with its conformal band, the donor weights, the realized cost, and the built-in plot:

import pandas as pd
import matplotlib.pyplot as plt
from mlsynth import GEOLIFT
from mlsynth.utils.geolift_helpers.marketselect.plotter import plot_design

url = ("https://raw.githubusercontent.com/jgreathouse9/mlsynth/"
       "refs/heads/main/basedata/geolift_test_data.csv")
df = pd.read_csv(url)                                   # GeoLift_Test: 40 mkts x 105 days
dates = sorted(df["date"].unique())
df["post"] = df["date"].isin(dates[90:]).astype(int)   # last 15 days = treatment window

res = GEOLIFT({
    "df": df, "outcome": "Y", "unitid": "location", "time": "date",
    "treatment_size": 2, "to_be_treated": ["chicago", "portland"],
    "durations": [15], "effect_sizes": [0.0, 0.10], "post_col": "post",
    "cpic": 7.50, "how": "mean", "fixed_effects": True, "display_graphs": False,
}).fit()

# headline + the power / MDE / budget shortlist (one row per candidate x duration)
print(res.selected_units, res.report.effects.att, res.report.inference.p_value)
print(res.power[["candidate", "duration", "mde", "power", "investment"]])
print(res.report.weights.summary_stats["cost"])        # realized spend = cpic x incremental

# observed vs synthetic
ts = res.report.time_series
plt.plot(ts.time_periods, ts.observed_outcome, "k", label="observed")
plt.plot(ts.time_periods, ts.counterfactual_outcome, "r--", label="synthetic")
plt.axvline(ts.intervention_time, color="grey", ls=":"); plt.legend(); plt.show()

# the effect (gap) with its conformal prediction band
d = res.report.inference.details
plt.plot(ts.time_periods, ts.estimated_gap, "k")
plt.fill_between(d["periods"], d["lower"], d["upper"], alpha=0.3)
plt.axhline(0, color="grey", ls=":"); plt.show()

# donor weights, biggest contributors
w = res.report.weights.donor_weights
top = dict(sorted(w.items(), key=lambda kv: -abs(kv[1]))[:10])
plt.bar(top.keys(), top.values()); plt.xticks(rotation=45, ha="right"); plt.show()

# the built-in plot (design + realized phases: conformal band + gap)
plot_design(res, report=res.report, show=True)

For the full power-vs-effect-size curve of a region (GeoLift’s GeoLiftPower plot) — power rising through the threshold marks the MDE — run the scoring helpers directly:

import pandas as pd
import matplotlib.pyplot as plt
from mlsynth.utils.datautils import geoex_dataprep
from mlsynth.utils.geolift_helpers.marketselect.helpers.batch import run_simulations
from mlsynth.utils.geolift_helpers.marketselect.helpers.aggregate import compute_power

url = ("https://raw.githubusercontent.com/jgreathouse9/mlsynth/"
       "refs/heads/main/basedata/geolift_market_data.csv")
Ywide = geoex_dataprep(pd.read_csv(url), "location", "date", "Y")["Ywide"]
cube = run_simulations(
    Ywide, [frozenset({"chicago", "portland"})], durations=[15],
    lookback_window=1, effect_sizes=[0.0, 0.05, 0.10, 0.15, 0.20],
    fixed_effects=True, ns=500)
pw = compute_power(cube, alpha=0.10)
plt.plot(pw["effect_size"], pw["power"], marker="o")
plt.axhline(0.8, ls="--", color="grey")                # power threshold -> MDE crossing
plt.xlabel("injected lift"); plt.ylabel("power"); plt.show()

Multi-cell designs#

A multi-cell experiment runs several treatments at once — different channels, budgets, or creatives — each on its own group of geos (“cells” $A, B, \dots$), all measured against a shared control pool over the same window (GeoLift’s GeoLiftMultiCell). The dedicated estimator is mlsynth.MULTICELLGEOLIFT; its data model is a unit-level cell-membership column ("A" / "B" / … for treated geos; blank or a control_label for controls) plus a post_col window:

import pandas as pd
from mlsynth import MULTICELLGEOLIFT

url = ("https://raw.githubusercontent.com/jgreathouse9/mlsynth/"
       "refs/heads/main/basedata/geolift_test_data.csv")
df = pd.read_csv(url)                                   # GeoLift_Test: 40 mkts x 105 days
dates = sorted(df["date"].unique())
df["post"] = df["date"].isin(dates[90:]).astype(int)   # last 15 days = treatment window

#   cell:  "A" -> social-media markets, "B" -> paid-search markets, "" -> control
cell = {"chicago": "A", "portland": "A", "atlanta": "B", "boston": "B"}
df["cell"] = df["location"].map(cell).fillna("")        # blank = shared control pool

res = MULTICELLGEOLIFT({
    "df": df, "outcome": "Y", "unitid": "location", "time": "date",
    "cell_column_name": "cell", "post_col": "post", "fixed_effects": True,
}).fit()

res.cells["A"].effects.att          # cell A's per-unit ATT (a full EffectResult)
res.cells["A"].inference.p_value    # cell A's conformal p
res.comparison                      # pairwise [{cell_a, cell_b, att_diff, winner}, ...]
res.winner                          # cell that wins every comparison, or None

Each cell is measured with the same fixed-effect ASCM + conformal inference as a single cell, excluding the other cells’ markets from its donor pool (they are treated, hence contaminated). The cross-cell winner uses GeoLift’s non-overlapping-CI rule; with one cell it is identical to single-cell GEOLIFT. See MULTICELLGEOLIFT — multi-cell GeoLift analysis for the full treatment, the per-cell plots (plot_multicell), and the augsynth cross-validation.

Verification#

The realized effect report is cross-validated against GeoLift/augsynth value-for-value on the package’s own GeoLift_Walkthrough example: with fixed_effects=True (the default), GEOLIFT reproduces the walkthrough’s per-unit ATT (155.6), percent lift (5.4%), summed incremental (4667), and conformal p-value (0.01). See GEOLIFT — Meta’s GeoLift walkthrough (augsynth cross-validation) for the four ingredients required to match (unit fixed effects, mean-of-units fit target, the all-period conformal refit, and augsynth’s period-space ridge ASCM) and the calibration/placebo evidence behind them. The market-selection stages (no published table) remain a faithful port validated end-to-end on GeoLift’s own data, with each documented divergence (the CV-once optimization proven exact, the corrected per-anchor RNG) available as an opt-in, tested swap.

[BMFR2021] (1,2)

Ben-Michael, E., Feller, A., & Rothstein, J. (2021). The Augmented Synthetic Control Method. Journal of the American Statistical Association.

[CWZ2021] (1,2,3)

Chernozhukov, V., Wüthrich, K., & Zhu, Y. (2021). An Exact and Robust Conformal Inference Method for Counterfactual and Synthetic Controls. Journal of the American Statistical Association.

Core API#

class mlsynth.GEOLIFT(config: GeoLiftConfig | dict)#

GeoLift market-selection design.

Chooses which markets to treat before an experiment by simulating power over the historical (pre-treatment) panel, then – once outcomes are observed (a post_col leaving a pre/post split) – realizes the chosen design into a standardized effect report with conformal inference.

Parameters:: config (GeoLiftConfig or dict) – Configuration. See mlsynth.utils.geolift_helpers.config.GeoLiftConfig.

Examples

>>> from mlsynth import GEOLIFT
>>> res = GEOLIFT({"df": panel, "outcome": "Y", "unitid": "location",
...                "time": "date", "treatment_size": 3, "durations": [14],
...                "effect_sizes": [0.0, 0.1, 0.2]}).fit()
>>> res.selected_units

fit() → GEOLIFTResults#

Run the market-selection design and return the result.

Behaviour is driven by the data and config, not by manual sequencing:

always runs the design (candidate nomination -> power -> MDE -> rank -> per-candidate synthetic controls);
when a post_col leaves a post-treatment window, realizes the winning design on the full panel under the hood, populating result.report (conformal effect report) – the DesignResult resolving to its EffectResult;
when display_graphs is set, plots the recommended design (design phase, or the realized post phase).

class mlsynth.utils.geolift_helpers.config.GeoLiftConfig(*, df: DataFrame, outcome: str, unitid: str, time: str, treatment_size: int, to_be_treated: List | None = None, not_to_be_treated: List | None = None, durations: List[int], effect_sizes: List[float], lookback_window: int = 1, post_col: str | None = None, how: str = 'sum', augment: str | None = 'ridge', fixed_effects: bool = True, alpha: float = 0.1, power_threshold: float = 0.8, cpic: float | None = None, budget: float | None = None, ns: int = 1000, conformal_type: str = 'iid', run_stochastic: bool = False, stochastic_mode: str = 'global', seed: int = 0, display_graphs: bool = True)#

Configuration for the GeoLift market-selection design (GEOLIFT).

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].