Proximal Inference Synthetic Control (PROXIMAL)#
When to Use This Estimator#
Proximal inference is, by design, a different theory of identification from everything else in the synthetic-control family – the Bayesian (Bayesian Synthetic Control with a Soft Simplex Constraint (BVS-SS)), staggered-adoption (Sequential Synthetic Difference-in-Differences (Sequential SDiD)), matrix-completion (Matrix Completion with Nuclear Norm Minimization (MCNNM)), and forward-selection (Forward Difference-in-Differences (FDID)) variants alike. All of those identify the counterfactual by matching: they assume some combination of donors reproduces the treated unit’s latent trajectory, and they treat a good pre-treatment fit as evidence that the assumption holds. PROXIMAL begins from the opposite admission – that a time-varying confounder you cannot match away is present, and that a good-looking pre-fit can still be biased – and identifies the effect by instrumenting that confounder instead of matching it. Because this is a genuinely different identification strategy, the sections below build it up from scratch: what a proxy is, what a surrogate is, and why this counts as its own theory. First, the regimes where it pays off.
The synthetic control (SC) method of Abadie and co-authors [ABADIE2010] is justified by a latent-factor model: each unit’s outcome is driven by a common, time-varying confounder \(\boldsymbol{\lambda}_t\) (the “interactive fixed effect”) loaded differently across units. Classical SC regresses the treated unit’s pre-treatment outcomes on the donors’ and takes the fitted weights as the synthetic control. Abadie shows this is (approximately) unbiased only as the number of pre-treatment periods grows without bound, and even then only when a good pre-treatment fit is attainable.
That leaves two regimes where classical SC is unreliable, and where PROXIMAL is the right tool:
Short pre-period / poor pre-fit. With few pre-treatment periods, or when no convex combination of donors closely tracks the treated unit, the bias bound does not bite and the OLS/WLS weights are inconsistent – the donor outcomes are error-laden proxies of \(\boldsymbol{\lambda}_t\), so the regressor is correlated with the residual (a textbook errors-in-variables problem). The bias does not vanish as the pre-period grows.
Long or structurally-broken post-period. When the post-period is long or contains trend breaks, extrapolating a pre-period fit forward is fragile. If you also observe surrogates – post-treatment series predictive of the treatment effect – PROXIMAL can borrow that post-period information to sharpen the estimate, which classical SC simply discards.
The fix, due to Shi, Li, Miao, Hu and Tchetgen Tchetgen [ProxSCM], is to stop using every control as a regressor. Instead, split the controls: some become donors that build the synthetic control, and the rest become proxies (negative controls) that are associated with the units only through the latent factor \(\boldsymbol{\lambda}_t\). The proxies serve as instruments that purge the measurement error, yielding consistent weights and valid inference via the generalized method of moments (GMM). Liu, Tchetgen Tchetgen and Varjão [LiuTchetgenVar] extend this to surrogates, time-varying correlates of the causal effect observed post-treatment.
A Different Theory of Identification#
Most of causal inference identifies effects by removing confounding. Either you condition on enough covariates that treatment is as-good-as-random – the no-unmeasured-confounding (ignorability) assumption – or, in synthetic control, you find donor weights that reproduce the treated unit so closely that whatever drove selection is matched away. Both routes assume the confounding can be observed and neutralized.
Proximal causal inference makes a different bet. It concedes that an unmeasured confounder remains – here, the latent factor \(\boldsymbol{\lambda}_t\) that drives both the outcomes and the timing of treatment – and that you will never observe it directly. Rather than assume it away, it asks for two observable shadows of that confounder and uses them to algebraically subtract the confounding from the estimate. This is exactly the logic epidemiologists use with negative controls to detect and correct hidden bias (Lipsitch et al.; Shi, Miao, Nelson and Tchetgen Tchetgen [ShiNegControl], who give the double-negative-control identification and multiply-robust estimation theory PROXIMAL descends from). The proximal SC papers import it into panel data: \(\boldsymbol{\lambda}_t\) is the confounder, and the control units are its shadows.
Matching / ignorability (classical SC, DiD, the rest of mlsynth) |
Proximal / negative control (PROXIMAL) |
|
|---|---|---|
Core assumption |
The confounder is matched or conditioned away; pre-fit is good. |
A confounder remains; we observe valid proxies for it. |
What identifies the effect |
A donor combination that reproduces the treated trajectory. |
Proxies that instrument the latent factor. |
Good pre-fit is… |
necessary evidence the design is credible. |
neither necessary nor sufficient – bias can hide behind it. |
Fails when |
no convex/linear match exists, or the pre-period is short. |
no variable is a valid proxy (or proxies are irrelevant). |
The practical upshot: PROXIMAL is not a better way to fit the donors, nor a shrinkage/pooling trick like the Bayesian or staggered variants. It changes the assumption you must defend – from “my synthetic control matches” to “I have valid proxies for the latent confounder.”
What Counts as a Proxy?#
A proxy (synonymously, a negative control) is a variable that is
associated with the latent confounder \(\boldsymbol{\lambda}_t\), but
has no direct causal link to the treated unit’s outcome – its only connection to that outcome runs through \(\boldsymbol{\lambda}_t\).
Condition (1) is relevance (a proxy unrelated to the factor is useless, just like a weak instrument); condition (2) is exclusion (a proxy with its own path to the outcome would inject new bias). Proxies come in two roles, mirroring the negative-control pair:
A negative-control outcome is not affected by the treatment but is driven by the same latent factor. In SC the donor outcomes themselves play this role – controls are, by the no-interference assumption, unaffected by the treated unit’s treatment – and they build the synthetic control.
A negative-control exposure is associated with the latent factor but is not a direct cause of the outcome. In SC the outcomes of controls excluded from the donor pool serve here: they proxy the factor but do not enter the synthetic control. These are the \(\mathbf{Z}_0\) in the formulas below.
Where do real proxies come from?
Epidemiology (the origin). To study whether the flu vaccine cuts flu hospitalization – confounded by unmeasured health-seeking behavior – one uses a non-flu outcome such as injury/trauma hospitalization as a negative-control outcome: the vaccine cannot plausibly affect it, yet it shares the health-seeking confounder, so a non-zero “effect” on it exposes the bias.
Synthetic control. Control units dropped from the donor pool because they ran similar interventions or risk spillover are ideal proxies: they track the common factor but violate no-interference if used as donors. (In Abadie’s tobacco study, 38 of 50 states were eligible but only a handful received weight; the rest can be proxies.) So can treatment-free contemporaneous covariates of the donors – a sector index, market trading volume, weather – that move with \(\boldsymbol{\lambda}_t\) but are not caused by the treatment.
Marketing / geo experiments. In a regional campaign, a category- demand or search-volume index in untreated regions, or foot-traffic in markets the campaign never reached: associated with the macro demand factor, but with no direct line to the treated region’s sales.
What Counts as a Surrogate?#
A surrogate is a post-treatment variable driven by the same latent factors as the causal effect itself – not the confounder of the untreated outcome. It is predictive of how big the effect is, period by period. The defining contrast with a proxy:
a proxy carries information about \(\boldsymbol{\lambda}_t\), the confounder of the untreated outcome, and is used in the pre-period to recover the donor weights;
a surrogate carries information about \(\boldsymbol{\rho}_t\), the factors of the treatment effect, and is used in the post-period to sharpen or extend the estimate.
Loosely: a proxy cleans up the denominator (confounding); a surrogate informs the numerator (the effect). Crucially, a surrogate may itself be affected by the treatment – that is fine, because it is removed from the donor pool and used only to learn the effect’s trajectory, never to build the counterfactual.
Where do real surrogates come from?
Panic of 1907 (the paper’s example). The bid prices of the two other trusts that also suffered bank runs are useless as donors (the crisis hit them too), but their post-crisis movements track the very shock driving Knickerbocker’s effect – making them strong surrogates. Even Knickerbocker’s own bid price is used this way.
Marketing. After a price cut, fast downstream signals – app opens, add-to-cart rate, repeat-visit rate – respond to the same demand shock as revenue. They predict the revenue effect and arrive quickly, which is valuable when the post-launch revenue series is short or noisy.
Spillovers / partial treatment. Geographies that are partially treated or absorb spillover should not be donors, but they carry the treatment-effect signal and so make good surrogates.
Long-run effects. An early leading indicator of a long-horizon outcome (a classic “surrogate endpoint” in clinical trials) lets you estimate a long-run effect from a short post-treatment window.
The Methods#
PROXIMAL exposes six estimators. They are idiosyncratic – each
makes a different identification bet and needs different inputs – so you
choose the ones you want with the methods argument and the
estimator runs exactly those (validating that your inputs support them):
Method |
What it uses |
Paper |
|---|---|---|
PI |
Donors + donor proxies; pre-period moments only. |
Shi et al. [ProxSCM] |
PIS |
Adds surrogates + surrogate proxies; pre and post data. |
Liu et al. [LiuTchetgenVar] |
PIPost |
Surrogates, post-treatment data only. |
Liu et al. [LiuTchetgenVar] |
SPSC |
Donors only – a single proxy type, with the treated unit’s own outcome as the instrument. |
Park & Tchetgen Tchetgen [SPSC] |
DR |
Donors + donor proxies; doubly robust – consistent if either the outcome or the weighting model is right. |
Qiu et al. [DRProx] |
PIPW |
Donors + donor proxies; a weighting-only estimator (treatment confounding bridge), no outcome model. |
Qiu et al. [DRProx] |
PROXIMAL({..., "methods": ["SPSC"]}) # SPSC alone (no proxies needed)
PROXIMAL({..., "methods": ["PI"]}) # classic proximal inference
PROXIMAL({..., "methods": ["DR", "PIPW"]}) # doubly robust + weighting
PROXIMAL({..., "methods": ["PI", "PIS", "PIPost", "SPSC", "DR", "PIPW"]}) # all six
methods is required – there is no implicit default – so a run
only ever computes what you asked for. The config layer enforces input
consistency: "PI"/"PIS"/"PIPost"/"DR"/"PIPW" require
donor proxies (and, for the surrogate methods, surrogate units and
proxies), whereas "SPSC" needs only the donor pool. Results are
returned on a
PROXIMALResults, with
results.methods mapping each requested method to its fit.
What Each Method Does in Practice#
Beyond the econometrics, the four methods answer different practical questions. Classical SCM just asks “what weighted blend of controls tracks my treated unit?” – these methods each go further in a distinct way.
PI – de-noise the synthetic control. “Build a synthetic version of my treated unit from clean controls, but correct for the fact that the controls are noisy stand-ins for the thing that actually drives my outcome.” A retailer launches a loyalty program in one metro; nearby metros are controls, but their sales are noisy proxies of a shared regional demand cycle, so a plain SC blend is biased. PI uses a second set of metros – ones kept out of the blend (say, because they ran their own promotions) – as instruments to purge that noise, so the counterfactual isn’t distorted by metro-specific blips.
PIS – borrow fast signals when the outcome is slow or broken. “My post-period is long or has a structural break, and the outcome itself is noisy – lean on quick-moving signals that respond to the same shock as the effect.” After a price change, monthly revenue is noisy and the clean post-window is short, but app engagement (sessions, add-to-cart, repeat visits) moves with the same demand shock as revenue. PIS folds those surrogates in – using both pre- and post-launch data – to sharpen the revenue-effect estimate.
PIPost – estimate the effect from post-launch data alone. “I don’t have a usable pre-period for the controls, but I do have surrogates after launch.” Maybe clean control logging only began at rollout, or the pre-period is contaminated. Because the treated outcome splits into a donor-matched piece and a surrogate-driven effect piece, PIPost recovers the effect from post-treatment data only – at the cost of some efficiency.
SPSC – the no-proxy fallback. “All I have is my treated series and a pool of other series – no curated proxy or surrogate groups.” A flagship store’s sales versus a pool of other stores, with nothing but the sales panel. SPSC treats the other stores as noisy proxies of the flagship’s own counterfactual and uses the flagship’s own pre-period as the instrument, returning a de-noised synthetic flagship plus conformal bands that stay valid even with a short post-window. It is the most practical proximal method when no natural second proxy group exists.
DR – hedge against getting the model wrong. “I have both a synthetic control I trust *and a weighting model I trust – but I’m not sure which is right, and I don’t want the answer to hinge on that.”* DR combines an outcome model (the synthetic control) with a weighting model (how the confounding shifts at the intervention) so the ATT is consistent if either one is correctly specified – you get one shot at being right across two tries. Useful in a vaccine roll-out study where you can build a synthetic-control of hospitalizations and model how disease pressure shifted, and want robustness to a misspecification of either.
PIPW – weight, don’t model the outcome. “I’d rather not commit to a model for the treated unit’s counterfactual trajectory at all.” PIPW estimates the effect purely by re-weighting the pre-period to look like the post-period (a covariate-shift / inverse-probability-style weight built from the proxies), with no synthetic-control trajectory. It is the natural choice when the outcome is hard to model but the shift in the confounding is easier to capture.
Notation#
Let \(j = 1\) denote the sole treated unit, with all units \(\mathcal{N} \coloneqq \{1, \ldots, N\}\) and donor/control pool \(\mathcal{N}_0 \coloneqq \mathcal{N} \setminus \{1\}\) of cardinality \(N_0\). A subset \(\mathcal{D} \subseteq \mathcal{N}_0\) is the donor pool used to build the synthetic control; the remaining controls are repurposed as proxies. Time runs over \(t \in \mathcal{T} \coloneqq \{1, \ldots, T\}\), split by the intervention into a pre-treatment window \(\mathcal{T}_1 \coloneqq \{1, \ldots, T_0\}\) and a post-treatment window \(\mathcal{T}_2 \coloneqq \{T_0 + 1, \ldots, T\}\); the post-period has \(T - T_0\) periods (Shi et al.’s \(T_1\)). Potential outcomes are \(y^N_{jt}\) and \(y^I_{jt}\), and we observe
Stacking the donor pool, let \(\mathbf{W}_t \in \mathbb{R}^{|\mathcal{D}|}\) be the donor outcomes at time \(t\), with weight vector \(\boldsymbol{\alpha}\). Let \(\mathbf{Z}_{0t}\) be the donor proxies, \(\mathbf{X}_t \in \mathbb{R}^{H}\) the surrogate outcomes with coefficients \(\boldsymbol{\gamma}\), and \(\mathbf{Z}_{1t}\) the surrogate proxies. The estimand is the average treatment effect on the treated,
Notation bridge
The source papers write the treated outcome \(Y_t\), donors \(W_t\), donor proxies \(Z_{0,t}\), surrogates \(X_t\), surrogate proxies \(Z_{1,t}\), the donor latent factor \(\lambda_t\), and the effect’s latent factor \(\rho_t\). We keep \(\mathbf{W}, \mathbf{Z}_0, \mathbf{X}, \mathbf{Z}_1, \boldsymbol{\lambda}, \boldsymbol{\rho}\) and write the treated unit as \(j = 1\).
Why Standard SC Fails Here#
Assume the interactive fixed-effects model
where \(\boldsymbol{\lambda}_t\) is an unobserved common factor and \(\boldsymbol{\mu}_j\) a unit-specific loading. A synthetic control exists if the treated loading is a weighted average of the donor loadings, \(\boldsymbol{\mu}_1 = \sum_{j \in \mathcal{D}} \alpha_j \boldsymbol{\mu}_j\). Then in the pre-period
The donor outcomes \(y_{jt}\) are noisy proxies of \(\boldsymbol{\lambda}_t\): they carry the idiosyncratic errors \(\varepsilon_{jt}\), which also appear in the residual. Regressing \(y_{1t}\) on them is therefore an errors-in-variables regression, and the OLS/WLS weights are inconsistent even as \(T_0 \to \infty\) (Ferman and Pinto). PROXIMAL breaks this correlation with an instrument.
Mathematical Formulation#
Proximal Inference (PI)#
Suppose we observe proxies \(\mathbf{Z}_{0t}\) – e.g. the outcomes of controls excluded from the donor pool, or contemporaneous covariates – that are associated with the units only through \(\boldsymbol{\lambda}_t\) in the pre-period. Then the pre-period residual \(y_{1t} - \mathbf{W}_t^\top \boldsymbol{\alpha}\) is orthogonal to the proxies, giving the moment condition
Unlike the OLS normal equation \(\mathbb{E}[\mathbf{W}_t(y_{1t} - \mathbf{W}_t^\top \boldsymbol{\alpha})] = 0\), this estimating function is mean-zero at the truth because \(\mathbf{Z}_{0t}\) is uncorrelated with the measurement error. Solving it by GMM yields a consistent \(\widehat{\boldsymbol{\alpha}}\), and the ATT is the mean post-period gap
Adding Surrogates (PIS)#
Surrogates \(\mathbf{X}_t\) are post-treatment series driven by the same latent factors \(\boldsymbol{\rho}_t\) as the treatment effect:
With surrogate proxies \(\mathbf{Z}_{1t}\) instrumenting \(\mathbf{X}_t\), the effect coefficient \(\boldsymbol{\gamma}\) (with \(\boldsymbol{\Phi} \boldsymbol{\gamma} = \boldsymbol{\theta}\)) is identified by a second, post-period moment. The stacked conditions are
and the ATT is \(\widehat{\tau} = (T - T_0)^{-1} \sum_{t \in \mathcal{T}_2} \mathbf{X}_t^\top \widehat{\boldsymbol{\gamma}}\).
Post-Treatment-Only (PIPost)#
Because the post-period outcome carries both a latent-factor component (matched by donors) and a surrogate-driven effect component, both \(\boldsymbol{\alpha}\) and \(\boldsymbol{\gamma}\) can be estimated from a single post-period IV fit, using \((\mathbf{Z}_{0t}, \mathbf{Z}_{1t})\) to instrument \((\mathbf{W}_t, \mathbf{X}_t)\):
This is the most economical method – it needs no pre-period – but also the least efficient, since it discards pre-treatment information.
Inference: GMM Sandwich with HAC#
Each method stacks its moment conditions into \(\mathbf{U}_t(\boldsymbol{\theta})\) for parameters \(\boldsymbol{\theta} = (\boldsymbol{\alpha}, \boldsymbol{\gamma}, \tau)\) and solves the GMM problem \(\widehat{\boldsymbol{\theta}} \coloneqq \operatorname*{argmin}_{\boldsymbol{\theta}}\, \bar{\mathbf{U}}(\boldsymbol{\theta})^\top \boldsymbol{\Omega}^{-1} \bar{\mathbf{U}}(\boldsymbol{\theta})\). Standard errors come from the sandwich variance
where \(\mathbf{G}\) is the Jacobian of the moment conditions and \(\boldsymbol{\Omega}\) is the heteroskedasticity- and autocorrelation-consistent (HAC) long-run variance of the moments,
with \(k(\cdot)\) the Bartlett kernel and bandwidth \(J = \bigl\lfloor 4 (\,(T - T_0)/100\,)^{2/9} \bigr\rfloor\). (For PIPost the normalization uses the post-period count \(T - T_0\) in place of \(T\).) The HAC middle is what makes the intervals valid under serially correlated errors.
Assumptions#
Assumption 1 (interactive fixed effects). The untreated outcome obeys \(y^N_{jt} = \boldsymbol{\mu}_j^\top \boldsymbol{\lambda}_t + \varepsilon_{jt}\) with \(\mathbb{E}[\varepsilon_{jt} \mid \boldsymbol{\lambda}_t] = 0\), and there is no interference (the treated unit’s status does not affect controls).
Remark. The latent factor \(\boldsymbol{\lambda}_t\) is the unmeasured confounder: it both drives the outcome and is associated with treatment timing. This is the standard SC data-generating model; PROXIMAL does not need it to be stationary, so trending or non-stationary factors are allowed.
Assumption 2 (existence of a synthetic control). There exist weights \(\boldsymbol{\alpha}\) with \(\boldsymbol{\mu}_1 = \sum_{j \in \mathcal{D}} \alpha_j \boldsymbol{\mu}_j\) (and, for surrogates, \(\boldsymbol{\gamma}\) with \(\boldsymbol{\Phi} \boldsymbol{\gamma} = \boldsymbol{\theta}\)).
Remark. A necessary condition is that the donor pool be at least as large as the number of latent factors (\(|\mathcal{D}| \ge \dim \boldsymbol{\lambda}_t\)), and likewise that there be at least as many surrogates as effect factors. Weights need not be non-negative or sum to one – the simplex is optional, used only for interpretability or to avoid extrapolation.
Assumption 3 (valid proxies). The proxies satisfy \(\mathbf{Z}_{0t} \perp\!\!\!\perp \{y_{1t}, \mathbf{W}_t\} \mid \boldsymbol{\lambda}_t\) for \(t \in \mathcal{T}_1\) (and analogously for \(\mathbf{Z}_{1t}\) in the post-period).
Remark. Proxies must touch the units only through the latent factor – they carry information about \(\boldsymbol{\lambda}_t\) but have no direct causal link to the treated outcome. Outcomes of controls excluded from the donor pool (e.g. units dropped for similar interventions or spillover risk) and treatment-free contemporaneous covariates are natural candidates. Proxy choice is a pre-specified, domain-knowledge decision, not a data-driven search.
Assumption 4 (relevance / completeness). The cross-moment \(\mathbb{E}[\mathbf{Z}_{0t} \mathbf{W}_t^\top]\) has full column rank (and a completeness condition holds for nonparametric identification).
Remark. This is the instrument-relevance condition: the proxies must be strongly associated with the latent factor, so that variation in \(\mathbf{W}_t\) is recoverable from variation in \(\mathbf{Z}_{0t}\). It fails precisely when the proxies are unrelated to \(\boldsymbol{\lambda}_t\), in which case they cannot purge the measurement error.
Assumption 5 (stationary, weakly dependent errors). The error processes are stationary and weakly dependent.
Remark. This is weaker than i.i.d. errors: it permits serial correlation, which is why inference uses the HAC variance rather than a white-noise formula. The latent factors themselves may still be non-stationary.
Contaminated surrogates
In practice “pure” surrogates are rare. Often a surrogate is an
alternative outcome of the treated unit, or the outcome of another
affected unit, and so is contaminated by the donor latent factor
\(\boldsymbol{\lambda}_t\) as well as the effect factor
\(\boldsymbol{\rho}_t\) (Appendix A.3 of [LiuTchetgenVar]).
mlsynth handles this by residualizing the surrogate outcomes against
the donor proxies and donor outcomes on the pre-period (a
confounding-bridge projection) before the surrogate stage, so the
surrogates used downstream carry the effect signal net of
\(\boldsymbol{\lambda}_t\).
Example#
The block below is self-contained: simulate one panel from the surrogate
data-generating process of [LiuTchetgenVar] – two trending donor factors
\(\boldsymbol{\lambda}_t\), one effect factor \(\boldsymbol{\rho}_t\)
with mean one (so the true ATT is \(\approx 1\)), and contaminated
surrogates that load on both – then fit PROXIMAL and read off the ATT
and standard error for all three methods.
import numpy as np
import pandas as pd
from mlsynth import PROXIMAL
rng = np.random.default_rng(4)
F, T0, T, H = 2, 100, 200, 2 # donor factors, pre, total, surrogates
post = np.arange(T) >= T0
noise = 0.3
lam = np.log(np.arange(1, T + 1))[:, None] + rng.normal(size=(T, F)) # trending factors
rho = 1.0 + rng.normal(size=T) # effect factor, mean 1
Theta = np.array([[0.6, 0.4], [0.4, 0.6]]) # surrogate contamination
Y = lam.sum(1) + rng.normal(scale=noise, size=T)
Y[post] += rho[post] # apply the effect
true_att = rho[post].mean()
W = lam + rng.normal(scale=noise, size=(T, F)) # donor outcomes
Z0 = lam + rng.normal(scale=noise, size=(T, F)) # donor proxies
X = lam @ Theta + np.outer(rho * post, np.ones(H)) + rng.normal(scale=noise, size=(T, H))
Z1 = np.outer(rho, np.ones(H)) + lam @ Theta + rng.normal(scale=noise, size=(T, H))
# Long panel: each donor unit carries (outcome=W, donorproxy=Z0); each surrogate
# unit carries (donorproxy column = surrogate outcome X, surrogatevar = Z1).
rows = []
for t in range(T):
rows.append({"unit": "treated", "time": t, "y": Y[t], "dp": 0.0, "sv": 0.0,
"treat": int(post[t])})
for j in range(F):
rows.append({"unit": f"donor{j}", "time": t, "y": W[t, j], "dp": Z0[t, j],
"sv": 0.0, "treat": 0})
for k in range(H):
rows.append({"unit": f"surr{k}", "time": t, "y": 0.0, "dp": X[t, k],
"sv": Z1[t, k], "treat": 0})
df = pd.DataFrame(rows)
res = PROXIMAL({
"df": df, "outcome": "y", "treat": "treat", "unitid": "unit", "time": "time",
"methods": ["PI", "PIS", "PIPost"],
"donors": [f"donor{j}" for j in range(F)],
"surrogates": [f"surr{k}" for k in range(H)],
"vars": {"donorproxies": ["dp"], "surrogatevars": ["sv"]},
"display_graphs": False,
}).fit()
print(f"true ATT = {true_att:.3f}")
for name, fit in res.methods.items():
print(f"{name:6s} ATT = {fit.att:+.3f} SE = {fit.att_se:.3f}")
A representative run prints (true ATT ≈ 1.05):
PI ATT = +1.001 SE = 0.138
PIS ATT = +1.018 SE = 0.129
PIPost ATT = +1.080 SE = 0.120
res is a
PROXIMALResults:
res.pi / res.pis / res.pipost hold the per-method
ProximalMethodFit
objects, res.methods maps the names that ran, and convenience accessors
(res.att, res.att_se, res.donor_weights,
res.att_by_method()) forward to the headline PI fit.
Empirical Illustration: Panic of 1907#
[LiuTchetgenVar] apply the surrogate method to the Panic of 1907, using data from [fohlin2021]. The crisis brought down the Knickerbocker Trust, a major New York bank. We have log stock prices for 59 trusts, with Knickerbocker as the treated unit. Two other trusts also suffered bank runs and seven were tied to major firms; dropping one trust missing a period leaves 49 potential controls. The logged bid price of the 49 controls serves as the donor proxy for Knickerbocker’s log price – a sensible proxy, since the bid reflects macro forces driving the overall price.
import pandas as pd
import numpy as np
from mlsynth import PROXIMAL
file_path = "https://github.com/jgreathouse9/mlsynth/raw/refs/heads/main/basedata/trust.dta"
df = pd.read_stata(file_path)
df = df[df["ID"] != 1] # Drop the unbalanced unit
surrogates = df[df['introuble'] == 1]['ID'].unique().tolist() # affected trusts
donors = df[df['type'] == "normal"]['ID'].unique().tolist() # pure controls
vars = ["bid_itp", "ask_itp"]
df[vars] = df[vars].apply(np.log) # log, per the paper
df['Panic'] = np.where((df['time'] > 229) & (df['ID'] == 34), 1, 0)
treat, outcome, unitid, time = "Panic", "prc_log", "ID", "date"
var_dict = {"donorproxies": ["bid_itp"], "surrogatevars": ["ask_itp"]}
# Donors-only proximal inference (PI)
res_pi = PROXIMAL({
"df": df, "treat": treat, "time": time, "outcome": outcome, "unitid": unitid,
"methods": ["PI"],
"treated_color": "black", "counterfactual_color": ["blue"],
"display_graphs": True, "vars": var_dict, "donors": donors,
}).fit()
# Adding surrogates (PI, PIS, PIPost)
res_surr = PROXIMAL({
"df": df, "treat": treat, "time": time, "outcome": outcome, "unitid": unitid,
"methods": ["PI", "PIS", "PIPost"],
"treated_color": "black", "counterfactual_color": ["blue", "red", "lime"],
"display_graphs": True, "vars": var_dict, "donors": donors,
"surrogates": surrogates, # the affected trusts, repurposed as surrogates
}).fit()
print(res_surr.att_by_method())
This pulls the data straight from the repository (48 pure-control donors, 3 affected trusts as surrogates) and prints the ATT for each method:
{'PI': -1.148, 'PIS': -1.148, 'PIPost': -1.220}
which reproduces the paper’s full-window Table 3 estimates (PI -1.138, PI-S -1.134, PI-P -1.220) to within rounding.
Using the bid price as a proxy, the synthetic control fits the pre-intervention series well. The affected trusts – which would be discarded in a classical SC analysis because they violate the no-interference assumption – are instead repurposed as surrogates: they do not enter the donor pool, but their post-intervention movements help pin down the latent effect factors. The asking price of those trusts is their surrogate proxy. Even using only post-intervention data (PIPost), the estimate largely agrees with the donors-only proximal inference.
Single Proxy Synthetic Control (SPSC)#
PI, PIS and PIPost all require two proxy types: outcome proxies (the donors) and a separate group of treatment/surrogate proxies (\(\mathbf{Z}_0\), \(\mathbf{Z}_1\)) to instrument them. Park and Tchetgen Tchetgen [SPSC] show this can be reduced to a single proxy type – the donor outcomes alone – by a clever change of perspective.
Instead of viewing the donors as proxies of a latent factor, SPSC views them as error-prone proxies of the treated unit’s own treatment-free potential outcome \(y^N_{1t}\). It posits a synthetic-control bridge function \(h^\star\) that is conditionally unbiased for that outcome, \(y^N_{1t} = \mathbb{E}[h^\star(\mathbf{W}_t) \mid y^N_{1t}]\). With a linear bridge \(h^\star(\mathbf{W}_t) = \mathbf{W}_t^\top \boldsymbol{\gamma}\), this is the “reverse” measurement-error regression
so the treated unit’s own pre-treatment outcome is a valid instrument for the donors – no second proxy group is needed. The identifying moment (Theorem 3.1 of [SPSC]) is \(\mathbb{E}[\,\phi(y_{1t})\,(y_{1t} - \mathbf{W}_t^\top \boldsymbol{\gamma})\,] = 0\) over \(t \in \mathcal{T}_1\), where \(\phi(\cdot)\) is a basis of the treated outcome (the identity by default).
Why use it. SPSC trades the need for a curated proxy/surrogate group for
a single, always-available instrument – the treated series itself – which
makes it the most practical proximal method when no natural second proxy
group exists. It pairs naturally with a conformal prediction interval
for the per-period effect (spsc_conformal=True), valid even with a
short post-period.
Estimation. Because there are typically far fewer instruments than donors, \(\boldsymbol{\gamma}\) is estimated by a ridge-regularized GMM (penalty selected by leave-one-out cross-validation), and the ATT is the mean post-period gap with a GMM sandwich (HAC) standard error. Two variants handle trends: SPSC-NoDT uses the raw outcome as the instrument, while SPSC-DT first residualizes the treated outcome against a cubic B-spline time trend – essential when the series is non-stationary (the analogue of the time-varying estimating function in \(\Psi_{\text{pre}}\)).
Select it with methods=["SPSC"]. Unlike PI/PIS/PIPost it needs no
proxy variables at all – just the treated series and the donor pool:
import pandas as pd
import numpy as np
from mlsynth import PROXIMAL
raw = pd.read_stata("https://github.com/jgreathouse9/mlsynth/raw/refs/heads/main/basedata/trust.dta")
raw["prc_log"] = raw["prc_log"].astype(float)
# Park & Tchetgen Tchetgen's window: 1906-01-05 to 1908-12-30 (T0=217).
win = raw[(raw["date"] >= "1906-01-05") & (raw["date"] <= "1908-12-31")].copy()
# Treated unit = average log price of the two most-affected trusts.
treated = (win[win["type"].isin(["Knickerbocker", "Trust Co of Am"])]
.groupby(["date", "time"], as_index=False)
.agg(prc_log=("prc_log", "mean")))
treated["ID"] = "treated"
# Donors = the weakly-connected "normal" trusts (drop the one unbalanced unit).
donors_df = win[(win["type"] == "normal") & (win["ID"] != 1)][
["ID", "date", "time", "prc_log"]].copy()
donors_df["ID"] = donors_df["ID"].astype(str)
df = pd.concat([treated[["ID", "date", "time", "prc_log"]], donors_df], ignore_index=True)
df["Panic"] = np.where((df["time"] >= 230) & (df["ID"] == "treated"), 1, 0)
donor_ids = sorted(donors_df["ID"].unique())
res = PROXIMAL({
"df": df, "treat": "Panic", "time": "date", "outcome": "prc_log", "unitid": "ID",
"methods": ["SPSC"], # SPSC alone -- no proxies needed
"donors": donor_ids,
"spsc_detrend": True, # SPSC-DT
"display_graphs": False,
}).fit()
print(res.spsc.att, res.spsc.att_se, res.spsc.metadata["variant"])
This reproduces the paper’s Table 3: SPSC-DT ATT -0.815 (SE 0.067) and,
with spsc_detrend=False, SPSC-NoDT ATT -0.812 (SE 0.085) – against
the paper’s -0.816 / 0.066 and -0.813 / 0.084.
Conformal intervals. Set spsc_conformal=True (optionally
spsc_conformal_periods=[...] to cover only some post-periods) to attach
pointwise prediction intervals for the per-period effect, returned on
res.spsc.metadata["conformal"] as {"periods", "lower", "upper"}.
Over the Panic post-period these reproduce the average interval width of
the paper’s Figure 3 (≈ 0.07 for SPSC-DT). The inversion re-fits the
weights on a grid of candidate effects per period, so it is opt-in for
cost.
Nonparametric (series) SPSC. By default the treated unit’s own outcome
enters the moment conditions linearly – the reference’s identity
Y.basis. Park & Tchetgen Tchetgen’s supplement (S1.6) notes that a
rich basis of the outcome – “polynomials, trigonometric functions,
splines, or wavelets” – spans a larger space of the latent factor and so
identifies a bridge that need not be linear. Set spsc_basis_degree=p
(\(p \ge 2\)) to replace the instrument with the polynomial sieve
\([\,y_{1t},\,y_{1t}^2,\,\dots,\,y_{1t}^p\,]\). This over-identifies the
ridge-GMM (more moments than donor weights) and is the right choice when
the synthetic-control relationship is nonlinear in the donor outcomes;
spsc_basis_degree=1 (the default) is bit-for-bit the linear single
proxy. The fitted variant is labelled accordingly
(res.spsc.metadata["variant"] becomes e.g. "SPSC-DT-NP3"), and the
detrending and conformal machinery carry the same sieve.
Doubly Robust Proximal Synthetic Control (DR & PIPW)#
PI, PIS, PIPost and SPSC all rest on getting one model right – an outcome model (the synthetic control). Qiu, Shi, Miao, Dobriban and Tchetgen Tchetgen [DRProx] add a second, complementary nuisance and combine the two so you only need one of them to be correct.
There are two bridges (each augmented with an intercept):
the outcome bridge \(h(\mathbf{W}_t) = (1, \mathbf{W}_t)^\top \boldsymbol{\alpha}\) – a pre-period IV fit of the treated outcome on the donors, instrumented by the proxies (the PI idea); and
the treatment confounding bridge \(q(\mathbf{Z}_t) = \exp\{(1, \mathbf{Z}_t)^\top \boldsymbol{\beta}\}\) – a covariate-shift / likelihood-ratio weight capturing how the unmeasured confounding shifts at the intervention, solving \(\mathbb{E}_{\text{pre}}[q(\mathbf{Z})(1,\mathbf{W})] = \mathbb{E}_{\text{post}}[(1,\mathbf{W})]\).
They give three estimands:
The DR form is consistent if either \(h\) or \(q\) is
correctly specified – not necessarily both. PIPW exposes the
weighting-only estimator (no outcome model at all); the outcome-only form
is the existing PI.
Estimation. Each is a just-identified GMM (alpha by IV, beta by
a small nonlinear solve, the means in closed form), so the parameters solve
the moment equations exactly and the ATT standard error is the GMM sandwich
with a Bartlett-HAC middle. DR returns the outcome-bridge synthetic
control as its counterfactual; PIPW, being a pure weighting estimator,
has no imputed trajectory (its counterfactual is NaN).
Both consume the same inputs as PI – donors W and the donor
proxies Z – so just add them to methods. The block below is a
runnable proof of the agreement claimed in Replication Status: it
draws from the reference implementation’s own DGP
(DR_Proximal_SC/simulation/normal: true.ATE = 2, AR(1) confounders,
\(W_j = 2U_j + \text{noise}\), \(Z_j = 2U_j + \text{noise}\)), runs
the packaged PROXIMAL, and checks recovery, Wald coverage, and double
robustness:
import numpy as np
import pandas as pd
from mlsynth import PROXIMAL
TRUE = 2.0
def gen(T, rng, nU=2, misspecify=False):
"""Liu-Tchetgen Tchetgen-Varjao reference DGP (simulation/normal)."""
T0 = T // 2
U = np.empty((T, nU)); U[0] = rng.normal(size=nU)
for t in range(1, T):
U[t] = 0.1 * U[t - 1] + 0.9 * rng.normal(size=nU)
sigU = U.sum(1)
signal = sigU if not misspecify else sigU + 0.7 * sigU ** 2 # nonlinear -> breaks h-bridge
Y = TRUE * (np.arange(1, T + 1) > T0) + 2 * signal + rng.normal(size=T)
W = 2 * U + rng.normal(size=(T, nU)) # donor outcomes
Z = 2 * U + rng.normal(size=(T, nU)) # donor proxies
rows = []
for t in range(T):
rows.append({"unit": "treated", "time": t, "y": float(Y[t]), "dp": 0.0,
"treat": int(t >= T0)})
for j in range(nU):
rows.append({"unit": f"d{j}", "time": t, "y": float(W[t, j]),
"dp": float(Z[t, j]), "treat": 0})
return pd.DataFrame(rows), nU
def fit(df, nU, methods):
return PROXIMAL({
"df": df, "outcome": "y", "treat": "treat", "unitid": "unit", "time": "time",
"methods": methods, "donors": [f"d{j}" for j in range(nU)],
"vars": {"donorproxies": ["dp"]}, "display_graphs": False,
}).fit().methods
# (1) recovery + (2) 95% Wald coverage at T=1000
acc = {"DR": [], "PIPW": []}; cov = {"DR": 0, "PIPW": 0}
for r in range(200):
m = fit(*gen(1000, np.random.default_rng(r)), ["DR", "PIPW"])
for k in ("DR", "PIPW"):
acc[k].append(m[k].att)
cov[k] += abs(m[k].att - TRUE) <= 1.96 * m[k].att_se
for k in ("DR", "PIPW"):
print(f"{k:5s} mean ATT={np.mean(acc[k]):.3f} coverage={cov[k]/200:.0%}")
# DR mean ATT=2.007 coverage=91%
# PIPW mean ATT=2.007 coverage=99%
# (3) double robustness: misspecify the outcome bridge -> PI collapses, DR holds
pi, dr = [], []
for r in range(120):
m = fit(*gen(1000, np.random.default_rng(1000 + r), misspecify=True), ["PI", "DR"])
pi.append(m["PI"].att); dr.append(m["DR"].att)
print(f"misspecified h: PI={np.mean(pi):.2f} (collapses) DR={np.mean(dr):.2f} (holds)")
# misspecified h: PI=4.30 (collapses) DR=1.99 (holds)
Over-identified / empirical use
The paper’s real analyses (Brazil, Florida, Kansas) use a separate,
larger set of proxy units Z than donors W, which makes the
GMM over-identified. mlsynth’s DR/PIPW are the just-identified form
(Z = the donor proxies, matched to W). In the over-identified
regime with many near-collinear control-unit instruments, the GMM
minimizer is ill-conditioned and its value is sensitive to the
optimizer, so those published point estimates are not bit-reproducible
across languages. We therefore validate DR/PIPW synthetically (Path
B; see Replication Status) rather than against the empirical tables.
Replication Status#
Note
Reference-code validation (Path A). mlsynth’s PI, PIS and
PIPost were checked value-for-value against the authors’ reference
implementation (freshtaste/proximal) on identical data-generating
draws. Both the ATT and the GMM/HAC standard error match to machine
precision for all three methods. A coverage Monte Carlo confirms the
inference is correct: nominal-95% Wald intervals attain ≈ 93.8%
coverage (PI), identical to the reference – restored from a 63.8%
undercoverage caused by an earlier Jacobian-scaling bug in the GMM
sandwich.
Empirical (Path A, Panic of 1907). Running mlsynth on the trust
panel (see Empirical Illustration: Panic of 1907) reproduces the
full-window Table 3 of [LiuTchetgenVar] to within rounding: PI -1.148
vs. -1.138, PI-S -1.148 vs. -1.134, PI-P -1.220 vs. -1.220.
SPSC (Path A, single proxy). SPSC is a value-for-value port of the
authors’ reference R package (github.com/qkrcks0218/SPSC) and
reproduces its Panic-of-1907 Table 3: SPSC-NoDT ATT -0.812 / SE 0.085
(paper -0.813 / 0.084) and SPSC-DT ATT -0.815 / SE 0.067 (paper -0.816 /
0.066). The tiny ATT gap is one donor (48 vs. 49: the reference keeps a
unit that is unbalanced in this build). The conformal prediction
intervals of [SPSC] are also ported and reproduce the average interval
width of the paper’s Figure 3 (≈ 0.07 for SPSC-DT).
SPSC (Path B, durable IFEM Monte Carlo). The authors ship a
self-contained interactive-fixed-effects DGP in their package README
(the “Toy Example from Interactive Fixed Effect Models,”
\(\mathrm{True.ATT}=3\), a trending donor pool). The durable
benchmark spsc_ifem_mc redraws it 60 times and drives mlsynth’s
SPSC: both SPSC-DT and SPSC-NoDT recover the true ATT essentially without
bias (biases ≈ 0.006 and 0.008), but only the detrended SPSC-DT
delivers honest inference – its 95% Wald intervals cover near nominal
while SPSC-NoDT under-covers because its constant-gap model is forced
through a trending counterfactual. This reproduces the supplement’s
central finding ([SPSC] Figures S2-S6): detrending is what buys correct
coverage when the untreated trajectories drift.
Simulation (Path B). The robustness claim of [LiuTchetgenVar] Sec.
4.1 reproduces, and is pinned by the durable benchmark
proximal_surrogates_mc (the authors’ freshtaste/proximal dgp.py):
under a trending latent factor
(\(\boldsymbol{\lambda}_t \sim N(\log t, 1)\)), classical SC is biased
by the trend (mean ATT ≈ 1.30 against the true 1.0, MSE ≈ 0.19) while
PI/PIS/PIPost recover the truth (biases ≲ 0.003) with near-nominal Wald
coverage and lower MSE; PIS attains the lowest MSE of the three
(≈ 0.05). See Example for a one-draw illustration.
DR & PIPW (Path B) – runnable proof, not a claim. The DR/PIPW
agreement is demonstrated by the runnable Monte Carlo above (the
Doubly Robust section), which draws from the reference implementation’s
own DGP (DR_Proximal_SC/simulation/normal, true.ATE = 2) and
drives the packaged PROXIMAL. At T = 1000 over 200 reps both
estimators recover the truth – DR and PIPW mean ATT = 2.007
(sd 0.11) – with Wald coverage of 91% (DR) and 99% (PIPW) against
the 95% nominal. The double-robustness headline also reproduces:
misspecifying the outcome bridge (Y nonlinear in the confounder)
biases the outcome-only PI estimator (mean ATT ≈ 4.3) while
DR stays at 1.99, rescued by the correct treatment-confounding
bridge. Copy-paste the block to re-derive these numbers, or run the
durable benchmark dr_proximal_mc – it drives the same DGP through
the packaged estimators and pins recovery, coverage, and the
double-robustness collapse (PI ≈ 4.23 vs DR ≈ 1.96 under misspecification).
The over-identified empirical analyses (Brazil/Florida/Kansas) are not
bit-reproducible cross-language (ill-conditioned GMM; see the admonition
above), so DR/PIPW rest on this synthetic validation.
Per the project’s replication contract
(agents/agents_estimators.md), PROXIMAL is considered validated on
the strength of the machine-precision agreement with the reference code
plus the reproduced simulation behavior.
Core API#
Proximal Inference (PROXIMAL) estimator.
Implements:
Shi, X., Li, K., Miao, W., Hu, M., & Tchetgen Tchetgen, E. (2023). “Theory for Identification and Inference with Synthetic Controls: A Proximal Causal Inference Framework.” arXiv:2108.13935.
Liu, J., Tchetgen Tchetgen, E. J., & Varjao, C. (2023). “Proximal Causal Inference for Synthetic Control with Surrogates.” arXiv:2308.09527.
PROXIMAL treats donor outcomes as negative controls instrumented by donor proxies, and optionally adds surrogate outcomes instrumented by surrogate proxies. It runs up to three methods on the same panel:
PI – Proximal Inference (donors only).
PIS – Proximal Inference with surrogates (full-sample two-stage).
PIPost – post-treatment-only surrogate variant.
PIS and PIPost run only when surrogate units are configured. Every method closes with a GMM sandwich variance for the ATT (HAC/Bartlett middle).
See mlsynth.utils.proximal_helpers for the algorithmic pieces.
- class mlsynth.estimators.proximal.PROXIMAL(config: PROXIMALConfig | dict)#
Bases:
objectProximal Inference (PROXIMAL) estimator.
- Parameters:
config (PROXIMALConfig or dict) – Configuration object. See
mlsynth.config_models.PROXIMALConfig.- Returns:
PROXIMALResults – Container with the PI fit (always) and the PIS / PIPost fits when surrogates are configured, plus convenience accessors forwarding to the headline PI method.
- fit() PROXIMALResults#
Run the proximal pipeline and return a
PROXIMALResults.
Configuration#
- class mlsynth.config_models.PROXIMALConfig(*, df: ~pandas.DataFrame, outcome: str, treat: str, unitid: str, time: str, display_graphs: bool = True, save: bool | str = False, counterfactual_color: str | ~typing.List[str] = <factory>, treated_color: str = 'black', plot: ~mlsynth.config_models.PlotConfig = <factory>, methods: ~typing.Annotated[~typing.List[str], ~annotated_types.MinLen(min_length=1)], donors: ~typing.Annotated[~typing.List[str | int], ~annotated_types.MinLen(min_length=1)], surrogates: ~typing.List[str | int] = <factory>, vars: ~typing.Dict[str, ~typing.List[str]] = <factory>, spsc_detrend: bool = True, spsc_lambda: float | None = None, spsc_spline_df: ~typing.Annotated[int, ~annotated_types.Ge(ge=3)] = 5, spsc_basis_degree: ~typing.Annotated[int, ~annotated_types.Ge(ge=1)] = 1, spsc_conformal: bool = False, spsc_conformal_periods: ~typing.List[int] | None = None)#
Configuration for the Proximal Inference (PROXIMAL) estimator.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
Result Containers#
PROXIMAL.fit() returns a
PROXIMALResults, whose
pi / pis / pipost fields each hold a
ProximalMethodFit
(counterfactual, gap, ATT, GMM/HAC standard error, pre/post RMSE, donor
weights) for the methods that ran. The prepared panel is exposed as a
PROXIMALInputs.
Frozen dataclasses for the Proximal Inference (PROXIMAL) estimator.
PROXIMAL bundles up to three proximal causal-inference estimators that all run on the same prepared panel:
PI – Proximal Inference with negative-control donor outcomes
Wand donor proxiesZ0(Shi, Li, Miao, Hu and Tchetgen Tchetgen 2023, arXiv:2108.13935). A pre-period IV fit imputes the post-period counterfactual.PIS – Proximal Inference with Surrogates. Adds a second stage projecting the treatment effect onto surrogate outcomes
Xinstrumented by surrogate proxiesZ1(Liu, Tchetgen Tchetgen and Varjao 2023, arXiv:2308.09527), estimated on the full sample.PIPost – the post-treatment-only surrogate variant of PIS.
PIS and PIPost run only when surrogate units are configured, so the user can compare the available estimates side by side. Every method closes with a GMM sandwich variance for the ATT (HAC/Bartlett middle), validated value-for-value against the authors’ reference code.
The three layers below (inputs, per-method fit, top-level results) keep the pipeline pluggable and mirror the CLUSTERSC container design.
- class mlsynth.utils.proximal_helpers.structures.PROXIMALInputs(y: ndarray, donor_outcomes: ndarray, donor_proxies: ndarray | None, surrogate_outcomes: ndarray | None, surrogate_proxies: ndarray | None, T: int, T0: int, bandwidth: int, time_labels: ndarray, treated_unit_name: Any, donor_names: Sequence, methods: Sequence[str] = ('PI',), spsc_detrend: bool = True, spsc_lambda: float | None = None, spsc_spline_df: int = 5, spsc_basis_degree: int = 1, spsc_conformal: bool = False, spsc_conformal_periods: Sequence[int] | None = None)#
Bases:
objectPreprocessed panel data for the proximal pipeline.
- Parameters:
y (np.ndarray) – Treated-unit outcome over all
Tperiods, shape(T,).donor_outcomes (np.ndarray) – Donor outcomes
W, shape(T, n_donors).donor_proxies (np.ndarray) – Donor proxies
Z0(instruments forW), shape(T, n_donors).surrogate_outcomes (np.ndarray or None) – Cleaned surrogate outcomes
X, shape(T, n_surrogate_vars);Nonewhen no surrogates are configured.surrogate_proxies (np.ndarray or None) – Surrogate proxies
Z1(instruments forX);Nonewhen no surrogates are configured.T (int) – Total number of periods.
T0 (int) – Number of pre-treatment periods.
bandwidth (int) – Bartlett HAC truncation lag used for all GMM standard errors.
time_labels (np.ndarray) – Length-
Ttime labels.treated_unit_name (Any) – Identifier of the treated unit.
donor_names (Sequence) – Length-
n_donorsdonor labels (column order ofdonor_outcomes).methods (Sequence of str) – Which estimators to run: any of
"PI","PIS","PIPost","SPSC".spsc_detrend (bool) – Whether SPSC detrends the treated outcome against a B-spline time trend (SPSC-DT vs SPSC-NoDT).
spsc_lambda (float or None) – log10 ridge penalty for SPSC;
Noneselects it by LOO-CV.spsc_spline_df (int) – Degrees of freedom of the SPSC detrend B-spline basis.
spsc_basis_degree (int) – Degree of the polynomial sieve on the SPSC treated-outcome instrument (1 = linear single proxy; >=2 = nonparametric / series SPSC).
spsc_conformal (bool) – Whether to compute SPSC conformal prediction intervals.
spsc_conformal_periods (Sequence of int or None) – Absolute post-period indices to cover with conformal intervals;
Nonecovers every post-treatment period.
- donor_outcomes: ndarray#
- time_labels: ndarray#
- y: ndarray#
- class mlsynth.utils.proximal_helpers.structures.PROXIMALResults(inputs: ~mlsynth.utils.proximal_helpers.structures.PROXIMALInputs, pi: ~mlsynth.utils.proximal_helpers.structures.ProximalMethodFit | None, pis: ~mlsynth.utils.proximal_helpers.structures.ProximalMethodFit | None, pipost: ~mlsynth.utils.proximal_helpers.structures.ProximalMethodFit | None, spsc: ~mlsynth.utils.proximal_helpers.structures.ProximalMethodFit | None = None, dr: ~mlsynth.utils.proximal_helpers.structures.ProximalMethodFit | None = None, pipw: ~mlsynth.utils.proximal_helpers.structures.ProximalMethodFit | None = None, selected_variant: str = 'PI', metadata: ~typing.Dict[str, ~typing.Any] = <factory>)#
Bases:
objectTop-level container returned by
mlsynth.PROXIMAL.fit().- Parameters:
inputs (PROXIMALInputs) – Preprocessed panel.
pi (ProximalMethodFit or None) – Proximal Inference fit (always populated).
pis (ProximalMethodFit or None) – Proximal-with-surrogates fit (populated only when surrogates are configured).
pipost (ProximalMethodFit or None) – Post-treatment surrogate fit (populated only when surrogates are configured).
selected_variant (str) – Which fit is exposed via the convenience aliases
att,att_se,counterfactual,gap,donor_weights– one of"PI","PIS","PIPost". Defaults to"PI".metadata (dict) – Free-form pipeline diagnostics.
- ci_by_method() Dict[str, Tuple[float, float]]#
{method: (lower, upper)}Wald CIs from the GMM standard errors.
- property counterfactual: ndarray#
Counterfactual of the primary variant.
- dr: ProximalMethodFit | None = None#
- property gap: ndarray#
Gap of the primary variant.
- inputs: PROXIMALInputs#
- property methods: Dict[str, ProximalMethodFit]#
{method_name: fit}for the methods that were run, in order.
- pi: ProximalMethodFit | None#
- pipost: ProximalMethodFit | None#
- pipw: ProximalMethodFit | None = None#
- pis: ProximalMethodFit | None#
- se_by_method() Dict[str, float | None]#
{method: ATT standard error}across the methods that were run.
- spsc: ProximalMethodFit | None = None#
- class mlsynth.utils.proximal_helpers.structures.ProximalMethodFit(name: str, counterfactual: ~numpy.ndarray, gap: ~numpy.ndarray, time_varying_effect: ~numpy.ndarray, att: float, att_se: float | None, pre_rmse: float, post_rmse: float, alpha_weights: ~numpy.ndarray, donor_weights: ~typing.Dict[~typing.Any, float], metadata: ~typing.Dict[str, ~typing.Any] = <factory>)#
Bases:
objectSingle proximal-method (PI / PIS / PIPost) fit output.
- Parameters:
name (str) – Method identifier (
"PI","PIS", or"PIPost").counterfactual (np.ndarray) – Estimated counterfactual outcome path, shape
(T,).gap (np.ndarray) – Observed treated minus counterfactual, shape
(T,).time_varying_effect (np.ndarray) – Estimated time-varying treatment effect, shape
(T,). For PI this equalsgap; for the surrogate methods it is the fittedX gammaseries.att (float) – Mean post-treatment gap.
att_se (float or None) – GMM/HAC standard error of the ATT (
Noneif inference failed).pre_rmse (float) – Root-mean-squared pre-treatment gap.
post_rmse (float) – Root-mean-squared post-treatment gap.
alpha_weights (np.ndarray) – Estimated donor coefficients
alpha.donor_weights (dict) – Mapping
{donor_name: coefficient}.metadata (dict) – Free-form per-method diagnostics.
- alpha_weights: ndarray#
- counterfactual: ndarray#
- gap: ndarray#
- time_varying_effect: ndarray#
Helper Modules#
Data preparation – pivots the long panel, builds the donor/surrogate
outcome and proxy matrices, residualizes contaminated surrogates, and packs
everything into the typed
PROXIMALInputs.
Data preparation for the PROXIMAL estimator.
Pivots a long panel into the typed PROXIMALInputs container:
the treated outcome, the donor outcome matrix W and donor proxy
matrix Z0, and – when surrogate units are configured – the cleaned
surrogate outcome matrix X and surrogate proxy matrix Z1.
Surrogate outcomes are residualized against the donor proxies/outcomes on
the pre-period via mlsynth.utils.datautils.clean_surrogates2(),
matching the construction in Liu, Tchetgen Tchetgen and Varjao (2023).
- mlsynth.utils.proximal_helpers.setup.prepare_proximal_inputs(df: DataFrame, outcome: str, unitid: str, time: str, treat: str, donors: List[str | int], surrogates: List[str | int], vars: Dict[str, List[str]], methods: Sequence[str] = ('PI',), spsc_detrend: bool = True, spsc_lambda: float | None = None, spsc_spline_df: int = 5, spsc_basis_degree: int = 1, spsc_conformal: bool = False, spsc_conformal_periods: Sequence[int] | None = None) PROXIMALInputs#
Pivot a long panel into the typed inputs the PROXIMAL pipeline expects.
Only the matrices the requested
methodsneed are built: donor proxies (Z0) are built when a method consuming them (PI/PIS/PIPost) is requested, and surrogate matrices when PIS/PIPost is requested. SPSC needs only the donor outcomes and the treated series.- Parameters:
df (pd.DataFrame) – Long balanced panel data.
outcome, unitid, time, treat (str) – Column names identifying the outcome, units, time periods, and the treatment indicator.
donors (list) – Donor unit identifiers used to build
W.surrogates (list) – Surrogate unit identifiers used to build
X/Z1.vars (dict) – Proxy-variable map (
"donorproxies"/"surrogatevars").methods (sequence of str) – Estimators to prepare for.
spsc_detrend, spsc_lambda, spsc_spline_df, spsc_conformal, spsc_conformal_periods – SPSC options, forwarded onto
PROXIMALInputs.
- Returns:
PROXIMALInputs – Prepared outcome/proxy matrices and label metadata.
- Raises:
MlsynthDataError – If no configured donors are present in the panel.
The Bartlett kernel and HAC long-run variance shared by the PI family.
HAC variance machinery for proximal GMM inference.
The three proximal estimators (estimation) all close with a GMM
sandwich variance for the ATT. The “meat” of that sandwich is a
Heteroskedasticity- and Autocorrelation-Consistent (HAC) estimate of the
long-run variance of the stacked moment conditions, formed with a
Bartlett kernel. This mirrors the reference implementation of
Shi, X., Li, K., Miao, W., Hu, M., & Tchetgen Tchetgen, E. (2023). “Theory for identification and Inference with Synthetic Controls: A Proximal Causal Inference Framework.” arXiv:2108.13935.
and was validated value-for-value against the authors’ code
(freshtaste/proximal).
- mlsynth.utils.proximal_helpers.inference.bartlett(lag_order: int, truncation_lag: int) float#
Bartlett kernel weight
1 - |lag| / (lag_trunc + 1).- Parameters:
lag_order (int) – Current lag order (
0returns weight1).truncation_lag (int) – Bandwidth; lags beyond it receive weight
0.
- Returns:
float – The kernel weight for
lag_order.
- mlsynth.utils.proximal_helpers.inference.hac(moment_conditions: ~numpy.ndarray, truncation_lag: int, kernel: ~typing.Callable[[int, int], float] = <function bartlett>) ndarray#
HAC long-run covariance of stacked moment conditions.
- Parameters:
moment_conditions (np.ndarray) – Moment matrix of shape
(n_obs, n_moments)(rows are time periods, columns are moment conditions).truncation_lag (int) – Kernel bandwidth (number of autocovariance lags to include).
kernel (Callable[[int, int], float], optional) – Lag-weighting kernel. Defaults to
bartlett().
- Returns:
np.ndarray – The
(n_moments, n_moments)HAC covariance estimate.
Each estimator lives in its own subpackage so new proximal methods can be
added as new subpackages. pi, pis and pipost are the two-proxy
GMM family; spsc is the single-proxy ridge-GMM plus conformal
inference.
Proximal Inference (PI) – donors-only, two-proxy GMM.
Implements the PI estimator of Shi, Li, Miao, Hu and Tchetgen Tchetgen
(2023, arXiv:2108.13935): donor outcomes W are negative-control
outcomes instrumented by donor proxies Z0 on the pre-period, and the
fitted relationship imputes the post-period counterfactual. Closes with the
GMM sandwich variance of the ATT (HAC/Bartlett middle), validated
value-for-value against the authors’ reference code (freshtaste/proximal).
- mlsynth.utils.proximal_helpers.pi.estimation.estimate_pi(outcome_vector: ndarray, design_matrix: ndarray, instrument_matrix: ndarray, num_pre_treatment_periods: int, num_post_periods_for_effect_eval: int, total_periods: int, hac_truncation_lag: int, common_aux_covariates_1: ndarray | None = None, common_aux_covariates_2: ndarray | None = None) Tuple[ndarray, ndarray, float]#
Proximal Inference (PI) counterfactual, donor weights, and ATT SE.
Stage 1 estimates donor coefficients
alphaon the pre-period via the just-identified IV momentZ0' (Y - W alpha) = 0; the fittedW alphais the counterfactual. The ATT standard error is the GMM sandwich variance with a HAC (Bartlett) middle.- Parameters:
outcome_vector (np.ndarray) – Treated outcome, shape
(total_periods,).design_matrix (np.ndarray) – Donor outcomes
W, shape(total_periods, n_donors).instrument_matrix (np.ndarray) – Donor proxies
Z0(instruments forW), same column count asdesign_matrix.num_pre_treatment_periods (int) – Number of pre-treatment periods
T0.num_post_periods_for_effect_eval (int) – Number of post-treatment periods used to average the ATT.
total_periods (int) – Total number of periods
T.hac_truncation_lag (int) – Bartlett bandwidth for the HAC variance.
common_aux_covariates_1, common_aux_covariates_2 (np.ndarray, optional) – Optional covariates augmenting both
WandZ0. If one is given, both must be.
- Returns:
counterfactual (np.ndarray) – Predicted counterfactual
W alpha(original donors only), shape(total_periods,).alpha (np.ndarray) – Donor coefficients (original donors only).
se_tau (float) – Standard error of the ATT (
np.nanif GMM inference fails).
Proximal Inference with Surrogates (PIS) – full-sample two-stage GMM.
Implements the surrogate estimator of Liu, Tchetgen Tchetgen and Varjao
(2023, arXiv:2308.09527). Stage 1 fits donor coefficients alpha on the
pre-period; Stage 2 projects the post-period residual onto surrogate
outcomes X instrumented by surrogate proxies Z1. Closes with the
joint GMM sandwich variance of the ATT (HAC/Bartlett middle).
- mlsynth.utils.proximal_helpers.pis.estimation.estimate_pi_surrogate(outcome_vector: ndarray, design_matrix_main: ndarray, instrument_matrix_main: ndarray, instrument_matrix_surrogate: ndarray, surrogate_outcome_matrix: ndarray, num_pre_treatment_periods: int, num_post_periods_for_effect_eval: int, total_periods: int, hac_truncation_lag: int, aux_covariates_main_1: ndarray | None = None, aux_covariates_main_2: ndarray | None = None, aux_covariates_surrogate: ndarray | None = None) Tuple[float, ndarray, ndarray, float]#
Proximal Inference with surrogates (PIS).
Stage 1 fits donor coefficients
alphaon the pre-period (Z0' (Y - W alpha) = 0). Stage 2 projects the post-period residual onto surrogate outcomesXinstrumented by surrogate proxiesZ1(Z1' (Y - W alpha - X gamma) = 0);X gammais the time-varying effect. The ATT SE is the joint GMM sandwich variance.- Parameters:
outcome_vector (np.ndarray) – Treated outcome, shape
(total_periods,).design_matrix_main (np.ndarray) – Donor outcomes
W.instrument_matrix_main (np.ndarray) – Donor proxies
Z0(instruments forW).instrument_matrix_surrogate (np.ndarray) – Surrogate proxies
Z1(instruments forX).surrogate_outcome_matrix (np.ndarray) – Surrogate outcomes
X.num_pre_treatment_periods, num_post_periods_for_effect_eval, total_periods (int) – Pre/post/total period counts.
hac_truncation_lag (int) – Bartlett bandwidth.
aux_covariates_main_1, aux_covariates_main_2 (np.ndarray, optional) – Optional covariates augmenting
WandZ0.aux_covariates_surrogate (np.ndarray, optional) – Optional covariates augmenting
XandZ1.
- Returns:
tau (float) – ATT (mean post-period time-varying effect).
taut (np.ndarray) – Time-varying effect over all periods (pre-period entries are the Stage-1 residuals), shape
(total_periods,).alpha (np.ndarray) – Donor coefficients (original donors only).
se_tau (float) – Standard error of the ATT (
np.nanif GMM inference fails).
Post-treatment-only proximal surrogate estimator (PIPost).
Implements the post-only variant of Liu, Tchetgen Tchetgen and Varjao
(2023, arXiv:2308.09527): donor and surrogate coefficients are estimated
jointly from a single post-treatment IV fit, using (Z0, Z1) to
instrument (W, X). The GMM sandwich variance is scaled by the number of
post-treatment periods T1.
- mlsynth.utils.proximal_helpers.pipost.estimation.estimate_pi_surrogate_post(outcome_vector: ndarray, main_covariates: ndarray, main_instruments: ndarray, surrogate_instruments: ndarray, surrogate_covariates: ndarray, treatment_start_period: int, num_post_treatment_periods_analyzed: int, hac_truncation_lag: int, aux_main_covariates: ndarray | None = None, aux_main_instruments: ndarray | None = None, aux_surrogate_covariates: ndarray | None = None) Tuple[float, ndarray, ndarray, float]#
Post-treatment proximal surrogate estimator (PIPost).
Estimates donor and surrogate coefficients jointly on the post-treatment period in a single just-identified IV fit (
Z' (Y - [W X] params) = 0), with the surrogate blockX gammagiving the time-varying effect. The GMM sandwich variance here is scaled by the number of post-treatment periodsT1.- Parameters:
outcome_vector (np.ndarray) – Treated outcome, shape
(total_periods,).main_covariates (np.ndarray) – Donor outcomes
W.main_instruments (np.ndarray) – Donor proxies
Z0.surrogate_instruments (np.ndarray) – Surrogate proxies
Z1.surrogate_covariates (np.ndarray) – Surrogate outcomes
X.treatment_start_period (int) – Index of the first post-treatment period
T0.num_post_treatment_periods_analyzed (int) – Number of post-treatment periods
T1.hac_truncation_lag (int) – Bartlett bandwidth.
aux_main_covariates, aux_main_instruments, aux_surrogate_covariates (np.ndarray, optional) – Optional covariates augmenting the design/instrument blocks.
- Returns:
tau (float) – ATT (mean post-period time-varying effect).
taut (np.ndarray) – Time-varying effect
X gammaover all periods.params_W (np.ndarray) – Donor coefficients (original donors only).
se_tau (float) – Standard error of the ATT (
np.nanif GMM inference fails).
Single Proxy Synthetic Control: ridge-GMM with the treated unit’s own (optionally detrended) outcome as the instrument, plus the GMM/HAC ATT standard error and conformal prediction intervals.
Single Proxy Synthetic Control (SPSC).
Implements:
Park, C., & Tchetgen Tchetgen, E. J. (2025). “Single Proxy Synthetic Control.” Journal of Causal Inference 13(1), 20230079. https://doi.org/10.1515/jci-2023-0079
Unlike the two-proxy proximal estimators (PI/PIS/PIPost), SPSC needs only
one type of proxy: the donor outcomes themselves. It views the donor
outcomes W as error-prone proxies of the treated unit’s treatment-free
potential outcome, and uses the treated unit’s own (optionally detrended)
pre-treatment outcome as the instrument. A ridge-regularized GMM recovers
the synthetic-control weights gamma; the ATT is the mean post-period
gap, with a GMM sandwich (HAC) standard error.
This is a faithful port of the authors’ reference R package
(github.com/qkrcks0218/SPSC), validated value-for-value on the Panic of
1907 application (Table 3): SPSC-NoDT ATT -0.811 / SE 0.085 (paper -0.813 /
0.084) and SPSC-DT ATT -0.815 / SE 0.067 (paper -0.816 / 0.066).
- mlsynth.utils.proximal_helpers.spsc.estimation.estimate_spsc(outcome_vector: ndarray, donor_outcomes: ndarray, num_pre_treatment_periods: int, detrend: bool = True, spline_df: int = 5, ridge_lambda: float | None = None, basis_degree: int = 1) Tuple[ndarray, ndarray, float, float, ndarray, float]#
Single Proxy Synthetic Control estimate.
- Parameters:
outcome_vector (np.ndarray) – Treated outcome over all
Tperiods, shape(T,).donor_outcomes (np.ndarray) – Donor outcomes
W, shape(T, N)– the single proxy group.num_pre_treatment_periods (int) – Number of pre-treatment periods
T0.detrend (bool, default True) – If True, residualize the treated outcome against a cubic B-spline time trend (SPSC-DT); otherwise SPSC-NoDT.
spline_df (int, default 5) – Degrees of freedom of the detrend B-spline basis.
ridge_lambda (float or None, default None) – log10 ridge penalty.
Noneselects it by leave-one-out CV over10**[-6, ..., 2].basis_degree (int, default 1) – Degree of the polynomial sieve applied to the treated-outcome instrument (the reference’s
Y.basis).1is the linear single proxy;>=2is the nonparametric (series) SPSC, which spans a richer space of the outcome and over-identifies the bridge – useful when the synthetic-control bridge is nonlinear in the donor outcomes.
- Returns:
counterfactual (np.ndarray) – Synthetic control
W gammaover all periods, shape(T,).gamma (np.ndarray) – Donor weights.
att (float) – Mean post-treatment gap.
se (float) – GMM/HAC standard error of the ATT (
np.nanifT1 <= 1).trend (np.ndarray) – Estimated treated-outcome trend (zeros if
detrend=False).lambda_opt (float) – Selected log10 ridge penalty.
Conformal prediction intervals for SPSC (Park & Tchetgen Tchetgen 2025, Sec. 3.5).
Constructs pointwise prediction intervals for the per-period treatment
effect xi_t = y^1_{0t} - y^0_{0t} by inverting the permutation test of
Chernozhukov, Wuthrich and Zhu (2021). For a candidate effect xi at a
post-period s, the treated outcome is “un-treated” (y_s - xi),
appended to the pre-period sample, the synthetic-control weights are
re-fit (with the ridge penalty held fixed), and a conformal p-value is
formed from the rank of the appended residual among all residuals. The
interval is the set of xi not rejected at the target level.
This is a faithful port of the conformal.interval branch of the
authors’ reference R package (github.com/qkrcks0218/SPSC), and unlike
the asymptotic GMM standard error it remains valid with a short
post-treatment period.
- mlsynth.utils.proximal_helpers.spsc.conformal.conformal_intervals(outcome_vector: ndarray, donor_outcomes: ndarray, num_pre_treatment_periods: int, gamma: ndarray, ridge_lambda: float, detrend: bool, spline_df: int, att_se: float, periods: Sequence[int] | None = None, alpha: float = 0.05, window: float = 25.0, grid_size: int = 101, basis_degree: int = 1) Dict[str, ndarray]#
Pointwise conformal prediction intervals for the per-period effect.
- Parameters:
outcome_vector (np.ndarray) – Treated outcome, shape
(T,).donor_outcomes (np.ndarray) – Donor outcomes
W, shape(T, N).num_pre_treatment_periods (int) –
T0.gamma (np.ndarray) – Point-estimate SC weights (used to center the search grid).
ridge_lambda (float) – log10 ridge penalty held fixed during the inversion.
detrend (bool) – Whether the SPSC fit detrends (must match the point fit).
spline_df (int) – Detrend B-spline degrees of freedom.
att_se (float) – Asymptotic ATT standard error, used to scale the search grid. If not finite, a data-driven width is used.
periods (sequence of int, optional) – Post-treatment period indices (absolute, in
[T0, T)) to cover. Defaults to every post-treatment period.alpha (float, default 0.05) – Target miscoverage (95% interval).
window (float, default 25.0) – Half-width of the (SE-scaled) coarse search grid.
grid_size (int, default 101) – Number of coarse grid points (the reference uses 101).
- Returns:
dict –
{"periods": int array, "lower": float array, "upper": float array}– prediction interval forxi_tat each covered period.
The doubly-robust family: shared confounding-bridge fits and the GMM
sandwich (bridges), the doubly-robust estimator (dr), and the
treatment-bridge weighting estimator (pipw).
Shared building blocks for the doubly-robust proximal estimators.
Implements the two confounding-bridge fits and the just-identified GMM
sandwich used by both the doubly-robust (dr) and treatment-bridge
weighting (pipw) estimators of
Qiu, H., Shi, X., Miao, W., Dobriban, E., & Tchetgen Tchetgen, E. (2024). “Doubly robust proximal synthetic controls.” Biometrics 80(2), ujae055.
Bridges (with an intercept column appended to W and Z):
outcome bridge
h_alpha(W) = (1, W) alpha– a just-identified IV fit of the treated outcome on the donorsWinstrumented by the proxiesZon the pre-period.treatment bridge
q_beta(Z) = exp((1, Z) beta)– a covariate-shift / likelihood-ratio weight solving the pre-period momentE_pre[q(Z)(1, W)] = E_post[(1, W)].
Both estimators are just-identified, so the parameters solve the empirical
moment equations exactly and the asymptotic variance is the GMM sandwich
G^{-1} Omega G^{-T} / T with a Bartlett-HAC Omega.
- mlsynth.utils.proximal_helpers.bridges.augment(matrix: ndarray) ndarray#
Prepend an intercept column of ones.
- mlsynth.utils.proximal_helpers.bridges.fit_outcome_bridge(Y_pre: ndarray, Wc_pre: ndarray, Zc_pre: ndarray) ndarray#
Just-identified IV for
alpha:E_pre[(1,Z)(Y - (1,W) alpha)] = 0.
- mlsynth.utils.proximal_helpers.bridges.fit_treatment_bridge(Zc_pre: ndarray, Wc_pre: ndarray, psi: ndarray, beta_init: ndarray | None = None) ndarray#
Solve
E_pre[exp((1,Z) beta) (1,W)] = psiforbeta.psi = E_post[(1, W)]is the post-period donor mean. The system is square (dim(beta) = dim(W)+1); a Newton/hybr solve frombeta=0(with a logistic-regression fallback init) recovers it.
- mlsynth.utils.proximal_helpers.bridges.gmm_sandwich_se(theta: ndarray, moments: Callable[[ndarray], ndarray], param_index: int, total_periods: int, bandwidth: int, eps: float = 1e-06) float#
Sandwich SE for one parameter of a just-identified GMM.
- Parameters:
theta (np.ndarray) – Solved parameter vector.
moments (callable) –
theta -> Ureturning the(T, p)per-period moment matrix.param_index (int) – Index into
thetaof the parameter whose SE is wanted.total_periods (int) –
T(sandwich normalization).bandwidth (int) – Bartlett-HAC bandwidth.
- Returns:
float –
sqrt(Cov[param_index, param_index]);np.nanif the Jacobian is singular or the variance is negative.
Doubly robust proximal synthetic control (DR).
Implements the doubly-robust ATT of Qiu, Shi, Miao, Dobriban and Tchetgen Tchetgen (2024, Biometrics):
phi* = E_post[Y - h(W)] - E_pre[q(Z){Y - h(W)}],
which is consistent if either the outcome bridge h or the
treatment bridge q is correctly specified. Both nuisances are fit on
the pre-period; the estimand and all nuisance/auxiliary parameters are
stacked into one just-identified GMM, and the ATT standard error is the
GMM sandwich with a Bartlett-HAC middle. Validated against the authors’
reference code (QIU-Hongxiang-David/DR_Proximal_SC): the just-identified
point estimate matches by construction, and the sandwich SE is calibrated
(~95% Wald coverage on their simulation DGP).
- mlsynth.utils.proximal_helpers.dr.estimation.estimate_dr(outcome_vector: ndarray, donor_outcomes: ndarray, donor_proxies: ndarray, num_pre_treatment_periods: int, hac_bandwidth: int) Tuple[ndarray, ndarray, ndarray, float, float]#
Doubly-robust proximal ATT.
- Parameters:
outcome_vector (np.ndarray) – Treated outcome
Y, shape(T,).donor_outcomes (np.ndarray) – Donor outcomes
W(the outcome-bridge regressors), shape(T, n_donors).donor_proxies (np.ndarray) – Supplemental proxies
Z(instruments forhand regressors forq), shape(T, n_proxies).num_pre_treatment_periods (int) –
T0.hac_bandwidth (int) – Bartlett-HAC bandwidth for the sandwich SE.
- Returns:
counterfactual (np.ndarray) – Outcome-bridge synthetic control
h(W) = (1, W) alphaover all periods, shape(T,).alpha (np.ndarray) – Outcome-bridge coefficients (intercept first).
beta (np.ndarray) – Treatment-bridge coefficients (intercept first).
att (float) – Doubly-robust ATT estimate
phi.se (float) – GMM/HAC standard error of
phi.
Treatment-bridge (proximal inverse-probability weighting) estimator (PIPW).
Implements the weighting-only ATT of Qiu, Shi, Miao, Dobriban and Tchetgen Tchetgen (2024, Biometrics):
phi* = E_post[Y] - E_pre[q(Z) Y],
where q_beta(Z) = exp((1, Z) beta) is the treatment confounding bridge
(a covariate-shift / likelihood-ratio weight) solving the pre-period moment
E_pre[q(Z)(1, W)] = E_post[(1, W)]. Unlike the outcome-bridge methods,
this relies on no model for the treated unit’s counterfactual outcome
trajectory – only on correctly modelling the weights. The estimand and
auxiliary means are stacked into one just-identified GMM with a Bartlett-HAC
sandwich SE.
- mlsynth.utils.proximal_helpers.pipw.estimation.estimate_pipw(outcome_vector: ndarray, donor_outcomes: ndarray, donor_proxies: ndarray, num_pre_treatment_periods: int, hac_bandwidth: int) Tuple[ndarray, float, float]#
Treatment-bridge weighting ATT.
- Parameters:
outcome_vector (np.ndarray) – Treated outcome
Y, shape(T,).donor_outcomes (np.ndarray) – Donor outcomes
Wused in the weighting moment, shape(T, n_donors).donor_proxies (np.ndarray) – Supplemental proxies
Z(treatment-bridge regressors), shape(T, n_proxies).num_pre_treatment_periods (int) –
T0.hac_bandwidth (int) – Bartlett-HAC bandwidth for the sandwich SE.
- Returns:
beta (np.ndarray) – Treatment-bridge coefficients (intercept first).
att (float) – Weighting ATT estimate
phi.se (float) – GMM/HAC standard error of
phi.
Drives the requested methods on a prepared panel and assembles the per-method fits.
Run the requested proximal estimators and assemble per-method fits.
Dispatches over inputs.methods – any of PI, PIS, PIPost,
SPSC – and packages each into a ProximalMethodFit
(counterfactual, gap, ATT, GMM/HAC standard error, pre/post RMSE, donor
weights). Only the requested methods run; the config layer guarantees the
inputs each method needs are present.
- mlsynth.utils.proximal_helpers.orchestration.run_proximal(inputs: PROXIMALInputs) Dict[str, ProximalMethodFit]#
Run each estimator named in
inputs.methodsand return the fits.- Parameters:
inputs (PROXIMALInputs) – Prepared panel from
prepare_proximal_inputs().- Returns:
dict –
{method_name: ProximalMethodFit}for the requested methods, in request order.
The trajectories-and-gap overlay plot across methods.
Diagnostic plot for PROXIMAL results.
- mlsynth.utils.proximal_helpers.plotter.plot_proximal(results: PROXIMALResults) None#
Two-panel plot: trajectories + gap, with one overlay per method run.