Proximal Inference Synthetic Control (PROXIMAL)

Contents

Proximal Inference Synthetic Control (PROXIMAL)#

When to Use This Estimator#

Proximal inference is, by design, a different theory of identification from everything else in the synthetic-control family – the Bayesian (Bayesian Synthetic Control with a Soft Simplex Constraint (BVS-SS)), staggered-adoption (Sequential Synthetic Difference-in-Differences (Sequential SDiD)), matrix-completion (Matrix Completion with Nuclear Norm Minimization (MCNNM)), and forward-selection (Forward Difference-in-Differences (FDID)) variants alike. All of those identify the counterfactual by matching: they assume some combination of donors reproduces the treated unit’s latent trajectory, and they treat a good pre-treatment fit as evidence that the assumption holds. PROXIMAL begins from the opposite admission – that a time-varying confounder you cannot match away is present, and that a good-looking pre-fit can still be biased – and identifies the effect by instrumenting that confounder instead of matching it. Because this is a genuinely different identification strategy, the sections below build it up from scratch: what a proxy is, what a surrogate is, and why this counts as its own theory. First, the regimes where it pays off.

The synthetic control (SC) method of Abadie and co-authors [ABADIE2010] is justified by a latent-factor model: each unit’s outcome is driven by a common, time-varying confounder \(\boldsymbol{\lambda}_t\) (the “interactive fixed effect”) loaded differently across units. Classical SC regresses the treated unit’s pre-treatment outcomes on the donors’ and takes the fitted weights as the synthetic control. Abadie shows this is (approximately) unbiased only as the number of pre-treatment periods grows without bound, and even then only when a good pre-treatment fit is attainable.

That leaves two regimes where classical SC is unreliable, and where PROXIMAL is the right tool:

  1. Short pre-period / poor pre-fit. With few pre-treatment periods, or when no convex combination of donors closely tracks the treated unit, the bias bound does not bite and the OLS/WLS weights are inconsistent – the donor outcomes are error-laden proxies of \(\boldsymbol{\lambda}_t\), so the regressor is correlated with the residual (a textbook errors-in-variables problem). The bias does not vanish as the pre-period grows.

  2. Long or structurally-broken post-period. When the post-period is long or contains trend breaks, extrapolating a pre-period fit forward is fragile. If you also observe surrogates – post-treatment series predictive of the treatment effect – PROXIMAL can borrow that post-period information to sharpen the estimate, which classical SC simply discards.

The fix, due to Shi, Li, Miao, Hu and Tchetgen Tchetgen [ProxSCM], is to stop using every control as a regressor. Instead, split the controls: some become donors that build the synthetic control, and the rest become proxies (negative controls) that are associated with the units only through the latent factor \(\boldsymbol{\lambda}_t\). The proxies serve as instruments that purge the measurement error, yielding consistent weights and valid inference via the generalized method of moments (GMM). Liu, Tchetgen Tchetgen and Varjão [LiuTchetgenVar] extend this to surrogates, time-varying correlates of the causal effect observed post-treatment.

A Different Theory of Identification#

Most of causal inference identifies effects by removing confounding. Either you condition on enough covariates that treatment is as-good-as-random – the no-unmeasured-confounding (ignorability) assumption – or, in synthetic control, you find donor weights that reproduce the treated unit so closely that whatever drove selection is matched away. Both routes assume the confounding can be observed and neutralized.

Proximal causal inference makes a different bet. It concedes that an unmeasured confounder remains – here, the latent factor \(\boldsymbol{\lambda}_t\) that drives both the outcomes and the timing of treatment – and that you will never observe it directly. Rather than assume it away, it asks for two observable shadows of that confounder and uses them to algebraically subtract the confounding from the estimate. This is exactly the logic epidemiologists use with negative controls to detect and correct hidden bias (Lipsitch et al.; Shi, Miao, Nelson and Tchetgen Tchetgen [ShiNegControl], who give the double-negative-control identification and multiply-robust estimation theory PROXIMAL descends from). The proximal SC papers import it into panel data: \(\boldsymbol{\lambda}_t\) is the confounder, and the control units are its shadows.

Matching / ignorability (classical SC, DiD, the rest of mlsynth)

Proximal / negative control (PROXIMAL)

Core assumption

The confounder is matched or conditioned away; pre-fit is good.

A confounder remains; we observe valid proxies for it.

What identifies the effect

A donor combination that reproduces the treated trajectory.

Proxies that instrument the latent factor.

Good pre-fit is…

necessary evidence the design is credible.

neither necessary nor sufficient – bias can hide behind it.

Fails when

no convex/linear match exists, or the pre-period is short.

no variable is a valid proxy (or proxies are irrelevant).

The practical upshot: PROXIMAL is not a better way to fit the donors, nor a shrinkage/pooling trick like the Bayesian or staggered variants. It changes the assumption you must defend – from “my synthetic control matches” to “I have valid proxies for the latent confounder.”

What Counts as a Proxy?#

A proxy (synonymously, a negative control) is a variable that is

  1. associated with the latent confounder \(\boldsymbol{\lambda}_t\), but

  2. has no direct causal link to the treated unit’s outcome – its only connection to that outcome runs through \(\boldsymbol{\lambda}_t\).

Condition (1) is relevance (a proxy unrelated to the factor is useless, just like a weak instrument); condition (2) is exclusion (a proxy with its own path to the outcome would inject new bias). Proxies come in two roles, mirroring the negative-control pair:

  • A negative-control outcome is not affected by the treatment but is driven by the same latent factor. In SC the donor outcomes themselves play this role – controls are, by the no-interference assumption, unaffected by the treated unit’s treatment – and they build the synthetic control.

  • A negative-control exposure is associated with the latent factor but is not a direct cause of the outcome. In SC the outcomes of controls excluded from the donor pool serve here: they proxy the factor but do not enter the synthetic control. These are the \(\mathbf{Z}_0\) in the formulas below.

Where do real proxies come from?

  • Epidemiology (the origin). To study whether the flu vaccine cuts flu hospitalization – confounded by unmeasured health-seeking behavior – one uses a non-flu outcome such as injury/trauma hospitalization as a negative-control outcome: the vaccine cannot plausibly affect it, yet it shares the health-seeking confounder, so a non-zero “effect” on it exposes the bias.

  • Synthetic control. Control units dropped from the donor pool because they ran similar interventions or risk spillover are ideal proxies: they track the common factor but violate no-interference if used as donors. (In Abadie’s tobacco study, 38 of 50 states were eligible but only a handful received weight; the rest can be proxies.) So can treatment-free contemporaneous covariates of the donors – a sector index, market trading volume, weather – that move with \(\boldsymbol{\lambda}_t\) but are not caused by the treatment.

  • Marketing / geo experiments. In a regional campaign, a category- demand or search-volume index in untreated regions, or foot-traffic in markets the campaign never reached: associated with the macro demand factor, but with no direct line to the treated region’s sales.

What Counts as a Surrogate?#

A surrogate is a post-treatment variable driven by the same latent factors as the causal effect itself – not the confounder of the untreated outcome. It is predictive of how big the effect is, period by period. The defining contrast with a proxy:

  • a proxy carries information about \(\boldsymbol{\lambda}_t\), the confounder of the untreated outcome, and is used in the pre-period to recover the donor weights;

  • a surrogate carries information about \(\boldsymbol{\rho}_t\), the factors of the treatment effect, and is used in the post-period to sharpen or extend the estimate.

Loosely: a proxy cleans up the denominator (confounding); a surrogate informs the numerator (the effect). Crucially, a surrogate may itself be affected by the treatment – that is fine, because it is removed from the donor pool and used only to learn the effect’s trajectory, never to build the counterfactual.

Where do real surrogates come from?

  • Panic of 1907 (the paper’s example). The bid prices of the two other trusts that also suffered bank runs are useless as donors (the crisis hit them too), but their post-crisis movements track the very shock driving Knickerbocker’s effect – making them strong surrogates. Even Knickerbocker’s own bid price is used this way.

  • Marketing. After a price cut, fast downstream signals – app opens, add-to-cart rate, repeat-visit rate – respond to the same demand shock as revenue. They predict the revenue effect and arrive quickly, which is valuable when the post-launch revenue series is short or noisy.

  • Spillovers / partial treatment. Geographies that are partially treated or absorb spillover should not be donors, but they carry the treatment-effect signal and so make good surrogates.

  • Long-run effects. An early leading indicator of a long-horizon outcome (a classic “surrogate endpoint” in clinical trials) lets you estimate a long-run effect from a short post-treatment window.

The Methods#

PROXIMAL exposes six estimators. They are idiosyncratic – each makes a different identification bet and needs different inputs – so you choose the ones you want with the methods argument and the estimator runs exactly those (validating that your inputs support them):

Method

What it uses

Paper

PI

Donors + donor proxies; pre-period moments only.

Shi et al. [ProxSCM]

PIS

Adds surrogates + surrogate proxies; pre and post data.

Liu et al. [LiuTchetgenVar]

PIPost

Surrogates, post-treatment data only.

Liu et al. [LiuTchetgenVar]

SPSC

Donors only – a single proxy type, with the treated unit’s own outcome as the instrument.

Park & Tchetgen Tchetgen [SPSC]

DR

Donors + donor proxies; doubly robust – consistent if either the outcome or the weighting model is right.

Qiu et al. [DRProx]

PIPW

Donors + donor proxies; a weighting-only estimator (treatment confounding bridge), no outcome model.

Qiu et al. [DRProx]

PROXIMAL({..., "methods": ["SPSC"]})              # SPSC alone (no proxies needed)
PROXIMAL({..., "methods": ["PI"]})                # classic proximal inference
PROXIMAL({..., "methods": ["DR", "PIPW"]})        # doubly robust + weighting
PROXIMAL({..., "methods": ["PI", "PIS", "PIPost", "SPSC", "DR", "PIPW"]})  # all six

methods is required – there is no implicit default – so a run only ever computes what you asked for. The config layer enforces input consistency: "PI"/"PIS"/"PIPost"/"DR"/"PIPW" require donor proxies (and, for the surrogate methods, surrogate units and proxies), whereas "SPSC" needs only the donor pool. Results are returned on a PROXIMALResults, with results.methods mapping each requested method to its fit.

What Each Method Does in Practice#

Beyond the econometrics, the four methods answer different practical questions. Classical SCM just asks “what weighted blend of controls tracks my treated unit?” – these methods each go further in a distinct way.

PI – de-noise the synthetic control. “Build a synthetic version of my treated unit from clean controls, but correct for the fact that the controls are noisy stand-ins for the thing that actually drives my outcome.” A retailer launches a loyalty program in one metro; nearby metros are controls, but their sales are noisy proxies of a shared regional demand cycle, so a plain SC blend is biased. PI uses a second set of metros – ones kept out of the blend (say, because they ran their own promotions) – as instruments to purge that noise, so the counterfactual isn’t distorted by metro-specific blips.

PIS – borrow fast signals when the outcome is slow or broken. “My post-period is long or has a structural break, and the outcome itself is noisy – lean on quick-moving signals that respond to the same shock as the effect.” After a price change, monthly revenue is noisy and the clean post-window is short, but app engagement (sessions, add-to-cart, repeat visits) moves with the same demand shock as revenue. PIS folds those surrogates in – using both pre- and post-launch data – to sharpen the revenue-effect estimate.

PIPost – estimate the effect from post-launch data alone. “I don’t have a usable pre-period for the controls, but I do have surrogates after launch.” Maybe clean control logging only began at rollout, or the pre-period is contaminated. Because the treated outcome splits into a donor-matched piece and a surrogate-driven effect piece, PIPost recovers the effect from post-treatment data only – at the cost of some efficiency.

SPSC – the no-proxy fallback. “All I have is my treated series and a pool of other series – no curated proxy or surrogate groups.” A flagship store’s sales versus a pool of other stores, with nothing but the sales panel. SPSC treats the other stores as noisy proxies of the flagship’s own counterfactual and uses the flagship’s own pre-period as the instrument, returning a de-noised synthetic flagship plus conformal bands that stay valid even with a short post-window. It is the most practical proximal method when no natural second proxy group exists.

DR – hedge against getting the model wrong. “I have both a synthetic control I trust *and a weighting model I trust – but I’m not sure which is right, and I don’t want the answer to hinge on that.”* DR combines an outcome model (the synthetic control) with a weighting model (how the confounding shifts at the intervention) so the ATT is consistent if either one is correctly specified – you get one shot at being right across two tries. Useful in a vaccine roll-out study where you can build a synthetic-control of hospitalizations and model how disease pressure shifted, and want robustness to a misspecification of either.

PIPW – weight, don’t model the outcome. “I’d rather not commit to a model for the treated unit’s counterfactual trajectory at all.” PIPW estimates the effect purely by re-weighting the pre-period to look like the post-period (a covariate-shift / inverse-probability-style weight built from the proxies), with no synthetic-control trajectory. It is the natural choice when the outcome is hard to model but the shift in the confounding is easier to capture.

Notation#

Let \(j = 1\) denote the sole treated unit, with all units \(\mathcal{N} \coloneqq \{1, \ldots, N\}\) and donor/control pool \(\mathcal{N}_0 \coloneqq \mathcal{N} \setminus \{1\}\) of cardinality \(N_0\). A subset \(\mathcal{D} \subseteq \mathcal{N}_0\) is the donor pool used to build the synthetic control; the remaining controls are repurposed as proxies. Time runs over \(t \in \mathcal{T} \coloneqq \{1, \ldots, T\}\), split by the intervention into a pre-treatment window \(\mathcal{T}_1 \coloneqq \{1, \ldots, T_0\}\) and a post-treatment window \(\mathcal{T}_2 \coloneqq \{T_0 + 1, \ldots, T\}\); the post-period has \(T - T_0\) periods (Shi et al.’s \(T_1\)). Potential outcomes are \(y^N_{jt}\) and \(y^I_{jt}\), and we observe

\[\begin{split}y_{1t} = \begin{cases} y^N_{1t}, & t \in \mathcal{T}_1, \\ y^I_{1t}, & t \in \mathcal{T}_2. \end{cases}\end{split}\]

Stacking the donor pool, let \(\mathbf{W}_t \in \mathbb{R}^{|\mathcal{D}|}\) be the donor outcomes at time \(t\), with weight vector \(\boldsymbol{\alpha}\). Let \(\mathbf{Z}_{0t}\) be the donor proxies, \(\mathbf{X}_t \in \mathbb{R}^{H}\) the surrogate outcomes with coefficients \(\boldsymbol{\gamma}\), and \(\mathbf{Z}_{1t}\) the surrogate proxies. The estimand is the average treatment effect on the treated,

\[\tau \coloneqq \frac{1}{T - T_0} \sum_{t \in \mathcal{T}_2} \bigl(y^I_{1t} - y^N_{1t}\bigr).\]

Notation bridge

The source papers write the treated outcome \(Y_t\), donors \(W_t\), donor proxies \(Z_{0,t}\), surrogates \(X_t\), surrogate proxies \(Z_{1,t}\), the donor latent factor \(\lambda_t\), and the effect’s latent factor \(\rho_t\). We keep \(\mathbf{W}, \mathbf{Z}_0, \mathbf{X}, \mathbf{Z}_1, \boldsymbol{\lambda}, \boldsymbol{\rho}\) and write the treated unit as \(j = 1\).

Why Standard SC Fails Here#

Assume the interactive fixed-effects model

\[y^N_{jt} = \boldsymbol{\mu}_j^\top \boldsymbol{\lambda}_t + \varepsilon_{jt},\]

where \(\boldsymbol{\lambda}_t\) is an unobserved common factor and \(\boldsymbol{\mu}_j\) a unit-specific loading. A synthetic control exists if the treated loading is a weighted average of the donor loadings, \(\boldsymbol{\mu}_1 = \sum_{j \in \mathcal{D}} \alpha_j \boldsymbol{\mu}_j\). Then in the pre-period

\[y_{1t} = \sum_{j \in \mathcal{D}} \alpha_j y_{jt} + \Bigl(\varepsilon_{1t} - \sum_{j \in \mathcal{D}} \alpha_j \varepsilon_{jt}\Bigr).\]

The donor outcomes \(y_{jt}\) are noisy proxies of \(\boldsymbol{\lambda}_t\): they carry the idiosyncratic errors \(\varepsilon_{jt}\), which also appear in the residual. Regressing \(y_{1t}\) on them is therefore an errors-in-variables regression, and the OLS/WLS weights are inconsistent even as \(T_0 \to \infty\) (Ferman and Pinto). PROXIMAL breaks this correlation with an instrument.

Mathematical Formulation#

Proximal Inference (PI)#

Suppose we observe proxies \(\mathbf{Z}_{0t}\) – e.g. the outcomes of controls excluded from the donor pool, or contemporaneous covariates – that are associated with the units only through \(\boldsymbol{\lambda}_t\) in the pre-period. Then the pre-period residual \(y_{1t} - \mathbf{W}_t^\top \boldsymbol{\alpha}\) is orthogonal to the proxies, giving the moment condition

\[\mathbb{E}\!\left[\mathbf{Z}_{0t}\bigl(y_{1t} - \mathbf{W}_t^\top \boldsymbol{\alpha}\bigr)\right] = 0, \qquad t \in \mathcal{T}_1.\]

Unlike the OLS normal equation \(\mathbb{E}[\mathbf{W}_t(y_{1t} - \mathbf{W}_t^\top \boldsymbol{\alpha})] = 0\), this estimating function is mean-zero at the truth because \(\mathbf{Z}_{0t}\) is uncorrelated with the measurement error. Solving it by GMM yields a consistent \(\widehat{\boldsymbol{\alpha}}\), and the ATT is the mean post-period gap

\[\widehat{\tau} = \frac{1}{T - T_0} \sum_{t \in \mathcal{T}_2} \bigl(y_{1t} - \mathbf{W}_t^\top \widehat{\boldsymbol{\alpha}}\bigr).\]

Adding Surrogates (PIS)#

Surrogates \(\mathbf{X}_t\) are post-treatment series driven by the same latent factors \(\boldsymbol{\rho}_t\) as the treatment effect:

\[y^I_{1t} - y^N_{1t} = \boldsymbol{\rho}_t^\top \boldsymbol{\theta} + \delta_t, \qquad \mathbf{X}_t = \boldsymbol{\Phi}^\top \boldsymbol{\rho}_t + \boldsymbol{\epsilon}_{X,t}.\]

With surrogate proxies \(\mathbf{Z}_{1t}\) instrumenting \(\mathbf{X}_t\), the effect coefficient \(\boldsymbol{\gamma}\) (with \(\boldsymbol{\Phi} \boldsymbol{\gamma} = \boldsymbol{\theta}\)) is identified by a second, post-period moment. The stacked conditions are

\[\mathbb{E}\!\left[\mathbf{Z}_{0t}\bigl(y_{1t} - \mathbf{W}_t^\top \boldsymbol{\alpha}\bigr)\right] = 0,\ t \in \mathcal{T}_1, \qquad \mathbb{E}\!\left[\mathbf{Z}_{1t}\bigl(y_{1t} - \mathbf{W}_t^\top \boldsymbol{\alpha} - \mathbf{X}_t^\top \boldsymbol{\gamma}\bigr)\right] = 0,\ t \in \mathcal{T}_2,\]

and the ATT is \(\widehat{\tau} = (T - T_0)^{-1} \sum_{t \in \mathcal{T}_2} \mathbf{X}_t^\top \widehat{\boldsymbol{\gamma}}\).

Post-Treatment-Only (PIPost)#

Because the post-period outcome carries both a latent-factor component (matched by donors) and a surrogate-driven effect component, both \(\boldsymbol{\alpha}\) and \(\boldsymbol{\gamma}\) can be estimated from a single post-period IV fit, using \((\mathbf{Z}_{0t}, \mathbf{Z}_{1t})\) to instrument \((\mathbf{W}_t, \mathbf{X}_t)\):

\[\begin{split}\mathbb{E}\!\left[ \begin{pmatrix} \mathbf{Z}_{0t} \\ \mathbf{Z}_{1t} \end{pmatrix} \bigl(y_{1t} - \mathbf{W}_t^\top \boldsymbol{\alpha} - \mathbf{X}_t^\top \boldsymbol{\gamma}\bigr)\right] = 0, \qquad t \in \mathcal{T}_2.\end{split}\]

This is the most economical method – it needs no pre-period – but also the least efficient, since it discards pre-treatment information.

Inference: GMM Sandwich with HAC#

Each method stacks its moment conditions into \(\mathbf{U}_t(\boldsymbol{\theta})\) for parameters \(\boldsymbol{\theta} = (\boldsymbol{\alpha}, \boldsymbol{\gamma}, \tau)\) and solves the GMM problem \(\widehat{\boldsymbol{\theta}} \coloneqq \operatorname*{argmin}_{\boldsymbol{\theta}}\, \bar{\mathbf{U}}(\boldsymbol{\theta})^\top \boldsymbol{\Omega}^{-1} \bar{\mathbf{U}}(\boldsymbol{\theta})\). Standard errors come from the sandwich variance

\[\mathrm{Cov} = \mathbf{G}^{-1} \boldsymbol{\Omega} \bigl(\mathbf{G}^{-1}\bigr)^\top, \qquad \mathrm{SE}(\widehat{\tau}) = \sqrt{\frac{\mathrm{Cov}[-1,-1]}{T}},\]

where \(\mathbf{G}\) is the Jacobian of the moment conditions and \(\boldsymbol{\Omega}\) is the heteroskedasticity- and autocorrelation-consistent (HAC) long-run variance of the moments,

\[\boldsymbol{\Omega} = \frac{1}{T} \sum_{\ell=-J}^{J} k(\ell, J) \sum_{t} \mathbf{g}_t \mathbf{g}_{t+\ell}^\top,\]

with \(k(\cdot)\) the Bartlett kernel and bandwidth \(J = \bigl\lfloor 4 (\,(T - T_0)/100\,)^{2/9} \bigr\rfloor\). (For PIPost the normalization uses the post-period count \(T - T_0\) in place of \(T\).) The HAC middle is what makes the intervals valid under serially correlated errors.

Assumptions#

Assumption 1 (interactive fixed effects). The untreated outcome obeys \(y^N_{jt} = \boldsymbol{\mu}_j^\top \boldsymbol{\lambda}_t + \varepsilon_{jt}\) with \(\mathbb{E}[\varepsilon_{jt} \mid \boldsymbol{\lambda}_t] = 0\), and there is no interference (the treated unit’s status does not affect controls).

Remark. The latent factor \(\boldsymbol{\lambda}_t\) is the unmeasured confounder: it both drives the outcome and is associated with treatment timing. This is the standard SC data-generating model; PROXIMAL does not need it to be stationary, so trending or non-stationary factors are allowed.

Assumption 2 (existence of a synthetic control). There exist weights \(\boldsymbol{\alpha}\) with \(\boldsymbol{\mu}_1 = \sum_{j \in \mathcal{D}} \alpha_j \boldsymbol{\mu}_j\) (and, for surrogates, \(\boldsymbol{\gamma}\) with \(\boldsymbol{\Phi} \boldsymbol{\gamma} = \boldsymbol{\theta}\)).

Remark. A necessary condition is that the donor pool be at least as large as the number of latent factors (\(|\mathcal{D}| \ge \dim \boldsymbol{\lambda}_t\)), and likewise that there be at least as many surrogates as effect factors. Weights need not be non-negative or sum to one – the simplex is optional, used only for interpretability or to avoid extrapolation.

Assumption 3 (valid proxies). The proxies satisfy \(\mathbf{Z}_{0t} \perp\!\!\!\perp \{y_{1t}, \mathbf{W}_t\} \mid \boldsymbol{\lambda}_t\) for \(t \in \mathcal{T}_1\) (and analogously for \(\mathbf{Z}_{1t}\) in the post-period).

Remark. Proxies must touch the units only through the latent factor – they carry information about \(\boldsymbol{\lambda}_t\) but have no direct causal link to the treated outcome. Outcomes of controls excluded from the donor pool (e.g. units dropped for similar interventions or spillover risk) and treatment-free contemporaneous covariates are natural candidates. Proxy choice is a pre-specified, domain-knowledge decision, not a data-driven search.

Assumption 4 (relevance / completeness). The cross-moment \(\mathbb{E}[\mathbf{Z}_{0t} \mathbf{W}_t^\top]\) has full column rank (and a completeness condition holds for nonparametric identification).

Remark. This is the instrument-relevance condition: the proxies must be strongly associated with the latent factor, so that variation in \(\mathbf{W}_t\) is recoverable from variation in \(\mathbf{Z}_{0t}\). It fails precisely when the proxies are unrelated to \(\boldsymbol{\lambda}_t\), in which case they cannot purge the measurement error.

Assumption 5 (stationary, weakly dependent errors). The error processes are stationary and weakly dependent.

Remark. This is weaker than i.i.d. errors: it permits serial correlation, which is why inference uses the HAC variance rather than a white-noise formula. The latent factors themselves may still be non-stationary.

Contaminated surrogates

In practice “pure” surrogates are rare. Often a surrogate is an alternative outcome of the treated unit, or the outcome of another affected unit, and so is contaminated by the donor latent factor \(\boldsymbol{\lambda}_t\) as well as the effect factor \(\boldsymbol{\rho}_t\) (Appendix A.3 of [LiuTchetgenVar]). mlsynth handles this by residualizing the surrogate outcomes against the donor proxies and donor outcomes on the pre-period (a confounding-bridge projection) before the surrogate stage, so the surrogates used downstream carry the effect signal net of \(\boldsymbol{\lambda}_t\).

Example#

The block below is self-contained: simulate one panel from the surrogate data-generating process of [LiuTchetgenVar] – two trending donor factors \(\boldsymbol{\lambda}_t\), one effect factor \(\boldsymbol{\rho}_t\) with mean one (so the true ATT is \(\approx 1\)), and contaminated surrogates that load on both – then fit PROXIMAL and read off the ATT and standard error for all three methods.

import numpy as np
import pandas as pd
from mlsynth import PROXIMAL

rng = np.random.default_rng(4)
F, T0, T, H = 2, 100, 200, 2            # donor factors, pre, total, surrogates
post = np.arange(T) >= T0
noise = 0.3

lam = np.log(np.arange(1, T + 1))[:, None] + rng.normal(size=(T, F))  # trending factors
rho = 1.0 + rng.normal(size=T)                                        # effect factor, mean 1
Theta = np.array([[0.6, 0.4], [0.4, 0.6]])                            # surrogate contamination

Y = lam.sum(1) + rng.normal(scale=noise, size=T)
Y[post] += rho[post]                                                  # apply the effect
true_att = rho[post].mean()

W  = lam + rng.normal(scale=noise, size=(T, F))     # donor outcomes
Z0 = lam + rng.normal(scale=noise, size=(T, F))     # donor proxies
X  = lam @ Theta + np.outer(rho * post, np.ones(H)) + rng.normal(scale=noise, size=(T, H))
Z1 = np.outer(rho, np.ones(H)) + lam @ Theta + rng.normal(scale=noise, size=(T, H))

# Long panel: each donor unit carries (outcome=W, donorproxy=Z0); each surrogate
# unit carries (donorproxy column = surrogate outcome X, surrogatevar = Z1).
rows = []
for t in range(T):
    rows.append({"unit": "treated", "time": t, "y": Y[t], "dp": 0.0, "sv": 0.0,
                 "treat": int(post[t])})
    for j in range(F):
        rows.append({"unit": f"donor{j}", "time": t, "y": W[t, j], "dp": Z0[t, j],
                     "sv": 0.0, "treat": 0})
    for k in range(H):
        rows.append({"unit": f"surr{k}", "time": t, "y": 0.0, "dp": X[t, k],
                     "sv": Z1[t, k], "treat": 0})
df = pd.DataFrame(rows)

res = PROXIMAL({
    "df": df, "outcome": "y", "treat": "treat", "unitid": "unit", "time": "time",
    "methods": ["PI", "PIS", "PIPost"],
    "donors": [f"donor{j}" for j in range(F)],
    "surrogates": [f"surr{k}" for k in range(H)],
    "vars": {"donorproxies": ["dp"], "surrogatevars": ["sv"]},
    "display_graphs": False,
}).fit()

print(f"true ATT = {true_att:.3f}")
for name, fit in res.methods.items():
    print(f"{name:6s} ATT = {fit.att:+.3f}  SE = {fit.att_se:.3f}")

A representative run prints (true ATT ≈ 1.05):

PI     ATT = +1.001  SE = 0.138
PIS    ATT = +1.018  SE = 0.129
PIPost ATT = +1.080  SE = 0.120

res is a PROXIMALResults: res.pi / res.pis / res.pipost hold the per-method ProximalMethodFit objects, res.methods maps the names that ran, and convenience accessors (res.att, res.att_se, res.donor_weights, res.att_by_method()) forward to the headline PI fit.

Empirical Illustration: Panic of 1907#

[LiuTchetgenVar] apply the surrogate method to the Panic of 1907, using data from [fohlin2021]. The crisis brought down the Knickerbocker Trust, a major New York bank. We have log stock prices for 59 trusts, with Knickerbocker as the treated unit. Two other trusts also suffered bank runs and seven were tied to major firms; dropping one trust missing a period leaves 49 potential controls. The logged bid price of the 49 controls serves as the donor proxy for Knickerbocker’s log price – a sensible proxy, since the bid reflects macro forces driving the overall price.

import pandas as pd
import numpy as np
from mlsynth import PROXIMAL

file_path = "https://github.com/jgreathouse9/mlsynth/raw/refs/heads/main/basedata/trust.dta"
df = pd.read_stata(file_path)
df = df[df["ID"] != 1]  # Drop the unbalanced unit

surrogates = df[df['introuble'] == 1]['ID'].unique().tolist()  # affected trusts
donors = df[df['type'] == "normal"]['ID'].unique().tolist()    # pure controls

vars = ["bid_itp", "ask_itp"]
df[vars] = df[vars].apply(np.log)  # log, per the paper
df['Panic'] = np.where((df['time'] > 229) & (df['ID'] == 34), 1, 0)

treat, outcome, unitid, time = "Panic", "prc_log", "ID", "date"
var_dict = {"donorproxies": ["bid_itp"], "surrogatevars": ["ask_itp"]}

# Donors-only proximal inference (PI)
res_pi = PROXIMAL({
    "df": df, "treat": treat, "time": time, "outcome": outcome, "unitid": unitid,
    "methods": ["PI"],
    "treated_color": "black", "counterfactual_color": ["blue"],
    "display_graphs": True, "vars": var_dict, "donors": donors,
}).fit()

# Adding surrogates (PI, PIS, PIPost)
res_surr = PROXIMAL({
    "df": df, "treat": treat, "time": time, "outcome": outcome, "unitid": unitid,
    "methods": ["PI", "PIS", "PIPost"],
    "treated_color": "black", "counterfactual_color": ["blue", "red", "lime"],
    "display_graphs": True, "vars": var_dict, "donors": donors,
    "surrogates": surrogates,  # the affected trusts, repurposed as surrogates
}).fit()

print(res_surr.att_by_method())

This pulls the data straight from the repository (48 pure-control donors, 3 affected trusts as surrogates) and prints the ATT for each method:

{'PI': -1.148, 'PIS': -1.148, 'PIPost': -1.220}

which reproduces the paper’s full-window Table 3 estimates (PI -1.138, PI-S -1.134, PI-P -1.220) to within rounding.

Using the bid price as a proxy, the synthetic control fits the pre-intervention series well. The affected trusts – which would be discarded in a classical SC analysis because they violate the no-interference assumption – are instead repurposed as surrogates: they do not enter the donor pool, but their post-intervention movements help pin down the latent effect factors. The asking price of those trusts is their surrogate proxy. Even using only post-intervention data (PIPost), the estimate largely agrees with the donors-only proximal inference.

Single Proxy Synthetic Control (SPSC)#

PI, PIS and PIPost all require two proxy types: outcome proxies (the donors) and a separate group of treatment/surrogate proxies (\(\mathbf{Z}_0\), \(\mathbf{Z}_1\)) to instrument them. Park and Tchetgen Tchetgen [SPSC] show this can be reduced to a single proxy type – the donor outcomes alone – by a clever change of perspective.

Instead of viewing the donors as proxies of a latent factor, SPSC views them as error-prone proxies of the treated unit’s own treatment-free potential outcome \(y^N_{1t}\). It posits a synthetic-control bridge function \(h^\star\) that is conditionally unbiased for that outcome, \(y^N_{1t} = \mathbb{E}[h^\star(\mathbf{W}_t) \mid y^N_{1t}]\). With a linear bridge \(h^\star(\mathbf{W}_t) = \mathbf{W}_t^\top \boldsymbol{\gamma}\), this is the “reverse” measurement-error regression

\[\mathbf{W}_t^\top \boldsymbol{\gamma} = y^N_{1t} + \bar{\varepsilon}_t, \qquad \mathbb{E}[\bar{\varepsilon}_t \mid y^N_{1t}] = 0,\]

so the treated unit’s own pre-treatment outcome is a valid instrument for the donors – no second proxy group is needed. The identifying moment (Theorem 3.1 of [SPSC]) is \(\mathbb{E}[\,\phi(y_{1t})\,(y_{1t} - \mathbf{W}_t^\top \boldsymbol{\gamma})\,] = 0\) over \(t \in \mathcal{T}_1\), where \(\phi(\cdot)\) is a basis of the treated outcome (the identity by default).

Why use it. SPSC trades the need for a curated proxy/surrogate group for a single, always-available instrument – the treated series itself – which makes it the most practical proximal method when no natural second proxy group exists. It pairs naturally with a conformal prediction interval for the per-period effect (spsc_conformal=True), valid even with a short post-period.

Estimation. Because there are typically far fewer instruments than donors, \(\boldsymbol{\gamma}\) is estimated by a ridge-regularized GMM (penalty selected by leave-one-out cross-validation), and the ATT is the mean post-period gap with a GMM sandwich (HAC) standard error. Two variants handle trends: SPSC-NoDT uses the raw outcome as the instrument, while SPSC-DT first residualizes the treated outcome against a cubic B-spline time trend – essential when the series is non-stationary (the analogue of the time-varying estimating function in \(\Psi_{\text{pre}}\)).

Select it with methods=["SPSC"]. Unlike PI/PIS/PIPost it needs no proxy variables at all – just the treated series and the donor pool:

import pandas as pd
import numpy as np
from mlsynth import PROXIMAL

raw = pd.read_stata("https://github.com/jgreathouse9/mlsynth/raw/refs/heads/main/basedata/trust.dta")
raw["prc_log"] = raw["prc_log"].astype(float)

# Park & Tchetgen Tchetgen's window: 1906-01-05 to 1908-12-30 (T0=217).
win = raw[(raw["date"] >= "1906-01-05") & (raw["date"] <= "1908-12-31")].copy()

# Treated unit = average log price of the two most-affected trusts.
treated = (win[win["type"].isin(["Knickerbocker", "Trust Co of Am"])]
           .groupby(["date", "time"], as_index=False)
           .agg(prc_log=("prc_log", "mean")))
treated["ID"] = "treated"

# Donors = the weakly-connected "normal" trusts (drop the one unbalanced unit).
donors_df = win[(win["type"] == "normal") & (win["ID"] != 1)][
    ["ID", "date", "time", "prc_log"]].copy()
donors_df["ID"] = donors_df["ID"].astype(str)

df = pd.concat([treated[["ID", "date", "time", "prc_log"]], donors_df], ignore_index=True)
df["Panic"] = np.where((df["time"] >= 230) & (df["ID"] == "treated"), 1, 0)
donor_ids = sorted(donors_df["ID"].unique())

res = PROXIMAL({
    "df": df, "treat": "Panic", "time": "date", "outcome": "prc_log", "unitid": "ID",
    "methods": ["SPSC"],          # SPSC alone -- no proxies needed
    "donors": donor_ids,
    "spsc_detrend": True,         # SPSC-DT
    "display_graphs": False,
}).fit()

print(res.spsc.att, res.spsc.att_se, res.spsc.metadata["variant"])

This reproduces the paper’s Table 3: SPSC-DT ATT -0.815 (SE 0.067) and, with spsc_detrend=False, SPSC-NoDT ATT -0.812 (SE 0.085) – against the paper’s -0.816 / 0.066 and -0.813 / 0.084.

Conformal intervals. Set spsc_conformal=True (optionally spsc_conformal_periods=[...] to cover only some post-periods) to attach pointwise prediction intervals for the per-period effect, returned on res.spsc.metadata["conformal"] as {"periods", "lower", "upper"}. Over the Panic post-period these reproduce the average interval width of the paper’s Figure 3 (≈ 0.07 for SPSC-DT). The inversion re-fits the weights on a grid of candidate effects per period, so it is opt-in for cost.

Nonparametric (series) SPSC. By default the treated unit’s own outcome enters the moment conditions linearly – the reference’s identity Y.basis. Park & Tchetgen Tchetgen’s supplement (S1.6) notes that a rich basis of the outcome – “polynomials, trigonometric functions, splines, or wavelets” – spans a larger space of the latent factor and so identifies a bridge that need not be linear. Set spsc_basis_degree=p (\(p \ge 2\)) to replace the instrument with the polynomial sieve \([\,y_{1t},\,y_{1t}^2,\,\dots,\,y_{1t}^p\,]\). This over-identifies the ridge-GMM (more moments than donor weights) and is the right choice when the synthetic-control relationship is nonlinear in the donor outcomes; spsc_basis_degree=1 (the default) is bit-for-bit the linear single proxy. The fitted variant is labelled accordingly (res.spsc.metadata["variant"] becomes e.g. "SPSC-DT-NP3"), and the detrending and conformal machinery carry the same sieve.

Doubly Robust Proximal Synthetic Control (DR & PIPW)#

PI, PIS, PIPost and SPSC all rest on getting one model right – an outcome model (the synthetic control). Qiu, Shi, Miao, Dobriban and Tchetgen Tchetgen [DRProx] add a second, complementary nuisance and combine the two so you only need one of them to be correct.

There are two bridges (each augmented with an intercept):

  • the outcome bridge \(h(\mathbf{W}_t) = (1, \mathbf{W}_t)^\top \boldsymbol{\alpha}\) – a pre-period IV fit of the treated outcome on the donors, instrumented by the proxies (the PI idea); and

  • the treatment confounding bridge \(q(\mathbf{Z}_t) = \exp\{(1, \mathbf{Z}_t)^\top \boldsymbol{\beta}\}\) – a covariate-shift / likelihood-ratio weight capturing how the unmeasured confounding shifts at the intervention, solving \(\mathbb{E}_{\text{pre}}[q(\mathbf{Z})(1,\mathbf{W})] = \mathbb{E}_{\text{post}}[(1,\mathbf{W})]\).

They give three estimands:

\[\begin{split}\text{outcome only:}\quad & \tau = \mathbb{E}_{\text{post}}[y_{1t} - h(\mathbf{W})], \\ \text{weighting only (PIPW):}\quad & \tau = \mathbb{E}_{\text{post}}[y_{1t}] - \mathbb{E}_{\text{pre}}[q(\mathbf{Z})\,y_{1t}], \\ \text{doubly robust (DR):}\quad & \tau = \mathbb{E}_{\text{post}}[y_{1t} - h(\mathbf{W})] - \mathbb{E}_{\text{pre}}[q(\mathbf{Z})\{y_{1t} - h(\mathbf{W})\}].\end{split}\]

The DR form is consistent if either \(h\) or \(q\) is correctly specified – not necessarily both. PIPW exposes the weighting-only estimator (no outcome model at all); the outcome-only form is the existing PI.

Estimation. Each is a just-identified GMM (alpha by IV, beta by a small nonlinear solve, the means in closed form), so the parameters solve the moment equations exactly and the ATT standard error is the GMM sandwich with a Bartlett-HAC middle. DR returns the outcome-bridge synthetic control as its counterfactual; PIPW, being a pure weighting estimator, has no imputed trajectory (its counterfactual is NaN).

Both consume the same inputs as PI – donors W and the donor proxies Z – so just add them to methods. The block below is a runnable proof of the agreement claimed in Replication Status: it draws from the reference implementation’s own DGP (DR_Proximal_SC/simulation/normal: true.ATE = 2, AR(1) confounders, \(W_j = 2U_j + \text{noise}\), \(Z_j = 2U_j + \text{noise}\)), runs the packaged PROXIMAL, and checks recovery, Wald coverage, and double robustness:

import numpy as np
import pandas as pd
from mlsynth import PROXIMAL

TRUE = 2.0

def gen(T, rng, nU=2, misspecify=False):
    """Liu-Tchetgen Tchetgen-Varjao reference DGP (simulation/normal)."""
    T0 = T // 2
    U = np.empty((T, nU)); U[0] = rng.normal(size=nU)
    for t in range(1, T):
        U[t] = 0.1 * U[t - 1] + 0.9 * rng.normal(size=nU)
    sigU = U.sum(1)
    signal = sigU if not misspecify else sigU + 0.7 * sigU ** 2   # nonlinear -> breaks h-bridge
    Y = TRUE * (np.arange(1, T + 1) > T0) + 2 * signal + rng.normal(size=T)
    W = 2 * U + rng.normal(size=(T, nU))                          # donor outcomes
    Z = 2 * U + rng.normal(size=(T, nU))                          # donor proxies
    rows = []
    for t in range(T):
        rows.append({"unit": "treated", "time": t, "y": float(Y[t]), "dp": 0.0,
                     "treat": int(t >= T0)})
        for j in range(nU):
            rows.append({"unit": f"d{j}", "time": t, "y": float(W[t, j]),
                         "dp": float(Z[t, j]), "treat": 0})
    return pd.DataFrame(rows), nU

def fit(df, nU, methods):
    return PROXIMAL({
        "df": df, "outcome": "y", "treat": "treat", "unitid": "unit", "time": "time",
        "methods": methods, "donors": [f"d{j}" for j in range(nU)],
        "vars": {"donorproxies": ["dp"]}, "display_graphs": False,
    }).fit().methods

# (1) recovery + (2) 95% Wald coverage at T=1000
acc = {"DR": [], "PIPW": []}; cov = {"DR": 0, "PIPW": 0}
for r in range(200):
    m = fit(*gen(1000, np.random.default_rng(r)), ["DR", "PIPW"])
    for k in ("DR", "PIPW"):
        acc[k].append(m[k].att)
        cov[k] += abs(m[k].att - TRUE) <= 1.96 * m[k].att_se
for k in ("DR", "PIPW"):
    print(f"{k:5s} mean ATT={np.mean(acc[k]):.3f}  coverage={cov[k]/200:.0%}")
# DR    mean ATT=2.007  coverage=91%
# PIPW  mean ATT=2.007  coverage=99%

# (3) double robustness: misspecify the outcome bridge -> PI collapses, DR holds
pi, dr = [], []
for r in range(120):
    m = fit(*gen(1000, np.random.default_rng(1000 + r), misspecify=True), ["PI", "DR"])
    pi.append(m["PI"].att); dr.append(m["DR"].att)
print(f"misspecified h:  PI={np.mean(pi):.2f} (collapses)  DR={np.mean(dr):.2f} (holds)")
# misspecified h:  PI=4.30 (collapses)  DR=1.99 (holds)

Over-identified / empirical use

The paper’s real analyses (Brazil, Florida, Kansas) use a separate, larger set of proxy units Z than donors W, which makes the GMM over-identified. mlsynth’s DR/PIPW are the just-identified form (Z = the donor proxies, matched to W). In the over-identified regime with many near-collinear control-unit instruments, the GMM minimizer is ill-conditioned and its value is sensitive to the optimizer, so those published point estimates are not bit-reproducible across languages. We therefore validate DR/PIPW synthetically (Path B; see Replication Status) rather than against the empirical tables.

Replication Status#

Note

Reference-code validation (Path A). mlsynth’s PI, PIS and PIPost were checked value-for-value against the authors’ reference implementation (freshtaste/proximal) on identical data-generating draws. Both the ATT and the GMM/HAC standard error match to machine precision for all three methods. A coverage Monte Carlo confirms the inference is correct: nominal-95% Wald intervals attain ≈ 93.8% coverage (PI), identical to the reference – restored from a 63.8% undercoverage caused by an earlier Jacobian-scaling bug in the GMM sandwich.

Empirical (Path A, Panic of 1907). Running mlsynth on the trust panel (see Empirical Illustration: Panic of 1907) reproduces the full-window Table 3 of [LiuTchetgenVar] to within rounding: PI -1.148 vs. -1.138, PI-S -1.148 vs. -1.134, PI-P -1.220 vs. -1.220.

SPSC (Path A, single proxy). SPSC is a value-for-value port of the authors’ reference R package (github.com/qkrcks0218/SPSC) and reproduces its Panic-of-1907 Table 3: SPSC-NoDT ATT -0.812 / SE 0.085 (paper -0.813 / 0.084) and SPSC-DT ATT -0.815 / SE 0.067 (paper -0.816 / 0.066). The tiny ATT gap is one donor (48 vs. 49: the reference keeps a unit that is unbalanced in this build). The conformal prediction intervals of [SPSC] are also ported and reproduce the average interval width of the paper’s Figure 3 (≈ 0.07 for SPSC-DT).

SPSC (Path B, durable IFEM Monte Carlo). The authors ship a self-contained interactive-fixed-effects DGP in their package README (the “Toy Example from Interactive Fixed Effect Models,” \(\mathrm{True.ATT}=3\), a trending donor pool). The durable benchmark spsc_ifem_mc redraws it 60 times and drives mlsynth’s SPSC: both SPSC-DT and SPSC-NoDT recover the true ATT essentially without bias (biases ≈ 0.006 and 0.008), but only the detrended SPSC-DT delivers honest inference – its 95% Wald intervals cover near nominal while SPSC-NoDT under-covers because its constant-gap model is forced through a trending counterfactual. This reproduces the supplement’s central finding ([SPSC] Figures S2-S6): detrending is what buys correct coverage when the untreated trajectories drift.

Simulation (Path B). The robustness claim of [LiuTchetgenVar] Sec. 4.1 reproduces, and is pinned by the durable benchmark proximal_surrogates_mc (the authors’ freshtaste/proximal dgp.py): under a trending latent factor (\(\boldsymbol{\lambda}_t \sim N(\log t, 1)\)), classical SC is biased by the trend (mean ATT ≈ 1.30 against the true 1.0, MSE ≈ 0.19) while PI/PIS/PIPost recover the truth (biases ≲ 0.003) with near-nominal Wald coverage and lower MSE; PIS attains the lowest MSE of the three (≈ 0.05). See Example for a one-draw illustration.

DR & PIPW (Path B) – runnable proof, not a claim. The DR/PIPW agreement is demonstrated by the runnable Monte Carlo above (the Doubly Robust section), which draws from the reference implementation’s own DGP (DR_Proximal_SC/simulation/normal, true.ATE = 2) and drives the packaged PROXIMAL. At T = 1000 over 200 reps both estimators recover the truth – DR and PIPW mean ATT = 2.007 (sd 0.11) – with Wald coverage of 91% (DR) and 99% (PIPW) against the 95% nominal. The double-robustness headline also reproduces: misspecifying the outcome bridge (Y nonlinear in the confounder) biases the outcome-only PI estimator (mean ATT 4.3) while DR stays at 1.99, rescued by the correct treatment-confounding bridge. Copy-paste the block to re-derive these numbers, or run the durable benchmark dr_proximal_mc – it drives the same DGP through the packaged estimators and pins recovery, coverage, and the double-robustness collapse (PI ≈ 4.23 vs DR ≈ 1.96 under misspecification). The over-identified empirical analyses (Brazil/Florida/Kansas) are not bit-reproducible cross-language (ill-conditioned GMM; see the admonition above), so DR/PIPW rest on this synthetic validation.

Per the project’s replication contract (agents/agents_estimators.md), PROXIMAL is considered validated on the strength of the machine-precision agreement with the reference code plus the reproduced simulation behavior.

Core API#

Proximal Inference (PROXIMAL) estimator.

Implements:

Shi, X., Li, K., Miao, W., Hu, M., & Tchetgen Tchetgen, E. (2023). “Theory for Identification and Inference with Synthetic Controls: A Proximal Causal Inference Framework.” arXiv:2108.13935.

Liu, J., Tchetgen Tchetgen, E. J., & Varjao, C. (2023). “Proximal Causal Inference for Synthetic Control with Surrogates.” arXiv:2308.09527.

PROXIMAL treats donor outcomes as negative controls instrumented by donor proxies, and optionally adds surrogate outcomes instrumented by surrogate proxies. It runs up to three methods on the same panel:

  1. PI – Proximal Inference (donors only).

  2. PIS – Proximal Inference with surrogates (full-sample two-stage).

  3. PIPost – post-treatment-only surrogate variant.

PIS and PIPost run only when surrogate units are configured. Every method closes with a GMM sandwich variance for the ATT (HAC/Bartlett middle).

See mlsynth.utils.proximal_helpers for the algorithmic pieces.

class mlsynth.estimators.proximal.PROXIMAL(config: PROXIMALConfig | dict)#

Bases: object

Proximal Inference (PROXIMAL) estimator.

Parameters:

config (PROXIMALConfig or dict) – Configuration object. See mlsynth.config_models.PROXIMALConfig.

Returns:

PROXIMALResults – Container with the PI fit (always) and the PIS / PIPost fits when surrogates are configured, plus convenience accessors forwarding to the headline PI method.

fit() PROXIMALResults#

Run the proximal pipeline and return a PROXIMALResults.

Configuration#

class mlsynth.config_models.PROXIMALConfig(*, df: ~pandas.DataFrame, outcome: str, treat: str, unitid: str, time: str, display_graphs: bool = True, save: bool | str = False, counterfactual_color: str | ~typing.List[str] = <factory>, treated_color: str = 'black', plot: ~mlsynth.config_models.PlotConfig = <factory>, methods: ~typing.Annotated[~typing.List[str], ~annotated_types.MinLen(min_length=1)], donors: ~typing.Annotated[~typing.List[str | int], ~annotated_types.MinLen(min_length=1)], surrogates: ~typing.List[str | int] = <factory>, vars: ~typing.Dict[str, ~typing.List[str]] = <factory>, spsc_detrend: bool = True, spsc_lambda: float | None = None, spsc_spline_df: ~typing.Annotated[int, ~annotated_types.Ge(ge=3)] = 5, spsc_basis_degree: ~typing.Annotated[int, ~annotated_types.Ge(ge=1)] = 1, spsc_conformal: bool = False, spsc_conformal_periods: ~typing.List[int] | None = None)#

Configuration for the Proximal Inference (PROXIMAL) estimator.

classmethod check_methods_and_vars(values: Any) Any#
counterfactual_color: str | List[str]#
donors: List[str | int]#
methods: List[str]#
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

spsc_basis_degree: int#
spsc_conformal: bool#
spsc_conformal_periods: List[int] | None#
spsc_detrend: bool#
spsc_lambda: float | None#
spsc_spline_df: int#
surrogates: List[str | int]#
vars: Dict[str, List[str]]#

Result Containers#

PROXIMAL.fit() returns a PROXIMALResults, whose pi / pis / pipost fields each hold a ProximalMethodFit (counterfactual, gap, ATT, GMM/HAC standard error, pre/post RMSE, donor weights) for the methods that ran. The prepared panel is exposed as a PROXIMALInputs.

Frozen dataclasses for the Proximal Inference (PROXIMAL) estimator.

PROXIMAL bundles up to three proximal causal-inference estimators that all run on the same prepared panel:

  • PI – Proximal Inference with negative-control donor outcomes W and donor proxies Z0 (Shi, Li, Miao, Hu and Tchetgen Tchetgen 2023, arXiv:2108.13935). A pre-period IV fit imputes the post-period counterfactual.

  • PIS – Proximal Inference with Surrogates. Adds a second stage projecting the treatment effect onto surrogate outcomes X instrumented by surrogate proxies Z1 (Liu, Tchetgen Tchetgen and Varjao 2023, arXiv:2308.09527), estimated on the full sample.

  • PIPost – the post-treatment-only surrogate variant of PIS.

PIS and PIPost run only when surrogate units are configured, so the user can compare the available estimates side by side. Every method closes with a GMM sandwich variance for the ATT (HAC/Bartlett middle), validated value-for-value against the authors’ reference code.

The three layers below (inputs, per-method fit, top-level results) keep the pipeline pluggable and mirror the CLUSTERSC container design.

class mlsynth.utils.proximal_helpers.structures.PROXIMALInputs(y: ndarray, donor_outcomes: ndarray, donor_proxies: ndarray | None, surrogate_outcomes: ndarray | None, surrogate_proxies: ndarray | None, T: int, T0: int, bandwidth: int, time_labels: ndarray, treated_unit_name: Any, donor_names: Sequence, methods: Sequence[str] = ('PI',), spsc_detrend: bool = True, spsc_lambda: float | None = None, spsc_spline_df: int = 5, spsc_basis_degree: int = 1, spsc_conformal: bool = False, spsc_conformal_periods: Sequence[int] | None = None)#

Bases: object

Preprocessed panel data for the proximal pipeline.

Parameters:
  • y (np.ndarray) – Treated-unit outcome over all T periods, shape (T,).

  • donor_outcomes (np.ndarray) – Donor outcomes W, shape (T, n_donors).

  • donor_proxies (np.ndarray) – Donor proxies Z0 (instruments for W), shape (T, n_donors).

  • surrogate_outcomes (np.ndarray or None) – Cleaned surrogate outcomes X, shape (T, n_surrogate_vars); None when no surrogates are configured.

  • surrogate_proxies (np.ndarray or None) – Surrogate proxies Z1 (instruments for X); None when no surrogates are configured.

  • T (int) – Total number of periods.

  • T0 (int) – Number of pre-treatment periods.

  • bandwidth (int) – Bartlett HAC truncation lag used for all GMM standard errors.

  • time_labels (np.ndarray) – Length-T time labels.

  • treated_unit_name (Any) – Identifier of the treated unit.

  • donor_names (Sequence) – Length-n_donors donor labels (column order of donor_outcomes).

  • methods (Sequence of str) – Which estimators to run: any of "PI", "PIS", "PIPost", "SPSC".

  • spsc_detrend (bool) – Whether SPSC detrends the treated outcome against a B-spline time trend (SPSC-DT vs SPSC-NoDT).

  • spsc_lambda (float or None) – log10 ridge penalty for SPSC; None selects it by LOO-CV.

  • spsc_spline_df (int) – Degrees of freedom of the SPSC detrend B-spline basis.

  • spsc_basis_degree (int) – Degree of the polynomial sieve on the SPSC treated-outcome instrument (1 = linear single proxy; >=2 = nonparametric / series SPSC).

  • spsc_conformal (bool) – Whether to compute SPSC conformal prediction intervals.

  • spsc_conformal_periods (Sequence of int or None) – Absolute post-period indices to cover with conformal intervals; None covers every post-treatment period.

T: int#
T0: int#
bandwidth: int#
donor_names: Sequence#
donor_outcomes: ndarray#
donor_proxies: ndarray | None#
property has_surrogates: bool#

True when surrogate outcomes and proxies are both available.

methods: Sequence[str] = ('PI',)#
property n_donors: int#

Number of donor units.

property n_post: int#

Number of post-treatment periods.

spsc_basis_degree: int = 1#
spsc_conformal: bool = False#
spsc_conformal_periods: Sequence[int] | None = None#
spsc_detrend: bool = True#
spsc_lambda: float | None = None#
spsc_spline_df: int = 5#
surrogate_outcomes: ndarray | None#
surrogate_proxies: ndarray | None#
time_labels: ndarray#
treated_unit_name: Any#
y: ndarray#
class mlsynth.utils.proximal_helpers.structures.PROXIMALResults(inputs: ~mlsynth.utils.proximal_helpers.structures.PROXIMALInputs, pi: ~mlsynth.utils.proximal_helpers.structures.ProximalMethodFit | None, pis: ~mlsynth.utils.proximal_helpers.structures.ProximalMethodFit | None, pipost: ~mlsynth.utils.proximal_helpers.structures.ProximalMethodFit | None, spsc: ~mlsynth.utils.proximal_helpers.structures.ProximalMethodFit | None = None, dr: ~mlsynth.utils.proximal_helpers.structures.ProximalMethodFit | None = None, pipw: ~mlsynth.utils.proximal_helpers.structures.ProximalMethodFit | None = None, selected_variant: str = 'PI', metadata: ~typing.Dict[str, ~typing.Any] = <factory>)#

Bases: object

Top-level container returned by mlsynth.PROXIMAL.fit().

Parameters:
  • inputs (PROXIMALInputs) – Preprocessed panel.

  • pi (ProximalMethodFit or None) – Proximal Inference fit (always populated).

  • pis (ProximalMethodFit or None) – Proximal-with-surrogates fit (populated only when surrogates are configured).

  • pipost (ProximalMethodFit or None) – Post-treatment surrogate fit (populated only when surrogates are configured).

  • selected_variant (str) – Which fit is exposed via the convenience aliases att, att_se, counterfactual, gap, donor_weights – one of "PI", "PIS", "PIPost". Defaults to "PI".

  • metadata (dict) – Free-form pipeline diagnostics.

property att: float#

ATT of the primary variant.

att_by_method() Dict[str, float]#

{method: ATT} across the methods that were run.

property att_se: float | None#

ATT standard error of the primary variant.

ci_by_method() Dict[str, Tuple[float, float]]#

{method: (lower, upper)} Wald CIs from the GMM standard errors.

property counterfactual: ndarray#

Counterfactual of the primary variant.

property donor_weights: Dict[Any, float]#

Donor weights of the primary variant.

dr: ProximalMethodFit | None = None#
property gap: ndarray#

Gap of the primary variant.

inputs: PROXIMALInputs#
metadata: Dict[str, Any]#
property methods: Dict[str, ProximalMethodFit]#

{method_name: fit} for the methods that were run, in order.

property mode: str#

Solver mode reported to downstream consumers.

pi: ProximalMethodFit | None#
pipost: ProximalMethodFit | None#
pipw: ProximalMethodFit | None = None#
pis: ProximalMethodFit | None#
property pre_rmse: float#

Pre-treatment RMSE of the primary variant.

se_by_method() Dict[str, float | None]#

{method: ATT standard error} across the methods that were run.

selected_variant: str = 'PI'#
spsc: ProximalMethodFit | None = None#
class mlsynth.utils.proximal_helpers.structures.ProximalMethodFit(name: str, counterfactual: ~numpy.ndarray, gap: ~numpy.ndarray, time_varying_effect: ~numpy.ndarray, att: float, att_se: float | None, pre_rmse: float, post_rmse: float, alpha_weights: ~numpy.ndarray, donor_weights: ~typing.Dict[~typing.Any, float], metadata: ~typing.Dict[str, ~typing.Any] = <factory>)#

Bases: object

Single proximal-method (PI / PIS / PIPost) fit output.

Parameters:
  • name (str) – Method identifier ("PI", "PIS", or "PIPost").

  • counterfactual (np.ndarray) – Estimated counterfactual outcome path, shape (T,).

  • gap (np.ndarray) – Observed treated minus counterfactual, shape (T,).

  • time_varying_effect (np.ndarray) – Estimated time-varying treatment effect, shape (T,). For PI this equals gap; for the surrogate methods it is the fitted X gamma series.

  • att (float) – Mean post-treatment gap.

  • att_se (float or None) – GMM/HAC standard error of the ATT (None if inference failed).

  • pre_rmse (float) – Root-mean-squared pre-treatment gap.

  • post_rmse (float) – Root-mean-squared post-treatment gap.

  • alpha_weights (np.ndarray) – Estimated donor coefficients alpha.

  • donor_weights (dict) – Mapping {donor_name: coefficient}.

  • metadata (dict) – Free-form per-method diagnostics.

alpha_weights: ndarray#
att: float#
att_se: float | None#
property ci: Tuple[float, float]#

Two-sided 95% Wald CI for the ATT from the GMM standard error.

counterfactual: ndarray#
donor_weights: Dict[Any, float]#
gap: ndarray#
metadata: Dict[str, Any]#
name: str#
post_rmse: float#
pre_rmse: float#
time_varying_effect: ndarray#

Helper Modules#

Data preparation – pivots the long panel, builds the donor/surrogate outcome and proxy matrices, residualizes contaminated surrogates, and packs everything into the typed PROXIMALInputs.

Data preparation for the PROXIMAL estimator.

Pivots a long panel into the typed PROXIMALInputs container: the treated outcome, the donor outcome matrix W and donor proxy matrix Z0, and – when surrogate units are configured – the cleaned surrogate outcome matrix X and surrogate proxy matrix Z1.

Surrogate outcomes are residualized against the donor proxies/outcomes on the pre-period via mlsynth.utils.datautils.clean_surrogates2(), matching the construction in Liu, Tchetgen Tchetgen and Varjao (2023).

mlsynth.utils.proximal_helpers.setup.prepare_proximal_inputs(df: DataFrame, outcome: str, unitid: str, time: str, treat: str, donors: List[str | int], surrogates: List[str | int], vars: Dict[str, List[str]], methods: Sequence[str] = ('PI',), spsc_detrend: bool = True, spsc_lambda: float | None = None, spsc_spline_df: int = 5, spsc_basis_degree: int = 1, spsc_conformal: bool = False, spsc_conformal_periods: Sequence[int] | None = None) PROXIMALInputs#

Pivot a long panel into the typed inputs the PROXIMAL pipeline expects.

Only the matrices the requested methods need are built: donor proxies (Z0) are built when a method consuming them (PI/PIS/PIPost) is requested, and surrogate matrices when PIS/PIPost is requested. SPSC needs only the donor outcomes and the treated series.

Parameters:
  • df (pd.DataFrame) – Long balanced panel data.

  • outcome, unitid, time, treat (str) – Column names identifying the outcome, units, time periods, and the treatment indicator.

  • donors (list) – Donor unit identifiers used to build W.

  • surrogates (list) – Surrogate unit identifiers used to build X/Z1.

  • vars (dict) – Proxy-variable map ("donorproxies" / "surrogatevars").

  • methods (sequence of str) – Estimators to prepare for.

  • spsc_detrend, spsc_lambda, spsc_spline_df, spsc_conformal, spsc_conformal_periods – SPSC options, forwarded onto PROXIMALInputs.

Returns:

PROXIMALInputs – Prepared outcome/proxy matrices and label metadata.

Raises:

MlsynthDataError – If no configured donors are present in the panel.

The Bartlett kernel and HAC long-run variance shared by the PI family.

HAC variance machinery for proximal GMM inference.

The three proximal estimators (estimation) all close with a GMM sandwich variance for the ATT. The “meat” of that sandwich is a Heteroskedasticity- and Autocorrelation-Consistent (HAC) estimate of the long-run variance of the stacked moment conditions, formed with a Bartlett kernel. This mirrors the reference implementation of

Shi, X., Li, K., Miao, W., Hu, M., & Tchetgen Tchetgen, E. (2023). “Theory for identification and Inference with Synthetic Controls: A Proximal Causal Inference Framework.” arXiv:2108.13935.

and was validated value-for-value against the authors’ code (freshtaste/proximal).

mlsynth.utils.proximal_helpers.inference.bartlett(lag_order: int, truncation_lag: int) float#

Bartlett kernel weight 1 - |lag| / (lag_trunc + 1).

Parameters:
  • lag_order (int) – Current lag order (0 returns weight 1).

  • truncation_lag (int) – Bandwidth; lags beyond it receive weight 0.

Returns:

float – The kernel weight for lag_order.

mlsynth.utils.proximal_helpers.inference.hac(moment_conditions: ~numpy.ndarray, truncation_lag: int, kernel: ~typing.Callable[[int, int], float] = <function bartlett>) ndarray#

HAC long-run covariance of stacked moment conditions.

Parameters:
  • moment_conditions (np.ndarray) – Moment matrix of shape (n_obs, n_moments) (rows are time periods, columns are moment conditions).

  • truncation_lag (int) – Kernel bandwidth (number of autocovariance lags to include).

  • kernel (Callable[[int, int], float], optional) – Lag-weighting kernel. Defaults to bartlett().

Returns:

np.ndarray – The (n_moments, n_moments) HAC covariance estimate.

Each estimator lives in its own subpackage so new proximal methods can be added as new subpackages. pi, pis and pipost are the two-proxy GMM family; spsc is the single-proxy ridge-GMM plus conformal inference.

Proximal Inference (PI) – donors-only, two-proxy GMM.

Implements the PI estimator of Shi, Li, Miao, Hu and Tchetgen Tchetgen (2023, arXiv:2108.13935): donor outcomes W are negative-control outcomes instrumented by donor proxies Z0 on the pre-period, and the fitted relationship imputes the post-period counterfactual. Closes with the GMM sandwich variance of the ATT (HAC/Bartlett middle), validated value-for-value against the authors’ reference code (freshtaste/proximal).

mlsynth.utils.proximal_helpers.pi.estimation.estimate_pi(outcome_vector: ndarray, design_matrix: ndarray, instrument_matrix: ndarray, num_pre_treatment_periods: int, num_post_periods_for_effect_eval: int, total_periods: int, hac_truncation_lag: int, common_aux_covariates_1: ndarray | None = None, common_aux_covariates_2: ndarray | None = None) Tuple[ndarray, ndarray, float]#

Proximal Inference (PI) counterfactual, donor weights, and ATT SE.

Stage 1 estimates donor coefficients alpha on the pre-period via the just-identified IV moment Z0' (Y - W alpha) = 0; the fitted W alpha is the counterfactual. The ATT standard error is the GMM sandwich variance with a HAC (Bartlett) middle.

Parameters:
  • outcome_vector (np.ndarray) – Treated outcome, shape (total_periods,).

  • design_matrix (np.ndarray) – Donor outcomes W, shape (total_periods, n_donors).

  • instrument_matrix (np.ndarray) – Donor proxies Z0 (instruments for W), same column count as design_matrix.

  • num_pre_treatment_periods (int) – Number of pre-treatment periods T0.

  • num_post_periods_for_effect_eval (int) – Number of post-treatment periods used to average the ATT.

  • total_periods (int) – Total number of periods T.

  • hac_truncation_lag (int) – Bartlett bandwidth for the HAC variance.

  • common_aux_covariates_1, common_aux_covariates_2 (np.ndarray, optional) – Optional covariates augmenting both W and Z0. If one is given, both must be.

Returns:

  • counterfactual (np.ndarray) – Predicted counterfactual W alpha (original donors only), shape (total_periods,).

  • alpha (np.ndarray) – Donor coefficients (original donors only).

  • se_tau (float) – Standard error of the ATT (np.nan if GMM inference fails).

Proximal Inference with Surrogates (PIS) – full-sample two-stage GMM.

Implements the surrogate estimator of Liu, Tchetgen Tchetgen and Varjao (2023, arXiv:2308.09527). Stage 1 fits donor coefficients alpha on the pre-period; Stage 2 projects the post-period residual onto surrogate outcomes X instrumented by surrogate proxies Z1. Closes with the joint GMM sandwich variance of the ATT (HAC/Bartlett middle).

mlsynth.utils.proximal_helpers.pis.estimation.estimate_pi_surrogate(outcome_vector: ndarray, design_matrix_main: ndarray, instrument_matrix_main: ndarray, instrument_matrix_surrogate: ndarray, surrogate_outcome_matrix: ndarray, num_pre_treatment_periods: int, num_post_periods_for_effect_eval: int, total_periods: int, hac_truncation_lag: int, aux_covariates_main_1: ndarray | None = None, aux_covariates_main_2: ndarray | None = None, aux_covariates_surrogate: ndarray | None = None) Tuple[float, ndarray, ndarray, float]#

Proximal Inference with surrogates (PIS).

Stage 1 fits donor coefficients alpha on the pre-period (Z0' (Y - W alpha) = 0). Stage 2 projects the post-period residual onto surrogate outcomes X instrumented by surrogate proxies Z1 (Z1' (Y - W alpha - X gamma) = 0); X gamma is the time-varying effect. The ATT SE is the joint GMM sandwich variance.

Parameters:
  • outcome_vector (np.ndarray) – Treated outcome, shape (total_periods,).

  • design_matrix_main (np.ndarray) – Donor outcomes W.

  • instrument_matrix_main (np.ndarray) – Donor proxies Z0 (instruments for W).

  • instrument_matrix_surrogate (np.ndarray) – Surrogate proxies Z1 (instruments for X).

  • surrogate_outcome_matrix (np.ndarray) – Surrogate outcomes X.

  • num_pre_treatment_periods, num_post_periods_for_effect_eval, total_periods (int) – Pre/post/total period counts.

  • hac_truncation_lag (int) – Bartlett bandwidth.

  • aux_covariates_main_1, aux_covariates_main_2 (np.ndarray, optional) – Optional covariates augmenting W and Z0.

  • aux_covariates_surrogate (np.ndarray, optional) – Optional covariates augmenting X and Z1.

Returns:

  • tau (float) – ATT (mean post-period time-varying effect).

  • taut (np.ndarray) – Time-varying effect over all periods (pre-period entries are the Stage-1 residuals), shape (total_periods,).

  • alpha (np.ndarray) – Donor coefficients (original donors only).

  • se_tau (float) – Standard error of the ATT (np.nan if GMM inference fails).

Post-treatment-only proximal surrogate estimator (PIPost).

Implements the post-only variant of Liu, Tchetgen Tchetgen and Varjao (2023, arXiv:2308.09527): donor and surrogate coefficients are estimated jointly from a single post-treatment IV fit, using (Z0, Z1) to instrument (W, X). The GMM sandwich variance is scaled by the number of post-treatment periods T1.

mlsynth.utils.proximal_helpers.pipost.estimation.estimate_pi_surrogate_post(outcome_vector: ndarray, main_covariates: ndarray, main_instruments: ndarray, surrogate_instruments: ndarray, surrogate_covariates: ndarray, treatment_start_period: int, num_post_treatment_periods_analyzed: int, hac_truncation_lag: int, aux_main_covariates: ndarray | None = None, aux_main_instruments: ndarray | None = None, aux_surrogate_covariates: ndarray | None = None) Tuple[float, ndarray, ndarray, float]#

Post-treatment proximal surrogate estimator (PIPost).

Estimates donor and surrogate coefficients jointly on the post-treatment period in a single just-identified IV fit (Z' (Y - [W X] params) = 0), with the surrogate block X gamma giving the time-varying effect. The GMM sandwich variance here is scaled by the number of post-treatment periods T1.

Parameters:
  • outcome_vector (np.ndarray) – Treated outcome, shape (total_periods,).

  • main_covariates (np.ndarray) – Donor outcomes W.

  • main_instruments (np.ndarray) – Donor proxies Z0.

  • surrogate_instruments (np.ndarray) – Surrogate proxies Z1.

  • surrogate_covariates (np.ndarray) – Surrogate outcomes X.

  • treatment_start_period (int) – Index of the first post-treatment period T0.

  • num_post_treatment_periods_analyzed (int) – Number of post-treatment periods T1.

  • hac_truncation_lag (int) – Bartlett bandwidth.

  • aux_main_covariates, aux_main_instruments, aux_surrogate_covariates (np.ndarray, optional) – Optional covariates augmenting the design/instrument blocks.

Returns:

  • tau (float) – ATT (mean post-period time-varying effect).

  • taut (np.ndarray) – Time-varying effect X gamma over all periods.

  • params_W (np.ndarray) – Donor coefficients (original donors only).

  • se_tau (float) – Standard error of the ATT (np.nan if GMM inference fails).

Single Proxy Synthetic Control: ridge-GMM with the treated unit’s own (optionally detrended) outcome as the instrument, plus the GMM/HAC ATT standard error and conformal prediction intervals.

Single Proxy Synthetic Control (SPSC).

Implements:

Park, C., & Tchetgen Tchetgen, E. J. (2025). “Single Proxy Synthetic Control.” Journal of Causal Inference 13(1), 20230079. https://doi.org/10.1515/jci-2023-0079

Unlike the two-proxy proximal estimators (PI/PIS/PIPost), SPSC needs only one type of proxy: the donor outcomes themselves. It views the donor outcomes W as error-prone proxies of the treated unit’s treatment-free potential outcome, and uses the treated unit’s own (optionally detrended) pre-treatment outcome as the instrument. A ridge-regularized GMM recovers the synthetic-control weights gamma; the ATT is the mean post-period gap, with a GMM sandwich (HAC) standard error.

This is a faithful port of the authors’ reference R package (github.com/qkrcks0218/SPSC), validated value-for-value on the Panic of 1907 application (Table 3): SPSC-NoDT ATT -0.811 / SE 0.085 (paper -0.813 / 0.084) and SPSC-DT ATT -0.815 / SE 0.067 (paper -0.816 / 0.066).

mlsynth.utils.proximal_helpers.spsc.estimation.estimate_spsc(outcome_vector: ndarray, donor_outcomes: ndarray, num_pre_treatment_periods: int, detrend: bool = True, spline_df: int = 5, ridge_lambda: float | None = None, basis_degree: int = 1) Tuple[ndarray, ndarray, float, float, ndarray, float]#

Single Proxy Synthetic Control estimate.

Parameters:
  • outcome_vector (np.ndarray) – Treated outcome over all T periods, shape (T,).

  • donor_outcomes (np.ndarray) – Donor outcomes W, shape (T, N) – the single proxy group.

  • num_pre_treatment_periods (int) – Number of pre-treatment periods T0.

  • detrend (bool, default True) – If True, residualize the treated outcome against a cubic B-spline time trend (SPSC-DT); otherwise SPSC-NoDT.

  • spline_df (int, default 5) – Degrees of freedom of the detrend B-spline basis.

  • ridge_lambda (float or None, default None) – log10 ridge penalty. None selects it by leave-one-out CV over 10**[-6, ..., 2].

  • basis_degree (int, default 1) – Degree of the polynomial sieve applied to the treated-outcome instrument (the reference’s Y.basis). 1 is the linear single proxy; >=2 is the nonparametric (series) SPSC, which spans a richer space of the outcome and over-identifies the bridge – useful when the synthetic-control bridge is nonlinear in the donor outcomes.

Returns:

  • counterfactual (np.ndarray) – Synthetic control W gamma over all periods, shape (T,).

  • gamma (np.ndarray) – Donor weights.

  • att (float) – Mean post-treatment gap.

  • se (float) – GMM/HAC standard error of the ATT (np.nan if T1 <= 1).

  • trend (np.ndarray) – Estimated treated-outcome trend (zeros if detrend=False).

  • lambda_opt (float) – Selected log10 ridge penalty.

Conformal prediction intervals for SPSC (Park & Tchetgen Tchetgen 2025, Sec. 3.5).

Constructs pointwise prediction intervals for the per-period treatment effect xi_t = y^1_{0t} - y^0_{0t} by inverting the permutation test of Chernozhukov, Wuthrich and Zhu (2021). For a candidate effect xi at a post-period s, the treated outcome is “un-treated” (y_s - xi), appended to the pre-period sample, the synthetic-control weights are re-fit (with the ridge penalty held fixed), and a conformal p-value is formed from the rank of the appended residual among all residuals. The interval is the set of xi not rejected at the target level.

This is a faithful port of the conformal.interval branch of the authors’ reference R package (github.com/qkrcks0218/SPSC), and unlike the asymptotic GMM standard error it remains valid with a short post-treatment period.

mlsynth.utils.proximal_helpers.spsc.conformal.conformal_intervals(outcome_vector: ndarray, donor_outcomes: ndarray, num_pre_treatment_periods: int, gamma: ndarray, ridge_lambda: float, detrend: bool, spline_df: int, att_se: float, periods: Sequence[int] | None = None, alpha: float = 0.05, window: float = 25.0, grid_size: int = 101, basis_degree: int = 1) Dict[str, ndarray]#

Pointwise conformal prediction intervals for the per-period effect.

Parameters:
  • outcome_vector (np.ndarray) – Treated outcome, shape (T,).

  • donor_outcomes (np.ndarray) – Donor outcomes W, shape (T, N).

  • num_pre_treatment_periods (int) – T0.

  • gamma (np.ndarray) – Point-estimate SC weights (used to center the search grid).

  • ridge_lambda (float) – log10 ridge penalty held fixed during the inversion.

  • detrend (bool) – Whether the SPSC fit detrends (must match the point fit).

  • spline_df (int) – Detrend B-spline degrees of freedom.

  • att_se (float) – Asymptotic ATT standard error, used to scale the search grid. If not finite, a data-driven width is used.

  • periods (sequence of int, optional) – Post-treatment period indices (absolute, in [T0, T)) to cover. Defaults to every post-treatment period.

  • alpha (float, default 0.05) – Target miscoverage (95% interval).

  • window (float, default 25.0) – Half-width of the (SE-scaled) coarse search grid.

  • grid_size (int, default 101) – Number of coarse grid points (the reference uses 101).

Returns:

dict{"periods": int array, "lower": float array, "upper": float array} – prediction interval for xi_t at each covered period.

The doubly-robust family: shared confounding-bridge fits and the GMM sandwich (bridges), the doubly-robust estimator (dr), and the treatment-bridge weighting estimator (pipw).

Shared building blocks for the doubly-robust proximal estimators.

Implements the two confounding-bridge fits and the just-identified GMM sandwich used by both the doubly-robust (dr) and treatment-bridge weighting (pipw) estimators of

Qiu, H., Shi, X., Miao, W., Dobriban, E., & Tchetgen Tchetgen, E. (2024). “Doubly robust proximal synthetic controls.” Biometrics 80(2), ujae055.

Bridges (with an intercept column appended to W and Z):

  • outcome bridge h_alpha(W) = (1, W) alpha – a just-identified IV fit of the treated outcome on the donors W instrumented by the proxies Z on the pre-period.

  • treatment bridge q_beta(Z) = exp((1, Z) beta) – a covariate-shift / likelihood-ratio weight solving the pre-period moment E_pre[q(Z)(1, W)] = E_post[(1, W)].

Both estimators are just-identified, so the parameters solve the empirical moment equations exactly and the asymptotic variance is the GMM sandwich G^{-1} Omega G^{-T} / T with a Bartlett-HAC Omega.

mlsynth.utils.proximal_helpers.bridges.augment(matrix: ndarray) ndarray#

Prepend an intercept column of ones.

mlsynth.utils.proximal_helpers.bridges.fit_outcome_bridge(Y_pre: ndarray, Wc_pre: ndarray, Zc_pre: ndarray) ndarray#

Just-identified IV for alpha: E_pre[(1,Z)(Y - (1,W) alpha)] = 0.

mlsynth.utils.proximal_helpers.bridges.fit_treatment_bridge(Zc_pre: ndarray, Wc_pre: ndarray, psi: ndarray, beta_init: ndarray | None = None) ndarray#

Solve E_pre[exp((1,Z) beta) (1,W)] = psi for beta.

psi = E_post[(1, W)] is the post-period donor mean. The system is square (dim(beta) = dim(W)+1); a Newton/hybr solve from beta=0 (with a logistic-regression fallback init) recovers it.

mlsynth.utils.proximal_helpers.bridges.gmm_sandwich_se(theta: ndarray, moments: Callable[[ndarray], ndarray], param_index: int, total_periods: int, bandwidth: int, eps: float = 1e-06) float#

Sandwich SE for one parameter of a just-identified GMM.

Parameters:
  • theta (np.ndarray) – Solved parameter vector.

  • moments (callable) – theta -> U returning the (T, p) per-period moment matrix.

  • param_index (int) – Index into theta of the parameter whose SE is wanted.

  • total_periods (int) – T (sandwich normalization).

  • bandwidth (int) – Bartlett-HAC bandwidth.

Returns:

floatsqrt(Cov[param_index, param_index]); np.nan if the Jacobian is singular or the variance is negative.

Doubly robust proximal synthetic control (DR).

Implements the doubly-robust ATT of Qiu, Shi, Miao, Dobriban and Tchetgen Tchetgen (2024, Biometrics):

phi* = E_post[Y - h(W)] - E_pre[q(Z){Y - h(W)}],

which is consistent if either the outcome bridge h or the treatment bridge q is correctly specified. Both nuisances are fit on the pre-period; the estimand and all nuisance/auxiliary parameters are stacked into one just-identified GMM, and the ATT standard error is the GMM sandwich with a Bartlett-HAC middle. Validated against the authors’ reference code (QIU-Hongxiang-David/DR_Proximal_SC): the just-identified point estimate matches by construction, and the sandwich SE is calibrated (~95% Wald coverage on their simulation DGP).

mlsynth.utils.proximal_helpers.dr.estimation.estimate_dr(outcome_vector: ndarray, donor_outcomes: ndarray, donor_proxies: ndarray, num_pre_treatment_periods: int, hac_bandwidth: int) Tuple[ndarray, ndarray, ndarray, float, float]#

Doubly-robust proximal ATT.

Parameters:
  • outcome_vector (np.ndarray) – Treated outcome Y, shape (T,).

  • donor_outcomes (np.ndarray) – Donor outcomes W (the outcome-bridge regressors), shape (T, n_donors).

  • donor_proxies (np.ndarray) – Supplemental proxies Z (instruments for h and regressors for q), shape (T, n_proxies).

  • num_pre_treatment_periods (int) – T0.

  • hac_bandwidth (int) – Bartlett-HAC bandwidth for the sandwich SE.

Returns:

  • counterfactual (np.ndarray) – Outcome-bridge synthetic control h(W) = (1, W) alpha over all periods, shape (T,).

  • alpha (np.ndarray) – Outcome-bridge coefficients (intercept first).

  • beta (np.ndarray) – Treatment-bridge coefficients (intercept first).

  • att (float) – Doubly-robust ATT estimate phi.

  • se (float) – GMM/HAC standard error of phi.

Treatment-bridge (proximal inverse-probability weighting) estimator (PIPW).

Implements the weighting-only ATT of Qiu, Shi, Miao, Dobriban and Tchetgen Tchetgen (2024, Biometrics):

phi* = E_post[Y] - E_pre[q(Z) Y],

where q_beta(Z) = exp((1, Z) beta) is the treatment confounding bridge (a covariate-shift / likelihood-ratio weight) solving the pre-period moment E_pre[q(Z)(1, W)] = E_post[(1, W)]. Unlike the outcome-bridge methods, this relies on no model for the treated unit’s counterfactual outcome trajectory – only on correctly modelling the weights. The estimand and auxiliary means are stacked into one just-identified GMM with a Bartlett-HAC sandwich SE.

mlsynth.utils.proximal_helpers.pipw.estimation.estimate_pipw(outcome_vector: ndarray, donor_outcomes: ndarray, donor_proxies: ndarray, num_pre_treatment_periods: int, hac_bandwidth: int) Tuple[ndarray, float, float]#

Treatment-bridge weighting ATT.

Parameters:
  • outcome_vector (np.ndarray) – Treated outcome Y, shape (T,).

  • donor_outcomes (np.ndarray) – Donor outcomes W used in the weighting moment, shape (T, n_donors).

  • donor_proxies (np.ndarray) – Supplemental proxies Z (treatment-bridge regressors), shape (T, n_proxies).

  • num_pre_treatment_periods (int) – T0.

  • hac_bandwidth (int) – Bartlett-HAC bandwidth for the sandwich SE.

Returns:

  • beta (np.ndarray) – Treatment-bridge coefficients (intercept first).

  • att (float) – Weighting ATT estimate phi.

  • se (float) – GMM/HAC standard error of phi.

Drives the requested methods on a prepared panel and assembles the per-method fits.

Run the requested proximal estimators and assemble per-method fits.

Dispatches over inputs.methods – any of PI, PIS, PIPost, SPSC – and packages each into a ProximalMethodFit (counterfactual, gap, ATT, GMM/HAC standard error, pre/post RMSE, donor weights). Only the requested methods run; the config layer guarantees the inputs each method needs are present.

mlsynth.utils.proximal_helpers.orchestration.run_proximal(inputs: PROXIMALInputs) Dict[str, ProximalMethodFit]#

Run each estimator named in inputs.methods and return the fits.

Parameters:

inputs (PROXIMALInputs) – Prepared panel from prepare_proximal_inputs().

Returns:

dict{method_name: ProximalMethodFit} for the requested methods, in request order.

The trajectories-and-gap overlay plot across methods.

Diagnostic plot for PROXIMAL results.

mlsynth.utils.proximal_helpers.plotter.plot_proximal(results: PROXIMALResults) None#

Two-panel plot: trajectories + gap, with one overlay per method run.