Panel Data Approach (PDA)#
When to Use This Estimator#
The panel data approach (PDA) of Hsiao, Ching and Wan [HCW] estimates a single treated unit’s untreated counterfactual by a linear regression on the control units, fit over the pre-treatment window and extrapolated out-of-sample. Unlike the synthetic control method, PDA imposes no constraints on the coefficients (no simplex, no non-negativity), so it is a plain projection justified by a latent-factor model: if every unit loads on a small set of common factors, the treated unit’s untreated path is a linear combination of the controls’ paths plus an orthogonal error.
The challenge is which controls and how many. Classical PDA was built
for low dimensions (few controls relative to pre-periods) and chooses controls
by AIC/BIC, which break down once the number of controls N approaches or
exceeds the pre-period length T1. mlsynth packages three
high-dimensional PDA variants that resolve this differently, each with the
estimation and inference theory of its own paper:
L2-relaxation (
l2; Shi & Wang [l2relax]) – a dense estimator (a “cousin of ridge”) for when the factor model makes the projection coefficients dense; toleratesN > T1; prediction is robust to heteroskedasticity.LASSO (
lasso; Li & Bell [LASSOPDA]) – an L1 estimator that selects a sparse set of relevant controls; computationally far cheaper than AIC/BIC and works forN > T1.Forward selection (
fs; Shi & Huang [fsPDA]) – a greedy procedure that grows the control set one unit at a time, with valid post-selection inference and no sparsity requirement (works for dense or sparse models).
All three target the single-treated-unit, many-candidate-controls regime
and produce a time-varying treatment effect and an average treatment effect
(ATE) with a HAC-based confidence interval. The practical choice among them
(detailed below) follows the authors’ own arguments: l2 when the
coefficients are dense, lasso when only a few controls matter and you want
an interpretable selection, fs when you want a cheap, predictive ensemble
with honest post-selection inference regardless of sparsity.
Notation#
We adopt the notation of Shi & Wang [l2relax]. For a positive integer n,
\([n] = \{1, \ldots, n\}\); \(\mathbf{I}_n\) is the identity,
\(\mathbf{1}_n\), \(\mathbf{0}_n\) the all-ones/zeros vectors. For a
matrix \(\mathbf{A} = (a_{ij})\) and index sets
\(\mathcal{S}, \mathcal{Q}\), the submatrix is
\(\mathbf{A}_{\mathcal{S}\mathcal{Q}} = (a_{ij})_{i\in\mathcal{S}, j\in\mathcal{Q}}\)
and the subvector \(\mathbf{x}_{\mathcal{Q}} = (x_i)_{i\in\mathcal{Q}}\).
\(\phi_{\min}(\cdot)\), \(\phi_{\max}(\cdot)\) denote the smallest and
largest eigenvalues; \(\|\mathbf{A}\|_\infty = \max_{ij}|a_{ij}|\),
\(\|\mathbf{A}\|_2 = \sqrt{\phi_{\max}(\mathbf{A}'\mathbf{A})}\),
\(\|\mathbf{x}\|_1 = \sum_i |x_i|\).
The two time-series operators are central. Over an index set \(\mathcal{S}\) of periods,
the sample mean and sample covariance of a series over \(\mathcal{S}\). The intervention occurs after the pre-period \(\mathcal{T}_1 = \{1, \ldots, T_1\}\); the post-period is \(\mathcal{T}_2 = \{T_1+1, \ldots, T\}\) with \(T_2 = |\mathcal{T}_2|\). Unit \(j=0\) is treated, \(\mathcal{N} = \{1,\ldots,N\}\) the controls.
L2-relaxation (l2, Shi & Wang)#
Idea. Under the factor model the projection coefficient \(\boldsymbol{\beta}^0 = \boldsymbol{\Omega}^{-1}\boldsymbol{\Lambda} (\boldsymbol{\Lambda}'\boldsymbol{\Omega}^{-1}\boldsymbol{\Lambda} + \mathbf{I}_q)^{-1}\boldsymbol{\lambda}_0\) is dense in general – almost no entries are exactly zero. Sparse methods (LASSO) are then mis-matched. With \(\hat{\boldsymbol{\Sigma}} = \Gamma_{\mathcal{T}_1}(\mathbf{x}_t, \mathbf{x}_t')\) and \(\hat{\boldsymbol{\eta}} = \Gamma_{\mathcal{T}_1}(\mathbf{x}_t, y_t)\), OLS solves the KKT condition \(\hat{\boldsymbol{\Sigma}}\boldsymbol{\beta} = \hat{\boldsymbol{\eta}}\), which is unstable or non-unique once \(N\) is close to or exceeds \(T_1\). L2-relaxation relaxes the sup-norm of this moment condition by a tuning parameter \(\tau\) and minimizes the coefficient norm:
This is the “bias-variance trade-off” made explicit: tolerating a small
violation \(\tau\) of the OLS moment condition shrinks the variance. At
\(\tau = 0\) it reduces to (ridgeless) OLS; at \(\tau \ge
\|\hat{\boldsymbol{\eta}}\|_\infty\) it gives \(\boldsymbol{\beta} =
\mathbf{0}\). mlsynth picks \(\tau\) by sequential out-of-sample
validation on the tail of the training window (the validated \(\tau\)
tracks the infeasible-optimal one, and both shrink toward zero as the sample
grows) over a log-spaced grid down to \(10^{-4}\max|\hat{\boldsymbol{\eta}}|\)
(the optimum is often a tiny fraction of the cap). This is time-respecting –
the fit never sees periods later than the validation tail – unlike the
released L2relax.CV, whose 5-block K-fold trains on both past and future
of each block.
Note
Standardisation. Following the authors’ released L2relax, the treated
and control series are standardised (demeaned and scaled to unit
variance) before forming \(\hat{\boldsymbol{\Sigma}}\) /
\(\hat{\boldsymbol{\eta}}\), and the solution is mapped back to the
original scale. This is the default (l2_standardize=True) – the
\(\ell_2\) penalty is scale-sensitive, so standardisation is both
recommended and what reproduces the paper’s empirical results; on the Hong
Kong panel it moves the L2 estimate from \(2.48\%\) to \(2.61\%\)
(closer to Shi & Wang’s \(2.65\%\)). Set l2_standardize=False for the
raw-scale variant.
Assumptions (Shi & Wang).
Assumption 1 (loadings). \(\|\boldsymbol{\lambda}_0\|_\infty + \|\boldsymbol{\Lambda}\|_\infty \le C\), and there is a \(q\)-unit subset making \(\boldsymbol{\Lambda}_{\mathcal{Q}\cdot}\) full column rank; the average factor strength \(\xi_N = \phi_{\min}(\boldsymbol{\Lambda}' \boldsymbol{\Lambda}/N)\) may vanish (weak factors allowed).
Assumption 2 (errors). The idiosyncratic covariance \(\boldsymbol{\Omega}\) has eigenvalues bounded between \(\underline\sigma^2\) and \(\overline\sigma^2\); errors may be heteroskedastic and cross-sectionally dependent.
Assumption 3-4 (sampling). In- and out-of-sample sampling errors of the sample moments are \(O_p(T_1^{-1/2})\) for low-dimensional pieces and \(O_p(\sqrt{\log N / (N\wedge T_1)})\) for high-dimensional ones (holding under time-series weak dependence, not just i.i.d.).
Assumption 5 (ATE inference). The oracle prediction error \(\epsilon_t^*\) and the effect-plus-error \(d_t^* = \Delta_t - \mathbb{E}[\Delta_t] + \epsilon_t^*\) have finite, positive long-run variances \(\rho^2_{\epsilon^*}\), \(\rho^2_{d^*}\) consistently estimable by HAC, and a sequential CLT applies.
Remark. The coefficient estimator is consistent for the oracle target (Theorem 1) and – importantly – the prediction error is asymptotically irrelevant to heteroskedasticity (Theorem 2): unlike the coefficient MSE, the out-of-sample MSE does not depend on the noise heterogeneity \(\psi_{\max}\).
Inference (Shi & Wang, Theorem 3; single treated unit). With pre-period prediction residuals \(e_t = y_t - \hat{y}_t\) (\(t\in\mathcal{T}_1\)) and post-period effects \(\hat{\Delta}_t\) (\(t\in\mathcal{T}_2\)),
where \(\hat{\rho}^2_{(1)}\) is the HAC long-run variance of the pre-period residuals (first-stage estimation uncertainty) and \(\hat{\rho}^2_{(2)}\) is the HAC long-run variance of the de-meaned post-period effects (post-period averaging). Both sources of uncertainty enter, which matters when \(T_1\) and \(T_2\) are comparable.
When to use. Dense, factor-driven coefficients; high dimension (\(N>T_1\) permitted); when prediction accuracy and heteroskedasticity- robustness matter more than identifying a handful of controls.
LASSO (lasso, Li & Bell)#
Idea. When only a few controls are truly relevant, an L1 penalty selects them and shrinks the rest. Li & Bell estimate
with \(\lambda\) chosen by (leave-one-out) cross-validation, then predict the counterfactual as in the shared model. LASSO works for \(N > T_1\) (where AIC/AICC/BIC cannot even be computed) and is far cheaper.
Assumptions (Li & Bell). They relax HCW’s linear-conditional-mean assumption and drop one of HCW’s identification conditions. The key conditions are: a factor model with \(\mathrm{Rank}(\tilde{B}) = K\) (enough independent factor variation among the controls); a weakly dependent, weakly stationary panel so laws of large numbers and CLTs apply to partial sums; consistency of the pre-period least-squares pieces (\(\hat{\delta}_1 - \delta_1, \hat{\delta}-\delta = O_p(T_1^{-1/2})\)); and \(\rho\)-mixing with geometric decay plus a finite limit \(\eta = \lim T_2/T_1\). Sparsity (only \(m\) of \(\boldsymbol{\beta}\) non-zero, \(m\) fixed or \(o(T_1)\)) is assumed for the high-dimensional selection.
Remark. Li & Bell prove consistency \(\hat{\Delta}_1 - \Delta_1 = O_p(T_1^{-1/2} + T_2^{-1/2})\) (estimation error has two parts: first-stage \(O_p(T_1^{-1/2})\) and post-averaging \(O_p(T_2^{-1/2})\)), holding even when \(y_t\) is trend-stationary.
Inference (Li & Bell, Theorem 3.2). With \(\hat{\Sigma} = \hat{\Sigma}_1 + \hat{\Sigma}_2\),
where \(\hat{\Sigma}_2\) is the Newey-West HAC long-run variance of the post-period effects, and \(\hat{\Sigma}_1\) is the first-stage (pre-period estimation) variance – the OLS prediction variance of the mean post-period counterfactual on the selected support. Li & Bell note \(\hat{\Sigma}_1\) is negligible when \(T_1 \gg T_2\), so the post-period term dominates in long-pre-period panels.
When to use. A genuinely sparse set of relevant controls; very large \(N\) (even \(N/T_1 \to \infty\)); when an interpretable, computa- tionally cheap selection is preferred. (For selection consistency, Li & Bell note the adaptive LASSO; for prediction, plain LASSO already beats AIC/BIC and leave-many-out CV in their simulations.)
Forward selection (fs, Shi & Huang)#
Idea. Rather than penalize, grow the control set greedily. Start empty; at each step add the control whose inclusion maximizes the pre-treatment OLS \(R^2\) (equivalently minimizes the residual sum of squares). The number of selected controls \(R\) is a tuning parameter chosen by a modified BIC (Wang, Li & Tsai),
with \(\hat{\sigma}^2(\hat{U}_r)\) the pre-period residual variance of OLS on the \(r\)-unit set. The counterfactual is the OLS extrapolation on the chosen set \(\hat{U}_{\hat{R}}\). Forward selection evaluates \(\sum_r (N-r+1)\) regressions – linear in \(N\) – versus the \(2^N\) of exhaustive subset search.
Assumptions (Shi & Huang). Asymptotics are multi-index: \(N\to\infty\) with \(T_1 = T_1(N)\) deterministic, \(\log N / T_1 \to 0\), and \(T_2 = T_2(N) \to \infty\) with \(\log N / T_2 \to 0\) (\(N\) may exceed \(T_1\)).
Assumption 1 (sparse Riesz / restricted eigenvalue). The minimal eigenvalue of the population Gram matrix over any \(u\)-unit subset (\(u \le (1+\delta_1)R\)) is bounded below – a condition the authors show is a natural implication of the latent factor model, not an ad hoc restriction.
Assumption 2 (second moments). Sample second moments converge at the high-dimensional rate \(O_p(\sqrt{\log N / T_1})\) with bounded fourth moments.
Assumption 3 (post-period). Analogous convergence and long-run-variance bounds on the post-treatment data.
Assumption 4 (weak dependence). The series are strong (\(\alpha\)-) mixing with geometric decay, so a Berry-Esseen bound for heterogeneous time series applies.
Remark. The validity is uniform over a class of DGPs (Theorem 1) – it holds whether the true coefficients are dense or sparse, which separates fsPDA from the post-selection-inference literature that needs sparsity or the oracle property. Theorem 2 shows the greedy algorithm attains a regression variance asymptotically as small as the best \(u\)-unit subset, so the cheap forward search is statistically efficient.
Inference (Shi & Huang, Eq. 4). Because forward selection uses only the pre-period and, under weak dependence, the pre- and post-periods become asymptotically independent (sample splitting), the naive conditional t-statistic is valid:
where \(\hat{\rho}^2_{\tau\hat{U}}\) is the HAC long-run variance of the de-meaned post-period effects. No first-stage variance term is needed – the asymptotic independence absorbs it – which makes fsPDA’s inference the simplest of the three.
Note
Long-run-variance estimator. mlsynth defaults to the prewhitened
Newey-West estimator (Andrews-Monahan VAR(1) prewhitening + Bartlett kernel
with the data-driven NW(1994) bandwidth + finite-sample adjustment) – R’s
sandwich::lrvar(..., prewhite = TRUE, adjust = TRUE), which Shi & Huang
use in their application scripts. Prewhitening is essential when the
treatment-effect series is strongly serially dependent: monthly growth rates
mean-revert (lag-1 autocorrelation around \(-0.45\) in the luxury-watch
panel), and a plain Bartlett kernel cannot absorb that, leaving
\(\hat\rho\) nearly double its true value and the test far too
conservative. Setting lrvar_lag instead switches to the released
est.fsPDA package’s fixed-lag Bartlett estimator
(default lag \(\lfloor T_2^{1/4}\rfloor\), capped at
\(\lfloor\sqrt{T_2}\rfloor\)); on the watch panel that no-prewhitening
form gives an insignificant \(t \approx -1.15\), versus the prewhitened
default’s \(-2.51\) (the paper reports \(-2.457\)).
When to use. A large candidate-control pool where the goal is to synthesize an ensemble that mimics the outcome (not to interpret which controls are “causal”); when computational efficiency and honest post-selection inference matter; and regardless of whether the underlying model is sparse. (Shi & Huang recommend the adaptive LASSO instead when the identity of a few causal controls is the object of interest.)
Choosing among the three#
Method |
Coefficient structure |
Inference variance |
Use when |
|---|---|---|---|
|
dense (factor-implied) |
pre + post HAC (both terms) |
dense coefficients; |
|
sparse |
post HAC (+ first stage, small if |
few relevant controls; interpretable selection; very large |
|
dense or sparse |
post HAC only (sample splitting) |
large pool; predictive ensemble; cheap; honest post-selection inference |
Empirical Illustration: Hong Kong economic integration#
The original HCW [HCW] application – and Shi & Huang’s Example 1 – evaluates the effect of economic integration with mainland China on Hong Kong’s quarterly real-GDP growth, using 24 comparison economies. Here all three PDA variants run on the same data, and the package returns the time-varying effect, the ATE, and a HAC confidence interval for each.
import pandas as pd
from mlsynth import PDA
url = "https://raw.githubusercontent.com/jgreathouse9/mlsynth/refs/heads/main/basedata/HongKong.csv"
df = pd.read_csv(url)
res = PDA({"df": df, "outcome": "GDP", "treat": "Integration",
"unitid": "Country", "time": "Time",
"methods": ["l2", "LASSO", "fs"], "alpha": 0.05,
"display_graphs": True}).fit()
for name, fit in res.fits.items():
sel = "-" if fit.selected_donors is None else len(fit.selected_donors)
print(f"{name:6s} ATE {fit.att:.4f} SE {fit.att_se:.4f} "
f"95% CI ({fit.ci[0]:.4f}, {fit.ci[1]:.4f}) p={fit.p_value:.3f} donors={sel}")
This prints:
l2 ATE 0.0248 SE 0.0032 95% CI (0.0185, 0.0311) p=0.000 donors=24
lasso ATE 0.0330 SE 0.0054 95% CI (0.0224, 0.0436) p=0.000 donors=11
fs ATE 0.0285 SE 0.0059 95% CI (0.0169, 0.0401) p=0.000 donors=7
All three find a significant positive integration effect on Hong Kong’s
GDP growth – roughly +2.5 to +3.3 percentage points – differing only by their
selection philosophy: l2 keeps all 24 controls with dense, signed
coefficients (pre-RMSE 0.013); lasso keeps 11; fs keeps a parsimonious
7 (Malaysia, New Zealand, Norway, Austria, Canada, Thailand, Australia). The
estimates bracket the Forward-DiD result on the same data (0.025), a useful
cross-method check.
Verification#
Note
Empirical (Path A, Hong Kong). All three variants run on the HCW Hong Kong panel (above) and agree on a significant positive integration effect, consistent with the literature and the Forward-DiD cross-check (0.025).
Simulation study (Path B): forward selection vs LASSO#
Shi & Huang’s (2023) Table 1 compares forward selection against LASSO on a
four-factor DGP, re-implemented in
mlsynth.utils.pda_helpers.simulation.simulate_pda_panel(): four factors
(f1 i.i.d.; f2 AR(1) 0.9; f3 MA(2) (0.8,0.4); f4 ARMA(1,1)
(0.5,0.5) under the dynamic structure, all i.i.d. N(0,1) under the
i.i.d. structure); loadings U(1,2) on the treated + 4 relevant controls
and U(-0.1,0.1) on the remaining 96; idiosyncratic N(0,0.5); one treated
unit, \(N=100\) controls, \(T_1=T_2\). Shocks D1-D7 set the
post-period ATE (D1-D3 null → size; D4-D7 non-zero → power).
Driving the packaged PDA (methods=["fs","LASSO"],
fs_intercept=False) at \(T_1=100\) (200 reps for size, 60 for power):
factors |
method |
# donors |
size (D1) |
power (D5) |
|---|---|---|---|---|
i.i.d. |
fs |
3.9 |
0.090 |
1.00 |
i.i.d. |
LASSO |
9.5 |
0.065 |
1.00 |
dynamic |
fs |
4.5 |
0.075 |
1.00 |
dynamic |
LASSO |
15.0 |
0.140 |
1.00 |
The paper’s geometry reproduces. Forward selection is parsimonious – it
keeps to ~the 4 relevant donors in both structures – while LASSO
over-selects (9-15 donors). Forward selection’s test is correctly sized
(≈ 0.05-0.09) under both factor structures, the robustness Shi & Huang
emphasise. LASSO is correctly sized under i.i.d. factors (0.065, matching
the paper’s 0.058) but its size inflates under dynamic factors (0.140;
paper’s modified-BIC LASSO 0.184) – the size inflation the paper reports is a
dynamic-factor phenomenon, not an i.i.d. one. Both tests are fully powered at
D5 (mean-1 shift). Durable case: pda_table1.
Note
mlsynth’s LASSO is cross-validated; the paper’s is not. Shi & Huang
select the Lasso penalty with a modified BIC (Remark 4 cont., p.521:
“we tune the constants in the modified BIC to allow Lasso to take in more
variables”); mlsynth’s L1-PDA selects it with LassoCV (5-fold
cross-validation). The two are different penalty rules, so the LASSO cells
above are mlsynth’s CV variant, not a cell-by-cell match of the paper’s
Lasso. What both share – and what the benchmark pins – is the geometry:
LASSO over-selects relative to fs and its size inflates under dynamic
factors, while forward selection (the paper’s method, validated cell-by-
cell on Hong Kong in pda_hongkong) stays parsimonious and correctly
sized.
The fs_intercept knob – valid size on factor data
Achieving the correct fs size above required a fix. The released fsPDA
R package (est.fsPDA.R) fits the donor regression with an intercept;
on the paper’s mean-zero factor DGP that intercept absorbs a spurious
pre-period constant which extrapolates into the post window, biasing the
gap and inflating the null rejection rate to ~0.20. The paper’s Table 1
was produced by the simulation code (FS.R), which fits without an
intercept and yields valid size. mlsynth exposes both via
PDAConfig.fs_intercept (default False = the no-intercept, valid-size
form; set True for panels with genuine unit level shifts). With the
default, the fs D1 rejection rate drops from ~0.20 (intercept) to the
~0.05-0.09 reported above (no intercept).
One honest caveat: under dynamic factors fs shows mild residual size inflation at small \(T\) (the “imprecise long-run-variance” effect the paper itself notes), shrinking toward 5% as \(T_1 \to \infty\).
L2-relaxation (Shi & Wang). The l2 method’s out-of-sample MPSE falls
with \(T\) and its test approaches the nominal 5% size as \(T_1 \to
\infty\), matching Shi & Wang’s Table 2 (size \(0.142\) at \(T_1=50\)
→ \(0.072\) at \(200\)). Its per-fit cross-validation over the
\(\tau\) grid makes large Monte Carlos expensive (~5 s/fit), so the full
table is summarized rather than swept here.
Multiple treated units#
When several units are treated by the same intervention, Shi & Wang’s L2-relaxation PDA fits a counterfactual per treated unit against the shared control pool and aggregates the effects into a per-period cross-sectional ATE,
with \(\hat{\Sigma}_e\) the cross-sectional covariance of the pre-period
prediction residuals (so the standard error is constant across post-periods).
mlsynth.utils.pda_helpers.multitreat.run_pda_multitreat() implements this.
Because every per-unit fit shares the same
\(\hat{\boldsymbol{\Sigma}} = X'X/T_1\), all \(J\) fits run through a
single OSQP factorisation – the batched solver
mlsynth.utils.pda_helpers.l2.batch.l2_relax_batch() updates only the
constraint bounds per (unit, tau) (hundreds of conic solves become hundreds
of warm-started ADMM updates). On the Brexit study (52 UK firms vs 300 controls)
this reproduces the paper’s first-day return ATE of \(-4.3\%\)
(\(t \approx -7.4\)); see benchmarks/cases/pda_brexit.py.
Core API#
Panel Data Approach (PDA) for program evaluation.
A thin, NumPy-first orchestration over mlsynth.utils.pda_helpers. PDA
(Hsiao, Ching & Wan 2012) predicts a single treated unit’s untreated
counterfactual by a linear regression on the control units fit over the
pre-treatment window, then extrapolates out-of-sample to estimate the ATE.
Three high-dimensional variants are supported, each with the inference theory
from its own paper:
l2– L2-relaxation (Shi & Wang 2024): dense ridge-like coefficients viamin ||beta||_2^2 s.t. ||eta_hat - Sigma_hat beta||_inf <= tau; ATE inference combines pre-residual and post-effect HAC long-run variances.LASSO– L1 (Li & Bell 2017): sparse donor selection by cross-validated LASSO; HAC t-test on the ATE with a first-stage variance term.fs– forward selection (Shi & Huang 2023): greedy R^2 selection with a modified-BIC stopping rule; post-selection HAC t-test (sample splitting).
Set method for one variant or methods to run several at once.
- class mlsynth.estimators.pda.PDA(config: PDAConfig | dict)#
Bases:
objectPanel Data Approach estimator (l2 / LASSO / fs).
- Parameters:
config (PDAConfig or dict) – Validated configuration. Beyond the common fields, PDA reads
method/methods(which variant(s) to run),tau(the l2 relaxation parameter;Noneselects it by validation), andalpha(CI level).
References
Hsiao, C., Ching, H. S., & Wan, S. K. (2012). A panel data approach for program evaluation. Journal of Applied Econometrics, 27(5), 705-740.
Shi, Z., & Wang, Y. (2024). L2-relaxation for Economic Prediction.
Li, K. T., & Bell, D. R. (2017). Estimation of average treatment effects with panel data. Journal of Econometrics, 197(1), 65-75.
Shi, Z., & Huang, J. (2023). Forward-selected panel data approach for program evaluation. Journal of Econometrics, 234(2), 512-535.
- fit() PDAResults#
Run the requested PDA variant(s) and return typed results.
- Returns:
PDAResults – Container of per-variant
PDAMethodFitobjects (coefficients, counterfactual, gap, ATE, HAC standard error, CI, p-value, donor weights), with convenience aliases (att,att_se,counterfactual,donor_weights) forwarding to the primary variant.
Configuration#
- class mlsynth.config_models.PDAConfig(*, df: ~pandas.DataFrame, outcome: str, treat: str, unitid: str, time: str, display_graphs: bool = True, save: bool | str = False, counterfactual_color: ~typing.List[str] = <factory>, treated_color: str = 'black', plot: ~mlsynth.config_models.PlotConfig = <factory>, method: ~typing.Annotated[str, _PydanticGeneralMetadata(pattern='^(LASSO|l2|fs)$')] = 'fs', methods: ~typing.List[str] | None = None, tau: float | None = None, alpha: ~typing.Annotated[float, ~annotated_types.Gt(gt=0.0), ~annotated_types.Lt(lt=1.0)] = 0.05, fs_intercept: bool = False, lrvar_lag: int | None = None, l2_standardize: bool = True)#
Configuration for the Panel Data Approach (PDA) estimator.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
Result Containers#
PDA.fit() returns a
PDAResults, whose fits maps
each variant to a
PDAMethodFit (coefficients,
intercept, counterfactual, gap, ATE, HAC standard error, CI, p-value, donor
weights, and the selected-donor list for lasso/fs). The prepared,
NumPy-only panel is exposed as a
PDAInputs, with units and time
addressed through an IndexSet.
Note
PDA.fit() returns an EffectResult on the
standardized two-family contract. It is a dispatcher over the variants in
res.fits (l2 / lasso / fs); the selected variant drives the flat
accessors (res.att / res.att_ci / res.counterfactual /
res.gap / res.donor_weights / res.pre_rmse), which resolve
through the standardized sub-models. res.donor_weights are the regression
coefficients (PDA is a regression counterfactual, not a simplex average);
res.att_by_method() / res.se_by_method() report every variant.
Frozen, NumPy-first containers for the Panel Data Approach (PDA).
PDA (Hsiao, Ching & Wan 2012) predicts a treated unit’s untreated
counterfactual by a linear regression on the control units fit over the
pre-treatment window, then extrapolates out-of-sample. mlsynth exposes
three high-dimensional PDA variants, each in its own subpackage with the
inference theory from its own paper:
l2– L2-relaxation (Shi & Wang 2024).lasso– L1/LASSO (Li & Bell 2017).fs– forward-selected PDA (Shi & Huang 2023).
Everything below is pure NumPy; units/time are addressed through
IndexSet. The only DataFrame touchpoint is setup.
- class mlsynth.utils.pda_helpers.structures.PDAInputs(unit_index: ~mlsynth.utils.fast_scm_helpers.structure.IndexSet, time_index: ~mlsynth.utils.fast_scm_helpers.structure.IndexSet, y: ~numpy.ndarray, X: ~numpy.ndarray, T0: int, treated_label: ~typing.Any, metadata: ~typing.Dict[str, ~typing.Any] = <factory>)#
Bases:
objectPreprocessed, NumPy-only panel for the PDA engine.
- Parameters:
unit_index (IndexSet) – All
Ndonor units (column order ofX).time_index (IndexSet) – All
Tperiods (row order ofyandX).y (np.ndarray) – Treated-unit outcome over all periods, shape
(T,).X (np.ndarray) – Donor outcomes, shape
(T, N).T0 (int) – Number of pre-treatment periods (
T1); post isT2 = T - T0.treated_label (Any) – Identifier of the treated unit.
metadata (dict) – Free-form provenance.
- X: ndarray#
- property donor_labels: ndarray#
- time_index: IndexSet#
- unit_index: IndexSet#
- y: ndarray#
- class mlsynth.utils.pda_helpers.structures.PDAMethodFit(name: str, beta: ~numpy.ndarray, intercept: float, counterfactual: ~numpy.ndarray, gap: ~numpy.ndarray, att: float, att_se: float, ci: ~typing.Tuple[float, float], p_value: float, donor_weights: ~typing.Dict[~typing.Any, float], selected_donors: ~typing.List[~typing.Any] | None = None, metadata: ~typing.Dict[str, ~typing.Any] = <factory>)#
Bases:
objectA single PDA-variant fit (l2 / lasso / fs).
- beta: ndarray#
- counterfactual: ndarray#
- gap: ndarray#
- class mlsynth.utils.pda_helpers.structures.PDAResults(*, effects: ~mlsynth.config_models.EffectsResults | None = None, fit_diagnostics: ~mlsynth.config_models.FitDiagnosticsResults | None = None, time_series: ~mlsynth.config_models.TimeSeriesResults | None = None, weights: ~mlsynth.config_models.WeightsResults | None = None, inference: ~mlsynth.config_models.InferenceResults | None = None, method_details: ~mlsynth.config_models.MethodDetailsResults | None = None, sub_method_results: ~typing.Dict[str, ~typing.Any] | None = None, additional_outputs: ~typing.Dict[str, ~typing.Any] | None = None, raw_results: ~typing.Dict[str, ~typing.Any] | None = None, execution_summary: ~typing.Dict[str, ~typing.Any] | None = None, plot_config: ~mlsynth.config_models.PlotConfig | None = None, inputs: ~mlsynth.utils.pda_helpers.structures.PDAInputs, fits: ~typing.Dict[str, ~mlsynth.utils.pda_helpers.structures.PDAMethodFit], selected_variant: str = 'fs', metadata: ~typing.Dict[str, ~typing.Any] = <factory>)#
Bases:
BaseEstimatorResultsTop-level container returned by
mlsynth.PDA.fit().An
EffectResult(the observational report): a dispatcher over the PDA variants infits, it lifts the selected variant’s quantities into the standardized sub-models so the flat accessors (att/att_ci/counterfactual/gap/donor_weights/pre_rmse) resolve through the base contract.donor_weightshere are the regression coefficients (PDA is a regression/factor counterfactual, not a simplex donor average). The PDA-specific fields keep every variant fit.- Parameters:
inputs (PDAInputs) – Preprocessed panel.
fits (dict) –
{variant_name: PDAMethodFit}for every variant run (l2 / lasso / fs).selected_variant (str) – Which variant drives the standardized surface / flat accessors.
- fits: Dict[str, PDAMethodFit]#
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'frozen': True, 'json_encoders': {<class 'numpy.ndarray'>: <function BaseEstimatorResults.Config.<lambda>>}}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
Helper Modules#
Data preparation – the only DataFrame touchpoint: pivots to NumPy, builds the unit/time ``IndexSet``es, and splits pre/post.
Long-DataFrame -> NumPy boundary for PDA (the only pandas touchpoint).
- mlsynth.utils.pda_helpers.setup.derive_treatment(df: DataFrame, unitid: str, time: str, treat: str) Tuple[Any, Any]#
Read the single treated unit and its first treated period from
treat.
- mlsynth.utils.pda_helpers.setup.prepare_pda_inputs(df: DataFrame, *, unitid: str, time: str, outcome: str, treat: str) PDAInputs#
Pivot the panel to NumPy, build
IndexSets, split pre/post.- Returns:
PDAInputs – Pure-NumPy container: treated vector
y(T,), donor matrixX(T, N),T0, and unit/timeIndexSets.
Shared HAC long-run-variance machinery (Bartlett/Newey-West) and the N(0,1) test used by all three variants.
Shared HAC long-run-variance machinery for PDA inference.
All three PDA variants base inference on a heteroskedasticity- and autocorrelation-consistent (HAC) long-run variance of a scalar time series of post-period treatment effects (and, for some variants, pre-period prediction residuals). The Bartlett/Newey-West and uniform kernels are provided; the default truncation lag follows Newey-West.
- mlsynth.utils.pda_helpers.inference.fspda_lrvar_lag(T2: int, lrvar_lag: int | None = None) int#
Bartlett-kernel truncation lag for the fsPDA long-run variance.
Follows Shi & Huang’s released
fsPDApackage (est.fsPDA): the default isfloor(T2 ** (1/4))and a user-supplied lag must be a non-negative integer no larger thanfloor(sqrt(T2)).
- mlsynth.utils.pda_helpers.inference.hac_lrv(z: ndarray, lag: int | None = None, kernel: str = 'bartlett') float#
HAC long-run variance of a mean-zero-able scalar series
z.lrv = gamma_0 + sum_{l=1}^{L} k(l) * 2 * gamma_lwithgamma_lthe sample autocovariance andkthe Bartlett (1 - l/(L+1)) or uniform kernel. The series is de-meaned first.
- mlsynth.utils.pda_helpers.inference.lrvar_prewhite_nw(x: ndarray) float#
Prewhitened Newey-West long-run variance of the mean of
x.Andrews-Monahan (1992) VAR(1) prewhitening + a Bartlett kernel with the Newey-West (1994) data-driven bandwidth + the finite-sample adjustment – i.e. R’s
sandwich::lrvar(x, type = "Newey-West", prewhite = TRUE, adjust = TRUE), which Shi & Huang use for the post-selection t-test in their applications (app1_luxury_watch/fsPDA.R). Returns the variance of the sample mean (so the t-statistic ismean(x) / sqrt(lrvar_prewhite_nw(x))).Prewhitening is what makes this differ sharply from the plain Bartlett estimator on strongly serially-dependent effects (e.g. monthly growth rates, which mean-revert): the VAR(1) step removes the bulk of the dependence before the kernel smooths the remainder, then the long-run factor
1 / (1 - A)recolors it.
- mlsynth.utils.pda_helpers.inference.newey_west_lag(n: int) int#
Newey-West (1994) automatic truncation lag
floor(4 (n/100)^(2/9)).
- mlsynth.utils.pda_helpers.inference.normal_test(att: float, se: float, alpha: float = 0.05)#
Two-sided N(0,1) test: returns (p_value, (ci_lower, ci_upper)).
L2-relaxation (Shi & Wang): the relaxation primal, tau validation, and the two-term HAC ATE inference.
L2-relaxation estimation (Shi & Wang 2024).
Solves the L2-relaxation primal (their Eq. 3)
min_beta (1/2) ||beta||_2^2 s.t. || eta_hat - Sigma_hat beta ||_inf <= tau,
with Sigma_hat = X'X / T1 and eta_hat = X'y / T1 over the pre-period.
Following the authors’ released L2relax (ishwang1/L2relax-PDA),
the treated and control series are standardised (demeaned and scaled to unit
variance) before forming Sigma / eta, and the solution is mapped back to
the original scale: beta = sd_y * (beta_tilde / sd_X) and intercept
alpha = mean_y - mean_X . beta. Standardisation is the default (it is what
the replication code does and what reproduces the paper’s empirical results);
set standardize=False for the raw-scale variant. The tuning parameter
tau is chosen by out-of-sample validation over a log-spaced grid (the
optimal tau is often a tiny fraction of max|eta|).
- mlsynth.utils.pda_helpers.l2.estimation.cross_validate_tau(y_pre: ndarray, X_pre: ndarray, val_frac: float = 0.2, n_coarse: int = 40, n_fine: int = 40, standardize: bool = True, tau_grid: ndarray | None = None) float#
Validate
tauby sequential out-of-sample validation on the tail.The training window is split in time order – fit on the earlier
1 - val_fracfraction, validate on the most-recentval_fractail – so no future period informs an earlier counterfactual (unlike a K-fold split over contiguous blocks, which trains on both past and future of each fold).tau_gridoverrides the automatic log-spaced grid (e.g. to match a reference grid); otherwise the grid is log-spaced up tomax|eta|(the optimaltauis often a tiny fraction of that).
- mlsynth.utils.pda_helpers.l2.estimation.fit_l2(y: ndarray, X: ndarray, T0: int, tau: float | None = None, standardize: bool = True, tau_grid: ndarray | None = None) Tuple[ndarray, float, ndarray, float]#
Fit L2-relaxation; return
(beta, intercept, counterfactual, tau).
- mlsynth.utils.pda_helpers.l2.estimation.l2_relax(y_pre: ndarray, X_pre: ndarray, tau: float, standardize: bool = True) Tuple[ndarray, float]#
Solve the L2-relaxation primal for coefficients and intercept.
ATE inference for L2-relaxation PDA (Shi & Wang 2024, Theorem 3).
Single treated unit. With the post-period estimated effects
Delta_hat_t = y_t - y_hat_t (t in T2) and the pre-period prediction
residuals e_t = y_t - y_hat_t (t in T1), the ATE Delta_bar is
asymptotically normal,
Z_hat = Delta_bar / sqrt( rho_hat_(1)^2 / T1 + rho_hat_(2)^2 / T2 ) -> N(0, 1),
where rho_hat_(1)^2 is the HAC long-run variance of the pre-period
residuals and rho_hat_(2)^2 is the HAC long-run variance of the de-meaned
post-period effects. Both estimation uncertainty (pre) and post-period noise
contribute.
- mlsynth.utils.pda_helpers.l2.inference.l2_ate_inference(y: ndarray, counterfactual: ndarray, T0: int, alpha: float = 0.05) Tuple[float, float, Tuple[float, float], float]#
Return
(att, se, ci, p_value)for the L2-relaxation ATE.
LASSO (Li & Bell): cross-validated L1 estimation and the HAC t-test with a first-stage variance term.
L1/LASSO PDA estimation (Li & Bell 2017).
Selects control units and estimates coefficients by LASSO on the pre-period, with the penalty chosen by cross-validation, then predicts the treated unit’s counterfactual out-of-sample.
- mlsynth.utils.pda_helpers.lasso.estimation.fit_lasso(y: ndarray, X: ndarray, T0: int, cv: int = 5, random_state: int = 0) Tuple[ndarray, float, ndarray, ndarray]#
Fit LASSO PDA; return
(beta, intercept, counterfactual, support_mask).
ATE inference for LASSO PDA (Li & Bell 2017, Theorem 3.2).
The test statistic is sqrt(T2) * Delta_bar / sqrt(Sigma_hat) with
Sigma_hat = Sigma_hat_1 + Sigma_hat_2:
Sigma_hat_2– the HAC (Newey-West) long-run variance of the post-period effects, the always-present post-period averaging variance;Sigma_hat_1– the first-stage (pre-period estimation) variance, here the OLS prediction variance of the mean post-period counterfactual on the LASSO-selected support. Li & Bell note it is negligible whenT1 >> T2.
Equivalently the ATE standard error is
SE = sqrt(var_firststage + Sigma_hat_2 / T2).
- mlsynth.utils.pda_helpers.lasso.inference.lasso_ate_inference(y: ndarray, X: ndarray, counterfactual: ndarray, support: ndarray, T0: int, alpha: float = 0.05) Tuple[float, float, Tuple[float, float], float]#
Return
(att, se, ci, p_value)for the LASSO PDA ATE.
Forward selection (Shi & Huang): greedy R^2 selection with modified-BIC stopping and the post-selection HAC t-test.
Forward-selected PDA estimation (Shi & Huang 2023).
Greedy forward selection of control units: at each step add the donor whose
inclusion maximizes the pre-treatment OLS R^2 (equivalently minimizes the
residual variance sigma^2 = mean(e^2)). Selection stops as soon as the
modified information criterion
IC(r) = log( sigma^2(U_r) ) + B * r, B = log(log N) * log(T1) / T1
stops decreasing (the stopping rule of Wang, Li & Tsai used in the authors’
fsPDA R package), starting from the intercept-only IC = log(var(y1)).
The counterfactual is the OLS extrapolation on the selected set. This is a
direct port of est.fsPDA.R.
- mlsynth.utils.pda_helpers.fs.estimation.forward_select(y: ndarray, X: ndarray, T0: int, intercept: bool = False) Tuple[List[int], ndarray, float, ndarray]#
Forward-select donors with the stop-on-increase IC rule, then refit OLS.
Returns
(selected_indices, beta_full, intercept, counterfactual)wherebeta_fullis anN-vector with zeros off the selected support.- Parameters:
intercept (bool, default False) – When
Falsethe donor regression is fit without a constant and the IC baseline islog(mean(y_pre**2))– the paper’s simulation convention (Shi & Huang 2023,simulation/FS.R). This is the correctly-specified form for the factor DGP (mean-zero factors, no unit level shift) and yields valid test size. WhenTruea constant is included and the baseline islog(var(y_pre))– the releasedfsPDAR package convention (est.fsPDA.R), appropriate for panels with genuine level differences but size-distorting on centered data.
Post-selection ATE inference for forward-selected PDA (Shi & Huang 2023, Eq. 4).
Because forward selection uses pre-treatment data only and the pre/post periods become asymptotically independent under weak dependence, the naive conditional t-statistic is valid:
Z_U = Delta_bar / sqrt(lrvar_hat) -> N(0, 1),
where lrvar_hat is the long-run variance of the sample mean of the
(de-meaned) post-period treatment effects. By default mlsynth uses the
prewhitened Newey-West estimator (sandwich::lrvar(..., prewhite = TRUE,
adjust = TRUE)) that Shi & Huang use in their applications
(app1_luxury_watch/fsPDA.R); on serially-dependent effects this is far less
conservative than a plain Bartlett kernel. Supplying lrvar_lag instead
switches to the released est.fsPDA package’s fixed-lag Bartlett estimator
(floor(T2 ** (1/4)) if the cap, see fspda_lrvar_lag()).
- mlsynth.utils.pda_helpers.fs.inference.fs_ate_inference(y: ndarray, counterfactual: ndarray, T0: int, alpha: float = 0.05, lrvar_lag: int | None = None) Tuple[float, float, Tuple[float, float], float]#
Return
(att, se, ci, p_value)for the forward-selected PDA ATE.
Run loop assembling the per-variant fits.
Run the requested PDA variant(s) and assemble typed per-method fits.
- mlsynth.utils.pda_helpers.orchestration.resolve_methods(method: str, methods: List[str] | None) List[str]#
Internal method keys to run (explicit
methodswin overmethod).
- mlsynth.utils.pda_helpers.orchestration.run_pda(inputs: PDAInputs, methods: List[str], tau: float | None, alpha: float, fs_intercept: bool = False, lrvar_lag: int | None = None, l2_standardize: bool = True, l2_tau_grid: Sequence[float] | None = None) Dict[str, PDAMethodFit]#
Fit each requested PDA variant with its own paper’s inference.