SEQ_SDID — Sequential Synthetic DiD (Arkhangelsky & Samkov 2025)#
- Estimator:
Sequential Synthetic Difference-in-Differences (Sequential SDiD) —
mlsynth.SequentialSDID- Source:
Arkhangelsky, D. & Samkov, A. (2025), “Sequential Synthetic Difference in Differences,” arXiv:2404.00164v2.
- Replication type:
Path B — the paper’s Monte Carlo (Section 5.2.2, “Experiment 2: Calibrated State-Level Panel”; Table 1 and Figures 4-5).
- Status:
Verified (geometry) — the headline coverage/RMSE contrast is reproduced; the exact Table-1 cells require the authors’ (non-public) CPS panel.
Validation strategy#
The paper’s central empirical claim is about inference: when parallel trends fail because adoption timing is correlated with unobserved interactive fixed effects, standard difference-in-differences is severely biased and its confidence intervals under-cover, whereas Sequential SDiD stays approximately unbiased with near-nominal coverage. Table 1 quantifies this on a state-by-year panel calibrated to March-CPS women’s log wages: 95% CI coverage of ~0.95 for Sequential SDiD against ~0.70 for DiD, with lower RMSE at every lag.
That panel is not public, so the cells cannot be matched value-for-value. Instead we re-implement the design from the paper’s description (scenario 1, paper only) and reproduce its geometry: the same qualitative ranking and an even sharper version of the same coverage collapse.
A convenient feature of the method makes the comparison airtight: the paper’s
standard-DiD comparator is the same estimator at \(\eta \to \infty\) (the
“Original Results” line in Figure 1; the stacked-DiD limit of Remark 2.2),
exposed as mode="sdid_imputation". Both arms therefore share the Bayesian
bootstrap and differ only in the weighting.
The data-generating process#
The DGP is packaged in
mlsynth.utils.seq_sdid_helpers.simulate. Following the paper’s recipe:
Structural truth is fixed, only shocks are redrawn. The authors freeze the estimated structural components (two-way FE plus a low-rank interactive fixed effect) and generate new draws by resampling the idiosyncratic AR shocks.
calibrate_staggered_ife()draws the structure once; each draw ofsimulate_replication()redraws only the AR(2) noise. This is what makes the within-panel bootstrap a valid measure of the sampling variability the Monte Carlo averages.The IFE is a differential linear trend — the canonical rank-one interactive fixed effect, \(\lambda_i \, f_t\) with \(f_t = t/T\). Adoption is tilted toward high-loading (steeper-trending) units, so treatment timing is correlated with the unobserved trend. DiD assumes a common trend and is biased; Sequential SDiD balances the loading against later-adopting and never-treated donors and is not.
Cohorts are enlarged by replicating each unit four times (Section 5.2.1), so cohort aggregates concentrate.
Only donor-balanced cohorts are estimated. A cohort needs at least two later / never-treated donor cohorts to balance its loading, so
a_maxis capped to the sixth-latest cohort (the latest cohorts are donor-starved — see the estimator’s Sequential Synthetic Difference-in-Differences (Sequential SDiD) “Limitations”).
Reproducing Table 1’s geometry#
import warnings
import numpy as np
from mlsynth import SequentialSDID
from mlsynth.utils.seq_sdid_helpers.simulate import (
calibrate_staggered_ife, simulate_replication)
design = calibrate_staggered_ife(seed=2024)
tau, K, M, B = 1.0, 4, 40, 50
def fit(df, mode):
res = SequentialSDID({"df": df, "outcome": "y", "treat": "treat",
"unitid": "unit", "time": "year", "mode": mode, "eta": 0.05,
"K": K, "a_max": design.a_max, "n_bootstrap": B, "seed": 7,
"display_graphs": False}).fit()
return res.event_study.tau, res.event_study.ci
cov = {"ssdid": [], "sdid_imputation": []}
with warnings.catch_warnings():
warnings.simplefilter("ignore")
for m in range(M):
df = simulate_replication(design, np.random.default_rng(8000 + m), tau=tau)
for mode in cov:
tau_hat, ci = fit(df, mode)
cov[mode].append(((ci[:, 0] <= tau) & (tau <= ci[:, 1])).mean())
print("SSDiD coverage", np.mean(cov["ssdid"]))
print("DiD coverage", np.mean(cov["sdid_imputation"]))
Results#
At \(M = 40\) draws, \(B = 50\) bootstrap reps (the paper uses \(M = 1000\), \(B = 100\)):
Metric |
Sequential SDiD |
Standard DiD |
Paper (SSDiD / DiD) |
|---|---|---|---|
95% CI coverage |
0.945 |
0.45 |
~0.95 / ~0.70 |
mean \(|\mathrm{bias}|\) |
0.062 |
0.305 |
— |
RMSE |
0.252 |
0.346 |
SSDiD < DiD |
What it confirms#
Sequential SDiD delivers valid inference — coverage 0.945, essentially the nominal 0.95 — under an IFE violation that breaks DiD.
Standard DiD’s coverage collapses to 0.45 and its bias is about five times larger; its CIs are unreliable in exactly the regime the method targets. (The collapse is sharper than the paper’s ~0.70 because the reconstructed differential-trend violation is stronger than the CPS calibration; the direction and ranking are the paper’s.)
Sequential SDiD has lower RMSE, the second half of Table 1’s finding.
A noiseless corollary, pinned in test_seq_sdid.py, underlies the result:
on a noiseless rank-one IFE the estimator recovers the effect to machine
precision for every donor-balanced cohort, so the design’s reliability is not a
tolerance artifact.
The durable check lives in benchmarks/cases/seq_sdid_mc.py:
python benchmarks/run_benchmarks.py --case seq_sdid_mc