Sequential Synthetic Difference-in-Differences (Sequential SDiD)

Sequential Synthetic Difference-in-Differences (Sequential SDiD)#

Overview#

Sequential Synthetic Difference-in-Differences (Sequential SDiD, arXiv:2404.00164v2, Arkhangelsky & Samkov, 2025) is an event-study estimator for staggered-adoption designs that remains robust when the parallel-trends assumption fails. It adapts the canonical SDiD of Arkhangelsky et al. (2021) by operating on cohort-level aggregates and sequentially imputing treated cells with their estimated counterfactuals so that bias from early cohorts does not cascade into later ones.

The estimator is asymptotically equivalent to an infeasible oracle OLS regression that knows the unobserved interactive fixed effects (Proposition 3.1 of the paper), giving it the first formal efficiency guarantees for an SC-type method. Five structural differences distinguish Sequential SDiD from the canonical SDID estimator already in mlsynth:

It works on aggregated cohort outcomes \(y_{a, t} \coloneqq n_a^{-1} \sum_{i:\,A_i = a} y_{i, t}\) rather than unit-level data, with cohort shares \(\pi_a \coloneqq n_a / n\) carrying the unit-count information.
Weights satisfy only the simplex sum constraint \(\mathbf{1}^\top\mathbf{w} = 1\) — non-negativity is dropped.
The unit-weight penalty is the population-share-scaled \(\eta^{2} \sum_j w_j^2 / \pi_j\); the time-weight penalty is \(\eta^{2} \sum_l \lambda_l^2\).
Donors for cohort \(a\) are restricted to later-adopting cohorts \(j > a\) (including the never-treated cohort), not the universe of controls.
Cohort-by-horizon effects are estimated in a sequential cascade: each \(\widehat{\tau}_{a, k}^{\,\mathrm{SSDiD}}\) is computed, then the treated cell \(y_{a, a + k}\) is overwritten with \(y_{a, a + k} - \widehat{\tau}_{a, k}^{\,\mathrm{SSDiD}}\) so subsequent (a', k') steps see an imputed panel free of treatment contamination.

When to Use This Estimator#

Event studies with staggered adoption – units switching on at different dates – are the workhorse design of applied micro. The modern estimators built for them (Callaway & Sant’Anna 2020, de Chaisemartin & d’Haultfœuille 2020, Sun & Abraham 2020, Borusyak et al. 2024) fixed the heterogeneous-effects bug in two-way fixed effects, but they still rest on parallel trends: absent treatment, treated and comparison cohorts would have moved together. When unobserved interactive fixed effects (latent unit factors loading on latent time factors) drive selection into treatment, parallel trends fails and those estimators are biased – exactly the regime Sequential SDiD (Arkhangelsky & Samkov 2025) targets.

Its bet is to model the confounder. Within a linear interactive-fixed- effects model, Sequential SDiD is proven asymptotically equivalent to the infeasible oracle OLS that knows the latent factors (Prop. 3.1). That equivalence buys three things at once: robustness to parallel-trends violations of the IFE type, asymptotic normality with standard inference, and the first formal efficiency guarantee for an SC-type estimator. The sequential imputation – estimate the earliest cohort, subtract its effect, reuse the cleaned panel for later cohorts – gives a cohesive analysis of the whole event study, in contrast to per-cohort SC methods (Partially Pooled SCM (PPSCM), Cattaneo et al. 2021) that treat each adoption cohort as a separate problem.

Reach for Sequential SDiD when#

You have a staggered-adoption event study and you suspect parallel trends fails because of unobserved confounders that look like interactive fixed effects (selection on latent trends, not just levels).
Adoption cohorts are reasonably large. The method averages outcomes within each cohort and leans on a law of large numbers to kill idiosyncratic noise; this is the estimator’s core requirement and its main limitation.
You want valid, standard inference (asymptotic normality) and a defensible efficiency claim, plus a dynamic event-study path of cohort-by-horizon effects.
You are in the fixed-\(T\), large-\(N\) regime typical of county/firm/individual panels aggregated into a handful of adoption cohorts.

Do not use Sequential SDiD when#

Cohorts are small (few units per adoption date, or single-unit cohorts). The within-cohort averaging has nothing to average and the oracle equivalence does not engage. Use Partially Pooled SCM (PPSCM), whose partial pooling is designed for noisy per-unit cohort fits, or canonical Synthetic Difference-in-Differences (SDID) per cohort.
There is a single treated unit or one common adoption time. The sequential cascade has nothing to cascade; use canonical Synthetic Difference-in-Differences (SDID) (many units, one time) or classic SC (Two-Step Synthetic Control, Forward Difference-in-Differences (FDID), Synthetic Control with Multiple Outcomes (SCMO)) for a single unit.
Parallel trends is credible and you want the simplest transparent thing. Standard staggered-DiD (Callaway-Sant’Anna and relatives) is fine; on the Bailey-Goodman-Bacon application where PT holds, Sequential SDiD merely reproduces standard DiD.
SUTVA is violated by spillovers onto the comparison cohorts – use Spatial Synthetic Difference-in-Differences (SpSyDiD) or Spillover-Aware Synthetic Control (SPILLSYNTH).
Distributional questions (quantiles, tails) – use Distributional Synthetic Control (DSC).

Mathematical Formulation#

Notation#

Let \(\mathcal{N} \coloneqq \{1, \dots, N\}\) index the underlying units, with generic unit \(i \in \mathcal{N}\), and let \(t \in \mathcal{T} \coloneqq \{1, \dots, T\}\) index the (1-indexed) periods. Each unit carries a (possibly infinite) adoption period \(A_i\), the period after which it is treated; the never-treated cohort uses \(A_i = +\infty\). Units are grouped into cohorts indexed by adoption period \(a\), with \(n_a\) units in cohort \(a\), total \(n = \sum_a n_a\), and cohort share \(\pi_a \coloneqq n_a / n\). The unit-level outcome is \(y_{i, t}\); the cohort-level aggregate

\[y_{a, t} \;\coloneqq\; \frac{1}{n_a} \sum_{i:\, A_i = a} y_{i, t}\]

plays the role of a series in \(\mathbb{R}^{T}\), and the later-adopting cohorts of a treated cohort \(a\) (those with adoption period \(j > a\), including the never-treated cohort) form its donor pool, stacked into the donor matrix \(\mathbf{Y}_0\). Unit (donor) weights are the vector \(\mathbf{w}\) with entries \(w_j\), optimiser \(\mathbf{w}^\ast\); time weights are the vector \(\boldsymbol{\lambda}\) with entries \(\lambda_l\). The per-(cohort, horizon) effect is \(\tau_{a, k}\) and the pooled horizon effect is \(\tau_k\); the treatment effect is reserved to \(\tau\) throughout, so the regularization strength is \(\eta \ge 0\) (never \(\tau\)) and the significance level is the bare \(\alpha\) (matching SequentialSDIDConfig.alpha), distinct from the subscripted cohort fixed effect \(\alpha_a\) of the model in Setup.

Setup#

For each cohort the implementation computes the aggregate \(y_{a, t}\) above with share \(\pi_a\). Under Assumption 2.2 of the paper, the aggregate outcomes inherit the interactive-fixed-effects structure

\[y_{a, t} \;=\; \alpha_a + \beta_t + \theta_a^\top \psi_t + \mathbf{1}\{a \le t\}\,\tau_{a, t - a} + \epsilon_{a, t},\]

where \(\theta_a^\top \psi_t\) captures unobserved confounders that break parallel trends. Aggregation drives \(\epsilon_{a, t}\) to zero under the paper’s “large cohort” asymptotics, leaving the IFE structure identifiable.

Identifying assumptions#

Interactive fixed effects (Assumption 2.2). The no-intervention outcomes follow the additive-plus-factor structure above: cohort effect \(\alpha_a\), time effect \(\beta_t\), and a low-rank interactive term \(\theta_a^\top \psi_t\) of latent cohort loadings on latent time factors, the component that breaks parallel trends.

Remark. It is precisely the \(\theta_a^\top \psi_t\) term that defeats standard staggered-DiD: when adoption correlates with the loading \(\theta_a\), comparison cohorts do not trace the treated cohort’s counterfactual trend. Sequential SDiD models this term rather than assuming it away, which is what buys robustness when parallel trends fails.
Large adoption cohorts. Cohort sizes \(n_a\) grow with the sample so that aggregation drives the idiosyncratic noise \(\epsilon_{a, t}\) to zero, leaving the IFE structure identifiable from the cohort aggregates.

Remark. This is the estimator’s core requirement and its main limitation: the within-cohort average must have enough units to average. With single-unit or tiny cohorts the law of large numbers does not engage and the oracle equivalence (Prop. 3.1) does not hold — use Partially Pooled SCM (PPSCM) or canonical Synthetic Difference-in-Differences (SDID) instead.
Donor balance. Each treated cohort \(a\) is balanced only against the cohorts adopting after it (\(j > a\)) plus any never-treated cohort, so it needs at least two such donor cohorts to balance even a rank-one interactive fixed effect — matching a one-dimensional loading and the intercept spans \((1, \lambda)\).

Remark. With a single donor the sum-to-one constraint forces \(\mathbf{w} = [1]\) and the cohort effect collapses to an unbalanced DiD, biased whenever the loadings differ; that bias also cascades backward through the sequential imputation. The latest cohorts are the most exposed, so SequentialSDID.fit() warns when a cohort is donor-starved (see Limitations).
No interference across cohorts (SUTVA). Treating one cohort does not change the outcomes of the comparison cohorts that serve as its donors, so the donor aggregates carry only no-intervention outcomes.

Remark. Spillovers onto the comparison cohorts contaminate the counterfactual; when they are present use Spatial Synthetic Difference-in-Differences (SpSyDiD) or Spillover-Aware Synthetic Control (SPILLSYNTH) rather than Sequential SDiD.

Algorithm 1#

For each horizon \(k = 0, 1, \dots, K\) (outer loop) and each treated cohort \(a = a_{\min}, \dots, a_{\max}\) (inner loop), Sequential SDiD runs three steps.

Step 1: Solve two regularized QPs. Both QPs are equality-constrained convex quadratic programs (no non-negativity). The unit-weight QP is

\[\widehat{\mathbf{w}}^{(a, k)}, \widehat{w}_0 \;=\; \operatorname*{argmin}_{\sum_{j > a} w_j = 1} \quad \sum_{l < a + k} \left( w_0 + \sum_{j > a} w_j \, y_{j, l} - y_{a, l} \right)^{\!2} \;+\; \eta^2 \sum_{j > a} \frac{w_j^2}{\pi_j},\]

and the time-weight QP is

\[\widehat{\boldsymbol{\lambda}}^{(a, k)}, \widehat{\lambda}_0 \;=\; \operatorname*{argmin}_{\sum_{l < a + k} \lambda_l = 1} \quad \sum_{j > a} \left( \lambda_0 + \sum_{l < a + k} \lambda_l \, y_{j, l} - y_{j, a + k} \right)^{\!2} \;+\; \eta^2 \sum_{l < a + k} \lambda_l^{\,2}.\]

Both are solved in closed form via their KKT linear systems in mlsynth.utils.seq_sdid_helpers.weights.

Step 2: Weighted double-difference.

\[\widehat{\tau}_{a, k}^{\,\mathrm{SSDiD}} \;=\; \left( y_{a, a + k} - \sum_{j > a} \widehat{w}_j^{(a, k)} \, y_{j, a + k} \right) \;-\; \sum_{l < a + k} \widehat{\lambda}_l^{(a, k)} \left( y_{a, l} - \sum_{j > a} \widehat{w}_j^{(a, k)} \, y_{j, l} \right).\]

This is the same SDID-style contrast as the canonical estimator, just evaluated on cohort-level aggregates.

Step 3: Sequential imputation.

\[y_{a, a + k} \;\coloneqq\; y_{a, a + k} \;-\; \widehat{\tau}_{a, k}^{\,\mathrm{SSDiD}}.\]

The treated cell is replaced with its estimated counterfactual in place in the panel matrix. When the outer loop advances to a longer horizon or the inner loop advances to a later cohort, subsequent QPs use this imputed panel — which is the mechanism that prevents bias from cascading through the estimator.

Pooled Event Study#

Cohort-specific effects are aggregated into a single event-study trajectory via Equation 2.5 of the paper:

\[\widehat{\tau}_k^{\,\mathrm{SSDiD}}(\mu) \;=\; \sum_{a \in [a_{\min}, a_{\max}]} \mu_a \, \widehat{\tau}_{a, k}^{\,\mathrm{SSDiD}}, \qquad \mu_a \coloneqq \frac{\pi_a}{\sum_{a' \in [a_{\min}, a_{\max}]} \pi_{a'}}.\]

The default mu is proportional to cohort shares (i.e. larger cohorts get more weight), recovering the unit-uniform interpretation common in the DiD literature. The result lives on SeqSDIDEventStudy.tau.

The \(\eta \to \infty\) Limit#

Remark 2.2 of the paper notes that as \(\eta \to \infty\), the unit- weight QP’s penalty \(\sum w_j^2 / \pi_j\) forces \(w_j \propto \pi_j\) (each unit in the donor pool gets equal weight), and the time-weight QP’s penalty \(\sum \lambda_l^{\,2}\) forces \(\lambda_l = 1 / (a + k - 1)\) (uniform). The resulting estimator is a sequential DiD imputation estimator closely related to Borusyak, Jaravel, and Spiess (2024). The mlsynth implementation exposes this mode via SequentialSDIDConfig.mode:

mode = "ssdid": paper’s main estimator with finite eta (default).
mode = "sdid_imputation": forces the \(\eta \to \infty\) limit internally and returns the sequential-DiD-style result.

Inference (Section 2.3)#

Inference uses the Bayesian bootstrap of Rubin (1981) and Chamberlain & Imbens (2003). At each bootstrap iteration:

Draw independent weights \(\xi_i \sim \mathrm{Exp}(1)\) for every underlying unit (not cohort).
Reconstruct cohort-level outcomes as weighted means:

\[y_{a, t}(\xi) \;=\; \frac{\sum_{i:\, A_i = a} y_{i, t} \, \xi_i}{\sum_{i:\, A_i = a} \xi_i}.\]
Re-run Algorithm 1 on the perturbed panel.
Record the pooled event-study vector.

Standard errors are sample standard deviations of the bootstrap replicates, and confidence intervals are Wald-type at SequentialSDIDConfig.alpha. The full replicate matrix is retained on SeqSDIDEventStudy.bootstrap_draws in case quantile-based intervals are preferred downstream.

Limitations#

The paper’s formal guarantees require large adoption cohorts — cohort sizes that grow with the sample so the aggregation kills the idiosyncratic noise. The algorithm still runs on single-treated-unit panels (e.g., the Proposition 99 dataset), but with only one treated cohort and one never-treated cohort the time-weight QP becomes effectively underdetermined; the practical recommendation is to use canonical SDID for those panels and reserve Sequential SDiD for genuine staggered designs with multiple sizable cohorts.

A second requirement is donor balance. Each treated cohort \(a\) is balanced only against the cohorts adopting after it (\(j > a\)) plus any never-treated cohort, so it needs at least two such donor cohorts to balance even a rank-one interactive fixed effect — matching a one-dimensional loading and the intercept spans \((1, \lambda)\). With a single donor the sum-to-one constraint forces \(\mathbf{w} = [1]\) and the cohort effect collapses to an unbalanced DiD, biased whenever the loadings differ; that bias also cascades backward through the sequential imputation. On a noiseless rank-one factor the estimator recovers the effect exactly for every cohort that is balanced, and the entire residual bias is attributable to the donor-starved late cohorts. Because the latest cohorts are the most exposed (the default a_max is the last adopting cohort), SequentialSDID.fit() emits a UserWarning naming any donor-starved cohort and the largest a_max that keeps every estimated cohort balanced — report, don’t silently relax.

Verification#

Path B — the paper’s Section-5.2.2 calibrated-panel Monte Carlo (Table 1) is reconstructed from its description (the authors’ CPS log-wage panel is not public). Under an interactive-fixed-effects violation of parallel trends with adoption correlated to the leading loading, standard DiD’s 95% CI coverage collapses (\(\approx 0.45\); paper \(\approx 0.70\)) while Sequential SDiD stays near nominal (\(0.945\)) with roughly five-times-smaller bias and lower RMSE — the paper’s “DiD severely biased, Sequential SDiD reliable” result. A noiseless rank-one IFE corollary pins exact machine-precision recovery for every donor-balanced cohort. The durable check is benchmarks/cases/seq_sdid_mc.py; see the dedicated replication page, SEQ_SDID — Sequential Synthetic DiD (Arkhangelsky & Samkov 2025), for the design, code, and the full table.

Core API#

Sequential Synthetic Difference-in-Differences (Sequential SDiD) estimator.

Implements:

Arkhangelsky, D., & Samkov, A. (2025). “Sequential Synthetic Difference in Differences.” arXiv:2404.00164v2.

The estimator targets event-study designs with staggered treatment adoption and remains robust when the parallel-trends assumption is violated by interactive fixed effects. It operates on cohort-level aggregates rather than unit-level data, sequentially imputes treated outcomes with their estimated counterfactuals, and uses unconstrained-sum weights with a population-share-scaled L2 penalty.

Output is a typed mlsynth.utils.seq_sdid_helpers.structures.SeqSDIDResults container exposing:

cohort_effects cohort-by-horizon point estimates
tau_hat_{a, k}^SSDiD

event_study the pooled horizon-k effects
tau_hat_k^SSDiD(mu) with bootstrap CIs

inference bootstrap configuration summary

raw_event_study the non-bootstrap point-estimate vector

class mlsynth.estimators.seq_sdid.SequentialSDID(config: SequentialSDIDConfig | dict)#

Bases: object

Sequential Synthetic Difference-in-Differences estimator.

Parameters:: config (SequentialSDIDConfig or dict) – Configuration object. See mlsynth.config_models.SequentialSDIDConfig.
Returns:: SeqSDIDResults – Typed container with cohort-by-horizon effects, the pooled event-study trajectory, and Bayesian-bootstrap SE / CI.

Notes

The two-way fixed-effects representation underlying canonical SDiD requires parallel trends; Sequential SDiD relaxes this by modelling interactive fixed effects directly. The theoretical guarantees in the paper require that adoption cohorts be relatively large; on single-treated-unit panels the algorithm still runs but the formal efficiency results don’t apply.

Each treated cohort’s counterfactual is balanced against the cohorts that adopt after it (plus any never-treated cohort). A cohort therefore needs at least two such donor cohorts to balance even a rank-one interactive fixed effect – with a single donor its effect collapses to an unbalanced DiD and is biased under interactive fixed effects, a bias that also cascades backward through the sequential imputation. The latest cohorts are the most exposed; fit() emits a UserWarning naming any donor-starved cohort and the largest a_max that keeps every estimated cohort balanced.

References

Arkhangelsky, D., & Samkov, A. (2025). “Sequential Synthetic Difference in Differences.” arXiv:2404.00164v2.

Examples

>>> import pandas as pd
>>> from mlsynth import SequentialSDID
>>> df = pd.read_csv("...")
>>> res = SequentialSDID({
...     "df": df, "outcome": "y", "treat": "treated",
...     "unitid": "unit", "time": "year",
...     "n_bootstrap": 200, "eta": 0.0, "display_graphs": False,
... }).fit()
>>> res.event_study.tau

fit() → SeqSDIDResults#: Run Algorithm 1 + bootstrap inference and return the typed result.

Configuration#

class mlsynth.config_models.SequentialSDIDConfig(*, df: ~pandas.DataFrame, outcome: str, treat: str, unitid: str, time: str, display_graphs: bool = True, save: bool | str = False, counterfactual_color: ~typing.List[str] = <factory>, treated_color: str = 'black', plot: ~mlsynth.config_models.PlotConfig = <factory>, eta: ~typing.Annotated[float, ~annotated_types.Ge(ge=0)] = 0.0, mode: ~typing.Literal['ssdid', 'sdid_imputation'] = 'ssdid', K: ~typing.Annotated[int | None, ~annotated_types.Ge(ge=0)] = None, a_min: int | None = None, a_max: int | None = None, n_bootstrap: ~typing.Annotated[int, ~annotated_types.Ge(ge=0)] = 500, alpha: ~typing.Annotated[float, ~annotated_types.Gt(gt=0.0), ~annotated_types.Lt(lt=1.0)] = 0.05, seed: int = 1400)#

Configuration for the Sequential Synthetic Difference-in-Differences estimator.

Implements Arkhangelsky & Samkov (2025, arXiv:2404.00164v2). Operates on cohort-level aggregates and is robust to violations of parallel trends induced by interactive fixed effects. Inherits the standard df / outcome / treat / unitid / time panel interface from BaseEstimatorConfig.

K: int | None#

a_max: int | None#

a_min: int | None#

alpha: float#

eta: float#

mode: Literal['ssdid', 'sdid_imputation']#

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_bootstrap: int#

seed: int#

Helper Modules#

Cohort-level aggregation for Sequential SDiD.

Reuses mlsynth.utils.datautils.dataprep() to identify treated / never-treated units and their adoption periods, then aggregates the unit-level outcomes into the cohort-level Y_{a,t} panel that Algorithm 1 consumes.

mlsynth.utils.seq_sdid_helpers.setup.prepare_seq_sdid_inputs(df: DataFrame, outcome: str, treat: str, unitid: str, time: str, a_min: int | None = None, a_max: int | None = None, K: int | None = None) → SeqSDIDInputs#

Aggregate the panel into cohort-level outcomes Y_{a,t} and shares.

Parameters:

df, outcome, treat, unitid, time (standard mlsynth panel inputs.)
a_min, a_max (int, optional) – Earliest / latest treated cohort (1-based time index) to estimate. Default: span all treated cohorts in the data.
K (int, optional) – Maximum event-time horizon. Default: T - a_max so every estimable effect fits inside the panel.

Closed-form QP solvers for the Sequential SDiD weights.

Both weight problems in Algorithm 1 are equality-constrained convex quadratic programs with the simplex sum constraint and no non-negativity constraint. We solve them via the KKT linear system rather than handing them to cvxpy — the bootstrap calls these tens of thousands of times, and the linear system is small (~ N_cohort + 1 unknowns).

mlsynth.utils.seq_sdid_helpers.weights.solve_time_qp(Y_pre_donors: ndarray, y_event_donors: ndarray, eta: float) → Tuple[ndarray, float]#

Solve the Sequential SDiD time-weight QP for one (a, k) step.

Optimization (paper Algorithm 1, line 5):

min over (lambda, lambda_0)
    sum_{j > a} (lambda_0 + sum_l lambda_l Y_{j, l} - Y_{j, a + k})^2
    + eta^2 * sum_l lambda_l^2
s.t.  sum_l lambda_l = 1.

Parameters:

Y_pre_donors (np.ndarray) – Donor outcomes in the pre-event window, shape (T_pre, J).
y_event_donors (np.ndarray) – Donor outcomes at event time a + k, shape (J,).
eta (float) – Non-negative regularization parameter.

Returns:

lambda_w (np.ndarray) – Optimal time weights, shape (T_pre,), summing to 1.
lambda_0 (float) – Optimal intercept.

mlsynth.utils.seq_sdid_helpers.weights.solve_unit_qp(Y_pre_donors: ndarray, y_pre_treated: ndarray, pi_donors: ndarray, eta: float) → Tuple[ndarray, float]#

Solve the Sequential SDiD unit-weight QP for one (a, k) step.

Optimization (paper Algorithm 1, line 5):

min over (omega, omega_0)
    sum_{l < a + k} (omega_0 + sum_j omega_j Y_{j, l} - Y_{a, l})^2
    + eta^2 * sum_j omega_j^2 / pi_j
s.t.  sum_j omega_j = 1.

Parameters:

Y_pre_donors (np.ndarray) – Pre-event outcomes of later-adopting cohorts, shape (T_pre, J) where T_pre = a + k - 1 and J is the number of donor cohorts (j > a).
y_pre_treated (np.ndarray) – Pre-event outcomes of the treated cohort, shape (T_pre,).
pi_donors (np.ndarray) – Cohort shares pi_j for the donor cohorts, shape (J,).
eta (float) – Non-negative regularization parameter.

Returns:

omega (np.ndarray) – Optimal unit weights, shape (J,), summing to 1.
omega_0 (float) – Optimal intercept.

Algorithm 1: Sequential SDiD outer / inner loop.

Iterates over k = 0, 1, ..., K (outer) and treated cohorts a = a_min, ..., a_max (inner). At each (k, a) step the routine solves the two regularized QPs, computes the weighted double-difference tau_{a,k}, and overwrites Y_{a, a+k} with its estimated counterfactual Y_{a, a+k} - tau_{a,k}. Later steps therefore see a panel where every previously-estimated treated cell has been replaced by its imputed counterfactual — this is the sequential cascade that gives the estimator its name.

mlsynth.utils.seq_sdid_helpers.algorithm.pooled_event_study(cohort_effects: Dict[Tuple[int, int], SeqSDIDCohortEffect], pi: ndarray, cohort_periods: ndarray, a_min: int, a_max: int, K: int) → ndarray#

Aggregate per-cohort effects into horizon-k pooled estimates.

Implements tau_hat_k^SSDiD(mu) = sum_a mu_a * tau_hat_{a, k} with mu_a = pi_a / sum_{a' in [a_min, a_max]} pi_a', the cohort-share weighting recommended in the paper (Eq. 2.5).

Parameters:

cohort_effects (Dict[(int, int), SeqSDIDCohortEffect]) – Output of run_sequential_sdid().
pi (np.ndarray) – Cohort shares.
cohort_periods (np.ndarray) – 1-based time indices.
a_min, a_max (int) – Range of treated cohorts that participated in estimation.
K (int) – Maximum horizon.

Returns:

np.ndarray – Length-K + 1 array of pooled effects.

mlsynth.utils.seq_sdid_helpers.algorithm.run_sequential_sdid(Y_agg: ndarray, pi: ndarray, cohort_periods: ndarray, treated_cohort_indices: ndarray, a_min: int, a_max: int, K: int, eta: float, in_place_imputation: bool = True) → Tuple[ndarray, Dict[Tuple[int, int], SeqSDIDCohortEffect]]#

Run Algorithm 1 of Arkhangelsky & Samkov (2025).

Parameters:

Y_agg (np.ndarray) – Cohort-level outcome matrix, shape (T, A).
pi (np.ndarray) – Cohort shares, length A, summing to 1 across the whole sample.
cohort_periods (np.ndarray) – 1-based time index of each cohort’s adoption period, length A. Never-treated cohorts use np.iinfo(np.int64).max.
treated_cohort_indices (np.ndarray) – Column indices into Y_agg (and pi) identifying the finitely-adopting cohorts.
a_min, a_max (int) – Earliest / latest cohort adoption-period index to estimate.
K (int) – Maximum horizon k to estimate.
eta (float) – Regularization parameter (>= 0). At eta -> infinity the unit weights collapse to omega_j proportional to pi_j and the time weights to 1 / (a + k - 1) (Remark 2.2 of the paper).
in_place_imputation (bool) – Whether to update Y_agg[a, a+k] in place with the estimated counterfactual. The paper’s Algorithm 1 does this; we expose the flag for diagnostics. Default: True. Y_agg is copied before modification, so the input is never mutated.

Returns:

Y_imputed (np.ndarray) – The (possibly imputed) Y_agg matrix.
cohort_effects (Dict[Tuple[int, int], SeqSDIDCohortEffect]) – Per-(cohort_period, k) point estimates plus the fitted weights.

Bayesian-bootstrap inference for Sequential SDiD (paper Section 2.3).

For each bootstrap iteration we:

Draw xi_i ~ Exp(1) for every underlying unit (not cohort).
Re-weight the cohort-level outcomes: Y_{a, t}(xi) = sum_{i: A_i = a} Y_{i, t} xi_i / sum_{i: A_i = a} xi_i.
Re-run Algorithm 1 on the perturbed panel.
Record the pooled event-study vector.

Wald-type SE/CI come from the sample standard deviation of the bootstrap replicate matrix. Bootstrap draws are also retained on the result object in case the user wants quantile-based intervals later.

mlsynth.utils.seq_sdid_helpers.inference.bayesian_bootstrap_event_study(df: DataFrame, outcome: str, treat: str, unitid: str, time: str, inputs, eta: float, n_bootstrap: int, seed: int) → ndarray#

Return (n_bootstrap, K + 1) matrix of bootstrap-replicate event-study vectors.

The bootstrap reweighting follows Section 2.3 of Arkhangelsky & Samkov (2025): independent xi_i ~ Exp(1) are drawn for every unit, and the cohort-level outcomes are reconstructed as weighted means with those weights.

mlsynth.utils.seq_sdid_helpers.inference.wald_intervals(tau_hat: ndarray, bootstrap_draws: ndarray, alpha: float = 0.05) → Tuple[ndarray, ndarray]#

Return (se, ci) Wald-type SE and confidence intervals.

se is the sample standard deviation of the bootstrap replicates; ci is the standard normal Wald interval centered at tau_hat.

Event-study chart for Sequential SDiD.

Plots tau_hat_k^SSDiD against the event-time horizon k, with the bootstrap Wald band as a shaded region. Reuses matplotlib directly to stay independent of the existing SDID_plot helper, which is wired for the canonical-SDID dict shape.

mlsynth.utils.seq_sdid_helpers.plotter.plot_seq_sdid(results: SeqSDIDResults, title: str = 'Sequential SDiD event study', save: bool | str | dict = False) → None#: Render the event-study trajectory with the bootstrap CI band.

Note

SequentialSDID.fit() returns an EffectResult on the standardized two-family contract. Sequential SDiD is an event-study estimator, so its standardized time_series is laid out over event-time horizons (res.gap is the pooled horizon effect tau_hat_k) and res.att is the simple average of those pooled effects. The per-(cohort, horizon) effects, the pooled event study, and the bootstrap config stay on res.cohort_effects / res.event_study / res.inference_detail.

Typed result containers for Sequential SDiD.

All matrices follow mlsynth’s (T, N) orientation (rows = time, columns = cohort).

class mlsynth.utils.seq_sdid_helpers.structures.SeqSDIDCohortEffect(cohort_period: int, k: int, tau: float, omega: ndarray, lambda_w: ndarray)#

Single cohort-by-horizon estimate.

Parameters:

cohort_period (int) – 1-based time index of the cohort’s adoption period.
k (int) – Horizon (event-time offset), with k = 0 the first treated period.
tau (float) – Point estimate tau_{a,k}^SSDiD.
omega (np.ndarray) – Unit weights solving the (a, k) QP, aligned with the slice of Y_agg corresponding to later cohorts j > a.
lambda_w (np.ndarray) – Time weights solving the (a, k) QP, aligned with pre-event periods l < a + k.

cohort_period: int#

k: int#

lambda_w: ndarray#

omega: ndarray#

tau: float#

class mlsynth.utils.seq_sdid_helpers.structures.SeqSDIDEventStudy(horizons: ndarray, tau: ndarray, se: ndarray, ci: ndarray, bootstrap_draws: ndarray, alpha: float)#

Pooled horizon-k effects tau_hat_k^SSDiD(mu).

Parameters:

horizons (np.ndarray) – Length-K + 1 array of event-time horizons k = 0, 1, ..., K.
tau (np.ndarray) – Pooled effects aligned with horizons.
se (np.ndarray) – Bootstrap standard errors aligned with tau.
ci (np.ndarray) – Length-(K + 1, 2) array of Wald confidence intervals.
bootstrap_draws (np.ndarray) – Bootstrap replicate matrix of shape (B, K + 1), retained for downstream diagnostics or alternative quantile-based intervals.
alpha (float) – Significance level used for ci.

alpha: float#

bootstrap_draws: ndarray#

ci: ndarray#

horizons: ndarray#

se: ndarray#

tau: ndarray#

class mlsynth.utils.seq_sdid_helpers.structures.SeqSDIDInference(n_bootstrap: int, method: str, seed: int)#

Bayesian bootstrap inference summary.

method: str#

n_bootstrap: int#

seed: int#

class mlsynth.utils.seq_sdid_helpers.structures.SeqSDIDInputs(Y_agg: ndarray, pi: ndarray, cohort_periods: ndarray, cohort_labels: Sequence, treated_cohort_indices: ndarray, time_labels: ndarray, n_units: int, a_min: int, a_max: int, K: int)#

Aggregated cohort-level panel fed into Sequential SDiD.

Parameters:

Y_agg (np.ndarray) – Cohort-level outcome matrix of shape (T, A) where A is the number of distinct adoption cohorts (treated cohorts in ascending order followed by the never-treated cohort, if present).
pi (np.ndarray) – Cohort shares pi_a = n_a / n, length A, summing to 1 over the entire sample.
cohort_periods (np.ndarray) – Length-A array of cohort adoption periods (1-based time indices). The never-treated cohort is encoded as np.iinfo(np.int64).max and lives at index A - 1.
cohort_labels (Sequence) – Human-readable labels for each cohort (e.g. adoption year), aligned with cohort_periods.
treated_cohort_indices (np.ndarray) – Integer indices into Y_agg’s second axis identifying treated (i.e. finitely-adopting) cohorts.
time_labels (np.ndarray) – Original time labels, length T.
n_units (int) – Total number of underlying units.
a_min (int) – Earliest treated cohort to estimate (1-based time index).
a_max (int) – Latest treated cohort to estimate.
K (int) – Maximum event-time horizon to estimate.

K: int#

Y_agg: ndarray#

a_max: int#

a_min: int#

cohort_labels: Sequence#

cohort_periods: ndarray#

n_units: int#

pi: ndarray#

time_labels: ndarray#

treated_cohort_indices: ndarray#

class mlsynth.utils.seq_sdid_helpers.structures.SeqSDIDResults(*, effects: EffectsResults | None = None, fit_diagnostics: FitDiagnosticsResults | None = None, time_series: TimeSeriesResults | None = None, weights: WeightsResults | None = None, inference: InferenceResults | None = None, method_details: MethodDetailsResults | None = None, sub_method_results: Dict[str, Any] | None = None, additional_outputs: Dict[str, Any] | None = None, raw_results: Dict[str, Any] | None = None, execution_summary: Dict[str, Any] | None = None, plot_config: PlotConfig | None = None, inputs: SeqSDIDInputs, cohort_effects: Dict[Tuple[int, int], SeqSDIDCohortEffect], event_study: SeqSDIDEventStudy, inference_detail: SeqSDIDInference, eta: float, mode: str, raw_event_study: ndarray)#

Public SequentialSDID.fit() return container.

An EffectResult (the observational report). Sequential SDiD is an event-study estimator, so its standardized time_series is laid out over event-time horizons rather than calendar time: time_periods are the horizons k = 0, 1, ..., K and gap is the pooled horizon effect tau_hat_k (counterfactual is the no-effect baseline). The flat att is the simple average of the pooled horizon effects. The full SSDiD detail – per-(cohort, horizon) effects, the pooled event study, and the bootstrap config – stays in the typed fields below.

Parameters:

inputs (SeqSDIDInputs) – Aggregated panel + cohort metadata.
cohort_effects (Dict[Tuple[int, int], SeqSDIDCohortEffect]) – Per-(cohort_period, k) effects.
event_study (SeqSDIDEventStudy) – Pooled horizon-k effects with bootstrap inference.
inference_detail (SeqSDIDInference) – Bootstrap configuration summary (was inference before the contract migration; the standardized inference slot holds the InferenceResults).
eta (float) – Regularization parameter actually used.
mode (str) – "ssdid" (the paper’s main estimator) or "sdid_imputation" (the eta -> infinity Borusyak-style limit from Remark 2.2).
raw_event_study (np.ndarray) – Length-K + 1 non-bootstrapped pooled effect vector (the same numbers as event_study.tau; kept separately for clarity).

cohort_effects: Dict[Tuple[int, int], SeqSDIDCohortEffect]#

eta: float#

event_study: SeqSDIDEventStudy#

inference_detail: SeqSDIDInference#

inputs: SeqSDIDInputs#

mode: str#

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'frozen': True, 'json_encoders': {<class 'numpy.ndarray'>: <function BaseEstimatorResults.Config.<lambda>>}}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

raw_event_study: np.ndarray#

Example#

import pandas as pd
from mlsynth import SequentialSDID

df = pd.read_csv("staggered_panel.csv")  # state-level panel with treat in {0, 1}

results = SequentialSDID({
    "df":           df,
    "outcome":      "log_wage",
    "treat":        "treated",
    "unitid":       "state",
    "time":         "year",
    "eta":          1.0,
    "K":            10,
    "n_bootstrap":  500,
    "alpha":        0.05,
    "display_graphs": True,
}).fit()

# Pooled event-study trajectory (Eq. 2.5 of the paper).
for k, tau, se in zip(results.event_study.horizons,
                      results.event_study.tau,
                      results.event_study.se):
    print(f"k = {k:>2}  tau = {tau:+.3f}  se = {se:.3f}")

# Cohort-by-horizon decomposition.
for (a, k), effect in results.cohort_effects.items():
    print(f"cohort a = {a}  horizon k = {k}  tau = {effect.tau:+.3f}")

# Bayesian-bootstrap replicate matrix (B x (K + 1)) — for quantile CIs
# or downstream diagnostics.
results.event_study.bootstrap_draws.shape

References#

Arkhangelsky, D., & Samkov, A. (2025). “Sequential Synthetic Difference in Differences.” arXiv:2404.00164v2.

Arkhangelsky, D., Athey, S., Hirshberg, D. A., Imbens, G. W., & Wager, S. (2021). “Synthetic Difference-in-Differences.” American Economic Review 111(12): 4088-4118.

Borusyak, K., Jaravel, X., & Spiess, J. (2024). “Revisiting Event-Study Designs: Robust and Efficient Estimation.” Review of Economic Studies.

Sequential Synthetic Difference-in-Differences (Sequential SDiD)

Contents

Sequential Synthetic Difference-in-Differences (Sequential SDiD)#

Overview#

When to Use This Estimator#

Reach for Sequential SDiD when#

Do not use Sequential SDiD when#

Mathematical Formulation#

Notation#

Setup#

Identifying assumptions#

Algorithm 1#

Pooled Event Study#

The \(\eta \to \infty\) Limit#

Inference (Section 2.3)#

Limitations#

Verification#

Core API#

Configuration#

Helper Modules#

Example#

References#