Synthetic Difference-in-Differences (SDID)

Synthetic Difference-in-Differences (SDID)#

When to Use This Estimator#

Difference-in-differences (DiD) and synthetic control (SC) are usually pitched as tools for different problems. DiD is used when many units are treated and you are willing to assume parallel trends – that treated and control outcomes would have moved in lockstep absent treatment, after removing additive unit and time fixed effects. SC is used when one (or a few) units are treated and parallel trends plainly fails, so you instead re-weight the donors to match the treated unit’s pre-treatment path.

Synthetic Difference-in-Differences (SDID), due to Arkhangelsky, Athey, Hirshberg, Imbens and Wager (2021, AER) [aersdid], argues these two strategies rest on closely related assumptions and combines the best of both. It fits a two-way fixed-effects regression that is doubly weighted – by SC-style unit weights \(w_i\) and DiD-style time weights \(\lambda_t\):

\[(\widehat{\tau}, \widehat{\mu}, \widehat{\alpha}, \widehat{\beta}) = \operatorname*{arg\,min}_{\tau, \mu, \alpha, \beta} \sum_{i \in \mathcal{N}}\sum_{t \in \mathcal{T}} \bigl(y_{it} - \mu - \alpha_i - \beta_t - d_{it}\tau\bigr)^2\, \widehat{w}_i\, \widehat{\lambda}_t .\]

The weights make the regression local: it leans on control units whose past resembles the treated unit’s, and on pre-periods that resemble the post-period. Reach for SDID when:

DiD is tempting but pre-trends are not parallel. SDID re-weights controls so their trend becomes parallel (not identical – the unit fixed effects \(\alpha_i\) absorb level gaps) to the treated unit, then runs DiD on the re-weighted panel. It “automates” the usual practice of hunting for comparable units/periods to make parallel trends plausible, with statistical guarantees – addressing the pre-testing concerns of Roth.
SC is tempting but the pre-fit is imperfect or you want valid inference. Adding unit fixed effects (and an intercept in the weight problem) means the donors only need to be parallel to the treated unit, not match it exactly, and the design admits large-panel inference.
You want robustness without choosing. Where DiD has been used, SDID is competitive with or better than DiD; where SC has been used, it is competitive with or better than SC. The weighting also often improves precision by removing predictable structure – in the Prop 99 study, SDID’s standard error (8.4) is smaller than DiD’s (17.7) despite being the more flexible estimator.

Note

The localization is not a free lunch: if outcomes have little systematic heterogeneity across units or periods, unequal weighting can worsen precision relative to plain DiD. SDID helps most when there is real structure (trends, levels) for the weights to exploit.

Do not use SDID when#

Spillovers / interference contaminate the donor pool. SDID assumes the controls are untreated and unaffected by the treatment (SUTVA). If treatment leaks to neighbours – cross-border shopping, migration, geographic advertising – the weighted controls are biased. Use Spatial Synthetic Difference-in-Differences (SpSyDiD), which separates the direct ATT from the spillover term.
Staggered adoption where you want partial pooling or an interactive fixed-effects guarantee. SDID runs per cohort and averages, which is fine for an overall ATT, but it does not pool information across cohorts the way Partially Pooled SCM (PPSCM) does, nor does it give the oracle-OLS efficiency of Sequential Synthetic Difference-in-Differences (Sequential SDiD). Prefer those when cohorts are many and individual cohort fits are noisy.
The treated unit sits far outside the donor convex hull / the donor pool is huge and noisy. SDID’s unit weights are non-negative and (softly) sum-constrained; a treated path no linear convex combination can parallel will fit poorly. A factor-model estimator (Factor Model Approach (FMA)) or a low-rank/denoising approach (Cluster Synthetic Controls (CLUSTERSC), Matrix Completion with Nuclear Norm Minimization (MCNNM)) is better suited there.
A single treated unit, short panel, and you want the interpretable sparse convex-weight story as the deliverable. Classic SC and its refinements (Two-Step Synthetic Control, Forward Difference-in-Differences (FDID), Synthetic Control with Multiple Outcomes (SCMO)) are more transparent; SDID’s double weighting buys little when there is only one treated unit and no time structure for the time weights to exploit.
Distributional questions (quantile effects, Lorenz, tails). SDID targets the mean ATT; use Distributional Synthetic Control (DSC).

What SDID Does in Practice#

Beyond the econometrics: SDID answers “what would the treated unit have done?” by building a synthetic comparison that is parallel to it, not a clone, and by trusting the recent, relevant past more than the distant past.

Policy / geo evaluation. A state raises cigarette taxes (Prop 99); a city introduces congestion pricing; a country reunifies. You have a long panel of comparison regions whose levels differ wildly and whose pre-trends are not parallel. SDID re-weights the comparison regions to parallel the treated one and downweights ancient history that no longer looks like the policy window.
Marketing / pricing roll-outs. A pricing change launches in some markets. Plain DiD over all markets is biased if the treated markets were on a different trajectory; pure SC ignores that fixed level differences are harmless. SDID handles both, and – via time weights – discounts pre-launch months that don’t resemble the post-launch regime (seasonal shifts, a pre-launch promo).
Staggered roll-outs. When units adopt at different dates, SDID runs per cohort and aggregates (Clarke et al., 2023), yielding both an overall ATT and a dynamic event-study path (Ciccia, 2024).

Notation#

Let \(y_{it}\) be the outcome of unit \(i\) in period \(t\), with units \(i \in \mathcal{N} \coloneqq \{1, \dots, N\}\) and periods \(t \in \mathcal{T} \coloneqq \{1, \dots, T\}\), 1-indexed, and let \(d_{it} \in \{0, 1\}\) be the treatment indicator. Unlike the single-treated SC family, SDID admits several treated units, so there is no distinguished \(i = 1\). The first \(N_{co}\) units are never-treated controls (donors); the remaining \(N_{tr} = N - N_{co}\) are treated, exposed after their adoption period. \(T_{pre}\) and \(T_{post}\) count pre- and post-treatment periods. The unit weights \(\mathbf{w} = (w_1, \dots, w_{N_{co}})^\top\) are supported on the controls and lie on the simplex \(\Delta^{N_{co}} \coloneqq \{\mathbf{w} \in \mathbb{R}_{\ge 0}^{N_{co}} : \|\mathbf{w}\|_1 = 1\}\); the time weights \(\boldsymbol{\lambda} = (\lambda_1, \dots)^\top\) are supported on the pre-period (Arkhangelsky et al.’s \(\lambda\), kept distinct from the regularization symbols below). \(\zeta\) is the unit-weight regularization parameter, \(\zeta = (N_{tr} T_{post})^{1/4}\,\widehat\sigma\) with \(\widehat\sigma\) the standard deviation of the first-differenced control outcomes (Arkhangelsky et al. 2021; the synthdid zeta.omega). The treated count \(N_{tr}\) enters per cohort, so a block with several treated units is regularized more strongly than a single-treated design on the same panel; for one treated unit it reduces to \((T_{post})^{1/4}\widehat\sigma\). The optimisers are written \(\mathbf{w}^\ast\) and \(\boldsymbol{\lambda}^\ast\). The estimand is the average treatment effect on the treated, \(\tau\) (denoted \(\widehat{ATT}\) in aggregate).

Notation bridge

The mlsynth implementation generalizes the single-treated block design to cohorts: cohort \(a\) is the set \(I^a \subseteq \{N_{co} + 1, \dots, N\}\) of units first treated in period \(a\), with size \(N_{tr}^a = |I^a|\) and \(T_{tr}^a = T - a + 1\) post-periods; \(A = \{a_1, \dots, a_K\}\) collects the distinct adoption periods, and \(T_{post} = \sum_{a \in A} N_{tr}^a T_{tr}^a\) is the aggregate post-treatment exposure (Clarke et al., 2023). The classical single-treated case (California) is the one-cohort special case, where the cohort ATT and the overall ATT coincide (and this aggregate exposure reduces to the post-period count \(T_{post}\) above).

Assumptions#

SDID’s formal guarantees are developed under an interactive fixed-effects (latent factor) model for the control potential outcome,

\[y_{it} = \boldsymbol{\gamma}_i^\top \mathbf{v}_t + \tau d_{it} + \varepsilon_{it},\]

where \(\boldsymbol{\gamma}_i\) are latent unit factors and \(\mathbf{v}_t\) latent time factors (a generalization of additive \(\alpha_i + \beta_t\) two-way fixed effects).

Assumption 1 (latent factor outcome model). The systematic part of the outcome is \(\boldsymbol{\gamma}_i^\top \mathbf{v}_t\); deviations \(\varepsilon_{it}\) are mean-zero given the systematic component and the treatment assignment.

Remark. This is strictly more general than DiD’s additive \(\alpha_i + \beta_t\). When the factor structure is additive, plain DiD is already consistent; SDID is designed to also handle the interactive case, where DiD is biased.

Assumption 2 (selection on the systematic part only). Treatment assignment \(d_{it}\) may depend on the latent factors \(\boldsymbol{\gamma}_i, \mathbf{v}_t\) (units are not randomized) but not on the idiosyncratic error \(\varepsilon\).

Remark. This is what lets policies be adopted non-randomly – California was not a coin flip – yet still be identified: the confounding must run through the persistent latent structure that the weights and fixed effects soak up, not through transitory shocks.

Assumption 3 (weak cross-unit dependence). The error vectors \(\varepsilon_i\) are independent across units, though correlation within a unit over time is allowed.

Remark. Serial correlation within a unit is the norm in panel data and is permitted; this is why the time-weight problem is left unregularized (it must accommodate within-unit temporal correlation) while the unit-weight problem is regularized. Cross-unit independence is what powers the placebo variance estimator.

Assumption 4 (weighted parallel trends, achieved by construction). There exist unit weights making the treated trajectory parallel to the weighted control trajectory over the pre-period, and time weights making each control’s post-period mean a constant offset from its weighted pre-period mean.

Remark. Unlike DiD – which assumes parallel trends on the raw data – SDID constructs weights to make parallel trends hold on the re-weighted panel, then proceeds. The graphical “parallel trends” check is thus performed on adjusted data, automatically and with guarantees.

Why Unit Weights and Why Time Weights#

Unit weights are chosen so the treated unit’s pre-treatment path is parallel to the weighted-control path. Two differences from classical SC (Abadie et al., 2010) make this work inside a fixed-effects regression:

an intercept \(w_0\) is allowed, so the weights need only make trends parallel rather than coincident – the unit fixed effects \(\alpha_i\) absorb any constant level gap; and
a ridge penalty \(\zeta^2 \|\mathbf{w}\|_2^2\) is added (with \(\zeta = (N_{tr} T_{post})^{1/4}\widehat{\sigma}\), \(\widehat{\sigma}\) the SD of first-differenced control outcomes) to disperse and uniquely pin down the weights.

Time weights are chosen so that, for the control units, the weighted average of pre-treatment outcomes predicts the post-treatment average up to a constant. The argument for them mirrors the argument for unit weights: down-weighting pre-periods that look nothing like the post-period removes bias and improves precision. This is the data-driven counterpart to event-study practice, which implicitly puts all comparison weight on the last pre-period – SDID instead lets the data choose which pre-periods are informative. The time-weight problem is left unregularized (Assumption 3).

Together, unit and time weights plus unit fixed effects make the DiD contrast both more robust (it leans on comparable units and periods) and, typically, more precise (predictable structure is removed), which is why SDID’s standard errors can be smaller than DiD’s despite its added flexibility.

Mathematical Formulation#

Setup#

Using the cohort notation introduced above (\(I^a\), \(N_{tr}^a\), \(T_{tr}^a\), the adoption-period set \(A\), and the aggregate exposure \(T_{post}\)), recall that the classical Arkhangelsky et al. (2021) SDID estimator targets a single cohort. The mlsynth implementation runs that estimator per cohort, accumulates the cohort-specific effects, and then aggregates them in two complementary ways (Ciccia, 2024).

Cohort-Specific SDID (Equation 2)#

For a single cohort \(a\), SDID fits unit weights \(\mathbf{w}\) over \(N_{co}\) donor units and time weights \(\boldsymbol{\lambda}\) over the cohort’s pre-treatment window \(t < a\) by solving two convex programs:

\[\mathbf{w}^\ast \;=\; \operatorname*{arg\,min}_{\sum w_i = 1,\ w_i \geq 0} \sum_{t = 1}^{a - 1} \left( \bar y_{I^a, t} - w_0 - \sum_{i = 1}^{N_{co}} w_i\, y_{it} \right)^{\!2} + T_0\, \zeta^2 \|\mathbf{w}\|_2^2,\]

\[\boldsymbol{\lambda}^\ast \;=\; \operatorname*{arg\,min}_{\sum \lambda_t = 1,\ \lambda_t \geq 0} \sum_{i = 1}^{N_{co}} \left( \bar y_{i, [a, T]} - \lambda_0 - \sum_{t = 1}^{a - 1} \lambda_t\, y_{it} \right)^{\!2},\]

where \(\bar y_{I^a, t}\) is the treated-unit mean at time \(t\), \(\bar y_{i, [a, T]}\) is donor \(i\)’s mean over the post-treatment window, and \(\zeta\) is a regularization parameter scaled by the standard deviation of first-differenced donor outcomes. The cohort-specific SDID estimator is then

\[\widehat{\tau}_a^{\,sdid} \;=\; \frac{1}{T_{tr}^a} \sum_{t = a}^{T} \left( \frac{1}{N_{tr}^a} \sum_{i \in I^a} y_{it} - \sum_{i = 1}^{N_{co}} w_i\, y_{it} \right) - \sum_{t = 1}^{a - 1} \lambda_t \left( \frac{1}{N_{tr}^a} \sum_{i \in I^a} y_{it} - \sum_{i = 1}^{N_{co}} w_i\, y_{it} \right).\]

This is Equation 2 of Ciccia (2024). Each cohort is fit independently inside mlsynth.utils.sdid_helpers.cohort.estimate_cohort_sdid_effects().

Cohort-Specific Event Study (Equation 3)#

The cohort ATT is the average of a sequence of dynamic effects, one per post-treatment offset \(\ell \in \{1, \dots, T_{tr}^a\}\):

\[\widehat\tau_{a, \ell}^{\,sdid} \;=\; \frac{1}{N_{tr}^a} \sum_{i \in I^a} Y_{i, a - 1 + \ell} \;-\; \sum_{i = 1}^{N_{co}} \omega_i Y_{i, a - 1 + \ell} \;-\; \sum_{t = 1}^{a - 1} \lambda_t \left( \frac{1}{N_{tr}^a} \sum_{i \in I^a} Y_{i, t} - \sum_{i = 1}^{N_{co}} \omega_i Y_{i, t} \right).\]

The first two terms are the post-treatment gap between the treated cohort and its synthetic control at offset \(\ell\); the third term is the time-weighted pre-treatment baseline. By construction,

\[\widehat\tau_a^{\,sdid} \;=\; \frac{1}{T_{tr}^a} \sum_{\ell = 1}^{T_{tr}^a} \widehat\tau_{a, \ell}^{\,sdid},\]

i.e. the cohort ATT is the sample mean of its dynamic effects (Equation 4 of Ciccia 2024). These effects are exposed on the result object as SDIDCohort.event_effects.

Pooled Event Study (Equation 6)#

Let \(A_\ell = \{a \in A : a - 1 + \ell \le T\}\) be the set of cohorts for which the \(\ell\)-th dynamic effect is computable, and \(N_{tr}^\ell = \sum_{a \in A_\ell} N_{tr}^a\) the corresponding treated-unit count. The pooled event-study estimator is

\[\widehat\tau_\ell^{\,sdid} \;=\; \sum_{a \in A_\ell} \frac{N_{tr}^a}{N_{tr}^\ell} \widehat\tau_{a, \ell}^{\,sdid},\]

a treated-unit-weighted average of the cohort-specific dynamic effects. This is the central quantity Ciccia (2024) recommends researchers report. In the mlsynth API it is SDIDEventStudy.tau, indexed by the corresponding event time on SDIDEventStudy.event_times.

Overall ATT (Equation 7)#

Define \(T_{tr} = \max_{a \in A} T_{tr}^a\), the post-treatment length of the earliest cohort. The overall ATT of Clarke et al. (2023) admits the equivalent disaggregated form

\[\widehat{ATT} \;=\; \frac{1}{T_{post}} \sum_{\ell = 1}^{T_{tr}} N_{tr}^\ell \, \widehat\tau_\ell^{\,sdid},\]

i.e. the average of the pooled event-study effects weighted by the number of treated units contributing to each offset. This is SDIDInference.att, with a placebo-based standard error and confidence interval at SDIDInference.se / SDIDInference.ci.

Inference#

Arkhangelsky et al. (2021) give three procedures for the variance of the ATT, generalized to cohort and event-time effects by Clarke et al. (2023). The SDIDConfig.vce option selects among them; the label used is recorded on SDIDInference.method.

placebo (the default, Algorithm 4): For each of \(B\) iterations (SDIDConfig.B), a control unit is reassigned as a pseudo-treated unit and removed from the donor pool, the full SDID pipeline is rerun on the remaining controls, and the variance of the resulting placebo effects estimates the variance of the actual estimator. This is the only procedure defined for a single treated unit, and it is what the canonical Proposition 99 example uses. The implementation lives in mlsynth.utils.sdid_helpers.inference.estimate_placebo_variance(). The two-sided placebo p-value on SDIDInference.p_value uses the canonical \(((k + 1) / (B + 1))\) correction, where \(k\) counts the placebo iterations whose \(|\widehat\tau^{\,*}_{att}|\) is at least as large as the observed \(|\widehat{ATT}|\).
jackknife (Algorithm 3): The fitted unit weights \(\widehat\omega\) and time weights \(\widehat\lambda\) are held fixed and each unit is left out in turn; the variance is the standard fixed-weights jackknife \(\tfrac{N-1}{N}\sum_i (\widehat{ATT}_{(-i)} - \overline{ATT})^2\). It is deterministic and fast (no re-solve of the weight problems), but is undefined when a cohort has a single treated unit – leaving out the sole treated unit is undefined – and returns NaN there, matching the synthdid R package.
bootstrap (Algorithm 2): Units are resampled with replacement, degenerate all-treated or all-control resamples are discarded, and the full SDID estimate (weights re-fit) is recomputed on each resample; the variance is that of the resampled estimates. Like the jackknife it needs more than one treated unit and returns NaN otherwise.
noinference: Skips variance estimation; SDIDInference.se, the interval, and the p-value are NaN.

The jackknife and bootstrap are implemented for the block (single adoption period) design, matching synthdid’s vcov.R; a staggered-adoption panel raises, directing you to the placebo procedure. For the jackknife and bootstrap the p-value on SDIDInference.p_value is the asymptotic-normal \(2\,(1 - \Phi(|\widehat{ATT}| / \widehat{se}))\), matching the confidence intervals those methods construct.

The three methods are cross-validated against synthdid: on a three-treated block panel the deterministic jackknife reproduces the authors’ R value-for-value (\(10.557\)), and the placebo and bootstrap match in magnitude (they are stochastic, with independent RNG streams across the two languages).

Two-DataFrame and Single-Cohort Convergence#

When the panel has a single treated unit (e.g., California in the Proposition 99 study), mlsynth.utils.datautils.dataprep() returns a single-treated payload rather than a cohorts dict. The mlsynth.utils.sdid_helpers.setup.prepare_sdid_inputs() helper unifies both shapes into a single cohorts_dict keyed by adoption period index (1-based), which is what the cohort estimator’s \ell = t - (a - 1) math requires. In the single-cohort case, the cohort ATT and the overall ATT are numerically identical by construction.

Core API#

Synthetic Difference-in-Differences (SDID) estimator with event-study output.

Implements:

Arkhangelsky, D., Athey, S., Hirshberg, D., Imbens, G., & Wager, S. (2021). “Synthetic Difference-in-Differences.” American Economic Review.

Ciccia, D. (2024). “A Short Note on Event-Study Synthetic Difference-in-Differences Estimators.” arXiv:2407.09565.

Clarke, D., Pailanir, D., Athey, S., & Imbens, G. (2023). “Synthetic difference in differences estimation.” arXiv preprint.

The estimator handles both the canonical single-treated-unit setup (e.g. Proposition 99) and staggered-adoption designs with multiple cohorts. Output is a typed mlsynth.utils.sdid_helpers.structures.SDIDResults object that exposes:

inference.att / inference.se / inference.ci / inference.p_value
the overall ATT and its placebo-based inference (Ciccia 2024 Eq. 7);

event_study.tau / event_study.se / event_study.ci / event_study.event_times
the pooled event-study estimator (Ciccia 2024 Eq. 6);

cohorts[a] for each adoption period a: the cohort ATT
tau_a^sdid (Eq. 2), the cohort-specific event-time effects tau_{a, ell}^sdid (Eq. 3), and the cohort’s actual vs. bias-corrected synthetic control trajectories.

class mlsynth.estimators.sdid.SDID(config: SDIDConfig | dict)#

Bases: object

Synthetic Difference-in-Differences estimator with event-study output.

Parameters:: config (SDIDConfig or dict) – Configuration object. See mlsynth.config_models.SDIDConfig.
Returns:: SDIDResults – Typed container with the overall ATT and placebo inference (SDIDResults.inference), the pooled event-study estimator (SDIDResults.event_study), and the per-cohort decomposition (SDIDResults.cohorts).

Notes

The estimator accepts either a single treatment date (the canonical SDID setup) or a staggered-adoption panel. dataprep distinguishes the two cases automatically.

References

Arkhangelsky, D., Athey, S., Hirshberg, D., Imbens, G., & Wager, S. (2021). “Synthetic Difference-in-Differences.” American Economic Review.

Ciccia, D. (2024). “A Short Note on Event-Study Synthetic Difference-in-Differences Estimators.” arXiv:2407.09565.

Examples

>>> import pandas as pd
>>> from mlsynth import SDID
>>> df = pd.read_csv(
...     "https://raw.githubusercontent.com/jgreathouse9/mlsynth/"
...     "refs/heads/main/basedata/smoking_data.csv"
... )
>>> df["Proposition 99"] = df["Proposition 99"].astype(int)
>>> res = SDID({
...     "df": df, "outcome": "cigsale", "treat": "Proposition 99",
...     "unitid": "state", "time": "year", "B": 200,
...     "display_graphs": False,
... }).fit()
>>> res.inference.att
-14.485...

fit() → SDIDResults#: Run the SDID pipeline and return the typed result container.

Configuration#

class mlsynth.config_models.SDIDConfig(*, df: ~pandas.DataFrame, outcome: str, treat: str, unitid: str, time: str, display_graphs: bool = True, save: bool | str = False, counterfactual_color: ~typing.List[str] = <factory>, treated_color: str = 'black', plot: ~mlsynth.config_models.PlotConfig = <factory>, vce: ~typing.Literal['placebo', 'jackknife', 'bootstrap', 'noinference'] = 'placebo', B: ~typing.Annotated[int, ~annotated_types.Ge(ge=0)] = 500, seed: int = 1400, intercept_adjust: bool = False)#

Configuration for the Synthetic Difference-in-Differences (SDID) estimator.

Implements Arkhangelsky, Athey, Hirshberg, Imbens & Wager (2021)’s SDID with the event-study aggregation of Ciccia (2024, arXiv:2407.09565). Inherits the standard df / outcome / treat / unitid / time panel-data interface from BaseEstimatorConfig.

B: int#

intercept_adjust: bool#

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

seed: int#

vce: Literal['placebo', 'jackknife', 'bootstrap', 'noinference']#

Helper Modules#

Data preparation for SDID.

Calls mlsynth.utils.datautils.dataprep() and packages its return shape (single-treated or cohorts) into a uniform cohorts_dict that the math helpers consume. This replaces the inline if "cohorts" not in prep restructuring block that used to live in SDID.fit().

mlsynth.utils.sdid_helpers.setup.prepare_sdid_inputs(df: DataFrame, outcome: str, treat: str, unitid: str, time: str) → SDIDInputs#

Prepare panel data for the SDID pipeline.

Parameters:

df (pd.DataFrame) – Long-form balanced panel.
outcome, treat, unitid, time (str) – Column names identifying the outcome, treatment indicator, units, and time periods.

Returns:

SDIDInputs – Pre-processed cohorts payload and metadata.

Unit-weight, time-weight, and regularization solvers for SDID.

Both SDID weight programs are simplex-constrained least squares with a free intercept (and, for the unit weights, an L2 ridge):

unit weights omega minimise ||a + Y0_pre @ omega - y_treated_pre||^2 + T0 * zeta^2 * ||omega||^2 subject to sum(omega) = 1, omega >= 0 (Arkhangelsky et al. 2021 eq. 4 / Clarke et al. 2024 eq. 4);
time weights lambda minimise ||a + lambda @ Y0_pre - mean_post||^2 subject to sum(lambda) = 1, lambda >= 0 (eq. 6).

Rather than canonicalise these through cvxpy on every call – expensive in the placebo / jackknife loops, which re-solve them hundreds of times – we solve them natively with the library’s active-set simplex QP (mlsynth.utils.bilevel.active_set.solve_simplex_qp()). Two standard reductions make the active-set primitive applicable without changing the optimum:

the free intercept a is profiled out by centering the design and target over the observation axis (for fixed weights, the optimal intercept is the mean residual, which is exactly what centering enforces); the intercept is recovered afterwards as a* = mean(target) - colmean(X) @ w;
the unit-weight ridge lambda_r = T0 * zeta^2 is folded in by stacking a sqrt(lambda_r) * I block beneath the centered design with a zero target, so ||X_aug w - b_aug||^2 == ||X_c w - b_c||^2 + lambda_r ||w||^2.

Both reduced programs are strictly convex on SDID’s panels (the unit program by the positive ridge; the time program because the donors outnumber the pre-periods), so the optimum is unique and the active-set solution coincides with CLARABEL’s to solver tolerance – the Prop 99 ATT is preserved bit-for-bit within the pinned benchmark tolerance. Parity is asserted in tests/test_sdid_weights_native.py.

mlsynth.utils.sdid_helpers.weights.compute_regularization(donor_outcomes_pre_treatment: ndarray, num_post_treatment_periods: int, num_treated_units: int = 1) → float#

Compute regularization parameter zeta for unit weights.

Parameters:

donor_outcomes_pre_treatment (np.ndarray) – Donor outcomes in pre-treatment period, shape (T0, N_donors).
num_post_treatment_periods (int) – Number of post-treatment periods (T_post).
num_treated_units (int, optional) – Number of treated units in the cohort (N_tr), by default 1. Arkhangelsky et al. (2021) fold the treated count into the unit-weight ridge; the synthdid R package uses eta.omega = ((N - N0) * (T - T0))^(1/4) = (N_tr * T_post)^(1/4). A single treated unit (the default) leaves zeta at the (T_post)^(1/4) form, so single-treated designs are unchanged.

Returns:

float – The calculated regularization parameter zeta. If donor_outcomes_pre_treatment has fewer than 2 time periods, a fallback value (currently 1.0, though this might indicate insufficient data for robust estimation) is used for std_dev_of_first_differenced_donor_outcomes, which then influences zeta.

Notes

The regularization parameter zeta is calculated as: zeta = ((num_treated_units * num_post_treatment_periods) ** 0.25) * std_dev_of_first_differenced_donor_outcomes where std_dev_of_first_differenced_donor_outcomes is the standard deviation of the first-differenced outcomes of donor units in the pre-treatment period. This matches the synthdid unit-weight tuning parameter zeta.omega.

Examples

>>> T0_ex, N_donors_ex = 10, 5
>>> Y0_pre_donors_ex = np.random.rand(T0_ex, N_donors_ex) * 100
>>> T_post_ex = 5
>>> zeta = compute_regularization(Y0_pre_donors_ex, T_post_ex)
>>> print(f"Zeta: {zeta:.2f}")
Zeta: ...

>>> # Example with insufficient pre-treatment periods for diff
>>> Y0_short_pre_donors_ex = np.random.rand(1, N_donors_ex)
>>> zeta_short = compute_regularization(Y0_short_pre_donors_ex, T_post_ex)
>>> # Based on fallback std_dev_of_first_differenced_donor_outcomes = 1.0
>>> # Expected: (5**0.25) * 1.0 = 1.495...
>>> print(f"Zeta for short pre-period: {zeta_short:.2f}")
Zeta for short pre-period: 1.50

mlsynth.utils.sdid_helpers.weights.fit_time_weights(donor_outcomes_pre_treatment: ndarray, mean_donor_outcomes_post_treatment: ndarray) → Tuple[float | None, ndarray | None]#

Fit time weights for SDID.

Parameters:

donor_outcomes_pre_treatment (np.ndarray) – Donor outcomes in pre-treatment period, shape (T0, N_donors).
mean_donor_outcomes_post_treatment (np.ndarray) – Mean outcome of each donor unit in post-treatment period, shape (N_donors,).

Returns:

Tuple[Optional[float], Optional[np.ndarray]] –

interceptOptional[float]
The estimated intercept term (beta_0 in some notations). Returns None if the optimization fails or does not converge.
time_weightsOptional[np.ndarray]
The estimated time weights (lambda_t in some notations). Shape (num_pre_treatment_periods,). These weights sum to 1 and are non-negative. Returns None if the optimization fails or does not converge.

Notes

This function solves an optimization problem to find time weights and an intercept that best reconstruct the average post-treatment donor outcomes using a weighted average of pre-treatment donor outcomes. The objective is to minimize the sum of squared differences between mean_donor_outcomes_post_treatment and intercept + time_weights @ donor_outcomes_pre_treatment, subject to sum(time_weights) = 1 and time_weights >= 0.

Examples

>>> T0_ex, N_donors_ex = 5, 3
>>> Y0_pre_donors_ex = np.random.rand(T0_ex, N_donors_ex)
>>> Y0_post_donors_mean_ex = np.random.rand(N_donors_ex)
>>> intercept_val, time_w_val = fit_time_weights(Y0_pre_donors_ex, Y0_post_donors_mean_ex)
>>> if time_w_val is not None:
...     print(f"Time weights shape: {time_w_val.shape}")
...     print(f"Sum of time weights: {np.sum(time_w_val):.2f}")
Time weights shape: (5,)
Sum of time weights: 1.00

mlsynth.utils.sdid_helpers.weights.unit_weights(donor_outcomes_pre_treatment: ndarray, mean_treated_outcome_pre_treatment: ndarray, regularization_parameter_zeta: float) → Tuple[float | None, ndarray | None]#

Fit unit (donor) weights for SDID.

Parameters:

donor_outcomes_pre_treatment (np.ndarray) – Donor outcomes in pre-treatment period, shape (T0, N_donors).
mean_treated_outcome_pre_treatment (np.ndarray) – Mean outcome of treated units in pre-treatment period, shape (T0,).
regularization_parameter_zeta (float) – Regularization parameter.

Returns:

Tuple[Optional[float], Optional[np.ndarray]] –

interceptOptional[float]
The estimated intercept term (beta_0 in some notations). Returns None if the optimization fails or does not converge.
unit_weightsOptional[np.ndarray]
The estimated donor weights (omega_j in some notations). Shape (N_donors,). These weights sum to 1 and are non-negative. Returns None if the optimization fails or does not converge.

Notes

This function solves an optimization problem to find donor weights and an intercept that best reconstruct the pre-treatment trajectory of the (mean) treated unit using a weighted average of donor unit outcomes. The objective is to minimize the sum of squared differences between mean_treated_outcome_pre_treatment and intercept + donor_outcomes_pre_treatment @ unit_weights, plus an L2 penalty on the unit_weights scaled by regularization_parameter_zeta. Constraints are sum(unit_weights) = 1 and unit_weights >= 0.

Examples

>>> T0_ex, N_donors_ex = 10, 5
>>> Y0_pre_donors_ex = np.random.rand(T0_ex, N_donors_ex)
>>> y_pre_mean_treated_ex = np.random.rand(T0_ex)
>>> zeta_ex = 0.1
>>> intercept_val, unit_w_val = unit_weights(
...     Y0_pre_donors_ex, y_pre_mean_treated_ex, zeta_ex
... )
>>> if unit_w_val is not None:
...     print(f"Unit weights shape: {unit_w_val.shape}")
...     print(f"Sum of unit weights: {np.sum(unit_w_val):.2f}")
Unit weights shape: (5,)
Sum of unit weights: 1.00

Per-cohort SDID estimator.

Implements the cohort-specific SDID effects from Arkhangelsky et al. (2021) as re-expressed in Equations 2 and 3 of Ciccia (2024). For each cohort with adoption period a this routine:

fits unit weights omega and time weights lambda on the cohort’s pre-treatment window (the heavy lifting lives in weights),
computes the bias-corrected synthetic-control trajectory,
extracts the cohort-specific event-time effects tau_{a, ell} = Y_{0, a-1+ell} - Y_{0, a-1+ell}^{SC} - bias_correction (Equation 3 of Ciccia 2024),
averages those into the cohort ATT tau_a^sdid (Equation 4),
and pushes each event-time effect into the pooled accumulator that feeds the event-study aggregation in event_study.

Function body and signatures are verbatim from the previous sdidutils.estimate_cohort_sdid_effects so the Prop 99 numbers do not shift across the refactor.

mlsynth.utils.sdid_helpers.cohort.estimate_cohort_sdid_effects(cohort_adoption_period: int, cohort_data_dict: Dict[str, Any], pooled_event_time_effects_accumulator: DefaultDict[float, List[Tuple[int, float]]]) → Dict[str, Any]#

Estimate Synthetic Difference-in-Differences (SDID) effects for a specific cohort.

This function calculates SDID treatment effects, synthetic control outcomes, and related metrics for a single cohort of treated units. It involves estimating unit (donor) weights and time weights, computing a bias correction term, and then deriving the treatment effects relative to the cohort’s specific treatment adoption period cohort_adoption_period.

The results from each cohort (event-time effects and number of treated units) are accumulated into the pooled_event_time_effects_accumulator dictionary, which is modified in place.

Parameters:

cohort_adoption_period (int) – Adoption period (treatment start time) for the current cohort. This is typically a specific time period index (e.g., year).
cohort_data_dict (Dict[str, Any]) – A dictionary containing data specific to the current cohort. Expected keys:
- “y” : np.ndarray Outcome matrix for treated units in this cohort. Shape (total_time_periods, N_treated_cohort), where total_time_periods is the total number of time periods in the panel, and N_treated_cohort is the number of treated units in this specific cohort.
- “donor_matrix” : np.ndarray Matrix of outcomes for all donor units available to this cohort. Shape (total_time_periods, N_donors).
- “total_periods” : int Total number of time periods (total_time_periods) in the panel.
- “pre_periods” : int Number of pre-treatment periods (num_pre_treatment_periods_cohort) relative to this cohort’s adoption period cohort_adoption_period.
- “post_periods” : int Number of post-treatment periods (num_post_treatment_periods_cohort) relative to cohort_adoption_period.
- “treated_indices” : List[int] List of original indices identifying the treated units in this cohort. Used to determine N_treated_cohort.
pooled_event_time_effects_accumulator (DefaultDict[float, List[Tuple[int, float]]]) – A dictionary (typically collections.defaultdict(list)) that accumulates event-time effects across all cohorts. - Keys are event times ell (float, relative to treatment start, e.g., -2, -1, 0, 1, 2). - Values are lists of tuples, where each tuple is (N_treated_cohort, effect_value). This dictionary is updated in place by this function, adding the contributions from the current cohort.

Returns:

Dict[str, Any] – A dictionary containing detailed results for the processed cohort:

”effects” : np.ndarray Array of (event_time, treatment_effect) pairs for all total_time_periods periods. Shape (total_time_periods, 2).
”pre_effects” : np.ndarray Array of (event_time, treatment_effect) pairs for pre-intervention periods. Shape (N_pre_effects, 2) or empty if no pre-effects.
”post_effects” : np.ndarray Array of (event_time, treatment_effect) pairs for post-intervention periods (including event time 0). Shape (N_post_effects, 2) or empty.
”actual” : np.ndarray Mean actual outcome trajectory for the treated units in this cohort. Shape (total_time_periods,).
”counterfactual” : np.ndarray Raw synthetic control outcome trajectory (cohort_donor_outcomes_matrix @ optimal_unit_weights_vector). Shape (total_time_periods,). Can contain NaNs if weights are not estimated.
”fitted_counterfactual” : np.ndarray Bias-corrected synthetic control outcome trajectory. Shape (total_time_periods,). Can contain NaNs.
”att” : float Average Treatment Effect on the Treated (ATT) for this cohort, averaged over its post-intervention periods. NaN if no post-periods or if effects cannot be calculated.
”treatment_effects_series” : np.ndarray Time series of treatment effects (actual - fitted_counterfactual) for all total_time_periods periods. Shape (total_time_periods,).
”ell” : np.ndarray Array of event times relative to this cohort’s treatment start cohort_adoption_period. Shape (total_time_periods,). For example, ell = 0 corresponds to period cohort_adoption_period.

Examples

>>> # Conceptual example due to complex data setup
>>> # Assume 'adoption_period_ex' is the treatment start year for this cohort
>>> adoption_period_ex = 2005
>>> # Assume 'cohort_data_example' is a dict with keys like "y", "donor_matrix", etc.
>>> # and 'pooled_effects_accumulator_ex' is a defaultdict(list)
>>> total_periods_ex, n_treated_ex, n_donors_ex, n_pre_periods_ex = 10, 2, 5, 5 # Example dimensions
>>> cohort_data_example_ex = {
...     "y": np.random.rand(total_periods_ex, n_treated_ex),
...     "donor_matrix": np.random.rand(total_periods_ex, n_donors_ex),
...     "total_periods": total_periods_ex,
...     "pre_periods": n_pre_periods_ex, # Number of pre-treatment periods for this cohort
...     "post_periods": total_periods_ex - n_pre_periods_ex,
...     "treated_indices": list(range(n_treated_ex))
... }
>>> pooled_effects_accumulator_ex = defaultdict(list)
>>> # Mock dependent functions for a runnable example
>>> with warnings.catch_warnings(): # Suppress potential warnings from mock data
...     warnings.simplefilter("ignore")
...     # Mocking internal weight and regularization functions
...     # These would normally perform complex optimizations
...     mock_zeta_ex = 0.1
...     mock_unit_w_ex = np.full(n_donors_ex, 1.0/n_donors_ex)
...     mock_time_w_ex = np.full(n_pre_periods_ex, 1.0/n_pre_periods_ex)
...     from unittest.mock import patch
...     with patch('mlsynth.utils.sdidutils.compute_regularization', return_value=mock_zeta_ex),     ...          patch('mlsynth.utils.sdidutils.unit_weights', return_value=(0.0, mock_unit_w_ex)),     ...          patch('mlsynth.utils.sdidutils.fit_time_weights', return_value=(0.0, mock_time_w_ex)):
...         results_cohort_ex = estimate_cohort_sdid_effects(
...             adoption_period_ex, cohort_data_example_ex, pooled_effects_accumulator_ex
...         )
>>> print(f"Cohort ATT: {results_cohort_ex['att']:.3f}") # Example output
Cohort ATT: ...
>>> # pooled_effects_accumulator_ex would be updated in place
>>> # print(len(pooled_effects_accumulator_ex[-1])) # Example check

Event-study SDID aggregation.

Implements the pooled and aggregate event-study estimators from Ciccia (2024, arXiv:2407.09565). Given per-cohort effects from cohort, this module aggregates them into:

the pooled event-time effects tau_ell^sdid (Equation 6, paper), with weights proportional to the per-cohort treated-unit counts at each event-time horizon;
the overall ATT (Equation 7) as a treated-unit-weighted average of the pooled event-study effects;
placebo-based standard errors and confidence intervals for both.

Function body of estimate_event_study_sdid is verbatim from the previous sdidutils location.

mlsynth.utils.sdid_helpers.event_study.estimate_event_study_sdid(prepped_event_study_data: Dict[str, Any], placebo_iterations: int = 1000, seed: int = 1400, vce: str = 'placebo') → Dict[str, Any]#

Estimate event-study SDID effects with placebo inference for variance, SE, and 95% CI.

Parameters:

prepped_event_study_data (Dict[str, Any]) – Preprocessed data from a function like dataprep_event_study_sdid. Expected to contain a ‘cohorts’ key, which is a dictionary mapping cohort adoption periods (int) to cohort-specific data dictionaries.
placebo_iterations (int, optional) – Number of placebo resamples (B) for variance estimation, by default 1000.
seed (int, optional) – Random seed for reproducibility of placebo sampling, by default 1400.

Returns:

Dict[str, Any] – A dictionary containing various estimates:

”tau_a_ell” : Dict[int, Dict[str, Any]] Per-cohort detailed results. Keys are cohort adoption periods. Values are dictionaries from estimate_cohort_sdid_effects.
”tau_ell” : Dict[float, float] Pooled event-time effects (weighted average across cohorts). Keys are event times ell, values are the pooled effect estimates.
”att” : float Overall Average Treatment Effect on Treated, aggregated across all cohorts and post-treatment periods.
”att_se” : float Standard error for the overall ATT, estimated via placebo inference.
”att_ci” : List[float] 95% Confidence interval [lower, upper] for the overall ATT.
”cohort_estimates” : Dict[int, Dict[str, Any]] Per-cohort summary statistics. Keys are cohort adoption periods. Values are dicts with “att”, “att_se”, “att_ci”, and “event_estimates” (a dict of event_time -> {tau, se, ci}).
”pooled_estimates” : Dict[float, Dict[str, Any]] Pooled event-time estimates with SE and CI. Keys are event times ell. Values are dicts with “tau”, “se”, “ci”.
”placebo_att_values” : List[float] List of ATT values obtained from each placebo iteration. Useful for diagnostics or alternative inference methods.

Examples

>>> # Conceptual example due to the complexity of `prepped_event_study_data` data
>>> # `prepped_data_example` would be the output of a data preparation function
>>> # specific to event study SDID, containing multiple cohorts.
>>> prepped_data_example = {
...     "cohorts": {
...         2005: { # Data for cohort treated in 2005
...             "y": np.random.rand(10, 2), "donor_matrix": np.random.rand(10, 5),
...             "total_periods": 10, "pre_periods": 5, "post_periods": 5,
...             "treated_indices": [0, 1]
...         },
...         2006: { # Data for cohort treated in 2006
...             "y": np.random.rand(10, 1), "donor_matrix": np.random.rand(10, 5),
...             "total_periods": 10, "pre_periods": 6, "post_periods": 4,
...             "treated_indices": [2]
...         }
...     }
... }
>>> # Mock dependent functions for a runnable example
>>> from unittest.mock import patch
>>> mock_zeta = 0.1
>>> mock_unit_w = np.array([0.2, 0.2, 0.2, 0.2, 0.2])
>>> mock_time_w_c1 = np.full(5, 0.2)
>>> mock_time_w_c2 = np.full(6, 1/6)
>>> # This example is highly simplified and primarily tests structure
>>> with warnings.catch_warnings(): # Suppress potential warnings
...     warnings.simplefilter("ignore")
...     # Mocking internal weight and regularization functions
...     # Need to handle calls for each cohort within estimate_cohort_sdid_effects
...     # and also for each placebo iteration within estimate_placebo_variance
...     # This level of mocking is complex for a simple docstring example.
...     # We'll assume the function runs and check for key existence.
...     with patch('mlsynth.utils.sdidutils.compute_regularization', return_value=mock_zeta),     ...          patch('mlsynth.utils.sdidutils.unit_weights', return_value=(0.0, mock_unit_w)),     ...          patch('mlsynth.utils.sdidutils.fit_time_weights', side_effect=[(0.0, mock_time_w_c1), (0.0, mock_time_w_c2)] * (1 + 10)): # 1 real + B mock iterations
...         results_event_study = estimate_event_study_sdid(prepped_data_example, placebo_iterations=10, seed=42)
>>> print("Overall ATT:", results_event_study["att"]) # Example output
Overall ATT: ...
>>> print("Pooled estimate for event time 0:", results_event_study["pooled_estimates"].get(0.0, {}).get("tau"))
Pooled estimate for event time 0: ...
>>> assert "placebo_att_values" in results_event_study

Variance estimators for SDID.

Arkhangelsky et al. (2021) propose three procedures for the variance of the SDID ATT, extended to the staggered / event-study setting by Clarke et al. (2023). All three are provided here and selected through the vce config option:

placebo (Algorithm 4) – control units are repeatedly reassigned as pseudo-treated units, the full SDID pipeline is rerun, and the variance of the resulting effects estimates the variance of the actual estimator. This is the only procedure defined for a single treated unit and is the default.
jackknife (Algorithm 3) – the fitted unit/time weights are held fixed and each unit is left out in turn; the variance follows the standard fixed-weights jackknife. Undefined (NaN) when a cohort has a single treated unit, matching the synthdid R package.
bootstrap (Algorithm 2) – units are resampled with replacement and the full SDID estimate (weights re-fit) is recomputed on each resample; the variance is the variance of the resampled estimates. Undefined (NaN) for a single treated unit, matching synthdid.

The jackknife and bootstrap are implemented for the block (single adoption period) design, matching synthdid’s vcov.R; staggered adoption uses the placebo procedure.

mlsynth.utils.sdid_helpers.inference.estimate_bootstrap_variance(prepped_event_study_data: Dict[str, Any], num_bootstrap_iterations: int, seed: int) → Dict[str, Any]#

Clustered (block) bootstrap variance of the SDID ATT (Algorithm 2).

Resamples the N units with replacement, discards a resample that is all treated or all control, and recomputes the full SDID estimate – weights re-fit – on each resample. The variance is the population variance of the resampled ATTs (matching synthdid’s sqrt((B-1)/B) * sd(...)).

Returns NaN when the cohort has a single treated unit, matching synthdid, whose bootstrap is undefined unless more than one unit is treated.

mlsynth.utils.sdid_helpers.inference.estimate_jackknife_variance(prepped_event_study_data: Dict[str, Any]) → Dict[str, Any]#

Fixed-weights jackknife variance of the SDID ATT (Algorithm 3).

Holds the fitted unit weights omega and time weights lambda fixed, leaves out each unit (treated or control) in turn, and recomputes the ATT from the closed-form weighted DID. When a control is dropped, omega is renormalized over the retained controls (synthdid’s sum_normalize); when a treated unit is dropped, omega is unchanged. The variance is the standard jackknife form ((N - 1) / N) * sum_i (att_(-i) - mean)^2, matching synthdid’s jackknife().

Returns NaN when the cohort has a single treated unit (leaving out the sole treated unit is undefined) or when the fitted weights are not available.

mlsynth.utils.sdid_helpers.inference.estimate_placebo_variance(prepped_event_study_data: Dict[str, Any], num_placebo_iterations: int, seed: int) → Dict[str, Any]#

Estimate variance of ATT and event-time effects using placebo inference.

Parameters:

prepped_event_study_data (Dict[str, Any]) – Preprocessed data from dataprep_event_study_sdid or similar.
num_placebo_iterations (int) – Number of placebo iterations.
seed (int) – Random seed for reproducibility.

Returns:

Dict[str, Any] – Dictionary containing variance estimates and placebo ATT values:

”att_variance” (float): Variance of the overall ATT.
”cohort_variances” (Dict[int, float]): Variances of cohort-specific ATTs. Keys are cohort adoption periods.
”event_variances” (Dict[int, Dict[int, float]]): Variances of cohort-specific event-time effects. Outer keys are cohort adoption periods, inner keys are event times ell.
”pooled_event_variances” (Dict[float, float]): Variances of pooled event-time effects. Keys are event times ell.
”placebo_att_values” (List[float]): List of ATT values from each placebo iteration, useful for diagnostics.

Notes

This function performs placebo tests by iteratively reassigning control units as pseudo-treated units and re-estimating effects. The variance of these placebo effects is then used as an estimate of the variance of the actual treatment effects. A warning is issued if the number of unique control units is less than the total number of treated units across all cohorts, as this may compromise the reliability of placebo inference.

Examples

>>> # Conceptual example due to the complexity of `prepped_event_study_data` data
>>> # `prepped_data_example` would be the output of a data preparation function
>>> # specific to event study SDID, containing multiple cohorts.
>>> prepped_data_example = {
...     "cohorts": {
...         2005: { # Data for cohort treated in 2005
...             "y": np.random.rand(10, 2), "donor_matrix": np.random.rand(10, 5),
...             "total_periods": 10, "pre_periods": 5, "post_periods": 5,
...             "treated_indices": [0, 1] # Original treated indices
...         },
...         2006: { # Data for cohort treated in 2006
...             "y": np.random.rand(10, 1), "donor_matrix": np.random.rand(10, 5),
...             "total_periods": 10, "pre_periods": 6, "post_periods": 4,
...             "treated_indices": [2] # Original treated index
...         }
...     }
... }
>>> # Mock dependent functions for a runnable example
>>> from unittest.mock import patch
>>> mock_zeta = 0.1
>>> mock_unit_w = np.array([0.2, 0.2, 0.2, 0.2, 0.2])
>>> mock_time_w_c1 = np.full(5, 0.2)
>>> mock_time_w_c2 = np.full(6, 1/6)
>>> # This example is highly simplified.
>>> with warnings.catch_warnings():
...     warnings.simplefilter("ignore")
...     # Mocking internal weight and regularization functions.
...     # The side_effect needs to cover calls for each cohort and each placebo iteration.
...     # For num_placebo_iterations=3, and 2 cohorts, estimate_cohort_sdid_effects is called 2*3=6 times.
...     # Each call to estimate_cohort_sdid_effects calls fit_time_weights once.
...     # So, fit_time_weights needs 6 return values.
...     fit_time_weights_returns = []
...     for _ in range(3): # num_placebo_iterations
...         fit_time_weights_returns.append((0.0, mock_time_w_c1)) # For cohort 2005 placebo
...         fit_time_weights_returns.append((0.0, mock_time_w_c2)) # For cohort 2006 placebo
...
...     with patch('mlsynth.utils.sdidutils.compute_regularization', return_value=mock_zeta),     ...          patch('mlsynth.utils.sdidutils.unit_weights', return_value=(0.0, mock_unit_w)),     ...          patch('mlsynth.utils.sdidutils.fit_time_weights', side_effect=fit_time_weights_returns):
...         variance_results = estimate_placebo_variance(prepped_data_example, num_placebo_iterations=3, seed=42)
>>> print(f"ATT Variance: {variance_results['att_variance']}") # Example output
ATT Variance: ...
>>> assert "placebo_att_values" in variance_results
>>> assert len(variance_results["placebo_att_values"]) <= 3 # Can be less if NaNs occur

Top-level SDID procedure (Ciccia 2024-style event-study aggregation).

Sequence:

prepare_sdid_inputs() packs the panel into a uniform cohorts dict.
estimate_event_study_sdid() fits all cohorts, aggregates the pooled event-study estimator, and runs the placebo procedure.
assemble_results() wraps the raw dictionary into typed frozen dataclasses (SDIDResults etc.).

mlsynth.utils.sdid_helpers.orchestration.assemble_results(inputs: SDIDInputs, raw: Dict[str, Any], intercept_adjust: bool = False) → SDIDResults#: Wrap the raw dict from estimate_event_study_sdid into typed objects.

mlsynth.utils.sdid_helpers.orchestration.run_sdid(df, outcome: str, treat: str, unitid: str, time: str, B: int = 500, seed: int = 1400, vce: str = 'placebo', intercept_adjust: bool = False) → SDIDResults#: End-to-end SDID pipeline producing a typed SDIDResults object.

Display plot for SDID.

A single treated cohort is drawn as an observed-versus-counterfactual chart through the shared in-house Plotter (mlsynth.utils.plotting), so SDID looks like every other single-treated-unit estimator. A staggered design (more than one adoption cohort) keeps the pooled event-study chart rendered by mlsynth.utils.resultutils.SDID_plot(), the only sensible aggregate view when cohorts adopt at different times.

mlsynth.utils.sdid_helpers.plotter.plot_sdid(results: SDIDResults, **plot_kwargs: Any) → None#: Render the SDID display plot, choosing the view by treated structure.

Note

SDID.fit() returns an EffectResult on the standardized two-family contract: res.att / res.att_ci / res.counterfactual / res.gap / res.pre_rmse resolve through the standardized sub-models (the flat counterfactual / gap are the treated-unit-weighted aggregate across cohorts). The placebo inference, the pooled event study, and the per-cohort decomposition stay on res.inference_detail / res.event_study / res.cohorts (the bare res.inference slot is reserved for the standardized ATT-level InferenceResults).

Typed result containers for the SDID pipeline.

All matrices follow mlsynth’s (T, N) orientation (rows = time, columns = unit), matching mlsynth.utils.datautils.dataprep(). The Ciccia (2024) quantities are surfaced as first-class fields rather than buried in a nested metadata dict.

class mlsynth.utils.sdid_helpers.structures.SDIDCohort(adoption_period: int, n_treated: int, n_post: int, att: float, att_se: float, att_ci: Tuple[float, float], event_effects: Dict[int, SDIDEventEffect], actual: ndarray, counterfactual: ndarray, unit_weights: ndarray | None = None, time_weights: ndarray | None = None)#

Per-cohort SDID estimator output (Ciccia 2024 Eqs. 2 and 3).

Parameters:

adoption_period (int) – Treatment-onset period for this cohort.
n_treated (int) – Number of treated units in this cohort (N_tr^a).
n_post (int) – Number of post-treatment periods for this cohort (T_tr^a).
att (float) – Cohort ATT tau_a^sdid (Equation 2 of Ciccia 2024).
att_se (float) – Placebo-based standard error for att.
att_ci (Tuple[float, float]) – 95 percent confidence interval for att.
event_effects (Dict[int, SDIDEventEffect]) – Cohort-specific event-time effects tau_{a, ell}^sdid (Equation 3), keyed by event time ell (negative for pre, non-negative for post).
actual (np.ndarray) – Mean treated-unit outcome trajectory, shape (T,).
counterfactual (np.ndarray) – Bias-corrected synthetic control trajectory, shape (T,).

actual: ndarray#

adoption_period: int#

att: float#

att_ci: Tuple[float, float]#

att_se: float#

counterfactual: ndarray#

event_effects: Dict[int, SDIDEventEffect]#

n_post: int#

n_treated: int#

time_weights: ndarray | None = None#

unit_weights: ndarray | None = None#

class mlsynth.utils.sdid_helpers.structures.SDIDEventEffect(ell: int, tau: float, se: float, ci: Tuple[float, float])#

Single event-time effect with placebo-based SE and CI.

ci: Tuple[float, float]#

ell: int#

se: float#

tau: float#

class mlsynth.utils.sdid_helpers.structures.SDIDEventStudy(event_times: ndarray, tau: ndarray, se: ndarray, ci: ndarray)#

Pooled event-study estimator (Ciccia 2024 Equation 6).

Parameters:

event_times (np.ndarray) – Event-time offsets ell covered by the pooled estimator.
tau (np.ndarray) – Pooled effects tau_ell^sdid, aligned with event_times.
se (np.ndarray) – Placebo-based standard errors aligned with tau.
ci (np.ndarray) – Length-2 CI tuples aligned with tau, shape (L, 2).

ci: ndarray#

event_times: ndarray#

se: ndarray#

tau: ndarray#

class mlsynth.utils.sdid_helpers.structures.SDIDInference(att: float, se: float, ci: Tuple[float, float], p_value: float, placebo_att: ndarray, method: str, n_placebo: int)#

Overall ATT and placebo inference (Ciccia 2024 Equation 7).

Parameters:

att (float) – Treated-unit-weighted aggregate ATT across cohorts.
se (float) – Placebo-based standard error.
ci (Tuple[float, float]) – 95 percent confidence interval.
p_value (float) – Two-sided p-value. For the placebo method this is the permutation (|placebo| >= |att|) + 1) / (B + 1); for the jackknife and bootstrap it is the asymptotic-normal 2 * (1 - Phi(|att| / se)).
placebo_att (np.ndarray) – Vector of placebo ATT estimates, useful for diagnostics. Empty for the jackknife and bootstrap methods (which do not form a placebo distribution over the null).
method (str) – Inference method label: "placebo", "jackknife", "bootstrap", or "noinference".
n_placebo (int) – Number of placebo iterations actually completed (may be smaller than the configured B when some iterations yield NaN).

att: float#

ci: Tuple[float, float]#

method: str#

n_placebo: int#

p_value: float#

placebo_att: ndarray#

se: float#

class mlsynth.utils.sdid_helpers.structures.SDIDInputs(cohorts_dict: Dict[int, Dict[str, Any]], treated_unit_name: Any, donor_names: Sequence, time_labels: ndarray, n_pre: int, n_post: int, Ywide: Any, outcome: str)#

Pre-processed two-DataFrame view of the SDID panel.

Parameters:

cohorts_dict (Dict[int, Dict[str, Any]]) – The cohort-keyed payload consumed by the math helpers. Keys are cohort adoption periods (integers); values follow the schema of estimate_cohort_sdid_effects.
treated_unit_name (Any) – Label of the (canonical) treated aggregate. For staggered designs this is the label of an arbitrary representative treated unit.
donor_names (Sequence) – Labels of the donor units in the order matching the donor matrices.
time_labels (np.ndarray) – Time labels in original order.
n_pre (int) – Pre-treatment period count (relative to the earliest cohort).
n_post (int) – Post-treatment period count.
Ywide (Any) – The wide outcome frame produced by dataprep; kept for plotting.
outcome (str) – Outcome variable name.

Ywide: Any#

cohorts_dict: Dict[int, Dict[str, Any]]#

donor_names: Sequence#

n_post: int#

n_pre: int#

outcome: str#

time_labels: ndarray#

treated_unit_name: Any#

class mlsynth.utils.sdid_helpers.structures.SDIDResults(*, effects: EffectsResults | None = None, fit_diagnostics: FitDiagnosticsResults | None = None, time_series: TimeSeriesResults | None = None, weights: WeightsResults | None = None, inference: InferenceResults | None = None, method_details: MethodDetailsResults | None = None, sub_method_results: Dict[str, Any] | None = None, additional_outputs: Dict[str, Any] | None = None, raw_results: Dict[str, Any] | None = None, execution_summary: Dict[str, Any] | None = None, plot_config: PlotConfig | None = None, inputs: SDIDInputs, inference_detail: SDIDInference, event_study: SDIDEventStudy, cohorts: Dict[int, SDIDCohort], raw: Dict[str, Any])#

Public SDID.fit() return container.

An EffectResult (the observational report): it populates the standardized sub-models so the flat accessors (att / att_ci / counterfactual / gap / pre_rmse) resolve through the base contract. The flat counterfactual / gap are the treated-unit-weighted aggregate trajectories across cohorts (for a single cohort, that cohort’s path). The full SDID detail – the placebo inference, the pooled event study, and the per-cohort decomposition – stays in the typed fields below.

Parameters:

inputs (SDIDInputs) – Pre-processed panel.
inference_detail (SDIDInference) – Overall ATT and placebo inference (Equation 7). Was inference before the contract migration; the standardized inference slot now holds the ATT-level InferenceResults.
event_study (SDIDEventStudy) – Pooled event-study estimator (Equation 6).
cohorts (Dict[int, SDIDCohort]) – Per-cohort estimator outputs (Equations 2 and 3), keyed by adoption period.
raw (Dict[str, Any]) – Raw dictionary returned by mlsynth.utils.sdid_helpers.event_study.estimate_event_study_sdid(), retained for reproducibility and downstream tooling.

cohorts: Dict[int, SDIDCohort]#

event_study: SDIDEventStudy#

inference_detail: SDIDInference#

inputs: SDIDInputs#

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'frozen': True, 'json_encoders': {<class 'numpy.ndarray'>: <function BaseEstimatorResults.Config.<lambda>>}}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

raw: Dict[str, Any]#

Example#

import pandas as pd
from mlsynth import SDID

df = pd.read_csv(
    "https://raw.githubusercontent.com/jgreathouse9/mlsynth/"
    "refs/heads/main/basedata/smoking_data.csv"
)
df["Proposition 99"] = df["Proposition 99"].astype(int)

results = SDID({
    "df":       df,
    "outcome":  "cigsale",
    "treat":    "Proposition 99",
    "unitid":   "state",
    "time":     "year",
    "B":        500,        # placebo / bootstrap resamples
    "vce":      "placebo",  # or "jackknife" / "bootstrap" / "noinference"
    "display_graphs": True,
}).fit()

# Overall ATT (Ciccia 2024 Eq. 7) and placebo inference.
print(results.inference_detail.att)        # -15.605 (matches Arkhangelsky et al. 2021)
print(results.inference_detail.se)
print(results.inference_detail.ci)
print(results.inference_detail.p_value)

# Pooled event-study trajectory (Ciccia 2024 Eq. 6).
es = results.event_study
for ell, tau, se in zip(es.event_times, es.tau, es.se):
    print(f"ell={int(ell):>3}  tau={tau:+.3f}  se={se:.3f}")

# Per-cohort decomposition (Ciccia 2024 Eqs. 2 and 3).
for adoption_period, cohort in results.cohorts.items():
    print(adoption_period, cohort.n_treated, cohort.att)
    print(cohort.event_effects[1])  # the first-period dynamic effect

Replication: Proposition 99#

Note

Empirical replication (Path A). Run on the California smoking panel (39 states, 1970-2000; California treated by Proposition 99 from 1989), mlsynth’s SDID reproduces the headline estimate of [aersdid] to three significant figures:

Quantity	mlsynth	Reference
Overall ATT	-15.605	-15.6 (Arkhangelsky et al. 2021, Table 1; `synthdid` R: -15.604)
Placebo SE (B = 500)	7.58	8.4 (placebo SE, Table 1)
95% CI	(-30.5, -0.7)
Placebo p-value	0.032

The point estimate matches the authors’ synthdid package (-15.604) essentially exactly. The placebo standard error is in the same range (7.6 vs. 8.4); it is a resampling estimate and varies with the placebo draw and B. As Arkhangelsky et al. emphasize, SDID’s -15.6 sits well below the DiD estimate (-27.3) and below SC (-19.6), and its SE is smaller than DiD’s (17.7) – the localization payoff.

Per the project’s replication contract (agents/agents_estimators.md), SDID is considered done: the published empirical ATT is reproduced on the same data to machine precision in the point estimate.

Cross-validation. The same estimate is matched to the authors’ own synthdid R package (\(|\Delta| = 1.6\times 10^{-3}\)) and pinned in benchmarks/cases/sdid_prop99.py; see the dedicated page SDID — Synthetic Difference-in-Differences (Arkhangelsky et al. 2021).

References#

Arkhangelsky, D., Athey, S., Hirshberg, D. A., Imbens, G. W., & Wager, S. (2021). “Synthetic Difference-in-Differences.” American Economic Review 111(12):4088-4118.

Ciccia, D. (2024). “A Short Note on Event-Study Synthetic Difference-in-Differences Estimators.” arXiv:2407.09565.

Clarke, D., Pailanir, D., Athey, S., & Imbens, G. (2023). “Synthetic difference in differences estimation.” arXiv preprint.

Synthetic Difference-in-Differences (SDID)

Contents

Synthetic Difference-in-Differences (SDID)#

When to Use This Estimator#

Do not use SDID when#

What SDID Does in Practice#

Notation#

Assumptions#

Why Unit Weights and Why Time Weights#

Mathematical Formulation#

Setup#

Cohort-Specific SDID (Equation 2)#

Cohort-Specific Event Study (Equation 3)#

Pooled Event Study (Equation 6)#

Overall ATT (Equation 7)#

Inference#

Two-DataFrame and Single-Cohort Convergence#

Core API#

Configuration#

Helper Modules#

Example#

Replication: Proposition 99#

References#