Spatial Synthetic Difference-in-Differences (SpSyDiD)#
Overview#
SpSyDiD (Serenini, R., & Masek, F. (2024). “Spatial Synthetic Difference-in-Differences,” SSRN 4736857) extends the Synthetic Difference-in-Differences (SDID) estimator of Arkhangelsky-Athey-Hirshberg-Imbens-Wager (2021) with a spatial spillover term. The estimator separates two estimands that standard SDID confounds when SUTVA is violated by geographic spillovers:
\(\widehat \tau\) – the direct ATT on the directly-treated units (identical in form to standard SDID).
\(\widehat \tau_s\) – the indirect / spillover coefficient per unit of neighbour-treatment exposure \((WD)_{it} = \sum_j w_{ij} D_{jt}\).
The implied population ATE follows from Serenini & Masek’s eq. 14,
where \(\overline{WD}\) is the average exposure across the directly + indirectly treated units in the post-period.
The user supplies a row-standardised \(N \times N\) spatial
weight matrix \(W\). Helpers in
mlsynth.utils.spsydid_helpers.spatial cover the standard
constructions:
knn_weights()– \(k\)-nearest neighbours from coordinates.inverse_distance_weights()– \(w_{ij} \propto 1/d_{ij}^p\) with optional cutoff.contiguity_weights()– queen / rook contiguity from an adjacency dictionary.
When \(W = 0\) (no spatial structure) or no donor has any treated neighbour, SpSyDiD numerically reduces to plain SDID with \(\widehat \tau_s = 0\).
When to Use This Method#
Every difference-based estimator – DiD, synthetic control, and plain Synthetic Difference-in-Differences (SDID) – rests on SUTVA: a control unit’s outcome is unaffected by anyone else’s treatment. Geography routinely breaks this. When a policy in the treated region leaks to its neighbours, those neighbours are exactly the units a synthetic control wants to lean on, and the leakage corrupts the comparison. Serenini & Masek (2024) make the bias explicit:
Spillovers onto units *inside* the donor pool bias and render inconsistent the standard SDID ATT – the synthetic control is built from partially-treated donors, so the “untreated” benchmark is itself moving with the treatment.
Spillovers *outside* the donor pool leave the ATT identifiable but make the population ATE unidentified, because the indirect effect on exposed-but-excluded units is never measured.
SpSyDiD targets this regime directly. It adds a single spatial exposure term \((WD)_{it} = \sum_j w_{ij} D_{jt}\) to the doubly-weighted SDID regression, so the estimator returns two numbers: the direct ATT \(\hat\tau\) (same form as SDID) and the per-exposure indirect coefficient \(\hat\tau_s\). The population ATE then follows from \(\widehat{ATE} = \hat\tau\,(1 + \overline{WD})\) (eq. 14). Relative to the older Spatial DiD of Delgado & Florax (2015), the synthetic weighting sharpens identification of the indirect effect while keeping SDID’s robustness for the direct effect.
Reach for SpSyDiD whenever there is a plausible mechanism for the treatment to leak from the directly-treated units to a subset of the donor pool through spatial or structural proximity, and you can supply a credible row-standardised weight matrix \(W\) encoding that proximity. Typical examples:
Immigration policy with cross-border relocation. Arizona’s 2007 LAWA legislation directly affected Arizona’s noncitizen Hispanic population but also displaced workers to neighbouring states. SDID alone would either bias the ATT (if you include the spillover-affected states as controls) or be unable to estimate the spillover at all.
State tax changes with cross-border shopping. A state sales-tax increase affects that state’s revenue directly and leaks via cross-border shopping into neighbouring states.
Local advertising campaigns with geographic spillovers across DMA boundaries.
Vaccine mandates with cross-state mobility effects.
Do not use SpSyDiD when#
SUTVA holds / there is no spillover concern. With \(W = 0\) or no treated neighbours, SpSyDiD reduces numerically to plain Synthetic Difference-in-Differences (SDID) with \(\hat\tau_s = 0\); the extra exposure column just adds noise. Use Synthetic Difference-in-Differences (SDID) – it is faster and more parsimonious.
You cannot defend a spatial weight matrix. The whole identification of \(\hat\tau_s\) runs through \(W\). If proximity is not the spillover channel (e.g., interference flows through an unobserved social or supply-chain network you cannot encode), a misspecified \(W\) buys biased indirect effects; consider Spillover-Aware Synthetic Control (SPILLSYNTH), which models spillover through donor membership rather than a fixed geographic kernel.
Interference is global or non-local. SpSyDiD assumes exposure is a local, distance-decaying function of neighbours’ treatment. General equilibrium effects that hit every unit equally are absorbed into the time effects and cannot be separated.
You only need the direct ATT and the donor pool is clean. If the spillover-affected units can simply be dropped from the donor pool and the indirect effect is not of interest, plain Synthetic Difference-in-Differences (SDID) on the pruned pool is the simpler honest choice.
Distributional questions (quantiles, tails) – use Distributional Synthetic Control (DSC); or a single treated unit with no spatial structure – use Two-Step Synthetic Control / Forward Difference-in-Differences (FDID).
Mathematical Formulation#
Setup#
We observe \(N\) units indexed \(i = 1, \dots, N\) over \(T\) periods. Treatment begins at \(T_0 + 1\). Let \(Y_{it}\) be the outcome, \(D_{it} \in \{0, 1\}\) the direct treatment indicator, and \(W \in \mathbb{R}^{N \times N}_+\) a row-standardised spatial weight matrix with zero diagonal. The spillover exposure of unit \(i\) at time \(t\) is
The estimator auto-partitions the panel into
\(\mathcal{I}_{\mathrm{tr}}\) – directly treated units (\(D_{it} = 1\) for some \(t\)),
\(\mathcal{I}_{\mathrm{sp}}\) – indirectly treated units (\(D = 0\) always but \((WD) > 0\) for some \(t\)),
\(\mathcal{C}\) – pure controls (\(D = 0\) and \((WD) = 0\) for all \(t\)).
Only \(\mathcal{C}\) is used to fit the SDID unit / time weights.
Algorithm#
Step 1 – SDID weights from pure controls. Following Arkhangelsky et al. (2021), fit the unit weights \(\widehat \omega \in \mathbb{R}^{|\mathcal{C}|}_+\) and time weights \(\widehat \lambda \in \mathbb{R}^{T_0}_+\) (each summing to 1) on \(\mathcal{C}\) only. The regularisation parameter is \(\zeta = T_{\mathrm{post}}^{1/4} \cdot \mathrm{sd}(\Delta Y)\) where \(\mathrm{sd}(\Delta Y)\) is the standard deviation of the first-differenced pre-period donor outcomes.
Step 2 – assemble the full weight vector. Set
Time weights are SDID-fit for the pre-period and uniform \(1 / T_{\mathrm{post}}\) for the post-period.
Step 3 – augmented two-way FE WLS regression. Solve
The augmented design jointly recovers the direct effect \(\widehat \tau\) (the ATT) and the spillover coefficient \(\widehat \tau_s\).
Step 4 – combine. The implied population ATE is \(\widehat{ATE} = \widehat \tau \cdot (1 + \overline{WD})\) where \(\overline{WD}\) is the average exposure across \(\mathcal{I}_{\mathrm{tr}} \cup \mathcal{I}_{\mathrm{sp}}\) in the post-period.
Identification assumptions#
A1. No anticipation – units do not adjust outcomes in advance of the treatment.
A2. Parallel trends – in the absence of treatment, treated, spillover, and control units would have followed similar trends, conditional on unit and time fixed effects.
A3. Additivity and linearity of spillovers – the potential outcome of a unit depends linearly and additively on its own treatment status and the treatment exposure of its neighbours, captured by \((WD)_{it}\).
A4. Limited interference – spillovers operate exclusively through the structure defined by the exogenous \(W\). No other local or global interference mechanisms are assumed.
A5. Synthetic-control transferability – the SDID synthetic control built on the pure controls also approximates the counterfactual trajectory for the indirectly-treated units. This holds when spillover-affected units are spatially / structurally similar to directly-treated units, which is typically the case in geographic spillover settings (neighbours of treated states tend to resemble treated states).
Connection to existing methods#
When \(W = 0\) (no spatial structure), the spillover column vanishes and SpSyDiD reduces to plain SDID with \(\widehat \tau_s = 0\).
When \(\widehat \omega_i = 1 / N_{\mathrm{co}}\) for all controls (uniform weights), SpSyDiD reduces to the Spatial Difference-in-Differences estimator of Delgado & Florax (2015).
When the panel is balanced + no spillover + non-trivial \(W\), SpSyDiD’s \(\widehat \tau\) matches SDID’s ATT.
Core API#
Spatial Synthetic Difference-in-Differences (SpSyDiD) estimator.
Serenini, R., & Masek, F. (2024). “Spatial Synthetic Difference-in-Differences.” SSRN 4736857.
Extends Arkhangelsky-Athey-Hirshberg-Imbens-Wager (2021) SDID with a spatial spillover term so the estimator can disentangle two estimands that the standard SDID confounds when SUTVA is violated by geographic spillovers:
\(\widehat \tau\) – direct effect on the directly-treated units (the ATT, identical in form to standard SDID).
\(\widehat \tau_s\) – spillover coefficient per unit of neighbour-treatment exposure \((WD)_{it} = \sum_j w_{ij} D_{jt}\).
The user supplies a row-standardised \(N \times N\) spatial weight
matrix \(W\) (helpers in
mlsynth.utils.spsydid_helpers.spatial cover the standard
constructions: k-NN from coordinates, inverse distance, queen / rook
contiguity from adjacency). Donors are auto-partitioned into directly
treated, spillover-exposed, and pure controls based on
\(D\) and \(W\). The SDID unit / time weights are computed on
the pure controls; the final WLS regression jointly estimates
\(\tau\) and \(\tau_s\).
When \(W = 0\) (no spatial structure) or no donor has any treated neighbour, SpSyDiD numerically reduces to plain SDID with \(\widehat \tau_s = 0\).
- class mlsynth.estimators.spsydid.SpSyDiD(config: SpSyDiDConfig | dict)#
Bases:
objectSpatial Synthetic Difference-in-Differences estimator.
- Parameters:
config (SpSyDiDConfig or dict) – Configuration object. See
mlsynth.config_models.SpSyDiDConfig.
- fit() SpSyDiDResults#
Run Algorithm 1 of Serenini & Masek (2024).
Configuration#
- class mlsynth.config_models.SpSyDiDConfig(*, df: ~pandas.DataFrame, outcome: str, treat: str, unitid: str, time: str, display_graphs: bool = True, save: bool | str = False, counterfactual_color: ~typing.List[str] = <factory>, treated_color: str = 'black', spatial_matrix: ~typing.Any, unit_order: ~typing.List[~typing.Any] | None = None, row_standardize_spatial: bool = True)#
Configuration for the Spatial Synthetic Difference-in-Differences estimator.
Serenini & Masek (2024). “Spatial Synthetic Difference-in-Differences,” SSRN 4736857. Extends SDID (Arkhangelsky et al. 2021) with a spatial spillover term so the estimator separates the direct ATT from the indirect (spillover) effect on units exposed via the spatial weight matrix \(W\).
- Parameters:
spatial_matrix (np.ndarray) – Square \(N \times N\) spatial weight matrix. Rows / columns must align with
unit_order(orsorted(df[unitid].unique())ifunit_orderis None). Use the helpers inmlsynth.utils.spsydid_helpers.spatialto buildWfrom coordinates (k-NN, inverse distance) or from an adjacency list (queen / rook contiguity).unit_order (list, optional) – Canonical ordering of unit ids matching the rows / columns of
spatial_matrix. IfNone(default), units are ordered bysorted(df[unitid].unique()).row_standardize_spatial (bool) – Row-standardise
Winternally before computing exposure. Default True. Skip when the caller has already standardised.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
Helper Modules#
Spatial weight matrix utilities for SpSyDiD.
The estimator accepts a row-standardised \(N \times N\) spatial weight
matrix \(W\) directly. These helpers cover the common ways one builds
\(W\) in practice, so users can either plug in their own matrix or
construct one from coordinates / adjacency information without an external
dependency on libpysal.
- mlsynth.utils.spsydid_helpers.spatial.contiguity_weights(adjacency: Dict[int, Iterable[int]], unit_order: Sequence, row_standardized: bool = True) ndarray#
Build a contiguity (queen / rook) spatial weight matrix.
- Parameters:
adjacency (dict) –
{unit_id: iterable of neighbour unit_ids}.unit_order (sequence) – Length-
Ncanonical ordering of unit ids matching the panel.row_standardized (bool) – Divide each row by its row sum (i.e., uniform weight 1/k_i across neighbours).
- mlsynth.utils.spsydid_helpers.spatial.inverse_distance_weights(coords: ndarray, cutoff: float | None = None, power: float = 1.0, row_standardized: bool = True) ndarray#
Build an inverse-distance spatial weight matrix.
\(w_{ij} = 1 / d(i, j)^{\text{power}}\) for
i != j, zero elsewhere. Entries beyondcutoff(Euclidean distance) are set to zero.
- mlsynth.utils.spsydid_helpers.spatial.knn_weights(coords: ndarray, k: int, row_standardized: bool = True) ndarray#
Build a \(k\)-nearest-neighbour spatial weight matrix from coords.
- Parameters:
coords (np.ndarray) – Shape
(N, d)of unit coordinates in some metric space (e.g.(lat, lon)or projected(x, y)). Euclidean distance is used; project to a metric CRS for geographic data.k (int) – Number of neighbours per unit (excluding self).
row_standardized (bool) – If True (default), divide each row by
kso weights sum to 1.
- mlsynth.utils.spsydid_helpers.spatial.row_standardize(W: ndarray, warn_isolated: bool = False) ndarray#
Divide each row of
Wby its row sum.Rows with zero sum (units with no neighbours) are left as zero. The paper’s algorithm assumes row-standardised \(W\) so the spillover term \((WD)_{it} = \sum_j w_{ij} D_{jt}\) lies in \([0, 1]\).
- Parameters:
W (np.ndarray) – Non-negative weight matrix.
warn_isolated (bool) – If True, emit a
RuntimeWarningwhen one or more rows sum to zero (units with no spatial neighbours). Such units can never be classified as spillover-exposed and contribute a constant-zero exposure column, which is easy to miss. Off by default to keep the low-level helper quiet; enabled at themlsynth.utils.spsydid_helpers.setup.prepare_spsydid_inputs()boundary.
- mlsynth.utils.spsydid_helpers.spatial.validate_spatial_matrix(W: ndarray, n_units: int) ndarray#
Sanity-check
Wand return a float-array copy.- Checks:
shape is
(n_units, n_units);entries are finite and non-negative;
diagonal is zero (a unit is not its own spatial neighbour).
The matrix is not automatically row-standardised here – pass it through
row_standardize()first if needed.
Micro-panel data preparation for SpSyDiD.
Converts a long-format panel + a spatial weight matrix into the
SpSyDiDInputs container expected by run_spsydid(). The
donor pool is auto-partitioned into three classes following Serenini
& Masek (2024):
Directly treated – units with \(D_{it} = 1\) for some t.
Indirectly treated (spillover-exposed) – units with \(D_{it} = 0\) for all t but \((WD)_{it} > 0\) for some t, i.e. they have at least one spatial neighbour who is treated.
Pure controls – units with \(D = 0\) and \((WD) = 0\) for all t. Only these are used to fit the SDID unit / time weights.
- mlsynth.utils.spsydid_helpers.setup.prepare_spsydid_inputs(df: DataFrame, outcome: str, treat: str, unitid: str, time: str, spatial_matrix: ndarray, unit_order: Sequence | None = None, row_standardize_spatial: bool = True) SpSyDiDInputs#
Pivot a long-format panel into
SpSyDiDInputs.- Parameters:
df (pd.DataFrame) – Balanced long panel with columns
unitid,time,outcome,treat.outcome, treat, unitid, time (str) – Column names.
spatial_matrix (np.ndarray) – Square
(N, N)spatial weight matrix. Rows / columns must be ordered consistently with the units in the panel (useunit_orderto fix the ordering; otherwise sorted uniqueunitidvalues are used).unit_order (sequence, optional) – Canonical ordering of unit ids that matches the rows / columns of
spatial_matrix. IfNone(default), units are ordered bysorted(df[unitid].unique()).row_standardize_spatial (bool) – If True (default), row-standardise
Wbefore storing it. Skip when the caller has already standardised.
SDID weight-computation primitives duplicated for SpSyDiD.
These functions are intentionally duplicated from
mlsynth/utils/sdid_helpers/weights.py rather than imported. The
duplication isolates SpSyDiD from future changes to the SDID pipeline
so silent behavioural drift cannot occur. If the upstream SDID
formulas change, this module should be updated deliberately.
Wraps the Arkhangelsky-Athey-Hirshberg-Imbens-Wager (2021) unit-weight QP, time-weight QP, and the \(\zeta = T_{\text{post}}^{1/4} \cdot \mathrm{std}(\Delta Y)\) regularisation rule.
- mlsynth.utils.spsydid_helpers.weights.compute_regularization(donor_outcomes_pre: ndarray, num_post_periods: int) float#
SDID \(\zeta = T_{\text{post}}^{1/4} \cdot \mathrm{sd}(\Delta Y)\).
The standard deviation is of the first-differenced pre-period donor outcomes (Arkhangelsky et al. 2021 Section 3).
- mlsynth.utils.spsydid_helpers.weights.fit_time_weights(donor_outcomes_pre: ndarray, mean_donor_outcomes_post: ndarray) Tuple[float | None, ndarray | None]#
SDID time-weight QP.
Solve for
(beta_0, lambda)minimising \(\| \beta_0 \mathbf 1 + \Lambda^\top \mathrm{Y}_{0,\mathrm{pre}} - \bar y_{0,\mathrm{post}} \|_2^2\) subject tosum(lambda) == 1andlambda >= 0.
- mlsynth.utils.spsydid_helpers.weights.fit_unit_weights(donor_outcomes_pre: ndarray, mean_treated_outcome_pre: ndarray, zeta: float) Tuple[float | None, ndarray | None]#
SDID unit-weight QP.
Solve for
(omega_0, omega)minimising \(\| \omega_0 \mathbf 1 + \mathrm Y_{0,\mathrm{pre}} \omega - \bar y_{1,\mathrm{pre}} \|_2^2 + T_0 \zeta^2 \|\omega\|_2^2\) subject tosum(omega) == 1andomega >= 0.
Orchestration pipeline for Spatial Synthetic Difference-in-Differences.
Implements Algorithm 1 of Serenini & Masek (2024):
Compute SDID unit / time weights using only the pure controls as donors (Arkhangelsky et al. 2021 QPs duplicated in
weights).Fix the per-unit weight as \(\omega_i = 1 / N_{\mathrm{tr}}\) for directly-treated units, \(\omega_i = 1 / N_{\mathrm{sp}}\) for indirectly-treated units, and SDID-fit \(\omega_i\) for pure controls.
Run the weighted two-way FE regression
\[(\widehat \tau, \widehat \tau_s, \widehat \mu, \widehat \alpha, \widehat \beta) = \arg \min \sum_{i, t} \bigl[ Y_{it} - \mu - \alpha_i - \beta_t - \tau D_{it} - \tau_s (WD)_{it} \bigr]^2 \widehat \omega_i\, \widehat \lambda_t.\]The augmented design recovers the direct effect \(\widehat \tau\) and the spillover coefficient \(\widehat \tau_s\) jointly.
The implied population ATE is \(\widehat \tau (1 + \overline{WD})\) with \(\overline{WD}\) the average exposure across directly + indirectly treated units (paper eq. 14).
- mlsynth.utils.spsydid_helpers.pipeline.run_spsydid(inputs: SpSyDiDInputs) SpSyDiDResults#
Run Algorithm 1 of Serenini & Masek (2024).
Frozen dataclasses for the Spatial Synthetic Difference-in-Differences estimator.
Serenini & Masek (2024). “Spatial Synthetic Difference-in-Differences.” SSRN 4736857. Extends Arkhangelsky-Athey-Hirshberg-Imbens-Wager (2021) SDID with a spatial spillover term \(\tau_s\) so the estimator can disentangle the direct ATT on the directly-treated units from the indirect (spillover) effect on units exposed via a spatial weight matrix \(W\).
Two estimands fall out of one regression:
\(\widehat \tau\) – direct effect on the directly-treated units (the ATT, identical in form to standard SDID).
\(\widehat \tau_s\) – spillover effect per unit of neighbour-treatment exposure \((WD)_{it} = \sum_j w_{ij} D_{jt}\).
- class mlsynth.utils.spsydid_helpers.structures.SpSyDiDInputs(outcome_matrix: ndarray, treatment_matrix: ndarray, spatial_matrix: ndarray, exposure_matrix: ndarray, unit_names: List[Any], time_labels: ndarray, T: int, T0: int, direct_indices: ndarray, spillover_indices: ndarray, pure_control_indices: ndarray)#
Preprocessed panel + spatial weights for SpSyDiD.
- outcome_matrix#
(N, T)panel of outcomes, ordered byunit_names.- Type:
np.ndarray
- treatment_matrix#
(N, T)panel of 0/1 treatment indicators.- Type:
np.ndarray
- spatial_matrix#
Row-standardised spatial weight matrix, shape
(N, N), ordered consistently withunit_names.- Type:
np.ndarray
- exposure_matrix#
Pre-computed spillover exposure \((WD)_{it} = \sum_j w_{ij} D_{jt}\), shape
(N, T).- Type:
np.ndarray
- time_labels#
Length-
Tordering of time-period labels.- Type:
np.ndarray
- direct_indices#
Indices of directly treated units (those with
D=1at some t).- Type:
np.ndarray
- spillover_indices#
Indices of indirectly treated units (
D=0always but(WD)_it > 0at some t).- Type:
np.ndarray
- pure_control_indices#
Indices of pure controls (
D=0and(WD)=0for all t).- Type:
np.ndarray
- direct_indices: ndarray#
- exposure_matrix: ndarray#
- outcome_matrix: ndarray#
- pure_control_indices: ndarray#
- spatial_matrix: ndarray#
- spillover_indices: ndarray#
- time_labels: ndarray#
- treatment_matrix: ndarray#
- class mlsynth.utils.spsydid_helpers.structures.SpSyDiDResults(inputs: ~mlsynth.utils.spsydid_helpers.structures.SpSyDiDInputs, att: float, aite: float, ate: float, unit_weights: ~typing.Dict[~typing.Any, float], time_weights: ~numpy.ndarray, zeta: float, weights: ~typing.Any | None = None, metadata: ~typing.Dict[str, ~typing.Any] = <factory>)#
Top-level container returned by
mlsynth.SpSyDiD.fit().- inputs#
Preprocessed panel + W matrix + auto-detected partition.
- Type:
- att#
Direct treatment effect on the treated \(\widehat \tau\) (identical in form to standard SDID).
- Type:
- aite#
Average indirect treatment effect per unit of exposure \(\widehat \tau_s\). Multiply by the average exposure to recover the population-level spillover.
- Type:
- ate#
Implied population-level ATE \(\widehat \tau \cdot (1 + \bar{WD})\) per the paper’s eq. 14, with \(\bar{WD}\) the average exposure across the directly + indirectly treated units.
- Type:
- unit_weights#
Mapping
{unit_name: omega}– the per-unit weights used in the final WLS regression (SDID-style for pure controls, uniform \(1/N_{tr}\) for directly treated and \(1/N_{sp}\) for indirectly treated).- Type:
- time_weights#
Length-
T0SDID time weights for the pre-period (post-period weights are uniform \(1/T_{\text{post}}\) and not stored).- Type:
np.ndarray
- zeta#
SDID regularisation parameter from Arkhangelsky et al. 2021 (used in the unit-weight QP for pure controls).
- Type:
- inputs: SpSyDiDInputs#
- time_weights: ndarray#
Example#
A self-contained one-draw Monte Carlo on a \(8 \times 8\) spatial grid. Six well-spaced units receive treatment of magnitude \(\tau = 2.0\); their \(k = 4\) neighbours absorb a spillover of \(\tau_s = 1.0\) per unit of exposure. SpSyDiD with the same \(W\) recovers both estimates.
"""One draw of a spatial spillover simulation."""
import numpy as np
import pandas as pd
from mlsynth import SpSyDiD
from mlsynth.utils.spsydid_helpers.spatial import knn_weights
# ---------------------------------------------------------------------
# 1. Lay out an 8x8 grid of units
# ---------------------------------------------------------------------
rng = np.random.default_rng(0)
xs, ys = np.meshgrid(np.arange(8), np.arange(8))
coords = np.column_stack([xs.flatten(), ys.flatten()])
N = coords.shape[0]
T_pre, T_post = 16, 8
T = T_pre + T_post
W = knn_weights(coords, k=4, row_standardized=True)
# ---------------------------------------------------------------------
# 2. Two-way FE DGP with planted direct + spillover effects
# ---------------------------------------------------------------------
tau_true = 2.0
tau_s_true = 1.0
unit_fe = rng.standard_normal(N) * 0.5
time_fe = np.linspace(0.0, 1.0, T)
Y0 = (
unit_fe[:, None]
+ time_fe[None, :]
+ rng.standard_normal((N, T)) * 0.2
)
D = np.zeros((N, T), dtype=float)
for u in (0, 7, 24, 39, 56, 63):
D[u, T_pre:] = 1.0
Y = Y0 + tau_true * D + tau_s_true * (W @ D)
# ---------------------------------------------------------------------
# 3. Long DataFrame
# ---------------------------------------------------------------------
rows = [
{"unit": i, "time": t, "y": float(Y[i, t]), "D": float(D[i, t])}
for i in range(N)
for t in range(T)
]
df = pd.DataFrame(rows)
# ---------------------------------------------------------------------
# 4. Fit SpSyDiD
# ---------------------------------------------------------------------
res = SpSyDiD({
"df": df,
"outcome": "y",
"treat": "D",
"unitid": "unit",
"time": "time",
"spatial_matrix": W,
}).fit()
# ---------------------------------------------------------------------
# 5. Inspect the output
# ---------------------------------------------------------------------
print(f"true tau = {tau_true:+.3f} tau_hat = {res.att:+.3f}")
print(f"true tau_s = {tau_s_true:+.3f} tau_s_hat = {res.aite:+.3f}")
print(f"ATE = {res.ate:+.3f}")
print(f"partition : {res.inputs.N_direct} direct, "
f"{res.inputs.N_spillover} spillover, "
f"{res.inputs.N_pure} pure controls")
print(f"mean post-period exposure on treated union = "
f"{res.metadata['mean_exposure_post_treated']:.3f}")
Verification (Path-B Monte Carlo)#
Serenini & Masek (2024) include an empirical example (the Arizona
2007 LAWA effect on noncitizen Hispanic share, Tables 8-11) but do
not release the CPS panel used to construct it – their public
replication repo
(renanserenini/spatial_SDID) ships only the
simulation code and a BLS unemployment panel for two Monte Carlo
exercises. We therefore satisfy the Path-B contract by reproducing
those two simulation findings against the authors’ own driver
(functions_ssdid.py in their repo), invoking
SpSyDiD(config).fit() end-to-end on every replication.
The reference panels and adjacency matrices ship with mlsynth in
basedata/:
state_unemployment.csv– BLS monthly state unemployment 1976-2014.US_no_islands_matrix.gal– queen-contiguity W for the 49 contiguous states.spsydid_bls_county_subset.csv– the BLS county-employment slice (2002-2004, states WY/OR/PA/AL) used in the county-level MC.spsydid_county_matrices.pkl– per-state county adjacency matrices.
State-level Monte Carlo (40 rolling-window replications)#
Reproduces State_Level_Simulations.ipynb: at each 3-year window
starting in 1975..2014, treat Arkansas (FIPS 5) only and inject
\(\text{ATT} = 25\%\) of mean unemployment plus
\(\rho = 0.8\) spillover via the queen-contiguity W. We compare
the authors’ reference algorithm against SpSyDiD(config).fit() on
the same 40 panels to test for per-rep agreement.
ref-mean ref-sd mlsynth-mean mlsynth-sd
ATT bias +0.0187 0.3204 +0.0189 0.3229
rho bias (tau_s/ATT) +0.0596 0.9228 +0.0669 0.9965
per-rep correlation: ATT 0.9917 rho 0.9948
Both estimators recover the paper’s headline finding: the mean ATT bias is essentially zero (~0.019 against an ATT magnitude of ~1.5 percentage points). Per-replication, the two implementations agree to ~0.02 on every panel realisation; the small residual is the unit-weight assignment for affected rows (mlsynth: \(1/N_{sp}\); reference: mean of treated-unit SDID weights). Both choices are valid downstream of the SDID weight QPs.
The driver is examples/spsydid/replicate_state_level_mc.py;
run with python -m examples.spsydid.replicate_state_level_mc
--reps 40.
County-level Monte Carlo (4 states x 200 reps)#
Reproduces Monte_Carlo_Simulations.ipynb: for each of WY, OR, PA,
AL, randomly draw 10% of counties as directly treated (multiple
treated units per rep), inject \(\text{ATT} = -25\%\) of mean
unemployment plus \(\rho = 0.5\) spillover, fit
SpSyDiD(config).fit(), repeat. The four states span 23-67
counties, so the test is whether the SUTVA correction works across
panel sizes.
state #counties #treated ATT bias mean (sd) AITE bias mean (sd)
WY 23 2 -0.003 0.260 -0.018 0.143
OR 36 4 -0.023 0.225 -0.022 0.183
PA 67 7 +0.034 0.246 +0.062 0.127
AL 67 7 +0.028 0.228 -0.000 0.139
In every cell the absolute mean ATT bias is below 0.04 against an ATT magnitude of ~-1.5 – the spatial-DGP-induced bias of plain SDID is cleanly removed by the SpSyDiD correction at the county scale.
The driver is examples/spsydid/replicate_county_level_mc.py;
run with python -m examples.spsydid.replicate_county_level_mc
--reps 200.
References#
Arkhangelsky, D., Athey, S., Hirshberg, D. A., Imbens, G. W., & Wager, S. (2021). “Synthetic Difference-in-Differences.” American Economic Review 111(12):4088-4118.
Delgado, M. S., & Florax, R. J. G. M. (2015). “Difference-in-Differences Techniques for Spatial Data: Local Autocorrelation and Spatial Interaction.” Economics Letters 137:123-126.
Serenini, R., & Masek, F. (2024). “Spatial Synthetic Difference-in-Differences.” SSRN Working Paper 4736857.