Dynamic Synthetic Control for Auto-Regressive processes (DSCAR)#
Overview#
DSCAR is mlsynth’s implementation of the Dynamic Synthetic Control method of Zheng & Chen (2024). DSCAR extends classical Abadie-Diamond- Hainmueller (2010) synthetic control to settings with:
time-varying confounders \(X_{it}\) (e.g., meteorological variables in an air-pollution panel),
an explicit auto-regressive outcome model \(Y_{it}(0) = \delta_t + \beta_t^\prime X_{it} + \rho_t Y_{i, t-1} + \varepsilon_{it}\),
spatial dependence in the residuals \(\varepsilon_{it}\), and
multiple treated units sharing a common intervention time.
The matching weights are time-varying – computed afresh at every post-period via empirical-likelihood maximisation under per-period matching constraints (equations 2.7-2.9 of the paper). This is different from the L2 simplex weight typical of mlsynth’s other estimators; the EL formulation lets DSCAR attain exact covariate matches as \(N_{co}, N_{tr} \to \infty\) with \(T\) fixed.
The acronym DSCAR is used in mlsynth to distinguish this estimator
from the Distributional Synthetic Control of Gunsilius (2023),
which ships under mlsynth.DSC.
When to Use This Method#
The motivating problem in Zheng & Chen (2024) is an air-pollution alert: a city authority orders mandatory emission cuts when bad air is forecast, and you want to know whether the alert actually lowered pollution at the affected monitoring stations. Three features of that data break the classical synthetic-control toolkit, and they recur far beyond air quality – in marketing panels from wearables and loyalty apps, in clinical monitoring, in any micro-level spatio-temporal study:
Time-varying confounders. The thing that drives the outcome (meteorology for pollution, weather/promotions for sales, vitals for health) moves every period. Classical SC matches on time-invariant covariates plus the whole pre-treatment outcome trajectory; it has no natural way to match a confounder that is a different value at every \(t\).
Autoregressive outcomes. Hourly pollutant concentrations, daily prices, and weekly sales carry the previous period forward (\(Y_{it}(0) = \delta_t + \beta_t' X_{it} + \rho_t Y_{i,t-1}(0) + \varepsilon_{it}\)). The lagged outcome is itself a confounder that has to be matched.
Many units, short panel, spatial dependence. Micro-level studies have lots of units (dozens to hundreds of stations) observed over a short window, and neighbouring units’ shocks are correlated. Classical SC’s consistency needs the pre-period \(T_0 \to \infty\); here \(T_0\) is small and what grows is the number of units.
DSCAR (Zheng & Chen’s Dynamic Synthetic Control) is built for exactly this regime. Instead of one fixed weight vector matched on the full \((p + T_0)\)-dimensional pre-trajectory, it constructs a fresh weight vector at every post-period that matches only the current confounder state and one lagged outcome – a \((p+1)\)-dimensional match. Two consequences follow, and they are the reasons to reach for it:
Matching becomes feasible. A low-dimensional per-period match is far easier to satisfy exactly than SC’s full-trajectory match; the paper proves the exact match is attained with probability approaching one. The weights are pinned down by empirical-likelihood maximisation (\(\max \prod_i w_{it}\) under the matching constraints), which guarantees a unique solution and lets the authors characterise the weights’ asymptotic order – the lever behind the consistency proof.
The asymptotics run in :math:`N`, not :math:`T_0`. \(\hat\tau_t\) is consistent as \(N_{tr}, N_{co} \to \infty\) with \(T\) fixed, in marked contrast to Abadie et al. (2010), which needs \(T_0 \to \infty\). This is what makes DSCAR a micro-data estimator: it naturally handles multiple treated units sharing a common intervention time and spatially dependent outcomes and errors.
DSCAR also turns the unconfoundedness assumption from an article of faith into something testable. Because \(Y_{it} = Y_{it}(0)\) for every unit before treatment, you can run the estimator on the pre-period and check whether the pseudo-effects are zero (Section 3.1; with an FDR correction across periods). A non-zero pre-period effect flags a misspecified model – a cue to change covariates or add higher-order lags. For post-period significance, the paper supplies a normalised placebo test that rescales each control’s placebo path, addressing the asymmetry that ordinary placebo tests ignore when treated and control units have different error variances.
Dynamic per-period matching vs. fixed-trajectory SC#
Both methods build a counterfactual from a weighted average of controls; the difference is what gets matched and when.
Classical SC (ADH 2010) |
DSCAR (Zheng & Chen 2024) |
|
|---|---|---|
Weights |
one vector, constant in time |
re-solved every post-period |
Match target |
time-invariant covariates + full \(T_0\) outcome path |
current confounder state + one lagged outcome |
Confounders |
time-invariant |
time-varying |
Outcome model |
factor model \(\lambda_t' \xi_i\) |
autoregressive \(\rho_t Y_{i,t-1}\) |
Treated units |
typically one |
many, common timing |
Cross-unit dependence |
independent donors |
spatial dependence allowed |
Consistency regime |
\(T_0 \to \infty\) |
\(N_{tr}, N_{co} \to \infty\), \(T\) fixed |
Weight solver |
quadratic program (may be non-unique) |
empirical likelihood (unique) |
Reach for DSCAR when#
The outcome is strongly autocorrelated – hourly pollutant concentrations, daily prices, weekly sales – so an AR(1) (or AR(\(k\))) structure is a defensible model of the untreated path.
You have time-varying covariates you want to match period-by-period, not a static covariate snapshot.
You are in the high-:math:`N`, moderate-:math:`T` micro-panel regime (e.g., 50-100 monitoring stations or stores × 50-100 periods), possibly with many treated units that all switch at a common time.
Outcomes and shocks are spatially dependent across units (nearby stations, neighbouring stores) – DSCAR’s theory accommodates this where classic placebo inference does not.
You want to test unconfoundedness / model specification on the pre-period rather than assume it, and you want a placebo test that is normalised to handle treated/control variance asymmetry.
Do not use DSCAR when#
You have a single treated aggregate unit, a long pre-period, and no time-varying confounders – the canonical Prop 99 / Basque setting. Classic SC and its mlsynth refinements (Two-Step Synthetic Control, Synthetic Control with Multiple Outcomes (SCMO), Forward Difference-in-Differences (FDID)) are faster, simpler, and give an interpretable convex-weight story. DSCAR’s per-period EL refinement is overkill when the covariate trajectory does not matter.
The outcome is not autoregressive and has no time-varying confounders. The AR matching constraint (2.9) buys you nothing; use a factor-model estimator (Factor Model Approach (FMA)) or classic SC.
The panel is small in :math:`N` (a handful of units). DSCAR’s consistency runs in \(N\); with few units the asymptotics do not engage and the per-period match is noisy. Prefer Two-Step Synthetic Control or Forward Difference-in-Differences (FDID), whose theory tolerates a long-\(T_0\), small-\(N\) panel.
The treatment moves the confounders (Assumption 3 is violated – e.g., the alert itself changes the meteorology you match on). DSCAR is then biased; you need a method that models the confounder response, or a proximal/IV design (Proximal Inference Synthetic Control (PROXIMAL), Synthetic IV).
You need distributional effects (quantiles, Lorenz, tails) rather than the mean ATT – use Distributional Synthetic Control (DSC) (Distributional Synthetic Control), which ships separately as
mlsynth.DSC.There is interference between treated and control units beyond the modelled spatial error dependence (treatment spillovers onto the donor pool). Use Spillover-Aware Synthetic Control (SPILLSYNTH) or Spatial Synthetic Difference-in-Differences (SpSyDiD).
Method#
The estimator works in three steps. Given a panel of \(N\) units over \(T\) periods, \(N_{tr}\) of which are directly treated starting at \(t = T_0 + 1\):
Variable-importance matrix
V_t. For each \(t\), fit an OLS of \(Y_t\) on \((Y_{t-1}, X_t)\) across the cross- section (full panel for \(t \leq T_0\), donors only for \(t > T_0\)), and setV_t = diag(|coefficients|). This is the per-period analogue of the SCMVmatrix.Per-period EL weights \(w_t^*\). Solve the convex QP
\[\min_w (Z_{1t} - Z_{0t} w)^\prime V_t (Z_{1t} - Z_{0t} w) \qquad \text{s.t.} \quad \sum_i w_i = 1,\ 0 \leq w_i \leq 1,\]where \(Z_{1t}\) is the treated-mean covariate target at \(t\) (and the lagged outcome) and \(Z_{0t}\) are the donor-side analogues. When the QP residual is small enough (default \(\leq 0.01\) mean absolute), refine by maximising \(\prod_i w_i\) subject to the same matching constraints – this is the empirical-likelihood step that gives DSCAR its asymptotic-theory guarantees (Theorem 1 of the paper).
Dynamic matching of the lag. For \(t > T_0 + 1\), the treated-side lagged-outcome target is the previously-estimated counterfactual \(\widehat \mu_{t-1}(0)\), not the observed treated outcome (which carries the treatment). This recursion makes the bias term in equation (2.11) stochastically small.
The treatment-effect estimator is
and the headline ATT is the post-period mean of \(\widehat \tau_t\).
Assumptions#
The Zheng & Chen (2024) consistency theorem requires:
Consistency: \(Y_{it} = D_{it} Y_{it}(1) + (1 - D_{it}) Y_{it}(0)\) for all \(i, t\).
Unconfoundedness: \(\mathbb{E}[Y_{it}(0) | Y_{i, t-1}(0), X_{it}, D_i = 1] = \mathbb{E}[Y_{it}(0) | Y_{i, t-1}(0), X_{it}, D_i = 0]\).
Unaffected confounders: \(X_{it} = X_{it}(0) = X_{it}(1)\) – the treatment does not move the confounders.
Treatment-overlap: \(\mathbb{P}(D_i = 1 | X_{it}, Y_{i, t-1}(0)) < 1\) with probability one.
For inference (Section 3 of the paper), additional finite-moment and positive-definite-covariance conditions on \(\varepsilon_{it}\) are required.
Theorem 1 (consistency). Under Assumptions 1-7 and the linear model \(Y_{it}(0) = \delta_t + \beta_t^\prime X_{it} + \rho_t Y_{i, t-1}(0) + \varepsilon_{it}\), the DSCAR estimator \(\widehat \tau_t\) converges to \(\tau_t\) in probability as both \(N_{tr}, N_{co} \to \infty\) with \(T\) fixed. The asymptotic regime is in marked contrast with Abadie et al. (2010), which requires \(T_0 \to \infty\).
Core API#
Dynamic Synthetic Control for Auto-Regressive processes (DSCAR).
Zheng, X., & Chen, S. X. (2024). Dynamic synthetic control method for evaluating treatment effects in auto-regressive processes. Journal of the Royal Statistical Society Series B, 86(1):155-176.
DSCAR extends Abadie-Diamond-Hainmueller (2010) to panels with time-varying confounders, spatial dependence in the residuals, and an auto-regressive outcome model. The matching weights are computed per post-period via empirical likelihood (equations 2.7-2.9 of the paper), so the synthetic control tracks both the current covariate state and the previous-period potential outcome.
Public API: DSCAR(config).fit() -> DSCARResults.
(The acronym DSCAR is used in mlsynth to distinguish this method
from the Distributional Synthetic Control of Gunsilius (2023), which
ships under mlsynth.DSC.)
- class mlsynth.estimators.dscar.DSCAR(config: DSCARConfig | dict)#
Bases:
objectDynamic Synthetic Control for Auto-Regressive processes.
- Parameters:
config (DSCARConfig or dict) – See
mlsynth.config_models.DSCARConfig.
- fit() DSCARResults#
Configuration#
- class mlsynth.config_models.DSCARConfig(*, df: ~pandas.DataFrame, outcome: str, treat: str, unitid: str, time: str, display_graphs: bool = True, save: bool | str = False, counterfactual_color: ~typing.List[str] = <factory>, treated_color: str = 'black', exog_covariates: ~typing.List[str] | None = None, lagged_outcome: str | None = None, placebo_reps: int = 0, el_tolerance: float = 0.01, fdr_alpha: float = 0.05, seed: int = 0)#
Configuration for the Dynamic Synthetic Control AR (DSCAR) estimator.
Zheng & Chen (2024), JRSS-B 86(1):155-176, “Dynamic synthetic control method for evaluating treatment effects in auto-regressive processes.” Extends Abadie-Diamond-Hainmueller (2010) to settings with time-varying confounders, an auto-regressive outcome, spatial dependence in the residuals, and possibly multiple treated units that all turn on at a common intervention time.
The DSC weights are computed per post-period via empirical- likelihood maximisation under per-period matching constraints (equations 2.7-2.9 of the paper), allowing exact matches as \(N_{co}, N_{tr} \to \infty\) with \(T\) fixed – the asymptotic regime that suits a typical air-pollution / hourly panel.
- Parameters:
exog_covariates (list of str, optional) – Time-varying exogenous covariate columns to include in the per-period matching.
Noneskips the covariate match (DSC then matches on the lagged outcome only).lagged_outcome (str, optional) – Column name supplying the externally-computed \(Y_{i, t-1}\) value at the first sample period. For later periods the lag is read off the panel itself.
Nonedrops the lag constraint att = 1.placebo_reps (int) – Number of normalised-placebo replications for the SE on the DSC ATT (Section 3.2).
0(default) skips placebo inference.el_tolerance (float) – Threshold for the QP residual
mean|Z_1 - Z_0 w|at which the EL refinement step is attempted; matches the R reference’s default0.01. Smaller values fall back to QP weights more often.fdr_alpha (float) – Significance level for the BY-adjusted pre-period unconfoundedness test (Section 3.1).
seed (int) – RNG seed for the placebo draw.
References
Zheng, X., & Chen, S. X. (2024). Dynamic synthetic control method for evaluating treatment effects in auto-regressive processes. Journal of the Royal Statistical Society Series B, 86(1):155-176.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
Helper Modules#
Panel reshape for the Dynamic Synthetic Control estimator.
DSC operates on a long-format panel with a treated indicator, an
optional lagged-outcome column (pm25_lag1 in the paper’s air-
pollution example), and one or more time-varying exogenous covariates.
This module:
Validates the columns the user named in
DSCARConfig.Pivots to wide format
(N, T)with treated units first.Constructs the one-period-lag outcome cube
Y_lag1– using the user-provided lag column fort = 1andY[:, t - 1]for later periods.Stacks the exogenous covariates into an
(N, T, p)cube.
- mlsynth.utils.dscar_helpers.setup.prepare_dsc_inputs(df: DataFrame, *, outcome: str, treat: str, unitid: str, time: str, exog_covariates: Sequence[str] | None = None, lagged_outcome: str | None = None) DSCARInputs#
Pivot a long-format panel into the inputs DSC consumes.
- Parameters:
df (pandas.DataFrame) – Long-format panel with one row per unit-time.
outcome, treat, unitid, time (str) – Column names.
treatshould be1at every row where the unit is part of the directly-treated group (regardless of pre / post timing); the per-row pre/post split is inferred from the first period at which anytreat == 1row appears.exog_covariates (sequence of str, optional) – Time-varying exogenous covariate columns. When
Nonethe DSC matching uses only the lagged outcome.lagged_outcome (str, optional) – Column carrying the externally-supplied
Y_{t-1}at the first sample period. WhenNone, the lag fort = 1isNaNfor every unit and the corresponding matching constraint is dropped att = 1.
- Returns:
DSCARInputs
Per-period weight solver for Dynamic Synthetic Control.
Implements equations (2.7) and (2.17-2.18) of Zheng & Chen (2024):
QP step (the convex-hull feasibility check). Solve
\[\min_w (Z_1 - Z_0 w)' V (Z_1 - Z_0 w) + \big( \sum_i w_i - 1 \big)^2 \qquad \text{s.t.} \sum w_i = 1,\ 0 \le w_i \le 1.\]This always has a solution and gives a feasible starting point.
EL refinement (the empirical-likelihood step). When the QP solution has mean
|Z_1 - Z_0 w| <= eps, refine by\[\max_w \prod_i w_i \qquad \text{s.t.} Z_0 w = Z_1,\ \sum w_i = 1,\ 0 \le w_i \le 1.\]This is the empirical-likelihood maximisation that gives DSC its asymptotic-theory guarantees (Theorem 1 of the paper). When the EL step diverges, fail open: use the QP solution.
The R reference (Dynamic_Synthetic_Control_new.R) uses
LowRankQP for step 1 and NlcOptim::solnl for step 2; we
mirror those with cvxpy and scipy.optimize.minimize respectively.
- mlsynth.utils.dscar_helpers.weights.solve_dsc_weights(Z1: ndarray, Z0: ndarray, V_diag: ndarray, *, el_tolerance: float = 0.01) Tuple[ndarray, bool]#
One-period DSC weight solve: QP feasibility + EL refinement.
- Parameters:
Z1 (np.ndarray) – Length-
ktreated targets (per-period covariates + lagged outcome).Z0 (np.ndarray) – Shape
(k, N_co)donor targets.V_diag (np.ndarray) – Length-
kvariable-importance diagonal for the QP.el_tolerance (float) – Maximum mean absolute mismatch
|Z_1 - Z_0 w|at which the EL refinement step is attempted. (R reference uses0.01.) When the QP residual exceeds this, the EL step is skipped and the QP weights are returned.
- Returns:
weights (np.ndarray) – Length-
N_cosimplex weight vector.used_el (bool) –
Trueif EL refinement succeeded and was used;Falseotherwise (QP fallback).
- mlsynth.utils.dscar_helpers.weights.variable_importance(Y: ndarray, X: ndarray, Y_lag1: ndarray, T0: int) ndarray#
Per-period OLS-coefficient magnitudes used as the V diagonal.
For each time
tin1..T, regressY[:, t]on[Y_lag1[:, t], X[:, t, :]]over the panel and return|coefficients|(ignoring the intercept). Mirrors the R reference’sparamtmatrix.- Parameters:
Y, X, Y_lag1 (np.ndarray) – Outcome, covariate cube, and lagged outcome (see
DSCARInputs).T0 (int) – Pre-period length. For
t <= T0the full panel is used; fort > T0only the donor rows are used (the treated rows carry the post-treatment outcomes, which would contaminate the coefficient estimate).
- Returns:
np.ndarray – Shape
(T, 1 + p)per-period absolute coefficient matrix. Column0is|rho_t|(lagged outcome); columns1..pare|beta_t|(exogenous covariates).
End-to-end Dynamic Synthetic Control orchestrator.
Wraps the per-period weight solver in weights with the
algorithm’s recursive structure (Section 2.2 of Zheng & Chen 2024):
For the pre-period and the first post-period, match on the observed lagged outcome \(Y_{i, t-1}\).
For each subsequent post-period \(t > T_0 + 1\), replace the treated unit’s lagged-outcome target by the previously-estimated counterfactual \(\widehat \mu_{t-1}(0)\). This is the dynamic matching that makes the bias term in eq. (2.11) stochastically small.
- mlsynth.utils.dscar_helpers.pipeline.run_dsc(inputs: DSCARInputs, *, el_tolerance: float = 0.01, placebo_reps: int = 0, do_fdr_test: bool = True, fdr_alpha: float = 0.05, seed: int = 0) DSCARFit#
Walk the panel period by period and assemble the DSCARFit.
- Parameters:
inputs (DSCARInputs) – Output of
prepare_dsc_inputs().el_tolerance (float) – Mean-absolute-mismatch threshold for triggering EL refinement.
placebo_reps (int) – If
> 0, run a normalised placebo test (Section 3.2) with this many random control-only “treated” draws and populateDSCARFit.se/DSCARFit.placebo_atts.do_fdr_test (bool) – If
True, run the FDR-controlled per-pre-period unconfoundedness test (Section 3.1) and populateDSCARFit.pre_period_pvalues/DSCARFit.pre_period_min_pvalue_adj.fdr_alpha (float) – Significance level for the FDR test.
seed (int) – RNG seed for placebo draws.
Inference for the Dynamic Synthetic Control method.
Two procedures from Sections 3.1 and 3.2 of Zheng & Chen (2024):
FDR-controlled unconfoundedness test (Section 3.1). For each pre-period
t, testH_0: gap_t = 0using a z-statisticz_t = gap_t / (sigma * v_eta_t), wherev_eta_t = sqrt(1/n_treated + sum_i w_{t, i}^2)andsigmais the residual SD from the pre-period AR-1 modelY_it(0) = delta_t + beta_t' X_it + rho_t Y_{i, t-1} + eps_it. Benjamini-Yekutieli correction across theT0per-period tests.Normalised placebo test (Section 3.2). Sample
n_treateddonor units uniformly at randomKtimes; re-run DSC treating them as the placebo “treated” group; normalise each placebo’s post-period mean effect by its own per-rep SD so the empirical distribution is on the same scale as the real ATT. The placebo SD of these normalised statistics is the SE foratt.
- mlsynth.utils.dscar_helpers.inference.fdr_unconfoundedness_test(*, inputs: DSCARInputs, weights: np.ndarray, gap: np.ndarray) Tuple[np.ndarray, float]#
Per-pre-period z-tests + BY-adjusted minimum p-value.
- Returns:
p_values (np.ndarray) – Shape
(T0,)two-sided p-values forH_0: gap_t = 0.min_p_adjusted (float) – Smallest BY-adjusted p-value across the
T0tests.
- mlsynth.utils.dscar_helpers.inference.normalised_placebo_test(*, inputs: DSCARInputs, weights: np.ndarray, att_observed: float, placebo_reps: int, el_tolerance: float, seed: int) Tuple[np.ndarray, float]#
Run the normalised placebo procedure (Section 3.2 of Zheng & Chen 2024).
- Returns:
placebo_atts (np.ndarray) – Length-
placebo_repsarray of placebo post-period mean effects (normalised to match the observed-treated SD scale).se (float) – Empirical SE of
att_observed, computed from the placebo distribution.
Frozen dataclasses for the DSC estimator pipeline.
- class mlsynth.utils.dscar_helpers.structures.DSCARFit(weights: ndarray, Y0_hat: ndarray, Y_treated_mean: ndarray, gap: ndarray, att: float, att_relative: float, se: float | None = None, placebo_atts: ndarray | None = None, pre_period_pvalues: ndarray | None = None, pre_period_min_pvalue_adj: float | None = None, n_exact_matched_periods: int = 0, v_diagonal: ndarray | None = None)#
Per-period DSC weights + counterfactual + treatment-effect path.
- Parameters:
weights (np.ndarray) – Shape
(T, n_donors)per-period simplex weight matrix.Y0_hat (np.ndarray) – Length-
Testimated counterfactual outcome for the treated group (per-hour mean across treated units, following Zheng & Chen 2024 Section 5).Y_treated_mean (np.ndarray) – Length-
Tobserved per-hour mean across treated units.gap (np.ndarray) – Length-
Tper-period effectY_treated_mean - Y0_hat.att (float) – Mean of
gapover the post-period.att_relative (float) –
1 - mu1 / mu0wheremu1, mu0are post-period means ofY_treated_meanandY0_hatrespectively.se (Optional[float]) – Standard error of
attfrom the normalised placebo run (Section 3.2).Nonewhenplacebo_reps == 0.placebo_atts (Optional[np.ndarray]) – Length-
placebo_repspost-period mean effects from the normalised placebo runs.pre_period_pvalues (Optional[np.ndarray]) – Length-
T0per-pre-period two-sided p-values forH_0: gap_t = 0(Section 3.1).pre_period_min_pvalue_adj (Optional[float]) – Benjamini-Yekutieli-adjusted minimum pre-period p-value.
n_exact_matched_periods (int) – Number of periods at which the EL refinement step succeeded (
T_matchedin the paper’s notation).v_diagonal (Optional[np.ndarray]) – Shape
(T, p + 1)per-period variable-importance vector used in the QP (the diagonal ofV_t).
- Y0_hat: ndarray#
- Y_treated_mean: ndarray#
- gap: ndarray#
- weights: ndarray#
- class mlsynth.utils.dscar_helpers.structures.DSCARInputs(Y: ndarray, Y_lag1: ndarray, X: ndarray, var_names: Tuple[str, ...], y_name: str, treated_labels: Tuple[Any, ...], donor_labels: Tuple[Any, ...], time_labels: ndarray, N: int, T: int, T0: int, T1: int, n_treated: int)#
Preprocessed panel for DSC.
- Parameters:
Y (np.ndarray) – Shape
(N, T)outcome panel ordered with then_treatedtreated units first (rows0 .. n_treated - 1), then donor units.Y_lag1 (np.ndarray) – Shape
(N, T)one-period-lag outcome. Columnt = 0carries the user-provided pre-period lag; columnst >= 1equalY[:, t - 1].X (np.ndarray) – Shape
(N, T, p)exogenous-covariate cube.pmay be 0.var_names (tuple of str) – Length-
pnames of the exogenous covariates (informational).y_name (str) – Outcome column name (informational).
treated_labels (tuple) – Labels of the directly-treated units, in panel row order.
donor_labels (tuple) – Labels of the donor units, in panel row order.
time_labels (np.ndarray) – Length-
Ttime labels.N (int) – Total number of units.
T (int) – Total number of time periods.
T0 (int) – Number of pre-treatment periods.
T1 (int) – Number of post-treatment periods.
n_treated (int) – Number of directly-treated units.
- X: ndarray#
- Y: ndarray#
- Y_lag1: ndarray#
- time_labels: ndarray#
- class mlsynth.utils.dscar_helpers.structures.DSCARResults(inputs: DSCARInputs, fit: DSCARFit, method: str = 'dsc')#
Top-level DSC result container.
- property counterfactual: ndarray#
- property gap: ndarray#
- inputs: DSCARInputs#
- property weights: ndarray#
Example#
A tiny AR(1) panel with a planted treatment effect of \(\tau = 2\) on unit 0:
import numpy as np
import pandas as pd
from mlsynth import DSCAR
rng = np.random.default_rng(0)
N, T, T0 = 8, 30, 20
x = rng.standard_normal((N, T)) * 0.5
eps = rng.standard_normal((N, T)) * 0.3
Y = np.zeros((N, T))
for t in range(1, T):
Y[:, t] = 0.5 * x[:, t] + 0.6 * Y[:, t - 1] + eps[:, t]
Y[0, T0:] += 2.0 # treatment effect on unit 0
rows = [
{"unit": f"u{i}", "year": t,
"y": float(Y[i, t]), "x1": float(x[i, t]),
"y_lag1": float(Y[i, t - 1]) if t >= 1 else 0.0,
"treat": int(i == 0 and t >= T0)}
for i in range(N) for t in range(T)
]
df = pd.DataFrame(rows)
res = DSCAR({
"df": df, "outcome": "y", "treat": "treat",
"unitid": "unit", "time": "year",
"exog_covariates": ["x1"], "lagged_outcome": "y_lag1",
"display_graphs": False,
}).fit()
print(f"DSCAR ATT = {res.att:+.3f} (true tau = 2.0)")
Empirical replication: Beijing PM2.5 air-pollution alerts#
DSCAR ships with the two air-pollution panels used in Zheng & Chen (2024) Section 5:
basedata/beijing_pm25_orange_alert.csv– the orange alert starting 17 Nov 2016, 94 monitoring stations × 72 hours, 20 treated.basedata/beijing_pm25_red_alert.csv– the red alert starting 16 Dec 2016, 66 stations × 72 hours, 20 treated.
The pre-period is the 48 hours before the alert; the post-period is the 24 hours after.
import pandas as pd
from mlsynth import DSCAR
df = pd.read_csv("https://raw.githubusercontent.com/jgreathouse9/mlsynth/refs/heads/main/basedata/beijing_pm25_orange_alert.csv")
df["treat_indicator"] = (
(df["alert_if"] == 1) & (df["hour_eps"] > 48)
).astype(int)
res = DSCAR({
"df": df, "outcome": "pm25", "treat": "treat_indicator",
"unitid": "id_eps", "time": "hour_eps",
"exog_covariates": ["WSPM", "humi", "dewp", "pres"],
"lagged_outcome": "pm25_lag1",
"display_graphs": False,
}).fit()
mu0 = res.fit.Y0_hat[48:].mean()
mu1 = res.fit.Y_treated_mean[48:].mean()
print(f"orange alert ATT = {res.att:+.4f} (paper -33.8)")
print(f" relative reduction = {100 * res.att_relative:+.2f}% (paper -24.3%)")
print(f" mu_0 = {mu0:.4f} (paper 139.0)")
print(f" mu_1 = {mu1:.4f} (paper 105.3)")
prints:
orange alert ATT = -33.7830 (paper -33.8)
relative reduction = -24.29% (paper -24.3%)
mu_0 = 139.07 (paper 139.0)
mu_1 = 105.28 (paper 105.3)
Path-A regression status:
Orange alert: ATT matches the paper to 0.05 μg/m³ and the relative-reduction figure to 0.01 percentage points – this is a faithful Path-A replication.
Red alert: my implementation produces
ATT = −55.7 μg/m³(relative reduction 21.9%) against the paper’s reported-70.4 μg/m³(relative reduction 26.2%). I do not know why this is, as the final code is unpublished. The qualitative finding holds (large negative ATT, ~20% reduction), but the magnitude differs by ~21%. The reference R script that was emailed to me two years ago (as of this writing,eg2/Eg_Air_Pollution_eps_201616_12_16_final.R) contains a commented-out per-unit pressure / humidity de-meaning block, suggesting the paper’s red-alert numbers were produced with preprocessing the released code doesn’t actually perform. The pytest regressionTestPathABeijingAlerts::test_red_att_qualitativeasserts the qualitative ATT bound rather than the paper’s exact magnitude.
The driver is examples/dscar/replicate_beijing_alerts.py; run
with python -m examples.dscar.replicate_beijing_alerts.
References#
Zheng, X., & Chen, S. X. (2024). “Dynamic synthetic control method for evaluating treatment effects in auto-regressive processes.” Journal of the Royal Statistical Society Series B 86(1):155-176.
Abadie, A., Diamond, A., & Hainmueller, J. (2010). “Synthetic Control Methods for Comparative Case Studies.” Journal of the American Statistical Association 105(490):493-505.
Chen, S. X., & Van Keilegom, I. (2009). “A review on empirical likelihood methods for regression.” Test 18(3):415-447.
Owen, A. (1988). “Empirical likelihood ratio confidence intervals for a single functional.” Biometrika 75(2):237-249.