FDID — Forward Difference-in-Differences (Li 2024)#
- Estimator:
Forward Difference-in-Differences (FDID) —
mlsynth.FDID- Source:
Li, Kathleen T. (2024), “Frontiers: A Simple Forward Difference-in-Differences Method,” Marketing Science 43(2) [Li2024].
- Replication type:
Path A — the author’s released public empirical (Hong Kong GDP) reproduced cell by cell — and Path B — the paper’s own Monte Carlo (Web Appendix E, Table 5).
- Status:
Fully verified — empirical and simulation reproduced.
Validation strategy#
Li’s headline application — the effect of opening physical stores on an online-first retailer’s city-level sales — runs on a confidential retailer dataset that cannot be redistributed. For context, that study reports a Forward DiD effect of opening a store in Atlanta of about +$75,143 in monthly sales (an 86% lift, pre-period \(R^2 = 0.76\)), with ordinary DiD and synthetic control — which fit Atlanta’s steep pre-trend poorly — overstating it.
That headline number cannot be checked value-for-value, but it does not have to be: alongside the paper the author released a public companion replication (MATLAB and R) on the Hsiao, Ching & Wan (2012) Hong Kong GDP panel, which mlsynth reproduces cell by cell (Path A). We additionally reproduce the paper’s own Monte Carlo (Path B), which exercises the mismatched-control regimes a single empirical case cannot.
Path A — Hong Kong GDP#
The author’s public dataset (basedata/HongKong.csv — Hong Kong plus 24
OECD / Asian control economies over 61 quarters, with treatment, the
political and economic integration with mainland China, beginning at quarter
44 so \(T_1 = 44\)) is the Hsiao, Ching & Wan (2012) panel.
import pandas as pd
from mlsynth import FDID
df = pd.read_csv("basedata/HongKong.csv")
res = FDID({"df": df, "outcome": "GDP", "treat": "Integration",
"unitid": "Country", "time": "Time",
"display_graphs": False, "verbose": False}).fit()
res.fdid.att, res.fdid.att_percent, res.fdid.r_squared, len(res.fdid.selected_names)
Forward DiD selects 9 of the 24 controls and reproduces the author’s
released MATLAB/R output (ForwardDID_Readme.txt) cell by cell:
Metric |
FDID (mlsynth) |
FDID (Li) |
DID (mlsynth) |
DID (Li) |
|---|---|---|---|---|
ATT |
0.0254 |
0.0254 |
0.0317 |
0.0317 |
% ATT |
53.84 |
53.84 |
77.62 |
77.62 |
pre-period \(R^2\) |
0.843 |
0.843 |
0.505 |
0.505 |
controls used |
9 |
9 |
24 |
24 |
The 95% confidence interval (0.0163, 0.0345) and the standardized ATT
(t-statistic) \(\approx 5.49\) likewise match the released values.
Forward DiD’s far higher pre-period fit (\(R^2 = 0.84\) versus DiD’s
\(0.50\)) is the whole point: the all-controls average tracks Hong Kong’s
pre-integration path poorly, so plain DiD overstates the effect, while the
forward search keeps only the 9 economies that co-move with Hong Kong.
The durable check lives in benchmarks/cases/fdid_hongkong.py:
python benchmarks/run_benchmarks.py --case fdid_hongkong
Path B — the simulation design#
The four DGPs and their factor structure are packaged in
mlsynth.utils.fdid_helpers.simulation.simulate_fdid_sample(): three
common factors — f1 AR(1) 0.8, f2 ARMA(1,1) (-0.6, 0.8),
f3 MA(2) (0.9, 0.4), innovations \(N(0,1)\) — with outcomes
\(a_0 + c_0 \sum_k f_{kt} + \varepsilon\) for the treated unit and
\(1 + c \sum_k f_{kt} + \varepsilon\) for the controls (the first half of
the donor pool loading \(c_1\), the second half \(c_2\)). The four
DGPs vary \((a_0, c_0, c_1, c_2)\):
DGP1
(1,1,1,1)and DGP3(2,1,1,1)— all controls share the treated unit’s factor loading, so ordinary DiD is applicable.DGP2
(1,1,1,2)and DGP4(2,1,1,2)— half the controls carry the wrong loading, so the all-controls DiD average is contaminated.
The true ATT is \(0\), and the reported risk is the predictive MSE \(\mathrm{PMSE} = M^{-1}\sum_j \widehat{\mathrm{ATT}}_j^2\).
Reproducing Table 5#
import numpy as np
from mlsynth import FDID
from mlsynth.utils.fdid_helpers.simulation import simulate_fdid_sample
def pmse_cell(dgp, N, T1, T2, M, seed=0):
fdid_sq, did_sq = [], []
for j in range(M):
rng = np.random.default_rng(seed + j)
sample = simulate_fdid_sample(dgp=dgp, N=N, T1=T1, T2=T2, rng=rng)
res = FDID({"df": sample.df, "outcome": "y", "treat": "treat",
"unitid": "unit", "time": "time",
"display_graphs": False, "verbose": False}).fit()
fdid_sq.append(res.fdid.att ** 2) # ATT = 0, so SE^2 = att^2
did_sq.append(res.did.att ** 2)
return float(np.mean(fdid_sq)), float(np.mean(did_sq))
for dgp in (1, 2, 3, 4):
for T1, T2 in [(12, 6), (24, 12), (48, 24)]:
fdid_pmse, did_pmse = pmse_cell(dgp, N=60, T1=T1, T2=T2, M=1000)
print(f"DGP{dgp} ({T1},{T2}): FDID={fdid_pmse:.4f} DID={did_pmse:.4f}")
Results#
At \(M = 1{,}000\) (Li uses \(M = 10{,}000\); the runtime difference is the only material change) this reproduces Table 5 cell by cell:
DGP |
\((T_1, T_2)\) |
DID (mlsynth) |
DID (Li) |
FDID (mlsynth) |
FDID (Li) |
|---|---|---|---|---|---|
1 |
(12, 6) |
0.265 |
0.259 |
0.325 |
0.315 |
1 |
(24, 12) |
0.127 |
0.128 |
0.147 |
0.146 |
1 |
(48, 24) |
0.065 |
0.063 |
0.075 |
0.071 |
2 |
(12, 6) |
1.202 |
1.037 |
0.431 |
0.385 |
2 |
(24, 12) |
0.765 |
0.746 |
0.177 |
0.180 |
2 |
(48, 24) |
0.451 |
0.473 |
0.084 |
0.082 |
3 |
(12, 6) |
0.265 |
0.252 |
0.325 |
0.303 |
3 |
(24, 12) |
0.127 |
0.123 |
0.147 |
0.143 |
3 |
(48, 24) |
0.065 |
0.064 |
0.075 |
0.072 |
4 |
(12, 6) |
1.202 |
1.038 |
0.431 |
0.391 |
4 |
(24, 12) |
0.765 |
0.744 |
0.177 |
0.171 |
4 |
(48, 24) |
0.451 |
0.454 |
0.084 |
0.081 |
What it confirms#
The two headline findings reproduce:
When all controls are valid (DGP1, DGP3), DiD is the parsimonious, efficient choice and edges out Forward DiD by a small margin at every horizon — Forward DiD pays only a small efficiency cost for the safety of the search.
When half the controls are mismatched (DGP2, DGP4), DiD’s PMSE stays large and does not shrink as the panel grows (DGP2 at \((48,24)\): DID still
0.45) because the contaminating controls bias the all-controls average, while Forward DiD’s PMSE collapses (0.084) because the forward search discards them. Forward DiD wins decisively when DiD is invalid — Li’s central result.
The identity of the DGP1/DGP3 (and DGP2/DGP4) columns also confirms the estimator’s intercept invariance: moving \(a_0\) from 1 to 2 changes nothing because Forward DiD’s \(\widehat\alpha\) absorbs it. The \((12, 6)\) cell runs slightly hot under DGP2/4, consistent with Monte Carlo noise at \(M = 1{,}000\) versus Li’s \(M = 10{,}000\).