Forward Difference-in-Differences (FDID)

Contents

Forward Difference-in-Differences (FDID)#

When to Use This Estimator#

Difference-in-differences (DiD) is the workhorse of quasi-experimental causal inference, but it rests on a parallel-trends assumption: the treated unit’s untreated outcome would have moved in lockstep with the average of the controls. With a large, heterogeneous pool of candidate controls that assumption is rarely credible for the pool as a whole – most of the controls are simply the wrong comparison. The usual escape hatches each have a catch:

  • Plain DiD uses every control with equal weight. One badly mismatched control contaminates the average, and its bias does not shrink as the panel grows.

  • Synthetic control (SC, [ABADIE2010]) weights the controls on the simplex, but is justified only as the pre-period grows without bound, and has no inference theory when the data are non-stationary of unknown form – exactly the regime of most marketing and macro panels.

  • The panel-data approach of Hsiao, Ching and Wan ([HCW]) and its forward-selected variant ([fsPDA]) fit an unrestricted regression on the controls. When the number of controls \(N\) exceeds the number of pre-treatment periods \(T_1\) – common in store/geo studies – they overfit in-sample and predict poorly out-of-sample.

Forward DiD (Li [Li2024]) targets precisely this regime: many candidate controls, a short-to-moderate pre-period, and a need for valid inference under non-stationarity. It keeps DiD’s transparency – an equal-weighted comparison group plus a single intercept – but chooses which controls enter the comparison by a greedy forward search on pre-treatment fit. Because only one parameter is ever estimated (the DiD intercept \(\alpha\)), no matter how many controls are selected, overfitting is impossible and the textbook DiD standard error applies. Its advantages, in Li’s own summary:

  1. It is a flexible drop-in for DiD, usable even when DiD’s all-controls parallel trend is too restrictive.

  2. It accommodates any number of controls, including \(N > T_1\).

  3. There are no overfitting concerns – one parameter after selection.

  4. It is computationally cheap: a greedy \(O(N^2)\) search rather than the \(2^N\) subsets of the optimal procedure.

  5. It has inference theory valid for stationary and non-stationary data, which SC and HCW lack.

Forward Selection vs. Matching and Weighting#

Every synthetic-control-family estimator answers the same question – what comparison reproduces the treated unit’s untreated path? – but each makes a different structural bet about the comparison. Forward DiD’s bet is distinctive and worth stating plainly:

A *subset* of the controls shares the treated unit’s trend; find that subset and average it with equal weights.

It does not try to weight all controls cleverly (SC), nor regress on all of them (HCW), nor trust all of them (DiD). It selects. The selection is greedy: add, one at a time, the control that most improves pre-treatment fit, trace the fit as the subset grows, and keep the subset that fits best.

DiD

Synthetic control

Forward DiD

Comparison

all controls, equal weight

all controls, simplex weights

a selected subset, equal weight

Free parameters

1 (intercept)

\(N\) weights

1 (intercept) – after selection

Overfitting risk

none

controlled by the simplex

none – one parameter

Key assumption

all controls are parallel

treated is in the convex hull

some subset is parallel

Inference under non-stationarity

standard

none available

standard (Prop. 2.1)

The equal weights are the crux. Because the selected controls enter through a single average – not \(|\mathcal{D}|\) separate coefficients – adding more controls cannot increase the model’s degrees of freedom. This is an implicit regularization that buys both the overfitting immunity and the clean, DiD-style inference theory. Forward DiD is therefore best read as DiD with a principled, data-driven choice of comparison group, not as a new weighting scheme.

Notation#

Index the units by \(j\), with \(j = 0\) the single treated unit and \(\mathcal{N} = \{1, \ldots, N\}\) the control units. A selected subset \(\mathcal{D} \subseteq \mathcal{N}\) is the comparison group; write its equal-weighted average outcome as

\[\bar{y}_{\mathcal{D}, t} = \frac{1}{|\mathcal{D}|} \sum_{j \in \mathcal{D}} y_{jt}.\]

Time runs over \(t \in \{1, \ldots, T\}\); the intervention starts at \(T_1 + 1\), giving a pre-period \(\mathcal{T}_1 = \{1, \ldots, T_1\}\) and a post-period \(\mathcal{T}_2 = \{T_1 + 1, \ldots, T\}\) of length \(T_2 = T - T_1\). Potential outcomes are \(y^0_{jt}\) (untreated) and \(y^1_{jt}\) (treated); we observe \(y_{0t} = y^0_{0t}\) for \(t \in \mathcal{T}_1\) and \(y_{0t} = y^1_{0t}\) for \(t \in \mathcal{T}_2\). The estimand is the average treatment effect on the treated,

\[\mathrm{ATT} = \frac{1}{T_2} \sum_{t \in \mathcal{T}_2} \bigl(y^1_{0t} - y^0_{0t}\bigr).\]

Notation bridge

Li [Li2024] writes the treated outcome \(y_{tr,t}\), the selected control set \(\mathcal{N}_{co}\) with size \(N_{co}\), the control average \(\bar{y}_{\mathcal{N}_{co}, t}\), the intercept \(\alpha\), and \(T_1\) / \(T_2\) for the pre/post counts (treatment at \(T_1 + 1\)). We keep \(j = 0\) for the treated unit, \(\mathcal{D}\) for the selected comparison group, and \(\bar{y}_{\mathcal{D}, t}\) for its average.

Mathematical Formulation#

The DiD model#

For a fixed comparison group \(\mathcal{D}\), Forward DiD posits that the treated unit’s untreated outcome equals the group average plus a constant level shift:

\[y^0_{0t} = \alpha + \bar{y}_{\mathcal{D}, t} + v_t, \qquad t = 1, \ldots, T,\]

with \(\alpha\) an unknown intercept and \(v_t\) a zero-mean, weakly dependent error. Crucially, \(y^0_{0t}\) and \(\bar{y}_{\mathcal{D}, t}\) may each be non-stationary (trending) provided their difference is stationary – this is the Forward DiD parallel-trends condition. The intercept is estimated by least squares on the pre-period,

\[\hat{\alpha} = \frac{1}{T_1} \sum_{t \in \mathcal{T}_1} \bigl(y_{0t} - \bar{y}_{\mathcal{D}, t}\bigr),\]

so the in-sample fit and out-of-sample counterfactual are

\[\hat{y}^0_{0t} = \hat{\alpha} + \bar{y}_{\mathcal{D}, t}, \qquad t = 1, \ldots, T,\]

and the ATT is the mean post-period gap

\[\widehat{\mathrm{ATT}} = \frac{1}{T_2} \sum_{t \in \mathcal{T}_2} \bigl(y_{0t} - \hat{y}^0_{0t}\bigr).\]

Because the model has a single parameter, the pre-treatment fit quality is summarized by an \(R^2\) (identical to the adjusted \(R^2\), since there is only one regressor coefficient):

\[R^2_{\mathcal{D}} = 1 - \frac{\sum_{t \in \mathcal{T}_1} \hat{v}_t^2} {\sum_{t \in \mathcal{T}_1} (y_{0t} - \bar{y}_0)^2}, \qquad \hat{v}_t = y_{0t} - \bar{y}_{\mathcal{D}, t} - \hat{\alpha},\]

where \(\bar{y}_0\) is the treated unit’s pre-period mean.

The forward-selection algorithm#

Maximizing \(R^2_{\mathcal{D}}\) is equivalent to minimizing the pre-period residual variance \(T_1^{-1} \sum_{t \in \mathcal{T}_1} \hat{v}_t^2\). Forward DiD searches over comparison groups greedily:

  1. Step 1. For each single control \(i \in \mathcal{N}\), form the one-unit comparison group and compute its pre-period \(R^2\). Keep the best single control, \(\hat{c}_1\).

  2. Step 2. Add to \(\{\hat{c}_1\}\) each of the remaining \(N - 1\) controls in turn; keep the two-unit group with the highest \(R^2\).

  3. Step 3. Continue, adding one control at a time, until all \(N\) controls are in. This yields \(N\) nested groups (sizes \(1, 2, \ldots, N\)); select the one, \(\hat{\mathcal{D}} = \mathcal{N}_{co}\), with the largest \(R^2\).

The greedy search evaluates \(1 + 2 + \cdots + N = N(N+1)/2\) sub-models rather than the \(2^N\) of the exhaustive procedure (for \(N = 60\), that is 1,830 versus \(1.15 \times 10^{18}\)). The final group \(\hat{\mathcal{D}}\) is then plugged into the DiD formulas above for the ATT, its standard error, and the \(R^2\).

How mlsynth computes this: incremental means and a batched \(R^2\)#

Read literally, each step of the algorithm re-forms a subset average from scratch and re-fits a DiD regression for every remaining candidate – an \(O(N)\) rebuild times an \(O(N)\) candidate loop times the per-candidate work, repeated \(N\) times. mlsynth’s forward_did_select() instead collapses each step into a handful of vectorized NumPy operations through three observations.

1. The comparison average is updated incrementally, never rebuilt. Let \(\mathbf{m}^{(k)} \in \mathbb{R}^{T}\) be the running average over the \(k\) already-selected controls. Adding control \(c\) gives the \((k+1)\)-average by a single rank-one update,

\[\mathbf{m}^{(k+1)} = \mathbf{m}^{(k)} + \frac{\mathbf{y}_c - \mathbf{m}^{(k)}}{k + 1},\]

which is \(O(T)\) rather than \(O(kT)\). This is _update_synthetic_control() (current_mean + (control - current_mean) / (k + 1)).

2. All candidate averages for a step are built in one matrix. At step \(k\), let \(\mathbf{Y}_{\mathcal{R}} \in \mathbb{R}^{T_1 \times |\mathcal{R}|}\) stack the pre-period columns of the remaining candidates \(\mathcal{R}\). Every candidate \((k+1)\)-average – one per column – is formed simultaneously by broadcasting the running pre-period mean \(\mathbf{m}^{(k)}_{\mathcal{T}_1}\):

\[\mathbf{M} = \frac{k\,\mathbf{m}^{(k)}_{\mathcal{T}_1}\mathbf{1}^\top + \mathbf{Y}_{\mathcal{R}}}{k + 1} \in \mathbb{R}^{T_1 \times |\mathcal{R}|}.\]

In code this is the one line new_means = (current_mean_pre[:, None] * k + candidates) / (k + 1) inside _select_best_donor().

3. The intercept \(\alpha\) drops out, so scoring is pure inner products. This is the step that removes the per-candidate regression entirely. Profiling out \(\alpha\) from the DiD loss is exactly centering: the fitted residual for candidate column \(\ell\) is \(\hat v_t = (y_{0t} - \bar y_0) - (M_{t\ell} - \bar M_\ell)\). Writing \(\tilde{\mathbf{y}} = \mathbf{y}_{0,\mathcal{T}_1} - \bar y_0\) (precomputed once, with its norm \(\lVert\tilde{\mathbf{y}}\rVert^2 = \mathrm{ss}_{\text{tot}}\)), the residual sum of squares for all candidates is

\[\mathrm{SSR}_\ell = \mathrm{ss}_{\text{tot}} + \underbrace{\lVert \mathbf{M}_\ell - \bar M_\ell \rVert^2}_{\text{column SS}} - 2\,\underbrace{\tilde{\mathbf{y}}^\top (\mathbf{M}_\ell - \bar M_\ell)}_{\text{one matrix--vector product}}, \qquad R^2_\ell = 1 - \frac{\mathrm{SSR}_\ell}{\mathrm{ss}_{\text{tot}}}.\]

The cross term for the whole candidate set is the single matvec \(\tilde{\mathbf{y}}^\top(\mathbf{M} - \bar{\mathbf{M}})\); the column sums of squares are one reduction. This is _r2_batch() – no candidate is ever regressed, and \(\alpha\) is never explicitly solved during the search (it is recovered only once, for the winning group, in did_from_mean()).

Taken together, a forward step costs \(O(T_1 |\mathcal{R}|)\) and the entire search is \(O(T_1 N^2)\), with the inner loop expressed as a broadcast, a matrix–vector product, a column reduction, and an argmax – no Python-level loop over candidates and no per-candidate solve. This is what lets the implementation run the selection over \(\sim\)1,500 controls, and what makes the \(M = 5{,}000\) replication Monte Carlo in Verification tractable.

Assumptions#

Assumption 1 (Forward DiD parallel trends). There exists a subset \(\mathcal{D} \subseteq \mathcal{N}\) and a constant \(\alpha\) such that \(y^0_{0t} = \alpha + \bar{y}_{\mathcal{D}, t} + v_t\) for all \(t\), where \(v_t\) is a weakly dependent process with zero mean and finite variance.

Remark. This says the gap between the treated unit and the selected comparison group is stable across the pre- and post-periods up to a mean-zero shock. It is strictly weaker than DiD’s requirement that all controls be parallel: it asks only that some equal-weighted subset be parallel. Both \(y^0_{0t}\) and \(\bar{y}_{\mathcal{D}, t}\) may trend arbitrarily (even non-linearly), so long as their difference is trendless – which is what makes the method valid under non-stationarity.

Assumption 2 (no anticipation / no interference). Controls are untreated throughout, and the treated unit’s outcome equals its untreated potential outcome in the pre-period.

Remark. Standard in the DiD/SC literature. It is what lets the pre-period identify the comparison group: if controls were themselves affected by the intervention (spillover), their pre/post relationship to the treated unit would shift and selection would be biased.

Assumption 3 (regularity for inference). The partial sums of \(v_t\) obey a central limit theorem; errors are weakly dependent with finite long-run variance.

Remark. This is what delivers the asymptotic normality in Proposition 2.1 below, and it holds for the broad class of stationary, weakly-dependent error processes – it does not require \(v_t\) to be i.i.d. or the levels \(y^0_{0t}\) to be stationary.

When not to use Forward DiD

Assumption 1 fails when no subset of controls can track the treated unit – most importantly when the treated unit lies outside the range of the controls (e.g. its outcome trends upward more steeply than every control’s). Equal weights cannot extrapolate beyond the controls, so no selection rescues it. In that regime Li points to methods that let the treated unit sit outside the control hull: the augmented DiD ([ADID]), factor-model / interactive-fixed-effect estimators, or SC with an intercept.

The pretreatment \(R^2\) returned by FDID is the natural empirical check on Assumption 1. The script below draws two panels with the same underlying common factor and the same true ATT of zero, differing only in the treated unit’s factor loading:

  • Panel A (treated_loading = 1). The treated unit shares the controls’ single-factor loading. Assumption 1 holds for any subset of the donors.

  • Panel B (treated_loading = 3). The treated unit trends three times faster than any control. No subset of the equal-weighted donors can extrapolate the steeper trend – Assumption 1 fails.

import numpy as np
import pandas as pd

from mlsynth import FDID


def make_panel(*, treated_loading, n_controls=40, T1=24, T2=12, seed=0):
    """Synthetic panel with one common smoothly-trending factor.

    The treated unit's loading on the factor is ``treated_loading``; every
    control loads with coefficient 1. True ATT = 0 by construction.
    """
    rng = np.random.default_rng(seed)
    T = T1 + T2
    f = np.cumsum(rng.standard_normal(T)) / np.sqrt(T)
    eps_tr = 0.10 * rng.standard_normal(T)
    eps_co = 0.10 * rng.standard_normal((n_controls, T))
    y_tr = 1.0 + treated_loading * f + eps_tr
    y_co = 1.0 + 1.0 * f[None, :] + eps_co
    rows = []
    for t in range(T):
        rows.append({"unit": "treated", "time": t, "y": float(y_tr[t]),
                     "treat": int(t >= T1)})
        for i in range(n_controls):
            rows.append({"unit": f"c{i}", "time": t, "y": float(y_co[i, t]),
                         "treat": 0})
    return pd.DataFrame(rows)


for label, loading in [("Forward PTA holds (loading=1)", 1.0),
                        ("Forward PTA fails (loading=3)", 3.0)]:
    df = make_panel(treated_loading=loading, seed=0)
    res = FDID({"df": df, "outcome": "y", "treat": "treat",
                 "unitid": "unit", "time": "time",
                 "display_graphs": False}).fit()
    print(f"{label:35s}  FDID ATT = {res.fdid.att:+.3f}  "
           f"R^2 = {res.fdid.r_squared:.3f}  "
           f"selected {len(res.fdid.selected_names)} donors")

prints:

Forward PTA holds (loading=1)        FDID ATT = -0.009  R^2 = 0.975  selected 4 donors
Forward PTA fails (loading=3)        FDID ATT = -0.802  R^2 = 0.588  selected 2 donors

Two lessons jump out:

  1. The :math:`R^2` is the warning signal. When Forward PTA holds, FDID hits \(R^2 \approx 0.98\) and recovers the true zero ATT to within noise. When it fails, the in-sample fit drops to \(R^2 \approx 0.59\) – a much weaker fit on a panel of the same dimensions. Compare the two against the same threshold you would apply in a forecast exercise (Li’s empirical applications report \(R^2\) of 0.76-0.91 on Atlanta / San Diego / San Jose). When the pre-fit is weak, distrust the post-fit ATT.

  2. The bias is large and one-sided. When Forward PTA fails because the treated unit trends faster than any subset of controls, FDID’s equal-weighted comparison group flattens the post-period counterfactual and the ATT is biased toward zero from above (here: \(-0.80\) against a true 0). A clean placebo on the pre-period will also be off: the in-sample residuals are systematically wrong when the controls cannot extrapolate the treated unit’s trend.

If your application reports \(R^2\) materially below the threshold you would consider acceptable for a forecast (say, < 0.7), treat the ATT estimate as a lower bound on the magnitude of misspecification rather than an estimate of the causal effect, and switch to one of the methods Li flags for the out-of-hull case: Forward Difference-in-Differences (FDID) with a different comparison construction is unlikely to recover it – try the augmented DiD, a factor-model / interactive-fixed-effects estimator, or synthetic control with an intercept.

Inference#

Because Forward DiD estimates a single parameter, its inference is the textbook DiD inference. Let \(\hat{\sigma}^2_{\mathcal{D}} = T_1^{-1} \sum_{t \in \mathcal{T}_1} \hat{v}_t^2\) be the pre-period residual variance on the selected group. Li’s Proposition 2.1 establishes

\[\left| \Pr\!\left( \frac{\sqrt{T_2}\,(\widehat{\mathrm{ATT}} - \mathrm{ATT})} {\hat{\sigma}_{\mathcal{D}}} \le a \right) - \Phi(a) \right| \to 0, \qquad a \in \mathbb{R},\]

as \(T_1, T_2 \to \infty\), where \(\Phi\) is the standard-normal CDF. mlsynth reports the finite-sample standard error that also carries the estimation error in \(\hat{\alpha}\):

\[\mathrm{SE}(\widehat{\mathrm{ATT}}) = \hat{\sigma}_{\mathcal{D}} \sqrt{\frac{1}{T_1} + \frac{1}{T_2}},\]

since \(\widehat{\mathrm{ATT}} - \mathrm{ATT} = -T_1^{-1} \sum_{\mathcal{T}_1} v_t + T_2^{-1} \sum_{\mathcal{T}_2} v_t\) contributes one \(1/T_1\) and one \(1/T_2\) variance term. This collapses to Proposition 2.1’s \(\hat{\sigma}_{\mathcal{D}} / \sqrt{T_2}\) when \(T_1 \gg T_2\). The 95% Wald interval and two-sided p-value follow in the usual way.

Consistency of the selection#

Li also shows the greedy search recovers a valid comparison group. Under Assumption 1 and the appendix’s regularity conditions, with \(N\) fixed, the empirical forward selection selects (one of) the same subset(s) the infeasible procedure based on true error variances would select, with probability approaching one as \(T_1 \to \infty\) (Proposition 2.2; Proposition D.1 handles ties). Proposition D.2 extends this to the case where \(N\) grows with \(T_1\) under a latent group structure. Intuitively, by the law of large numbers each step’s empirical \(R^2\) converges to its population value, so the greedy path tracks the population-optimal path.

Example#

The block below is self-contained. It draws one panel from Li’s Web Appendix E data-generating process (three common factors, 60 controls), in the configuration where half the controls are the wrong comparison: the treated unit and the first 30 controls load on the common factor with weight 1, while the last 30 load with weight 2 (Li’s “DGP2”). The true ATT is zero. Forward DiD should select from the matching half and beat plain DiD, which is contaminated by the mismatched half.

import numpy as np
from mlsynth import FDID
from mlsynth.utils.fdid_helpers.simulation import simulate_fdid_sample

sample = simulate_fdid_sample(dgp=2, N=60, T1=24, T2=12,
                               rng=np.random.default_rng(0))

res = FDID({"df": sample.df, "outcome": "y", "treat": "treat",
            "unitid": "unit", "time": "time",
            "display_graphs": False}).fit()

sel = res.fdid.selected_names
matching = sum(int(s[1:]) < 60 // 2 for s in sel)
print(f"FDID: ATT={res.fdid.att:+.3f}  R2={res.fdid.r_squared:.3f}  "
      f"selected {len(sel)} donors, {matching} from the matching group")
print(f"DID : ATT={res.did.att:+.3f}  R2={res.did.r_squared:.3f}  (all 60 donors)")

A representative single draw prints:

FDID: ATT=-0.556  R2=0.918  selected 4 donors, 4 from the matching group
DID : ATT=-0.924  R2=0.632  (all 60 donors)

Forward DiD picks only matching controls, lifting the pre-fit \(R^2\) from 0.63 to 0.92 and landing closer to the true zero effect than DiD – which is dragged off by the 30 mismatched controls it is forced to include. (A single draw is noisy; the averaged behaviour over many draws is in Verification below.)

res is an FDIDResults: res.fdid and res.did are the two FDIDMethodFit objects, the convenience accessors (res.att, res.att_se, res.counterfactual, res.donor_weights) forward to the Forward DiD fit, and res.att_by_method() / res.ci_by_method() return both side by side.

Empirical Illustration: Hong Kong’s economic integration#

Forward DiD is the DiD analogue of the forward-selected panel-data approach ([fsPDA]), and it shines on exactly the data those methods target. We use the Hsiao, Ching and Wan ([HCW]) panel of quarterly real-GDP growth for Hong Kong and 24 comparison economies, with Hong Kong’s economic integration with mainland China as the intervention (44 pre-treatment quarters, 17 post).

import pandas as pd
from mlsynth import FDID

url = "https://raw.githubusercontent.com/jgreathouse9/mlsynth/main/basedata/HongKong.csv"
df = pd.read_csv(url)

res = FDID({"df": df, "outcome": "GDP", "treat": "Integration",
            "unitid": "Country", "time": "Time", "display_graphs": True}).fit()

print(f"FDID ATT {res.fdid.att:.4f}  SE {res.fdid.att_se:.4f}  "
      f"R2 {res.fdid.r_squared:.3f}  ({len(res.fdid.selected_names)} of "
      f"{res.inputs.n_donors} controls)")
print("selected:", res.fdid.selected_names)
print(f"DID  ATT {res.did.att:.4f}  SE {res.did.att_se:.4f}  R2 {res.did.r_squared:.3f}")

This prints:

FDID ATT 0.0254  SE 0.0046  R2 0.843  (9 of 24 controls)
selected: ['Philippines', 'Singapore', 'Thailand', 'Norway', 'Mexico',
           'Korea', 'Indonesia', 'New Zealand', 'Malaysia']
DID  ATT 0.0317  SE 0.0082  R2 0.505

Forward DiD keeps 9 of the 24 economies – a regionally sensible mix of Hong Kong’s trading partners – and in doing so lifts the pre-intervention \(R^2\) from 0.51 (all-controls DiD) to 0.84, roughly halving the standard error. The selected comparison group implies a post-integration GDP-growth effect of about +2.5 percentage points, more precisely estimated and better-fitting than the all-controls DiD’s +3.2.

Verification#

Monte Carlo replication (Path B). Li’s empirical application – the effect of opening physical stores on an online-first retailer’s city-level sales – runs on a confidential retailer dataset, so it cannot be reproduced value-for-value. Per the project’s replication contract (agents/agents_estimators.md), Forward DiD is therefore validated by reproducing the paper’s own Monte Carlo, Web Appendix E.

The four DGPs and their factor structure are all packaged in mlsynth.utils.fdid_helpers.simulation.simulate_fdid_sample(): three common factors – f1 AR(1) 0.8, f2 ARMA(1,1) (-0.6, 0.8), f3 MA(2) (0.9, 0.4), innovations \(N(0,1)\) – with outcomes \(a_0 + c_0 \sum_k f_{kt} + \varepsilon\) for the treated unit and \(1 + c \sum_k f_{kt} + \varepsilon\) for the controls (first half loading \(c_1\), second half \(c_2\)). Four DGPs vary \((a_0, c_0, c_1, c_2)\): DGP1 (1,1,1,1) and DGP3 (2,1,1,1) (all controls match – DiD is applicable); DGP2 (1,1,1,2) and DGP4 (2,1,1,2) (half the controls have the wrong loading – DiD breaks). True ATT \(= 0\) and \(\mathrm{PMSE} = M^{-1} \sum_j \widehat{\mathrm{ATT}}_j^2\).

Replicating Table 5 is a 12-line script:

import numpy as np
from mlsynth import FDID
from mlsynth.utils.fdid_helpers.simulation import simulate_fdid_sample

def pmse_cell(dgp, N, T1, T2, M, seed=0):
    fdid_sq, did_sq = [], []
    for j in range(M):
        rng = np.random.default_rng(seed + j)
        sample = simulate_fdid_sample(dgp=dgp, N=N, T1=T1, T2=T2, rng=rng)
        res = FDID({"df": sample.df, "outcome": "y", "treat": "treat",
                    "unitid": "unit", "time": "time",
                    "display_graphs": False, "verbose": False}).fit()
        fdid_sq.append(res.fdid.att ** 2)   # ATT = 0, so SE^2 = att^2
        did_sq.append(res.did.att ** 2)
    return float(np.mean(fdid_sq)), float(np.mean(did_sq))

for dgp in (1, 2, 3, 4):
    for T1, T2 in [(12, 6), (24, 12), (48, 24)]:
        fdid_pmse, did_pmse = pmse_cell(dgp, N=60, T1=T1, T2=T2, M=1000)
        print(f"DGP{dgp} ({T1},{T2}): FDID={fdid_pmse:.4f}  DID={did_pmse:.4f}")

At \(M = 1{,}000\) (Li uses \(M = 10{,}000\); runtime difference is the only material change) this reproduces Table 5 cell by cell:

DGP

\((T_1, T_2)\)

DID (mlsynth)

DID (Li)

FDID (mlsynth)

FDID (Li)

1

(12, 6)

0.265

0.259

0.325

0.315

1

(24, 12)

0.127

0.128

0.147

0.146

1

(48, 24)

0.065

0.063

0.075

0.071

2

(12, 6)

1.202

1.037

0.431

0.385

2

(24, 12)

0.765

0.746

0.177

0.180

2

(48, 24)

0.451

0.473

0.084

0.082

3

(12, 6)

0.265

0.252

0.325

0.303

3

(24, 12)

0.127

0.123

0.147

0.143

3

(48, 24)

0.065

0.064

0.075

0.072

4

(12, 6)

1.202

1.038

0.431

0.391

4

(24, 12)

0.765

0.744

0.177

0.171

4

(48, 24)

0.451

0.454

0.084

0.081

The two headline findings reproduce. When all controls are valid (DGP1, DGP3) DiD is the parsimonious efficient choice and edges out Forward DiD by a small margin at every horizon. When half the controls are mismatched (DGP2, DGP4) DiD’s PMSE stays large and does not shrink as the panel grows (DGP2 at \((48,24)\): DID still 0.45), because the contaminating controls bias the all-controls average; Forward DiD’s PMSE collapses (0.084) because the forward search discards them. Forward DiD pays only a small efficiency cost when DiD is valid, and wins decisively when it is not – Li’s central result. Identity of the DGP1/DGP3 (and DGP2/DGP4) columns also confirms the estimator’s intercept invariance – moving \(a_0\) from 1 to 2 changes nothing because Forward DiD’s \(\widehat\alpha\) absorbs it. The \((12, 6)\) cell runs slightly hot under DGP2/4, consistent with Monte Carlo noise at \(M = 1{,}000\) vs Li’s \(M = 10{,}000\).

For reference, Li’s confidential store-opening study reports a Forward DiD effect of opening a store in Atlanta of +$75,143 in monthly sales (an 86% lift, pre-period \(R^2 = 0.76\)), with DiD and SC – which fit Atlanta’s steep pre-trend poorly – overstating it.

Core API#

Forward Difference-in-Differences (FDID) estimator.

Implements the forward-selection difference-in-differences method of Li (2023), Frontiers: A Simple Forward Difference-in-Differences Method, Marketing Science. FDID greedily grows the control group one donor at a time, keeping the subset that maximises pre-treatment fit, and reports both the forward-selected estimate (FDID) and the textbook all-donor difference-in-differences benchmark (DID), each with Li (2023) analytical standard errors.

The estimator is a thin orchestration layer over mlsynth.utils.fdid_helpers: it validates configuration, prepares the panel, runs forward selection, assembles a typed FDIDResults, and optionally plots the counterfactuals.

class mlsynth.estimators.fdid.FDID(config: FDIDConfig | dict)#

Bases: object

Forward Difference-in-Differences (FDID) estimator.

Parameters:

config (FDIDConfig or dict) – Validated configuration (or a compatible dictionary). See mlsynth.config_models.FDIDConfig for the available fields (df, outcome, treat, unitid, time, display_graphs, save, counterfactual_color, treated_color, verbose).

References

Li, K. T. (2023). Frontiers: A Simple Forward Difference-in-Differences Method. Marketing Science, 43(2), 267-279. https://doi.org/10.1287/mksc.2022.0212

Examples

>>> import pandas as pd
>>> from mlsynth import FDID
>>> url = "https://raw.githubusercontent.com/jgreathouse9/mlsynth/refs/heads/main/basedata/basque_data.csv"
>>> data = pd.read_csv(url)
>>> config = {
...     "df": data,
...     "outcome": data.columns[2],
...     "treat": data.columns[-1],
...     "unitid": data.columns[0],
...     "time": data.columns[1],
...     "display_graphs": False,
... }
>>> results = FDID(config).fit()
>>> round(results.att, 3)
fit() FDIDResults#

Run forward selection and return the typed FDID results.

Returns:

FDIDResults – Container exposing the forward-selected fdid fit (primary) and the all-donor did benchmark, plus convenience aliases (att, att_se, counterfactual, gap, donor_weights).

Raises:
  • MlsynthDataError – If panel balancing or data preparation fails.

  • MlsynthEstimationError – If there are too few pre-periods or forward selection fails.

Configuration#

class mlsynth.config_models.FDIDConfig(*, df: ~pandas.DataFrame, outcome: str, treat: str, unitid: str, time: str, display_graphs: bool = True, save: bool | str = False, counterfactual_color: ~typing.List[str] = <factory>, treated_color: str = 'black', verbose: bool = True)#

Configuration for the Forward Difference-in-Differences (FDID) estimator. Inherits all common configuration parameters from BaseEstimatorConfig.

Additional Parameters#

plot_didbool, default=True

Whether to display a plot for the standard DID estimator. Has no effect on FDID or ADID plots.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

verbose: bool#

Result Containers#

FDID.fit() returns an FDIDResults, whose fdid and did fields each hold an FDIDMethodFit (counterfactual, gap, ATT, analytical standard error, 95% CI, p-value, pre-period RMSE and \(R^2\), selected donor names and equal weights, and – for Forward DiD – the \(R^2\) selection path). The prepared panel is exposed as an FDIDInputs.

Frozen dataclasses for the Forward Difference-in-Differences estimator.

FDID (Li 2023, Frontiers: A Simple Forward Difference-in-Differences Method, Marketing Science) builds the control group for a single treated unit by forward selection: it greedily adds the donor that most improves pre-treatment fit (R^2 between the treated unit and the running donor average), tracks the R^2 path, and keeps the subset that maximises it. The synthetic control is the simple average of the selected donors, with a difference-in-differences intercept.

Two estimates are always returned side by side:

  • FDID – the forward-selected difference-in-differences (best donor subset).

  • DID – the textbook two-way difference-in-differences using all donors (the average of every control unit). This is the natural benchmark the forward search improves upon.

Both carry Li (2023) analytical standard errors. The three layers below (inputs, per-method fit, top-level results) mirror the CLUSTERSC / PROXIMAL container design used elsewhere in mlsynth.

class mlsynth.utils.fdid_helpers.structures.FDIDInputs(y: ~numpy.ndarray, donor_matrix: ~numpy.ndarray, pre_periods: int, post_periods: int, T: int, donor_names: ~typing.Sequence, time_labels: ~numpy.ndarray, treated_unit_name: ~typing.Any, verbose: bool = True, prepped: ~typing.Dict[str, ~typing.Any] = <factory>)#

Bases: object

Preprocessed panel data for the FDID pipeline.

Parameters:
  • y (np.ndarray) – Treated-unit outcome over all T periods, shape (T,).

  • donor_matrix (np.ndarray) – Donor outcomes, shape (T, n_donors).

  • pre_periods (int) – Number of pre-treatment periods T0.

  • post_periods (int) – Number of post-treatment periods T1 = T - T0.

  • T (int) – Total number of periods.

  • donor_names (Sequence) – Length-n_donors donor labels (column order of donor_matrix).

  • time_labels (np.ndarray) – Length-T time labels.

  • treated_unit_name (Any) – Identifier of the treated unit.

  • verbose (bool) – Whether the forward-selection path is recorded step by step.

  • prepped (dict) – The raw mlsynth.utils.datautils.dataprep() dictionary, kept so the plotter can reuse the prepared panel.

T: int#
donor_matrix: ndarray#
donor_names: Sequence#
property n_donors: int#

Number of donor units.

post_periods: int#
pre_periods: int#
prepped: Dict[str, Any]#
time_labels: ndarray#
treated_unit_name: Any#
verbose: bool = True#
y: ndarray#
class mlsynth.utils.fdid_helpers.structures.FDIDMethodFit(name: str, counterfactual: ~numpy.ndarray, gap: ~numpy.ndarray, att: float, att_se: float, att_percent: float, satt: float, pre_rmse: float, r_squared: float, intercept: float, p_value: float, ci: ~typing.Tuple[float, float], selected_indices: ~typing.List[int], selected_names: ~typing.List[~typing.Any], donor_weights: ~typing.Dict[~typing.Any, float], r2_path: ~numpy.ndarray | None = None, intermediary: list | None = None, metadata: ~typing.Dict[str, ~typing.Any] = <factory>)#

Bases: object

Single FDID/DID fit output.

Parameters:
  • name (str) – Method identifier ("FDID" or "DID").

  • counterfactual (np.ndarray) – Estimated counterfactual outcome path, shape (T,).

  • gap (np.ndarray) – Observed treated minus counterfactual, shape (T,).

  • att (float) – Mean post-treatment treatment effect.

  • att_se (float) – Li (2023) analytical standard error of the ATT.

  • att_percent (float) – ATT as a percentage of the post-period counterfactual mean.

  • satt (float) – Standardised ATT (att / se * sqrt(T1)).

  • pre_rmse (float) – Root-mean-squared pre-treatment fit error.

  • r_squared (float) – Pre-treatment R^2 of the difference-in-differences fit.

  • intercept (float) – Difference-in-differences intercept (treated minus donor pre-period mean).

  • p_value (float) – Two-sided p-value for the ATT.

  • ci (tuple of float) – (lower, upper) 95% confidence interval for the ATT.

  • selected_indices (list of int) – Column indices of the donors retained (all donors for DID).

  • selected_names (list) – Donor labels corresponding to selected_indices.

  • donor_weights (dict) – Mapping {donor_name: weight} (equal weights over the selected donors).

  • r2_path (np.ndarray or None) – R^2 after each forward-selection step (FDID only; None for DID).

  • intermediary (list or None) – Per-step diagnostics when verbose (FDID only).

  • metadata (dict) – Free-form per-method diagnostics.

att: float#
att_percent: float#
att_se: float#
ci: Tuple[float, float]#
counterfactual: ndarray#
donor_weights: Dict[Any, float]#
gap: ndarray#
intercept: float#
intermediary: list | None = None#
metadata: Dict[str, Any]#
name: str#
p_value: float#
pre_rmse: float#
r2_path: ndarray | None = None#
r_squared: float#
satt: float#
selected_indices: List[int]#
selected_names: List[Any]#
class mlsynth.utils.fdid_helpers.structures.FDIDResults(inputs: ~mlsynth.utils.fdid_helpers.structures.FDIDInputs, fdid: ~mlsynth.utils.fdid_helpers.structures.FDIDMethodFit, did: ~mlsynth.utils.fdid_helpers.structures.FDIDMethodFit, selected_variant: str = 'FDID', metadata: ~typing.Dict[str, ~typing.Any] = <factory>)#

Bases: object

Top-level container returned by mlsynth.FDID.fit().

Parameters:
  • inputs (FDIDInputs) – Preprocessed panel.

  • fdid (FDIDMethodFit) – Forward-selected difference-in-differences fit (primary).

  • did (FDIDMethodFit) – Textbook difference-in-differences using all donors.

  • selected_variant (str) – Which fit is exposed via the convenience aliases att, att_se, counterfactual, gap, donor_weights"FDID" or "DID". Defaults to "FDID".

  • metadata (dict) – Free-form pipeline diagnostics.

property att: float#

ATT of the primary variant.

att_by_method() Dict[str, float]#

{method: ATT} for both fits.

property att_se: float#

ATT standard error of the primary variant.

ci_by_method() Dict[str, Tuple[float, float]]#

{method: (lower, upper)} confidence intervals for both fits.

property counterfactual: ndarray#

Counterfactual of the primary variant.

did: FDIDMethodFit#
property donor_weights: Dict[Any, float]#

Donor weights of the primary variant.

fdid: FDIDMethodFit#
property gap: ndarray#

Gap of the primary variant.

inputs: FDIDInputs#
metadata: Dict[str, Any]#
property methods: Dict[str, FDIDMethodFit]#

{method_name: fit} for both fits, FDID first.

property pre_rmse: float#

Pre-treatment RMSE of the primary variant.

se_by_method() Dict[str, float]#

{method: ATT standard error} for both fits.

selected_variant: str = 'FDID'#

Helper Modules#

Data preparation – balances the panel, pivots it, validates the pre-period count, and packs everything into the typed FDIDInputs.

Data preparation for the Forward Difference-in-Differences estimator.

mlsynth.utils.fdid_helpers.setup.prepare_fdid_inputs(df: DataFrame, outcome: str, treat: str, unitid: str, time: str, verbose: bool = True) FDIDInputs#

Balance the panel, pivot it, and package it into FDIDInputs.

Parameters:
  • df (pd.DataFrame) – Long panel with outcome, treatment, unit, and time columns.

  • outcome, treat, unitid, time (str) – Column names identifying the outcome, treatment indicator, unit, and time period.

  • verbose (bool, default True) – Whether the forward-selection path should be recorded step by step.

Returns:

FDIDInputs – Preprocessed panel ready for forward selection.

Raises:
  • MlsynthDataError – If panel balancing or data preparation fails (e.g. no donor units).

  • MlsynthEstimationError – If fewer than two pre-treatment periods are available.

The forward-selection core and the difference-in-differences arithmetic. The public entry points are forward_did_select (the vectorized greedy search) and did_from_mean (the DiD fit for a fixed comparison group); the private helpers documented below are the incremental-mean and batched \(R^2\) primitives described in How mlsynth computes this.

Forward-selection and difference-in-differences estimation for FDID.

This module holds the heavy numerical core of the Forward Difference-in-Differences estimator of Li (2023):

  • forward_did_select() – the vectorised forward-selection loop that greedily adds the donor most improving pre-treatment R^2, tracks the R^2 path, and returns the optimal donor subset alongside the textbook all-donor difference-in-differences benchmark.

  • did_from_mean() – the difference-in-differences estimate for a given donor average (ATT, fit, analytical inference, and vectors).

Both previously lived in the shared selector_helpers grab-bag and the legacy estutils module; they are FDID-specific and now live with the rest of the FDID pipeline.

mlsynth.utils.fdid_helpers.estimation._choose_optimal_subset(selected: List[int], R2_path: ndarray) Tuple[List[int], ndarray]#

Keep the donor prefix up to (and including) the R^2-maximising step.

mlsynth.utils.fdid_helpers.estimation._compute_fdid_result(treated_outcome: ndarray, control_outcomes: ndarray, optimal_idxs: List[int], pre_periods: int, R2_path: ndarray, donor_names: List[Any]) Dict[str, Any]#

Difference-in-differences result for the selected donor subset.

mlsynth.utils.fdid_helpers.estimation._r2_batch(y_c: ndarray, ss_tot: float, X_pre: ndarray) ndarray#

Pre-treatment R^2 of each candidate donor average vs the treated unit.

Parameters:
  • y_c (np.ndarray) – Centred treated pre-treatment vector (y - mean(y)).

  • ss_tot (float) – Total sum of squares of y_c.

  • X_pre (np.ndarray) – Candidate pre-treatment vectors, shape (T0, N).

Returns:

np.ndarray – R^2 for each candidate, shape (N,).

mlsynth.utils.fdid_helpers.estimation._record_verbose_step(intermediary_results: list, it: int, best_idx: int, best_r2: float, r2_cand: ndarray, selected: List[int], donor_names: List[Any], current_mean_pre: ndarray, k: int) None#

Append one forward-selection step to the verbose diagnostics log.

mlsynth.utils.fdid_helpers.estimation._select_best_donor(X_pre: ndarray, current_mean_pre: ndarray, k: int, remaining_idx: ndarray, y_c: ndarray, ss_tot: float) Tuple[int, float, ndarray]#

Pick the remaining donor whose addition maximises pre-period R^2.

mlsynth.utils.fdid_helpers.estimation._update_synthetic_control(current_mean: ndarray, control_outcomes: ndarray, best_idx: int, k: int) ndarray#

Incrementally fold a newly selected donor into the running average.

mlsynth.utils.fdid_helpers.estimation.did_from_mean(treated: ndarray, mean_ctrl: ndarray, pre_periods: int) Dict[str, Any]#

Difference-in-differences estimate from a pre-computed donor average.

Parameters:
  • treated (np.ndarray) – Treated-unit outcome vector, shape (T,).

  • mean_ctrl (np.ndarray) – Average outcome of the selected donor pool, shape (T,).

  • pre_periods (int) – Number of pre-treatment periods T0.

Returns:

dict – Structured result with Effects, Fit, Inference, and Vectors blocks.

mlsynth.utils.fdid_helpers.estimation.forward_did_select(treated_outcome: ndarray, control_outcomes: ndarray, pre_periods: int, donor_names: List[Any], verbose: bool = False) Dict[str, Any]#

Run Li (2023) forward-selected difference-in-differences.

Sequentially adds the control unit that most improves pre-treatment fit (R^2) with the treated unit, tracks the path of R^2 values, and returns both the textbook all-donor DID and the optimal FDID estimate.

Parameters:
  • treated_outcome (np.ndarray) – Treated-unit outcome vector, shape (T,).

  • control_outcomes (np.ndarray) – Outcome matrix for all potential control units, shape (T, N).

  • pre_periods (int) – Number of pre-treatment periods T0.

  • donor_names (list) – Donor labels; length must equal N.

  • verbose (bool, default False) – If True, attach per-step diagnostics under "intermediary".

Returns:

dict{"DID": <all-donor result>, "FDID": <forward-selected result>}.

References

Li, K. T. (2023). Frontiers: A Simple Forward Difference-in-Differences Method. Marketing Science, 43(2), 267-279. https://doi.org/10.1287/mksc.2022.0212

The Li (2023) analytical standard error, confidence interval, and p-value.

Analytical inference for Forward Difference-in-Differences (Li 2023).

Li (2023) derives a closed-form variance for the difference-in-differences ATT estimator. Writing the pre-treatment residuals of the treated unit against its difference-in-differences fit as e_t, the post-period average treatment effect has asymptotic variance

Var(ATT) = (omega_1 + omega_2) / T1,

where omega_2 = mean(e_t^2) is the pre-period residual variance and omega_1 = (T1 / T0) * omega_2 inflates it for the post-period sample size T1. The standard error is the square root of this quantity.

mlsynth.utils.fdid_helpers.inference.did_inference(att: float, pre_residuals: ndarray, pre_periods: int, post_periods: int) Tuple[float, Tuple[float, float], float, float]#

Compute the Li (2023) analytical SE, 95% CI, p-value, and SATT.

Parameters:
  • att (float) – Estimated average treatment effect on the treated.

  • pre_residuals (np.ndarray) – Pre-treatment residuals of the treated unit against its difference-in-differences fit, shape (T0,).

  • pre_periods (int) – Number of pre-treatment periods T0.

  • post_periods (int) – Number of post-treatment periods T1.

Returns:

  • se (float) – Analytical standard error of the ATT (nan if undefined).

  • ci (tuple of float) – (lower, upper) 95% confidence interval.

  • p_value (float) – Two-sided p-value for the ATT.

  • satt (float) – Standardised ATT (att / se * sqrt(T1)).

Assembly of the raw selection output into the typed result containers.

Assemble typed FDID results from the raw estimation dictionaries.

mlsynth.utils.fdid_helpers.results_assembly.assemble_fdid_results(selector_output: Dict[str, Dict[str, Any]], inputs: FDIDInputs) FDIDResults#

Build the typed FDIDResults container.

Parameters:
Returns:

FDIDResults – Container exposing the FDID (primary) and DID fits.

The observed-versus-counterfactual overlay plot for the FDID and DID fits.

Plotting wrapper for the Forward Difference-in-Differences estimator.

mlsynth.utils.fdid_helpers.plotter.plot_fdid(results: FDIDResults, *, time: str, unitid: str, outcome: str, treat: str, treated_color: str, counterfactual_color: str | List[str], save: bool | dict) None#

Plot observed vs FDID and DID counterfactuals.

Plotting failures are downgraded to warnings so a rendering problem never masks a successful estimation.

The Web Appendix E Monte Carlo DGPs (DGP1-DGP4), packaged as simulate_fdid_sample() so the replication in Verification runs as a one-liner.

Web Appendix E Monte Carlo DGPs for the Forward DiD method.

Implements the four data-generating processes from Li, Shi & Huang (2023) Web Appendix E. Each draw produces one treated unit and N controls over T1 + T2 periods, generated from three common factors:

\[\begin{split}f_{1t} &= 0.8 f_{1,t-1} + v_{1t}, \\ f_{2t} &= -0.6 f_{2,t-1} + v_{2t} + 0.8 v_{2,t-1}, \\ f_{3t} &= v_{3t} + 0.9 v_{3,t-1} + 0.4 v_{3,t-2},\end{split}\]

with \(v_{kt} \sim \mathcal{N}(0, 1)\) and outcomes

\[\begin{split}y_{tr,t} &= a_0 + c_0 \mathbf{1}' f_t + \varepsilon_{tr,t}, \\ y_{it} &= 1 + c_1 \mathbf{1}' f_t + \varepsilon_{it} \quad i \le N/2, \\ y_{it} &= 1 + c_2 \mathbf{1}' f_t + \varepsilon_{it} \quad i > N/2,\end{split}\]

where \(\varepsilon_{it} \sim \mathcal{N}(0, 1)\). The four DGPs vary \((a_0, c_0, c_1, c_2)\):

DGP  (a_0, c_0, c_1, c_2)
1    (1, 1, 1, 1) — all controls match (DiD is applicable)
2    (1, 1, 1, 2) — half the controls have mismatched loadings
3    (2, 1, 1, 1) — treated has a different intercept
4    (2, 1, 1, 2) — intercept and half-mismatched loadings

True ATT is zero in every DGP (matching the paper’s PMSE convention; the PMSE is invariant to a constant treatment effect).

Note

The appendix prints f_2t = -0.6 f_{1,t-1} + ... for the lag term, but the Monte Carlo numbers in Li’s Table 5 match the alternative reading -0.6 f_{2,t-1} (ARMA(1,1) on \(f_2\) itself). The latter is used here — it reproduces the paper’s DID PMSE values closely (within ~3%) while the literal reading reproduces only the FDID column.

class mlsynth.utils.fdid_helpers.simulation.FDIDSample(df: DataFrame, Y_treated: ndarray, Y_controls: ndarray, T1: int, T2: int, dgp: int)#

One draw from a Web Appendix E DGP.

df#

Long panel with columns unit / time / y / treat ready to feed to mlsynth.FDID.

Type:

pd.DataFrame

Y_treated#

Treated-unit outcome path, shape (T,).

Type:

np.ndarray

Y_controls#

Control outcomes, shape (N, T). Rows 0..N//2-1 carry loading c_1; rows N//2..N-1 carry loading c_2.

Type:

np.ndarray

T1, T2

Pre- / post-treatment period counts.

Type:

int

dgp#

Which of the four DGPs was drawn.

Type:

int

T1: int#
T2: int#
Y_controls: ndarray#
Y_treated: ndarray#
df: DataFrame#
dgp: int#
mlsynth.utils.fdid_helpers.simulation.simulate_fdid_sample(dgp: int, N: int = 60, T1: int = 24, T2: int = 12, rng: Generator | None = None) FDIDSample#

Draw one sample from FDID Web Appendix E DGP dgp (1-4).

Parameters:
  • dgp (int) – Which DGP to draw (1, 2, 3, or 4).

  • N (int, default 60) – Number of control units (the paper uses N = 60).

  • T1, T2 (int) – Pre- and post-treatment period counts.

  • rng (np.random.Generator, optional) – NumPy RNG. Defaults to np.random.default_rng().

Returns:

FDIDSample