Factor Model Approach (FMA)

Factor Model Approach (FMA)#

When to Use This Method#

Quasi-experiments are the bread and butter of marketing causal inference: a policy or firm decision switches on for one unit (a state legalises recreational marijuana, a digitally native retailer opens a physical showroom in Brooklyn) and you want the sales effect against a pool of untreated cities, stores, or categories. The two reflexive choices each strain in this setting:

Difference-in-differences (DiD) assumes the time effect is the same for the treated unit and every control – the parallel-trends assumption. Marketing panels (sales, share, macro series) rarely oblige: different markets ride different seasonal, regional, and business-cycle waves. When that homogeneity fails, DiD carries a large estimation bias that does not shrink as the panel grows – in Li and Sonnier’s own MSE comparison it is so biased they drop it from the table.
Synthetic control (SC) relaxes that by weighting controls on the simplex (nonnegative, sum-to-one, no intercept), which pins the counterfactual inside the convex hull of the controls. That is a feature when the treated unit is “surrounded” by donors and you want an interpretable convex-combination weight story, but a bug when the treated unit sits outside the donor range, and SC has no inference theory when controls are many and the data are non-stationary of unknown form.

The factor model approach (FMA) of Li and Sonnier ([FMA]) targets exactly the regime SC and DiD struggle with: many control units, time effects that differ across units, a treated path that may lie outside the donor range, and a need for honest confidence intervals. Instead of weighting the controls directly, it first projects the control panel onto a small set of latent factors (interactive fixed effects: $y^N_{jt} = \boldsymbol{\lambda}_j^\top \mathbf{f}_t + e_{jt}$), then regresses the treated unit’s pre-period outcome on those factors with no constraint on the loadings. Three properties follow, and they are the reasons to reach for it:

Heterogeneous time effects. The interactive $\boldsymbol{\lambda}_j^\top \mathbf{f}_t$ structure nests two-way fixed effects (DiD) as the special case $\boldsymbol{\lambda}_j = (\xi_j, 1)^\top$, $\mathbf{f}_t = (1, \delta_t)^\top$, and also absorbs unit-specific trends and cross-sectional dependence. You are not betting on a common shock.
The counterfactual can leave the donor hull. Because the loadings are unrestricted (no nonnegativity, no sum-to-one, no zero intercept), FMA fits treated units whose level or trajectory is outside the range of every control – the case SC structurally cannot represent.
Robust to a large, growing donor pool. Unconstrained regression on the raw controls (the Hsiao-Ching-Wan approach) overfits and breaks down once $N_0$ approaches or exceeds $T_0$. FMA’s dimension reduction sidesteps this: it benefits from more controls (they sharpen the factor estimates) rather than being destabilised by them. The same single factor extraction also scales cheaply to many treated units and staggered timing, since it is done once.

The paper’s headline contribution is the fourth reason: valid, computationally cheap inference. FMA delivers a closed-form normal confidence interval for the ATT (Theorems 3.1 / 3.3) that is valid for both stationary and non-stationary data and – critically – accommodates treated and control units with different error variances. The previously standard Xu (2017) bootstrap assumes $\sigma^2_{\text{tr}} = \sigma^2_{\text{co}}$; when that fails (the norm for non-randomised quasi-experiments) it over- or under-covers badly, while permutation/placebo tests need an analogous symmetry. In the California beer application the estimated variance ratio is $\widehat{\sigma}^2_{\text{tr}} / \widehat{\sigma}^2_{\text{co}} \approx 37$, so the bootstrap interval comes out less than half its honest width. FMA’s normal interval needs no such assumption and no resampling loop.

Factor projection vs. donor weighting#

Every synthetic-control-family estimator answers what reproduces the treated unit’s untreated path? FMA’s structural bet is distinctive:

The controls’ untreated outcomes share a low-dimensional latent factor structure; estimate those factors, then load the treated unit onto them without constraint.

	DiD	Synthetic control	Factor model (FMA)
Comparison built from	all controls, equal weight	all controls, simplex weights	latent factors of the controls
Time effects	homogeneous (common shock)	implicit via weights	heterogeneous $\boldsymbol{\lambda}_j^\top \mathbf{f}_t$
Counterfactual outside donor hull	allowed	no – convex hull	yes – unrestricted loadings
Many controls ($N_0 \gtrsim T_0$)	bias does not shrink	overfits if restrictions relaxed	benefits – sharper factors
Inference, non-stationary data	standard	none available	normal CI (Thm 3.1 / 3.3)
Unequal treated/control variance	n/a	placebo needs symmetry	handled directly

Reach for FMA when#

You have a single treated unit (or a handful, or many – the factor step generalises) and a large control pool, especially $N_0$ large relative to the pre-period $T_0$, where unrestricted regression on raw donors would overfit.
The outcome is plausibly driven by a few common latent factors – regional sales, market share, macro-linked categories – so the control matrix is approximately low-rank. This is what the factor projection exploits.
You expect heterogeneous time dynamics across units (different seasonality, trends, cross-sectional dependence) that violate DiD’s parallel-trends assumption.
The treated unit’s level or trajectory may sit outside the range of the controls, so SC’s convex-hull restriction would distort the fit.
You need formal inference – a hypothesis test or confidence interval for the ATT – and you cannot defend the equal-variance assumption behind the Xu (2017) bootstrap or the symmetry behind permutation/placebo tests. FMA’s normal CI is valid under unequal variances and under non-stationarity, and it is far cheaper than resampling.
You have enough data for the asymptotics: the paper’s simulations show the normal CI is reliable for $N_0 \ge 30$ and $T_0 \ge 30$ (at $T - T_0 = 20$). For smaller panels the paper suggests a $t_{T_0 - (N_0 + 1)}$ reference distribution, which mlsynth does not yet expose; treat the normal CI cautiously there.

Do not use FMA when#

The control matrix has no low-rank / factor structure. FMA, like the PCA-based estimators in Cluster Synthetic Controls (CLUSTERSC), leans entirely on a few factors explaining the controls. If the spectrum decays slowly, the factors are noise; prefer a balancing estimator (MicroSynth (User-Level Balancing SC) for user-level data) or a selection estimator (Forward Difference-in-Differences (FDID)).
A sparse, interpretable convex-combination weight is the deliverable. FMA’s loadings are unconstrained and not a “California = 0.4 Utah + 0.3 Montana” story. If policy storytelling needs nonnegative donor weights that sum to one, use Two-Step Synthetic Control, Forward Difference-in-Differences (FDID), or classic SC.
The donor pool is tiny ($N_0 \le 10$) with a clean canonical SC fit. Factor extraction adds variance without identification gain; Two-Step Synthetic Control or Forward Difference-in-Differences (FDID) are more honest, and Two-Step Synthetic Control will even test whether SC’s restrictions are needed.
The sample is very small ($N_0, T_0 \approx 20$). The normal approximation degrades and the recommended $t$-correction is not wired into the public API.
You want efficiency from the treated unit’s own pre-history with many treated units. Factors are estimated from controls only; the efficiency loss is negligible for a single treated unit but grows with the number of treated units (paper, footnote 3).
Distributional questions (quantile effects, Lorenz/tail changes). FMA targets the mean ATT; use Distributional Synthetic Control (DSC).
Continuous or multi-valued treatment, or spillovers/interference across units. FMA encodes a single binary intervention and assumes the controls are untreated; use Continuous-Treatment Synthetic Control (CTSC) for dose response and Spillover-Aware Synthetic Control (SPILLSYNTH) / Spatial Synthetic Difference-in-Differences (SpSyDiD) under interference.

Overview#

FMA implements Li, K. T., & Sonnier, G. P. (2023). “Statistical Inference for the Factor Model Approach to Estimate Causal Effects in Quasi-Experimental Settings.” Journal of Marketing Research, 60(3):449-472. The estimator constructs a counterfactual for a single treated unit by

extracting principal-component factors from the control panel (with the number of factors chosen by the paper’s modified Bai-Ng criterion for stationary outcomes or Bai (2004) IPC1 for non- stationary outcomes);
projecting the treated unit’s pre-treatment outcomes onto a constant plus the factors via OLS to recover the loading $\widehat{\boldsymbol{\lambda}}_1$;
forming the counterfactual $\widehat{y}_{1t}^N \coloneqq \widetilde{\mathbf{f}}_t^\top \widehat{\boldsymbol{\lambda}}_1$ for every period; the ATT is the mean post-treatment gap.

The paper’s distinctive contribution is valid statistical inference for the ATT. FMA in mlsynth exposes three procedures in parallel; the user picks any subset via FMAConfig.inference_methods:

"asymptotic" (default) – Theorem 3.1 (stationary) / Theorem 3.3 (non-stationary) normal CI for the ATT, built from the variance decomposition $\widehat{\Omega} \coloneqq \widehat{\Omega}_1 + \widehat{\Omega}_2$.
"bootstrap" – Web Appendix F residual bootstrap for per-period $\tau_t$ CIs.
"placebo" – Web Appendix G control-as-pseudo-treated band.

Notation#

Let $j = 1$ denote the treated unit, with all units $\mathcal{N} \coloneqq \{1, \dots, N\}$ and donor pool $\mathcal{N}_0 \coloneqq \mathcal{N} \setminus \{1\}$ of cardinality $N_0$ (the control units). Time runs over $t \in \mathcal{T} \coloneqq \{1, \dots, T\}$, 1-indexed; the intervention takes effect after period $T_0$, splitting $\mathcal{T}$ into the pre-period $\mathcal{T}_1 \coloneqq \{t \in \mathcal{T} : t \le T_0\}$ (of length $T_0$) and the post-period $\mathcal{T}_2 \coloneqq \{t \in \mathcal{T} : t > T_0\}$ (of length $T - T_0$).

The observed outcome is $y_{jt}$, decomposed via Abadie potential outcomes as $y_{jt} = y_{jt}^N + (y_{jt}^I - y_{jt}^N)\, d_{jt}$, with treatment dummy $d_{jt}$ (one only for $j = 1$ and $t \in \mathcal{T}_2$). The no-intervention outcome follows the interactive factor model $y_{jt}^N = \boldsymbol{\lambda}_j^\top \mathbf{f}_t + e_{jt}$, with the $r$-vector of latent common factors $\mathbf{f}_t \in \mathbb{R}^{r}$, unit-specific loadings $\boldsymbol{\lambda}_j \in \mathbb{R}^{r}$, and idiosyncratic error $e_{jt}$. Estimated factors are $\widehat{\mathbf{f}}_t$; the augmented regressor stacking a constant is $\widetilde{\mathbf{f}}_t \coloneqq [1, \widehat{\mathbf{f}}_t^\top]^\top$. The estimated treated loading is $\widehat{\boldsymbol{\lambda}}_1$, the counterfactual $\widehat{y}_{1t}^N \coloneqq \widetilde{\mathbf{f}}_t^\top \widehat{\boldsymbol{\lambda}}_1$, the per-period effect $\tau_t \coloneqq y_{1t} - \widehat{y}_{1t}^N$, and the ATT $\widehat{\tau} \coloneqq |\mathcal{T}_2|^{-1} \sum_{t \in \mathcal{T}_2} \tau_t$. The control-error and treated-error standard deviations are $\sigma_{\text{co}}$ and $\sigma_{\text{tr}}$; the significance level is $\alpha$.

Assumptions#

Assumption 1 (factor structure). The no-intervention outcomes admit a low-rank interactive structure $y_{jt}^N = \boldsymbol{\lambda}_j^\top \mathbf{f}_t + e_{jt}$ with a fixed number $r$ of common factors, and the control panel’s leading sample eigenvalues separate from the noise floor (an approximate-factor / pervasiveness condition).

Remark. This is FMA’s structural bet, shared with the PCA estimators in Cluster Synthetic Controls (CLUSTERSC): the whole method leans on a few factors explaining the controls. If the spectrum decays slowly the extracted factors are noise, and the projection in the next step has nothing real to load onto.

Assumption 2 (no anticipation, untreated controls). Treatment has no effect before $T_0$ ($y_{1t} = y_{1t}^N$ for $t \in \mathcal{T}_1$), and every control $j \in \mathcal{N}_0$ is untreated over $\mathcal{T}$, so the factors and the treated loading are estimated from no-intervention outcomes only.

Remark. Pre-period contamination – the treated unit reacting before the nominal date, or a control caught by the same shock – biases the loading fit and hence the counterfactual. Date $T_0$ at the first plausible response and quarantine contaminated donors.

Assumption 3 (stable loadings). The treated unit’s loading $\boldsymbol{\lambda}_1$ recovered on $\mathcal{T}_1$ continues to govern its no-intervention path on $\mathcal{T}_2$, so the pre-period projection extrapolates forward.

Remark. Unlike SC, FMA places no constraint on $\boldsymbol{\lambda}_1$ – no nonnegativity, sum-to-one, or zero intercept – so the counterfactual may leave the donor hull. What licenses the forward projection is constancy of the loading across the two windows, not convex-hull support; a regime change in $\mathbf{f}_t$ unrelated to treatment breaks it even with a perfect pre-fit.

Assumption 4 (regularity for inference). The errors $e_{jt}$ satisfy the moment and weak-dependence conditions of Li & Sonnier’s Theorems 3.1 / 3.3, and both $T_0$ and $N_0$ are large enough for the factor estimates to converge – but treated and control error variances need not be equal ($\sigma_{\text{tr}} \ne \sigma_{\text{co}}$ is permitted).

Remark. This is the load-bearing relaxation: the asymptotic CI below allows unequal treated/control variances, which is exactly where the Xu (2017) bootstrap miscovers. The cost is needing enough data for the normal approximation – the paper’s simulations support $N_0, T_0 \ge 30$.

Mathematical Formulation#

Setup#

We observe one treated unit ($j = 1$) and $N_0$ control units over $T$ periods. Treatment begins after $T_0$. Under the factor model

\[y_{jt}^N = \boldsymbol{\lambda}_j^\top \mathbf{f}_t + e_{jt}, \qquad y_{jt} = y_{jt}^N + d_{jt}\, \tau_t,\]

the goal is to estimate

\[\widehat{\tau} \coloneqq \frac{1}{T - T_0} \sum_{t \in \mathcal{T}_2} \tau_t = \frac{1}{T - T_0} \sum_{t \in \mathcal{T}_2} (y_{1t} - y_{1t}^N).\]

Factor extraction#

Factors $\widehat{\mathbf{f}}_t$ are extracted from the control panel via PCA after demeaning (or standardising) each control series. The number of factors $r$ is chosen by one of two criteria:

Modified Bai-Ng (MBN) – stationary data. Choose $r \in \{0, 1, \dots, r_{\max}\}$ minimising

\[PC_{MBN}(r) = \frac{1}{N_0 T} \sum_{j \in \mathcal{N}_0} \sum_{t \in \mathcal{T}} (y_{jt} - \widehat{\boldsymbol{\lambda}}_j^\top \widehat{\mathbf{f}}_t)^2 + c_{N, T}\, r\, \widehat{\sigma}^2 \frac{N_0 + T}{N_0 T} \log \frac{N_0 + T}{N_0 T},\]

with the small-sample adjustment

\[c_{N, T} = \frac{(N_0 + \max(70 - N_0, 0)) (T + \max(70 - T, 0))}{N_0 T}.\]

When N_0, T >= 70 the adjustment collapses to 1 and the criterion is identical to Bai-Ng (2002) $PC_{p1}$.
Bai (2004) IPC1 – non-stationary data. Replaces the small-sample factor with a log-log adjustment suited to non- stationary factors.

Override the data-driven selection by passing FMAConfig.n_factors directly.

Loading and counterfactual#

With $\widetilde{\mathbf{f}}_t \coloneqq [1, \widehat{\mathbf{f}}_t^\top]^\top$ and pre-period OLS,

\[\widehat{\boldsymbol{\lambda}}_1 = \biggl(\sum_{t \in \mathcal{T}_1} \widetilde{\mathbf{f}}_t \widetilde{\mathbf{f}}_t^\top\biggr)^{-1} \sum_{t \in \mathcal{T}_1} \widetilde{\mathbf{f}}_t\, y_{1t}, \qquad \widehat{y}_{1t}^N = \widetilde{\mathbf{f}}_t^\top \widehat{\boldsymbol{\lambda}}_1.\]

The ATT is $\widehat{\tau} = (T - T_0)^{-1} \sum_{t \in \mathcal{T}_2} (y_{1t} - \widehat{y}_{1t}^N)$.

Asymptotic inference (Theorem 3.1 / 3.3)#

Write $\bar{\widetilde{\mathbf{f}}}_2 \coloneqq (T - T_0)^{-1} \sum_{t \in \mathcal{T}_2} \widetilde{\mathbf{f}}_t$ and $\widehat{\boldsymbol{\Psi}} \coloneqq (\sum_{t \in \mathcal{T}_1} \widetilde{\mathbf{f}}_t \widetilde{\mathbf{f}}_t^\top)^{-1}$. The paper shows

\[\widehat{\Omega} = \widehat{\Omega}_1 + \widehat{\Omega}_2, \quad \widehat{\Omega}_1 = \frac{T - T_0}{T_0}\, \bar{\widetilde{\mathbf{f}}}_2^\top\, \widehat{\boldsymbol{\Psi}}\, \bar{\widetilde{\mathbf{f}}}_2, \quad \widehat{\Omega}_2 = \widehat{\sigma}_e^2,\]

with $\widehat{\sigma}_e^2$ the variance of the pre-treatment residuals. The $(1 - \alpha)$ CI for the ATT is

\[\widehat{\tau} \pm z_{1 - \alpha / 2}\, \frac{\sqrt{\widehat{\Omega}}}{\sqrt{T - T_0}}.\]

A two-sided z-test of $H_0: \tau = 0$ reports the p-value.

Bootstrap inference (Web Appendix F)#

The paper notes that the per-period $\tau_t$ CI cannot shrink to zero as $T_0, T - T_0, N_0 \to \infty$ because its leading term is the idiosyncratic shock $e_{1t}$ itself. A residual bootstrap therefore drives the per-period CI:

Compute pre-period residuals $\widehat{e}_{1t} = y_{1t} - \widetilde{\mathbf{f}}_t^\top \widehat{\boldsymbol{\lambda}}_1$, $t \in \mathcal{T}_1$.
For each bootstrap draw $b = 1, \dots, B$:
- Sample $e^\ast_{1t}$ from $\{\widehat{e}_{1t}\}$ with replacement for every $t \in \mathcal{T}$.
- Form $y^\ast_{1t} = \widetilde{\mathbf{f}}_t^\top \widehat{\boldsymbol{\lambda}}_1 + e^\ast_{1t}$.
- Re-estimate $\widehat{\boldsymbol{\lambda}}^{\ast}_1$ from the bootstrap pre-period.
- Compute $\tau^\ast_{1t} = y^\ast_{1t} - \widetilde{\mathbf{f}}_t^\top \widehat{\boldsymbol{\lambda}}^{\ast}_1$ for $t \in \mathcal{T}_2$.
The $(1 - \alpha)$ CI for $\tau_t$ is

\[\bigl[\widehat{\tau}_{1t} - \tau^\ast_{1t, ((1 - \alpha/2) B)}, \widehat{\tau}_{1t} - \tau^\ast_{1t, (\alpha B / 2)}\bigr].\]

Placebo inference (Web Appendix G)#

Set each control unit in turn as a pseudo-treated unit and re-fit the factor model on the remaining controls; collect the $N_0$ pseudo-ATT curves and report their pointwise $(\alpha/2, 1 - \alpha/2)$ quantile band. The paper notes the placebo test is sensitive to error-variance heterogeneity between the treated unit and the controls – when that assumption is violated the asymptotic CI from the previous paragraph is preferred.

Core API#

Factor Model Approach (FMA) estimator.

Li, K. T., & Sonnier, G. P. (2023). “Statistical Inference for the Factor Model Approach to Estimate Causal Effects in Quasi-Experimental Settings.” Journal of Marketing Research, 60(3), 449-472.

FMA estimates the ATT for a single treated unit by:

Extracting principal-component factors from the control panel (paper Section 3.1; factor count chosen by the modified Bai-Ng (MBN) criterion for stationary data or Bai (2004) IPC1 for non-stationary data – see Web Appendix D).
Projecting the treated unit’s pre-treatment outcomes onto a constant + the factors via OLS to recover the loading.
Forming the counterfactual hat y^0_{1, t} = F_aug_t' lambda_hat for every period; the ATT is the mean post-treatment gap.

Inference is the paper’s main contribution. Three procedures live side by side:

asymptotic (default) – Theorem 3.1 (stationary) / Theorem 3.3 (non-stationary) normal CI for the ATT, built from the variance decomposition Omega_hat = Omega1 + Omega2.
bootstrap – Web Appendix F residual bootstrap for per-period ATT_t CIs.
placebo – Web Appendix G control-as-pseudo-treated band.

Activate any combination via FMAConfig.inference_methods.

class mlsynth.estimators.fma.FMA(config: FMAConfig | dict)#

Bases: object

Factor Model Approach (Li & Sonnier 2023) estimator.

Parameters:: config (FMAConfig or dict) – Configuration object. See mlsynth.config_models.FMAConfig.
Returns:: FMAResults – Frozen container with the optimised design, asymptotic + bootstrap + placebo inference, counterfactual, ATT, and per-period gap.

fit() → FMAResults#: Run the FMA pipeline end to end.

Configuration#

class mlsynth.config_models.FMAConfig(*, df: ~pandas.DataFrame, outcome: str, treat: str, unitid: str, time: str, display_graphs: bool = True, save: bool | str = False, counterfactual_color: ~typing.List[str] = <factory>, treated_color: str = 'black', plot: ~mlsynth.config_models.PlotConfig = <factory>, stationarity: ~typing.Literal['stationary', 'nonstationary'] = 'nonstationary', preprocessing: ~typing.Literal['demean', 'standardize'] = 'demean', n_factors: ~typing.Annotated[int | None, ~annotated_types.Ge(ge=1)] = None, max_factors: ~typing.Annotated[int, ~annotated_types.Ge(ge=1)] = 10, alpha: ~typing.Annotated[float, ~annotated_types.Gt(gt=0.0), ~annotated_types.Lt(lt=1.0)] = 0.05, inference_methods: ~typing.List[str] = <factory>, n_bootstrap: ~typing.Annotated[int, ~annotated_types.Ge(ge=100)] = 1000, bootstrap_seed: int = 0)#

Configuration for the Factor Model Approach (FMA) estimator.

Implements Li & Sonnier (2023), “Statistical Inference for the Factor Model Approach to Estimate Causal Effects in Quasi-Experimental Settings”, JMR 60(3):449-472. FMA estimates the ATT for a single treated unit by extracting principal- component factors from the control panel, projecting the treated unit’s pre-period outcomes onto those factors, and using the fitted loading to predict the untreated potential outcome in the post-period.

Parameters:

stationarity ({“stationary”, “nonstationary”}) – Selects the factor-selection criterion: "stationary" uses the paper’s modified Bai-Ng (MBN) criterion (Web Appendix D.1); "nonstationary" uses Bai (2004) IPC1 with a log-log adjustment for non-stationary factors. Default: "nonstationary" (the paper’s recommendation for general applied settings).
preprocessing ({“demean”, “standardize”}) – Preprocessing applied to the control panel before PCA.
n_factors (int or None) – Override the data-driven factor count. None triggers the criterion in stationarity.
max_factors (int) – Upper bound passed to the factor-selection routine.
alpha (float) – Two-sided significance level for CIs.
inference_methods (list of {“asymptotic”, “bootstrap”, “placebo”}) – Inference procedures to run. Defaults to ["asymptotic"], which gives the paper’s Theorem 3.1 normal CI for the ATT. Add "bootstrap" to get per-period ATT_t CIs via the Web Appendix F residual bootstrap, and "placebo" to get the Web Appendix G control-as-pseudo-treated band.
n_bootstrap (int) – Number of bootstrap replicates (Web Appendix F). Ignored when "bootstrap" is not in inference_methods.
bootstrap_seed (int) – Seed for the bootstrap RNG.

alpha: float#

bootstrap_seed: int#

classmethod check_fma_params(values: Any) → Any#

inference_methods: List[str]#

max_factors: int#

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_bootstrap: int#

n_factors: int | None#

preprocessing: Literal['demean', 'standardize']#

stationarity: Literal['stationary', 'nonstationary']#

Helper Modules#

Data preparation for the FMA estimator.

mlsynth.utils.fma_helpers.setup.prepare_fma_inputs(df: DataFrame, outcome: str, treat: str, unitid: str, time: str, preprocessing: str = 'demean', stationarity: str = 'nonstationary') → FMAInputs#

Pivot the panel and assemble FMA inputs.

Parameters:

df (pd.DataFrame) – Long balanced panel with one row per (unit, time).
outcome, treat, unitid, time (str) – Column names.
preprocessing ({“demean”, “standardize”}) – Preprocessing applied to the control panel before factor extraction. Demeaning (default) follows Bai (2003); standardizing is preferred when the control series have very heterogeneous variances.
stationarity ({“stationary”, “nonstationary”}) – Selects the factor-selection criterion downstream: "stationary" uses the modified Bai-Ng (MBN) criterion (Li & Sonnier 2023, Web Appendix D), "nonstationary" uses Bai (2004) IPC1.

Returns:

FMAInputs – Preprocessed panel.

Raises:

MlsynthDataError, MlsynthConfigError –

Factor extraction + selection for FMA.

Wraps the bai_ng.nbpiid() helper to apply the appropriate Li & Sonnier (2023) selection criterion:

stationarity="stationary" -> modified Bai-Ng (MBN) – nbpiid criterion code 10 with N_co + max(70 - N_co, 0) / T + max(70 - T, 0) adjustments (Web Appendix D.1).
stationarity="nonstationary" -> Bai (2004) IPC1 with the log-log adjustment – nbpiid criterion code 11.

A user may also bypass the data-driven selection by passing n_factors directly to extract_factors().

mlsynth.utils.fma_helpers.factors.extract_factors(control_panel: ndarray, stationarity: str = 'nonstationary', preprocessing: str = 'demean', n_factors: int | None = None, max_factors: int = 10) → Tuple[int, ndarray, ndarray, str]#

Estimate the factor structure from the control panel.

Parameters:

control_panel (np.ndarray) – (T, N_co) matrix of control-unit outcomes.
stationarity ({“stationary”, “nonstationary”}) – Drives the factor-selection criterion (MBN vs IPC1).
preprocessing ({“demean”, “standardize”}) – Mapped to nbpiid’s preprocessing_method_code: 1 for demean, 2 for standardize.
n_factors (int, optional) – Override the data-driven selection. None triggers MBN/IPC1.
max_factors (int) – Upper bound passed to the selection routine; ignored when n_factors is supplied.

Returns:

n_factors_selected (int) – Final number of factors used.
common_component (np.ndarray) – (T, N_co) reconstruction of the control panel.
factors (np.ndarray) – (T, r) matrix of estimated factors.
source (str) – "MBN", "IPC1", or "user".

Treated-unit projection and counterfactual construction for FMA.

Given the factor estimates F_hat from the control panel, the treated unit’s factor loading is recovered by OLS of its pre-period outcomes on the factors (with a constant):

\[\hat \lambda_1 = (\sum_{t=1}^{T_0} \tilde F_t \tilde F_t')^{-1} \sum_{t=1}^{T_0} \tilde F_t \, y_{1, t},\]

with $\tilde F_t = [1, F_{ht}']'$. The full-period counterfactual is $\hat y^0_{1, t} = \tilde F_t' \hat\lambda_1$.

mlsynth.utils.fma_helpers.fit.estimate_loading_and_counterfactual(treated_outcome: ndarray, factors: ndarray, T0: int) → Tuple[ndarray, ndarray, ndarray, float]#

OLS projection of the treated pre-period onto the factors.

Parameters:

treated_outcome (np.ndarray) – Treated outcome series, shape (T,).
factors (np.ndarray) – Estimated factor matrix, shape (T, r).
T0 (int) – Number of pre-treatment periods.

Returns:

lambda_hat (np.ndarray) – Loading vector (r + 1,) (constant + factor loadings).
counterfactual (np.ndarray) – Predicted untreated outcome for every period, shape (T,).
factors_with_const (np.ndarray) – Factor matrix with the constant column prepended, shape (T, r + 1). Returned for downstream variance plug-ins.
residual_variance (float) – Variance of the pre-treatment residuals.

Inference procedures for FMA (Li & Sonnier 2023).

Three procedures live here and can run in any combination:

asymptotic_inference() – Theorem 3.1 (stationary) / Theorem 3.3 (non-stationary) normal CI for the ATT.
bootstrap_inference() – Web Appendix F residual bootstrap for per-period ATT_t CIs. Uses the pre-period residuals as the bootstrap distribution and refits the loading on each draw, so the CIs reflect the joint variability of the loading estimate and the idiosyncratic shock at time t.
placebo_inference() – Web Appendix G placebo test where every control is treated as a pseudo-treated unit in turn. Returns the pointwise quantile band across the placebo ATT curves.

mlsynth.utils.fma_helpers.inference.asymptotic_inference(treated_outcome: ndarray, counterfactual: ndarray, factors_with_const: ndarray, residual_variance: float, T0: int, alpha: float = 0.05) → Tuple[float, float, float, float]#

Theorem 3.1 normal CI for the ATT.

Returns:: se_att, lower, upper, p_value

mlsynth.utils.fma_helpers.inference.bootstrap_inference(treated_outcome: ndarray, factors: ndarray, counterfactual: ndarray, T0: int, alpha: float = 0.05, n_replicates: int = 1000, seed: int = 0) → dict#

Web Appendix F residual bootstrap for per-period ATT_t CIs.

Procedure#

Compute pre-period residuals e_hat_1t = y_1t - F_aug_t' lambda_hat for t = 1, ..., T0.
For each bootstrap draw b = 1, …, B: a. Sample e*_1t from {e_hat} with replacement for every t. b. Form y*_1t = F_aug_t' lambda_hat + e*_1t. c. Re-fit the loading on the bootstrap pre-period. d. Compute Delta*_1t = y*_1t - F_aug_t' lambda*_hat for t > T0.
The (1 - alpha) CI for Delta_1t is [Delta_hat_1t - q_{1 - alpha/2}, Delta_hat_1t - q_{alpha/2}], with quantiles taken across bootstrap replicates of Delta*.

mlsynth.utils.fma_helpers.inference.placebo_inference(control_outcomes: ndarray, treated_outcome: ndarray, T0: int, n_factors: int | None, stationarity: str, preprocessing: str, alpha: float = 0.05, max_factors: int = 10) → dict#

Web Appendix G: pseudo-ATT curves with each control as the treated unit.

For each control k, swap it into the treated slot, refit the factor model on the remaining N_co - 1 controls, project the pseudo-treated pre-period onto those factors, compute the pseudo- ATT curve. The output band is the pointwise alpha/2 / (1 - alpha/2) quantile across the placebo curves at each period.

Parameters:

control_outcomes (np.ndarray) – (T, N_co) control panel.
treated_outcome (np.ndarray) – Real treated outcome series, shape (T,). Used for the leading row of the curves matrix (so the caller can compare).
T0 (int) – Pre-treatment periods.
n_factors (int or None) – Number of factors to fit on each placebo iteration. None means re-select per iteration via the criterion in stationarity.
stationarity ({“stationary”, “nonstationary”})
preprocessing ({“demean”, “standardize”})
alpha (float) – Significance level for the quantile band.
max_factors (int) – Upper bound on the factor-selection routine.

Returns:

dict – Keys curves ((N_co + 1, T); first row is the real treated unit), q_lower, q_upper ((T,) bands).

The Web Appendix E.1 Monte Carlo DGP1, packaged as simulate_fma_sample() so the Verification replication below runs as a one-liner.

Li & Sonnier (2023) DGP1 / DGP2 simulation helpers for the FMA estimator.

Implements the data-generating processes from the paper’s Web Appendix E.

DGP1 (Section 4.1, stationary factors, Hsiao-Ching-Wan 2012):

\[\begin{split}f_{1t} &= 0.8\, f_{1,t-1} + v_{1t}, \\ f_{2t} &= -0.68\, f_{2,t-1} + v_{2t} + 0.8\, v_{2,t-1}, \\ f_{3t} &= v_{3t} + 0.9\, v_{3,t-1} + 0.4\, v_{3,t-2}, \qquad v_{kt} \sim \mathcal{N}(0, 1).\end{split}\]

DGP2 (Web Appendix E.1, non-stationary factors):

\[\begin{split}F_{1t} &= (0.2 + \xi_t)\, t + \varepsilon_{5t} \qquad \text{(linear trend, random slope)} \\ F_{2t} &= F_{2,t-1} + \varepsilon_{4t} \qquad \text{(drift-less unit root)} \\ F_{3t} &= \sqrt{t} + \varepsilon_{6t} + 0.9\, \varepsilon_{6,t-1} + 0.4\, \varepsilon_{6,t-2} \qquad \text{(sqrt-}t\text{ trend, MA(2) errors)}\end{split}\]

with $\xi_t \sim \mathrm{Uniform}[0,1]$ and $\varepsilon_{kt} \sim \mathcal{N}(0, 1)$. (The paper’s TeX prints F_{2t} = F_{4,t-1} + ε_{4t}, which is a typo; the surrounding text describes $F_{2t}$ as a drift-less unit root, so F_{4,t-1} reads as F_{2,t-1}.)

Either DGP feeds the same factor model

\[y_{it}^0 = \alpha + F_t' \lambda_i + e_{it}, \qquad i = 1, \ldots, N, \quad t = 1, \ldots, T,\]

with loadings $\lambda_{il} \stackrel{\text{iid}}{\sim} \mathcal{N}(1, 1)$ and idiosyncratic shocks

\[e_{1t} \sim \mathcal{N}(0, \sigma_{\text{tr}}^2), \qquad e_{it} \sim \mathcal{N}(0, \sigma_{\text{co}}^2) \quad (i \ge 2).\]

The paper studies three variance regimes via the keyword variance_case:

"equal" — $\sigma_{\text{tr}}^2 = \sigma_{\text{co}}^2 = 1$ (Figures 2, W.5; both the new CI and Xu’s bootstrap nominal);
"treated_smaller" — $\sigma_{\text{tr}} = 0.5 \sigma_{\text{co}}$ (Figures 3, W.6; Xu’s bootstrap overcovers);
"treated_larger" — $\sigma_{\text{tr}} = 2 \sigma_{\text{co}}$ (Figures 4, W.7; Xu’s bootstrap undercovers).

True ATT = 0 in every draw; the paper’s centred statistic $\sqrt{T_2}(\hat\Delta_1 - \Delta_1)$ is invariant to a constant treatment effect (see the equation following Equation 4.2), so coverage is unaffected by setting $\Delta_{1t} = 0$.

class mlsynth.utils.fma_helpers.simulation.FMASample(df: DataFrame, Y_treated: ndarray, Y_controls: ndarray, factors: ndarray, T1: int, T2: int, sigma_tr: float, sigma_co: float, dgp: str = 'dgp1')#

One draw from a Li & Sonnier DGP.

df#

Long panel with columns unit / time / y / D ready for mlsynth.FMA.

Type:: pd.DataFrame

Y_treated#

Treated-unit outcome path, shape (T,).

Type:: np.ndarray

Y_controls#

Control outcomes, shape (N_co, T).

Type:: np.ndarray

factors#

Common factor matrix, shape (T, 3).

Type:: np.ndarray

T1, T2

Pre- / post-treatment period counts.

Type:: int

sigma_tr, sigma_co

Idiosyncratic error standard deviations actually drawn.

Type:: float

dgp#

Which DGP was drawn ("dgp1" or "dgp2").

Type:: str

T1: int#

T2: int#

Y_controls: ndarray#

Y_treated: ndarray#

df: DataFrame#

dgp: str = 'dgp1'#

factors: ndarray#

sigma_co: float#

sigma_tr: float#

mlsynth.utils.fma_helpers.simulation.simulate_fma_sample(dgp: Literal['dgp1', 'dgp2'] = 'dgp1', N_co: int = 30, T1: int = 30, T2: int = 20, variance_case: Literal['equal', 'treated_smaller', 'treated_larger'] = 'equal', alpha: float = 0.0, rng: Generator | None = None) → FMASample#

Draw one sample from a Li & Sonnier (2023) DGP.

Parameters:

dgp ({“dgp1”, “dgp2”}, default "dgp1") – Stationary (DGP1, Section 4.1) or non-stationary (DGP2, Web Appendix E.1) factor process.
N_co (int, default 30) – Number of control units (the paper’s coverage exercise uses 30).
T1, T2 (int, default 30, 20) – Pre- and post-treatment period counts.
variance_case ({“equal”, “treated_smaller”, “treated_larger”}) – Which of the paper’s three variance regimes to simulate.
alpha (float, default 0.0) – Intercept shared by every unit (the paper’s $\alpha$; coverage is invariant to its value).
rng (np.random.Generator, optional) – NumPy RNG. Defaults to np.random.default_rng().

Returns:

FMASample

Diagnostic plot for FMA results.

mlsynth.utils.fma_helpers.plotter.plot_fma(results: FMAResults) → None#: Two-panel plot: trajectories + per-period gap with CI bands.

Note

FMA.fit() returns an EffectResult on the standardized two-family contract: res.att / res.att_ci / res.counterfactual / res.gap / res.pre_rmse resolve through the standardized sub-models (FMA is a factor-model counterfactual, so it carries no donor weights). The full asymptotic/bootstrap/placebo inference is on res.inference_detail (the bare res.inference slot is reserved for the standardized ATT-level InferenceResults).

Frozen dataclasses for the Factor Model Approach (FMA) estimator.

FMA implements Li and Sonnier (2023), “Statistical Inference for the Factor Model Approach to Estimate Causal Effects in Quasi-Experimental Settings”, JMR 60(3):449-472. The estimator constructs a counterfactual for a single treated unit by (i) extracting principal-component factors from the control panel, (ii) projecting the treated unit’s pre-period outcomes onto those factors, and (iii) using the resulting loadings to predict the treated unit’s untreated potential outcomes in the post-period.

Inference is the paper’s main contribution. Three procedures live side-by-side in FMAInference:

asymptotic – Theorem 3.1 (stationary) / Theorem 3.3 (non-stationary) normal CIs for the ATT.
bootstrap – Web Appendix F residual bootstrap for per-period ATT_t CIs.
placebo – Web Appendix G control-as-pseudo-treated bands.

The four layers below (inputs, design, inference, results) keep that pipeline pluggable.

class mlsynth.utils.fma_helpers.structures.FMADesign(n_factors: int, n_factors_source: str, factors: ndarray, lambda_hat: ndarray, counterfactual: ndarray, gap: ndarray, common_component: ndarray, residual_variance: float)#

Factor-model design produced by the FMA fit.

Parameters:

n_factors (int) – Number of common factors selected (or user-supplied).
n_factors_source (str) – "MBN" (modified Bai-Ng), "IPC1" (Bai 2004), "user" (user override).
factors (np.ndarray) – Estimated factors F_hat, shape (T, r).
lambda_hat (np.ndarray) – Estimated factor loading for the treated unit, shape (r + 1,) (intercept + loadings).
counterfactual (np.ndarray) – Predicted treated potential outcome under no treatment, shape (T,).
gap (np.ndarray) – Observed treated minus counterfactual, shape (T,).
common_component (np.ndarray) – Reconstructed control panel via factor projection, shape (T, N_co).
residual_variance (float) – Variance of the pre-treatment residuals y_1t - F_hat_t' lambda_hat.

common_component: ndarray#

counterfactual: ndarray#

factors: ndarray#

gap: ndarray#

lambda_hat: ndarray#

n_factors: int#

n_factors_source: str#

residual_variance: float#

class mlsynth.utils.fma_helpers.structures.FMAInference(method: str, alpha: float, att: float, asymptotic_att_se: float = nan, asymptotic_att_lower: float = nan, asymptotic_att_upper: float = nan, asymptotic_att_p_value: float = nan, bootstrap_att_t_lower: ~numpy.ndarray = <factory>, bootstrap_att_t_upper: ~numpy.ndarray = <factory>, bootstrap_replicates: ~numpy.ndarray = <factory>, bootstrap_n_replicates: int = 0, placebo_att_curves: ~numpy.ndarray = <factory>, placebo_quantile_lower: ~numpy.ndarray = <factory>, placebo_quantile_upper: ~numpy.ndarray = <factory>)#

Combined asymptotic + bootstrap + placebo inference output.

Parameters:

method (str) – Primary inference method tag ("asymptotic", "bootstrap", "placebo", or "none").
alpha (float) – Two-sided significance level.
att (float) – Mean post-treatment gap.
asymptotic_att_se (float) – Standard error of the ATT from the paper’s Theorem 3.1.
asymptotic_att_lower (float)
asymptotic_att_upper (float) – (1 - alpha) asymptotic CI bounds.
asymptotic_att_p_value (float) – Two-sided z-test of H_0: ATT = 0.
bootstrap_att_t_lower (np.ndarray)
bootstrap_att_t_upper (np.ndarray) – Per-period bootstrap CI bounds for ATT_t, shape (n_post,). Empty when method != "bootstrap".
bootstrap_replicates (np.ndarray) – (B, n_post) matrix of bootstrap replicate ATT_t.
bootstrap_n_replicates (int) – Number of bootstrap draws actually completed.
placebo_att_curves (np.ndarray) – (N_co, T) matrix of pseudo-ATT curves (one per control used as a placebo treated unit). Empty when method != "placebo".
placebo_quantile_lower (np.ndarray)
placebo_quantile_upper (np.ndarray) – Pointwise (alpha/2, 1 - alpha/2) quantile bands across the placebo curves at each period. Shape (T,).

alpha: float#

asymptotic_att_lower: float = nan#

asymptotic_att_p_value: float = nan#

asymptotic_att_se: float = nan#

asymptotic_att_upper: float = nan#

att: float#

bootstrap_att_t_lower: ndarray#

bootstrap_att_t_upper: ndarray#

bootstrap_n_replicates: int = 0#

bootstrap_replicates: ndarray#

method: str#

placebo_att_curves: ndarray#

placebo_quantile_lower: ndarray#

placebo_quantile_upper: ndarray#

class mlsynth.utils.fma_helpers.structures.FMAInputs(treated_outcome: ndarray, control_outcomes: ndarray, donor_names: ndarray, treated_unit_name: Any, T: int, T0: int, time_labels: ndarray, preprocessing: str, stationarity: str)#

Preprocessed panel data for FMA estimation.

Parameters:

treated_outcome (np.ndarray) – Outcome series for the treated unit, shape (T,).
control_outcomes (np.ndarray) – Outcome matrix for the N_co control units, shape (T, N_co).
donor_names (np.ndarray) – Labels of the control units, length N_co.
treated_unit_name (Any) – Label of the treated unit.
T (int) – Total number of panel periods.
T0 (int) – Number of pre-treatment periods.
time_labels (np.ndarray) – Labels of the time periods, length T.
preprocessing (str) – "demean" (default) or "standardize" – preprocessing applied to the control panel before PCA.
stationarity (str) – "stationary" or "nonstationary" – selects the appropriate factor-selection criterion (MBN vs Bai-IPC1).

property N_co: int#: Number of control units.

T: int#

T0: int#

control_outcomes: ndarray#

donor_names: ndarray#

property n_post: int#: Number of post-treatment periods.

preprocessing: str#

stationarity: str#

time_labels: ndarray#

treated_outcome: ndarray#

treated_unit_name: Any#

class mlsynth.utils.fma_helpers.structures.FMAResults(*, effects: ~mlsynth.config_models.EffectsResults | None = None, fit_diagnostics: ~mlsynth.config_models.FitDiagnosticsResults | None = None, time_series: ~mlsynth.config_models.TimeSeriesResults | None = None, weights: ~mlsynth.config_models.WeightsResults | None = None, inference: ~mlsynth.config_models.InferenceResults | None = None, method_details: ~mlsynth.config_models.MethodDetailsResults | None = None, sub_method_results: ~typing.Dict[str, ~typing.Any] | None = None, additional_outputs: ~typing.Dict[str, ~typing.Any] | None = None, raw_results: ~typing.Dict[str, ~typing.Any] | None = None, execution_summary: ~typing.Dict[str, ~typing.Any] | None = None, plot_config: ~mlsynth.config_models.PlotConfig | None = None, inputs: ~mlsynth.utils.fma_helpers.structures.FMAInputs, design: ~mlsynth.utils.fma_helpers.structures.FMADesign, inference_detail: ~mlsynth.utils.fma_helpers.structures.FMAInference, metadata: ~typing.Dict[str, ~typing.Any] = <factory>)#

Top-level container returned by mlsynth.FMA.fit().

An EffectResult (the observational report): it populates the standardized sub-models so the flat accessors (att / att_ci / counterfactual / gap / pre_rmse) resolve through the base contract. FMA is a factor-model counterfactual, so it carries no donor weights; the rich asymptotic/bootstrap/placebo inference lives on inference_detail (the standardized inference slot mirrors the ATT-level asymptotic CI).

Parameters:

inputs (FMAInputs) – Preprocessed panel.
design (FMADesign) – Factor-model design.
inference_detail (FMAInference) – Full inference output (was inference before the contract migration).
metadata (dict) – Free-form pipeline diagnostics.

design: FMADesign#

inference_detail: FMAInference#

inputs: FMAInputs#

metadata: Dict[str, Any]#

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'frozen': True, 'json_encoders': {<class 'numpy.ndarray'>: <function BaseEstimatorResults.Config.<lambda>>}}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property n_factors: int#: Selected number of factors.

Example#

A self-contained one-draw Monte Carlo at a non-stationary two-factor DGP – the regime the paper’s empirical California / Brooklyn applications fall into. The example fits FMA in all three inference modes and prints the headline output.

"""One draw of a Li & Sonnier (2023) factor-model simulation."""

import numpy as np
import pandas as pd

from mlsynth import FMA


# ---------------------------------------------------------------------
# 1. Simulate one panel from a non-stationary two-factor DGP
# ---------------------------------------------------------------------

rng = np.random.default_rng(0)
J = 20           # control units
T_pre = 30
T_post = 10
T = T_pre + T_post
r_true = 2
tau_true = 1.0   # additive treatment effect on the treated unit

F = rng.standard_normal((T, r_true)).cumsum(axis=0)        # non-stationary
lam = rng.standard_normal((J + 1, r_true))
eps = rng.standard_normal((T, J + 1)) * 0.5
Y0 = F @ lam.T + eps
Y = Y0.copy()
Y[T_pre:, 0] += tau_true                                    # unit 0 treated

rows = [
    {
        "unit": j,
        "time": t,
        "y": float(Y[t, j]),
        "D": int(j == 0 and t >= T_pre),
    }
    for j in range(J + 1)
    for t in range(T)
]
df = pd.DataFrame(rows)


# ---------------------------------------------------------------------
# 2. Fit FMA with all three inference modes
# ---------------------------------------------------------------------

results = FMA({
    "df": df,
    "outcome": "y",
    "treat": "D",
    "unitid": "unit",
    "time": "time",
    "stationarity": "nonstationary",         # IPC1 factor selection
    "preprocessing": "demean",
    "inference_methods": ["asymptotic", "bootstrap", "placebo"],
    "n_bootstrap": 500,
    "alpha": 0.05,
    "display_graphs": False,
}).fit()


# ---------------------------------------------------------------------
# 3. Inspect the output
# ---------------------------------------------------------------------

print(f"true tau              = {tau_true:+.3f}")
print(f"ATT_hat               = {results.att:+.3f}")
print(f"r selected            = {results.design.n_factors} "
      f"({results.design.n_factors_source})")
print(f"pre-RMSE              = {results.pre_rmse:.4f}")
inf = results.inference_detail
print(f"asymptotic 95% CI ATT = "
      f"[{inf.asymptotic_att_lower:+.3f}, {inf.asymptotic_att_upper:+.3f}]")
print(f"asymptotic p-value    = {inf.asymptotic_att_p_value:.3f}")

import pandas as pd
print("\nPer-period ATT_t with bootstrap CI:")
print(pd.DataFrame({
    "t": np.arange(T_pre + 1, T + 1),
    "att_t": results.gap[T_pre:],
    "boot_lower": inf.bootstrap_att_t_lower,
    "boot_upper": inf.bootstrap_att_t_upper,
}).round(3).to_string(index=False))

Verification#

Monte Carlo replication (Path B). Li & Sonnier’s empirical applications – California beer sales and a Brooklyn eyeglass retailer’s showroom – run on Nielsen retail scanner data and a proprietary retailer panel, neither of which is redistributable, so the estimator is validated through the paper’s own Monte Carlo. The methodological contribution of the paper is that the new asymptotic CI from Theorem 3.1 attains nominal coverage regardless of whether $\sigma_{\text{tr}} = \sigma_{\text{co}}$, while the Xu (2017) bootstrap miscovers badly when the two variances differ (Figures 2-4 Panel A, plus the non-stationary mirror in Web Appendix E.1’s Figures W.5-W.7 and the sample-size sweeps in W.8-W.11).

Both DGPs – the stationary Section 4 DGP1 and the non-stationary Appendix E.1 DGP2 – are packaged in mlsynth.utils.fma_helpers.simulation.simulate_fma_sample(), together with the three variance regimes ("equal" / "treated_smaller" / "treated_larger"). True ATT is zero in every draw; the paper’s centred statistic $\sqrt{T - T_0}\,(\widehat{\tau} - \tau)$ is invariant to $\tau$ (see the equation following 4.2), so coverage doesn’t depend on its value.

Replicating the headline coverage findings is a 15-line script:

import numpy as np
from mlsynth import FMA
from mlsynth.utils.fma_helpers.simulation import simulate_fma_sample

def coverage_cell(dgp, variance_case, N_co, T1, T2, M, alpha=0.05):
    stationarity = "stationary" if dgp == "dgp1" else "nonstationary"
    covers = []
    for j in range(M):
        s = simulate_fma_sample(dgp=dgp, N_co=N_co, T1=T1, T2=T2,
                                  variance_case=variance_case,
                                  rng=np.random.default_rng(j))
        res = FMA({"df": s.df, "outcome": "y", "treat": "D",
                   "unitid": "unit", "time": "time",
                   "stationarity": stationarity,
                   "inference_methods": ["asymptotic"],
                   "alpha": alpha, "display_graphs": False}).fit()
        covers.append(res.inference_detail.asymptotic_att_lower <= 0.0
                       <= res.inference_detail.asymptotic_att_upper)
    return float(np.mean(covers))

for dgp in ("dgp1", "dgp2"):
    for case in ("equal", "treated_smaller", "treated_larger"):
        print(dgp, case,
              coverage_cell(dgp, case, N_co=30, T1=30, T2=20, M=1000))

At $M = 1{,}000$ (Li & Sonnier use $M = 100{,}000$; runtime difference is the only material change) mlsynth reproduces all four appendix-figure families. The MC standard error at $M = 1{,}000$ is $\sqrt{0.95 \cdot 0.05 / 1000} \approx 0.7$ pp.

Section 4 + Web Appendix E.1: three variance regimes#

$N_0 = 30$, $T_0 = 30$, $T - T_0 = 20$. Reproduces Figures 2-4 Panel A (DGP1, stationary) and Figures W.5-W.7 Panel A (DGP2, non-stationary).

DGP	Variance case	$\sigma_{\text{tr}}$	$\sigma_{\text{co}}$	mlsynth	paper
DGP1	`"equal"` (Fig 2A)	1.0	1.0	0.947	0.95
DGP1	`"treated_smaller"` (Fig 3A)	0.5	1.0	0.946	0.95
DGP1	`"treated_larger"` (Fig 4A)	2.0	1.0	0.951	0.95
DGP2	`"equal"` (Fig W.5A)	1.0	1.0	0.935	0.95
DGP2	`"treated_smaller"` (Fig W.6A)	0.5	1.0	0.944	0.95
DGP2	`"treated_larger"` (Fig W.7A)	2.0	1.0	0.929	0.95

Every cell lands within $\pm 2.1$ pp of nominal – well inside three MC standard errors. The headline takeaway is the equality of the three numbers within each DGP: coverage does not deteriorate when $\sigma_{\text{tr}} \ne \sigma_{\text{co}}$, which is precisely the regime where the Xu (2017) bootstrap mistakes the treated-error variance for the control-error variance and either over- or under-covers (Panel B of Figures 3-4 and W.6-W.7 in the paper).

Web Appendix E.2.1: other $(T_0, N_0)$ combinations#

Equal variance ($\sigma_{\text{tr}} = \sigma_{\text{co}} = 1$), $T - T_0 = 20$, sample-size sweeps over both DGPs reproducing Figures W.8-W.9 (stationary) and W.10-W.11 (non-stationary).

DGP	$T_0$	$N_0$	mlsynth	paper
DGP1	30	60	0.936	0.95 (Fig W.8A)
DGP1	60	30	0.939	0.95 (Fig W.8B)
DGP1	60	60	0.944	0.95 (Fig W.9A)
DGP1	120	120	0.949	0.95 (Fig W.9B)
DGP2	30	60	0.939	0.95 (Fig W.10A)
DGP2	60	30	0.936	0.95 (Fig W.10B)
DGP2	60	60	0.935	0.95 (Fig W.11A)
DGP2	120	120	0.955	0.95 (Fig W.11B)

All eight cells within $\pm 1.5$ pp of nominal, and the largest cell ($T_0 = N_0 = 120$) hits the asymptotic regime cleanly – the underlying theory is consistent in both panel dimensions, as the paper asserts.

Not replicated here#

Web Appendix E.2.2 (small-N/T $t$-distribution) – the paper recommends switching to $t_{T_0 - (N_0 + 1)}$ for very small samples, but the suggested degrees-of-freedom value is non-positive in the figure’s specific $(T_0, N_0) = (20, 20)$ configuration, and FMA’s public API exposes only the normal CI; the small-sample refinement is left as a future addition.
Web Appendix E.3 / W.15-W.17 (DGP3) – the paper bootstraps DGP3’s first factor from the proprietary Brooklyn sales panel, so the precise DGP cannot be reconstructed from public data; the corresponding empirical-application coverage check is therefore out of scope.

For reference, the corresponding empirical applications report:

California recreational marijuana legalization on beer sales – $\widehat{\tau} = -\$88{,}400$ weekly ($-0.464\%$), 95% CI $[-\$407{,}400,\ \$230{,}600]$ (Table 2; not significant). The Xu bootstrap’s interval is less than half as wide ($[-\$229{,}000,\ \$82{,}400]$), invalidly so given the variance ratio $\widehat{\sigma}_{\text{tr}}^2 / \widehat{\sigma}_{\text{co}}^2 \approx 37.4$.
Brooklyn showroom opening on weekly sales – $\widehat{\tau} = +\$2{,}446$ weekly ($+27.2\%$).

References#

Bai, J. (2004). “Estimating Cross-Section Common Stochastic Trends in Nonstationary Panel Data.” Journal of Econometrics 122(1):137-183.

Bai, J., & Ng, S. (2002). “Determining the Number of Factors in Approximate Factor Models.” Econometrica 70(1):191-221.

Li, K. T., & Sonnier, G. P. (2023). “Statistical Inference for the Factor Model Approach to Estimate Causal Effects in Quasi-Experimental Settings.” Journal of Marketing Research 60(3):449-472.

Factor Model Approach (FMA)

Contents

Factor Model Approach (FMA)#

When to Use This Method#

Factor projection vs. donor weighting#

Reach for FMA when#

Do not use FMA when#

Overview#

Notation#

Assumptions#

Mathematical Formulation#

Setup#

Factor extraction#

Loading and counterfactual#

Asymptotic inference (Theorem 3.1 / 3.3)#

Bootstrap inference (Web Appendix F)#

Placebo inference (Web Appendix G)#

Core API#

Configuration#

Helper Modules#

Procedure#

Example#

Verification#

Section 4 + Web Appendix E.1: three variance regimes#

Web Appendix E.2.1: other \((T_0, N_0)\) combinations#

Not replicated here#

References#