MCNNM — Matrix Completion with Nuclear-Norm Minimization (Athey et al. 2021)#

Estimator:

Matrix Completion with Nuclear Norm Minimization (MCNNM)mlsynth.MCNNM

Source:

Athey, S., Bayati, M., Doudchenko, N., Imbens, G., & Khosravi, K. (2021), “Matrix Completion Methods for Causal Panel Data Models,” Journal of the American Statistical Association 116(536):1716-1730.

Replication type:

Path A — Proposition 99 empirical, with a cross-validation against the causaltensor reference implementation.

Status:

Fully verified — estimand reproduced and matched against an independent MC-NNM implementation.

Validation strategy#

MC-NNM imputes the treated post-period cells of the outcome matrix as missing entries via low-rank completion with two-way fixed effects. Athey et al. report a Proposition 99 effect of roughly -20 packs per capita. mlsynth recovers that estimand, and we cross-validate against causaltensor.MC_NNM_with_cross_validation on the same matrix.

Why this is an estimand-level cross-check#

Both implementations solve the same SOFT-IMPUTE objective (Athey et al. 2021, eq. 4.3), but they differ in the two-way fixed-effect sub-solver:

  • mlsynth fits the unit/time effects on the observed cells only — the Athey et al. observed-set \(\mathcal{O}\) convention.

  • causaltensor fits them on the full \(O - M\) matrix.

With each library choosing its own regulariser by cross-validation, the two completed matrices are therefore close but not bit-identical. We accordingly cross-validate the ATT — the public estimand surfaced by res.att — not the raw low-rank fit.

Path A — Proposition 99#

import pandas as pd
from mlsynth import MCNNM

df = pd.read_csv("basedata/smoking_data.csv")
df["treat"] = df["Proposition 99"].astype(int)
res = MCNNM({"df": df[["state", "year", "cigsale", "treat"]],
             "outcome": "cigsale", "treat": "treat",
             "unitid": "state", "time": "year",
             "display_graphs": False}).fit()
res.att        # -19.83

Cross-validation against causaltensor#

import numpy as np, causaltensor as ct

wide = df.pivot(index="state", columns="year", values="cigsale").sort_index()
states, years = wide.index.tolist(), wide.columns.tolist()
O = wide.values.astype(float)
ti, sc = states.index("California"), years.index(1989)
Omega = np.ones_like(O); Omega[ti, sc:] = 0   # 1 = observed, 0 = missing
_, _, _, tau = ct.MC_NNM_with_cross_validation(O, Omega)
tau            # -20.27

Quantity

mlsynth

causaltensor

JASA headline

ATT (packs/capita)

-19.83

-20.27

\(\approx\) -20

The two CV-selected ATTs agree to \(|\Delta| = 0.44\) packs — about 2% of the estimand — with the residual attributable to the documented FE-solver difference plus the two independent cross-validation grids.

Durable check#

The benchmark lives in benchmarks/cases/mcnnm_prop99.py and runs in the default suite (skipping gracefully if causaltensor is absent):

pip install causaltensor
python benchmarks/run_benchmarks.py --case mcnnm_prop99

It asserts the ATT lands near the published -20 (tol 1.5) and matches causaltensor to within 1.0 pack.

References#

Athey, S., Bayati, M., Doudchenko, N., Imbens, G., & Khosravi, K. (2021). “Matrix Completion Methods for Causal Panel Data Models.” Journal of the American Statistical Association 116(536):1716-1730.