Matrix Completion with Nuclear Norm Minimization (MCNNM)#
When to Use This Estimator#
MCNNM implements the matrix-completion estimator of Athey, Bayati,
Doudchenko, Imbens and Khosravi [MCNNM]. Its argument for when it is the
right tool is unifying: the authors show (their Theorem 1) that the
unconfoundedness, synthetic-control, and difference-in-differences estimators
all minimize the same least-squares objective and differ only in the
restrictions they impose – unconfoundedness reweights time periods,
synthetic control reweights control units, DiD imposes parallel trends.
MC-NNM replaces those hard restrictions with a nuclear-norm regularization
on a low-rank factor model, exploiting the cross-sectional and time-series
structure at once.
The practical payoff is robustness across regimes. Because it does not commit
to one restriction, MC-NNM performs well whether the panel is wide
(N >> T), tall (T >> N), or roughly square (N ~ T) – settings
where synthetic control, DiD, or vertical regression each individually
degrade. Reach for it when you have a single treated unit or many, a single
adoption date or staggered adoption, and you would rather regularize the
latent structure than assume which comparison (units vs. periods) is the right
one. The cost is that the estimand is a low-rank imputation rather than an
interpretable set of donor weights.
Do not use MCNNM when#
An interpretable donor-weight story is the deliverable. MC-NNM imputes a low-rank matrix; it does not hand you a sparse “California = 0.4 Utah + 0.3 Montana” convex combination. If the weights are the result, use Two-Step Synthetic Control, Synthetic Control with Multiple Outcomes (SCMO), or Forward Difference-in-Differences (FDID).
There is no low-rank structure. The nuclear-norm regularization is only useful when the control matrix is approximately low rank plus noise. With a slowly-decaying spectrum, prefer a balancing estimator (MicroSynth (User-Level Balancing SC)) or a selection estimator (Forward Difference-in-Differences (FDID)).
Missingness is informative (MNAR / self-masking) – the probability a cell is observed depends on its own value, as in recommender systems. MC-NNM assumes a structured-but-exogenous missingness pattern; use Synthetic Nearest Neighbors / Causal Matrix Completion (SNN), which is built for missing-not-at-random data.
Spillovers contaminate the control block (SUTVA fails). The low-rank signal model treats controls as untreated; use Spatial Synthetic Difference-in-Differences (SpSyDiD) or Spillover-Aware Synthetic Control (SPILLSYNTH).
Treatment is endogenous and you have an instrument. MC-NNM imputes \(Y(0)\) but does not break simultaneity; use Synthetic IV.
Distributional questions (quantiles, tails) – MC-NNM targets the mean ATT; use Distributional Synthetic Control (DSC).
Notation#
The outcome panel is the \(N \times T\) matrix \(\mathbf{Y} = (Y_{it})\) for units \(i = 1, \ldots, N\) over periods \(t = 1, \ldots, T\). A treatment-indicator matrix \(\mathbf{D}\) marks treated cells; the observed (untreated) cells form the index set \(\mathcal{O}\), and \(P_{\mathcal{O}}\) is the projection that zeros out the rest (\(P_{\mathcal{O}}^{\perp}\) its complement). The untreated potential outcomes follow a low-rank component plus two-way fixed effects,
with unit effects \(\boldsymbol{\Gamma} \in \mathbb{R}^N\), time effects \(\boldsymbol{\Delta} \in \mathbb{R}^T\), and mean-zero noise. The nuclear norm \(\|\mathbf{L}\|_{*} = \sum_i \sigma_i(\mathbf{L})\) sums the singular values. The intervention splits the panel into pre-period \(\mathcal{T}_1 = \{1, \ldots, T_0\}\) and post-period; the treatment effect on a treated cell is \(\hat\tau_{it} = Y_{it} - \widehat{Y}_{it}(0)\) and the ATT is its average over treated cells.
The estimator#
Treating the treated cells as missing, MC-NNM solves (paper Eq. 4.3)
Only the low-rank part \(\mathbf{L}\) is regularized; the unit/time fixed effects are estimated explicitly and left unregularized, which markedly improves the imputations (paper Section 4). The counterfactual for a treated cell is read off the completed matrix, \(\widehat{Y}_{it}(0) = \widehat{L}_{it} + \widehat\Gamma_i + \widehat\Delta_t\).
Algorithm (SOFT-IMPUTE)#
The problem is solved by singular-value soft-thresholding [Mazumder2010] (paper Eq. 4.4-4.5). With the shrink operator \(\mathrm{shrink}_\lambda(\mathbf{A}) = \mathbf{S}\,\widetilde{\boldsymbol{\Sigma}}\,\mathbf{R}^{\top}\) (each singular value replaced by \(\max(\sigma - \lambda, 0)\)), iterate
re-fitting the fixed effects from their first-order conditions after each step
until convergence. The regularization strength \(\lambda\) is chosen by
cross-validation over the observed cells (an n_lambda-point grid,
n_folds folds).
Assumptions and remarks#
Assumption 1 (low rank + two-way effects). The untreated outcomes are a low-rank matrix plus unit and time fixed effects, with mean-zero idiosyncratic noise. Remark. This is the single structural assumption that subsumes the others: a rank-\(r\) factor model nests interactive fixed effects, while the explicit two-way effects absorb additive unit/time shifts so the nuclear-norm penalty only has to recover the interaction structure.
Assumption 2 (missingness / no anticipation). Treatment makes a cell’s untreated outcome missing; the observed (untreated) cells are informative for the missing ones, and there is no anticipation. Remark. Staggered adoption produces a “staircase” of missing cells, which the mask-based completion fills directly – MC-NNM’s main advantage over fixed-rank interactive-fixed-effects methods that need a rectangular treated block.
Assumption 3 (regularization rate). \(\lambda\) shrinks at the rate the theory prescribes so the completion is consistent; in practice it is selected by cross-validation on held-out observed cells. Remark. Choosing \(\lambda\) too small overfits noise into \(\mathbf{L}\); too large over-shrinks and biases the imputed counterfactual toward the fixed-effects-only fit.
Assumption 4 (jackknife inference). The leave-one-control refits are
approximately exchangeable, so their dispersion estimates the ATT’s sampling
variability. Remark. MC-NNM has no closed-form standard error; mlsynth
follows the matrix-completion literature in using a jackknife (see Inference).
Causal use and staggered adoption#
MCNNM marks the treated post-treatment cells as missing, imputes their
untreated outcomes, and forms \(\hat\tau_{it} = Y_{it} - \widehat{Y}_{it}(0)\),
aggregated to the ATT over treated cells. Adoption times are detected with
mlsynth.utils.datautils.dataprep(), and the result exposes two
staggered-aware aggregations beyond the overall ATT:
cohort_att–{adoption_time: ATT}for each adoption cohort.event_study–{relative_time: average effect}, re-centred on each unit’s own adoption date (negative keys are pre-adoption fit checks, ~0; non-negative keys are the dynamic effects).
With multiple adoption times, display_graphs=True draws an event-study
plot (effect vs. time-since-adoption) rather than a single calendar trajectory,
so cohorts at different event times are not blended.
Inference#
Setting inference=True runs a leave-one-control jackknife for the ATT:
drop one control unit, refit at the cross-validation-selected \(\lambda\),
recompute the ATT, and form
\(\widehat{\mathrm{se}}^2 = \tfrac{q-1}{q}\sum_{q}(\hat\tau_q - \bar\tau)^2\)
over the q control-deletions, with a Wald interval at level alpha. This
is a standard inference for matrix-completion estimators (no analytic SE
exists); it captures donor-pool uncertainty. The interval is returned on
MCNNMInference (se, ci).
Example#
Proposition 99 – California’s 1988 tobacco-control program. MC-NNM imputes
California’s post-1988 counterfactual per-capita cigarette sales by matrix
completion and reports the ATT; with display_graphs=True it draws the
observed-vs-counterfactual chart.
import pandas as pd
from mlsynth import MCNNM
url = ("https://raw.githubusercontent.com/jgreathouse9/mlsynth/"
"refs/heads/main/basedata/smoking_data.csv")
df = pd.read_csv(url)
res = MCNNM({
"df": df, "outcome": "cigsale", "treat": "Proposition 99",
"unitid": "state", "time": "year",
"inference": True, # jackknife ATT SE / CI
"display_graphs": True, # observed vs MC-NNM counterfactual
}).fit()
print(f"ATT (avg 1989-2000) = {res.att:+.2f} packs/capita")
lo, hi = res.inference.ci
print(f"jackknife 95% CI = [{lo:+.2f}, {hi:+.2f}]")
print(f"gap by 2000 = {res.att_by_period[2000]:+.2f}")
print(f"selected lambda = {res.best_lambda:.2f}")
Verification#
Note
Empirical (Proposition 99). MC-NNM imputes a near-exact California
pre-treatment fit and an average ATT of about \(-20\) packs per capita,
widening to roughly \(-30\) by 2000 – consistent with Abadie, Diamond &
Hainmueller [ABADIE2010] and with the synthetic-control / SNN estimates
elsewhere in mlsynth. The jackknife confidence interval excludes zero.
Regime robustness. Because MC-NNM regularizes rather than restricts, the
same estimator runs unchanged on wide (N >> T), tall (T >> N) and
square panels and on staggered adoption, where the unconfoundedness / SC /
DiD special cases (paper Theorem 1) individually break down.
Core API#
MCNNM: Matrix Completion with Nuclear Norm Minimization (Athey et al. 2021).
Athey, S., Bayati, M., Doudchenko, N., Imbens, G. & Khosravi, K. (2021). “Matrix Completion Methods for Causal Panel Data Models.” Journal of the American Statistical Association 116(536):1716-1730.
MC-NNM estimates causal effects in panel data by treating the treated unit/period cells as missing entries of the outcome matrix and imputing them via low-rank matrix completion. It models the untreated-outcome matrix as a low-rank component plus two-way (unit and time) fixed effects,
regularising only the low-rank part \(L\) via its nuclear norm (the sum of singular values). The fixed effects are estimated explicitly and left unregularised, which substantially improves imputation. The problem is solved by the SOFT-IMPUTE iteration (singular-value soft-thresholding) with the regularisation strength chosen by cross-validation over the observed cells.
MC-NNM nests the unconfoundedness, synthetic-control, and
difference-in-differences estimators (paper Theorem 1): all minimise the
same objective and differ only in the restrictions/regularisation they
impose. By regularising rather than imposing hard restrictions, MC-NNM
performs well whether N >> T, T >> N, or N ~ T – regimes
where the unconfoundedness or synthetic-control approaches individually
break down.
This estimator targets the block / staggered-adoption causal setting: control units and treated units’ pre-treatment periods are the observed entries; treated post-treatment cells are imputed, and the treatment effect is the observed outcome minus the imputed counterfactual.
- class mlsynth.estimators.mcnnm.MCNNM(config: MCNNMConfig | dict)#
Bases:
objectMatrix Completion with Nuclear Norm Minimization estimator.
- Parameters:
config (MCNNMConfig or dict) – Configuration object. See
mlsynth.config_models.MCNNMConfig.
- fit() MCNNMResults#
Run MC-NNM and return
MCNNMResults.
Configuration#
- class mlsynth.config_models.MCNNMConfig(*, df: ~pandas.DataFrame, outcome: str, treat: str, unitid: str, time: str, display_graphs: bool = True, save: bool | str = False, counterfactual_color: ~typing.List[str] = <factory>, treated_color: str = 'black', estimate_unit_fe: bool = True, estimate_time_fe: bool = True, n_lambda: ~typing.Annotated[int, ~annotated_types.Ge(ge=2)] = 40, n_folds: ~typing.Annotated[int, ~annotated_types.Ge(ge=2)] = 5, inference: bool = False, alpha: ~typing.Annotated[float, ~annotated_types.Gt(gt=0.0), ~annotated_types.Lt(lt=1.0)] = 0.05, random_state: int = 0)#
Configuration for the MC-NNM estimator.
Athey, Bayati, Doudchenko, Imbens & Khosravi (2021), “Matrix Completion Methods for Causal Panel Data Models” (JASA). Imputes the treated cells of the outcome matrix via nuclear-norm-regularised low-rank matrix completion with unregularised two-way fixed effects (SOFT-IMPUTE, threshold chosen by cross-validation). Inherits the standard
df/outcome/treat/unitid/timeinterface.- Parameters:
estimate_unit_fe (bool) – Estimate (unregularised) unit fixed effects. Default True.
estimate_time_fe (bool) – Estimate (unregularised) time fixed effects. Default True.
n_lambda (int) – Number of candidate singular-value thresholds in the CV grid.
n_folds (int) – Cross-validation folds over the observed cells.
inference (bool) – Run a leave-one-control jackknife for the ATT SE / CI. Default False (it refits the model once per control unit).
alpha (float) – Two-sided level for the jackknife confidence interval.
random_state (int) – Seed for the CV fold assignment.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
Result Containers#
MCNNM.fit() returns a
MCNNMResults: the ATT, the
completed counterfactual matrix and per-cell effects, att_by_period,
cohort_att and event_study aggregations, the low-rank factors
(unit_factors, time_factors, singular_values, rank), the fixed
effects, the CV-selected best_lambda, implied (non-unique) donor weights,
and – when requested – the
MCNNMInference jackknife.
Frozen dataclasses for the MC-NNM estimator.
Athey, Bayati, Doudchenko, Imbens & Khosravi (2021), Matrix Completion Methods for Causal Panel Data Models (JASA). MC-NNM imputes the missing (treated) entries of the outcome matrix by nuclear-norm-regularised low-rank matrix completion with unregularised two-way fixed effects, then forms treatment effects as observed minus imputed.
- class mlsynth.utils.mcnnm_helpers.structures.MCNNMInference(method: str, se: float, ci: tuple, alpha_level: float, n_jackknife: int)#
Bases:
objectLeave-one-control jackknife inference for the MC-NNM ATT.
- class mlsynth.utils.mcnnm_helpers.structures.MCNNMInputs(Y: ndarray, mask: ndarray, D: ndarray, treated_idx: ndarray, T0: int, unit_names: List[Any], time_labels: ndarray)#
Bases:
objectPreprocessed panel for MC-NNM.
- Y#
Observed outcomes, shape
(N, T).- Type:
np.ndarray
- mask#
Observation indicator, shape
(N, T);1observed (control / pre-treatment),0missing (treated post-treatment).- Type:
np.ndarray
- D#
Treatment indicators, shape
(N, T)(1where treated).- Type:
np.ndarray
- treated_idx#
Indices of ever-treated units.
- Type:
np.ndarray
- time_labels#
- Type:
np.ndarray
- D: ndarray#
- Y: ndarray#
- mask: ndarray#
- time_labels: ndarray#
- treated_idx: ndarray#
- class mlsynth.utils.mcnnm_helpers.structures.MCNNMResults(inputs: ~mlsynth.utils.mcnnm_helpers.structures.MCNNMInputs, att: float, counterfactual: ~numpy.ndarray, effects: ~numpy.ndarray, att_by_period: ~typing.Dict[~typing.Any, float], cohort_att: ~typing.Dict[~typing.Any, float], event_study: ~typing.Dict[int, float], L: ~numpy.ndarray, gamma: ~numpy.ndarray, delta: ~numpy.ndarray, best_lambda: float, rank: int, unit_factors: ~numpy.ndarray | None = None, time_factors: ~numpy.ndarray | None = None, singular_values: ~numpy.ndarray | None = None, weights: ~typing.Any | None = None, inference: ~mlsynth.utils.mcnnm_helpers.structures.MCNNMInference | None = None, metadata: ~typing.Dict[str, ~typing.Any] = <factory>)#
Bases:
objectTop-level container returned by
mlsynth.MCNNM.fit().- inputs#
- Type:
- att#
Average treatment effect on the treated (observed minus imputed over the treated cells).
- Type:
- counterfactual#
Full fitted matrix
L + Gamma + Delta, shape(N, T); on treated cells this is the imputed untreated potential outcome.- Type:
np.ndarray
- effects#
Per-cell effects (observed minus imputed) on treated cells;
NaNelsewhere, shape(N, T).- Type:
np.ndarray
- att_by_period#
{period_label: mean effect across treated units}post-treatment (calendar time – pools cohorts at each period).- Type:
- cohort_att#
{adoption_time_label: mean ATT for that adoption cohort}– the cohort-specific effects under staggered adoption.- Type:
- event_study#
{relative_time: mean effect across treated cells at that event time}where relative time isperiod - adoption periodfor each treated unit. Negative keys are pre-adoption (a placebo / fit-quality check, ~0); non-negative keys are the dynamic treatment effects.- Type:
- L#
Estimated low-rank matrix, shape
(N, T).- Type:
np.ndarray
- gamma#
Unit fixed effects, shape
(N,).- Type:
np.ndarray
- delta#
Time fixed effects, shape
(T,).- Type:
np.ndarray
- unit_factors#
Unit loadings \(U \Sigma^{1/2}\) from the SVD of
L, shape(N, rank)– MC-NNM’s “under the hood”: each unit’s position in the latent factor space.- Type:
np.ndarray
- time_factors#
Time factors \(V \Sigma^{1/2}\), shape
(T, rank).- Type:
np.ndarray
- singular_values#
Singular values of
L(the nuclear-norm spectrum).- Type:
np.ndarray
- weights#
Implied per-treated-unit donor weights, obtained by projecting the treated unit’s low-rank row onto the control rows. MC-NNM is a factorisation (not a weighting) estimator, so these are a derived, non-unique diagnostic – flagged as such in
summary_stats.- Type:
WeightsResults, optional
- inference#
MCNNMInferencewheninference=True; elseNone.- Type:
object, optional
- L: ndarray#
- counterfactual: ndarray#
- delta: ndarray#
- effects: ndarray#
- gamma: ndarray#
- inference: MCNNMInference | None = None#
- inputs: MCNNMInputs#
Helper Modules#
Data preparation – the DataFrame touchpoint: pivots to the outcome matrix and the observed/treated masks.
Panel ingestion for the MC-NNM estimator.
- mlsynth.utils.mcnnm_helpers.setup.prepare_mcnnm_inputs(df: DataFrame, outcome: str, treat: str, unitid: str, time: str) MCNNMInputs#
Pivot a long panel into
MCNNMInputs.Observed entries (
mask == 1) are the control units and the treated units’ pre-treatment periods; the treated post-treatment cells are the missing entries MC-NNM imputes.
The SOFT-IMPUTE completion solver and the cross-validation over \(\lambda\).
Core MC-NNM engine: soft-impute with unregularised fixed effects.
Athey, S., Bayati, M., Doudchenko, N., Imbens, G. & Khosravi, K. (2021). “Matrix Completion Methods for Causal Panel Data Models.” Journal of the American Statistical Association 116(536):1716-1730.
The model is \(Y = L^* + \Gamma 1_T^\top + 1_N \Delta^\top + \varepsilon\), and the estimator (paper eq. 4.3)
regularises only the low-rank part \(L\) (the unit/time fixed effects \(\Gamma, \Delta\) are estimated explicitly, unregularised, to reduce bias). It is solved by the SOFT-IMPUTE iteration (eq. 4.4-4.5): soft-threshold the singular values of the filled-in matrix, then re-fit the fixed effects, until convergence.
- mlsynth.utils.mcnnm_helpers.completion.mcnnm_cv(Y: ndarray, mask: ndarray, *, est_u: bool = True, est_v: bool = True, n_lam: int = 40, n_folds: int = 5, max_iter: int = 400, tol: float = 1e-05, random_state: int = 0) dict#
Select the threshold by K-fold cross-validation over observed cells.
For each candidate threshold, a fraction of observed entries is held out, the model is fit on the rest, and out-of-sample squared error is averaged over folds; the threshold minimising it is chosen, then the final fit uses all observed entries (paper “Cross-validation”).
- mlsynth.utils.mcnnm_helpers.completion.mcnnm_fit(Y: ndarray, mask: ndarray, thr: float, *, est_u: bool = True, est_v: bool = True, max_iter: int = 400, tol: float = 1e-05) dict#
Fit MC-NNM for a given SVD threshold
thr.- Parameters:
Y (np.ndarray) – Outcome matrix, shape
(N, T). Missing entries may hold any value; only observed (mask == 1) entries are used.mask (np.ndarray) – Observation indicator, shape
(N, T);1observed,0missing (to be imputed).thr (float) – Singular-value soft-threshold (the regularisation strength).
est_u, est_v (bool) – Estimate unit / time fixed effects.
- Returns:
dict with
L(low-rank matrix),gamma(N,),delta(T,),and
completed=L + gamma + delta(the full fitted matrix).
Run loop: completion, ATT and staggered aggregations, factor decomposition, and the jackknife.
Orchestration for the MC-NNM estimator (Athey et al. 2021).
- mlsynth.utils.mcnnm_helpers.pipeline.run_mcnnm(inputs: MCNNMInputs, *, est_u: bool = True, est_v: bool = True, n_lam: int = 40, n_folds: int = 5, max_iter: int = 400, tol: float = 1e-05, inference: bool = False, alpha_level: float = 0.05, random_state: int = 0, adoption_times: dict | None = None) MCNNMResults#
Run MC-NNM (CV over the threshold) and assemble
MCNNMResults.- Parameters:
inputs (MCNNMInputs)
est_u, est_v (bool) – Estimate unit / time fixed effects (recommended; default True).
n_lam (int) – Number of candidate thresholds in the CV grid.
n_folds (int) – Cross-validation folds over observed cells.
inference (bool) – If True, run a leave-one-control jackknife (at the CV-selected threshold) for the ATT SE / CI.