Matrix Completion with Nuclear Norm Minimization (MCNNM)

Contents

Matrix Completion with Nuclear Norm Minimization (MCNNM)#

When to Use This Estimator#

MCNNM implements the matrix-completion estimator of Athey, Bayati, Doudchenko, Imbens and Khosravi [MCNNM]. Its argument for when it is the right tool is unifying: the authors show (their Theorem 1) that the unconfoundedness, synthetic-control, and difference-in-differences estimators all minimize the same least-squares objective and differ only in the restrictions they impose – unconfoundedness reweights time periods, synthetic control reweights control units, DiD imposes parallel trends. MC-NNM replaces those hard restrictions with a nuclear-norm regularization on a low-rank factor model, exploiting the cross-sectional and time-series structure at once.

The practical payoff is robustness across regimes. Because it does not commit to one restriction, MC-NNM performs well whether the panel is wide (N >> T), tall (T >> N), or roughly square (N ~ T) – settings where synthetic control, DiD, or vertical regression each individually degrade. Reach for it when you have a single treated unit or many, a single adoption date or staggered adoption, and you would rather regularize the latent structure than assume which comparison (units vs. periods) is the right one. The cost is that the estimand is a low-rank imputation rather than an interpretable set of donor weights.

Do not use MCNNM when#

Notation#

The outcome panel is the \(N \times T\) matrix \(\mathbf{Y} = (Y_{it})\) for units \(i = 1, \ldots, N\) over periods \(t = 1, \ldots, T\). A treatment-indicator matrix \(\mathbf{D}\) marks treated cells; the observed (untreated) cells form the index set \(\mathcal{O}\), and \(P_{\mathcal{O}}\) is the projection that zeros out the rest (\(P_{\mathcal{O}}^{\perp}\) its complement). The untreated potential outcomes follow a low-rank component plus two-way fixed effects,

\[\mathbf{Y}(0) = \mathbf{L}^{*} + \boldsymbol{\Gamma}\mathbf{1}_T^{\top} + \mathbf{1}_N \boldsymbol{\Delta}^{\top} + \boldsymbol{\varepsilon},\]

with unit effects \(\boldsymbol{\Gamma} \in \mathbb{R}^N\), time effects \(\boldsymbol{\Delta} \in \mathbb{R}^T\), and mean-zero noise. The nuclear norm \(\|\mathbf{L}\|_{*} = \sum_i \sigma_i(\mathbf{L})\) sums the singular values. The intervention splits the panel into pre-period \(\mathcal{T}_1 = \{1, \ldots, T_0\}\) and post-period; the treatment effect on a treated cell is \(\hat\tau_{it} = Y_{it} - \widehat{Y}_{it}(0)\) and the ATT is its average over treated cells.

The estimator#

Treating the treated cells as missing, MC-NNM solves (paper Eq. 4.3)

\[(\widehat{\mathbf{L}}, \widehat{\boldsymbol{\Gamma}}, \widehat{\boldsymbol{\Delta}}) = \operatorname*{argmin}_{\mathbf{L}, \boldsymbol{\Gamma}, \boldsymbol{\Delta}} \frac{1}{|\mathcal{O}|} \bigl\| P_{\mathcal{O}}(\mathbf{Y} - \mathbf{L} - \boldsymbol{\Gamma}\mathbf{1}_T^{\top} - \mathbf{1}_N\boldsymbol{\Delta}^{\top}) \bigr\|_F^2 + \lambda \|\mathbf{L}\|_{*}.\]

Only the low-rank part \(\mathbf{L}\) is regularized; the unit/time fixed effects are estimated explicitly and left unregularized, which markedly improves the imputations (paper Section 4). The counterfactual for a treated cell is read off the completed matrix, \(\widehat{Y}_{it}(0) = \widehat{L}_{it} + \widehat\Gamma_i + \widehat\Delta_t\).

Algorithm (SOFT-IMPUTE)#

The problem is solved by singular-value soft-thresholding [Mazumder2010] (paper Eq. 4.4-4.5). With the shrink operator \(\mathrm{shrink}_\lambda(\mathbf{A}) = \mathbf{S}\,\widetilde{\boldsymbol{\Sigma}}\,\mathbf{R}^{\top}\) (each singular value replaced by \(\max(\sigma - \lambda, 0)\)), iterate

\[\mathbf{L}_{k+1} = \mathrm{shrink}_{\lambda|\mathcal{O}|/2} \bigl\{ P_{\mathcal{O}}(\mathbf{Y} - \widehat{\mathrm{FE}}) + P_{\mathcal{O}}^{\perp}(\mathbf{L}_k) \bigr\},\]

re-fitting the fixed effects from their first-order conditions after each step until convergence. The regularization strength \(\lambda\) is chosen by cross-validation over the observed cells (an n_lambda-point grid, n_folds folds).

Assumptions and remarks#

Assumption 1 (low rank + two-way effects). The untreated outcomes are a low-rank matrix plus unit and time fixed effects, with mean-zero idiosyncratic noise. Remark. This is the single structural assumption that subsumes the others: a rank-\(r\) factor model nests interactive fixed effects, while the explicit two-way effects absorb additive unit/time shifts so the nuclear-norm penalty only has to recover the interaction structure.

Assumption 2 (missingness / no anticipation). Treatment makes a cell’s untreated outcome missing; the observed (untreated) cells are informative for the missing ones, and there is no anticipation. Remark. Staggered adoption produces a “staircase” of missing cells, which the mask-based completion fills directly – MC-NNM’s main advantage over fixed-rank interactive-fixed-effects methods that need a rectangular treated block.

Assumption 3 (regularization rate). \(\lambda\) shrinks at the rate the theory prescribes so the completion is consistent; in practice it is selected by cross-validation on held-out observed cells. Remark. Choosing \(\lambda\) too small overfits noise into \(\mathbf{L}\); too large over-shrinks and biases the imputed counterfactual toward the fixed-effects-only fit.

Assumption 4 (jackknife inference). The leave-one-control refits are approximately exchangeable, so their dispersion estimates the ATT’s sampling variability. Remark. MC-NNM has no closed-form standard error; mlsynth follows the matrix-completion literature in using a jackknife (see Inference).

Causal use and staggered adoption#

MCNNM marks the treated post-treatment cells as missing, imputes their untreated outcomes, and forms \(\hat\tau_{it} = Y_{it} - \widehat{Y}_{it}(0)\), aggregated to the ATT over treated cells. Adoption times are detected with mlsynth.utils.datautils.dataprep(), and the result exposes two staggered-aware aggregations beyond the overall ATT:

  • cohort_att{adoption_time: ATT} for each adoption cohort.

  • event_study{relative_time: average effect}, re-centred on each unit’s own adoption date (negative keys are pre-adoption fit checks, ~0; non-negative keys are the dynamic effects).

With multiple adoption times, display_graphs=True draws an event-study plot (effect vs. time-since-adoption) rather than a single calendar trajectory, so cohorts at different event times are not blended.

Inference#

Setting inference=True runs a leave-one-control jackknife for the ATT: drop one control unit, refit at the cross-validation-selected \(\lambda\), recompute the ATT, and form \(\widehat{\mathrm{se}}^2 = \tfrac{q-1}{q}\sum_{q}(\hat\tau_q - \bar\tau)^2\) over the q control-deletions, with a Wald interval at level alpha. This is a standard inference for matrix-completion estimators (no analytic SE exists); it captures donor-pool uncertainty. The interval is returned on MCNNMInference (se, ci).

Example#

Proposition 99 – California’s 1988 tobacco-control program. MC-NNM imputes California’s post-1988 counterfactual per-capita cigarette sales by matrix completion and reports the ATT; with display_graphs=True it draws the observed-vs-counterfactual chart.

import pandas as pd
from mlsynth import MCNNM

url = ("https://raw.githubusercontent.com/jgreathouse9/mlsynth/"
       "refs/heads/main/basedata/smoking_data.csv")
df = pd.read_csv(url)

res = MCNNM({
    "df": df, "outcome": "cigsale", "treat": "Proposition 99",
    "unitid": "state", "time": "year",
    "inference": True,          # jackknife ATT SE / CI
    "display_graphs": True,     # observed vs MC-NNM counterfactual
}).fit()

print(f"ATT (avg 1989-2000) = {res.att:+.2f} packs/capita")
lo, hi = res.inference.ci
print(f"jackknife 95% CI    = [{lo:+.2f}, {hi:+.2f}]")
print(f"gap by 2000         = {res.att_by_period[2000]:+.2f}")
print(f"selected lambda     = {res.best_lambda:.2f}")

Verification#

Note

Empirical (Proposition 99). MC-NNM imputes a near-exact California pre-treatment fit and an average ATT of about \(-20\) packs per capita, widening to roughly \(-30\) by 2000 – consistent with Abadie, Diamond & Hainmueller [ABADIE2010] and with the synthetic-control / SNN estimates elsewhere in mlsynth. The jackknife confidence interval excludes zero.

Regime robustness. Because MC-NNM regularizes rather than restricts, the same estimator runs unchanged on wide (N >> T), tall (T >> N) and square panels and on staggered adoption, where the unconfoundedness / SC / DiD special cases (paper Theorem 1) individually break down.

Core API#

MCNNM: Matrix Completion with Nuclear Norm Minimization (Athey et al. 2021).

Athey, S., Bayati, M., Doudchenko, N., Imbens, G. & Khosravi, K. (2021). “Matrix Completion Methods for Causal Panel Data Models.” Journal of the American Statistical Association 116(536):1716-1730.

MC-NNM estimates causal effects in panel data by treating the treated unit/period cells as missing entries of the outcome matrix and imputing them via low-rank matrix completion. It models the untreated-outcome matrix as a low-rank component plus two-way (unit and time) fixed effects,

\[(\widehat L, \widehat\Gamma, \widehat\Delta) = \arg\min_{L, \Gamma, \Delta} \tfrac{1}{|\mathcal{O}|} \| P_\mathcal{O}(Y - L - \Gamma 1_T^\top - 1_N \Delta^\top)\|_F^2 + \lambda \|L\|_*,\]

regularising only the low-rank part \(L\) via its nuclear norm (the sum of singular values). The fixed effects are estimated explicitly and left unregularised, which substantially improves imputation. The problem is solved by the SOFT-IMPUTE iteration (singular-value soft-thresholding) with the regularisation strength chosen by cross-validation over the observed cells.

MC-NNM nests the unconfoundedness, synthetic-control, and difference-in-differences estimators (paper Theorem 1): all minimise the same objective and differ only in the restrictions/regularisation they impose. By regularising rather than imposing hard restrictions, MC-NNM performs well whether N >> T, T >> N, or N ~ T – regimes where the unconfoundedness or synthetic-control approaches individually break down.

This estimator targets the block / staggered-adoption causal setting: control units and treated units’ pre-treatment periods are the observed entries; treated post-treatment cells are imputed, and the treatment effect is the observed outcome minus the imputed counterfactual.

class mlsynth.estimators.mcnnm.MCNNM(config: MCNNMConfig | dict)#

Bases: object

Matrix Completion with Nuclear Norm Minimization estimator.

Parameters:

config (MCNNMConfig or dict) – Configuration object. See mlsynth.config_models.MCNNMConfig.

fit() MCNNMResults#

Run MC-NNM and return MCNNMResults.

Configuration#

class mlsynth.config_models.MCNNMConfig(*, df: ~pandas.DataFrame, outcome: str, treat: str, unitid: str, time: str, display_graphs: bool = True, save: bool | str = False, counterfactual_color: ~typing.List[str] = <factory>, treated_color: str = 'black', estimate_unit_fe: bool = True, estimate_time_fe: bool = True, n_lambda: ~typing.Annotated[int, ~annotated_types.Ge(ge=2)] = 40, n_folds: ~typing.Annotated[int, ~annotated_types.Ge(ge=2)] = 5, inference: bool = False, alpha: ~typing.Annotated[float, ~annotated_types.Gt(gt=0.0), ~annotated_types.Lt(lt=1.0)] = 0.05, random_state: int = 0)#

Configuration for the MC-NNM estimator.

Athey, Bayati, Doudchenko, Imbens & Khosravi (2021), “Matrix Completion Methods for Causal Panel Data Models” (JASA). Imputes the treated cells of the outcome matrix via nuclear-norm-regularised low-rank matrix completion with unregularised two-way fixed effects (SOFT-IMPUTE, threshold chosen by cross-validation). Inherits the standard df / outcome / treat / unitid / time interface.

Parameters:
  • estimate_unit_fe (bool) – Estimate (unregularised) unit fixed effects. Default True.

  • estimate_time_fe (bool) – Estimate (unregularised) time fixed effects. Default True.

  • n_lambda (int) – Number of candidate singular-value thresholds in the CV grid.

  • n_folds (int) – Cross-validation folds over the observed cells.

  • inference (bool) – Run a leave-one-control jackknife for the ATT SE / CI. Default False (it refits the model once per control unit).

  • alpha (float) – Two-sided level for the jackknife confidence interval.

  • random_state (int) – Seed for the CV fold assignment.

alpha: float#
estimate_time_fe: bool#
estimate_unit_fe: bool#
inference: bool#
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_folds: int#
n_lambda: int#
random_state: int#

Result Containers#

MCNNM.fit() returns a MCNNMResults: the ATT, the completed counterfactual matrix and per-cell effects, att_by_period, cohort_att and event_study aggregations, the low-rank factors (unit_factors, time_factors, singular_values, rank), the fixed effects, the CV-selected best_lambda, implied (non-unique) donor weights, and – when requested – the MCNNMInference jackknife.

Frozen dataclasses for the MC-NNM estimator.

Athey, Bayati, Doudchenko, Imbens & Khosravi (2021), Matrix Completion Methods for Causal Panel Data Models (JASA). MC-NNM imputes the missing (treated) entries of the outcome matrix by nuclear-norm-regularised low-rank matrix completion with unregularised two-way fixed effects, then forms treatment effects as observed minus imputed.

class mlsynth.utils.mcnnm_helpers.structures.MCNNMInference(method: str, se: float, ci: tuple, alpha_level: float, n_jackknife: int)#

Bases: object

Leave-one-control jackknife inference for the MC-NNM ATT.

method#

"jackknife".

Type:

str

se#
Type:

float

ci#
Type:

tuple of float

alpha_level#
Type:

float

n_jackknife#
Type:

int

alpha_level: float#
ci: tuple#
method: str#
n_jackknife: int#
se: float#
class mlsynth.utils.mcnnm_helpers.structures.MCNNMInputs(Y: ndarray, mask: ndarray, D: ndarray, treated_idx: ndarray, T0: int, unit_names: List[Any], time_labels: ndarray)#

Bases: object

Preprocessed panel for MC-NNM.

Y#

Observed outcomes, shape (N, T).

Type:

np.ndarray

mask#

Observation indicator, shape (N, T); 1 observed (control / pre-treatment), 0 missing (treated post-treatment).

Type:

np.ndarray

D#

Treatment indicators, shape (N, T) (1 where treated).

Type:

np.ndarray

treated_idx#

Indices of ever-treated units.

Type:

np.ndarray

T0#

First treated period.

Type:

int

unit_names#
Type:

list

time_labels#
Type:

np.ndarray

D: ndarray#
property N: int#
property T: int#
T0: int#
Y: ndarray#
mask: ndarray#
time_labels: ndarray#
treated_idx: ndarray#
unit_names: List[Any]#
class mlsynth.utils.mcnnm_helpers.structures.MCNNMResults(inputs: ~mlsynth.utils.mcnnm_helpers.structures.MCNNMInputs, att: float, counterfactual: ~numpy.ndarray, effects: ~numpy.ndarray, att_by_period: ~typing.Dict[~typing.Any, float], cohort_att: ~typing.Dict[~typing.Any, float], event_study: ~typing.Dict[int, float], L: ~numpy.ndarray, gamma: ~numpy.ndarray, delta: ~numpy.ndarray, best_lambda: float, rank: int, unit_factors: ~numpy.ndarray | None = None, time_factors: ~numpy.ndarray | None = None, singular_values: ~numpy.ndarray | None = None, weights: ~typing.Any | None = None, inference: ~mlsynth.utils.mcnnm_helpers.structures.MCNNMInference | None = None, metadata: ~typing.Dict[str, ~typing.Any] = <factory>)#

Bases: object

Top-level container returned by mlsynth.MCNNM.fit().

inputs#
Type:

MCNNMInputs

att#

Average treatment effect on the treated (observed minus imputed over the treated cells).

Type:

float

counterfactual#

Full fitted matrix L + Gamma + Delta, shape (N, T); on treated cells this is the imputed untreated potential outcome.

Type:

np.ndarray

effects#

Per-cell effects (observed minus imputed) on treated cells; NaN elsewhere, shape (N, T).

Type:

np.ndarray

att_by_period#

{period_label: mean effect across treated units} post-treatment (calendar time – pools cohorts at each period).

Type:

dict

cohort_att#

{adoption_time_label: mean ATT for that adoption cohort} – the cohort-specific effects under staggered adoption.

Type:

dict

event_study#

{relative_time: mean effect across treated cells at that event time} where relative time is period - adoption period for each treated unit. Negative keys are pre-adoption (a placebo / fit-quality check, ~0); non-negative keys are the dynamic treatment effects.

Type:

dict

L#

Estimated low-rank matrix, shape (N, T).

Type:

np.ndarray

gamma#

Unit fixed effects, shape (N,).

Type:

np.ndarray

delta#

Time fixed effects, shape (T,).

Type:

np.ndarray

best_lambda#

Cross-validation-selected singular-value threshold.

Type:

float

rank#

Numerical rank of L (singular values > 1e-6 of the max).

Type:

int

unit_factors#

Unit loadings \(U \Sigma^{1/2}\) from the SVD of L, shape (N, rank) – MC-NNM’s “under the hood”: each unit’s position in the latent factor space.

Type:

np.ndarray

time_factors#

Time factors \(V \Sigma^{1/2}\), shape (T, rank).

Type:

np.ndarray

singular_values#

Singular values of L (the nuclear-norm spectrum).

Type:

np.ndarray

weights#

Implied per-treated-unit donor weights, obtained by projecting the treated unit’s low-rank row onto the control rows. MC-NNM is a factorisation (not a weighting) estimator, so these are a derived, non-unique diagnostic – flagged as such in summary_stats.

Type:

WeightsResults, optional

inference#

MCNNMInference when inference=True; else None.

Type:

object, optional

metadata#
Type:

dict

L: ndarray#
att: float#
att_by_period: Dict[Any, float]#
best_lambda: float#
cohort_att: Dict[Any, float]#
counterfactual: ndarray#
delta: ndarray#
effects: ndarray#
event_study: Dict[int, float]#
gamma: ndarray#
inference: MCNNMInference | None = None#
inputs: MCNNMInputs#
metadata: Dict[str, Any]#
rank: int#
singular_values: ndarray | None = None#
time_factors: ndarray | None = None#
unit_factors: ndarray | None = None#
weights: Any | None = None#

Helper Modules#

Data preparation – the DataFrame touchpoint: pivots to the outcome matrix and the observed/treated masks.

Panel ingestion for the MC-NNM estimator.

mlsynth.utils.mcnnm_helpers.setup.prepare_mcnnm_inputs(df: DataFrame, outcome: str, treat: str, unitid: str, time: str) MCNNMInputs#

Pivot a long panel into MCNNMInputs.

Observed entries (mask == 1) are the control units and the treated units’ pre-treatment periods; the treated post-treatment cells are the missing entries MC-NNM imputes.

The SOFT-IMPUTE completion solver and the cross-validation over \(\lambda\).

Core MC-NNM engine: soft-impute with unregularised fixed effects.

Athey, S., Bayati, M., Doudchenko, N., Imbens, G. & Khosravi, K. (2021). “Matrix Completion Methods for Causal Panel Data Models.” Journal of the American Statistical Association 116(536):1716-1730.

The model is \(Y = L^* + \Gamma 1_T^\top + 1_N \Delta^\top + \varepsilon\), and the estimator (paper eq. 4.3)

\[(\widehat L, \widehat\Gamma, \widehat\Delta) = \arg\min_{L, \Gamma, \Delta} \frac{1}{|\mathcal{O}|} \bigl\| P_\mathcal{O}(Y - L - \Gamma 1_T^\top - 1_N \Delta^\top) \bigr\|_F^2 + \lambda \|L\|_*,\]

regularises only the low-rank part \(L\) (the unit/time fixed effects \(\Gamma, \Delta\) are estimated explicitly, unregularised, to reduce bias). It is solved by the SOFT-IMPUTE iteration (eq. 4.4-4.5): soft-threshold the singular values of the filled-in matrix, then re-fit the fixed effects, until convergence.

mlsynth.utils.mcnnm_helpers.completion.mcnnm_cv(Y: ndarray, mask: ndarray, *, est_u: bool = True, est_v: bool = True, n_lam: int = 40, n_folds: int = 5, max_iter: int = 400, tol: float = 1e-05, random_state: int = 0) dict#

Select the threshold by K-fold cross-validation over observed cells.

For each candidate threshold, a fraction of observed entries is held out, the model is fit on the rest, and out-of-sample squared error is averaged over folds; the threshold minimising it is chosen, then the final fit uses all observed entries (paper “Cross-validation”).

mlsynth.utils.mcnnm_helpers.completion.mcnnm_fit(Y: ndarray, mask: ndarray, thr: float, *, est_u: bool = True, est_v: bool = True, max_iter: int = 400, tol: float = 1e-05) dict#

Fit MC-NNM for a given SVD threshold thr.

Parameters:
  • Y (np.ndarray) – Outcome matrix, shape (N, T). Missing entries may hold any value; only observed (mask == 1) entries are used.

  • mask (np.ndarray) – Observation indicator, shape (N, T); 1 observed, 0 missing (to be imputed).

  • thr (float) – Singular-value soft-threshold (the regularisation strength).

  • est_u, est_v (bool) – Estimate unit / time fixed effects.

Returns:

  • dict with L (low-rank matrix), gamma (N,), delta (T,),

  • and completed = L + gamma + delta (the full fitted matrix).

Run loop: completion, ATT and staggered aggregations, factor decomposition, and the jackknife.

Orchestration for the MC-NNM estimator (Athey et al. 2021).

mlsynth.utils.mcnnm_helpers.pipeline.run_mcnnm(inputs: MCNNMInputs, *, est_u: bool = True, est_v: bool = True, n_lam: int = 40, n_folds: int = 5, max_iter: int = 400, tol: float = 1e-05, inference: bool = False, alpha_level: float = 0.05, random_state: int = 0, adoption_times: dict | None = None) MCNNMResults#

Run MC-NNM (CV over the threshold) and assemble MCNNMResults.

Parameters:
  • inputs (MCNNMInputs)

  • est_u, est_v (bool) – Estimate unit / time fixed effects (recommended; default True).

  • n_lam (int) – Number of candidate thresholds in the CV grid.

  • n_folds (int) – Cross-validation folds over observed cells.

  • inference (bool) – If True, run a leave-one-control jackknife (at the CV-selected threshold) for the ATT SE / CI.