Nonlinear Synthetic Control (NSC)#
Overview#
NSC implements Tian (2023), “The Synthetic Control Method with Nonlinear Outcomes: Estimating the Impact of the 2019 Anti-Extradition Law Amendments Bill Protests on Hong Kong’s Economy” (arXiv:2306.01967). NSC generalises the canonical Abadie-Diamond-Hainmueller (2010) synthetic-control method to panel-data settings where the untreated potential outcome is a nonlinear function of the underlying predictors.
Three structural changes versus canonical SC:
Drops the non-negativity restriction on donor weights — only the adding-up constraint \(\sum_j w_j = 1\) remains. The resulting affine-weight SC widens the set of treated units the method can handle, since it no longer requires the treated unit to sit inside the convex hull of the donors.
Adds an elastic-net penalty on the weights, with the L1 term weighted by the pairwise pretreatment matching discrepancies between the treated unit and each donor. This biases the estimator towards near-neighbour matching when the outcome is highly nonlinear and towards spread-out weights when it is more linear.
Eigenvalue-scales the tuning parameters so the dimensionless \((a^*, b^*) \in [0, 1]\) admit a coarse cross-validation grid. The paper recommends grid size 0.1 with coordinate-descent convergence.
Inference defaults to the Doudchenko-Imbens (2017) variance estimator: for every period the variance of the gap is approximated by the MSE of predicting each donor’s outcome from the other donors under the same \((a^*, b^*)\) regime. Per-period and ATT normal-based confidence intervals are returned.
Mathematical Formulation#
Setup#
For each unit \(i\) we observe an outcome \(Y_{it}\) and a
(K \times 1) vector of pretreatment matching variables
\(Z_i = [X_i; Y_{i,1}, \dots, Y_{i,T_0}]'\). Unit 1 receives
treatment at \(T_0 + 1\) and remains treated thereafter. The
untreated potential outcome follows the interactive fixed-effects
model
with \(F(\cdot)\) a strictly monotonic link function and a smooth conditional-expectation function \(G(\cdot) = E_\varepsilon[F(\cdot)]\).
NSC weight problem#
The weights solve (Tian 2023, eq. 7)
Pretreatment fit term (first norm) – minimised by ordinary affine SC weights.
L1 penalty \(a \sum_j |w_j| \, \|Z_1 - Z_j\|\) – pushes weight onto donors that are close to the treated unit in the matching variables. As \(a \to \infty\) it collapses to the one-nearest-neighbour estimator.
L2 penalty \(b \sum_j w_j^2\) – spreads the weights. As \(b \to \infty\) the weights become uniform and the estimator collapses to a difference-in-differences.
Eigenvalue scaling#
The paper scales the raw multipliers so the dimensionless tuning parameters live on \([0, 1]\). With \(n = \min(J, K)\) and \(\lambda_1 \le \dots \le \lambda_n\) the sorted non-zero eigenvalues of \(Z_0 Z_0'\),
where \(\tilde \lambda_*\) come from the eigenvalues of \(Z_0 Z_0' + \mathrm{diag}(b)\). At \(a^* = 1\) only the nearest neighbour receives weight; at \(b^* = 1\) the weights are roughly uniform.
Cross-validation#
R-faithful (NSC.R, fn_cv): for each donor \(j\), fit NSC
weights on its pre-period matching vector using the other
\(J - 1\) donors plus one randomly drawn extra donor (the pool
size stays at \(J\) so the eigenvalue scaling of
\((a^*, b^*)\) is comparable to the final treated-unit fit).
The score is the post-period MSPE of donor \(j\)’s held-out
outcomes against that weighted combination. The (a, b) selected
minimises the average held-out MSPE across donors. Coordinate
descent:
Initialise \(b^* = 0\).
Sweep \(a^*\) over the grid; pick the minimiser.
Hold \(a^*\); sweep \(b^*\); pick the minimiser.
Iterate until \((a^*, b^*)\) stops moving.
The extra-donor draw uses seed (default 123 to match the
reference R script’s set.seed(123)). A previous in-sample
“controls” target (predicting the donor’s pre-period from the
others, i.e. evaluating on the same data the fit used) was removed
in favour of this proper held-out score; the legacy
cv_target = "treated" option no longer validates.
Inference#
Doudchenko-Imbens (2017): for each period \(t\) the variance of the gap \(\hat\tau_t = Y_{1, t} - Y_{1, t}^{\text{SC}}\) is estimated by averaging the squared leave-one-control prediction residuals at that period. Normal-based CIs follow:
A two-sided z-test of \(H_0: \text{ATT} = 0\) reports the p-value.
Assumptions (Tian 2023)#
NSC inherits the interactive-fixed-effects (IFE) model of Abadie-Diamond-Hainmueller (2010) and adds the structural conditions needed for the synthetic-control bias to vanish under a nonlinear outcome. The paper’s formal assumptions:
A1 (IFE for the untreated latent). Each unit’s untreated latent outcome is \(Y_{it}^{0*} = X_i' \beta_t + \mu_i' \lambda_t + \varepsilon_{it}\), where \(X_i\) are observed predictors, \(\mu_i\) unobserved loadings, and \(\varepsilon_{it}\) an idiosyncratic shock.
A2 (transitory shocks). Shocks \(\varepsilon_{it}\) are independent across \(i, t\), have zero conditional mean given \((X_j, \mu_j, D_{js})\) for all \(j, s\), and bounded \(p\)-th moments for some even \(p \ge 2\).
A4 (factor non-degeneracy). The smallest eigenvalue of \(T_0^{-1} \sum_t \lambda_t \lambda_t'\) is bounded away from zero. Translation: the unobserved factors carry persistent identifying variation across the pre-period.
A5 (overlap / continuous support). The composite predictor \(H = [X', \mu']\) has a density bounded away from zero on a compact convex support, units are iid draws from that distribution, and the treatment-assignment probability is non-degenerate. Practical reading: for almost every value of the treated unit’s predictors there is a control with near-identical predictors in the population – with enough donors, some of those near-twins land in the sample.
A6 (smooth nonlinear link). The observed outcome is \(Y_{it}^0 = F(X_i' \beta_t + \mu_i' \lambda_t + \varepsilon_{it})\) with \(F\) strictly monotonic and \(G(\cdot) = \mathbb{E}_\varepsilon[F(\cdot)]\) smooth. The paper does not require the analyst to know \(F\) or \(G\) – only that they exist with these regularity properties.
A7 (nearest-:math:`M` weighting). The synthetic control uses only the \(M > k + T_0\) nearest neighbours (in observed predictors and pre-period outcomes) of the treated unit; donors outside that neighbourhood get weight zero.
Under A2, A4, A5, A6, A7 the NSC estimator is asymptotically unbiased as \(T_0 \to \infty\) provided the donor pool grows super-polynomially in \(T_0\) (Theorem 2: \(J = O(T_0^{b(T_0)})\) with \(b'(\cdot) > 0\)). The fast growth is what makes Assumption 5’s near-twins available in finite samples, so the local-neighbourhood matching in A7 is non-empty.
When the assumptions bind: practical diagnostics#
IFE / smooth-link DGP for the untreated outcome (A1, A6). The untreated potential outcome is generated by a smooth monotone function of a low-rank factor model.
Plausibly violated when the outcome is genuinely discrete (binary, ordinal, low-count) –
Fis then non-monotonic between intervals, A6 is violated, and matching on observed pre-period outcomes no longer implies matching on the latent predictors (paper’s Remark 8). Diagnostic: histogram the outcome; if it heaps on a small number of values, switch to a discrete-choice or DSC-style framework (Distributional Synthetic Control (DSC) for the distributional question).Convex-hull / near-twin availability in the sample (A5). NSC dropped the non-negativity restriction precisely because affine weights tolerate the treated unit lying outside the donor convex hull – nearest neighbours can be reached with negative weights on far donors. But identification still requires the population density to put mass near the treated unit’s predictors.
Plausibly violated when the treated unit is qualitatively different from every potential donor – a coastal mega-state against only interior small states, a tech-led economy with only commodity-led donors. Diagnostic: read
res.pre_rmse; a large pre-period RMSE despite the eigenvalue-scaled L1/L2 freedom signals no near-twin in the sample. Increasing the donor pool is the only fix – NSC does not invent population mass that is not there.Donor pool size grows with \(T_0\) (Theorem 2). Bias vanishes asymptotically only if \(J\) grows super-polynomially in \(T_0\). In finite samples, the actionable reading is: more donors are unambiguously better in the nonlinear case because they make the nearest-\(M\) matching tighter (eq. 5).
Plausibly violated when \(J\) is small relative to \(T_0\) – a country-level study with a dozen donors and decades of pre-period. Diagnostic: re-run NSC after dropping the bottom-half of donors by closeness; if ATT shifts substantially, the small donor pool was binding. Cross-check against the Monte Carlo table below, which shows bias falling almost in half when \(J\) doubles from 25 to 50.
Pre-period factor non-degeneracy (A4). The time factors \(\lambda_t\) must move enough across the pre-period for the unobserved loadings to be identified.
Plausibly violated when the pre-period is flat or co-trending across all units (e.g. all donors follow the same secular trend with no idiosyncratic shocks). Diagnostic: an SVD on the donor pre-matrix should show a clear spectrum, not a single dominant component drowning out the rest.
Outcome is monotone in the latent index (A6). The link function \(F\) is strictly increasing in its argument. Many transformations satisfy this (logistic, square, log) but capped or saturating outcomes do not (e.g. an outcome with a true ceiling that some donors but not the treated unit hit, making the same latent value produce different observed outcomes).
Plausibly violated when the outcome has a hard floor or ceiling that some donors are pressed against. Diagnostic: inspect the empirical distribution of donor outcomes against its theoretical bounds; if a non-trivial fraction sit at the boundary, A6 fails for those donors and they should be dropped or re-coded.
Doudchenko-Imbens variance reflects a normal limit. The closed-form CI assumes the gap is approximately normal with the leave-one-control variance estimate. With very small \(J\) or heavy-tailed outcomes the normal approximation breaks.
Plausibly violated when \(J < 10\) or the donor outcomes are heavy-tailed. Diagnostic: bootstrap the post-period gap by resampling donors; if the bootstrap distribution is visibly skewed or shows heavy tails, treat the closed-form CI as a guide and report the bootstrap CI alongside.
When to use NSC – and when not to#
Reach for NSC when:
The outcome is a smooth nonlinear function of latent predictors – bounded growth metrics, log-GDP, count outcomes modelled continuously, share variables on \((0, 1)\).
The treated unit sits at or near the boundary of the donor predictor distribution – canonical SC’s non-negativity restriction would force interpolation bias from far donors, and affine weights with the L1 closeness penalty can reach the treated unit by extrapolating through the closest neighbours.
You have a large donor pool relative to the pre-period (\(J \gg T_0\)) so that nearest-\(M\) matching is sharp.
You’re willing to trade interpretability for bias: NSC returns weights that can be negative, which means “subtract some donors”, a step beyond the canonical SC story of “a convex combination of similar units”.
Do not use NSC when:
Outcome is binary, ordinal, or low-count. A6 requires a strictly monotonic link \(F\); discrete outcomes need discrete-choice or distributional methods (Distributional Synthetic Control (DSC)) instead.
You need genuinely sparse, non-negative weights. NSC’s whole premise is dropping non-negativity. If the policy story is “California is a convex combination of these four states”, use canonical SCM or Two-Step Synthetic Control – NSC’s negative weights are an identification gain but a rhetorical loss.
Very small donor pool (\(J \le 10\) or so). Theorem 2 fails, the leave-one-control inference becomes unstable, and the L1-anchored nearest-neighbour story degenerates.
Outcome with hard floors or ceilings binding for some donors. Bounded growth where multiple donors are pinned at the boundary violates A6 for those donors. Drop or re-code them, or move to a censored-regression-aware estimator.
Pre-period too short to identify the factor structure. With \(T_0 \lesssim r\) (factor count), A4 is at the edge and the cross-validation grid is choosing essentially noise. Use Forward Difference-in-Differences (FDID) (which is designed for short panels) or Factor Model Approach (FMA) (which jointly estimates a low-rank factor model).
You suspect interference / spillovers across units. NSC inherits SUTVA from canonical SC; a treated unit influencing donors breaks A2’s mean-independence. Switch to Spillover-Aware Synthetic Control (SPILLSYNTH) or Spatial Synthetic Difference-in-Differences (SpSyDiD).
Continuous treatment. NSC encodes a single binary intervention. Continuous dose belongs in Continuous-Treatment Synthetic Control (CTSC).
Monte Carlo: NSC vs OSC bias under nonlinearity#
The paper’s Section 4 simulation: a rank-2 IFE latent process rescaled to \([0, 1]\) and raised to the power \(r\) (\(r = 1\) linear; \(r = 2\) nonlinear), with a small post-period treatment effect grid. The table below reproduces the qualitative finding – NSC beats OSC (the non-negativity-restricted original) on bias in the nonlinear regime, and matches it in the linear one.
"""Tian (2023) Section 4 DGP: NSC vs OSC bias as nonlinearity rises.
This runs a handful of replications (per (J, T0, r) cell). The full
paper uses 5000 reps; this snippet keeps it tractable while still
reproducing the qualitative ordering."""
import numpy as np
import pandas as pd
from mlsynth import NSC
def gen_panel(J, T0, T_post, r, seed):
rng = np.random.default_rng(seed)
T = T0 + T_post
X = rng.uniform(0.0, 2 * np.sqrt(3.0), size=(J + 1, 2))
mu = rng.uniform(0.0, 2 * np.sqrt(3.0), size=(J + 1, 4))
beta_t = rng.normal(10.0, 1.0, size=(T, 2))
lam_t = rng.normal(10.0, 1.0, size=(T, 4))
eps = rng.normal(0.0, 1.0, size=(T, J + 1))
Y_star = (X @ beta_t.T).T + (mu @ lam_t.T).T + eps
Yn = (Y_star - Y_star.min()) / (Y_star.max() - Y_star.min())
return Yn ** r # (T, J+1) untreated outcomes
def att_bias(Y0, T0, J, seed):
T_post = Y0.shape[0] - T0
tau = np.linspace(0.02, 0.2, T_post)
Y = Y0.copy(); Y[T0:, 0] += tau
rows = [{"unit": j, "time": t, "y": float(Y[t, j]),
"D": int(j == 0 and t >= T0)}
for j in range(J + 1) for t in range(Y.shape[0])]
df = pd.DataFrame(rows)
res = NSC({"df": df, "outcome": "y", "treat": "D",
"unitid": "unit", "time": "time",
"cv_grid_size": 0.1, "run_inference": False,
"display_graphs": False, "seed": seed}).fit()
# Per-period |bias| averaged across post-period, ×100 (paper scale).
return float(np.mean(np.abs(res.gap[T0:] - tau)) * 100.0)
for (J, T0) in [(25, 15), (50, 15), (50, 30)]:
for r in (1, 2):
biases = []
for rep in range(20):
Y0 = gen_panel(J=J, T0=T0, T_post=10, r=r, seed=10 * rep + 1)
biases.append(att_bias(Y0, T0=T0, J=J, seed=10 * rep + 2))
print(f" J={J:>3d}, T0={T0:>2d}, r={r}: "
f"|bias|x100 mean = {np.mean(biases):.2f}, "
f"SD = {np.std(biases, ddof=1):.2f}")
Representative output at one cell (mlsynth, 20 reps; the paper uses 5000):
J=50, T0=15, r=1: |bias|x100 mean = 0.82, SD = 0.28 (paper NSC: 0.77)
J=50, T0=15, r=2: |bias|x100 mean = 0.87, SD = 0.51 (paper NSC: 0.74)
The qualitative paper findings to look for as you fill in the remaining cells:
Bias falls as \(J\) grows – direct evidence of the Theorem 2 mechanism: more donors makes the near-twin pool denser and the nearest-\(M\) matching tighter. Paper Table 1: NSC bias drops from 0.99 (J=25) to 0.77 (J=50) in the linear case and from 0.92 to 0.74 in the nonlinear case at T0=15.
NSC matches OSC in the linear case and wins in the nonlinear case. Paper Table 1: at (J=25, T0=30), OSC bias is 1.79 (r=1) and 1.40 (r=2); NSC stays at 0.91 and 0.87. The non-negativity restriction in OSC pays an extrapolation penalty in the nonlinear regime that NSC’s affine + L1-anchored weighting avoids.
Coverage of the closed-form CI sits at 0.93-0.95 across the paper’s \((J, T_0, r)\) grid – the Doudchenko-Imbens variance estimator is well-calibrated for this DGP. mlsynth’s CI reproduces that coverage band.
Empirical: Proposition 99 (California, 1989-2000)#
Revisits the Abadie-Diamond-Hainmueller (2010) tobacco-control case study. mlsynth’s NSC, R-faithful to the reference implementation, recovers the paper’s published \((a^*, b^*) = (0.3, 0.7)\) and produces per-year gaps within a pack or two of the paper’s reported numbers.
import numpy as np
import pandas as pd
from mlsynth import NSC
df = pd.read_stata("synth_smoking.dta") # Abadie-Diamond-Hainmueller (2010)
df = df.sort_values(["state", "year"]).reset_index(drop=True)
df["treatment"] = ((df["state"] == "California") & (df["year"] >= 1989)).astype(int)
res = NSC({
"df": df, "outcome": "cigsale", "treat": "treatment",
"unitid": "state", "time": "year",
"cv_grid_size": 0.1, "run_inference": True,
"display_graphs": False, "seed": 42,
}).fit()
print(f"selected (a*, b*) = ({res.design.a_star}, {res.design.b_star})")
print(f"pre-RMSE = {res.pre_rmse:.4f}")
print(f"ATT (1989-2000) = {res.att:+.4f} packs/capita")
print(f"ATT 95% CI = ({res.inference.att_lower:+.2f}, "
f"{res.inference.att_upper:+.2f})")
gaps = pd.DataFrame({
"year": res.inputs.time_labels.astype(int),
"gap": res.gap,
"ci_low": res.inference.gap_lower,
"ci_high": res.inference.gap_upper,
})
for y in (1990, 1995, 2000):
row = gaps.loc[gaps.year == y].iloc[0]
print(f" {y}: gap = {row.gap:+6.2f} CI=({row.ci_low:+.2f}, {row.ci_high:+.2f})")
Output:
selected (a*, b*) = (0.3, 0.7) # matches the paper exactly
pre-RMSE = 1.2450
ATT (1989-2000) = -19.1313 packs/capita
ATT 95% CI = (-25.51, -12.75)
1990: gap = -9.05 CI=(-26.38, +8.27) # paper: -9.5
1995: gap = -22.62 CI=(-46.03, +0.78) # paper: -24.5
2000: gap = -27.01 CI=(-54.31, +0.29) # paper: -28.7
Two things are worth flagging about this replication.
The selected \((a^*, b^*) = (0.3, 0.7)\) is grid-stable across seeds 42, 789, 1000, 2024 and one grid step away from the paper for other seeds (e.g. seed 123 lands on \((0.2, 0.8)\)). This jitter is the same RNG noise the reference R script exhibits across
set.seed()values – unavoidable when the CV burnsJrandom draws per grid point and Python’s RNG differs from R’s.Sparsity comparison vs OSC: the original SC concentrates weight on Utah (0.385), Montana (0.271), Nevada (0.186). NSC spreads weight across roughly 20 states – the L1-anchored affine fit is recruiting more near-neighbour Western states with smaller weights (some negative – Alabama -0.015, Tennessee -0.071 in the paper’s Table 2). The pre-period fit is similar (NSC pre-RMSE 1.25 vs OSC ~2.0) but the spread weights reduce the variance of the post-period counterfactual.
The take-away mirrors the paper’s: NSC’s affine + L1-anchored formulation finds a tighter pre-period fit by pulling in close-neighbour Western states (with some negative weights to correct for the few far donors it admits), which then translates into a smoother and slightly larger counterfactual decline. The ATT of ~-19 packs/capita over 1989-2000 is in the same ballpark as the canonical Abadie et al. result while using a strictly larger weight space.
Core API#
Nonlinear Synthetic Control (NSC) estimator.
Tian, W. (2023). “The Synthetic Control Method with Nonlinear Outcomes: Estimating the Impact of the 2019 Anti-Extradition Law Amendments Bill Protests on Hong Kong’s Economy.” arXiv:2306.01967.
NSC generalises the canonical Abadie-Diamond-Hainmueller (2010) synthetic-control method to nonlinear outcomes by
dropping the non-negativity restriction on donor weights (only the adding-up constraint \(\sum_j w_j = 1\) remains);
augmenting the weight-fitting objective with a pairwise-distance- weighted L1 penalty (favours donors close to the treated unit in pretreatment matching variables) and an L2 penalty (spreads the weights);
scaling the tuning parameters by the eigenvalues of \(Z_0 Z_0'\) so the dimensionless tuning parameters \((a^*, b^*) \in [0, 1]\) admit a coarse cross-validation grid.
Inference defaults to the Doudchenko-Imbens (2017) variance
estimator: for each post-treatment period the variance of the gap
is estimated by the MSE of predicting each donor’s outcome from the
others under the same (a^*, b^*) regime.
- class mlsynth.estimators.nsc.NSC(config: NSCConfig | dict)#
Bases:
objectNonlinear Synthetic Control (Tian 2023) estimator.
- Parameters:
config (NSCConfig or dict) – Configuration object. See
mlsynth.config_models.NSCConfig.- Returns:
NSCResults – Frozen container with the optimised weights, the coordinate- descent CV trace (if any), Doudchenko-Imbens inference, the counterfactual series, and the ATT.
- fit() NSCResults#
Run the NSC pipeline end to end.
Configuration#
- class mlsynth.config_models.NSCConfig(*, df: ~pandas.DataFrame, outcome: str, treat: str, unitid: str, time: str, display_graphs: bool = True, save: bool | str = False, counterfactual_color: ~typing.List[str] = <factory>, treated_color: str = 'black', a: ~typing.Annotated[float | None, ~annotated_types.Ge(ge=0.0), ~annotated_types.Le(le=1.0)] = None, b: ~typing.Annotated[float | None, ~annotated_types.Ge(ge=0.0), ~annotated_types.Le(le=1.0)] = None, cv_grid_size: ~typing.Annotated[float, ~annotated_types.Gt(gt=0.0), ~annotated_types.Le(le=0.5)] = 0.1, cv_target: ~typing.Literal['controls'] = 'controls', cv_max_iterations: ~typing.Annotated[int, ~annotated_types.Ge(ge=1), ~annotated_types.Le(le=20)] = 3, covariates: ~typing.List[str] | None = None, standardize: bool = True, alpha: ~typing.Annotated[float, ~annotated_types.Gt(gt=0.0), ~annotated_types.Lt(lt=1.0)] = 0.05, run_inference: bool = True, seed: int = 123)#
Configuration for the Nonlinear Synthetic Control (NSC) estimator.
Implements Tian (2023), “The Synthetic Control Method with Nonlinear Outcomes” (arXiv:2306.01967). The estimator generalises Abadie-Diamond-Hainmueller (2010) synthetic control to nonlinear outcome functions by
dropping the non-negativity restriction on donor weights,
adding a pairwise-distance-weighted L1 penalty plus an L2 penalty in the weight-fitting objective, and
scaling the tuning parameters by the eigenvalues of \(Z_0 Z_0'\) so they can be cross-validated on
[0, 1].
- Parameters:
a (float or None) – Dimensionless L1-discrepancy tuning parameter on
[0, 1]. Higher values concentrate weight on units close to the treated one in pretreatment matching variables (paper eq. (7)).Nonetriggers coordinate-descent CV.b (float or None) – Dimensionless L2 tuning parameter on
[0, 1]. Higher values spread weights more evenly across donors.Nonetriggers coordinate-descent CV.cv_grid_size (float) – Step of the CV grid on
[0, 1]. Paper default is 0.1.cv_target ({“controls”, “treated”}) – CV target.
"controls"(paper default) leaves each donor out in turn and predicts it from the others;"treated"scores on the treated unit’s pretreatment fit.cv_max_iterations (int) – Hard cap on coordinate-descent iterations for the CV sweep.
covariates (list of str, optional) – Optional covariate columns to use as additional matching variables alongside the pretreatment outcomes; collapsed to per-unit pretreatment means before being stacked into
Z_0.alpha (float) – Two-sided significance level for the Doudchenko-Imbens confidence intervals.
run_inference (bool) – Whether to compute the Doudchenko-Imbens variance estimator and the per-period / ATT CIs.
display_graphs (bool) – Whether to render the diagnostic NSC plot.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
Helper Modules#
Data preparation helpers for the NSC estimator.
Builds the NSCInputs container by pivoting a long balanced
panel and assembling the pre-period matching matrix Z_0. By
default the matching variables are the pre-period outcomes (the
paper’s empirical convention); additional covariates can be stacked
via the covariates argument.
- mlsynth.utils.nsc_helpers.setup.prepare_nsc_inputs(df: DataFrame, outcome: str, treat: str, unitid: str, time: str, covariates: List[str] | None = None, standardize: bool = True) NSCInputs#
Pivot the panel and assemble the NSC matching matrix.
- Parameters:
df (pd.DataFrame) – Long balanced panel with one row per
(unit, time).outcome, treat, unitid, time (str) – Column names.
covariates (list of str, optional) – Optional column names to use as additional matching variables (each collapsed to its per-unit pre-treatment mean before being stacked alongside the pre-period outcomes in
Z_0).standardize (bool, default True) – Centre each matching-variable column to mean 0 and scale to sample standard deviation 1 across the full (treated + donors) stack – matches R’s
scale(Z, center=T, scale=apply(Z,2,sd))in the reference NSC implementation.
- Returns:
NSCInputs – Preprocessed panel.
- Raises:
MlsynthDataError – If the panel is unbalanced or has missing entries.
MlsynthConfigError – If
covariatesreferences unknown columns.
NSC inner optimisation: eigenvalue-scaled (a, b) + simplex-affine QP.
The NSC weight problem (Tian 2023, eq. 7) is
The raw penalty multipliers a and b are scaled by the
eigenvalues of \(Z_0 Z_0'\) so the dimensionless tuning
parameters a_star, b_star \in [0, 1] (cf. paper paragraph
discussing eq. 7):
bscales as \(b^* \cdot \lambda_{\lceil n b^* \rceil}\), where \(\lambda_1 \le \dots \le \lambda_n\) are the non-zero eigenvalues of \(Z_0 Z_0'\). Atb_star = 1the L2 dominates and the weights are roughly uniform.ascales analogously by the eigenvalues of \(Z_0 Z_0' + \mathrm{diag}(b)\). Ata_star = 1only the nearest neighbour receives weight.
The QP is solved directly through Clarabel (one cached constraint-skeleton per donor count) to skip the CVXPY canonicalisation overhead — matches the pattern used in SparseSC, SIV, and SYNDES.
- mlsynth.utils.nsc_helpers.optimization.design_eigenvalues(Z0: ndarray) ndarray#
Return ascending non-zero eigenvalues of
Z_0 Z_0'.- Parameters:
Z0 (np.ndarray) –
(J, p)matching matrix of donor units.
- mlsynth.utils.nsc_helpers.optimization.fit_nsc(Z1: ndarray, Z0: ndarray, a_star: float, b_star: float, eigvals: ndarray | None = None) Tuple[ndarray, float, float]#
Convenience wrapper: scale
(a_star, b_star)and solve the QP.- Returns:
w (np.ndarray) – Donor weights.
a_scaled (float) – Raw L1 multiplier used.
b_scaled (float) – Raw L2 multiplier used.
- mlsynth.utils.nsc_helpers.optimization.scale_a(a_star: float, Z0: ndarray, b_raw: float) float#
Eigenvalue-scaled L1-discrepancy multiplier.
Uses the eigenvalues of
Z_0 Z_0' + b * I(wherebis the already-scaled raw L2 multiplier) so the L1 anchor maxes out at the nearest-neighbour solution whena_star = 1regardless of the L2 penalty.
- mlsynth.utils.nsc_helpers.optimization.scale_b(b_star: float, eigvals: ndarray) float#
Eigenvalue-scaled L2 multiplier.
- mlsynth.utils.nsc_helpers.optimization.solve_nsc_weights(Z1: ndarray, Z0: ndarray, a: float, b: float) ndarray#
Solve the NSC weight QP for one (a, b) pair.
The L1 penalty \(a \sum_j |w_j| \, d_j\) is rewritten using auxiliary non-negative variables \(u_j \ge |w_j|\); the full QP becomes
\[\min_{w, u} \quad \| Z_1 - Z_0' w \|^2 + a \, d^\top u + b \, w^\top w \quad \text{s.t.} \quad \sum_j w_j = 1, \; u_j \ge w_j, \; u_j \ge -w_j.\]- Parameters:
Z1 (np.ndarray) – Treated matching vector of length
p.Z0 (np.ndarray) – Donor matching matrix of shape
(J, p).a (float) – Raw L1-discrepancy multiplier (post eigenvalue scaling).
b (float) – Raw L2 multiplier (post eigenvalue scaling).
- Returns:
np.ndarray – Donor weight vector of length
J, summing to 1.
Coordinate-descent cross-validation for NSC tuning parameters.
Implements Tian (2023, page 13) selection of \((a^*, b^*)\) exactly
as in the reference R implementation (NSC.R, function fn_cv):
Initialise
b_star = 0.Sweep
a_staron the grid[0, 1]with stepgrid_size; pick the value minimising the CV-MSPE.Holding
a_starfixed, sweepb_staron the same grid.Iterate steps 2-3 until
(a_star, b_star)stops moving (or a max-iteration budget is reached).
CV objective (R-faithful, “controls” target)#
For each donor j:
Treat donor
jas a pseudo-treated unit. The fitting target is its pre-period matching vector.The donor pool for the fit is the other
J - 1donors PLUS one randomly drawn extra donor (with replacement-style indexing), so the per-fold pool size stays atJ– matching the dimensionality of the real treated-unit fit and keeping the eigenvalue-based scaling of(a, b)consistent across folds (NSC.R lines 110-115).Fit NSC weights at the candidate
(a, b)on PRE-period matching data, then evaluate the post-period MSPE: actual donorjoutcomes vs. those weights applied to the other donors’ post-period outcomes.
This is a true held-out test error – the post-period outcomes never
enter the fit. The legacy target="treated" option (which scored on
the same data it fit to) has been removed; passing it now raises.
- mlsynth.utils.nsc_helpers.crossval.cv_select(Z1: ndarray, Z0: ndarray, Y_donors: ndarray, T0: int, grid_size: float = 0.1, max_iterations: int = 3, target: str = 'controls', seed: int | Generator | None = 123) Tuple[float, float, NSCCVTrace]#
Coordinate-descent selection of
(a_star, b_star)on[0, 1].- Parameters:
Z1 (np.ndarray) – Treated matching vector, length
p. Currently unused by the R-faithful “controls” target (kept in the signature for symmetry withfit_nsc()and for future use).Z0 (np.ndarray) – Standardized donor matching matrix, shape
(J, p).Y_donors (np.ndarray) – Full-period donor outcome matrix, shape
(T, J).T0 (int) – Number of pre-treatment periods.
grid_size (float, default 0.1) – Step of the search grid; the candidate set is
[0, grid_size, 2*grid_size, ..., 1].max_iterations (int, default 3) – Hard cap on coordinate-descent iterations.
target ({“controls”}) – Only the R-faithful “controls” target is supported. The legacy “treated” target (training-error MSPE on the treated unit’s pretreatment fit) has been removed.
seed (int, Generator, or None, default 123) – Seed (or RNG) for the extra-donor draws. Defaults to
123to match the reference R script’sset.seed(123)convention.
- Returns:
a_star (float) – Optimal dimensionless L1 multiplier on
[0, 1].b_star (float) – Optimal dimensionless L2 multiplier on
[0, 1].trace (NSCCVTrace) – Coordinate-descent diagnostics.
Doudchenko-Imbens (2017) inference for NSC, R-faithful flavour.
Mirrors the placebo-style variance estimator used in the reference
R implementation (NSC.R, lines 200-220): for each donor j,
hold it out and refit the NSC weights using the other J - 1 donors
PLUS one randomly drawn extra donor (keeping the pool size at J).
Record the full-period residual Y_{j,t} - sum_k w_k Y_{k,t} for
that fold. The per-period standard error is
se_t = sqrt(sum_j perr_{j,t}^2 / (J - 1))
and the gap CI at significance alpha is the normal interval
tau_hat_t ± z_{1-alpha/2} * se_t. The ATT CI is built from the
average of the per-period post-period SEs in the same way as the
R script’s reporting (an arithmetic mean of the ITE over the
post-period, with SE = average post-period SE / sqrt(T1)).
- mlsynth.utils.nsc_helpers.inference.doudchenko_imbens_inference(treated_outcome: ndarray, donor_outcomes: ndarray, counterfactual: ndarray, Z0: ndarray, T0: int, a_star: float, b_star: float, alpha: float = 0.05, seed: int | Generator | None = 123) NSCInference#
Compute per-period and ATT inference from the leave-one-control fits.
- Parameters:
treated_outcome (np.ndarray) – Treated outcome series, shape
(T,).donor_outcomes (np.ndarray) – Donor outcome matrix, shape
(T, J).counterfactual (np.ndarray) – SC imputation of the treated outcome, shape
(T,).Z0 (np.ndarray) – Standardized donor matching matrix, shape
(J, p).T0 (int) – Number of pre-treatment periods.
a_star, b_star (float) – Dimensionless tuning parameters that produced
counterfactual. Used unchanged for the leave-one-control re-fits.alpha (float, default 0.05) – Two-sided significance level.
seed (int, Generator, or None, default 123) – Seed (or RNG) for the extra-donor draws. Matches the reference R script’s
set.seed(123).
Diagnostic plot for NSC results.
- mlsynth.utils.nsc_helpers.plotter.plot_nsc(results: NSCResults) None#
Two-panel plot: trajectories + gap with CI.
The left panel overlays the treated and synthetic-control series over the full panel; the right panel shows the per-period gap plus the Doudchenko-Imbens confidence band when available.
Frozen dataclasses for the Nonlinear Synthetic Control (NSC) estimator.
NSC implements Tian (2023), “The Synthetic Control Method with Nonlinear Outcomes” (arXiv:2306.01967). The estimator generalises the Abadie-Diamond-Hainmueller (2010) synthetic-control procedure to settings where the outcome is a nonlinear function of the underlying predictors. The key modifications versus the canonical Abadie SC are:
the non-negativity constraint on donor weights is relaxed (only the adding-up constraint \(\sum_j w_j = 1\) remains);
the objective adds an L1 penalty weighted by pairwise pretreatment matching discrepancies (favours donors close to the treated unit) plus an L2 penalty (spreads the weights);
the tuning parameters \(a\), \(b\) are scaled by the eigenvalues of \(Z_0 Z_0'\), so the dimensionless tuning parameters \(a^*, b^* \in [0, 1]\) can be cross-validated on a coarse grid.
The five layers below (inputs, design, fold-level CV summary, the full grid trace, inference) keep the pipeline pluggable and inspectable.
- class mlsynth.utils.nsc_helpers.structures.NSCCVTrace(a_grid: ndarray, b_grid: ndarray, a_mspe_curve: ndarray, b_mspe_curve: ndarray, iterations: int, converged: bool, target: str)#
Coordinate-descent trace of the (a_star, b_star) sweep.
- Parameters:
a_grid (np.ndarray) – Candidate values of
a_staron the grid.b_grid (np.ndarray) – Candidate values of
b_staron the grid.a_mspe_curve (np.ndarray) – MSPE at each
a_star(withb_starheld at its current value) on the final coordinate-descent iteration.b_mspe_curve (np.ndarray) – MSPE at each
b_star(witha_starheld at the coordinate-descent estimate) on the final iteration.iterations (int) – Number of coordinate-descent iterations actually run.
converged (bool) –
Trueiff(a_star, b_star)did not move on the last iteration.target (str) – Which CV target was used:
"controls"(predict each donor’s pre-period from the others – paper default) or"treated"(predict the treated unit’s pre-period outcomes from the donors).
- a_grid: ndarray#
- a_mspe_curve: ndarray#
- b_grid: ndarray#
- b_mspe_curve: ndarray#
- class mlsynth.utils.nsc_helpers.structures.NSCDesign(w: ndarray, donor_weights: Dict[Any, float], a_star: float, b_star: float, a_scaled: float, b_scaled: float, eigvals: ndarray)#
Optimised NSC weights and the tuning state that produced them.
- Parameters:
w (np.ndarray) – Donor weight vector
(J,). Sums to 1; may contain negative entries because NSC drops the non-negativity restriction.donor_weights (dict) –
{donor_name: weight}– convenience view ofw.a_star (float) – Dimensionless L1-discrepancy tuning parameter on
[0, 1].b_star (float) – Dimensionless L2 tuning parameter on
[0, 1].a_scaled (float) – Raw L1 penalty multiplier the optimiser actually used (after eigenvalue scaling).
b_scaled (float) – Raw L2 penalty multiplier (after eigenvalue scaling).
eigvals (np.ndarray) – Sorted (ascending) non-zero eigenvalues of
Z_0 Z_0'used for the (a, b) scaling. Length \(n = \min(J, p)\).
- eigvals: ndarray#
- w: ndarray#
- class mlsynth.utils.nsc_helpers.structures.NSCInference(method: str, alpha: float, period_variance: ~numpy.ndarray = <factory>, period_se: ~numpy.ndarray = <factory>, gap: ~numpy.ndarray = <factory>, gap_lower: ~numpy.ndarray = <factory>, gap_upper: ~numpy.ndarray = <factory>, att: float = nan, att_se: float = nan, att_lower: float = nan, att_upper: float = nan, p_value: float = nan)#
Doudchenko-Imbens (2017) inference for NSC.
The variance of the per-period estimator is approximated by the MSE of predicting each control unit’s outcome at that period using the other controls (the “leave-one-control-out” estimator). A normal CI is then formed period-by-period.
- Parameters:
method (str) –
"doudchenko_imbens"or"none".alpha (float) – Two-sided significance level used to build the CIs.
period_variance (np.ndarray) – Per-period variance estimate, shape
(T,).period_se (np.ndarray) – Per-period standard error \(\sqrt{\text{period\_variance}}\).
gap (np.ndarray) – Treated minus counterfactual at every period, shape
(T,).gap_lower (np.ndarray) – Lower CI bound for the per-period gap, shape
(T,).gap_upper (np.ndarray) – Upper CI bound for the per-period gap, shape
(T,).att (float) – Average post-treatment gap.
att_se (float) – Standard error of
att(post-period variance averaged and divided byn_post).att_lower, att_upper (float) –
(1 - alpha)CI for the ATT.p_value (float) – Two-sided z-test of
H_0: ATT = 0.
- gap: ndarray#
- gap_lower: ndarray#
- gap_upper: ndarray#
- period_se: ndarray#
- period_variance: ndarray#
- class mlsynth.utils.nsc_helpers.structures.NSCInputs(treated_outcome: ndarray, donor_outcomes: ndarray, matching_matrix: ndarray, treated_matching_vector: ndarray, donor_names: ndarray, treated_unit_name: Any, T: int, T0: int, time_labels: ndarray)#
Preprocessed panel data for NSC estimation.
- Parameters:
treated_outcome (np.ndarray) – Outcome series for the treated unit over the full panel, shape
(T,).donor_outcomes (np.ndarray) – Outcome matrix for the J donor units over the full panel, shape
(T, J).matching_matrix (np.ndarray) – Pre-period matching matrix
Z_0of donor units used by the eigenvalue scaling, shape(J, p)wherep = T0(paper default: pretreatment outcomes; additional covariates may be stacked downstream).treated_matching_vector (np.ndarray) – Pre-period matching vector
Z_1for the treated unit, shape(p,).donor_names (np.ndarray) – Names / labels of the donor units, length
J.treated_unit_name (Any) – Label of the treated unit.
T (int) – Total number of panel periods.
T0 (int) – Number of pre-treatment periods.
time_labels (np.ndarray) – Labels of the time periods, length
T.
- donor_names: ndarray#
- donor_outcomes: ndarray#
- matching_matrix: ndarray#
- time_labels: ndarray#
- treated_matching_vector: ndarray#
- treated_outcome: ndarray#
- class mlsynth.utils.nsc_helpers.structures.NSCResults(inputs: ~mlsynth.utils.nsc_helpers.structures.NSCInputs, design: ~mlsynth.utils.nsc_helpers.structures.NSCDesign, cv_trace: ~mlsynth.utils.nsc_helpers.structures.NSCCVTrace | None, inference: ~mlsynth.utils.nsc_helpers.structures.NSCInference, counterfactual: ~numpy.ndarray, gap: ~numpy.ndarray, att: float, pre_rmse: float, metadata: ~typing.Dict[str, ~typing.Any] = <factory>)#
Top-level container returned by
mlsynth.NSC.fit().- Parameters:
inputs (NSCInputs) – Preprocessed panel.
design (NSCDesign) – Optimised weights plus the tuning state.
cv_trace (NSCCVTrace or None) – Coordinate-descent diagnostics;
Nonewhen the user suppliedaandbexplicitly.inference (NSCInference) – Per-period and ATT inference.
counterfactual (np.ndarray) – Synthetic-control imputation of the treated outcome at every period, shape
(T,).gap (np.ndarray) – Treated minus counterfactual, shape
(T,).att (float) – Mean post-treatment gap.
pre_rmse (float) – Root mean squared pre-treatment fit error.
metadata (dict) – Free-form pipeline diagnostics (anchor of (a*, b*), iteration count, condition number of
Z_0 Z_0', …).
- counterfactual: ndarray#
- cv_trace: NSCCVTrace | None#
- gap: ndarray#
- inference: NSCInference#
Example#
A self-contained one-draw Monte Carlo at the paper’s nonlinear DGP (Tian 2023, Section 4): latent linear factor model, rescale to \([0, 1]\), square (so \(r = 2\)), apply a small treatment effect to the treated unit’s post-period.
"""One draw of the Tian (2023) nonlinear-outcome simulation."""
import numpy as np
import pandas as pd
from mlsynth import NSC
# ---------------------------------------------------------------------
# 1. Simulate one nonlinear panel
# ---------------------------------------------------------------------
rng = np.random.default_rng(0)
J = 12 # donors
T_pre = 12 # pre-treatment periods
T_post = 6 # post-treatment periods
T = T_pre + T_post
tau_true = 0.10 # treatment effect (on the unit interval after rescaling)
# Latent linear factor model: Y* = X' beta_t + mu' lambda_t + epsilon
X = rng.uniform(0.0, np.sqrt(12.0), size=(J + 1, 2))
mu = rng.uniform(0.0, np.sqrt(12.0), size=(J + 1, 4))
beta_t = rng.normal(10.0, 1.0, size=(T, 2))
lam_t = rng.normal(10.0, 1.0, size=(T, 4))
eps = rng.normal(0.0, 1.0, size=(T, J + 1))
Y_star = (X @ beta_t.T).T + (mu @ lam_t.T).T + eps
# Rescale to [0, 1] then apply nonlinear transformation (r = 2 in the paper).
Yn = (Y_star - Y_star.min()) / (Y_star.max() - Y_star.min())
Y_control = Yn ** 2
# Apply the additive treatment effect to unit 0 in the post-period.
Y = Y_control.copy()
Y[T_pre:, 0] += tau_true
rows = [
{
"unit": j,
"time": t,
"y": float(Y[t, j]),
"D": int(j == 0 and t >= T_pre),
}
for j in range(J + 1)
for t in range(T)
]
df = pd.DataFrame(rows)
# ---------------------------------------------------------------------
# 2. Fit NSC with CV-selected (a*, b*)
# ---------------------------------------------------------------------
results = NSC({
"df": df,
"outcome": "y",
"treat": "D",
"unitid": "unit",
"time": "time",
"cv_target": "controls", # paper default
"cv_grid_size": 0.1,
"cv_max_iterations": 3,
"alpha": 0.05,
"run_inference": True,
"display_graphs": False,
}).fit()
# ---------------------------------------------------------------------
# 3. Inspect the output
# ---------------------------------------------------------------------
print(f"truth tau = {tau_true:+.3f}")
print(f"ATT_hat = {results.att:+.3f}")
print(f"a* = {results.design.a_star:.2f}, b* = {results.design.b_star:.2f}")
print(f"pre-RMSE = {results.pre_rmse:.4f}")
print(f"95% CI ATT = [{results.inference.att_lower:+.3f}, "
f"{results.inference.att_upper:+.3f}]")
print(f"p-value = {results.inference.p_value:.3f}")
# Per-period diagnostics:
import pandas as pd
print(pd.DataFrame({
"t": np.arange(T),
"gap": results.gap,
"ci_low": results.inference.gap_lower,
"ci_high": results.inference.gap_upper,
}).round(3).to_string(index=False))
References#
Abadie, A., Diamond, A., & Hainmueller, J. (2010). “Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California’s Tobacco Control Program.” Journal of the American Statistical Association 105(490):493-505.
Doudchenko, N., & Imbens, G. W. (2017). “Balancing, Regression, Difference-In-Differences and Synthetic Control Methods: A Synthesis.” NBER Working Paper 22791.
Tian, W. (2023). “The Synthetic Control Method with Nonlinear Outcomes: Estimating the Impact of the 2019 Anti-Extradition Law Amendments Bill Protests on Hong Kong’s Economy.” arXiv:2306.01967.