Nonlinear Synthetic Control (NSC)

Nonlinear Synthetic Control (NSC)#

Overview#

NSC implements Tian (2023), “The Synthetic Control Method with Nonlinear Outcomes: Estimating the Impact of the 2019 Anti-Extradition Law Amendments Bill Protests on Hong Kong’s Economy” (arXiv:2306.01967). NSC generalises the canonical Abadie-Diamond-Hainmueller (2010) synthetic-control method to panel-data settings where the untreated potential outcome is a nonlinear function of the underlying predictors.

Three structural changes versus canonical SC:

Drops the non-negativity restriction on donor weights — only the adding-up constraint \(\mathbf{1}^\top \mathbf{w} = 1\) remains. The resulting affine-weight SC widens the set of treated units the method can handle, since it no longer requires the treated unit to sit inside the convex hull of the donors.
Adds an elastic-net penalty on the weights, with the L1 term weighted by the pairwise pretreatment matching discrepancies between the treated unit and each donor. This biases the estimator towards near-neighbour matching when the outcome is highly nonlinear and towards spread-out weights when it is more linear.
Eigenvalue-scales the tuning parameters so the dimensionless \((a^*, b^*) \in [0, 1]\) admit a coarse cross-validation grid. The paper recommends grid size 0.1 with coordinate-descent convergence.

Inference defaults to the Doudchenko-Imbens (2017) variance estimator: for every period the variance of the gap is approximated by the MSE of predicting each donor’s outcome from the other donors under the same \((a^*, b^*)\) regime. Per-period and ATT normal-based confidence intervals are returned.

When to use NSC – and when not to#

Reach for NSC when:

The outcome is a smooth nonlinear function of latent predictors – bounded growth metrics, log-GDP, count outcomes modelled continuously, share variables on \((0, 1)\).
The treated unit sits at or near the boundary of the donor predictor distribution – canonical SC’s non-negativity restriction would force interpolation bias from far donors, and affine weights with the L1 closeness penalty can reach the treated unit by extrapolating through the closest neighbours.
You have a large donor pool relative to the pre-period (\(N_0 \gg T_0\)) so that nearest-\(M\) matching is sharp.
You’re willing to trade interpretability for bias: NSC returns weights that can be negative, which means “subtract some donors”, a step beyond the canonical SC story of “a convex combination of similar units”.

Do not use NSC when:

Outcome is binary, ordinal, or low-count. A6 requires a strictly monotonic link \(\mathbf{F}\); discrete outcomes need discrete-choice or distributional methods (Distributional Synthetic Control (DSC)) instead.
You need genuinely sparse, non-negative weights. NSC’s whole premise is dropping non-negativity. If the policy story is “California is a convex combination of these four states”, use canonical SCM or Two-Step Synthetic Control – NSC’s negative weights are an identification gain but a rhetorical loss.
Very small donor pool (\(N_0 \le 10\) or so). Theorem 2 fails, the leave-one-control inference becomes unstable, and the L1-anchored nearest-neighbour story degenerates.
Outcome with hard floors or ceilings binding for some donors. Bounded growth where multiple donors are pinned at the boundary violates A6 for those donors. Drop or re-code them, or move to a censored-regression-aware estimator.
Pre-period too short to identify the factor structure. With \(T_0 \lesssim r\) (factor count), A4 is at the edge and the cross-validation grid is choosing essentially noise. Use Forward Difference-in-Differences (FDID) (which is designed for short panels) or Factor Model Approach (FMA) (which jointly estimates a low-rank factor model).
You suspect interference / spillovers across units. NSC inherits SUTVA from canonical SC; a treated unit influencing donors breaks A2’s mean-independence. Switch to Spillover-Aware Synthetic Control (SPILLSYNTH) or Spatial Synthetic Difference-in-Differences (SpSyDiD).
Continuous treatment. NSC encodes a single binary intervention. Continuous dose belongs in Continuous-Treatment Synthetic Control (CTSC).

A concrete example: a city raises its minimum wage and you want the effect on a bounded outcome – say the employment rate, a share on \((0, 1)\) whose untreated path is a smooth nonlinear function of latent local conditions. There is one treated city, a long pre-policy history, and a large pool of untreated cities, several of them close neighbours of the treated one. NSC builds a synthetic city from an affine (possibly negative-weight) blend that, anchored by the L1 closeness penalty, leans on those near-neighbours – reaching a treated unit that may sit outside the donors’ convex hull – and reads the policy effect as the post-period gap.

Notation#

Let \(j = 1\) denote the treated unit, with all units \(\mathcal{N} \coloneqq \{1, \dots, N\}\) and donor pool \(\mathcal{N}_0 \coloneqq \mathcal{N} \setminus \{1\}\) of cardinality \(N_0\). Time runs over \(t \in \mathcal{T} \coloneqq \{1, \dots, T\}\), 1-indexed; the intervention takes effect after period \(T_0\), splitting \(\mathcal{T}\) into the pre-period \(\mathcal{T}_1 \coloneqq \{t \in \mathcal{T} : t \le T_0\}\) (of length \(T_0\)) and the post-period \(\mathcal{T}_2 \coloneqq \{t \in \mathcal{T} : t > T_0\}\).

For each unit \(j\) we observe a scalar outcome \(y_{jt}\), with treated series \(\mathbf{y}_1 = (y_{11}, \dots, y_{1T})^\top \in \mathbb{R}^{T}\) and donor series \(\mathbf{y}_j\) stacked into the donor matrix \(\mathbf{Y}_0 \coloneqq [\mathbf{y}_j]_{j \in \mathcal{N}_0} \in \mathbb{R}^{T \times N_0}\) (one column per donor). Alongside the outcome each unit carries a \((K \times 1)\) vector of pretreatment matching variables \(\mathbf{z}_j \coloneqq [\mathbf{x}_j; y_{j,1}, \dots, y_{j,T_0}]'\), collecting observed predictors \(\mathbf{x}_j\) and the unit’s pre-period outcomes; the donors’ matching vectors stack into \(\mathbf{Z}_0 \coloneqq [\mathbf{z}_j]_{j \in \mathcal{N}_0}\).

Donor weights are \(\mathbf{w} \in \mathbb{R}^{N_0}\), constrained only by the affine adding-up condition \(\mathbf{1}^\top \mathbf{w} = 1\) (NSC drops the non-negativity restriction, so \(\mathbf{w}\) need not lie on the simplex \(\Delta^{N_0}\)); the optimiser is \(\mathbf{w}^\ast\). The synthetic counterfactual is \(\widehat{\mathbf{y}}_1 \coloneqq \mathbf{Y}_0\,\mathbf{w}^\ast\) with entries \(\widehat{y}_{1t}\), the per-period effect is \(\tau_t \coloneqq y_{1t} - \widehat{y}_{1t}\), and the ATT is \(\widehat{\tau} \coloneqq |\mathcal{T}_2|^{-1} \sum_{t \in \mathcal{T}_2} \tau_t\). The significance level is \(\alpha\), and \((a, b)\) (with eigenvalue-scaled dimensionless counterparts \((a^*, b^*) \in [0, 1]\)) are the L1 / L2 tuning parameters.

Assumptions (Tian 2023)#

NSC inherits the interactive-fixed-effects (IFE) model of Abadie-Diamond-Hainmueller (2010) and adds the structural conditions needed for the synthetic-control bias to vanish under a nonlinear outcome. The paper’s formal assumptions:

A1 (IFE for the untreated latent). Each unit’s untreated latent outcome is \(y_{jt}^{0*} = \mathbf{x}_j' \boldsymbol{\beta}_t + \boldsymbol{\mu}_j' \boldsymbol{\lambda}_t + \varepsilon_{jt}\), where \(\mathbf{x}_j\) are observed predictors, \(\boldsymbol{\mu}_j\) unobserved loadings, and \(\varepsilon_{jt}\) an idiosyncratic shock.

Remark. This is the workhorse structural model: an untreated path driven by a low-rank factor structure plus noise. It is what lets pre-period matching on the latent index carry over to the post-period – the same loadings govern both windows – and it is shared with canonical SC; NSC’s departure is only that this latent index is observed through a nonlinear link (A6).

A2 (transitory shocks). Shocks \(\varepsilon_{jt}\) are independent across \(j, t\), have zero conditional mean given \((\mathbf{x}_k, \boldsymbol{\mu}_k, d_{ks})\) for all \(k, s\), and bounded \(p\)-th moments for some even \(p \ge 2\).

Remark. Mean-independence carries the no-confounding / no-interference content: the treatment of unit \(1\) may not move any donor’s shocks (SUTVA), and the noise must not anticipate assignment. A donor contaminated by the same shock as the treated unit (a spillover, a co-incident policy) breaks it – quarantine such units from the donor pool before fitting.

A4 (factor non-degeneracy). The smallest eigenvalue of \(T_0^{-1} \sum_t \boldsymbol{\lambda}_t \boldsymbol{\lambda}_t'\) is bounded away from zero. Translation: the unobserved factors carry persistent identifying variation across the pre-period.

Remark. The loadings are identified only if the time factors actually move over the pre-period. A flat or fully co-trending pre-period (all units share one secular trend, no idiosyncratic shocks) starves the match of variation; an SVD on the donor pre-matrix should show a clear spectrum rather than a single dominant component.

A5 (overlap / continuous support). The composite predictor \(\mathbf{h} = [\mathbf{x}', \boldsymbol{\mu}']\) has a density bounded away from zero on a compact convex support, units are iid draws from that distribution, and the treatment-assignment probability is non-degenerate. Practical reading: for almost every value of the treated unit’s predictors there is a control with near-identical predictors in the population – with enough donors, some of those near-twins land in the sample.

Remark. This is the population-level near-twin condition that the nearest-\(M\) matching (A7) cashes out in finite samples. Affine weights tolerate a treated unit outside the donors’ convex hull, but identification still needs population mass near its predictors; a treated unit qualitatively unlike every donor (a coastal mega-state against only interior small states) leaves a large pre-period RMSE that more donors, not more tuning, must fix.

A6 (smooth nonlinear link). The observed outcome is \(y_{jt}^0 = \mathbf{F}(\mathbf{x}_j' \boldsymbol{\beta}_t + \boldsymbol{\mu}_j' \boldsymbol{\lambda}_t + \varepsilon_{jt})\) with \(\mathbf{F}\) strictly monotonic and \(\mathbf{G}(\cdot) = \mathbb{E}_\varepsilon[\mathbf{F}(\cdot)]\) smooth. The paper does not require the analyst to know \(\mathbf{F}\) or \(\mathbf{G}\) – only that they exist with these regularity properties.

Remark. Strict monotonicity is what makes matching on observed pre-period outcomes equivalent to matching on the latent index – the heart of NSC’s nonlinear generalisation. It fails for genuinely discrete outcomes (the link is non-monotone between intervals, paper’s Remark 8) and for outcomes pressed against a hard floor or ceiling, where the same latent value yields different observed outcomes; histogram the outcome and inspect its bounds before trusting the fit.

A7 (nearest-\(M\) weighting). The synthetic control uses only the \(M > k + T_0\) nearest neighbours (in observed predictors and pre-period outcomes) of the treated unit; donors outside that neighbourhood get weight zero.

Remark. Local matching is the finite-sample mechanism that A5’s population overlap and Theorem 2’s fast donor growth deliver: with enough donors the near-twin pool is non-empty, so restricting to the \(M\) nearest sharpens the fit without losing the treated unit. A donor pool small relative to \(T_0\) (a dozen donors, decades of pre-period) leaves the neighbourhood thin – the regime where the Monte Carlo below shows bias halving as the pool doubles.

Under A2, A4, A5, A6, A7 the NSC estimator is asymptotically unbiased as \(T_0 \to \infty\) provided the donor pool grows super-polynomially in \(T_0\) (Theorem 2: \(N_0 = O(T_0^{b(T_0)})\) with \(b'(\cdot) > 0\)). The fast growth is what makes Assumption 5’s near-twins available in finite samples, so the local-neighbourhood matching in A7 is non-empty.

Mathematical Formulation#

Setup#

Unit \(j = 1\) receives treatment at \(T_0 + 1\) and remains treated thereafter. Stacking each unit’s matching vector \(\mathbf{z}_j = [\mathbf{x}_j; y_{j,1}, \dots, y_{j,T_0}]'\) (see Notation), the untreated potential outcome follows the interactive fixed-effects model

\[y_{jt}^0 = \mathbf{F}(\mathbf{x}_j' \boldsymbol{\beta}_t + \boldsymbol{\mu}_j' \boldsymbol{\lambda}_t + \varepsilon_{jt}),\]

with \(\mathbf{F}(\cdot)\) a strictly monotonic link function and a smooth conditional-expectation function \(\mathbf{G}(\cdot) \coloneqq \mathbb{E}_\varepsilon[\mathbf{F}(\cdot)]\).

NSC weight problem#

The weights solve (Tian 2023, eq. 7)

\[\mathbf{w}^\ast \in \operatorname*{argmin}_{\mathbf{w}} \quad \biggl\| \mathbf{z}_1 - \sum_{j \in \mathcal{N}_0} w_j \mathbf{z}_j \biggr\|^2 + a \sum_{j \in \mathcal{N}_0} |w_j| \, \| \mathbf{z}_1 - \mathbf{z}_j \| + b \sum_{j \in \mathcal{N}_0} w_j^{\,2} \quad \text{s.t.} \quad \sum_{j \in \mathcal{N}_0} w_j = 1.\]

Pretreatment fit term (first norm) – minimised by ordinary affine SC weights.
L1 penalty \(a \sum_j |w_j| \, \|\mathbf{z}_1 - \mathbf{z}_j\|\) – pushes weight onto donors that are close to the treated unit in the matching variables. As \(a \to \infty\) it collapses to the one-nearest-neighbour estimator.
L2 penalty \(b \sum_j w_j^2\) – spreads the weights. As \(b \to \infty\) the weights become uniform and the estimator collapses to a difference-in-differences.

Eigenvalue scaling#

The paper scales the raw multipliers so the dimensionless tuning parameters live on \([0, 1]\). With \(n = \min(N_0, K)\) and \(\lambda_1 \le \dots \le \lambda_n\) the sorted non-zero eigenvalues of \(\mathbf{Z}_0 \mathbf{Z}_0'\),

\[b = b^* \, \lambda_{\lceil n b^* \rceil}, \qquad a = a^* \, \widetilde\lambda_{\lceil n a^* \rceil},\]

where \(\widetilde\lambda_*\) come from the eigenvalues of \(\mathbf{Z}_0 \mathbf{Z}_0' + \operatorname{diag}(b)\). At \(a^* = 1\) only the nearest neighbour receives weight; at \(b^* = 1\) the weights are roughly uniform.

Cross-validation#

R-faithful (NSC.R, fn_cv): for each donor \(j\), fit NSC weights on its pre-period matching vector using the other \(N_0 - 1\) donors plus one randomly drawn extra donor (the pool size stays at \(N_0\) so the eigenvalue scaling of \((a^*, b^*)\) is comparable to the final treated-unit fit). The score is the post-period MSPE of donor \(j\)’s held-out outcomes against that weighted combination. The (a, b) selected minimises the average held-out MSPE across donors. Coordinate descent:

Initialise \(b^* = 0\).
Sweep \(a^*\) over the grid; pick the minimiser.
Hold \(a^*\); sweep \(b^*\); pick the minimiser.
Iterate until \((a^*, b^*)\) stops moving.

The extra-donor draw uses seed (default 123 to match the reference R script’s set.seed(123)). A previous in-sample “controls” target (predicting the donor’s pre-period from the others, i.e. evaluating on the same data the fit used) was removed in favour of this proper held-out score; the legacy cv_target = "treated" option no longer validates.

Inference#

Doudchenko-Imbens (2017): for each period \(t\) the variance of the gap \(\tau_t = y_{1t} - \widehat{y}_{1t}\) is estimated by averaging the squared leave-one-control prediction residuals at that period. Normal-based CIs follow:

\[\tau_t \pm z_{1 - \alpha/2} \cdot \widehat\sigma_t, \quad \widehat{\tau} \pm z_{1 - \alpha/2} \cdot \frac{\sqrt{\overline{\widehat\sigma_t^{\,2}}}}{\sqrt{n_{\text{post}}}}.\]

A two-sided z-test of \(H_0: \tau = 0\) reports the p-value.

When the assumptions bind: practical diagnostics#

IFE / smooth-link DGP for the untreated outcome (A1, A6). The untreated potential outcome is generated by a smooth monotone function of a low-rank factor model.

Plausibly violated when the outcome is genuinely discrete (binary, ordinal, low-count) – F is then non-monotonic between intervals, A6 is violated, and matching on observed pre-period outcomes no longer implies matching on the latent predictors (paper’s Remark 8). Diagnostic: histogram the outcome; if it heaps on a small number of values, switch to a discrete-choice or DSC-style framework (Distributional Synthetic Control (DSC) for the distributional question).
Convex-hull / near-twin availability in the sample (A5). NSC dropped the non-negativity restriction precisely because affine weights tolerate the treated unit lying outside the donor convex hull – nearest neighbours can be reached with negative weights on far donors. But identification still requires the population density to put mass near the treated unit’s predictors.

Plausibly violated when the treated unit is qualitatively different from every potential donor – a coastal mega-state against only interior small states, a tech-led economy with only commodity-led donors. Diagnostic: read res.pre_rmse; a large pre-period RMSE despite the eigenvalue-scaled L1/L2 freedom signals no near-twin in the sample. Increasing the donor pool is the only fix – NSC does not invent population mass that is not there.
Donor pool size grows with \(T_0\) (Theorem 2). Bias vanishes asymptotically only if \(N_0\) grows super-polynomially in \(T_0\). In finite samples, the actionable reading is: more donors are unambiguously better in the nonlinear case because they make the nearest-\(M\) matching tighter (eq. 5).

Plausibly violated when \(N_0\) is small relative to \(T_0\) – a country-level study with a dozen donors and decades of pre-period. Diagnostic: re-run NSC after dropping the bottom-half of donors by closeness; if ATT shifts substantially, the small donor pool was binding. Cross-check against the Monte Carlo table below, which shows bias falling almost in half when \(N_0\) doubles from 25 to 50.
Pre-period factor non-degeneracy (A4). The time factors \(\boldsymbol{\lambda}_t\) must move enough across the pre-period for the unobserved loadings to be identified.

Plausibly violated when the pre-period is flat or co-trending across all units (e.g. all donors follow the same secular trend with no idiosyncratic shocks). Diagnostic: an SVD on the donor pre-matrix should show a clear spectrum, not a single dominant component drowning out the rest.
Outcome is monotone in the latent index (A6). The link function \(\mathbf{F}\) is strictly increasing in its argument. Many transformations satisfy this (logistic, square, log) but capped or saturating outcomes do not (e.g. an outcome with a true ceiling that some donors but not the treated unit hit, making the same latent value produce different observed outcomes).

Plausibly violated when the outcome has a hard floor or ceiling that some donors are pressed against. Diagnostic: inspect the empirical distribution of donor outcomes against its theoretical bounds; if a non-trivial fraction sit at the boundary, A6 fails for those donors and they should be dropped or re-coded.
Doudchenko-Imbens variance reflects a normal limit. The closed-form CI assumes the gap is approximately normal with the leave-one-control variance estimate. With very small \(N_0\) or heavy-tailed outcomes the normal approximation breaks.

Plausibly violated when \(N_0 < 10\) or the donor outcomes are heavy-tailed. Diagnostic: bootstrap the post-period gap by resampling donors; if the bootstrap distribution is visibly skewed or shows heavy tails, treat the closed-form CI as a guide and report the bootstrap CI alongside.

Monte Carlo: NSC vs OSC bias under nonlinearity#

The paper’s Section 4 simulation: a rank-2 IFE latent process rescaled to \([0, 1]\) and raised to the power \(r\) (\(r = 1\) linear; \(r = 2\) nonlinear), with a small post-period treatment effect grid. The table below reproduces the qualitative finding – NSC beats OSC (the non-negativity-restricted original) on bias in the nonlinear regime, and matches it in the linear one.

Note

Verification. NSC is validated two ways: a cross-validation against Tian’s own R implementation on Proposition 99 (weights match to correlation 0.989; effect path \(-9.1/-22.6/-27.0\) vs the paper’s \(-9.5/-24.5/-28.7\)) and the Path-B nonlinear Monte Carlo (near-nominal coverage; error shrinking with the donor pool). See the dedicated page NSC — Nonlinear Synthetic Control (Tian 2023); durable checks nsc_prop99 and nsc_mc.

"""Tian (2023) Section 4 DGP: NSC vs OSC bias as nonlinearity rises.

This runs a handful of replications (per (J, T0, r) cell). The full
paper uses 5000 reps; this snippet keeps it tractable while still
reproducing the qualitative ordering."""

import numpy as np
import pandas as pd
from mlsynth import NSC

def gen_panel(J, T0, T_post, r, seed):
    rng = np.random.default_rng(seed)
    T = T0 + T_post
    X = rng.uniform(0.0, 2 * np.sqrt(3.0), size=(J + 1, 2))
    mu = rng.uniform(0.0, 2 * np.sqrt(3.0), size=(J + 1, 4))
    beta_t = rng.normal(10.0, 1.0, size=(T, 2))
    lam_t = rng.normal(10.0, 1.0, size=(T, 4))
    eps = rng.normal(0.0, 1.0, size=(T, J + 1))
    Y_star = (X @ beta_t.T).T + (mu @ lam_t.T).T + eps
    Yn = (Y_star - Y_star.min()) / (Y_star.max() - Y_star.min())
    return Yn ** r           # (T, J+1) untreated outcomes

def att_bias(Y0, T0, J, seed):
    T_post = Y0.shape[0] - T0
    tau = np.linspace(0.02, 0.2, T_post)
    Y = Y0.copy(); Y[T0:, 0] += tau
    rows = [{"unit": j, "time": t, "y": float(Y[t, j]),
             "D": int(j == 0 and t >= T0)}
            for j in range(J + 1) for t in range(Y.shape[0])]
    df = pd.DataFrame(rows)
    res = NSC({"df": df, "outcome": "y", "treat": "D",
               "unitid": "unit", "time": "time",
               "cv_grid_size": 0.1, "run_inference": False,
               "display_graphs": False, "seed": seed}).fit()
    # Per-period |bias| averaged across post-period, ×100 (paper scale).
    return float(np.mean(np.abs(res.gap[T0:] - tau)) * 100.0)

for (J, T0) in [(25, 15), (50, 15), (50, 30)]:
    for r in (1, 2):
        biases = []
        for rep in range(20):
            Y0 = gen_panel(J=J, T0=T0, T_post=10, r=r, seed=10 * rep + 1)
            biases.append(att_bias(Y0, T0=T0, J=J, seed=10 * rep + 2))
        print(f"  J={J:>3d}, T0={T0:>2d}, r={r}: "
              f"|bias|x100 mean = {np.mean(biases):.2f}, "
              f"SD = {np.std(biases, ddof=1):.2f}")

Representative output at one cell (mlsynth, 20 reps; the paper uses 5000):

J=50, T0=15, r=1: |bias|x100 mean = 0.82, SD = 0.28   (paper NSC: 0.77)
J=50, T0=15, r=2: |bias|x100 mean = 0.87, SD = 0.51   (paper NSC: 0.74)

The qualitative paper findings to look for as you fill in the remaining cells:

Bias falls as \(N_0\) grows – direct evidence of the Theorem 2 mechanism: more donors makes the near-twin pool denser and the nearest-\(M\) matching tighter. Paper Table 1: NSC bias drops from 0.99 (J=25) to 0.77 (J=50) in the linear case and from 0.92 to 0.74 in the nonlinear case at T0=15.
NSC matches OSC in the linear case and wins in the nonlinear case. Paper Table 1: at (J=25, T0=30), OSC bias is 1.79 (r=1) and 1.40 (r=2); NSC stays at 0.91 and 0.87. The non-negativity restriction in OSC pays an extrapolation penalty in the nonlinear regime that NSC’s affine + L1-anchored weighting avoids.
Coverage of the closed-form CI sits at 0.93-0.95 across the paper’s \((N_0, T_0, r)\) grid – the Doudchenko-Imbens variance estimator is well-calibrated for this DGP. mlsynth’s CI reproduces that coverage band.

Empirical: Proposition 99 (California, 1989-2000)#

Revisits the Abadie-Diamond-Hainmueller (2010) tobacco-control case study. mlsynth’s NSC, R-faithful to the reference implementation, recovers the paper’s published \((a^*, b^*) = (0.3, 0.7)\) and produces per-year gaps within a pack or two of the paper’s reported numbers.

import numpy as np
import pandas as pd
from mlsynth import NSC

df = pd.read_stata("synth_smoking.dta")            # Abadie-Diamond-Hainmueller (2010)
df = df.sort_values(["state", "year"]).reset_index(drop=True)
df["treatment"] = ((df["state"] == "California") & (df["year"] >= 1989)).astype(int)

res = NSC({
    "df": df, "outcome": "cigsale", "treat": "treatment",
    "unitid": "state", "time": "year",
    "cv_grid_size": 0.1, "run_inference": True,
    "display_graphs": False, "seed": 42,
}).fit()

print(f"selected (a*, b*) = ({res.design.a_star}, {res.design.b_star})")
print(f"pre-RMSE          = {res.pre_rmse:.4f}")
print(f"ATT (1989-2000)   = {res.att:+.4f}  packs/capita")
print(f"ATT 95% CI        = ({res.inference_detail.att_lower:+.2f}, "
      f"{res.inference_detail.att_upper:+.2f})")

gaps = pd.DataFrame({
    "year": res.inputs.time_labels.astype(int),
    "gap": res.gap,
    "ci_low": res.inference_detail.gap_lower,
    "ci_high": res.inference_detail.gap_upper,
})
for y in (1990, 1995, 2000):
    row = gaps.loc[gaps.year == y].iloc[0]
    print(f"  {y}: gap = {row.gap:+6.2f}  CI=({row.ci_low:+.2f}, {row.ci_high:+.2f})")

Output:

selected (a*, b*) = (0.3, 0.7)        # matches the paper exactly
pre-RMSE          = 1.2450
ATT (1989-2000)   = -19.1313  packs/capita
ATT 95% CI        = (-25.51, -12.75)
  1990: gap =  -9.05  CI=(-26.38, +8.27)        # paper: -9.5
  1995: gap = -22.62  CI=(-46.03, +0.78)        # paper: -24.5
  2000: gap = -27.01  CI=(-54.31, +0.29)        # paper: -28.7

Two things are worth flagging about this replication.

The selected \((a^*, b^*) = (0.3, 0.7)\) is grid-stable across seeds 42, 789, 1000, 2024 and one grid step away from the paper for other seeds (e.g. seed 123 lands on \((0.2, 0.8)\)). This jitter is the same RNG noise the reference R script exhibits across set.seed() values – unavoidable when the CV burns J random draws per grid point and Python’s RNG differs from R’s.
Sparsity comparison vs OSC: the original SC concentrates weight on Utah (0.385), Montana (0.271), Nevada (0.186). NSC spreads weight across roughly 20 states – the L1-anchored affine fit is recruiting more near-neighbour Western states with smaller weights (some negative – Alabama -0.015, Tennessee -0.071 in the paper’s Table 2). The pre-period fit is similar (NSC pre-RMSE 1.25 vs OSC ~2.0) but the spread weights reduce the variance of the post-period counterfactual.

The take-away mirrors the paper’s: NSC’s affine + L1-anchored formulation finds a tighter pre-period fit by pulling in close-neighbour Western states (with some negative weights to correct for the few far donors it admits), which then translates into a smoother and slightly larger counterfactual decline. The ATT of ~-19 packs/capita over 1989-2000 is in the same ballpark as the canonical Abadie et al. result while using a strictly larger weight space.

Core API#

Nonlinear Synthetic Control (NSC) estimator.

Tian, W. (2023). “The Synthetic Control Method with Nonlinear Outcomes: Estimating the Impact of the 2019 Anti-Extradition Law Amendments Bill Protests on Hong Kong’s Economy.” arXiv:2306.01967.

NSC generalises the canonical Abadie-Diamond-Hainmueller (2010) synthetic-control method to nonlinear outcomes by

dropping the non-negativity restriction on donor weights (only the adding-up constraint \(\sum_j w_j = 1\) remains);
augmenting the weight-fitting objective with a pairwise-distance- weighted L1 penalty (favours donors close to the treated unit in pretreatment matching variables) and an L2 penalty (spreads the weights);
scaling the tuning parameters by the eigenvalues of \(Z_0 Z_0'\) so the dimensionless tuning parameters \((a^*, b^*) \in [0, 1]\) admit a coarse cross-validation grid.

Inference defaults to the Doudchenko-Imbens (2017) variance estimator: for each post-treatment period the variance of the gap is estimated by the MSE of predicting each donor’s outcome from the others under the same (a^*, b^*) regime.

class mlsynth.estimators.nsc.NSC(config: NSCConfig | dict)#

Bases: object

Nonlinear Synthetic Control (Tian 2023) estimator.

Parameters:: config (NSCConfig or dict) – Configuration object. See mlsynth.config_models.NSCConfig.
Returns:: NSCResults – Frozen container with the optimised weights, the coordinate- descent CV trace (if any), Doudchenko-Imbens inference, the counterfactual series, and the ATT.

fit() → NSCResults#: Run the NSC pipeline end to end.

Configuration#

class mlsynth.config_models.NSCConfig(*, df: ~pandas.DataFrame, outcome: str, treat: str, unitid: str, time: str, display_graphs: bool = True, save: bool | str = False, counterfactual_color: ~typing.List[str] = <factory>, treated_color: str = 'black', plot: ~mlsynth.config_models.PlotConfig = <factory>, a: ~typing.Annotated[float | None, ~annotated_types.Ge(ge=0.0), ~annotated_types.Le(le=1.0)] = None, b: ~typing.Annotated[float | None, ~annotated_types.Ge(ge=0.0), ~annotated_types.Le(le=1.0)] = None, cv_grid_size: ~typing.Annotated[float, ~annotated_types.Gt(gt=0.0), ~annotated_types.Le(le=0.5)] = 0.1, cv_target: ~typing.Literal['controls'] = 'controls', cv_max_iterations: ~typing.Annotated[int, ~annotated_types.Ge(ge=1), ~annotated_types.Le(le=20)] = 3, covariates: ~typing.List[str] | None = None, standardize: bool = True, alpha: ~typing.Annotated[float, ~annotated_types.Gt(gt=0.0), ~annotated_types.Lt(lt=1.0)] = 0.05, run_inference: bool = True, seed: int = 123)#

Configuration for the Nonlinear Synthetic Control (NSC) estimator.

Implements Tian (2023), “The Synthetic Control Method with Nonlinear Outcomes” (arXiv:2306.01967). The estimator generalises Abadie-Diamond-Hainmueller (2010) synthetic control to nonlinear outcome functions by

dropping the non-negativity restriction on donor weights,
adding a pairwise-distance-weighted L1 penalty plus an L2 penalty in the weight-fitting objective, and
scaling the tuning parameters by the eigenvalues of \(Z_0 Z_0'\) so they can be cross-validated on [0, 1].

Parameters:

a (float or None) – Dimensionless L1-discrepancy tuning parameter on [0, 1]. Higher values concentrate weight on units close to the treated one in pretreatment matching variables (paper eq. (7)). None triggers coordinate-descent CV.
b (float or None) – Dimensionless L2 tuning parameter on [0, 1]. Higher values spread weights more evenly across donors. None triggers coordinate-descent CV.
cv_grid_size (float) – Step of the CV grid on [0, 1]. Paper default is 0.1.
cv_target ({“controls”, “treated”}) – CV target. "controls" (paper default) leaves each donor out in turn and predicts it from the others; "treated" scores on the treated unit’s pretreatment fit.
cv_max_iterations (int) – Hard cap on coordinate-descent iterations for the CV sweep.
covariates (list of str, optional) – Optional covariate columns to use as additional matching variables alongside the pretreatment outcomes; collapsed to per-unit pretreatment means before being stacked into Z_0.
alpha (float) – Two-sided significance level for the Doudchenko-Imbens confidence intervals.
run_inference (bool) – Whether to compute the Doudchenko-Imbens variance estimator and the per-period / ATT CIs.
display_graphs (bool) – Whether to render the diagnostic NSC plot.

a: float | None#

alpha: float#

b: float | None#

classmethod check_nsc_params(values: Any) → Any#

covariates: List[str] | None#

cv_grid_size: float#

cv_max_iterations: int#

cv_target: Literal['controls']#

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

run_inference: bool#

seed: int#

standardize: bool#

Helper Modules#

Data preparation helpers for the NSC estimator.

Builds the NSCInputs container by pivoting a long balanced panel and assembling the pre-period matching matrix Z_0. By default the matching variables are the pre-period outcomes (the paper’s empirical convention); additional covariates can be stacked via the covariates argument.

mlsynth.utils.nsc_helpers.setup.prepare_nsc_inputs(df: DataFrame, outcome: str, treat: str, unitid: str, time: str, covariates: List[str] | None = None, standardize: bool = True) → NSCInputs#

Pivot the panel and assemble the NSC matching matrix.

Parameters:

df (pd.DataFrame) – Long balanced panel with one row per (unit, time).
outcome, treat, unitid, time (str) – Column names.
covariates (list of str, optional) – Optional column names to use as additional matching variables (each collapsed to its per-unit pre-treatment mean before being stacked alongside the pre-period outcomes in Z_0).
standardize (bool, default True) – Centre each matching-variable column to mean 0 and scale to sample standard deviation 1 across the full (treated + donors) stack – matches R’s scale(Z, center=T, scale=apply(Z,2,sd)) in the reference NSC implementation.

Returns:

NSCInputs – Preprocessed panel.

Raises:

MlsynthDataError – If the panel is unbalanced or has missing entries.
MlsynthConfigError – If covariates references unknown columns.

NSC inner optimisation: eigenvalue-scaled (a, b) + simplex-affine QP.

The NSC weight problem (Tian 2023, eq. 7) is

\[\min_w \; \| Z_1 - Z_0' w \|^2 + a \sum_j |w_j| \, \| Z_1 - Z_0[j] \| + b \sum_j w_j^{\,2} \quad \text{s.t.} \quad \sum_j w_j = 1.\]

The raw penalty multipliers a and b are scaled by the eigenvalues of \(Z_0 Z_0'\) so the dimensionless tuning parameters a_star, b_star \in [0, 1] (cf. paper paragraph discussing eq. 7):

b scales as \(b^* \cdot \lambda_{\lceil n b^* \rceil}\), where \(\lambda_1 \le \dots \le \lambda_n\) are the non-zero eigenvalues of \(Z_0 Z_0'\). At b_star = 1 the L2 dominates and the weights are roughly uniform.
a scales analogously by the eigenvalues of \(Z_0 Z_0' + \mathrm{diag}(b)\). At a_star = 1 only the nearest neighbour receives weight.

The QP is solved directly through Clarabel (one cached constraint-skeleton per donor count) to skip the CVXPY canonicalisation overhead — matches the pattern used in SparseSC, SIV, and SYNDES.

mlsynth.utils.nsc_helpers.optimization.design_eigenvalues(Z0: ndarray) → ndarray#

Return ascending non-zero eigenvalues of Z_0 Z_0'.

Parameters:: Z0 (np.ndarray) – (J, p) matching matrix of donor units.

mlsynth.utils.nsc_helpers.optimization.fit_nsc(Z1: ndarray, Z0: ndarray, a_star: float, b_star: float, eigvals: ndarray | None = None) → Tuple[ndarray, float, float]#

Convenience wrapper: scale (a_star, b_star) and solve the QP.

Returns:

w (np.ndarray) – Donor weights.
a_scaled (float) – Raw L1 multiplier used.
b_scaled (float) – Raw L2 multiplier used.

mlsynth.utils.nsc_helpers.optimization.scale_a(a_star: float, Z0: ndarray, b_raw: float) → float#

Eigenvalue-scaled L1-discrepancy multiplier.

Uses the eigenvalues of Z_0 Z_0' + b * I (where b is the already-scaled raw L2 multiplier) so the L1 anchor maxes out at the nearest-neighbour solution when a_star = 1 regardless of the L2 penalty.

mlsynth.utils.nsc_helpers.optimization.scale_b(b_star: float, eigvals: ndarray) → float#: Eigenvalue-scaled L2 multiplier.

mlsynth.utils.nsc_helpers.optimization.solve_nsc_weights(Z1: ndarray, Z0: ndarray, a: float, b: float) → ndarray#

Solve the NSC weight QP for one (a, b) pair.

The L1 penalty \(a \sum_j |w_j| \, d_j\) is rewritten using auxiliary non-negative variables \(u_j \ge |w_j|\); the full QP becomes

\[\min_{w, u} \quad \| Z_1 - Z_0' w \|^2 + a \, d^\top u + b \, w^\top w \quad \text{s.t.} \quad \sum_j w_j = 1, \; u_j \ge w_j, \; u_j \ge -w_j.\]

Parameters:

Z1 (np.ndarray) – Treated matching vector of length p.
Z0 (np.ndarray) – Donor matching matrix of shape (J, p).
a (float) – Raw L1-discrepancy multiplier (post eigenvalue scaling).
b (float) – Raw L2 multiplier (post eigenvalue scaling).

Returns:

np.ndarray – Donor weight vector of length J, summing to 1.

Coordinate-descent cross-validation for NSC tuning parameters.

Implements Tian (2023, page 13) selection of \((a^*, b^*)\) exactly as in the reference R implementation (NSC.R, function fn_cv):

Initialise b_star = 0.
Sweep a_star on the grid [0, 1] with step grid_size; pick the value minimising the CV-MSPE.
Holding a_star fixed, sweep b_star on the same grid.
Iterate steps 2-3 until (a_star, b_star) stops moving (or a max-iteration budget is reached).

CV objective (R-faithful, “controls” target)#

For each donor j:

Treat donor j as a pseudo-treated unit. The fitting target is its pre-period matching vector.
The donor pool for the fit is the other J - 1 donors PLUS one randomly drawn extra donor (with replacement-style indexing), so the per-fold pool size stays at J – matching the dimensionality of the real treated-unit fit and keeping the eigenvalue-based scaling of (a, b) consistent across folds (NSC.R lines 110-115).
Fit NSC weights at the candidate (a, b) on PRE-period matching data, then evaluate the post-period MSPE: actual donor j outcomes vs. those weights applied to the other donors’ post-period outcomes.

This is a true held-out test error – the post-period outcomes never enter the fit. The legacy target="treated" option (which scored on the same data it fit to) has been removed; passing it now raises.

mlsynth.utils.nsc_helpers.crossval.cv_select(Z1: ndarray, Z0: ndarray, Y_donors: ndarray, T0: int, grid_size: float = 0.1, max_iterations: int = 3, target: str = 'controls', seed: int | Generator | None = 123) → Tuple[float, float, NSCCVTrace]#

Coordinate-descent selection of (a_star, b_star) on [0, 1].

Parameters:

Z1 (np.ndarray) – Treated matching vector, length p. Currently unused by the R-faithful “controls” target (kept in the signature for symmetry with fit_nsc() and for future use).
Z0 (np.ndarray) – Standardized donor matching matrix, shape (J, p).
Y_donors (np.ndarray) – Full-period donor outcome matrix, shape (T, J).
T0 (int) – Number of pre-treatment periods.
grid_size (float, default 0.1) – Step of the search grid; the candidate set is [0, grid_size, 2*grid_size, ..., 1].
max_iterations (int, default 3) – Hard cap on coordinate-descent iterations.
target ({“controls”}) – Only the R-faithful “controls” target is supported. The legacy “treated” target (training-error MSPE on the treated unit’s pretreatment fit) has been removed.
seed (int, Generator, or None, default 123) – Seed (or RNG) for the extra-donor draws. Defaults to 123 to match the reference R script’s set.seed(123) convention.

Returns:

a_star (float) – Optimal dimensionless L1 multiplier on [0, 1].
b_star (float) – Optimal dimensionless L2 multiplier on [0, 1].
trace (NSCCVTrace) – Coordinate-descent diagnostics.

Doudchenko-Imbens (2017) inference for NSC, R-faithful flavour.

Mirrors the placebo-style variance estimator used in the reference R implementation (NSC.R, lines 200-220): for each donor j, hold it out and refit the NSC weights using the other J - 1 donors PLUS one randomly drawn extra donor (keeping the pool size at J). Record the full-period residual Y_{j,t} - sum_k w_k Y_{k,t} for that fold. The per-period standard error is

se_t = sqrt(sum_j perr_{j,t}^2 / (J - 1))

and the gap CI at significance alpha is the normal interval tau_hat_t ± z_{1-alpha/2} * se_t. The ATT CI is built from the average of the per-period post-period SEs in the same way as the R script’s reporting (an arithmetic mean of the ITE over the post-period, with SE = average post-period SE / sqrt(T1)).

mlsynth.utils.nsc_helpers.inference.doudchenko_imbens_inference(treated_outcome: ndarray, donor_outcomes: ndarray, counterfactual: ndarray, Z0: ndarray, T0: int, a_star: float, b_star: float, alpha: float = 0.05, seed: int | Generator | None = 123) → NSCInference#

Compute per-period and ATT inference from the leave-one-control fits.

Parameters:

treated_outcome (np.ndarray) – Treated outcome series, shape (T,).
donor_outcomes (np.ndarray) – Donor outcome matrix, shape (T, J).
counterfactual (np.ndarray) – SC imputation of the treated outcome, shape (T,).
Z0 (np.ndarray) – Standardized donor matching matrix, shape (J, p).
T0 (int) – Number of pre-treatment periods.
a_star, b_star (float) – Dimensionless tuning parameters that produced counterfactual. Used unchanged for the leave-one-control re-fits.
alpha (float, default 0.05) – Two-sided significance level.
seed (int, Generator, or None, default 123) – Seed (or RNG) for the extra-donor draws. Matches the reference R script’s set.seed(123).

Diagnostic plot for NSC results.

mlsynth.utils.nsc_helpers.plotter.plot_nsc(results: NSCResults) → None#

Two-panel plot: trajectories + gap with CI.

The left panel overlays the treated and synthetic-control series over the full panel; the right panel shows the per-period gap plus the Doudchenko-Imbens confidence band when available.

Note

NSC.fit() returns an EffectResult on the standardized two-family contract: res.att / res.att_ci / res.counterfactual / res.gap / res.donor_weights / res.pre_rmse resolve through the standardized sub-models. The rich Doudchenko-Imbens per-period bands are on res.inference_detail (the bare res.inference slot is reserved for the standardized ATT-level InferenceResults).

Frozen dataclasses for the Nonlinear Synthetic Control (NSC) estimator.

NSC implements Tian (2023), “The Synthetic Control Method with Nonlinear Outcomes” (arXiv:2306.01967). The estimator generalises the Abadie-Diamond-Hainmueller (2010) synthetic-control procedure to settings where the outcome is a nonlinear function of the underlying predictors. The key modifications versus the canonical Abadie SC are:

the non-negativity constraint on donor weights is relaxed (only the adding-up constraint \(\sum_j w_j = 1\) remains);
the objective adds an L1 penalty weighted by pairwise pretreatment matching discrepancies (favours donors close to the treated unit) plus an L2 penalty (spreads the weights);
the tuning parameters \(a\), \(b\) are scaled by the eigenvalues of \(Z_0 Z_0'\), so the dimensionless tuning parameters \(a^*, b^* \in [0, 1]\) can be cross-validated on a coarse grid.

The five layers below (inputs, design, fold-level CV summary, the full grid trace, inference) keep the pipeline pluggable and inspectable.

class mlsynth.utils.nsc_helpers.structures.NSCCVTrace(a_grid: ndarray, b_grid: ndarray, a_mspe_curve: ndarray, b_mspe_curve: ndarray, iterations: int, converged: bool, target: str)#

Coordinate-descent trace of the (a_star, b_star) sweep.

Parameters:

a_grid (np.ndarray) – Candidate values of a_star on the grid.
b_grid (np.ndarray) – Candidate values of b_star on the grid.
a_mspe_curve (np.ndarray) – MSPE at each a_star (with b_star held at its current value) on the final coordinate-descent iteration.
b_mspe_curve (np.ndarray) – MSPE at each b_star (with a_star held at the coordinate-descent estimate) on the final iteration.
iterations (int) – Number of coordinate-descent iterations actually run.
converged (bool) – True iff (a_star, b_star) did not move on the last iteration.
target (str) – Which CV target was used: "controls" (predict each donor’s pre-period from the others – paper default) or "treated" (predict the treated unit’s pre-period outcomes from the donors).

a_grid: ndarray#

a_mspe_curve: ndarray#

b_grid: ndarray#

b_mspe_curve: ndarray#

converged: bool#

iterations: int#

target: str#

class mlsynth.utils.nsc_helpers.structures.NSCDesign(w: ndarray, donor_weights: Dict[Any, float], a_star: float, b_star: float, a_scaled: float, b_scaled: float, eigvals: ndarray)#

Optimised NSC weights and the tuning state that produced them.

Parameters:

w (np.ndarray) – Donor weight vector (J,). Sums to 1; may contain negative entries because NSC drops the non-negativity restriction.
donor_weights (dict) – {donor_name: weight} – convenience view of w.
a_star (float) – Dimensionless L1-discrepancy tuning parameter on [0, 1].
b_star (float) – Dimensionless L2 tuning parameter on [0, 1].
a_scaled (float) – Raw L1 penalty multiplier the optimiser actually used (after eigenvalue scaling).
b_scaled (float) – Raw L2 penalty multiplier (after eigenvalue scaling).
eigvals (np.ndarray) – Sorted (ascending) non-zero eigenvalues of Z_0 Z_0' used for the (a, b) scaling. Length \(n = \min(J, p)\).

a_scaled: float#

a_star: float#

b_scaled: float#

b_star: float#

donor_weights: Dict[Any, float]#

eigvals: ndarray#

w: ndarray#

class mlsynth.utils.nsc_helpers.structures.NSCInference(method: str, alpha: float, period_variance: ~numpy.ndarray = <factory>, period_se: ~numpy.ndarray = <factory>, gap: ~numpy.ndarray = <factory>, gap_lower: ~numpy.ndarray = <factory>, gap_upper: ~numpy.ndarray = <factory>, att: float = nan, att_se: float = nan, att_lower: float = nan, att_upper: float = nan, p_value: float = nan)#

Doudchenko-Imbens (2017) inference for NSC.

The variance of the per-period estimator is approximated by the MSE of predicting each control unit’s outcome at that period using the other controls (the “leave-one-control-out” estimator). A normal CI is then formed period-by-period.

Parameters:

method (str) – "doudchenko_imbens" or "none".
alpha (float) – Two-sided significance level used to build the CIs.
period_variance (np.ndarray) – Per-period variance estimate, shape (T,).
period_se (np.ndarray) – Per-period standard error \(\sqrt{\text{period\_variance}}\).
gap (np.ndarray) – Treated minus counterfactual at every period, shape (T,).
gap_lower (np.ndarray) – Lower CI bound for the per-period gap, shape (T,).
gap_upper (np.ndarray) – Upper CI bound for the per-period gap, shape (T,).
att (float) – Average post-treatment gap.
att_se (float) – Standard error of att (post-period variance averaged and divided by n_post).
att_lower, att_upper (float) – (1 - alpha) CI for the ATT.
p_value (float) – Two-sided z-test of H_0: ATT = 0.

alpha: float#

att: float = nan#

att_lower: float = nan#

att_se: float = nan#

att_upper: float = nan#

gap: ndarray#

gap_lower: ndarray#

gap_upper: ndarray#

method: str#

p_value: float = nan#

period_se: ndarray#

period_variance: ndarray#

class mlsynth.utils.nsc_helpers.structures.NSCInputs(treated_outcome: ndarray, donor_outcomes: ndarray, matching_matrix: ndarray, treated_matching_vector: ndarray, donor_names: ndarray, treated_unit_name: Any, T: int, T0: int, time_labels: ndarray)#

Preprocessed panel data for NSC estimation.

Parameters:

treated_outcome (np.ndarray) – Outcome series for the treated unit over the full panel, shape (T,).
donor_outcomes (np.ndarray) – Outcome matrix for the J donor units over the full panel, shape (T, J).
matching_matrix (np.ndarray) – Pre-period matching matrix Z_0 of donor units used by the eigenvalue scaling, shape (J, p) where p = T0 (paper default: pretreatment outcomes; additional covariates may be stacked downstream).
treated_matching_vector (np.ndarray) – Pre-period matching vector Z_1 for the treated unit, shape (p,).
donor_names (np.ndarray) – Names / labels of the donor units, length J.
treated_unit_name (Any) – Label of the treated unit.
T (int) – Total number of panel periods.
T0 (int) – Number of pre-treatment periods.
time_labels (np.ndarray) – Labels of the time periods, length T.

property J: int#: Number of donor units.

T: int#

T0: int#

donor_names: ndarray#

donor_outcomes: ndarray#

matching_matrix: ndarray#

property n_post: int#: Number of post-treatment periods.

time_labels: ndarray#

treated_matching_vector: ndarray#

treated_outcome: ndarray#

treated_unit_name: Any#

class mlsynth.utils.nsc_helpers.structures.NSCResults(*, effects: ~mlsynth.config_models.EffectsResults | None = None, fit_diagnostics: ~mlsynth.config_models.FitDiagnosticsResults | None = None, time_series: ~mlsynth.config_models.TimeSeriesResults | None = None, weights: ~mlsynth.config_models.WeightsResults | None = None, inference: ~mlsynth.config_models.InferenceResults | None = None, method_details: ~mlsynth.config_models.MethodDetailsResults | None = None, sub_method_results: ~typing.Dict[str, ~typing.Any] | None = None, additional_outputs: ~typing.Dict[str, ~typing.Any] | None = None, raw_results: ~typing.Dict[str, ~typing.Any] | None = None, execution_summary: ~typing.Dict[str, ~typing.Any] | None = None, plot_config: ~mlsynth.config_models.PlotConfig | None = None, inputs: ~mlsynth.utils.nsc_helpers.structures.NSCInputs, design: ~mlsynth.utils.nsc_helpers.structures.NSCDesign, cv_trace: ~mlsynth.utils.nsc_helpers.structures.NSCCVTrace | None, inference_detail: ~mlsynth.utils.nsc_helpers.structures.NSCInference, metadata: ~typing.Dict[str, ~typing.Any] = <factory>)#

Top-level container returned by mlsynth.NSC.fit().

An EffectResult (the observational report): it populates the standardized sub-models so the flat accessors (att / att_ci / counterfactual / gap / donor_weights / pre_rmse) resolve through the base contract. The rich Doudchenko-Imbens per-period bands live on inference_detail (the standardized inference slot mirrors the ATT-level CI), and the NSC tuning detail stays on design / cv_trace.

Parameters:

inputs (NSCInputs) – Preprocessed panel.
design (NSCDesign) – Optimised weights plus the tuning state.
cv_trace (NSCCVTrace or None) – Coordinate-descent diagnostics; None when the user supplied a and b explicitly.
inference_detail (NSCInference) – Per-period and ATT inference (was inference before the contract migration; the standardized inference slot now holds the ATT-level InferenceResults).
metadata (dict) – Free-form pipeline diagnostics.

cv_trace: NSCCVTrace | None#

design: NSCDesign#

property donor_weights: Dict[Any, float]#: Alias for design.donor_weights (original-keyed).

inference_detail: NSCInference#

inputs: NSCInputs#

metadata: Dict[str, Any]#

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'frozen': True, 'json_encoders': {<class 'numpy.ndarray'>: <function BaseEstimatorResults.Config.<lambda>>}}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Example#

A self-contained one-draw Monte Carlo at the paper’s nonlinear DGP (Tian 2023, Section 4): latent linear factor model, rescale to \([0, 1]\), square (so \(r = 2\)), apply a small treatment effect to the treated unit’s post-period.

"""One draw of the Tian (2023) nonlinear-outcome simulation."""

import numpy as np
import pandas as pd

from mlsynth import NSC


# ---------------------------------------------------------------------
# 1. Simulate one nonlinear panel
# ---------------------------------------------------------------------

rng = np.random.default_rng(0)

J = 12                    # donors
T_pre = 12                # pre-treatment periods
T_post = 6                # post-treatment periods
T = T_pre + T_post
tau_true = 0.10           # treatment effect (on the unit interval after rescaling)

# Latent linear factor model: Y* = X' beta_t + mu' lambda_t + epsilon
X = rng.uniform(0.0, np.sqrt(12.0), size=(J + 1, 2))
mu = rng.uniform(0.0, np.sqrt(12.0), size=(J + 1, 4))
beta_t = rng.normal(10.0, 1.0, size=(T, 2))
lam_t = rng.normal(10.0, 1.0, size=(T, 4))
eps = rng.normal(0.0, 1.0, size=(T, J + 1))
Y_star = (X @ beta_t.T).T + (mu @ lam_t.T).T + eps

# Rescale to [0, 1] then apply nonlinear transformation (r = 2 in the paper).
Yn = (Y_star - Y_star.min()) / (Y_star.max() - Y_star.min())
Y_control = Yn ** 2

# Apply the additive treatment effect to unit 0 in the post-period.
Y = Y_control.copy()
Y[T_pre:, 0] += tau_true

rows = [
    {
        "unit": j,
        "time": t,
        "y": float(Y[t, j]),
        "D": int(j == 0 and t >= T_pre),
    }
    for j in range(J + 1)
    for t in range(T)
]
df = pd.DataFrame(rows)


# ---------------------------------------------------------------------
# 2. Fit NSC with CV-selected (a*, b*)
# ---------------------------------------------------------------------

results = NSC({
    "df": df,
    "outcome": "y",
    "treat": "D",
    "unitid": "unit",
    "time": "time",
    "cv_target": "controls",     # paper default
    "cv_grid_size": 0.1,
    "cv_max_iterations": 3,
    "alpha": 0.05,
    "run_inference": True,
    "display_graphs": False,
}).fit()


# ---------------------------------------------------------------------
# 3. Inspect the output
# ---------------------------------------------------------------------

print(f"truth   tau = {tau_true:+.3f}")
print(f"ATT_hat     = {results.att:+.3f}")
print(f"a* = {results.design.a_star:.2f}, b* = {results.design.b_star:.2f}")
print(f"pre-RMSE    = {results.pre_rmse:.4f}")
print(f"95% CI ATT  = [{results.inference_detail.att_lower:+.3f}, "
      f"{results.inference_detail.att_upper:+.3f}]")
print(f"p-value     = {results.inference_detail.p_value:.3f}")

# Per-period diagnostics:
import pandas as pd
print(pd.DataFrame({
    "t": np.arange(T),
    "gap": results.gap,
    "ci_low": results.inference_detail.gap_lower,
    "ci_high": results.inference_detail.gap_upper,
}).round(3).to_string(index=False))

References#

Abadie, A., Diamond, A., & Hainmueller, J. (2010). “Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California’s Tobacco Control Program.” Journal of the American Statistical Association 105(490):493-505.

Doudchenko, N., & Imbens, G. W. (2017). “Balancing, Regression, Difference-In-Differences and Synthetic Control Methods: A Synthesis.” NBER Working Paper 22791.

Tian, W. (2023). “The Synthetic Control Method with Nonlinear Outcomes: Estimating the Impact of the 2019 Anti-Extradition Law Amendments Bill Protests on Hong Kong’s Economy.” arXiv:2306.01967.

Nonlinear Synthetic Control (NSC)

Contents

Nonlinear Synthetic Control (NSC)#

Overview#

When to use NSC – and when not to#

Notation#

Assumptions (Tian 2023)#

Mathematical Formulation#

Setup#

NSC weight problem#

Eigenvalue scaling#

Cross-validation#

Inference#

When the assumptions bind: practical diagnostics#

Monte Carlo: NSC vs OSC bias under nonlinearity#

Empirical: Proposition 99 (California, 1989-2000)#

Core API#

Configuration#

Helper Modules#

CV objective (R-faithful, “controls” target)#

Example#

References#