Synthetic Control with Multiple Outcomes (SCMO)

Contents

Synthetic Control with Multiple Outcomes (SCMO)#

When to Use This Estimator#

The synthetic control (SC) method of Abadie and co-authors [ABADIE2010] builds a counterfactual for one treated unit as a convex combination of donors that reproduces the treated unit’s single outcome over the pre-treatment period. That single-outcome design faces a bias dilemma in the panels most applied work actually has:

  • Short pre-period. With few pre-treatment periods, a flexible donor pool can fit the pre-period too well, latching onto idiosyncratic noise rather than the latent factors – an overfit that predicts poorly out-of-sample.

  • Long pre-period. Matching over many periods mitigates overfitting but is fragile to structural breaks in the outcome-predictor relationship, which reintroduces bias.

SCMO targets exactly this regime by supplementing the time dimension with an outcome dimension. When you observe several related outcomes that share the same latent drivers – a “domain” such as {GDP, industrial production, CPI, trade} for an economy, or {math score, reading score, attendance} for a school – a single set of donor weights can be matched on all of them at once. Each extra outcome is an additional, partially independent view of the same donor loadings, so the weights are pinned down with far fewer pre-treatment periods. Tian, Lee and Panchenko [TianLeePanchenko] show the bias then shrinks at rate \(O(1/\sqrt{K T_0})\) in the number of outcomes \(K\) and periods \(T_0\), versus \(O(1/\sqrt{T_0})\) for single-outcome SC – a smaller order. The headline demonstration is striking: a synthetic West Germany matched on nine economic indicators in the single year 1989 tracks 30 years of its GDP almost as well as one matched on the entire 1960-1989 GDP trajectory.

Use SCMO when you have one treated unit, a moderate-to-short pre-period, and a set of related outcomes driven by common factors. It is the right tool when single-outcome SC overfits (too few periods) or when you simply want a single, interpretable comparison group that is credible across several outcomes at once.

Matching on Outcomes, Not Just Time#

Most empirical work with several outcomes runs a separate SC for each one, getting a different donor mix every time – hard to interpret and statistically wasteful. SCMO instead estimates one common weight vector by balancing the treated unit against the donors across the whole outcome domain. There are two ways to combine the outcomes, due to the two papers SCMO implements:

Scheme

What it balances

Paper

concatenated

The stacked standardized pre-treatment series of all \(K\) outcomes (a single SC fit on a \(K T_0\)-long matching vector).

Tian-Lee-Panchenko [TianLeePanchenko]

averaged

The average of the standardized outcomes within each period (a single SC fit on a \(T_0\)-long matching vector).

Sun-Ben-Michael-Feller [SunBenMichaelFeller]

separate

The primary outcome’s pre-treatment trajectory alone – the conventional single-outcome SC baseline.

Abadie et al. [ABADIE2010]

MA

A convex model-average of concatenated and averaged, chosen by pre-treatment fit.

(this implementation)

Both papers agree on the surrounding recipe: standardize each outcome (per period) before matching, since outcomes have different scales; optionally de-mean (intercept-shift) to allow stable level differences between treated and donors; and constrain the weights to the simplex (non-negative, summing to one).

Notation#

We observe \(K\) related outcomes in a domain \(\mathbb{K} = \{1, \ldots, K\}\) for \(N + 1\) units over \(T\) periods. Unit \(j = 0\) is the sole treated unit, and \(\mathcal{N} = \{1, \ldots, N\}\) indexes the donors. Treatment begins at \(T_0 + 1\), giving a pre-period \(\mathcal{T}_1 = \{1, \ldots, T_0\}\) and a post-period \(\mathcal{T}_2 = \{T_0 + 1, \ldots, T\}\) of length \(T_2 = T - T_0\). Write \(y_{jtk}\) for unit \(j\)’s outcome \(k\) at time \(t\), and \(\boldsymbol{\gamma} = (\gamma_1, \ldots, \gamma_N)\) for the common donor weights (a single vector shared across all \(K\) outcomes). The estimand is the per-outcome ATT,

\[\tau_k = \frac{1}{T_2} \sum_{t \in \mathcal{T}_2} \bigl(y^1_{0tk} - y^0_{0tk}\bigr), \qquad k \in \mathbb{K},\]

with the primary outcome’s \(\tau\) the headline estimate.

Notation bridge

Both papers write the treated unit \(i = 1\) and donors \(i = 2, \ldots, N+1\); we use \(j = 0\) for the treated unit and \(\mathcal{N}\) for donors. Their common weights are \(\hat{w}_j\) / \(\gamma_i\); we use \(\boldsymbol{\gamma}\). mlsynth builds matching variables through a spec (which outcomes, which period(s), and per-variable transforms level/log/ per_capita/raw) rather than a fixed list, so the same engine covers “match \(K\) outcomes over \(T_0\) periods” and “match a cross-section of indicators in one year”.

Mathematical Formulation#

The factor model#

Both papers assume the untreated potential outcome follows an interactive fixed-effects (factor) model with loadings that are common across the outcomes in a domain:

\[y^0_{jtk} = \delta_{tk} + \boldsymbol{\mu}_j^\top \boldsymbol{\lambda}_{tk} + \varepsilon_{jtk},\]

where \(\boldsymbol{\lambda}_{tk}\) are time- and outcome-specific factors, \(\boldsymbol{\mu}_j\) are unit loadings shared across outcomes \(k\) (the key assumption), and \(\varepsilon_{jtk}\) are transitory shocks. Because the loadings are shared, each outcome is a separate window onto the same \(\boldsymbol{\mu}_j\), so \(K\) outcomes over \(T_0\) periods give \(K T_0\) matching equations to pin them down.

The matching matrix and the two weighting schemes#

Let \(\tilde{y}_{jtk}\) denote outcome \(k\), standardized in each period by its cross-unit standard deviation (and optionally de-meaned by the unit’s pre-period mean). The donor weights solve a simplex-constrained least-squares problem; the two schemes differ only in what they balance:

\[\text{concatenated:}\quad \hat{\boldsymbol{\gamma}}^{\text{cat}} = \operatorname*{argmin}_{\boldsymbol{\gamma} \in \Delta^{N-1}} \sum_{k=1}^{K} \sum_{t \in \mathcal{T}_1} \Bigl( \tilde{y}_{0tk} - \sum_{j \in \mathcal{N}} \gamma_j \tilde{y}_{jtk} \Bigr)^2,\]
\[\text{averaged:}\quad \hat{\boldsymbol{\gamma}}^{\text{avg}} = \operatorname*{argmin}_{\boldsymbol{\gamma} \in \Delta^{N-1}} \sum_{t \in \mathcal{T}_1} \Bigl( \bar{\tilde{y}}_{0t} - \sum_{j \in \mathcal{N}} \gamma_j \bar{\tilde{y}}_{jt} \Bigr)^2, \quad \bar{\tilde{y}}_{jt} = \tfrac{1}{K} \sum_{k} \tilde{y}_{jtk},\]

where \(\Delta^{N-1}\) is the simplex. The counterfactual for the primary outcome and its ATT are then

\[\hat{y}^0_{0tk} = \sum_{j \in \mathcal{N}} \hat{\gamma}_j\, y_{jtk}, \qquad \hat{\tau}_k = \frac{1}{T_2} \sum_{t \in \mathcal{T}_2} \bigl( y_{0tk} - \hat{y}^0_{0tk} \bigr),\]

with a de-meaned (intercept-shifted) variant \(\hat{y}^0_{0tk} = \bar{y}_{0\cdot k} + \sum_j \hat{\gamma}_j (y_{jtk} - \bar{y}_{j \cdot k})\) when demean=True ([DoudchenkoImbens2017]; Sun-Ben-Michael-Feller Eq. 1).

Why more outcomes help#

Tian-Lee-Panchenko’s Proposition 1 gives the bias rates: for the single-outcome SC, \(|\mathbb{E}[\hat{\tau}^{\text{sep}}] - \tau| = O(1/\sqrt{T_0})\), while for the multiple-outcome SC, \(|\mathbb{E}[\hat{\tau}^{\text{cat}}] - \tau| = O(1/\sqrt{K T_0})\). More related outcomes shrink the bias at a faster order. Sun-Ben-Michael- Feller sharpen this for the averaged scheme: averaging reduces the bias due to overfitting by \(1/\sqrt{K}\) (like concatenation) and the bias due to poor pre-treatment fit by a further \(1/\sqrt{K}\) – but, as both papers note, equal-weight averaging can wash out signal that is specific to individual outcomes. Which scheme wins is therefore regime-dependent (see When to Use Concatenated vs Averaged).

Assumptions#

Assumption 1 (shared-loading factor model). The untreated outcomes obey \(y^0_{jtk} = \delta_{tk} + \boldsymbol{\mu}_j^\top \boldsymbol{\lambda}_{tk} + \varepsilon_{jtk}\) with loadings \(\boldsymbol{\mu}_j\) common across outcomes in the domain.

Remark. This is the substantive assumption – the outcomes must be driven by the same unit-level latent traits (a “domain”), exactly the premise of standard factor analysis. If outcomes loaded on different unobservables, the benefit of borrowing across them would vanish and SCMO would reduce to the single-outcome order of bias. Sun-Ben-Michael-Feller phrase the equivalent condition as low rank of the stacked model-component matrix \(L\).

Assumption 2 (transitory shocks). The \(\varepsilon_{jtk}\) are mean-zero given the factors and loadings, independent across units and outcomes, with bounded moments.

Remark. Independence is across outcomes; the shared interactive fixed effects still induce correlation along the factor dimensions. Different scales/volatilities across outcomes are handled by standardizing each outcome per period before matching.

Assumption 3 (feasibility / convex hull). There exist weights \(\hat{w}_j \ge 0\) summing to one such that the treated unit’s matching variables lie (approximately) in the convex hull of the donors’: \(\sum_j \hat{w}_j \boldsymbol{\mu}_j = \boldsymbol{\mu}_0\) and \(\sum_j \hat{w}_j y_{jtk} = y_{0tk}\) for \(t \le T_0\).

Remark. This is the multiple-outcome analogue of Abadie et al.’s perfect-fit condition. When the treated unit sits outside the donor hull (extreme levels), de-meaning relaxes it to a parallel-trends-style condition by allowing a constant level gap per outcome – which also corrects size distortion of the permutation/conformal test.

Assumption 4 (consistency for inference). The estimated weights yield a consistent estimate of the de-meaned counterfactual as \(T_0, N \to \infty\).

Remark. This is what the Chernozhukov-Wuethrich-Zhu conformal test (below) needs for asymptotically valid size; Sun-Ben-Michael-Feller verify it for the averaged weights in their Online Appendix A.

Inference#

mlsynth uses a single inference procedure for every weighting scheme: the conformal test of Chernozhukov, Wuethrich and Zhu [CWZ2021], in the multiple-outcome form of Sun-Ben-Michael-Feller (Online Appendix A). Under the sharp null \(H_0: \tau = \tau_0\), form the adjusted residuals \(\hat{u}_{tk}\) (the gap, with the post-period shifted by \(\tau_0\)) and the per-period statistic

\[S_q(\hat{u}_t) = \Bigl( \tfrac{1}{\sqrt{K}} \sum_{k=1}^{K} |\hat{u}_{tk}|^q \Bigr)^{1/q},\]

with \(q = 1\) (the average effect) by default; larger \(q\) targets effects concentrated on a few outcomes. The conformal p-value ranks the post-treatment statistic against the pre-treatment (moving-block) distribution,

\[\hat{p}(\boldsymbol{\tau}_0) = \frac{1}{T} \sum_{t \in \mathcal{T}_1} \mathbf{1}\{S_q(\hat{u}_{T}) \le S_q(\hat{u}_t)\} + \frac{1}{T},\]

and inverting the test over a grid of \(\tau_0\) yields a confidence interval for the ATT. Because the matching matrix is built from pre-period information, the SC weights do not depend on the post-period outcome, so the test inversion is essentially free (no refitting). With a single predicted outcome the statistic reduces to \(|\text{gap}_t|\).

Why this replaces placebo/permutation. The conformal test is exact-in-finite- sample under exchangeability and applies identically to the concatenated and averaged fits, so SCMO does not need the Abadie permutation test or a separate agnostic-conformal path – one method serves both schemes.

When to Use Concatenated vs Averaged#

  • Concatenated (Tian-Lee-Panchenko) keeps every outcome’s information separate, so it shines when the outcomes are distinct views of the shared loadings (different factor trajectories per outcome). It is the safer default and is what reproduces the West Germany result below.

  • Averaged (Sun-Ben-Michael-Feller) collapses the outcomes to one averaged series, which denoises when the outcomes share a strong common factor and the per-outcome noise is large – but can blur signal that lives in a single outcome. Prefer it when the domain is tightly co-moving and noisy.

  • MA hedges between the two by pre-treatment fit; separate is the single-outcome baseline for comparison.

mlsynth’s SCMO ships four schemes, three of which are genuinely multi-outcome and one a single-outcome baseline. The headline trade-off is between the two main multi-outcome variants:

  • Concatenated (Tian-Lee-Panchenko, 2024). Stack every outcome’s pre-period vector on top of the others, then solve one simplex QP for donor weights that balance the whole stack at once. Each outcome contributes its own per-period constraints; nothing is averaged away.

  • Averaged (Sun-Ben-Michael-Feller, 2025). First compute a per-period average (or user-supplied index) of all \(K\) outcomes, then run a single-outcome SC fit on that index. The per-outcome noise is denoised by the average; what’s left is the common factor signal.

  • MA (model-averaged). Fits both of the above and weights them by pre-treatment fit. A reasonable hedge when you’re unsure.

  • Separate. The single-outcome SC baseline (one weight vector per outcome), kept for comparison.

The mirror-image rule of thumb is:

  • Prefer Averaged when outcomes share a strong common factor and the per-outcome noise is large. Averaging acts as a denoiser; Sun-Ben-Michael-Feller prove that the bias reduction grows as \(K\) (the number of outcomes) grows, so the more closely co-moving outcomes you can stack, the bigger the averaging dividend. Educational testing (math + reading + attendance with shared school factors) is the canonical fit.

  • Prefer Concatenated when outcomes are distinct views of the shared loadings – each outcome’s trajectory looks different even though they share unit-level loadings. Averaging blurs the signal in this regime because the distinct trajectories partially cancel. Multi-indicator economy panels (GDP, energy, trade, patents…) where each indicator has its own time pattern are the canonical fit – which is why the West Germany replication below uses concatenated.

  • Prefer MA when you can’t tell which regime you’re in – it picks by pre-treatment fit and is rarely the worst of the three.

  • Prefer Separate when you only have one outcome anyway, or you want a different donor mix per outcome for substantive reporting (e.g.\ separate “education” vs “labour” subdomains).

SCMO is a multi-outcome generalisation; it inherits all of classical SC’s identification requirements (donor units untreated, treated unit in the convex hull of donor loadings, no anticipation / spillover) and adds one of its own:

  • You only have one related outcome. With \(K = 1\) SCMO collapses to vanilla SC – use Two-Step Synthetic Control (single-outcome with formal pre-trends testing) or Forward Difference-in-Differences (FDID) instead.

  • The outcomes do not share latent factors. SCMO’s bias reduction comes from estimating one set of weights that balances all outcomes’ loadings simultaneously. If GDP, school enrolment and unemployment in your panel are driven by genuinely different latent processes, stacking them just adds noise to the QP – the Tian-Lee-Panchenko / Sun-Ben-Michael-Feller bias gains evaporate. Diagnostic: refit with the separate scheme; if the donor weight vectors across outcomes look unrelated (no cross-outcome agreement on which donors carry weight), you probably don’t have a shared-loading domain.

  • The treated unit is outside the convex hull on at least one outcome. Each extra outcome adds a new dimension the synthetic must match. The more outcomes you stack, the harder the hull condition becomes (more constraints, smaller feasible region). If the pre-period fit is poor on most outcomes, SCMO will still return a weight vector but its bias bounds break – use Forward Difference-in-Differences (FDID) (which permits an offset and a freely selected donor subset) or augmented-SC variants.

  • Outcomes are measured on very different scales without prior standardisation. Concatenation puts every period of every outcome into a single objective; an outcome measured in millions will dominate one measured in proportions unless you demean and scale. Use demean=True (the default) and consider pre-standardising.

  • Long, clean pre-period for a single primary outcome. If you have, say, 30 years of stable annual GDP, single-outcome SC has enough information to nail the weights without help. SCMO’s advantage is in the short-pre-period regime; in the long one, the extra outcomes are mostly book-keeping overhead.

The Monte Carlo below builds two contrasting data-generating processes with the same \(K = 8\) outcomes, \(T_0 = 5\) short pre-period, and known ATT of \(+3\) on the primary outcome:

  • DGP A (Averaged-favoured). A single common factor trajectory \(F_t\) is shared across all outcomes; each outcome carries the same \(F\) plus heavy per-outcome noise. Averaging cancels the per-outcome noise.

  • DGP B (Concatenated-favoured). Each of the \(K\) outcomes has its own factor trajectory (different \(F^{(k)}_t\)), even though all outcomes share the same unit-level loadings \(\Lambda_i\). Averaging across outcomes blurs the per-outcome trajectories; concatenation keeps them as separate match constraints.

import numpy as np
import pandas as pd
from mlsynth import SCMO


def build(*, mode, N=30, T0=5, T1=10, K=8, r=2, TRUE=3.0, noise=1.0, seed=0):
    """mode='averaged' -> outcomes share one factor trajectory + heavy noise.
       mode='concatenated' -> each outcome has its own factor trajectory."""
    rng = np.random.default_rng(seed)
    T = T0 + T1
    phi = rng.normal(size=(N, r))                    # shared loadings
    if mode == "averaged":
        F = rng.normal(size=(T, r))
        Fs = [F for _ in range(K)]                   # all K outcomes share F
    else:
        Fs = [rng.normal(size=(T, r)) for _ in range(K)]
    rows = []
    for i in range(N):
        a = rng.normal(size=K)
        for t in range(T):
            rec = {"unit": f"u{i}", "time": t,
                   "treat": int(i == 0 and t >= T0)}
            for k in range(K):
                y = a[k] + phi[i] @ Fs[k][t] + rng.normal(scale=noise)
                if i == 0 and t >= T0 and k == 0:
                    y += TRUE
                rec[f"y{k+1}"] = y
            rows.append(rec)
    return pd.DataFrame(rows), T0, K


NREPS = 50
for mode in ("averaged", "concatenated"):
    atts = {"concatenated": [], "averaged": [], "MA": [], "separate": []}
    for s in range(NREPS):
        df, T0, K = build(mode=mode, seed=s)
        spec = {"year": list(range(T0)),
                "vars": {f"y{k+1}": f"y{k+1}" for k in range(K)}}
        try:
            res = SCMO({"df": df, "outcome": "y1", "treat": "treat",
                         "unitid": "unit", "time": "time", "spec": spec,
                         "schemes": ["concatenated", "averaged",
                                       "MA", "separate"],
                         "demean": True, "display_graphs": False}).fit()
            for k, v in res.att_by_method().items():
                atts[k].append(v - 3.0)
        except Exception:
            continue
    print(f"\nDGP={mode}  (true ATT = 3.0)   N = {len(atts['averaged'])} reps")
    for k in ("concatenated", "averaged", "MA", "separate"):
        a = np.array(atts[k])
        print(f"  {k:13s}  mean bias = {a.mean():+7.3f}   "
               f"RMSE = {np.sqrt((a**2).mean()):.3f}")

prints (deterministic with the seeds above):

DGP=averaged  (true ATT = 3.0)   N = 50 reps
  concatenated   mean bias =  +0.100   RMSE = 0.744
  averaged       mean bias =  +0.097   RMSE = 0.742
  MA             mean bias =  +0.120   RMSE = 0.788
  separate       mean bias =  +0.101   RMSE = 1.132

DGP=concatenated  (true ATT = 3.0)   N = 50 reps
  concatenated   mean bias =  -0.055   RMSE = 0.769
  averaged       mean bias =  -0.087   RMSE = 0.895
  MA             mean bias =  -0.094   RMSE = 0.750
  separate       mean bias =  -0.062   RMSE = 0.910

Three takeaways:

  1. Both multi-outcome schemes beat single-outcome SC by a wide margin in either regime. separate’s RMSE is 50% worse in DGP A and 18% worse in DGP B than the best multi-outcome competitor. That’s the headline of both papers: with a short \(T_0\), any form of multi-outcome stacking is a strict improvement.

  2. Averaged ties or beats concatenated when the DGP genuinely averages: in DGP A, averaged’s RMSE is 0.742 vs concatenated’s 0.744. The advantage is small on this calibration but the Sun-Ben-Michael-Feller theory says it grows with \(K\).

  3. Concatenated wins when outcomes are distinct: in DGP B, concatenated’s RMSE is 0.769 vs averaged’s 0.895 – a 16% reduction. MA’s pre-fit weighting hedges close to the winner in both regimes (0.788 / 0.750), which is why it’s the safest default when you don’t know which DGP you’re in.

Empirical Illustration: West Germany, matched on 1989 alone#

The canonical SCMO demonstration (Tian-Lee-Panchenko Section 4) revisits the 1990 German reunification. Instead of matching on 30 years of GDP, it matches West Germany to 16 OECD donors on nine economic indicators in the single year 1989 – private social expenditure, energy-per-GDP, electricity and patents per capita, real GDP growth, CPI, trade openness, total tax revenue, and GDP per capita.

import pandas as pd
import numpy as np
from mlsynth import SCMO

url = "https://raw.githubusercontent.com/jgreathouse9/mlsynth/refs/heads/main/basedata/germany_augmented.csv"
df = pd.read_csv(url)
df["Reunification"] = ((df["country"] == "West Germany") & (df["year"] >= 1990)).astype(int)

spec = {"year": 1989, "vars": {
    "private_social_exp": "Private social expenditure",
    "energy_gdp": "Total primary energy supply per unit of GDP",
    "electricity_pc": ("Electricity generation", "per_capita"),
    "patents_pc": ("Triadic patent families", "per_capita"),
    "gdp_growth": "Real GDP growth", "cpi": "CPI: all items",
    "trade": "trade", "tax": "Total tax revenue", "gdp_pc": "gdp"}}

res = SCMO({"df": df, "outcome": "gdp", "treat": "Reunification",
            "unitid": "country", "time": "year", "spec": spec,
            "schemes": ["concatenated", "averaged", "MA"],
            "conformal_alpha": 0.1, "display_graphs": True}).fit()

for name, fit in res.fits.items():
    print(f"{name:13s} ATT {fit.att:8.1f}  p={fit.p_value:.3f}  "
          f"90% CI ({fit.ci[0]:.0f}, {fit.ci[1]:.0f})")

This prints:

concatenated  ATT  -1462.8  p=0.056  90% CI (-1568, -1357)
averaged      ATT  -1720.4  p=0.056  90% CI (-1949, -1492)
MA            ATT  -1462.8  p=0.056  90% CI (-1568, -1357)

The concatenated SC – fit on a single year’s nine indicators, never shown the GDP path – matches West Germany’s pre-1990 GDP trajectory to a root-mean- squared error of 110 (vs. 74 for the conventional SC fit directly to 30 years of GDP), and implies post-reunification per-capita GDP about $1,463 below the synthetic, conformally significant at the 5.6% level. The donor mix is France 0.27, Netherlands 0.25, USA 0.21, Switzerland 0.14, Japan 0.09, Norway 0.04. The averaged scheme is looser here (pre-RMSE 185) because averaging the nine distinct indicators blurs their individual signal – the concatenated scheme is preferred for this domain.

Verification#

Note

Empirical (Path A, German reunification). mlsynth’s concatenated scheme reproduces the Tian-Lee-Panchenko (2024) German reunification result value-for-value: matching West Germany on the nine 1989 indicators yields the published donor weights (France 0.267, Netherlands 0.248, USA 0.208, Switzerland 0.135, Japan 0.092, Norway 0.043, Belgium 0.007) and a pre-1990 GDP-fit RMSE of 110, against a reference QP implementation of the paper’s fn_W to the third decimal.

Simulation (Path B). A factor-model Monte Carlo (shared loadings, \(N=30\) donors, \(T_0=4\), \(K\) related outcomes, true ATT \(= 3\), 400 reps) reproduces the paper’s bias-reduction finding. RMSE of the estimated ATT:

\(K\)

separate

concatenated

1

1.139

1.139

3

1.175

0.916

6

1.068

0.794

The concatenated SC’s error falls monotonically as outcomes are added, while the single-outcome SC does not improve – the \(O(1/\sqrt{K T_0})\) versus \(O(1/\sqrt{T_0})\) gap of Proposition 1. Under a shared common factor with large per-outcome noise (the Sun-Ben-Michael-Feller regime) the averaged scheme also beats the single-outcome SC (e.g. at \(K=6\): separate 1.598, concatenated 1.382, averaged 1.408), confirming both papers’ analyses and the regime-dependence of which scheme wins.

Simulation study: Tian-Lee-Panchenko Table 1 (Path B)#

We reproduce the concatenated paper’s own Monte Carlo (Tian-Lee-Panchenko 2024, Section 3, Table 1) through the packaged SCMO. Their DGP has \(N = 30\) units (1 treated, 29 control), one post period, zero true effect, and \(Y_{it,k} = \delta_{t,k} + Z_i'\theta_{t,k} + \mu_i'\lambda_{t,k} + \varepsilon_{it,k}\) with shared predictors \(Z_i\) (2 observed), \(\mu_i\) (4 unobserved) drawn once from \(U[-d, d]\), large level differences \(\delta, \theta, \lambda \sim N(\omega_k, 1)\), \(\omega_k \sim N(0, 10^2)\), and \(\varepsilon \sim N(0,1)\). The estimand is the effect on outcome 1 at \(t = T_0+1\); with a zero true effect the average absolute bias and SD of \(\hat\tau\) approach the half-normal floor \(\sqrt{2/\pi} \approx 0.80\) and \(1.00\) as the fit improves. The concatenated (de-meaned) scheme reproduces Table 1 at \(d = 1\) – bias and SD fall monotonically as \(K\) and \(T_0\) grow toward the floor (800 reps; the paper uses 5000):

Average absolute bias / SD of \(\hat\tau\), \(d=1\)#

\(T_0\)

\(K\)

mlsynth (bias / SD)

paper (bias / SD)

5

1

1.47 / 1.83

1.43 / 1.81

5

10

1.26 / 1.64

1.22 / 1.54

20

1

1.26 / 1.59

1.18 / 1.49

20

10

1.11 / 1.40

1.08 / 1.36

import numpy as np
import pandas as pd
from mlsynth import SCMO

N = 30
def tlp_sim(d, T0, K, rng):                       # Tian-Lee-Panchenko Sec 3 draw
    T = T0 + 1
    Z = np.empty((N, 2)); mu = np.empty((N, 4))
    Z[1:] = rng.uniform(-1, 1, (N - 1, 2)); mu[1:] = rng.uniform(-1, 1, (N - 1, 4))
    Z[0] = rng.uniform(-d, d, 2); mu[0] = rng.uniform(-d, d, 4)
    Y = np.empty((N, T, K))
    for k in range(K):
        w = rng.normal(0, 10)
        Y[:, :, k] = (rng.normal(w, 1, T)[None, :] + Z @ rng.normal(w, 1, (T, 2)).T
                      + mu @ rng.normal(w, 1, (T, 4)).T + rng.normal(0, 1, (N, T)))
    df = pd.DataFrame([{"unit": f"u{i}", "time": t, "treat": int(i == 0 and t >= T0),
                        **{f"y{k+1}": float(Y[i, t, k]) for k in range(K)}}
                       for i in range(N) for t in range(T)])
    spec = {"year": list(range(T0)), "vars": {f"y{k+1}": f"y{k+1}" for k in range(K)}}
    res = SCMO({"df": df, "outcome": "y1", "treat": "treat", "unitid": "unit",
                "time": "time", "spec": spec, "schemes": ["concatenated"],
                "demean": True, "display_graphs": False}).fit()
    return res.att_by_method()["concatenated"]

for K in (1, 3, 10):                              # half-normal floor: bias~0.80, sd~1.00
    tau = np.array([tlp_sim(1.0, 10, K, np.random.default_rng(K)) for _ in range(300)])
    print(f"K={K:>2}:  bias={np.mean(np.abs(tau)):.2f}  sd={np.std(tau):.2f}")
# bias/SD shrink toward the floor as K grows

Simulation study: Sun-Ben-Michael-Feller averaging gain (Path B)#

Sun-Ben-Michael-Feller (2025) prove that, under a shared factor structure, the multiple-outcome schemes lower the weight-estimation bias relative to fitting each outcome separately, and that averaging lowers it furthest. Their Table 1 gives the leading bias terms (\(N\) fixed):

SBMF Table 1 – leading bias terms#

scheme

bias from imperfect fit

bias from overfitting

separate

\(O(1)\)

\(O(1/\sqrt{T_0})\)

concatenated

\(O(1)\)

\(O(1/\sqrt{T_0 K})\)

averaged

\(O(1/\sqrt{K})\)

\(O(1/\sqrt{T_0 K})\)

The distinctive claim is the averaged column: averaging the \(K\) outcomes denoises the matching target, so it alone shaves the imperfect-fit bias by \(1/\sqrt{K}\) while separate and concatenated stay \(O(1)\) there. To see this through the packaged estimator we isolate the object the theorem bounds – the weight bias \((\hat\gamma - \gamma^\star)\) – by applying each scheme’s estimated donor weights to the noiseless donor structure (the observed-data counterfactual is otherwise swamped by the treated unit’s irreducible post-period noise). On their supplement DGP \(Y_{it,k} = \rho\,\phi_i \mu_t + (1-\rho)\,\phi_{ik}\mu_{tk} + \varepsilon_{it,k}\) (\(\rho = 0.5\), a common rank-one factor plus outcome-specific idiosyncratic factors), 200 reps:

Pure weight bias (RMSE of \(\hat\gamma\) applied to structure)#

\(K\)

separate

concatenated

averaged

1

0.112

0.112

0.112

4

0.149

0.138

0.140

8

0.140

0.120

0.097

16

0.135

0.087

0.071

The ordering is exactly Table 1’s: separate is flat in \(K\) (\(O(1)\)), while concatenated and averaged fall as outcomes are added, and averaged is lowest at large \(K\) – the \(1/\sqrt{K}\) imperfect-fit reduction that is unique to averaging.

import numpy as np
import pandas as pd
from mlsynth import SCMO

N, T0, TPOST, RHO, TAU = 20, 40, 4, 0.5, 2.0

def _ar1(T, rng, c=0.5):
    x = np.zeros(T)
    for t in range(1, T):
        x[t] = c * x[t - 1] + rng.normal()
    return x

def one_rep(K, rng):
    """SBMF-regime draw; returns each scheme's weight bias on the noiseless structure."""
    T = T0 + TPOST
    gstar = rng.dirichlet(np.ones(N) * 0.5)
    phi = rng.normal(0, 1, N); phi0 = gstar @ phi; mu = _ar1(T, rng)
    m0 = donor1 = None; structs = []
    for k in range(K):
        if k == 0:                                # outcome 1: common only (oracle holds)
            phik, muk, phi0k = phi.copy(), mu.copy(), phi0
        else:                                     # k>=2: idiosyncratic, treated loading independent
            phik, muk, phi0k = rng.normal(0, 1, N), _ar1(T, rng), rng.normal()
        donor = RHO * phi[:, None] * mu[None, :] + (1 - RHO) * phik[:, None] * muk[None, :]
        treated = RHO * phi0 * mu + (1 - RHO) * phi0k * muk
        structs.append(np.vstack([treated, donor]))
        if k == 0:
            m0, donor1 = treated.copy(), donor
    Y = np.stack(structs, axis=2) + rng.normal(0, 1, (N + 1, T, K))
    Y[0, T0:, 0] += TAU
    df = pd.DataFrame([{"unit": f"u{i}", "time": t, "treat": int(i == 0 and t >= T0),
                        **{f"y{k+1}": float(Y[i, t, k]) for k in range(K)}}
                       for i in range(N + 1) for t in range(T)])
    spec = {"year": list(range(T0)), "vars": {f"y{k+1}": f"y{k+1}" for k in range(K)}}
    res = SCMO({"df": df, "outcome": "y1", "treat": "treat", "unitid": "unit",
                "time": "time", "spec": spec,
                "schemes": ["separate", "concatenated", "averaged"],
                "demean": False, "display_graphs": False}).fit()
    out = {}
    for sch in ("separate", "concatenated", "averaged"):
        g = np.array([res.fits[sch].donor_weights.get(f"u{j}", 0.0) for j in range(1, N + 1)])
        out[sch] = float(np.mean(donor1[:, T0:].T @ g - m0[T0:]))
    return out

for K in (1, 4, 8, 16):
    e = {s: [] for s in ("separate", "concatenated", "averaged")}
    rng = np.random.default_rng(100 + K)
    for _ in range(100):
        r = one_rep(K, rng)
        for s in e: e[s].append(r[s])
    rmse = lambda v: float(np.sqrt(np.mean(np.array(v) ** 2)))
    print(f"K={K:>2}:  sep={rmse(e['separate']):.3f}  cat={rmse(e['concatenated']):.3f}  "
          f"avg={rmse(e['averaged']):.3f}")
# separate flat; concatenated & averaged fall; averaged lowest at large K

Core API#

Synthetic Control with Multiple Outcomes (SCMO).

A thin, NumPy-first orchestration over mlsynth.utils.scmo_helpers. SCMO builds the synthetic control by matching the treated unit to donors on a matching matrix assembled from one or more related outcomes/predictors (optionally across several pre-treatment periods), rather than a single outcome’s long trajectory. Two weighting schemes from the literature are supported, plus a model-average and the conventional baseline:

  • concatenated – Tian, Lee & Panchenko (2024): stack the standardized matching columns and solve one simplex SC.

  • averaged – Sun, Ben-Michael & Feller (2025): match the average of the standardized outcomes within each period (extra 1/sqrt(K) bias gain).

  • separate – the conventional single-outcome SC baseline.

  • MA – convex model-average of the above, chosen by pre-treatment fit.

Matching is configured by a spec ({"year": int|list, "vars": {...}}); when omitted it is built by stacking the primary outcome and addout over the pre-treatment period. Inference defaults to the Abadie permutation (placebo) test; conformal intervals are available via inference="conformal".

class mlsynth.estimators.scmo.SCMO(config: SCMOConfig | dict)#

Bases: object

Synthetic Control with Multiple Outcomes estimator.

Parameters:

config (SCMOConfig or dict) – Validated configuration. Beyond the common fields (df, outcome, treat, unitid, time, display_graphs, save, colors), SCMO reads spec (matching specification), schemes / method, demean, inference, addout and conformal_alpha.

References

Tian, W., Lee, S., & Panchenko, V. (2024). Synthetic Controls with Multiple Outcomes. arXiv:2304.02272.

Sun, L., Ben-Michael, E., & Feller, A. (2025). Using Multiple Outcomes to Improve the Synthetic Control Method. Review of Economics and Statistics.

fit() SCMOResults#

Build the matching matrix, fit the requested schemes, and return results.

Returns:

SCMOResults – Container of per-scheme SCMOMethodFit objects with convenience aliases (att, counterfactual, gap, donor_weights) forwarding to the primary scheme.

Configuration#

class mlsynth.config_models.SCMOConfig(*, df: ~pandas.DataFrame, outcome: str, treat: str, unitid: str, time: str, display_graphs: bool = True, save: bool | str = False, counterfactual_color: ~typing.List[str] = <factory>, treated_color: str = 'black', addout: str | ~typing.List[str] = <factory>, method: ~typing.Annotated[str, _PydanticGeneralMetadata(pattern='^(TLP|SBMF|BOTH)$')] = 'TLP', conformal_alpha: ~typing.Annotated[float, ~annotated_types.Gt(gt=0.0), ~annotated_types.Lt(lt=1.0)] = 0.1, spec: ~typing.Dict[str, ~typing.Any] | None = None, schemes: ~typing.List[str] | None = None, demean: bool = False, conformal_q: ~typing.Annotated[float, ~annotated_types.Gt(gt=0)] = 1.0)#

Configuration for the Synthetic Control with Multiple Outcomes (SCMO) estimator.

addout: str | List[str]#
conformal_alpha: float#
conformal_q: float#
demean: bool#
method: str#
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

schemes: List[str] | None#
spec: Dict[str, Any] | None#

Result Containers#

SCMO.fit() returns an SCMOResults, whose fits maps each weighting scheme to an SCMOMethodFit (counterfactual, gap, ATT, donor weights, and the CWZ conformal p-value and CI). The prepared, NumPy-only panel is exposed as an SCMOInputs, with units and time addressed through an IndexSet.

Frozen, NumPy-first containers for Synthetic Control with Multiple Outcomes.

SCMO (Tian-Lee-Panchenko 2024; Sun-Ben-Michael-Feller 2025) builds the synthetic control by matching the treated unit to donors on a matching matrix Z assembled from one or more related outcomes/predictors – optionally across several pre-treatment periods – rather than a single outcome’s long trajectory. Everything below is pure NumPy; the only DataFrame touchpoint is mlsynth.utils.scmo_helpers.setup.prepare_scmo_inputs().

Units and time are addressed through IndexSet (immutable label<->integer maps) so downstream code never reaches back into pandas.

class mlsynth.utils.scmo_helpers.structures.SCMOInputs(unit_index: ~mlsynth.utils.fast_scm_helpers.structure.IndexSet, time_index: ~mlsynth.utils.fast_scm_helpers.structure.IndexSet, treated_idx: int, donor_idx: ~numpy.ndarray, Y: ~numpy.ndarray, T0: int, Z: ~numpy.ndarray, predictor_labels: ~typing.Sequence, col_period: ~numpy.ndarray | None = None, metadata: ~typing.Dict[str, ~typing.Any] = <factory>)#

Bases: object

Preprocessed, NumPy-only panel for the SCMO engine.

Parameters:
  • unit_index (IndexSet) – All N units; row order of Y and Z.

  • time_index (IndexSet) – All T periods; column order of Y.

  • treated_idx (int) – Row index (into unit_index) of the treated unit.

  • donor_idx (np.ndarray) – Row indices of the donor pool.

  • Y (np.ndarray) – Primary-outcome panel, shape (N, T) (rows = units).

  • T0 (int) – Number of pre-treatment periods.

  • Z (np.ndarray) – Standardized matching matrix, shape (N, P) (rows = units).

  • predictor_labels (Sequence) – Length-P labels for the columns of Z.

  • metadata (dict) – Free-form provenance (spec, demean flag, dropped columns, …).

property T: int#
T0: int#
Y: ndarray#
property Y_donors: ndarray#
Z: ndarray#
property Z_donors: ndarray#
property Z_treated: ndarray#
col_period: ndarray | None = None#
donor_idx: ndarray#
property donor_labels: ndarray#
metadata: Dict[str, Any]#
property n_donors: int#
predictor_labels: Sequence#
time_index: IndexSet#
treated_idx: int#
property treated_label: Any#
unit_index: IndexSet#
property y_treated: ndarray#
class mlsynth.utils.scmo_helpers.structures.SCMOMethodFit(name: str, weights: ~numpy.ndarray, counterfactual: ~numpy.ndarray, gap: ~numpy.ndarray, att: float, pre_rmse: float, donor_weights: ~typing.Dict[~typing.Any, float], att_se: float | None = None, ci: ~typing.Tuple[float, float] = (nan, nan), p_value: float | None = None, metadata: ~typing.Dict[str, ~typing.Any] = <factory>)#

Bases: object

A single weighting-scheme fit (concatenated / averaged / separate / MA).

att: float#
att_se: float | None = None#
ci: Tuple[float, float] = (nan, nan)#
counterfactual: ndarray#
donor_weights: Dict[Any, float]#
gap: ndarray#
metadata: Dict[str, Any]#
name: str#
p_value: float | None = None#
pre_rmse: float#
weights: ndarray#
class mlsynth.utils.scmo_helpers.structures.SCMOResults(inputs: ~mlsynth.utils.scmo_helpers.structures.SCMOInputs, fits: ~typing.Dict[str, ~mlsynth.utils.scmo_helpers.structures.SCMOMethodFit], selected_variant: str = 'concatenated', metadata: ~typing.Dict[str, ~typing.Any] = <factory>)#

Bases: object

Top-level container returned by mlsynth.SCMO.fit().

property att: float#
att_by_method() Dict[str, float]#
property counterfactual: ndarray#
property donor_weights: Dict[Any, float]#
fits: Dict[str, SCMOMethodFit]#
property gap: ndarray#
inputs: SCMOInputs#
metadata: Dict[str, Any]#
property pre_rmse: float#
selected_variant: str = 'concatenated'#

Helper Modules#

Data preparation – the only DataFrame touchpoint: pivots the long panel to NumPy, builds the unit/time ``IndexSet``es, and assembles the matching matrix.

Long-DataFrame -> NumPy boundary for SCMO (the only pandas touchpoint).

mlsynth.utils.scmo_helpers.setup.prepare_scmo_inputs(df: DataFrame, *, unitid: str, time: str, outcome: str, spec: Dict[str, Any], treated_unit: Any, intervention_time: Any) SCMOInputs#

Pivot the panel to NumPy, build IndexSets and the matching matrix.

Parameters:
  • df (pd.DataFrame) – Long balanced panel (one row per unit-period).

  • unitid, time, outcome (str) – Column names for unit id, time, and the primary outcome.

  • spec (dict) – Matching specification consumed by mlsynth.utils.scmo_helpers.matrix_builder.build_matching_matrix().

  • treated_unit (Any) – Label of the treated unit.

  • intervention_time (Any) – First treated period; pre-period is time < intervention_time.

Returns:

SCMOInputs – Pure-NumPy container for the estimation engine.

The spec-driven matching-matrix builder (per-outcome standardization, level/log/per_capita transforms, complete-cases column drop).

Spec-driven construction of the SCMO matching matrix Z (pure NumPy).

A spec describes how to assemble the columns of Z from a long panel:

spec = {
    "year": 1989,                       # int, or list[int] to stack periods
    "vars": {
        "private_social_exp": "Private social expenditure",   # raw column
        "electricity_pc": ("Electricity generation", "per_capita"),
        "patents_pc":     ("Triadic patent families", "per_capita"),
        "gdp_growth":     "Real GDP growth",
        "gdp_pc":         "gdp",
        ...
    },
    "per_capita_denominator": "Population levels",   # optional, default
}

Rules are either a bare column name (raw level) or (column, op) with op in {"level", "log", "per_capita", "raw"}. Each resulting column is standardized by its cross-unit SD (Tian-Lee-Panchenko footnote 5); columns that are not complete across all units are dropped (complete.cases).

mlsynth.utils.scmo_helpers.matrix_builder.build_matching_matrix(df: DataFrame, *, unitid: str, time: str, spec: Dict[str, Any], unit_index: IndexSet) Tuple[ndarray, List[str]]#

Assemble the standardized matching matrix Z from spec.

Parameters:
  • df (pd.DataFrame) – Long panel (one row per unit-period) with the spec’s columns.

  • unitid, time (str) – Unit and time column names.

  • spec (dict) – Matching specification (see module docstring).

  • unit_index (IndexSet) – Canonical unit ordering for the rows of Z.

Returns:

  • Z (np.ndarray) – Standardized matching matrix, shape (N, P).

  • labels (list of str) – Length-P predictor labels (name or name@year when several periods are stacked).

  • col_period (np.ndarray) – Length-P period that each column belongs to (used by the averaged scheme to average across outcomes within a period).

The simplex SC solver and the weighting schemes (concatenated / averaged / separate / model-average) plus de-meaning.

Synthetic-control weight solvers for SCMO (NumPy/cvxpy, no estutils.Opt).

The simplex solver replaces Opt.SCopt(scm_model_type="SIMPLEX"): it minimizes the squared imbalance between the treated and donor matching vectors subject to the convex-combination constraint (weights >= 0, sum 1), exactly the Tian-Lee-Panchenko / Abadie program.

mlsynth.utils.scmo_helpers.solvers.simplex_weights(Z_treated: ndarray, Z_donors: ndarray) ndarray#

Convex (simplex) SC weights minimizing ||Z_treated - Z_donors' w||^2.

Parameters:
  • Z_treated (np.ndarray) – Treated matching vector, shape (P,).

  • Z_donors (np.ndarray) – Donor matching matrix, shape (J, P).

Returns:

np.ndarray – Donor weights, shape (J,); non-negative and summing to one.

SCMO estimation: weighting schemes, demeaning, counterfactual, effects.

All functions are pure NumPy and operate on the arrays in SCMOInputs, so the same core drives both the real fit and the permutation placebos in mlsynth.utils.scmo_helpers.inference.

Schemes#

  • concatenated (Tian-Lee-Panchenko): match the full standardized matching matrix Z.

  • averaged (Sun-Ben-Michael-Feller): match the average of the standardized outcomes within each period.

  • separate (conventional): match the primary outcome’s standardized pre-treatment trajectory alone.

  • MA: convex model-average of the concatenated and averaged counterfactuals (chosen by pre-treatment fit).

Demeaning (demean=True) is the Doudchenko-Imbens / Sun-Ben-Michael-Feller intercept shift: the counterfactual is the treated unit’s pre-period mean plus the weighted donor deviations, allowing levels to differ.

mlsynth.utils.scmo_helpers.estimation.fit_scheme(inputs: SCMOInputs, scheme: str, demean: bool = False) SCMOMethodFit#

Fit one weighting scheme on the real treated unit.

mlsynth.utils.scmo_helpers.estimation.model_average(inputs: SCMOInputs, fits: List[SCMOMethodFit]) SCMOMethodFit#

Convex model-average of several fits by pre-treatment fit (the old BOTH).

The Chernozhukov-Wuethrich-Zhu conformal inference (multi-outcome form).

Conformal inference for SCMO (Chernozhukov-Wuethrich-Zhu, multi-outcome form).

This is the single inference procedure used for every weighting scheme, after Sun, Ben-Michael & Feller (2025), Online Appendix A, which adapts the conformal test of Chernozhukov, Wuethrich & Zhu (2021) to multiple outcomes.

For a constant-effect null H0: tau = tau0 the synthetic-control weights do not depend on the post-period outcome (the matching matrix Z is built from pre-period information), so the adjusted residual is simply the gap shifted by tau0 in the post-period. The per-period test statistic is:

S_q(u_t) = ( (1/sqrt(K)) * sum_k |u_tk|^q )^{1/q},   default q = 1,

and the conformal p-value ranks the post-treatment statistic against the distribution of pre-treatment (moving-block) statistics. Inverting the test over a grid of tau0 yields a confidence interval for the ATT. With a single predicted outcome (K = 1) the statistic reduces to |gap|.

mlsynth.utils.scmo_helpers.inference.conformal_inference(y: ndarray, counterfactual: ndarray, T0: int, alpha: float = 0.1, q: float = 1.0, n_grid: int = 600) Tuple[float, float, Tuple[float, float]]#

CWZ conformal ATT, p-value, and confidence interval from a gap series.

Tests the average post-treatment effect (the scalar case of Sun-Ben-Michael- Feller Online Appendix A / Chernozhukov-Wuethrich-Zhu 2021): the test statistic is the post-period mean gap, and the reference distribution is the set of pre-treatment moving-block means of the same length. Inverting the test over tau0 yields the ATT confidence interval. (For a single predicted outcome the per-period S_q reduces to |gap|; q only bites with K > 1 outcomes.)

Parameters:
  • y, counterfactual (np.ndarray) – Observed treated outcome and estimated counterfactual, shape (T,).

  • T0 (int) – Number of pre-treatment periods.

  • alpha (float) – Miscoverage rate (e.g. 0.1 -> 90% interval).

  • q (float) – Norm exponent (kept for the multi-outcome generalization; inert for K=1).

  • n_grid (int) – Resolution of the test-inversion grid for the confidence interval.

Returns:

  • att (float) – Mean post-treatment gap.

  • p_value (float) – Conformal p-value for the sharp null of no average effect.

  • ci (tuple of float) – (lower, upper) confidence interval for the ATT.

Scheme resolution, treatment derivation, spec construction, and the run loop.

Glue the SCMO pieces together: resolve schemes, run fits, attach inference.

mlsynth.utils.scmo_helpers.orchestration.build_spec(explicit_spec: Dict[str, Any] | None, outcome: str, addout, pre_years: List[Any]) Dict[str, Any]#

Use an explicit spec, or stack the outcome + auxiliary outcomes over the pre-period.

mlsynth.utils.scmo_helpers.orchestration.derive_treatment(df: DataFrame, unitid: str, time: str, treat: str) Tuple[Any, Any, List[Any]]#

Read treated unit, intervention time, and pre-period years off the treat column.

mlsynth.utils.scmo_helpers.orchestration.resolve_schemes(schemes: List[str] | None, method: str) List[str]#

Pick the weighting schemes to run (explicit schemes win over method).

mlsynth.utils.scmo_helpers.orchestration.run_scmo(inputs: SCMOInputs, schemes: List[str], demean: bool, conformal_alpha: float, conformal_q: float = 1.0) Dict[str, SCMOMethodFit]#

Fit each requested scheme and attach CWZ conformal inference to every fit.