Harmonic Synthetic Control (HSC)

Harmonic Synthetic Control (HSC)#

When to use this estimator#

Reach for HSC, due to Liu and Xu (2026) [HSC], when your outcome is nonstationary with unit-specific stochastic trends and you do not know, ex ante, whether that trend variation is shared across units or idiosyncratic to the treated unit. This is the regime where standard synthetic control is most fragile, and where the usual fixes force an awkward all-or-nothing choice:

Matching on raw levels (plain SC, SC-with-intercept, SDID) uses all the trend variation for donor weighting. That is ideal when the trend is shared (a convex combination of donors reproduces it), but when the treated unit has its own stochastic trend it produces spurious matching — a tight pre-period fit that does not persist out of sample (Granger-Newbold; Phillips, 1986), with bias that does not vanish as the pre-period grows.
Hard pre-filtering (differencing, or the Hamilton trend-cycle split of Synthetic Business Cycle (SBC)) removes the stochastic trend before matching. That neutralizes the idiosyncratic-trend risk, but it also discards genuinely shared trend variation that would have helped pin down the donor weights.

HSC replaces this binary choice with a soft, data-driven allocation: a single parameter $\rho \in [0, 1]$, chosen by rolling-origin cross-validation, continuously interpolates between SC on differences ($\rho \to 0$) and SC on levels with an intercept/trend ($\rho \to 1$). The method does not need to know which regime it is in — cross-validation finds the allocation that predicts best out of sample.

Concretely, use HSC for trending, nonstationary outcomes where the shared-vs-idiosyncratic split is unknown:

Marketing / business science. Brand or category sales, market share, or traffic after an event — where some markets share a common demand trend but the focal market may also drift on its own.
Macro / policy. GDP per capita, unemployment, emissions — the canonical nonstationary outcomes, where whether the treated economy’s trend is common or idiosyncratic is rarely known in advance.

The flip side: if your outcome is plausibly stationary, the spurious-matching concern is muted and a conventional SCM is simpler. And if you are confident the stochastic trend is purely idiosyncratic, hard differencing (or Synthetic Business Cycle (SBC)) is a defensible fixed choice; HSC buys robustness precisely when you are unsure.

Notation#

Let $j = 1$ denote the treated unit, with all units $\mathcal{N} \coloneqq \{1, \dots, N\}$ and donor pool $\mathcal{N}_0 \coloneqq \mathcal{N} \setminus \{1\}$ of cardinality $N_0$. Time runs over $t \in \mathcal{T} \coloneqq \{1, \dots, T\}$, 1-indexed; the intervention takes effect after period $T_0$, splitting $\mathcal{T}$ into the pre-period $\mathcal{T}_1 \coloneqq \{t \in \mathcal{T} : t \le T_0\}$ (of length $T_0$) and the post-period $\mathcal{T}_2 \coloneqq \{t \in \mathcal{T} : t > T_0\}$ (of length $T - T_0$).

The treated series is $\mathbf{y}_1 = (y_{11}, \dots, y_{1T})^\top \in \mathbb{R}^{T}$ with scalar outcomes $y_{1t}$; each donor $j \in \mathcal{N}_0$ contributes a series $\mathbf{y}_j$, stacked into the donor matrix $\mathbf{Y}_0 \coloneqq [\mathbf{y}_j]_{j \in \mathcal{N}_0} \in \mathbb{R}^{T \times N_0}$ (one column per donor). Restricted to the pre-period these are the treated pre-outcomes $\mathbf{y}_{1,\mathcal{T}_1} \in \mathbb{R}^{T_0}$ and the donor pre-outcomes $\mathbf{Y}_{0,\mathcal{T}_1} \in \mathbb{R}^{T_0 \times N_0}$, with their post-period counterparts $\mathbf{Y}_{0,\mathcal{T}_2}$.

Donor weights are $\mathbf{w} \in \mathbb{R}^{N_0}$, constrained to the unit simplex $\Delta^{N_0} \coloneqq \{\mathbf{w} \in \mathbb{R}_{\ge 0}^{N_0} : \|\mathbf{w}\|_1 = 1\}$; the optimiser is $\mathbf{w}^\ast$ and the fitted vector $\widehat{\mathbf{w}}$. The synthetic counterfactual is $\widehat{\mathbf{y}}_1$ with entries $\widehat{y}_{1t}$, the per-period effect is $\tau_t \coloneqq y_{1t} - \widehat{y}_{1t}$, and the ATT is $\widehat{\tau} \coloneqq |\mathcal{T}_2|^{-1} \sum_{t \in \mathcal{T}_2} \tau_t$. The pre-treatment discrepancy under weights $\mathbf{w}$ is $\mathbf{r}(\mathbf{w}) \coloneqq \mathbf{y}_{1,\mathcal{T}_1} - \mathbf{Y}_{0,\mathcal{T}_1}\,\mathbf{w}$.

Two page-specific objects govern the spectral allocation: $\mathbf{D}_q$ is the $q$-th order difference operator ($q \in \{1, 2\}$) and $\mathbf{K}_q \coloneqq \mathbf{D}_q^\top \mathbf{D}_q$ the roughness matrix. The treated-unit-specific smooth component is $\mathbf{e}$ (absorbing the idiosyncratic low-frequency residual), the allocation parameter is $\rho \in [0, 1]$ with its cross-validated optimum $\widehat{\rho}$, and the donor-ridge strength is $\zeta \ge 0$.

The Outcome Decomposition#

HSC decomposes untreated potential outcomes into three pieces (Liu-Xu Eq. (1)); for a generic unit $j \in \mathcal{N}$,

\[y_{jt}^N \;=\; L_{jt} \;+\; R_{jt} \;+\; \varepsilon_{jt},\]

where $L_{jt} = \boldsymbol{\lambda}_j^\top \mathbf{f}_t$ is a shared low-rank component (a convex combination of donors that matches the treated loadings reproduces it), $R_{jt}$ is an idiosyncratic stochastic trend (long-run variance grows without bound), and $\varepsilon_{jt}$ is idiosyncratic short-run noise. The difficulty is $R$: independent stochastic trends produce realized correlations that do not vanish as $T_0$ grows, so longer pre-periods do not help. Whether the treated unit’s trend variation is mostly shared (in $L$) or idiosyncratic (in $R$) is generally unknown — which is exactly what HSC’s allocation parameter is designed to handle.

Assumptions#

HSC inherits the standard synthetic-control identification conditions for the shared component and adds two structural conditions on the idiosyncratic trend that license the spectral metric and the forecast step. Each is stated formally with a plain-language remark.

Assumption 1 (three-way outcome model). Untreated potential outcomes follow the decomposition above, $y_{jt}^N = \boldsymbol{\lambda}_j^\top \mathbf{f}_t + R_{jt} + \varepsilon_{jt}$, where $\boldsymbol{\lambda}_j^\top \mathbf{f}_t$ is a shared low-rank factor structure, $R_{jt}$ is an integrated (unit-root) idiosyncratic process, and $\varepsilon_{jt}$ is mean-zero short-run noise with $\mathbb{E}[\varepsilon_{jt} \mid \mathbf{f}_t, \boldsymbol{\lambda}_j, R_{jt}] = 0$.

Remark. This is the interactive-fixed-effects model behind synthetic-control consistency theory (Abadie et al., 2010; Bai, 2009), augmented with a unit-specific stochastic trend $R$. The augmentation is the whole point: classical SC assumes the donor pool spans all of the treated unit’s systematic variation, whereas HSC explicitly allows a piece ($R$) that no donor combination can reproduce.

Assumption 2 (the shared component is in the donor hull). The treated unit’s factor loading lies in the convex hull of the donor loadings — there exist weights $\mathbf{w} \in \Delta^{N_0}$ with $\sum_{j \in \mathcal{N}_0} w_j \boldsymbol{\lambda}_j = \boldsymbol{\lambda}_1$, hence $\sum_{j \in \mathcal{N}_0} w_j L_{jt} = L_{1t}$ for all $t$.

Remark. This is the usual “treated unit in the convex hull of donors” requirement of Abadie et al. (2010), but imposed on the shared component only. HSC does not ask the donors to reproduce the treated unit’s idiosyncratic trend — that is what the smooth component $\mathbf{e}$ is for — so the hull condition is materially weaker here than in level-matching SC.

Assumption 3 (spectral separation of the idiosyncratic trend). The idiosyncratic trend $R_{1t}$ is low-frequency: it is integrated of order $q$ so that $\mathbf{D}_q R_1$ is stationary and short-memory, and its energy is concentrated at low frequencies, separated from the broadband short-run noise $\varepsilon$.

Remark. This is the assumption that gives the metric $\mathbf{W}_{\rho,q}$ and the smoother $\mathbf{S}_{\rho,q}$ their meaning. The smoother extracts the low-frequency part of the pre-period residual into $\mathbf{e}$; the metric down-weights exactly that part when matching donors. If $R$ were itself high-frequency, no smoother could separate it from noise and the spectral allocation would have nothing to exploit. Under independent stochastic trends, level-matching $R^2$ does not vanish as $T_0 \to \infty$ (Granger–Newbold, 1974; Phillips, 1986) — the pre-period fit looks excellent but is spurious — which is precisely the failure mode Assumption 3 lets HSC avoid.

Assumption 4 (the smooth component is forecastable). The smooth component $\mathbf{e}$ absorbing $R_1$ (plus any unmatched low-frequency residual) follows a model the post-period forecaster extrapolates consistently — by default an integrated AR(1), which is correctly specified when $R_1$ is an ARIMA(1,1,0) trend.

Remark. The post-period counterfactual error decomposes into the donor-matching error (controlled by Assumption 2) and the forecast error of $\mathbf{e}$. The latter dominates at long horizons, so the relevant model is the one for $R_1$’s differences. forecaster="last" is the conservative fallback (a driftless random walk) when even an AR(1) on the differences is suspect.

Assumption 5 (no anticipation, no interference, clean donors). Donor units are untreated throughout the sample; the treated unit’s pre-period outcomes are untreated potential outcomes (no anticipation); and treatment of the focal unit does not spill over onto donors (SUTVA).

Remark. These are the standard SCM design conditions and are not specific to HSC; they are what make $\mathbf{Y}_{0,\mathcal{T}_1}$ and $\mathbf{Y}_{0,\mathcal{T}_2}$ valid controls and the pre-period a clean training window for both $\mathbf{w}$ and $\mathbf{e}$.

Diagnostic: shared vs idiosyncratic stochastic trends#

The HSC paper’s central claim is that the choice between matching on levels and matching on differences should be made by the data, not a fixed prior. The script below builds two panels with the same true ATT of zero and the same number of donors, differing only in whether the treated unit’s stochastic trend is the one the donors share or its own independent random walk:

Panel A – mode='shared'. Donors and treated each carry the same random walk plus their own short-run noise. Assumption 2 holds trivially (the treated factor loading sits inside the donor hull) and Assumption 3 is irrelevant.
Panel B – mode='idio'. Donors share a random walk but the treated unit is driven by an independent random walk. The treated loading on the donors’ shared factor is zero, so a convex combination of the donors cannot reproduce the treated trajectory. This is the Granger-Newbold / Phillips (1986) spurious-matching regime: the level-matched pre-period fit will look great by chance, but the post-period gap is mostly the difference between two independent random walks – not a treatment effect.

import numpy as np
import pandas as pd

from mlsynth import CLUSTERSC, HSC


def panel(*, mode, n_donors=10, T0=40, T1=20, seed=0):
    """One shared random walk for the donors. Treated either shares
    it (mode='shared') or has its own independent RW (mode='idio').
    True ATT = 0 by construction.
    """
    rng = np.random.default_rng(seed)
    T = T0 + T1
    shared_rw = np.cumsum(rng.standard_normal(T))
    eps_d = 0.3 * rng.standard_normal((n_donors, T))
    donors = shared_rw[None, :] + eps_d
    if mode == "shared":
        treated_trend = shared_rw
    else:
        treated_trend = np.cumsum(rng.standard_normal(T))
    treated = treated_trend + 0.3 * rng.standard_normal(T)
    rows = [{"unit": "T", "t": t, "y": float(treated[t]),
             "treat": int(t >= T0)} for t in range(T)]
    for i, d in enumerate(donors):
        rows.extend({"unit": f"d{i}", "t": t, "y": float(d[t]),
                      "treat": 0} for t in range(T))
    return pd.DataFrame(rows)


for mode, label in [
    ("shared", "shared stochastic trend"),
    ("idio",   "idiosyncratic stochastic trend"),
]:
    df = panel(mode=mode, seed=1)
    h = HSC({"df": df, "outcome": "y", "treat": "treat",
               "unitid": "unit", "time": "t",
               "display_graphs": False}).fit()
    sc = CLUSTERSC({"df": df, "outcome": "y", "treat": "treat",
                      "unitid": "unit", "time": "t",
                      "method": "PCR", "objective": "SIMPLEX",
                      "cluster": False,
                      "display_graphs": False}).fit()
    print(f"\n{label}  (true ATT = 0)")
    print(f"  HSC  ATT = {h.att:+7.3f}   rho* = {h.design.selected_rho:.3f}")
    print(f"  SC   ATT = {sc.att:+7.3f}")

prints (deterministic with the seed above):

shared stochastic trend  (true ATT = 0)
  HSC  ATT =  +0.079   rho* = 0.970
  SC   ATT =  -0.082

idiosyncratic stochastic trend  (true ATT = 0)
  HSC  ATT =  -5.002   rho* = 0.200
  SC   ATT = -13.364

Three takeaways:

HSC’s selected $\widehat{\rho}$ is the diagnostic. When the trend is shared, cross-validation picks $\widehat{\rho} \approx 1$ (essentially level-matching SC). When the trend is idiosyncratic, it drops to $\widehat{\rho} \approx 0.2$ (almost pure differences matching). Reading off $\widehat{\rho}$ after fitting tells you which regime the data live in.
Spurious matching is large and one-sided. On a panel where the true ATT is zero, level-matched SC reports an apparent effect of $-13.4$ simply because two independent random walks drifted apart over 60 periods. This bias does not vanish as $T_0$ grows – it is the Phillips (1986) phenomenon, not finite-sample noise.
HSC pays a small efficiency cost in the shared regime. In Panel A, both methods recover the true zero ATT to within noise (SC’s $-0.08$ vs HSC’s $+0.08$); in Panel B, HSC’s bias ($-5.0$) is well below SC’s ($-13.4$) but is not zero because cross-validation with a finite pre-period cannot completely undo the spurious component. The choice between HSC and a fixed-$\rho$ estimator is therefore a Bayesian one: if you are confident which regime you are in, take the corresponding endpoint; if you are not, HSC’s CV pays a small price in the easy regime and a large dividend in the hard one.

When to prefer SBC over HSC#

HSC and Synthetic Business Cycle (SBC) are both targeted at the spurious-matching failure mode of classical SC, but they take opposite engineering bets. HSC dials a soft $\rho \in [0, 1]$ between levels and differences; SBC commits to a hard Hamilton-filter trend-cycle split and forecasts the treated unit’s trend from its own history. The cases where SBC is the better choice:

The trend / cycle decomposition is part of the substantive story. Macro applications where you specifically want a trend-vs-cycle interpretation (German reunification’s reunification dividend on trend versus the catch-up boom on the cycle) get more out of SBC’s explicit decomposition than out of HSC’s single $\widehat{\rho}$.
The treated unit has a long, self-forecastable trend. SBC’s Step 2 forecasts the treated trend univariately. If the treated unit’s own history is rich enough that you’d happily fit an AR(p) on it without donors, SBC’s modularity is an asset, not a liability.
You prefer a fixed structural recipe to a CV-tuned knob. HSC’s $\widehat{\rho}$ is chosen by rolling-origin CV, which is itself noisy in finite samples. SBC’s Hamilton filter has no hyperparameter to tune at the SC step.
You want explicit asymptotic unbiasedness theory. SBC has a clean Theorem 1; HSC’s CV-based guarantee is less crisp.
The Hamilton filter’s stationary-cycle assumption is defensible. Macro outcomes (GDP, unemployment, prices) where business-cycle properties are well documented are exactly SBC’s comfort zone.

Conversely, prefer HSC when the trend-vs-cycle distinction is not substantively interesting (sales, market share, daily prices), when you have reason to think the shared-trend regime is plausible and don’t want SBC to discard that variation by construction, or when your outcome is outside the macroeconomic comfort zone of the Hamilton filter.

Graphical comparison#

The block below builds two contrasting panels and fits both HSC and mlsynth.SBC on each, then overlays the observed treated trajectory with both counterfactuals. Both panels have a true ATT of zero by construction; the closer a method’s dashed line tracks the solid observed line in the post-period (right of the dotted intervention line), the better that method estimates the (zero) treatment effect on this panel.

Panel A – sales-like data: donors and the treated unit share a strong deterministic linear trend, short pre-period ($T_0 = 16$). HSC’s cross-validation picks $\widehat{\rho} \approx 0.97$ (essentially level-matching), exploits the shared trend, and hugs the observed series. SBC strips the trend by Hamilton-filtering and then has to forecast it from a short pre-period of treated observations, which extrapolates poorly – the SBC counterfactual visibly undershoots.
Panel B – macro-like data: the treated unit has its own deterministic linear trend that none of the donors share, but donors and treated share a strong stationary cycle. Long pre-period ($T_0 = 120$). SBC’s univariate AR-trend forecast of the treated trend is essentially exact, and the cycle-only SC step uses the donors where they actually help. HSC has no donor combination that can reproduce the treated trend and is forced into a compromise $\widehat{\rho} \approx 0.5$, which drifts off in the post-period.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from mlsynth import HSC, SBC


def panel_HSC_wins(*, n_donors=10, T0=16, T1=12, seed=0):
    """Sales-like: shared deterministic growth + short pre-period."""
    rng = np.random.default_rng(seed)
    T = T0 + T1
    t = np.arange(T)
    shared = 100 + 2.0 * t
    donors = shared[None, :] + 1.5 * rng.standard_normal((n_donors, T))
    treated = shared + 1.5 * rng.standard_normal(T)
    rows = [{"unit": "T", "t": k, "y": float(treated[k]),
             "treat": int(k >= T0)} for k in range(T)]
    for i, d in enumerate(donors):
        rows.extend({"unit": f"d{i}", "t": k, "y": float(d[k]),
                      "treat": 0} for k in range(T))
    return pd.DataFrame(rows), T0, T1


def panel_SBC_wins(*, n_donors=10, T0=120, T1=20, seed=1):
    """Macro-like: treated has own linear trend, donors share a cycle only."""
    rng = np.random.default_rng(seed)
    T = T0 + T1
    t = np.arange(T)
    treated_trend = 100 + 0.8 * t
    cycle = np.zeros(T)
    for i in range(1, T):
        cycle[i] = 0.6 * cycle[i - 1] + rng.standard_normal()
    donors = np.zeros((n_donors, T))
    for i in range(n_donors):
        base = 100 + 4 * rng.standard_normal()
        donors[i] = base + 5.0 * cycle + 0.3 * rng.standard_normal(T)
    treated = treated_trend + 5.0 * cycle + 0.3 * rng.standard_normal(T)
    rows = [{"unit": "T", "t": k, "y": float(treated[k]),
             "treat": int(k >= T0)} for k in range(T)]
    for i, d in enumerate(donors):
        rows.extend({"unit": f"d{i}", "t": k, "y": float(d[k]),
                      "treat": 0} for k in range(T))
    return pd.DataFrame(rows), T0, T1


fig, axes = plt.subplots(1, 2, figsize=(13, 4.8))
for ax, (label, builder) in zip(axes, [
    ("Panel A: shared trend, short pre  (HSC favoured)", panel_HSC_wins),
    ("Panel B: own trend on treated, shared cycle  (SBC favoured)", panel_SBC_wins),
]):
    df, T0, T1 = builder()
    h = HSC({"df": df, "outcome": "y", "treat": "treat",
              "unitid": "unit", "time": "t",
              "display_graphs": False}).fit()
    s = SBC({"df": df, "outcome": "y", "treat": "treat",
              "unitid": "unit", "time": "t", "h": T1,
              "display_graphs": False}).fit()
    obs = df.loc[df["unit"] == "T", "y"].to_numpy()
    t = np.arange(obs.size)
    ax.plot(t, obs, "k-", lw=2.2, label="Observed (true ATT=0)")
    ax.plot(t, h.counterfactual_full, "--", color="tab:blue", lw=1.8,
             label=f"HSC counterfactual  (ATT={h.att:+.2f})")
    ax.plot(t, s.counterfactual_full, "--", color="tab:red", lw=1.8,
             label=f"SBC counterfactual  (ATT={s.att:+.2f})")
    ax.axvline(T0 - 0.5, color="grey", ls=":", alpha=0.7)
    ax.set_title(label, fontsize=11)
    ax.set_xlabel("t"); ax.set_ylabel("y")
    ax.legend(loc="best", fontsize=8.5)
    ax.grid(alpha=0.2)
plt.tight_layout()
plt.show()

prints (deterministic with the seeds above):

Panel A HSC:  ATT = -0.24   rho* = 0.97
Panel A SBC:  ATT = -6.62

Panel B HSC:  ATT = +6.22   rho* = 0.50
Panel B SBC:  ATT = -1.63

Read the plot the same way you’d read a Figure-3-style synthetic control plot: the post-period vertical gap between solid (observed) and dashed (counterfactual) is the estimated ATT. The closer that gap is to zero, the closer the method is to recovering the true (zero) effect on this synthetic panel.

Mathematical Formulation#

The Profiled Metric (Proposition 1)#

HSC jointly estimates donor weights $\mathbf{w}$ and a treated-unit-specific smooth component $\mathbf{e}$ that absorbs low-frequency residual variation, with the roughness of $\mathbf{e}$ penalized by $\|\mathbf{D}_q \mathbf{e}\|_2^2$. Profiling out $\mathbf{e}$ reduces the problem to a donor-weight QP under a $\rho$-dependent metric. With $\lambda_\rho = \rho/(1-\rho)$,

\[\mathbf{S}_{\rho,q} = (\mathbf{I}_{T_0} + \lambda_\rho \mathbf{K}_q)^{-1}, \qquad \mathbf{W}_{\rho,q} = \tfrac{1}{\rho}\,(\mathbf{I}_{T_0} - \mathbf{S}_{\rho,q}),\]

the donor weights solve a ridge-regularized simplex QP,

\[\widehat{\mathbf{w}}(\rho, q) = \operatorname*{argmin}_{\mathbf{w} \in \Delta^{N_0}} \; \mathbf{r}(\mathbf{w})^\top \mathbf{W}_{\rho,q}\, \mathbf{r}(\mathbf{w}) + \zeta\,\|\mathbf{w}\|_2^2,\]

and the fitted smooth component is $\widehat{\mathbf{e}} = \mathbf{S}_{\rho,q}\,(\mathbf{y}_{1,\mathcal{T}_1} - \mathbf{Y}_{0,\mathcal{T}_1}\widehat{\mathbf{w}})$. The smoother $\mathbf{S}_{\rho,q}$ extracts the low-frequency (trend-like) part of the residual; the metric $\mathbf{W}_{\rho,q}$ down-weights exactly that part in donor matching, which is what alleviates the spurious-matching risk.

Note

Donor ridge. The penalty coefficient $\zeta$ is set by ridge. A float gives a relative ridge $\texttt{ridge} \cdot \operatorname{tr}(\mathbf{Y}_{0,\mathcal{T}_1}^\top \mathbf{W}_{\rho,q} \mathbf{Y}_{0,\mathcal{T}_1})/N_0$ (default 1e-6). The string ridge="sdid" uses the data-driven SDID-style penalty $\zeta^2 T_0$ with $\zeta = (T - T_0)^{1/4} \widehat{\sigma}_{\Delta \mathbf{Y}_0}$ (Arkhangelsky et al., 2021; Liu-Xu §7), where $\widehat{\sigma}_{\Delta \mathbf{Y}_0}$ is the standard deviation of the donors’ first differences. The SDID ridge diversifies the donor weights (no corner solutions) and is the configuration the paper uses for its empirical application. On the 1997 Hong Kong handover it spreads weight broadly across all 11 donors (largest Korea ≈ 0.18, vs ≈ 0.41 under the near-unregularized relative ridge) while leaving the counterfactual essentially unchanged — see Empirical Application: the 1997 Hong Kong Handover below.

The Two Endpoints#

The metric extends continuously to the boundary, giving the two classical estimators as special cases:

$\rho \to 0$: $\mathbf{S} = \mathbf{I}$, $\mathbf{W} = \mathbf{K}_q$ — donor matching on q-th differences, and $\mathbf{e}$ absorbs the entire level discrepancy.
$\rho \to 1$: $\mathbf{S} = \mathbf{P}_0$ (the projector onto $\operatorname{Null}(\mathbf{K}_q)$), $\mathbf{W} = \mathbf{I} - \mathbf{P}_0$ — donor matching on levels with an intercept ($q=1$) or intercept plus linear trend ($q=2$).

Interior $\rho$ interpolates smoothly between them — a soft spectral transformation of the data rather than a hard filter.

Counterfactual and Forecasting#

In the post-period the smooth component is extrapolated by a time-series forecaster and added to the donor-matched component:

\[\widehat{y}_{1,T_0+h}^{\,N} \;=\; \mathbf{Y}_{0,\,T_0+h}^\top \widehat{\mathbf{w}} \;+\; \widehat{e}_{T_0+h}.\]

The default forecaster is a closed-form ARIMA(1, 1, 0) (an integrated AR(1) on the differences of $\widehat{\mathbf{e}}$), the correctly specified model for an idiosyncratic ARIMA(1,1,0) stochastic trend; forecaster="last" carries the last fitted value forward.

Selecting the Allocation#

$\rho$ is chosen by rolling-origin cross-validation (scikit-learn’s TimeSeriesSplit): for each candidate $\rho$ and each expanding-window fold, HSC is fit on the training block and scored on the held-out block by $\mathbf{Y}_{0,\text{val}}\widehat{\mathbf{w}} + \operatorname{forecast}(\widehat{\mathbf{e}})$. The $\rho$ with the lowest average out-of-sample error is selected, with no ex-ante assumption about whether the stochastic trend is shared or idiosyncratic.

Monte Carlo Study: Regime Adaptation#

The block below is self-contained. It draws panels from the paper’s data-generating process (Appendix C.1) under two regimes — a shared stochastic trend (rho_u=1) and an idiosyncratic one (rho_u=0) — and compares HSC against the two fixed strategies it interpolates: SC with an intercept (levels) and SC on first differences. The treated unit gets no effect, so the post-period RMSE is pure counterfactual error.

import numpy as np
import pandas as pd
import cvxpy as cp
from mlsynth import HSC

def integ_ar1(T, phi, innov):
    d = np.zeros(T)
    for t in range(1, T):
        d[t] = phi * d[t - 1] + innov[t]
    return np.cumsum(d)

def simulate(rng, Lam, lam0, alpha_d, N0, T0, Tpost, kappa, rho_u, phe=0.25):
    T = T0 + Tpost
    F = np.column_stack([
        np.cumsum(rng.normal(0, 2, T)),                 # random walk
        integ_ar1(T, 0.5, rng.normal(0, 2, T)),         # ARIMA(1,1,0)
        np.r_[0.0, np.cumsum(rng.normal(0, 1, T - 1))], # extra trend factor
    ])
    L = np.vstack([lam0, Lam]) @ F.T
    uc = rng.normal(0, np.sqrt(1 - phe**2), T)
    E = np.zeros((N0 + 1, T))
    for j in range(N0 + 1):
        ui = rng.normal(0, np.sqrt(1 - phe**2), T)
        E[j] = integ_ar1(T, phe, np.sqrt(rho_u) * uc + np.sqrt(1 - rho_u) * ui)
    alpha = np.concatenate([[0.0], alpha_d])
    Y = L + kappa * E + rng.normal(0, 1, (N0 + 1, T)) + alpha[:, None] + rng.normal(0, 1, T)
    return Y                                            # (N0+1, T); row 0 = treated

def to_df(Y, T0):
    return pd.DataFrame(
        [{"unit": f"u{j:02d}", "time": t, "y": float(Y[j, t]),
          "treat": int(j == 0 and t >= T0)}
         for j in range(Y.shape[0]) for t in range(Y.shape[1])]
    )

def simplex_fit(X, y):
    w = cp.Variable(X.shape[1])
    cp.Problem(cp.Minimize(cp.sum_squares(y - X @ w)),
               [w >= 0, cp.sum(w) == 1]).solve(solver=cp.CLARABEL)
    return np.clip(w.value, 0, None)

def sc_int(Xpre, Ypre, Xpost):       # SC with intercept (levels)
    w = simplex_fit(Xpre - Xpre.mean(0), Ypre - Ypre.mean())
    return Xpost @ w + (Ypre.mean() - Xpre.mean(0) @ w)

def sc_diff(Xpre, Ypre, Xpost):      # SC on first differences
    w = simplex_fit(np.diff(Xpre, axis=0), np.diff(Ypre))
    return Ypre[-1] + (Xpost - Xpre[-1]) @ w

N0, T0, Tpost, R = 20, 100, 10, 60
m = np.random.default_rng(0)
Lam = np.clip(m.normal(0, 0.5, (N0, 3)), -2, 2)            # donor loadings (fixed)
S8 = m.choice(N0, 8, replace=False)
lam0 = m.dirichlet(np.ones(8) * 0.5) @ Lam[S8]             # treated loading, in hull
alpha_d = m.uniform(5, 15, N0)                            # donor fixed effects

rmse = lambda e: float(np.sqrt(np.mean(np.asarray(e) ** 2)))
for label, rho_u in [("common drift", 1.0), ("idiosyncratic", 0.0)]:
    ei, ed, eh, rhos = [], [], [], []
    for r in range(R):
        rng = np.random.default_rng(1000 + r)
        Y = simulate(rng, Lam, lam0, alpha_d, N0, T0, Tpost, kappa=2.0, rho_u=rho_u)
        Ypre, Xpre = Y[0, :T0], Y[1:, :T0].T
        Xpost, Y0post = Y[1:, T0:].T, Y[0, T0:]
        ei.append(sc_int(Xpre, Ypre, Xpost) - Y0post)
        ed.append(sc_diff(Xpre, Ypre, Xpost) - Y0post)
        res = HSC({"df": to_df(Y, T0), "outcome": "y", "unitid": "unit",
                   "time": "time", "treat": "treat"}).fit()
        eh.append(res.design.counterfactual_post - Y0post)
        rhos.append(res.selected_rho)
    print(f"{label:>14}: SC-INT {rmse(ei):5.2f} | SC-diff {rmse(ed):5.2f} | "
          f"HSC {rmse(eh):5.2f} | mean rho_hat {np.mean(rhos):.2f}")

Running it reproduces the paper’s headline (and matches the standalone skeleton bit-for-bit):

 common drift: SC-INT  1.15 | SC-diff  1.48 | HSC  1.21 | mean rho_hat 0.86
idiosyncratic: SC-INT 10.60 | SC-diff  6.11 | HSC  6.46 | mean rho_hat 0.48

HSC is good in both regimes, tracking whichever fixed estimator is best, while each fixed estimator fails in one: SC-on-levels is excellent under a shared trend (1.15) but catastrophic when the trend is idiosyncratic (10.60); SC-on-differences is the reverse. And the cross-validated allocation moves the right way on its own — $\widehat{\rho} \approx 0.86$ (lean to levels) under a shared trend, dropping to $\approx 0.48$ (lean to differencing) when the trend is idiosyncratic.

Note

The small gap to the oracle-best fixed method in each regime (1.21 vs 1.15; 6.46 vs 6.11) is the expected price of adaptation — cross-validation is not an oracle — and is what buys robustness across regimes.

Across the full $(\kappa, \rho_u)$ grid#

Running the same harness at the paper’s actual sample sizes ($N_0 = 50$ donors, $T_0 = 200$, $T_{\text{post}} = 20$, q=1, arima110 forecaster) over the full Liu-Xu (2026) grid — amplitude $\kappa \in \{0, 0.5, 1, 2\}$ of the unit-specific stochastic trend and cross-unit sharing $\rho_u \in \{0, 0.5, 1\}$, 80 reps/cell — reproduces both of the paper’s Monte-Carlo figures, all through the packaged HSC estimator.

Figure 7 — the cross-validated $\widehat{\rho}$ adapts to the regime. When the stochastic trend is shared across units ($\rho_u = 1$) or absent ($\kappa = 0$), level matching identifies the donor weights and $\widehat{\rho}$ sits near 1; when the trend is idiosyncratic ($\kappa \ge 0.5$, $\rho_u \le 0.5$), the donor pool cannot reproduce it, so cross-validation pushes $\widehat{\rho}$ down toward differencing:

Mean cross-validated $\widehat{\rho}$#
$\kappa$	$\rho_u = 0$	$\rho_u = 0.5$	$\rho_u = 1$
0.0	0.89	0.92	0.92
0.5	0.51	0.55	0.94
1.0	0.55	0.59	0.89
2.0	0.49	0.43	0.90

Figure 6 — HSC ties or beats the baselines in every cell. The post-period RMSE ratio of HSC to plain level-matching SC is below 1 throughout (0.34-0.77): HSC ties SC tightly when the trend is shared/absent (ratio $\approx$ 0.34-0.39, both near the noise floor) and the advantage widens with $\kappa$ as idiosyncratic drift grows — exactly the §6.2 pattern.

Post-period RMSE ratio, HSC / plain SC#
$\kappa$	$\rho_u = 0$	$\rho_u = 0.5$	$\rho_u = 1$
0.0	0.39	0.34	0.34
0.5	0.62	0.49	0.36
1.0	0.68	0.64	0.37
2.0	0.77	0.57	0.38

To reproduce, set N0, T0, Tpost = 50, 200, 20 in the block above and sweep kappa and rho_u over the grid (the paper uses $R = 500$ reps; 80 already gives the pattern above).

Inference#

The authors are explicit that uncertainty quantification “rests on additional assumptions beyond those required for the point estimator” and leave it to future work, pointing toward a prediction interval that combines donor-weight estimation uncertainty with out-of-sample forecast-error calibration for the smooth component.

Empirical Application: the 1997 Hong Kong Handover#

The paper’s headline application (Liu-Xu §7) revisits the economic-integration question of Hsiao, Ching and Wan (2012): what happened to Hong Kong’s real GDP per capita after the 1997 handover to mainland China? The panel (basedata/hong_kong_handover.csv) is annual real GDP per capita for Hong Kong and 11 OECD donors, with treatment switching on in 1997 ($T_0 = 36$ pre-treatment years, 1961–1996). Following the authors, the post-treatment window runs 1997–2003 ($T_{\text{post}} = 7$) — the counterfactual of an integrated process is only trustworthy a few years out (see the note below), so the analysis is not extended to the end of the raw series. GDP per capita is the canonical nonstationary, trending outcome for which HSC is built, and it is unknown a priori whether Hong Kong’s growth trend is shared with the donor pool or idiosyncratic — exactly the regime HSC’s cross-validated $\rho$ resolves.

import numpy as np
import pandas as pd
from mlsynth import HSC

df = pd.read_csv(
    "https://raw.githubusercontent.com/jgreathouse9/mlsynth/"
    "refs/heads/main/basedata/hong_kong_handover.csv"
)
df = df[df["year"] <= 2003]               # authors' post-window: 1997-2003

res = HSC({
    "df": df, "outcome": "gdp", "unitid": "country",
    "time": "year", "treat": "Handover",
    "ridge": "sdid",          # SDID-style ridge -> diversified weights
    # refine the rho grid: the CV optimum (~0.09) lies between the default
    # grid's 0.0 and 0.2 points, so the coarse default would miss it.
    "rho_grid": list(np.round(np.arange(0.0, 0.98, 0.01), 2)),
    "display_graphs": True,
}).fit()

print("selected rho:", round(res.selected_rho, 3))   # 0.09
print("2003 effect:", round(res.treatment_effect[-1]))  # -1902
print("ATT 1997-2003:", round(res.att))              # -1734
for k, v in sorted(res.weights_by_donor.items(), key=lambda kv: -kv[1]):
    print(f"  {k:12s} {v:.3f}")

Donor weights. With the SDID ridge the design leans on a broad mix of donors rather than a handful — the largest weight is Korea ≈ 0.18, followed by Germany ≈ 0.14, the US ≈ 0.13 and Italy ≈ 0.11, with the remaining seven donors each contributing 0.04–0.09. This matches the paper’s reported weights almost exactly (Korea 0.18, Germany 0.14, US 0.13, Italy 0.11, max < 0.19). Under the near-unregularized relative ridge the same fit instead collapses most of the weight onto Korea (≈ 0.41) — the SDID ridge is what buys the diversification.

The counterfactual path. Cross-validation selects $\widehat{\rho} = 0.09$ (paper: 0.11) — a genuine interior allocation, leaning toward differencing but keeping some level information. Hong Kong sits below its synthetic counterfactual throughout the post-handover window, a persistent shortfall spanning the Asian Financial Crisis (1998) and the 2003 SARS epidemic:

HSC counterfactual vs. observed real GDP per capita (USD), `ridge="sdid"`, fine grid#
Year	Observed	Counterfactual	Effect
1997	26,632	26,093	+538
1998	24,857	26,528	-1,671
1999	25,238	27,651	-2,413
2000	26,933	28,809	-1,876
2001	26,885	29,225	-2,340
2002	27,210	29,688	-2,478
2003	28,097	29,999	-1,902

The 2003 effect is -$1,902 against a counterfactual of $29,999 — matching the paper’s headline (≈ -$1,900 against ≈ $30,000) to the dollar. The average post-treatment effect is $\widehat{\text{ATT}} =$ -$1,734 per capita.

Note

Refine the $\rho$ grid for empirical work. The default rho_grid [0, 0.2, 0.5, 0.8, 0.97] is coarse — fine for the validated Monte Carlo, but here the true CV optimum ($\rho \approx 0.09$) falls in the gap between the grid’s 0.0 and 0.2 points. Both neighbours are ~8-10% worse on the CV criterion than the interior optimum, so the coarse grid selects an unrelated point (0.5) and inflates the largest weight to 0.21. Passing a fine grid (np.arange(0, 0.98, 0.01)) recovers $\widehat{\rho} = 0.09$ and the paper’s weights. This was the dominant source of the gap to the paper — not the data.

Robustness Checks#

The 2003 effect (≈ -$1,900) and counterfactual (≈ $30,000) are stable across the estimator’s tuning choices. Re-fitting (fine $\rho$ grid throughout) under the relative ridge, second-order differencing ($q = 2$), and the last-value forecaster:

Hong Kong robustness (post-window 1997-2003, fine grid)#
Configuration	$\widehat{\rho}$	max weight	cf 2003	effect 2003	ATT
`ridge="sdid"`, `q=1`, ARIMA(1,1,0)	0.09	0.18	29,999	-1,902	-1,734
`ridge=1e-6`, `q=1`, ARIMA(1,1,0)	0.56	0.41	29,857	-1,760	-1,744
`ridge="sdid"`, `q=2`, ARIMA(1,1,0)	0.00	0.17	30,200	-2,103	-1,818
`ridge="sdid"`, `q=1`, `forecaster="last"`	0.00	0.18	30,708	-2,611	-2,319

Across every configuration the 2003 effect stays in a tight -$1,760 to -$2,610 band and the 2003 counterfactual in $29,800–$30,700. The ridge choice changes how the weight is spread (max donor weight 0.18 under the SDID ridge vs. 0.41 under the relative ridge) far more than it changes the counterfactual; the differencing order and forecaster move it only modestly. The rolling-origin CV curve over $\rho$ confirms a real interior optimum near 0.09 that the default grid straddles — the criterion rises ~8-10% at the default grid’s 0.0/0.2/0.5 points and explodes toward $\rho = 1$, the signature of a partly idiosyncratic trend:

Rolling-origin CV error by $\rho$ (`ridge="sdid"`, fine grid)#
$\rho$	CV mean squared error	vs. optimum
0.00 (default grid)	1.178e6	+8.3%
0.05	1.103e6	+1.3%
0.09 (selected)	1.088e6	—
0.11 (paper)	1.094e6	+0.6%
0.20 (default grid)	1.193e6	+9.6%
0.50 (default grid)	1.177e6	+8.2%
0.97 (default grid)	4.69e6	+331%

Note

This is a close Path-A empirical replication: on the same data, the authors’ 1997–2003 window, and a fine $\rho$ grid, HSC matches the paper’s headline numbers essentially to the dollar — $\widehat{\rho} = 0.09$ (paper 0.11), a -$1,902 2003 effect against a $29,999 counterfactual (paper ≈ -$1,900 / ≈ $30,000), and donor weights Korea 0.18 / Germany 0.14 / US 0.13 / Italy 0.11 (paper 0.18 / 0.14 / 0.13 / 0.11). The remaining $\rho$-gap (0.09 vs 0.11) is within the CV’s grid resolution. The validated bit-for-bit check for HSC is the Monte Carlo above, which matches the standalone estimation skeleton exactly.

Core API#

Harmonic Synthetic Control (HSC) estimator.

Implements:

Liu, Z., & Xu, Y. (2026). “The Harmonic Synthetic Control Method.”

HSC addresses the spurious-matching risk of synthetic control on nonstationary outcomes with unit-specific stochastic trends. Instead of the binary choice between matching on raw levels (vulnerable to spurious matching from idiosyncratic trends) and matching on differences (which discards useful shared trend variation), HSC uses a soft, data-driven allocation:

donor weights are estimated under a frequency-dependent metric W_{rho,q} that downweights low-frequency (trend-like) residual variation, and
a treated-unit-specific smooth component E absorbs the low-frequency residual and is forecast forward to complete the counterfactual.

A single allocation parameter rho in [0, 1] – selected by rolling-origin cross-validation – continuously interpolates between SC on q-th differences (rho -> 0) and SC on levels with an intercept/trend (rho -> 1). See mlsynth.utils.hsc_helpers for the algorithmic pieces.

class mlsynth.estimators.hsc.HSC(config: HSCConfig | dict)#

Bases: object

Harmonic Synthetic Control (HSC) estimator.

Parameters:: config (HSCConfig or dict) – Configuration object. See mlsynth.config_models.HSCConfig.
Returns:: HSCResults – Donor weights, the fitted smooth component, the counterfactual, and the cross-validated allocation rho.

Notes

Following Liu & Xu (2026), this ships the HSC point estimator only. Uncertainty quantification rests on additional assumptions beyond those needed for the point estimate (the authors point to a Cattaneo-Feng-Titiunik (2021) style prediction interval) and is deliberately not implemented here.

fit() → HSCResults#: Run the HSC pipeline.

Configuration#

class mlsynth.config_models.HSCConfig(*, df: ~pandas.DataFrame, outcome: str, treat: str, unitid: str, time: str, display_graphs: bool = False, save: bool | str = False, counterfactual_color: ~typing.List[str] = <factory>, treated_color: str = 'black', plot: ~mlsynth.config_models.PlotConfig = <factory>, q: ~typing.Literal[1, 2] = 1, rho_grid: ~typing.List[float] = <factory>, cv_splits: ~typing.Annotated[int, ~annotated_types.Ge(ge=2)] = 3, ridge: float | ~typing.Literal['sdid'] = 1e-06, forecaster: ~typing.Literal['arima110', 'last'] = 'arima110')#

Configuration for the Harmonic Synthetic Control (HSC) estimator.

Implements:

Liu, Z., & Xu, Y. (2026). “The Harmonic Synthetic Control Method.”

HSC matches donors under a frequency-dependent metric and absorbs the treated unit’s low-frequency residual into a smooth component that is forecast forward. A single allocation parameter rho in [0, 1], selected by rolling-origin cross-validation, interpolates between synthetic control on q-th differences (rho -> 0) and synthetic control on levels with an intercept/trend (rho -> 1).

Parameters:

q (int) – Smoothness order of the difference operator (1 or 2). q=1 controls a stochastic trend / random-walk drift; q=2 controls a local-linear trend. Default 1.
rho_grid (list of float) – Candidate allocation values in [0, 1] searched by CV. Default [0.0, 0.2, 0.5, 0.8, 0.97].
cv_splits (int) – Number of rolling-origin folds (sklearn TimeSeriesSplit). Default 3.
ridge (float) – Relative ridge coefficient on the donor weights for QP conditioning. Default 1e-6.
forecaster ({“arima110”, “last”}) – Forecaster for the smooth residual. "arima110" (default) is a closed-form ARIMA(1, 1, 0); "last" carries the last value forward.
display_graphs (bool) – Show the observed-vs-counterfactual plot after fitting.

Notes

Following Liu & Xu (2026), HSC ships the point estimator only; uncertainty quantification is deliberately out of scope, so there are no inference options here.

cv_splits: int#

display_graphs: bool#

forecaster: Literal['arima110', 'last']#

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

q: Literal[1, 2]#

rho_grid: List[float]#

ridge: float | Literal['sdid']#

Result Containers#

HSC.fit() returns an HSCResults, bundling the fitted HSCDesign (donor weights, the smooth component, the cross-validated rho and its CV curve, the counterfactual) and the prepared HSCInputs.

Structured containers for the Harmonic Synthetic Control (HSC) pipeline.

Implements the dataclasses used throughout HSC, which itself implements:

Liu, Z., & Xu, Y. (2026). “The Harmonic Synthetic Control Method.”

All containers are frozen (immutable) per the repository convention.

class mlsynth.utils.hsc_helpers.structures.HSCDesign(selected_rho: float, q: int, omega: ndarray, smooth_pre: ndarray, donor_match_pre: ndarray, counterfactual_post: ndarray, smooth_forecast: ndarray, donor_match_post: ndarray, cv_curve: Dict[float, float], forecaster: str)#

Bases: object

Fitted HSC design.

Parameters:

selected_rho (float) – Cross-validated allocation parameter in [0, 1]. Near 1 the design matches on levels (with an intercept/trend); near 0 it matches on q-th differences and the smooth component absorbs the trend.
q (int) – Smoothness order used.
omega (np.ndarray) – Donor weights on the simplex, shape (N,).
smooth_pre (np.ndarray) – Fitted treated-unit smooth residual E over the pre-period, shape (T0,) (S_{rho,q} (Y_pre - X_pre omega)).
donor_match_pre (np.ndarray) – Donor-matched component over the pre-period, X_pre @ omega.
counterfactual_post (np.ndarray) – Post-treatment counterfactual, shape (n_post,) (X_post @ omega + forecast(E)).
smooth_forecast (np.ndarray) – Forecast of the smooth residual over the post-period.
donor_match_post (np.ndarray) – Donor-matched component over the post-period, X_post @ omega.
cv_curve (dict) – {rho: rolling-origin CV error} used to select selected_rho.
forecaster (str) – Name of the residual forecaster used (e.g. "arima110").

counterfactual_post: ndarray#

cv_curve: Dict[float, float]#

donor_match_post: ndarray#

donor_match_pre: ndarray#

forecaster: str#

omega: ndarray#

q: int#

selected_rho: float#

smooth_forecast: ndarray#

smooth_pre: ndarray#

class mlsynth.utils.hsc_helpers.structures.HSCInputs(y_target: ndarray, donor_matrix: ndarray, T: int, T0: int, treated_unit_name: Any, donor_names: Sequence, time_labels: ndarray, q: int)#

Bases: object

Preprocessed panel data for HSC.

Parameters:

y_target (np.ndarray) – Treated unit’s outcome over the full timeline, shape (T,).
donor_matrix (np.ndarray) – Donor outcomes, shape (T, N) (rows = periods, cols = donors).
T (int) – Total number of periods.
T0 (int) – Number of pre-treatment periods.
treated_unit_name (Any) – Label of the treated unit.
donor_names (Sequence) – Labels of the donor units (length N).
time_labels (np.ndarray) – Time-period labels (length T).
q (int) – Smoothness order of the difference operator (1 or 2).

T: int#

T0: int#

property X_post: ndarray#: Donor post-treatment matrix, shape (T - T0, N).

property X_pre: ndarray#: Donor pre-treatment matrix, shape (T0, N).

property Y_post: ndarray#: Treated post-treatment outcomes, shape (T - T0,).

property Y_pre: ndarray#: Treated pre-treatment outcomes, shape (T0,).

donor_matrix: ndarray#

donor_names: Sequence#

property n_post: int#: Number of post-treatment periods.

q: int#

time_labels: ndarray#

treated_unit_name: Any#

y_target: ndarray#

class mlsynth.utils.hsc_helpers.structures.HSCResults(*, effects: EffectsResults | None = None, fit_diagnostics: FitDiagnosticsResults | None = None, time_series: TimeSeriesResults | None = None, weights: WeightsResults | None = None, inference: InferenceResults | None = None, method_details: MethodDetailsResults | None = None, sub_method_results: Dict[str, Any] | None = None, additional_outputs: Dict[str, Any] | None = None, raw_results: Dict[str, Any] | None = None, execution_summary: Dict[str, Any] | None = None, plot_config: PlotConfig | None = None, inputs: HSCInputs, design: HSCDesign, counterfactual_full: ndarray, treatment_effect: ndarray, weights_by_donor: Dict[Any, float])#

Bases: BaseEstimatorResults

User-facing output of the HSC estimator.

An EffectResult (the observational report): it populates the standardized sub-models so the flat accessors att / counterfactual / gap / donor_weights / pre_rmse resolve through the base contract. The HSC-specific fields below carry the harmonic decomposition detail.

Parameters:

inputs (HSCInputs) – Preprocessed panel data.
design (HSCDesign) – Fitted design (weights, smooth component, counterfactual).
counterfactual_full (np.ndarray) – Counterfactual over the full timeline, shape (T,) (pre-period in-sample fit followed by the post-period counterfactual; mirrored into res.counterfactual).
treatment_effect (np.ndarray) – Post-period treated minus counterfactual, shape (n_post,). (Note: the contract’s res.gap is the full-length treated-minus-CF series; this field stays post-only for HSC consumers.)
weights_by_donor (dict) – {donor_label: weight} for donors with non-trivial weight (mirrored into res.donor_weights).

counterfactual_full: np.ndarray#

design: HSCDesign#

inputs: HSCInputs#

property mode: str#: Solver mode reported to downstream consumers.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'frozen': True, 'json_encoders': {<class 'numpy.ndarray'>: <function BaseEstimatorResults.Config.<lambda>>}}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property selected_rho: float#: Cross-validated allocation parameter.

treatment_effect: np.ndarray#

weights_by_donor: Dict[Any, float]#

Helper Modules#

Data preparation for HSC.

Wraps datautils.dataprep to pivot the long panel into the treated outcome vector and donor matrix HSC consumes, and validates that the pre-treatment window is long enough for the difference operator and the rolling-origin cross-validation.

mlsynth.utils.hsc_helpers.setup.prepare_hsc_inputs(df: DataFrame, outcome: str, unitid: str, time: str, treat: str, q: int, n_splits: int = 3) → HSCInputs#

Pivot panel data into the (T,) / (T, N) layout HSC consumes.

Parameters:

df (pd.DataFrame) – Long balanced panel data.
outcome, unitid, time, treat (str) – Column names for outcome, units, time, and the binary treatment indicator (1 for the treated unit in post-treatment periods).
q (int) – Smoothness order (1 or 2); determines the minimum pre-period length.
n_splits (int) – Rolling-origin CV folds; used only to validate the pre-period is long enough to cross-validate.

Returns:

HSCInputs

The profiled HSC metric and the donor-weight quadratic program.

Implements the profiled representation (Proposition 1 of Liu & Xu, 2026). For a fixed allocation rho in (0, 1) and smoothness order q in {1, 2}:

lambda_rho = rho / (1 - rho) K_q = D_q’ D_q (roughness matrix) S_{rho,q} = (I + lambda_rho K_q)^{-1} (smoother) W_{rho,q} = (1 / rho) (I - S_{rho,q}) (donor-matching metric)

with the boundary cases

rho = 0: S = I, W = K_q (SC on q-th differences) rho = 1: S = P_0, W = I - P_0 (SC on levels + intercept/trend)

where P_0 projects onto Null(K_q) (constants for q=1, linear trends for q=2). Donor weights then solve a ridge-regularized simplex QP under the metric W_{rho,q}.

mlsynth.utils.hsc_helpers.formulation.difference_operator(T: int, q: int) → ndarray#: q-th order difference operator D_q of shape (T - q, T).

mlsynth.utils.hsc_helpers.formulation.fit_donor_weights(X: ndarray, Y: ndarray, W: ndarray, ridge: float = 1e-06, ridge_abs: float | None = None, solver: object | None = None) → ndarray#

Solve the simplex-constrained, metric-weighted donor QP.

argmin_{omega in simplex} (Y - X omega)' W (Y - X omega) + c ||omega||^2, where the ridge coefficient c is either ridge_abs (absolute, e.g. the SDID-style zeta^2 T0 from sdid_ridge_coefficient()) when supplied, or the relative ridge * trace(X'WX) / N otherwise.

Parameters:

X (np.ndarray) – Donor matrix, shape (T, N).
Y (np.ndarray) – Treated outcome, shape (T,).
W (np.ndarray) – Donor-matching metric, shape (T, T).
ridge (float) – Relative ridge coefficient (used only when ridge_abs is None).
ridge_abs (float, optional) – Absolute ridge coefficient on ||omega||^2. Overrides ridge.
solver (optional) – CVXPY solver; defaults to Clarabel.

Returns:

np.ndarray – Non-negative donor weights summing to one, shape (N,).

mlsynth.utils.hsc_helpers.formulation.roughness_matrix(T: int, q: int) → ndarray#: Roughness matrix K_q = D_q' D_q of shape (T, T).

mlsynth.utils.hsc_helpers.formulation.sdid_ridge_coefficient(X_pre: ndarray, n_post: int) → float#

SDID-style absolute ridge coefficient on ||omega||^2.

Following Liu & Xu (2026) §7 and Arkhangelsky et al. (2021), the donor penalty is zeta^2 * T0 with zeta = T_post^{1/4} * sigma_dX, where sigma_dX is the standard deviation of the first differences of the donor outcomes over the pre-period. This scales the ridge to the data and yields diversified (non-corner) donor weights.

Parameters:

X_pre (np.ndarray) – Donor pre-treatment matrix, shape (T0, N).
n_post (int) – Number of post-treatment periods.

Returns:

float – Absolute coefficient on ||omega||^2 (zeta^2 * T0).

mlsynth.utils.hsc_helpers.formulation.smoother_and_metric(T: int, q: int, rho: float) → Tuple[ndarray, ndarray]#

Return (S_{rho,q}, W_{rho,q}) for the profiled HSC problem.

Parameters:

T (int) – Length of the (pre-treatment) series.
q (int) – Smoothness order (1 or 2).
rho (float) – Allocation parameter in [0, 1].

Returns:

S (np.ndarray) – Smoother that extracts the smooth residual E from the discrepancy.
W (np.ndarray) – Symmetric PSD donor-matching metric (a seminorm).

Forecasters for the HSC smooth residual E.

In post-treatment periods the treated-unit smooth component E is extrapolated by a time-series forecaster and added to the donor-matched component to form the counterfactual. The default is an ARIMA(1, 1, 0) forecaster – the correctly specified model for the paper’s idiosyncratic ARIMA(1, 1, 0) stochastic-trend DGP.

The ARIMA(1, 1, 0) forecast is computed in closed form (no iterative MLE): the first differences follow a zero-mean AR(1), so future increments decay geometrically and the level path integrates them forward. This is fully deterministic, which keeps HSC results exactly reproducible.

mlsynth.utils.hsc_helpers.forecast.arima110_forecast(x: ndarray, h: int) → ndarray#

Closed-form ARIMA(1, 1, 0) forecast of a level series x.

Fits a zero-mean AR(1) to diff(x) by least squares, then forecasts h future increments d_{t+k} = phi^k d_t and integrates to levels.

Parameters:

x (np.ndarray) – Observed level series.
h (int) – Forecast horizon.

Returns:

np.ndarray – Length-h forecast of the level path.

mlsynth.utils.hsc_helpers.forecast.forecast_smooth(E: ndarray, h: int, method: str = 'arima110') → ndarray#: Dispatch a forecaster by name over the smooth residual E.

mlsynth.utils.hsc_helpers.forecast.last_value_forecast(x: ndarray, h: int) → ndarray#: Carry the last observed value forward as a constant.

HSC fit at a fixed allocation, and rolling-origin CV over the allocation.

fit_at_rho solves the profiled donor QP and returns the smooth residual; select_rho_by_cv chooses rho by sklearn’s rolling-origin TimeSeriesSplit, scoring each candidate by its out-of-sample prediction error (donor match + forecast of the smooth component) on held-out pre-period blocks.

mlsynth.utils.hsc_helpers.optimization.fit_at_rho(X_pre: ndarray, Y_pre: ndarray, rho: float, q: int, ridge: float = 1e-06, solver: object | None = None, ridge_abs: float | None = None) → Tuple[ndarray, ndarray]#

Fit the profiled HSC problem at a fixed rho.

ridge_abs (e.g. the SDID-style coefficient) overrides the relative ridge when supplied.

Returns:

omega (np.ndarray) – Donor weights on the simplex, shape (N,).
E_pre (np.ndarray) – Fitted smooth residual S_{rho,q} (Y_pre - X_pre omega), (T0,).

mlsynth.utils.hsc_helpers.optimization.select_rho_by_cv(X_pre: ndarray, Y_pre: ndarray, q: int, rho_grid: Sequence[float], n_splits: int = 3, ridge: float = 1e-06, forecaster: str = 'arima110', solver: object | None = None, ridge_abs: float | None = None) → Tuple[float, Dict[float, float]]#

Select rho by rolling-origin cross-validation.

For each candidate rho and each expanding-window TimeSeriesSplit fold, HSC is fit on the training block; the held-out block is predicted by X_val @ omega + forecast(E) and scored by mean squared error. The rho with the lowest average fold error wins.

Returns:

best_rho (float) – Selected allocation parameter.
cv_curve (dict) – {rho: mean CV error}.

Top-level HSC solve and effect summary.

solve_hsc runs the full pipeline – select rho by rolling-origin CV, fit the profiled donor QP on the whole pre-period, forecast the smooth residual into the post-period, and assemble the counterfactual. The post-treatment counterfactual is

Y_hat_post = X_post @ omega + forecast(E),

the donor-matched component plus the extrapolated smooth component.

mlsynth.utils.hsc_helpers.orchestration.solve_hsc(inputs: HSCInputs, q: int, rho_grid: Sequence[float], n_splits: int = 3, ridge: object = 1e-06, forecaster: str = 'arima110', solver: object | None = None) → HSCDesign#

Run HSC end-to-end and return the fitted design.

ridge is either a float (relative ridge ridge * trace(X'WX)/N) or the string "sdid", which uses the data-driven SDID-style coefficient zeta^2 T0 (computed once from the full pre-period donors and the post-treatment horizon) for diversified donor weights.

mlsynth.utils.hsc_helpers.orchestration.summarize_effects(inputs: HSCInputs, design: HSCDesign) → Tuple[float, ndarray, ndarray]#

Assemble ATT, full-timeline counterfactual, and per-period effects.

Returns:

att (float) – Mean post-period treatment effect.
counterfactual_full (np.ndarray) – Pre-period in-sample fit (X_pre @ omega + E) stacked with the post-period counterfactual, shape (T,).
treatment_effect (np.ndarray) – Y_post - counterfactual_post, shape (n_post,).

Plotting for HSC: observed treated series vs the HSC counterfactual.

mlsynth.utils.hsc_helpers.plotter.plot_hsc(results: HSCResults) → None#

Plot the treated outcome against the HSC counterfactual.

Parameters:: results (HSCResults) – Output of mlsynth.estimators.HSC.
Raises:: MlsynthPlottingError – If required arrays are missing from the result.

References#

Liu, Z., & Xu, Y. (2026). “The Harmonic Synthetic Control Method.” See [HSC].

Abadie, A., Diamond, A., & Hainmueller, J. (2010). “Synthetic Control Methods for Comparative Case Studies.” Journal of the American Statistical Association 105(490):493-505.

Arkhangelsky, D., Athey, S., Hirshberg, D. A., Imbens, G. W., & Wager, S. (2021). “Synthetic Difference-in-Differences.” American Economic Review 111(12):4088-4118.

Bai, J. (2009). “Panel Data Models with Interactive Fixed Effects.” Econometrica 77(4):1229-1279.

Cattaneo, M. D., Feng, Y., & Titiunik, R. (2021). “Prediction Intervals for Synthetic Control Methods.” Journal of the American Statistical Association 116(536):1865-1880.

Granger, C. W. J., & Newbold, P. (1974). “Spurious Regressions in Econometrics.” Journal of Econometrics 2(2):111-120.

Hsiao, C., Ching, H. S., & Wan, S. K. (2012). “A Panel Data Approach for Program Evaluation: Measuring the Benefits of Political and Economic Integration of Hong Kong with Mainland China.” Journal of Applied Econometrics 27(5):705-740.

Phillips, P. C. B. (1986). “Understanding Spurious Regressions in Econometrics.” Journal of Econometrics 33(3):311-340.

Harmonic Synthetic Control (HSC)

Contents

Harmonic Synthetic Control (HSC)#

When to use this estimator#

Notation#

The Outcome Decomposition#

Assumptions#

Diagnostic: shared vs idiosyncratic stochastic trends#

When to prefer SBC over HSC#

Graphical comparison#

Mathematical Formulation#

The Profiled Metric (Proposition 1)#

The Two Endpoints#

Counterfactual and Forecasting#

Selecting the Allocation#

Monte Carlo Study: Regime Adaptation#

Across the full \((\kappa, \rho_u)\) grid#

Inference#

Empirical Application: the 1997 Hong Kong Handover#

Robustness Checks#

Core API#

Configuration#

Result Containers#

Helper Modules#

References#