Time-Aware Synthetic Control (TASC)

Contents

Time-Aware Synthetic Control (TASC)#

Overview#

Time-Aware Synthetic Control (TASC) arXiv:2601.03099 is a state-space synthetic-control estimator. Unlike classical SC (Forward Difference-in-Differences (FDID), Two-Step Synthetic Control) or robust-SC variants (Cluster Synthetic Controls (CLUSTERSC), Proximal Inference Synthetic Control (PROXIMAL)), which treat the ordering of pre-intervention time indices as interchangeable, TASC explicitly models the temporal evolution of the latent factors driving the panel. It embeds the standard SC outcome matrix inside a linear-Gaussian state-space model with a constant trend matrix \(A\), fits the model parameters via the Expectation-Maximization (EM) algorithm with a Kalman-filter + Rauch-Tung-Striebel (RTS) smoother E-step and a closed-form M-step, and produces both a point counterfactual and a posterior-based confidence band in one pass.

Two structural properties distinguish TASC from the rest of the mlsynth toolkit:

  • Time-awareness. Because \(A\) is shared across periods, permuting the pre-intervention time indices changes the fit. Permutation-invariant methods (classical SC, robust SC, nuclear-norm matrix completion) produce identical counterfactuals under the same permutation; TASC does not. Section 5.1 of the paper formalizes this via a data-processing-inequality argument (Proposition A.1).

  • Approximately low-rank signal under omnidirectional noise. The observation matrix decomposes as \(Y = H X + E\) where \(H X\) is exactly rank-\(d\) and \(E\) is full-rank observation noise. TASC therefore tolerates substantial measurement noise — even when PCA-style denoising (used by Robust SC) breaks down — because it does not assume the principal directions are noise-free.

When to use TASC instead of something else#

The Rho-Illick-Narasipura-Abadie-Hsu-Misra (2026) paper runs a 4-cell ablation comparing TASC against vanilla SC, Robust SC, and the Causal Impact Model under independent variation of the observation-noise covariance \(R\) and the state-perturbation covariance \(Q\) (Section 5.2, Figures 3-4 of the paper). The clean recommendation:

  • Use TASC when observation noise is high. Across the two large-\(R\) cells (small-\(Q\) and large-\(Q\)) TASC delivers the smallest median RMSE in the paper’s simulation. PCA- style denoising (Robust SC) and simplex shrinkage (vanilla SC) break down because they assume the principal directions of the observation matrix are noise-free; TASC’s full-rank \(R \sim \mathcal{N}(0, R)\) assumption is a much better fit when the noise is omnidirectional.

  • Use TASC when the donor panel has a persistent, smoothly-varying trend. “Persistent” means the trend extends past the intervention point. This is the strong-trend regime (small \(Q\), non-trivial \(A\)). The Kalman + RTS smoother extrapolates the trend forward; PCA / nuclear-norm methods don’t.

  • Use TASC when you need a posterior credible band for free. TASC is a generative model. The RTS smoother returns the full posterior covariance at every period, so a +/- 1.96 sigma band on the counterfactual is part of the fit’s output. The other mlsynth estimators that ship credible bands are Bayesian Synthetic Control with a Soft Simplex Constraint (BVS-SS) (Bayesian spike-and-slab) and Time-Aware Synthetic Control (TASC) itself; the rest require an external bootstrap or subsampling pass.

When not to reach for TASC:

  • The pre-intervention trend is weak or absent (the paper writes “\(A \approx 0\)” — large \(Q\) regime). The smaller the trend, the smaller TASC’s edge over classical SC; in the small-\(R\), large-\(Q\) cell of the paper’s ablation, vanilla SC matches or beats TASC.

  • Observation noise is small AND structured low-dimensional. Under small \(R\), hard singular-value thresholding (Robust SC) cleans the signal exactly, and TASC’s omnidirectional-\(R\) prior is paying a price for flexibility it doesn’t need.

  • Long-horizon forecasting in noisy regimes. The paper’s Figures 5-6 show that under large \(R\) and large \(Q\), TASC’s RMSE rises noticeably from horizon 51-60 to horizon 91-100 (small-\(Q\) is stable). If you need a 5-year-out counterfactual on a noisy panel, look at Factor Model Approach (FMA) or Matrix Completion with Nuclear Norm Minimization (MCNNM) first.

  • Time indices are not really ordered (you’re modelling a cross-section that happens to be indexed by time, or the periods are interchangeable up to relabelling). Permuting time indices costs TASC 48.5% on mean RMSE and 25.7% on the RMSE standard deviation in the paper’s controlled test (Section 5.1, Figure 2). If the time ordering is meaningless, use a permutation-invariant estimator like Two-Step Synthetic Control or Cluster Synthetic Controls (CLUSTERSC).

Assumptions (and how to spot violations)#

TASC inherits the assumptions of a linear-Gaussian state-space model. Section 3 of the paper lays them out; the practitioner-facing restatement is:

  1. Linear-Gaussian dynamics. The hidden state evolves as \(x_t = A x_{t-1} + q_t\) with \(q_t\) zero-mean Gaussian. Equivalently: the trend in the latent factors is well-approximated by a stable linear AR(1) at the level of the state vector, and the perturbations around the trend are homoscedastic and uncorrelated across time.

    Plausibly violated when: the latent factor evolution is strongly nonlinear (regime switches, breakpoints, structural breaks), has fat-tailed shocks, or has volatility clustering. Diagnostic: examine the smoothed state residuals from the pre-period fit; non-Gaussian QQ-plot tails or Ljung-Box-significant autocorrelation suggest misspecification.

  2. Constant trend matrix :math:`A`. The dynamics that hold over the pre-period are assumed to continue unchanged through the post-period. This is the “trend persists past the intervention point” assumption that gives TASC its long-horizon advantage.

    Plausibly violated when: the intervention itself triggers a regime change in the donor units (e.g.\ a tax change that affects neighbouring states’ growth dynamics, not just their levels). TASC is by construction unable to detect a post-period change in \(A\) — the post-period target outcomes are treated as missing, so they cannot inform the update. Diagnostic: split the pre-period into two halves and refit on each. If the estimated \(A\) differs materially, the constant-\(A\) assumption is shaky and the post-period forecast is suspect.

  3. Observation model :math:`y_t = H x_t + r_t`, :math:`R` full rank, :math:`d ll min(n, T)`. The signal is low-rank with rank \(d\) (the latent-state dimension); the noise \(r_t\) is Gaussian with a positive-definite covariance. Importantly, \(R\) does NOT have to be diagonal: TASC handles correlated cross-donor noise via the full \(R\) (set diagonal_R = False in the config).

    Plausibly violated when: the residual cross-section is rank-deficient (some donors are exact linear combinations of others, e.g.\ aggregated subseries paired with their components), or when the true signal is full-rank (no shared factors — every donor moves independently). In both cases the EM estimate of \(d\) ends up wrong and either underfits (low \(d\)) or overfits (high \(d\)).

  4. No unobserved confounders that affect donors AND treated unit between :math:`T_0` and :math:`T_0 + 1`. This is the standard SC unconfoundedness assumption, not specific to TASC, but worth restating: TASC’s counterfactual is informative about the treatment effect only if any post-period shock to the donor pool is also reflected in what the target would have done absent treatment.

    Plausibly violated when: a covariate that drives the treated unit’s outcome (but is uncorrelated with the donors) shifts at the intervention time. TASC has no covariate hook, so this kind of confounding can only be diagnosed externally.

  5. Hidden-state dimension :math:`d` correctly specified. TASC takes \(d\) as a user hyperparameter (hidden_state_dim). The paper’s Section 5.3 shows that underestimating :math:`d` is worse than overestimating — if in doubt, err on the high side.

    Plausibly violated when: the data has more latent factors than you’ve allowed for. Diagnostic: increase \(d\) and refit; if the RMSE on a held-out pre-period segment drops materially, you were underfitting.

Mathematical Formulation#

Let \(Y \in \mathbb{R}^{N \times T}\) be the outcome matrix with units in rows and periods in columns. The first row corresponds to the treated target unit; the remaining \(n = N - 1\) rows are donors. Pre-intervention periods are \(t = 1, \dots, T_0\); the post- intervention window is \(t = T_0 + 1, \dots, T\), during which the target row is unobserved (the very quantity TASC reconstructs).

State-Space Model#

The TASC generative model (Eqs. (2)-(3) of the paper) is a classical linear-Gaussian state-space model:

\[\begin{split}\begin{aligned} x_t &= A \, x_{t-1} + q_{t-1}, & q_{t-1} &\sim \mathcal{N}(0, Q), \\ y_t &= H \, x_t + r_t, & r_t &\sim \mathcal{N}(0, R), \end{aligned}\end{split}\]

with initial state \(x_0 \sim \mathcal{N}(m_0, P_0)\). The hidden state \(x_t \in \mathbb{R}^d\) has dimension \(d \ll \min(n, T)\), which is precisely what preserves the low-rank structure of the signal \(H X\). The complete parameter set is

\[\theta \;=\; \{A, H, Q, R, m_0, P_0\}, \quad A \in \mathbb{R}^{d \times d}, \quad H \in \mathbb{R}^{N \times d}, \quad Q \in \mathbb{R}^{d \times d}, \quad R \in \mathbb{R}^{N \times N}.\]

All three covariance matrices \(Q, R, P_0\) are positive definite. The TASCConfig flags diagonal_Q and diagonal_R control whether the M-step constrains \(Q\) and \(R\) to be diagonal (the paper’s default — see Algorithm 7) or updates the full symmetric covariance.

Relationship to the Linear Factor Model#

The classical SC linear factor model from Abadie & Gardeazabal (2003),

\[Y_{i,t} \;=\; \delta_t + \theta_t^\top Z_i + \lambda_t^\top \mu_i + \epsilon_{i,t},\]

can be cast as a state-space model with latent state \(x_t = (\delta_t, \theta_t, \lambda_t)\) and observation rows \(h_i = (1, Z_i, \mu_i)\). The crucial distinction is that linear factor models impose no dynamics on \(x_t\) (or equivalently \(A = 0\), \(x_t = q_t\)), whereas TASC enforces a stable trend through \(A\). This is what gives TASC its long-horizon forecast accuracy under correct specification, at the cost of greater sensitivity to misspecification when temporal dynamics are complex.

The Counterfactual via Infinite-Variance Kalman Filtering#

In the post-intervention window the target’s observed value is unavailable. TASC handles this by formally setting the target’s observation-noise variance to \(+\infty\) (Section 4.2 of the paper). Partition

\[\begin{split}y_t = \begin{pmatrix} y_{t,1} \\ y_{t,2} \end{pmatrix}, \quad H = \begin{pmatrix} h_1^\top \\ H_2 \end{pmatrix}, \quad R' = \begin{pmatrix} \infty & 0 \\ 0 & R_2 \end{pmatrix},\end{split}\]

where \(y_{t,2}, r_{t,2} \in \mathbb{R}^n\), \(H_2 \in \mathbb{R}^{n \times d}\), and \(R_2 \in \mathbb{R}^{n \times n}\). Under \(R'\), the Schur-complement inverse of the innovation covariance

\[\begin{split}(S_k)^{-1} \;=\; \begin{pmatrix} 0 & 0 \\ 0 & (H_2 P_{k|k-1} H_2^\top + R_2)^{-1} \end{pmatrix}\end{split}\]

has a zero in its (1, 1) block. The Kalman gain therefore picks up no contribution from the target row, and the post-intervention filter update depends only on the donor block. This is implemented in mlsynth.utils.tasc_helpers.filtering.kalman_filter_inf_variance_step() (Algorithm 5), and the full forward pass is composed by mlsynth.utils.tasc_helpers.filtering.kalman_filter_full() following Algorithm 3.

Once the forward pass produces \((m_k, P_k)_{k=0}^T\), the backward Rauch-Tung-Striebel smoother (mlsynth.utils.tasc_helpers.smoothing.rts_smoother(), Algorithm 6) returns the smoothed posterior

\[m_k^s, \; P_k^s, \; G_k \quad \text{for } k = T, T-1, \dots, 0,\]

with

\[\begin{split}\begin{aligned} m_{k+1|k} &= A \, m_k, \\ P_{k+1|k} &= A \, P_k \, A^\top + Q, \\ G_k &= P_k \, A^\top \, P_{k+1|k}^{-1}, \\ m_k^s &= m_k + G_k \left( m_{k+1}^s - m_{k+1|k} \right), \\ P_k^s &= P_k + G_k \left( P_{k+1}^s - P_{k+1|k} \right) G_k^\top. \end{aligned}\end{split}\]

The counterfactual for the target unit is then read off the smoothed latent state via \(h_1\):

\[\hat y_{0, t} \;=\; h_1^\top \, m_t^s, \qquad t = 1, \dots, T,\]

and the posterior variance of the observation (not just the latent target) is

\[\operatorname{Var}(y_{0, t} \mid y_{1:T_0}, y_{2:N, \, T_0+1:T}) \;=\; h_1^\top \, P_t^s \, h_1 \;+\; R_{1, 1}.\]

The corresponding \((1 - \alpha)\)-confidence band is

\[\hat y_{0, t} \;\pm\; z_{1 - \alpha / 2} \, \sqrt{\,h_1^\top P_t^s h_1 + R_{1, 1}\,}.\]

These are populated unconditionally on TASCResults.inference, with \(\alpha\) controlled by the TASCConfig.alpha field.

Learning \(\theta\) from Pre-Intervention Data (EM)#

The parameter set \(\theta\) is learned by Expectation-Maximization on the pre-intervention slice \(Y_{\text{pre}} \in \mathbb{R}^{N \times T_0}\). Each outer iteration of mlsynth.utils.tasc_helpers.em.em_pre() (Algorithm 2) runs:

  1. E-step (filtering pass): apply the standard Kalman filter (Algorithm 4) for \(k = 1, \dots, T_0\) to obtain \((m_k, P_k)\).

  2. E-step (smoothing pass): apply the RTS smoother backward to obtain \((m_k^s, P_k^s, G_k)\) for \(k = T_0, \dots, 0\).

  3. M-step (closed-form MLE update): Algorithm 7. Define the sufficient statistics

    \[\begin{split}\begin{aligned} \Sigma &= \frac{1}{T_0} \sum_{k=1}^{T_0} \left( P_k^s + m_k^s {m_k^s}^\top \right), & \Phi &= \frac{1}{T_0} \sum_{k=1}^{T_0} \left( P_{k-1}^s + m_{k-1}^s {m_{k-1}^s}^\top \right), \\ B &= \frac{1}{T_0} \sum_{k=1}^{T_0} y_k \, {m_k^s}^\top, & C &= \frac{1}{T_0} \sum_{k=1}^{T_0} \left( P_k^s G_{k-1}^\top + m_k^s {m_{k-1}^s}^\top \right), \\ D &= \frac{1}{T_0} \sum_{k=1}^{T_0} y_k \, y_k^\top. \end{aligned}\end{split}\]

    The update is then

    \[\begin{split}\begin{aligned} A' &\leftarrow C \, \Phi^{-1}, & H' &\leftarrow B \, \Sigma^{-1}, \\ Q' &\leftarrow \operatorname{Diag}\!\left( \Sigma - 2 C A'^\top + A' \Phi A'^\top \right), & R' &\leftarrow \operatorname{Diag}\!\left( D - 2 B H'^\top + H' \Sigma H'^\top \right), \\ m_0' &\leftarrow m_0^s, & P_0' &\leftarrow P_0^s + (m_0^s - m_0)(m_0^s - m_0)^\top, \end{aligned}\end{split}\]

    where \(\operatorname{Diag}(\cdot)\) zeroes the off-diagonal entries when diagonal_Q=True / diagonal_R=True (the paper’s default) or returns the symmetric matrix unchanged otherwise.

The loop terminates after TASCConfig.n_em_iter outer iterations or, if TASCConfig.em_tol is set, as soon as the maximum absolute change in \((A, H)\) between successive iterations falls below the threshold.

Spectral Initialization#

EM is sensitive to initialization (as noted in the paper’s Section 7). TASC therefore warm-starts \(\theta^{(0)}\) from a truncated SVD of the pre-intervention matrix (mlsynth.utils.tasc_helpers.setup.initialize_parameters()):

\[Y_{\text{pre}} \;=\; U \, \operatorname{diag}(s) \, V^\top, \qquad H^{(0)} = U_{:, 1:d} \, \operatorname{diag}(s_{1:d}), \qquad X^{(0)} = V_{:, 1:d}.\]

The transition matrix \(A^{(0)}\) is obtained from a ridge- regularized AR(1) least-squares fit on the latent trajectory \(X^{(0)}\); \(Q^{(0)}\) and \(R^{(0)}\) are seeded from the corresponding residual variances; and \(m_0^{(0)}, P_0^{(0)}\) are taken from the first row of \(X^{(0)}\) and \(Q^{(0)}\) respectively.

Treatment Effect and Pre-Period Fit#

For post-treatment periods \(t = T_0 + 1, \dots, T\), the average treatment effect on the treated is the mean of the post-period gap between the observed target and its TASC reconstruction:

\[\widehat{\mathrm{ATT}} \;=\; \frac{1}{T - T_0} \sum_{t = T_0 + 1}^{T} \left( y_{0, t} - \hat y_{0, t} \right),\]

reported as TASCResults.att. The pre-period RMSE between the observed target and the smoother’s pre-treatment fit,

\[\mathrm{RMSE}_{\text{pre}} \;=\; \sqrt{ \frac{1}{T_0} \sum_{t = 1}^{T_0} \left( y_{0, t} - \hat y_{0, t} \right)^2 },\]

is reported as TASCResults.pre_rmse and serves as the primary fit diagnostic.

Complexity#

The dominant cost of TASC is \(O(N_1 \, T_0 \, N^3)\), where \(N_1\) is the number of EM iterations and the \(N^3\) term arises from inverting the innovation covariance during the Kalman filter. The post-EM full-window pass adds \(O(T \, N^3)\), which is negligible when \(T \ll N_1 \, T_0\). Constraining \(R\) to be diagonal in the M-step (the default) does not change the filter’s inner-loop complexity but does reduce parameter-count variance and improves numerical stability in moderate-\(N\) regimes.

Algorithm 1 and the Theoretical Appendix#

The paper’s Algorithm 1 is the abstract “SC Family of Methods” frame (target-side regression on donors), which TASC instantiates implicitly through the state-space machinery rather than as a discrete code path. Appendix A’s Proposition A.1 (Kalman sufficiency, information loss by permutation invariance, dominance) is the theoretical justification for TASC’s edge over permutation-invariant SC variants; it does not correspond to a separate routine in mlsynth.

Core API#

Time-Aware Synthetic Control (TASC) estimator.

Implements:

Rho, S., Illick, C., Narasipura, S., Abadie, A., Hsu, D., & Misra, V. (2026). “Time-Aware Synthetic Control.” arXiv:2601.03099.

TASC embeds the standard SC panel inside a linear-Gaussian state-space model

x_t = A x_{t-1} + q_{t-1}, q ~ N(0, Q) y_t = H x_t + r_t, r ~ N(0, R)

and learns the parameters theta = {A, H, Q, R, m_0, P_0} from the pre-treatment data via EM (Kalman filter + RTS smoother E-step, closed-form MLE M-step). Counterfactual inference runs a Kalman filter pass with the target’s observation-noise variance R_{1, 1} set to infinity over the post-treatment window, followed by RTS smoothing. The counterfactual is then y_hat_{0, t} = h_1^T m_t^s, with posterior CIs derived from h_1^T P_t^s h_1 + R_{1, 1}.

Algorithm flow (see mlsynth.utils.tasc_helpers):

setup.py : Algorithm 3 line 0 prepare_tasc_inputs, init theta_0 em.py : Algorithm 2 EM_pre filtering.py : Algorithms 4 and 5 Kalman filter passes smoothing.py : Algorithm 6 RTS smoother mstep.py : Algorithm 7 closed-form MLE M-step orchestration.py : Algorithm 3 run_tasc + summarize_effects inference.py : Algorithm 3 footer h_1^T m^s, posterior CIs

class mlsynth.estimators.tasc.TASC(config: TASCConfig | dict)#

Bases: object

Time-Aware Synthetic Control (TASC) estimator.

Fits a linear-Gaussian state-space model to a single-treated-unit synthetic control panel via EM, and forms counterfactual estimates and posterior confidence intervals from a Kalman / RTS pass that treats the target’s post-treatment observations as missing.

Parameters:

config (TASCConfig or dict) – Configuration object specifying the panel inputs and the EM / inference hyperparameters. See mlsynth.config_models.TASCConfig.

Returns:

TASCResults – Container with the learned model, EM diagnostics, smoothed states, counterfactual path, and posterior confidence intervals. See mlsynth.utils.tasc_helpers.structures.TASCResults.

Notes

  • Each column of the input panel is a unit, each row is a time period (datautils.dataprep convention). Internally, TASC reshapes the data to the paper’s Y in R^{N x T} orientation.

  • The hidden state dimension d should be small relative to min(n_donors, T) to preserve the low-rank structure of the signal.

  • EM is sensitive to initialization. Defaults use a spectral (top-d SVD) start which generally produces stable fits.

References

Rho, S., Illick, C., Narasipura, S., Abadie, A., Hsu, D., & Misra, V. (2026). “Time-Aware Synthetic Control.” arXiv:2601.03099.

Examples

>>> from mlsynth import TASC
>>> config = {
...     "df": panel,
...     "outcome": "sales",
...     "unitid": "state",
...     "time": "year",
...     "treat": "treated",
...     "d": 2,
... }
>>> results = TASC(config).fit()
fit() TASCResults#

Run the TASC pipeline and return the learned design.

Returns:

TASCResults – Final design, inputs, posterior inference, and summary effects.

Configuration#

class mlsynth.config_models.TASCConfig(*, df: ~pandas.DataFrame, outcome: str, treat: str, unitid: str, time: str, display_graphs: bool = True, save: bool | str = False, counterfactual_color: ~typing.List[str] = <factory>, treated_color: str = 'black', d: ~typing.Annotated[int, ~annotated_types.Ge(ge=1)], n_em_iter: ~typing.Annotated[int, ~annotated_types.Ge(ge=1)] = 50, em_tol: ~typing.Annotated[float | None, ~annotated_types.Gt(gt=0.0)] = None, diagonal_Q: bool = True, diagonal_R: bool = True, alpha: ~typing.Annotated[float, ~annotated_types.Gt(gt=0.0), ~annotated_types.Lt(lt=1.0)] = 0.05, seed: int | None = None)#

Configuration for the Time-Aware Synthetic Control (TASC) estimator.

Implements the state-space model of Rho, Illick, Narasipura, Abadie, Hsu, and Misra (2026, arXiv:2601.03099) with EM learning (Kalman filter + RTS smoother on the E-step, closed-form MLE on the M-step) and Kalman-with- infinite-variance counterfactual inference.

alpha: float#
d: int#
diagonal_Q: bool#
diagonal_R: bool#
em_tol: float | None#
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_em_iter: int#
seed: int | None#

Helper Modules#

Data preparation and EM initialization helpers for TASC.

datautils.dataprep returns the wide outcome matrix in the (T, N) orientation (rows = time, columns = unit). The TASC paper works with Y in R^{N x T} (rows = unit, columns = time), so the wrappers in this module take care of the transpose once and only once.

mlsynth.utils.tasc_helpers.setup.initialize_parameters(Y_pre: ndarray, d: int, seed: int | None = None) TASCParameters#

Spectral initialization for the EM parameters.

Performs a thin SVD of Y_pre (N x T0) and uses the top-d left singular vectors as the initial observation matrix H. The initial latent trajectory is then the corresponding right singular vectors scaled by the singular values, from which a simple AR(1) least squares fit gives A. Q, R, P0 are seeded from the associated residual variances.

Parameters:
  • Y_pre (np.ndarray) – Pre-treatment outcome matrix of shape (N, T0).

  • d (int) – Hidden state dimension.

  • seed (int or None) – Reserved for tie-breaking; not currently consumed but retained for API symmetry with other estimators.

Returns:

TASCParameters – Initial theta_0.

mlsynth.utils.tasc_helpers.setup.prepare_tasc_inputs(df: DataFrame, outcome: str, unitid: str, time: str, treat: str) TASCInputs#

Run dataprep and reshape the result into the paper’s N x T layout.

Parameters:
  • df (pd.DataFrame) – Long balanced panel data.

  • outcome, unitid, time, treat (str) – Column names identifying the outcome, units, time periods, and the binary treatment indicator.

Returns:

TASCInputs – Pre / post matrices in (N, T) orientation along with metadata.

Kalman filter passes for TASC.

Implements Algorithm 4 (standard Kalman filter) and Algorithm 5 (Kalman filter with infinite observation-noise variance on the target row) from Rho, Illick, Narasipura, Abadie, Hsu, Misra (2026, arXiv:2601.03099).

The “infinite variance” trick (Sec 4.2) lets the post-treatment update use only the donor rows to refine the latent state, while the target row’s contribution to the Kalman gain is zeroed out by Schur complement.

mlsynth.utils.tasc_helpers.filtering.kalman_filter_full(Y_pre: ndarray, Y_post_donors: ndarray, params: TASCParameters) TASCFilteredStates#

Forward pass over all T periods (Algorithm 3, lines 1-3).

The first T0 updates use Algorithm 4; the remaining T - T0 updates use Algorithm 5 with the target’s observation variance set to infinity.

Parameters:
  • Y_pre (np.ndarray) – Pre-treatment slice of shape (N, T0).

  • Y_post_donors (np.ndarray) – Post-treatment donor-only slice of shape (N - 1, T - T0).

  • params (TASCParameters) – EM-learned model parameters.

mlsynth.utils.tasc_helpers.filtering.kalman_filter_inf_variance_step(y_donors_k: ndarray, m_prev: ndarray, P_prev: ndarray, params: TASCParameters) Tuple[ndarray, ndarray]#

Single Kalman filter step with R_{1,1} = inf (Algorithm 5).

The target row is treated as missing. We partition

H = [h_1^T; H_2], R = diag(inf, R_2)

so that the inverse innovation covariance has zero in the (1, 1) block by Schur complement, and only the donor block contributes to the update.

Parameters:
  • y_donors_k (np.ndarray) – Donor observations at time k, shape (N - 1,).

  • m_prev, P_prev (np.ndarray) – Previous filtered mean and covariance.

  • params (TASCParameters) – Current model parameters.

mlsynth.utils.tasc_helpers.filtering.kalman_filter_pre(Y_pre: ndarray, params: TASCParameters) TASCFilteredStates#

Run Algorithm 4 across the pre-treatment window.

Parameters:
  • Y_pre (np.ndarray) – Pre-treatment outcome matrix of shape (N, T0).

  • params (TASCParameters) – Current model parameters.

mlsynth.utils.tasc_helpers.filtering.kalman_filter_step(y_k: ndarray, m_prev: ndarray, P_prev: ndarray, params: TASCParameters) Tuple[ndarray, ndarray]#

Single Kalman filter step (Algorithm 4).

Rauch-Tung-Striebel smoother for TASC (Algorithm 6).

Operates on the output of the forward Kalman pass and produces smoothed state estimates plus the smoother-gain matrices G_k required for the M-step (Algorithm 7 computes C from G_{k-1}).

mlsynth.utils.tasc_helpers.smoothing.rts_smoother(filtered: TASCFilteredStates, params: TASCParameters) TASCSmoothedStates#

Backward smoothing pass.

Parameters:
  • filtered (TASCFilteredStates) – Output of kalman_filter_pre or kalman_filter_full. Index 0 holds the prior; indices 1..T hold filtered posteriors.

  • params (TASCParameters) – Model parameters used in the forward pass.

Returns:

TASCSmoothedStates – Smoothed means and covariances at indices 0..T, plus the smoother gains G_k at indices 0..T-1. G[T] is zero-filled and unused.

Closed-form M-step for TASC (Algorithm 7).

Given the smoothed sufficient statistics, returns new theta' that maximizes the expected complete-data log-likelihood. Q and R may be constrained to be diagonal (the paper’s default) via the corresponding flags.

mlsynth.utils.tasc_helpers.mstep.m_step(Y_pre: ndarray, smoothed: TASCSmoothedStates, prev_params: TASCParameters, diagonal_Q: bool = True, diagonal_R: bool = True) TASCParameters#

Maximum-likelihood parameter update (Algorithm 7).

Parameters:
  • Y_pre (np.ndarray) – Pre-treatment outcomes, shape (N, T). The number of columns must equal the smoother’s T.

  • smoothed (TASCSmoothedStates) – Output of the RTS smoother. Indices 0..T-1 supply m_{k-1}^s and G_{k-1}; indices 1..T supply m_k^s.

  • prev_params (TASCParameters) – Current parameters, used to seed m_0 and P_0 updates.

  • diagonal_Q, diagonal_R (bool) – If True (default), the corresponding covariance update is restricted to its diagonal as in the paper.

EM loop on pre-intervention data (Algorithm 2: EM_pre).

Each outer iteration runs:

  1. Forward Kalman filter pass over Y_pre (Algorithm 4).

  2. Backward RTS smoother pass (Algorithm 6).

  3. Closed-form M-step update (Algorithm 7).

Optionally terminates early when the max-abs change in (A, H) falls below em_tol.

mlsynth.utils.tasc_helpers.em.em_pre(Y_pre: ndarray, init_params: TASCParameters, n_em_iter: int, em_tol: float | None = None, diagonal_Q: bool = True, diagonal_R: bool = True) Tuple[TASCParameters, ndarray, TASCFilteredStates, TASCSmoothedStates]#

Run EM_pre over the pre-treatment window.

Parameters:
  • Y_pre (np.ndarray) – Pre-treatment outcome matrix of shape (N, T0).

  • init_params (TASCParameters) – Initial parameters theta^{(0)}.

  • n_em_iter (int) – Maximum number of EM iterations (N_1 in the paper).

  • em_tol (float or None) – If not None, EM stops once the max absolute change in (A, H) is below this threshold.

  • diagonal_Q, diagonal_R (bool) – Forwarded to m_step.

Returns:

  • params (TASCParameters) – Final parameter estimate.

  • deltas (np.ndarray) – Max-abs change in (A, H) at each iteration (length equal to the number of iterations actually run).

  • filtered (TASCFilteredStates) – Forward filtered states from the final iteration.

  • smoothed (TASCSmoothedStates) – RTS smoothed states from the final iteration.

Top-level TASC procedure (Algorithm 3).

Sequence:

  1. EM_pre on the pre-treatment data to learn theta.

  2. Forward Kalman pass: Algorithm 4 on t = 1..T0 and Algorithm 5 on t = T0 + 1..T.

  3. Backward RTS smoother on the full window.

  4. Counterfactual + posterior CI computation.

mlsynth.utils.tasc_helpers.orchestration.run_tasc(inputs: TASCInputs, d: int, n_em_iter: int, em_tol: float | None, diagonal_Q: bool, diagonal_R: bool, alpha: float, seed: int | None = None)#

Execute Algorithm 3 end-to-end.

Returns:

  • design (TASCDesign) – Learned model with EM diagnostics and final filtered / smoothed states across all T periods.

  • inference (TASCInference) – Counterfactual and posterior-based CIs.

mlsynth.utils.tasc_helpers.orchestration.summarize_effects(inputs: TASCInputs, inference: TASCInference) tuple[float, float]#

Compute ATT (post-period mean gap) and pre-period RMSE.

Both are returned as plain floats for storage on TASCResults.

Counterfactual inference for TASC.

Given the smoothed states from the full forward / backward pass (Algorithm 3, lines 1-3 of the post-EM block), the counterfactual is

y_hat_{0, t} = h_1^T m_t^s

and the posterior variance of the target observation is

Var(y_{0, t} | y_{1:T_donors}) = h_1^T P_t^s h_1 + R_{1, 1}

(adding back the observation noise restores the variance of an observation rather than the latent target).

mlsynth.utils.tasc_helpers.inference.counterfactual_with_ci(smoothed: TASCSmoothedStates, params: TASCParameters, alpha: float) TASCInference#

Compute the TASC counterfactual path and posterior CIs.

Parameters:
  • smoothed (TASCSmoothedStates) – Smoothed states from the full pass over all T periods. Index 0 holds the smoothed prior; indices 1..T hold the smoothed posteriors for each observation timestep.

  • params (TASCParameters) – Final EM-learned parameters. H[0] is h_1 and R[0, 0] is the target’s observation-noise variance R_{1, 1}.

  • alpha (float) – Significance level. The bands are y_hat +/- z_{1 - alpha/2} * sd.

Plotting helper for TASC.

Wraps mlsynth.utils.resultutils.plot_estimates so we get the standard observed-vs-counterfactual chart with the posterior-based CI band shaded behind the counterfactual curve.

mlsynth.utils.tasc_helpers.plotter.plot_tasc(results: TASCResults, treated_color: str = 'black', counterfactual_color: str | List[str] = 'red', save: bool | dict = False, time_axis_label: str = 'Time', outcome_label: str | None = None, treatment_label: str = 'Treatment', unit_label: str = 'Unit') None#

Render the TASC counterfactual against the observed series.

Structured containers for the TASC pipeline.

All matrices follow the paper’s convention Y in R^{N x T} with rows = units (target as the first row, donors below) and columns = time. This is the transpose of datautils.dataprep’s Ywide (which is time x unit). The transpose is performed once in setup.prepare_tasc_inputs.

class mlsynth.utils.tasc_helpers.structures.TASCDesign(parameters: TASCParameters, n_em_iter_used: int, em_param_deltas: ndarray, filtered: TASCFilteredStates, smoothed: TASCSmoothedStates)#

Learned model and EM diagnostics.

Parameters:
  • parameters (TASCParameters) – Final EM-estimated parameters.

  • n_em_iter_used (int) – Number of EM iterations actually executed (may be less than the cap if em_tol triggered early stopping).

  • em_param_deltas (np.ndarray) – Per-iteration max absolute change in (A, H). Length equal to n_em_iter_used.

  • filtered (TASCFilteredStates) – Forward filtered states from the final full pass.

  • smoothed (TASCSmoothedStates) – Backward smoothed states from the final full pass.

em_param_deltas: ndarray#
filtered: TASCFilteredStates#
n_em_iter_used: int#
parameters: TASCParameters#
smoothed: TASCSmoothedStates#
class mlsynth.utils.tasc_helpers.structures.TASCFilteredStates(m: ndarray, P: ndarray)#

Output of the forward Kalman pass.

Parameters:
  • m (np.ndarray) – Filtered means stacked over time, shape (T + 1, d). Index 0 holds the prior m0; index k >= 1 holds the posterior mean m_{k|k}.

  • P (np.ndarray) – Filtered covariances, shape (T + 1, d, d). Index 0 holds P0.

P: ndarray#
m: ndarray#
class mlsynth.utils.tasc_helpers.structures.TASCInference(counterfactual: ndarray, ci_lower: ndarray, ci_upper: ndarray, posterior_variance: ndarray, alpha: float)#

Counterfactual point estimates and posterior confidence intervals.

Parameters:
  • counterfactual (np.ndarray) – Estimated counterfactual for the target unit across all T periods, y_hat_{0, t} = h_1^T m_t^s.

  • ci_lower (np.ndarray) – Lower confidence band, shape (T,).

  • ci_upper (np.ndarray) – Upper confidence band, shape (T,).

  • posterior_variance (np.ndarray) – Posterior variance of the target row, h_1^T P_t^s h_1 + R_{1,1}, shape (T,).

  • alpha (float) – Significance level used to build the bands.

alpha: float#
ci_lower: ndarray#
ci_upper: ndarray#
counterfactual: ndarray#
posterior_variance: ndarray#
class mlsynth.utils.tasc_helpers.structures.TASCInputs(Y_full: ndarray, Y_pre: ndarray, Y_post_donors: ndarray | None, T0: int, T: int, N: int, treated_unit_name: str, donor_names: Sequence, time_labels: ndarray, pre_periods: int, post_periods: int, Ywide: object, y_target: ndarray)#

Pre-processed panel data fed into the TASC EM and inference loops.

Parameters:
  • Y_full (np.ndarray) – Outcome matrix of shape (N, T) with the target in row 0 and the n = N - 1 donor units in rows 1..N-1.

  • Y_pre (np.ndarray) – Pre-treatment slice Y_full[:, :T0] of shape (N, T0).

  • Y_post_donors (np.ndarray or None) – Post-treatment donor-only slice Y_full[1:, T0:] of shape (n, T - T0). None if T0 == T (no post period available).

  • T0 (int) – Number of pre-treatment periods.

  • T (int) – Total number of periods.

  • N (int) – Total number of units (target + donors).

  • treated_unit_name (str) – Identifier of the treated unit.

  • donor_names (Sequence) – Identifiers for the donor units in the order matching rows 1..N-1.

  • time_labels (np.ndarray) – Time labels in their original order, length T.

  • pre_periods (int) – Alias of T0 kept for compatibility with plotting helpers that expect a processed_data_dict from dataprep.

  • post_periods (int) – T - T0.

  • Ywide (object) – The wide pandas frame produced by dataprep (rows = time, columns = units). Retained so that plot_estimates can use it directly without re-pivoting.

  • y_target (np.ndarray) – Convenience copy of the full observed target series, length T (post-treatment values are the observed values, used only for plotting and effect computation; TASC treats them as missing during filtering).

N: int#
T: int#
T0: int#
Y_full: ndarray#
Y_post_donors: ndarray | None#
Y_pre: ndarray#
Ywide: object#
donor_names: Sequence#
post_periods: int#
pre_periods: int#
time_labels: ndarray#
treated_unit_name: str#
y_target: ndarray#
class mlsynth.utils.tasc_helpers.structures.TASCParameters(A: ndarray, H: ndarray, Q: ndarray, R: ndarray, m0: ndarray, P0: ndarray)#

State-space parameters theta = {A, H, Q, R, m0, P0}.

Parameters:
  • A (np.ndarray) – Transition matrix, shape (d, d).

  • H (np.ndarray) – Observation matrix, shape (N, d). Row 0 is h_1^T.

  • Q (np.ndarray) – State-noise covariance, shape (d, d).

  • R (np.ndarray) – Observation-noise covariance, shape (N, N).

  • m0 (np.ndarray) – Initial state mean, shape (d,).

  • P0 (np.ndarray) – Initial state covariance, shape (d, d).

A: ndarray#
H: ndarray#
P0: ndarray#
Q: ndarray#
R: ndarray#
m0: ndarray#
class mlsynth.utils.tasc_helpers.structures.TASCResults(inputs: TASCInputs, design: TASCDesign, inference: TASCInference, att: float, pre_rmse: float)#

Public TASC.fit() return container.

Parameters:
  • inputs (TASCInputs) – Pre-processed panel data.

  • design (TASCDesign) – Learned model, EM diagnostics, and filtered / smoothed state arrays.

  • inference (TASCInference) – Counterfactual and posterior-based confidence intervals.

  • att (float) – Average treatment effect on the treated across post-treatment periods. mean(y_{0, t} - y_hat_{0, t}) for t > T0.

  • pre_rmse (float) – Root mean squared error between the observed target and its smoother-based fit over the pre-treatment window.

att: float#
design: TASCDesign#
inference: TASCInference#
inputs: TASCInputs#
pre_rmse: float#
class mlsynth.utils.tasc_helpers.structures.TASCSmoothedStates(m_s: ndarray, P_s: ndarray, G: ndarray)#

Output of the RTS backward pass.

Parameters:
  • m_s (np.ndarray) – Smoothed means, shape (T + 1, d). Index 0 is m_0^s.

  • P_s (np.ndarray) – Smoothed covariances, shape (T + 1, d, d). Index 0 is P_0^s.

  • G (np.ndarray) – RTS smoother gain matrices, shape (T + 1, d, d). G[k] is the gain used to smooth time k from k + 1; G[T] is unused.

G: ndarray#
P_s: ndarray#
m_s: ndarray#

Example#

from mlsynth import TASC

# TASC accepts either a TASCConfig instance or a plain dict.
config = {
    "df": df,
    "outcome": "sales",
    "unitid": "state",
    "time": "year",
    "treat": "treated",        # binary 0/1 treatment indicator
    "d": 2,                    # hidden state dimension (small)
    "n_em_iter": 50,           # N_1 in Algorithm 2
    "em_tol": 1e-4,            # optional early-stopping on max |delta(A, H)|
    "diagonal_Q": True,        # paper default; set False for full covariance
    "diagonal_R": True,
    "alpha": 0.05,             # significance level for posterior CIs
    "display_graphs": True,
}

results = TASC(config).fit()

# Point estimate and fit diagnostic
print(results.att)              # mean post-period gap, y_{0,t} - h_1' m_t^s
print(results.pre_rmse)         # pre-period RMSE of the smoother's target fit

# Counterfactual path and posterior confidence band
cf = results.inference.counterfactual     # length-T vector
lo = results.inference.ci_lower           # length-T vector
hi = results.inference.ci_upper           # length-T vector

# Learned model and EM diagnostics
theta = results.design.parameters         # A, H, Q, R, m0, P0
print(theta.A.shape, theta.H.shape)
print(results.design.n_em_iter_used)      # number of EM iterations executed
print(results.design.em_param_deltas)     # per-iteration max |delta(A, H)|

# Smoothed latent trajectory (useful for downstream diagnostics)
m_s = results.design.smoothed.m_s         # shape (T + 1, d)
P_s = results.design.smoothed.P_s         # shape (T + 1, d, d)

# Inputs preserved on the result object for plotting / re-analysis
results.inputs.Y_full.shape               # (N, T)
results.inputs.Y_pre.shape                # (N, T0)
results.inputs.Y_post_donors.shape        # (N - 1, T - T0)  (None if no post)
results.inputs.treated_unit_name
results.inputs.donor_names

Verification#

Empirical replication against the authors’ published numbers (Path A) plus a Section 5 state-space Monte Carlo (Path B). Path A reruns the classical Proposition 99 California-tobacco illustration from Section 6.1 of [TASC] using the long-form panel basedata/prop99_packsales.csv shipped with mlsynth, and reproduces the post-1988 divergence between observed California cigarette sales and the TASC counterfactual that the paper’s Figure 10 displays. Path B replicates the four-cell \((Q, R)\) ablation grid (Figures 3 and 4) by drawing panels directly from TASC’s own generative state-space model and comparing mlsynth.TASC against a simplex-constrained Synthetic Control baseline – the same baseline the paper benchmarks.

Path A: Proposition 99 California (Section 6.1)#

The paper runs TASC on per-capita cigarette sales alone (no auxiliary predictors) with hidden-state dimension \(d = 2\). mlsynth.TASC on the same long-form panel reproduces the qualitative pattern of the paper’s Figure 10 directly:

import pandas as pd
from mlsynth import TASC

df = pd.read_csv("basedata/prop99_packsales.csv")
df["treat"] = ((df["state"] == "California")
                & (df["year"] >= 1989)).astype(int)

res = TASC({"df": df, "outcome": "cigsale", "unitid": "state",
             "time": "year", "treat": "treat", "d": 2,
             "n_em_iter": 50, "em_tol": 1e-4, "alpha": 0.05,
             "seed": 0, "display_graphs": False}).fit()
yhat = res.inference.counterfactual
print(f"pre-RMSE = {res.pre_rmse:.3f}  ATT = {res.att:.3f}")

prints:

pre-RMSE = 0.767  ATT = -16.793

with the year-by-year trajectory

Year

California (observed)

TASC counterfactual

Gap

1985

102.80

102.66

+0.14

1988

90.10

91.88

-1.78

1989

82.40

88.30

-5.90

1990

77.80

84.37

-6.57

1995

56.40

76.50

-20.10

2000

41.60

65.14

-23.54

The 1985-1988 fit is essentially tight on California’s observed series (pre-RMSE \(= 0.77\) packs against an outcome scale of roughly 100 packs), the divergence opens at the 1989 intervention, and the gap widens monotonically – reaching a roughly \(-24\) pack difference by 2000 against the paper’s Figure-10 gap of about \(-25\) to \(-30\) packs at the same horizon. The average post-1989 treatment effect is \(\widehat{\mathrm{ATT}} = -16.8\) packs per year, in the same neighbourhood as Abadie, Diamond and Hainmueller’s classical estimate.

Path B: Section 5 state-space ablation grid#

The paper’s Section 5.2 ablation sweeps a \(2 \times 2\) grid of state-perturbation and observation-noise covariance scales \((Q, R)\) (Figures 3-4): a “small” covariance has diagonal variance \(0.01\) (average \(|r_t| \approx 0.084\)) and a “big” covariance has diagonal variance \(1.0\) (average \(|r_t| \approx 0.836\)). Panels are drawn from TASC’s own generative model (Equations 2-3), so this is a correctly-specified Monte Carlo. The DGP is packaged as mlsynth.utils.tasc_helpers.simulation.simulate_tasc_sample(); the panel below compares the post-period RMSE of mlsynth.TASC (\(d_{\mathrm{fit}} = d_{\mathrm{true}} = 5\)) against a simplex-constrained Synthetic Control baseline.

import numpy as np
import scipy.optimize as opt
from mlsynth import TASC
from mlsynth.utils.tasc_helpers.simulation import simulate_tasc_sample

def sc_simplex(Y, T0):
    y, X = Y[0], Y[1:].T
    n = X.shape[1]
    cons = [{"type": "eq", "fun": lambda w: w.sum() - 1.0}]
    bnds = [(0.0, 1.0)] * n
    r = opt.minimize(lambda w: ((X[:T0] @ w - y[:T0]) ** 2).sum(),
                      np.full(n, 1.0 / n), method="SLSQP",
                      bounds=bnds, constraints=cons)
    return float(np.sqrt(np.mean((y[T0:] - X[T0:] @ r.x) ** 2)))

def tasc_rmse(sample, d_fit=5):
    r = TASC({"df": sample.df, "outcome": "y", "treat": "treat",
                "unitid": "unit", "time": "time", "d": d_fit,
                "n_em_iter": 30, "em_tol": 1e-4, "alpha": 0.05,
                "seed": 0, "display_graphs": False}).fit()
    y0, T0 = sample.Y[0], sample.T0
    return float(np.sqrt(np.mean(
        (y0[T0:] - r.inference.counterfactual[T0:]) ** 2)))

M = 30
for q, r in [(0.01, 0.01), (0.01, 1.0), (0.10, 0.01), (0.10, 1.0)]:
    sc_, tasc_ = [], []
    for seed in range(M):
        s = simulate_tasc_sample(q_scale=q, r_scale=r,
                                   rng=np.random.default_rng(seed))
        sc_.append(sc_simplex(s.Y, s.T0))
        tasc_.append(tasc_rmse(s))
    print(f"q={q:.2f} r={r:.2f}  TASC={np.median(tasc_):.3f}  "
           f"SC={np.median(sc_):.3f}")

prints (at \(M = 30\), \(N = 38\), \(T = 100\), \(T_0 = 50\), \(d_{\mathrm{true}} = 5\)):

Regime

TASC median RMSE

SC median RMSE

Margin (SC / TASC)

small Q, small R

0.116

0.196

1.7x

small Q, big R

1.103

1.198

1.1x

big Q, small R

0.117

0.524

4.5x

big Q, big R

1.130

1.301

1.2x

TASC carries the lowest median RMSE in all four regimes, and the margin over SC is largest precisely in the high-\(Q\) / low-\(R\) cell – the same regime where the paper’s Figure 4 identifies TASC’s strongest dominance (a fitted state-space model extracts the persistent low-rank signal that the simplex projection cannot exploit). Under high observation noise (\(R = 1.0\)), the SC simplex projection still trails TASC but by a narrower margin, reflecting the noise floor common to both estimators. The paper’s Figures 3-4 also include the Robust Synthetic Control of Amjad, Shah and Shen (2018) and the Causal Impact Model of Brodersen et al. (2015) as additional comparators that are not in mlsynth; the ordering above against the canonical simplex-SC baseline is the slice of those comparisons that mlsynth can reproduce directly.

The takeaway carried into the published TASC procedure is the paper’s headline finding: when the data-generating process carries a persistent low-rank temporal signal – as it does in many policy panels with strong trends – explicitly fitting that temporal structure through a state-space model lowers post-period prediction error relative to permutation-invariant alternatives, and the advantage widens as the latent signal strengthens (large \(Q\)).

References#

Rho, S., Illick, C., Narasipura, S., Abadie, A., Hsu, D., & Misra, V. (2026). Time-Aware Synthetic Control. arXiv:2601.03099.

Durbin, J., & Koopman, S. J. (2012). Time Series Analysis by State Space Methods. Oxford Statistical Science Series 38, 2nd edition. Oxford University Press.