Time-Aware Synthetic Control (TASC)#
Overview#
Time-Aware Synthetic Control (TASC) arXiv:2601.03099 is a state-space synthetic-control estimator. Unlike classical SC (Forward Difference-in-Differences (FDID), Two-Step Synthetic Control) or robust-SC variants (Cluster Synthetic Controls (CLUSTERSC), Proximal Inference Synthetic Control (PROXIMAL)), which treat the ordering of pre-intervention time indices as interchangeable, TASC explicitly models the temporal evolution of the latent factors driving the panel. It embeds the standard SC outcome matrix inside a linear-Gaussian state-space model with a constant trend matrix \(A\), fits the model parameters via the Expectation-Maximization (EM) algorithm with a Kalman-filter + Rauch-Tung-Striebel (RTS) smoother E-step and a closed-form M-step, and produces both a point counterfactual and a posterior-based confidence band in one pass.
Two structural properties distinguish TASC from the rest of the
mlsynth toolkit:
Time-awareness. Because \(A\) is shared across periods, permuting the pre-intervention time indices changes the fit. Permutation-invariant methods (classical SC, robust SC, nuclear-norm matrix completion) produce identical counterfactuals under the same permutation; TASC does not. Section 5.1 of the paper formalizes this via a data-processing-inequality argument (Proposition A.1).
Approximately low-rank signal under omnidirectional noise. The observation matrix decomposes as \(Y = H X + E\) where \(H X\) is exactly rank-\(d\) and \(E\) is full-rank observation noise. TASC therefore tolerates substantial measurement noise — even when PCA-style denoising (used by Robust SC) breaks down — because it does not assume the principal directions are noise-free.
When to use TASC instead of something else#
The Rho-Illick-Narasipura-Abadie-Hsu-Misra (2026) paper runs a 4-cell ablation comparing TASC against vanilla SC, Robust SC, and the Causal Impact Model under independent variation of the observation-noise covariance \(R\) and the state-perturbation covariance \(Q\) (Section 5.2, Figures 3-4 of the paper). The clean recommendation:
Use TASC when observation noise is high. Across the two large-\(R\) cells (small-\(Q\) and large-\(Q\)) TASC delivers the smallest median RMSE in the paper’s simulation. PCA- style denoising (Robust SC) and simplex shrinkage (vanilla SC) break down because they assume the principal directions of the observation matrix are noise-free; TASC’s full-rank \(R \sim \mathcal{N}(0, R)\) assumption is a much better fit when the noise is omnidirectional.
Use TASC when the donor panel has a persistent, smoothly-varying trend. “Persistent” means the trend extends past the intervention point. This is the strong-trend regime (small \(Q\), non-trivial \(A\)). The Kalman + RTS smoother extrapolates the trend forward; PCA / nuclear-norm methods don’t.
Use TASC when you need a posterior credible band for free. TASC is a generative model. The RTS smoother returns the full posterior covariance at every period, so a
+/- 1.96 sigmaband on the counterfactual is part of the fit’s output. The other mlsynth estimators that ship credible bands are Bayesian Synthetic Control with a Soft Simplex Constraint (BVS-SS) (Bayesian spike-and-slab) and Time-Aware Synthetic Control (TASC) itself; the rest require an external bootstrap or subsampling pass.
When not to reach for TASC:
The pre-intervention trend is weak or absent (the paper writes “\(A \approx 0\)” — large \(Q\) regime). The smaller the trend, the smaller TASC’s edge over classical SC; in the small-\(R\), large-\(Q\) cell of the paper’s ablation, vanilla SC matches or beats TASC.
Observation noise is small AND structured low-dimensional. Under small \(R\), hard singular-value thresholding (Robust SC) cleans the signal exactly, and TASC’s omnidirectional-\(R\) prior is paying a price for flexibility it doesn’t need.
Long-horizon forecasting in noisy regimes. The paper’s Figures 5-6 show that under large \(R\) and large \(Q\), TASC’s RMSE rises noticeably from horizon 51-60 to horizon 91-100 (small-\(Q\) is stable). If you need a 5-year-out counterfactual on a noisy panel, look at Factor Model Approach (FMA) or Matrix Completion with Nuclear Norm Minimization (MCNNM) first.
Time indices are not really ordered (you’re modelling a cross-section that happens to be indexed by time, or the periods are interchangeable up to relabelling). Permuting time indices costs TASC 48.5% on mean RMSE and 25.7% on the RMSE standard deviation in the paper’s controlled test (Section 5.1, Figure 2). If the time ordering is meaningless, use a permutation-invariant estimator like Two-Step Synthetic Control or Cluster Synthetic Controls (CLUSTERSC).
Assumptions (and how to spot violations)#
TASC inherits the assumptions of a linear-Gaussian state-space model. Section 3 of the paper lays them out; the practitioner-facing restatement is:
Linear-Gaussian dynamics. The hidden state evolves as \(x_t = A x_{t-1} + q_t\) with \(q_t\) zero-mean Gaussian. Equivalently: the trend in the latent factors is well-approximated by a stable linear AR(1) at the level of the state vector, and the perturbations around the trend are homoscedastic and uncorrelated across time.
Plausibly violated when: the latent factor evolution is strongly nonlinear (regime switches, breakpoints, structural breaks), has fat-tailed shocks, or has volatility clustering. Diagnostic: examine the smoothed state residuals from the pre-period fit; non-Gaussian QQ-plot tails or Ljung-Box-significant autocorrelation suggest misspecification.
Constant trend matrix :math:`A`. The dynamics that hold over the pre-period are assumed to continue unchanged through the post-period. This is the “trend persists past the intervention point” assumption that gives TASC its long-horizon advantage.
Plausibly violated when: the intervention itself triggers a regime change in the donor units (e.g.\ a tax change that affects neighbouring states’ growth dynamics, not just their levels). TASC is by construction unable to detect a post-period change in \(A\) — the post-period target outcomes are treated as missing, so they cannot inform the update. Diagnostic: split the pre-period into two halves and refit on each. If the estimated \(A\) differs materially, the constant-\(A\) assumption is shaky and the post-period forecast is suspect.
Observation model :math:`y_t = H x_t + r_t`, :math:`R` full rank, :math:`d ll min(n, T)`. The signal is low-rank with rank \(d\) (the latent-state dimension); the noise \(r_t\) is Gaussian with a positive-definite covariance. Importantly, \(R\) does NOT have to be diagonal: TASC handles correlated cross-donor noise via the full \(R\) (set
diagonal_R = Falsein the config).Plausibly violated when: the residual cross-section is rank-deficient (some donors are exact linear combinations of others, e.g.\ aggregated subseries paired with their components), or when the true signal is full-rank (no shared factors — every donor moves independently). In both cases the EM estimate of \(d\) ends up wrong and either underfits (low \(d\)) or overfits (high \(d\)).
No unobserved confounders that affect donors AND treated unit between :math:`T_0` and :math:`T_0 + 1`. This is the standard SC unconfoundedness assumption, not specific to TASC, but worth restating: TASC’s counterfactual is informative about the treatment effect only if any post-period shock to the donor pool is also reflected in what the target would have done absent treatment.
Plausibly violated when: a covariate that drives the treated unit’s outcome (but is uncorrelated with the donors) shifts at the intervention time. TASC has no covariate hook, so this kind of confounding can only be diagnosed externally.
Hidden-state dimension :math:`d` correctly specified. TASC takes \(d\) as a user hyperparameter (
hidden_state_dim). The paper’s Section 5.3 shows that underestimating :math:`d` is worse than overestimating — if in doubt, err on the high side.Plausibly violated when: the data has more latent factors than you’ve allowed for. Diagnostic: increase \(d\) and refit; if the RMSE on a held-out pre-period segment drops materially, you were underfitting.
Mathematical Formulation#
Let \(Y \in \mathbb{R}^{N \times T}\) be the outcome matrix with units in rows and periods in columns. The first row corresponds to the treated target unit; the remaining \(n = N - 1\) rows are donors. Pre-intervention periods are \(t = 1, \dots, T_0\); the post- intervention window is \(t = T_0 + 1, \dots, T\), during which the target row is unobserved (the very quantity TASC reconstructs).
State-Space Model#
The TASC generative model (Eqs. (2)-(3) of the paper) is a classical linear-Gaussian state-space model:
with initial state \(x_0 \sim \mathcal{N}(m_0, P_0)\). The hidden state \(x_t \in \mathbb{R}^d\) has dimension \(d \ll \min(n, T)\), which is precisely what preserves the low-rank structure of the signal \(H X\). The complete parameter set is
All three covariance matrices \(Q, R, P_0\) are positive definite.
The TASCConfig flags diagonal_Q and diagonal_R control
whether the M-step constrains \(Q\) and \(R\) to be diagonal
(the paper’s default — see Algorithm 7) or updates the full symmetric
covariance.
Relationship to the Linear Factor Model#
The classical SC linear factor model from Abadie & Gardeazabal (2003),
can be cast as a state-space model with latent state \(x_t = (\delta_t, \theta_t, \lambda_t)\) and observation rows \(h_i = (1, Z_i, \mu_i)\). The crucial distinction is that linear factor models impose no dynamics on \(x_t\) (or equivalently \(A = 0\), \(x_t = q_t\)), whereas TASC enforces a stable trend through \(A\). This is what gives TASC its long-horizon forecast accuracy under correct specification, at the cost of greater sensitivity to misspecification when temporal dynamics are complex.
The Counterfactual via Infinite-Variance Kalman Filtering#
In the post-intervention window the target’s observed value is unavailable. TASC handles this by formally setting the target’s observation-noise variance to \(+\infty\) (Section 4.2 of the paper). Partition
where \(y_{t,2}, r_{t,2} \in \mathbb{R}^n\), \(H_2 \in \mathbb{R}^{n \times d}\), and \(R_2 \in \mathbb{R}^{n \times n}\). Under \(R'\), the Schur-complement inverse of the innovation covariance
has a zero in its (1, 1) block. The Kalman gain therefore picks up no
contribution from the target row, and the post-intervention filter
update depends only on the donor block. This is implemented in
mlsynth.utils.tasc_helpers.filtering.kalman_filter_inf_variance_step()
(Algorithm 5), and the full forward pass is composed by
mlsynth.utils.tasc_helpers.filtering.kalman_filter_full()
following Algorithm 3.
Once the forward pass produces \((m_k, P_k)_{k=0}^T\), the
backward Rauch-Tung-Striebel smoother
(mlsynth.utils.tasc_helpers.smoothing.rts_smoother(), Algorithm
6) returns the smoothed posterior
with
The counterfactual for the target unit is then read off the smoothed latent state via \(h_1\):
and the posterior variance of the observation (not just the latent target) is
The corresponding \((1 - \alpha)\)-confidence band is
These are populated unconditionally on TASCResults.inference,
with \(\alpha\) controlled by the TASCConfig.alpha field.
Learning \(\theta\) from Pre-Intervention Data (EM)#
The parameter set \(\theta\) is learned by Expectation-Maximization
on the pre-intervention slice \(Y_{\text{pre}} \in
\mathbb{R}^{N \times T_0}\). Each outer iteration of
mlsynth.utils.tasc_helpers.em.em_pre() (Algorithm 2) runs:
E-step (filtering pass): apply the standard Kalman filter (Algorithm 4) for \(k = 1, \dots, T_0\) to obtain \((m_k, P_k)\).
E-step (smoothing pass): apply the RTS smoother backward to obtain \((m_k^s, P_k^s, G_k)\) for \(k = T_0, \dots, 0\).
M-step (closed-form MLE update): Algorithm 7. Define the sufficient statistics
\[\begin{split}\begin{aligned} \Sigma &= \frac{1}{T_0} \sum_{k=1}^{T_0} \left( P_k^s + m_k^s {m_k^s}^\top \right), & \Phi &= \frac{1}{T_0} \sum_{k=1}^{T_0} \left( P_{k-1}^s + m_{k-1}^s {m_{k-1}^s}^\top \right), \\ B &= \frac{1}{T_0} \sum_{k=1}^{T_0} y_k \, {m_k^s}^\top, & C &= \frac{1}{T_0} \sum_{k=1}^{T_0} \left( P_k^s G_{k-1}^\top + m_k^s {m_{k-1}^s}^\top \right), \\ D &= \frac{1}{T_0} \sum_{k=1}^{T_0} y_k \, y_k^\top. \end{aligned}\end{split}\]The update is then
\[\begin{split}\begin{aligned} A' &\leftarrow C \, \Phi^{-1}, & H' &\leftarrow B \, \Sigma^{-1}, \\ Q' &\leftarrow \operatorname{Diag}\!\left( \Sigma - 2 C A'^\top + A' \Phi A'^\top \right), & R' &\leftarrow \operatorname{Diag}\!\left( D - 2 B H'^\top + H' \Sigma H'^\top \right), \\ m_0' &\leftarrow m_0^s, & P_0' &\leftarrow P_0^s + (m_0^s - m_0)(m_0^s - m_0)^\top, \end{aligned}\end{split}\]where \(\operatorname{Diag}(\cdot)\) zeroes the off-diagonal entries when
diagonal_Q=True/diagonal_R=True(the paper’s default) or returns the symmetric matrix unchanged otherwise.
The loop terminates after TASCConfig.n_em_iter outer
iterations or, if TASCConfig.em_tol is set, as soon as the
maximum absolute change in \((A, H)\) between successive iterations
falls below the threshold.
Spectral Initialization#
EM is sensitive to initialization (as noted in the paper’s Section 7).
TASC therefore warm-starts \(\theta^{(0)}\) from a truncated SVD
of the pre-intervention matrix
(mlsynth.utils.tasc_helpers.setup.initialize_parameters()):
The transition matrix \(A^{(0)}\) is obtained from a ridge- regularized AR(1) least-squares fit on the latent trajectory \(X^{(0)}\); \(Q^{(0)}\) and \(R^{(0)}\) are seeded from the corresponding residual variances; and \(m_0^{(0)}, P_0^{(0)}\) are taken from the first row of \(X^{(0)}\) and \(Q^{(0)}\) respectively.
Treatment Effect and Pre-Period Fit#
For post-treatment periods \(t = T_0 + 1, \dots, T\), the average treatment effect on the treated is the mean of the post-period gap between the observed target and its TASC reconstruction:
reported as TASCResults.att. The pre-period RMSE between the
observed target and the smoother’s pre-treatment fit,
is reported as TASCResults.pre_rmse and serves as the primary
fit diagnostic.
Complexity#
The dominant cost of TASC is \(O(N_1 \, T_0 \, N^3)\), where \(N_1\) is the number of EM iterations and the \(N^3\) term arises from inverting the innovation covariance during the Kalman filter. The post-EM full-window pass adds \(O(T \, N^3)\), which is negligible when \(T \ll N_1 \, T_0\). Constraining \(R\) to be diagonal in the M-step (the default) does not change the filter’s inner-loop complexity but does reduce parameter-count variance and improves numerical stability in moderate-\(N\) regimes.
Algorithm 1 and the Theoretical Appendix#
The paper’s Algorithm 1 is the abstract “SC Family of Methods” frame
(target-side regression on donors), which TASC instantiates implicitly
through the state-space machinery rather than as a discrete code path.
Appendix A’s Proposition A.1 (Kalman sufficiency, information loss by
permutation invariance, dominance) is the theoretical justification
for TASC’s edge over permutation-invariant SC variants; it does not
correspond to a separate routine in mlsynth.
Core API#
Time-Aware Synthetic Control (TASC) estimator.
Implements:
Rho, S., Illick, C., Narasipura, S., Abadie, A., Hsu, D., & Misra, V. (2026). “Time-Aware Synthetic Control.” arXiv:2601.03099.
TASC embeds the standard SC panel inside a linear-Gaussian state-space model
x_t = A x_{t-1} + q_{t-1}, q ~ N(0, Q) y_t = H x_t + r_t, r ~ N(0, R)
and learns the parameters theta = {A, H, Q, R, m_0, P_0} from the pre-treatment
data via EM (Kalman filter + RTS smoother E-step, closed-form MLE M-step).
Counterfactual inference runs a Kalman filter pass with the target’s
observation-noise variance R_{1, 1} set to infinity over the post-treatment
window, followed by RTS smoothing. The counterfactual is then
y_hat_{0, t} = h_1^T m_t^s, with posterior CIs derived from
h_1^T P_t^s h_1 + R_{1, 1}.
Algorithm flow (see mlsynth.utils.tasc_helpers):
setup.py : Algorithm 3 line 0 prepare_tasc_inputs, init theta_0 em.py : Algorithm 2 EM_pre filtering.py : Algorithms 4 and 5 Kalman filter passes smoothing.py : Algorithm 6 RTS smoother mstep.py : Algorithm 7 closed-form MLE M-step orchestration.py : Algorithm 3 run_tasc + summarize_effects inference.py : Algorithm 3 footer h_1^T m^s, posterior CIs
- class mlsynth.estimators.tasc.TASC(config: TASCConfig | dict)#
Bases:
objectTime-Aware Synthetic Control (TASC) estimator.
Fits a linear-Gaussian state-space model to a single-treated-unit synthetic control panel via EM, and forms counterfactual estimates and posterior confidence intervals from a Kalman / RTS pass that treats the target’s post-treatment observations as missing.
- Parameters:
config (TASCConfig or dict) – Configuration object specifying the panel inputs and the EM / inference hyperparameters. See
mlsynth.config_models.TASCConfig.- Returns:
TASCResults – Container with the learned model, EM diagnostics, smoothed states, counterfactual path, and posterior confidence intervals. See
mlsynth.utils.tasc_helpers.structures.TASCResults.
Notes
Each column of the input panel is a unit, each row is a time period (
datautils.dataprepconvention). Internally, TASC reshapes the data to the paper’sY in R^{N x T}orientation.The hidden state dimension
dshould be small relative tomin(n_donors, T)to preserve the low-rank structure of the signal.EM is sensitive to initialization. Defaults use a spectral (top-
dSVD) start which generally produces stable fits.
References
Rho, S., Illick, C., Narasipura, S., Abadie, A., Hsu, D., & Misra, V. (2026). “Time-Aware Synthetic Control.” arXiv:2601.03099.
Examples
>>> from mlsynth import TASC >>> config = { ... "df": panel, ... "outcome": "sales", ... "unitid": "state", ... "time": "year", ... "treat": "treated", ... "d": 2, ... } >>> results = TASC(config).fit()
- fit() TASCResults#
Run the TASC pipeline and return the learned design.
- Returns:
TASCResults – Final design, inputs, posterior inference, and summary effects.
Configuration#
- class mlsynth.config_models.TASCConfig(*, df: ~pandas.DataFrame, outcome: str, treat: str, unitid: str, time: str, display_graphs: bool = True, save: bool | str = False, counterfactual_color: ~typing.List[str] = <factory>, treated_color: str = 'black', d: ~typing.Annotated[int, ~annotated_types.Ge(ge=1)], n_em_iter: ~typing.Annotated[int, ~annotated_types.Ge(ge=1)] = 50, em_tol: ~typing.Annotated[float | None, ~annotated_types.Gt(gt=0.0)] = None, diagonal_Q: bool = True, diagonal_R: bool = True, alpha: ~typing.Annotated[float, ~annotated_types.Gt(gt=0.0), ~annotated_types.Lt(lt=1.0)] = 0.05, seed: int | None = None)#
Configuration for the Time-Aware Synthetic Control (TASC) estimator.
Implements the state-space model of Rho, Illick, Narasipura, Abadie, Hsu, and Misra (2026, arXiv:2601.03099) with EM learning (Kalman filter + RTS smoother on the E-step, closed-form MLE on the M-step) and Kalman-with- infinite-variance counterfactual inference.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
Helper Modules#
Data preparation and EM initialization helpers for TASC.
datautils.dataprep returns the wide outcome matrix in the
(T, N) orientation (rows = time, columns = unit). The TASC paper works
with Y in R^{N x T} (rows = unit, columns = time), so the wrappers in
this module take care of the transpose once and only once.
- mlsynth.utils.tasc_helpers.setup.initialize_parameters(Y_pre: ndarray, d: int, seed: int | None = None) TASCParameters#
Spectral initialization for the EM parameters.
Performs a thin SVD of
Y_pre(N x T0) and uses the top-dleft singular vectors as the initial observation matrixH. The initial latent trajectory is then the corresponding right singular vectors scaled by the singular values, from which a simple AR(1) least squares fit givesA.Q,R,P0are seeded from the associated residual variances.- Parameters:
Y_pre (np.ndarray) – Pre-treatment outcome matrix of shape
(N, T0).d (int) – Hidden state dimension.
seed (int or None) – Reserved for tie-breaking; not currently consumed but retained for API symmetry with other estimators.
- Returns:
TASCParameters – Initial
theta_0.
- mlsynth.utils.tasc_helpers.setup.prepare_tasc_inputs(df: DataFrame, outcome: str, unitid: str, time: str, treat: str) TASCInputs#
Run
dataprepand reshape the result into the paper’s N x T layout.- Parameters:
df (pd.DataFrame) – Long balanced panel data.
outcome, unitid, time, treat (str) – Column names identifying the outcome, units, time periods, and the binary treatment indicator.
- Returns:
TASCInputs – Pre / post matrices in
(N, T)orientation along with metadata.
Kalman filter passes for TASC.
Implements Algorithm 4 (standard Kalman filter) and Algorithm 5 (Kalman filter with infinite observation-noise variance on the target row) from Rho, Illick, Narasipura, Abadie, Hsu, Misra (2026, arXiv:2601.03099).
The “infinite variance” trick (Sec 4.2) lets the post-treatment update use only the donor rows to refine the latent state, while the target row’s contribution to the Kalman gain is zeroed out by Schur complement.
- mlsynth.utils.tasc_helpers.filtering.kalman_filter_full(Y_pre: ndarray, Y_post_donors: ndarray, params: TASCParameters) TASCFilteredStates#
Forward pass over all
Tperiods (Algorithm 3, lines 1-3).The first
T0updates use Algorithm 4; the remainingT - T0updates use Algorithm 5 with the target’s observation variance set to infinity.- Parameters:
Y_pre (np.ndarray) – Pre-treatment slice of shape
(N, T0).Y_post_donors (np.ndarray) – Post-treatment donor-only slice of shape
(N - 1, T - T0).params (TASCParameters) – EM-learned model parameters.
- mlsynth.utils.tasc_helpers.filtering.kalman_filter_inf_variance_step(y_donors_k: ndarray, m_prev: ndarray, P_prev: ndarray, params: TASCParameters) Tuple[ndarray, ndarray]#
Single Kalman filter step with
R_{1,1} = inf(Algorithm 5).The target row is treated as missing. We partition
H = [h_1^T; H_2], R = diag(inf, R_2)
so that the inverse innovation covariance has zero in the (1, 1) block by Schur complement, and only the donor block contributes to the update.
- Parameters:
y_donors_k (np.ndarray) – Donor observations at time
k, shape(N - 1,).m_prev, P_prev (np.ndarray) – Previous filtered mean and covariance.
params (TASCParameters) – Current model parameters.
- mlsynth.utils.tasc_helpers.filtering.kalman_filter_pre(Y_pre: ndarray, params: TASCParameters) TASCFilteredStates#
Run Algorithm 4 across the pre-treatment window.
- Parameters:
Y_pre (np.ndarray) – Pre-treatment outcome matrix of shape
(N, T0).params (TASCParameters) – Current model parameters.
- mlsynth.utils.tasc_helpers.filtering.kalman_filter_step(y_k: ndarray, m_prev: ndarray, P_prev: ndarray, params: TASCParameters) Tuple[ndarray, ndarray]#
Single Kalman filter step (Algorithm 4).
Rauch-Tung-Striebel smoother for TASC (Algorithm 6).
Operates on the output of the forward Kalman pass and produces smoothed
state estimates plus the smoother-gain matrices G_k required for the
M-step (Algorithm 7 computes C from G_{k-1}).
- mlsynth.utils.tasc_helpers.smoothing.rts_smoother(filtered: TASCFilteredStates, params: TASCParameters) TASCSmoothedStates#
Backward smoothing pass.
- Parameters:
filtered (TASCFilteredStates) – Output of
kalman_filter_preorkalman_filter_full. Index 0 holds the prior; indices 1..T hold filtered posteriors.params (TASCParameters) – Model parameters used in the forward pass.
- Returns:
TASCSmoothedStates – Smoothed means and covariances at indices 0..T, plus the smoother gains
G_kat indices 0..T-1.G[T]is zero-filled and unused.
Closed-form M-step for TASC (Algorithm 7).
Given the smoothed sufficient statistics, returns new theta' that
maximizes the expected complete-data log-likelihood. Q and R may be
constrained to be diagonal (the paper’s default) via the corresponding
flags.
- mlsynth.utils.tasc_helpers.mstep.m_step(Y_pre: ndarray, smoothed: TASCSmoothedStates, prev_params: TASCParameters, diagonal_Q: bool = True, diagonal_R: bool = True) TASCParameters#
Maximum-likelihood parameter update (Algorithm 7).
- Parameters:
Y_pre (np.ndarray) – Pre-treatment outcomes, shape
(N, T). The number of columns must equal the smoother’sT.smoothed (TASCSmoothedStates) – Output of the RTS smoother. Indices 0..T-1 supply
m_{k-1}^sandG_{k-1}; indices 1..T supplym_k^s.prev_params (TASCParameters) – Current parameters, used to seed
m_0andP_0updates.diagonal_Q, diagonal_R (bool) – If True (default), the corresponding covariance update is restricted to its diagonal as in the paper.
EM loop on pre-intervention data (Algorithm 2: EM_pre).
Each outer iteration runs:
Forward Kalman filter pass over
Y_pre(Algorithm 4).Backward RTS smoother pass (Algorithm 6).
Closed-form M-step update (Algorithm 7).
Optionally terminates early when the max-abs change in (A, H) falls
below em_tol.
- mlsynth.utils.tasc_helpers.em.em_pre(Y_pre: ndarray, init_params: TASCParameters, n_em_iter: int, em_tol: float | None = None, diagonal_Q: bool = True, diagonal_R: bool = True) Tuple[TASCParameters, ndarray, TASCFilteredStates, TASCSmoothedStates]#
Run
EM_preover the pre-treatment window.- Parameters:
Y_pre (np.ndarray) – Pre-treatment outcome matrix of shape
(N, T0).init_params (TASCParameters) – Initial parameters
theta^{(0)}.n_em_iter (int) – Maximum number of EM iterations (
N_1in the paper).em_tol (float or None) – If not None, EM stops once the max absolute change in
(A, H)is below this threshold.diagonal_Q, diagonal_R (bool) – Forwarded to
m_step.
- Returns:
params (TASCParameters) – Final parameter estimate.
deltas (np.ndarray) – Max-abs change in
(A, H)at each iteration (length equal to the number of iterations actually run).filtered (TASCFilteredStates) – Forward filtered states from the final iteration.
smoothed (TASCSmoothedStates) – RTS smoothed states from the final iteration.
Top-level TASC procedure (Algorithm 3).
Sequence:
EM_preon the pre-treatment data to learntheta.Forward Kalman pass: Algorithm 4 on
t = 1..T0and Algorithm 5 ont = T0 + 1..T.Backward RTS smoother on the full window.
Counterfactual + posterior CI computation.
- mlsynth.utils.tasc_helpers.orchestration.run_tasc(inputs: TASCInputs, d: int, n_em_iter: int, em_tol: float | None, diagonal_Q: bool, diagonal_R: bool, alpha: float, seed: int | None = None)#
Execute Algorithm 3 end-to-end.
- Returns:
design (TASCDesign) – Learned model with EM diagnostics and final filtered / smoothed states across all
Tperiods.inference (TASCInference) – Counterfactual and posterior-based CIs.
- mlsynth.utils.tasc_helpers.orchestration.summarize_effects(inputs: TASCInputs, inference: TASCInference) tuple[float, float]#
Compute
ATT(post-period mean gap) and pre-period RMSE.Both are returned as plain
floats for storage onTASCResults.
Counterfactual inference for TASC.
Given the smoothed states from the full forward / backward pass (Algorithm 3, lines 1-3 of the post-EM block), the counterfactual is
y_hat_{0, t} = h_1^T m_t^s
and the posterior variance of the target observation is
Var(y_{0, t} | y_{1:T_donors}) = h_1^T P_t^s h_1 + R_{1, 1}
(adding back the observation noise restores the variance of an observation rather than the latent target).
- mlsynth.utils.tasc_helpers.inference.counterfactual_with_ci(smoothed: TASCSmoothedStates, params: TASCParameters, alpha: float) TASCInference#
Compute the TASC counterfactual path and posterior CIs.
- Parameters:
smoothed (TASCSmoothedStates) – Smoothed states from the full pass over all
Tperiods. Index 0 holds the smoothed prior; indices 1..T hold the smoothed posteriors for each observation timestep.params (TASCParameters) – Final EM-learned parameters.
H[0]ish_1andR[0, 0]is the target’s observation-noise varianceR_{1, 1}.alpha (float) – Significance level. The bands are
y_hat +/- z_{1 - alpha/2} * sd.
Plotting helper for TASC.
Wraps mlsynth.utils.resultutils.plot_estimates so we get the standard
observed-vs-counterfactual chart with the posterior-based CI band shaded
behind the counterfactual curve.
- mlsynth.utils.tasc_helpers.plotter.plot_tasc(results: TASCResults, treated_color: str = 'black', counterfactual_color: str | List[str] = 'red', save: bool | dict = False, time_axis_label: str = 'Time', outcome_label: str | None = None, treatment_label: str = 'Treatment', unit_label: str = 'Unit') None#
Render the TASC counterfactual against the observed series.
Structured containers for the TASC pipeline.
All matrices follow the paper’s convention Y in R^{N x T} with rows = units
(target as the first row, donors below) and columns = time. This is the
transpose of datautils.dataprep’s Ywide (which is time x unit). The
transpose is performed once in setup.prepare_tasc_inputs.
- class mlsynth.utils.tasc_helpers.structures.TASCDesign(parameters: TASCParameters, n_em_iter_used: int, em_param_deltas: ndarray, filtered: TASCFilteredStates, smoothed: TASCSmoothedStates)#
Learned model and EM diagnostics.
- Parameters:
parameters (TASCParameters) – Final EM-estimated parameters.
n_em_iter_used (int) – Number of EM iterations actually executed (may be less than the cap if
em_toltriggered early stopping).em_param_deltas (np.ndarray) – Per-iteration max absolute change in
(A, H). Length equal ton_em_iter_used.filtered (TASCFilteredStates) – Forward filtered states from the final full pass.
smoothed (TASCSmoothedStates) – Backward smoothed states from the final full pass.
- em_param_deltas: ndarray#
- filtered: TASCFilteredStates#
- parameters: TASCParameters#
- smoothed: TASCSmoothedStates#
- class mlsynth.utils.tasc_helpers.structures.TASCFilteredStates(m: ndarray, P: ndarray)#
Output of the forward Kalman pass.
- Parameters:
m (np.ndarray) – Filtered means stacked over time, shape
(T + 1, d). Index 0 holds the priorm0; indexk >= 1holds the posterior meanm_{k|k}.P (np.ndarray) – Filtered covariances, shape
(T + 1, d, d). Index 0 holdsP0.
- P: ndarray#
- m: ndarray#
- class mlsynth.utils.tasc_helpers.structures.TASCInference(counterfactual: ndarray, ci_lower: ndarray, ci_upper: ndarray, posterior_variance: ndarray, alpha: float)#
Counterfactual point estimates and posterior confidence intervals.
- Parameters:
counterfactual (np.ndarray) – Estimated counterfactual for the target unit across all
Tperiods,y_hat_{0, t} = h_1^T m_t^s.ci_lower (np.ndarray) – Lower confidence band, shape
(T,).ci_upper (np.ndarray) – Upper confidence band, shape
(T,).posterior_variance (np.ndarray) – Posterior variance of the target row,
h_1^T P_t^s h_1 + R_{1,1}, shape(T,).alpha (float) – Significance level used to build the bands.
- ci_lower: ndarray#
- ci_upper: ndarray#
- counterfactual: ndarray#
- posterior_variance: ndarray#
- class mlsynth.utils.tasc_helpers.structures.TASCInputs(Y_full: ndarray, Y_pre: ndarray, Y_post_donors: ndarray | None, T0: int, T: int, N: int, treated_unit_name: str, donor_names: Sequence, time_labels: ndarray, pre_periods: int, post_periods: int, Ywide: object, y_target: ndarray)#
Pre-processed panel data fed into the TASC EM and inference loops.
- Parameters:
Y_full (np.ndarray) – Outcome matrix of shape
(N, T)with the target in row 0 and then = N - 1donor units in rows 1..N-1.Y_pre (np.ndarray) – Pre-treatment slice
Y_full[:, :T0]of shape(N, T0).Y_post_donors (np.ndarray or None) – Post-treatment donor-only slice
Y_full[1:, T0:]of shape(n, T - T0).NoneifT0 == T(no post period available).T0 (int) – Number of pre-treatment periods.
T (int) – Total number of periods.
N (int) – Total number of units (target + donors).
treated_unit_name (str) – Identifier of the treated unit.
donor_names (Sequence) – Identifiers for the donor units in the order matching rows 1..N-1.
time_labels (np.ndarray) – Time labels in their original order, length
T.pre_periods (int) – Alias of
T0kept for compatibility with plotting helpers that expect aprocessed_data_dictfromdataprep.post_periods (int) –
T - T0.Ywide (object) – The wide pandas frame produced by
dataprep(rows = time, columns = units). Retained so thatplot_estimatescan use it directly without re-pivoting.y_target (np.ndarray) – Convenience copy of the full observed target series, length
T(post-treatment values are the observed values, used only for plotting and effect computation; TASC treats them as missing during filtering).
- Y_full: ndarray#
- Y_pre: ndarray#
- time_labels: ndarray#
- y_target: ndarray#
- class mlsynth.utils.tasc_helpers.structures.TASCParameters(A: ndarray, H: ndarray, Q: ndarray, R: ndarray, m0: ndarray, P0: ndarray)#
State-space parameters
theta = {A, H, Q, R, m0, P0}.- Parameters:
A (np.ndarray) – Transition matrix, shape
(d, d).H (np.ndarray) – Observation matrix, shape
(N, d). Row 0 ish_1^T.Q (np.ndarray) – State-noise covariance, shape
(d, d).R (np.ndarray) – Observation-noise covariance, shape
(N, N).m0 (np.ndarray) – Initial state mean, shape
(d,).P0 (np.ndarray) – Initial state covariance, shape
(d, d).
- A: ndarray#
- H: ndarray#
- P0: ndarray#
- Q: ndarray#
- R: ndarray#
- m0: ndarray#
- class mlsynth.utils.tasc_helpers.structures.TASCResults(inputs: TASCInputs, design: TASCDesign, inference: TASCInference, att: float, pre_rmse: float)#
Public
TASC.fit()return container.- Parameters:
inputs (TASCInputs) – Pre-processed panel data.
design (TASCDesign) – Learned model, EM diagnostics, and filtered / smoothed state arrays.
inference (TASCInference) – Counterfactual and posterior-based confidence intervals.
att (float) – Average treatment effect on the treated across post-treatment periods.
mean(y_{0, t} - y_hat_{0, t})fort > T0.pre_rmse (float) – Root mean squared error between the observed target and its smoother-based fit over the pre-treatment window.
- design: TASCDesign#
- inference: TASCInference#
- inputs: TASCInputs#
- class mlsynth.utils.tasc_helpers.structures.TASCSmoothedStates(m_s: ndarray, P_s: ndarray, G: ndarray)#
Output of the RTS backward pass.
- Parameters:
m_s (np.ndarray) – Smoothed means, shape
(T + 1, d). Index 0 ism_0^s.P_s (np.ndarray) – Smoothed covariances, shape
(T + 1, d, d). Index 0 isP_0^s.G (np.ndarray) – RTS smoother gain matrices, shape
(T + 1, d, d).G[k]is the gain used to smooth timekfromk + 1;G[T]is unused.
- G: ndarray#
- P_s: ndarray#
- m_s: ndarray#
Example#
from mlsynth import TASC
# TASC accepts either a TASCConfig instance or a plain dict.
config = {
"df": df,
"outcome": "sales",
"unitid": "state",
"time": "year",
"treat": "treated", # binary 0/1 treatment indicator
"d": 2, # hidden state dimension (small)
"n_em_iter": 50, # N_1 in Algorithm 2
"em_tol": 1e-4, # optional early-stopping on max |delta(A, H)|
"diagonal_Q": True, # paper default; set False for full covariance
"diagonal_R": True,
"alpha": 0.05, # significance level for posterior CIs
"display_graphs": True,
}
results = TASC(config).fit()
# Point estimate and fit diagnostic
print(results.att) # mean post-period gap, y_{0,t} - h_1' m_t^s
print(results.pre_rmse) # pre-period RMSE of the smoother's target fit
# Counterfactual path and posterior confidence band
cf = results.inference.counterfactual # length-T vector
lo = results.inference.ci_lower # length-T vector
hi = results.inference.ci_upper # length-T vector
# Learned model and EM diagnostics
theta = results.design.parameters # A, H, Q, R, m0, P0
print(theta.A.shape, theta.H.shape)
print(results.design.n_em_iter_used) # number of EM iterations executed
print(results.design.em_param_deltas) # per-iteration max |delta(A, H)|
# Smoothed latent trajectory (useful for downstream diagnostics)
m_s = results.design.smoothed.m_s # shape (T + 1, d)
P_s = results.design.smoothed.P_s # shape (T + 1, d, d)
# Inputs preserved on the result object for plotting / re-analysis
results.inputs.Y_full.shape # (N, T)
results.inputs.Y_pre.shape # (N, T0)
results.inputs.Y_post_donors.shape # (N - 1, T - T0) (None if no post)
results.inputs.treated_unit_name
results.inputs.donor_names
Verification#
Empirical replication against the authors’ published numbers (Path A)
plus a Section 5 state-space Monte Carlo (Path B). Path A reruns the
classical Proposition 99 California-tobacco illustration from Section
6.1 of [TASC] using the long-form panel
basedata/prop99_packsales.csv shipped with mlsynth, and
reproduces the post-1988 divergence between observed California
cigarette sales and the TASC counterfactual that the paper’s Figure 10
displays. Path B replicates the four-cell \((Q, R)\) ablation grid
(Figures 3 and 4) by drawing panels directly from TASC’s own
generative state-space model and comparing mlsynth.TASC against a
simplex-constrained Synthetic Control baseline – the same baseline
the paper benchmarks.
Path A: Proposition 99 California (Section 6.1)#
The paper runs TASC on per-capita cigarette sales alone (no auxiliary
predictors) with hidden-state dimension \(d = 2\). mlsynth.TASC
on the same long-form panel reproduces the qualitative pattern of the
paper’s Figure 10 directly:
import pandas as pd
from mlsynth import TASC
df = pd.read_csv("basedata/prop99_packsales.csv")
df["treat"] = ((df["state"] == "California")
& (df["year"] >= 1989)).astype(int)
res = TASC({"df": df, "outcome": "cigsale", "unitid": "state",
"time": "year", "treat": "treat", "d": 2,
"n_em_iter": 50, "em_tol": 1e-4, "alpha": 0.05,
"seed": 0, "display_graphs": False}).fit()
yhat = res.inference.counterfactual
print(f"pre-RMSE = {res.pre_rmse:.3f} ATT = {res.att:.3f}")
prints:
pre-RMSE = 0.767 ATT = -16.793
with the year-by-year trajectory
Year |
California (observed) |
TASC counterfactual |
Gap |
|---|---|---|---|
1985 |
102.80 |
102.66 |
+0.14 |
1988 |
90.10 |
91.88 |
-1.78 |
1989 |
82.40 |
88.30 |
-5.90 |
1990 |
77.80 |
84.37 |
-6.57 |
1995 |
56.40 |
76.50 |
-20.10 |
2000 |
41.60 |
65.14 |
-23.54 |
The 1985-1988 fit is essentially tight on California’s observed series (pre-RMSE \(= 0.77\) packs against an outcome scale of roughly 100 packs), the divergence opens at the 1989 intervention, and the gap widens monotonically – reaching a roughly \(-24\) pack difference by 2000 against the paper’s Figure-10 gap of about \(-25\) to \(-30\) packs at the same horizon. The average post-1989 treatment effect is \(\widehat{\mathrm{ATT}} = -16.8\) packs per year, in the same neighbourhood as Abadie, Diamond and Hainmueller’s classical estimate.
Path B: Section 5 state-space ablation grid#
The paper’s Section 5.2 ablation sweeps a \(2 \times 2\) grid of
state-perturbation and observation-noise covariance scales
\((Q, R)\) (Figures 3-4): a “small” covariance has diagonal
variance \(0.01\) (average \(|r_t| \approx 0.084\)) and a
“big” covariance has diagonal variance \(1.0\) (average
\(|r_t| \approx 0.836\)). Panels are drawn from TASC’s own
generative model (Equations 2-3), so this is a correctly-specified
Monte Carlo. The DGP is packaged as
mlsynth.utils.tasc_helpers.simulation.simulate_tasc_sample();
the panel below compares the post-period RMSE of mlsynth.TASC
(\(d_{\mathrm{fit}} = d_{\mathrm{true}} = 5\)) against a
simplex-constrained Synthetic Control baseline.
import numpy as np
import scipy.optimize as opt
from mlsynth import TASC
from mlsynth.utils.tasc_helpers.simulation import simulate_tasc_sample
def sc_simplex(Y, T0):
y, X = Y[0], Y[1:].T
n = X.shape[1]
cons = [{"type": "eq", "fun": lambda w: w.sum() - 1.0}]
bnds = [(0.0, 1.0)] * n
r = opt.minimize(lambda w: ((X[:T0] @ w - y[:T0]) ** 2).sum(),
np.full(n, 1.0 / n), method="SLSQP",
bounds=bnds, constraints=cons)
return float(np.sqrt(np.mean((y[T0:] - X[T0:] @ r.x) ** 2)))
def tasc_rmse(sample, d_fit=5):
r = TASC({"df": sample.df, "outcome": "y", "treat": "treat",
"unitid": "unit", "time": "time", "d": d_fit,
"n_em_iter": 30, "em_tol": 1e-4, "alpha": 0.05,
"seed": 0, "display_graphs": False}).fit()
y0, T0 = sample.Y[0], sample.T0
return float(np.sqrt(np.mean(
(y0[T0:] - r.inference.counterfactual[T0:]) ** 2)))
M = 30
for q, r in [(0.01, 0.01), (0.01, 1.0), (0.10, 0.01), (0.10, 1.0)]:
sc_, tasc_ = [], []
for seed in range(M):
s = simulate_tasc_sample(q_scale=q, r_scale=r,
rng=np.random.default_rng(seed))
sc_.append(sc_simplex(s.Y, s.T0))
tasc_.append(tasc_rmse(s))
print(f"q={q:.2f} r={r:.2f} TASC={np.median(tasc_):.3f} "
f"SC={np.median(sc_):.3f}")
prints (at \(M = 30\), \(N = 38\), \(T = 100\), \(T_0 = 50\), \(d_{\mathrm{true}} = 5\)):
Regime |
TASC median RMSE |
SC median RMSE |
Margin (SC / TASC) |
|---|---|---|---|
small Q, small R |
0.116 |
0.196 |
1.7x |
small Q, big R |
1.103 |
1.198 |
1.1x |
big Q, small R |
0.117 |
0.524 |
4.5x |
big Q, big R |
1.130 |
1.301 |
1.2x |
TASC carries the lowest median RMSE in all four regimes, and the
margin over SC is largest precisely in the high-\(Q\) /
low-\(R\) cell – the same regime where the paper’s Figure 4
identifies TASC’s strongest dominance (a fitted state-space model
extracts the persistent low-rank signal that the simplex projection
cannot exploit). Under high observation noise (\(R = 1.0\)), the
SC simplex projection still trails TASC but by a narrower margin,
reflecting the noise floor common to both estimators. The paper’s
Figures 3-4 also include the Robust Synthetic Control of Amjad,
Shah and Shen (2018) and the Causal Impact Model of Brodersen et al.
(2015) as additional comparators that are not in mlsynth; the
ordering above against the canonical simplex-SC baseline is the
slice of those comparisons that mlsynth can reproduce directly.
The takeaway carried into the published TASC procedure is the paper’s headline finding: when the data-generating process carries a persistent low-rank temporal signal – as it does in many policy panels with strong trends – explicitly fitting that temporal structure through a state-space model lowers post-period prediction error relative to permutation-invariant alternatives, and the advantage widens as the latent signal strengthens (large \(Q\)).
References#
Rho, S., Illick, C., Narasipura, S., Abadie, A., Hsu, D., & Misra, V. (2026). Time-Aware Synthetic Control. arXiv:2601.03099.
Durbin, J., & Koopman, S. J. (2012). Time Series Analysis by State Space Methods. Oxford Statistical Science Series 38, 2nd edition. Oxford University Press.