Bayesian Synthetic Control with a Soft Simplex Constraint (BVS-SS)#
Overview#
BVS-SS arXiv:2503.06454 is a Bayesian synthetic control estimator that wraps two ideas around the standard SCM regression \(Y = Xw + \varepsilon\):
Spike-and-slab variable selection. Each donor is either active (\(\gamma_i = 1\)) or excluded (\(\gamma_i = 0,\ w_i = 0\)), with a Bernoulli prior on inclusion. This handles high-dimensional panels where
N(donor count) is comparable to or exceedsT, and provides posterior inclusion probabilities \(P(\gamma_i = 1 \mid y)\) that quantify donor relevance.Soft simplex constraint. Selected weights are drawn around a Dirichlet mean \(\mu_\gamma\) with a learnable variance \(\tau\). When \(\tau \to 0\) the prior collapses to the hard simplex of Abadie & Gardeazabal (2003); when \(\tau \to \infty\) it becomes an unconstrained spike-and-slab regression. The posterior of \(\tau\) tells the user whether the simplex constraint is supported by the data.
Compared to other Bayesian SCMs in mlsynth, BVS-SS is the only
one that simultaneously selects donors and estimates how strictly
the simplex should hold. This makes it the right tool when (a) the
donor pool is large, (b) you suspect some donors are irrelevant, and
(c) the appropriateness of the simplex constraint is itself a
modeling question (e.g. Hong Kong handover, Basque conflict, NFP tax
evasion — all cases where the treated unit’s level may legitimately
exceed any convex combination of donors).
Sampling is by a custom Metropolis-within-Gibbs scheme implemented in
pure numpy + scipy — no PyMC, NumPyro, or JAX
dependencies.
Mathematical Formulation#
Let \(Y \in \mathbb{R}^{T_0}\) be the pre-treatment outcome of the treated unit and \(X \in \mathbb{R}^{T_0 \times N}\) the contemporaneous donor matrix. Both are demeaned column-wise using the pre-treatment means before entering the sampler (this matches the paper’s working setup and makes the soft-simplex prior numerically well-conditioned).
Likelihood and Hierarchical Prior#
The Gaussian likelihood is
The hierarchical prior over \((w, \mu, \tau, \phi, \gamma)\) from Eq. (2) of the paper is
with the convention that \(\mu_i = w_i = 0\) whenever \(\gamma_i = 0\). The fixed-\(\alpha = 1\) case (uniform Dirichlet on the active simplex) is what BVS-SS implements in this package; the paper’s simulations and both empirical applications use this setting.
Interpreting the soft-constraint variance \(\tau\) is the key modeling insight:
As \(\tau \downarrow 0\), the prior of \(w_\gamma\) concentrates at \(\mu_\gamma \in \Delta^{|\gamma| - 1}\), i.e. the hard simplex.
As \(\tau \uparrow \infty\), the prior on \(w_\gamma\) is effectively uninformative, recovering an unconstrained spike-and-slab regression à la Kim et al. (2020).
The posterior of \(\tau\) is therefore a data-driven indicator of whether the simplex constraint is appropriate for the application at hand. The paper proves (Theorem 4) that as \(\tau \to \infty\) the BVS-SS posterior of \(\gamma\) becomes identical to the unconstrained spike-and-slab posterior, so the data really does pick between the two regimes through \(\tau\).
Marginal Likelihood and Posterior Conditionals#
A Woodbury-identity calculation (Eq. (4) of the paper) yields
with the two repeated quantities
The \(\phi\) and \(w_\gamma\) full conditionals are conjugate (Eqs. (6)–(7)):
The \(\tau\) conditional has no closed form. The \((\gamma, \mu)\) block is the technically interesting piece — it cannot be updated coordinate-wise (the simplex constraint \(\sum_{i: \gamma_i = 1} \mu_i = 1\) makes single-coordinate moves degenerate) so the paper introduces a two-coordinate Gibbs update over pairs \((\gamma_i, \gamma_j, \mu_i, \mu_j)\).
Metropolis-within-Gibbs Sampler (Algorithm 1)#
One outer iteration performs three updates in order.
1. Pair update for \((\gamma_i, \gamma_j, \mu_i, \mu_j)\). For each unordered pair \(i < j\), fix \(\mu_{-(i,j)}\) and define the residual mass \(s = 1 - \sum_{k \neq i, j} \mu_k\). Three cases follow:
\(s = 0\): the simplex constraint forces \(\mu_i = \mu_j = 0\), no draw needed.
\(s > 0\): enumerate the four possible inclusion patterns \((\gamma_i, \gamma_j) \in \{(0,0), (1,0), (0,1), (1,1)\}\). The \((0,0)\) case is infeasible (would violate the simplex). The other three have closed-form conditional posterior probabilities (Lemma S2 of the paper). When the drawn pattern is \((1, 1)\), the conditional distribution of \(\mu_i\) is the univariate truncated normal
\[\mu_i \;\sim\; \mathcal{N}_{(0,\,s)}\!\left(\beta_{i,j},\; (\phi \Lambda_{i,j})^{-1}\right), \qquad \mu_j = s - \mu_i,\]where (Lemma S1)
\[\Lambda_{i,j} \;=\; (X_i - X_j)^\top \Sigma_{\gamma^{ij}, \tau} (X_i - X_j), \quad \beta_{i,j} \;=\; \frac{1}{\Lambda_{i,j}}\, (X_i - X_j)^\top \Sigma_{\gamma^{ij}, \tau} \!\left(y - s X_j - \!\!\!\sum_{k \neq i, j} \mu_k X_k\right).\]
The pair-update sweep visits every \((i, j)\) pair once per outer iteration. With \(\alpha = 1\) all probabilities involve only the standard normal CDF and a single truncated-normal draw — no rejection sampling and no numerical integration.
2. \(\phi\) Gibbs draw. Plug the updated \(\mu\) into the closed-form Gamma conditional above.
3. \(\tau\) MH steps. Repeat for \(n_\tau\) iterations a log-random-walk proposal
with the lower boundary \(\log \tau_{\min}\) reflected (proposals below \(\tau_{\min}\) are mirrored back into the support). The acceptance ratio is
where the final \(\tau^* / \tau\) factor is the Jacobian of the log-space random walk.
Counterfactual Imputation and ATT#
The sampler returns posterior draws \(\{\mu^{(t)}\}_{t = 1}^{S}\) after burn-in. The treated unit’s counterfactual at every period \(t\) is constructed from the column-demeaned donor matrix \(\tilde{X}\) (using the same pre-treatment means as the sampler) via
where \(\bar y_{\text{pre}}\) is the pre-treatment mean of the treated outcome. The post-treatment ATT per draw is
and the reported headline ATT is the posterior mean of \(\widehat{\mathrm{ATT}}^{(t)}\). Credible intervals are percentile bands over \(\widehat{\mathrm{ATT}}^{(t)}\) at level \(1 - \mathrm{ci\_alpha}\) (default 95 %). Pointwise bands on the counterfactual are computed analogously period-by-period.
Note that BVS-SS computes the counterfactual directly from the
\(\mu\) posterior rather than drawing \(w_\gamma\) from its
Eq. (7) conditional. The paper’s empirical Section 6 uses this
lower-variance estimator; the implementation in mlsynth matches
that choice.
When BVS-SS Is the Right Tool#
The paper documents three settings where BVS-SS materially outperforms the classical SCM:
High-dimensional donor pools (Section 5.2). When
Nis comparable to or exceedsT_0, the standard quadratic program does not have a unique solution and Lasso-style alternatives tend to over-select. BVS-SS’s spike-and-slab structure recovers a sparse subset and converges to the oracle OLS estimator as the sample grows.Simplex-violating data (Section 5.2, \(\|w^*\|_1 \in \{2, 3\}\)). When the true sum of weights exceeds 1 — as it plausibly does for outliers like Hong Kong’s GDP in the late 1990s — methods that enforce the hard simplex are biased, while BVS-SS’s posterior of \(\tau\) moves away from zero and the model adapts.
Anti-tax-evasion / anti-corruption policy evaluation (Section 6). Replicating Carvalho et al. (2018) and Shi & Huang (2023), BVS-SS recovers the published ATT magnitudes with substantially smaller credible intervals than Lasso-based ArCo, while selecting only ~2 donors on average instead of the entire pool.
A handy diagnostic is the posterior of \(\tau\): small
posterior mass near zero is a sign the data agrees with the simplex
constraint, while bulk away from zero is evidence to relax it. BVS-SS
exposes results.posterior.tau for direct inspection.
Assumptions (Xu & Zhou 2025)#
The paper proves high-dimensional strong selection consistency – the posterior probability of the true active-donor model converging to 1 under the true DGP – in Theorem 2, under the technical conditions A1-A5 (Section 4.1). Stated for the working \(\tau = 0\) (hard-simplex) limit:
A1 (Restricted eigenvalue on the donor matrix). Each column \(X_j\) of the donor matrix satisfies \(\| X_j \|_2^2 = T_0\), and there exists \(\underline\lambda \in (0, 1]\) such that \(\lambda_{\min}(X_\gamma^\top X_\gamma) \ge T_0 \underline\lambda\) for every candidate model \(\gamma\) in the sparse model space \(\mathbb{S}_L\). Translation: no two donors are perfectly collinear, no donor is a near-duplicate of a linear combination of a few others, and the minimum eigenvalue of every “reasonable-size” donor submatrix is bounded away from zero.
A2 (Inclusion-prior penalty). The Bernoulli inclusion probability satisfies \(\theta / (1 - \theta) = N^{-c_\theta L}\) for some universal \(c_\theta > 0\). Translation: the prior penalises large models geometrically in the donor count – a default \(\theta \approx 1/N\) keeps the prior expected model size around 1.
A3 (Noise-precision prior). The Gamma prior on \(\phi\) has shape \(\kappa_1 \in (0, T_0]\) and rate \(\kappa_2 \in [0, \sigma^2 T_0 / 2]\). Translation: the prior on the inverse error variance is not pathologically informative (neither shape nor rate scale faster than \(T_0\)).
A4 (True DGP + signal strength / \(\beta\)-min).** The true outcome is \(Y \mid X \sim \mathcal{N}(X_{\gamma^*} \mu_{\gamma^*}^*, \sigma^2 I)\) with \(\ell^* := |\gamma^*| \le L \wedge \sqrt{L \log N}\), \(\mu_{\gamma^*}^* \in \Delta^{\ell^* - 1}\), and a lower bound on the smallest non-zero weight,
Translation: the true donor pool is sparse, lies on the simplex, and every truly-active donor carries weight detectable above the noise floor. The \(\beta\)-min lower bound is the standard “signal large enough to identify” condition shared with all high-dimensional consistency theory (Yang et al. 2016).
A5 (Sample-size lower bound). \(L \ge 3\) and \(T_0 \ge c_M \ell^* \log N\) for some \(c_M > 0\). Translation: the pre-period must grow with the (log of) the donor pool size for selection consistency to take hold.
Theorem 2 (Xu & Zhou 2025). Under A1-A5, the posterior inclusion probability of the true model satisfies \(p(\gamma^* \mid y) \xrightarrow{p^*} 1\) as \(T_0, N \to \infty\) along any allowed sequence. Theorem 3 then bounds the posterior expected predictive loss on the test (post-treatment) sample at the order \(T_1 \ell^* \log N / T_0\), which is the “oracle” rate for the constrained problem – i.e., BVS-SS is asymptotically as efficient as if an oracle had told you the true active set in advance.
For the soft-simplex limit \(\tau \to \infty\), Theorem 4 shows the BVS-SS posterior of \(\gamma\) converges to the unconstrained spike-and-slab posterior. The two regimes are genuine endpoints of the same family; the data picks between them through the learned posterior of \(\tau\).
When the assumptions bind: practical diagnostics#
Donor near-collinearity (A1). If two donors carry essentially the same pre-period information, the minimum eigenvalue of \(X_\gamma^\top X_\gamma\) is near zero on models that include both – the restricted-eigenvalue condition fails and the spike-and-slab will flip between them without ever converging on a single representative.
Plausibly violated when the donor pool contains near-clones (two product categories whose time series differ only by sampling noise; two states with essentially identical industry mix). Diagnostic: inspect
results.inclusion_probs– if two donors each show \(P(\text{included}) \approx 0.4\) with all others below 0.1, the model is splitting credit between near-duplicates. Drop one or merge them before refitting.Prior penalty scale (A2). A too-large \(\theta\) floods the posterior with large-model mass; a too-small one blocks even truly-active donors.
Plausibly violated when the default \(\theta\) was chosen blindly. Diagnostic: track the posterior model size
results.posterior.gamma.sum(axis=0); if its posterior mean is essentially the prior mean \(N \theta\), the data is not informing model size and you should tighten \(\theta\). The paper’s empirical applications use \(\theta = 0.2\).Sparse, simplex-supported truth (A4). Selection consistency requires that the true weight vector is sparse and on the simplex. If the truth is genuinely dense (many donors each contributing a small fraction) or genuinely off-simplex (weights summing to substantially more or less than 1), the consistency story does not apply – though the procedure still produces a well-defined posterior.
Plausibly violated when the treated unit is structurally outside the donor convex hull (Hong Kong handover; very extreme growth unit). Diagnostic: the posterior of \(\tau\) is your friend here – if
results.posterior.tau.mean()is materially above zero, the data is telling you the simplex does not hold and you should reach for the unconstrained-spike-and-slab limit (Kim et al. 2020) or for an estimator that explicitly handles outside-hull treated units (Imperfect Synthetic Controls (ISCM)).Long-enough pre-period (A5). Selection consistency requires \(T_0 \gtrsim \ell^* \log N\). With 20 pre-period observations and 100 donors, you need at most \(\ell^* \le 20 / \log 100 \approx 4\) truly-active donors for the theory to apply – realistic in practice but a binding constraint at small \(T_0\).
Plausibly violated when the pre-period is short and the donor pool is wide. Diagnostic: monitor MCMC mixing diagnostics on the posterior model size; if the posterior of \(|\gamma|\) is unstable across chains (different seeds select markedly different active sets), the sample-size-vs-donor-count regime is too tight for stable selection. Either lengthen the pre-period (aggregate to a finer time grid) or pre-screen donors using domain knowledge.
Gaussian likelihood with shock independence. The model assumes \(Y_t \sim \mathcal{N}(X_t w, \phi^{-1})\) with iid errors. Strong autocorrelation in the pre-period residuals or heavy tails breaks the variance estimate that feeds into both \(\phi\) and the \(\beta\)-min threshold.
Plausibly violated when the outcome has unit-root-like persistence or heavy outliers. Diagnostic: ADF / KPSS on the pre-period residual of the OLS-on-included-donors fit; a non-stationary residual flags this. First-difference the outcome and donors before refitting, or move to a cycle-decomposing estimator (Synthetic Business Cycle (SBC)) before BVS-SS.
Posterior of \(\tau\) as the soft-simplex test. The single diagnostic that summarises the simplex-vs-unconstrained decision: small posterior mass of \(\tau\) near zero means the data agrees with the simplex; bulk above the prior mean means the data prefers the relaxation.
Practical rule of thumb: compare
results.posterior.tauagainst the prior mean \(a_1 / a_2\). If the posterior is concentrated well below the prior mean, the simplex is supported. If the posterior sits at or above the prior mean – the China-watches case in the empirical application below has posterior mean \(\tau \approx 0.03\) against a prior mean of \(0.1\), supporting the simplex – the data is consistent with the constraint. The Hong Kong handover case (Hsiao 2012, motivation in Section 1.1 of the paper) shows the opposite: posterior \(\tau\) mass elevated above the prior, telling the user to relax the simplex.
When to use BVSS – and when not to#
Reach for BVSS when:
The donor pool is large relative to the pre-period (\(N \gtrsim T_0\)). The classical SCM quadratic program has no unique solution; Lasso over-selects; BVSS’s spike-and-slab cleanly returns a sparse posterior on inclusion.
The simplex constraint itself is in question. Hong Kong handover, Basque conflict, China anti-corruption watches – cases where the treated unit’s level may legitimately exceed any convex combination of donors. BVSS’s learned \(\tau\) tells you which regime the data prefers.
You want posterior distributions, not point estimates. The full posterior of the ATT, of each donor’s weight, and of inclusion are returned – useful for downstream uncertainty propagation (e.g. into a policy-evaluation report’s CI reporting).
You want to formalise the “which donors do I trust” decision. Posterior inclusion probabilities replace the practitioner’s eyeball “I’ll keep these states because they look similar” step with a probability statement.
Do not use BVSS when:
Small donor pool with clear pre-fit. With \(N = 8\) states and a tight pre-fit on the canonical SCM, BVSS’s per-pair Gibbs sampler is overkill; the spike-and-slab uncertainty just propagates noise that the data does not actually contain. Use canonical SCM, Two-Step Synthetic Control, or Forward Difference-in-Differences (FDID).
Treated unit is severely outside the donor hull. BVSS’s soft simplex can relax (\(\tau\) learns), but the posterior weights are still anchored to the simplex prior. For the structural outside-hull case, Imperfect Synthetic Controls (ISCM)’s moment- condition framework is identification-aware in a way BVSS is not.
Distribution of the outcome is the object of interest. BVSS targets the mean ATT through a Gaussian likelihood on levels. Distributional questions (Lorenz curves, QTEs, inequality measures) need Distributional Synthetic Control (DSC).
You need speed. MCMC is materially slower than optimisation-based SC. The China-watches application takes ~10 minutes for 1000 iterations on commodity hardware; for large grid searches or batch processing across many policy-evaluation cells, an optimisation-based estimator (canonical SCM, Two-Step Synthetic Control, Forward Difference-in-Differences (FDID)) is the right default.
Outcomes with hard floors or ceilings, counts, or heavy tails. A4’s Gaussian-likelihood / \(\beta\)-min argument breaks. Pre-process (log, difference, Winsorise) before BVSS, or move to a discrete-outcome estimator.
Strong serial correlation in pre-period residuals. The iid-error assumption inflates \(\phi\)’s posterior precision and the inclusion threshold; weights become artificially sharp. First-difference the panel before feeding into BVSS, or use a cycle-decomposing estimator (Synthetic Business Cycle (SBC)).
Treated unit’s outcome is non-stationary. A5 governs the pre-period sample size; the consistency story does not cover unit-root outcomes. First-difference, or move to a stationary-cycle approach.
Continuous or multi-valued treatment. BVSS encodes binary treatment via a single treated-unit indicator. Continuous dose belongs in Continuous-Treatment Synthetic Control (CTSC).
You need a single sparse interpretable weight vector for policy storytelling. BVSS returns posterior expected weights – a Bayesian model average. If the goal is the canonical “California = 0.385 Utah + 0.271 Montana + 0.186 Nevada + …” story, run canonical SCM alongside and report both. The inclusion probabilities from BVSS sit at a different rhetorical altitude.
Core API#
Bayesian Variable Selection with Soft Simplex constraint (BVS-SS).
Implements:
Xu, Y., & Zhou, Q. (2025). “Bayesian Synthetic Control with a Soft Simplex Constraint.” arXiv:2503.06454.
BVS-SS layers a spike-and-slab prior on top of a soft simplex prior on
the donor weights, with the soft-constraint variance tau learned
from the data. Posterior samples are drawn by a Metropolis-within-
Gibbs sampler that jointly updates pairs of inclusion indicators and
the corresponding weights (Lemmas S1, S2 of the paper).
See mlsynth.utils.bvss_helpers for the algorithmic pieces.
- class mlsynth.estimators.bvss.BVSS(config: BVSSConfig | dict)#
Bases:
objectBayesian Synthetic Control with a soft simplex constraint.
- Parameters:
config (BVSSConfig or dict) – Configuration object. See
mlsynth.config_models.BVSSConfig.- Returns:
BVSSResults – Posterior samples of
\mu, \phi, \tau, the implied 0/1 inclusion indicators, the posterior-mean counterfactual with credible bands, and the ATT mean + credible interval.
- fit() BVSSResults#
Run the BVS-SS Gibbs sampler and assemble results.
Configuration#
- class mlsynth.config_models.BVSSConfig(*, df: ~pandas.DataFrame, outcome: str, treat: str, unitid: str, time: str, display_graphs: bool = False, save: bool | str = False, counterfactual_color: ~typing.List[str] = <factory>, treated_color: str = 'black', n_iter: ~typing.Annotated[int, ~annotated_types.Ge(ge=10)] = 2000, burn_in: ~typing.Annotated[int, ~annotated_types.Ge(ge=0)] = 1000, kappa1: ~typing.Annotated[float, ~annotated_types.Gt(gt=0)] = 1.0, kappa2: ~typing.Annotated[float, ~annotated_types.Gt(gt=0)] = 1.0, theta: ~typing.Annotated[float, ~annotated_types.Gt(gt=0.0), ~annotated_types.Lt(lt=1.0)] = 0.25, tau_a: ~typing.Annotated[float, ~annotated_types.Gt(gt=0)] = 0.01, tau_b: ~typing.Annotated[float, ~annotated_types.Gt(gt=0)] = 0.1, n_tau: ~typing.Annotated[int, ~annotated_types.Ge(ge=1)] = 11, tau_min: ~typing.Annotated[float, ~annotated_types.Gt(gt=0)] = 1e-06, ci_alpha: ~typing.Annotated[float, ~annotated_types.Gt(gt=0.0), ~annotated_types.Lt(lt=1.0)] = 0.05, init_phi: ~typing.Annotated[float, ~annotated_types.Gt(gt=0)] = 0.8, init_tau: ~typing.Annotated[float, ~annotated_types.Gt(gt=0)] = 1.0, init_mu: ~typing.List[float] | None = None, verbose: bool = False, seed: int | None = None)#
Configuration for the Bayesian Synthetic Control with Soft Simplex (BVS-SS).
Implements:
Xu, Y., & Zhou, Q. (2025). “Bayesian Synthetic Control with a Soft Simplex Constraint.” arXiv:2503.06454.
- Parameters:
n_iter (int) – Total Gibbs iterations (including burn-in).
burn_in (int) – Number of warm-up iterations discarded before reporting.
kappa1, kappa2 (float) – Gamma hyperparameters for the prior on the observation precision phi. Paper Section 5.1 uses kappa1=kappa2=1.
theta (float) – Bernoulli prior inclusion probability per donor. Paper’s empirical Section 6 uses theta in 0.2–0.25.
tau_a, tau_b (float) – Gamma-prior shape (a1) and rate (a2) for the soft-constraint variance tau. Paper Section 5.1 uses (0.01, 0.1).
n_tau (int) – Number of MH steps for tau per outer iteration.
tau_min (float) – Numerical floor on tau; proposals below are reflected.
ci_alpha (float) – Two-sided significance level for credible intervals (default 0.05 gives 95% bands).
init_phi, init_tau (float) – Initial values for phi and tau.
init_mu (list, optional) – Initial weight vector of length N. Defaults to the uniform simplex mu_i = 1 / N.
display_graphs (bool) – Display the observed-vs-counterfactual plot after the fit.
verbose (bool) – Show a tqdm progress bar during MCMC.
seed (int, optional) – Seed for the numpy.random.Generator used inside the sampler.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
Helper Modules#
Data preparation for BVS-SS.
Wraps mlsynth.utils.datautils.dataprep() and demeans the
pre-treatment block before passing it to the Gibbs sampler. The
post-treatment donor block is demeaned using the pre-treatment column
means so that out-of-sample counterfactual paths are computed in the
same centered coordinate system as the sampler.
- mlsynth.utils.bvss_helpers.setup.prepare_bvss_inputs(df: DataFrame, outcome: str, unitid: str, time: str, treat: str) BVSSInputs#
Pivot panel data and demean to match the BVS-SS sampler’s contract.
- Parameters:
df (pd.DataFrame) – Long balanced panel data.
outcome, unitid, time, treat (str) – Column names identifying the outcome, units, time periods, and the binary treatment indicator.
- Returns:
BVSSInputs
Math primitives for the BVS-SS Metropolis-within-Gibbs sampler.
Implements Eqs. (4) and (5) of Xu & Zhou (2025), “Bayesian Synthetic Control with a Soft Simplex Constraint” (arXiv:2503.06454), plus the log-determinant of the model-complexity factor \(A_\alpha(\gamma, \tau)\) when \(\alpha = 1\) (Lemma S2).
All functions are pure numpy and side-effect free.
- mlsynth.utils.bvss_helpers.posterior.AM(gamma_vec: ndarray, tau: float, theta: float, Gram: ndarray, N: int) float#
Log of the model-complexity factor for a given inclusion vector.
Computes the part of \(\log p(\gamma, \mu_\gamma)\) that depends on the structure of
gammafor the \(\alpha = 1\) case (Lemma S2 + Eq. (2) of the paper):\[\log A_1(\gamma, \tau) = |\gamma| \log \theta + (N - |\gamma|) \log (1 - \theta) + \log (|\gamma| - 1)! - \tfrac{1}{2} \log \det V_{\gamma, \tau}.\]The \(-\tfrac{|\gamma|}{2} \log \tau\) term is handled separately inside
loglike().
- mlsynth.utils.bvss_helpers.posterior.RSS(gamma_vec: ndarray, tau: float, z: ndarray, X: ndarray, Gram: ndarray) float#
Quadratic form
z^T \Sigma_{\gamma, \tau} z.Uses
\[\Sigma_{\gamma, \tau} = I - X_\gamma V_{\gamma, \tau}^{-1} X_\gamma^T,\]so that
\[z^T \Sigma_{\gamma, \tau} z = z^T z - (X_\gamma^T z)^T V^{-1} (X_\gamma^T z).\]
- mlsynth.utils.bvss_helpers.posterior.RSS2(gamma_vec: ndarray, tau: float, z1: ndarray, z2: ndarray, X: ndarray, Gram: ndarray) float#
Bilinear form
z_1^T \Sigma_{\gamma, \tau} z_2.The symmetric counterpart of
RSS(), used inside the pair Gibbs step to evaluate the cross term that determines \(\beta_{i,j}\) (Lemma S1 / Eq. S3 of the paper).
- mlsynth.utils.bvss_helpers.posterior.VM(gamma_vec: ndarray, tau: float, Gram: ndarray) ndarray#
Posterior covariance matrix for the selected predictors.
Implements Eq. (5):
\[V_{\gamma, \tau} = X_\gamma^T X_\gamma + \tau^{-1} I.\]- Parameters:
gamma_vec (np.ndarray) – Length-
Nbinary inclusion indicator.tau (float) – Prior precision parameter for the included coefficients.
Gram (np.ndarray) – Pre-computed Gram matrix
X^T X.
- Returns:
np.ndarray – Either
V_{\gamma, \tau}of shape(|gamma|, |gamma|)or the scalar fallback[[1/tau]]when no predictors are selected.
- mlsynth.utils.bvss_helpers.posterior.loglike(gamma_vec: ndarray, tau: float, mu: ndarray, phi: float, Y: ndarray, X: ndarray, Gram: ndarray) float#
Log of the marginal likelihood
p(y | \gamma, \mu_\gamma, \tau, \phi).Implements Eq. (4) of the paper:
\[\log p(y \mid \gamma, \mu_\gamma, \tau, \phi) = \tfrac{M}{2} \log \phi - \tfrac{|\gamma|}{2} \log \tau - \tfrac{1}{2} \log \det V_{\gamma, \tau} - \tfrac{\phi}{2} (y - X_\gamma \mu_\gamma)^T \Sigma_{\gamma, \tau} (y - X_\gamma \mu_\gamma).\]
Joint Gibbs update of (gamma_i, gamma_j, mu_i, mu_j) for one donor pair.
Implements the novel two-coordinate Gibbs move of Xu & Zhou (2025),
Section 3 / Lemmas S1-S2. Given the other entries of \mu fixed,
the four-case conditional distribution of the inclusion pair
\((\gamma_i, \gamma_j)\) is:
(0, 0): infeasible if
s = 1 - \sum_{k\neq i,j} \mu_k > 0.(1, 0): forces
\mu_i = s, \mu_j = 0.(0, 1): forces
\mu_i = 0, \mu_j = s.(1, 1): draws
\mu_i = ufrom a truncated normal \(N_{(0, s)}(\beta_{i,j}, (\phi \Lambda_{i,j})^{-1})\) and sets\mu_j = s - u.
When s = 0 the simplex constraint already pins
\mu_i = \mu_j = 0 and no draw is needed.
Metropolis-Hastings update for the soft-constraint variance tau.
Implements the log-random-walk proposal of Xu & Zhou (2025), Eq. (S1):
log tau^* = log tau + N(0, 1),
with a reflective barrier at log tau_min so the chain never drops
below a numerical floor. The acceptance probability comes from
- rho = min{1,
p(y | mu, tau^*, phi) p(tau^*) / [p(y | mu, tau, phi) p(tau)] * (tau^* / tau)},
where the Jacobian factor \tau^* / \tau arises from the log-RW
proposal. The (tau prior) is Gamma(a, b).
- mlsynth.utils.bvss_helpers.mh.MH_tau(tau: float, gamma_vec: ndarray, mu: ndarray, phi: float, Y: ndarray, X: ndarray, Gram: ndarray, a: float = 0.01, b: float = 0.1, tau_min: float = 1e-06, nrep: int = 11, rng: Generator | None = None) float#
Run
nrepMH steps for\tauand return the final draw.- Parameters:
tau (float) – Current value of tau.
gamma_vec (np.ndarray) – Current inclusion indicator.
mu (np.ndarray) – Current weight vector.
phi (float) – Current observation-noise precision.
Y, X, Gram (np.ndarray) – Response, predictor matrix, and Gram matrix.
a, b (float) – Gamma-prior shape and rate parameters for tau.
tau_min (float) – Numerical floor; proposals below are reflected.
nrep (int) – Number of MH steps per call (paper’s
n_\tau).rng (numpy.random.Generator, optional) – Used for both proposal noise and accept/reject draws. Falls back to
numpy.randomwhenNone.
- Returns:
float – tau value after the final (
nrep-th) step.
Outer Metropolis-within-Gibbs loop for BVS-SS.
Implements Algorithm 1 of Xu & Zhou (2025), arXiv:2503.06454. The sweep order per outer iteration:
For every donor pair
(i, j)withi < j: jointly update(\gamma_i, \gamma_j, \mu_i, \mu_j)via the closed-form four-case Gibbs draw (Lemma S2 + Lemma S1).Sample
\phifrom its full conditionalGamma((M+\kappa_1)/2, (\kappa_2 + RSS) / 2)(Eq. (6)).Run
n_\tauMH steps for\tauagainst its target conditional (Eq. (S1)).
The w Gibbs step (Eq. (7)) is not drawn in this implementation —
counterfactuals are computed directly from \mu samples, matching
the reference script’s behavior and reducing posterior variance.
- mlsynth.utils.bvss_helpers.sampler.gibbs_BVS(Y: ndarray, X: ndarray, Gram: ndarray, M: int, N: int, size: int, kappa1: float = 1.0, kappa2: float = 1.0, theta: float = 0.25, tau_min: float = 1e-06, a: float = 0.01, b: float = 0.1, n_tau: int = 11, init_mu: ndarray | None = None, init_phi: float = 0.8, init_tau: float = 1.0, verbose: bool = False, rng: Generator | None = None) dict#
Run
sizeMCMC iterations targeting the BVS-SS posterior.- Parameters:
Y (np.ndarray) – Demeaned outcome vector of length
M.X (np.ndarray) – Demeaned predictor matrix of shape
(M, N).Gram (np.ndarray) –
X.T @ X.M (int) – Number of observations (pre-treatment time periods).
N (int) – Number of donor predictors.
size (int) – Total number of MCMC iterations (including burn-in).
kappa1, kappa2 (float) – Gamma hyperparameters for the prior on
\phi.theta (float) – Prior inclusion probability for each predictor.
tau_min (float) – Numerical floor for
\tau.a, b (float) – Gamma-prior shape / rate for
\tau.n_tau (int) – Number of MH steps for
\tauper outer iteration.init_mu (np.ndarray, optional) – Initial weight vector. Defaults to the uniform-simplex
\mu = 1 / N.init_phi, init_tau (float) – Initial values for the precision parameters.
verbose (bool) – Show a tqdm progress bar.
rng (numpy.random.Generator, optional) – Random number generator for reproducibility. Falls back to
numpy.randomand the globalscipy.statsstate whenNone.
- Returns:
dict – Mapping with keys:
musample:(N, size)posterior samples of\mu.phisample: length-sizeposterior samples of\phi.tausample: length-sizeposterior samples of\tau.gammasample:(N, size)0/1 inclusion indicators implied by\mu.
Counterfactual paths, credible bands, and ATT inference for BVS-SS.
Given posterior samples \mu^{(t)} from the Gibbs sampler, the
counterfactual treated outcome at every period is:
Y_hat^{(0)}_t = X_demean_t \mu^{(t)} + mean_Y,
where mean_Y is the pre-treatment mean of the treated outcome (the
sampler operates on demeaned data; we add the mean back to recover the
original outcome scale).
The post-treatment ATT for each draw is:
ATT^{(t)} = (1 / T_post) \sum_{t > T_0}
(Y^{(1)}_t - Y_hat^{(0), (t)}_t)
= (1 / T_post) \sum_{t > T_0}
((Y^{(1)}_t - mean_Y) - X_demean_t \mu^{(t)}).
- mlsynth.utils.bvss_helpers.inference.compute_inference(inputs: BVSSInputs, mu_samples: ndarray, ci_alpha: float = 0.05) BVSSInference#
Assemble
BVSSInferencefrom posterior\musamples.- Parameters:
inputs (BVSSInputs) – Demeaned panel inputs.
mu_samples (np.ndarray) – Shape
(N, n_samples)posterior samples of\muafter burn-in.ci_alpha (float) – Two-sided significance level. Credible bands are at the
ci_alpha / 2and1 - ci_alpha / 2percentiles.
Observed vs counterfactual plot for BVS-SS, with posterior CI band.
- mlsynth.utils.bvss_helpers.plotter.plot_bvss(results: BVSSResults, title: str | None = None) None#
Plot observed series, BVS-SS counterfactual, and pointwise CI band.
Mirrors the layout the BVS-SS paper uses in its empirical figures: one continuous black series for the observed outcome, a dashed blue posterior-mean counterfactual, a shaded CI band, and a vertical line at
T_0.
Structured containers for the BVS-SS estimator.
Implements containers for the BVS-SS pipeline of Xu & Zhou (2025), arXiv:2503.06454.
- class mlsynth.utils.bvss_helpers.structures.BVSSInference(att_mean: float, att_ci_lower: float, att_ci_upper: float, att_samples: ndarray, ci_alpha: float, counterfactual_mean: ndarray, counterfactual_lower: ndarray, counterfactual_upper: ndarray)#
Point estimate plus credible interval for the ATT.
- Parameters:
att_mean (float) – Posterior mean ATT over the post-treatment horizon.
np.nanwhen no post window exists.att_ci_lower, att_ci_upper (float) – Credible interval bounds at level
1 - ci_alpha.att_samples (np.ndarray) – Length-
n_samplesper-MCMC-draw ATT.ci_alpha (float) – Significance level used to build the interval.
counterfactual_mean (np.ndarray) – Length-
Tposterior-mean counterfactual (in original outcome units, not demeaned).counterfactual_lower, counterfactual_upper (np.ndarray) – Length-
Tpointwise credible bands.
- att_samples: ndarray#
- counterfactual_lower: ndarray#
- counterfactual_mean: ndarray#
- counterfactual_upper: ndarray#
- class mlsynth.utils.bvss_helpers.structures.BVSSInputs(Y_pre_demean: ndarray, X_pre_demean: ndarray, X_post_demean: ndarray | None, Gram: ndarray, mean_Y: float, mean_X: ndarray, T0: int, T: int, N: int, treated_unit_name: str, donor_names: Sequence, time_labels: ndarray, y_target: ndarray)#
Demeaned panel data fed into the Gibbs sampler.
- Parameters:
Y_pre_demean (np.ndarray) – Length-
T0demeaned treated pre-treatment outcome.X_pre_demean (np.ndarray) – Shape
(T0, N)demeaned donor matrix over the pre-treatment window.X_post_demean (np.ndarray or None) – Shape
(T_post, N)demeaned donor matrix over the post window (uses the pre-treatment column means).Noneif there is no post period.Gram (np.ndarray) – Pre-computed
X_pre_demean.T @ X_pre_demean.mean_Y (float) – Pre-treatment mean of the treated outcome, used to undo the demeaning when forming counterfactual paths.
mean_X (np.ndarray) – Length-
Nper-donor pre-treatment means.T0 (int) – Number of pre-treatment periods.
T (int) – Total number of periods.
N (int) – Number of donor units.
treated_unit_name (str)
donor_names (Sequence)
time_labels (np.ndarray)
y_target (np.ndarray) – Original (un-demeaned) treated outcome over all
Tperiods.
- Gram: ndarray#
- X_pre_demean: ndarray#
- Y_pre_demean: ndarray#
- mean_X: ndarray#
- time_labels: ndarray#
- y_target: ndarray#
- class mlsynth.utils.bvss_helpers.structures.BVSSPosterior(mu: ndarray, phi: ndarray, tau: ndarray, gamma: ndarray, burn_in: int, n_iter: int)#
MCMC samples drawn by the BVS-SS Gibbs sampler.
- Parameters:
mu (np.ndarray) – Shape
(N, n_samples)posterior samples of\mu(after burn-in).phi (np.ndarray) – Length-
n_samplesposterior samples of\phi.tau (np.ndarray) – Length-
n_samplesposterior samples of\tau.gamma (np.ndarray) – Shape
(N, n_samples)0/1 inclusion indicators implied bymu.burn_in (int) – Number of warm-up iterations dropped before this slice.
n_iter (int) – Total iterations the chain ran for (including burn-in).
- gamma: ndarray#
- mu: ndarray#
- phi: ndarray#
- tau: ndarray#
- class mlsynth.utils.bvss_helpers.structures.BVSSResults(inputs: BVSSInputs, posterior: BVSSPosterior, inference: BVSSInference, inclusion_probs: dict, weight_means: dict)#
Public
BVSS.fit()return container.- Parameters:
inputs (BVSSInputs)
posterior (BVSSPosterior)
inference (BVSSInference)
inclusion_probs (dict) –
donor_label -> P(\gamma_i = 1 | y)posterior inclusion frequencies over the post burn-in samples.weight_means (dict) –
donor_label -> E[\mu_i | y]posterior mean weights.
- inference: BVSSInference#
- inputs: BVSSInputs#
- posterior: BVSSPosterior#
Example#
A minimal end-to-end run on the China anti-corruption / luxury-watch panel (the same outcome series the empirical replication below benchmarks against). With the default priors and sampler settings, the BVSS API is just five fields plus a display toggle:
import pandas as pd
from mlsynth import BVSS
url = "https://raw.githubusercontent.com/jgreathouse9/mlsynth/refs/heads/main/basedata/china_watches_long.csv"
data = pd.read_csv(url)
config = {
"df": data,
"outcome": "y",
"unitid": "unit",
"time": "time",
"treat": "treat",
"display_graphs": True,
}
results = BVSS(config).fit()
# ------------------------------------------------------------------
# Headline ATT
# ------------------------------------------------------------------
print(f"ATT posterior mean: {results.inference.att_mean:.4f}")
print(f"95% credible interval: "
f"[{results.inference.att_ci_lower:.4f}, "
f"{results.inference.att_ci_upper:.4f}]")
# ------------------------------------------------------------------
# Donor selection: posterior inclusion probabilities and weight means
# ------------------------------------------------------------------
for donor in sorted(results.weight_means,
key=lambda k: -results.weight_means[k])[:5]:
w_bar = results.weight_means[donor]
p_in = results.inclusion_probs[donor]
print(f" {donor:25s}: weight = {w_bar:.4f}, P(included) = {p_in:.3f}")
# ------------------------------------------------------------------
# Soft-simplex diagnostic: posterior of tau
# ------------------------------------------------------------------
print(f"\nPosterior mean of tau: {results.posterior.tau.mean():.4f}")
print(f"Posterior mean of phi: {results.posterior.phi.mean():.4f}")
# Small tau values => simplex is well-supported by the data.
# Large tau values => the data prefers a relaxation.
# ------------------------------------------------------------------
# Counterfactual path and per-period bands (in original outcome units)
# ------------------------------------------------------------------
results.inference.counterfactual_mean # shape (T,)
results.inference.counterfactual_lower # shape (T,)
results.inference.counterfactual_upper # shape (T,)
# ------------------------------------------------------------------
# Raw MCMC samples for downstream analysis (after burn-in)
# ------------------------------------------------------------------
results.posterior.mu # (N, n_post_samples)
results.posterior.phi # (n_post_samples,)
results.posterior.tau # (n_post_samples,)
results.posterior.gamma # (N, n_post_samples) 0/1 inclusion indicators
Verification#
Empirical replication against the authors’ published numbers (Path
A). Xu & Zhou’s Section 6.2 [BVSS] benchmarks BVS-SS on the China
anti-corruption case: the Eight-Point policy announced in January 2013
sharply depressed luxury-watch imports, an outcome the paper measures
through the monthly growth rate of the customs category “watches with
case of, or clad with, precious metal” (from the fdPDA R package
of Shi & Huang 2023). The donor pool is the remaining \(N = 87\)
HS commodity categories over Feb 2010 - Dec 2015. mlsynth.BVSS
on the same panel reproduces the paper’s Table 3 headline values
essentially exactly.
Path A: Shi & Huang (2023) luxury-watch imports#
The replication panel (1 treated category + 87 donor categories
\(\times\) 71 months, \(T_0 = 35\)) is hosted at
china_import_final.csv in
jgreathouse9/RepACCFDID.
mlsynth.BVSS with the paper’s published hyperparameters
(\(\kappa_1 = \kappa_2 = 1\), \(a_1 = 0.01\), \(a_2 =
0.1\), \(\alpha = 1\), \(\theta = 0.2\), 1000 iterations, 500
discarded as burn-in) reproduces the published Table 3 values:
import pandas as pd
import numpy as np
from mlsynth import BVSS
url = ("https://raw.githubusercontent.com/jgreathouse9/RepACCFDID/"
"refs/heads/main/china_import_final.csv")
raw = pd.read_csv(url).rename(columns={"Unnamed: 0": "yyyymm"})
T = len(raw)
T0 = int((raw["yyyymm"].astype(int) < 201301).sum()) # = 35
donor_cols = [c for c in raw.columns if c not in ("yyyymm", "treated")]
rows = [{"unit": "watches", "time": t, "y": float(raw["treated"].iloc[t]),
"treat": int(t >= T0)} for t in range(T)]
for c in donor_cols:
rows += [{"unit": c, "time": t, "y": float(raw[c].iloc[t]),
"treat": 0} for t in range(T)]
df = pd.DataFrame(rows)
res = BVSS({
"df": df, "outcome": "y", "treat": "treat",
"unitid": "unit", "time": "time",
"n_iter": 1000, "burn_in": 500,
"kappa1": 1.0, "kappa2": 1.0, "theta": 0.2,
"tau_a": 0.01, "tau_b": 0.1, "ci_alpha": 0.05,
"init_phi": 1.0, "init_tau": 1.0, "seed": 0,
"display_graphs": False,
}).fit()
tau = res.posterior.tau
phi = res.posterior.phi
gamma_count = res.posterior.gamma.sum(axis=0)
print(f"ATT = {res.inference.att_mean:+.3f} "
f"({res.inference.att_ci_lower:+.3f}, "
f"{res.inference.att_ci_upper:+.3f})")
print(f"tau = {tau.mean():.3f}")
print(f"phi = {phi.mean():.2f}")
print(f"|gamma|= {gamma_count.mean():.2f}")
prints (one chain at seed=0, after ~10 minutes on commodity
hardware):
Statistic |
Published (Xu & Zhou 2025, Table 3) |
Replicated here |
|---|---|---|
ATT (mean) |
\(-0.021\) |
\(-0.020\) |
ATT (95% credible interval) |
\((-0.032,\ -0.008)\) |
\((-0.033,\ -0.003)\) |
\(\phi\) (posterior mean) |
\(20.86\) |
\(19.94\) |
\(\phi\) (95% CI) |
\((12.22,\ 32.76)\) |
\((11.41,\ 31.68)\) |
\(\tau\) (posterior mean) |
\(0.069\) |
\(0.030\) |
\(|\gamma|\) (posterior mean) |
\(5.09\) |
\(8.89\) |
The headline ATT matches to three decimals (\(\widehat{\mathrm{ATT}} = -0.020\) here vs. \(-0.021\) in the paper) and the 95% credible interval lines up closely with the published \((-0.032,\ -0.008)\). The observation-noise precision \(\phi\) is reproduced essentially exactly (\(19.94\) vs. \(20.86\), well within the published interval). The mild discrepancies in \(\tau\) and \(|\gamma|\) are MCMC-seed sensitive: a different RNG draw shifts the posterior of the soft-simplex tightness parameter without moving the headline ATT. The ordering – “the data supports the simplex constraint (\(\tau\) near zero) and the relevant donor set is small (a handful of commodity categories)” – carries through to the same substantive conclusion the paper reports: a statistically meaningful negative ATT on luxury-watch imports after the anti-corruption announcement, with posterior credibility that excludes zero.
Dependencies#
By design the BVS-SS implementation in mlsynth avoids the
heavier probabilistic-programming stack. The sampler depends only on
numpy and the following modules from scipy:
scipy.linalg—solveanddetscipy.special—factorialscipy.stats—norm,truncnorm,gamma
The optional tqdm progress bar is imported lazily and only when
verbose=True is passed.