Welcome to mlsynth 0.1.2

Welcome to mlsynth 0.1.2#

Synthetic control, for everyone.

mlsynth is an open-source Python toolbox of synthetic-control methods for program evaluation. It implements the classical Abadie-Diamond-Hainmueller estimator alongside a growing catalogue of modern variants – Bayesian spike-and-slab selection, state-space modelling, instrumental variants, sequential difference-in-differences, matrix completion – under a single long-DataFrame API. Every estimator’s documentation page includes a Verification section that reproduces the original paper’s reported numbers where applicable.

For example, the following code replicates Abadie, Diamond and Hainmueller’s Proposition 99 study end-to-end. It loads the panel shipped with the library, fits TSSC (which auto-selects between four SC-class variants based on a pre-trends test), and prints the recommended ATT with a 95% subsampling confidence interval:

import pandas as pd
from mlsynth import TSSC

# Long panel: 50 US states x 31 years of per-capita cigarette sales.
url = ("https://raw.githubusercontent.com/jgreathouse9/mlsynth/"
       "main/basedata/prop99_packsales.csv")
df = pd.read_csv(url)
df["treat"] = ((df["state"] == "California")
                & (df["year"] >= 1989)).astype(int)

res = TSSC({"df": df, "outcome": "cigsale", "unitid": "state",
             "time": "year", "treat": "treat",
             "display_graphs": False, "seed": 0}).fit()

print(f"recommended: {res.recommended_method}")
print(f"ATT = {res.att:+.2f} packs/yr  "
       f"(95% CI: {res.att_ci[0]:+.2f}, {res.att_ci[1]:+.2f})")

prints:

recommended: SC
ATT = -14.95 packs/yr  (95% CI: -16.06, -9.65)

This short script is a representative example of what mlsynth can do. In addition to classical SC, mlsynth also supports Bayesian variable selection (Bayesian Synthetic Control with a Soft Simplex Constraint (BVS-SS)), staggered-adoption sequential difference-in-differences (Sequential Synthetic Difference-in-Differences (Sequential SDiD), Spatial Synthetic Difference-in-Differences (SpSyDiD)), instrumental synthetic control (Synthetic IV), matrix completion under missingness (Matrix Completion with Nuclear Norm Minimization (MCNNM)), state- space time-aware control (Time-Aware Synthetic Control (TASC)), and clustered / robust high- dimensional variants (Cluster Synthetic Controls (CLUSTERSC), Multi-Level Synthetic Control (mlSC), Synthetic Controls for Experimental Design (MAREX)).

For a guided tour of the estimator catalogue, start with the About mlsynth page. Browse the Estimators sidebar for the full list grouped by methodology.

mlsynth builds on top of numpy, pandas, scipy, scikit-learn, cvxpy, pydantic, and statsmodels; convex programs are routed through cvxpy’s solver stack.

Not sure which estimator to use? Walk the A practitioner’s decision tree decision tree – a sequence of identification and design questions that funnels you from “what kind of problem do I have?” down to one or two methods, with the catalogue grouped by family.

Community.

The mlsynth community spans economists, statisticians, and data scientists who use synthetic-control methods for program evaluation across policy, marketing, sports, and public health. We welcome you to join us!

Development.

mlsynth is maintained by Jared Greathouse (Georgia State University). The project would not be possible without the kind efforts of and discussions with Jason Coupet, Kathy Li, Mani Bayani, Zhentao Shi, and Jaume Vives-i-Bastida, along with a growing list of contributors.

News.

The verification campaign now covers thirty-two of the thirty-six estimators in mlsynth – each auditing its implementation against its source paper, either by reproducing an empirical Table value on the authors’ own data (“Path A”) or by reproducing a Monte Carlo from the paper’s simulation section (“Path B”), or against an authoritative reference implementation. See the Replications page for the full catalogue with headline numbers.