Welcome to mlsynth 1.0.0#

Synthetic control, for everyone.

mlsynth is an open-source Python toolbox of synthetic-control methods for program evaluation. It implements the classical Abadie-Diamond-Hainmueller estimator alongside a growing catalogue of modern variants – Bayesian spike-and-slab selection, state-space modelling, instrumental variants, sequential difference-in-differences, matrix completion – under a single long-DataFrame API. Every estimator’s documentation page includes a Verification section that reproduces the original paper’s reported numbers where applicable.

For example, the following code reproduces Abadie, Diamond and Hainmueller’s Proposition 99 study and cross-checks it against a robust variant. It loads the California tobacco panel shipped with the library, fits the classical synthetic control on the canonical predictor set, then fits an ordinary-least-squares principal-component-regression control on the same outcomes, and prints both estimated effects:

import pandas as pd
from mlsynth import VanillaSC, CLUSTERSC

# California tobacco panel, 1970-2000: per-capita cigarette sales plus the
# Abadie-Diamond-Hainmueller predictors.
url = ("https://raw.githubusercontent.com/jgreathouse9/mlsynth/"
       "main/basedata/augmented_cali_long.csv")
df = pd.read_csv(url)
df = df[df["year"] <= 2000].copy()
df["treated"] = ((df["state"] == "California")
                 & (df["year"] >= 1989)).astype(int)

# Three lagged-outcome predictors, as in ADH (2010), Table 2.
for L in (1975, 1980, 1988):
    df[f"cig{L}"] = df["state"].map(
        df[df["year"] == L].set_index("state")["cigsale"])

covariates = ["loginc", "p_cig", "pct15-24", "pc_beer",
              "cig1975", "cig1980", "cig1988"]
windows = {"loginc": (1980, 1988), "p_cig": (1980, 1988),
           "pct15-24": (1980, 1988), "pc_beer": (1984, 1988),
           "cig1975": (1975, 1975), "cig1980": (1980, 1980),
           "cig1988": (1988, 1988)}

common = dict(df=df, outcome="cigsale", treat="treated",
              unitid="state", time="year", display_graphs=False)

# Classical synthetic control on the canonical predictor set.
sc = VanillaSC({**common, "covariates": covariates,
                "covariate_windows": windows,
                "backend": "mscmt", "canonical_v": "min.loss.w",
                "seed": 0}).fit()

# Robust principal-component-regression control on the same outcomes.
pcr = CLUSTERSC({**common, "method": "pcr",
                 "pcr_objective": "OLS"}).fit()

print(f"VanillaSC ATT = {sc.att:+.2f} packs/yr")
print(f"OLS-PCR   ATT = {pcr.att:+.2f} packs/yr")

prints:

VanillaSC ATT = -18.98 packs/yr
OLS-PCR   ATT = -21.39 packs/yr

The classical synthetic control reproduces ADH’s headline estimate – a drop of about nineteen packs per capita – and the principal-component control, built on a different identifying assumption, agrees on a large reduction. Fitting an estimator and then stress-testing the finding against a second one is the everyday way mlsynth is used.

This short script is a representative example of what mlsynth can do. In addition to classical SC, mlsynth also supports Bayesian variable selection (Bayesian Synthetic Control with a Soft Simplex Constraint (BVS-SS)), staggered-adoption sequential difference-in-differences (Sequential Synthetic Difference-in-Differences (Sequential SDiD), Spatial Synthetic Difference-in-Differences (SpSyDiD)), instrumental synthetic control (Synthetic IV), matrix completion under missingness (Matrix Completion with Nuclear Norm Minimization (MCNNM)), state- space time-aware control (Time-Aware Synthetic Control (TASC)), and clustered / robust high- dimensional variants (Cluster Synthetic Controls (CLUSTERSC), Multi-Level Synthetic Control (mlSC), Synthetic Controls for Experimental Design (MAREX)).

For a guided tour of the estimator catalogue, start with the About mlsynth page. Browse the Estimators sidebar for the full list grouped by methodology.

mlsynth builds on top of numpy, pandas, scipy, scikit-learn, cvxpy, pydantic, and statsmodels; convex programs are routed through cvxpy’s solver stack.

Installation.

Install the latest version straight from GitHub:

pip install -U git+https://github.com/jgreathouse9/mlsynth.git

mlsynth runs on Python 3.9 and later. The base install carries every core dependency and runs every estimator except two that rely on heavier, specialised backends. Those backends ship as optional extras, so you install only the weight you use:

Extra	Adds	Needed for
`design`	`pyscipopt` (the SCIP mixed-integer solver)	the experimental-design estimators Synthetic Design (SYNDES) and Synthetic Controls for Experimental Design (MAREX), whose market-selection step is a mixed-integer quadratic program
`bayes`	`numpyro` (JAX-based MCMC)	Spillover-Detecting Synthetic Control (SPOTSYNTH)’s Bayesian synthetic-control mode
`all`	both of the above	the full feature set

Request an extra with bracket syntax, quoting the specifier so the shell does not glob the brackets:

# SCIP solver for SYNDES / MAREX
pip install -U "mlsynth[design] @ git+https://github.com/jgreathouse9/mlsynth.git"

# NumPyro for SPOTSYNTH's Bayesian mode
pip install -U "mlsynth[bayes] @ git+https://github.com/jgreathouse9/mlsynth.git"

# everything
pip install -U "mlsynth[all] @ git+https://github.com/jgreathouse9/mlsynth.git"

Both extra backends are imported lazily, so import mlsynth and importing any estimator class always succeed on the base install; the extra is consulted only when you actually run the design optimiser (SYNDES / MAREX) or SPOTSYNTH’s Bayesian path, which otherwise raise a clear error naming the missing package. pyscipopt ships prebuilt wheels that bundle SCIP, so mlsynth[design] is normally a plain install with no separate solver setup. The test suite is a development artifact and is not shipped in the installed package – clone the repository to run pytest.

Not sure which estimator to use? Walk the A practitioner’s decision tree decision tree – a sequence of identification and design questions that funnels you from “what kind of problem do I have?” down to one or two methods, with the catalogue grouped by family.

Community.

The mlsynth community spans economists, statisticians, and data scientists who use synthetic-control methods for program evaluation across policy, marketing, sports, and public health. We welcome you to join us!

To share feature requests and bug reports, use the issue tracker.
To follow development, watch the mlsynth repository on GitHub.

Development.

mlsynth is maintained by Jared Greathouse (Georgia State University). The project would not be possible without the kind efforts of and discussions with Jason Coupet, Kathy Li, Mani Bayani, Zhentao Shi, and Jaume Vives-i-Bastida, along with a growing list of contributors.

News.

The verification campaign now covers thirty-two of the thirty-six estimators in mlsynth – each auditing its implementation against its source paper, either by reproducing an empirical Table value on the authors’ own data (“Path A”) or by reproducing a Monte Carlo from the paper’s simulation section (“Path B”), or against an authoritative reference implementation. See the Replications page for the full catalogue with headline numbers.