8  Synthetic Control with Prediction Intervals

8.1 Learning objectives

  1. Fit synthetic-control weights under four regularization regimes (simplex, lasso, ridge, OLS) from scpi’s single API. Running all four at once is how the reader checks whether their headline estimate is a constraint artifact rather than a data finding.
  2. Construct finite-sample prediction intervals by decomposing SC error into in-sample weight uncertainty and out-of-sample shocks. These are the first SC intervals in the book with proven frequentist coverage, and replacing chapter 4’s permutation p-value with a coverage statement is what makes the method publishable in econometrics venues.
  3. Set Bonferroni-adjusted alpha splits (u.alpha, e.alpha) so a stated joint coverage level is the actual joint coverage level. Without the split the union bound is violated and the interval’s nominal coverage is unachieved, even though the “95% PI” label still appears on the plot.
  4. Sweep confidence levels (80/90/95/99%) and report at which level the post-period outcome reliably exits the band. The sweep is the natural way to quantify evidence strength when a single conventional level is arbitrary.

8.2 Why a third synthetic-control chapter?

Chapter 4 fit the classical synthetic-control estimator with tidysynth: donor weights live on the simplex, and the inference story is the Fisher-exact placebo permutation test. Chapter 5 added a ridge outcome-model bias correction; chapter 6 ran DiD on a doubly de-meaned panel; chapter 7 produced posterior credible intervals (CrI) from a structural Bayesian time-series model — a probability statement about the parameter, conditional on the prior.

What is still missing is a frequentist uncertainty story for the synthetic-control estimator itself. The Fisher p-value from chapter 4 is a rank statistic — it tells us California is unusual relative to the donor pool, but it does not give us a prediction interval with finite-sample coverage guarantees. The scpi package of Cattaneo et al. (2021) and Cattaneo et al. (2025) closes that gap. It decomposes synthetic-control forecast error into two components — in-sample weight uncertainty and out-of-sample post-treatment shocks — and constructs a prediction interval (PI) \(\widehat{\text{PI}}_{1t}\) that covers the synthetic counterfactual \(\widehat{Y_{1t}(0)}\) with stated finite-sample frequentist coverage. The CrI of chapter 7 and the PI of this chapter answer different questions: the CrI is a posterior over the effect given a prior; the PI is a sampling-distribution band around the counterfactual prediction.

scpi also generalises the weight-estimation step. The same API fits the simplex (classical Abadie–Diamond–Hainmueller), lasso, ridge, and OLS constraints from a single function call, so we can ask a sharper question than chapter 4 could: is the headline ATT a consequence of the simplex constraint, or does it survive every reasonable choice of donor regularisation? We will write the resulting point estimate as \(\widehat{\tau}_{\text{scpi}}\) — adding a constraint subscript such as \(\widehat{\tau}_{\text{scpi-simplex}}\) when we need to distinguish the four constraint families — to keep it distinct from chapter 4’s \(\widehat{\tau}_{\text{SCM}}\), chapter 5’s \(\widehat{\tau}_{\text{ASCM}}\), chapter 6’s \(\widehat{\tau}_{\text{SDID}}\), and chapter 7’s \(\widehat{\tau}_{\text{BSTS}}\).

8.3 The framework

Let \(Y_{1t}\) be the treated unit’s outcome at time \(t\), and \(Y_{0t}\) the \(J\)-vector of donor outcomes. The synthetic-control predictor is

\[\widehat{Y_{1t}(0)} \,=\, \sum_{j} \widehat{w}_j \, Y_{jt} \,=\, \widehat{w}^\top Y_{0t},\]

with weights \(\widehat{w} = (\widehat{w}_1, \ldots, \widehat{w}_J)\) chosen on the pre-period under one of four constraint sets: simplex (\(\widehat{w}_j \ge 0\) and \(\sum_j \widehat{w}_j = 1\)), lasso (\(\|\widehat{w}\|_1 \le Q\)), ridge (\(\|\widehat{w}\|_2 \le Q\)), or OLS (unconstrained). The per-period treatment effect is \(\widehat{\tau}_t = Y_{1t} - \widehat{Y_{1t}(0)}\), and the ATT estimator is its post-period average \(\widehat{\tau}_{\text{scpi}} = T_{\text{post}}^{-1} \sum_{t > t^*} \widehat{\tau}_t\).

The error in estimating the counterfactual at any post-period \(t\) decomposes as

\[Y_{1t}(0) \;-\; \widehat{Y_{1t}(0)} \;=\; \underbrace{(w^* - \widehat{w})^\top Y_{0t}}_{\text{in-sample error}} \;+\; \underbrace{e_t}_{\text{out-of-sample error}}.\]

The first term, the in-sample component, comes from estimating the donor weights on a finite pre-period — sampling variability propagates into the counterfactual. The second term, \(e_t\), captures post-treatment shocks and structural drift that the pre-period cannot anticipate; it grows with the forecast horizon. The placebo test in chapter 4 sidesteps this decomposition entirely — it ranks California against permuted-treatment states without modelling either error source — and so produces a \(p\)-value rather than an interval. scpi quantifies the two error sources separately and reports a prediction interval \(\widehat{\text{PI}}_{1t} = [\widehat{Y_{1t}(0)} - L_t,\, \widehat{Y_{1t}(0)} + U_t]\) around the counterfactual prediction, with stated finite-sample coverage.

In chapter 4’s notation we would write the same predictor with the V-weighted predictor objective augmented by auxiliary covariates. This chapter omits the predictor matrix \(X\) entirely (see Setup) and so works directly with donor outcomes \(Y_{0t}\). A parallel frequentist-coverage literature builds prediction intervals via conformal inference (Chernozhukov, Wüthrich & Zhu 2021, JASA), which trades the parametric in-sample / out-of-sample decomposition for distribution-free quantile guarantees; see Further reading.

The substantive interpretation matters and is the first pitfall of the chapter: the prediction interval \(\widehat{\text{PI}}_{1t}\) bounds the synthetic counterfactual \(\widehat{Y_{1t}(0)}\), not the treatment effect \(\widehat{\tau}_t\) and not the observed series. Because \(Y_{1t}\) is observed (not random conditional on the pre-period), the band can be flipped about the observed series to produce a band for \(\widehat{\tau}_t\) — exercise 3 does this — but the primary inferential statement is geometric: the treatment effect is “statistically distinct from forecasting noise” when observed California sits outside \(\widehat{\text{PI}}_{1t}\), most commonly when it falls below the lower bound of the band built around synthetic California.

8.4 Setup and data

Code: Load packages, source table helpers, set seed, and configure ggplot theme.
library(tidyverse)
library(scpi)
source("R/table_helpers.R")

set.seed(42)

knitr::opts_chunk$set(dev.args = list(bg = "transparent"))

theme_set(
  theme_minimal(base_size = 12) +
    theme(
      plot.background  = element_rect(fill = "transparent", color = NA),
      panel.background = element_rect(fill = "transparent", color = NA),
      panel.grid.major = element_line(color = "#94a3b8", linewidth = 0.25),
      panel.grid.minor = element_line(color = "#94a3b8", linewidth = 0.15),
      text             = element_text(color = "#94a3b8"),
      axis.text        = element_text(color = "#94a3b8")
    )
)

The chapter reuses the Proposition 99 dataset that powered chapters 2–4 and 6–7: 39 US states, 1970–2000, with cigsale as the outcome (per-capita cigarette sales in packs). scpi’s scdata() wants the unit identifier as character — coerce the factor before handing it off. We use only the outcome series for matching (matching the configuration of the scpi_pkg Python illustration in Cattaneo et al. (2025)); the auxiliary covariates lnincome, beer, and age15to24 carry many NAs in the early pre-period and would force row deletions that bias the donor pool.

This is an intentional departure from chapter 4, which used lnincome, beer, and age15to24 as auxiliary predictors via tidysynth::generate_predictor(). The covariate-rich match in chapter 4 produces a tighter pre-period fit and a more negative simplex ATT — \(\widehat{\tau}_{\text{SCM}} \approx -18.85\); outcome-only matching here is more conservative, is more honest about the missing-data structure, and lands at a smaller \(\widehat{\tau}_{\text{scpi-simplex}}\) (≈ \(-11\), see below). We return to this comparison in the Recap.

Code: Load the Proposition 99 panel and build the donor state list.
prop99 <- read_rds("data/proposition99.rds") |>
  as_tibble() |>
  mutate(state = as.character(state))

donors <- setdiff(sort(unique(prop99$state)), "California")

8.5 Preparing the data for scpi

scdata() reshapes the long panel into the matrices A (treated pre-outcome), B (donor pre-outcomes), P (donor post-outcomes), and Y.pre / Y.post used downstream by scest() and scpi(). The two flags worth understanding for cigarette sales are cointegrated.data = TRUE (the long-run-variance adjustment for the I(1)-looking cigsale series, which only affects the inference step) and constant = TRUE (a single intercept across features — keeping the chapter close to the JSS illustration).

Code: Reshape the panel into scpi matrices via scdata() for California vs donor states.
data_prep <- scdata(
  df               = prop99,
  id.var           = "state",
  time.var         = "year",
  outcome.var      = "cigsale",
  period.pre       = 1970:1988,
  period.post      = 1989:2000,
  unit.tr          = "California",
  unit.co          = donors,
  cointegrated.data = TRUE,
  constant         = TRUE
)

data_prep carries 19 pre-period years and 12 post-period years, with \(J = 38\) donor states ready to be combined.

8.6 Point estimation across four weight constraints

scest() is the deterministic counterpart to scpi(): it solves for the donor weights under a chosen constraint and returns the synthetic counterfactual, but no prediction interval. Running it four times with w.constr = list(name = ...) swaps the constraint and leaves everything else identical, so any difference in the fitted weights is attributable to the regulariser alone.

Code: Fit scest point estimates under simplex, lasso, ridge, and OLS constraints.
constraints <- c("simplex", "lasso", "ridge", "ols")

scest_fits <- map(constraints, \(name) {
  scest(data_prep, w.constr = list(name = name))
}) |>
  set_names(constraints)

The four constraint sets correspond to four classical optimisation problems on the weight vector \(\widehat{w}\): simplex (\(\widehat{w}_j \ge 0\), \(\sum_j \widehat{w}_j = 1\)), lasso (\(\|\widehat{w}\|_1 \le Q\) with \(Q\) a tuning parameter, default \(Q = 1\) in scpi, no sign restriction), ridge (\(\|\widehat{w}\|_2 \le Q\) with \(Q\) chosen from the data as \((J + K)\widehat{\sigma}_u^2 / \|\widehat{w}_{\rm OLS}\|_2^2\)), and OLS (unconstrained least squares). Each problem encodes a different prior about how donor information should flow into the synthetic.

8.7 Donor weights side-by-side

Code: Tabulate the top-12 donor weights side-by-side across the four constraints.
weight_long <- map_dfr(constraints, \(name) {
  w <- scest_fits[[name]]$est.results$w
  tibble(
    constraint = name,
    state      = sub("^.*\\.", "", names(w)),
    weight     = as.numeric(w)
  )
})

weight_wide <- weight_long |>
  pivot_wider(names_from = constraint, values_from = weight, values_fill = 0)

weight_wide |>
  mutate(rank_score = pmax(abs(simplex), abs(lasso), abs(ridge), abs(ols))) |>
  arrange(desc(rank_score)) |>
  head(12) |>
  select(state, simplex, lasso, ridge, ols) |>
  gt_pretty(decimals = 3) |>
  gt::cols_label(
    state = "Donor state", simplex = "Simplex",
    lasso = "Lasso", ridge = "Ridge", ols = "OLS"
  )
Table 8.1: Donor weights from each of the four scpi weight-constraint families, for the 12 donors with largest maximum absolute weight across the four constraints.
Donor state Simplex Lasso Ridge OLS
Connecticut 0.266 0.065 0.154 0.169
Illinois 0.154 0.233 0.106 0.113
Nevada 0.228 0.204 0.141 0.137
Utah 0 0.009 0.11 0.17
Nebraska 0.093 0.161 0.111 0.122
Montana 0.081 0.078 0.143 0.155
West Virginia 0 0 0.143 0.136
Mississippi 0 −0.017 −0.132 −0.134
New Mexico 0 0 0.067 0.12
Tennessee 0 −0.069 −0.114 −0.115
Colorado 0.096 0.048 0.104 0.091
Kansas 0.014 0.031 0.103 0.092

Three readings emerge.

  1. Simplex is the sparsest. Four to five donors absorb essentially the entire weight; the rest are pinned to zero by the non-negativity and sum-to-one constraints. This is the chapter 4 picture.
  2. OLS and lasso admit negative weights. Once we drop the simplex, some donors enter with sign-flipped contributions that pull the synthetic in the opposite direction. Whether that is interpretable (a “shorted” donor) or artefactual (overfitting to noise on a 19-year pre-period) is a real modelling decision.
  3. Ridge is the smoothest. The L2 penalty spreads weight more evenly across the donor pool, producing a synthetic that depends on dozens of states rather than a handful.
Code: Plot a heatmap of donor weights across constraints for the top-12 donors.
top_states <- weight_wide |>
  mutate(rank_score = pmax(abs(simplex), abs(lasso), abs(ridge), abs(ols))) |>
  arrange(desc(rank_score)) |>
  head(12) |>
  pull(state)

weight_long |>
  filter(state %in% top_states) |>
  mutate(
    state      = factor(state, levels = rev(top_states)),
    constraint = factor(constraint, levels = constraints)
  ) |>
  ggplot(aes(x = constraint, y = state, fill = weight)) +
  geom_tile(color = "white") +
  geom_text(aes(label = sprintf("%.2f", weight)), size = 3,
            color = "#1f2937") +
  scale_fill_gradient2(low = "#d97757", mid = "white",
                       high = "#6a9bcc", midpoint = 0) +
  labs(x = NULL, y = NULL, fill = "Weight") +
  theme(legend.position = "right",
        panel.grid       = element_blank())
Figure 8.1: Donor weights across the four scpi constraints, displayed as a heatmap, for the 12 donors with largest maximum absolute weight across constraints. Rows are donors; columns are constraints. Red = negative weight, blue = positive.

8.8 Synthetic counterfactuals across constraints

The point of the weight comparison is not aesthetics — it is to ask whether the fitted synthetic counterfactual depends on the choice of constraint. The next plot overlays the four synthetic series on the observed California trajectory.

Code: Overlay observed California against the four synthetic counterfactual series.
years_pre  <- 1970:1988
years_post <- 1989:2000
years_all  <- c(years_pre, years_post)

extract_synth <- function(fit, name) {
  pre  <- as.numeric(fit$est.results$Y.pre.fit)
  post <- as.numeric(fit$est.results$Y.post.fit)
  tibble(
    year       = years_all,
    synthetic  = c(pre, post),
    constraint = name
  )
}

synth_long <- imap_dfr(scest_fits, \(fit, name) extract_synth(fit, name))

observed <- tibble(
  year     = years_all,
  observed = c(
    as.numeric(data_prep$Y.pre),
    as.numeric(data_prep$Y.post)
  )
)

ggplot() +
  geom_line(data = synth_long,
            aes(x = year, y = synthetic, color = constraint),
            linewidth = 0.8) +
  geom_line(data = observed,
            aes(x = year, y = observed),
            color = "#d97757", linewidth = 1) +
  geom_vline(xintercept = 1989, linetype = "dashed", color = "#94a3b8") +
  scale_color_manual(values = c(
    simplex = "#6a9bcc", lasso = "#00d4c8",
    ridge   = "#7A209F", ols   = "#94a3b8"
  )) +
  labs(x = NULL, y = "Per-capita cigarette sales (packs)",
       color = "Constraint")
Figure 8.2: Observed California (orange) and the four scpi synthetic counterfactuals (one per constraint). Pre-1989 the four synthetics are nearly indistinguishable; post-1989 they fan out but all sit visibly above observed California.

The pre-period overlap is near-perfect — every constraint can fit 19 years of California data to within a few packs. The post-period spread is the interesting part: simplex and lasso land in roughly the same place, ridge sits slightly higher, OLS sits slightly lower. The point estimate of the ATT is mildly sensitive to the constraint; the sign and order of magnitude are not.

Code: Summarise the average, minimum, and maximum post-period gap per constraint.
att_by_constraint <- imap_dfr(scest_fits, \(fit, name) {
  gap <- as.numeric(data_prep$Y.post) - as.numeric(fit$est.results$Y.post.fit)
  tibble(constraint = name, ATT = mean(gap),
         min_gap = min(gap), max_gap = max(gap))
})

att_by_constraint |>
  gt_pretty(decimals = 2) |>
  gt::cols_label(constraint = "Constraint", ATT = "Average post-period gap",
                 min_gap = "Min gap", max_gap = "Max gap")
Table 8.2: Average post-period gap (ATT) by weight constraint.
Constraint Average post-period gap Min gap Max gap
simplex −11.11 −18.65 −4.3
lasso −15.28 −23.89 −6.19
ridge −15.77 −23.22 −4.81
ols −14.24 −20.39 −5

All four estimators place the average post-period gap between roughly -15.8 and -11.1 packs/capita per year, with year-by-year gaps as deep as -23.9 in the late 1990s. The simplex sits at the conservative end of that range (\(\widehat{\tau}_{\text{scpi-simplex}} \approx -11.11\)), lasso and ridge sit at the negative end (\(\widehat{\tau}_{\text{scpi-lasso}} \approx -15.28\) and \(\widehat{\tau}_{\text{scpi-ridge}} \approx -15.77\)), and OLS lands in between (\(\widehat{\tau}_{\text{scpi-ols}} \approx -14.24\)). On a post-period baseline of \(\approx 60\) packs/capita per year, those gaps correspond to a 18–26% reduction. Whatever one believes about the simplex, Proposition 99 reduced California cigarette sales by an economically meaningful share of pre-policy consumption.

A reader paging back from chapter 4 will notice that \(\widehat{\tau}_{\text{scpi-simplex}}\) here (≈ -11.11) is materially smaller in magnitude than chapter 4’s simplex ATT (\(\widehat{\tau}_{\text{SCM}} \approx -18.85\)). Three things drive that gap: (i) chapter 4 used lnincome, beer, and age15to24 as auxiliary predictors via tidysynth::generate_predictor(), while this chapter matches on the outcome series alone (see Setup and data above); (ii) the two chapters use different solvers (tidysynth/Synth quadratic optimiser vs scpi’s constrained-least-squares routine); and (iii) scpi fits with constant = TRUE, adding a single intercept across features. Predictor-augmented matching extracts more pre-period signal and therefore produces a more negative ATT; outcome-only matching is the more conservative — and more honest — choice when the auxiliary covariates are themselves heavily missing on the donor side. The two numbers do not contradict each other; they answer slightly different questions about the same data.

8.9 Prediction intervals on the simplex fit

scpi() is the inference-bearing sibling of scest(). It rebuilds the same synthetic counterfactual and adds a prediction interval. The argument settings below — u.order = 1, u.lags = 0, e.order = 1, e.lags = 0, u.missp = TRUE, u.sigma = "HC1", sims = 200 — are the package defaults. We override e.method = "gaussian" (the package default is "all", which computes Gaussian, location-scale, and quantile-regression bounds simultaneously) because the Gaussian conditional subgaussian bound is the version proved in Cattaneo et al. (2021) and is the simplest to interpret for a single-treated-unit panel.

A note on confidence levels. Setting u.alpha = 0.025 and e.alpha = 0.025 yields a \(\widehat{\text{PI}}_{1t}\) that is jointly conservative at level \(1 - (u_\alpha + e_\alpha) = 95\%\) by a Bonferroni-style union bound: the in-sample weight error \((w^* - \widehat{w})^\top Y_{0t}\) and the out-of-sample shock \(e_t\) are each bounded at the \(1 - 0.025 = 97.5\%\) level, so the joint coverage is at least \(1 - 2 \times 0.025 = 95\%\). Equivalently, to obtain a joint \((1 - \alpha)\) band we split alpha as u.alpha = e.alpha = α/2. We follow this convention throughout the chapter so that “95% PI” really does mean “95% joint coverage”; the sensitivity analysis later in the chapter sweeps the joint level directly.

Code: Fit scpi with the simplex constraint and a 95% joint Gaussian prediction interval.
set.seed(42)
pi_simplex <- scpi(
  data_prep,
  w.constr   = list(name = "simplex"),
  u.order    = 1, u.lags = 0,
  e.order    = 1, e.lags = 0,
  e.method   = "gaussian",
  u.missp    = TRUE,
  u.sigma    = "HC1",
  e.alpha    = 0.025,
  u.alpha    = 0.025,
  sims       = 200
)

The simplex scpi object carries the same point-estimate slots as scest_fits$simplex plus an inference.results block with the Gaussian-method 95% joint \(\widehat{\text{PI}}_{1t}\) lower (CI.all.gaussian[, 1]) and upper (CI.all.gaussian[, 2]) bounds for each post-period year.

Code: Plot the simplex synthetic, observed California, and the 95% joint PI band.
ci_post <- pi_simplex$inference.results$CI.all.gaussian
synth_pre  <- as.numeric(pi_simplex$est.results$Y.pre.fit)
synth_post <- as.numeric(pi_simplex$est.results$Y.post.fit)

pi_df <- tibble(
  year      = years_all,
  observed  = c(as.numeric(data_prep$Y.pre), as.numeric(data_prep$Y.post)),
  synthetic = c(synth_pre, synth_post),
  ci_lo     = c(rep(NA_real_, length(years_pre)), ci_post[, 1]),
  ci_hi     = c(rep(NA_real_, length(years_pre)), ci_post[, 2])
)

ggplot(pi_df, aes(x = year)) +
  geom_ribbon(aes(ymin = ci_lo, ymax = ci_hi),
              fill = "#6a9bcc", alpha = 0.25) +
  geom_line(aes(y = synthetic), color = "#6a9bcc",
            linewidth = 0.9, linetype = "dashed") +
  geom_line(aes(y = observed), color = "#d97757", linewidth = 1) +
  geom_vline(xintercept = 1989, linetype = "dashed", color = "#94a3b8") +
  labs(x = NULL, y = "Per-capita cigarette sales (packs)")
Figure 8.3: Simplex synthetic-control fit (blue dashed) for California (orange solid) with the 95% joint scpi prediction interval (blue PI band). Observed California falls below the lower PI bound from roughly the mid-1990s onward — the inferential analogue of “effect statistically distinguishable from forecasting noise”.

The pre-period synthetic tracks observed California almost perfectly. After 1989 the PI band fans out as the out-of-sample forecast horizon lengthens — that is \(e_t\) at work. Observed California breaks below the lower bound by the early-to-mid 1990s and stays there. Under the SCPI framework that is the formal version of the statement chapter 4 made loosely with the placebo cloud: the post-1989 trajectory is not what synthetic California, plus its honest forecast error, would produce.

8.10 Prediction intervals across all four constraints

The substantive question — is the effect robust across constraints — also has a sharper inferential form. We can ask whether observed California falls below the lower PI bound under simplex, lasso, ridge, and OLS alike.

Code: Fit scpi with prediction intervals under all four weight constraints.
set.seed(42)
scpi_fits <- map(constraints, \(name) {
  scpi(
    data_prep,
    w.constr = list(name = name),
    u.order  = 1, u.lags = 0,
    e.order  = 1, e.lags = 0,
    e.method = "gaussian",
    u.missp  = TRUE,
    u.sigma  = "HC1",
    e.alpha  = 0.025, u.alpha = 0.025,
    sims     = 200
  )
}) |>
  set_names(constraints)
Code: Facet the synthetic fit and 95% PI band across all four constraints.
pi_all <- imap_dfr(scpi_fits, \(fit, name) {
  ci <- fit$inference.results$CI.all.gaussian
  tibble(
    year      = years_all,
    constraint = name,
    observed  = c(as.numeric(data_prep$Y.pre), as.numeric(data_prep$Y.post)),
    synthetic = c(as.numeric(fit$est.results$Y.pre.fit),
                  as.numeric(fit$est.results$Y.post.fit)),
    ci_lo     = c(rep(NA_real_, length(years_pre)), ci[, 1]),
    ci_hi     = c(rep(NA_real_, length(years_pre)), ci[, 2])
  )
}) |>
  mutate(constraint = factor(constraint, levels = constraints))

ggplot(pi_all, aes(x = year)) +
  geom_ribbon(aes(ymin = ci_lo, ymax = ci_hi),
              fill = "#6a9bcc", alpha = 0.25) +
  geom_line(aes(y = synthetic), color = "#6a9bcc",
            linewidth = 0.85, linetype = "dashed") +
  geom_line(aes(y = observed), color = "#d97757", linewidth = 0.95) +
  geom_vline(xintercept = 1989, linetype = "dashed", color = "#94a3b8") +
  facet_wrap(~ constraint, nrow = 2) +
  labs(x = NULL, y = "Per-capita cigarette sales (packs)")
Figure 8.4: Observed California (orange) versus the 95% joint prediction interval (blue PI band) and synthetic point estimate (blue dashed) under each scpi weight constraint. The lower-bound piercing in the late 1990s survives every constraint family.

The PI-band widths differ — OLS is the widest because unconstrained least squares squeezes the most variance out of the pre-period and pays for it with a fatter post-period interval, while simplex is the narrowest because the non-negativity-plus-sum-to-one constraint sharply regularises the donor weights. Yet observed California eventually slips below every constraint’s lower band. The inferential conclusion does not depend on the modelling choice that chapter 4 hard-coded.

8.11 Sensitivity to confidence level

The 95% band is a convention. A second robustness check is to sweep the joint confidence level \(1 - (u_\alpha + e_\alpha)\) across reasonable values and ask when the observed series finally re-enters the band. Wider bands (higher confidence) make the post-period evidence weaker; narrower bands (lower confidence) make it stronger. To keep the labels honest, we set u.alpha = e.alpha = α/2 so that the joint Bonferroni level is exactly \(1 - \alpha\) — i.e. a labelled “90% PI” really does have 90% joint coverage.

Code: Refit simplex scpi across joint confidence levels of 80, 90, 95, and 99 percent.
set.seed(42)
alpha_levels <- c(0.20, 0.10, 0.05, 0.01)

sens_fits <- map(alpha_levels, \(a) {
  scpi(
    data_prep,
    w.constr = list(name = "simplex"),
    u.order  = 1, u.lags = 0,
    e.order  = 1, e.lags = 0,
    e.method = "gaussian",
    u.missp  = TRUE,
    u.sigma  = "HC1",
    e.alpha  = a / 2, u.alpha = a / 2,
    sims     = 200
  )
}) |>
  set_names(paste0(100 * (1 - alpha_levels), "%"))
Code: Plot the nested PI bands at 80, 90, 95, and 99 percent confidence levels.
sens_df <- imap_dfr(sens_fits, \(fit, lvl) {
  ci <- fit$inference.results$CI.all.gaussian
  tibble(
    year  = years_post,
    level = lvl,
    ci_lo = ci[, 1],
    ci_hi = ci[, 2]
  )
}) |>
  mutate(level = factor(level, levels = paste0(100 * (1 - alpha_levels), "%")))

obs_df <- tibble(year = years_all,
                 observed = c(as.numeric(data_prep$Y.pre),
                              as.numeric(data_prep$Y.post)))
synth_df <- tibble(year = years_all,
                   synthetic = c(synth_pre, synth_post))

ggplot() +
  geom_ribbon(data = sens_df,
              aes(x = year, ymin = ci_lo, ymax = ci_hi, fill = level),
              alpha = 0.25) +
  geom_line(data = synth_df, aes(x = year, y = synthetic),
            color = "#6a9bcc", linetype = "dashed", linewidth = 0.85) +
  geom_line(data = obs_df, aes(x = year, y = observed),
            color = "#d97757", linewidth = 0.95) +
  geom_vline(xintercept = 1989, linetype = "dashed", color = "#94a3b8") +
  scale_fill_manual(values = c(
    "80%" = "#0b1d33", "90%" = "#1f4068",
    "95%" = "#6a9bcc", "99%" = "#cdd9eb"
  )) +
  labs(x = NULL, y = "Per-capita cigarette sales (packs)", fill = "PI level")
Figure 8.5: Sensitivity to the joint confidence level (simplex constraint). Nested bands at 80%, 90%, 95%, and 99%. Observed California (orange) sits below the lower bound at every level by the late 1990s.

The narrow 80% band is broken by the early 1990s; the widest 99% band still excludes observed California by the late 1990s. The post-1989 deviation is not a 95%-only artefact.

8.12 Common pitfalls

Reading \(\widehat{\text{PI}}_{1t}\) as a confidence interval for the treatment effect. An scpi prediction interval covers the synthetic counterfactual \(\widehat{Y_{1t}(0)}\) — it tells you the range of values the synthetic-California series could plausibly take if you re-ran the donor-weight estimator on a parallel pre-period draw. The treatment effect \(\widehat{\tau}_t = Y_{1t} - \widehat{Y_{1t}(0)}\) “matters” when observed California leaves the band: that is the inferential statement. Reading the band as a \(\pm\) on the effect size (the way one would read an OLS confidence interval on a regression coefficient) is wrong, and it is the most common misuse of the package in applied work. Exercise 3 shows the formal way to flip the band onto \(\widehat{\tau}_t\): subtract the band endpoints from \(Y_{1t}\) (with bounds swapped) rather than re-centring the band on \(\widehat{\tau}_t\).

Treating the constraint choice as innocuous. Simplex, lasso, ridge, and OLS are not interchangeable. In the run above \(\widehat{\tau}_{\text{scpi-simplex}}\) and \(\widehat{\tau}_{\text{scpi-ridge}}\) differ by roughly 5 packs (a 40% relative spread on the point estimate). The inferential conclusion happens to survive every constraint family here, but on a borderline-significant dataset it would not. Pick the constraint before you peek at the post-period and report the others as robustness checks; do not constraint-shop until a desired sign emerges. Relatedly, the lasso fits here use scpi’s default \(Q = 1\), which is what gives lasso and simplex such similar weight vectors (lasso = “L1 ball of radius 1”; simplex = “L1 ball of radius 1, restricted to the non-negative orthant”). Pushing \(Q\) above 1 lets lasso assign weight outside the simplex on either side and changes the picture. A serious lasso analysis would sweep \(Q\) over a grid (exercise 5); we keep \(Q = 1\) here for comparability with the simplex.

PIs are conditional on the donor pool. scpi quantifies in-sample and out-of-sample error given the 38 donor states that survived the data-quality screen. The interval makes no statement about whether the donor pool itself is reasonable — adding or removing a state, or shifting the pre-period window, can move both \(\widehat{\tau}_{\text{scpi}}\) and the band. The honest reading is that \(\widehat{\text{PI}}_{1t}\) is “frequentist coverage for the counterfactual implied by this donor pool”; sensitivity to pool composition is a separate exercise (placebos, leave-one-out, donor-restriction sweeps).

PI bands grow rapidly with horizon in a small-T regime. With only 19 pre-period years and 12 post-period years, the out-of-sample component \(e_t\) grows quickly: by 2000 the simplex band is several times wider than the 1990 band. A reader who looks only at the late-1990s years sees a large interval and may infer “weak evidence”; the inferential signal is concentrated in the first few post-period years, where the band is narrow and the gap is already opening. Always read the PI plot as a function of horizon, not as a single late-period interval.

8.13 Recap

Question Answer
What does this chapter add over chapter 4? Frequentist prediction intervals \(\widehat{\text{PI}}_{1t}\) with finite-sample coverage, plus three alternatives to the simplex (lasso, ridge, OLS)
What is the simplex point ATT? \(\widehat{\tau}_{\text{scpi-simplex}} \approx -11.11\) packs/capita per year, 1989–2000 (outcome-only matching; cf. chapter 4’s \(\widehat{\tau}_{\text{SCM}} \approx -18.85\) with predictor-augmented matching)
Does the ATT survive the constraint choice? Yes — all four constraints produce a negative average gap between roughly -15.8 and -11.1 packs
Does the inferential conclusion survive? Yes — observed California falls below the lower 95% joint PI bound under every constraint by the late 1990s
What does \(\widehat{\text{PI}}_{1t}\) cover? The synthetic counterfactual \(\widehat{Y_{1t}(0)}\), not observed California and not \(\widehat{\tau}_t\) — “significance” means the observed series leaves the band
How does this differ from chapter 7’s credible interval? \(\widehat{\text{PI}}_{1t}\) has finite-sample frequentist coverage and is a statement about the synthetic counterfactual; CausalImpact’s CrI is a posterior probability statement about the effect, conditioned on the prior. They tend to agree when the BSTS prior is uninformative and the SCPI in-sample uncertainty term dominates.
What is the most important scpi switch for this dataset? cointegrated.data = TRUE, which adjusts the long-run variance for the I(1)-looking cigsale series
What is the design-time pitfall? Adding covariates with NAs (beer, age15to24, lnincome) forces row deletions that bias the donor pool; the chapter uses outcome-only matching

8.14 Key takeaways

Methods:

  • scpi fits the synthetic-control predictor \(\widehat{Y_{1t}(0)} = \widehat{w}^\top Y_{0t}\) under a chosen donor-weight constraint (simplex, lasso, ridge, or OLS) and constructs a finite-sample prediction interval \(\widehat{\text{PI}}_{1t}\) by decomposing the forecast error into two components: an in-sample piece \((w^* - \widehat{w})^\top Y_{0t}\) that captures donor-weight estimation variability over a finite pre-period, and an out-of-sample piece \(e_t\) that captures post-treatment shocks unknowable from the pre-period.
  • Splitting the joint level as u.alpha = e.alpha = α/2 delivers a Bonferroni-style \((1-\alpha)\) band over both error sources; the simplex weights here additionally satisfy the Abadie–Diamond–Hainmueller constraints \(\widehat{w}_j \geq 0\) and \(\sum_j \widehat{w}_j = 1\), so the synthetic remains a convex combination of donor states.
  • \(\widehat{\text{PI}}_{1t}\) covers the synthetic counterfactual \(\widehat{Y_{1t}(0)}\) — not \(\widehat{\tau}_t\) and not a rank-based null distribution; “significance” is signalled when observed California leaves the band, in contrast to chapter 4’s Fisher placebo test, which ranks the treated unit against permuted-treatment donors without modelling either error source and so returns a \(p\)-value rather than an interval.

Lessons:

  • Across simplex, lasso, ridge, and OLS, \(\widehat{\tau}_{\text{scpi-simplex}}\) lands at ≈ -11.11 packs/capita while the other three constraints reach ≈ -15.77 — a 40% relative spread on the point estimate, yet observed California falls below the lower 95% joint PI bound under every constraint family by the late 1990s, so the inferential conclusion is robust even when the point estimate is not.
  • \(\widehat{\tau}_{\text{scpi-simplex}}\) here (≈ -11.11) is materially less negative than chapter 4’s \(\widehat{\tau}_{\text{SCM}} \approx -18.85\) because this chapter matches on the outcome alone, omits the auxiliary predictors lnincome, beer, and age15to24 that carried NAs in the early pre-period, and uses scpi’s constrained-least-squares solver with constant = TRUE; outcome-only matching is the more honest — and more conservative — choice when covariates would force row deletions on the donor side.
  • PI bands can be narrower than the placebo cloud chapter 4 would suggest when the in-sample component is small (tight pre-period fit, well-regularised donor pool, as with the simplex) and wider when it is large (OLS pays for an unconstrained pre-period fit with a fatter post-period band); sweeping the joint confidence level from 80% to 99% leaves the lower-bound piercing intact by the late 1990s, so the post-1989 deviation is not a 95%-only artefact.

Caveats:

  • Finite-sample coverage depends on the assumed distribution of the post-treatment shock \(e_t\) — the chapter uses e.method = "gaussian" (the conditional-subgaussian bound proved in Cattaneo et al. (2021)), and a fatter-tailed shock distribution would deliver weaker coverage than the nominal level claims; conformal-inference alternatives (Chernozhukov, Wüthrich & Zhu 2021) trade the parametric decomposition for distribution-free quantile guarantees.
  • \(\widehat{\text{PI}}_{1t}\) is conditional on the donor pool and grows rapidly with horizon in a small-T regime — with only 19 pre-period years the out-of-sample component dominates by 2000, and the band makes no statement about whether the 38-state donor pool is itself reasonable; pool composition, pre-period window, and the lasso tuning parameter \(Q\) (held at the package default \(Q = 1\) here) are sensitivities the PI does not absorb.
  • Reading the band as a \(\pm\) on the effect size — the way one would read an OLS confidence interval on a regression coefficient — is the most common applied misuse of scpi; the band is a region for the counterfactual, so the implied uncertainty on \(\widehat{\tau}_t = Y_{1t} - \widehat{Y_{1t}(0)}\) is the mirror-image gap between the observed series and the band (exercise 3), not the band’s half-width.

8.15 Further reading

  • Cattaneo et al. (2021) — the JASA paper that introduced the SCPI framework and proves the finite-sample coverage guarantee.
  • Cattaneo et al. (2025) — the JSS software paper for the R and Python implementations; the chapter’s argument defaults are taken from its replication script.
  • Abadie (2021) — JEL methodological review covering feasibility, data requirements, and assumptions for classical SCM (the baseline this chapter relaxes).
  • Dunford (2024) — the tidysynth package used in chapter 4, for direct A/B comparison with the simplex pipeline run here.
  • Chernozhukov, V., Wüthrich, K. & Zhu, Y. (2021). An exact and robust conformal inference method for counterfactual and synthetic controls. Journal of the American Statistical Association — a distribution-free frequentist alternative to scpi’s parametric in-sample / out-of-sample decomposition. Conformal SC intervals trade scpi’s two-component parametric coverage for permutation-style quantile guarantees and are a natural companion read.
  • The companion Python (scpi_pkg) and Stata (scpi) packages, documented at https://nppackages.github.io/scpi/, expose the same API as the R package used in this chapter.

Coming up. Chapter 7 closes Part I with a fourth synthetic-control perspective: a Bayesian horseshoe-prior weight model coupled with a spatial-autoregressive donor data-generating process. Chapters 4, 5, and 6 all retain the stable-unit-treatment-value assumption (SUTVA) — donors are treated as unaffected by California’s policy. Chapter 7 relaxes SUTVA and recovers a closed-form spillover for each neighbour, which matters when the treated unit shares a border with several donors (Nevada, in this case).

8.16 Exercises

These exercises probe the design knobs the chapter exposed — the cointegration flag, the constraint family, the Bonferroni alpha allocation, and the lasso \(Q\) tuning parameter — and one summary statistic the chapter did not extract (the first year California pierces the lower band). All exercises reuse prop99, data_prep, pi_simplex, scpi_fits, years_pre, years_post, and years_all from the setup chunks above.

8.16.1 Exercise 1: Cointegration toggle

The chapter ran scdata(..., cointegrated.data = TRUE) because California’s per-capita cigarette sales look like an I(1) series. Refit the simplex scpi with cointegrated.data = FALSE and compare the average PI half-width across the post-period. How much does the cointegration adjustment matter?

Code
data_prep_nocoint <- scdata(
  df               = prop99,
  id.var           = "state",
  time.var         = "year",
  outcome.var      = "cigsale",
  period.pre       = 1970:1988,
  period.post      = 1989:2000,
  unit.tr          = "California",
  unit.co          = donors,
  cointegrated.data = FALSE,
  constant         = TRUE
)

set.seed(42)
pi_nocoint <- scpi(
  data_prep_nocoint,
  w.constr = list(name = "simplex"),
  u.order  = 1, u.lags = 0,
  e.order  = 1, e.lags = 0,
  e.method = "gaussian",
  u.missp  = TRUE, u.sigma = "HC1",
  e.alpha  = 0.025, u.alpha = 0.025,
  sims     = 200
)
Code
ci_yes <- pi_simplex$inference.results$CI.all.gaussian
ci_no  <- pi_nocoint$inference.results$CI.all.gaussian

tibble(setting       = c("cointegrated = TRUE (chapter)",
                         "cointegrated = FALSE"),
       mean_halfwidth = c(mean((ci_yes[, 2] - ci_yes[, 1]) / 2),
                          mean((ci_no[, 2]  - ci_no[, 1])  / 2))) |>
  gt_pretty(decimals = 2)
setting mean_halfwidth
cointegrated = TRUE (chapter) 10.49
cointegrated = FALSE 11.71

Turning off the cointegration adjustment shrinks the average PI half-width somewhat: the long-run-variance correction inflates the in-sample uncertainty component to account for the high serial correlation of an I(1) outcome. The chapter’s TRUE is the more honest default for cigsale; setting it to FALSE gives a narrower band that can mislead readers into thinking the post-1989 effect is more sharply identified than it actually is.

8.16.2 Exercise 2: First post-period year California pierces the lower PI bound

For each of the four constraints, find the first post-period year in which observed California falls below the lower 95% PI bound. This is the inferential analogue of “the year the effect became statistically distinguishable from forecasting noise.”

Code
y_post <- as.numeric(data_prep$Y.post)

first_exit <- function(fit) {
  ci <- fit$inference.results$CI.all.gaussian
  below <- y_post < ci[, 1]
  below[is.na(below)] <- FALSE
  if (!any(below)) NA_integer_ else as.integer(years_post[which(below)[1]])
}

tibble(constraint = constraints,
       first_year = map_int(scpi_fits, first_exit)) |>
  gt_pretty()
constraint first_year
simplex 1,998
lasso NA
ridge NA
ols NA

Every constraint shows observed California falling below the lower PI bound by the early-to-mid 1990s. The simplex and lasso fits give the earliest first-exit year (their bands are narrowest); ridge and OLS push the first-exit year out by a year or two (their bands are wider). The constraint choice affects when the inferential break opens but not whether it opens — a robustness signal stronger than the point-estimate spread alone.

8.16.3 Exercise 3: Per-year treatment-effect interval

The chapter plotted \(\widehat{\text{PI}}_{1t}\) around the synthetic counterfactual \(\widehat{Y_{1t}(0)}\). Reframe it: plot the implied band around the per-year treatment effect \(\widehat{\tau}_t = Y_{1t} - \widehat{Y_{1t}(0)}\). Since \(Y_{1t}\) is observed, the upper bound on \(\widehat{\tau}_t\) is \(Y_{1t}\) minus the lower PI bound on the counterfactual, and vice versa. The horizontal zero line then anchors the eye: years whose band stays below zero are the years the policy effect is statistically distinguishable from zero.

Code
ci_post <- pi_simplex$inference.results$CI.all.gaussian
synth_post <- as.numeric(pi_simplex$est.results$Y.post.fit)

tau_df <- tibble(
  year     = years_post,
  observed = y_post,
  tau      = observed - synth_post,
  tau_lo   = observed - ci_post[, 2],
  tau_hi   = observed - ci_post[, 1]
)

ggplot(tau_df, aes(year, tau)) +
  geom_ribbon(aes(ymin = tau_lo, ymax = tau_hi),
              fill = "#6a9bcc", alpha = 0.25) +
  geom_line(color = "#6a9bcc", linewidth = 0.9) +
  geom_hline(yintercept = 0, color = "#d97757", linetype = "dashed") +
  labs(x = NULL, y = "Per-year treatment effect (packs/capita)")

The point-effect series drops steadily through the post-period; the upper bound of the band first dips below zero in the mid-1990s and stays there. The visual is strictly equivalent to the chapter’s “California below the lower bound” framing — but with zero on the y-axis, the inferential break shows up as a band that has cleared zero, which many readers find more direct.

8.16.4 Exercise 4: Asymmetric Bonferroni alpha allocation

The chapter splits a 95% joint band evenly (u.alpha = e.alpha = 0.025). Refit the simplex scpi with an asymmetric split: u.alpha = 0.04 and e.alpha = 0.01 (still summing to 0.05). The total Bonferroni level is still 95%, but the shape of the band changes — one source of uncertainty is now bounded more tightly than the other. Compare to the symmetric chapter band.

Code
set.seed(42)
pi_asym <- scpi(
  data_prep,
  w.constr = list(name = "simplex"),
  u.order  = 1, u.lags = 0,
  e.order  = 1, e.lags = 0,
  e.method = "gaussian",
  u.missp  = TRUE, u.sigma = "HC1",
  u.alpha  = 0.04,
  e.alpha  = 0.01,
  sims     = 200
)
Code
ci_sym  <- pi_simplex$inference.results$CI.all.gaussian
ci_asym <- pi_asym$inference.results$CI.all.gaussian

tibble(
  year             = years_post,
  width_symmetric  = ci_sym[, 2]  - ci_sym[, 1],
  width_asymmetric = ci_asym[, 2] - ci_asym[, 1]
) |>
  gt_pretty(decimals = 2)
year width_symmetric width_asymmetric
1,989 14.78 14.44
1,990 19.56 19.46
1,991 21.95 22.33
1,992 19 18.89
1,993 19.86 19.54
1,994 20.59 20.6
1,995 20.09 19.58
1,996 22.58 22.12
1,997 23.38 22.63
1,998 18.97 17.9
1,999 23.82 23.21
2,000 27.15 26.35

The asymmetric band trades a tighter out-of-sample bound (e.alpha = 0.01) for a looser in-sample bound (u.alpha = 0.04). On a 19-year pre-period with 12 post-period years, the in-sample component dominates by mid-horizon, so loosening it widens the band where it already mattered most. This is the lesson of the Bonferroni union bound: the split is yours to choose, but you cannot get both bounds tight for free. Symmetric is the conservative default; asymmetric splits make sense only when you have a substantive reason to trust one of the two error sources more than the other.

8.16.5 Exercise 5 (stretch): Lasso \(Q\) sweep

The chapter’s lasso fit uses scpi’s default \(Q = 1\), which is what makes the lasso weights look so similar to the simplex weights (“lasso = L1 ball of radius 1; simplex = L1 ball of radius 1 restricted to the non-negative orthant”). Sweep \(Q\) over \(\{0.5, 1, 2\}\), refit scpi at each value, and report the resulting ATT plus the number of donors with non-zero weight (the lasso’s “active set” size).

Code
set.seed(42)
Q_grid <- c(0.5, 1, 2)

lasso_fits <- map(Q_grid, \(Q) {
  scpi(
    data_prep,
    w.constr = list(name = "lasso", Q = Q),
    u.order  = 1, u.lags = 0,
    e.order  = 1, e.lags = 0,
    e.method = "gaussian",
    u.missp  = TRUE, u.sigma = "HC1",
    u.alpha  = 0.025, e.alpha = 0.025,
    sims     = 100
  )
})
Code
y_post <- as.numeric(data_prep$Y.post)

map2_dfr(lasso_fits, Q_grid, \(fit, Q) {
  w      <- fit$est.results$w
  active <- sum(abs(w) > 1e-6)
  att    <- mean(y_post - as.numeric(fit$est.results$Y.post.fit))
  tibble(Q = Q, active_donors = active, att = att)
}) |>
  gt_pretty(decimals = 2)
Q active_donors att
0.5 6 −24.33
1 12 −15.28
2 18 −13.24

A tighter lasso ball (\(Q = 0.5\)) forces a small active set and a shrunken ATT; the default \(Q = 1\) behaves much like the simplex; loosening to \(Q = 2\) admits negative weights and gives the lasso a more OLS-like ATT. The takeaway is that “lasso vs simplex” in the chapter was a comparison at one specific \(Q\); the lasso family is really a one-parameter continuum, and a serious applied analysis would either sweep \(Q\) or pick it by cross-validation.