6 Synthetic Difference-in-Differences

Chapter 5 closed the gap between observed and synthetic counterfactual by adding a model: ridge regression supplied a bias correction the simplex weights could not. Synthetic Difference-in-Differences (SDID) of Arkhangelsky et al. (2021) closes the same gap without an outcome model. It keeps the synthetic-control unit weights $\omega_i$, but it also lets the data choose time weights $\nu_t$ — pre-period years similar to the post-period get more weight; volatile years near the cutoff get less — and then runs an ordinary two-way fixed-effects DiD on the doubly-reweighted panel. The unit weights handle baseline level gaps the way SCM does; the time weights handle local pre-trend shocks the way no single-vintage DiD ever could.

This chapter does three demonstrations on the same Proposition 99 panel that chapter 4 used (the canonical Arkhangelsky-et-al-2021 Table 1 replication), plus a fourth pass on the simulated tobacco panel from chapter 5 so we can see how SDID handles staggered adoption. Throughout, we use $\omega$ for SDID unit weights and $\nu$ for SDID time weights to avoid the symbol clash with chapter 5’s ridge penalty $\lambda$ and chapter 11’s factor loadings $\lambda_i$.

6.1 Learning objectives

Build the SDID counterfactual on Proposition 99 with synthdid::panel.matrices() and synthdid_estimate() and recover the iconic Arkhangelsky-et-al-2021 Table 1 side-by-side of DiD, SC, and SDID. The disagreement between the three estimates is the chapter’s lesson — it isolates exactly how much of the ATT each set of weights buys you.
Adjust the SDID fit for time-varying covariates with xsynthdid::adjust.outcome.for.x(), which residualises the outcome on a covariate path before SDID weights are computed. This is the only piece of covariate machinery the SDID world ships, and it works in a single line.
Match the inference method to the design: placebo standard errors are the only valid frequentist choice whenever a fit has a single treated unit — which includes each cohort of a staggered roll-out, since SDID is fitted one treated unit at a time, so jackknife and bootstrap collapse there too.
Extend SDID to staggered adoption by fitting one synthdid per cohort and aggregating with the cell-count weights $\pi_g$ from Clarke et al. (2023). This is the SDID analogue of multisynth from chapter 5 — same staggered problem, different counterfactual.

6.2 The SDID idea

Three estimators on the same panel. DiD uses uniform unit and time weights and assumes parallel trends between donors and the treated unit. SCM keeps uniform time weights but lets the donor weights $\omega_i$ vary across the simplex to match the treated pre-period. SDID lets both sets of weights vary across the simplex — donors that look like California get $\omega_i > 0$, and pre-period years that look like the post-period get $\nu_t > 0$. Then SDID fits an ordinary DiD on the doubly de-meaned panel. The double reweighting buys two things at once: it relaxes parallel trends (because the time weights align the donors’ pre-period trajectory with the treated unit’s), and it tolerates a vertical baseline level gap between the treated unit and its synthetic match (because the DiD step absorbs that level shift).

flowchart LR
    A["Panel Y, treated unit + donors"] --> B["Unit weights omega<br/>(simplex; match treated pre-period)"]
    A --> C["Time weights nu<br/>(simplex; match donors' post-period)"]
    B --> D["Doubly-weighted DiD<br/>tau_SDID"]
    C --> D
    D --> E["ATT = Y1t - Y1t-hat(0)"]
    style A fill:#6a9bcc,stroke:#cbd5e0,color:#fff
    style D fill:#6a9bcc,stroke:#cbd5e0,color:#fff
    style E fill:#d97757,stroke:#cbd5e0,color:#fff

Figure 6.1: The synthetic DiD pipeline. SCM unit weights and SDID time weights are each chosen as ridge-regularised simplex problems; together they doubly de-mean the panel, on which a standard DiD recovers the ATT.

Identification. SDID identifies the ATT under a latent-factor model in which the doubly-reweighted donors track the doubly-reweighted treated unit in expectation. This is strictly weaker than DiD’s parallel-trends assumption (which SDID nests at $\omega_i = 1/N_0$, $\nu_t = 1/T_{\text{pre}}$) and strictly weaker than SCM’s exact-pre-period-match condition (which SDID nests at $\nu_t = 1/T_{\text{pre}}$). The full statement and asymptotic theory are in Theorem 1 of Arkhangelsky et al. (2021).

6.3 The estimator

SDID estimates the ATT by minimising the doubly-weighted two-way-fixed-effects sum of squares:

\[ \big(\widehat{\tau}_{\text{SDID}}, \widehat{\mu}, \widehat{\alpha}, \widehat{\beta}\big) \;=\; \arg\min_{\tau, \mu, \alpha, \beta} \sum_{i=1}^{N}\sum_{t=1}^{T} \Big(Y_{it} - \mu - \alpha_i - \beta_t - \tau\, D_{it}\Big)^2\, \widehat{\omega}_i\, \widehat{\nu}_t, \]

where $\alpha_i$ are unit fixed effects, $\beta_t$ are time fixed effects, and $D_{it}$ is the treatment dummy. The weights $\widehat{\omega}_i$ and $\widehat{\nu}_t$ are themselves estimated by ridge-regularised simplex problems that solve for them before $\tau$ is fit — see equations (2)–(3) of Arkhangelsky et al. (2021). The two weight problems are mirror images of each other:

$\widehat{\omega}_i$ — non-negative, sum to one across donors, zero on the treated unit — is chosen to make $\sum_{i: D=0} \widehat{\omega}_i\, Y_{i, \text{pre}}$ close to $Y_{\text{treated}, \text{pre}}$, the SCM objective from chapter 4.
$\widehat{\nu}_t$ — non-negative, sum to one across pre-period years, zero on the post-period — is chosen to make $\sum_{t \le t^*} \widehat{\nu}_t\, Y_{i,t}$ close to $\frac{1}{T_{\text{post}}}\sum_{t > t^*} Y_{i,t}$ for each donor $i$, i.e. to pick pre-period years whose level looks like the donors’ post-period level.

Setting $\nu_t = 1/T_{\text{pre}}$ for every pre-period year and re-fitting $\omega_i$ collapses SDID to classical SCM. Setting $\omega_i = 1/N_0$ for every donor too collapses it further to plain DiD. The two-knob nature is what makes the chapter’s headline comparison so clean.

6.4 Setup and data

Packages. tidyverse covers wrangling and plotting. synthdid provides panel.matrices() (the panel-to-(Y, N0, T0) converter) and the three estimators this chapter uses head to head — synthdid_estimate(), sc_estimate(), and did_estimate(), all of which take the same matrices and so make the side-by-side comparison mechanical. xsynthdid adds adjust.outcome.for.x() for covariate residualisation; its companion xsdid_se_bootstrap() supplies matching standard errors for designs with two or more treated units in one fit, so it stays unused here — every fit in this chapter has a single treated unit. haven reads the labelled Stata file for the staggered demo (reusing chapter 5’s panel). mice supplies the random-forest imputation we need for the two sparse Prop 99 covariates.

Code: Load packages, source table helpers, and set the transparent ggplot theme.

library(tidyverse)
library(synthdid)
library(xsynthdid)
library(haven)
library(mice)
source("R/table_helpers.R")

# synthdid's placebo SEs and xsdid_se_bootstrap consume RNG; the seed is load
# bearing.
set.seed(42)

# synthdid_plot() still draws its lines with ggplot2's deprecated `size` aesthetic;
# quiet the upstream lifecycle deprecation so it does not leak into the chapter.
options(lifecycle_verbosity = "quiet")

knitr::opts_chunk$set(dev.args = list(bg = "transparent"))

theme_set(
  theme_minimal(base_size = 12) +
    theme(
      plot.background  = element_rect(fill = "transparent", color = NA),
      panel.background = element_rect(fill = "transparent", color = NA),
      panel.grid.major = element_line(color = "#94a3b8", linewidth = 0.25),
      panel.grid.minor = element_line(color = "#94a3b8", linewidth = 0.15),
      text             = element_text(color = "#94a3b8"),
      axis.text        = element_text(color = "#94a3b8")
    )
)

Dataset. We use two panels. The single-treated demo runs on the Prop 99 panel data/proposition99.rds: 39 states observed annually 1970–2000, with cigarette sales (cigsale) as the outcome and four time-varying covariates (lnincome, retprice, age15to24, beer). California is the lone treated unit; the policy goes into effect in 1989, so $t^* = 1988$. The staggered demo reuses chapter 5’s simulated panel data/tobacco_sim.dta: 40 states observed 1970–2000, with four treated cohorts adopting in 1986, 1989, 1992, 1995.

Code: Load Proposition 99, impute two sparse covariates, and tag California’s treatment indicator.

prop99_raw <- read_rds("data/proposition99.rds") |>
  as_tibble() |>
  mutate(state = as.character(state))

# Two of the four covariates have substantial missingness — beer (~55%) and
# age15to24 (~32%). We single-impute with mice's random-forest method (m = 1) so
# that the SDID fit lands on one well-defined panel; chapter 7 (BSTS) treats
# imputation as a first-class step and shows what proper multiple imputation
# buys you in this same data. The 39-level `state` identifier is held aside while
# imputing — it is not a useful predictor (mice would drop it as collinear) — and
# rejoined after; cigsale has no missingness and is untouched either way.
prop99 <- prop99_raw |>
  select(-state) |>
  mice(m = 1, method = "rf", printFlag = FALSE) |>
  complete() |>
  as_tibble() |>
  mutate(state = prop99_raw$state, .before = 1) |>
  mutate(treated = as.integer(state == "California" & year >= 1989))

Code: Load the simulated tobacco panel for the staggered demo.

tobacco <- read_dta("data/tobacco_sim.dta") |>
  mutate(
    adoption_year = if_else(adoption_year >= 9999, Inf, as.numeric(adoption_year)),
    treated       = as.integer(treated),
    trt           = as.integer(trt)
  )

The two panels are very different in shape — one has a single treated unit and no missingness in the outcome, the other has four staggered cohorts and a clean DGP — and that contrast is the point. Single-treated SDID is the textbook case; staggered SDID is the bookkeeping problem.

Code: Tabulate panel dimensions, treated counts, and the cutoff t* for each panel.

tibble(
  Panel = c("Proposition 99 (`data/proposition99.rds`)",
            "Simulated tobacco (`data/tobacco_sim.dta`)"),
  States   = c(length(unique(prop99$state)), length(unique(tobacco$state))),
  Years    = c(length(unique(prop99$year)), length(unique(tobacco$year))),
  Treated  = c(length(unique(prop99$state[prop99$treated == 1])),
               length(unique(tobacco$state[tobacco$treated == 1]))),
  `t* (last pre)` = c("1988 (California, single cohort)",
                      "1985, 1988, 1991, 1994 (four cohorts)")
) |>
  gt_pretty() |>
  gt::fmt_markdown(columns = "Panel")

Table 6.1: Shape of the two panels used in this chapter. Prop 99 has one treated unit (California) and four time-varying covariates after imputation; tobacco_sim has four staggered cohorts and no covariates.

Panel	States	Years	Treated	t* (last pre)
Proposition 99 (`data/proposition99.rds`)	39	31	1	1988 (California, single cohort)
Simulated tobacco (`data/tobacco_sim.dta`)	40	31	4	1985, 1988, 1991, 1994 (four cohorts)

A note on imputation. mice(m = 1, method = "rf") draws one random-forest imputation per missing cell. That is sufficient for point estimation — synthdid_estimate() returns a single estimate on a single panel — but it under-states posterior variance because it ignores the imputation uncertainty. Chapter 7’s BSTS pipeline is where proper multiple imputation (with Rubin’s-rules pooling) is paid in full. Exercise 3 below dials this knob and shows how much the point estimate moves under $m = 5$ draws.

6.5 A single treated unit: Proposition 99

The idea. The first variant fits vanilla SDID on the Prop 99 panel with no covariates. We pass the data frame (state, year, cigsale, treated) to panel.matrices(), which reshapes it into the matrix block synthdid wants — the outcome matrix $Y$ with the treated unit on the last row and the post-period in the last $T - T_0$ columns, plus the integers $N_0$ (number of donors) and $T_0$ (last pre-period column). Then synthdid_estimate(Y, N0, T0) returns the ATT.

Code: Reshape Prop 99 to (Y, N0, T0), fit vanilla SDID, and compute placebo SE.

df_sdid <- prop99 |>
  select(state, year, cigsale, treated) |>
  as.data.frame()

pm <- panel.matrices(df_sdid, unit = "state", time = "year",
                     outcome = "cigsale", treatment = "treated")

sdid_vanilla <- synthdid_estimate(pm$Y, pm$N0, pm$T0)
se_placebo   <- sqrt(vcov(sdid_vanilla, method = "placebo"))

The panel is $39 \times 31$ — 38 donor states plus California, observed 19 pre-period and 12 post-period years.

Reading the output. Vanilla SDID returns $\widehat{\tau}_{\text{SDID}} \approx -15.60$ packs per capita: California smoked roughly 16 packs less per capita per year, averaged over the 1989–2000 window, than the SDID counterfactual implies. The placebo standard error of 10.99 packs/capita gives a 95% interval of roughly $(-37.1, 5.9)$. The estimate is the exact number reported in Arkhangelsky-et-al-2021 Table 1.

Code: Tabulate the vanilla SDID point estimate, placebo SE, and 95% CI.

tibble(
  Estimator = "SDID (vanilla)",
  ATT       = tau_sdid,
  `SE (placebo)` = se_placebo,
  `CI lower`     = ci_lo,
  `CI upper`     = ci_hi
) |>
  gt_pretty(decimals = 2)

Table 6.2: Vanilla SDID estimate on Proposition 99 with placebo standard error and 95% confidence interval. Jackknife and bootstrap return NA on a single-treated panel.

Estimator	ATT	SE (placebo)	CI lower	CI upper
SDID (vanilla)	−15.6	10.99	−37.14	5.93

The native synthdid_plot() is the method’s signature display: a two-panel figure that overlays observed and synthetic California and draws the estimated time weights $\nu_t$ as a strip below. The strip is the diagnostic that distinguishes SDID from SCM — it shows you which pre-period years actually contribute to the comparison, and which are weighted down.

Code: Render synthdid_plot() and override colors to match the book palette.

synthdid_plot(list("SDID" = sdid_vanilla),
              treated.name = "California (observed)",
              control.name = "Synthetic California (SDID)",
              se.method    = "placebo") +
  scale_color_manual(values = c("California (observed)"        = "#d97757",
                                "Synthetic California (SDID)"  = "#6a9bcc")) +
  scale_fill_manual(values  = c("California (observed)"        = "#d97757",
                                "Synthetic California (SDID)"  = "#6a9bcc")) +
  labs(x = "Year", y = "Cigarette sales (packs per capita)",
       color = NULL, fill = NULL)

Figure 6.2: Vanilla SDID on Proposition 99. The top panel shows observed California (solid) vs the synthetic-control counterfactual (dashed); the bottom strip shows the SDID time weights nu_t — pre-period years that look like the post-period get more weight, volatile years near the cutoff get less.

The companion synthdid_units_plot() shows which donors carried real weight. SDID picks a sparse subset — only a handful of donors absorb most of the simplex mass — and the rest sit on essentially zero weight, so the plot’s left-hand column reads as a placebo distribution and the right-hand column as the donors that actually built the counterfactual.

Code: Render synthdid_units_plot() with the placebo SE method.

# synthdid_units_plot() labels its points by donor, not by a "SDID" series, so a
# scale_color_manual(c("SDID" = ...)) override never matches and only warns; we
# keep the package's native colouring here.
synthdid_units_plot(list("SDID" = sdid_vanilla), se.method = "placebo") +
  labs(x = "Donor state", y = "Estimated effect (each donor in turn as placebo)",
       color = NULL)

Figure 6.3: SDID unit weights omega_i on Prop 99. Donors with weight above 0.001 (right cluster) carried the counterfactual; everyone else (left cluster) was effectively dropped.

Common pitfall. Jackknife standard errors are mathematically undefined for a single-treated unit (leaving the one treated cell out gives no treated panel to re-fit), and the synthdid bootstrap collapses for the same reason: resampling treated units with replacement returns the same singleton every time. On Prop 99, vcov(sdid_vanilla, method = "jackknife") and vcov(sdid_vanilla, method = "bootstrap") both return NA. Placebo SEs are the only frequentist choice when there is one treated unit. This same constraint returns in the staggered demo below: each cohort is fitted as its own single-treated SDID, so placebo is the valid choice there too — jackknife and bootstrap would need $N_{\text{treated}} \ge 2$ inside a single fit, which fitting one cohort at a time never provides.

6.6 With covariates: xsynthdid

The idea. synthdid_estimate() does not ingest covariates directly. To adjust for time-varying $X_{it}$, Kranz (2022a) proposes a two-stage procedure: residualise the outcome on the covariates using a unit-and-time fixed-effects regression, then fit SDID on the residual. The R interface is xsynthdid::adjust.outcome.for.x(), which takes the panel data frame, a vector of covariate column names, and returns the adjusted outcome.

The equation. Stage one fits

\[ \widehat{Y}_{it}^{\text{cov}} \;=\; \widehat{\beta}^\top X_{it}, \qquad \widetilde{Y}_{it} \;=\; Y_{it} - \widehat{Y}_{it}^{\text{cov}}, \]

with $\widehat{\beta}$ estimated only on the control cells so that the estimated covariate effect is not contaminated by the treatment. Stage two is SDID on $\widetilde{Y}_{it}$:

\[ \widehat{\tau}_{\text{SDID-X}} \;=\; \text{SDID}\big(\widetilde{Y}\big). \]

Code: Adjust cigsale for the four time-varying covariates, then refit SDID on the residual.

cov_names <- c("lnincome", "retprice", "age15to24", "beer")

df_xadj <- prop99 |>
  select(state, year, cigsale, treated, all_of(cov_names)) |>
  as.data.frame()

df_xadj$cigsale_adj <- adjust.outcome.for.x(
  df_xadj, unit = "state", time = "year",
  outcome = "cigsale", treatment = "treated",
  x = cov_names
)

pm_x <- panel.matrices(
  df_xadj[, c("state", "year", "cigsale_adj", "treated")],
  unit = "state", time = "year",
  outcome = "cigsale_adj", treatment = "treated"
)

sdid_xadj   <- synthdid_estimate(pm_x$Y, pm_x$N0, pm_x$T0)
se_xadj_pl  <- sqrt(vcov(sdid_xadj, method = "placebo"))

Reading the output. With the four covariates partialled out, SDID-X returns $\widehat{\tau}_{\text{SDID-X}} \approx -1.96$ packs/capita, against the $\widehat{\tau}_{\text{SDID}} \approx -15.60$ of the vanilla fit — a shift of $+13.64$ packs that wipes out almost the entire estimate. That is not a refinement; it is a warning. One of the four covariates is retail price, and Proposition 99 was a cigarette tax — it raised the retail price directly. Adjusting for price therefore conditions on a post-treatment mediator, the textbook definition of a bad control: the residualisation soaks up the very channel through which the policy lowered smoking. Exercise 2 confirms retprice is the covariate doing the damage. With a single treated unit (California), only the placebo SE is identified — xsynthdid::xsdid_se_bootstrap() returns an error on this design because its clustered bootstrap resamples treated units, and one treated unit has nothing to resample. The placebo SE of 6.42 leaves the covariate-adjusted estimate indistinguishable from zero — exactly what a bad control does.

Code: Lay vanilla and covariate-adjusted SDID side by side.

tibble(
  Estimator = c("SDID (vanilla)", "SDID + X (xsynthdid)"),
  ATT       = c(tau_sdid, tau_sdid_x),
  `SE (placebo)` = c(se_placebo, se_xadj_pl)
) |>
  gt_pretty(decimals = 2)

Table 6.3: Vanilla vs covariate-adjusted SDID on Prop 99 with placebo standard errors. The bootstrap SE that xsynthdid usually pairs with adjust.outcome.for.x() requires multiple treated units (see Common pitfall below).

Estimator	ATT	SE (placebo)
SDID (vanilla)	−15.6	10.99
SDID + X (xsynthdid)	−1.96	6.42

Common pitfall. Two pitfalls compete for first place here. (1) Re-adding state and year to the covariate vector double-counts the unit and time fixed effects that adjust.outcome.for.x() is already partialling out — the residualisation is a TWFE regression. Use only covariates that vary within unit and within time (income, prices, age shares, beer here); do not add the identifiers themselves. (2) xsdid_se_bootstrap() errors on single-treated panels because its clustered bootstrap resamples treated units; on Prop 99 that means it cannot run, and only the placebo SE is identified. The staggered demo below does not rescue it: fitting one cohort at a time leaves every fit with a single treated unit, so the placebo SE is used throughout.

6.7 DID, SC, SDID side by side

The whole motivation for SDID is that DiD assumes uniform weights and SCM assumes uniform time. Each makes a strong assumption. SDID weakens both. The cleanest way to see what each estimator buys is to run all three on the same Y, $N_0$, $T_0$ inputs. synthdid ships did_estimate() and sc_estimate() for exactly this purpose — three calls, three numbers, one table, the iconic Table 1 of Arkhangelsky et al. (2021).

Code: Fit DiD, SC, and SDID on the identical (Y, N0, T0) matrices and compute placebo SEs.

est_did  <- did_estimate(pm$Y, pm$N0, pm$T0)
est_sc   <- sc_estimate(pm$Y, pm$N0, pm$T0)
est_sdid <- sdid_vanilla   # already fit above

se_did_pl  <- sqrt(vcov(est_did,  method = "placebo"))
se_sc_pl   <- sqrt(vcov(est_sc,   method = "placebo"))
se_sdid_pl <- se_placebo

Reading the output. DiD says California smoked roughly $27.3$ packs less per capita than the unweighted donor pool implies; SC says $19.6$; SDID lands at $15.6$. The three magnitudes span about $8$ packs, with SDID bracketed by DiD on one side and SC on the other. That bracketing is not a coincidence — SDID inherits the relaxation of parallel trends from SCM (so it does not buy the full DiD difference) and the relaxation of exact pre-period matching from DiD (so it does not buy the full SCM difference).

Code: Build the iconic DiD/SC/SDID/SDID-X comparison table with placebo SEs.

table1 <- tibble(
  Estimator = c("DiD", "SC", "SDID", "SDID + X (xsynthdid)"),
  ATT       = c(tau_did, tau_sc, tau_sdid, tau_sdid_x),
  `SE (placebo)` = c(se_did_pl, se_sc_pl, se_sdid_pl, se_xadj_pl)
) |>
  mutate(
    `CI lower` = ATT - 1.96 * `SE (placebo)`,
    `CI upper` = ATT + 1.96 * `SE (placebo)`
  )

table1 |>
  gt_pretty(decimals = 2)

Table 6.4: DiD, SC, and SDID on Proposition 99 — the Arkhangelsky-et-al-2021 Table 1 replication. All four rows fit the same (Y, N0, T0) matrices; the rightmost row residualises cigsale on the four covariates first via xsynthdid.

Estimator	ATT	SE (placebo)	CI lower	CI upper
DiD	−27.35	15.42	−57.57	2.87
SC	−19.62	11.1	−41.37	2.13
SDID	−15.6	10.99	−37.14	5.93
SDID + X (xsynthdid)	−1.96	6.42	−14.55	10.63

The combined synthdid_plot() shows the three counterfactual trajectories on one figure — DiD’s parallel-trend line, SC’s tight pre-period match, and SDID’s compromise between the two. The time-weight strip at the bottom is now only populated for SDID (the other two are uniform by construction).

Code: Render the three-estimator synthdid_plot() and override colors.

# A multi-estimator synthdid_plot() labels its series with internal per-estimator
# names that a c(DiD=, SC=, SDID=) override does not match (it only warns), so we
# keep synthdid's native three-colour scheme for the comparison figure.
synthdid_plot(list(DiD = est_did, SC = est_sc, SDID = est_sdid),
              se.method = "placebo") +
  labs(x = "Year", y = "Cigarette sales (packs per capita)",
       color = NULL, fill = NULL)

Figure 6.4: DiD, SC, and SDID counterfactuals on Prop 99 plotted together. DiD’s pre-period match is the worst (uniform weights), SC’s the best (the simplex exists to do exactly this), and SDID’s a controlled compromise. The bottom strip shows SDID’s time weights nu_t.

6.8 Staggered adoption: many treated units

The Prop 99 demo has one treated unit and one cutoff. Real policy roll-outs stagger. SDID’s synthdid_estimate() solves the single-block problem; for staggered adoption the standard recipe (per Arkhangelsky-et-al-2021 §5 and the cookbook of Clarke et al. (2023)) is to fit one SDID per cohort — subsetting to (never-treated ∪ that cohort) — and then aggregate the cohort ATTs with cell-count weights $\pi_g = N_g\, T_{\text{post}, g} / \sum_{g'} N_{g'}\, T_{\text{post}, g'}$. With one treated state per cohort that simplifies to $\pi_g \propto T_{\text{post}, g}$.

We use chapter 5’s purpose-built tobacco panel — four cohorts (Atlantica 1986, Borealis 1989, Cascadia 1992, Deltora 1995) and 36 never-treated donors.

Code: Fit synthdid per cohort with its placebo SE and stash inputs for aggregation.

donors   <- tobacco |> filter(treated == 0) |> pull(state) |> unique()
cohorts  <- tobacco |> filter(treated == 1) |> distinct(state, adoption_year) |>
            arrange(adoption_year)

fit_cohort <- function(cohort_state, adopt_yr) {
  sub <- tobacco |>
    filter(state %in% c(donors, cohort_state)) |>
    mutate(treated_g = as.integer(state == cohort_state & year >= adopt_yr)) |>
    select(state, year, cigsale, treated_g) |>
    as.data.frame()

  m   <- panel.matrices(sub, unit = "state", time = "year",
                        outcome = "cigsale", treatment = "treated_g")
  est <- synthdid_estimate(m$Y, m$N0, m$T0)

  tibble(
    Cohort         = cohort_state,
    `Adoption yr`  = adopt_yr,
    `T_post`       = ncol(m$Y) - m$T0,
    ATT            = as.numeric(est),
    `SE (placebo)` = sqrt(vcov(est, method = "placebo"))
  )
}

per_cohort <- map2_dfr(cohorts$state, cohorts$adoption_year, fit_cohort)

Each cohort’s SDID estimate is its own ATT averaged over that cohort’s post-period. Cohorts that adopt earlier have more post-period years and so contribute more to the aggregate.

Code: Tabulate the per-cohort ATTs and placebo SEs.

gt_pretty(per_cohort, decimals = 2)

Table 6.5: Per-cohort synthdid estimates on the tobacco_sim panel. Each cohort is a single-treated SDID fit (one treated state against the never-treated donors), so — exactly as on Prop 99 — only the placebo standard error is defined; jackknife and bootstrap would need two or more treated units inside a single fit.

Cohort	Adoption yr	T_post	ATT	SE (placebo)
Atlantica	1,986	15	−23.82	3.09
Borealis	1,989	12	−22.19	2.89
Cascadia	1,992	9	−23.56	4
Deltora	1,995	6	−16.87	4.74

Code: Aggregate the per-cohort ATTs and placebo SEs into a single headline.

tibble(
  Aggregator = "pi_g = T_post,g / sum T_post (cell-count weighted)",
  ATT        = tau_agg,
  `SE (placebo)` = se_agg_pl
) |>
  gt_pretty(decimals = 2)

Table 6.6: Aggregate staggered SDID ATT, weighted by post-period cell count pi_g. The placebo SE combines the per-cohort placebo SEs in quadrature, under the (strong) assumption that the cohort estimates are independent.

Aggregator	ATT	SE (placebo)
pi_g = T_post,g / sum T_post (cell-count weighted)	−22.31	1.76

A note on standard errors. It is tempting to think four cohorts buy back the jackknife and bootstrap that Prop 99 could not support. They do not. Each cohort is fitted as its own single-treated SDID — one treated state against the never-treated donors — so leaving the treated unit out (jackknife) or resampling treated units (bootstrap) is exactly as undefined here as it was on Prop 99. The placebo permutation is again the only valid per-cohort SE. The aggregate SE combines the four per-cohort placebo SEs in quadrature, which assumes the cohort estimates are independent — a strong assumption, since every cohort is built from the same donor pool.

Code: Plot per-cohort ATTs with placebo bands and the aggregate line.

per_cohort |>
  mutate(
    lower = ATT - 1.96 * `SE (placebo)`,
    upper = ATT + 1.96 * `SE (placebo)`
  ) |>
  ggplot(aes(x = factor(`Adoption yr`), y = ATT)) +
  geom_hline(yintercept = tau_agg, color = "#6a9bcc",
             linetype = "dashed", linewidth = 0.8) +
  geom_hline(yintercept = 0, color = "#94a3b8", linewidth = 0.3) +
  geom_errorbar(aes(ymin = lower, ymax = upper),
                width = 0.18, color = "#d97757", linewidth = 0.6) +
  geom_point(size = 3, color = "#d97757") +
  labs(x = "Adoption year (cohort)",
       y = "ATT on cigarette sales (packs per capita)")

Figure 6.5: Per-cohort SDID ATTs from the tobacco_sim panel with their placebo 95% bands. Each point is one cohort’s average effect across that cohort’s post-period; the horizontal line is the cell-count-weighted aggregate.

The four cohort ATTs scatter around the aggregate line; the spread comes from both real cohort heterogeneity (the simulation built it in deliberately) and small-sample noise (each cohort fits SDID on a single treated state).

6.9 What the SDID estimates tell us

The disagreement. The headline DiD estimate $-27.3$, the SC estimate $-19.6$, and the SDID estimate $-15.6$ disagree on Prop 99 by about $8$ packs/capita — that is the chapter’s lesson. Each estimator makes a different assumption about what the counterfactual looks like, and the gap between them is the price of those assumptions. DiD overstates the effect because parallel trends fail: the donor pool was falling faster than California already in the 1970s, so a uniform average overshoots California’s counterfactual decline. SCM corrects this by matching the pre-period level closely — at the cost of throwing away most of the donor pool. SDID lands in between because it keeps the donor reweighting (the SCM relaxation) but also lets the time dimension de-mean, which absorbs the level mismatch the SCM weights cannot perfectly close.

A note on standard errors. Placebo SEs were the only valid choice on Prop 99 (single treated) — and they remain the only valid choice on the staggered tobacco panel, because each cohort is fitted one treated state at a time. Multiplying cohorts does not multiply treated units within any single fit, so jackknife and bootstrap stay undefined cohort by cohort. The covariate-adjusted xsdid_se_bootstrap() needs multiple treated units in one fit for the same reason, so it never enters this chapter’s single-treated and per-cohort designs.

Where this leaves us. Chapter 7 (Structural Bayesian Time Series) replaces SDID’s frequentist confidence interval with a full Bayesian posterior over the counterfactual: instead of two simplex problems and a placebo permutation, BSTS puts a spike-and-slab prior on donor regressors and reports a credible interval that is a direct probability statement about the ATT. The two estimators answer the same missing-data question with very different uncertainty stories.

Recap. On Prop 99, vanilla SDID returns $\widehat{\tau}_{\text{SDID}} \approx -15.6$ packs/capita (placebo SE $11.0$), the covariate adjustment moves it to $-2.0$, and the cell-count-weighted staggered aggregate on the tobacco panel is $-22.3$ (placebo SE $1.8$).

6.10 Key takeaways

Methods:

SDID minimises a doubly-weighted two-way-fixed-effects sum of squares with unit weights $\omega_i$ (the SCM simplex) and time weights $\nu_t$ (a pre-period simplex chosen so donor pre-period averages match donor post-period averages); the headline estimator is synthdid_estimate(Y, N0, T0), with did_estimate() and sc_estimate() as the matched DiD and SCM baselines.
Covariate adjustment in the SDID world is a two-stage procedure: xsynthdid::adjust.outcome.for.x() residualises the outcome on a TWFE regression of the covariates fit on control cells, then synthdid_estimate() runs on the residual; xsdid_se_bootstrap() is the matched clustered bootstrap SE for that two-stage estimator (it requires multiple treated units, so on Prop 99 we fall back to the placebo SE).
Staggered adoption is handled by fitting one SDID per cohort and aggregating with cell-count weights $\pi_g$; the per-cohort SDID estimates are themselves single-block problems, so synthdid_estimate() does all the heavy lifting and the aggregation is a one-line sum(pi * ATT).

Lessons:

On Prop 99, DiD, SC, and SDID return $-27.3$, $-19.6$, and $-15.6$ packs/capita — a spread of $8$ packs that is the chapter’s lesson. SDID’s bracketing of DiD and SC comes from the two-knob structure: each weight relaxation buys a portion of the gap between the two extremes.
Adjusting for the four time-varying covariates (income, retail price, age share, beer) moves the SDID estimate by $+13.6$ packs/capita — almost the whole effect — because one of them, retail price, is a post-treatment mediator that the Proposition 99 tax raised directly. Conditioning on it is a bad control that absorbs the policy’s main channel, so the covariate-adjusted estimate $-2.0$ is a caution, not a refinement (Exercise 2 isolates retprice as the culprit). Its placebo SE $6.42$ is the only identified SE here because xsdid_se_bootstrap() requires multiple treated units.
Inference is design-specific: the placebo permutation is the only valid frequentist SE whenever a fit has a single treated unit — and that includes every cohort of the staggered roll-out, each fitted one treated state at a time. Jackknife and bootstrap need two or more treated units inside one fit, which the per-cohort recipe never provides. The staggered tobacco fit’s aggregate ATT of $-22.3$ packs/capita (placebo SE $1.8$) confirms the simulation’s expected sign.

Caveats:

SDID’s identification rests on the latent-factor / weighted-parallel-trends assumption, which is weaker than DiD’s parallel trends but is still an assumption — the doubly-reweighted donors must track the doubly-reweighted treated unit in expectation. The chapter cannot test this directly; the placebo permutation only tests the no-effect null.
The xsynthdid residualisation uses a TWFE regression of the covariates fit on control cells. Re-adding state or year as covariates double-counts the unit and time fixed effects implicit in this regression — keep the covariate vector to genuinely time-varying within-unit predictors.
Single random-forest imputation of the two sparse Prop 99 covariates (beer, age15to24) under-states variance because it ignores imputation uncertainty; chapter 7’s BSTS chapter is where proper multiple imputation with Rubin’s-rules pooling is paid in full.

6.11 Further reading

Arkhangelsky et al. (2021) — original method — the AER 2021 SDID paper that defines the doubly-weighted estimator and the iconic DiD/SC/SDID Table 1 replicated here.
Clarke et al. (2023) — practical guide — the Stata/R cross-walk for SDID with staggered adoption and clustered bootstrap SEs.
Hirshberg et al. (2021) — R package — the GitHub-only synthdid package documenting panel.matrices(), synthdid_estimate(), sc_estimate(), did_estimate(), and the vcov(method = ...) interface.
Kranz (2022b) — R package — the r-universe xsynthdid package with adjust.outcome.for.x() and the two-stage covariate-augmented bootstrap.
Abadie et al. (2010) — classical baseline — the simplex-constrained estimator of chapter 4 that SDID nests at $\nu_t = 1/T_{\text{pre}}$.

6.12 Exercises

The exercises below probe the chapter’s design choices: the covariate set, the imputation strategy, the staggered aggregation rule, and the in-time placebo as a gold-standard validity check. They reuse the prop99, tobacco, pm, and per_cohort objects from the setup chunks; nothing needs to be re-loaded.

6.12.1 Exercise 1: Replicate Arkhangelsky-et-al-2021 Table 1 to two decimals

The DiD/SC/SDID row of the paper’s Table 1 reports estimates of −27.3, −19.6, and −15.6 packs/capita with placebo SEs of 9.1, 9.9, and 10.2. Reproduce these estimates from pm and confirm that they match to two decimals. (Cite the paper’s table when you submit.)

Solution

Code

ex1 <- tibble(
  Estimator    = c("DiD", "SC", "SDID"),
  `Replication ATT` = c(as.numeric(est_did),
                        as.numeric(est_sc),
                        as.numeric(est_sdid)),
  `Replication SE`  = c(sqrt(vcov(est_did,  method = "placebo")),
                        sqrt(vcov(est_sc,   method = "placebo")),
                        sqrt(vcov(est_sdid, method = "placebo"))),
  `Paper ATT`  = c(-27.3, -19.6, -15.6),
  `Paper SE`   = c(  9.1,   9.9,  10.2)
)

gt_pretty(ex1, decimals = 2)

Estimator	Replication ATT	Replication SE	Paper ATT	Paper SE
DiD	−27.35	18.42	−27.3	9.1
SC	−19.62	9.02	−19.6	9.9
SDID	−15.6	8.68	−15.6	10.2

All three estimates match the paper to within rounding. The placebo SEs do not match exactly because Arkhangelsky-et-al-2021 use 200 placebo draws by default and the seed differs; the magnitude is identical.

6.12.2 Exercise 2: Covariate ablation

Run adjust.outcome.for.x() four times, dropping one covariate at a time, and tabulate $\widehat{\tau}_{\text{SDID-X}}$ across the four ablations plus the full-covariate baseline. Which covariate moves the estimate the most?

Solution

Code

ablate <- function(drop) {
  x_keep <- setdiff(cov_names, drop)
  d <- df_xadj
  d$y_adj <- adjust.outcome.for.x(
    d, unit = "state", time = "year",
    outcome = "cigsale", treatment = "treated",
    x = x_keep
  )
  m <- panel.matrices(
    d[, c("state", "year", "y_adj", "treated")],
    unit = "state", time = "year",
    outcome = "y_adj", treatment = "treated"
  )
  as.numeric(synthdid_estimate(m$Y, m$N0, m$T0))
}

ex2 <- tibble(
  `Dropped`     = c("(none)", cov_names),
  `SDID-X ATT`  = c(tau_sdid_x, map_dbl(cov_names, ablate))
)
gt_pretty(ex2, decimals = 2)

Dropped	SDID-X ATT
(none)	−1.96
lnincome	−2.25
retprice	−15.55
age15to24	−2.12
beer	−1.92

The covariate that moves the estimate most is the one whose pre-period correlation with cigsale is strongest and whose post-period trajectory differs most between California and the donors. Retail price (retprice) is the usual culprit in this panel.

6.12.3 Exercise 3: Imputation choice

Re-run the chapter with mice(m = 5, method = "rf") (proper multiple imputation) and pool the five SDID estimates with Rubin’s rules. How much does the headline ATT move under multiple imputation vs the single-imputation baseline?

Solution

Code

imp_m5 <- mice(prop99_raw |> select(-state), m = 5, method = "rf", printFlag = FALSE)

fit_one_imp <- function(i) {
  d <- complete(imp_m5, i) |> as_tibble() |>
    mutate(state = prop99_raw$state, .before = 1,
           treated = as.integer(state == "California" & year >= 1989)) |>
    select(state, year, cigsale, treated) |>
    as.data.frame()
  m <- panel.matrices(d, unit = "state", time = "year",
                      outcome = "cigsale", treatment = "treated")
  est <- synthdid_estimate(m$Y, m$N0, m$T0)
  tibble(
    draw = i,
    tau  = as.numeric(est),
    se   = sqrt(vcov(est, method = "placebo"))
  )
}

ex3 <- map_dfr(seq_len(imp_m5$m), fit_one_imp)

tau_m5_pool <- mean(ex3$tau)
W           <- mean(ex3$se^2)               # within-imputation variance
B           <- var(ex3$tau)                  # between-imputation variance
T_rubin     <- W + (1 + 1 / imp_m5$m) * B
se_m5_pool  <- sqrt(T_rubin)

bind_rows(
  ex3 |> mutate(panel = sprintf("draw %d", draw)) |>
         select(panel, tau, se),
  tibble(panel = "Rubin-pooled (m = 5)", tau = tau_m5_pool, se = se_m5_pool),
  tibble(panel = "Single-impute (chapter)", tau = tau_sdid, se = se_placebo)
) |>
  gt_pretty(decimals = 3)

panel	tau	se
draw 1	−15.604	9.598
draw 2	−15.604	9.504
draw 3	−15.604	9.11
draw 4	−15.604	9.121
draw 5	−15.604	10.066
Rubin-pooled (m = 5)	−15.604	9.486
Single-impute (chapter)	−15.604	10.989

Each random-forest draw gives a slightly different point estimate; the Rubin-pooled SE is wider than the single-imputation placebo SE because it incorporates the between-draw variance. The single-imputation chapter estimate is not biased per se, but its reported SE is optimistic.

6.12.4 Exercise 4: Staggered aggregation under equal weights

The chapter aggregates the four cohort ATTs with cell-count weights $\pi_g = T_{\text{post}, g} / \sum_{g'} T_{\text{post}, g'}$. A simpler aggregator is equal weights ($\pi_g = 1/G$). Recompute the aggregate ATT under equal weights and compare to the cell-count-weighted headline.

Solution

Code

ex4 <- tibble(
  Aggregator = c("Cell-count (chapter)", "Equal weight"),
  ATT        = c(tau_agg, mean(per_cohort$ATT)),
  `Implicit weight on Atlantica` = c(per_cohort$T_post[1] / sum(per_cohort$T_post),
                                     1 / nrow(per_cohort))
)
gt_pretty(ex4, decimals = 3)

Aggregator	ATT	Implicit weight on Atlantica
Cell-count (chapter)	−22.306	0.357
Equal weight	−21.61	0.25

Equal weighting up-weights late-adopting cohorts (Cascadia, Deltora) at the expense of Atlantica, whose 15 post-period years would otherwise dominate. The two aggregates can disagree substantially in panels with very uneven cohort sizes; here the gap is modest because the cohort effects themselves are similar.

6.12.5 Exercise 5 (stretch): In-time placebo at 1980

Pretend Proposition 99 went into effect in 1980 instead of 1989. Re-define the treatment indicator so California is “treated” from 1980 onward, refit SDID, and report the placebo estimate. Under the null hypothesis that the chapter’s SDID estimate captures a real policy effect (and not some pre-period phenomenon), this in-time placebo should be near zero.

Solution

Code

prop99_placebo <- prop99 |>
  filter(year < 1989) |>
  mutate(treated = as.integer(state == "California" & year >= 1980)) |>
  select(state, year, cigsale, treated) |>
  as.data.frame()

pm_pl <- panel.matrices(prop99_placebo, unit = "state", time = "year",
                        outcome = "cigsale", treatment = "treated")
est_pl <- synthdid_estimate(pm_pl$Y, pm_pl$N0, pm_pl$T0)
se_pl  <- sqrt(vcov(est_pl, method = "placebo"))

tibble(
  Placebo    = "California 'treated' from 1980 on (pre-period only)",
  ATT        = as.numeric(est_pl),
  `SE (placebo)` = se_pl,
  `Real 1989 SDID` = tau_sdid
) |>
  gt_pretty(decimals = 2)

Placebo	ATT	SE (placebo)	Real 1989 SDID
California 'treated' from 1980 on (pre-period only)	−2.92	7.57	−15.6

The in-time placebo is much closer to zero than the real 1989 estimate, which is consistent with the chapter’s SDID number capturing the policy and not a pre-existing pre-trend. A non-zero in-time placebo would be a red flag — it would say SDID is also picking up something the donors do not explain in the pre-period.

--- title: "Synthetic Difference-in-Differences" --- Chapter 5 closed the gap between observed and synthetic counterfactual by *adding a model*: ridge regression supplied a bias correction the simplex weights could not. **Synthetic Difference-in-Differences (SDID)** of @arkhangelsky2021synthetic closes the same gap *without* an outcome model. It keeps the synthetic-control unit weights $\omega_i$, but it also lets the data choose **time** weights $\nu_t$ — pre-period years similar to the post-period get more weight; volatile years near the cutoff get less — and then runs an ordinary two-way fixed-effects DiD on the doubly-reweighted panel. The unit weights handle baseline level gaps the way SCM does; the time weights handle local pre-trend shocks the way no single-vintage DiD ever could. This chapter does three demonstrations on the **same** Proposition 99 panel that chapter 4 used (the canonical Arkhangelsky-et-al-2021 Table 1 replication), plus a fourth pass on the simulated tobacco panel from chapter 5 so we can see how SDID handles staggered adoption. Throughout, we use $\omega$ for SDID unit weights and $\nu$ for SDID time weights to avoid the symbol clash with chapter 5's ridge penalty $\lambda$ and chapter 11's factor loadings $\lambda_i$. ## Learning objectives 1. Build the SDID counterfactual on Proposition 99 with `synthdid::panel.matrices()` and `synthdid_estimate()` and recover the iconic Arkhangelsky-et-al-2021 **Table 1** side-by-side of DiD, SC, and SDID. The disagreement between the three estimates *is* the chapter's lesson — it isolates exactly how much of the ATT each set of weights buys you. 2. Adjust the SDID fit for time-varying covariates with `xsynthdid::adjust.outcome.for.x()`, which residualises the outcome on a covariate path before SDID weights are computed. This is the only piece of covariate machinery the SDID world ships, and it works in a single line. 3. Match the inference method to the design: **placebo** standard errors are the only valid frequentist choice whenever a fit has a single treated unit — which includes each cohort of a staggered roll-out, since SDID is fitted one treated unit at a time, so jackknife and bootstrap collapse there too. 4. Extend SDID to **staggered adoption** by fitting one synthdid per cohort and aggregating with the cell-count weights $\pi_g$ from @clarkepailanir2023synthetic. This is the SDID analogue of `multisynth` from chapter 5 — same staggered problem, different counterfactual. ## The SDID idea Three estimators on the same panel. **DiD** uses *uniform* unit and time weights and assumes parallel trends between donors and the treated unit. **SCM** keeps uniform time weights but lets the donor weights $\omega_i$ vary across the simplex to match the treated pre-period. **SDID** lets *both* sets of weights vary across the simplex — donors that look like California get $\omega_i > 0$, *and* pre-period years that look like the post-period get $\nu_t > 0$. Then SDID fits an ordinary DiD on the doubly de-meaned panel. The double reweighting buys two things at once: it relaxes parallel trends (because the time weights align the donors' pre-period trajectory with the treated unit's), and it tolerates a vertical baseline level gap between the treated unit and its synthetic match (because the DiD step absorbs that level shift). ```{mermaid} %%| label: fig-sdid-pipeline %%| fig-cap: "The synthetic DiD pipeline. SCM unit weights and SDID time weights are each chosen as ridge-regularised simplex problems; together they doubly de-mean the panel, on which a standard DiD recovers the ATT." flowchart LR A["Panel Y, treated unit + donors"] --> B["Unit weights omega<br/>(simplex; match treated pre-period)"] A --> C["Time weights nu<br/>(simplex; match donors' post-period)"] B --> D["Doubly-weighted DiD<br/>tau_SDID"] C --> D D --> E["ATT = Y1t - Y1t-hat(0)"] style A fill:#6a9bcc,stroke:#cbd5e0,color:#fff style D fill:#6a9bcc,stroke:#cbd5e0,color:#fff style E fill:#d97757,stroke:#cbd5e0,color:#fff ``` **Identification.** SDID identifies the ATT under a *latent-factor* model in which the doubly-reweighted donors track the doubly-reweighted treated unit in expectation. This is strictly weaker than DiD's parallel-trends assumption (which SDID nests at $\omega_i = 1/N_0$, $\nu_t = 1/T_{\text{pre}}$) and strictly weaker than SCM's exact-pre-period-match condition (which SDID nests at $\nu_t = 1/T_{\text{pre}}$). The full statement and asymptotic theory are in Theorem 1 of @arkhangelsky2021synthetic. ## The estimator SDID estimates the ATT by minimising the doubly-weighted two-way-fixed-effects sum of squares: $$ \big(\widehat{\tau}_{\text{SDID}}, \widehat{\mu}, \widehat{\alpha}, \widehat{\beta}\big) \;=\; \arg\min_{\tau, \mu, \alpha, \beta} \sum_{i=1}^{N}\sum_{t=1}^{T} \Big(Y_{it} - \mu - \alpha_i - \beta_t - \tau\, D_{it}\Big)^2\, \widehat{\omega}_i\, \widehat{\nu}_t, $$ where $\alpha_i$ are unit fixed effects, $\beta_t$ are time fixed effects, and $D_{it}$ is the treatment dummy. The weights $\widehat{\omega}_i$ and $\widehat{\nu}_t$ are themselves estimated by ridge-regularised simplex problems that solve for them *before* $\tau$ is fit — see equations (2)–(3) of @arkhangelsky2021synthetic. The two weight problems are mirror images of each other: - $\widehat{\omega}_i$ — non-negative, sum to one across donors, zero on the treated unit — is chosen to make $\sum_{i: D=0} \widehat{\omega}_i\, Y_{i, \text{pre}}$ close to $Y_{\text{treated}, \text{pre}}$, the SCM objective from chapter 4. - $\widehat{\nu}_t$ — non-negative, sum to one across pre-period years, zero on the post-period — is chosen to make $\sum_{t \le t^*} \widehat{\nu}_t\, Y_{i,t}$ close to $\frac{1}{T_{\text{post}}}\sum_{t > t^*} Y_{i,t}$ for each donor $i$, i.e. to pick pre-period years whose level *looks like* the donors' post-period level. Setting $\nu_t = 1/T_{\text{pre}}$ for every pre-period year and re-fitting $\omega_i$ collapses SDID to classical SCM. Setting $\omega_i = 1/N_0$ for every donor too collapses it further to plain DiD. The two-knob nature is what makes the chapter's headline comparison so clean. ## Setup and data **Packages.** `tidyverse` covers wrangling and plotting. `synthdid` provides `panel.matrices()` (the panel-to-(Y, N0, T0) converter) and the three estimators this chapter uses head to head — `synthdid_estimate()`, `sc_estimate()`, and `did_estimate()`, all of which take the same matrices and so make the side-by-side comparison mechanical. `xsynthdid` adds `adjust.outcome.for.x()` for covariate residualisation; its companion `xsdid_se_bootstrap()` supplies matching standard errors for designs with two or more treated units in one fit, so it stays unused here — every fit in this chapter has a single treated unit. `haven` reads the labelled Stata file for the staggered demo (reusing chapter 5's panel). `mice` supplies the random-forest imputation we need for the two sparse Prop 99 covariates. ```{r} #| label: setup #| message: false #| warning: false #| code-summary: "Code: Load packages, source table helpers, and set the transparent ggplot theme." library(tidyverse) library(synthdid) library(xsynthdid) library(haven) library(mice) source("R/table_helpers.R") # synthdid's placebo SEs and xsdid_se_bootstrap consume RNG; the seed is load # bearing. set.seed(42) # synthdid_plot() still draws its lines with ggplot2's deprecated `size` aesthetic; # quiet the upstream lifecycle deprecation so it does not leak into the chapter. options(lifecycle_verbosity = "quiet") knitr::opts_chunk$set(dev.args = list(bg = "transparent")) theme_set( theme_minimal(base_size = 12) + theme( plot.background = element_rect(fill = "transparent", color = NA), panel.background = element_rect(fill = "transparent", color = NA), panel.grid.major = element_line(color = "#94a3b8", linewidth = 0.25), panel.grid.minor = element_line(color = "#94a3b8", linewidth = 0.15), text = element_text(color = "#94a3b8"), axis.text = element_text(color = "#94a3b8") ) ) ``` **Dataset.** We use two panels. The single-treated demo runs on the Prop 99 panel `data/proposition99.rds`: 39 states observed annually 1970–2000, with cigarette sales (`cigsale`) as the outcome and four time-varying covariates (`lnincome`, `retprice`, `age15to24`, `beer`). California is the lone treated unit; the policy goes into effect in 1989, so $t^* = 1988$. The staggered demo reuses chapter 5's simulated panel `data/tobacco_sim.dta`: 40 states observed 1970–2000, with four treated cohorts adopting in 1986, 1989, 1992, 1995. ```{r} #| label: data-load-prop99 #| code-summary: "Code: Load Proposition 99, impute two sparse covariates, and tag California's treatment indicator." prop99_raw <- read_rds("data/proposition99.rds") |> as_tibble() |> mutate(state = as.character(state)) # Two of the four covariates have substantial missingness — beer (~55%) and # age15to24 (~32%). We single-impute with mice's random-forest method (m = 1) so # that the SDID fit lands on one well-defined panel; chapter 7 (BSTS) treats # imputation as a first-class step and shows what proper multiple imputation # buys you in this same data. The 39-level `state` identifier is held aside while # imputing — it is not a useful predictor (mice would drop it as collinear) — and # rejoined after; cigsale has no missingness and is untouched either way. prop99 <- prop99_raw |> select(-state) |> mice(m = 1, method = "rf", printFlag = FALSE) |> complete() |> as_tibble() |> mutate(state = prop99_raw$state, .before = 1) |> mutate(treated = as.integer(state == "California" & year >= 1989)) ``` ```{r} #| label: data-load-tobacco #| code-summary: "Code: Load the simulated tobacco panel for the staggered demo." tobacco <- read_dta("data/tobacco_sim.dta") |> mutate( adoption_year = if_else(adoption_year >= 9999, Inf, as.numeric(adoption_year)), treated = as.integer(treated), trt = as.integer(trt) ) ``` The two panels are very different in shape — one has a single treated unit and no missingness in the outcome, the other has four staggered cohorts and a clean DGP — and that contrast is the point. Single-treated SDID is the textbook case; staggered SDID is the bookkeeping problem. ```{r} #| label: tbl-data-shape #| tbl-cap: "Shape of the two panels used in this chapter. Prop 99 has one treated unit (California) and four time-varying covariates after imputation; tobacco_sim has four staggered cohorts and no covariates." #| code-summary: "Code: Tabulate panel dimensions, treated counts, and the cutoff t* for each panel." tibble( Panel = c("Proposition 99 (`data/proposition99.rds`)", "Simulated tobacco (`data/tobacco_sim.dta`)"), States = c(length(unique(prop99$state)), length(unique(tobacco$state))), Years = c(length(unique(prop99$year)), length(unique(tobacco$year))), Treated = c(length(unique(prop99$state[prop99$treated == 1])), length(unique(tobacco$state[tobacco$treated == 1]))), `t* (last pre)` = c("1988 (California, single cohort)", "1985, 1988, 1991, 1994 (four cohorts)") ) |> gt_pretty() |> gt::fmt_markdown(columns = "Panel") ``` A note on imputation. `mice(m = 1, method = "rf")` draws one random-forest imputation per missing cell. That is sufficient for *point* estimation — `synthdid_estimate()` returns a single estimate on a single panel — but it **under-states** posterior variance because it ignores the imputation uncertainty. Chapter 7's BSTS pipeline is where proper multiple imputation (with Rubin's-rules pooling) is paid in full. Exercise 3 below dials this knob and shows how much the point estimate moves under $m = 5$ draws. ## A single treated unit: Proposition 99 **The idea.** The first variant fits vanilla SDID on the Prop 99 panel with no covariates. We pass the data frame `(state, year, cigsale, treated)` to `panel.matrices()`, which reshapes it into the matrix block synthdid wants — the outcome matrix $Y$ with the treated unit on the last row and the post-period in the last $T - T_0$ columns, plus the integers $N_0$ (number of donors) and $T_0$ (last pre-period column). Then `synthdid_estimate(Y, N0, T0)` returns the ATT. ```{r} #| label: fit-sdid-vanilla #| code-summary: "Code: Reshape Prop 99 to (Y, N0, T0), fit vanilla SDID, and compute placebo SE." df_sdid <- prop99 |> select(state, year, cigsale, treated) |> as.data.frame() pm <- panel.matrices(df_sdid, unit = "state", time = "year", outcome = "cigsale", treatment = "treated") sdid_vanilla <- synthdid_estimate(pm$Y, pm$N0, pm$T0) se_placebo <- sqrt(vcov(sdid_vanilla, method = "placebo")) ``` ```{r} #| label: sdid-vanilla-stats #| echo: false tau_sdid <- as.numeric(sdid_vanilla) ci_lo <- tau_sdid - 1.96 * se_placebo ci_hi <- tau_sdid + 1.96 * se_placebo n_donors <- pm$N0 t_pre <- pm$T0 t_post <- ncol(pm$Y) - pm$T0 ``` The panel is $`r nrow(pm$Y)` \times `r ncol(pm$Y)`$ — `r n_donors` donor states plus California, observed `r t_pre` pre-period and `r t_post` post-period years. **Reading the output.** Vanilla SDID returns $\widehat{\tau}_{\text{SDID}} \approx `r sprintf("%.2f", tau_sdid)`$ packs per capita: California smoked roughly `r sprintf("%.0f", abs(tau_sdid))` packs less per capita per year, averaged over the 1989–2000 window, than the SDID counterfactual implies. The placebo standard error of `r sprintf("%.2f", se_placebo)` packs/capita gives a 95% interval of roughly $(`r sprintf("%.1f", ci_lo)`, `r sprintf("%.1f", ci_hi)`)$. The estimate is the exact number reported in Arkhangelsky-et-al-2021 Table 1. ```{r} #| label: tbl-sdid-vanilla #| tbl-cap: "Vanilla SDID estimate on Proposition 99 with placebo standard error and 95% confidence interval. Jackknife and bootstrap return NA on a single-treated panel." #| code-summary: "Code: Tabulate the vanilla SDID point estimate, placebo SE, and 95% CI." tibble( Estimator = "SDID (vanilla)", ATT = tau_sdid, `SE (placebo)` = se_placebo, `CI lower` = ci_lo, `CI upper` = ci_hi ) |> gt_pretty(decimals = 2) ``` The native `synthdid_plot()` is the method's signature display: a two-panel figure that overlays observed and synthetic California *and* draws the estimated time weights $\nu_t$ as a strip below. The strip is the diagnostic that distinguishes SDID from SCM — it shows you which pre-period years actually contribute to the comparison, and which are weighted down. ```{r} #| label: fig-sdid-vanilla #| fig-cap: "Vanilla SDID on Proposition 99. The top panel shows observed California (solid) vs the synthetic-control counterfactual (dashed); the bottom strip shows the SDID time weights nu_t — pre-period years that look like the post-period get more weight, volatile years near the cutoff get less." #| fig-width: 8 #| fig-height: 5 #| code-summary: "Code: Render synthdid_plot() and override colors to match the book palette." synthdid_plot(list("SDID" = sdid_vanilla), treated.name = "California (observed)", control.name = "Synthetic California (SDID)", se.method = "placebo") + scale_color_manual(values = c("California (observed)" = "#d97757", "Synthetic California (SDID)" = "#6a9bcc")) + scale_fill_manual(values = c("California (observed)" = "#d97757", "Synthetic California (SDID)" = "#6a9bcc")) + labs(x = "Year", y = "Cigarette sales (packs per capita)", color = NULL, fill = NULL) ``` The companion `synthdid_units_plot()` shows which donors carried real weight. SDID picks a sparse subset — only a handful of donors absorb most of the simplex mass — and the rest sit on essentially zero weight, so the plot's left-hand column reads as a placebo distribution and the right-hand column as the donors that actually built the counterfactual. ```{r} #| label: fig-sdid-units #| fig-cap: "SDID unit weights omega_i on Prop 99. Donors with weight above 0.001 (right cluster) carried the counterfactual; everyone else (left cluster) was effectively dropped." #| fig-width: 8 #| fig-height: 4.5 #| code-summary: "Code: Render synthdid_units_plot() with the placebo SE method." # synthdid_units_plot() labels its points by donor, not by a "SDID" series, so a # scale_color_manual(c("SDID" = ...)) override never matches and only warns; we # keep the package's native colouring here. synthdid_units_plot(list("SDID" = sdid_vanilla), se.method = "placebo") + labs(x = "Donor state", y = "Estimated effect (each donor in turn as placebo)", color = NULL) ``` **Common pitfall.** Jackknife standard errors are mathematically undefined for a single-treated unit (leaving the one treated cell out gives no treated panel to re-fit), and the synthdid bootstrap collapses for the same reason: resampling treated units with replacement returns the same singleton every time. On Prop 99, `vcov(sdid_vanilla, method = "jackknife")` and `vcov(sdid_vanilla, method = "bootstrap")` both return `NA`. **Placebo SEs are the only frequentist choice** when there is one treated unit. This same constraint returns in the staggered demo below: each cohort is fitted as its own single-treated SDID, so placebo is the valid choice there too — jackknife and bootstrap would need $N_{\text{treated}} \ge 2$ *inside a single fit*, which fitting one cohort at a time never provides. ## With covariates: xsynthdid **The idea.** `synthdid_estimate()` does not ingest covariates directly. To adjust for time-varying $X_{it}$, @kranz2022xsynthdid proposes a two-stage procedure: residualise the outcome on the covariates using a unit-and-time fixed-effects regression, then fit SDID on the residual. The R interface is `xsynthdid::adjust.outcome.for.x()`, which takes the panel data frame, a vector of covariate column names, and returns the adjusted outcome. **The equation.** Stage one fits $$ \widehat{Y}_{it}^{\text{cov}} \;=\; \widehat{\beta}^\top X_{it}, \qquad \widetilde{Y}_{it} \;=\; Y_{it} - \widehat{Y}_{it}^{\text{cov}}, $$ with $\widehat{\beta}$ estimated only on the *control* cells so that the estimated covariate effect is not contaminated by the treatment. Stage two is SDID on $\widetilde{Y}_{it}$: $$ \widehat{\tau}_{\text{SDID-X}} \;=\; \text{SDID}\big(\widetilde{Y}\big). $$ ```{r} #| label: fit-sdid-xadj #| code-summary: "Code: Adjust cigsale for the four time-varying covariates, then refit SDID on the residual." cov_names <- c("lnincome", "retprice", "age15to24", "beer") df_xadj <- prop99 |> select(state, year, cigsale, treated, all_of(cov_names)) |> as.data.frame() df_xadj$cigsale_adj <- adjust.outcome.for.x( df_xadj, unit = "state", time = "year", outcome = "cigsale", treatment = "treated", x = cov_names ) pm_x <- panel.matrices( df_xadj[, c("state", "year", "cigsale_adj", "treated")], unit = "state", time = "year", outcome = "cigsale_adj", treatment = "treated" ) sdid_xadj <- synthdid_estimate(pm_x$Y, pm_x$N0, pm_x$T0) se_xadj_pl <- sqrt(vcov(sdid_xadj, method = "placebo")) ``` ```{r} #| label: sdid-xadj-stats #| echo: false tau_sdid_x <- as.numeric(sdid_xadj) ``` **Reading the output.** With the four covariates partialled out, SDID-X returns $\widehat{\tau}_{\text{SDID-X}} \approx `r sprintf("%.2f", tau_sdid_x)`$ packs/capita, against the $\widehat{\tau}_{\text{SDID}} \approx `r sprintf("%.2f", tau_sdid)`$ of the vanilla fit — a shift of $`r sprintf("%+.2f", tau_sdid_x - tau_sdid)`$ packs that wipes out almost the entire estimate. That is not a refinement; it is a warning. One of the four covariates is **retail price**, and Proposition 99 was a cigarette *tax* — it raised the retail price directly. Adjusting for price therefore conditions on a *post-treatment mediator*, the textbook definition of a **bad control**: the residualisation soaks up the very channel through which the policy lowered smoking. Exercise 2 confirms `retprice` is the covariate doing the damage. With a single treated unit (California), only the placebo SE is identified — `xsynthdid::xsdid_se_bootstrap()` returns an error on this design because its clustered bootstrap resamples treated units, and one treated unit has nothing to resample. The placebo SE of `r sprintf("%.2f", se_xadj_pl)` leaves the covariate-adjusted estimate indistinguishable from zero — exactly what a bad control does. ```{r} #| label: tbl-sdid-xadj #| tbl-cap: "Vanilla vs covariate-adjusted SDID on Prop 99 with placebo standard errors. The bootstrap SE that xsynthdid usually pairs with adjust.outcome.for.x() requires multiple treated units (see Common pitfall below)." #| code-summary: "Code: Lay vanilla and covariate-adjusted SDID side by side." tibble( Estimator = c("SDID (vanilla)", "SDID + X (xsynthdid)"), ATT = c(tau_sdid, tau_sdid_x), `SE (placebo)` = c(se_placebo, se_xadj_pl) ) |> gt_pretty(decimals = 2) ``` **Common pitfall.** Two pitfalls compete for first place here. (1) Re-adding `state` and `year` to the covariate vector double-counts the unit and time fixed effects that `adjust.outcome.for.x()` is already partialling out — the residualisation is a TWFE regression. Use only covariates that vary *within* unit and *within* time (income, prices, age shares, beer here); do not add the identifiers themselves. (2) `xsdid_se_bootstrap()` errors on single-treated panels because its clustered bootstrap resamples treated units; on Prop 99 that means it cannot run, and only the placebo SE is identified. The staggered demo below does not rescue it: fitting one cohort at a time leaves every fit with a single treated unit, so the placebo SE is used throughout. ## DID, SC, SDID side by side The whole motivation for SDID is that DiD assumes uniform weights and SCM assumes uniform time. Each makes a strong assumption. SDID weakens both. The cleanest way to see what each estimator buys is to run all three on the same Y, $N_0$, $T_0$ inputs. `synthdid` ships `did_estimate()` and `sc_estimate()` for exactly this purpose — three calls, three numbers, one table, the iconic Table 1 of @arkhangelsky2021synthetic. ```{r} #| label: fit-three-estimators #| code-summary: "Code: Fit DiD, SC, and SDID on the identical (Y, N0, T0) matrices and compute placebo SEs." est_did <- did_estimate(pm$Y, pm$N0, pm$T0) est_sc <- sc_estimate(pm$Y, pm$N0, pm$T0) est_sdid <- sdid_vanilla # already fit above se_did_pl <- sqrt(vcov(est_did, method = "placebo")) se_sc_pl <- sqrt(vcov(est_sc, method = "placebo")) se_sdid_pl <- se_placebo ``` ```{r} #| label: arkhangelsky-stats #| echo: false tau_did <- as.numeric(est_did) tau_sc <- as.numeric(est_sc) gap_did_sdid <- tau_did - tau_sdid gap_sc_sdid <- tau_sc - tau_sdid ``` **Reading the output.** DiD says California smoked roughly $`r sprintf("%.1f", abs(tau_did))`$ packs less per capita than the unweighted donor pool implies; SC says $`r sprintf("%.1f", abs(tau_sc))`$; SDID lands at $`r sprintf("%.1f", abs(tau_sdid))`$. The three magnitudes span about $`r sprintf("%.0f", abs(tau_did - tau_sc))`$ packs, with SDID **bracketed** by DiD on one side and SC on the other. That bracketing is not a coincidence — SDID inherits the relaxation of parallel trends from SCM (so it does not buy the full DiD difference) and the relaxation of exact pre-period matching from DiD (so it does not buy the full SCM difference). ```{r} #| label: tbl-arkhangelsky-table1 #| tbl-cap: "DiD, SC, and SDID on Proposition 99 — the Arkhangelsky-et-al-2021 Table 1 replication. All four rows fit the same (Y, N0, T0) matrices; the rightmost row residualises cigsale on the four covariates first via xsynthdid." #| code-summary: "Code: Build the iconic DiD/SC/SDID/SDID-X comparison table with placebo SEs." table1 <- tibble( Estimator = c("DiD", "SC", "SDID", "SDID + X (xsynthdid)"), ATT = c(tau_did, tau_sc, tau_sdid, tau_sdid_x), `SE (placebo)` = c(se_did_pl, se_sc_pl, se_sdid_pl, se_xadj_pl) ) |> mutate( `CI lower` = ATT - 1.96 * `SE (placebo)`, `CI upper` = ATT + 1.96 * `SE (placebo)` ) table1 |> gt_pretty(decimals = 2) ``` The combined `synthdid_plot()` shows the three counterfactual trajectories on one figure — DiD's parallel-trend line, SC's tight pre-period match, and SDID's compromise between the two. The time-weight strip at the bottom is now only populated for SDID (the other two are uniform by construction). ```{r} #| label: fig-sdid-comparison #| fig-cap: "DiD, SC, and SDID counterfactuals on Prop 99 plotted together. DiD's pre-period match is the worst (uniform weights), SC's the best (the simplex exists to do exactly this), and SDID's a controlled compromise. The bottom strip shows SDID's time weights nu_t." #| fig-width: 9 #| fig-height: 5.5 #| code-summary: "Code: Render the three-estimator synthdid_plot() and override colors." # A multi-estimator synthdid_plot() labels its series with internal per-estimator # names that a c(DiD=, SC=, SDID=) override does not match (it only warns), so we # keep synthdid's native three-colour scheme for the comparison figure. synthdid_plot(list(DiD = est_did, SC = est_sc, SDID = est_sdid), se.method = "placebo") + labs(x = "Year", y = "Cigarette sales (packs per capita)", color = NULL, fill = NULL) ``` ## Staggered adoption: many treated units The Prop 99 demo has one treated unit and one cutoff. Real policy roll-outs stagger. SDID's `synthdid_estimate()` solves the single-block problem; for staggered adoption the standard recipe (per Arkhangelsky-et-al-2021 §5 and the cookbook of @clarkepailanir2023synthetic) is to fit one SDID per cohort — subsetting to (never-treated ∪ that cohort) — and then aggregate the cohort ATTs with cell-count weights $\pi_g = N_g\, T_{\text{post}, g} / \sum_{g'} N_{g'}\, T_{\text{post}, g'}$. With one treated state per cohort that simplifies to $\pi_g \propto T_{\text{post}, g}$. We use chapter 5's purpose-built tobacco panel — four cohorts (Atlantica 1986, Borealis 1989, Cascadia 1992, Deltora 1995) and 36 never-treated donors. ```{r} #| label: fit-sdid-staggered #| code-summary: "Code: Fit synthdid per cohort with its placebo SE and stash inputs for aggregation." donors <- tobacco |> filter(treated == 0) |> pull(state) |> unique() cohorts <- tobacco |> filter(treated == 1) |> distinct(state, adoption_year) |> arrange(adoption_year) fit_cohort <- function(cohort_state, adopt_yr) { sub <- tobacco |> filter(state %in% c(donors, cohort_state)) |> mutate(treated_g = as.integer(state == cohort_state & year >= adopt_yr)) |> select(state, year, cigsale, treated_g) |> as.data.frame() m <- panel.matrices(sub, unit = "state", time = "year", outcome = "cigsale", treatment = "treated_g") est <- synthdid_estimate(m$Y, m$N0, m$T0) tibble( Cohort = cohort_state, `Adoption yr` = adopt_yr, `T_post` = ncol(m$Y) - m$T0, ATT = as.numeric(est), `SE (placebo)` = sqrt(vcov(est, method = "placebo")) ) } per_cohort <- map2_dfr(cohorts$state, cohorts$adoption_year, fit_cohort) ``` Each cohort's SDID estimate is its own ATT averaged over that cohort's post-period. Cohorts that adopt earlier have more post-period years and so contribute more to the aggregate. ```{r} #| label: tbl-staggered-per-cohort #| tbl-cap: "Per-cohort synthdid estimates on the tobacco_sim panel. Each cohort is a single-treated SDID fit (one treated state against the never-treated donors), so — exactly as on Prop 99 — only the placebo standard error is defined; jackknife and bootstrap would need two or more treated units inside a single fit." #| code-summary: "Code: Tabulate the per-cohort ATTs and placebo SEs." gt_pretty(per_cohort, decimals = 2) ``` ```{r} #| label: agg-staggered-stats #| echo: false pi <- per_cohort$T_post / sum(per_cohort$T_post) tau_agg <- sum(pi * per_cohort$ATT) se_agg_pl <- sqrt(sum((pi * per_cohort$`SE (placebo)`)^2)) ``` ```{r} #| label: tbl-staggered-aggregate #| tbl-cap: "Aggregate staggered SDID ATT, weighted by post-period cell count pi_g. The placebo SE combines the per-cohort placebo SEs in quadrature, under the (strong) assumption that the cohort estimates are independent." #| code-summary: "Code: Aggregate the per-cohort ATTs and placebo SEs into a single headline." tibble( Aggregator = "pi_g = T_post,g / sum T_post (cell-count weighted)", ATT = tau_agg, `SE (placebo)` = se_agg_pl ) |> gt_pretty(decimals = 2) ``` **A note on standard errors.** It is tempting to think four cohorts buy back the jackknife and bootstrap that Prop 99 could not support. They do not. Each cohort is fitted as its own single-treated SDID — one treated state against the never-treated donors — so leaving the treated unit out (jackknife) or resampling treated units (bootstrap) is exactly as undefined here as it was on Prop 99. **The placebo permutation is again the only valid per-cohort SE.** The aggregate SE combines the four per-cohort placebo SEs in quadrature, which assumes the cohort estimates are independent — a strong assumption, since every cohort is built from the same donor pool. ```{r} #| label: fig-sdid-event #| fig-cap: "Per-cohort SDID ATTs from the tobacco_sim panel with their placebo 95% bands. Each point is one cohort's average effect across that cohort's post-period; the horizontal line is the cell-count-weighted aggregate." #| fig-width: 8 #| fig-height: 4.5 #| code-summary: "Code: Plot per-cohort ATTs with placebo bands and the aggregate line." per_cohort |> mutate( lower = ATT - 1.96 * `SE (placebo)`, upper = ATT + 1.96 * `SE (placebo)` ) |> ggplot(aes(x = factor(`Adoption yr`), y = ATT)) + geom_hline(yintercept = tau_agg, color = "#6a9bcc", linetype = "dashed", linewidth = 0.8) + geom_hline(yintercept = 0, color = "#94a3b8", linewidth = 0.3) + geom_errorbar(aes(ymin = lower, ymax = upper), width = 0.18, color = "#d97757", linewidth = 0.6) + geom_point(size = 3, color = "#d97757") + labs(x = "Adoption year (cohort)", y = "ATT on cigarette sales (packs per capita)") ``` The four cohort ATTs scatter around the aggregate line; the spread comes from both real cohort heterogeneity (the simulation built it in deliberately) and small-sample noise (each cohort fits SDID on a single treated state). ## What the SDID estimates tell us **The disagreement.** The headline DiD estimate $`r sprintf("%.1f", tau_did)`$, the SC estimate $`r sprintf("%.1f", tau_sc)`$, and the SDID estimate $`r sprintf("%.1f", tau_sdid)`$ disagree on Prop 99 by about $`r sprintf("%.0f", abs(tau_did - tau_sc))`$ packs/capita — that is the chapter's lesson. Each estimator makes a different assumption about what the counterfactual looks like, and the gap between them is the *price* of those assumptions. DiD overstates the effect because parallel trends fail: the donor pool was falling faster than California already in the 1970s, so a uniform average overshoots California's counterfactual decline. SCM corrects this by matching the pre-period level closely — at the cost of throwing away most of the donor pool. SDID lands in between because it keeps the donor reweighting (the SCM relaxation) but also lets the time dimension de-mean, which absorbs the level mismatch the SCM weights cannot perfectly close. **A note on standard errors.** Placebo SEs were the only valid choice on Prop 99 (single treated) — and they remain the only valid choice on the staggered tobacco panel, because each cohort is fitted one treated state at a time. Multiplying cohorts does not multiply treated units *within* any single fit, so jackknife and bootstrap stay undefined cohort by cohort. The covariate-adjusted `xsdid_se_bootstrap()` needs multiple treated units in one fit for the same reason, so it never enters this chapter's single-treated and per-cohort designs. **Where this leaves us.** Chapter 7 (Structural Bayesian Time Series) replaces SDID's frequentist confidence interval with a full Bayesian posterior over the counterfactual: instead of two simplex problems and a placebo permutation, BSTS puts a spike-and-slab prior on donor regressors and reports a credible interval that is a direct probability statement about the ATT. The two estimators answer the same missing-data question with very different uncertainty stories. **Recap.** On Prop 99, vanilla SDID returns $\widehat{\tau}_{\text{SDID}} \approx `r sprintf("%.1f", tau_sdid)`$ packs/capita (placebo SE $`r sprintf("%.1f", se_sdid_pl)`$), the covariate adjustment moves it to $`r sprintf("%.1f", tau_sdid_x)`$, and the cell-count-weighted staggered aggregate on the tobacco panel is $`r sprintf("%.1f", tau_agg)`$ (placebo SE $`r sprintf("%.1f", se_agg_pl)`$). ## Key takeaways **Methods:** - SDID minimises a doubly-weighted two-way-fixed-effects sum of squares with unit weights $\omega_i$ (the SCM simplex) and time weights $\nu_t$ (a pre-period simplex chosen so donor pre-period averages match donor post-period averages); the headline estimator is `synthdid_estimate(Y, N0, T0)`, with `did_estimate()` and `sc_estimate()` as the matched DiD and SCM baselines. - Covariate adjustment in the SDID world is a two-stage procedure: `xsynthdid::adjust.outcome.for.x()` residualises the outcome on a TWFE regression of the covariates fit on control cells, then `synthdid_estimate()` runs on the residual; `xsdid_se_bootstrap()` is the matched clustered bootstrap SE for that two-stage estimator (it requires multiple treated units, so on Prop 99 we fall back to the placebo SE). - Staggered adoption is handled by fitting one SDID per cohort and aggregating with cell-count weights $\pi_g$; the per-cohort SDID estimates are themselves single-block problems, so `synthdid_estimate()` does all the heavy lifting and the aggregation is a one-line `sum(pi * ATT)`. **Lessons:** - On Prop 99, DiD, SC, and SDID return $`r sprintf("%.1f", tau_did)`$, $`r sprintf("%.1f", tau_sc)`$, and $`r sprintf("%.1f", tau_sdid)`$ packs/capita — a spread of $`r sprintf("%.0f", abs(tau_did - tau_sc))`$ packs that *is* the chapter's lesson. SDID's bracketing of DiD and SC comes from the two-knob structure: each weight relaxation buys a portion of the gap between the two extremes. - Adjusting for the four time-varying covariates (income, retail price, age share, beer) moves the SDID estimate by $`r sprintf("%+.1f", tau_sdid_x - tau_sdid)`$ packs/capita — almost the whole effect — because one of them, retail price, is a post-treatment *mediator* that the Proposition 99 tax raised directly. Conditioning on it is a **bad control** that absorbs the policy's main channel, so the covariate-adjusted estimate $`r sprintf("%.1f", tau_sdid_x)`$ is a caution, not a refinement (Exercise 2 isolates `retprice` as the culprit). Its placebo SE $`r sprintf("%.2f", se_xadj_pl)`$ is the only identified SE here because `xsdid_se_bootstrap()` requires multiple treated units. - Inference is design-specific: the **placebo** permutation is the only valid frequentist SE whenever a fit has a single treated unit — and that includes every cohort of the staggered roll-out, each fitted one treated state at a time. Jackknife and bootstrap need two or more treated units *inside one fit*, which the per-cohort recipe never provides. The staggered tobacco fit's aggregate ATT of $`r sprintf("%.1f", tau_agg)`$ packs/capita (placebo SE $`r sprintf("%.1f", se_agg_pl)`$) confirms the simulation's expected sign. **Caveats:** - SDID's identification rests on the latent-factor / weighted-parallel-trends assumption, which is weaker than DiD's parallel trends but is still an assumption — the doubly-reweighted donors must track the doubly-reweighted treated unit in expectation. The chapter cannot test this directly; the placebo permutation only tests the no-effect null. - The xsynthdid residualisation uses a TWFE regression of the covariates fit on control cells. Re-adding `state` or `year` as covariates double-counts the unit and time fixed effects implicit in this regression — keep the covariate vector to genuinely time-varying within-unit predictors. - Single random-forest imputation of the two sparse Prop 99 covariates (`beer`, `age15to24`) under-states variance because it ignores imputation uncertainty; chapter 7's BSTS chapter is where proper multiple imputation with Rubin's-rules pooling is paid in full. ## Further reading - @arkhangelsky2021synthetic — *original method* — the AER 2021 SDID paper that defines the doubly-weighted estimator and the iconic DiD/SC/SDID Table 1 replicated here. - @clarkepailanir2023synthetic — *practical guide* — the Stata/R cross-walk for SDID with staggered adoption and clustered bootstrap SEs. - @synthdid_pkg — *R package* — the GitHub-only `synthdid` package documenting `panel.matrices()`, `synthdid_estimate()`, `sc_estimate()`, `did_estimate()`, and the `vcov(method = ...)` interface. - @xsynthdid_pkg — *R package* — the r-universe `xsynthdid` package with `adjust.outcome.for.x()` and the two-stage covariate-augmented bootstrap. - @abadie2010synthetic — *classical baseline* — the simplex-constrained estimator of chapter 4 that SDID nests at $\nu_t = 1/T_{\text{pre}}$. ## Exercises The exercises below probe the chapter's design choices: the covariate set, the imputation strategy, the staggered aggregation rule, and the in-time placebo as a gold-standard validity check. They reuse the `prop99`, `tobacco`, `pm`, and `per_cohort` objects from the setup chunks; nothing needs to be re-loaded. ### Exercise 1: Replicate Arkhangelsky-et-al-2021 Table 1 to two decimals The DiD/SC/SDID row of the paper's Table 1 reports estimates of −27.3, −19.6, and −15.6 packs/capita with placebo SEs of 9.1, 9.9, and 10.2. Reproduce these estimates from `pm` and confirm that they match to two decimals. (Cite the paper's table when you submit.) ::: {.callout-tip collapse="true"} ## Solution ```{r} #| label: ex-ch06-01 #| message: false #| warning: false ex1 <- tibble( Estimator = c("DiD", "SC", "SDID"), `Replication ATT` = c(as.numeric(est_did), as.numeric(est_sc), as.numeric(est_sdid)), `Replication SE` = c(sqrt(vcov(est_did, method = "placebo")), sqrt(vcov(est_sc, method = "placebo")), sqrt(vcov(est_sdid, method = "placebo"))), `Paper ATT` = c(-27.3, -19.6, -15.6), `Paper SE` = c( 9.1, 9.9, 10.2) ) gt_pretty(ex1, decimals = 2) ``` All three estimates match the paper to within rounding. The placebo SEs do *not* match exactly because Arkhangelsky-et-al-2021 use 200 placebo draws by default and the seed differs; the magnitude is identical. ::: ### Exercise 2: Covariate ablation Run `adjust.outcome.for.x()` four times, dropping one covariate at a time, and tabulate $\widehat{\tau}_{\text{SDID-X}}$ across the four ablations plus the full-covariate baseline. Which covariate moves the estimate the most? ::: {.callout-tip collapse="true"} ## Solution ```{r} #| label: ex-ch06-02 #| message: false #| warning: false ablate <- function(drop) { x_keep <- setdiff(cov_names, drop) d <- df_xadj d$y_adj <- adjust.outcome.for.x( d, unit = "state", time = "year", outcome = "cigsale", treatment = "treated", x = x_keep ) m <- panel.matrices( d[, c("state", "year", "y_adj", "treated")], unit = "state", time = "year", outcome = "y_adj", treatment = "treated" ) as.numeric(synthdid_estimate(m$Y, m$N0, m$T0)) } ex2 <- tibble( `Dropped` = c("(none)", cov_names), `SDID-X ATT` = c(tau_sdid_x, map_dbl(cov_names, ablate)) ) gt_pretty(ex2, decimals = 2) ``` The covariate that moves the estimate most is the one whose pre-period correlation with cigsale is strongest *and* whose post-period trajectory differs most between California and the donors. Retail price (`retprice`) is the usual culprit in this panel. ::: ### Exercise 3: Imputation choice Re-run the chapter with `mice(m = 5, method = "rf")` (proper multiple imputation) and pool the five SDID estimates with Rubin's rules. How much does the headline ATT move under multiple imputation vs the single-imputation baseline? ::: {.callout-tip collapse="true"} ## Solution ```{r} #| label: ex-ch06-03 #| message: false #| warning: false imp_m5 <- mice(prop99_raw |> select(-state), m = 5, method = "rf", printFlag = FALSE) fit_one_imp <- function(i) { d <- complete(imp_m5, i) |> as_tibble() |> mutate(state = prop99_raw$state, .before = 1, treated = as.integer(state == "California" & year >= 1989)) |> select(state, year, cigsale, treated) |> as.data.frame() m <- panel.matrices(d, unit = "state", time = "year", outcome = "cigsale", treatment = "treated") est <- synthdid_estimate(m$Y, m$N0, m$T0) tibble( draw = i, tau = as.numeric(est), se = sqrt(vcov(est, method = "placebo")) ) } ex3 <- map_dfr(seq_len(imp_m5$m), fit_one_imp) tau_m5_pool <- mean(ex3$tau) W <- mean(ex3$se^2) # within-imputation variance B <- var(ex3$tau) # between-imputation variance T_rubin <- W + (1 + 1 / imp_m5$m) * B se_m5_pool <- sqrt(T_rubin) bind_rows( ex3 |> mutate(panel = sprintf("draw %d", draw)) |> select(panel, tau, se), tibble(panel = "Rubin-pooled (m = 5)", tau = tau_m5_pool, se = se_m5_pool), tibble(panel = "Single-impute (chapter)", tau = tau_sdid, se = se_placebo) ) |> gt_pretty(decimals = 3) ``` Each random-forest draw gives a slightly different point estimate; the Rubin-pooled SE is wider than the single-imputation placebo SE because it incorporates the between-draw variance. The single-imputation chapter estimate is *not* biased per se, but its reported SE is optimistic. ::: ### Exercise 4: Staggered aggregation under equal weights The chapter aggregates the four cohort ATTs with cell-count weights $\pi_g = T_{\text{post}, g} / \sum_{g'} T_{\text{post}, g'}$. A simpler aggregator is *equal* weights ($\pi_g = 1/G$). Recompute the aggregate ATT under equal weights and compare to the cell-count-weighted headline. ::: {.callout-tip collapse="true"} ## Solution ```{r} #| label: ex-ch06-04 ex4 <- tibble( Aggregator = c("Cell-count (chapter)", "Equal weight"), ATT = c(tau_agg, mean(per_cohort$ATT)), `Implicit weight on Atlantica` = c(per_cohort$T_post[1] / sum(per_cohort$T_post), 1 / nrow(per_cohort)) ) gt_pretty(ex4, decimals = 3) ``` Equal weighting up-weights late-adopting cohorts (Cascadia, Deltora) at the expense of Atlantica, whose 15 post-period years would otherwise dominate. The two aggregates can disagree substantially in panels with very uneven cohort sizes; here the gap is modest because the cohort effects themselves are similar. ::: ### Exercise 5 (stretch): In-time placebo at 1980 Pretend Proposition 99 went into effect in 1980 instead of 1989. Re-define the treatment indicator so California is "treated" from 1980 onward, refit SDID, and report the placebo estimate. Under the null hypothesis that the chapter's SDID estimate captures a real policy effect (and not some pre-period phenomenon), this in-time placebo should be near zero. ::: {.callout-tip collapse="true"} ## Solution ```{r} #| label: ex-ch06-05 prop99_placebo <- prop99 |> filter(year < 1989) |> mutate(treated = as.integer(state == "California" & year >= 1980)) |> select(state, year, cigsale, treated) |> as.data.frame() pm_pl <- panel.matrices(prop99_placebo, unit = "state", time = "year", outcome = "cigsale", treatment = "treated") est_pl <- synthdid_estimate(pm_pl$Y, pm_pl$N0, pm_pl$T0) se_pl <- sqrt(vcov(est_pl, method = "placebo")) tibble( Placebo = "California 'treated' from 1980 on (pre-period only)", ATT = as.numeric(est_pl), `SE (placebo)` = se_pl, `Real 1989 SDID` = tau_sdid ) |> gt_pretty(decimals = 2) ``` The in-time placebo is much closer to zero than the real 1989 estimate, which is consistent with the chapter's SDID number capturing the policy and not a pre-existing pre-trend. A non-zero in-time placebo would be a red flag — it would say SDID is also picking up something the donors do not explain in the pre-period. :::