Chapter 08 — Case Studies for Bivariate Regression

Does every extra dollar of health spending buy you more life? Or does the U.S. prove that spending and outcomes can come apart?

A coefficient can be statistically significant yet economically modest — or economically large yet imprecise. Economic significance asks whether the effect is large enough to matter for policy; statistical significance asks whether it is different from zero. In cross-country regressions, always interpret both, and use heteroskedasticity-robust standard errors — richer countries show more variation, making robust SEs essential for valid inference (KC 8.2).

Outcome variable

Show residuals

Intercept (b1)

—

Slope (b2)

—

se(b2)

—

t-statistic

—

R-squared

—

Try this

Start with Life Expectancy. R² ≈ 0.32 and each extra $1,000 of spending predicts about 1.1 more years of life. Spending matters, but it explains only a third of the cross-country variation.
Switch to Infant Mortality. The slope flips negative — more spending, fewer infant deaths. The same predictor, the same sample, but a totally different lens on health.
Toggle residuals on for both outcomes. The U.S. has the largest residual in each model. A persistent anomaly: the U.S. underperforms what spending alone would predict, regardless of which outcome you use.

Take-away: In cross-country regressions, report both the magnitude and the precision of the effect — and always use robust SEs to handle heteroskedasticity. Read §8.1 in the chapter →

Richer countries spend more on health — but does spending rise proportionally with income, or faster? The answer decides whether health care is a luxury or a necessity.

Income elasticity of demand measures how spending changes with income. An elasticity near 1.0 means health care is a normal good — a 1% increase in GDP associates with roughly a 1% increase in health spending. Health is neither a luxury (elasticity > 1) nor a necessity (< 1) in cross-country data.

Sample

Highlight outliers

Slope (b2)

—

se(b2)

—

R-squared

—

Elasticity at mean

—

Try this

Start with All 34 Countries. R² ≈ 0.60 and slope ≈ 0.09. GDP explains 60% of health-spending variation, but two countries pull the fit down.
Switch to Exclude USA & LUX. R² jumps to ~0.93 and the slope rises to 0.13. Removing two of 34 countries transforms the fit — influence matters far more than sample size here.
Turn highlights on and compare the two fitted lines. Both flag the U.S. and Luxembourg as far from the trend. The 32-country line is the "typical OECD" pattern; the 34-country line mixes it with two exceptional cases.

Take-away: With income elasticity ≈ 1 in the "typical" OECD subset, health care behaves as a normal good — outliers distort the slope but don't overturn the elasticity story. Read §8.2 in the chapter →

How fragile is your regression? If dropping one country flips the slope or doubles the R², your story isn't about the data — it's about that country.

A few extreme observations can dramatically alter regression results. The outlier workflow has four steps: (1) identify outliers visually; (2) assess their influence on coefficients; (3) test robustness by re-running without them; (4) interpret results in context. Never silently drop outliers — disclose and justify any exclusion.

Countries to drop (by |residual|) 0

Slope (b2)

—

R-squared

—

Dropped

—

Remaining n

—

Try this

Start at 0 drops. R² ≈ 0.60, slope ≈ 0.09. This is the baseline story on all 34 countries.
Slide to 2. The USA and Luxembourg (the two largest residuals) are removed and R² jumps to ~0.93. Two of 34 countries carry most of the model's "misfit" — dropping them tightens the relationship dramatically.
Slide up to 5 or 6. R² keeps creeping up but the marginal gain flattens. After the first few influential points, further exclusion is just chasing natural scatter — time to stop.

Take-away: Influence analysis shows which observations drive your results — report both with-and-without specifications so readers can judge robustness. Read §8.2 in the chapter →

Some stocks amplify the market; others cushion you from it. One number — the CAPM beta — decides which camp each stock is in.

Beta measures how much a stock's returns co-move with the overall market — this is its systematic risk. Beta < 1 marks a "defensive" stock (less volatile than the market); beta > 1 marks a "growth" stock that amplifies market moves. Only systematic risk is priced in efficient markets — idiosyncratic risk diversifies away in portfolios (KC 8.6).

Stock

Show 45° line (beta=1)

Alpha

—

Beta

—

se(beta)

—

t(beta)

—

R-squared

—

Try this

Start with Coca-Cola and toggle the 45° reference line on. Beta ≈ 0.61 and the fitted slope is visibly flatter than the reference. A defensive stock — it moves less than the market in both directions.
Switch to Target. Beta climbs above 1.0 and the scatter widens. Target amplifies market moves and carries more idiosyncratic risk than the two consumer staples.
Switch to Walmart. Beta lands between the other two — defensive, like another consumer-staple giant. Stable demand for everyday goods keeps beta low across both staples despite very different company histories.

Take-away: Beta separates stocks into defensive (< 1) and amplifying (> 1) — and in efficient markets only beta-risk is priced; diversifying away everything else costs nothing. Read §8.3 in the chapter →

If CAPM fits, the residuals should look like pure noise. Do they? And which stock carries the most firm-specific surprise?

R² in CAPM measures the fraction of return variation explained by market movements — the rest is idiosyncratic risk. Residuals capture firm-specific shocks (idiosyncratic risk); in efficient markets this portion diversifies away in a portfolio and earns no risk premium. A well-specified CAPM regression shows residuals randomly scattered around zero with no fan shape or curve.

Stock

Plot type

Mean residual

—

SD residual

—

Max |residual|

—

R-squared

—

Try this

Pick Coca-Cola and view Residuals vs. Fitted. The cloud is roughly centered on zero with no fan. Homoskedasticity holds approximately — the CAPM assumptions don't obviously break for this stock.
Switch to Residual Histogram. The shape is roughly bell-like with fat tails. Most months are ordinary; a handful of large residuals mark firm-specific news events.
Cycle through the three stocks in histogram view. Target's residual SD is clearly largest. Target carries the most idiosyncratic risk — and thus the most that a diversified portfolio can wash away.

Take-away: What CAPM leaves unexplained is idiosyncratic risk — in efficient markets that's free to diversify away, so it carries no reward. Read §8.3 in the chapter →

Okun's 1962 rule of thumb says every 1-point jump in unemployment costs you two points of GDP growth. Does that still hold in the modern U.S. economy?

Okun's Law is a remarkably stable empirical regularity linking unemployment changes to GDP growth. The basic relationship holds across time periods and countries, but the exact coefficient drifts due to structural changes in labor markets, productivity trends, and institutional differences. Original Okun: slope ≈ −2.0. Our U.S. 1961–2019 estimate: slope ≈ −1.59.

Predict for URATE change 0.0

Show Okun's original (slope = -2.0)

Intercept (b1)

—

Slope (b2)

—

se(b2)

—

R-squared

—

Predicted GDP growth

—

Try this

Set the prediction slider to +3 (unemployment rises 3 points). Predicted GDP growth drops to about −1.8%. That's a severe-recession forecast — consistent with what the U.S. saw in 2008–09 and 1982.
Toggle Okun's original slope = −2.0 on. The dashed reference is visibly steeper than the fitted line. The modern U.S. relationship is slightly weaker than Okun's 1962 original — structural changes in labor markets have softened the link.
Hover the points in the lower-right cluster. These are recession years: 2008–09, 1982, 1974–75. Big unemployment rises paired with negative GDP growth — the textbook Okun pattern made concrete.

Take-away: Okun's rule-of-thumb survives decades and countries, but not at exactly the same slope — the relationship's stability matters more than the coefficient's precise value. Read §8.4 in the chapter →

Okun's Law is stable — but for how long at a stretch? Slide the window to find the years where the model quietly stops working.

Structural breaks shift long-run relationships when something in the economy changes. Policy changes, technological shifts, and economic crises can reshape coefficients mid-sample. Visual inspection of actual vs. predicted values over time identifies periods when the model weakens or strengthens. Persistent runs of positive or negative residuals are red flags for a break.

Start year 1961

End year 2019

Show

Subperiod slope

—

Subperiod R²

—

Years

—

Mean |residual|

—

Try this

Set the range to 1961–2007 and note R² and slope. Then extend to 1961–2019. The fit barely moves — the 1961–2007 Okun relationship survives most of the full sample.
Set 2008–2019 only. The slope and R² drift from the full-sample numbers. The "jobless recovery" period left systematic prediction errors — a candidate structural break.
Toggle to Residuals. Scan for runs of positive or negative values, especially after 2008. Persistent runs of one sign are the visual signature of serial correlation or a structural break.

Take-away: Time-series regressions are not static truths — scanning actual-vs-predicted plots is the cheapest way to catch a model that has quietly stopped working. Read §8.4 in the chapter →

Python Libraries and Code

You've explored the key concepts interactively — now reproduce them in Python. This self-contained code block covers everything you practiced above. Copy it into an empty notebook and run it.

# =============================================================================
# CHAPTER 8 CHEAT SHEET: Case Studies for Bivariate Regression
# =============================================================================

# --- Libraries ---
import pandas as pd                       # data loading and manipulation
import matplotlib.pyplot as plt           # creating plots and visualizations
from statsmodels.formula.api import ols   # OLS regression with R-style formulas

# =============================================================================
# STEP 1: Load OECD health data from a URL
# =============================================================================
# pd.read_stata() reads Stata .dta files — this dataset covers 34 OECD countries
url_health = "https://raw.githubusercontent.com/quarcs-lab/data-open/master/AED/AED_HEALTH2009.DTA"
data_health = pd.read_stata(url_health)

print(f"Health dataset: {data_health.shape[0]} countries, {data_health.shape[1]} variables")

# =============================================================================
# STEP 2: Descriptive statistics — summarize before modeling
# =============================================================================
# .describe() gives mean, std, min, quartiles, max for each variable
print(data_health[['hlthpc', 'lifeexp', 'infmort', 'gdppc']].describe().round(2))

# =============================================================================
# STEP 3: Health outcomes regression with robust standard errors
# =============================================================================
# Does higher health spending improve life expectancy?
model_life = ols('lifeexp ~ hlthpc', data=data_health).fit()

slope_life = model_life.params['hlthpc']
r2_life    = model_life.rsquared

print(f"Life expectancy: slope = {slope_life:.5f}, R² = {r2_life:.4f}")
print(f"Each extra $1,000 in spending → {slope_life*1000:.2f} more years of life expectancy")

# Robust standard errors adjust for non-constant error variance (heteroskedasticity)
model_life_robust = model_life.get_robustcov_results(cov_type='HC1')
model_life_robust.summary()

# =============================================================================
# STEP 4: Health spending vs GDP — income elasticity
# =============================================================================
# How much of health spending is driven by national income?
model_gdp = ols('hlthpc ~ gdppc', data=data_health).fit()

slope_gdp = model_gdp.params['gdppc']
r2_gdp    = model_gdp.rsquared

# Income elasticity at the mean: (slope × mean_x) / mean_y
mean_gdp  = data_health['gdppc'].mean()
mean_hlth = data_health['hlthpc'].mean()
elasticity = (slope_gdp * mean_gdp) / mean_hlth

print(f"Health spending on GDP: slope = {slope_gdp:.4f}, R² = {r2_gdp:.4f}")
print(f"Income elasticity at the mean: {elasticity:.2f} (≈1.0 → normal good)")

# =============================================================================
# STEP 5: Outlier robustness — excluding USA and Luxembourg
# =============================================================================
# Two countries drive much of the model's "misfit" — test robustness by excluding them
data_subset = data_health[(data_health['code'] != 'USA') &
                          (data_health['code'] != 'LUX')]

model_subset = ols('hlthpc ~ gdppc', data=data_subset).fit()

print(f"\nAll 34 countries:  slope = {slope_gdp:.4f}, R² = {r2_gdp:.4f}")
print(f"Excluding USA/LUX: slope = {model_subset.params['gdppc']:.4f}, R² = {model_subset.rsquared:.4f}")
print("Removing 2 of 34 countries transforms R² — always check for influential observations!")

fig, axes = plt.subplots(1, 2, figsize=(14, 5))
for ax, df, mdl, title in zip(
        axes,
        [data_health, data_subset],
        [model_gdp, model_subset],
        ['All 34 Countries', 'Excluding USA & Luxembourg']):
    ax.scatter(df['gdppc'], df['hlthpc'], s=50, alpha=0.7)
    ax.plot(df['gdppc'], mdl.fittedvalues, color='red', linewidth=2)
    ax.set_xlabel('GDP per capita ($)')
    ax.set_ylabel('Health spending per capita ($)')
    ax.set_title(f'{title}  (R² = {mdl.rsquared:.2f})')
    ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# =============================================================================
# STEP 6: CAPM — estimating Coca-Cola's beta (systematic risk)
# =============================================================================
# Beta measures how a stock's excess return co-moves with the market excess return
url_capm = "https://raw.githubusercontent.com/quarcs-lab/data-open/master/AED/AED_CAPM.DTA"
data_capm = pd.read_stata(url_capm)

model_capm = ols('rko_rf ~ rm_rf', data=data_capm).fit()

alpha = model_capm.params['Intercept']   # excess return beyond CAPM prediction
beta  = model_capm.params['rm_rf']       # systematic risk
r2_capm = model_capm.rsquared

print(f"Coca-Cola CAPM: alpha = {alpha:.4f}, beta = {beta:.4f}, R² = {r2_capm:.4f}")
print(f"Beta < 1 → defensive stock (moves less than the market)")
print(f"R² = {r2_capm:.2%} explained by market; {1-r2_capm:.2%} is idiosyncratic risk")

# Full regression table
model_capm.summary()

# =============================================================================
# STEP 7: Okun's Law — GDP growth vs unemployment change
# =============================================================================
# Okun (1962): each +1 point in unemployment → ≈ -2 points in GDP growth
url_gdp = "https://raw.githubusercontent.com/quarcs-lab/data-open/master/AED/AED_GDPUNEMPLOY.DTA"
data_gdp = pd.read_stata(url_gdp)

model_okun = ols('rgdpgrowth ~ uratechange', data=data_gdp).fit()

slope_okun = model_okun.params['uratechange']
r2_okun    = model_okun.rsquared

print(f"Okun's Law: slope = {slope_okun:.2f} (Okun's original: -2.0)")
print(f"R² = {r2_okun:.4f} — unemployment explains {r2_okun*100:.0f}% of GDP growth variation")

# Scatter plot with fitted line
fig, ax = plt.subplots(figsize=(10, 6))
ax.scatter(data_gdp['uratechange'], data_gdp['rgdpgrowth'], s=50, alpha=0.7)
ax.plot(data_gdp['uratechange'], model_okun.fittedvalues, color='red', linewidth=2,
        label=f'Fitted: slope = {slope_okun:.2f}')
ax.axhline(y=0, color='gray', linestyle=':', linewidth=1, alpha=0.5)
ax.set_xlabel('Change in unemployment rate (percentage points)')
ax.set_ylabel('Real GDP growth (%)')
ax.set_title(f"Okun's Law: GDP Growth vs Unemployment Change  (R² = {r2_okun:.2f})")
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Open empty Colab notebook →

Case Studies for Bivariate Regression

Health Outcomes Across Countries

Health Expenditures vs. GDP

Outlier Detection and Influence

CAPM: Stock Betas

CAPM Residual Diagnostics

Okun's Law: GDP Growth vs. Unemployment

Actual vs. Predicted GDP Growth Over Time

Python Libraries and Code