Chapter 08 of 18 ยท Interactive Dashboard

Case Studies for Bivariate Regression

Explore health economics, financial markets, and macroeconomics through interactive regression analysis. Toggle outcomes, compare stocks, exclude outliers, and test Okun's Law.

Health Outcomes Across Countries

Does every extra dollar of health spending buy you more life? Or does the U.S. prove that spending and outcomes can come apart?

A coefficient can be statistically significant yet economically modest โ€” or economically large yet imprecise. Economic significance asks whether the effect is large enough to matter for policy; statistical significance asks whether it is different from zero. In cross-country regressions, always interpret both, and use heteroskedasticity-robust standard errors โ€” richer countries show more variation, making robust SEs essential for valid inference (KC 8.2).
What you can do here
  • Toggle between Life Expectancy and Infant Mortality as the outcome variable.
  • Toggle residuals on to see how far each country sits from the fitted line.
  • Hover any point to read the country name, spending, and outcome.
Intercept (b1)
โ€”
Slope (b2)
โ€”
se(b2)
โ€”
t-statistic
โ€”
R-squared
โ€”
n
โ€”
Try this
  1. Start with Life Expectancy. Rยฒ โ‰ˆ 0.32 and each extra $1,000 of spending predicts about 1.1 more years of life. Spending matters, but it explains only a third of the cross-country variation.
  2. Switch to Infant Mortality. The slope flips negative โ€” more spending, fewer infant deaths. The same predictor, the same sample, but a totally different lens on health.
  3. Toggle residuals on for both outcomes. The U.S. has the largest residual in each model. A persistent anomaly: the U.S. underperforms what spending alone would predict, regardless of which outcome you use.

Take-away: In cross-country regressions, report both the magnitude and the precision of the effect โ€” and always use robust SEs to handle heteroskedasticity. Read ยง8.1 in the chapter โ†’

Health Expenditures vs. GDP

Richer countries spend more on health โ€” but does spending rise proportionally with income, or faster? The answer decides whether health care is a luxury or a necessity.

Income elasticity of demand measures how spending changes with income. An elasticity near 1.0 means health care is a normal good โ€” a 1% increase in GDP associates with roughly a 1% increase in health spending. Health is neither a luxury (elasticity > 1) nor a necessity (< 1) in cross-country data.
What you can do here
  • Toggle between All 34 Countries and Exclude USA & LUX to see how two outliers shift the story.
  • Toggle highlights to mark the USA and Luxembourg in pink.
  • Watch the slope, Rยฒ, and elasticity-at-mean update in the stat cards.
Slope (b2)
โ€”
se(b2)
โ€”
R-squared
โ€”
Elasticity at mean
โ€”
n
โ€”
Try this
  1. Start with All 34 Countries. Rยฒ โ‰ˆ 0.60 and slope โ‰ˆ 0.09. GDP explains 60% of health-spending variation, but two countries pull the fit down.
  2. Switch to Exclude USA & LUX. Rยฒ jumps to ~0.93 and the slope rises to 0.13. Removing two of 34 countries transforms the fit โ€” influence matters far more than sample size here.
  3. Turn highlights on and compare the two fitted lines. Both flag the U.S. and Luxembourg as far from the trend. The 32-country line is the "typical OECD" pattern; the 34-country line mixes it with two exceptional cases.

Take-away: With income elasticity โ‰ˆ 1 in the "typical" OECD subset, health care behaves as a normal good โ€” outliers distort the slope but don't overturn the elasticity story. Read ยง8.2 in the chapter โ†’

Outlier Detection and Influence

How fragile is your regression? If dropping one country flips the slope or doubles the Rยฒ, your story isn't about the data โ€” it's about that country.

A few extreme observations can dramatically alter regression results. The outlier workflow has four steps: (1) identify outliers visually; (2) assess their influence on coefficients; (3) test robustness by re-running without them; (4) interpret results in context. Never silently drop outliers โ€” disclose and justify any exclusion.
What you can do here
  • Slide the "drop" count from 0 to 8 โ€” the widget removes the countries with the largest residuals first.
  • Watch the slope and Rยฒ update live as you drop.
  • Check the pink markers โ€” dropped countries are highlighted on the scatter.
Slope (b2)
โ€”
R-squared
โ€”
Dropped
โ€”
Remaining n
โ€”
Try this
  1. Start at 0 drops. Rยฒ โ‰ˆ 0.60, slope โ‰ˆ 0.09. This is the baseline story on all 34 countries.
  2. Slide to 2. The USA and Luxembourg (the two largest residuals) are removed and Rยฒ jumps to ~0.93. Two of 34 countries carry most of the model's "misfit" โ€” dropping them tightens the relationship dramatically.
  3. Slide up to 5 or 6. Rยฒ keeps creeping up but the marginal gain flattens. After the first few influential points, further exclusion is just chasing natural scatter โ€” time to stop.

Take-away: Influence analysis shows which observations drive your results โ€” report both with-and-without specifications so readers can judge robustness. Read ยง8.2 in the chapter โ†’

CAPM: Stock Betas

Some stocks amplify the market; others cushion you from it. One number โ€” the CAPM beta โ€” decides which camp each stock is in.

Beta measures how much a stock's returns co-move with the overall market โ€” this is its systematic risk. Beta < 1 marks a "defensive" stock (less volatile than the market); beta > 1 marks a "growth" stock that amplifies market moves. Only systematic risk is priced in efficient markets โ€” idiosyncratic risk diversifies away in portfolios (KC 8.6).
What you can do here
  • Toggle between Coca-Cola, Target, and Walmart to compare their betas and Rยฒ.
  • Toggle the 45ยฐ (beta = 1) reference line to see how the fitted slope compares to the market benchmark.
  • Read the alpha, beta, se(beta), and Rยฒ in the stat cards.
Alpha
โ€”
Beta
โ€”
se(beta)
โ€”
t(beta)
โ€”
R-squared
โ€”
n
โ€”
Try this
  1. Start with Coca-Cola and toggle the 45ยฐ reference line on. Beta โ‰ˆ 0.61 and the fitted slope is visibly flatter than the reference. A defensive stock โ€” it moves less than the market in both directions.
  2. Switch to Target. Beta climbs above 1.0 and the scatter widens. Target amplifies market moves and carries more idiosyncratic risk than the two consumer staples.
  3. Switch to Walmart. Beta lands between the other two โ€” defensive, like another consumer-staple giant. Stable demand for everyday goods keeps beta low across both staples despite very different company histories.

Take-away: Beta separates stocks into defensive (< 1) and amplifying (> 1) โ€” and in efficient markets only beta-risk is priced; diversifying away everything else costs nothing. Read ยง8.3 in the chapter โ†’

CAPM Residual Diagnostics

If CAPM fits, the residuals should look like pure noise. Do they? And which stock carries the most firm-specific surprise?

Rยฒ in CAPM measures the fraction of return variation explained by market movements โ€” the rest is idiosyncratic risk. Residuals capture firm-specific shocks (idiosyncratic risk); in efficient markets this portion diversifies away in a portfolio and earns no risk premium. A well-specified CAPM regression shows residuals randomly scattered around zero with no fan shape or curve.
What you can do here
  • Toggle between Coca-Cola, Target, and Walmart.
  • Toggle Residuals vs. Fitted to check for patterns (fans, curves) that would break CAPM's assumptions.
  • Toggle Residual Histogram to inspect the bell shape and the tails.
Mean residual
โ€”
SD residual
โ€”
Max |residual|
โ€”
R-squared
โ€”
Try this
  1. Pick Coca-Cola and view Residuals vs. Fitted. The cloud is roughly centered on zero with no fan. Homoskedasticity holds approximately โ€” the CAPM assumptions don't obviously break for this stock.
  2. Switch to Residual Histogram. The shape is roughly bell-like with fat tails. Most months are ordinary; a handful of large residuals mark firm-specific news events.
  3. Cycle through the three stocks in histogram view. Target's residual SD is clearly largest. Target carries the most idiosyncratic risk โ€” and thus the most that a diversified portfolio can wash away.

Take-away: What CAPM leaves unexplained is idiosyncratic risk โ€” in efficient markets that's free to diversify away, so it carries no reward. Read ยง8.3 in the chapter โ†’

Okun's Law: GDP Growth vs. Unemployment

Okun's 1962 rule of thumb says every 1-point jump in unemployment costs you two points of GDP growth. Does that still hold in the modern U.S. economy?

Okun's Law is a remarkably stable empirical regularity linking unemployment changes to GDP growth. The basic relationship holds across time periods and countries, but the exact coefficient drifts due to structural changes in labor markets, productivity trends, and institutional differences. Original Okun: slope โ‰ˆ โˆ’2.0. Our U.S. 1961โ€“2019 estimate: slope โ‰ˆ โˆ’1.59.
What you can do here
  • Slide the URATE change slider from โˆ’3 to +5 and watch the predicted GDP growth update.
  • Toggle Okun's original slope = โˆ’2.0 as a dashed reference line.
  • Hover individual years to identify recessions in the lower-right cluster.
Intercept (b1)
โ€”
Slope (b2)
โ€”
se(b2)
โ€”
R-squared
โ€”
Predicted GDP growth
โ€”
n
โ€”
Try this
  1. Set the prediction slider to +3 (unemployment rises 3 points). Predicted GDP growth drops to about โˆ’1.8%. That's a severe-recession forecast โ€” consistent with what the U.S. saw in 2008โ€“09 and 1982.
  2. Toggle Okun's original slope = โˆ’2.0 on. The dashed reference is visibly steeper than the fitted line. The modern U.S. relationship is slightly weaker than Okun's 1962 original โ€” structural changes in labor markets have softened the link.
  3. Hover the points in the lower-right cluster. These are recession years: 2008โ€“09, 1982, 1974โ€“75. Big unemployment rises paired with negative GDP growth โ€” the textbook Okun pattern made concrete.

Take-away: Okun's rule-of-thumb survives decades and countries, but not at exactly the same slope โ€” the relationship's stability matters more than the coefficient's precise value. Read ยง8.4 in the chapter โ†’

Actual vs. Predicted GDP Growth Over Time

Okun's Law is stable โ€” but for how long at a stretch? Slide the window to find the years where the model quietly stops working.

Structural breaks shift long-run relationships when something in the economy changes. Policy changes, technological shifts, and economic crises can reshape coefficients mid-sample. Visual inspection of actual vs. predicted values over time identifies periods when the model weakens or strengthens. Persistent runs of positive or negative residuals are red flags for a break.
What you can do here
  • Slide the start and end year to fit Okun's Law over any subperiod from 1961 to 2019.
  • Toggle Actual + Predicted to compare the two series over time.
  • Toggle Residuals to scan for runs of one sign โ€” a telltale of a structural break.
Subperiod slope
โ€”
Subperiod Rยฒ
โ€”
Years
โ€”
Mean |residual|
โ€”
Try this
  1. Set the range to 1961โ€“2007 and note Rยฒ and slope. Then extend to 1961โ€“2019. The fit barely moves โ€” the 1961โ€“2007 Okun relationship survives most of the full sample.
  2. Set 2008โ€“2019 only. The slope and Rยฒ drift from the full-sample numbers. The "jobless recovery" period left systematic prediction errors โ€” a candidate structural break.
  3. Toggle to Residuals. Scan for runs of positive or negative values, especially after 2008. Persistent runs of one sign are the visual signature of serial correlation or a structural break.

Take-away: Time-series regressions are not static truths โ€” scanning actual-vs-predicted plots is the cheapest way to catch a model that has quietly stopped working. Read ยง8.4 in the chapter โ†’

Python Libraries and Code

You've explored the key concepts interactively โ€” now reproduce them in Python. This self-contained code block covers everything you practiced above. Copy it into an empty notebook and run it.

# =============================================================================
# CHAPTER 8 CHEAT SHEET: Case Studies for Bivariate Regression
# =============================================================================

# --- Libraries ---
import pandas as pd                       # data loading and manipulation
import matplotlib.pyplot as plt           # creating plots and visualizations
from statsmodels.formula.api import ols   # OLS regression with R-style formulas

# =============================================================================
# STEP 1: Load OECD health data from a URL
# =============================================================================
# pd.read_stata() reads Stata .dta files โ€” this dataset covers 34 OECD countries
url_health = "https://raw.githubusercontent.com/quarcs-lab/data-open/master/AED/AED_HEALTH2009.DTA"
data_health = pd.read_stata(url_health)

print(f"Health dataset: {data_health.shape[0]} countries, {data_health.shape[1]} variables")

# =============================================================================
# STEP 2: Descriptive statistics โ€” summarize before modeling
# =============================================================================
# .describe() gives mean, std, min, quartiles, max for each variable
print(data_health[['hlthpc', 'lifeexp', 'infmort', 'gdppc']].describe().round(2))

# =============================================================================
# STEP 3: Health outcomes regression with robust standard errors
# =============================================================================
# Does higher health spending improve life expectancy?
model_life = ols('lifeexp ~ hlthpc', data=data_health).fit()

slope_life = model_life.params['hlthpc']
r2_life    = model_life.rsquared

print(f"Life expectancy: slope = {slope_life:.5f}, Rยฒ = {r2_life:.4f}")
print(f"Each extra $1,000 in spending โ†’ {slope_life*1000:.2f} more years of life expectancy")

# Robust standard errors adjust for non-constant error variance (heteroskedasticity)
model_life_robust = model_life.get_robustcov_results(cov_type='HC1')
model_life_robust.summary()

# =============================================================================
# STEP 4: Health spending vs GDP โ€” income elasticity
# =============================================================================
# How much of health spending is driven by national income?
model_gdp = ols('hlthpc ~ gdppc', data=data_health).fit()

slope_gdp = model_gdp.params['gdppc']
r2_gdp    = model_gdp.rsquared

# Income elasticity at the mean: (slope ร— mean_x) / mean_y
mean_gdp  = data_health['gdppc'].mean()
mean_hlth = data_health['hlthpc'].mean()
elasticity = (slope_gdp * mean_gdp) / mean_hlth

print(f"Health spending on GDP: slope = {slope_gdp:.4f}, Rยฒ = {r2_gdp:.4f}")
print(f"Income elasticity at the mean: {elasticity:.2f} (โ‰ˆ1.0 โ†’ normal good)")

# =============================================================================
# STEP 5: Outlier robustness โ€” excluding USA and Luxembourg
# =============================================================================
# Two countries drive much of the model's "misfit" โ€” test robustness by excluding them
data_subset = data_health[(data_health['code'] != 'USA') &
                          (data_health['code'] != 'LUX')]

model_subset = ols('hlthpc ~ gdppc', data=data_subset).fit()

print(f"\nAll 34 countries:  slope = {slope_gdp:.4f}, Rยฒ = {r2_gdp:.4f}")
print(f"Excluding USA/LUX: slope = {model_subset.params['gdppc']:.4f}, Rยฒ = {model_subset.rsquared:.4f}")
print("Removing 2 of 34 countries transforms Rยฒ โ€” always check for influential observations!")

fig, axes = plt.subplots(1, 2, figsize=(14, 5))
for ax, df, mdl, title in zip(
        axes,
        [data_health, data_subset],
        [model_gdp, model_subset],
        ['All 34 Countries', 'Excluding USA & Luxembourg']):
    ax.scatter(df['gdppc'], df['hlthpc'], s=50, alpha=0.7)
    ax.plot(df['gdppc'], mdl.fittedvalues, color='red', linewidth=2)
    ax.set_xlabel('GDP per capita ($)')
    ax.set_ylabel('Health spending per capita ($)')
    ax.set_title(f'{title}  (Rยฒ = {mdl.rsquared:.2f})')
    ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# =============================================================================
# STEP 6: CAPM โ€” estimating Coca-Cola's beta (systematic risk)
# =============================================================================
# Beta measures how a stock's excess return co-moves with the market excess return
url_capm = "https://raw.githubusercontent.com/quarcs-lab/data-open/master/AED/AED_CAPM.DTA"
data_capm = pd.read_stata(url_capm)

model_capm = ols('rko_rf ~ rm_rf', data=data_capm).fit()

alpha = model_capm.params['Intercept']   # excess return beyond CAPM prediction
beta  = model_capm.params['rm_rf']       # systematic risk
r2_capm = model_capm.rsquared

print(f"Coca-Cola CAPM: alpha = {alpha:.4f}, beta = {beta:.4f}, Rยฒ = {r2_capm:.4f}")
print(f"Beta < 1 โ†’ defensive stock (moves less than the market)")
print(f"Rยฒ = {r2_capm:.2%} explained by market; {1-r2_capm:.2%} is idiosyncratic risk")

# Full regression table
model_capm.summary()

# =============================================================================
# STEP 7: Okun's Law โ€” GDP growth vs unemployment change
# =============================================================================
# Okun (1962): each +1 point in unemployment โ†’ โ‰ˆ -2 points in GDP growth
url_gdp = "https://raw.githubusercontent.com/quarcs-lab/data-open/master/AED/AED_GDPUNEMPLOY.DTA"
data_gdp = pd.read_stata(url_gdp)

model_okun = ols('rgdpgrowth ~ uratechange', data=data_gdp).fit()

slope_okun = model_okun.params['uratechange']
r2_okun    = model_okun.rsquared

print(f"Okun's Law: slope = {slope_okun:.2f} (Okun's original: -2.0)")
print(f"Rยฒ = {r2_okun:.4f} โ€” unemployment explains {r2_okun*100:.0f}% of GDP growth variation")

# Scatter plot with fitted line
fig, ax = plt.subplots(figsize=(10, 6))
ax.scatter(data_gdp['uratechange'], data_gdp['rgdpgrowth'], s=50, alpha=0.7)
ax.plot(data_gdp['uratechange'], model_okun.fittedvalues, color='red', linewidth=2,
        label=f'Fitted: slope = {slope_okun:.2f}')
ax.axhline(y=0, color='gray', linestyle=':', linewidth=1, alpha=0.5)
ax.set_xlabel('Change in unemployment rate (percentage points)')
ax.set_ylabel('Real GDP growth (%)')
ax.set_title(f"Okun's Law: GDP Growth vs Unemployment Change  (Rยฒ = {r2_okun:.2f})")
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Open empty Colab notebook โ†’