Chapter 08 of 18 · Interactive Dashboard

Case Studies for Bivariate Regression

Explore health economics, financial markets, and macroeconomics through interactive regression analysis. Toggle outcomes, compare stocks, exclude outliers, and test Okun's Law.

Health Outcomes Across Countries

Does every extra dollar of health spending buy you more life? Or does the U.S. prove that spending and outcomes can come apart?

A coefficient can be statistically significant yet economically modest — or economically large yet imprecise. Economic significance asks whether the effect is large enough to matter for policy; statistical significance asks whether it is different from zero. In cross-country regressions, always interpret both, and use heteroskedasticity-robust standard errors — richer countries show more variation, making robust SEs essential for valid inference (KC 8.2).
What you can do here
  • Toggle between Life Expectancy and Infant Mortality as the outcome variable.
  • Toggle residuals on to see how far each country sits from the fitted line.
  • Hover any point to read the country name, spending, and outcome.
Intercept (b1)
Slope (b2)
se(b2)
t-statistic
R-squared
n
Try this
  1. Start with Life Expectancy. R² ≈ 0.32 and each extra $1,000 of spending predicts about 1.1 more years of life. Spending matters, but it explains only a third of the cross-country variation.
  2. Switch to Infant Mortality. The slope flips negative — more spending, fewer infant deaths. The same predictor, the same sample, but a totally different lens on health.
  3. Toggle residuals on for both outcomes. The U.S. has the largest residual in each model. A persistent anomaly: the U.S. underperforms what spending alone would predict, regardless of which outcome you use.

Take-away: In cross-country regressions, report both the magnitude and the precision of the effect — and always use robust SEs to handle heteroskedasticity. Read §8.1 in the chapter →

Health Expenditures vs. GDP

Richer countries spend more on health — but does spending rise proportionally with income, or faster? The answer decides whether health care is a luxury or a necessity.

Income elasticity of demand measures how spending changes with income. An elasticity near 1.0 means health care is a normal good — a 1% increase in GDP associates with roughly a 1% increase in health spending. Health is neither a luxury (elasticity > 1) nor a necessity (< 1) in cross-country data.
What you can do here
  • Toggle between All 34 Countries and Exclude USA & LUX to see how two outliers shift the story.
  • Toggle highlights to mark the USA and Luxembourg in pink.
  • Watch the slope, R², and elasticity-at-mean update in the stat cards.
Slope (b2)
se(b2)
R-squared
Elasticity at mean
n
Try this
  1. Start with All 34 Countries. R² ≈ 0.60 and slope ≈ 0.09. GDP explains 60% of health-spending variation, but two countries pull the fit down.
  2. Switch to Exclude USA & LUX. R² jumps to ~0.93 and the slope rises to 0.13. Removing two of 34 countries transforms the fit — influence matters far more than sample size here.
  3. Turn highlights on and compare the two fitted lines. Both flag the U.S. and Luxembourg as far from the trend. The 32-country line is the "typical OECD" pattern; the 34-country line mixes it with two exceptional cases.

Take-away: With income elasticity ≈ 1 in the "typical" OECD subset, health care behaves as a normal good — outliers distort the slope but don't overturn the elasticity story. Read §8.2 in the chapter →

Outlier Detection and Influence

How fragile is your regression? If dropping one country flips the slope or doubles the R², your story isn't about the data — it's about that country.

A few extreme observations can dramatically alter regression results. The outlier workflow has four steps: (1) identify outliers visually; (2) assess their influence on coefficients; (3) test robustness by re-running without them; (4) interpret results in context. Never silently drop outliers — disclose and justify any exclusion.
What you can do here
  • Slide the "drop" count from 0 to 8 — the widget removes the countries with the largest residuals first.
  • Watch the slope and R² update live as you drop.
  • Check the pink markers — dropped countries are highlighted on the scatter.
Slope (b2)
R-squared
Dropped
Remaining n
Try this
  1. Start at 0 drops. R² ≈ 0.60, slope ≈ 0.09. This is the baseline story on all 34 countries.
  2. Slide to 2. The USA and Luxembourg (the two largest residuals) are removed and R² jumps to ~0.93. Two of 34 countries carry most of the model's "misfit" — dropping them tightens the relationship dramatically.
  3. Slide up to 5 or 6. R² keeps creeping up but the marginal gain flattens. After the first few influential points, further exclusion is just chasing natural scatter — time to stop.

Take-away: Influence analysis shows which observations drive your results — report both with-and-without specifications so readers can judge robustness. Read §8.2 in the chapter →

CAPM: Stock Betas

Some stocks amplify the market; others cushion you from it. One number — the CAPM beta — decides which camp each stock is in.

Beta measures how much a stock's returns co-move with the overall market — this is its systematic risk. Beta < 1 marks a "defensive" stock (less volatile than the market); beta > 1 marks a "growth" stock that amplifies market moves. Only systematic risk is priced in efficient markets — idiosyncratic risk diversifies away in portfolios (KC 8.6).
What you can do here
  • Toggle between Coca-Cola, Target, and Walmart to compare their betas and R².
  • Toggle the 45° (beta = 1) reference line to see how the fitted slope compares to the market benchmark.
  • Read the alpha, beta, se(beta), and R² in the stat cards.
Alpha
Beta
se(beta)
t(beta)
R-squared
n
Try this
  1. Start with Coca-Cola and toggle the 45° reference line on. Beta ≈ 0.61 and the fitted slope is visibly flatter than the reference. A defensive stock — it moves less than the market in both directions.
  2. Switch to Target. Beta climbs above 1.0 and the scatter widens. Target amplifies market moves and carries more idiosyncratic risk than the two consumer staples.
  3. Switch to Walmart. Beta lands between the other two — defensive, like another consumer-staple giant. Stable demand for everyday goods keeps beta low across both staples despite very different company histories.

Take-away: Beta separates stocks into defensive (< 1) and amplifying (> 1) — and in efficient markets only beta-risk is priced; diversifying away everything else costs nothing. Read §8.3 in the chapter →

CAPM Residual Diagnostics

If CAPM fits, the residuals should look like pure noise. Do they? And which stock carries the most firm-specific surprise?

R² in CAPM measures the fraction of return variation explained by market movements — the rest is idiosyncratic risk. Residuals capture firm-specific shocks (idiosyncratic risk); in efficient markets this portion diversifies away in a portfolio and earns no risk premium. A well-specified CAPM regression shows residuals randomly scattered around zero with no fan shape or curve.
What you can do here
  • Toggle between Coca-Cola, Target, and Walmart.
  • Toggle Residuals vs. Fitted to check for patterns (fans, curves) that would break CAPM's assumptions.
  • Toggle Residual Histogram to inspect the bell shape and the tails.
Mean residual
SD residual
Max |residual|
R-squared
Try this
  1. Pick Coca-Cola and view Residuals vs. Fitted. The cloud is roughly centered on zero with no fan. Homoskedasticity holds approximately — the CAPM assumptions don't obviously break for this stock.
  2. Switch to Residual Histogram. The shape is roughly bell-like with fat tails. Most months are ordinary; a handful of large residuals mark firm-specific news events.
  3. Cycle through the three stocks in histogram view. Target's residual SD is clearly largest. Target carries the most idiosyncratic risk — and thus the most that a diversified portfolio can wash away.

Take-away: What CAPM leaves unexplained is idiosyncratic risk — in efficient markets that's free to diversify away, so it carries no reward. Read §8.3 in the chapter →

Okun's Law: GDP Growth vs. Unemployment

Okun's 1962 rule of thumb says every 1-point jump in unemployment costs you two points of GDP growth. Does that still hold in the modern U.S. economy?

Okun's Law is a remarkably stable empirical regularity linking unemployment changes to GDP growth. The basic relationship holds across time periods and countries, but the exact coefficient drifts due to structural changes in labor markets, productivity trends, and institutional differences. Original Okun: slope ≈ −2.0. Our U.S. 1961–2019 estimate: slope ≈ −1.59.
What you can do here
  • Slide the URATE change slider from −3 to +5 and watch the predicted GDP growth update.
  • Toggle Okun's original slope = −2.0 as a dashed reference line.
  • Hover individual years to identify recessions in the lower-right cluster.
Intercept (b1)
Slope (b2)
se(b2)
R-squared
Predicted GDP growth
n
Try this
  1. Set the prediction slider to +3 (unemployment rises 3 points). Predicted GDP growth drops to about −1.8%. That's a severe-recession forecast — consistent with what the U.S. saw in 2008–09 and 1982.
  2. Toggle Okun's original slope = −2.0 on. The dashed reference is visibly steeper than the fitted line. The modern U.S. relationship is slightly weaker than Okun's 1962 original — structural changes in labor markets have softened the link.
  3. Hover the points in the lower-right cluster. These are recession years: 2008–09, 1982, 1974–75. Big unemployment rises paired with negative GDP growth — the textbook Okun pattern made concrete.

Take-away: Okun's rule-of-thumb survives decades and countries, but not at exactly the same slope — the relationship's stability matters more than the coefficient's precise value. Read §8.4 in the chapter →

Actual vs. Predicted GDP Growth Over Time

Okun's Law is stable — but for how long at a stretch? Slide the window to find the years where the model quietly stops working.

Structural breaks shift long-run relationships when something in the economy changes. Policy changes, technological shifts, and economic crises can reshape coefficients mid-sample. Visual inspection of actual vs. predicted values over time identifies periods when the model weakens or strengthens. Persistent runs of positive or negative residuals are red flags for a break.
What you can do here
  • Slide the start and end year to fit Okun's Law over any subperiod from 1961 to 2019.
  • Toggle Actual + Predicted to compare the two series over time.
  • Toggle Residuals to scan for runs of one sign — a telltale of a structural break.
Subperiod slope
Subperiod R²
Years
Mean |residual|
Try this
  1. Set the range to 1961–2007 and note R² and slope. Then extend to 1961–2019. The fit barely moves — the 1961–2007 Okun relationship survives most of the full sample.
  2. Set 2008–2019 only. The slope and R² drift from the full-sample numbers. The "jobless recovery" period left systematic prediction errors — a candidate structural break.
  3. Toggle to Residuals. Scan for runs of positive or negative values, especially after 2008. Persistent runs of one sign are the visual signature of serial correlation or a structural break.

Take-away: Time-series regressions are not static truths — scanning actual-vs-predicted plots is the cheapest way to catch a model that has quietly stopped working. Read §8.4 in the chapter →

Code Summary

You've explored the key concepts interactively — now reproduce them in code. These self-contained blocks cover everything you practiced above. Pick your language, copy the code, and run it.

# =============================================================================
# CHAPTER 8 CHEAT SHEET: Case Studies for Bivariate Regression
# =============================================================================

# --- Libraries ---
import pandas as pd                       # data loading and manipulation
import matplotlib.pyplot as plt           # creating plots and visualizations
import pyfixest as pf                     # fast OLS estimation with feols()
# !pip install pyfixest                   # uncomment if running in Google Colab

# =============================================================================
# STEP 1: Load OECD health data from a URL
# =============================================================================
# pd.read_stata() reads Stata .dta files — this dataset covers 34 OECD countries
url_health = "https://raw.githubusercontent.com/quarcs-lab/data-open/master/AED/AED_HEALTH2009.DTA"
data_health = pd.read_stata(url_health)

print(f"Health dataset: {data_health.shape[0]} countries, {data_health.shape[1]} variables")

# =============================================================================
# STEP 2: Descriptive statistics — summarize before modeling
# =============================================================================
# .describe() gives mean, std, min, quartiles, max for each variable
print(data_health[['hlthpc', 'lifeexp', 'infmort', 'gdppc']].describe().round(2))

# =============================================================================
# STEP 3: Health outcomes regression with robust standard errors
# =============================================================================
# Does higher health spending improve life expectancy?
fit_life = pf.feols('lifeexp ~ hlthpc', data=data_health)

slope_life = fit_life.coef()['hlthpc']
r2_life    = fit_life._r2

print(f"Life expectancy: slope = {slope_life:.5f}, R² = {r2_life:.4f}")
print(f"Each extra $1,000 in spending → {slope_life*1000:.2f} more years of life expectancy")

# Robust standard errors adjust for non-constant error variance (heteroskedasticity)
fit_life_robust = pf.feols('lifeexp ~ hlthpc', data=data_health, vcov='HC1')
fit_life_robust.summary()

# =============================================================================
# STEP 4: Health spending vs GDP — income elasticity
# =============================================================================
# How much of health spending is driven by national income?
fit_gdp = pf.feols('hlthpc ~ gdppc', data=data_health)

slope_gdp = fit_gdp.coef()['gdppc']
r2_gdp    = fit_gdp._r2

# Income elasticity at the mean: (slope × mean_x) / mean_y
mean_gdp  = data_health['gdppc'].mean()
mean_hlth = data_health['hlthpc'].mean()
elasticity = (slope_gdp * mean_gdp) / mean_hlth

print(f"Health spending on GDP: slope = {slope_gdp:.4f}, R² = {r2_gdp:.4f}")
print(f"Income elasticity at the mean: {elasticity:.2f} (≈1.0 → normal good)")

# =============================================================================
# STEP 5: Outlier robustness — excluding USA and Luxembourg
# =============================================================================
# Two countries drive much of the model's "misfit" — test robustness by excluding them
data_subset = data_health[(data_health['code'] != 'USA') &
                          (data_health['code'] != 'LUX')]

fit_subset = pf.feols('hlthpc ~ gdppc', data=data_subset)

print(f"\nAll 34 countries:  slope = {slope_gdp:.4f}, R² = {r2_gdp:.4f}")
print(f"Excluding USA/LUX: slope = {fit_subset.coef()['gdppc']:.4f}, R² = {fit_subset._r2:.4f}")
print("Removing 2 of 34 countries transforms R² — always check for influential observations!")

fig, axes = plt.subplots(1, 2, figsize=(14, 5))
for ax, df, ft, title in zip(
        axes,
        [data_health, data_subset],
        [fit_gdp, fit_subset],
        ['All 34 Countries', 'Excluding USA & Luxembourg']):
    ax.scatter(df['gdppc'], df['hlthpc'], s=50, alpha=0.7)
    ax.plot(df['gdppc'], ft.predict(), color='red', linewidth=2)
    ax.set_xlabel('GDP per capita ($)')
    ax.set_ylabel('Health spending per capita ($)')
    ax.set_title(f'{title}  (R² = {ft._r2:.2f})')
    ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# =============================================================================
# STEP 6: CAPM — estimating Coca-Cola's beta (systematic risk)
# =============================================================================
# Beta measures how a stock's excess return co-moves with the market excess return
url_capm = "https://raw.githubusercontent.com/quarcs-lab/data-open/master/AED/AED_CAPM.DTA"
data_capm = pd.read_stata(url_capm)

fit_capm = pf.feols('rko_rf ~ rm_rf', data=data_capm)

alpha = fit_capm.coef()['Intercept']     # excess return beyond CAPM prediction
beta  = fit_capm.coef()['rm_rf']         # systematic risk
r2_capm = fit_capm._r2

print(f"Coca-Cola CAPM: alpha = {alpha:.4f}, beta = {beta:.4f}, R² = {r2_capm:.4f}")
print(f"Beta < 1 → defensive stock (moves less than the market)")
print(f"R² = {r2_capm:.2%} explained by market; {1-r2_capm:.2%} is idiosyncratic risk")

# Full regression table
fit_capm.summary()

# =============================================================================
# STEP 7: Okun's Law — GDP growth vs unemployment change
# =============================================================================
# Okun (1962): each +1 point in unemployment → ≈ -2 points in GDP growth
url_gdp = "https://raw.githubusercontent.com/quarcs-lab/data-open/master/AED/AED_GDPUNEMPLOY.DTA"
data_gdp = pd.read_stata(url_gdp)

fit_okun = pf.feols('rgdpgrowth ~ uratechange', data=data_gdp)

slope_okun = fit_okun.coef()['uratechange']
r2_okun    = fit_okun._r2

print(f"Okun's Law: slope = {slope_okun:.2f} (Okun's original: -2.0)")
print(f"R² = {r2_okun:.4f} — unemployment explains {r2_okun*100:.0f}% of GDP growth variation")

# Scatter plot with fitted line
fig, ax = plt.subplots(figsize=(10, 6))
ax.scatter(data_gdp['uratechange'], data_gdp['rgdpgrowth'], s=50, alpha=0.7)
ax.plot(data_gdp['uratechange'], fit_okun.predict(), color='red', linewidth=2,
        label=f'Fitted: slope = {slope_okun:.2f}')
ax.axhline(y=0, color='gray', linestyle=':', linewidth=1, alpha=0.5)
ax.set_xlabel('Change in unemployment rate (percentage points)')
ax.set_ylabel('Real GDP growth (%)')
ax.set_title(f"Okun's Law: GDP Growth vs Unemployment Change  (R² = {r2_okun:.2f})")
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Open empty Colab notebook →
* =============================================================================
* CHAPTER 8 CHEAT SHEET: Case Studies for Bivariate Regression
* =============================================================================

* --- Setup ---
clear all                                // start with a clean workspace
set more off                             // do not pause output for long results

* =============================================================================
* STEP 1: Load OECD health data from a URL
* =============================================================================
* use loads a Stata .dta file; "clear" drops any data already in memory
use "https://raw.githubusercontent.com/quarcs-lab/data-open/master/AED/AED_HEALTH2009.DTA", clear

describe                                 // list all variables, types, and labels
display "Observations: " _N              // _N is Stata's built-in observation count

* =============================================================================
* STEP 2: Descriptive statistics — summarize before modeling
* =============================================================================
* summarize gives n, mean, std, min, max for key variables
summarize hlthpc lifeexp infmort gdppc

* "detail" adds median, skewness, kurtosis, and percentiles
summarize hlthpc, detail

* =============================================================================
* STEP 3: Health outcomes regression with robust standard errors
* =============================================================================
* Does higher health spending improve life expectancy?
* regress fits OLS: depvar followed by independent variables
regress lifeexp hlthpc

// After running regress, Stata stores results you can reference:
display "Slope on hlthpc: " _b[hlthpc]
display "R-squared:       " e(r2)
display "Each extra $1,000 in spending → " _b[hlthpc]*1000 " more years of life"

// Robust standard errors adjust for heteroskedasticity (non-constant error variance)
// Adding ", robust" or ", vce(robust)" re-estimates with HC1 standard errors
regress lifeexp hlthpc, vce(robust)

* =============================================================================
* STEP 4: Health spending vs GDP — income elasticity
* =============================================================================
* How much of health spending is driven by national income?
regress hlthpc gdppc

// Income elasticity at the mean: (slope × mean_x) / mean_y
// After regress, _b[gdppc] holds the slope coefficient
summarize gdppc
local mean_gdp = r(mean)
summarize hlthpc
local mean_hlth = r(mean)
local elasticity = (_b[gdppc] * `mean_gdp') / `mean_hlth'

display "Income elasticity at the mean: " `elasticity' "  (≈1.0 → normal good)"

* =============================================================================
* STEP 5: Outlier robustness — excluding USA and Luxembourg
* =============================================================================
* Two countries drive much of the model's "misfit" — test robustness
* First run with all 34 countries (already done above), then exclude outliers

// "if" restricts the estimation sample without dropping data
regress hlthpc gdppc if code != "USA" & code != "LUX"

display "Excluding USA/LUX: slope = " _b[gdppc] ", R² = " e(r2)
display "Removing 2 of 34 countries transforms R² — always check for influential observations!"

// Scatter with fitted line — all countries
predict hlthpc_hat                       // predict fitted values from last regression
twoway (scatter hlthpc gdppc, msize(small))                                  ///
       (lfit hlthpc gdppc, lcolor(red) lwidth(medthick)),                    ///
    title("All 34 Countries") xtitle("GDP per capita ($)")                   ///
    ytitle("Health spending per capita ($)") legend(off) name(all, replace)

// Scatter with fitted line — excluding USA and Luxembourg
twoway (scatter hlthpc gdppc if code != "USA" & code != "LUX", msize(small)) ///
       (lfit hlthpc gdppc if code != "USA" & code != "LUX",                  ///
            lcolor(red) lwidth(medthick)),                                    ///
    title("Excluding USA & Luxembourg") xtitle("GDP per capita ($)")         ///
    ytitle("Health spending per capita ($)") legend(off) name(excl, replace)

graph combine all excl, title("Outlier Robustness Check")

* =============================================================================
* STEP 6: CAPM — estimating Coca-Cola's beta (systematic risk)
* =============================================================================
* Beta measures how a stock's excess return co-moves with the market
use "https://raw.githubusercontent.com/quarcs-lab/data-open/master/AED/AED_CAPM.DTA", clear

regress rko_rf rm_rf

// _b[_cons] is the intercept (alpha); _b[rm_rf] is the slope (beta)
display "Alpha (excess return): " _b[_cons]
display "Beta (systematic risk): " _b[rm_rf]
display "R-squared: " e(r2)
display "Beta < 1 → defensive stock (moves less than the market)"

// Scatter plot with fitted line
twoway (scatter rko_rf rm_rf, msize(small) mcolor(navy%60))                  ///
       (lfit rko_rf rm_rf, lcolor(red) lwidth(medthick)),                    ///
    xtitle("Market excess return (%)") ytitle("Coca-Cola excess return (%)")  ///
    title("CAPM: Coca-Cola vs Market") legend(off)

* =============================================================================
* STEP 7: Okun's Law — GDP growth vs unemployment change
* =============================================================================
* Okun (1962): each +1 point in unemployment → ≈ -2 points in GDP growth
use "https://raw.githubusercontent.com/quarcs-lab/data-open/master/AED/AED_GDPUNEMPLOY.DTA", clear

regress rgdpgrowth uratechange

display "Okun's Law slope: " _b[uratechange] "  (Okun's original: -2.0)"
display "R-squared: " e(r2)

// predict fitted values and residuals for plotting
predict gdp_hat                          // fitted values from last regression
predict gdp_resid, residuals             // residuals = actual - fitted

// Scatter plot with fitted line
twoway (scatter rgdpgrowth uratechange, msize(small) mcolor(navy%60))        ///
       (lfit rgdpgrowth uratechange, lcolor(red) lwidth(medthick)),          ///
    xtitle("Change in unemployment rate (pp)")                                ///
    ytitle("Real GDP growth (%)")                                             ///
    title("Okun's Law: GDP Growth vs Unemployment Change")                    ///
    yline(0, lcolor(gray) lpattern(dot)) legend(off)
Paste into your Stata do-file editor
# =============================================================================
# CHAPTER 8 CHEAT SHEET: Case Studies for Bivariate Regression
# =============================================================================

# --- Libraries ---
library(haven)           # read Stata .dta files
library(fixest)          # fast OLS estimation with feols()
library(dplyr)           # data manipulation
library(ggplot2)         # grammar of graphics

# =============================================================================
# STEP 1: Load OECD health data from a URL
# =============================================================================
url_health <- "https://raw.githubusercontent.com/quarcs-lab/data-open/master/AED/AED_HEALTH2009.DTA"
data_health <- read_dta(url_health)

cat("Health dataset:", nrow(data_health), "countries,", ncol(data_health), "variables\n")

# =============================================================================
# STEP 2: Descriptive statistics — summarize before modeling
# =============================================================================
summary(data_health[, c("hlthpc", "lifeexp", "infmort", "gdppc")])

# =============================================================================
# STEP 3: Health outcomes regression with robust standard errors
# =============================================================================
# Does higher health spending improve life expectancy?
model_life <- feols(lifeexp ~ hlthpc, data = data_health, vcov = "HC1")
summary(model_life)

cat("Each extra $1,000 in spending \u2192",
    round(coef(model_life)["hlthpc"] * 1000, 2), "more years of life\n")

# =============================================================================
# STEP 4: Health spending vs GDP — income elasticity
# =============================================================================
model_gdp <- feols(hlthpc ~ gdppc, data = data_health)

# Income elasticity at the mean: (slope x mean_x) / mean_y
elasticity <- coef(model_gdp)["gdppc"] * mean(data_health$gdppc) /
              mean(data_health$hlthpc)
cat("Income elasticity at the mean:", round(elasticity, 2),
    "(\u2248 1.0 \u2192 normal good)\n")

# =============================================================================
# STEP 5: Outlier robustness — excluding USA and Luxembourg
# =============================================================================
data_subset <- data_health |> filter(!code %in% c("USA", "LUX"))
model_subset <- feols(hlthpc ~ gdppc, data = data_subset)

cat("\nAll 34 countries:  slope =", round(coef(model_gdp)["gdppc"], 4),
    ", R\u00b2 =", round(r2(model_gdp), 4), "\n")
cat("Excluding USA/LUX: slope =", round(coef(model_subset)["gdppc"], 4),
    ", R\u00b2 =", round(r2(model_subset), 4), "\n")

# etable() from fixest compares models side by side
etable(model_gdp, model_subset, headers = c("All", "Excl. USA/LUX"))

# =============================================================================
# STEP 6: CAPM — estimating Coca-Cola's beta (systematic risk)
# =============================================================================
url_capm <- "https://raw.githubusercontent.com/quarcs-lab/data-open/master/AED/AED_CAPM.DTA"
data_capm <- read_dta(url_capm)

model_capm <- feols(rko_rf ~ rm_rf, data = data_capm)
summary(model_capm)

cat("Alpha (excess return):", round(coef(model_capm)["(Intercept)"], 4), "\n")
cat("Beta (systematic risk):", round(coef(model_capm)["rm_rf"], 4), "\n")
cat("Beta < 1 \u2192 defensive stock (moves less than the market)\n")

# =============================================================================
# STEP 7: Okun's Law — GDP growth vs unemployment change
# =============================================================================
url_gdp <- "https://raw.githubusercontent.com/quarcs-lab/data-open/master/AED/AED_GDPUNEMPLOY.DTA"
data_gdp <- read_dta(url_gdp)

model_okun <- feols(rgdpgrowth ~ uratechange, data = data_gdp)
summary(model_okun)

cat("Okun's Law slope:", round(coef(model_okun)["uratechange"], 2),
    "(Okun's original: -2.0)\n")

ggplot(data_gdp, aes(x = uratechange, y = rgdpgrowth)) +
  geom_point(color = "steelblue", size = 2, alpha = 0.7) +
  geom_smooth(method = "lm", formula = y ~ x, color = "red",
              linewidth = 1.2, se = FALSE) +
  geom_hline(yintercept = 0, color = "gray", linetype = "dotted") +
  labs(x = "Change in unemployment rate (pp)", y = "Real GDP growth (%)",
       title = "Okun's Law: GDP Growth vs Unemployment Change") +
  theme_minimal()
Paste into your R console or RStudio