Chapter 07 — Statistical Inference for Bivariate Regression

When the sample is small, how confident can we be about β₂? The normal bell curve underpromises — it does not account for estimating σ itself. That is the job of the t-distribution.

The t-distribution is the small-sample cousin of the normal. It is bell-shaped and symmetric, but its tails are heavier because we have to estimate the population variance from the sample. It is indexed by degrees of freedom (df = n − 2 for bivariate regression), and converges to the normal as df grows — for n > 100 the difference is negligible.

Degrees of freedom 5

Show area

t crit (2.5%)

—

z crit (N(0,1))

1.960

Difference

—

Variance df/(df-2)

—

Try this

Set df = 3 (only n = 5 observations). The t critical value is ≈ 3.18 versus 1.96 for the normal. With tiny samples you need much more extreme t-values to reject.
Slide df to 27 (the house-price sample). t_crit = 2.052, only ~5% above 1.96. At this sample size the t and normal are already close enough that either gives almost the same decision.
Push df to 100. The two curves visually overlap. This is why large-sample theory happily uses the normal approximation.

Take-away: When σ is estimated, the t-distribution is the honest ruler — heavy-tailed for small samples, normal-like for big ones. Read §7.2 in the chapter →

You ran one regression and got one confidence interval. So what does 95% actually mean — about this interval, or about the procedure?

A 95% confidence interval is a statement about the procedure, not this specific interval. Across many resamples, 95% of constructed intervals contain β₂. It does NOT mean there is a 95% probability that β₂ lies inside the one you computed — β₂ is fixed; the interval is the random thing. The formula is b₂ ± t_{n−2, α/2} · se(b₂).

Number of CIs 100

Sample size n 30

Confidence level 95%

True slope

2.0000

Coverage rate

—

Avg CI width

—

Missed

—

Try this

At n = 30 with 100 intervals, count the pink (missed) ones. You should see roughly 5; click Resimulate a few times. The miss rate jitters around 5 — exactly what "95% coverage" promises.
Drop n to 10. Intervals get much wider but coverage stays near 95%. The t-distribution automatically inflates the interval to compensate for small samples.
Switch the confidence level to 80%. Intervals narrow sharply and about 20 in 100 miss. Width and coverage are two ends of the same lever.

Take-away: A CI is an instrument whose long-run hit rate you control — the parameter is fixed; the intervals are the random things. Read §7.3 in the chapter →

Is a slope of $73.77/sq ft really different from a round number like $90? Or from zero? Hypothesis testing turns the question into a reproducible decision.

A hypothesis test is a reproducible rule for deciding about a parameter. Write the null H₀ and alternative H_a, pick a significance level α, compute t = (b₂ − null) / se(b₂), turn t into a p-value, and reject H₀ when p < α. Two-sided tests catch deviations either way; one-sided tests target a direction and have more power there at the cost of none in the other tail. We never "accept" H₀ — only fail to reject.

Null value (H0: slope =) 0

Significance level 5%

Test type

b2 (estimate)

73.77

se(b2)

11.17

t-statistic

—

p-value

—

Decision

—

Try this

Set null = 0 (two-sided). t = 6.60 sits far in the tail; p ≈ 0. House size clearly matters for price.
Set null = 90 (two-sided). t = −1.45, p = 0.158. We fail to reject — the data are consistent with a $90/sq ft effect.
Keep null = 90 and switch to Left-Tail. The p-value halves to ~0.079. Directional tests are more powerful in their target direction, at the price of blindness in the other.

Take-away: A hypothesis test converts a claim about β₂ into a decision rule you can defend — but you can only ever fail to reject H₀, never accept it. Read §7.4 in the chapter →

What does "p < 0.05" actually mean? And why does moving the t-statistic a little shrink the p-value so much?

The p-value is the probability of seeing a test statistic at least this extreme if H₀ were true. Small p means your data would be rare under H₀ — strong evidence against it. Rough thresholds: < 0.01 strong, < 0.05 moderate, < 0.10 weak. A small p does NOT mean H₀ is false with that probability, and it says nothing about effect size.

t-statistic 2.00

Degrees of freedom 27

Test type

|t|

—

p-value

—

Significant at 5%?

—

Significant at 1%?

—

Try this

Set t = 1.96 with df = 27 (two-sided). The p-value lands just above 0.05. The t-tail needs slightly more extreme values than the normal does.
Nudge to t = 2.05 (the df = 27 critical value). p is exactly 0.05. This is the boundary between "reject" and "fail to reject" at the 5% level.
Drag t from 0 to 5 slowly. The p-value slides from 1.0 toward 0. The mapping is monotonic but very nonlinear — small changes near the critical value move p a lot.

Take-away: p measures how surprising your data would be under H₀ — not how likely H₀ is, and not how big the effect is. Read §7.5 in the chapter →

A news headline says an effect is "statistically significant." Does that mean it matters in dollars? Not always — sample size can dress up a trivial effect or hide an important one.

Statistical and economic significance answer different questions. Statistical asks "is the effect different from zero?" — driven by t = b₂/se(b₂), which grows with n. Economic asks "is the effect large enough to matter?" — driven by the coefficient's size in context. Large n can make trivial effects significant; small n can hide real ones. Always report both the estimate and its confidence interval.

True slope 0.20

Sample size n 500

Error sigma 2.0

True slope

—

Estimated b2

—

t-statistic

—

p-value

—

Significant?

—

Try this

Set slope = 0.10 and n = 50, then push n to 2,000. The same tiny slope flips from non-significant to highly significant. "Significant" does not mean "big."
Set slope = 2.0 and n = 10, then bump n to 30. The clearly large effect flips from often failing significance to almost always passing. Small samples can hide real effects.
Compare slope = 0.05 at n = 2,000 against slope = 1.5 at n = 15. One is "significant but tiny"; the other is "large but not significant." The second finding is usually more useful for a real decision.

Take-away: Significance and magnitude are two different dials — always report the coefficient and its confidence interval, not just the p-value. Read §7.4 in the chapter →

What happens to a textbook standard error when the noise is bigger in some parts of the data than others? The short answer: it lies, and our p-values lie with it.

Heteroskedasticity-robust (HC1) standard errors stay valid even when the error variance changes with x. Default SEs assume homoskedasticity (constant Var[u|x]). When that fails, coefficients are still unbiased but the default SEs are wrong, invalidating tests and CIs. HC1 SEs work in both cases — modern best practice for cross-sectional data.

Error structure

Sample size n 50

Het. severity 50%

Standard se(b2)

—

Robust se(b2)

—

Ratio (Robust/Std)

—

True slope

2.0000

Try this

Start on Homoskedastic. Standard and robust SEs are nearly identical — the ratio hugs 1.0. With constant variance, both formulas agree.
Switch to Heteroskedastic at 100% severity. The residual plot opens into a fan. The robust SE is usually larger, which protects against false positives.
Click Resimulate a few times under heteroskedasticity. Standard SEs bounce more; robust SEs are steadier. Robust SEs are paying for insurance — you only notice the value when the default fails.

Take-away: When the error variance can change with x, reach for robust (HC1) SEs — they cost you nothing when homoskedasticity holds. Read §7.7 in the chapter →

Twenty-nine houses, one regression line — how much can we actually say about the per-square-foot premium?

The six inference tools — t-distribution, standard errors, CIs, hypothesis tests, p-values, robust SEs — work together on real data. For these 29 Central Davis houses (1999): b₂ = $73.77/sq ft, se(b₂) = 11.17, 95% CI = [50.84, 96.70], robust SE = 11.33. The CI excludes 0 (the effect is real) but includes 90 (we cannot rule out a $90/sq ft premium).

Test null value ($/sq ft) 0

Confidence level 95%

SE type

b2 (slope)

—

se(b2)

—

t-statistic

—

p-value

—

R-squared

—

Try this

Set null = 0. t = 6.60, p ≈ 0. House size predicts price beyond reasonable doubt.
Set null = 90, then try null = 97. 90 sits inside the 95% CI, so we fail to reject; 97 is outside and the decision flips to "reject." The CI and the test point at the same boundary.
Toggle between Standard and Robust SEs. The CI barely changes (11.17 vs 11.33). Heteroskedasticity looks mild in this dataset — robust and standard agree.

Take-away: On real data, a narrow CI and a large-magnitude t-statistic tell the same story: house size matters, by about $74/sq ft. Read §7.1 in the chapter →

Code Summary

You've explored the key concepts interactively — now reproduce them in code. These self-contained blocks cover everything you practiced above. Pick your language, copy the code, and run it.

# =============================================================================
# CHAPTER 7 CHEAT SHEET: Statistical Inference for Bivariate Regression
# =============================================================================

# --- Libraries ---
import pandas as pd                       # data loading and manipulation
import matplotlib.pyplot as plt           # creating plots and visualizations
import pyfixest as pf                     # fast OLS estimation with feols()
# !pip install pyfixest                   # uncomment if running in Google Colab
from scipy import stats                   # t-distribution and critical values

# =============================================================================
# STEP 1: Load data directly from a URL
# =============================================================================
# pd.read_stata() reads Stata .dta files — the house price dataset has 29 houses
url = "https://raw.githubusercontent.com/quarcs-lab/data-open/master/AED/AED_HOUSE.DTA"
data_house = pd.read_stata(url)

print(f"Dataset: {data_house.shape[0]} observations, {data_house.shape[1]} variables")

# =============================================================================
# STEP 2: Estimate the regression and extract key statistics
# =============================================================================
# The t-statistic measures how many standard errors the estimate is from zero
fit = pf.feols('price ~ size', data=data_house)

slope     = fit.coef()['size']         # marginal effect: $/sq ft
intercept = fit.coef()['Intercept']
se_slope  = fit.se()['size']           # standard error of the slope
t_stat    = fit.tstat()['size']        # t = b2 / se(b2)
p_value   = fit.pvalue()['size']         # two-sided p-value for H0: b2 = 0

print(f"Estimated equation: price = {intercept:,.0f} + {slope:.2f} x size")
print(f"Standard error of slope: {se_slope:.2f}")
print(f"t-statistic: {t_stat:.4f}")
print(f"p-value: {p_value:.6f}")

# Full regression table (coefficients, std errors, t-stats, p-values, R2)
fit.summary()

# =============================================================================
# STEP 3: Confidence interval — a range of plausible values for b2
# =============================================================================
# CI = b2 +/- t_crit x se(b2), using T(n-2) distribution
n  = len(data_house)
df = n - 2                                        # degrees of freedom
t_crit = stats.t.ppf(0.975, df)                   # critical value for 95% CI

ci_lower = slope - t_crit * se_slope
ci_upper = slope + t_crit * se_slope

print(f"Degrees of freedom: {df}")
print(f"Critical t-value (alpha=0.05, two-sided): {t_crit:.4f}")
print(f"95% CI for slope: [{ci_lower:.2f}, {ci_upper:.2f}]")
print(f"Interpretation: each sq ft adds between ${ci_lower:.0f} and ${ci_upper:.0f} to price")

# =============================================================================
# STEP 4: Hypothesis tests — does size matter? Is the effect $90/sq ft?
# =============================================================================
# Test 1: Statistical significance (H0: b2 = 0)
print(f"Test H0: b2 = 0  ->  t = {t_stat:.2f}, p = {p_value:.6f}  ->  Reject H0")

# Test 2: Two-sided test for a specific value (H0: b2 = 90)
null_value = 90
t_90 = (slope - null_value) / se_slope
p_90 = 2 * (1 - stats.t.cdf(abs(t_90), df))

print(f"Test H0: b2 = 90  ->  t = {t_90:.4f}, p = {p_90:.4f}  ->  Fail to reject H0")
print(f"  (90 is inside the 95% CI [{ci_lower:.2f}, {ci_upper:.2f}])")

# =============================================================================
# STEP 5: One-sided test — does size increase price by less than $90/sq ft?
# =============================================================================
# H0: b2 >= 90  vs  Ha: b2 < 90 (lower-tailed test)
p_lower = stats.t.cdf(t_90, df)                   # one-sided p-value (left tail)

print(f"One-sided test H0: b2 >= 90 vs Ha: b2 < 90")
print(f"  t = {t_90:.4f}, one-sided p = {p_lower:.4f}")
print(f"  Fail to reject at 5% (p = {p_lower:.3f} > 0.05)")
print(f"  Would reject at 10% (p = {p_lower:.3f} < 0.10)")

# =============================================================================
# STEP 6: Robust standard errors — valid with or without heteroskedasticity
# =============================================================================
# HC1 robust SEs protect against non-constant variance in the errors
fit_robust = pf.feols('price ~ size', data=data_house, vcov='HC1')

print(f"{'':20s} {'Standard':>12s} {'Robust (HC1)':>12s}")
print("-" * 46)
print(f"{'SE(size)':<20s} {se_slope:>12.2f} {fit_robust.se()['size']:>12.2f}")
print(f"{'t-statistic':<20s} {t_stat:>12.2f} {fit_robust.tstat()['size']:>12.2f}")
print(f"{'p-value':<20s} {p_value:>12.6f} {fit_robust.pvalue()['size']:>12.6f}")

Open empty Colab notebook →

* =============================================================================
* CHAPTER 7 CHEAT SHEET: Statistical Inference for Bivariate Regression
* =============================================================================

* --- Setup ---
clear all                                // start with a clean workspace
set more off                             // do not pause output for long results

* =============================================================================
* STEP 1: Load data directly from a URL
* =============================================================================
* use loads a Stata .dta file; "clear" drops any data already in memory
use "https://raw.githubusercontent.com/quarcs-lab/data-open/master/AED/AED_HOUSE.DTA", clear

describe                                 // list all variables, types, and labels
display "Observations: " _N              // _N is Stata's built-in observation count

* =============================================================================
* STEP 2: Estimate the regression and extract key statistics
* =============================================================================
* regress fits OLS; the t-statistic measures how many SEs the estimate is from 0
regress price size

// After regress, Stata stores results you can reference:
display "Slope (b2):       " _b[size]         // estimated coefficient
display "Intercept (b1):   " _b[_cons]
display "SE of slope:      " _se[size]        // standard error of the slope
display "t-statistic:      " _b[size] / _se[size]

* =============================================================================
* STEP 3: Confidence interval — a range of plausible values for b2
* =============================================================================
* The confidence interval is shown in the regress output (last two columns)
* CI = b2 +/- t_crit x se(b2), using T(n-2) distribution

local n = e(N)                                    // number of observations
local df = `n' - 2                                // degrees of freedom
display "Degrees of freedom: " `df'

// Critical t-value for 95% CI (two-sided, alpha/2 = 0.025)
display "Critical t-value (alpha=0.05): " invttail(`df', 0.025)

// Display confidence interval bounds
local slope = _b[size]
local se    = _se[size]
local tcrit = invttail(`df', 0.025)
display "95% CI for slope: [" `slope' - `tcrit' * `se' ", " `slope' + `tcrit' * `se' "]"

* =============================================================================
* STEP 4: Hypothesis tests — does size matter? Is the effect $90/sq ft?
* =============================================================================
* Test 1: Statistical significance (H0: b2 = 0)
* This is the default test shown in the regress output
test size                                // F-test equivalent of t-test for H0: b2 = 0

* Test 2: Two-sided test for a specific value (H0: b2 = 90)
* lincom computes a linear combination and tests if it equals zero
* So lincom size - 90 tests H0: b2 - 90 = 0, i.e., H0: b2 = 90
lincom size - 90

// Alternative: use test with an explicit null value
test size = 90

* =============================================================================
* STEP 5: One-sided test — does size increase price by less than $90/sq ft?
* =============================================================================
* H0: b2 >= 90  vs  Ha: b2 < 90 (lower-tailed test)
* Stata's lincom gives a two-sided p-value; divide by 2 for one-sided

// Compute t-statistic for H0: b2 = 90
local t_90 = (_b[size] - 90) / _se[size]
display "t-statistic for H0: b2 = 90: " `t_90'

// One-sided p-value (left tail): Pr(T < t_90)
display "One-sided p-value (Ha: b2 < 90): " ttail(`df', -`t_90')

* =============================================================================
* STEP 6: Robust standard errors — valid with or without heteroskedasticity
* =============================================================================
* vce(robust) computes HC1 robust SEs, protecting against heteroskedasticity

// Standard OLS (for comparison)
quietly regress price size
local se_std = _se[size]
local t_std  = _b[size] / _se[size]

// Robust HC1 standard errors
regress price size, vce(robust)
display ""
display "Comparison of Standard vs. Robust (HC1) Standard Errors:"
display "SE(size):    Standard = " `se_std'  "   Robust = " _se[size]
display "t-statistic: Standard = " `t_std'   "   Robust = " _b[size] / _se[size]

Paste into your Stata do-file editor

# =============================================================================
# CHAPTER 7 CHEAT SHEET: Statistical Inference for Bivariate Regression
# =============================================================================

# --- Libraries ---
library(haven)           # read Stata .dta files
library(fixest)          # fast OLS estimation with feols()
library(ggplot2)         # grammar of graphics

# =============================================================================
# STEP 1: Load data directly from a URL
# =============================================================================
url <- "https://raw.githubusercontent.com/quarcs-lab/data-open/master/AED/AED_HOUSE.DTA"
data_house <- read_dta(url)

cat("Dataset:", nrow(data_house), "observations,", ncol(data_house), "variables\n")

# =============================================================================
# STEP 2: Estimate the regression and extract key statistics
# =============================================================================
# feols() estimates OLS; the t-statistic measures how many SEs from zero
model <- feols(price ~ size, data = data_house)
summary(model)

slope    <- coef(model)["size"]
se_slope <- se(model)["size"]
t_stat   <- tstat(model)["size"]
p_value  <- pvalue(model)["size"]

cat("Slope:", round(slope, 2), "  SE:", round(se_slope, 2),
    "  t:", round(t_stat, 4), "  p:", round(p_value, 6), "\n")

# =============================================================================
# STEP 3: Confidence interval — a range of plausible values for b2
# =============================================================================
# CI = b2 +/- t_crit x se(b2), using T(n-2) distribution
n  <- nrow(data_house)
df <- n - 2
t_crit <- qt(0.975, df)

ci <- confint(model)                     # 95% CI for all coefficients
cat("95% CI for slope: [", round(ci["size", 1], 2), ",",
    round(ci["size", 2], 2), "]\n")

# =============================================================================
# STEP 4: Hypothesis tests — does size matter? Is the effect $90/sq ft?
# =============================================================================
# Test 1: H0: b2 = 0 (shown in summary output)
cat("Test H0: b2 = 0 -> t =", round(t_stat, 2),
    ", p =", round(p_value, 6), "-> Reject H0\n")

# Test 2: H0: b2 = 90 (manual t-test)
null_value <- 90
t_90 <- (slope - null_value) / se_slope
p_90 <- 2 * pt(-abs(t_90), df = df)
cat("Test H0: b2 = 90 -> t =", round(t_90, 4),
    ", p =", round(p_90, 4), "-> Fail to reject H0\n")

# =============================================================================
# STEP 5: One-sided test — does size increase price by less than $90/sq ft?
# =============================================================================
# H0: b2 >= 90  vs  Ha: b2 < 90 (lower-tailed test)
p_lower <- pt(t_90, df = df)             # one-sided p-value (left tail)

cat("One-sided test H0: b2 >= 90 vs Ha: b2 < 90\n")
cat("  t =", round(t_90, 4), ", one-sided p =", round(p_lower, 4), "\n")

# =============================================================================
# STEP 6: Robust standard errors — valid with or without heteroskedasticity
# =============================================================================
# vcov = "HC1" computes HC1 robust SEs (same as Stata's , robust)
model_robust <- feols(price ~ size, data = data_house, vcov = "HC1")

cat("\nComparison of Standard vs. Robust (HC1) Standard Errors:\n")
cat("SE(size):    Standard =", round(se_slope, 2),
    "  Robust =", round(se(model_robust)["size"], 2), "\n")
cat("t-statistic: Standard =", round(t_stat, 2),
    "  Robust =", round(tstat(model_robust)["size"], 2), "\n")

Paste into your R console or RStudio

Statistical Inference for Bivariate Regression

t-Distribution Explorer

Confidence Interval Simulator

Hypothesis Testing Framework

p-Value Explorer

Economic vs. Statistical Significance

Robust Standard Errors

House Price Data: Full Inference

Code Summary