Chapter 07 of 18 ยท Interactive Dashboard

Statistical Inference for Bivariate Regression

Slide, toggle, and simulate to build intuition for t-statistics, confidence intervals, hypothesis testing, p-values, and robust standard errors.

t-Distribution Explorer

When the sample is small, how confident can we be about ฮฒโ‚‚? The normal bell curve underpromises โ€” it does not account for estimating ฯƒ itself. That is the job of the t-distribution.

The t-distribution is the small-sample cousin of the normal. It is bell-shaped and symmetric, but its tails are heavier because we have to estimate the population variance from the sample. It is indexed by degrees of freedom (df = n โˆ’ 2 for bivariate regression), and converges to the normal as df grows โ€” for n > 100 the difference is negligible.
What you can do here
  • Slide degrees of freedom from 1 up to 100 and watch the t-curve melt into N(0,1).
  • Toggle "95% Tails" to see the two rejection regions.
  • Toggle "95% Center" to see the complementary confidence region.
df
5
t crit (2.5%)
โ€”
z crit (N(0,1))
1.960
Difference
โ€”
Variance df/(df-2)
โ€”
Try this
  1. Set df = 3 (only n = 5 observations). The t critical value is โ‰ˆ 3.18 versus 1.96 for the normal. With tiny samples you need much more extreme t-values to reject.
  2. Slide df to 27 (the house-price sample). tcrit = 2.052, only ~5% above 1.96. At this sample size the t and normal are already close enough that either gives almost the same decision.
  3. Push df to 100. The two curves visually overlap. This is why large-sample theory happily uses the normal approximation.

Take-away: When ฯƒ is estimated, the t-distribution is the honest ruler โ€” heavy-tailed for small samples, normal-like for big ones. Read ยง7.2 in the chapter โ†’

Confidence Interval Simulator

You ran one regression and got one confidence interval. So what does 95% actually mean โ€” about this interval, or about the procedure?

A 95% confidence interval is a statement about the procedure, not this specific interval. Across many resamples, 95% of constructed intervals contain ฮฒโ‚‚. It does NOT mean there is a 95% probability that ฮฒโ‚‚ lies inside the one you computed โ€” ฮฒโ‚‚ is fixed; the interval is the random thing. The formula is bโ‚‚ ± tnโˆ’2, ฮฑ/2 · se(bโ‚‚).
What you can do here
  • Increase the number of CIs to see the long-run coverage settle down.
  • Shrink the sample size n to watch each interval get wider.
  • Dial the confidence level from 80% to 99% and see the trade-off between width and coverage.
  • Click Resimulate to draw a fresh batch of samples.
True slope
2.0000
Coverage rate
โ€”
Avg CI width
โ€”
Missed
โ€”
Try this
  1. At n = 30 with 100 intervals, count the pink (missed) ones. You should see roughly 5; click Resimulate a few times. The miss rate jitters around 5 โ€” exactly what "95% coverage" promises.
  2. Drop n to 10. Intervals get much wider but coverage stays near 95%. The t-distribution automatically inflates the interval to compensate for small samples.
  3. Switch the confidence level to 80%. Intervals narrow sharply and about 20 in 100 miss. Width and coverage are two ends of the same lever.

Take-away: A CI is an instrument whose long-run hit rate you control โ€” the parameter is fixed; the intervals are the random things. Read ยง7.3 in the chapter โ†’

Hypothesis Testing Framework

Is a slope of $73.77/sq ft really different from a round number like $90? Or from zero? Hypothesis testing turns the question into a reproducible decision.

A hypothesis test is a reproducible rule for deciding about a parameter. Write the null H0 and alternative Ha, pick a significance level ฮฑ, compute t = (bโ‚‚ โˆ’ null) / se(bโ‚‚), turn t into a p-value, and reject H0 when p < ฮฑ. Two-sided tests catch deviations either way; one-sided tests target a direction and have more power there at the cost of none in the other tail. We never "accept" H0 โ€” only fail to reject.
What you can do here
  • Slide the null value to probe different claims โ€” try 0 (is there any effect?) and 90 (does each sq ft add $90?).
  • Dial ฮฑ between 1% and 10% to move the decision threshold.
  • Toggle Two-Sided / Right-Tail / Left-Tail to reshape the rejection region.
b2 (estimate)
73.77
se(b2)
11.17
t-statistic
โ€”
p-value
โ€”
Decision
โ€”
Try this
  1. Set null = 0 (two-sided). t = 6.60 sits far in the tail; p โ‰ˆ 0. House size clearly matters for price.
  2. Set null = 90 (two-sided). t = โˆ’1.45, p = 0.158. We fail to reject โ€” the data are consistent with a $90/sq ft effect.
  3. Keep null = 90 and switch to Left-Tail. The p-value halves to ~0.079. Directional tests are more powerful in their target direction, at the price of blindness in the other.

Take-away: A hypothesis test converts a claim about ฮฒโ‚‚ into a decision rule you can defend โ€” but you can only ever fail to reject H0, never accept it. Read ยง7.4 in the chapter โ†’

p-Value Explorer

What does "p < 0.05" actually mean? And why does moving the t-statistic a little shrink the p-value so much?

The p-value is the probability of seeing a test statistic at least this extreme if H0 were true. Small p means your data would be rare under H0 โ€” strong evidence against it. Rough thresholds: < 0.01 strong, < 0.05 moderate, < 0.10 weak. A small p does NOT mean H0 is false with that probability, and it says nothing about effect size.
What you can do here
  • Drag the t-statistic between โˆ’5 and 5 and watch the shaded area update live.
  • Change df to see how sample size shifts the curve shape.
  • Toggle the test type to switch between one- and two-tailed p-values.
|t|
โ€”
p-value
โ€”
Significant at 5%?
โ€”
Significant at 1%?
โ€”
Try this
  1. Set t = 1.96 with df = 27 (two-sided). The p-value lands just above 0.05. The t-tail needs slightly more extreme values than the normal does.
  2. Nudge to t = 2.05 (the df = 27 critical value). p is exactly 0.05. This is the boundary between "reject" and "fail to reject" at the 5% level.
  3. Drag t from 0 to 5 slowly. The p-value slides from 1.0 toward 0. The mapping is monotonic but very nonlinear โ€” small changes near the critical value move p a lot.

Take-away: p measures how surprising your data would be under H0 โ€” not how likely H0 is, and not how big the effect is. Read ยง7.5 in the chapter โ†’

Economic vs. Statistical Significance

A news headline says an effect is "statistically significant." Does that mean it matters in dollars? Not always โ€” sample size can dress up a trivial effect or hide an important one.

Statistical and economic significance answer different questions. Statistical asks "is the effect different from zero?" โ€” driven by t = bโ‚‚/se(bโ‚‚), which grows with n. Economic asks "is the effect large enough to matter?" โ€” driven by the coefficient's size in context. Large n can make trivial effects significant; small n can hide real ones. Always report both the estimate and its confidence interval.
What you can do here
  • Slide the true slope to set the population effect size.
  • Slide sample size n from 10 to 2,000 โ€” this is the key lever.
  • Adjust the error ฯƒ to change noise.
  • Click Resimulate for a fresh random draw.
True slope
โ€”
Estimated b2
โ€”
t-statistic
โ€”
p-value
โ€”
Significant?
โ€”
Try this
  1. Set slope = 0.10 and n = 50, then push n to 2,000. The same tiny slope flips from non-significant to highly significant. "Significant" does not mean "big."
  2. Set slope = 2.0 and n = 10, then bump n to 30. The clearly large effect flips from often failing significance to almost always passing. Small samples can hide real effects.
  3. Compare slope = 0.05 at n = 2,000 against slope = 1.5 at n = 15. One is "significant but tiny"; the other is "large but not significant." The second finding is usually more useful for a real decision.

Take-away: Significance and magnitude are two different dials โ€” always report the coefficient and its confidence interval, not just the p-value. Read ยง7.4 in the chapter โ†’

Robust Standard Errors

What happens to a textbook standard error when the noise is bigger in some parts of the data than others? The short answer: it lies, and our p-values lie with it.

Heteroskedasticity-robust (HC1) standard errors stay valid even when the error variance changes with x. Default SEs assume homoskedasticity (constant Var[u|x]). When that fails, coefficients are still unbiased but the default SEs are wrong, invalidating tests and CIs. HC1 SEs work in both cases โ€” modern best practice for cross-sectional data.
What you can do here
  • Toggle Homoskedastic / Heteroskedastic to change how errors scale with x.
  • Slide sample size n to see how stable each SE is.
  • Dial severity to push the heteroskedastic case harder.
  • Click Resimulate for a fresh random draw.
Standard se(b2)
โ€”
Robust se(b2)
โ€”
Ratio (Robust/Std)
โ€”
b2
โ€”
True slope
2.0000
Try this
  1. Start on Homoskedastic. Standard and robust SEs are nearly identical โ€” the ratio hugs 1.0. With constant variance, both formulas agree.
  2. Switch to Heteroskedastic at 100% severity. The residual plot opens into a fan. The robust SE is usually larger, which protects against false positives.
  3. Click Resimulate a few times under heteroskedasticity. Standard SEs bounce more; robust SEs are steadier. Robust SEs are paying for insurance โ€” you only notice the value when the default fails.

Take-away: When the error variance can change with x, reach for robust (HC1) SEs โ€” they cost you nothing when homoskedasticity holds. Read ยง7.7 in the chapter โ†’

House Price Data: Full Inference

Twenty-nine houses, one regression line โ€” how much can we actually say about the per-square-foot premium?

The six inference tools โ€” t-distribution, standard errors, CIs, hypothesis tests, p-values, robust SEs โ€” work together on real data. For these 29 Central Davis houses (1999): bโ‚‚ = $73.77/sq ft, se(bโ‚‚) = 11.17, 95% CI = [50.84, 96.70], robust SE = 11.33. The CI excludes 0 (the effect is real) but includes 90 (we cannot rule out a $90/sq ft premium).
What you can do here
  • Slide the null value to test different claims about $/sq ft.
  • Change the confidence level to tighten or widen the CI.
  • Toggle Standard / Robust to compare SE types on real data.
b2 (slope)
โ€”
se(b2)
โ€”
t-statistic
โ€”
p-value
โ€”
CI
โ€”
R-squared
โ€”
Try this
  1. Set null = 0. t = 6.60, p โ‰ˆ 0. House size predicts price beyond reasonable doubt.
  2. Set null = 90, then try null = 97. 90 sits inside the 95% CI, so we fail to reject; 97 is outside and the decision flips to "reject." The CI and the test point at the same boundary.
  3. Toggle between Standard and Robust SEs. The CI barely changes (11.17 vs 11.33). Heteroskedasticity looks mild in this dataset โ€” robust and standard agree.

Take-away: On real data, a narrow CI and a large-magnitude t-statistic tell the same story: house size matters, by about $74/sq ft. Read ยง7.1 in the chapter โ†’

Python Libraries and Code

You've explored the key concepts interactively โ€” now reproduce them in Python. This self-contained code block covers everything you practiced above. Copy it into an empty notebook and run it.

# =============================================================================
# CHAPTER 7 CHEAT SHEET: Statistical Inference for Bivariate Regression
# =============================================================================

# --- Libraries ---
import pandas as pd                       # data loading and manipulation
import matplotlib.pyplot as plt           # creating plots and visualizations
from statsmodels.formula.api import ols   # OLS regression with R-style formulas
from scipy import stats                   # t-distribution and critical values

# =============================================================================
# STEP 1: Load data directly from a URL
# =============================================================================
# pd.read_stata() reads Stata .dta files โ€” the house price dataset has 29 houses
url = "https://raw.githubusercontent.com/quarcs-lab/data-open/master/AED/AED_HOUSE.DTA"
data_house = pd.read_stata(url)

print(f"Dataset: {data_house.shape[0]} observations, {data_house.shape[1]} variables")

# =============================================================================
# STEP 2: Estimate the regression and extract key statistics
# =============================================================================
# The t-statistic measures how many standard errors the estimate is from zero
model = ols('price ~ size', data=data_house).fit()

slope     = model.params['size']       # marginal effect: $/sq ft
intercept = model.params['Intercept']
se_slope  = model.bse['size']          # standard error of the slope
t_stat    = model.tvalues['size']      # t = b2 / se(b2)
p_value   = model.pvalues['size']      # two-sided p-value for H0: b2 = 0

print(f"Estimated equation: price = {intercept:,.0f} + {slope:.2f} x size")
print(f"Standard error of slope: {se_slope:.2f}")
print(f"t-statistic: {t_stat:.4f}")
print(f"p-value: {p_value:.6f}")

# Full regression table (coefficients, std errors, t-stats, p-values, R2)
model.summary()

# =============================================================================
# STEP 3: Confidence interval โ€” a range of plausible values for b2
# =============================================================================
# CI = b2 +/- t_crit x se(b2), using T(n-2) distribution
n  = len(data_house)
df = n - 2                                        # degrees of freedom
t_crit = stats.t.ppf(0.975, df)                   # critical value for 95% CI

ci_lower = slope - t_crit * se_slope
ci_upper = slope + t_crit * se_slope

print(f"Degrees of freedom: {df}")
print(f"Critical t-value (alpha=0.05, two-sided): {t_crit:.4f}")
print(f"95% CI for slope: [{ci_lower:.2f}, {ci_upper:.2f}]")
print(f"Interpretation: each sq ft adds between ${ci_lower:.0f} and ${ci_upper:.0f} to price")

# =============================================================================
# STEP 4: Hypothesis tests โ€” does size matter? Is the effect $90/sq ft?
# =============================================================================
# Test 1: Statistical significance (H0: b2 = 0)
print(f"Test H0: b2 = 0  ->  t = {t_stat:.2f}, p = {p_value:.6f}  ->  Reject H0")

# Test 2: Two-sided test for a specific value (H0: b2 = 90)
null_value = 90
t_90 = (slope - null_value) / se_slope
p_90 = 2 * (1 - stats.t.cdf(abs(t_90), df))

print(f"Test H0: b2 = 90  ->  t = {t_90:.4f}, p = {p_90:.4f}  ->  Fail to reject H0")
print(f"  (90 is inside the 95% CI [{ci_lower:.2f}, {ci_upper:.2f}])")

# =============================================================================
# STEP 5: One-sided test โ€” does size increase price by less than $90/sq ft?
# =============================================================================
# H0: b2 >= 90  vs  Ha: b2 < 90 (lower-tailed test)
p_lower = stats.t.cdf(t_90, df)                   # one-sided p-value (left tail)

print(f"One-sided test H0: b2 >= 90 vs Ha: b2 < 90")
print(f"  t = {t_90:.4f}, one-sided p = {p_lower:.4f}")
print(f"  Fail to reject at 5% (p = {p_lower:.3f} > 0.05)")
print(f"  Would reject at 10% (p = {p_lower:.3f} < 0.10)")

# =============================================================================
# STEP 6: Robust standard errors โ€” valid with or without heteroskedasticity
# =============================================================================
# HC1 robust SEs protect against non-constant variance in the errors
robust_model = ols('price ~ size', data=data_house).fit(cov_type='HC1')

print(f"{'':20s} {'Standard':>12s} {'Robust (HC1)':>12s}")
print("-" * 46)
print(f"{'SE(size)':<20s} {se_slope:>12.2f} {robust_model.bse['size']:>12.2f}")
print(f"{'t-statistic':<20s} {t_stat:>12.2f} {robust_model.tvalues['size']:>12.2f}")
print(f"{'p-value':<20s} {p_value:>12.6f} {robust_model.pvalues['size']:>12.6f}")
Open empty Colab notebook โ†’