Chapter 04 of 18 ยท Interactive Dashboard

Statistical Inference for the Mean

Explore confidence intervals, hypothesis testing, and the t-distribution through interactive controls โ€” using the same datasets and examples as the book.

Standard Error & Sample Size

How precise is our $41k estimate of average earnings? And how much more data would you need to halve that uncertainty?

The standard error se(X̄) = s/√n measures the precision of the sample mean as an estimate of μ. It quantifies sampling uncertainty โ€” smaller SEs mean more precise estimates. Because SE shrinks with √n, not with n, quadrupling the sample size is required to halve the standard error.
What you can do here
  • Slide the sample size n from 10 to 500 and watch SE and the 95% CI width update.
  • Compare to the book's n = 171 โ€” the default position of the slider.
  • Hover the curve to read SE at any point along the 1/√n shape.
n = 171
n
171
SD (s)
โ€”
SE (s/√n)
โ€”
Approx 95% CI width
โ€”
Try This
  1. Set n = 171 (the book's sample), then set n = 684 (4× larger). SE halves from ~$1,952 to ~$976 โ€” the √n rule in action: halving precision costs four times more data.
  2. Set n = 50. SE jumps to ~$3,611 and the approximate 95% CI widens to ~$14,400 โ€” under-sampling doesn't just shrink the window slightly, it nearly doubles it.
  3. Set n = 400. SE drops to ~$1,276 โ€” collecting 229 more observations shaves only ~$676 off the SE, the diminishing-returns shoulder of the 1/√n curve.

Take-away: SE = s/√n quantifies precision; doubling precision always costs four times more data. Read ยง4.1 in the chapter โ†’

t-Distribution Explorer

Why do we need a whole family of bell curves โ€” one for every sample size? Because estimating σ from the sample adds uncertainty the normal curve can't see.

The t-distribution is used when σ is unknown and estimated from the sample. It resembles the standard normal N(0, 1) but with fatter tails that reflect the extra uncertainty from using s in place of σ. As n grows, the t-distribution approaches the normal โ€” which is why the normal is a good approximation for large samples (the familiar "n ≥ 30" rule of thumb).
What you can do here
  • Toggle degrees of freedom between df = 4, 10, 30, and 100.
  • Toggle the Normal N(0,1) overlay to compare the two curves directly.
  • Watch the t-critical value in the stat cards โ€” it shrinks toward 1.96 as df grows.
df
30
t-crit (5%, two-sided)
โ€”
z-crit (Normal)
1.960
Tail area beyond ±1.96
โ€”
Try This
  1. Set df = 4 with the Normal overlay on. The t-tails are visibly fatter and t-crit = 2.78 (vs 1.96 for the normal) โ€” with tiny samples, you need a much larger t-statistic to reject at 5%.
  2. Set df = 30. t-crit drops to ~2.04, just barely above 1.96 โ€” the classic "n ≥ 30 is large enough" rule is exactly this small gap becoming negligible.
  3. Set df = 100. The two curves are visually indistinguishable and tail area beyond ±1.96 is ~5% โ€” for reasonable samples, the normal is a perfectly usable stand-in for the t.

Take-away: The t-distribution pays for estimating σ with fatter tails and refunds the cost as n grows. Read ยง4.2 in the chapter โ†’

Confidence Interval Visualizer

A point estimate of $41k is a single number. What range of μ values is the data actually compatible with โ€” and how does that range change with our confidence requirement?

A confidence interval provides a range of plausible values for μ. A 95% CI means: if we repeated the sampling procedure many times, approximately 95% of the resulting intervals would contain the true μ. The interval is constructed as X̄ ± tα/2 × SE(X̄) โ€” wider intervals indicate less precision.
What you can do here
  • Slide the confidence level from 80% to 99% and watch the CI expand or contract.
  • Toggle between Female Earnings and Gas Prices to see how SE differences reshape the CI.
  • Compare the margin of error and CI width in the stat cards at each setting.
95%
โ€”
t-critical
โ€”
Margin of error
โ€”
CI Lower
โ€”
CI Upper
โ€”
Width
โ€”
Try This
  1. At 95%, note the CI for Female Earnings: [$37,559, $45,266] (width $7,707). Push to 99%. The CI widens to about $10,171 โ€” a 4-point jump in confidence buys you a 30% wider interval.
  2. Drop the confidence level to 90%. The CI narrows to ~$6,457 โ€” you gain precision but accept that roughly 1 in 10 such intervals would miss the true μ.
  3. Switch to Gas Prices at 95%. The CI is only about $0.11 wide because SE is tiny, and $3.81 (the CA average) sits far outside โ€” the data rejects the state average before any formal test is run.

Take-away: Confidence intervals trade precision for confidence โ€” the interval widens every time you demand a stronger claim. Read ยง4.3 in the chapter โ†’

Two-Sided Hypothesis Test

Is the population earnings mean really $40,000 โ€” or is our $41k sample mean too far off for that story to hold?

A hypothesis test evaluates whether data provide sufficient evidence to reject a specific claim (H0) about a parameter. The t-statistic measures how many standard errors the estimate is from the hypothesized value. The p-value is the probability of observing a result at least as extreme as ours if H0 were true. Small p-values (< α, typically 0.05) provide evidence against H0 and lead us to reject it.
What you can do here
  • Slide the hypothesized value μ0 from $30k to $55k.
  • Watch the t-statistic, p-value, and REJECT / FAIL-TO-REJECT badge update live.
  • See the shaded rejection regions on the t-distribution shift as μ0 moves.
$40,000
โ€”
SE
โ€”
t-statistic
โ€”
p-value
โ€”
±t-crit (5%)
โ€”
โ€”
Try This
  1. Set μ0 = $40,000 (the book's example). t ≈ 0.72, p ≈ 0.470 โ€” the t-marker sits well inside the white region, so we fail to reject: $40,000 remains a plausible μ.
  2. Drag μ0 down toward $35,000 and watch p. The μ0 where p first crosses 0.05 is exactly the 95% CI endpoint โ€” tests and CIs are the same object seen from opposite sides.
  3. Set μ0 = $50,000. t is sharply negative, p is tiny, and the decision flips to REJECT โ€” $50k is too far from the sample mean to survive as a population mean.

Take-away: A hypothesis test is just a confidence interval read in reverse โ€” the values you'd reject are the ones outside the interval. Read ยง4.4 in the chapter โ†’

One-Sided Hypothesis Test

If theory says earnings should be higher than $40k โ€” not just different โ€” can we extract more statistical power from the same data?

One-sided tests concentrate the rejection region in one tail of the distribution, making them more powerful for detecting effects in the specified direction. Use them when theory predicts a specific direction. The one-sided p-value is exactly half the two-sided p-value (when the effect is in the predicted direction), and critical values are smaller โ€” e.g. 1.65 vs ±1.96 at α = 0.05.
What you can do here
  • Slide μ0 to set the hypothesized value.
  • Toggle the alternative direction between Ha: μ > μ0 and Ha: μ < μ0.
  • Compare the one-sided and two-sided p-values side by side in the stat cards.
$40,000
t-statistic
โ€”
p-value (one-sided)
โ€”
p-value (two-sided)
โ€”
t-crit (one-sided, 5%)
โ€”
โ€”
Try This
  1. Set μ0 = $40,000, Ha: μ > $40,000. One-sided p ≈ 0.235, exactly half of the two-sided 0.470 โ€” neither test is significant, but the directional version uses its power efficiently by only looking right.
  2. Set μ0 = $38,000 with Ha: μ > $38,000. The one-sided test rejects at 5% while the two-sided barely hangs on โ€” the directional version's "extra power" is real and visible here.
  3. Toggle to Ha: μ < μ0 while the t-stat is positive. p jumps near 1.0 โ€” testing the wrong direction leaves you defenseless: the data is maximally un-supportive of your alternative.

Take-away: One-sided tests double the power in the predicted direction but leave you unable to detect effects that go the other way. Read ยง4.6 in the chapter โ†’

Gasoline Prices: Statistical vs Practical Significance

Yolo County gas is $0.14/gallon cheaper than the California average โ€” stat-sig at p < 0.0001. But is that $2 per fill-up worth crossing town for?

Even small practical differences can be statistically significant when the sample gives precise estimates. The gasoline-price gap of $0.14 looks trivial, but the standard error is tiny ($0.027), so the t-statistic is large (−5.26) โ€” the difference sits many standard errors from zero. Statistical significance answers "Is there a difference?"; practical significance asks "Does the difference matter?" Both questions matter in econometrics.
What you can do here
  • Slide the hypothesized CA state average from $3.40 to $4.10.
  • Watch the t-statistic, p-value, and REJECT / FAIL badge update live.
  • Compare H0 to the sample mean ($3.67) โ€” the closer they are, the larger p becomes.
$3.81
Sample mean
โ€”
SE
โ€”
t-statistic
โ€”
p-value
โ€”
โ€”
Try This
  1. Set H0 = $3.81 (the book's CA average). t = −5.26, p < 0.0001, REJECT โ€” the tiny SE ($0.027) turns a $0.14 gap into overwhelming statistical evidence.
  2. Slide H0 toward $3.67 (the sample mean). p grows steeply; it crosses 0.05 near the 95% CI endpoints around $3.62 and $3.72 โ€” inside that narrow window, the data fails to reject.
  3. Set H0 = $3.67 exactly. t ≈ 0, p ≈ 1.0 โ€” we cannot reject that μ equals our own sample mean, a reassuring sanity check on the machinery.

Take-away: With a small SE, tiny effects become statistically significant โ€” always ask whether the difference also matters practically. Read ยง4.5 in the chapter โ†’

Inference for Proportions

A poll says 52.1% plan to vote Democrat. Is that a real lead โ€” or just sampling noise dressed up as a majority?

Proportions are simply means of 0/1 binary data, so the whole inference toolkit carries over. Confidence intervals, hypothesis tests, and standard errors for proportions use the same logic as for means โ€” only the SE formula changes: se(p̂) = √(p̂(1−p̂)/n), which follows from the variance of a binary variable. The sample proportion p̂ is literally the sample mean of the 0/1 codings.
What you can do here
  • Slide the sample proportion p̂ from 0.30 to 0.70.
  • Slide the sample size n from 50 to 5,000 and watch SE shrink.
  • Slide the hypothesized proportion p0 to test any null value.
  • Watch the z-statistic, p-value, 95% CI, and decision badge update together.
p̂ = 0.52
n = 921
p₀ = 0.50
SE
โ€”
95% CI Lower
โ€”
95% CI Upper
โ€”
z-statistic
โ€”
p-value
โ€”
โ€”
Try This
  1. Keep defaults (p̂ = 0.52, n = 921, p0 = 0.50). This matches the book's voter example: SE ≈ 0.0165, 95% CI ≈ [48.9%, 55.3%] โ€” since 50% sits inside, we fail to reject and the "majority" is within sampling noise.
  2. Keep p̂ = 0.52 and push n to 2,500. The CI tightens enough that 50% now falls outside โ€” the same observed proportion becomes statistically significant purely because SE shrank with √n.
  3. Set p̂ = 0.50 exactly. z = 0, p = 1.0, and the CI centers on p0 โ€” when the estimate equals the hypothesis, there is nothing to reject.

Take-away: Proportions are means of 0/1 data โ€” the whole inference toolkit (SE, CI, z-tests) carries over, with only the SE formula changing. Read ยง4.7 in the chapter โ†’

Python Libraries and Code

You've explored the key concepts interactively โ€” now reproduce them in Python. This self-contained code block covers everything you practiced above. Copy it into an empty notebook and run it.

# =============================================================================
# CHAPTER 4 CHEAT SHEET: Statistical Inference for the Mean
# =============================================================================

# --- Libraries ---
import numpy as np                       # numerical operations (sqrt, linspace)
import pandas as pd                      # data loading and manipulation
import matplotlib.pyplot as plt          # creating plots and visualizations
from scipy import stats                  # t-distribution and normal distribution

# =============================================================================
# STEP 1: Load data directly from a URL
# =============================================================================
# pd.read_stata() reads Stata .dta files โ€” this sample has 171 female workers
url = "https://raw.githubusercontent.com/quarcs-lab/data-open/master/AED/AED_EARNINGS.DTA"
data_earnings = pd.read_stata(url)
earnings = data_earnings['earnings']

print(f"Dataset: {len(earnings)} observations")

# =============================================================================
# STEP 2: Sample statistics and standard error
# =============================================================================
# The standard error measures the precision of the sample mean as an estimate
# of the population mean. It shrinks with sqrt(n), not n โ€” so quadrupling the
# sample size is needed to halve the SE.
n             = len(earnings)
mean_earnings = earnings.mean()
std_earnings  = earnings.std(ddof=1)           # ddof=1 for sample std dev
se_earnings   = std_earnings / np.sqrt(n)      # standard error = s / sqrt(n)

print(f"Sample mean:        ${mean_earnings:,.2f}")
print(f"Standard deviation: ${std_earnings:,.2f}")
print(f"Standard error:     ${se_earnings:,.2f}")

# =============================================================================
# STEP 3: t-distribution vs standard normal
# =============================================================================
# When sigma is unknown (estimated by s), we use the t-distribution instead of
# the normal. It has fatter tails โ€” more probability in the extremes โ€” but
# approaches the normal as n grows (the "n >= 30" rule of thumb).
x = np.linspace(-4, 4, 200)

fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(x, stats.norm.pdf(x), '--', linewidth=2, label='Standard Normal')
ax.plot(x, stats.t.pdf(x, df=4), linewidth=2, label='t(4) โ€” fatter tails')
ax.plot(x, stats.t.pdf(x, df=30), linewidth=2, label='t(30) โ€” nearly normal')
ax.set_xlabel('t value')
ax.set_ylabel('Density')
ax.set_title('t-Distribution Approaches Normal as Degrees of Freedom Increase')
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# =============================================================================
# STEP 4: Confidence interval โ€” a range of plausible values for mu
# =============================================================================
# A 95% CI means: if we repeatedly sampled and built CIs, about 95% would
# contain the true mu. Formula: x-bar +/- t_crit * SE
alpha  = 0.05
t_crit = stats.t.ppf(1 - alpha / 2, n - 1)    # critical value from t(n-1)
margin = t_crit * se_earnings

ci_lower = mean_earnings - margin
ci_upper = mean_earnings + margin

print(f"Critical value (t_170, 0.025): {t_crit:.4f}")
print(f"Margin of error:               ${margin:,.2f}")
print(f"95% CI: [${ci_lower:,.2f}, ${ci_upper:,.2f}]")

# Compare 90%, 95%, and 99% โ€” higher confidence requires wider intervals
print(f"\n{'Level':<8} {'Lower':>12} {'Upper':>12} {'Width':>10}")
print("-" * 44)
for conf in [0.90, 0.95, 0.99]:
    tc = stats.t.ppf(1 - (1 - conf) / 2, n - 1)
    lo = mean_earnings - tc * se_earnings
    hi = mean_earnings + tc * se_earnings
    print(f"{conf*100:.0f}%{lo:>14,.2f}{hi:>14,.2f}{hi - lo:>12,.2f}")

# =============================================================================
# STEP 5: Two-sided hypothesis test โ€” H0: mu = $40,000
# =============================================================================
# The t-statistic measures how many standard errors the estimate is from the
# hypothesized value. The p-value is the probability of a result at least as
# extreme as ours if H0 were true.
mu0    = 40000
t_stat = (mean_earnings - mu0) / se_earnings
p_val  = 2 * (1 - stats.t.cdf(abs(t_stat), n - 1))  # two-sided p-value

print(f"H0: mu = ${mu0:,}  vs  Ha: mu != ${mu0:,}")
print(f"t-statistic: {t_stat:.4f}")
print(f"p-value:     {p_val:.4f}")
print(f"Decision:    {'Reject H0' if p_val < 0.05 else 'Do not reject H0'} at alpha = 0.05")

# One-sided (upper-tailed): H0: mu <= 40000 vs Ha: mu > 40000
p_val_one = 1 - stats.t.cdf(t_stat, n - 1)            # upper tail only
t_crit_one = stats.t.ppf(0.95, n - 1)                  # one-sided critical value

print(f"\nOne-sided (upper): p-value = {p_val_one:.4f} (= two-sided / 2)")
print(f"One-sided critical value:    {t_crit_one:.4f} (vs two-sided {t_crit:.4f})")

# =============================================================================
# STEP 6: Statistical vs practical significance โ€” gasoline prices
# =============================================================================
# A $0.14 price gap looks small, but the SE is tiny โ€” so the t-statistic is
# huge and p < 0.0001. Statistical significance says "the difference is real";
# practical significance asks "does it matter?"
url_gas   = "https://raw.githubusercontent.com/quarcs-lab/data-open/master/AED/AED_GASPRICE.DTA"
data_gas  = pd.read_stata(url_gas)
price     = data_gas['price']

n_gas     = len(price)
mean_gas  = price.mean()
se_gas    = price.std(ddof=1) / np.sqrt(n_gas)
mu0_gas   = 3.81                                        # CA state average
t_gas     = (mean_gas - mu0_gas) / se_gas
p_gas     = 2 * (1 - stats.t.cdf(abs(t_gas), n_gas - 1))

print(f"Gas price test: H0: mu = ${mu0_gas}")
print(f"Sample mean: ${mean_gas:.4f}   SE: ${se_gas:.4f}")
print(f"t = {t_gas:.4f},  p = {p_gas:.6f}")
print(f"Decision: {'Reject H0' if p_gas < 0.05 else 'Do not reject H0'}")
print(f"Practical: ${mu0_gas - mean_gas:.2f}/gallon * 15 gal = ${(mu0_gas - mean_gas)*15:.2f} per tank")

# =============================================================================
# STEP 7: Inference for proportions โ€” binary (0/1) data
# =============================================================================
# Proportions are just means of 0/1 variables. The same CI and hypothesis-test
# logic applies โ€” only the SE formula changes: sqrt(p_hat*(1-p_hat)/n).
n_voters  = 921
n_dem     = 480
p_hat     = n_dem / n_voters
se_prop   = np.sqrt(p_hat * (1 - p_hat) / n_voters)

# 95% confidence interval (use z for large-sample proportions)
ci_lo = p_hat - 1.96 * se_prop
ci_hi = p_hat + 1.96 * se_prop

# Hypothesis test: H0: p = 0.50
p0        = 0.50
se_h0     = np.sqrt(p0 * (1 - p0) / n_voters)
z_stat    = (p_hat - p0) / se_h0
p_val_z   = 2 * (1 - stats.norm.cdf(abs(z_stat)))

print(f"Sample proportion: {p_hat:.4f} ({p_hat*100:.1f}%)")
print(f"SE: {se_prop:.4f}")
print(f"95% CI: [{ci_lo:.4f}, {ci_hi:.4f}]")
print(f"\nH0: p = {p0}  z = {z_stat:.4f}  p-value = {p_val_z:.4f}")
print(f"Decision: {'Reject H0' if abs(z_stat) > 1.96 else 'Do not reject H0'}")
Open empty Colab notebook โ†’