p-values and Confidence Intervals: Linear Regression

4 min read 801 words

🪄 Step 1: Intuition & Motivation

Core Idea: Now that our regression model is trained, we know how well it fits (thanks to $R^2$). But what if we want to know which features truly matter? That’s where p-values and confidence intervals step in — they help us separate real signal from random noise.
Simple Analogy: Imagine you’re listening to a band where each musician (feature) claims they’re important to the final song (the prediction). P-values help you decide which musicians are actually playing loud enough to make a real difference, and which are just pretending to strum.

🌱 Step 2: Core Concept

What’s Happening Under the Hood?

Each regression coefficient $\beta_j$ represents the influence of a particular feature on the target, assuming others are fixed.
But is this influence real or just due to chance?

We test this through hypothesis testing:

Null Hypothesis ($H_0$): The coefficient $\beta_j = 0$ (feature has no effect).
Alternative Hypothesis ($H_1$): The coefficient $\beta_j \neq 0$ (feature does affect the target).

We use the p-value to decide:

If the p-value is small (usually < 0.05), the evidence against $H_0$ is strong — meaning that feature likely matters.

Why It Works This Way

When we estimate coefficients, we’re using sample data — and samples always have randomness.
If a coefficient looks large, we ask: Could this have happened by random chance?

By comparing the estimated coefficient to its estimated standard error, we compute a t-statistic — which measures “how many standard errors away” our estimate is from zero.
A large t-statistic → small p-value → statistically significant.

How It Fits in ML Thinking

P-values are the bridge between statistical inference and machine learning.
In classical statistics, we ask: “Is this feature truly significant?”
In modern ML, we ask: “Does this feature consistently improve prediction performance?”
Different questions, same roots — both assess trust in relationships between variables.

📐 Step 3: Mathematical Foundation

t-Statistic and p-value

For each coefficient $\beta_j$:

$$ t_j = \frac{\hat{\beta_j}}{SE(\hat{\beta_j})} $$

Where:

$\hat{\beta_j}$ = estimated coefficient
$SE(\hat{\beta_j})$ = standard error of that coefficient

We then find the probability of observing a $t_j$ this extreme under $H_0$ (assuming $\beta_j = 0$).
That probability is the p-value.

Interpretation:

Small p-value (< 0.05): Strong evidence against $H_0$.
Large p-value: Feature may not have meaningful effect.

P-value measures “how surprised” we’d be to see this coefficient if the feature truly had no impact.
Smaller p = “Wow, this effect seems too strong to be random!”

Confidence Interval (CI)

$$ CI_j = \hat{\beta_j} \pm t^* \cdot SE(\hat{\beta_j}) $$

Where $t^*$ is the critical value from the t-distribution (depends on desired confidence, e.g., 95%).

Interpretation:
A 95% CI means: If we repeated this experiment many times, 95% of those intervals would contain the true coefficient.

If 0 lies inside the CI → coefficient might not be significant.
If 0 lies outside → effect likely real.

A confidence interval is like a range of reasonable guesses for the true value of $\beta$.
Wide interval = uncertain estimate.
Narrow interval = precise, confident estimate.

🧠 Step 4: Key Ideas and Assumptions

1️⃣ Sampling variability:
Coefficients are estimates — if data changes, so do their values.

2️⃣ Normality of residuals:
P-values assume errors are roughly normally distributed.

3️⃣ Independence:
If features are highly correlated (multicollinearity), standard errors inflate → p-values become unreliable.

4️⃣ Significance vs. importance:
Statistical significance ≠ practical importance.
A feature might be significant but have a tiny effect size.

⚖️ Step 5: Strengths, Limitations & Trade-offs

Quantifies certainty about feature effects.
Helps identify which predictors are truly relevant.
Works well when assumptions (normality, independence) hold.

Overreliance on p-values can mislead (especially with large datasets).
Sensitive to sample size — large $n$ can make even trivial effects “significant.”
Assumes perfect model specification (which rarely holds in ML).

P-values give interpretability and confidence,
but modern ML often favors validation metrics (e.g., RMSE, cross-validation) over pure significance testing.
Both views are valuable — one focuses on truth, the other on usefulness.

🚧 Step 6: Common Misunderstandings

🚨 Common Misunderstandings (Click to Expand)

“A small p-value means a strong effect.”
Not necessarily — it just means the effect is unlikely to be zero, not that it’s large.
“Non-significant = feature is useless.”
No — maybe the data is noisy, or sample size too small.
“P-values tell you model quality.”
They don’t. They only assess individual coefficients, not overall model performance.

🧩 Step 7: Mini Summary

🧠 What You Learned: P-values test whether each regression coefficient likely represents a real effect, while confidence intervals show a plausible range for that effect.

⚙️ How It Works: Compute $t = \frac{\beta}{SE}$, derive a p-value, and check whether 0 lies in the confidence interval.

🎯 Why It Matters: These concepts help distinguish statistical noise from genuine signal — the foundation of rigorous model interpretation.

Polynomial and Interaction Terms: Linear Regression Outliers and Robust Regression: Linear Regression