p-values and Confidence Intervals: Linear Regression
🪄 Step 1: Intuition & Motivation
Core Idea: Now that our regression model is trained, we know how well it fits (thanks to $R^2$). But what if we want to know which features truly matter? That’s where p-values and confidence intervals step in — they help us separate real signal from random noise.
Simple Analogy: Imagine you’re listening to a band where each musician (feature) claims they’re important to the final song (the prediction). P-values help you decide which musicians are actually playing loud enough to make a real difference, and which are just pretending to strum.
🌱 Step 2: Core Concept
What’s Happening Under the Hood?
Each regression coefficient $\beta_j$ represents the influence of a particular feature on the target, assuming others are fixed.
But is this influence real or just due to chance?
We test this through hypothesis testing:
- Null Hypothesis ($H_0$): The coefficient $\beta_j = 0$ (feature has no effect).
- Alternative Hypothesis ($H_1$): The coefficient $\beta_j \neq 0$ (feature does affect the target).
We use the p-value to decide:
If the p-value is small (usually < 0.05), the evidence against $H_0$ is strong — meaning that feature likely matters.
Why It Works This Way
When we estimate coefficients, we’re using sample data — and samples always have randomness.
If a coefficient looks large, we ask: Could this have happened by random chance?
By comparing the estimated coefficient to its estimated standard error, we compute a t-statistic — which measures “how many standard errors away” our estimate is from zero.
A large t-statistic → small p-value → statistically significant.
How It Fits in ML Thinking
In classical statistics, we ask: “Is this feature truly significant?”
In modern ML, we ask: “Does this feature consistently improve prediction performance?”
Different questions, same roots — both assess trust in relationships between variables.
📐 Step 3: Mathematical Foundation
t-Statistic and p-value
For each coefficient $\beta_j$:
$$ t_j = \frac{\hat{\beta_j}}{SE(\hat{\beta_j})} $$Where:
- $\hat{\beta_j}$ = estimated coefficient
- $SE(\hat{\beta_j})$ = standard error of that coefficient
We then find the probability of observing a $t_j$ this extreme under $H_0$ (assuming $\beta_j = 0$).
That probability is the p-value.
Interpretation:
- Small p-value (< 0.05): Strong evidence against $H_0$.
- Large p-value: Feature may not have meaningful effect.
Smaller p = “Wow, this effect seems too strong to be random!”
Confidence Interval (CI)
Where $t^*$ is the critical value from the t-distribution (depends on desired confidence, e.g., 95%).
Interpretation:
A 95% CI means: If we repeated this experiment many times, 95% of those intervals would contain the true coefficient.
If 0 lies inside the CI → coefficient might not be significant.
If 0 lies outside → effect likely real.
Wide interval = uncertain estimate.
Narrow interval = precise, confident estimate.
🧠 Step 4: Key Ideas and Assumptions
1️⃣ Sampling variability:
Coefficients are estimates — if data changes, so do their values.
2️⃣ Normality of residuals:
P-values assume errors are roughly normally distributed.
3️⃣ Independence:
If features are highly correlated (multicollinearity), standard errors inflate → p-values become unreliable.
4️⃣ Significance vs. importance:
Statistical significance ≠ practical importance.
A feature might be significant but have a tiny effect size.
⚖️ Step 5: Strengths, Limitations & Trade-offs
- Quantifies certainty about feature effects.
- Helps identify which predictors are truly relevant.
- Works well when assumptions (normality, independence) hold.
- Overreliance on p-values can mislead (especially with large datasets).
- Sensitive to sample size — large $n$ can make even trivial effects “significant.”
- Assumes perfect model specification (which rarely holds in ML).
but modern ML often favors validation metrics (e.g., RMSE, cross-validation) over pure significance testing.
Both views are valuable — one focuses on truth, the other on usefulness.
🚧 Step 6: Common Misunderstandings
🚨 Common Misunderstandings (Click to Expand)
“A small p-value means a strong effect.”
Not necessarily — it just means the effect is unlikely to be zero, not that it’s large.“Non-significant = feature is useless.”
No — maybe the data is noisy, or sample size too small.“P-values tell you model quality.”
They don’t. They only assess individual coefficients, not overall model performance.
🧩 Step 7: Mini Summary
🧠 What You Learned: P-values test whether each regression coefficient likely represents a real effect, while confidence intervals show a plausible range for that effect.
⚙️ How It Works: Compute $t = \frac{\beta}{SE}$, derive a p-value, and check whether 0 lies in the confidence interval.
🎯 Why It Matters: These concepts help distinguish statistical noise from genuine signal — the foundation of rigorous model interpretation.