R-squared and Adjusted R-squared: Linear Regression
🎯 Core Idea
- R-squared ($R^2$): Measures the proportion of variance in the dependent variable explained by the independent variables.
- Adjusted R-squared ($\bar{R}^2$): Corrects $R^2$ by penalizing the inclusion of irrelevant predictors, preventing overfitting.
🌱 Intuition & Real-World Analogy
-
R-squared as “explanation power”: Imagine you’re explaining why people’s salaries vary. If your model explains 70% of the variation, then $R^2 = 0.7$. The remaining 30% is “unexplained randomness.”
-
Analogy 1 – Movie Recommendation: R-squared is like measuring how much of a person’s movie taste is captured by factors like genre, director, and actor. Add more features, and you can only capture more—or at least not less.
-
Analogy 2 – Exam Scores: Suppose you’re predicting student exam scores using hours studied. If this explains 60% of score differences ($R^2 = 0.6$), adding “favorite color” as a feature won’t reduce $R^2$. But Adjusted $R^2$ would penalize such useless predictors.
📐 Mathematical Foundation
1. R-squared
$$ R^2 = 1 - \frac{SS_\text{res}}{SS_\text{tot}} $$Where:
- $SS_\text{res} = \sum (y_i - \hat{y}_i)^2$ → Residual Sum of Squares (unexplained variance).
- $SS_\text{tot} = \sum (y_i - \bar{y})^2$ → Total Sum of Squares (total variance in the data).
So, $R^2$ = proportion of total variance explained by the model.
2. Adjusted R-squared
$$ \bar{R}^2 = 1 - \left(1 - R^2\right) \cdot \frac{n - 1}{n - p - 1} $$Where:
- $n$ = number of observations.
- $p$ = number of predictors/features.
Key property: Unlike $R^2$, Adjusted $R^2$ can decrease when irrelevant predictors are added.
Deep Dive: Why R-squared never decreases
- $SS_\text{res}$ always decreases (or stays the same) when more predictors are added, since the model can “fit” more.
- Therefore, $R^2$ cannot go down.
- But this doesn’t mean the model is better—it may just be memorizing noise.
⚖️ Strengths, Limitations & Trade-offs
R-squared Strengths:
- Easy to interpret (“explains X% of variance”).
- Always between 0 and 1.
R-squared Limitations:
- Always increases (or stays same) when more features are added → misleading.
- High $R^2$ does not imply a good model (can result from overfitting).
- Does not indicate causality.
Adjusted R-squared Strengths:
- Penalizes extra features → better for comparing models with different numbers of predictors.
- Useful for model selection.
Adjusted R-squared Limitations:
- Still not foolproof: doesn’t account for feature importance, just count.
- Can be biased in small samples.
🔍 Variants & Extensions
- Pseudo R-squared (for logistic regression): Since variance-explained doesn’t make sense for categorical outcomes, alternative metrics like McFadden’s $R^2$ are used.
- Cross-validated $R^2$: Instead of in-sample fit, evaluates predictive performance on unseen data.
🚧 Common Challenges & Pitfalls
- Misinterpretation: A high $R^2$ doesn’t mean the model is correct, only that it fits data well.
- Overfitting Trap: Adding irrelevant predictors inflates $R^2$ but lowers generalization.
- Adjusted $R^2$ Misuse: Not a silver bullet; still possible to include harmful features if data is noisy.
- Negative $R^2$: Possible when model performs worse than the mean predictor (common in poor fits).
📚 Reference Pointers
- Wikipedia: Coefficient of Determination
- Penn State STAT 501: R-squared vs Adjusted R-squared
- Elements of Statistical Learning (Hastie, Tibshirani, Friedman) – Chapter on Model Assessment