R-squared and Adjusted R-squared: Linear Regression

3 min read 497 words

🎯 Core Idea

  • R-squared ($R^2$): Measures the proportion of variance in the dependent variable explained by the independent variables.
  • Adjusted R-squared ($\bar{R}^2$): Corrects $R^2$ by penalizing the inclusion of irrelevant predictors, preventing overfitting.

🌱 Intuition & Real-World Analogy

  • R-squared as “explanation power”: Imagine you’re explaining why people’s salaries vary. If your model explains 70% of the variation, then $R^2 = 0.7$. The remaining 30% is “unexplained randomness.”

  • Analogy 1 – Movie Recommendation: R-squared is like measuring how much of a person’s movie taste is captured by factors like genre, director, and actor. Add more features, and you can only capture more—or at least not less.

  • Analogy 2 – Exam Scores: Suppose you’re predicting student exam scores using hours studied. If this explains 60% of score differences ($R^2 = 0.6$), adding “favorite color” as a feature won’t reduce $R^2$. But Adjusted $R^2$ would penalize such useless predictors.


📐 Mathematical Foundation

1. R-squared

$$ R^2 = 1 - \frac{SS_\text{res}}{SS_\text{tot}} $$

Where:

  • $SS_\text{res} = \sum (y_i - \hat{y}_i)^2$ → Residual Sum of Squares (unexplained variance).
  • $SS_\text{tot} = \sum (y_i - \bar{y})^2$ → Total Sum of Squares (total variance in the data).

So, $R^2$ = proportion of total variance explained by the model.

2. Adjusted R-squared

$$ \bar{R}^2 = 1 - \left(1 - R^2\right) \cdot \frac{n - 1}{n - p - 1} $$

Where:

  • $n$ = number of observations.
  • $p$ = number of predictors/features.

Key property: Unlike $R^2$, Adjusted $R^2$ can decrease when irrelevant predictors are added.

Deep Dive: Why R-squared never decreases
  • $SS_\text{res}$ always decreases (or stays the same) when more predictors are added, since the model can “fit” more.
  • Therefore, $R^2$ cannot go down.
  • But this doesn’t mean the model is better—it may just be memorizing noise.

⚖️ Strengths, Limitations & Trade-offs

R-squared Strengths:

  • Easy to interpret (“explains X% of variance”).
  • Always between 0 and 1.

R-squared Limitations:

  • Always increases (or stays same) when more features are added → misleading.
  • High $R^2$ does not imply a good model (can result from overfitting).
  • Does not indicate causality.

Adjusted R-squared Strengths:

  • Penalizes extra features → better for comparing models with different numbers of predictors.
  • Useful for model selection.

Adjusted R-squared Limitations:

  • Still not foolproof: doesn’t account for feature importance, just count.
  • Can be biased in small samples.

🔍 Variants & Extensions

  • Pseudo R-squared (for logistic regression): Since variance-explained doesn’t make sense for categorical outcomes, alternative metrics like McFadden’s $R^2$ are used.
  • Cross-validated $R^2$: Instead of in-sample fit, evaluates predictive performance on unseen data.

🚧 Common Challenges & Pitfalls

  • Misinterpretation: A high $R^2$ doesn’t mean the model is correct, only that it fits data well.
  • Overfitting Trap: Adding irrelevant predictors inflates $R^2$ but lowers generalization.
  • Adjusted $R^2$ Misuse: Not a silver bullet; still possible to include harmful features if data is noisy.
  • Negative $R^2$: Possible when model performs worse than the mean predictor (common in poor fits).

📚 Reference Pointers

Any doubt in content? Ask me anything?
Chat
🤖 👋 Hi there! I'm your learning assistant. If you have any questions about this page or need clarification, feel free to ask!