R-squared and Adjusted R-squared: Linear Regression

3 min read 497 words

R-squared ($R^2$): Measures the proportion of variance in the dependent variable explained by the independent variables.
Adjusted R-squared ($\bar{R}^2$): Corrects $R^2$ by penalizing the inclusion of irrelevant predictors, preventing overfitting.

R-squared as “explanation power”: Imagine you’re explaining why people’s salaries vary. If your model explains 70% of the variation, then $R^2 = 0.7$. The remaining 30% is “unexplained randomness.”
Analogy 1 – Movie Recommendation: R-squared is like measuring how much of a person’s movie taste is captured by factors like genre, director, and actor. Add more features, and you can only capture more—or at least not less.
Analogy 2 – Exam Scores: Suppose you’re predicting student exam scores using hours studied. If this explains 60% of score differences ($R^2 = 0.6$), adding “favorite color” as a feature won’t reduce $R^2$. But Adjusted $R^2$ would penalize such useless predictors.

1. R-squared

$$ R^2 = 1 - \frac{SS_\text{res}}{SS_\text{tot}} $$

Where:

$SS_\text{res} = \sum (y_i - \hat{y}_i)^2$ → Residual Sum of Squares (unexplained variance).
$SS_\text{tot} = \sum (y_i - \bar{y})^2$ → Total Sum of Squares (total variance in the data).

So, $R^2$ = proportion of total variance explained by the model.

2. Adjusted R-squared

$$ \bar{R}^2 = 1 - \left(1 - R^2\right) \cdot \frac{n - 1}{n - p - 1} $$

Where:

Key property: Unlike $R^2$, Adjusted $R^2$ can decrease when irrelevant predictors are added.

Deep Dive: Why R-squared never decreases

$SS_\text{res}$ always decreases (or stays the same) when more predictors are added, since the model can “fit” more.
Therefore, $R^2$ cannot go down.
But this doesn’t mean the model is better—it may just be memorizing noise.

R-squared Strengths:

R-squared Limitations:

Adjusted R-squared Strengths:

Penalizes extra features → better for comparing models with different numbers of predictors.
Useful for model selection.

Adjusted R-squared Limitations:

Pseudo R-squared (for logistic regression): Since variance-explained doesn’t make sense for categorical outcomes, alternative metrics like McFadden’s $R^2$ are used.
Cross-validated $R^2$: Instead of in-sample fit, evaluates predictive performance on unseen data.

Misinterpretation: A high $R^2$ doesn’t mean the model is correct, only that it fits data well.
Overfitting Trap: Adding irrelevant predictors inflates $R^2$ but lowers generalization.
Adjusted $R^2$ Misuse: Not a silver bullet; still possible to include harmful features if data is noisy.
Negative $R^2$: Possible when model performs worse than the mean predictor (common in poor fits).

Wikipedia: Coefficient of Determination
Penn State STAT 501: R-squared vs Adjusted R-squared
Elements of Statistical Learning (Hastie, Tibshirani, Friedman) – Chapter on Model Assessment