R-squared and Adjusted R-squared: Linear Regression
🪄 Step 1: Intuition & Motivation
Core Idea: Once we’ve trained our regression model, the next big question is: how good is it? That’s where R-squared and Adjusted R-squared come in. These metrics tell us how well our model explains the variation in the target — or in plain English, how much of the “story” our model actually captures.
Simple Analogy: Imagine your target variable ($y$) as a messy room. R-squared measures how much of that mess your model manages to organize neatly into labeled boxes (explained variance). Adjusted R-squared then walks in and says, “Hey, don’t just add random boxes that don’t help! I’ll only reward you if each new box actually helps tidy the room.”
🌱 Step 2: Core Concept
What’s Happening Under the Hood?
R-squared quantifies how much of the variation in your data is captured by your model’s predictions.
Let’s define two quantities:
Total variation in $y$ (TSS) — how much $y$ varies overall:
$$ TSS = \sum (y_i - \bar{y})^2 $$Residual variation (RSS) — how much of that variation the model failed to explain:
$$ RSS = \sum (y_i - \hat{y_i})^2 $$
Then, R-squared is just:
$$ R^2 = 1 - \frac{RSS}{TSS} $$If $R^2 = 1$, your model explains everything (perfect fit).
If $R^2 = 0$, your model explains nothing (you might as well predict the mean).
Why It Works This Way
R-squared compares how bad your model is versus just predicting the mean every time.
If your model reduces the error a lot compared to the mean, $R^2$ goes up.
But there’s a catch — R-squared never decreases, even if you add useless features.
That’s where Adjusted R-squared comes in — it penalizes unnecessary complexity.
How It Fits in ML Thinking
Adjusted R-squared introduces regularization of reasoning — it’s the statistical ancestor of today’s complexity penalties (like L1, L2, etc.).
📐 Step 3: Mathematical Foundation
R-squared ($R^2$)
Where:
- $RSS = \sum (y_i - \hat{y_i})^2$ (residual sum of squares)
- $TSS = \sum (y_i - \bar{y})^2$ (total sum of squares)
Interpretation:
$R^2$ tells you the proportion of total variance in $y$ that’s explained by the model.
If $R^2 = 0.8$, it means “80% of the variance in the outcome can be explained by the model.”
Think of $R^2$ as your model’s report card:
- 1.0 → “You’ve explained everything!”
- 0.0 → “You didn’t explain anything new!”
- Negative → “You made things worse than just predicting the mean!”
Adjusted R-squared ($\bar{R}^2$)
Where:
- $n$ = number of samples
- $p$ = number of predictors (features)
Interpretation:
Adjusted R-squared corrects for adding too many features that don’t help.
When you add a useless feature, $R^2$ goes up slightly, but $\bar{R}^2$ drops, warning you that the improvement isn’t genuine.
“Don’t just bring more friends to the group project — unless they actually do work, I’ll lower your grade.”
🧠 Step 4: Key Ideas and Assumptions
1️⃣ Variance-based interpretation:
R-squared assumes your data has measurable variance to begin with. If your target is almost constant, R-squared becomes meaningless.
2️⃣ Monotonicity:
R-squared never decreases when adding features — but Adjusted R-squared can, if those features add noise instead of value.
3️⃣ Model context:
R-squared only compares models using the same dependent variable. It’s not absolute truth — it’s a relative measure of “fit.”
⚖️ Step 5: Strengths, Limitations & Trade-offs
- Easy to interpret — intuitive measure of “fit.”
- Adjusted R-squared discourages overfitting.
- Helpful for comparing nested models (one inside another).
- R-squared alone can mislead — always increases with more variables.
- Doesn’t measure predictive accuracy on new data.
- Can’t compare across different targets or data scales.
but not always good generalization.
Use Adjusted R-squared (and cross-validation) for a reality check.
🚧 Step 6: Common Misunderstandings
🚨 Common Misunderstandings (Click to Expand)
“A higher R-squared always means a better model.”
Not necessarily — you might just be overfitting with too many features.“R-squared can tell you how accurate your predictions are.”
It can’t. It measures fit to training data, not prediction performance.“R-squared can decrease with more variables.”
No — it can’t. But Adjusted R-squared can, and that’s the point.
🧩 Step 7: Mini Summary
🧠 What You Learned: R-squared measures how much variance your model explains, while Adjusted R-squared penalizes unnecessary complexity.
⚙️ How It Works: R-squared compares your model’s errors to a naive mean predictor; Adjusted R-squared corrects for added features.
🎯 Why It Matters: These metrics teach you to balance fit with parsimony — the essence of good modeling.