Feature Scaling: Linear Regression

3 min read 509 words

๐ŸŽฏ Core Idea

Feature scaling transforms features so they exist on comparable ranges. While Ordinary Least Squares (OLS) regression with a closed-form solution is unaffected by scaling, it is critical for:

  • Gradient Descent convergence (step size depends on feature magnitude).
  • Regularization penalties (L1/L2 shrinkage depends on absolute feature values).

๐ŸŒฑ Intuition & Real-World Analogy

  • Why scaling matters: Imagine walking downhill on a tilted landscape. If one axis stretches much longer than another, your steps zig-zag instead of moving smoothly toward the valley (minimum). Thatโ€™s gradient descent with unscaled features.
  • Analogy 1 (shoes): If one shoe is a sneaker and the other a ski boot, youโ€™ll struggle to walk straight. Scaling makes both shoes the same size.
  • Analogy 2 (currency): Suppose one feature is โ€œsalary in dollarsโ€ (10,000s) and another is โ€œage in yearsโ€ (tens). Without scaling, salary dominates simply because of its units, not its predictive power. Scaling puts them in fair competition.

๐Ÿ“ Mathematical Foundation

  1. Gradient Descent Dependence on Scale

    • Gradient step:
    $$ \theta_j^{(t+1)} = \theta_j^{(t)} - \eta \cdot \frac{\partial J}{\partial \theta_j} $$

    If features $x_j$ vary by large magnitudes, then $\frac{\partial J}{\partial \theta_j}$ differs wildly across dimensions โ†’ slow, zig-zag convergence.

  2. Standardization (most common):

    $$ x'_i = \frac{x_i - \mu}{\sigma} $$
    • $x_i$: original feature value
    • $\mu$: mean of the feature
    • $\sigma$: standard deviation of the feature

    Result: mean = 0, variance = 1.

  3. Normalization (Min-Max scaling):

    $$ x'_i = \frac{x_i - x_{\min}}{x_{\max} - x_{\min}} $$

    Maps values into $[0,1]$. Sensitive to outliers.

  4. Impact on Regularization (L1/L2):

    • Ridge: $\lambda \sum_j \theta_j^2$
    • Lasso: $\lambda \sum_j |\theta_j|$

    Without scaling, features with larger ranges artificially shrink more since coefficients adjust inversely to feature magnitude. Scaling ensures fair shrinkage.


โš–๏ธ Strengths, Limitations & Trade-offs

Strengths

  • Faster, more stable gradient descent.
  • Prevents dominance of high-range features.
  • Ensures fair effect of regularization.

Limitations

  • Not needed for OLS closed-form (normal equation).
  • Choice of scaling method (standardization vs. normalization) can affect model behavior.
  • Outlier sensitivity (min-max scaling heavily distorted by extremes).

Trade-offs

  • Standardization vs Normalization: standardization handles outliers better, but normalization is bounded and interpretable.

๐Ÿ” Variants & Extensions

  • Robust Scaling: Uses median and interquartile range (IQR). More resistant to outliers:

    $$ x'_i = \frac{x_i - \text{median}}{\text{IQR}} $$
  • Unit Vector Scaling (Normalization to length 1): Useful in text embeddings or cosine similarity contexts:

    $$ x' = \frac{x}{\|x\|_2} $$
  • Adaptive Scaling (e.g., Batch Normalization in Deep Learning): Dynamically scales features during training.


๐Ÿšง Common Challenges & Pitfalls

  • Mistaking necessity: Candidates often think scaling affects OLS coefficients โ€” it doesnโ€™t (but it changes interpretation of coefficients).
  • Forgetting regularization effects: Without scaling, one feature may be unfairly penalized more than another.
  • Improper pipeline handling: Scaling must be fit only on training data (leakage risk if test data is scaled using its own statistics).
  • Over-scaling categorical features: One-hot encoded features are already 0/1 โ€” scaling them often hurts interpretability.

๐Ÿ“š Reference Pointers

  • ESL by Hastie, Tibshirani, and Friedman โ€“ scaling in regression (link)
  • Deep Learning by Goodfellow, Bengio, and Courville โ€“ normalization effects in optimization (Chapter 8)
  • Batch Normalization paper (Ioffe & Szegedy, 2015): arXiv:1502.03167
  • Scikit-learn preprocessing docs for practical variants: link
Any doubt in content? Ask me anything?
Chat
๐Ÿค– ๐Ÿ‘‹ Hi there! I'm your learning assistant. If you have any questions about this page or need clarification, feel free to ask!