Linear Regression Interview Cheatsheet & Quick Reference
🎯 Core Idea
Linear Regression models the relationship between input features and a continuous target by fitting the best straight line (or hyperplane) that minimizes prediction errors.
🧠 How it Works: The Intuition
Linear Regression tries to find a mathematical equation (a line) that explains how inputs relate to outputs. It adjusts the slope and intercept so that the line best represents the observed data.
Step 1
Define a linear function that maps inputs (features) to outputs (predictions).
Step 2
Measure how far predictions are from actual values using a loss function (typically Mean Squared Error).
Step 3
Optimize the parameters (weights and bias) by minimizing this loss, often using the Normal Equation or Gradient Descent.
Step 4
Use the learned parameters to make predictions on new, unseen data.
📈 Mathematical Foundation & Complexity
This section contains the critical technical details for an interview setting.
🔍 View the Core Math & Equations
-
Hypothesis Function:
$$ \hat{y} = X\beta = \beta_0 + \beta_1 x_1 + \dots + \beta_n x_n $$ -
Loss Function (Mean Squared Error):
$$ J(\beta) = \frac{1}{2m} \sum_{i=1}^{m} \left( y^{(i)} - \hat{y}^{(i)} \right)^2 $$ -
Closed-form Solution (Normal Equation):
$$ \hat{\beta} = (X^T X)^{-1} X^T y $$ -
Gradient Descent Update Rule:
$$ \beta := \beta - \alpha \cdot \frac{1}{m} \sum_{i=1}^{m} ( \hat{y}^{(i)} - y^{(i)} ) x^{(i)} $$
- Time Complexity:
- Closed-form (Normal Equation):
O(n^3)
due to matrix inversion. - Gradient Descent:
O(m·n·k)
wherem = samples
,n = features
,k = iterations
.
- Closed-form (Normal Equation):
- Space Complexity:
O(m·n)
to store the dataset andO(n)
for the parameters.
✅ Pros & ❌ Cons
A quick summary of the primary trade-offs.
- Pro 1: Simple, interpretable, and easy to implement.
- Pro 2: Fast training for small to medium datasets.
- Pro 3: Provides insights into feature importance via coefficients.
- Con 1: Assumes linear relationship between features and target.
- Con 2: Sensitive to multicollinearity and outliers.
- Con 3: Performance degrades with high-dimensional or non-linear data.
🛠️ Practical Application
Guidance on using the algorithm in a real-world project.
🔧 View Key Hyperparameters
fit_intercept
: Whether to calculate the intercept (True
by default).normalize
: Whether to normalize features before fitting (deprecated in some libraries, use preprocessing).learning_rate
(for Gradient Descent): Controls step size; too high causes divergence, too low slows convergence.max_iter
: Number of iterations for optimization methods like Gradient Descent.
⚠️ Interviewer’s Trap: Assumptions & Pitfalls
- Key Assumption 1: Relationship between features and target is linear.
- Key Assumption 2: Errors are normally distributed with constant variance (homoscedasticity).
- Key Assumption 3: Features are not highly correlated (no strong multicollinearity).
- Common Pitfall 1: Forgetting to scale features when using Gradient Descent.
- Common Pitfall 2: Misinterpreting coefficients as causation instead of correlation.
- Common Pitfall 3: Ignoring the impact of outliers, which can skew results.