1. Linear Regression

It assumes that the dependent variable can be expressed as a weighted sum of independent variables plus an error term. Think of it like drawing the best straight line through a scatter plot of data.

📐 Technical / Math Details

Univariate case:

$$ \hat{y} = w_0 + w_1x $$

Multivariate case:

$$ \hat{y} = w^T x $$

Where $w_0$ is bias, $w_1, w_2, …, w_n$ are weights, and $x$ is feature vector.

⚖️ Trade-offs & Production Notes

Fast, interpretable, baseline model.
Struggles with non-linear relationships.
Sensitive to outliers.

🚨 Common Pitfalls

Forgetting to scale features before applying GD.
Ignoring multicollinearity.

🗣 Interview-ready Answer

“Linear Regression predicts continuous outcomes by fitting a linear equation between features and target; it’s simple, interpretable, but limited for non-linear data.”

Q2: Explain the cost function in Linear Regression.

🎯 TL;DR: The cost function (MSE) measures average squared error between predicted and true values; we minimize it.

🌱 Conceptual Explanation

The cost function quantifies model error. By minimizing it, the model finds the line of best fit.

📐 Technical / Math Details

$$ J(w) = \frac{1}{2m} \sum_{i=1}^m (\hat{y}^{(i)} - y^{(i)})^2 $$

$m$: number of samples
$\hat{y}^{(i)}$: predicted value
$y^{(i)}$: true value

⚖️ Trade-offs & Production Notes

Convex → guarantees global minimum.
Sensitive to outliers since errors are squared.

🚨 Common Pitfalls

Confusing cost with evaluation metrics.
Using non-scaled features → slow GD convergence.

🗣 Interview-ready Answer

“The cost function in Linear Regression is mean squared error, which penalizes large deviations by squaring residuals.”

Q3: How does Gradient Descent work in Linear Regression?

🎯 TL;DR: Gradient descent iteratively updates weights by moving in the opposite direction of the gradient of the cost function.

🌱 Conceptual Explanation

It’s like walking downhill blindfolded—each step follows the slope until you reach the valley (minimum cost).

📐 Technical / Math Details

Update rule:

$$ w_j := w_j - \alpha \frac{\partial J(w)}{\partial w_j} $$

$\alpha$: learning rate
$\frac{\partial J}{\partial w_j}$: gradient

⚖️ Trade-offs & Production Notes

Batch GD: stable, but slow for large data.
SGD: faster, but noisier.
Mini-batch: balance between both.

🚨 Common Pitfalls

Learning rate too small → slow.
Too large → divergence.

🗣 Interview-ready Answer

“Gradient Descent reduces error by updating weights opposite the gradient of the cost until convergence.”

Q4: What’s the difference between Univariate and Multivariate Linear Regression?

🎯 TL;DR: Univariate uses one feature; multivariate uses multiple features to predict the target.

🌱 Conceptual Explanation

Univariate draws a line in 2D space; multivariate draws a hyperplane in higher dimensions.

📐 Technical / Math Details

Univariate:
$$ \hat{y} = w_0 + w_1x $$
Multivariate:
$$ \hat{y} = w^T x $$

⚖️ Trade-offs & Production Notes

Multivariate captures richer relationships.
Risk of multicollinearity with many features.

🚨 Common Pitfalls

Using too many irrelevant features → overfitting.

🗣 Interview-ready Answer

“Univariate Linear Regression predicts with one feature, while multivariate uses multiple features, forming a hyperplane.”

Q5: Explain Overfitting vs Underfitting in Regression.

🎯 TL;DR: Overfitting → memorizes training data; underfitting → too simple to learn patterns.

🌱 Conceptual Explanation

Overfit = model too complex, performs poorly on new data. Underfit = model too simple, fails even on training data.

📐 Technical / Math Details

Overfit: high variance.
Underfit: high bias.

⚖️ Trade-offs & Production Notes

Use regularization (L1/L2) to combat overfitting.
Add features or complexity to reduce underfitting.

🚨 Common Pitfalls

Evaluating only on training set.

🗣 Interview-ready Answer

“Overfitting means the model is too complex and generalizes poorly; underfitting means it’s too simple to capture patterns.”

Q6: How does Scikit-learn implement Linear Regression?

🎯 TL;DR: LinearRegression uses closed-form OLS; SGDRegressor uses iterative optimization.

🌱 Conceptual Explanation

Scikit-learn provides both analytical and iterative implementations for Linear Regression, making it easy to use in practice.

📐 Technical / Math Details

LinearRegression: solves $$ w = (X^TX)^{-1}X^Ty $$
SGDRegressor: applies gradient descent updates.

⚖️ Trade-offs & Production Notes

OLS is fast for small/medium data.
SGD scales better for huge datasets.

🚨 Common Pitfalls

Not normalizing input data before SGD.

🗣 Interview-ready Answer

“Scikit-learn’s LinearRegression uses closed-form OLS, while SGDRegressor applies gradient descent for scalability.”

📐 Key Formulas

Univariate Regression Equation

$$ \hat{y} = w_0 + w_1 x $$

$w_0$: bias/intercept
$w_1$: slope/weight
$x$: input feature
Interpretation: Models output as a straight line function of input.

Multivariate Regression Equation

$$ \hat{y} = w^T x $$

$w$: weight vector
$x$: feature vector (with bias term $x_0=1$)
Interpretation: Generalizes line to a hyperplane in higher dimensions.

Mean Squared Error (Cost Function)

$$ J(w) = \frac{1}{2m} \sum_{i=1}^m (\hat{y}^{(i)} - y^{(i)})^2 $$

$m$: number of samples
$y^{(i)}$: true output
$\hat{y}^{(i)}$: predicted output
Interpretation: Average squared deviation between predictions and truth; convex, easy to optimize.

Gradient Descent Update Rule

$$ w_j := w_j - \alpha \frac{\partial J(w)}{\partial w_j} $$

$\alpha$: learning rate
$\frac{\partial J}{\partial w_j}$: gradient for parameter $w_j$
Interpretation: Iteratively adjusts weights to minimize cost function.

✅ Cheatsheet

Regression predicts continuous outputs; classification → discrete.
Overfitting = high variance, underfitting = high bias.
Univariate: one feature → straight line; multivariate: multiple features → hyperplane.
Cost function: MSE, convex, optimized via gradient descent.
Batch vs SGD vs Mini-batch: trade-off between stability and speed.
Scikit-learn: LinearRegression (OLS), SGDRegressor (gradient descent).