2.1. Performance & Capacity in ML Systems

Training complexity measures how expensive it is to fit the model. Evaluation complexity measures inference cost. Sample complexity is how much data you need.

Linear regression is lightweight in both training and inference, making it useful for low-latency applications.

Deep neural networks demand large datasets and compute but capture complex functions, trading performance for accuracy.

Capacity and performance constraints (SLA) define practical feasibility of ML models in production.

Funnel modeling balances cost and accuracy: cheap models handle bulk data, deeper models refine smaller subsets.

🎤 Interview Q&A

Q1: What are the three types of ML complexities and why do they matter?

🎯 TL;DR: Training, evaluation, and sample complexity guide model choice based on cost, latency, and data needs.

🌱 Conceptual Explanation

Complexity defines resources consumed: training (how long to fit), evaluation (latency at prediction), and sample (data volume required). They directly influence feasibility.

📐 Technical / Math Details

Training: $O(nfe)$ for linear regression; exponential for deep nets.
Evaluation: $O(f)$ for linear models vs. $O(n_{l1}n_{l2} + …)$ for neural nets.
Sample: grows with capacity (VC-dimension).

⚖️ Trade-offs & Production Notes

High training cost can be amortized; eval cost hits every request.
Limited data means models with low sample complexity perform better.

🚨 Common Pitfalls

Ignoring inference complexity → SLA violations.
Underestimating data requirements for high-capacity models.

🗣 Interview-ready Answer

“ML algorithms differ in training, evaluation, and sample complexity. These matter because they determine data needs, inference latency, and whether we can meet SLA constraints.”

Q2: How do SLAs shape ML system design?

🎯 TL;DR: SLAs force us to balance accuracy with latency and throughput.

🌱 Conceptual Explanation

SLAs set strict bounds on latency (e.g., 500ms at 99th percentile) and throughput (QPS). Model choices must align with these limits.

📐 Technical / Math Details

If inference time > SLA, system fails.
Capacity: e.g., 1000 QPS means model must sustain load distributed across shards.

⚖️ Trade-offs & Production Notes

Sharding/distribution can help scale capacity.
Faster models may be less accurate but necessary for SLA compliance.

🚨 Common Pitfalls

Optimizing solely for accuracy and ignoring SLA metrics.
Assuming scaling capacity infinitely is feasible.

🗣 Interview-ready Answer

“SLAs impose latency and throughput constraints, so we often trade some accuracy for faster inference or use distributed setups.”

Q3: Why is the funnel-based modeling approach effective?

🎯 TL;DR: It applies cheap models to filter bulk data, saving expensive models for small subsets.

🌱 Conceptual Explanation

Instead of running a deep net on 100M docs, use a fast filter (linear/tree) first. Then refine top candidates with slower, more accurate models.

📐 Technical / Math Details

Stage 1: $O(f)$ linear regression on millions.
Stage 2: $O(n_{l1}n_{l2} + …)$ DNN on hundreds.

⚖️ Trade-offs & Production Notes

Balances accuracy and performance.
Requires careful thresholding to avoid filtering out good candidates early.

🚨 Common Pitfalls

Poorly tuned thresholds → recall loss.
Complex funnel design adds engineering overhead.

🗣 Interview-ready Answer

“Funnel modeling lets us handle scale by applying fast models broadly, then using complex models only on a narrowed candidate set.”

Q4: Compare linear regression, tree-based, and deep neural nets for performance and capacity.

🎯 TL;DR: Linear = fastest but simple, trees = moderate cost/good generalization, DNNs = accurate but expensive.

🌱 Conceptual Explanation

Different models suit different resource constraints. Linear models win on speed, trees balance generalization, and DNNs maximize accuracy at cost.

📐 Technical / Math Details

Linear regression: Train $O(nfe)$, Eval $O(f)$.
MART: Train $O(ndfn_{trees})$, Eval $O(fdn_{trees})$.
DNNs: Exponential training, Eval $O(n_{l1}n_{l2}+…)$.

⚖️ Trade-offs & Production Notes

Use linear for SLA-critical cases.
Use trees for medium data, balanced performance.
Use DNNs for complex tasks when capacity allows.

🚨 Common Pitfalls

Assuming DNNs always best.
Ignoring eval cost of trees at large depth.

🗣 Interview-ready Answer

“Linear models are fastest, trees balance cost and generalization, and deep nets provide accuracy at much higher training and inference costs.”

Q5: How does distribution help with ML system performance?

🎯 TL;DR: Sharding spreads workload, reducing latency and meeting capacity needs.

🌱 Conceptual Explanation

If 100M documents take too long, divide them across 1000 machines. Each shard processes fewer docs, reducing wall-clock time.

📐 Technical / Math Details

If single-machine time = $T$, with $k$ shards → time = $T/k$ (ignoring overhead).

⚖️ Trade-offs & Production Notes

Distribution improves capacity.
Network/coordination overhead must be considered.

🚨 Common Pitfalls

Overestimating perfect scaling.
Ignoring system bottlenecks like I/O or synchronization.

🗣 Interview-ready Answer

“Distributed inference shards queries across machines, cutting down latency and increasing throughput, though overhead limits perfect scaling.”

📐 Key Formulas

Linear Regression Complexity

Training: $$O(nfe)$$
Evaluation: $$O(f)$$
$n$: samples, $f$: features, $e$: epochs.
Interpretation: Very efficient, scales linearly with data and features.

Neural Network Evaluation Complexity

$$ O(f n_{l1} + n_{l1} n_{l2} + ... ) $$

$f$: features, $n_{li}$: neurons at layer $i$.
Interpretation: Cost grows with layer size; deeper nets are slower.

MART Complexity

Training: $$O(ndfn_{trees})$$
Evaluation: $$O(fdn_{trees})$$
$n$: samples, $d$: depth, $f$: features, $n_{trees}$: number of trees.
Interpretation: Scales with number/depth of trees; slower than linear but faster than deep nets.

✅ Cheatsheet

Training Complexity: Time to fit model.
Evaluation Complexity: Latency at inference.
Sample Complexity: Data needed to generalize.
Linear Models: Fast, SLA-friendly.
Tree Models: Balanced generalization.
DNNs: High accuracy, high cost.
Funnel Approach: Fast → complex layers to balance performance & capacity.

2.2. Training Data Collection Strategies 1. Introduction