2.1. Performance & Capacity in ML Systems

2.1. Performance & Capacity in ML Systems

5 min read 1003 words

📝 Flashcards

⚡ Short Theories

Training complexity measures how expensive it is to fit the model. Evaluation complexity measures inference cost. Sample complexity is how much data you need.

Linear regression is lightweight in both training and inference, making it useful for low-latency applications.

Deep neural networks demand large datasets and compute but capture complex functions, trading performance for accuracy.

Capacity and performance constraints (SLA) define practical feasibility of ML models in production.

Funnel modeling balances cost and accuracy: cheap models handle bulk data, deeper models refine smaller subsets.

🎤 Interview Q&A

Q1: What are the three types of ML complexities and why do they matter?

🎯 TL;DR: Training, evaluation, and sample complexity guide model choice based on cost, latency, and data needs.


🌱 Conceptual Explanation

Complexity defines resources consumed: training (how long to fit), evaluation (latency at prediction), and sample (data volume required). They directly influence feasibility.

📐 Technical / Math Details

  • Training: $O(nfe)$ for linear regression; exponential for deep nets.
  • Evaluation: $O(f)$ for linear models vs. $O(n_{l1}n_{l2} + …)$ for neural nets.
  • Sample: grows with capacity (VC-dimension).

⚖️ Trade-offs & Production Notes

  • High training cost can be amortized; eval cost hits every request.
  • Limited data means models with low sample complexity perform better.

🚨 Common Pitfalls

  • Ignoring inference complexity → SLA violations.
  • Underestimating data requirements for high-capacity models.

🗣 Interview-ready Answer

“ML algorithms differ in training, evaluation, and sample complexity. These matter because they determine data needs, inference latency, and whether we can meet SLA constraints.”


Q2: How do SLAs shape ML system design?

🎯 TL;DR: SLAs force us to balance accuracy with latency and throughput.


🌱 Conceptual Explanation

SLAs set strict bounds on latency (e.g., 500ms at 99th percentile) and throughput (QPS). Model choices must align with these limits.

📐 Technical / Math Details

  • If inference time > SLA, system fails.
  • Capacity: e.g., 1000 QPS means model must sustain load distributed across shards.

⚖️ Trade-offs & Production Notes

  • Sharding/distribution can help scale capacity.
  • Faster models may be less accurate but necessary for SLA compliance.

🚨 Common Pitfalls

  • Optimizing solely for accuracy and ignoring SLA metrics.
  • Assuming scaling capacity infinitely is feasible.

🗣 Interview-ready Answer

“SLAs impose latency and throughput constraints, so we often trade some accuracy for faster inference or use distributed setups.”


Q3: Why is the funnel-based modeling approach effective?

🎯 TL;DR: It applies cheap models to filter bulk data, saving expensive models for small subsets.


🌱 Conceptual Explanation

Instead of running a deep net on 100M docs, use a fast filter (linear/tree) first. Then refine top candidates with slower, more accurate models.

📐 Technical / Math Details

  • Stage 1: $O(f)$ linear regression on millions.
  • Stage 2: $O(n_{l1}n_{l2} + …)$ DNN on hundreds.

⚖️ Trade-offs & Production Notes

  • Balances accuracy and performance.
  • Requires careful thresholding to avoid filtering out good candidates early.

🚨 Common Pitfalls

  • Poorly tuned thresholds → recall loss.
  • Complex funnel design adds engineering overhead.

🗣 Interview-ready Answer

“Funnel modeling lets us handle scale by applying fast models broadly, then using complex models only on a narrowed candidate set.”


Q4: Compare linear regression, tree-based, and deep neural nets for performance and capacity.

🎯 TL;DR: Linear = fastest but simple, trees = moderate cost/good generalization, DNNs = accurate but expensive.


🌱 Conceptual Explanation

Different models suit different resource constraints. Linear models win on speed, trees balance generalization, and DNNs maximize accuracy at cost.

📐 Technical / Math Details

  • Linear regression: Train $O(nfe)$, Eval $O(f)$.
  • MART: Train $O(ndfn_{trees})$, Eval $O(fdn_{trees})$.
  • DNNs: Exponential training, Eval $O(n_{l1}n_{l2}+…)$.

⚖️ Trade-offs & Production Notes

  • Use linear for SLA-critical cases.
  • Use trees for medium data, balanced performance.
  • Use DNNs for complex tasks when capacity allows.

🚨 Common Pitfalls

  • Assuming DNNs always best.
  • Ignoring eval cost of trees at large depth.

🗣 Interview-ready Answer

“Linear models are fastest, trees balance cost and generalization, and deep nets provide accuracy at much higher training and inference costs.”


Q5: How does distribution help with ML system performance?

🎯 TL;DR: Sharding spreads workload, reducing latency and meeting capacity needs.


🌱 Conceptual Explanation

If 100M documents take too long, divide them across 1000 machines. Each shard processes fewer docs, reducing wall-clock time.

📐 Technical / Math Details

If single-machine time = $T$, with $k$ shards → time = $T/k$ (ignoring overhead).

⚖️ Trade-offs & Production Notes

  • Distribution improves capacity.
  • Network/coordination overhead must be considered.

🚨 Common Pitfalls

  • Overestimating perfect scaling.
  • Ignoring system bottlenecks like I/O or synchronization.

🗣 Interview-ready Answer

“Distributed inference shards queries across machines, cutting down latency and increasing throughput, though overhead limits perfect scaling.”


📐 Key Formulas

Linear Regression Complexity
  • Training: $$O(nfe)$$
  • Evaluation: $$O(f)$$
    $n$: samples, $f$: features, $e$: epochs.
    Interpretation: Very efficient, scales linearly with data and features.
Neural Network Evaluation Complexity
$$ O(f n_{l1} + n_{l1} n_{l2} + ... ) $$
  • $f$: features, $n_{li}$: neurons at layer $i$.
    Interpretation: Cost grows with layer size; deeper nets are slower.
MART Complexity
  • Training: $$O(ndfn_{trees})$$
  • Evaluation: $$O(fdn_{trees})$$
    $n$: samples, $d$: depth, $f$: features, $n_{trees}$: number of trees.
    Interpretation: Scales with number/depth of trees; slower than linear but faster than deep nets.

✅ Cheatsheet

  • Training Complexity: Time to fit model.
  • Evaluation Complexity: Latency at inference.
  • Sample Complexity: Data needed to generalize.
  • Linear Models: Fast, SLA-friendly.
  • Tree Models: Balanced generalization.
  • DNNs: High accuracy, high cost.
  • Funnel Approach: Fast → complex layers to balance performance & capacity.
Any doubt in content? Ask me anything?
Chat
🤖 👋 Hi there! I'm your learning assistant. If you have any questions about this page or need clarification, feel free to ask!