2.2. Model Registry & Versioning

AI System Design Interview Guide (2025)

4 min read 799 words

🪄 Step 1: Intuition & Motivation

Core Idea (in 1 short paragraph): A model registry is the official “source of truth” for your models — not just the file with weights, but what it is, how it was trained, where it came from, and how to reproduce it. Think of it as a passport office for models: every model gets a unique identity, a history, stamps for where it’s allowed to travel (stages like Staging → Production), and the paperwork to recreate it exactly.
Simple Analogy (one only): It’s a museum archive: storing the painting (model artifact), the curator’s notes (metrics, params), the exhibition history (lineage), and the exact lighting and frame (runtime environment). With this, you can rehang the same exhibition years later.

🌱 Step 2: Core Concept

Model registries (like MLflow Model Registry or Vertex AI Model Registry) standardize how models move from experiment to production while staying reproducible.

What’s Happening Under the Hood?

A registry typically tracks:

Artifacts: the model file(s) — weights, tokenizer, signature (input/output schema).
Metadata: metrics (AUC, loss), hyperparameters, dataset versions, code commit SHA.
Lineage: which data, features, and code produced this model; upstream runs and downstream deployments.
Stages & Tags:
- Stages: e.g., None → Staging → Production → Archived.
- Tags: free-form labels like model:v3.2-prod, training:data-2025-10-28, quant:int8.
Permissions & Audit: who promoted, when, and why (notes or approval tickets).
Rollbacks: ability to promote a prior version back to Production quickly.

Promotion flow example:
Experiment Run → Register as recsys v7 → Stage: Staging → Shadow/Offline validation → Canary → Stage: Production → Monitor → If regressions: rollback to v6.

Why It Works This Way

Consistency: One canonical place prevents “which model are we serving?” chaos.
Reproducibility: Capturing data/code/env links means you can recreate results precisely.
Velocity with Safety: Stages and tags let you ship fast while keeping a paper trail and instant rollback.

How It Fits in ML Thinking

The registry sits at the intersection of experimentation and operations:

From the data side: it references datasets, feature sets, and time windows used.
From the code side: it pins commit hashes, dependency lockfiles, and Docker images.
From the ops side: it governs deployment stages, approvals, and rollbacks.

📐 Step 3: Mathematical Foundation

There’s no heavy math here, but two tiny concepts help intuition: deterministic identity and semantic versioning.

Deterministic Identity via Hashes

A run (training or packaging) can be uniquely identified by a content hash:

$$ \text{run\_id} = H(\text{code\_sha} \,\|\, \text{data\_snapshot\_id} \,\|\, \text{params} \,\|\, \text{env\_lock}) $$

$H(\cdot)$ is a cryptographic hash (conceptually).
If any ingredient changes (data, code, params, environment), the identity changes.

This is a “recipe fingerprint.” Change the spice or the oven, and you get a new fingerprint — perfect for tracing and comparisons.

Semantic Versioning for Models

Treat versions like:

$$ \text{major.minor.patch} $$

major: architecture/feature space changes (breaking).
minor: retrain on new data or improved calibration (compatible).
patch: metadata fix, small bug, or environment tweak (no metric shifts).

Clear versioning communicates expected risk and rollback blast radius.

🧠 Step 4: Assumptions or Key Ideas

Saving model.pkl isn’t enough: you also need data versions, code commit, and environment to reproduce.
Environment pinning is essential: lock Docker image + Conda/requirements to freeze exact libs and CUDA.
Lineage matters: link model → dataset snapshot → feature definitions → code version.
Stages & tags enforce governance**: who promoted what, under which test evidence.
Rollbacks are first-class: every promotion plan includes a rollback plan.

⚖️ Step 5: Strengths, Limitations & Trade-offs

Single source of truth: artifacts + metrics + lineage.
Faster incident response with 1-click rollback.
Trustworthy audits for experiments and deployments.
Enables side-by-side comparisons and promotions with confidence.

Operational overhead to maintain metadata quality.
Cultural shift required: disciplined tagging, notes, approvals.
Storage and retention policies needed for large artifacts.

Speed vs. Rigor: strict governance slows ad-hoc experiments, but saves days during incidents.
Granularity vs. Cost: more detailed lineage (data/feature/version) improves traceability but increases storage and tooling complexity.

🚧 Step 6: Common Misunderstandings (Optional)

🚨 Common Misunderstandings (Click to Expand)

“If I can load the model file, I can reproduce it.”
→ You also need the exact environment and exact data snapshot to match metrics.
“Tags are decoration.”
→ Tags encode deploy intent and routing (e.g., canary, shadow, prod), enabling safe automation.
“Lineage is overkill.”
→ Without lineage, drift/outage investigations turn into guesswork; mean time to recovery balloons.

🧩 Step 7: Mini Summary

🧠 What You Learned: A model registry tracks artifacts, metadata, lineage, stages, and tags so models move from experiment to production safely and reproducibly.

⚙️ How It Works: Register models with tied data/code/env references; promote through stages with approvals; use tags for routing; and keep rollbacks ready.

🎯 Why It Matters: It turns “a file on disk” into a reproducible, governable asset you can trust under pressure.

2.3. Online Inference Architecture 2.1. Feature Store Design