2.1. Memory Systems — Short-Term vs Long-Term Context

Generative AI & LLM Interview Guide for Top Roles (2025)

Agents & Autonomy

5 min read 866 words

🪄 Step 1: Intuition & Motivation

Core Idea: If an agent is to act like a thinking being, it needs more than logic — it needs memory. Without memory, even the smartest model becomes a goldfish: it can reason deeply for one moment but forget everything once the next prompt arrives.
Memory Systems give agents continuity — the ability to recall past interactions, learn from experience, and adapt over time.
Simple Analogy: Imagine meeting someone every day who forgets who you are. You’d spend half your life reintroducing yourself. Agents without memory are exactly that — brilliant conversationalists with amnesia.

🌱 Step 2: Core Concept

Let’s peek inside how agents remember information — and how that memory shapes their intelligence.

What’s Happening Under the Hood?

Agents maintain two broad categories of memory:

Short-Term Memory (STM) — the working memory that holds recent context (like what was said in the last few turns).
- This is typically managed through context windows (the tokens an LLM can “see” at once).
- Once the window overflows, old messages are forgotten.
Long-Term Memory (LTM) — the deep storage that holds knowledge across many sessions.
- This is often implemented using vector databases like faiss or chromadb.
- Instead of remembering exact text, it stores embeddings — numerical representations of meaning.

When the agent faces a new query, it searches through this vector memory to find semantically similar past experiences and brings them back into context.

Why It Works This Way

Because LLMs can’t truly remember — their weights are fixed once trained. What we call “memory” is actually retrieval — pulling relevant old data into the current conversation window.

This design mirrors human cognition:

Short-term memory helps us keep track of what we’re currently doing.
Long-term memory helps us recall past experiences and lessons.

By combining both, agents achieve something close to “continuity of thought.”

How It Fits in ML Thinking

In ML terms, memory systems act as external attention mechanisms. They extend the Transformer’s internal attention span by offloading context to an external storage, then selectively pulling it back when needed.

This architecture — called Retrieval-Augmented Generation (RAG) — lies at the heart of modern intelligent agents and knowledge systems.

📐 Step 3: Mathematical Foundation

Let’s describe how retrieval-based memory works in mathematical form.

Memory Retrieval Equation

Given a current query vector $q$, and stored memory embeddings $M = {m_1, m_2, …, m_n}$, the agent retrieves the most relevant memories using cosine similarity:

$$ \text{sim}(q, m_i) = \frac{q \cdot m_i}{||q|| , ||m_i||} $$

The top-$k$ similar memories are then added back into the agent’s input context.

Think of cosine similarity as “semantic distance.” The closer two thoughts (vectors) are in direction, the more related they are in meaning.

🧠 Step 4: Types of Memory

Memory in agent systems often mirrors human cognitive structures.

Type	Purpose	Analogy
Episodic Memory	Stores specific experiences — what happened, when, and how.	Like remembering a past conversation.
Semantic Memory	Stores general knowledge and facts derived from experience.	Like remembering that Paris is the capital of France.
Procedural Memory	Encodes learned routines or skills.	Like remembering how to solve a math problem.

Most agent frameworks (like LangGraph and CrewAI) combine episodic and semantic memories to enable contextually rich interactions.

🧠 Step 5: Implementation Principles

Let’s connect this concept to practical design:

Vector Storage (FAISS / ChromaDB): Store embeddings for past conversations or actions.
- Each entry has: text, vector, timestamp, and metadata (topic, task ID, etc.).
Retrieval Policy: When a new prompt arrives, decide:
- Recall: retrieve old related memories from storage.
- Requery: start fresh when no relevant memory exists.
Summarization Compression: To avoid token overflow, old memory entries are summarized and stored as compact notes. (Example: a 1000-word log → “Agent successfully found top 5 tools after 3 attempts.”)
Relevance Decay: Older or less-used memories are gradually deprioritized or deleted — mimicking human forgetting.

⚖️ Step 6: Strengths, Limitations & Trade-offs

Gives agents continuity of thought across long tasks.
Enables self-learning — building knowledge over time.
Reduces hallucination by grounding reasoning in prior results.

Memory can drift — irrelevant or outdated info may dominate retrieval.
Large vector stores can slow retrieval without re-indexing.
Over-reliance on memory can prevent fresh reasoning.

The key trade-off is between rich context and performance. Too much memory causes confusion; too little causes forgetfulness. Smart agents balance both using dynamic relevance scoring and memory pruning.

🚧 Step 7: Common Misunderstandings

🚨 Common Misunderstandings (Click to Expand)

“LLMs can remember past sessions.” Not by default — they only see what’s in the input window unless connected to external memory.
“All memory should be stored forever.” No — long-term memory needs pruning to avoid drift and redundancy.
“More memory = better performance.” Actually, more memory can slow retrieval and introduce irrelevant context.

🧩 Step 8: Mini Summary

🧠 What You Learned: Memory systems give agents persistence — the ability to recall, reason, and adapt based on prior interactions.

⚙️ How It Works: Agents combine short-term context windows with long-term vector retrieval to simulate learning and continuity.

🎯 Why It Matters: Without memory, agents can’t maintain goals, learn from mistakes, or perform complex, multi-step reasoning — memory is the bridge between “thinking” and “experience.”

2.2. Planning Systems — Goal Decomposition & Reflection 1.3. Agentic Architectures — Modular Reasoning Systems