5.3. Prompting, In-Context Learning, and RLHF

6 min read 1143 words

🪄 Step 1: Intuition & Motivation

  • Core Idea: Transformers today don’t just predict words — they reason, adapt, and follow instructions. What changed? We didn’t rewrite their architecture — we changed how we talk to them.

That’s the power of prompting and in-context learning (ICL) — getting a frozen model to behave as if it were retrained for your specific task, on the fly.

Then, to align their outputs with human values, we add a final layer of refinement — RLHF (Reinforcement Learning from Human Feedback) — teaching models not only to predict correctly, but to respond helpfully, honestly, and safely.

Together, these techniques transformed Transformers from “text predictors” → to “reasoning assistants.”


  • Simple Analogy: Imagine a brilliant student (the pretrained Transformer). They’ve read everything ever written, but they don’t know what you want unless you phrase it right. A good prompt gives context and expectations. In-context learning lets them pick up your desired behavior from a few examples. RLHF teaches them manners — so they don’t just answer correctly, but answer usefully.

🌱 Step 2: Core Concept

We’ll explore the full picture step by step:

  1. Prompting & Few-Shot Abilities
  2. In-Context Learning (Meta-Learning Emergence)
  3. Instruction Tuning & RLHF (Alignment through Feedback)

1️⃣ Prompting — Talking to Transformers

At its core, prompting is programming through natural language. You feed the model a prompt that defines the task, and it predicts the continuation.

Example:

Translate English to French:
English: I love pizza
French:

The model predicts → “J’adore la pizza.”

Same architecture, no retraining — just a new context.

Prompt Types:

TypeDescriptionExample
Zero-shotJust the instruction“Summarize this text:”
One-shotOne example provided“Translate: I love → J’adore”
Few-shotSeveral examples“I love → J’aime; I run → Je cours; I eat → Je mange”

Mechanism: The model uses its learned statistical associations to infer what you’re asking for based on the pattern in the prompt.

Prompting is like giving your model a quick “job brief.” You don’t retrain it — you just tell it how to think for the next few sentences.

2️⃣ In-Context Learning — Learning Without Weight Updates

In-context learning (ICL) is the model’s ability to learn from examples in the prompt, without gradient descent.

What’s Happening:

When you give examples like:

Input: 2 + 2 = 4
Input: 3 + 5 = 8
Input: 7 + 9 = 

The model figures out the pattern and continues with “16.” It’s not updating its parameters — it’s performing implicit reasoning based on patterns in the input sequence.

Why This Emerges:

During pretraining, the model learns to predict the next token given a context — so it becomes great at “recognizing patterns from examples.” This turns out to be equivalent to meta-learning — learning how to learn new tasks from context.

Mathematically, the Transformer implicitly minimizes an expected loss over multiple possible tasks during training, so it learns to adapt its internal attention to new instructions.

In other words: The model learns not just language, but how to learn from language.


In-context learning is like a detective scanning clues on the fly — no memory change, no training session, just clever pattern recognition in real-time.

3️⃣ Instruction Tuning and RLHF — Teaching the Model Human Values

Pretrained Transformers are powerful but neutral — they’ll complete any text, including unhelpful or unsafe continuations. To make them follow instructions and align with human intentions, we apply two finishing steps:


🧭 (a) Instruction Tuning

Fine-tune the model on datasets of (prompt → expected response) pairs. Example:

User: Explain gravity simply.
Model: Gravity is a force that pulls things toward each other.

This step turns a raw model into an instruction follower (like GPT-3 → InstructGPT).


💬 (b) RLHF (Reinforcement Learning from Human Feedback)

This adds an extra optimization loop:

  1. Supervised Fine-Tuning (SFT): Train on labeled helpful responses.
  2. Reward Model (RM): Train a separate model to score responses based on human preferences (helpfulness, harmlessness).
  3. Policy Optimization (PPO): Fine-tune the model to maximize reward from the RM.

Formally, PPO maximizes:

$$ \mathbb{E}*{x \sim \text{data}, y \sim \pi*\theta} [r(x, y)] $$

where $r(x, y)$ is the reward model’s score for response $y$ to input $x$.

Effect: The model learns what humans prefer, not just what text fits statistically.


RLHF is like etiquette school for LLMs — they already know the facts, but now they learn how to speak helpfully, politely, and responsibly.

📐 Step 3: Mathematical Foundation

In-Context Learning as Implicit Meta-Learning

When training on a mixture of tasks ${T_i}$, the model learns to minimize:

$$ \mathbb{E}*{T_i}[\mathcal{L}(f*\theta; T_i)] $$

This implicitly teaches it to perform task inference given a prompt context.

At inference, the model “updates its belief” about the task from the context tokens, even though $\theta$ (the parameters) are fixed — hence learning without learning.

The Transformer’s attention acts like a working memory that reconfigures itself dynamically — it’s meta-learning through inference.

Reward Optimization in RLHF (Simplified)

We fine-tune the policy $\pi_\theta$ to maximize reward under a constraint (to stay close to the original model):

$$ \max_\theta \mathbb{E}*{y \sim \pi*\theta} [r(y)] - \beta , D_{KL}(\pi_\theta || \pi_{ref}) $$

Here:

  • $r(y)$ = reward model score
  • $\pi_{ref}$ = reference model (e.g., SFT model)
  • $D_{KL}$ = KL-divergence to prevent “going rogue”

The $\beta$ term balances creativity vs. safety — small $\beta$ allows more deviation, large $\beta$ enforces conservative behavior.


🧠 Step 4: Key Ideas

  • Prompting: Directs pretrained models using language cues.
  • In-Context Learning: Lets models adapt behavior from examples without retraining.
  • Instruction Tuning: Trains models to follow explicit instructions.
  • RLHF: Aligns outputs with human preferences using feedback-based rewards.
  • Emergent Meta-Learning: Transformers learn to infer new tasks dynamically during inference.

⚖️ Step 5: Strengths, Limitations & Trade-offs

  • Enables powerful zero/few-shot generalization.
  • No retraining needed for new tasks.
  • Aligns outputs with human expectations and ethics.
  • Prompts can be ambiguous or manipulative.
  • RLHF may bias models toward majority opinions.
  • In-context learning consumes longer context windows → memory cost.
Prompting and RLHF bridge capability and alignment: Prompting shows “what models can do.” RLHF ensures “what they should do.” Balancing freedom and safety is the ongoing frontier of Transformer design.

🚧 Step 6: Common Misunderstandings

🚨 Common Misunderstandings (Click to Expand)
  • “In-context learning changes model weights.” It doesn’t — the model dynamically adapts attention patterns without gradient updates.
  • “RLHF trains the model from scratch.” No, it fine-tunes a pretrained model using feedback.
  • “Prompts directly control reasoning.” They influence reasoning, but outcomes still depend on training priors and reward shaping.

🧩 Step 7: Mini Summary

🧠 What You Learned: Transformers can adapt to new tasks from text alone (in-context learning) and align to human intent via RLHF.

⚙️ How It Works: Prompting sets context; ICL performs task inference; RLHF tunes models through preference-based rewards.

🎯 Why It Matters: This trio is the foundation of today’s most capable and aligned models — turning static neural networks into dynamic conversational agents.

Any doubt in content? Ask me anything?
Chat
🤖 👋 Hi there! I'm your learning assistant. If you have any questions about this page or need clarification, feel free to ask!