Prompt Repetition
Problem Being Solved
LLMs are trained as Causal Language Models, where each token attends only to previous tokens. This leads to:
- Context-Question Problem: The question is unknown when processing context
- Options-First MCQ Problem: Cannot fully understand the question context when viewing answer choices
- Position/Index Problem: Attention weights weaken for specific position information in long lists
Prompt repetition enables the second pass to reference the entire first pass, effectively mimicking some benefits of bidirectional attention.
When to use this skill
- When using lightweight models: claude-haiku, gemini-flash, gpt-4o-mini, etc.
- Options-First MCQ: Multiple choice where answer choices appear before the question
- Context + Question: Searching for specific information in long contexts
- Index/Position Tasks: Position-based queries in inventories or lists
- NPC Dialogue: Maintaining consistency for game AI characters
- Non-Reasoning Tasks: Tasks that do not use Chain-of-Thought
How It Works
Limitations of Causal Attention
[Context] β [Question]
β
Cannot reference Question content when processing Context tokens
Attention weights for Context are already finalized by the time Question tokens appear
How Prompt Repetition Solves This
[First Pass] [Second Pass]
Context β Question β Context' β Question'
β β
Can reference entire first pass
In the second repetition, the model reprocesses information across the entire first prompt and strengthens attention weights on key concepts, resulting in improved performance.
Note: This does not change the model architecture to bidirectional; it is a prompt engineering technique to mitigate the limitations of causal models.
Research Results (Google Research 2025)
| Metric |
Result |
| Significant improvement (p < 0.1) |
47 / 70 benchmarks |
| Performance degradation |
0 |
| Neutral |
23 |
| Improvement rate |
67% |
Most dramatic improvement: Gemini 2.0 Flash-Lite on NameIndex: 21.33% β 97.33% (+76%p)
Tested Models
- Gemini 2.0 Flash / Flash Lite
- GPT-4o / GPT-4o-mini
- Claude 3.7 Sonnet / Claude 3 Haiku
- Deepseek V3
Tested Benchmarks
- ARC (Challenge) - Scientific reasoning
- OpenBookQA - Open-domain QA
- GSM8K - Math problems
- MMLU-Pro - Multitask language understanding
- MATH - Mathematical problem solving
- NameIndex / MiddleMatch - Custom position tasks
Application Procedure
Step 1: Verify Auto-Apply Target Models
| Provider |
Auto-apply models |
Excluded models |
| Claude |
haiku series |
opus, sonnet |
| Gemini |
flash, flash-lite |
pro, ultra |
| OpenAI |
gpt-4o-mini, gpt-low |
gpt-4o, gpt-4 |
Step 2: Determine Repetition Count by Task Type
| Task Type |
Keyword Pattern |
Repetitions |
Expected Improvement |
| Options-First MCQ |
A. B. C. D. choices first |
2Γ |
+15-40%p |
| Index/Position |
slot, position, index, N-th |
3Γ |
+50-76%p |
| Context + Question |
General question |
2Γ |
+5-15%p |
| With CoT |
step by step, think through |
0Γ (not applied) |
~0% |
Step 3: Check Token Limits
max_context = model_context_window * 0.8
if len(prompt_tokens) * repetitions > max_context:
repetitions = max(1, int(max_context / len(prompt_tokens)))
Step 4: Prompt Transformation
def apply_prompt_repetition(prompt: str, times: int = 2) -> str:
"""Repeat the prompt a specified number of times
Args:
prompt: Original prompt
times: Number of repetitions (default 2)
Returns:
Repeated prompt
"""
if times <= 1:
return prompt
return "\n\n".join([prompt] * times)
Practical Examples
Example 1: Options-First MCQ (Greatest Effect)
Before:
A. Paris
B. London
C. Berlin
D. Madrid
Which city is the capital of France?
Reply with one letter.
After (repetition Γ2 applied):
A. Paris
B. London
C. Berlin
D. Madrid
Which city is the capital of France?
Reply with one letter.
A. Paris
B. London
C. Berlin
D. Madrid
Which city is the capital of France?
Reply with one letter.
Expected output:
A
Accuracy: original 78% β after repetition 93% (+15%p)
Example 2: Index/Position Tasks (Maximum Effect)
Before:
Inventory:
1. Iron Sword
2. Leather Armor
3. Health Potion (x5)
4. Magic Staff
...
25. Dragon Scale
...
50. Ancient Map
What item is in slot 25?
After (repetition Γ3 applied):
Prompt repeated 3 times
Expected output:
Dragon Scale
Accuracy: original 21% β after repetition 97% (+76%p)
Example 3: Tool Call Prompt Handling
Note: Prompts containing tool call instructions are also repeated in their entirety. The full-repetition approach was adopted for implementation simplicity and consistency.
Before:
Use the calculator tool to compute 234 * 567.
What is the result?
After (repetition Γ2):
Use the calculator tool to compute 234 * 567.
What is the result?
Use the calculator tool to compute 234 * 567.
What is the result?
Research results show that full repetition including tool call sections is also effective.
Production-Ready Implementation
Auto-Apply Transformer
"""prompt_repetition_transformer.py"""
from dataclasses import dataclass, field
from typing import Optional, Callable, List
import re
MODEL_CONTEXT_WINDOWS = {
"claude-3-haiku": 200_000,
"claude-haiku": 200_000,
"gemini-flash": 1_000_000,
"gemini-flash-lite": 1_000_000,
"gemini-2.0-flash": 1_000_000,
"gpt-4o-mini": 128_000,
"gpt-low": 128_000,
}
AUTO_APPLY_MODELS = list(MODEL_CONTEXT_WINDOWS.keys())
COT_PATTERNS = [
r"step by step",
r"think through",
r"let's think",
r"reasoning:",
r"chain of thought",
]
POSITION_PATTERNS = [
r"slot \d+",
r"position \d+",
r"index \d+",
r"\d+(st|nd|rd|th)",
r"item \d+",
r"row \d+",
r"column \d+",
]
@dataclass
class PromptRepetitionConfig:
"""Prompt repetition configuration"""
default_repetitions: int = 2
position_repetitions: int = 3
separator: str = "\n\n"
max_context_ratio: float = 0.8
applied_marker: str = "<!-- prompt-repetition-applied -->"
class PromptRepetitionTransformer:
"""Auto-apply prompt repetition transformer for lightweight models"""
def __init__(self, config: Optional[PromptRepetitionConfig] = None):
self.config = config or PromptRepetitionConfig()
def should_apply(self, model: str, prompt: str) -> bool:
"""Determine whether to auto-apply"""
if self.config.applied_marker in prompt:
return False
model_lower = model.lower()
if not any(m in model_lower for m in AUTO_APPLY_MODELS):
return False
prompt_lower = prompt.lower()
for pattern in COT_PATTERNS:
if re.search(pattern, prompt_lower):
return False
return True
def determine_repetitions(self, prompt: str, model: str) -<