What is Karpathy's LLM Wiki pattern?

Andrej Karpathy's LLM Wiki gist describes building persistent knowledge bases where an LLM incrementally maintains interlinked markdown files instead of re-retrieving raw documents on every query. Three layers: immutable raw sources, an LLM-owned wiki (summaries, entity pages, synthesis), and a schema file (CLAUDE.md or AGENTS.md) defining conventions and workflows.

How is LLM Wiki different from RAG?

RAG retrieves document chunks at query time—knowledge is re-derived each ask. LLM Wiki compiles knowledge once during ingest, maintains cross-references and contradictions over time, and queries read pre-built pages. Karpathy and implementers argue that below ~50K–100K tokens (~150–200 dense pages), pure context/wiki beats RAG on reliability without vector DB infrastructure.

Where is the original LLM Wiki gist?

The idea file lives at gist.github.com/karpathy/442a6bf555914893e9891c11519de94f (llm-wiki.md). It is designed to be copy-pasted into Claude Code, Codex, OpenCode, or similar agents. As of June 2026 it has 5,000+ stars and 5,000+ forks on GitHub Gists.

What are the three core operations in LLM Wiki?

Ingest: drop a source, LLM reads it, writes summary pages, updates entity/concept pages across 10–15 files, appends to log.md. Query: search wiki pages, synthesize answers with citations; good answers get filed back into the wiki. Lint: health-check for contradictions, stale claims, orphan pages, missing cross-references.

What are index.md and log.md in LLM Wiki?

index.md is a content catalog—every wiki page with link, one-line summary, organized by category. The LLM reads it first to find relevant pages before drilling in. log.md is chronological and append-only—ingests, queries, lint passes. Greppable entries like ## [2026-04-02] ingest | Article Title enable simple timeline queries.

How does LLM Wiki relate to Google's Open Knowledge Format?

Karpathy's gist describes the pattern; Google's OKF v0.1 (June 2026) formalizes interoperable conventions (YAML frontmatter, type field, concept graphs) for the same LLM-wiki shape at organizational scale. See our OKF guide for the enterprise spec; this guide covers Karpathy's original three-layer architecture.

Karpathy LLM Wiki Pattern: Agent Memory Guide | explainx.ai Blog

explainx.ainewsletter3.5k

workshops ↗

Karpathy LLM Wiki Pattern: Agent Memory Guide | explainx.ai Blog | explainx.ai

Question	Answer
Gist URL	karpathy/llm-wiki.md
Core idea	Persistent, compounding wiki—not per-query retrieval
Layer 1	Raw sources (immutable)
Layer 2	Wiki (LLM-owned markdown graph)
Layer 3	Schema (`CLAUDE.md` / `AGENTS.md`)
Operations	Ingest → Query → Lint
Special files	`index.md` (catalog), `log.md` (timeline)
Human role	Curate sources, ask questions, steer analysis
LLM role	Summarize, cross-reference, file, bookkeeping
vs RAG	Wiki wins below ~50K–100K tokens; RAG for millions+

Page type	Purpose
Source summaries	One page per ingested document
Entity pages	People, companies, concepts
Topic summaries	Evolving synthesis
Comparisons / analyses	Filed from query operations
`overview.md`	High-level map of the domain

Check	Action
Contradictions between pages	Flag or reconcile (domain-dependent)
Stale claims superseded by newer sources	Update or mark superseded
Orphan pages (no inbound links)	Link or merge
Concepts mentioned but no dedicated page	Create stub pages
Missing cross-references	Add links
Data gaps	Suggest web search or new sources

File	Orientation	Purpose
`index.md`	Content	Catalog of all pages—link, one-line summary, optional metadata (date, source count). Updated on every ingest. Query entry point.
`log.md`	Chronological	Append-only timeline of ingests, queries, lint passes

Domain	Example
Personal	Goals, health, psychology—journal + articles → structured self-model
Research	Papers over months → evolving thesis wiki
Reading a book	Chapter-by-chapter filing → personal Tolkien Gateway
Business/team	Slack, meetings, docs → LLM-maintained internal wiki
Competitive analysis	Due diligence, market maps
Trip planning, courses, hobbies	Any accumulating knowledge

Corpus size	Best approach
< ~50K–100K tokens (~150–200 dense pages)	LLM Wiki / full context — 100% retrieval reliability, no vector DB, global reasoning
Millions of tokens+	RAG — won't fit in context
In between / production	Hybrid — stable core in wiki, dynamic mass in RAG

Project	Focus	Link
AutoSci	Research agent; contradiction edges; self-evolving wiki; 3 papers end-to-end	github.com/skyllwt/AutoSci
memwiki	Coding-agent memory (`.memory/` + hooks for Claude/Cursor/Copilot)	github.com/hereisSwapnil/memwiki
secure-llm-wiki	Untrusted-source isolation; four-eyes review; provenance	github.com/NicoBleh/secure-llm-wiki
interview-doc-agent	Personal career library; context vs RAG proof	github.com/Shilren/interview-doc-agent
Dense-Mem	MCP memory server; typed claims, conflicts, graph	github.com/markhuangai/dense-mem
LLM-Wiki-MCP	Wiki as MCP-accessible system; provenance-aware ingest	github.com/Electro-resonance/LLM-WIKI-MCP
synthadoc	Web chat UI, lint, scheduled ingest	github.com/axoviq-ai/synthadoc
synto	Local-first; per-role providers; Ollama-friendly	github.com/kytmanov/synto
my-llm-wiki	Agentic arXiv/GitHub/YouTube ingest + D3 graph demo	github.com/MuhammadSaqlainAslam/my-llm-wiki
Google OKF v0.1	Vendor-neutral spec formalizing the pattern	OKF guide

	Karpathy LLM Wiki	Google OKF	CLAUDE.md
What	Pattern / idea file	Formal spec v0.1	Agent convention file
Scope	Any domain you define	Org knowledge graphs	Single-repo instructions
Required metadata	You define in schema	`type` in YAML frontmatter	None required
Interoperability	Bespoke per wiki	Cross-vendor bundles	Tool-specific
Best for	Personal/team wikis, research	Enterprise catalogs, BigQuery	Coding agent behavior

Karpathy LLM Wiki: The Pattern Behind Agent Memory (Complete Guide)

Related posts

What Is an Obsidian Vault? The Viral Neural-Graph Post, Fact-Checked (2026)

Is English really the hottest programming language? Karpathy's 2023 tweet, three years later

OpenKnowledge: Local-First Markdown Editor and LLM Wiki for Claude and Cursor

TL;DR

The Problem With RAG-Only Workflows

Three Layers

1. Raw sources (immutable)

2. The wiki (LLM-owned)

3. The schema (`CLAUDE.md` / `AGENTS.md`)

Three Operations

Ingest

Query

Lint

index.md vs log.md

Use Cases (From the Gist)

LLM Wiki vs RAG: The Magnitude Question

Optional: CLI Tools

Ecosystem: What People Built

Community design debates (worth knowing)

How to Start (Minimal)

LLM Wiki vs OKF vs CLAUDE.md

Why This Works (Karpathy's Argument)

Summary