How did GLM-5.2 score against Claude Fable 5 on Kilo Code's planning benchmark?

Claude Fable 5 scored 9.1 and GLM-5.2 scored 9.0 on the same planning task with the same prompt and rubric. The gap is within the noise of a single-run evaluation. Both models made the same architectural decisions on fast SHA-256 hashing, unknown-flag caching, and keeping environment variables out of the rollout hash.

What was the one area where Fable 5 outperformed GLM-5.2?

Fable 5 explicitly spelled out a create-time cache trap in its plan. GLM-5.2 left the same constraint implicit. This was the single edge that pushed Fable's score ahead by 0.1 points.

How much cheaper is GLM-5.2 compared to Claude Fable 5?

GLM-5.2 lists at $1.40 per million input tokens and $4.40 per million output tokens. Claude Fable 5 lists at $10 per million input tokens and $50 per million output tokens. GLM-5.2 is roughly one-tenth the price of Fable 5.

Does this mean GLM-5.2 plans as well as Fable 5 across all tasks?

No. This was one task, one run — not proof of parity across the board. But it is a strong signal that open-weight models are closing the planning gap, and at a price point that changes the economics of when to use a frontier model versus an affordable alternative for every task.

← Back to blog

explainx / blog

GLM-5.2 vs Claude Fable 5: Kilo Code's Planning Benchmark Shows a Near-Tie at 1/10th the Price

GLM-5.2 vs Claude Fable 5 on Kilo Code planning benchmarks — near-tie at 1/10th the price. Comparison updated July 1 with Fable live again.

Jun 19, 2026·1 min read·Yash Thakker

AI ModelsZhipu AIGLMClaude Fable 5AnthropicBenchmarksAgentic Coding

go deep

GLM-5.2 vs Claude Fable 5: Kilo Code's Planning Benchmark Shows a Near-Tie at 1/10th the Price

A quick look at how GLM-5.2 goes toe-to-toe with Claude Fable 5 on planning.

Jul 9, 2026

GPT-5.6 Sol, Terra, Luna vs Claude Fable 5: Complete Frontier Comparison

OpenAI launches GPT-5.6 publicly July 9; Fable 5 live globally since July 1. Sol Ultra leads Terminal-Bench at 91.9%; Fable leads SWE-Bench Pro at 80.3%. Terra matches Fable on terminal work at half the price — tier-by-tier guide.

Jun 15, 2026

GLM-5.2 Beats Fable 5 on Reasoning — 24 Hours After the U.S. Export Ban

The U.S. pulled Fable 5 on June 12. Within 48 hours, two Chinese labs had released models that beat it on key benchmarks — fully open source, at a fraction of the cost. Here is what GLM-5.2 is, what it can do, and what the timing means.

Jul 28, 2026

Opus 5 on SlopCodeBench: 24% Strict Pass, Still Can't Run Lights-Off

SlopCodeBench measures whether a model can maintain a codebase across incrementally revealed checkpoints, not just solve one problem once. humanlayer's dhorthy ran Claude Opus 5, Opus 4.8, and Sonnet 5 through 17 checkpoints and watched live for six hours. Opus 5 won on strict pass rate but also tripled the code volume — this post breaks down what the numbers mean and what HN argued about.

GLM-5.2 vs Claude Fable 5: Kilo Code's Planning Benchmark Shows a Near-Tie at 1/10th the Price

GLM-5.2 vs Claude Fable 5: Kilo Code's Planning Benchmark Shows a Near-Tie at 1/10th the Price

Related posts

GPT-5.6 Sol, Terra, Luna vs Claude Fable 5: Complete Frontier Comparison

GLM-5.2 Beats Fable 5 on Reasoning — 24 Hours After the U.S. Export Ban

Opus 5 on SlopCodeBench: 24% Strict Pass, Still Can't Run Lights-Off

Related posts

GPT-5.6 Sol, Terra, Luna vs Claude Fable 5: Complete Frontier Comparison

GLM-5.2 Beats Fable 5 on Reasoning — 24 Hours After the U.S. Export Ban

Opus 5 on SlopCodeBench: 24% Strict Pass, Still Can't Run Lights-Off