← Back to blog

explainx / blog

Sakana Fugu: One Model API to Orchestrate All the Others

Sakana AI just released Fugu Ultra — a multi-agent orchestration model that matches Anthropic Fable 5 and Mythos Preview performance without single-vendor dependency. Here's what it does, how it works, and why it matters for AI sovereignty.

·8 min read·Yash Thakker
AI AgentsMulti-Agent SystemsFoundation ModelsAI SovereigntyLLMsOrchestration
Sakana Fugu: One Model API to Orchestrate All the Others

There's a new way to think about frontier AI performance: don't build a bigger model, build a smarter coordinator.

Sakana Fugu, released today by Sakana AI, is the first production model built on this premise. It's a multi-agent orchestration system that presents itself as a single foundation model — you call one API endpoint, and internally Fugu decides whether to answer directly or assemble a team of specialized models to handle it. The complexity never reaches your code.

And the benchmarks are hard to dismiss. Fugu Ultra matches Anthropic's Fable 5 and Mythos Preview across the industry's most rigorous engineering, scientific, and reasoning tests — while neither of those models is even in its agent pool. They can't be. They're subject to export controls.

That last part is the point.


Why Orchestration, Why Now

For the past few years, progress in AI meant one thing: bigger models, more compute, more data. It worked. But it also created a concentration problem that is no longer theoretical.

Anthropic's Fable 5 and Mythos models — today's most capable frontier models — recently had export controls imposed on them. Organizations that built critical infrastructure on those APIs found their access could shift or disappear overnight due to regulatory boundaries and foreign policy decisions.

Sakana's answer isn't a new mega-model. It's a system designed so that no single provider is a point of failure.

"Collective intelligence serves as the practical hedge against this concentration of power." — Sakana AI

Fugu's agent pool is explicitly swappable. If a provider restricts access, Fugu routes around the disruption. As new models arrive — including Sakana's own — they fold into the pool and pass the gains to users automatically.

Live WorkshopAug 1–2, 2026 · 2 days

Claude for Work

Use Claude as a thought partner for writing, research & decisions — no coding required. 2 live sessions with Yash Thakker.

Register now

Claude for Work is a 2-day live workshop on using Claude to supercharge your daily work — writing, research, analysis, and decision-making — without any coding required. Learn how to set up Claude Projects with custom instructions, run deep-research sprints, co-write documents that sound like you, and build repeatable prompt systems for your team. August 1–2, 2026. Hosted by Yash Thakker, founder of AISOLO Technologies, instructor to 350,000+ students.

Includes 1-year access to all session recordings, a personal prompt library, Discord community access, and a certificate of completion. No coding or technical background required. Designed for managers, marketers, founders, and writers.


What Fugu Actually Is

Fugu is itself a language model. Not a wrapper or a router built with if/else logic — a trained orchestrator that has learned:

  • When to delegate vs. solve directly
  • How to break complex tasks into agent subtasks
  • How agents should communicate with each other
  • How to combine their outputs into a single, reliable answer

This is built on two ICLR 2026 papers from Sakana AI: TRINITY (an evolved LLM coordinator) and Conductor (learning to orchestrate agents in natural language). The academic grounding matters — this isn't prompt engineering dressed up as a product.

From the outside, you call one model. On the inside, a coordinated system of experts does the work.


Fugu vs. Fugu Ultra

Both models share the same OpenAI-compatible API.

FuguFugu Ultra
Optimized forLow latency, everyday tasksMaximum quality, complex tasks
Best use casesCode review, chatbots, Codex-style toolsResearch, security assessment, patent analysis
Compliance controlsOpt specific agents out of poolSame
LatencyLowerHigher (deeper coordination)

The Benchmark Numbers

Fugu vs Frontier Models — Benchmark Comparison Grid

Fugu Ultra goes head-to-head with today's frontier models across coding, reasoning, scientific, and agentic benchmarks. The comparison is against Opus 4.8 (Anthropic's best publicly accessible model), Gemini 3.1 Pro, and GPT-5.5 — not Fable 5 or Mythos, which are under export controls and not in Fugu's pool.

Fugu Benchmark Table — Detailed Scores vs Opus 4.8, Gemini 3.1 Pro, GPT-5.5

Key results at a glance:

BenchmarkFuguFugu UltraOpus 4.8Gemini 3.1 ProGPT-5.5
SWE Bench Pro59.073.769.254.258.6
TerminalBench 2.180.282.174.670.378.2
LiveCodeBench92.993.287.888.585.3
LiveCodeBench Pro87.890.884.882.988.4
Humanity's Last Exam47.250.049.844.441.4
CharXiv Reasoning85.186.684.283.384.1
GPQA-D95.595.592.094.393.6
SciCode60.158.753.558.956.1
Long Context Reasoning74.773.367.772.774.3
MRCRv286.693.687.984.994.8

Fugu Ultra leads or ties on 8 of 10 benchmarks. The standard Fugu model leads on SciCode and Long Context Reasoning — suggesting the lighter model is better calibrated for document-heavy tasks where over-coordination adds noise.


Fugu Ultra Matches Fable 5

This is the headline Sakana is leading with, and it deserves unpacking.

Sakana explicitly claims Fugu Ultra "stands shoulder-to-shoulder" with Fable 5 (Anthropic's current frontier model, released earlier this year) and Mythos Preview on the industry's most rigorous benchmarks. Importantly, neither model is in Fugu's agent pool — they're under export controls and not publicly accessible.

Fugu achieves frontier-level performance by orchestrating models that are accessible, then combining their outputs intelligently. The implication: you don't need access to Fable 5 to get Fable 5-level results.

For teams that have been locked out of Fable 5 or Mythos due to export restrictions — or who simply want independence from single-vendor agreements — this is a meaningful alternative.


What Beta Users Built

Close to 500 early users put Fugu through real, demanding workflows during the beta. Three use cases stood out:

Code Review

"For code review, Fugu Ultra is significantly better than GPT-5.5. It gives comprehensive answers and finds the bugs others miss. Where other tools flag about three issues, Fugu surfaced more than twenty. It's become the model I run all my reviews through." — Software Engineer

Long-Session Agent Products

"Raw output quality is on par with top frontier models, but Fugu showed unusually strong persona stability across long sessions, holding its identity where other models drift. For agent products, that may matter more than raw benchmark scores." — Executive at Enterprise Platform Company

Security Assessment

"Given one scoped instruction, Fugu drove a full security assessment end-to-end — recon, XSS/SQLi checks, auth review, and a clean report with evidence and retest steps — staying inside scope and avoiding destructive actions." — Cyber Security Engineer

The pattern across beta feedback is consistent: Fugu's value compounds on long, multi-step tasks. A single prompt doesn't reveal it. A 20-step research workflow does.


The Architecture Advantage

Traditional multi-agent systems require you to build the orchestration layer: pick models, define handoffs, write routing logic, handle failures. That complexity lives in your code.

Fugu internalizes it. The orchestration is trained, not hardcoded. This means:

  • No integration overhead — one API call, one endpoint
  • Dynamic routing — Fugu decides which specialist handles which subtask at runtime
  • Failure resilience — if an agent in the pool becomes unavailable, Fugu adapts
  • Automatic improvement — as better models enter the pool, performance improves without any change to your code

For developers, this is the equivalent of moving from managing your own servers to using a managed cloud service — except the managed layer is intelligence routing, not infrastructure routing.


Availability and Pricing

Sakana Fugu is generally available today through a single OpenAI-compatible API:

  • Subscription tier — designed for everyday use and team deployments
  • Pay-as-you-go — for heavier workloads and enterprise use cases

Both Fugu and Fugu Ultra are on the same API. Switching between them is a model parameter change.

Visit the Sakana AI Fugu product page to get started.


What Comes Next

Sakana has flagged their roadmap priorities:

  1. Expanding the agent pool — including open-weight models and Sakana's own upcoming models
  2. Stronger long-running coordination — deeper support for multi-day agentic tasks
  3. User control — more granular configuration of how Fugu orchestrates on your behalf

The architecture compounds naturally: every new capable model that enters the open ecosystem potentially enters Fugu's pool. Unlike a monolithic model that requires an expensive retraining cycle to improve, Fugu improves incrementally as the broader ecosystem does.


The Bigger Picture

The AI industry in 2026 has a geopolitical problem. Frontier capability is increasingly concentrated in a handful of US-based providers subject to export controls, regulatory changes, and policy shifts that can happen faster than engineering teams can adapt.

Sakana Fugu is the first production system explicitly designed around this constraint. It doesn't try to out-scale the frontier labs. It learns to coordinate them — and route around them when access disappears.

Whether that's enough to hold its position as Fable 5 evolves (Anthropic today updated Fable 5 with expanded capabilities) remains to be seen. But the architecture is sound: collective intelligence is more resilient than any single model, and Fugu is the first system that has made collective intelligence feel like a single model.


Related Reading on ExplainX


Following the frontier model landscape? Subscribe to the ExplainX newsletter for weekly breakdowns of what's actually worth building on.

Related posts