explainx / blog
LongCat-2.0: Meituan's 1.6T MoE Open Model Trained on AI ASIC Superpods
LongCat-2.0: 1.6T MoE, 48B active, 1M context, Terminal-Bench 70.8. Trained on 50K+ AI ASICs over 35T tokens. Claude Code, OpenClaw, Hermes β full guide.
explainx / blog
LongCat-2.0: 1.6T MoE, 48B active, 1M context, Terminal-Bench 70.8. Trained on 50K+ AI ASICs over 35T tokens. Claude Code, OpenClaw, Hermes β full guide.

Jun 15, 2026
Moonshot AI's Kimi K2.7-Code is a 1T-parameter open-weight MoE coding model with +21.8% on Kimi Code Bench v2, ~30% fewer reasoning tokens, and MCP tool scores that beat Opus 4.8 β at $0.95/M input under Modified MIT.
Jun 10, 2026
Launched June 9, 2026, North Mini Code is Cohere's first open-source agentic coding modelβa 30B parameter mixture-of-experts model with just 3B active parameters. Available under Apache 2.0, it delivers competitive performance on SWE-Bench and Terminal-Bench 2.0 while offering 2.8x higher output throughput than Devstral Small 2.
Jun 19, 2026
Kilo Code (21K+ GitHub stars) runs in VS Code, JetBrains, and CLI on an OpenCode core. BYOK with zero markup, optional Kilo Pass, Agent Manager, and the successor most Roo Code users picked after the May 2026 archive.
June 30, 2026: Meituan open-sourced LongCat-2.0 β a 1.6 trillion-parameter MoE model with ~48B activated parameters per token, trained on 50,000+ AI ASIC accelerators over 35+ trillion tokens with no rollbacks or irrecoverable loss spikes. The same day X shipped hosted MCP servers and Fable 5 remained offline Day 18, another open-weight frontier coder entered the field.
This is not the LongCat Video Avatar talking-head model. LongCat-2.0 is Meituan's text/code/agent LLM β positioned for repository-level edits, automated task execution, and long-horizon agent workflows via Claude Code, OpenClaw, and Hermes.

| Item | Detail |
|---|---|
| Released | June 30, 2026 (official blog) |
| Organization | Meituan (food delivery / local services giant) |
| Architecture | MoE β 1.6T total, ~48B active per token |
| Context | Trained on 1M-context data (hundreds of billions of tokens) |
| Attention | LongCat Sparse Attention (LSA) β evolution of DeepSeek Sparse Attention |
| Training | 50K+ AI ASIC superpods, 35T+ tokens, deterministic ops |
| Harnesses | Claude Code, OpenClaw, Hermes |
| Weights | Open-source announced β Hugging Face says "coming soon" |
| HN signal | 43 points β community focused on ASIC training story and weight availability |
Three threads make this release bigger than another MoE drop:
Meituan states the full training run and large-scale deployment run on tens of thousands of AI ASIC superpods β not a mature Nvidia GPU stack. Community speculation on Hacker News points at Huawei Ascend 910C-class accelerators, though Meituan has not confirmed vendor publicly in the launch post.
The claim: 35+ trillion tokens pre-trained with no rollbacks or irrecoverable loss spikes β evidence that alternative hardware can sustain frontier-scale runs if you invest in deterministic operators, fault recovery, and memory-aware parallelism.
That is strategically significant while US export controls fragment access to Fable 5 and Nvidia-class compute remains concentrated.
Day 18 of the Fable ban: international developers still need unrestricted frontier coders. Kimi K2.7-Code (June 12), GLM-5.2, and now LongCat-2.0 extend the open-weight ladder β each with different harness fit and hardware cost.
LongCat-2.0 ships integrated with Claude Code, OpenClaw, and Hermes β not "API only, figure out agents yourself." Meituan measured Terminal-Bench 2.1 and SWE-bench Pro through Claude Code sandboxes, aligning with how explainx.ai readers actually evaluate models (Claude Code MCP guide, /mcp-servers).
LongCat-2.0 builds on LongCat-Flash, pushing parameter efficiency and long-context speed.
Agent workloads drive long-input processing. DeepSeek Sparse Attention (DSA) uses fine-grained sparsity, but Meituan profiles the Lightning Indexer as a bottleneck (output discontinuity, quadratic scoring cost).
LSA adds three orthogonal indexer improvements:
| Component | What it does |
|---|---|
| Streaming-aware Indexing (SI) | Reshapes token selection for coalesced HBM access β contiguous reads instead of fragmented scatter |
| Cross-Layer Indexing (CLI) | One indexing pass serves multiple consecutive layers at inference β saliency stable across adjacent layers |
| Hierarchical Indexing (HI) | Coarse-to-fine scoring β block-level recall, then fine token selection inside candidates |
LSA extends to Multi-Token Prediction (MTP) for speculative decoding: draft and target models share indexing passes across MTP steps.

LongCat-2.0 adds 135B N-gram Embedding parameters (n-gram size 5) β expanding the embedding space ~100Γ via token combinations. Meituan's scaling logic:
Inference benefit: shifting params from experts to N-gram Embedding reduces large-batch decoding memory I/O.

| Parameter | LongCat-2.0 | Kimi K2.7-Code (comparison) |
|---|---|---|
| Total params | 1.6T | 1T |
| Active per token | ~48B | 32B |
| Context training | Up to 1M | 256K |
| Attention | LSA (sparse) | MLA |
Meituan's infrastructure section is unusually detailed for a launch blog β worth reading in full on longcat.chat/blog/longcat-2.0.
Highlights:
HN takeaway: the software community around non-Nvidia ASICs is "still less developed" (Meituan's words) β they built custom stacks to compensate. If Ascend-class clusters can train 1.6T MoE cleanly, the compute diversification narrative accelerates.
Scores from Meituan's table. * = external reported metrics; others in-house unified harness. "-" = not available.
| Model | Terminal-Bench 2.1 | SWE-bench Pro | SWE-bench Multilingual |
|---|---|---|---|
| LongCat-2.0 | 70.8 | 59.5 | 77.3 |
| Gemini 3.1 Pro | 70.7* | 54.2* | 76.9* |
| GPT-5.5 | 73.8* | 58.6* | β |
| Opus 4.6 | β | 57.3* | 77.8* |
| Opus 4.7 | 71.7* | 64.3* | 80.5* |
| Opus 4.8 | 78.9* | 69.2* | 84.8* |
Methodology notes (from Meituan):
LongCat-2.0 edges Gemini 3.1 Pro on Terminal-Bench in this table and sits between GPT-5.5 and Opus 4.7 on SWE-bench Pro β but trails Opus 4.8 on all three code-agent columns where external numbers exist.
| Model | FORTE β | BrowseComp | RWSearch |
|---|---|---|---|
| LongCat-2.0 | 73.2 | 79.9 | 78.8 |
| Gemini 3.1 Pro | 70.3 | 85.9* | 76.3 |
| GPT-5.5 | 77.8 | 84.4* | 85.3 |
| Opus 4.8 | 77.2 | 84.3* | 77.3 |
FORTE (Full-cycle Office Real-world Task Evaluation) β 15 corporate professions, OpenClaw/Hermes/Claude Code compatible, 45-minute task timeout.
| Model | IFEval | Writing Bench | IMO-AnswerBench | GPQA-diamond |
|---|---|---|---|---|
| LongCat-2.0 | 90.0 | 83.8 | 81.8 | 88.9 |
| Opus 4.8 | 86.0 | 85.2 | 75.3 | 92.4 |
Honesty filter: these are mostly Meituan-measured except starred externals. Independent reproduction on DevThrottle and community harnesses typically lags open-weight launches by 1β2 weeks. Treat launch numbers as directional until third parties confirm.
LongCat-2.0 uses a MOPD multi-expert post-training architecture:
| Expert group | Focus |
|---|---|
| Agent Experts | Code, work, search β tool invocation, parameter parsing, self-correction vs infinite loops |
| Reasoning Experts | Math, STEM, multi-hop β adaptive compute by difficulty |
| Interaction Experts | Instruction following, hallucination suppression, safety bounds |
Fusion via MOPD combines agentic execution, reasoning depth, and alignment β the same "specialist merge" pattern other Chinese labs use when one base MoE serves multiple product surfaces.

The launch page includes scenario tabs: codebase migration (full plugin rewrite to new SDK), web app development, agentic research, data analysis, presentation generation, and creative writing.
The codebase migration demo is the developer hook: read full repo + migration docs, map architecture, rewrite plugin preserving behavior, compile clean on first build β the same long-horizon promise Kimi K2.7-Code markets.
From the official page:
| Channel | Status |
|---|---|
| Try it | Web demo on longcat.chat |
| API Access | Listed on launch page |
| GitHub | Linked from announcement |
| Hugging Face | meituan-longcat/LongCat-2.0 β weights coming soon |
| Claude Code / OpenClaw / Hermes | Harness integration claimed at launch |
Self-hosting reality check: Hacker News commenters note llama.cpp does not support LongCat models today; expect Transformers + vLLM/SGLang when weights land. At 1.6T, even 2-bit quantization implies ~400GB+ weight storage before KV cache β "if you have to ask, you can't run it" on consumer GPUs.
The HN thread (~43 points, 3 hours after launch) split three ways:
1. ASIC training is the real story
"This is the real news story... frontier-scale training on alternative hardware platforms." β gardnr
Speculation: Huawei Ascend 910C clusters. If true, LongCat-2.0 is evidence of non-Nvidia frontier pretraining at 1.6T scale β geopolitically parallel to the Fable export-control fight.
2. Weights not yet downloadable
Hugging Face repo live but weights pending. GitHub links reportedly 404 at launch per early commenters β verify before planning fine-tunes.
3. Early quality probes mixed
One HN user tested a nuclear-reactor fuel question β LongCat gave a well-reasoned but incorrect answer (Pu-241 vs U-235); Qwen 3.7 Plus and Gemini Flash answered correctly. n=1 β not a verdict, but a reminder to eval on your domain before production routing.
4. Architecture lineage
Discussion of DeepSeek Sparse Attention lineage β Meituan extends DSA with LSA rather than shipping a pure finetune of DeepSeek V4-Pro, though architectural debt to DeepSeek's sparse-attention research is acknowledged in the community.
| Model | Org | Total / Active | Weights status | Standout claim |
|---|---|---|---|---|
| LongCat-2.0 | Meituan | 1.6T / ~48B | Coming soon | Terminal-Bench 70.8, ASIC training, 1M context |
| Kimi K2.7-Code | Moonshot | 1T / 32B | Available (Modified MIT) | MCP Mark 81.1 vs Opus 76.4, $0.95/M API |
| GLM-5.2 | Zhipu | β | Available (MIT) | BridgeBench reasoning, security parity narrative |
| Opus 4.8 | Anthropic | Closed | API | Official Fable 5 fallback; leads LongCat table on SWE-bench Pro |
Practical guidance: use Kimi/GLM today if you need downloadable weights now; watch LongCat for independent SWE-bench reproduction and weight drop; keep Opus 4.8 as closed baseline until Fable returns.
For enterprise migration playbooks during the ban, see Fable 5 open-source alternatives.
Before routing production agents to LongCat-2.0:
LongCat-2.0 is Meituan's bid for open-weight frontier agents β larger than Kimi K2.7, trained on AI ASIC superpods at 1.6T / 48B active, with credible in-house coding benchmarks and Claude Code / OpenClaw / Hermes integration on day one.
The headline for infrastructure watchers: 35T tokens, zero irrecoverable spikes, 50K ASICs β frontier training is no longer exclusively an Nvidia story.
The headline for developers: weights coming soon, Opus 4.8 still leads on external SWE-bench numbers in Meituan's own table, and Fable 5 is still offline β LongCat joins the open ladder, not the closed frontier restore.
Run your eval when weights drop. The benchmark that matters is on your codebase.
LongCat-2.0 specs and benchmarks accurate as of June 30, 2026 per longcat.chat/blog/longcat-2.0. Verify Hugging Face weight availability, API pricing, and independent benchmark reproduction before production use.