What is Qwen-AgentWorld?

Qwen-AgentWorld is a language model from Alibaba's Qwen team trained to simulate agent environments — predicting what a terminal, search engine, MCP server, browser, Android UI, or OS would return given an agent's action. It covers seven domains in a single model and is available open-source at 35B-A3B scale (MoE, 3B active parameters, 256K context).

How is Qwen-AgentWorld different from a regular LLM used as an agent?

A regular agent LLM learns to choose actions. Qwen-AgentWorld learns to predict environment observations — what happens next after an action is taken. This is the world-model side of the agent loop, not the policy side. The two capabilities are complementary and can be unified in one model.

What is AgentWorldBench?

AgentWorldBench is a seven-domain evaluation benchmark released alongside Qwen-AgentWorld. Each sample is paired with a ground-truth observation from a real environment execution, enabling reference-grounded scoring across five dimensions: format, factuality, consistency, realism, and quality.

Does controllable Sim RL actually beat training in real environments?

In their search experiment, controllable Sim RL reached 50.3% F1 at step 60 vs. 45.6% for Real RL trained with a live search engine. The key is controllability — uncontrolled simulation produced negligible gains, and in one domain (MCP Tool Decathlon) actually hurt performance.

Can world-model training transfer to agent tasks without agentic fine-tuning?

Yes. LWM RL warm-up on single-turn, tool-call-free world-model tasks transferred to multi-turn, tool-calling agent benchmarks across seven evaluations — including three domains entirely absent from world-model training — without any subsequent agent-specific RL.

What model sizes are available?

Two scales: 35B-A3B (open-source, MoE with 3B active parameters, 256K context) and 397B-A17B (the flagship scale used in the AgentWorldBench comparisons). The 35B-A3B model is available on Hugging Face and ModelScope.

Qwen-AgentWorld: Language World Model for AI Agents — 2026 | explainx.ai Blog

explainx.ainewsletter3.5k

workshops ↗

Qwen-AgentWorld: Language World Model for AI Agents — 2026 | explainx.ai Blog | explainx.ai

Alibaba's Qwen team just released something most AI labs haven't tried: a model trained not to act in environments, but to model them. Qwen-AgentWorld is a language world model — a single model that predicts what seven different agent environments (terminals, search engines, MCP servers, browsers, Android UIs, desktop OSes, and code editors) will return after any given action.

The flagship 397B-A17B version outperforms GPT-5.4 and Claude Opus 4.8 on AgentWorldBench, the new seven-domain benchmark released alongside it. The 35B-A3B version (MoE, 3B active parameters, 256K context) is fully open-source on Hugging Face.

The paper dropped June 23, 2026. The GitHub and HuggingFace repos are live.

The Core Idea: Model the Environment, Not Just the Agent

Every AI agent tutorial shows the same loop: agent observes state → agent picks action → environment returns new state → repeat. Almost all research optimizes one side of this loop — the policy (which action to take). Nobody has explicitly trained a language model to optimize the other side: predicting what the environment returns.

That's what Qwen-AgentWorld does. Given the interaction history and the agent's next action, the world model predicts the environment's response. For a terminal, that means predicting the exact shell output. For a search engine, that means generating realistic URLs, snippets, and rankings. For an MCP server, that means predicting the correct API response while maintaining referential consistency across sequential calls.

This isn't template generation. Getting these predictions right requires the same capabilities that make a good agent: multi-step causal reasoning, long-context state tracking, and domain-specific knowledge. For background on how agentic systems operate in these kinds of environments, see our overview of MCP servers and how to connect them to AI tools.

Seven Domains, One Model

Qwen-AgentWorld covers four text-based and three GUI-based environment types:

Text Environments

Terminal — shell output, file system state, process behavior

Model	Overall
Qwen-AgentWorld-397B-A17B	58.71
GPT-5.4	58.25
Claude Opus 4.8	— (below GPT-5.4)
Qwen-AgentWorld-35B-A3B	56.39
Claude Sonnet 4.6	56.04

	Tool Decathlon	MCPMark
Qwen3.5-35B-A3B-SFT	32.4	21.5
+ Sim RL (uncontrolled)	31.5	24.6
+ Sim RL (controlled)	36.1	33.8

	F1 by Item	F1 by Row
Qwen3.5-35B-A3B-SFT	34.02	13.72
+ Sim RL (controlled)	50.31	24.21

Benchmark	Base	+ LWM RL	Δ
Terminal-Bench 2.0	33.3	39.6	+6.3
SWE-Bench Verified	64.5	67.9	+3.4
SWE-Bench Pro	42.2	47.4	+5.2
WideSearch F1 Item	33.4	46.2	+12.8
Claw-Eval (OOD)	53.6	64.9	+11.3
QwenClawBench (OOD)	39.8	49.4	+9.7
BFCL v4 (OOD)	62.3	71.3	+9.0

bash

# SGLang
python -m sglang.launch_server \
    --model-path Qwen/Qwen-AgentWorld-35B-A3B \
    --port 8000 \
    --tensor-parallel-size 4 \
    --context-length 262144 \
    --reasoning-parser qwen3

# vLLM
vllm serve Qwen/Qwen-AgentWorld-35B-A3B \
    --port 8000 \
    --tensor-parallel-size 4 \
    --max-model-len 262144 \
    --reasoning-parser qwen3 \
    --trust-remote-code

python

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen-AgentWorld-35B-A3B",
    torch_dtype="auto",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-AgentWorld-35B-A3B")

Qwen-AgentWorld: The First Language World Model for General AI Agents (2026)

The Core Idea: Model the Environment, Not Just the Agent

Seven Domains, One Model

Related posts

Thoughtworks Zero-Cost Fallacy — Open Source in the Agentic Era

OpenClaw Foundation: 501(c)(3) Non-Profit, Full-Time Team, Major Partners

OpenClaw iOS and Android Apps Launch: Agents in Your Pocket (June 30)

How It Was Trained

Stage 1: Continual Pre-Training

Stage 2: Supervised Fine-Tuning

Stage 3: Reinforcement Learning

AgentWorldBench Results

What the Model Reasons About

Two Paradigms for Using World Models in Agent Training

Paradigm I: Decoupled Simulation (World Model as Environment)

Paradigm II: Agent Foundation Model (World Modeling as Agent Capability)

Deployment

What This Changes (and What It Doesn't)

The Bigger Picture

Where to Go