What is HY-World 2.0?

HY-World 2.0 is Tencent Hunyuan’s multi-modal world-model framework for both world generation (text or single image → navigable 3D) and world reconstruction (multi-view images or video → 3D). Outputs are described as real 3D assets—meshes and 3D Gaussian Splattings (3DGS)—meant to be editable and importable into engines such as Blender, Unity, Unreal, and Isaac Sim. Primary repo: https://github.com/Tencent-Hunyuan/HY-World-2.0

What is WorldMirror 2.0 and what is available today?

WorldMirror 2.0 is a unified feed-forward reconstruction model (~1.2B parameters in the README model zoo) that predicts depth, normals, camera parameters, point clouds, and 3DGS attributes in one forward pass. As of the project’s April 2026 announcement, inference code and checkpoints are open via Hugging Face; the README documents a Python API, CLI, and Gradio demo under hyworld2.worldrecon.

How is this different from video-only world models?

The team contrasts HY-World with models that output pixel videos (they cite Genie 3, Cosmos, and their earlier HY-World 1.5 line as video-style). Their argument: videos are hard to edit, duration-limited, and can flicker across views, whereas explicit 3D representations are persistent, view-consistent, and can be rendered in real time on consumer GPUs after a one-time generation cost.

What is still “coming soon” in the open-source plan?

The README marks as not yet released: full world-generation inference (WorldNav + world composition), HY-Pano 2.0 panorama weights/code, WorldStereo 2.0 weights/code, and WorldNav trajectory planning. WorldMirror 2.0 and the technical report are released; older WorldStereo/WorldMirror/HunyuanWorld 1.0 artifacts are pointed to as interim references.

What are the practical install constraints?

The README recommends CUDA 12.4, Python 3.10, PyTorch 2.4.0 with cu124 wheels, requirements.txt, and FlashAttention (v3 from source or flash-attn pip). Multi-GPU CLI uses FSDP and bf16; the README states the number of input images must be at least the GPU count (e.g. 8 images for 8 processes)—verify DOCUMENTATION.md after updates.

Where does explainx.ai fit for builders?

HY-World is research/engine tooling—not an explainx.ai product. For adjacent workflows (browser agents, skills, 3D on the web), see /tools and /skills; for long-context and model releases, see our blog index. Always validate licenses (License.txt in repo) before shipping derivatives.

Tencent Hunyuan HY-World 2.0: 3D world models, | explainx.ai Blog

explainx.ainewsletter3.5k

workshops ↗

Tencent Hunyuan HY-World 2.0: 3D world models, | explainx.ai Blog | explainx.ai

HY-World 2.0 is Tencent Hunyuan’s open multi-modal world model stack: it ingests text, single-view images, multi-view images, and video, and targets persistent 3D outputs—meshes, 3D Gaussian Splattings (3DGS), and point clouds—not just another mp4. The team positions it as “building a playable world” versus “watching a movie that ends.”

This post summarizes the public GitHub README and docs as of early May 2026; weights, APIs, and benchmarks should be re-checked on the repo and DOCUMENTATION.md before you freeze a reproduction.

Product try (vendor): 3d-models.hunyuan.tencent.com/world — the README notes demand can be high.

TL;DR

Topic	Takeaway
Core pitch	3D assets (3DGS / mesh / points) with engine import, vs non-editable video world models
Reconstruction (shipping)	WorldMirror 2.0 — multi-view / video → 3D, ~1.2B params, HF weights, Python API + CLI + Gradio
Generation (roadmap)	Four-stage pipeline: HY-Pano 2.0 (panorama) → WorldNav (trajectory) → WorldStereo 2.0 (expansion) → WorldMirror 2.0 + 3DGS learning
Open today	Technical report, WorldMirror 2.0 code & checkpoints per README April 16, 2026 news block
Not open yet	Full world generation inference, HY-Pano 2.0, WorldStereo 2.0, WorldNav (all listed coming soon)

Two capabilities: generation vs reconstruction

World generation (per README): turn text or a single image into a navigable scene via the staged pipeline above—panorama, planning, stereo expansion, then composition with WorldMirror 2.0 and 3DGS training.

World reconstruction: WorldMirror 2.0 is the feed-forward workhorse—one forward pass estimates depth, surface normals, camera parameters, point clouds, and 3DGS-style attributes from multi-view stills or casual video, with flexible resolution (README cites 50K–500K pixels).

Architecture (high level)

The README diagrams a systematic pipeline for generation: HY-Pano 2.0 → WorldNav → WorldStereo 2.0 → WorldMirror 2.0 + splatting—turning language or a single rgb input into a composed 3D world. Technical details live in their report (linked from the repo); this article does not reproduce proprietary figures.

Open-source plan (checklist from README)

Item	Status in README
Technical report	Released
WorldMirror 2.0 code & checkpoints	Released
Full world generation inference (WorldNav + composition)	Planned
HY-Pano 2.0 weights & code	Planned (HunyuanWorld 1.0 noted as interim)
WorldStereo 2.0 weights & code	Planned (WorldStereo as interim)
WorldNav	Planned

Treat checkboxes as intent; license, export rules, and GPU support still gate real adoption.

Getting started with WorldMirror 2.0

The README’s minimal Python shape:

python

from hyworld2.worldrecon.pipeline import WorldMirrorPipeline

pipeline = WorldMirrorPipeline.from_pretrained('tencent/HY-World-2.0')
result = pipeline('path/to/images')

Optional priors (camera / depth) are passed as paths; the repo points to a prior preparation guide in DOCUMENTATION.md.

CLI (single GPU):

bash

python -m hyworld2.worldrecon.pipeline --input_path path/to/images

Multi-GPU uses torchrun with --use_fsdp --enable_bf16. Important operational constraint: input image count ≥ GPU count (e.g. 8 images for 8 processes).

Gradio:

bash

python -m hyworld2.worldrecon.gradio_app

Environment: conda Python 3.10, CUDA 12.4, torch 2.4.0 + cu124 wheels, pip install -r requirements.txt, and FlashAttention (v3 build or pip install flash-attn path).

Benchmarks (as reported—verify in the report)

The README includes tables for:

WorldStereo 2.0 — camera metrics and single-view-generated reconstruction on Tanks-and-Temples / MipNeRF360 vs baselines such as SEVA, Gen3C, Lyra, FlashWorld.
WorldMirror 2.0 — point map accuracy / completeness on 7-Scenes, NRGBD, DTU at low / medium / high inference resolutions, with and without prior injection; comparisons include Pow3R and MapAnything under varying prior conditions.

Rule of thumb: read the technical report for protocol detail—leaderboard numbers without split / preprocessing context mislead buyers and paper reviewers alike.

Why teams care (strategic, not hype)

Game / sim / robotics: Persistent 3D fits Unreal / Unity / Isaac pipelines better than frame dumps. One-time reconstruction cost plus cheap real-time rendering matches interactive RL and digital-twin workflows—if export and license terms align.

Caution: World generation end-to-end is not fully open yet; most hackers will live in WorldMirror reconstruction until WorldNav / HY-Pano 2.0 / WorldStereo 2.0 ship.

WebGPU complete guide (2026) — browser-side 3D/GPU context
Tencent Hy3 — 295B MoE text model for agents — Hunyuan's July 2026 coding release
How diffusion image generation works — complementary generative-media primer
AI tools directory — discover utilities by task
Agent skills registry — repo-native agent playbooks

Primary sources

Repository: github.com/Tencent-Hunyuan/HY-World-2.0
Documentation: DOCUMENTATION.md (English) · DOCUMENTATION_zh.md (中文)
Model hub: README cites WorldMirrorPipeline.from_pretrained('tencent/HY-World-2.0') — confirm the exact Hugging Face card from the repo’s Model Zoo table
Product page: 3d-models.hunyuan.tencent.com/world

Citation (from README)

bibtex

@article{hyworld22026,
  title={HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds},
  author={Team HY-World},
  journal={arXiv preprint},
  year={2026}
}

HY-World 2.0 is a fast-moving research release. Treat this explainx.ai article as May 6, 2026 orientation text—validate LICENSE, weights, and CLI flags on the official repository before production use.

Tencent Hunyuan HY-World 2.0: 3D world models, WorldMirror 2.0, and open-source plan

Related posts

NVIDIA Cosmos 3: Open Physical AI World Models for Robots and Autonomous Systems

What Are World Models? The AI Systems That Simulate Reality (Starchild-1 and Beyond)

"What Happens to Creativity When AI Makes Copying Free?" — The shadcn Debate, Explained

TL;DR

Two capabilities: generation vs reconstruction

Architecture (high level)

Open-source plan (checklist from README)

Getting started with WorldMirror 2.0

Benchmarks (as reported—verify in the report)

Why teams care (strategic, not hype)

Primary sources

Citation (from README)

Related posts

NVIDIA Cosmos 3: Open Physical AI World Models for Robots and Autonomous Systems

What Are World Models? The AI Systems That Simulate Reality (Starchild-1 and Beyond)

"What Happens to Creativity When AI Makes Copying Free?" — The shadcn Debate, Explained

TL;DR

Two capabilities: generation vs reconstruction

Architecture (high level)

Open-source plan (checklist from README)

Getting started with WorldMirror 2.0

Benchmarks (as reported—verify in the report)

Why teams care (strategic, not hype)

Related on explainx.ai

Primary sources

Citation (from README)