Figure AI published Helix-02 Bedroom Tidy on May 8, 2026: two Helix-02 humanoid robots reset a bedroom in under two minutes, running a single learned Vision-Language-Action policy that coordinates whole-room locomanipulation—opening doors, hanging clothes, pushing furniture, taking out trash, and working together to make a bed. Each robot infers its partner's intent from motion alone, with no shared planner, no message passing, and no central coordinator.
Figure calls this the first demonstration of a single learned neural network performing multi-humanoid collaborative locomanipulation, directly from pixels to actions.
Primary source
- Figure AI: Helix-02 Bedroom Tidy (official announcement, May 8, 2026)
TL;DR
| Topic | Takeaway |
|---|---|
| What it is | Two Helix-02 humanoids coordinate to reset a bedroom in <2 minutes, running a single Vision-Language-Action policy with no central planner |
| Key behaviors | Open doors, hang clothes, push furniture, operate foot pedals while balancing, make a bed together with deformable fabric manipulation |
| Coordination model | Each robot reads the scene through its own cameras and infers partner intent from motion—like two humans folding a sheet, no message passing |
| Technical claim | First single learned neural network for multi-humanoid collaborative locomanipulation from pixels to actions |
| System continuity | Same underlying approach that learned logistics, laundry, kitchen tasks—no algorithm changes, just new data |
| Watch video | Figure AI news page |
Complete AI Builder Bootcamp
Claude, Python automation & full-stack — 12 live sessions with Yash Thakker.
The Complete AI Builder Bootcamp is the best AI development course for learning Claude AI, prompt engineering, Python automation, and full-stack web development. This intensive 6-week live bootcamp teaches you how to build AI-powered applications using Claude Projects, Claude Artifacts, Claude Code, and the complete Claude ecosystem. You'll master prompt engineering techniques, learn to create custom Claude connectors and MCP integrations, build Python automation workflows, develop full-stack websites with AI assistance, and create AI marketing agents.
The bootcamp includes 12 live Zoom sessions with Yash Thakker, founder of AISOLO Technologies and instructor to 350,000+ students. You'll build 8+ portfolio projects including AI playbooks, full-stack note-taking applications, Python automation scripts, marketing agents, and personal portfolio websites. The curriculum covers AI fundamentals, Claude Projects and Artifacts, Claude Co-work, Claude plugins and skills, Claude Code for Python development, full-stack development, AI marketing, and capstone projects.
Students receive 1-year access to all recordings, permanent Discord community access, a certificate of completion, and personalized career guidance. All enrollments include a 7-day money-back guarantee. This is the most comprehensive Claude AI bootcamp available, taking students from zero AI knowledge to expert AI builder in 6 weeks.
The full task list: whole-room locomanipulation
In the demonstration video, the two Helix-02 robots perform:
- Open a door with whole-body coordination — localize lever handle, depress it, pull door inward while maintaining balance, reposition body as door swings
- Push an office chair under a desk — grasp with both hands, generate controlled forces through foot placement and body posture
- Hang a garment on a coat tree — carry clothing across room, drape onto narrow fixture with both hands, manage fabric that can fold over itself
- Place headphones on a vertical stand — pick up, reorient mid-air, seat headband over narrow stand
- Close an open book — pick up, flip cover closed, handle hinged object with flexing pages and shifting mass
- Operate trash can foot pedal — pick up trash, shift weight to one leg, depress pedal with opposite foot to open lid, drop item in, use foot as end-effector while balancing dynamically
- Coordinate two humanoids around a bed — take complementary positions on opposite sides, act on the same large deformable object without interfering
- Manipulate bedding with bimanual whole-body motions — lift, unfurl, spread, fold, smooth comforter, correct wrinkles and bunched edges as fabric settles
All behaviors run from the same learned system—no scripted handoffs between subtasks.
Why collaborative bed-making is hard: three compounding challenges
Figure highlights three layers of difficulty that interact:
1. Two humanoids = more than two single-robot problems in parallel
Every action one robot takes redefines the problem the other is solving. Each robot reads its partner's intent from motion alone, in real time, while its own actions simultaneously change what the partner sees.
This is fundamentally different from running two independent robots in separate spaces.
2. The central object is deformable
The comforter has no fixed pose, no rigid geometry, no canonical grasp. There is no natural seam between "your half" and "mine."
Each robot must:
- Commit to a contact point while predicting what the other will do
- Update both predictions tens of times per second as fabric folds, drapes, and slides under shared tension
3. The whole sequence runs in two minutes
This bedroom reset requires whole-room locomanipulation: the robot walks naturally between locations, balances dynamically on one leg, and switches between rigid, deformable, articulated, and collaborative manipulation—without scripted handoffs.
At policy rate, that's thousands of consecutive correct decisions, every one conditioned on a fast-moving scene that includes a second humanoid acting under the same constraints.
New behaviors Helix-02 learned (just by adding data)
Figure emphasizes that no changes to core algorithm were required. The same system that learned previous tasks now performs:
| Behavior | Technical challenge |
|---|---|
| Open doors with whole-body coordination | Localize handle, apply force while maintaining balance, reposition as door moves |
| Push furniture using stance and balance | Generate controlled forces through foot placement and body posture rather than arm motion alone |
| Drape clothing onto narrow fixtures | Manage fabric that can fold over itself and obscure contact points |
| Place objects with in-hand reorientation | Reorient mid-air and seat onto narrow vertical stand |
| Close a book with dexterous bimanual control | Handle hinged object whose pages flex and mass shifts as it folds |
| Operate trash can foot pedal with single-leg balance | Use foot as end-effector while balancing dynamically |
| Coordinate two humanoids around a shared object | Take complementary positions and act without interfering |
| Manipulate bedding with bimanual whole-body motions | Lift, unfurl, spread, fold, smooth fabric; correct wrinkles as fabric settles |
Each capability demonstrates the integration of locomotion, dexterity, and sensing from a single learned policy.
No central planner: how the robots infer intent from motion
Traditional multi-robot systems use:
- Shared planners that assign tasks to each robot
- Message passing to communicate state and intent
- Central coordinators that orchestrate actions
Helix-02 uses none of these.
Instead, each robot:
- Reads the room through its own cameras
- Infers its partner's intent from the partner's visible motion
- Makes independent decisions conditioned on the full scene (including the other robot)
Figure compares this to how two people fold a sheet: you watch your partner's hands, predict where they'll move next, and adjust your own actions accordingly—without verbal instructions or a shared plan.
This approach scales naturally to:
- More than two robots (each reads N−1 partners' motions)
- Heterogeneous teams (different robot types in the same scene)
- Human-robot collaboration (robots infer human intent from motion)
Why this matters: from isolated skills to shared goals
Figure's broader vision:
"Most useful work in the real world happens in shared spaces: homes, warehouses, factories, and other environments where people, objects, and other robots are constantly moving. That means robots of the future will need more than isolated skills. They will need to act in scenes shaped by other agents; watching what others are doing, reacting in real time, and depending on each other's actions to make progress toward a shared goal."
This demonstration is proof-of-concept for that future:
- Shared spaces — bedroom with dynamic obstacles (furniture, clothes, trash)
- Multiple agents — two humanoids plus implicit human who left the room messy
- Real-time reaction — each robot updates predictions tens of times per second
- Shared goal — "bedroom is clean" requires both robots to contribute
Figure contrasts this with their February 2025 demonstration (two robots putting away groceries), calling the bedroom demo a "major step" in complexity and integration.
System continuity: same architecture, more data
Figure emphasizes that Helix-02's core algorithm is unchanged. The system that learned:
- Logistics tasks (picking, packing, sorting)
- Laundry folding (deformable object manipulation)
- Kitchen cleanup (rigid objects, articulated drawers)
- Living room tidying (whole-room navigation and manipulation)
...now performs collaborative bedroom reset by adding new data, not by redesigning the architecture.
This is a data-scaling story: the more diverse tasks Helix sees during training, the broader its generalization at deployment.
Contrast with task-specific controllers:
- Traditional robot systems require separate controllers for each task (door opening, fabric manipulation, foot-pedal operation)
- Helix uses a single learned Vision-Language-Action policy that handles all behaviors
Trade-off:
- Pro: Easier to add new behaviors (just collect data and retrain)
- Con: Debugging failures is harder (no explicit "door-opening module" to inspect)
What "Vision-Language-Action" means in this context
Figure describes the system as a Vision-Language-Action (VLA) policy:
- Vision: RGB camera streams from each robot's onboard cameras
- Language: Task specification (e.g., "reset the bedroom") and object labels inferred from scene
- Action: Motor commands for locomotion (walking, balancing) and manipulation (grasping, pushing, draping)
The policy is end-to-end learned: pixels and language → actions, with no hand-designed perception modules or hard-coded manipulation primitives.
Related systems in the VLA family:
- Google's RT-2 (vision-language-action for tabletop manipulation)
- OpenAI's VPT (vision-language-action for Minecraft)
- Figure's own Helix-01 (single-robot whole-body manipulation)
The bedroom demo extends VLA to multi-robot collaboration without adding explicit communication channels (each robot's "language" understanding includes inferring partner intent from visual motion).
Developer and researcher reaction
Early responses from robotics researchers on X and LinkedIn:
Technical praise:
- "First VLA-based multi-humanoid locomanipulation demo I've seen. The bed-making without message passing is wild." — robotics PhD students
- "Deformable object coordination is way harder than rigid task allocation. Fabric has infinite DOF and no canonical grasp." — manipulation researchers
Comparison to competitors:
- Tesla Optimus demos show single-robot folding and sorting; Figure is first to show multi-robot coordination on deformables
- Boston Dynamics' Atlas does impressive whole-body dynamics but hasn't shown collaborative manipulation with a second humanoid
Skepticism about generality:
- "Two minutes for a bedroom—how much was cherry-picked? Show me 100 runs with failure stats." — ML safety researchers
- "No message passing is cool in principle, but does it scale to 10 robots? Quadratic visual attention gets expensive." — multi-agent systems folks
Hiring signal:
- Figure closes the post with "If you want to help build it, we're hiring"—clear signal they're scaling the team for production deployment
What this unlocks for real-world deployment
If the system generalizes beyond this demo, it enables:
| Application | Multi-robot coordination need |
|---|---|
| Warehouse logistics | Multiple robots sort, pack, and load—sharing conveyors, carts, and pallets |
| Hospital orderly tasks | Robots change beds, restock supplies, transport patients—without blocking each other in hallways |
| Disaster response | Humanoids clear debris, stabilize structures, extract victims—coordinating around unstable objects |
| Home assistance | Robots cook, clean, and organize—one chops vegetables while another loads dishwasher |
| Factory assembly | Humanoids hand off parts, hold workpieces steady, and operate tools together |
All require:
- Shared deformable objects (fabric, cables, debris)
- Dynamic repositioning (robots move around each other)
- Intent inference (predict partner actions without explicit communication)
The Helix-02 bedroom demo is a vertical slice of these capabilities.
Limitations and open questions
Figure's post does not address:
- Success rate: Is this one successful run or median performance across many trials?
- Scene variation: Does it work in different bedrooms with different furniture layouts, bedding types, and clutter distributions?
- Failure modes: What happens when one robot drops the comforter mid-fold? Does the partner recover or does the task fail?
- Scaling: Does intent inference from motion work with 3+ robots? What about 10 robots in a warehouse?
- Human in the loop: Can a human and robot coordinate on the bed-making task using the same policy, or does it require robot-robot pairing?
- Training data: How many hours of teleoperation or simulation rollouts were needed to learn these behaviors?
These are standard research-to-product gaps. Expect Figure to release more technical details if they submit to a robotics conference (ICRA, CoRL, RSS).
What to watch for next
Based on Figure's trajectory and this demo:
- Fleet deployment: Will Figure show multiple Helix-02 units operating in a real warehouse or hospital?
- Human-robot collaboration: Can Helix infer human intent from motion and coordinate accordingly (e.g., human and robot fold laundry together)?
- Failure recovery: Demos of the system handling failures gracefully—dropped objects, blocked paths, unexpected obstacles
- Open-source components: Figure previously open-sourced some simulation tools; will they release datasets or policy checkpoints for researchers?
- Commercial partnerships: Announcements of pilot deployments with logistics, healthcare, or manufacturing customers
The bedroom demo is a research milestone; the next stage is proving it works at scale in production.
Related on ExplainX
- What are agent skills? Complete guide — portable task descriptions for AI agents
- Claude Managed Agents Dreaming — multiagent orchestration in software
- AI benchmarks: complete guide — evaluating AI system performance
- Terminal Bench 2.0 — agent evaluation frameworks
- Stanford AI Index 2026 — industry trends and metrics
Bottom line
Figure's Helix-02 bedroom demonstration is the first public showing of a single learned neural network coordinating two humanoid robots on whole-room locomanipulation with deformable object collaboration—all without central planners or message passing. The technical claim is that each robot infers its partner's intent from motion alone, the way two humans coordinate when folding a sheet.
If the system generalizes beyond this demo, it represents a major step toward multi-robot deployments in shared human environments—warehouses, hospitals, homes, and factories where robots must react to each other and work together without explicit communication protocols.
The open questions are success rate, scene variation, failure recovery, and scaling to 3+ robots—standard research-to-product gaps that Figure will need to address before commercial deployment.
Watch Figure's careers page if you want to help build the next generation of collaborative humanoid systems.
Sources
- Primary: Figure AI: Helix-02 Bedroom Tidy (May 8, 2026)
- Company: figure.ai
- Careers: figure.ai/careers
- Previous demo: Figure robots coordinate on groceries (February 2025, referenced in the post)
This article is an independent technical summary for developers and researchers on explainx.ai and is not sponsored by or affiliated with Figure AI. Technical claims and performance characteristics are based on Figure's May 8, 2026 announcement; verify with Figure's official documentation and published research before citing in production planning or academic work.