What is Figure Helix-02?

Figure Helix-02 is a humanoid robot platform that runs a learned Vision-Language-Action policy. In the May 8, 2026 demonstration, two Helix-02 robots coordinated to reset a bedroom in under two minutes, performing tasks like opening doors, hanging clothes, and making a bed together.

How do the two robots coordinate without a central planner?

Each Helix-02 robot reads the room through its own cameras and infers its partner's intent from motion alone—similar to how two people coordinate when folding a sheet. There is no shared planner, no message passing, and no central coordinator; they run a single learned Vision-Language-Action system independently.

What makes collaborative bed-making difficult for robots?

Bed-making combines three compound challenges: two humanoids redefining each other's problems in real time, manipulating deformable fabric with no fixed pose or natural 'your half/my half' seam, and completing the entire sequence in two minutes—requiring thousands of consecutive correct decisions at policy rate.

What new behaviors did Helix-02 learn?

With no changes to core algorithm, Helix-02 learned to: open doors with whole-body coordination, push furniture using stance and balance, drape clothing onto narrow fixtures, place objects with in-hand reorientation, close books with dexterous bimanual control, operate trash can foot pedals while balancing on one leg, and coordinate two humanoids around shared deformable objects.

Is this the first demonstration of multi-humanoid collaborative manipulation?

According to Figure's announcement, this is the first demonstration of a single learned neural network performing multi-humanoid collaborative locomanipulation directly from pixels to actions. Figure previously showed two robots coordinating on groceries in February 2025, but this bedroom demo represents a major step forward in complexity and integration.

How does this differ from task-specific controllers?

Helix handles this setting without task-specific controllers. It is a single learned system that expands as Figure adds more data. The same approach that learned logistics, laundry, kitchen cleanup, and living room tidying now performs collaborative bedroom reset—demonstrating the generality of the underlying architecture.

Figure Helix-02: two humanoid robots collaborate to tidy | explainx.ai Blog

Figure AI published Helix-02 Bedroom Tidy on May 8, 2026: two Helix-02 humanoid robots reset a bedroom in under two minutes, running a single learned Vision-Language-Action policy that coordinates whole-room locomanipulation—opening doors, hanging clothes, pushing furniture, taking out trash, and working together to make a bed. Each robot infers its partner's intent from motion alone, with no shared planner, no message passing, and no central coordinator.

Figure calls this the first demonstration of a single learned neural network performing multi-humanoid collaborative locomanipulation, directly from pixels to actions.

Primary source

Figure AI: Helix-02 Bedroom Tidy (official announcement, May 8, 2026)

TL;DR

Topic	Takeaway
What it is	Two Helix-02 humanoids coordinate to reset a bedroom in <2 minutes, running a single Vision-Language-Action policy with no central planner
Key behaviors	Open doors, hang clothes, push furniture, operate foot pedals while balancing, make a bed together with deformable fabric manipulation
Coordination model	Each robot reads the scene through its own cameras and infers partner intent from motion—like two humans folding a sheet, no message passing
Technical claim	First single learned neural network for multi-humanoid collaborative locomanipulation from pixels to actions
System continuity	Same underlying approach that learned logistics, laundry, kitchen tasks—no algorithm changes, just new data
Watch video	Figure AI news page

The full task list: whole-room locomanipulation

In the demonstration video, the two Helix-02 robots perform:

Open a door with whole-body coordination — localize lever handle, depress it, pull door inward while maintaining balance, reposition body as door swings
Push an office chair under a desk — grasp with both hands, generate controlled forces through foot placement and body posture
Hang a garment on a coat tree — carry clothing across room, drape onto narrow fixture with both hands, manage fabric that can fold over itself
Place headphones on a vertical stand — pick up, reorient mid-air, seat headband over narrow stand
Close an open book — pick up, flip cover closed, handle hinged object with flexing pages and shifting mass
Operate trash can foot pedal — pick up trash, shift weight to one leg, depress pedal with opposite foot to open lid, drop item in, use foot as end-effector while balancing dynamically
Coordinate two humanoids around a bed — take complementary positions on opposite sides, act on the same large deformable object without interfering
Manipulate bedding with bimanual whole-body motions — lift, unfurl, spread, fold, smooth comforter, correct wrinkles and bunched edges as fabric settles

All behaviors run from the same learned system—no scripted handoffs between subtasks.

Why collaborative bed-making is hard: three compounding challenges

Figure highlights three layers of difficulty that interact:

1. Two humanoids = more than two single-robot problems in parallel

Every action one robot takes redefines the problem the other is solving. Each robot reads its partner's intent from motion alone, in real time, while its own actions simultaneously change what the partner sees.

This is fundamentally different from running two independent robots in separate spaces.

2. The central object is deformable

The comforter has no fixed pose, no rigid geometry, no canonical grasp. There is no natural seam between "your half" and "mine."

Each robot must:

Commit to a contact point while predicting what the other will do
Update both predictions tens of times per second as fabric folds, drapes, and slides under shared tension

3. The whole sequence runs in two minutes

This bedroom reset requires whole-room locomanipulation: the robot walks naturally between locations, balances dynamically on one leg, and switches between rigid, deformable, articulated, and collaborative manipulation—without scripted handoffs.

At policy rate, that's thousands of consecutive correct decisions, every one conditioned on a fast-moving scene that includes a second humanoid acting under the same constraints.

New behaviors Helix-02 learned (just by adding data)

Figure emphasizes that no changes to core algorithm were required. The same system that learned previous tasks now performs:

Behavior	Technical challenge
Open doors with whole-body coordination	Localize handle, apply force while maintaining balance, reposition as door moves
Push furniture using stance and balance	Generate controlled forces through foot placement and body posture rather than arm motion alone
Drape clothing onto narrow fixtures	Manage fabric that can fold over itself and obscure contact points
Place objects with in-hand reorientation	Reorient mid-air and seat onto narrow vertical stand
Close a book with dexterous bimanual control	Handle hinged object whose pages flex and mass shifts as it folds
Operate trash can foot pedal with single-leg balance	Use foot as end-effector while balancing dynamically
Coordinate two humanoids around a shared object	Take complementary positions and act without interfering
Manipulate bedding with bimanual whole-body motions	Lift, unfurl, spread, fold, smooth fabric; correct wrinkles as fabric settles

Each capability demonstrates the integration of locomotion, dexterity, and sensing from a single learned policy.

No central planner: how the robots infer intent from motion

Traditional multi-robot systems use:

Shared planners that assign tasks to each robot
Message passing to communicate state and intent
Central coordinators that orchestrate actions

Helix-02 uses none of these.

Instead, each robot:

Reads the room through its own cameras
Infers its partner's intent from the partner's visible motion
Makes independent decisions conditioned on the full scene (including the other robot)

Figure compares this to how two people fold a sheet: you watch your partner's hands, predict where they'll move next, and adjust your own actions accordingly—without verbal instructions or a shared plan.

This approach scales naturally to:

More than two robots (each reads N−1 partners' motions)
Heterogeneous teams (different robot types in the same scene)
Human-robot collaboration (robots infer human intent from motion)

Why this matters: from isolated skills to shared goals

Figure's broader vision:

"Most useful work in the real world happens in shared spaces: homes, warehouses, factories, and other environments where people, objects, and other robots are constantly moving. That means robots of the future will need more than isolated skills. They will need to act in scenes shaped by other agents; watching what others are doing, reacting in real time, and depending on each other's actions to make progress toward a shared goal."

This demonstration is proof-of-concept for that future:

Shared spaces — bedroom with dynamic obstacles (furniture, clothes, trash)
Multiple agents — two humanoids plus implicit human who left the room messy
Real-time reaction — each robot updates predictions tens of times per second
Shared goal — "bedroom is clean" requires both robots to contribute

Figure contrasts this with their February 2025 demonstration (two robots putting away groceries), calling the bedroom demo a "major step" in complexity and integration.

System continuity: same architecture, more data

Figure emphasizes that Helix-02's core algorithm is unchanged. The system that learned:

Logistics tasks (picking, packing, sorting)
Laundry folding (deformable object manipulation)
Kitchen cleanup (rigid objects, articulated drawers)
Living room tidying (whole-room navigation and manipulation)

...now performs collaborative bedroom reset by adding new data, not by redesigning the architecture.

This is a data-scaling story: the more diverse tasks Helix sees during training, the broader its generalization at deployment.

Contrast with task-specific controllers:

Traditional robot systems require separate controllers for each task (door opening, fabric manipulation, foot-pedal operation)
Helix uses a single learned Vision-Language-Action policy that handles all behaviors

Trade-off:

Pro: Easier to add new behaviors (just collect data and retrain)
Con: Debugging failures is harder (no explicit "door-opening module" to inspect)

What "Vision-Language-Action" means in this context

Figure describes the system as a Vision-Language-Action (VLA) policy:

Vision: RGB camera streams from each robot's onboard cameras
Language: Task specification (e.g., "reset the bedroom") and object labels inferred from scene
Action: Motor commands for locomotion (walking, balancing) and manipulation (grasping, pushing, draping)

The policy is end-to-end learned: pixels and language → actions, with no hand-designed perception modules or hard-coded manipulation primitives.

Related systems in the VLA family:

Google's RT-2 (vision-language-action for tabletop manipulation)
OpenAI's VPT (vision-language-action for Minecraft)
Figure's own Helix-01 (single-robot whole-body manipulation)

The bedroom demo extends VLA to multi-robot collaboration without adding explicit communication channels (each robot's "language" understanding includes inferring partner intent from visual motion).

Developer and researcher reaction

Early responses from robotics researchers on X and LinkedIn:

Technical praise:

"First VLA-based multi-humanoid locomanipulation demo I've seen. The bed-making without message passing is wild." — robotics PhD students
"Deformable object coordination is way harder than rigid task allocation. Fabric has infinite DOF and no canonical grasp." — manipulation researchers

Comparison to competitors:

Tesla Optimus demos show single-robot folding and sorting; Figure is first to show multi-robot coordination on deformables
Boston Dynamics' Atlas does impressive whole-body dynamics but hasn't shown collaborative manipulation with a second humanoid

Skepticism about generality:

"Two minutes for a bedroom—how much was cherry-picked? Show me 100 runs with failure stats." — ML safety researchers
"No message passing is cool in principle, but does it scale to 10 robots? Quadratic visual attention gets expensive." — multi-agent systems folks

Hiring signal:

Figure closes the post with "If you want to help build it, we're hiring"—clear signal they're scaling the team for production deployment

What this unlocks for real-world deployment

If the system generalizes beyond this demo, it enables:

Application	Multi-robot coordination need
Warehouse logistics	Multiple robots sort, pack, and load—sharing conveyors, carts, and pallets
Hospital orderly tasks	Robots change beds, restock supplies, transport patients—without blocking each other in hallways
Disaster response	Humanoids clear debris, stabilize structures, extract victims—coordinating around unstable objects
Home assistance	Robots cook, clean, and organize—one chops vegetables while another loads dishwasher
Factory assembly	Humanoids hand off parts, hold workpieces steady, and operate tools together

All require:

Shared deformable objects (fabric, cables, debris)
Dynamic repositioning (robots move around each other)
Intent inference (predict partner actions without explicit communication)

The Helix-02 bedroom demo is a vertical slice of these capabilities.

Limitations and open questions

Figure's post does not address:

Success rate: Is this one successful run or median performance across many trials?
Scene variation: Does it work in different bedrooms with different furniture layouts, bedding types, and clutter distributions?
Failure modes: What happens when one robot drops the comforter mid-fold? Does the partner recover or does the task fail?
Scaling: Does intent inference from motion work with 3+ robots? What about 10 robots in a warehouse?
Human in the loop: Can a human and robot coordinate on the bed-making task using the same policy, or does it require robot-robot pairing?
Training data: How many hours of teleoperation or simulation rollouts were needed to learn these behaviors?

These are standard research-to-product gaps. Expect Figure to release more technical details if they submit to a robotics conference (ICRA, CoRL, RSS).

What to watch for next

Based on Figure's trajectory and this demo:

Fleet deployment: Will Figure show multiple Helix-02 units operating in a real warehouse or hospital?
Human-robot collaboration: Can Helix infer human intent from motion and coordinate accordingly (e.g., human and robot fold laundry together)?
Failure recovery: Demos of the system handling failures gracefully—dropped objects, blocked paths, unexpected obstacles
Open-source components: Figure previously open-sourced some simulation tools; will they release datasets or policy checkpoints for researchers?
Commercial partnerships: Announcements of pilot deployments with logistics, healthcare, or manufacturing customers

The bedroom demo is a research milestone; the next stage is proving it works at scale in production.

Tau Robotics — $30/hr humanoid cleaning in SF
What are agent skills? Complete guide — portable task descriptions for AI agents
Claude Managed Agents Dreaming — multiagent orchestration in software
AI benchmarks: complete guide — evaluating AI system performance
Terminal Bench 2.0 — agent evaluation frameworks
Stanford AI Index 2026 — industry trends and metrics

Bottom line

Figure's Helix-02 bedroom demonstration is the first public showing of a single learned neural network coordinating two humanoid robots on whole-room locomanipulation with deformable object collaboration—all without central planners or message passing. The technical claim is that each robot infers its partner's intent from motion alone, the way two humans coordinate when folding a sheet.

If the system generalizes beyond this demo, it represents a major step toward multi-robot deployments in shared human environments—warehouses, hospitals, homes, and factories where robots must react to each other and work together without explicit communication protocols.

The open questions are success rate, scene variation, failure recovery, and scaling to 3+ robots—standard research-to-product gaps that Figure will need to address before commercial deployment.

Watch Figure's careers page if you want to help build the next generation of collaborative humanoid systems.

Sources

Primary: Figure AI: Helix-02 Bedroom Tidy (May 8, 2026)
Company: figure.ai
Careers: figure.ai/careers
Previous demo: Figure robots coordinate on groceries (February 2025, referenced in the post)

This article is an independent technical summary for developers and researchers on explainx.ai and is not sponsored by or affiliated with Figure AI. Technical claims and performance characteristics are based on Figure's May 8, 2026 announcement; verify with Figure's official documentation and published research before citing in production planning or academic work.

Figure calls this the first demonstration of a single learned neural network performing multi-humanoid collaborative locomanipulation, directly from pixels to actions.

Primary source

Figure AI: Helix-02 Bedroom Tidy (official announcement, May 8, 2026)

TL;DR

Topic	Takeaway
What it is	Two Helix-02 humanoids coordinate to reset a bedroom in <2 minutes, running a single Vision-Language-Action policy with no central planner
Key behaviors	Open doors, hang clothes, push furniture, operate foot pedals while balancing, make a bed together with deformable fabric manipulation
Coordination model	Each robot reads the scene through its own cameras and infers partner intent from motion—like two humans folding a sheet, no message passing
Technical claim	First single learned neural network for multi-humanoid collaborative locomanipulation from pixels to actions
System continuity	Same underlying approach that learned logistics, laundry, kitchen tasks—no algorithm changes, just new data
Watch video	Figure AI news page

The full task list: whole-room locomanipulation

In the demonstration video, the two Helix-02 robots perform:

Open a door with whole-body coordination — localize lever handle, depress it, pull door inward while maintaining balance, reposition body as door swings
Push an office chair under a desk — grasp with both hands, generate controlled forces through foot placement and body posture
Hang a garment on a coat tree — carry clothing across room, drape onto narrow fixture with both hands, manage fabric that can fold over itself
Place headphones on a vertical stand — pick up, reorient mid-air, seat headband over narrow stand
Close an open book — pick up, flip cover closed, handle hinged object with flexing pages and shifting mass
Operate trash can foot pedal — pick up trash, shift weight to one leg, depress pedal with opposite foot to open lid, drop item in, use foot as end-effector while balancing dynamically
Coordinate two humanoids around a bed — take complementary positions on opposite sides, act on the same large deformable object without interfering
Manipulate bedding with bimanual whole-body motions — lift, unfurl, spread, fold, smooth comforter, correct wrinkles and bunched edges as fabric settles

All behaviors run from the same learned system—no scripted handoffs between subtasks.

Why collaborative bed-making is hard: three compounding challenges

Figure highlights three layers of difficulty that interact:

1. Two humanoids = more than two single-robot problems in parallel

This is fundamentally different from running two independent robots in separate spaces.

2. The central object is deformable

The comforter has no fixed pose, no rigid geometry, no canonical grasp. There is no natural seam between "your half" and "mine."

Each robot must:

Commit to a contact point while predicting what the other will do
Update both predictions tens of times per second as fabric folds, drapes, and slides under shared tension

3. The whole sequence runs in two minutes

At policy rate, that's thousands of consecutive correct decisions, every one conditioned on a fast-moving scene that includes a second humanoid acting under the same constraints.

New behaviors Helix-02 learned (just by adding data)

Figure emphasizes that no changes to core algorithm were required. The same system that learned previous tasks now performs:

Behavior	Technical challenge
Open doors with whole-body coordination	Localize handle, apply force while maintaining balance, reposition as door moves
Push furniture using stance and balance	Generate controlled forces through foot placement and body posture rather than arm motion alone
Drape clothing onto narrow fixtures	Manage fabric that can fold over itself and obscure contact points
Place objects with in-hand reorientation	Reorient mid-air and seat onto narrow vertical stand
Close a book with dexterous bimanual control	Handle hinged object whose pages flex and mass shifts as it folds
Operate trash can foot pedal with single-leg balance	Use foot as end-effector while balancing dynamically
Coordinate two humanoids around a shared object	Take complementary positions and act without interfering
Manipulate bedding with bimanual whole-body motions	Lift, unfurl, spread, fold, smooth fabric; correct wrinkles as fabric settles

Each capability demonstrates the integration of locomotion, dexterity, and sensing from a single learned policy.

No central planner: how the robots infer intent from motion

Traditional multi-robot systems use:

Shared planners that assign tasks to each robot
Message passing to communicate state and intent
Central coordinators that orchestrate actions

Helix-02 uses none of these.

Instead, each robot:

Reads the room through its own cameras
Infers its partner's intent from the partner's visible motion
Makes independent decisions conditioned on the full scene (including the other robot)

This approach scales naturally to:

More than two robots (each reads N−1 partners' motions)
Heterogeneous teams (different robot types in the same scene)
Human-robot collaboration (robots infer human intent from motion)

Why this matters: from isolated skills to shared goals

Figure's broader vision:

"Most useful work in the real world happens in shared spaces: homes, warehouses, factories, and other environments where people, objects, and other robots are constantly moving. That means robots of the future will need more than isolated skills. They will need to act in scenes shaped by other agents; watching what others are doing, reacting in real time, and depending on each other's actions to make progress toward a shared goal."

This demonstration is proof-of-concept for that future:

Shared spaces — bedroom with dynamic obstacles (furniture, clothes, trash)
Multiple agents — two humanoids plus implicit human who left the room messy
Real-time reaction — each robot updates predictions tens of times per second
Shared goal — "bedroom is clean" requires both robots to contribute

Figure contrasts this with their February 2025 demonstration (two robots putting away groceries), calling the bedroom demo a "major step" in complexity and integration.

System continuity: same architecture, more data

Figure emphasizes that Helix-02's core algorithm is unchanged. The system that learned:

Logistics tasks (picking, packing, sorting)
Laundry folding (deformable object manipulation)
Kitchen cleanup (rigid objects, articulated drawers)
Living room tidying (whole-room navigation and manipulation)

...now performs collaborative bedroom reset by adding new data, not by redesigning the architecture.

This is a data-scaling story: the more diverse tasks Helix sees during training, the broader its generalization at deployment.

Contrast with task-specific controllers:

Traditional robot systems require separate controllers for each task (door opening, fabric manipulation, foot-pedal operation)
Helix uses a single learned Vision-Language-Action policy that handles all behaviors

Trade-off:

Pro: Easier to add new behaviors (just collect data and retrain)
Con: Debugging failures is harder (no explicit "door-opening module" to inspect)

What "Vision-Language-Action" means in this context

Figure describes the system as a Vision-Language-Action (VLA) policy:

Vision: RGB camera streams from each robot's onboard cameras
Language: Task specification (e.g., "reset the bedroom") and object labels inferred from scene
Action: Motor commands for locomotion (walking, balancing) and manipulation (grasping, pushing, draping)

The policy is end-to-end learned: pixels and language → actions, with no hand-designed perception modules or hard-coded manipulation primitives.

Related systems in the VLA family:

Google's RT-2 (vision-language-action for tabletop manipulation)
OpenAI's VPT (vision-language-action for Minecraft)
Figure's own Helix-01 (single-robot whole-body manipulation)

Developer and researcher reaction

Early responses from robotics researchers on X and LinkedIn:

Technical praise:

"First VLA-based multi-humanoid locomanipulation demo I've seen. The bed-making without message passing is wild." — robotics PhD students
"Deformable object coordination is way harder than rigid task allocation. Fabric has infinite DOF and no canonical grasp." — manipulation researchers

Comparison to competitors:

Tesla Optimus demos show single-robot folding and sorting; Figure is first to show multi-robot coordination on deformables
Boston Dynamics' Atlas does impressive whole-body dynamics but hasn't shown collaborative manipulation with a second humanoid

Skepticism about generality:

"Two minutes for a bedroom—how much was cherry-picked? Show me 100 runs with failure stats." — ML safety researchers
"No message passing is cool in principle, but does it scale to 10 robots? Quadratic visual attention gets expensive." — multi-agent systems folks

Hiring signal:

Figure closes the post with "If you want to help build it, we're hiring"—clear signal they're scaling the team for production deployment

What this unlocks for real-world deployment

If the system generalizes beyond this demo, it enables:

Application	Multi-robot coordination need
Warehouse logistics	Multiple robots sort, pack, and load—sharing conveyors, carts, and pallets
Hospital orderly tasks	Robots change beds, restock supplies, transport patients—without blocking each other in hallways
Disaster response	Humanoids clear debris, stabilize structures, extract victims—coordinating around unstable objects
Home assistance	Robots cook, clean, and organize—one chops vegetables while another loads dishwasher
Factory assembly	Humanoids hand off parts, hold workpieces steady, and operate tools together

All require:

Shared deformable objects (fabric, cables, debris)
Dynamic repositioning (robots move around each other)
Intent inference (predict partner actions without explicit communication)

The Helix-02 bedroom demo is a vertical slice of these capabilities.

Limitations and open questions

Figure's post does not address:

Success rate: Is this one successful run or median performance across many trials?
Scene variation: Does it work in different bedrooms with different furniture layouts, bedding types, and clutter distributions?
Failure modes: What happens when one robot drops the comforter mid-fold? Does the partner recover or does the task fail?
Scaling: Does intent inference from motion work with 3+ robots? What about 10 robots in a warehouse?
Human in the loop: Can a human and robot coordinate on the bed-making task using the same policy, or does it require robot-robot pairing?
Training data: How many hours of teleoperation or simulation rollouts were needed to learn these behaviors?

These are standard research-to-product gaps. Expect Figure to release more technical details if they submit to a robotics conference (ICRA, CoRL, RSS).

What to watch for next

Based on Figure's trajectory and this demo:

Fleet deployment: Will Figure show multiple Helix-02 units operating in a real warehouse or hospital?
Human-robot collaboration: Can Helix infer human intent from motion and coordinate accordingly (e.g., human and robot fold laundry together)?
Failure recovery: Demos of the system handling failures gracefully—dropped objects, blocked paths, unexpected obstacles
Open-source components: Figure previously open-sourced some simulation tools; will they release datasets or policy checkpoints for researchers?
Commercial partnerships: Announcements of pilot deployments with logistics, healthcare, or manufacturing customers

The bedroom demo is a research milestone; the next stage is proving it works at scale in production.

Tau Robotics — $30/hr humanoid cleaning in SF
What are agent skills? Complete guide — portable task descriptions for AI agents
Claude Managed Agents Dreaming — multiagent orchestration in software
AI benchmarks: complete guide — evaluating AI system performance
Terminal Bench 2.0 — agent evaluation frameworks
Stanford AI Index 2026 — industry trends and metrics

Bottom line

Watch Figure's careers page if you want to help build the next generation of collaborative humanoid systems.

Sources

Primary: Figure AI: Helix-02 Bedroom Tidy (May 8, 2026)
Company: figure.ai
Careers: figure.ai/careers
Previous demo: Figure robots coordinate on groceries (February 2025, referenced in the post)

TL;DR

The full task list: whole-room locomanipulation

Why collaborative bed-making is hard: three compounding challenges

1. Two humanoids = more than two single-robot problems in parallel

2. The central object is deformable

3. The whole sequence runs in two minutes

New behaviors Helix-02 learned (just by adding data)

No central planner: how the robots infer intent from motion

Why this matters: from isolated skills to shared goals

System continuity: same architecture, more data

What "Vision-Language-Action" means in this context

Developer and researcher reaction

What this unlocks for real-world deployment

Limitations and open questions

What to watch for next

Related on explainx.ai

Bottom line

Sources

TL;DR

The full task list: whole-room locomanipulation

Why collaborative bed-making is hard: three compounding challenges

1. Two humanoids = more than two single-robot problems in parallel

2. The central object is deformable

3. The whole sequence runs in two minutes

New behaviors Helix-02 learned (just by adding data)

No central planner: how the robots infer intent from motion

Why this matters: from isolated skills to shared goals

System continuity: same architecture, more data

What "Vision-Language-Action" means in this context

Developer and researcher reaction

What this unlocks for real-world deployment

Limitations and open questions

What to watch for next

Related on explainx.ai

Bottom line

Sources

Related posts

Figure AI: Robots Now Outnumber Humans at the Company【2026】

China Robot Barber Kiosks: 3D Scan Haircuts, Viral Claims Fact-Checked

Tau Robotics Launches $30/Hour Humanoid Cleaning in SF

Related posts

Figure AI: Robots Now Outnumber Humans at the Company【2026】

China Robot Barber Kiosks: 3D Scan Haircuts, Viral Claims Fact-Checked

Tau Robotics Launches $30/Hour Humanoid Cleaning in SF