2026-06-12原文2 min readUpdated: 2026-06-12

Learning to Assist: Collaborative VLAs for Implicit Human-Robot Collaboration

This work shows that end-to-end imitation learning with vision-language-action (VLA) models can support collaborative manipulation. It identifies demonstration action leakage as a failure mode causing premature assistive behavior, and proposes an inference-time steering method. A 16-participant user study on a long-horizon assembly task demonstrates that steering enables longer execution horizons, faster collaboration, and fewer failures.

SourcearXiv RoboticsAuthor: Leo Xu, Letian Li, Alex Cuellar, Michael Hagenow

[2606.12475] Learning to Assist: Collaborative VLAs for Implicit Human-Robot Collaboration

[Submitted on 10 Jun 2026]

Title:Learning to Assist: Collaborative VLAs for Implicit Human-Robot Collaboration

View a PDF of the paper titled Learning to Assist: Collaborative VLAs for Implicit Human-Robot Collaboration, by Leo Xu and 3 other authors

View PDF HTML (experimental)

Abstract:Human-robot collaboration (HRC) combines the complementary strengths of humans and robots to improve task efficiency. However, many existing collaborative systems rely on hand-engineered pipelines, limiting their scalability and flexibility for new tasks. In this work, we show that models trained end-to-end with imitation learning, specifically vision-language-action (VLA) models, can support collaborative manipulation, and characterize the key factors affecting their real-world performance. We evaluate two state-of-the-art models and identify a failure mode of action-chunking policies in implicit HRC, where demonstration action leakage (i.e., action chunks crossing latent task transitions) can cause premature assistive behavior. We find that this issue increases with longer execution horizons and occurs in real-world collaborative VLA systems, such as when a robot attempts to hand over a tool before the person is ready. We propose an inference-time steering method to mitigate these erroneous assistive actions while preserving policy performance. Finally, through a 16-participant user study on a long-horizon collaborative assembly task, we show that steering enables a longer execution horizon while mitigating premature assistance, leading to faster collaboration and fewer failures compared to a shorter-horizon policy.

Subjects:

Robotics (cs.RO)

Cite as: arXiv:2606.12475 [cs.RO]

(or arXiv:2606.12475v1 [cs.RO] for this version)

https://doi.org/10.48550/arXiv.2606.12475

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Leo Xu [view email] [v1] Wed, 10 Jun 2026 05:42:49 UTC (13,470 KB)

Full-text links:

Access Paper:

View a PDF of the paper titled Learning to Assist: Collaborative VLAs for Implicit Human-Robot Collaboration, by Leo Xu and 3 other authors

View PDF

HTML (experimental)

TeX Source

view license

Current browse context:

cs.RO

new | recent | 2026-06

Change to browse by:

References & Citations

NASA ADS

Google Scholar

Semantic Scholar

Data provided by:

Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer Toggle

Bibliographic Explorer (What is the Explorer?)

Connected Papers Toggle

Connected Papers (What is Connected Papers?)

Litmaps Toggle

Litmaps (What is Litmaps?)

scite.ai Toggle

scite Smart Citations (What are Smart Citations?)

Code, Data, Media

Code, Data and Media Associated with this Article

alphaXiv Toggle

alphaXiv (What is alphaXiv?)

Links to Code Toggle

CatalyzeX Code Finder for Papers (What is CatalyzeX?)

DagsHub Toggle

DagsHub (What is DagsHub?)

GotitPub Toggle

Gotit.pub (What is GotitPub?)

Huggingface Toggle

Hugging Face (What is Huggingface?)

ScienceCast Toggle

ScienceCast (What is ScienceCast?)

Demos

Replicate Toggle

Replicate (What is Replicate?)

Spaces Toggle

Hugging Face Spaces (What is Spaces?)

Spaces Toggle

TXYZ.AI (What is TXYZ.AI?)