2026-05-20原文2 min readUpdated: 2026-06-12

COBALT: Crowdsourcing Robot Learning via Cloud-Based Teleoperation with Smartphones

The scarcity of large-scale, high-quality demonstration data remains a bottleneck in scaling imitation learning for robotic manipulation. We present COBALT, a teleoperation platform designed to democratize robot learning at scale both in simulation and in the real world. By leveraging vectorized environments, our scalable, load-balanced infrastructure supports concurrent teleoperation by multiple users on a single GPU, yielding a significant reduction in teleoperation cost. Operators can connect from nearly anywhere on Earth using commonly available devices, including single or dual smartphones, VR headsets, 3D mice, and keyboards. An inmemory data cache and efficient video streaming keep control and rendering synchronous, sustaining dozens of concurrent users at 20 Hz with sub-100 ms end-to-end latency for up to 8 concurrent users per GPU. We also demonstrate stable operation supporting 256 simulated clients across 8 GPUs, underscoring the system's ability to scale across hardware and within individual servers. We perform a comprehensive user study showing that phone-based teleoperation performs comparably to or better than specialized hardware, enabling faster, more ergonomic data collection. To ensure data quality, COBALT logs a suite of real-time metrics to automatically filter suboptimal demonstrations. We further demonstrate that a structured user training curriculum significantly improves data collection quality. Guided by insights from our user study, we crowdsource the collection of a large-scale, high-quality pilot dataset with 7500+ demonstrations (50+ hours) collected with smartphones across nine countries over five days. We validate the dataset's quality by training state-of-the-art imitation learning algorithms. Please visit cobalt-teleop.github.io for more details.

SourcearXiv RoboticsAuthor: Ayush Agarwal, Ansh Gandhi, Jeremy A. Collins, Omar Rayyan, Aryan Sarswat, Ranjani Koushik, Masoud Moghani, Ajay Mandlekar, Animesh Garg

Article intelligence

EngineersAdvanced

Key points

COBALT leverages vectorized environments to support concurrent teleoperation by multiple users on a single GPU, drastically reducing cost.
Operators can use smartphones, VR headsets, or 3D mice from anywhere, with sub-100 ms latency for up to 8 concurrent users per GPU.
A user study found phone-based teleoperation comparable or superior to specialized hardware, enabling faster and more ergonomic data collection.
A crowdsourced pilot dataset of 7,500+ demonstrations collected over 5 days in 9 countries validated the platform's effectiveness for training state-of-the-art imitation learning algorithms.

Why it matters

This matters because COBALT leverages vectorized environments to support concurrent teleoperation by multiple users on a single GPU, drastically reducing cost.

Technical impact

May affect GPUs, inference clusters, compute cost, and supply-chain planning.

[2605.19138] COBALT: Crowdsourcing Robot Learning via Cloud-Based Teleoperation with Smartphones

[Submitted on 18 May 2026]

Title:COBALT: Crowdsourcing Robot Learning via Cloud-Based Teleoperation with Smartphones

View a PDF of the paper titled COBALT: Crowdsourcing Robot Learning via Cloud-Based Teleoperation with Smartphones, by Ayush Agarwal and 8 other authors

View PDF HTML (experimental)

Abstract:The scarcity of large-scale, high-quality demonstration data remains a bottleneck in scaling imitation learning for robotic manipulation. We present COBALT, a teleoperation platform designed to democratize robot learning at scale both in simulation and in the real world. By leveraging vectorized environments, our scalable, load-balanced infrastructure supports concurrent teleoperation by multiple users on a single GPU, yielding a significant reduction in teleoperation cost. Operators can connect from nearly anywhere on Earth using commonly available devices, including single or dual smartphones, VR headsets, 3D mice, and keyboards. An inmemory data cache and efficient video streaming keep control and rendering synchronous, sustaining dozens of concurrent users at 20 Hz with sub-100 ms end-to-end latency for up to 8 concurrent users per GPU. We also demonstrate stable operation supporting 256 simulated clients across 8 GPUs, underscoring the system's ability to scale across hardware and within individual servers. We perform a comprehensive user study showing that phone-based teleoperation performs comparably to or better than specialized hardware, enabling faster, more ergonomic data collection. To ensure data quality, COBALT logs a suite of real-time metrics to automatically filter suboptimal demonstrations. We further demonstrate that a structured user training curriculum significantly improves data collection quality. Guided by insights from our user study, we crowdsource the collection of a large-scale, high-quality pilot dataset with 7500+ demonstrations (50+ hours) collected with smartphones across nine countries over five days. We validate the dataset's quality by training state-of-the-art imitation learning algorithms. Please visit \href{this https URL}{this http URL} for more details.

Subjects:

Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Cite as: arXiv:2605.19138 [cs.RO]

(or arXiv:2605.19138v1 [cs.RO] for this version)

https://doi.org/10.48550/arXiv.2605.19138

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Ayush Agarwal [view email] [v1] Mon, 18 May 2026 21:37:32 UTC (6,921 KB)

Full-text links:

Access Paper:

View a PDF of the paper titled COBALT: Crowdsourcing Robot Learning via Cloud-Based Teleoperation with Smartphones, by Ayush Agarwal and 8 other authors

View PDF

HTML (experimental)

TeX Source

view license

Current browse context:

cs.RO

new | recent | 2026-05

Change to browse by:

cs cs.AI cs.LG

References & Citations

NASA ADS

Google Scholar

Semantic Scholar

Data provided by:

Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer Toggle

Bibliographic Explorer (What is the Explorer?)

Connected Papers Toggle

Connected Papers (What is Connected Papers?)

Litmaps Toggle

Litmaps (What is Litmaps?)

scite.ai Toggle

scite Smart Citations (What are Smart Citations?)

Code, Data, Media

Code, Data and Media Associated with this Article

alphaXiv Toggle

alphaXiv (What is alphaXiv?)

Links to Code Toggle

CatalyzeX Code Finder for Papers (What is CatalyzeX?)

DagsHub Toggle

DagsHub (What is DagsHub?)

GotitPub Toggle

Gotit.pub (What is GotitPub?)

Huggingface Toggle

Hugging Face (What is Huggingface?)

ScienceCast Toggle

ScienceCast (What is ScienceCast?)

Demos

Replicate Toggle

Replicate (What is Replicate?)

Spaces Toggle

Hugging Face Spaces (What is Spaces?)

Spaces Toggle

TXYZ.AI (What is TXYZ.AI?)