Cosmos 3: Omnimodal World Models for Physical AI
NVIDIA introduces Cosmos 3, a family of omnimodal world models that jointly process and generate language, image, video, audio, and action sequences using a unified mixture-of-transformers architecture. It achieves state-of-the-art on understanding and generation tasks, and is released open-source under the OpenMDW-1.1 license.
[2606.02800] Cosmos 3: Omnimodal World Models for Physical AI
[Submitted on 1 Jun 2026]
Title:Cosmos 3: Omnimodal World Models for Physical AI
et al. (191 additional authors not shown)
View a PDF of the paper titled Cosmos 3: Omnimodal World Models for Physical AI, by Aditi and 290 other authors
View PDF
Abstract:We introduce Cosmos 3, a family of omnimodal world models designed to jointly process and generate language, image, video, audio, and action sequences within a unified mixture-of-transformers architecture. By supporting highly flexible input-output configurations, Cosmos 3 seamlessly unifies critical modalities for Physical AI -- effectively subsuming vision-language models, video generators, world simulators, and world-action models into a single framework. Our evaluation demonstrates that Cosmos 3 establishes a new state-of-the-art across a diverse suite of understanding and generation tasks, demonstrating omnimodal world models as scalable, general-purpose backbones for embodied agents. Our post-trained Cosmos 3 models were ranked as the best open-source Text-to-Image and Image-to-Video models by Artificial Analysis, and the best policy model by RoboArena at the time the technical report was written. To accelerate open research and deployment in Physical AI, we make our code, model checkpoints, curated synthetic datasets, and evaluation benchmark available under the Linux Foundation's OpenMDW-1.1 this https URL License at this https URL}{this http URL and this https URL . The project website is available at this https URL .
Subjects:
Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Robotics (cs.RO)
Cite as: arXiv:2606.02800 [cs.CV]
(or arXiv:2606.02800v1 [cs.CV] for this version)
https://doi.org/10.48550/arXiv.2606.02800
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Yin Cui [view email] [v1] Mon, 1 Jun 2026 19:12:30 UTC (30,203 KB)
Full-text links:
Access Paper:
View a PDF of the paper titled Cosmos 3: Omnimodal World Models for Physical AI, by Aditi and 290 other authors
View PDF
TeX Source
view license
Current browse context:
cs.CV
new | recent | 2026-06
Change to browse by:
cs cs.AI cs.LG cs.MM cs.RO
References & Citations
NASA ADS
Google Scholar
Semantic Scholar
Loading...
Data provided by:
Bibliographic Tools
Bibliographic and Citation Tools
Bibliographic Explorer Toggle
Bibliographic Explorer (What is the Explorer?)
Connected Papers Toggle
Connected Papers (What is Connected Papers?)
Litmaps Toggle
Litmaps (What is Litmaps?)
scite.ai Toggle
scite Smart Citations (What are Smart Citations?)
Code, Data, Media
Code, Data and Media Associated with this Article
alphaXiv Toggle
alphaXiv (What is alphaXiv?)
Links to Code Toggle
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub Toggle
DagsHub (What is DagsHub?)
GotitPub Toggle
Gotit.pub (What is GotitPub?)
Huggingface Toggle
Hugging Face (What is Huggingface?)
ScienceCast Toggle
ScienceCast (What is ScienceCast?)
Demos
Demos
Replicate Toggle
Replicate (What is Replicate?)
Spaces Toggle
Hugging Face Spaces (What is Spaces?)
Spaces Toggle
TXYZ.AI (What is TXYZ.AI?)
Related Papers
Recommenders and Search Tools
Link to Influence Flower
Influence Flower (What are Influence Flowers?)
Core recommender toggle
CORE Recommender (What is CORE?)
Author
Venue
Institution
Topic
About arXivLabs
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.
Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)