NVIDIA Research Advances Robotics From Simulation to the Real World
At ICRA, NVIDIA Research highlights eight papers on sim-to-real transfer, enabling robots to perceive, reason, plan, and act in dynamic environments. Methods like ScheduleStream, COMPASS, Grasp-MPC, SPARR, and SEAL improve coordination, navigation, grasping, assembly, and task execution, with significant gains in success rates and robustness.
Article intelligence
Key points
- NVIDIA presents 8 papers on sim-to-real transfer at ICRA
- Methods include multi-arm coordination, cross-robot navigation, novel object grasping, precision assembly, and vision-language-action models
- Techniques like COMPASS, Grasp-MPC, SPARR, and SEAL achieve up to 4.5x improvement in success rates
- Large-scale open datasets released to accelerate robotics research
Why it matters
This matters because NVIDIA presents 8 papers on sim-to-real transfer at ICRA.
Technical impact
May affect model selection, inference cost, product capability, and evaluation benchmarks.
Robotics is entering a new phase: moving from controlled demos and scripted automation toward generalizable, reliable embodied autonomy in the real world.
At the International Conference on Robotics and Automation (ICRA), eight of NVIDIA Research’s 28 accepted papers show how simulation-to-real transfer is becoming a foundation for that shift, helping robots perceive, reason, plan and act across dynamic, unpredictable environments.
Together, the papers span the full stack of challenges robot developers face: coordinating multiple arms in parallel, building policies that generalize across robot bodies, grasping novel objects in clutter, performing precise assembly and developing vision-language-action models that reason before they move.
The throughline is clear: sim-to-real is becoming a foundation for robots that can adapt, generalize, and operate with greater reliability outside the lab.
Coordinating Arms, Navigating Bodies, Grasping Objects
Picture a pharmaceutical lab run by robotic arms: picking up tubes, transferring liquids, mixing reagents — each step taking different amounts of time, all requiring careful coordination.
Traditional robot scheduling software handles those steps sequentially, one arm at a time.
ScheduleStream changes that by running computations on GPUs, letting multiple arms plan movements and operate in parallel. The result — a 3x speedup across multi-arm planning scenarios, on hardware like the NVIDIA Jetson edge AI platform. Code for the framework is available on GitHub.
https://blogs.nvidia.com/wp-content/uploads/2026/05/supplementary.mp4
A robot that learns to navigate through a space — avoiding obstacles and finding its destination — usually learns to do it in one body. Put the same navigation software into a differently shaped robot and it often falls apart, because its parts all move differently.
The COMPASS policy framework solves this by first building the baseline navigation functionality using imitation learning and then using residual reinforcement learning in NVIDIA Isaac Lab to build specialists for diverse robot embodiments. Crucially, no real-world robot data is involved at any stage: everything is trained in Isaac Lab simulation.
Compared with an imitation learning baseline, COMPASS achieved a 4.5x improvement in average success rate. It also seamlessly transfers to real-world environments, demonstrating around 80% success across 20 real-world navigation trials on autonomous mobile robots and humanoids.
COMPASS is agent-friendly, with dedicated skills — and developers can connect the pipeline with NVIDIA Omniverse NuRec to post-train and validate robots in a digital twin of a novel environment before deployment.
Most grasping systems identify the object, predict a grasp, plan a path, then execute. But the last few centimeters are where small errors matter most.
Grasp-MPC adaptively computes robotic grasps, continuously correcting the robot’s motion as it closes in on the object, rather than carrying out a fixed plan — the way a person grabs something by feeling rather than calculating every joint angle in advance.
To build the policy, the researchers generated 2 million simulated trajectories across 8,000 objects using annotations from the GraspGen dataset and motion planning data from cuRobo, a CUDA-accelerated library for robot motion generation.
After training on both successful and failed trajectories, Grasp-MPC learned to grasp novel objects in cluttered tabletops and shelves — achieving around 75% overall success on real robots, compared with a baseline of 41%.
https://blogs.nvidia.com/wp-content/uploads/2026/05/Sequential-Object-Grasping-2.mp4
Deformable Cluster Manipulation introduces a framework that tackles a parallel challenge: enabling systems to grasp not just one object, but a whole bundle of flexible, tangled material at once.
The framework was motivated by a real-world task: clearing a mass of tree branches that have grown over a power line, where there’s no single clean object to grab. The system uses its entire arm, not just the gripper: wrapping it around the branch cluster and sweeping it aside, the way someone might gather an armful of cables or push a tangle of brush out of the way.
The researchers built a tree generator using biological growth equations to create synthetic trees of many different shapes and sizes — then trained the system across thousands of them in NVIDIA Isaac open simulation frameworks.
The policy deploys to real branches zero shot. Beyond power lines, the researchers see potential in cable management, agricultural inspection and anywhere robots need to handle a tangle rather than a single graspable item.
Clearing tree branches in zero-shot sim-to-real deployment.
Assembling With Precision
Precise assembly — threading a nut onto a bolt, inserting a gear onto a gearshaft, pressing a peg into a hole — is notoriously hard to get right with simulation alone.
The real world is complex. Real surfaces aren’t perfectly smooth. Sensors don’t behave as specified. Tiny discrepancies that a simulator ignores can stop a robot in its tracks.
The SPARR method addresses this by splitting the job in two. A policy trained in Isaac Lab learns the general strategy for the assembly task in simulation. Then, on the actual hardware, a second layer learns to correct for whatever the simulator got wrong — using the robot’s own camera and without any human demonstrations or guidance.
SPARR improves success rates by 38% and reduces cycle time by around 30% compared with zero-shot sim-to-real baselines.
On National Institute of Standards and Technology (NIST) assembly tasks not seen during training, success improves by nearly 75% — approaching the results of methods that require a human in the loop.
The Refinery framework takes on the next layer of difficulty in assembly: tasks with multiple sequential steps, where how step one is finished determines whether step two is even possible. It’s like assembling furniture — leave a panel at the wrong angle, and the next fastener won’t go in.
By understanding how success varies across initial conditions and training across hundreds of simulated assembly scenarios, Refinery learns how to complete each step and leave each component in a position that sets up the next. It achieves 91% simulation success and a nearly 11% mean improvement over baselines with comparable real-world results — and its policies can be chained to handle long, multi-part sequences.
Action Models That Keep Their Word
The PEEK pipeline helps robots see past the clutter. In a typical manipulation task, the robot’s camera picks up everything in the scene — but most of it is irrelevant noise.
One task demonstrated on the PEEK project page is “give the banana to NVIDIA founder and CEO Jensen Huang”: a photo of Huang sits on a table alongside a photo of Michael Jordan, a collection of unrelated objects and other distractors.
A human doing the task instantly focuses on the banana and the right photo; a standard robot policy has to process everything and often gets confused. PEEK solves this by having a vision language model read the task instruction and focus the robot’s line of vision accordingly — showing a movement path, and highlighting around the objects that matter, while fading out everything else.
The policy then acts on that annotated view rather than the raw scene. For a policy trained purely in simulation, adding PEEK produced a 41x real-world improvement in accuracy. For large VLA models and smaller policies, gains range from 2-3.5x. Because it works at the image level, PEEK integrates with any camera-based policy without modification.
https://blogs.nvidia.com/wp-content/uploads/2026/05/8x_small_cube_move_clutter_2x_1.mp4
Do What You Say — a collaboration with researchers at Carnegie Mellon University, University of Utah and University of Sydney — addresses a specific failure mode that matters more as robots tackle longer, more complex tasks.
Give a robot an instruction like “store everything on this table inside the cabinet” or “prepare a Manhattan,” and it has to break that down into individual steps and execute them in sequence.
The problem is that the AI model can correctly reason through what it needs to do — and then execute something different.
The method, called SEAL, fixes this at runtime without any retraining: the robot generates several candidate action sequences, thinks through where each one would actually lead and picks the outcome that matches what it said it would do. SEAL delivers up to 15% accuracy gains over prior work, with robustness against rephrased instructions, changed objects, scene clutter and shifted camera angles.
https://blogs.nvidia.com/wp-content/uploads/2026/05/rollout_put_the_red_mug_on_the_plate_and_put_the_chocolate_pudding_to_the_right_of_the_plate_real_exec_43_success_length_359.mp4
In addition to papers, NVIDIA is expanding robotics research infrastructure with large-scale open datasets for robotics. The NVIDIA Physical AI Dataset is the world’s largest open dataset for physical development, surpassing 15 million+ downloads, while NVIDIA Isaac GR00T X Embodiment Sim has become one of the most-downloaded robotics datasets.
Universities Accelerate Physical AI Research With NVIDIA Technologies
Robotics teams from universities such as Carnegie Mellon University (CMU), ETH Zurich, MIT and University of Texas at Austin are tapping NVIDIA technologies to move physical AI research from simulation to real-world systems — with nearly 50 accepted papers referencing NVIDIA-accelerated simulation, robot learning and compute.
Examples include a paper from CMU demonstrating a robotic control framework trained in NVIDIA Isaac Lab and MIT work on large language model-guided reinforcement learning powered by NVIDIA GPUs.
Explore NVIDIA Research’s physical AI work. Developers can get started with Isaac Lab and Isaac Sim.
Stay up to date by subscribing to our newsletter, and following NVIDIA Robotics on LinkedIn, Instagram, X and Facebook.
To start your robotics journey, enroll in our free NVIDIA Robotics Fundamentals courses today.