2026-06-05 04:00 UTCOriginal source2 min readUpdated: 2026-06-30 13:03 UTC

MoDex: A Diffusion Policy for Sequential Multi-Object Dexterous Grasping

MoDex is a diffusion-based policy that enables a dexterous hand to sequentially grasp multiple objects without releasing those already held. By conditioning on opposition space and point cloud, it uses only a subset of finger degrees of freedom per grasp. Two-stage training (imitation learning + RL fine-tuning) improves success in simulation and real world.

SourcearXiv RoboticsAuthor: Haofei Lu, Hongjia Liu, Yifei Dong, Florian T. Pokorny, Jens Lundell, Danica Kragic

[2606.05407] MoDex: A Diffusion Policy for Sequential Multi-Object Dexterous Grasping

[Submitted on 3 Jun 2026]

Title:MoDex: A Diffusion Policy for Sequential Multi-Object Dexterous Grasping

View a PDF of the paper titled MoDex: A Diffusion Policy for Sequential Multi-Object Dexterous Grasping, by Haofei Lu and 4 other authors

View PDF HTML (experimental)

Abstract:This work addresses sequentially grasping multiple objects with a single dexterous hand without releasing those already held. Most dexterous grasping methods commit all of the hand's degrees of freedom to a single object, underutilizing its dexterity and leaving no redundancy for subsequent grasps. The proposed solution, MoDex, is a diffusion policy that predicts the next gripper pose directly from observations, conditioned on an opposition space and point cloud. The opposition space condition specifies which fingers participate in the current grasp, enabling the gripper to use only a subset of its available degrees of freedom while reserving the remaining degrees of freedom for subsequent grasps. To facilitate sim-to-real transfer, MoDex is trained in two stages: first through imitation learning on expert demonstrations, and subsequently through reinforcement learning fine-tuning, which consistently improves success rates over the pre-trained policy. We evaluate MoDex in simulation on a MuJoCo-based Franka Emika Panda robot equipped with an Allegro Hand and on the corresponding real-world hardware platform. Across both simulation and real-world experiments, MoDex achieves higher success rates than the evaluated learning-based baselines, improving performance by 2.92-17.92% and 6.67-17.78%, respectively. Project page: this https URL.

Comments: Submitted to CoRL 2026

Subjects:

Robotics (cs.RO)

Cite as: arXiv:2606.05407 [cs.RO]

(or arXiv:2606.05407v1 [cs.RO] for this version)

https://doi.org/10.48550/arXiv.2606.05407

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Haofei Lu [view email] [v1] Wed, 3 Jun 2026 20:22:10 UTC (7,739 KB)

Full-text links:

Access Paper:

View a PDF of the paper titled MoDex: A Diffusion Policy for Sequential Multi-Object Dexterous Grasping, by Haofei Lu and 4 other authors

View PDF

HTML (experimental)

TeX Source

view license

Current browse context:

cs.RO

new | recent | 2026-06

Change to browse by:

References & Citations

NASA ADS

Google Scholar

Semantic Scholar

Data provided by:

Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer Toggle

Bibliographic Explorer (What is the Explorer?)

Connected Papers Toggle

Connected Papers (What is Connected Papers?)

Litmaps Toggle

Litmaps (What is Litmaps?)

scite.ai Toggle

scite Smart Citations (What are Smart Citations?)

Code, Data, Media

Code, Data and Media Associated with this Article

alphaXiv Toggle

alphaXiv (What is alphaXiv?)

Links to Code Toggle

CatalyzeX Code Finder for Papers (What is CatalyzeX?)

DagsHub Toggle

DagsHub (What is DagsHub?)

GotitPub Toggle

Gotit.pub (What is GotitPub?)

Huggingface Toggle

Hugging Face (What is Huggingface?)

ScienceCast Toggle

ScienceCast (What is ScienceCast?)

Demos

Replicate Toggle

Replicate (What is Replicate?)

Spaces Toggle

Hugging Face Spaces (What is Spaces?)

Spaces Toggle

TXYZ.AI (What is TXYZ.AI?)