AI News HubLIVE
原文2 min read

DiffusionVS: A Generative Framework for Robust Visual Servoing Based on Diffusion Policy

The paper presents DiffusionVS, a diffusion-based visual servoing method that uses conditional denoising to generate camera velocity and online training for improved generalization. It achieves nearly 100% success in simulation and 93% in physical experiments, and can be integrated into existing visual servoing networks to boost performance.

SourcearXiv RoboticsAuthor: Hongkang Cui, Rui He, Haoyao Chen

[2606.19397] DiffusionVS: A Generative Framework for Robust Visual Servoing Based on Diffusion Policy

[Submitted on 17 Jun 2026]

Title:DiffusionVS: A Generative Framework for Robust Visual Servoing Based on Diffusion Policy

View a PDF of the paper titled DiffusionVS: A Generative Framework for Robust Visual Servoing Based on Diffusion Policy, by Hongkang Cui and 1 other authors

View PDF HTML (experimental)

Abstract:Visual servoing is a fundamental technique in robotic manipulation and navigation. Regression-based visual servoing frequently experiences trajectory jitter as a result of noise-sensitive single-step mappings and the accumulation of errors during distribution shifts. In contrast, Diffusion Policy maintains temporal consistency by predicting action sequences and improves robustness through implicit data augmentation.

This paper presents a novel diffusion-based servoing method. Based on Diffusion Policy, the proposed approach uses normalized image coordinates of observed tag corners as input and generates camera velocity through conditional denoising. To overcome the generalization limitations of models trained on static datasets, an online training paradigm is adopted, continuously expanding the diversity of training data through interactive experience collection. This strategy substantially enhances both the performance and generalization capability of the model. Comprehensive simulations and real-world experiments demonstrate the effectiveness of the proposed method, achieving success rates of nearly 100\% in simulation and 93\% in physical experiments. Beyond the specific pipeline, we further validate the generality of the diffusion mechanism. Experiments show that existing visual servoing networks consistently achieve improved performance when integrated with our diffusion-based module. These results indicate that the proposed strategy possesses broad applicability and can enhance various visual servoing systems beyond the specific architecture presented here.

Comments: 8 pages, 4 figures, 7 tables

Subjects:

Robotics (cs.RO)

Cite as: arXiv:2606.19397 [cs.RO]

(or arXiv:2606.19397v1 [cs.RO] for this version)

https://doi.org/10.48550/arXiv.2606.19397

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Hongkang Cui [view email] [v1] Wed, 17 Jun 2026 08:06:05 UTC (2,709 KB)

Full-text links:

Access Paper:

View a PDF of the paper titled DiffusionVS: A Generative Framework for Robust Visual Servoing Based on Diffusion Policy, by Hongkang Cui and 1 other authors

View PDF

HTML (experimental)

TeX Source

view license

Current browse context:

cs.RO

new | recent | 2026-06

Change to browse by:

cs

References & Citations

NASA ADS

Google Scholar

Semantic Scholar

Loading...

Data provided by:

Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer Toggle

Bibliographic Explorer (What is the Explorer?)

Connected Papers Toggle

Connected Papers (What is Connected Papers?)

Litmaps Toggle

Litmaps (What is Litmaps?)

scite.ai Toggle

scite Smart Citations (What are Smart Citations?)

Code, Data, Media

Code, Data and Media Associated with this Article

alphaXiv Toggle

alphaXiv (What is alphaXiv?)

Links to Code Toggle

CatalyzeX Code Finder for Papers (What is CatalyzeX?)

DagsHub Toggle

DagsHub (What is DagsHub?)

GotitPub Toggle

Gotit.pub (What is GotitPub?)

Huggingface Toggle

Hugging Face (What is Huggingface?)

ScienceCast Toggle

ScienceCast (What is ScienceCast?)

Demos

Demos

Replicate Toggle

Replicate (What is Replicate?)

Spaces Toggle

Hugging Face Spaces (What is Spaces?)

Spaces Toggle

TXYZ.AI (What is TXYZ.AI?)

Related Papers

Recommenders and Search Tools

Link to Influence Flower

Influence Flower (What are Influence Flowers?)

Core recommender toggle

CORE Recommender (What is CORE?)

Author

Venue

Institution

Topic

About arXivLabs

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)