DiffusionVS: A Generative Framework for Robust Visual Servoing Based on Diffusion Policy
The paper presents DiffusionVS, a diffusion-based visual servoing method that uses conditional denoising to generate camera velocity and online training for improved generalization. It achieves nearly 100% success in simulation and 93% in physical experiments, and can be integrated into existing visual servoing networks to boost performance.
[2606.19397] DiffusionVS: A Generative Framework for Robust Visual Servoing Based on Diffusion Policy
[Submitted on 17 Jun 2026]
Title:DiffusionVS: A Generative Framework for Robust Visual Servoing Based on Diffusion Policy
View a PDF of the paper titled DiffusionVS: A Generative Framework for Robust Visual Servoing Based on Diffusion Policy, by Hongkang Cui and 1 other authors
View PDF HTML (experimental)
Abstract:Visual servoing is a fundamental technique in robotic manipulation and navigation. Regression-based visual servoing frequently experiences trajectory jitter as a result of noise-sensitive single-step mappings and the accumulation of errors during distribution shifts. In contrast, Diffusion Policy maintains temporal consistency by predicting action sequences and improves robustness through implicit data augmentation.
This paper presents a novel diffusion-based servoing method. Based on Diffusion Policy, the proposed approach uses normalized image coordinates of observed tag corners as input and generates camera velocity through conditional denoising. To overcome the generalization limitations of models trained on static datasets, an online training paradigm is adopted, continuously expanding the diversity of training data through interactive experience collection. This strategy substantially enhances both the performance and generalization capability of the model. Comprehensive simulations and real-world experiments demonstrate the effectiveness of the proposed method, achieving success rates of nearly 100\% in simulation and 93\% in physical experiments. Beyond the specific pipeline, we further validate the generality of the diffusion mechanism. Experiments show that existing visual servoing networks consistently achieve improved performance when integrated with our diffusion-based module. These results indicate that the proposed strategy possesses broad applicability and can enhance various visual servoing systems beyond the specific architecture presented here.
Comments: 8 pages, 4 figures, 7 tables
Subjects:
Robotics (cs.RO)
Cite as: arXiv:2606.19397 [cs.RO]
(or arXiv:2606.19397v1 [cs.RO] for this version)
https://doi.org/10.48550/arXiv.2606.19397
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Hongkang Cui [view email] [v1] Wed, 17 Jun 2026 08:06:05 UTC (2,709 KB)
Full-text links:
Access Paper:
View a PDF of the paper titled DiffusionVS: A Generative Framework for Robust Visual Servoing Based on Diffusion Policy, by Hongkang Cui and 1 other authors
View PDF
HTML (experimental)
TeX Source
view license
Current browse context:
cs.RO
new | recent | 2026-06
Change to browse by:
cs
References & Citations
NASA ADS
Google Scholar
Semantic Scholar
Loading...
Data provided by:
Bibliographic Tools
Bibliographic and Citation Tools
Bibliographic Explorer Toggle
Bibliographic Explorer (What is the Explorer?)
Connected Papers Toggle
Connected Papers (What is Connected Papers?)
Litmaps Toggle
Litmaps (What is Litmaps?)
scite.ai Toggle
scite Smart Citations (What are Smart Citations?)
Code, Data, Media
Code, Data and Media Associated with this Article
alphaXiv Toggle
alphaXiv (What is alphaXiv?)
Links to Code Toggle
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub Toggle
DagsHub (What is DagsHub?)
GotitPub Toggle
Gotit.pub (What is GotitPub?)
Huggingface Toggle
Hugging Face (What is Huggingface?)
ScienceCast Toggle
ScienceCast (What is ScienceCast?)
Demos
Demos
Replicate Toggle
Replicate (What is Replicate?)
Spaces Toggle
Hugging Face Spaces (What is Spaces?)
Spaces Toggle
TXYZ.AI (What is TXYZ.AI?)
Related Papers
Recommenders and Search Tools
Link to Influence Flower
Influence Flower (What are Influence Flowers?)
Core recommender toggle
CORE Recommender (What is CORE?)
Author
Venue
Institution
Topic
About arXivLabs
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.
Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)