VASO: Formally Verifiable Self-Evolving Skills for Physical AI Agents
VASO is a framework that uses formal verification to guide the self-evolution of LLM-generated robot skill contracts. On Clearpath Jackal and PX4 quadcopter tasks, it achieves 97.2% formal-specification compliance with fewer than 100 optimization samples, outperforming execution-feedback, prompt-optimization, and fine-tuning baselines. It is the first framework to close the loop between formal verification and self-evolving skills for physical AI agents.
[2606.05395] VASO: Formally Verifiable Self-Evolving Skills for Physical AI Agents
[Submitted on 3 Jun 2026]
Title:VASO: Formally Verifiable Self-Evolving Skills for Physical AI Agents
View a PDF of the paper titled VASO: Formally Verifiable Self-Evolving Skills for Physical AI Agents, by Yunhao Yang and 5 other authors
View PDF HTML (experimental)
Abstract:Reusable robot skills are becoming the basic units through which embodied agents turn open-ended instructions into long-horizon physical behavior. We argue that, while foundation models have collapsed the cost of creating these skills, the cost of trusting them has not. Existing skill-evolution loops refine skills through execution feedback, unit tests, environment reward, or LLM self-critique, but these signals provide only trace-level evidence: they show that a skill worked on sampled executions, not that skill-induced plans satisfy temporal safety contracts under untested conditions. We introduce VASO, a framework for verification-guided self-evolution of LLM-generated robot skill contracts. In VASO, each skill is represented as a semantic contract with two coupled interfaces: a formal interface that aligns robot states, observations, and control commands with logical propositions for model checking, and a planner-facing interface that guides executable behavior generation. A model checker first filters logically inconsistent skill contracts, then verifies plans induced by the skill against global and local temporal specifications. When verification fails, VASO translates the counterexample trace into a textual gradient that updates the reusable skill contract while keeping foundation-model weights frozen. On Clearpath Jackal and PX4 quadcopter tasks, VASO reaches 97.2% formal-specification compliance using fewer than 100 optimization samples, outperforming execution-feedback, prompt-optimization, and fine-tuning baselines. To our knowledge, VASO is the first framework that closes the loop between formal verification and self-evolving LLM-generated skills for physical AI agents: formal counterexamples become optimization feedback for reusable robot skill contracts, rather than merely verifying one-off plans, tuning planner prompts, or fine-tuning model weights.
Comments: Project webpage: this https URL
Subjects:
Robotics (cs.RO); Artificial Intelligence (cs.AI)
Cite as: arXiv:2606.05395 [cs.RO]
(or arXiv:2606.05395v1 [cs.RO] for this version)
https://doi.org/10.48550/arXiv.2606.05395
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Yunhao Yang [view email] [v1] Wed, 3 Jun 2026 20:02:35 UTC (18,510 KB)
Full-text links:
Access Paper:
View a PDF of the paper titled VASO: Formally Verifiable Self-Evolving Skills for Physical AI Agents, by Yunhao Yang and 5 other authors
View PDF
HTML (experimental)
TeX Source
view license
Current browse context:
cs.RO
new | recent | 2026-06
Change to browse by:
cs cs.AI
References & Citations
NASA ADS
Google Scholar
Semantic Scholar
Loading...
Data provided by:
Bibliographic Tools
Bibliographic and Citation Tools
Bibliographic Explorer Toggle
Bibliographic Explorer (What is the Explorer?)
Connected Papers Toggle
Connected Papers (What is Connected Papers?)
Litmaps Toggle
Litmaps (What is Litmaps?)
scite.ai Toggle
scite Smart Citations (What are Smart Citations?)
Code, Data, Media
Code, Data and Media Associated with this Article
alphaXiv Toggle
alphaXiv (What is alphaXiv?)
Links to Code Toggle
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub Toggle
DagsHub (What is DagsHub?)
GotitPub Toggle
Gotit.pub (What is GotitPub?)
Huggingface Toggle
Hugging Face (What is Huggingface?)
ScienceCast Toggle
ScienceCast (What is ScienceCast?)
Demos
Demos
Replicate Toggle
Replicate (What is Replicate?)
Spaces Toggle
Hugging Face Spaces (What is Spaces?)
Spaces Toggle
TXYZ.AI (What is TXYZ.AI?)
Related Papers
Recommenders and Search Tools
Link to Influence Flower
Influence Flower (What are Influence Flowers?)
Core recommender toggle
CORE Recommender (What is CORE?)
Author
Venue
Institution
Topic
About arXivLabs
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.
Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)