Path Planning Using Deep Deterministic Policy Gradient: A Reinforcement Learning Approach
This paper proposes a path planning method for autonomous vehicles in threat-laden environments using Deep Deterministic Policy Gradient (DDPG). Threats are modeled as circular no-go zones. The DDPG agent learns a direct mapping from state to actions via trial and error, with a reward function comprising attractive, repulsive, and energy penalty components. Compared to traditional pseudospectral optimal control, DDPG is significantly faster while producing effective paths, making it suitable for real-time applications.
[2606.07855] Path Planning Using Deep Deterministic Policy Gradient: A Reinforcement Learning Approach
[Submitted on 5 Jun 2026]
Title:Path Planning Using Deep Deterministic Policy Gradient: A Reinforcement Learning Approach
View a PDF of the paper titled Path Planning Using Deep Deterministic Policy Gradient: A Reinforcement Learning Approach, by Qiang Le and Yaguang Yang and Isaac E. Weintraub
View PDF HTML (experimental)
Abstract:Path-planning for autonomous vehicles in threat-laden environments is a fundamental challenge because the problem is nonlinear and nonconvex even in simplest scenarios. While traditional optimal control methods can be used to find ideal paths, the computational time is often too slow for real-time decision-making. To solve this challenge, we propose a method based on Deep Deterministic Policy Gradient (DDPG) and model the threat as possibly multiple circular 'no-go' zones. A mission is regarded as a failure if the vehicle enters this restricted zone at any time or does not reach a neighborhood of the destination. The DDPG agent is trained through trial and error in a simulated environment, learning a direct mapping from its current state (position and heading) to a series of feasible actions that guide the agent to safely reach its destination. The reword function has three parts: (a) an attractive field centered at the final destination, (b) some repulsive fields centered at the origins of circular obstacles, and (c) a penalty of control energy consumption (the magnitude of heading change) that indirectly in favor for straight path. The DDPG trains the agent using these incentives to find the largest possible set of starting points wherein a safe path to the destination is guaranteed. This provides critical information for mission planning, showing beforehand whether a task is achievable from a given starting point, assisting pre-mission planning activities. The approach is validated in simulation. A comparison between the DDPG method and a traditional optimal control (pseudo-spectral) method is carried out. The results show that the learning-based agent produces effective paths while being significantly faster, making it a better fit for real-time applications.
Comments: 14 pages, 12 figures
Subjects:
Robotics (cs.RO); Optimization and Control (math.OC)
Cite as: arXiv:2606.07855 [cs.RO]
(or arXiv:2606.07855v1 [cs.RO] for this version)
https://doi.org/10.48550/arXiv.2606.07855
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Yaguang Yang [view email] [v1] Fri, 5 Jun 2026 21:35:54 UTC (734 KB)
Full-text links:
Access Paper:
View a PDF of the paper titled Path Planning Using Deep Deterministic Policy Gradient: A Reinforcement Learning Approach, by Qiang Le and Yaguang Yang and Isaac E. Weintraub
View PDF
HTML (experimental)
TeX Source
view license
Current browse context:
cs.RO
new | recent | 2026-06
Change to browse by:
cs math math.OC
References & Citations
NASA ADS
Google Scholar
Semantic Scholar
Loading...
Data provided by:
Bibliographic Tools
Bibliographic and Citation Tools
Bibliographic Explorer Toggle
Bibliographic Explorer (What is the Explorer?)
Connected Papers Toggle
Connected Papers (What is Connected Papers?)
Litmaps Toggle
Litmaps (What is Litmaps?)
scite.ai Toggle
scite Smart Citations (What are Smart Citations?)
Code, Data, Media
Code, Data and Media Associated with this Article
alphaXiv Toggle
alphaXiv (What is alphaXiv?)
Links to Code Toggle
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub Toggle
DagsHub (What is DagsHub?)
GotitPub Toggle
Gotit.pub (What is GotitPub?)
Huggingface Toggle
Hugging Face (What is Huggingface?)
ScienceCast Toggle
ScienceCast (What is ScienceCast?)
Demos
Demos
Replicate Toggle
Replicate (What is Replicate?)
Spaces Toggle
Hugging Face Spaces (What is Spaces?)
Spaces Toggle
TXYZ.AI (What is TXYZ.AI?)
Related Papers
Recommenders and Search Tools
Link to Influence Flower
Influence Flower (What are Influence Flowers?)
Core recommender toggle
CORE Recommender (What is CORE?)
Author
Venue
Institution
Topic
About arXivLabs
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.
Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)