AI News HubLIVE
原文3 min read

Path Planning Using Deep Deterministic Policy Gradient: A Reinforcement Learning Approach

This paper proposes a path planning method for autonomous vehicles in threat-laden environments using Deep Deterministic Policy Gradient (DDPG). Threats are modeled as circular no-go zones. The DDPG agent learns a direct mapping from state to actions via trial and error, with a reward function comprising attractive, repulsive, and energy penalty components. Compared to traditional pseudospectral optimal control, DDPG is significantly faster while producing effective paths, making it suitable for real-time applications.

SourcearXiv RoboticsAuthor: Qiang Le, Yaguang Yang, Isaac E. Weintraub

[2606.07855] Path Planning Using Deep Deterministic Policy Gradient: A Reinforcement Learning Approach

[Submitted on 5 Jun 2026]

Title:Path Planning Using Deep Deterministic Policy Gradient: A Reinforcement Learning Approach

View a PDF of the paper titled Path Planning Using Deep Deterministic Policy Gradient: A Reinforcement Learning Approach, by Qiang Le and Yaguang Yang and Isaac E. Weintraub

View PDF HTML (experimental)

Abstract:Path-planning for autonomous vehicles in threat-laden environments is a fundamental challenge because the problem is nonlinear and nonconvex even in simplest scenarios. While traditional optimal control methods can be used to find ideal paths, the computational time is often too slow for real-time decision-making. To solve this challenge, we propose a method based on Deep Deterministic Policy Gradient (DDPG) and model the threat as possibly multiple circular 'no-go' zones. A mission is regarded as a failure if the vehicle enters this restricted zone at any time or does not reach a neighborhood of the destination. The DDPG agent is trained through trial and error in a simulated environment, learning a direct mapping from its current state (position and heading) to a series of feasible actions that guide the agent to safely reach its destination. The reword function has three parts: (a) an attractive field centered at the final destination, (b) some repulsive fields centered at the origins of circular obstacles, and (c) a penalty of control energy consumption (the magnitude of heading change) that indirectly in favor for straight path. The DDPG trains the agent using these incentives to find the largest possible set of starting points wherein a safe path to the destination is guaranteed. This provides critical information for mission planning, showing beforehand whether a task is achievable from a given starting point, assisting pre-mission planning activities. The approach is validated in simulation. A comparison between the DDPG method and a traditional optimal control (pseudo-spectral) method is carried out. The results show that the learning-based agent produces effective paths while being significantly faster, making it a better fit for real-time applications.

Comments: 14 pages, 12 figures

Subjects:

Robotics (cs.RO); Optimization and Control (math.OC)

Cite as: arXiv:2606.07855 [cs.RO]

(or arXiv:2606.07855v1 [cs.RO] for this version)

https://doi.org/10.48550/arXiv.2606.07855

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Yaguang Yang [view email] [v1] Fri, 5 Jun 2026 21:35:54 UTC (734 KB)

Full-text links:

Access Paper:

View a PDF of the paper titled Path Planning Using Deep Deterministic Policy Gradient: A Reinforcement Learning Approach, by Qiang Le and Yaguang Yang and Isaac E. Weintraub

View PDF

HTML (experimental)

TeX Source

view license

Current browse context:

cs.RO

new | recent | 2026-06

Change to browse by:

cs math math.OC

References & Citations

NASA ADS

Google Scholar

Semantic Scholar

Loading...

Data provided by:

Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer Toggle

Bibliographic Explorer (What is the Explorer?)

Connected Papers Toggle

Connected Papers (What is Connected Papers?)

Litmaps Toggle

Litmaps (What is Litmaps?)

scite.ai Toggle

scite Smart Citations (What are Smart Citations?)

Code, Data, Media

Code, Data and Media Associated with this Article

alphaXiv Toggle

alphaXiv (What is alphaXiv?)

Links to Code Toggle

CatalyzeX Code Finder for Papers (What is CatalyzeX?)

DagsHub Toggle

DagsHub (What is DagsHub?)

GotitPub Toggle

Gotit.pub (What is GotitPub?)

Huggingface Toggle

Hugging Face (What is Huggingface?)

ScienceCast Toggle

ScienceCast (What is ScienceCast?)

Demos

Demos

Replicate Toggle

Replicate (What is Replicate?)

Spaces Toggle

Hugging Face Spaces (What is Spaces?)

Spaces Toggle

TXYZ.AI (What is TXYZ.AI?)

Related Papers

Recommenders and Search Tools

Link to Influence Flower

Influence Flower (What are Influence Flowers?)

Core recommender toggle

CORE Recommender (What is CORE?)

Author

Venue

Institution

Topic

About arXivLabs

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)