Abstract
Background: In most of the researches, the autonomous driving problem is either solved
with policy gradients or DQN. In this paper, we tried to eliminate the problem of policy gradients which
is high variance and an overestimation of values in DQN. We used DDQN as it has low variance, and it
solves the problem of overestimation in DQN.
Aim: The main aim of this paper is to propose a framework for an autonomous driving model that takes
in raw sensor information as input data and predicts actions as output from the model which could then
be used for simulating the car.
Objective: The main objective of this paper is to use DDQN and Discretization technique to solve the
autonomous driving problem and get better results even with a continuous action space.
Methods: To solve the bridge between self-driving cars and reinforcement learning we used Double
Deep Q-Networks as this could help to prevent the overestimation of values by decoupling the selection
from the evaluation. Also, to solve the problem of continuous action space we used the discretization
technique in which variables are grouped into bins and each bin is assigned a value in such a way that
the relationship between the bins is preserved.
Result: The experimental results showed improved performance of the agent. The agent was tested for
different conditions like curve roads and traffic, which showed the agent can drive at different conditions
as well. We also illustrated how DDQN performed well over policy gradients just by adding a
simple discretization technique to make the action space discrete and overcoming the issue of overestimation
of q-values.
Conclusion: The gym environment and reward function were designed for DDQN to work. We have
also used CARLA as a virtual simulator for training purposes. Finally, we have demonstrated that our
agent could perform well in different cases and conditions. As a reminder note, we can improve our
agent to also work for following traffic light rules and other road safety measures.
Keywords:
Deep Reinforcement Learning, DDQN, Artificial Intelligence, Autonomous Driving, CARLA, overestimation.
Graphical Abstract
[2]
Liu Z. A survey of intelligence methods in urban traffic signal control. IJCSNS 2007; 7(7): 105-12.
[3]
Hessel M, Modayil J, Van Hasselt H, et al. Rainbow: Combining improvements in deep reinforcement learning. Thirty-Second AAAI Conference on Artificial Intelligence
[4]
Wu K, Wang H, Esfahani MA, Yuan S. BND*-DDQN: Learn to Steer Autonomously through Deep Reinforcement Learning. IEEE Transactions on Cognitive and Developmental Systems 2019.
[6]
Abed-alguni BH, Ottom MA. Double delayed Q-learning. International Journal of Artificial Intelligence 2018; 16(2): 41-59.
[10]
Dosovitskiy A, Ros G, Codevilla F, Lopez A, Koltun V. CARLA: An open urban driving simulator. arXiv preprint 2017.
[12]
Brockman G, Cheung V, Pettersson L, et al. Openai gym. arXiv preprint 2016.
[13]
Zamora I, Lopez NG, Vilches VM, Cordero AH. Extending the openai gym for robotics: a toolkit for reinforcement learning using ros and gazebo. arXiv preprint 2016.
[14]
Abadi M, Barham P, Chen J, et al. 2016. Tensorflow: A system for large-scale machine learning.12th {USENIX} symposium on operating systems design and implementation ({OSDI} 16). 265-83.
[15]
Team TTD, Al-Rfou R, Alain G, et al. Theano: A Python framework for fast computation of mathematical expressions. arXiv preprint 2016.
[19]
Pomerleau DA. Alvinn: An autonomous land vehicle in a neural networkTechnical report. Carnegie Mellon University, Computer Science Department 1989.
[20]
Stéphane Ross and Drew Bagnell. Efficient reductions for imitation learning. AISTATS 2010; 3: 3-5.
[21]
Hausknecht M, Stone P. Deep recurrent q-learning for partially observable mdps. 2015. >AAAI Fall Symposium Series
[24]
Thrun S, Schwartz A. Issues in using function approximation for reinforcement learning. Proceedings of the 1993 Connectionist Models Summer School.
[27]
Grzes M. Reward shaping in episodic reinforcement learning 2017. 565-73.
[28]
Laud AD. Theory and application of reward shaping in reinforcement learning. University of Illinois at Urbana-Champaign 2004; pp. 1-97.
[31]
Schulman J, Chen X, Abbeel P. . Equivalence between policy gradients and soft q-learning. arXiv preprint 2017.
[32]
Aggarwal D, Bali V, Mittal S. An insight into machine learning techniques for predictive analysis and feature selection. International Journal of Innovative Technology and Exploring Engineering 2019; 8(9): 342-9.
[33]
Nagar R, Aggarwal D, Saxena UR, Bali V. Cancer Prediction Using Machine Learning Techniques Based on Clinical & Non-Clinical Parameters. Int J Adv Sci Technol 2020; 29(4): 8281-93.