Using Double DQN and Sensor Information for Autonomous Driving

Ramesh       Raghavan; Dinesh    Chander    Verma; Tarun       Bansal

Abstract

Background: In most of the researches, the autonomous driving problem is either solved with policy gradients or DQN. In this paper, we tried to eliminate the problem of policy gradients which is high variance and an overestimation of values in DQN. We used DDQN as it has low variance, and it solves the problem of overestimation in DQN.

Aim: The main aim of this paper is to propose a framework for an autonomous driving model that takes in raw sensor information as input data and predicts actions as output from the model which could then be used for simulating the car.

Objective: The main objective of this paper is to use DDQN and Discretization technique to solve the autonomous driving problem and get better results even with a continuous action space.

Methods: To solve the bridge between self-driving cars and reinforcement learning we used Double Deep Q-Networks as this could help to prevent the overestimation of values by decoupling the selection from the evaluation. Also, to solve the problem of continuous action space we used the discretization technique in which variables are grouped into bins and each bin is assigned a value in such a way that the relationship between the bins is preserved.

Result: The experimental results showed improved performance of the agent. The agent was tested for different conditions like curve roads and traffic, which showed the agent can drive at different conditions as well. We also illustrated how DDQN performed well over policy gradients just by adding a simple discretization technique to make the action space discrete and overcoming the issue of overestimation of q-values.

Conclusion: The gym environment and reward function were designed for DDQN to work. We have also used CARLA as a virtual simulator for training purposes. Finally, we have demonstrated that our agent could perform well in different cases and conditions. As a reminder note, we can improve our agent to also work for following traffic light rules and other road safety measures.

Keywords: Deep Reinforcement Learning, DDQN, Artificial Intelligence, Autonomous Driving, CARLA, overestimation.

Graphical Abstract

[1] 
Pomerleau DA. Efficient training of artificial neural networks for autonomous navigation. Neural Comput  1991; 3(1): 88-97.
[http://dx.doi.org/10.1162/neco.1991.3.1.88] [PMID:  31141866] 
[2] 
Liu Z. A survey of intelligence methods in urban traffic signal control. IJCSNS  2007; 7(7): 105-12.
[3] 
Hessel M, Modayil J, Van Hasselt H, et al. Rainbow: Combining improvements in deep reinforcement learning. Thirty-Second AAAI Conference on Artificial Intelligence 
[4] 
Wu K, Wang H, Esfahani MA, Yuan S. BND*-DDQN: Learn to Steer Autonomously through Deep Reinforcement Learning. IEEE Transactions on Cognitive and Developmental Systems 2019.
[5] 
Zhang Q, Du T, Tian CA. Sim2real method based on DDQN for training a self-driving scale car. Mathematical Foundations of Computing  2019; 2(4): 315.
[http://dx.doi.org/10.3934/mfc.2019020] 
[6] 
Abed-alguni BH, Ottom MA. Double delayed Q-learning. International Journal of Artificial Intelligence  2018; 16(2): 41-59.
[7] 
Paden B, Cap M, Yong SZ, Yershov DS, Frazzoli E. A survey of motion planning and control techniques for self-driving urban vehicles IEEE Transactions on Intelligent Vehicles   1(1): 33-55.
[http://dx.doi.org/10.1109/TIV.2016.2578706] 
[8] 
Franke U. Autonomous driving.  Computer Vision in Vehicle Technology 2017; pp. 24-54.
[http://dx.doi.org/10.1002/9781118868065.ch2] 
[9] 
Liu X, Liu Y, Chen Y, Hanzo L. Enhancing the Fuel-Economy of V2I-Assisted Autonomous Driving: A Reinforcement Learning Approach. IEEE Transactions on Vehicular Technology. 
[http://dx.doi.org/10.1109/TVT.2020.2996187] 
[10] 
Dosovitskiy A, Ros G, Codevilla F, Lopez A, Koltun V.  CARLA: An open urban driving simulator. arXiv preprint 2017.
[11] 
Games E.  Unreal Engine 4 https://www.unrealengine.com
[12] 
Brockman G, Cheung V, Pettersson L, et al. Openai gym. arXiv preprint  2016.
[13] 
Zamora I, Lopez NG, Vilches VM, Cordero AH. Extending the openai gym for robotics: a toolkit for reinforcement learning using ros and gazebo. arXiv preprint 2016.
[14] 
Abadi M, Barham P, Chen J, et al.  2016.  Tensorflow: A system for large-scale machine learning.12th {USENIX} symposium on operating systems design and implementation ({OSDI} 16). 265-83.
[15] 
Team TTD, Al-Rfou R, Alain G, et al. Theano: A Python framework for fast computation of mathematical expressions. arXiv preprint  2016.
[16] 
Paden B, Cap M, Yong SZ, Yershov D, Frazzoli E. A survey of motion planning and ´ control techniques for self-driving urban vehicles. IEEE Transactions on Intelligent Vehicles  2016; 1(1): 33-55.
[http://dx.doi.org/10.1109/TIV.2016.2578706] 
[17] 
Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning. Nature  2015; 518(7540): 529-33.
[http://dx.doi.org/10.1038/nature14236] [PMID:  25719670] 
[18] 
Mnih V. Asynchronous methods for deep reinforcement learning CoRR  2016.http://arxiv.org/abs/1602.01783
[19] 
Pomerleau DA. Alvinn: An autonomous land vehicle in a neural networkTechnical report. Carnegie Mellon University, Computer Science Department 1989.
[20] 
Stéphane Ross and Drew Bagnell. Efficient reductions for imitation learning. AISTATS  2010; 3: 3-5.
[21] 
Hausknecht M, Stone P. Deep recurrent q-learning for partially observable mdps. 2015. >AAAI Fall Symposium Series
[22] 
Cheung TL, Okamoto K, Maker F III, Liu X, Akella V. Markov decision process (MDP) framework for optimizing software on mobile phones. Proceedings of the seventh ACM international conference on Embedded software  11-20.
[http://dx.doi.org/10.1145/1629335.1629338] 
[23] 
Mundhenk M, Goldsmith J, Lusena C, Allender E. Complexity of finite-horizon Markov decision process problems. J Assoc Comput Mach  2000; 47(4): 681-720.
[http://dx.doi.org/10.1145/347476.347480] 
[24] 
Thrun S, Schwartz A. Issues in using function approximation for reinforcement learning.  Proceedings of the 1993 Connectionist Models Summer School. 
[25] 
Azizzadenesheli K, Brunskill E, Anandkumar A. IEEE 2018; pp. Efficient exploration through bayesian deep q-networks. 2018 Information Theory and Applications Workshop (ITA) IEEE 2018  1-9.
[http://dx.doi.org/10.1109/ITA.2018.8503252] 
[26] 
Walt SVD, Colbert SC, Varoquaux G. The NumPy array: A structure for efficient numerical computation Comput Sci Eng  2011; 13(2): 22-30.
[http://dx.doi.org/10.1109/MCSE.2011.37] 
[27] 
Grzes M. Reward shaping in episodic reinforcement learning 2017. 565-73.
[28] 
Laud AD. Theory and application of reward shaping in reinforcement learning.  University of Illinois at Urbana-Champaign 2004; pp. 1-97.
[29] 
Sinclair SR, Banerjee S, Yu CL. Adaptive discretization for episodic reinforcement learning in metric spaces. Proceedings of the ACM on Measurement and Analysis of Computing Systems  2019; 3(3): 1-44.
[http://dx.doi.org/10.1145/3366703] 
[30] 
Wei Z, Xu J, Lan Y, Guo J, Cheng X. Reinforcement learning to rank with Markov decision process. Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval  2017; 945-48.
[http://dx.doi.org/10.1145/3077136.3080685] 
[31] 
Schulman J, Chen X, Abbeel P. . Equivalence between policy gradients and soft q-learning. arXiv preprint  2017.
[32] 
Aggarwal D, Bali V, Mittal S. An insight into machine learning techniques for predictive analysis and feature selection. International Journal of Innovative Technology and Exploring Engineering  2019; 8(9): 342-9.
[33] 
Nagar R, Aggarwal D, Saxena UR, Bali V. Cancer Prediction Using Machine Learning Techniques Based on Clinical & Non-Clinical Parameters. Int J Adv Sci Technol  2020; 29(4): 8281-93.

Cite As

International Journal of Sensors, Wireless Communications and Control

Using Double DQN and Sensor Information for Autonomous Driving

Abstract

Graphical Abstract