SYSTEMS AND METHODS FOR END-TO-END LEARNING OF OPTIMAL DRIVING POLICY
DRIVE
December 8, 2022
A system for learning optimal driving behavior for autonomous vehicles comprises a deep neural network, a first stage training module, and a second stage training module. The deep neural network comprises a feature learning network configured to receive sensor data from a vehicle as input and output spatial temporal feature embeddings and a decision action network configured to receive the spatial temporal feature embeddings as input and output an optimal driving policy for the vehicle. The first training stage module is configured to, during a first training stage, train the feature learning network using object detection loss. The second stage training module is configured to, during a second training stage, train the decision action network using reinforcement learning.
Discussion in the ATmosphere