MANAGING ALEATORIC AND EPISTEMIC UNCERTAINTY IN REINFORCEMENT LEARNING, WITH APPLICATIONS TO AUTONOMOUS VEHICLE CONTROL
DRIVE
July 2, 2025
Methods relating to the control of autonomous vehicles using a reinforcement learning agent include a plurality of training sessions (110-1, ..., 110-K), in which the agent interacts with an environment, each having a different initial value and yielding a state-action quantile function Zk,τsa=FZksa−1τ dependent on state (s) and action (a). The methods further include a first uncertainty estimation (114) on the basis of a variability measure VarτEkZk,τsa, relating to a variability with respect to quantile τ, of an average EkZk,τsa of the plurality of state-action quantile functions evaluated for a state-action pair; and a second uncertainty estimation (116) on the basis of a variability measure VarkEτZk,τsa, relating to an ensemble variability, for the plurality of state-action quantile functions evaluated for a state-action pair. The state-action pair may either correspond to a tentative decision, which is verified before execution, or to possible decisions by the agent to guide additional training.
Discussion in the ATmosphere