{
  "$type": "site.standard.document",
  "coverImage": {
    "$type": "blob",
    "ref": {
      "$link": "bafkreidf5qux67wmw6trqke3qnzki77fsz7iqgiwyhbyguw3rqamnn7klu"
    },
    "mimeType": "image/png",
    "size": 122339
  },
  "description": "Methods relating to the control of autonomous vehicles using a reinforcement learning agent include a plurality of training sessions (110-1, ..., 110-K), in which the agent interacts with an environment, each having a different initial value and yielding a state-action quantile function…",
  "path": "/patents/1404680",
  "publishedAt": "2025-07-02T00:00:00.000Z",
  "site": "at://did:plc:oql6ds5vnff4ugar6rruliwd/site.standard.publication/3mn3ohu7oxx5w",
  "tags": [
    "B60W60/001",
    "VOLVO AUTONOMOUS SOLUTIONS AB [SE]"
  ],
  "textContent": "Methods relating to the control of autonomous vehicles using a reinforcement learning agent include a plurality of training sessions (110-1, ..., 110-K), in which the agent interacts with an environment, each having a different initial value and yielding a state-action quantile function Zk,τsa=FZksa−1τ dependent on state (s) and action (a). The methods further include a first uncertainty estimation (114) on the basis of a variability measure VarτEkZk,τsa, relating to a variability with respect to quantile τ, of an average EkZk,τsa of the plurality of state-action quantile functions evaluated for a state-action pair; and a second uncertainty estimation (116) on the basis of a variability measure VarkEτZk,τsa, relating to an ensemble variability, for the plurality of state-action quantile functions evaluated for a state-action pair. The state-action pair may either correspond to a tentative decision, which is verified before execution, or to possible decisions by the agent to guide additional training.",
  "title": "MANAGING ALEATORIC AND EPISTEMIC UNCERTAINTY IN REINFORCEMENT LEARNING, WITH APPLICATIONS TO AUTONOMOUS VEHICLE CONTROL"
}