VEHICLE OPERATION WITH MACHINE LEARNING
DRIVE
February 5, 2026
A computer that includes a processor and a memory, the memory including instructions executable by the processor to operate a system based on predictions output from the machine learning system including predicted states, actions, rewards, and costs, wherein the machine learning system includes a first transformer and a second transformer and is trained based on bisimulation offline reinforcement learning, wherein the first transformer and the second transformer are based on a Markov decision process that includes the states, the actions, the rewards, and the costs. The bisimulation offline reinforcement learning can include inputting a first sequences of training states, actions, rewards, and costs to the first transformer and a second sequence of the training states, actions, rewards, and costs to the second transformer to determine bisimulation learning objectives based on latent variables output from the first transformer and the second transformer.
Discussion in the ATmosphere