SYSTEM AND METHOD FOR TRAINING A POLICY USING CLOSED-LOOP WEIGHTED EMPIRICAL RISK MINIMIZATION
DRIVE
March 21, 2024
Systems and methods for training a policy are disclosed. In one example, a system includes a processor and a memory with instructions that cause the processor to train the policy using a training data set with training scenes to generate an identification policy and perform a closed-loop simulation on the identification policy to collect closed-loop metrics. Based on the closed-loop metrics, the instructions cause the processor to construct an error set of the training scenes and construct an upsampled training set by upsampling the error set. After that, the policy is trained using the upsampled training set to generate a final policy.
Discussion in the ATmosphere