IMAGE-BASED GENERATION OF DESCRIPTIVE AND PERCEPTIVE MESSAGES OF AUTOMOTIVE SCENES
DRIVE
May 22, 2025
A system includes: a traffic object detection module detecting traffic objects in an environment; an attention map highlighting module generating an attention map, highlighting relevant ones of the traffic objects or regions in which the relevant ones of the traffic objects are located; an image encoder, based on the attention map, encoding an image of the environment and generating an image embedding vector; a PLM module iteratively selecting and appending text to create a text message including selecting the text based on a score, the text message being a specific description of what is perceived in the environment; a text encoder encoding a portion of the text message created thus far to generate a text embedding vector; and a module, based on the image and text embedding vectors, to score the portion to generate the score, where the PLM module is configured to update the portion based on the score.
Discussion in the ATmosphere