{
"$type": "site.standard.document",
"description": "An artificial intelligence framework is described that incorporates a number of neural networks and a number of transformers for converting a two-dimensional image into three-dimensional semantic information. Neural networks convert one or more images into a set of image feature maps, depth…",
"path": "/patents/1360735",
"publishedAt": "2024-03-14T00:00:00.000Z",
"site": "at://did:plc:oql6ds5vnff4ugar6rruliwd/site.standard.publication/3mn3ohu7oxx5w",
"tags": [
"G06T17/00",
"NVIDIA Corporation"
],
"textContent": "An artificial intelligence framework is described that incorporates a number of neural networks and a number of transformers for converting a two-dimensional image into three-dimensional semantic information. Neural networks convert one or more images into a set of image feature maps, depth information associated with the one or more images, and query proposals based on the depth information. A first transformer implements a cross-attention mechanism to process the set of image feature maps in accordance with the query proposals. The output of the first transformer is combined with a mask token to generate initial voxel features of the scene. A second transformer implements a self-attention mechanism to convert the initial voxel features into refined voxel features, which are up-sampled and processed by a lightweight neural network to generate the three-dimensional semantic information, which may be used by, e.g., an autonomous vehicle for various advanced driver assistance system (ADAS) functions.",
"title": "SPARSE VOXEL TRANSFORMER FOR CAMERA-BASED 3D SEMANTIC SCENE COMPLETION"
}