{
  "$type": "site.standard.document",
  "description": "An artificial intelligence framework is described that incorporates a number of neural networks and a number of transformers for converting a two-dimensional image into three-dimensional semantic information. Neural networks convert one or more images into a set of image feature maps, depth…",
  "path": "/patents/1360735",
  "publishedAt": "2024-03-14T00:00:00.000Z",
  "site": "at://did:plc:oql6ds5vnff4ugar6rruliwd/site.standard.publication/3mn3ohu7oxx5w",
  "tags": [
    "G06T17/00",
    "NVIDIA Corporation"
  ],
  "textContent": "An artificial intelligence framework is described that incorporates a number of neural networks and a number of transformers for converting a two-dimensional image into three-dimensional semantic information. Neural networks convert one or more images into a set of image feature maps, depth information associated with the one or more images, and query proposals based on the depth information. A first transformer implements a cross-attention mechanism to process the set of image feature maps in accordance with the query proposals. The output of the first transformer is combined with a mask token to generate initial voxel features of the scene. A second transformer implements a self-attention mechanism to convert the initial voxel features into refined voxel features, which are up-sampled and processed by a lightweight neural network to generate the three-dimensional semantic information, which may be used by, e.g., an autonomous vehicle for various advanced driver assistance system (ADAS) functions.",
  "title": "SPARSE VOXEL TRANSFORMER FOR CAMERA-BASED 3D SEMANTIC SCENE COMPLETION"
}