Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreicvyb7e7elg2lydguwrgo7hy7zbgscu4yru7qr6b6xuczicgpxwdu",
    "uri": "at://did:plc:4rgrdigiftglskeax4wvmsev/app.bsky.feed.post/3mekn5hfbnuh2"
  },
  "coverImage": {
    "$type": "blob",
    "ref": {
      "$link": "bafkreiflo6xt7is6b2iafwghkjahlgggocme5jwjsbeuqqwcywuvjhmszm"
    },
    "mimeType": "image/png",
    "size": 24783
  },
  "path": "/abs/2602.10080v1",
  "publishedAt": "2026-02-11T01:00:00.000Z",
  "site": "https://arxiv.org",
  "tags": [
    "Zhengding Hu",
    "Jingwen Sun",
    "Le Jiang",
    "Yuhao Wang",
    "Junqing Lin",
    "Yi Zong",
    "Guangzhong Sun"
  ],
  "textContent": "**Authors:** Zhengding Hu, Jingwen Sun, Le Jiang, Yuhao Wang, Junqing Lin, Yi Zong, Guangzhong Sun\n\nAs one of the most fundamental problems in graph processing, the Single-Source Shortest Path (SSSP) problem plays a critical role in numerous application scenarios. However, existing GPU-based solutions remain inefficient, as they typically rely on a single, fixed queue design that incurs severe synchronization overhead, high memory latency, and poor adaptivity to diverse inputs. To address these inefficiencies, we propose MultiLevelMultiQueue (MLMQ), a novel data structure that distributes multiple queues across the GPU's multi-level parallelism and memory hierarchy. To realize MLMQ, we introduce a cache-like collaboration mechanism for efficient inter-queue coordination, and develop a modular queue design based on unified Read and Write primitives. Within this framework, we expand the optimization space by designing a set of GPU-friendly queues, composing them across multiple levels, and further providing an input-adaptive MLMQ configuration scheme. Our MLMQ design achieves average speedups of 1.87x to 17.13x over state-of-the-art implementations. Our code is open-sourced at https://github.com/Leo9660/MLMQ.git.",
  "title": "Beyond a Single Queue: Multi-Level-Multi-Queue as an Effective Design for SSSP problems on GPUs"
}