{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreie4dg3z2c2rdtw6nutqkc5kvnbz352bzvmq3dsmxrzdabf3gowuty",
    "uri": "at://did:plc:4rgrdigiftglskeax4wvmsev/app.bsky.feed.post/3mfj45fu7nnn2"
  },
  "coverImage": {
    "$type": "blob",
    "ref": {
      "$link": "bafkreiflo6xt7is6b2iafwghkjahlgggocme5jwjsbeuqqwcywuvjhmszm"
    },
    "mimeType": "image/png",
    "size": 24783
  },
  "path": "/abs/2602.18003v1",
  "publishedAt": "2026-02-23T01:00:00.000Z",
  "site": "https://arxiv.org",
  "tags": [
    "Jongmin Lee",
    "Ernest K. Ryu"
  ],
  "textContent": "**Authors:** Jongmin Lee, Ernest K. Ryu\n\nWhile there is an extensive body of research analyzing policy gradient methods for discounted cumulative-reward MDPs, prior work on policy gradient methods for average-reward MDPs has been limited, with most existing results restricted to ergodic or unichain settings. In this work, we first establish a policy gradient theorem for average-reward multichain MDPs based on the invariance of the classification of recurrent and transient states. Building on this foundation, we develop refined analyses and obtain a collection of convergence and sample-complexity results that advance the understanding of this setting. In particular, we show that the proposed $α$-clipped policy mirror ascent algorithm attains an $ε$-optimal policy with respect to positive policies.",
  "title": "Policy Gradient Algorithms in Average-Reward Multichain MDPs"
}