{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreie4dg3z2c2rdtw6nutqkc5kvnbz352bzvmq3dsmxrzdabf3gowuty",
"uri": "at://did:plc:4rgrdigiftglskeax4wvmsev/app.bsky.feed.post/3mfj45fu7nnn2"
},
"coverImage": {
"$type": "blob",
"ref": {
"$link": "bafkreiflo6xt7is6b2iafwghkjahlgggocme5jwjsbeuqqwcywuvjhmszm"
},
"mimeType": "image/png",
"size": 24783
},
"path": "/abs/2602.18003v1",
"publishedAt": "2026-02-23T01:00:00.000Z",
"site": "https://arxiv.org",
"tags": [
"Jongmin Lee",
"Ernest K. Ryu"
],
"textContent": "**Authors:** Jongmin Lee, Ernest K. Ryu\n\nWhile there is an extensive body of research analyzing policy gradient methods for discounted cumulative-reward MDPs, prior work on policy gradient methods for average-reward MDPs has been limited, with most existing results restricted to ergodic or unichain settings. In this work, we first establish a policy gradient theorem for average-reward multichain MDPs based on the invariance of the classification of recurrent and transient states. Building on this foundation, we develop refined analyses and obtain a collection of convergence and sample-complexity results that advance the understanding of this setting. In particular, we show that the proposed $α$-clipped policy mirror ascent algorithm attains an $ε$-optimal policy with respect to positive policies.",
"title": "Policy Gradient Algorithms in Average-Reward Multichain MDPs"
}