{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreifs5qn4wajxhpjxv4gpxsswuxzmcs77fearymnb7fah6fk4twy4qu",
"uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mio6vwwhctu2"
},
"path": "/t/looking-for-arxiv-endorsement-cs-lg-rl-fine-tuning-for-vlms-grpo-mathvista/174948#post_1",
"publishedAt": "2026-04-04T10:18:01.000Z",
"site": "https://discuss.huggingface.co",
"tags": [
"https://github.com/kgaero/RL_GSPO_Qwen2.5VLM/blob/main/paper/staged_metric_gated_grpo.pdf"
],
"textContent": "Hi everyone,\n\nI am seeking an arXiv endorsement for **cs.LG (Machine Learning)** to submit my first paper on RL fine-tuning for vision-language models.\n\n**Background:**\nMS in AI (Purdue), working on RL + VLM training systems.\n\n**Paper:**\n_A Case Study of Staged Metric-Gated GRPO for Visual Numeric Reasoning_\nPDF:\nhttps://github.com/kgaero/RL_GSPO_Qwen2.5VLM/blob/main/paper/staged_metric_gated_grpo.pdf\n\n**Short summary:**\n\n * Staged RL fine-tuning pipeline for VLMs (GRPO-based)\n\n * Curriculum over MathVista subsets\n\n * Metric-gated reward adaptation (structure → correctness)\n\n * Checkpoint-aware continuation via alias-based selection\n\n\n\n\n**Main result:**\nExact-match improves **0.375 → 0.75** with stable structure under constrained compute.\n\nIf you’re eligible to endorse (cs.LG or related), I’d greatly appreciate it.\nHappy to share endorsement details via DM.\n\nThanks!",
"title": "Looking for arXiv endorsement (cs.LG) – RL fine-tuning for VLMs (GRPO, MathVista)"
}