Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreifs5qn4wajxhpjxv4gpxsswuxzmcs77fearymnb7fah6fk4twy4qu",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mio6vwwhctu2"
  },
  "path": "/t/looking-for-arxiv-endorsement-cs-lg-rl-fine-tuning-for-vlms-grpo-mathvista/174948#post_1",
  "publishedAt": "2026-04-04T10:18:01.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "https://github.com/kgaero/RL_GSPO_Qwen2.5VLM/blob/main/paper/staged_metric_gated_grpo.pdf"
  ],
  "textContent": "Hi everyone,\n\nI am seeking an arXiv endorsement for **cs.LG (Machine Learning)** to submit my first paper on RL fine-tuning for vision-language models.\n\n**Background:**\nMS in AI (Purdue), working on RL + VLM training systems.\n\n**Paper:**\n_A Case Study of Staged Metric-Gated GRPO for Visual Numeric Reasoning_\nPDF:\nhttps://github.com/kgaero/RL_GSPO_Qwen2.5VLM/blob/main/paper/staged_metric_gated_grpo.pdf\n\n**Short summary:**\n\n  * Staged RL fine-tuning pipeline for VLMs (GRPO-based)\n\n  * Curriculum over MathVista subsets\n\n  * Metric-gated reward adaptation (structure → correctness)\n\n  * Checkpoint-aware continuation via alias-based selection\n\n\n\n\n**Main result:**\nExact-match improves **0.375 → 0.75** with stable structure under constrained compute.\n\nIf you’re eligible to endorse (cs.LG or related), I’d greatly appreciate it.\nHappy to share endorsement details via DM.\n\nThanks!",
  "title": "Looking for arXiv endorsement (cs.LG) – RL fine-tuning for VLMs (GRPO, MathVista)"
}