External Publication
Visit Post

Looking for arXiv endorsement (cs.LG) – RL fine-tuning for VLMs (GRPO, MathVista)

Hugging Face Forums [Unofficial] April 4, 2026
Source

Hi everyone,

I am seeking an arXiv endorsement for cs.LG (Machine Learning) to submit my first paper on RL fine-tuning for vision-language models.

Background: MS in AI (Purdue), working on RL + VLM training systems.

Paper: A Case Study of Staged Metric-Gated GRPO for Visual Numeric Reasoning PDF: https://github.com/kgaero/RL_GSPO_Qwen2.5VLM/blob/main/paper/staged_metric_gated_grpo.pdf

Short summary:

  • Staged RL fine-tuning pipeline for VLMs (GRPO-based)

  • Curriculum over MathVista subsets

  • Metric-gated reward adaptation (structure → correctness)

  • Checkpoint-aware continuation via alias-based selection

Main result: Exact-match improves 0.375 → 0.75 with stable structure under constrained compute.

If you’re eligible to endorse (cs.LG or related), I’d greatly appreciate it. Happy to share endorsement details via DM.

Thanks!

Discussion in the ATmosphere

Loading comments...