External Publication
Visit Post

Endorsement request — first arXiv cs.CL submission on cross-lingual GRPO at sub-3B scale

Hugging Face Forums [Unofficial] May 7, 2026
Source
Hi everyone, I’m an independent researcher in Vietnam preparing to submit a paper to arXiv in cs.CL, and I’m currently looking for an endorsement to complete the submission process. The paper extends Open-RS RS2 (Knoveleng and Ngo, 2025) at sub-3B scale on a single-A100 plus LoRA budget. It compares three GRPO training arms that vary one axis at a time: training language (English vs Vietnamese-translated math) and reward function (with or without a fastText language-consistency reward). The main finding is that the auxiliary reward, even when it fires constant 1.0 on English training data, recovers 13.3 percentage points on AIME-2024 over the vanilla English-only run, suggesting it acts as an implicit regularizer via PPO clipping geometry rather than through content signal. The Vietnamese-translated arm shows the same regularization signature at smaller magnitude. The paper documents the LoRA gap honestly (57.5 vs 80 percent on AMC23) and acknowledges single-seed limitations openly. Endorsement code: S7LYVM Endorse here: Log in to arXiv | arXiv e-print repository For transparency, the paper is already public on Zenodo with DOI 10.5281/zenodo.20061328: Beyond English-Only GRPO: Training Language and Auxiliary Reward as Implicit Regularizers in Sub-3B Math Reasoning Code, configs, LoRA adapters, evaluation JSONs, and per-step training logs are released on GitHub under nhockid235/xling-grpo-sub3b (Apache-2.0). If you have 3+ cs.* submissions in the last 5 years, you’re eligible to endorse. I appreciate any help or guidance, and I’m happy to answer questions about the work or share raw evaluation outputs if needed. Thank you for your time.

Discussion in the ATmosphere

Loading comments...