External Publication

Endorsement request — first arXiv cs.CL submission on cross-lingual GRPO at sub-3B scale

Hugging Face Forums [Unofficial] May 7, 2026

Hi everyone, I’m an independent researcher in Vietnam preparing to submit a paper to arXiv in cs.CL, and I’m currently looking for an endorsement to complete the submission process. The paper extends Open-RS RS2 (Knoveleng and Ngo, 2025) at sub-3B scale on a single-A100 plus LoRA budget. It compares three GRPO training arms that vary one axis at a time: training language (English vs Vietnamese-translated math) and reward function (with or without a fastText language-consistency reward). The main finding is that the auxiliary reward, even when it fires constant 1.0 on English training data, recovers 13.3 percentage points on AIME-2024 over the vanilla English-only run, suggesting it acts as an implicit regularizer via PPO clipping geometry rather than through content signal. The Vietnamese-translated arm shows the same regularization signature at smaller magnitude. The paper documents the LoRA gap honestly (57.5 vs 80 percent on AMC23) and acknowledges single-seed limitations openly. Endorsement code: S7LYVM Endorse here: Log in to arXiv | arXiv e-print repository For transparency, the paper is already public on Zenodo with DOI 10.5281/zenodo.20061328: Beyond English-Only GRPO: Training Language and Auxiliary Reward as Implicit Regularizers in Sub-3B Math Reasoning Code, configs, LoRA adapters, evaluation JSONs, and per-step training logs are released on GitHub under nhockid235/xling-grpo-sub3b (Apache-2.0). If you have 3+ cs.* submissions in the last 5 years, you’re eligible to endorse. I appreciate any help or guidance, and I’m happy to answer questions about the work or share raw evaluation outputs if needed. Thank you for your time.

Discussion in the ATmosphere