Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreigwjz3y67ojt72jz3dtpkpqeivlhpraqrqpr2amqwelqlaumvpqjm",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mlamfqgk2ps2"
  },
  "path": "/t/endorsement-request-first-arxiv-cs-cl-submission-on-cross-lingual-grpo-at-sub-3b-scale/175805#post_1",
  "publishedAt": "2026-05-07T06:39:53.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "Log in to arXiv | arXiv e-print repository",
    "Beyond English-Only GRPO: Training Language and Auxiliary Reward as Implicit Regularizers in Sub-3B Math Reasoning"
  ],
  "textContent": "Hi everyone, I’m an independent researcher in Vietnam preparing to submit a paper to arXiv in cs.CL, and I’m currently looking for an endorsement to complete the submission process.\n\nThe paper extends Open-RS RS2 (Knoveleng and Ngo, 2025) at sub-3B scale on a single-A100 plus LoRA budget. It compares three GRPO training arms that vary one axis at a time: training language (English vs\nVietnamese-translated math) and reward function (with or without a fastText language-consistency reward). The main finding is that the auxiliary reward, even when it fires constant 1.0 on English training data,\nrecovers 13.3 percentage points on AIME-2024 over the vanilla English-only run, suggesting it acts as an implicit regularizer via PPO clipping geometry rather than through content signal. The\nVietnamese-translated arm shows the same regularization signature at smaller magnitude. The paper documents the LoRA gap honestly (57.5 vs 80 percent on AMC23) and acknowledges single-seed limitations openly.\n\nEndorsement code: S7LYVM\nEndorse here: Log in to arXiv | arXiv e-print repository\n\nFor transparency, the paper is already public on Zenodo with DOI 10.5281/zenodo.20061328: Beyond English-Only GRPO: Training Language and Auxiliary Reward as Implicit Regularizers in Sub-3B Math Reasoning\n\nCode, configs, LoRA adapters, evaluation JSONs, and per-step training logs are released on GitHub under nhockid235/xling-grpo-sub3b (Apache-2.0).\n\nIf you have 3+ cs.* submissions in the last 5 years, you’re eligible to endorse. I appreciate any help or guidance, and I’m happy to answer questions about the work or share raw evaluation outputs if needed.\nThank you for your time.",
  "title": "Endorsement request — first arXiv cs.CL submission on cross-lingual GRPO at sub-3B scale"
}