Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreifc4lqowfi3zhqtqjopliogngmvxrv2kqbtocyun5ztzifnnpniqq",
    "uri": "at://did:plc:lk3jfj3zq4k4wxnk474axylu/app.bsky.feed.post/3mln2agl6fct2"
  },
  "path": "/t/refusals-v3-moderation-eval-failing-on-fine-tuning-jobs-again-internal-errors-across-gpt-4-1-nano-and-gpt-4o-mini/1380704#post_1",
  "publishedAt": "2026-05-12T04:42:57.000Z",
  "site": "https://community.openai.com",
  "textContent": "I’m hitting a consistent failure during the post-training moderation evaluation step on fine-tuning jobs. Training completes, checkpoints and the fine-tuned model are created, then the `refusals_v3` eval fails with `internal error` and exhausts all 3 retry attempts.\n\nReproduced 3 times so far: 2 runs on `gpt-4.1-nano` and 1 run on `gpt-4o-mini`. Same failure pattern every time, different base models.\n\nEvent log from the most recent run:\n\n\n    Retrying moderation eval refusals_v3 (attempt 3/3) due to an internal error.   00:37:07\n    Retrying moderation eval refusals_v3 (attempt 2/3) due to an internal error.   00:26:56\n    Evaluating model against our usage policies                                    00:26:56\n    New fine-tuned model created                                                   00:26:56\n    Checkpoint created at step 302                                                 00:26:56\n    Checkpoint created at step 151                                                 23:41:39\n    Fine-tuning job started                                                        23:41:37\n    Files validated, moving job to queued state                                    23:41:36\n    Validating training file: file-PrsA2qk3fi3ppPc3S1Lkgq                          23:41:36\n    Created fine-tuning job: ftjob-3S2R2CNYXZOiZUIIhd7x2Bqu\n\n\n**This looks like the same issue reported in February:**\n\n  * Fine-tuning job fails after 3 retries during moderation eval refusals_v3 (internal error, gpt-4.1-mini-2025-04-14) — Feb 12, 2026\n\n\n\nIn that thread, multiple users confirmed the failure across different base models and completely different (benign) datasets, and it was eventually resolved on the service side (“The refusals_v3 service seems to be up and running again”). The thread was closed without an official root-cause post. There are also a couple of related threads from January/February with the same `internal error` pattern.\n\nA few things I’d appreciate clarity on:\n\n  1. Is `refusals_v3` having issues again? Nothing on the status page as of right now, and I haven’t found a recent post about it.\n  2. The fine-tuned model is created _before_ the eval runs — is it actually usable, or does a failed moderation eval block deployment regardless?\n  3. When the eval itself fails for internal reasons (not a content issue), what’s the recommended action?\n\n\n\nJob ID and file ID are in the log above for anyone from the team who wants to dig in. Happy to share more details.\n\nThanks.",
  "title": "`refusals_v3` moderation eval failing on fine-tuning jobs again — internal errors across `gpt-4.1-nano` and `gpt-4o-mini`"
}