{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreifc4lqowfi3zhqtqjopliogngmvxrv2kqbtocyun5ztzifnnpniqq",
"uri": "at://did:plc:lk3jfj3zq4k4wxnk474axylu/app.bsky.feed.post/3mln2agl6fct2"
},
"path": "/t/refusals-v3-moderation-eval-failing-on-fine-tuning-jobs-again-internal-errors-across-gpt-4-1-nano-and-gpt-4o-mini/1380704#post_1",
"publishedAt": "2026-05-12T04:42:57.000Z",
"site": "https://community.openai.com",
"textContent": "I’m hitting a consistent failure during the post-training moderation evaluation step on fine-tuning jobs. Training completes, checkpoints and the fine-tuned model are created, then the `refusals_v3` eval fails with `internal error` and exhausts all 3 retry attempts.\n\nReproduced 3 times so far: 2 runs on `gpt-4.1-nano` and 1 run on `gpt-4o-mini`. Same failure pattern every time, different base models.\n\nEvent log from the most recent run:\n\n\n Retrying moderation eval refusals_v3 (attempt 3/3) due to an internal error. 00:37:07\n Retrying moderation eval refusals_v3 (attempt 2/3) due to an internal error. 00:26:56\n Evaluating model against our usage policies 00:26:56\n New fine-tuned model created 00:26:56\n Checkpoint created at step 302 00:26:56\n Checkpoint created at step 151 23:41:39\n Fine-tuning job started 23:41:37\n Files validated, moving job to queued state 23:41:36\n Validating training file: file-PrsA2qk3fi3ppPc3S1Lkgq 23:41:36\n Created fine-tuning job: ftjob-3S2R2CNYXZOiZUIIhd7x2Bqu\n\n\n**This looks like the same issue reported in February:**\n\n * Fine-tuning job fails after 3 retries during moderation eval refusals_v3 (internal error, gpt-4.1-mini-2025-04-14) — Feb 12, 2026\n\n\n\nIn that thread, multiple users confirmed the failure across different base models and completely different (benign) datasets, and it was eventually resolved on the service side (“The refusals_v3 service seems to be up and running again”). The thread was closed without an official root-cause post. There are also a couple of related threads from January/February with the same `internal error` pattern.\n\nA few things I’d appreciate clarity on:\n\n 1. Is `refusals_v3` having issues again? Nothing on the status page as of right now, and I haven’t found a recent post about it.\n 2. The fine-tuned model is created _before_ the eval runs — is it actually usable, or does a failed moderation eval block deployment regardless?\n 3. When the eval itself fails for internal reasons (not a content issue), what’s the recommended action?\n\n\n\nJob ID and file ID are in the log above for anyone from the team who wants to dig in. Happy to share more details.\n\nThanks.",
"title": "`refusals_v3` moderation eval failing on fine-tuning jobs again — internal errors across `gpt-4.1-nano` and `gpt-4o-mini`"
}