`refusals_v3` moderation eval failing on fine-tuning jobs again — internal errors across `gpt-4.1-nano` and `gpt-4o-mini`
I’m hitting a consistent failure during the post-training moderation evaluation step on fine-tuning jobs. Training completes, checkpoints and the fine-tuned model are created, then the refusals_v3 eval fails with internal error and exhausts all 3 retry attempts.
Reproduced 3 times so far: 2 runs on gpt-4.1-nano and 1 run on gpt-4o-mini. Same failure pattern every time, different base models.
Event log from the most recent run:
Retrying moderation eval refusals_v3 (attempt 3/3) due to an internal error. 00:37:07
Retrying moderation eval refusals_v3 (attempt 2/3) due to an internal error. 00:26:56
Evaluating model against our usage policies 00:26:56
New fine-tuned model created 00:26:56
Checkpoint created at step 302 00:26:56
Checkpoint created at step 151 23:41:39
Fine-tuning job started 23:41:37
Files validated, moving job to queued state 23:41:36
Validating training file: file-PrsA2qk3fi3ppPc3S1Lkgq 23:41:36
Created fine-tuning job: ftjob-3S2R2CNYXZOiZUIIhd7x2Bqu
This looks like the same issue reported in February:
- Fine-tuning job fails after 3 retries during moderation eval refusals_v3 (internal error, gpt-4.1-mini-2025-04-14) — Feb 12, 2026
In that thread, multiple users confirmed the failure across different base models and completely different (benign) datasets, and it was eventually resolved on the service side (“The refusals_v3 service seems to be up and running again”). The thread was closed without an official root-cause post. There are also a couple of related threads from January/February with the same internal error pattern.
A few things I’d appreciate clarity on:
- Is
refusals_v3having issues again? Nothing on the status page as of right now, and I haven’t found a recent post about it. - The fine-tuned model is created before the eval runs — is it actually usable, or does a failed moderation eval block deployment regardless?
- When the eval itself fails for internal reasons (not a content issue), what’s the recommended action?
Job ID and file ID are in the log above for anyone from the team who wants to dig in. Happy to share more details.
Thanks.
Discussion in the ATmosphere