External Publication

Made a Python failure dataset for DPO/RLHF — how do you source negative examples?

Hugging Face Forums [Unofficial] April 26, 2026

Hi everyone,

I’ve been quietly building a Python failure dataset for DPO / RLHF training over the past couple of weeks, running 24/7 on a single RTX 4060.

The basic idea: an autopilot pipeline generates Python code attempts for various CS domains (FFT, Monte Carlo, ZKP, etc.), runs each in a sandboxed pytest container, and keeps the genuine failures with error logs as rejected-side training data.

Quick stats:

~2K failure rows shipped (v1, v2)
19 CS domains covered
146 downloads since launch

Two questions for DPO / RLHF practitioners here:

1. How are you currently sourcing negative examples for DPO? Do you have your own pipeline, or rely on synthetic data from larger models? Curious about the trade-offs you’ve found.

2. What domains do you most need failure data for? I can pivot the autopilot’s domain priority in a few days, so concrete requests directly shape what gets generated next.

Free sample (100 rows):

huggingface.co

namakoo/idfu-verified-code · Datasets at Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Even one-line replies help calibrate the next release.

-– namakoo

namakoo/idfu-verified-code · Datasets at Hugging Face

Discussion in the ATmosphere