Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreiasghtolqmo6lovi2k7b2jw44f4eraulepjxquxqwenu43gk6paj4",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mkqc2lynvqm2"
  },
  "path": "/t/made-a-python-failure-dataset-for-dpo-rlhf-how-do-you-source-negative-examples/175567#post_2",
  "publishedAt": "2026-04-30T17:33:12.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "Full Bundle",
    "@b.com"
  ],
  "textContent": "Working on a planner-executor. Planner outputs a code spec as JSON, executor reads it. Kept running into the same failure: model returns a valid JSON object but wraps it in json… backticks and prepends an explanation. Parser dies before it reaches the first field.\n\nPrompt constraints help. “Return only JSON, no fences, no preamble.” Works for a session, then new inputs or slightly different phrasing and the decoration comes back.\n\nEnded up building training data specifically for this. One row from DinoDS Lane 16 (Code JSON Spec Mode):\n\nJSONL Row (public proof):\n\n\n    {\"sample_id\": \"lane_16_spec_example_01\", \"mode\": \"quick\", \"tone\": \"professional\",\n     \"intent_family\": \"content_generation\", \"intent_subtype\": \"code_json_spec\",\n     \"representation_choice\": \"plain_text\",\n     \"user_message\": \"Give me a JSON spec for a function that validates email addresses.\",\n     \"assistant_response\": \"{\"task_type\":\"simple_function\",\"language\":\"python\",\n       \"files\":[{\"name\":\"email_validator.py\",\"purpose\":\"Validate email strings\",\n       \"exports\":[\"is_valid_email\"]}],\n       \"constraints\":[\"No external dependencies\",\"Return boolean only\"],\n       \"tests\":[\"valid: a@b.com\",\"invalid: a@@b.com\"]}\"}\n\n\n\nThe key is representation_choice: plain_text — explicitly labels the output contract as raw structured data rather than a formatted response. Still running experiments on whether SFT coverage alone locks this in or if DPO pairs are needed on the fence/no-fence contrast.\n\nIf you have failure logs in this area, curious what inputs trigger the regression. If need Full Bundle can come discuss it.",
  "title": "Made a Python failure dataset for DPO/RLHF — how do you source negative examples?"
}