Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreiaipzchfbowdfgucjsmdq2pztnwol6lx73o63slbkqnuq6hfj4qia",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mh64yezjomt2"
  },
  "path": "/t/best-practices-to-create-an-audio-dataset/174312#post_1",
  "publishedAt": "2026-03-16T09:13:22.000Z",
  "site": "https://discuss.huggingface.co",
  "textContent": "Hello everyone,\n\nI have a question about the percentage breakdown of test, training, and validation data.\n\nIs there some kind of guideline, such as\n\nusing 10% for testing, 5% for evaluation, and 85% for training?\n\nAnd my bonus question: does it make a difference if the test and evaluation files are the same? I mean the same MP3 files.\n\nThanks in advance",
  "title": "Best practices to create an audio dataset"
}