Best practices to create an audio dataset
Hugging Face Forums [Unofficial]
March 16, 2026
Hello everyone,
I have a question about the percentage breakdown of test, training, and validation data.
Is there some kind of guideline, such as
using 10% for testing, 5% for evaluation, and 85% for training?
And my bonus question: does it make a difference if the test and evaluation files are the same? I mean the same MP3 files.
Thanks in advance
Discussion in the ATmosphere