External Publication
Visit Post

Best practices to create an audio dataset

Hugging Face Forums [Unofficial] March 16, 2026
Source
Hello everyone, I have a question about the percentage breakdown of test, training, and validation data. Is there some kind of guideline, such as using 10% for testing, 5% for evaluation, and 85% for training? And my bonus question: does it make a difference if the test and evaluation files are the same? I mean the same MP3 files. Thanks in advance

Discussion in the ATmosphere

Loading comments...