External Publication
Visit Post

How Do i Make Stt Work for my ai Vtuber on Discord Vc calls?

Hugging Face Forums [Unofficial] May 2, 2026
Source

In this case, the most critical issue is the “short duration of the audio,” so I don’t think adjusting the options passed to Whisper alone will solve the problem. You’ll likely need to make the WAV file itself longer.

Specifically, given the number of samples in that WAV file, even if the sampling rate is 16 kHz, the audio duration is only about one second; if the sampling rate were higher, it would be less than one second.

“Generating text from audio that’s less than a second long” is probably a bit outside the scope of the model’s design…

Discussion in the ATmosphere

Loading comments...