Whisper API keeps returning empty transcript for videos longer than 30 minutes — stuck in production
OpenAI Developer Community
May 1, 2026
Hey everyone,
Running into a consistent issue with Whisper API and longer recordings and not sure what the right fix is.
My setup right now:
* Pull the Zoom recording → convert to MP3 with FFmpeg → compress under 25MB → send to Whisper API
Works fine for anything under 20 minutes. The moment I go past 30 minutes the transcript either comes back empty or just cuts off mid-sentence with no error message. Whisper just returns 200 with partial or no content.
Already tried a few things:
Splitting into chunks — works but the speaker attribution gets completely lost between chunks and stitching the context back together is messy.
Lowering the bitrate more — quality drops so much that Whisper starts misidentifying words, especially with any background noise or non-native accents.
Switching to gpt-4o-transcribe — hit the 1500 second limit which is actually worse than Whisper for longer calls.
The real frustration is the entire pipeline assumes you have a small local file. For any real meeting or interview recording that is just not realistic without seriously degrading the audio.
Has anyone figured out a solid approach for this? Ideally something that:
* Takes the recording URL directly without needing to download and re-encode
* Handles 60-90 minute recordings reliably
* Keeps speaker labels intact
Open to completely different approaches if Whisper just isn’t the right tool for this use case.
Discussion in the ATmosphere