External Publication
Visit Post

Whisper API keeps returning empty transcript for videos longer than 30 minutes — stuck in production

OpenAI Developer Community May 1, 2026
Source
Hey everyone, Running into a consistent issue with Whisper API and longer recordings and not sure what the right fix is. My setup right now: * Pull the Zoom recording → convert to MP3 with FFmpeg → compress under 25MB → send to Whisper API Works fine for anything under 20 minutes. The moment I go past 30 minutes the transcript either comes back empty or just cuts off mid-sentence with no error message. Whisper just returns 200 with partial or no content. Already tried a few things: Splitting into chunks — works but the speaker attribution gets completely lost between chunks and stitching the context back together is messy. Lowering the bitrate more — quality drops so much that Whisper starts misidentifying words, especially with any background noise or non-native accents. Switching to gpt-4o-transcribe — hit the 1500 second limit which is actually worse than Whisper for longer calls. The real frustration is the entire pipeline assumes you have a small local file. For any real meeting or interview recording that is just not realistic without seriously degrading the audio. Has anyone figured out a solid approach for this? Ideally something that: * Takes the recording URL directly without needing to download and re-encode * Handles 60-90 minute recordings reliably * Keeps speaker labels intact Open to completely different approaches if Whisper just isn’t the right tool for this use case.

Discussion in the ATmosphere

Loading comments...