External Publication

Whisper API keeps returning empty transcript for videos longer than 30 minutes — stuck in production

OpenAI Developer Community May 3, 2026

The 25MB limit on the OpenAI Whisper API is the main bottleneck here. Compressing a 90-minute recording to fit that size kills the audio quality (hence the hallucinations or empty transcripts).

Since you’re looking for a production-ready solution that handles 60+ minutes and speaker labels, here are three solid approaches:

Deepgram API: It’s often the go-to for long-form audio. It accepts URLs directly, doesn’t have that strict 25MB limit, and has excellent diarization (speaker labels) that stays consistent throughout the entire recording.
AssemblyAI: Similar to Deepgram, it’s built for long files and handles speaker diarization much better than ‘stitching’ Whisper chunks together.
Self-hosted Whisper (Faster-Whisper): If you have the infra (or use a GPU cloud like RunPod), you can run faster-whisper. You won’t have file size limits, and you can use pyannote-audio for speaker labeling, though it requires more setup.

Sticking with OpenAI’s Whisper for 90-minute files will always feel like a hack because of the chunking/compression trade-off.The 25MB limit on the OpenAI Whisper API is the main bottleneck here. Compressing a 90-minute recording to fit that size kills the audio quality (hence the hallucinations or empty transcripts).

Since you’re looking for a production-ready solution that handles 60+ minutes and speaker labels, here are three solid approaches:

Deepgram API: It’s often the go-to for long-form audio. It accepts URLs directly, doesn’t have that strict 25MB limit, and has excellent diarization (speaker labels) that stays consistent throughout the entire recording.
AssemblyAI: Similar to Deepgram, it’s built for long files and handles speaker diarization much better than ‘stitching’ Whisper chunks together.
Self-hosted Whisper (Faster-Whisper): If you have the infra (or use a GPU cloud like RunPod), you can run faster-whisper. You won’t have file size limits, and you can use pyannote-audio for speaker labeling, though it requires more setup.

Sticking with OpenAI’s Whisper for 90-minute files will always feel like a hack because of the chunking/compression trade-off.

Discussion in the ATmosphere