{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreid3ezx2nj6ud4tuegoklsh5qhfbi5d4xvzce7xqpncqvahvvceluu",
"uri": "at://did:plc:lk3jfj3zq4k4wxnk474axylu/app.bsky.feed.post/3mkwedl7tf2g2"
},
"path": "/t/whisper-api-keeps-returning-empty-transcript-for-videos-longer-than-30-minutes-stuck-in-production/1380129#post_2",
"publishedAt": "2026-05-03T03:09:51.000Z",
"site": "https://community.openai.com",
"textContent": "**The 25MB limit on the OpenAI Whisper API is the main bottleneck here. Compressing a 90-minute recording to fit that size kills the audio quality (hence the hallucinations or empty transcripts).**\n\n**Since you’re looking for a production-ready solution that handles 60+ minutes and speaker labels, here are three solid approaches:**\n\n 1. **Deepgram API:** It’s often the go-to for long-form audio. It accepts URLs directly, doesn’t have that strict 25MB limit, and has excellent **diarization (speaker labels)** that stays consistent throughout the entire recording.\n\n 2. **AssemblyAI:** Similar to Deepgram, it’s built for long files and handles speaker diarization much better than ‘stitching’ Whisper chunks together.\n\n 3. **Self-hosted Whisper (Faster-Whisper):** If you have the infra (or use a GPU cloud like RunPod), you can run `faster-whisper`. You won’t have file size limits, and you can use `pyannote-audio` for speaker labeling, though it requires more setup.\n\n\n\n\n**Sticking with OpenAI’s Whisper for 90-minute files will always feel like a hack because of the chunking/compression trade-off.The 25MB limit on the OpenAI Whisper API is the main bottleneck here. Compressing a 90-minute recording to fit that size kills the audio quality (hence the hallucinations or empty transcripts).**\n\n**Since you’re looking for a production-ready solution that handles 60+ minutes and speaker labels, here are three solid approaches:**\n\n 1. **Deepgram API:** It’s often the go-to for long-form audio. It accepts URLs directly, doesn’t have that strict 25MB limit, and has excellent **diarization (speaker labels)** that stays consistent throughout the entire recording.\n\n 2. **AssemblyAI:** Similar to Deepgram, it’s built for long files and handles speaker diarization much better than ‘stitching’ Whisper chunks together.\n\n 3. **Self-hosted Whisper (Faster-Whisper):** If you have the infra (or use a GPU cloud like RunPod), you can run `faster-whisper`. You won’t have file size limits, and you can use `pyannote-audio` for speaker labeling, though it requires more setup.\n\n\n\n\n**Sticking with OpenAI’s Whisper for 90-minute files will always feel like a hack because of the chunking/compression trade-off.**",
"title": "Whisper API keeps returning empty transcript for videos longer than 30 minutes — stuck in production"
}