{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreid3ezx2nj6ud4tuegoklsh5qhfbi5d4xvzce7xqpncqvahvvceluu",
    "uri": "at://did:plc:lk3jfj3zq4k4wxnk474axylu/app.bsky.feed.post/3mkwedl7tf2g2"
  },
  "path": "/t/whisper-api-keeps-returning-empty-transcript-for-videos-longer-than-30-minutes-stuck-in-production/1380129#post_2",
  "publishedAt": "2026-05-03T03:09:51.000Z",
  "site": "https://community.openai.com",
  "textContent": "**The 25MB limit on the OpenAI Whisper API is the main bottleneck here. Compressing a 90-minute recording to fit that size kills the audio quality (hence the hallucinations or empty transcripts).**\n\n**Since you’re looking for a production-ready solution that handles 60+ minutes and speaker labels, here are three solid approaches:**\n\n  1. **Deepgram API:** It’s often the go-to for long-form audio. It accepts URLs directly, doesn’t have that strict 25MB limit, and has excellent **diarization (speaker labels)** that stays consistent throughout the entire recording.\n\n  2. **AssemblyAI:** Similar to Deepgram, it’s built for long files and handles speaker diarization much better than ‘stitching’ Whisper chunks together.\n\n  3. **Self-hosted Whisper (Faster-Whisper):** If you have the infra (or use a GPU cloud like RunPod), you can run `faster-whisper`. You won’t have file size limits, and you can use `pyannote-audio` for speaker labeling, though it requires more setup.\n\n\n\n\n**Sticking with OpenAI’s Whisper for 90-minute files will always feel like a hack because of the chunking/compression trade-off.The 25MB limit on the OpenAI Whisper API is the main bottleneck here. Compressing a 90-minute recording to fit that size kills the audio quality (hence the hallucinations or empty transcripts).**\n\n**Since you’re looking for a production-ready solution that handles 60+ minutes and speaker labels, here are three solid approaches:**\n\n  1. **Deepgram API:** It’s often the go-to for long-form audio. It accepts URLs directly, doesn’t have that strict 25MB limit, and has excellent **diarization (speaker labels)** that stays consistent throughout the entire recording.\n\n  2. **AssemblyAI:** Similar to Deepgram, it’s built for long files and handles speaker diarization much better than ‘stitching’ Whisper chunks together.\n\n  3. **Self-hosted Whisper (Faster-Whisper):** If you have the infra (or use a GPU cloud like RunPod), you can run `faster-whisper`. You won’t have file size limits, and you can use `pyannote-audio` for speaker labeling, though it requires more setup.\n\n\n\n\n**Sticking with OpenAI’s Whisper for 90-minute files will always feel like a hack because of the chunking/compression trade-off.**",
  "title": "Whisper API keeps returning empty transcript for videos longer than 30 minutes — stuck in production"
}