{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreicec4hd3le4ykfuvq7em7iqswbnzdtagkaertcjvmxpfv56odyu2q",
"uri": "at://did:plc:lk3jfj3zq4k4wxnk474axylu/app.bsky.feed.post/3mkrx3jgbib32"
},
"path": "/t/whisper-api-keeps-returning-empty-transcript-for-videos-longer-than-30-minutes-stuck-in-production/1380129#post_1",
"publishedAt": "2026-05-01T09:28:03.000Z",
"site": "https://community.openai.com",
"textContent": "Hey everyone,\n\nRunning into a consistent issue with Whisper API and longer recordings and not sure what the right fix is.\n\nMy setup right now:\n\n * Pull the Zoom recording → convert to MP3 with FFmpeg → compress under 25MB → send to Whisper API\n\n\n\nWorks fine for anything under 20 minutes. The moment I go past 30 minutes the transcript either comes back empty or just cuts off mid-sentence with no error message. Whisper just returns 200 with partial or no content.\n\nAlready tried a few things:\n\nSplitting into chunks — works but the speaker attribution gets completely lost between chunks and stitching the context back together is messy.\n\nLowering the bitrate more — quality drops so much that Whisper starts misidentifying words, especially with any background noise or non-native accents.\n\nSwitching to gpt-4o-transcribe — hit the 1500 second limit which is actually worse than Whisper for longer calls.\n\nThe real frustration is the entire pipeline assumes you have a small local file. For any real meeting or interview recording that is just not realistic without seriously degrading the audio.\n\nHas anyone figured out a solid approach for this? Ideally something that:\n\n * Takes the recording URL directly without needing to download and re-encode\n * Handles 60-90 minute recordings reliably\n * Keeps speaker labels intact\n\n\n\nOpen to completely different approaches if Whisper just isn’t the right tool for this use case.",
"title": "Whisper API keeps returning empty transcript for videos longer than 30 minutes — stuck in production"
}