Speech to text with diarization
OpenAI Developer Community
April 9, 2026
Whisper doesn’t natively support speaker diarization. If you wanted to get diarized transcripts, you’d have to use a diarization library like pyannote to segment the audio by speaker, then pass each segment to Whisper for transcription.
Unfortunately, you might still have mistakes using this approach because pyannote just uses AI to figure out who said what and it’s not always accurate. I’d look for an API that captures separate audio streams per speaker and can offer perfect diarization, which will be a faster way of solving this problem.
Discussion in the ATmosphere