External Publication

How to sync dual-channel transcripts via OpenAI Whisper (VAD silence stripping destroys absolute timestamps)

OpenAI Developer Community May 28, 2026

Well honestly your split-channel approach is actually pretty solid already the timestamps/speaker reconstruction logic looks mostly correct to me.

The bigger issue feels more like Whisper struggling with

A few things stand out from your output tho

So that usually looks more like ASR inference limitations than AGI/Asterisk sync problems.

One thing I’d seriously test

upsample audio to 16kHz before transcription (sox or ffmpeg)
even though no new information is created, Whisper tends to behave noticeably better on resampled telephony audio.

Also maybe try

adding small silence padding at start of both legs before transcription
forcing shorter segments/VAD chunking
aligning merged segments by midpoint timestamps instead of raw start times only.

Hmm.. Your actual synchronization pipeline honestly seems cleaner than most PBX transcription setups I’ve ever seen