External Publication
Visit Post

How to sync dual-channel transcripts via OpenAI Whisper (VAD silence stripping destroys absolute timestamps)

OpenAI Developer Community May 28, 2026
Source

Well honestly your split-channel approach is actually pretty solid already the timestamps/speaker reconstruction logic looks mostly correct to me.

The bigger issue feels more like Whisper struggling with

  • narrowband 8kHz telephony audio

  • overlapping speech/interruption handling

  • and context reconstruction across independently transcribed channels.

A few things stand out from your output tho

  • “Policy Test” instead of “caller side test”

  • numbers normalized weirdly

  • sentence continuation split oddly across timestamps

  • interruption timing drift around the “blue” overlap

So that usually looks more like ASR inference limitations than AGI/Asterisk sync problems.

One thing I’d seriously test

  • upsample audio to 16kHz before transcription (sox or ffmpeg)

  • even though no new information is created, Whisper tends to behave noticeably better on resampled telephony audio.

Also maybe try

  • adding small silence padding at start of both legs before transcription

  • forcing shorter segments/VAD chunking

  • aligning merged segments by midpoint timestamps instead of raw start times only.

Hmm.. Your actual synchronization pipeline honestly seems cleaner than most PBX transcription setups I’ve ever seen

Discussion in the ATmosphere

Loading comments...