External Publication
Visit Post

How to sync dual-channel transcripts via OpenAI Whisper (VAD silence stripping destroys absolute timestamps)

OpenAI Developer Community May 28, 2026
Source

I am building an automated call transcription pipeline for a PBX system. The goal is to generate a perfectly chronological, multi-speaker transcript (Caller vs. Callee) from standard 8kHz telephony audio. (My Attempted Solution) Because the OpenAI API downmixes stereo files to mono (which destroys speaker separation and causes heavy hallucination on 8kHz audio), I built a split-channel architecture:

  1. Asterisk: I use MixMonitor with the b,r(),t() flags to record the call legs into two separate, mathematically synchronized files (_caller.wav and _callee.wav).

  2. PHP Worker: A background script converts the files and fires two separate cURL requests to the Whisper API, requesting verbose_json to get exact timestamps.

  3. The Merge: The PHP script parses both JSON arrays, tags the speakers, merges the arrays, and sorts them chronologically by their start times to reconstruct the conversation.

The Specific Issue I am Facing getting jumbled transcription the transcription i am getting: [00:00] Caller: Hello, this is a Policy Test, my name is John Miller, today is Wednesday, May 27th, the

[00:00] Callee: Hi, if you record your name and reason for calling, I’ll see if this person is available.

[00:15] Caller: reference number is 473169, can you hear me clearly?

[00:24] Callee: Yes, I can hear you clearly.

[00:26] Callee: This is the Kohli site test.

[00:28] Callee: My name is Sarah Johnson.

[00:30] Callee: The audio quality sounds good from my side.

[00:33] Callee: Please continue with the verification. [00:35] Caller: I will now test timestamps and speaker changes, the amount is $125, the meeting is scheduled

[00:43] Caller: for 10.30am, please confirm the details.

[00:48] Callee: Confirmed.

[00:49] Callee: $125.

[00:51] Callee: Meeting at 10.30 AM.

[00:53] Callee: I am also testing punctuation, pauses, and pronunciation.

[00:58] Caller: Now testing short interruptions, can you just say the color blue while I continue speaking?

[01:05] Callee: Blue.

[01:07] Caller: Thank you, now testing phone numbers 9876543210, final verification test, this call recording

[01:17] Callee: Received.

[01:18] Callee: Now testing email pronunciation.

[01:20] Callee: john.miller at example dot com

[01:26] Caller: should contain timestamps, speaker labels and accurate English transcriptions, ending

[01:32] Caller: test now. the actual script of the test call i made: Caller

Hello, this is the caller side test.

My name is John Miller.

Today is Wednesday, May twenty seventh.

The reference number is four seven three one six nine.

Can you hear me clearly?

Callee

Yes, I can hear you clearly.

This is the callee side test.

My name is Sarah Johnson.

The audio quality sounds good from my side.

Please continue with the verification.

Caller

I will now test timestamps and speaker changes.

The amount is one hundred twenty five dollars.

The meeting is scheduled for ten thirty AM. Callee

Confirmed.

One hundred twenty five dollars.

Meeting at ten thirty AM.

I am also testing punctuation, pauses, and pronunciation.

Caller

Now testing short interruptions.

Can you say the color blue while I continue speaking?

Callee (interrupt slightly)

Blue. Caller

Thank you.

Now testing phone numbers.

Nine eight seven six five four three two one zero.

Callee

Received.

Now testing email pronunciation.

john dot miller at example dot com.

Caller

Final verification test.

This call recording should contain timestamps,

speaker labels, and accurate English transcription.

Ending test now.

AGI and asterisk experts please help if any solution from AGI side possible to this problem

Discussion in the ATmosphere

Loading comments...