Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreibdrjebescgafoeq63xfyq5ei5dn33pktn6tbksik2zlzzyhmglom",
    "uri": "at://did:plc:lk3jfj3zq4k4wxnk474axylu/app.bsky.feed.post/3mmvzkk74xuj2"
  },
  "path": "/t/how-to-sync-dual-channel-transcripts-via-openai-whisper-vad-silence-stripping-destroys-absolute-timestamps/1381942#post_1",
  "publishedAt": "2026-05-28T10:31:50.000Z",
  "site": "https://community.openai.com",
  "textContent": "I am building an automated call transcription pipeline for a PBX system. The goal is to generate a perfectly chronological, multi-speaker transcript (Caller vs. Callee) from standard 8kHz telephony audio.\n**(My Attempted Solution)** Because the OpenAI API downmixes stereo files to mono (which destroys speaker separation and causes heavy hallucination on 8kHz audio), I built a split-channel architecture:\n\n  1. **Asterisk:** I use `MixMonitor` with the `b,r(),t()` flags to record the call legs into two separate, mathematically synchronized files (`_caller.wav` and `_callee.wav`).\n\n  2. **PHP Worker:** A background script converts the files and fires two separate `cURL` requests to the Whisper API, requesting `verbose_json` to get exact timestamps.\n\n  3. **The Merge:** The PHP script parses both JSON arrays, tags the speakers, merges the arrays, and sorts them chronologically by their start times to reconstruct the conversation.\n\n\n\n\nThe Specific Issue I am Facing\ngetting jumbled transcription\nthe transcription i am getting:\n[00:00] Caller: Hello, this is a Policy Test, my name is John Miller, today is Wednesday, May 27th, the\n\n[00:00] Callee: Hi, if you record your name and reason for calling, I’ll see if this person is available.\n\n[00:15] Caller: reference number is 473169, can you hear me clearly?\n\n[00:24] Callee: Yes, I can hear you clearly.\n\n[00:26] Callee: This is the Kohli site test.\n\n[00:28] Callee: My name is Sarah Johnson.\n\n[00:30] Callee: The audio quality sounds good from my side.\n\n[00:33] Callee: Please continue with the verification.\n[00:35] Caller: I will now test timestamps and speaker changes, the amount is $125, the meeting is scheduled\n\n[00:43] Caller: for 10.30am, please confirm the details.\n\n[00:48] Callee: Confirmed.\n\n[00:49] Callee: $125.\n\n[00:51] Callee: Meeting at 10.30 AM.\n\n[00:53] Callee: I am also testing punctuation, pauses, and pronunciation.\n\n[00:58] Caller: Now testing short interruptions, can you just say the color blue while I continue speaking?\n\n[01:05] Callee: Blue.\n\n[01:07] Caller: Thank you, now testing phone numbers 9876543210, final verification test, this call recording\n\n[01:17] Callee: Received.\n\n[01:18] Callee: Now testing email pronunciation.\n\n[01:20] Callee: john.miller at example dot com\n\n[01:26] Caller: should contain timestamps, speaker labels and accurate English transcriptions, ending\n\n[01:32] Caller: test now.\nthe actual script of the test call i made:\nCaller\n\nHello, this is the caller side test.\n\nMy name is John Miller.\n\nToday is Wednesday, May twenty seventh.\n\nThe reference number is four seven three one six nine.\n\nCan you hear me clearly?\n\nCallee\n\nYes, I can hear you clearly.\n\nThis is the callee side test.\n\nMy name is Sarah Johnson.\n\nThe audio quality sounds good from my side.\n\nPlease continue with the verification.\n\nCaller\n\nI will now test timestamps and speaker changes.\n\nThe amount is one hundred twenty five dollars.\n\nThe meeting is scheduled for ten thirty AM.\nCallee\n\nConfirmed.\n\nOne hundred twenty five dollars.\n\nMeeting at ten thirty AM.\n\nI am also testing punctuation, pauses, and pronunciation.\n\nCaller\n\nNow testing short interruptions.\n\nCan you say the color blue while I continue speaking?\n\nCallee (interrupt slightly)\n\nBlue.\nCaller\n\nThank you.\n\nNow testing phone numbers.\n\nNine eight seven six five four three two one zero.\n\nCallee\n\nReceived.\n\nNow testing email pronunciation.\n\njohn dot miller at example dot com.\n\nCaller\n\nFinal verification test.\n\nThis call recording should contain timestamps,\n\nspeaker labels,\nand accurate English transcription.\n\nEnding test now.\n\nAGI and asterisk experts please help if any solution from AGI side possible to this problem",
  "title": "How to sync dual-channel transcripts via OpenAI Whisper (VAD silence stripping destroys absolute timestamps)"
}