{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreiflkjumwg64j4ydgzu3jg2snwibptxgrffvbzfqdu5pulmsmuswmu",
    "uri": "at://did:plc:lk3jfj3zq4k4wxnk474axylu/app.bsky.feed.post/3mnxu4omkbvh2"
  },
  "path": "/t/realtime-api-poor-portuguese-call-quality-with-gpt-realtime-mini-gpt-realtime/1381375#post_9",
  "publishedAt": "2026-06-10T22:27:11.000Z",
  "site": "https://community.openai.com",
  "textContent": "Thanks for the suggestions.\n\nI did some additional testing on the SIP/RTP side with Asterisk/PJSIP, and it looks like increasing the audio quality is not currently practical in this setup.\n\nThe OpenAI SIP endpoint accepted G.711 only:\n\n  * `PCMU/8000` → accepted, call completed\n\n  * `PCMA/8000` → accepted, call completed\n\n  * `G722/8000` → rejected with `400 Bad Request`\n\n  * `L16/16000` → rejected with `400 Bad Request`\n\n  * `L16/24000` → rejected with `400 Bad Request`\n\n\n\n\nFor example, this was rejected:\n\n\n    m=audio 15822 RTP/SAVP 123 101\n    a=rtpmap:123 L16/24000\n    a=rtpmap:101 telephone-event/8000\n    a=ptime:20\n    a=sendrecv\n\n\n\nThis was accepted:\n\n\n    m=audio 41544 RTP/SAVP 8 101\n    a=rtpmap:8 PCMA/8000\n    a=rtpmap:101 telephone-event/8000\n    a=ptime:20\n\n\n\nI also tried setting the `calls.accept` configuration to `audio/pcm` with `rate: 24000`, but when the SIP endpoint is configured with `ulaw` or `alaw`, the actual negotiated media remains G.711:\n\n\n    NativeFormats: (ulaw)\n    ReadFormat: ulaw\n    WriteFormat: ulaw\n\n\n\nSo it seems that `audio/pcm` in `calls.accept` does not make the SIP/RTP leg accept or negotiate `L16/24000`. At least in my tests, the SIP endpoint only works with G.711 (`PCMU`/`PCMA`) at 8 kHz.\n\nFor PSTN SIP trunks this is also a practical limitation, because all carriers I work with provide only 8 kHz codecs such as PCMU/PCMA. Even in the WhatsApp Calling SIP scenario, where Opus may be available on the Meta side, the audio would still be transcoded down to G.711 before reaching OpenAI if the OpenAI SIP leg only accepts PCMU/PCMA.\n\nSo, unless there is a specific SDP format required for PCM over SIP, or some other supported wideband codec on the OpenAI SIP endpoint, increasing the audio sample rate is not currently feasible with direct SIP integration.\n\nIt would be useful to clarify whether `audio/pcm` in `calls.accept` is expected to apply to SIP/RTP codec negotiation, or only to non-SIP Realtime media flows.\n\nThe issue is not equally distributed across all speech. General conversation is often understandable, but the most problematic parts are critical short entities: names, numbers, addresses, and payment methods. This is especially problematic because those are exactly the fields that need high accuracy in telephony workflows.\n\nIn Portuguese phone calls, the model can follow the overall intent, but it frequently mishears proper names or short payment-related terms, even when the user speaks naturally. That makes the workflow risky unless we add explicit confirmation steps.",
  "title": "Realtime API: Poor Portuguese call quality with gpt-realtime-mini / gpt-realtime"
}