{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreiblvy7rikbtxhl3wehkfxvvxlcjyft2bd4muupce7qfrwrhhipase",
    "uri": "at://did:plc:lk3jfj3zq4k4wxnk474axylu/app.bsky.feed.post/3mmcpkwd3fbh2"
  },
  "path": "/t/realtime-api-poor-portuguese-call-quality-with-gpt-realtime-mini-gpt-realtime/1381375#post_1",
  "publishedAt": "2026-05-20T19:08:40.000Z",
  "site": "https://community.openai.com",
  "textContent": "Hi everyone,\n\nI’d like to share an issue I’m experiencing in a real production scenario using the OpenAI Realtime API for phone calls in Portuguese.\n\nI have tested different Realtime models, including `gpt-realtime-mini` and `gpt-realtime`, but the problem is very similar across them.\n\nIn Brazil, many users answer calls using speakerphone mode. This captures a lot of background noise from the environment and causes interference. As a result, the AI frequently starts talking by itself or responds when the user has not clearly finished speaking.\n\nAnother frequent issue is speech recognition accuracy when collecting data. For example, when the user provides an address, name, or other important information, the AI often understands completely different words. Sometimes the words are not even similar. For example, the person says “Apple”, but the AI understands “Oscar”.\n\nCurrently, I’m using Realtime directly via SIP, with transcription enabled, and I accept the call using these parameters:\n\n\n    ‘audio’ => [\n    ‘input’ => [\n    ‘format’ => [‘type’ => ‘audio/pcmu’],\n\n        'transcription' => [\n            'model' => 'gpt-4o-transcribe',\n            'language' => 'pt',\n            'prompt' => 'Transcribe with maximum fidelity. Proper names are critical. Do not correct names. If you’re unsure, keep exactly what you heard.'\n        ],\n\n        'turn_detection' => [\n            'type' => 'server_vad',\n            'threshold' => 0.7,\n            'prefix_padding_ms' => 300,\n            'silence_duration_ms' => 1100,\n            'idle_timeout_ms' => 12000,\n\n            'create_response' => true,\n            'interrupt_response' => true,\n        ],\n    ],\n\n    'output' => [\n        'format' => ['type' => 'audio/pcmu'],\n        'voice' => 'marin',\n    ],\n\n    ]\n\n\nHas anyone had a better experience with calls in noisy environments, especially with not english users on speakerphone?\n\nAny recommendations for improving transcription accuracy, turn detection, VAD configuration, or reducing cases where the AI starts speaking by itself would be very welcome.\n\nI already tested `gpt-realtime-1.5` and `gpt-realtime-2`, but they are not acceptable for my use case right now because they are too expensive. Also, in my tests, the same issues still happened with them.",
  "title": "Realtime API: Poor Portuguese call quality with gpt-realtime-mini / gpt-realtime"
}