{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreibbgamslup4izkqhap3gmsqqpl7tsmzjna77pdceubhxftgwpwori",
"uri": "at://did:plc:lk3jfj3zq4k4wxnk474axylu/app.bsky.feed.post/3mljbbkz22yh2"
},
"path": "/t/realtime-regression-in-non-english-production-voice-agents-gpt-realtime-mini-vs-gpt-realtime-mini-2025-10-06/1380643#post_1",
"publishedAt": "2026-05-10T16:59:14.000Z",
"site": "https://community.openai.com",
"textContent": "We operate a production AI voice platform built on the OpenAI Realtime API via WebSocket/SIP/Twilio.\n\nWe are currently deploying our solution across a few dozen locations nationwide for an enterprise client, and we have encountered a material regression when comparing the dated Realtime snapshot we validated in production against the listed replacement model.\n\nModel validated in production:\ngpt-realtime-mini-2025-10-06\n\nListed replacement:\ngpt-realtime-mini\n\nIssue:\nIn non-English voice-agent flows, with Romanian as our current production case, the replacement model shows noticeably worse language quality and worse faithfulness to supplied business data.\n\nThe most serious issue is not just spelling or phrasing. We have observed the newer model hallucinating non-existing departments, services, and operational details that were not present in the database/context. The older snapshot, gpt-realtime-mini-2025-10-06, has been significantly more faithful to the provided information and less prone to confabulating unavailable services or internal departments.\n\nThis is important because the older snapshot was not selected casually. It was selected after thousands of hours of testing, R&D, and practical validation in Romanian-language voice-agent scenarios. Its reliability in staying faithful to provided business information is one of the reasons we currently depend on it for production deployments.\n\nImpact:\nThis affects an active enterprise rollout across a few dozen locations nationwide. The regression impacts:\n\n * live AI phone conversations;\n * appointment and call summaries;\n * CRM/customer records;\n * operational reporting;\n * client trust during rollout.\n\n\n\nWe are concerned that this may not be limited to Romanian, but may reflect broader non-English quality/faithfulness differences between the dated snapshot and the current gpt-realtime-mini alias.\n\nEvidence:\nWe have transcription evidence and can provide side-by-side examples comparing the same or similar flows between gpt-realtime-mini-2025-10-06 and gpt-realtime-mini.\n\nQuestions:\n\n 1. Has anyone else observed worse non-English performance or worse faithfulness to supplied data on gpt-realtime-mini compared to dated Realtime snapshots?\n 2. Is OpenAI tracking language-specific regressions for Realtime models before snapshot deprecations?\n 3. Is there a path for production customers to request temporary extended access or a migration path when a listed replacement model is not behaviorally equivalent?\n\n\n\nWe are committed to building on OpenAI’s Realtime infrastructure, but we need a reliable migration path before moving production enterprise traffic away from the currently working snapshot.",
"title": "Realtime regression in non-English production voice agents: gpt-realtime-mini vs gpt-realtime-mini-2025-10-06"
}