New Realtime Voice Models in the API
Additional Documentation and Guides related to this release:
Realtime and audio Updated overview for choosing between voice agents, realtime translation, realtime transcription, and request-based audio APIs. It explicitly routes low-latency voice agents to
gpt-realtime-2.Using realtime models New/updated prompting guide for
gpt-realtime-2, including reasoning effort, preambles, tool policies, unclear audio handling, exact entity capture, and long-session behavior.Voice agents Updated guide for building speech-to-speech agents with
RealtimeAgent/RealtimeSession, WebRTC, tools, handoffs, and guardrails.Realtime translation Dedicated guide for
gpt-realtime-translate, including/v1/realtime/translations, WebRTC/WebSocket patterns, listen-along translation, conversational translation, and production checklist.Realtime transcription Dedicated/refreshed guide for
gpt-realtime-whisper, streaming transcript deltas, latency/accuracy tuning, vocabulary guidance, and production checklist.Realtime with tools Guide for function tools, remote MCP servers, and built-in connectors in Realtime sessions with
gpt-realtime-2.gpt-realtime-2 model page
Pricing info:
GPT-Realtime-2:$32 / 1Maudio input tokens,$0.40 / 1Mcached input tokens,$64 / 1Maudio output tokensGPT-Realtime-Translate:$0.034 / minuteGPT-Realtime-Whisper:$0.017 / minute
Discussion in the ATmosphere