New Realtime Voice Models in the API

OpenAI Developer Community May 7, 2026

Source

Additional Documentation and Guides related to this release:

Realtime and audio Updated overview for choosing between voice agents, realtime translation, realtime transcription, and request-based audio APIs. It explicitly routes low-latency voice agents to gpt-realtime-2.
Using realtime models New/updated prompting guide for gpt-realtime-2, including reasoning effort, preambles, tool policies, unclear audio handling, exact entity capture, and long-session behavior.
Voice agents Updated guide for building speech-to-speech agents with RealtimeAgent / RealtimeSession, WebRTC, tools, handoffs, and guardrails.
Realtime translation Dedicated guide for gpt-realtime-translate, including /v1/realtime/translations, WebRTC/WebSocket patterns, listen-along translation, conversational translation, and production checklist.
Realtime transcription Dedicated/refreshed guide for gpt-realtime-whisper, streaming transcript deltas, latency/accuracy tuning, vocabulary guidance, and production checklist.
Realtime with tools Guide for function tools, remote MCP servers, and built-in connectors in Realtime sessions with gpt-realtime-2.
gpt-realtime-2 model page

Pricing info:

GPT-Realtime-2: $32 / 1M audio input tokens, $0.40 / 1M cached input tokens, $64 / 1M audio output tokens
GPT-Realtime-Translate: $0.034 / minute
GPT-Realtime-Whisper: $0.017 / minute