External Publication
Visit Post

New Realtime Voice Models in the API

OpenAI Developer Community May 7, 2026
Source

Additional Documentation and Guides related to this release:

  • Realtime and audio Updated overview for choosing between voice agents, realtime translation, realtime transcription, and request-based audio APIs. It explicitly routes low-latency voice agents to gpt-realtime-2.

  • Using realtime models New/updated prompting guide for gpt-realtime-2, including reasoning effort, preambles, tool policies, unclear audio handling, exact entity capture, and long-session behavior.

  • Voice agents Updated guide for building speech-to-speech agents with RealtimeAgent / RealtimeSession, WebRTC, tools, handoffs, and guardrails.

  • Realtime translation Dedicated guide for gpt-realtime-translate, including /v1/realtime/translations, WebRTC/WebSocket patterns, listen-along translation, conversational translation, and production checklist.

  • Realtime transcription Dedicated/refreshed guide for gpt-realtime-whisper, streaming transcript deltas, latency/accuracy tuning, vocabulary guidance, and production checklist.

  • Realtime with tools Guide for function tools, remote MCP servers, and built-in connectors in Realtime sessions with gpt-realtime-2.

  • gpt-realtime-2 model page

Pricing info:

  • GPT-Realtime-2: $32 / 1M audio input tokens, $0.40 / 1M cached input tokens, $64 / 1M audio output tokens
  • GPT-Realtime-Translate: $0.034 / minute
  • GPT-Realtime-Whisper: $0.017 / minute

Discussion in the ATmosphere

Loading comments...