External Publication
Visit Post

Gpt-realtime-2 GA API: What is the correct audio format for g711_ulaw (Twilio/telephony)?

OpenAI Developer Community May 12, 2026
Source

Migrating from gpt-realtime-1.5 (beta) to gpt-realtime-2 (GA API) for a Twilio-based voice agent. The GA API rejects the old flat input_audio_format / output_audio_format parameters and requires a nested session.audio.input.format object, but the correct type value for G.711 μ-law (used by Twilio media streams) is unclear.

What we’ve tried

Beta API (worked fine): { “type”: “session.update”, “session”: { “input_audio_format”: “g711_ulaw”, “output_audio_format”: “g711_ulaw” } }

GA API attempts (all rejected):

  1. “session.input_audio_format” → Unknown parameter: ‘session.input_audio_format’
  2. format: { type: “g711_ulaw” } → rejected

{ “type”: “session.update”, “session”: { “type”: “realtime”, “output_modalities”: [“text”, “audio”], “audio”: { “input”: { “format”: { “type”: “g711_ulaw” } }, “output”: { “format”: { “type”: “g711_ulaw” }, “voice”: “marin” } } } }

Questions

  1. What are the valid values for session.audio.input.format.type in the GA API?
  2. Is G.711 μ-law (8kHz) supported in gpt-realtime-2 over WebSocket, or only via SIP?
  3. Is there an official migration guide from the beta to the GA session config schema?

Environment

  • Model: gpt-realtime-2
  • Connection: WebSocket (wss://api.openai.com/v1/realtime)
  • Transport: Twilio media streams (G.711 μ-law, 8kHz)
  • No OpenAI-Beta header (GA API rejects it)

Any help appreciated — the GA API docs don’t enumerate the valid audio format types for the nested object structure.

Discussion in the ATmosphere

Loading comments...