Gpt-realtime-2 splits acknowledgment + next-step into separate turns, causing 5-20s caller silence (rollback to gpt-realtime-1.5 confirmed as A/B fix)
I have just discovered that this seems to be a case of false positive content_filter. From what I can find, the new model is highly sensitive and bad at this for other languages than English… You can detect it via the response.done event: “”" response done: namespace(type=‘response.done’, event_id=‘…’, response=namespace(object=‘realtime.response’, id=‘…’, status=‘incomplete’, status_details=namespace(type=‘incomplete’, reason=‘content_filter’), output=[namespace(id=‘…’, type=‘message’, status=‘incomplete’, role=‘assistant’, content=[namespace(type=‘output_audio’, transcript=‘Hej, dejligt at fortsætte efter vores korte introduktion. Sig endelig’)], phase=‘final_answer’)], conversation_id=‘…’, output_modalities=[‘audio’], max_output_tokens=‘inf’, audio=namespace(output=namespace(format=namespace(type=‘audio/pcm’, rate=24000), voice=‘marin’)), usage=namespace(total_tokens=1462, input_tokens=1312, output_tokens=150, input_token_details=namespace(text_tokens=1312, audio_tokens=0, image_tokens=0, cached_tokens=0, cached_tokens_details=namespace(text_tokens=0, audio_tokens=0, image_tokens=0)), output_token_details=namespace(text_tokens=75, audio_tokens=75)), metadata=None)) “”"
Discussion in the ATmosphere