Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreia3c6jj5phddhxa72jhtdj76bur4j54fwky6lou2d5u35be243ekm",
    "uri": "at://did:plc:lk3jfj3zq4k4wxnk474axylu/app.bsky.feed.post/3mehnzkbwkej2"
  },
  "path": "/t/realtime-api-feedback-we-built-a-star-trek-medical-computer-on-the-realtime-api-it-works-30-of-the-time/1373790#post_1",
  "publishedAt": "2026-02-09T22:31:51.000Z",
  "site": "https://community.openai.com",
  "textContent": "**TL;DR:** We’ve spent ~6 months building something that, when it fires on all cylinders, makes doctors stop mid-sentence and say “wait… it just _did_ that?” A passive AI medical scribe that listens, extracts, and structures clinical data in real time — no typing, no clicking, no dictation. The doctor just… talks to their patient, the ai is there to surface info and extract structured data.\n\n  * But the Realtime API’s instability is the wall between “incredible demo” and “deployed product.”\n\n\n\n* * *\n\n## What We Built\n\nA HIPAA-compliant passive AI assistant that sits in the background of a doctor-patient encounter. It listens. It extracts structured medical data through tool calls — orders, procedures, diagnoses, medications, clinical observations… whatever the doctor needs, as it configurable by them. It can answer questions about the patient’s chart mid-conversation. It can offload complex clinical reasoning to deeper models on the fly.\n\n  * `tool_choice: \"required\"`\n  * `output_modalities: [\"text\"]`\n  * No audio output.\n  * No chit-chat.\n  * Tools only.\n    * A `continue_waiting` tool handles cycles with nothing to extract.\n\n\n\nWhen it works — and I cannot stress this enough — _it is magic_. A cardiologist walks into a room, taps a button, has a 20-minute conversation about LVH and medication management, and walks out with structured, organized clinical data ready for the chart. We’ve had sessions where the AI caught medication discrepancies the doctor hadn’t noticed. We’ve had it surface relevant lab trends mid-conversation before the doctor even asked.\n\n> _**That’s the 30%. Here’s the other 70%.**_\n\n* * *\n\n## Issue 1: Silent Session Initialization Failures\n\n**~1 in 3-4 sessions.** Connection opens, green lights everywhere, but the API never processes audio. No errors, no disconnects — just silence. The only way to detect it is the _absence_ of expected events. We’ve built watchdog timers and automatic retry logic, but even with retries it sometimes just won’t start.\n\n> You can’t ask a doctor to “try again” while a patient is sitting in front of them, waiting for all these system to initialize, etc. etc.\n\n* * *\n\n## Issue 2: Tool Selection Death Spirals\n\nOnce the AI starts calling `continue_waiting`, it frequently gets stuck — 15+ consecutive calls with zero extractions during active medical conversation with clear, extractable content. Corrective injections often make it worse; the AI over-indexes on the reminder rather than returning to its job.\n\n> We’ve iterated on this extensively. The tool selection behavior is fundamentally inconsistent.\n\n* * *\n\n## Issue 3: Text Responses Despite `tool_choice: \"required\"`\n\nThe AI periodically generates conversational text (“I understand, let me know…”) despite `tool_choice: \"required\"`. Wastes processing cycles and triggers cascading correction loops that feed into Issue 2.\n\nThis seems like a straightforward bug.\n\n* * *\n\n## Issue 4: Quality Degradation Over Session Length\n\nBeyond 10-15 minutes, extraction accuracy drops noticeably. More idle calls, missed data, less precise tool arguments. Medical encounters run 20-40 minutes routinely. This is a critical gap for any real-world clinical deployment.\n\n* * *\n\n# Why This Matters\n\n### We’re not building a toy. This is active, HIPAA-compliant clinical infrastructure being tested with real patients, real encounters, real cardiologists. The workflow is validated. The doctors who’ve experienced it working are asking us when they can have it every day.\n\nWe have the architecture. We have the clinical integration. We have the audio pipeline, the prompts, the data extraction, the chart integration, the reasoning offloads. Everything around the Realtime API works. **The Realtime API itself is the bottleneck.**\n\n## We’re this close to deploying something that fundamentally changes how clinical documentation works. We just need the engine to be as reliable as the machine we built around it.\n\n_Happy to share session logs or debug traces if useful to the engineering team._",
  "title": "[REALTIME API] - FEEDBACK - We Built a Star Trek Medical Computer on the Realtime API, It Works 30% of the Time"
}