{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreias5wsqg27aqljaimkrrseqxqxc3cyepcpehwmpvalvbcewa3gsl4",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3ml3geaqppb62"
  },
  "path": "/t/how-do-you-reduce-latency-in-real-time-ai-applications/175624#post_3",
  "publishedAt": "2026-05-05T04:39:12.000Z",
  "site": "https://discuss.huggingface.co",
  "textContent": "We are dealing with similar issues at my company. We need near-realtime audio processing - transcribe and then analyze the text in various ways. A delay of 1 or 2 minutes is fine for us but the pipeline we need to run is quite long.\n\nSince we are using Gemini, We are currently experimenting with provisioned throughtput to see if it helps at least stabilize the response times.\n\nFor some parts of the pipeline we are actually moving back to smaller local models, not necessarily generative.",
  "title": "How do you reduce latency in real-time AI applications?"
}