Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreidvg6msrsb7off3x77ajl674daz3dhp5ow7xummijjae67k44ii7a",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mgdi5vr3wmr2"
  },
  "path": "/t/trace-to-fix-how-are-you-actually-improving-rag-agents-after-observability-flags-issues/174027#post_1",
  "publishedAt": "2026-03-05T17:11:50.000Z",
  "site": "https://discuss.huggingface.co",
  "textContent": "I’ve been looking at the agent/LLM observability space lately (Langfuse, LangSmith, Arize, Braintrust, Datadog LLM Observability, etc.). Traces are great at showing what failed and where it failed.\n\nWhat I’m still curious about is the step after that:\n\nHow do you go from “I see the failure in the trace” to “I found the fix” in a repeatable way?\n\nExamples of trace-level issues I mean:\n\n  * Retrieval returns low-quality context or misses key docs\n  * Citation enforcement fails or the model does not cite what it uses\n  * Tool calls have bad parameters or the agent picks the wrong tool\n  * Reranking or chunking choices look off in hindsight\n\n\n\nDo you:\n\n  * Write custom scripts to sweep params (chunk size, top-k, rerankers, prompts, tool policies)?\n  * Add failing traces to a dataset and run experiments?\n  * A/B prompts in production?\n  * Maintain a regression suite of traces?\n  * Something else?\n\n\n\nWould love to hear the practical workflow people are actually using.",
  "title": "Trace-to-Fix: how are you actually improving RAG/agents after observability flags issues?"
}