{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreibq3s454pww6aeqcgyzfmv5xpfaagkc2n5nddepbrt3jtat7bktc4",
    "uri": "at://did:plc:lk3jfj3zq4k4wxnk474axylu/app.bsky.feed.post/3mjmwancoegr2"
  },
  "path": "/t/optimizing-agentic-architecture-strategies-for-reducing-high-token-costs-in-multi-intent-workflows/1379120#post_1",
  "publishedAt": "2026-04-16T15:52:39.000Z",
  "site": "https://community.openai.com",
  "textContent": "Hello everyone,\n\nI’m currently building an ERP assistant using the OpenAI Agentic SDK, and I’m trying to optimize cost as much as possible before scaling to many users.\n\n### Current Architecture\n\nI implemented a custom orchestration layer with:\n\n  * **Intent classification step** (using a lightweight model)\n\n  * **Dynamic model routing** based on intent\n\n  * **Dynamic tool loading** depending on the detected intent\n\n  * **Session management** using `OpenAIConversationsSession`\n\n\n\n\nFlow:\n\n  1. User message\n\n  2. -> Intent classification (`gpt-4o-mini`)\n\n  3. -> Route:\n\n     * model (`gpt-4o-mini`, `gpt-5.x-mini`, `gpt-4o`)\n\n     * tools (semantic search, SQL, actions, PDF, Google Drive, etc.)\n\n  4. -> Run agent with selected tools + prompt\n\n  5. -> Return structured ERP response\n\n\n\n\n### Optimization Strategies Already Implemented\n\n  * Using smaller models (`gpt-4o-mini`) for simple queries\n\n  * Restricting tool availability per intent\n\n  * Custom prompts per intent (to reduce unnecessary reasoning)\n\n  * Session reuse with overflow protection\n\n  * Strict scope enforcement (ERP-only assistant)\n\n  * Limiting max tokens in classification step\n\n\n\n\n### Problem\n\nDespite all of this, I’m still seeing relatively high cost:\n\n  * ~20 requests ≈ **$0.5 – $1**\n\n  * This feels too high for my use case, especially at scale\n\n\n\n\n### My Questions\n\n  1. **Is this expected with the Agent SDK?**\n\n     * Does the SDK internally add hidden token overhead (tool calls, system prompts, etc.)?\n  2. **Is intent classification doubling my cost unnecessarily?**\n\n     * Would it be better to:\n\n       * merge classification into the main agent?\n\n       * or use a rule-based / embedding-based router?\n\n  3. **Are tools increasing token usage significantly?**\n\n     * Even when not heavily used?\n  4. **Would switching to a single-model strategy be more efficient?**\n\n     * Instead of routing between multiple models\n  5. **Is there a better pattern than “agent per request”?**\n\n     * e.g., long-lived agents, cached context, or hybrid pipelines\n\n\n\n### Goal\n\nI want to reach something closer to:\n\n  * **$0.05 – $0.10 per 20 requests** (or similar efficiency)\n\n\n\nbefore scaling to production with many users.\n\nIf anyone has experience optimizing Agent SDK cost at scale, I’d really appreciate guidance, patterns, or even architecture feedback.\n\nThanks a lot",
  "title": "Optimizing Agentic Architecture: Strategies for Reducing High Token Costs in Multi-Intent Workflows"
}