{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreibq3s454pww6aeqcgyzfmv5xpfaagkc2n5nddepbrt3jtat7bktc4",
"uri": "at://did:plc:lk3jfj3zq4k4wxnk474axylu/app.bsky.feed.post/3mjmwancoegr2"
},
"path": "/t/optimizing-agentic-architecture-strategies-for-reducing-high-token-costs-in-multi-intent-workflows/1379120#post_1",
"publishedAt": "2026-04-16T15:52:39.000Z",
"site": "https://community.openai.com",
"textContent": "Hello everyone,\n\nI’m currently building an ERP assistant using the OpenAI Agentic SDK, and I’m trying to optimize cost as much as possible before scaling to many users.\n\n### Current Architecture\n\nI implemented a custom orchestration layer with:\n\n * **Intent classification step** (using a lightweight model)\n\n * **Dynamic model routing** based on intent\n\n * **Dynamic tool loading** depending on the detected intent\n\n * **Session management** using `OpenAIConversationsSession`\n\n\n\n\nFlow:\n\n 1. User message\n\n 2. -> Intent classification (`gpt-4o-mini`)\n\n 3. -> Route:\n\n * model (`gpt-4o-mini`, `gpt-5.x-mini`, `gpt-4o`)\n\n * tools (semantic search, SQL, actions, PDF, Google Drive, etc.)\n\n 4. -> Run agent with selected tools + prompt\n\n 5. -> Return structured ERP response\n\n\n\n\n### Optimization Strategies Already Implemented\n\n * Using smaller models (`gpt-4o-mini`) for simple queries\n\n * Restricting tool availability per intent\n\n * Custom prompts per intent (to reduce unnecessary reasoning)\n\n * Session reuse with overflow protection\n\n * Strict scope enforcement (ERP-only assistant)\n\n * Limiting max tokens in classification step\n\n\n\n\n### Problem\n\nDespite all of this, I’m still seeing relatively high cost:\n\n * ~20 requests ≈ **$0.5 – $1**\n\n * This feels too high for my use case, especially at scale\n\n\n\n\n### My Questions\n\n 1. **Is this expected with the Agent SDK?**\n\n * Does the SDK internally add hidden token overhead (tool calls, system prompts, etc.)?\n 2. **Is intent classification doubling my cost unnecessarily?**\n\n * Would it be better to:\n\n * merge classification into the main agent?\n\n * or use a rule-based / embedding-based router?\n\n 3. **Are tools increasing token usage significantly?**\n\n * Even when not heavily used?\n 4. **Would switching to a single-model strategy be more efficient?**\n\n * Instead of routing between multiple models\n 5. **Is there a better pattern than “agent per request”?**\n\n * e.g., long-lived agents, cached context, or hybrid pipelines\n\n\n\n### Goal\n\nI want to reach something closer to:\n\n * **$0.05 – $0.10 per 20 requests** (or similar efficiency)\n\n\n\nbefore scaling to production with many users.\n\nIf anyone has experience optimizing Agent SDK cost at scale, I’d really appreciate guidance, patterns, or even architecture feedback.\n\nThanks a lot",
"title": "Optimizing Agentic Architecture: Strategies for Reducing High Token Costs in Multi-Intent Workflows"
}