Optimizing Agentic Architecture: Strategies for Reducing High Token Costs in Multi-Intent Workflows
Hello everyone,
I’m currently building an ERP assistant using the OpenAI Agentic SDK, and I’m trying to optimize cost as much as possible before scaling to many users.
Current Architecture
I implemented a custom orchestration layer with:
Intent classification step (using a lightweight model)
Dynamic model routing based on intent
Dynamic tool loading depending on the detected intent
Session management using
OpenAIConversationsSession
Flow:
User message
-> Intent classification (
gpt-4o-mini)-> Route:
model (
gpt-4o-mini,gpt-5.x-mini,gpt-4o)tools (semantic search, SQL, actions, PDF, Google Drive, etc.)
-> Run agent with selected tools + prompt
-> Return structured ERP response
Optimization Strategies Already Implemented
Using smaller models (
gpt-4o-mini) for simple queriesRestricting tool availability per intent
Custom prompts per intent (to reduce unnecessary reasoning)
Session reuse with overflow protection
Strict scope enforcement (ERP-only assistant)
Limiting max tokens in classification step
Problem
Despite all of this, I’m still seeing relatively high cost:
~20 requests ≈ $0.5 – $1
This feels too high for my use case, especially at scale
My Questions
Is this expected with the Agent SDK?
- Does the SDK internally add hidden token overhead (tool calls, system prompts, etc.)?
Is intent classification doubling my cost unnecessarily?
Would it be better to:
merge classification into the main agent?
or use a rule-based / embedding-based router?
Are tools increasing token usage significantly?
- Even when not heavily used?
Would switching to a single-model strategy be more efficient?
- Instead of routing between multiple models
Is there a better pattern than “agent per request”?
- e.g., long-lived agents, cached context, or hybrid pipelines
Goal
I want to reach something closer to:
- $0.05 – $0.10 per 20 requests (or similar efficiency)
before scaling to production with many users.
If anyone has experience optimizing Agent SDK cost at scale, I’d really appreciate guidance, patterns, or even architecture feedback.
Thanks a lot
Discussion in the ATmosphere