External Publication

Optimizing Agentic Architecture: Strategies for Reducing High Token Costs in Multi-Intent Workflows

OpenAI Developer Community April 16, 2026

Hello everyone,

I’m currently building an ERP assistant using the OpenAI Agentic SDK, and I’m trying to optimize cost as much as possible before scaling to many users.

Current Architecture

I implemented a custom orchestration layer with:

Intent classification step (using a lightweight model)
Dynamic model routing based on intent
Dynamic tool loading depending on the detected intent
Session management using OpenAIConversationsSession

Flow:

User message
-> Intent classification (gpt-4o-mini)
-> Route:
- model (gpt-4o-mini, gpt-5.x-mini, gpt-4o)
- tools (semantic search, SQL, actions, PDF, Google Drive, etc.)
-> Run agent with selected tools + prompt
-> Return structured ERP response

Optimization Strategies Already Implemented

Using smaller models (gpt-4o-mini) for simple queries
Restricting tool availability per intent
Custom prompts per intent (to reduce unnecessary reasoning)
Session reuse with overflow protection
Strict scope enforcement (ERP-only assistant)
Limiting max tokens in classification step

Problem

Despite all of this, I’m still seeing relatively high cost:

~20 requests ≈ $0.5 – $1
This feels too high for my use case, especially at scale

My Questions

Is this expected with the Agent SDK?
- Does the SDK internally add hidden token overhead (tool calls, system prompts, etc.)?
Is intent classification doubling my cost unnecessarily?
- Would it be better to:
  - merge classification into the main agent?
  - or use a rule-based / embedding-based router?
Are tools increasing token usage significantly?
- Even when not heavily used?
Would switching to a single-model strategy be more efficient?
- Instead of routing between multiple models
Is there a better pattern than “agent per request”?
- e.g., long-lived agents, cached context, or hybrid pipelines

Goal

I want to reach something closer to:

$0.05 – $0.10 per 20 requests (or similar efficiency)

before scaling to production with many users.

If anyone has experience optimizing Agent SDK cost at scale, I’d really appreciate guidance, patterns, or even architecture feedback.

Thanks a lot

Current Architecture

Optimization Strategies Already Implemented

Problem

My Questions

Goal

Discussion in the ATmosphere