{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreickc4p2vlaqkgjgal5rqo3qzadvcli5lhvi5abnmrjls4ne3tsqwe",
"uri": "at://did:plc:25rdn5elo5izoxrmtis34zuk/app.bsky.feed.post/3mogqzwrnyh22"
},
"coverImage": {
"$type": "blob",
"ref": {
"$link": "bafkreih7pa2xfhz46rfvz63yo43cc2mkjziof3ocsby3qbkog25xz5yfo4"
},
"mimeType": "image/webp",
"size": 59494
},
"path": "/qasim157/common-pitfalls-building-email-agents-and-fixes-29kg",
"publishedAt": "2026-06-16T21:18:04.000Z",
"site": "https://dev.to",
"tags": [
"ai",
"email",
"agents",
"bestpractices",
"dedup recipe",
"reply-handling recipe",
"rules"
],
"textContent": "A team ships their first email agent on a Thursday. Demo went great, handler's deployed, webhook's registered. Friday morning the on-call wakes up to an inbox where the agent has been enthusiastically replying to its own replies all night, a customer who received the same answer three times, and a thread in Gmail that's shattered into five separate conversations. None of it was an exotic failure — every one of these is a known pitfall with a known fix, documented in the Nylas Agent Accounts cookbook (the product's in beta; the mistakes are timeless).\n\nHere are the nine I'd check before any launch.\n\n## 1. The agent replies to itself\n\nThe `message.created` webhook fires for _outbound_ messages too — when your agent sends a reply via the API, that sent message triggers the same event as inbound mail. Skip this check and you've built a perpetual motion machine: reply, webhook, reply.\n\n**Fix:** filter the agent's own address at the very top of the handler, before any other logic.\n\n\n\n const sender = msg.from?.[0]?.email;\n if (sender === AGENT_EMAIL) return;\n\n\n## 2. No webhook deduplication\n\nDelivery is at-least-once. If your endpoint doesn't return `200` fast enough, or the network hiccups, the same `message.created` notification arrives again — and a naive handler replies twice. The dedup recipe calls this the most common source of duplicates.\n\n**Fix:** an atomic check-and-set on the message ID before processing — `INSERT ... ON CONFLICT DO NOTHING` in Postgres, `SET id 1 NX EX 86400` in Redis. Give records a TTL of 24–48 hours so late redeliveries still get caught without the table growing forever.\n\n## 3. Dedup without locking\n\nTwo concurrent workers (Lambda instances, ECS tasks) can race past the check-and-set in the same millisecond and both generate a reply. Dedup catches the same event delivered twice; it can't catch the same event _processed simultaneously_.\n\n**Fix:** a per-thread lock with a 30-second TTL so a crashed worker self-releases — and a double-check inside the lock that inspects the thread's latest message and bails if the agent already replied. You need dedup _and_ locking; they cover different failure modes.\n\n## 4. Trusting the webhook payload for the message body\n\nThe webhook carries summary fields — `subject`, `from`, `snippet` — not the full body. Worse, if a body exceeds roughly 1 MB, the event type becomes `message.created.truncated` and the body is omitted entirely. Agents that parse the payload directly work in testing and fail on real-world mail.\n\n**Fix:** always fetch the full message from the API using the ID in the payload, as the reply-handling recipe does, and handle the truncated event type explicitly.\n\n## 5. Replies that don't thread\n\nSend a \"reply\" as a fresh message and it lands as a disconnected email in the recipient's client — no quoted context, no conversation grouping. Multiply by a few turns and the customer is hunting through five fragments of one discussion.\n\n**Fix:** pass `reply_to_message_id` on every reply. That makes the platform set the `In-Reply-To` and `References` headers so the message threads correctly in Gmail, Outlook, and the agent's own mailbox. Match incoming replies by `thread_id`, never by subject line — subjects get edited, and two different threads can share one.\n\n\n\n await nylas.messages.send({\n identifier: AGENT_GRANT_ID,\n requestBody: {\n replyToMessageId: msg.id,\n to: [{ email: sender }],\n body: replyBody,\n },\n });\n\n\n## 6. Replying instantly to every message\n\nHumans send corrections. A recipient fires off a reply, spots a mistake, and sends a follow-up fifteen seconds later — and your agent has already answered the first message, so now it answers the second too, and the conversation forks.\n\n**Fix:** a 30–60 second cooldown before responding in active threads, batching consecutive inbound messages into one considered reply.\n\n## 7. No outbound circuit breaker\n\nEven with dedup, locking, and self-filtering, a logic bug can still produce a reply storm — and an autonomous sender fails at machine speed. This is the safety net the dedup recipe says not to ship without.\n\n**Fix:** a per-thread send budget. If the agent has sent 3 or more messages on one thread within 5 minutes, stop sending and escalate to a human. A rate limit triggering is a page; a runaway agent is an apology tour.\n\n## 8. Letting junk wake the agent\n\nSpam, bounce-backs, and out-of-office auto-replies all fire `message.created`. If every one of them reaches your LLM, you're paying inference costs to reason about garbage — and risking the agent _answering_ it.\n\n**Fix:** push filtering below your application using rules. A `block` rule rejects known-bad senders at the SMTP level so your code never sees the message; `assign_to_folder` routes automated notifications away from the inbox so your handler can skip folders the agent shouldn't answer. Rules run in priority order (0–1000, lower first), so put specific matches before broad `contains` rules — the first matching `block` is terminal.\n\n## 9. Treating a blocked send as a retryable error\n\nIf your workspace has outbound rules, a send matching a `block` rule returns `403` — and no retry will ever deliver it, because the rule rejected it before the provider was involved. An agent with generic retry logic will hammer that send forever and report a flaky network.\n\n**Fix:** treat `403` on send as terminal. Log it, then query `GET /v3/grants/{grant_id}/rule-evaluations` to see exactly which rule matched and what data was evaluated — that endpoint is the fastest answer to \"why didn't this send?\"\n\nThere's one nuance worth encoding in your error handler. Rule evaluation fails closed: if a `block` rule can't be evaluated because of a transient infrastructure problem (say, a list lookup failure during `in_list` matching), the send is blocked anyway — but it comes back as a `503`, not a `403`, and the audit record carries `blocked_by_evaluation_error: true`. So the rule is simple: retry `503`, never retry `403`. Conflating the two is how agents either give up on deliverable mail or hammer undeliverable mail.\n\nThe pattern across all nine: email agents fail at the seams between at-least-once infrastructure and autonomous action, not in the LLM prompt. The fixes are boring — a filter, a lock, a cap, a rule — and that's the point. Boring is what you want standing between a language model and a real human's inbox.\n\nTurn this into a pre-launch checklist: nine items, and your load test should specifically exercise #2 and #3 by firing duplicate webhooks from concurrent connections. Which of these has bitten you in production — and was the fix on this list?",
"title": "Common Pitfalls Building Email Agents (and Fixes)"
}