Self-Improving Agents via Scheduled Reflection: Anthropic's Dreaming Architecture
On May 6, 2025, Anthropic shipped three new capabilities for Managed Agents: Dreaming (research preview), Outcomes (public beta), and multi-agent orchestration (public beta). This note focuses on the architectural implications of Dreaming and Outcomes as a coupled feedback mechanism, with attention to what distinguishes this approach from existing memory and evaluation patterns.
The Core Problem: Cross-Session Blind Spots
Standard agent architectures are stateless by design. Each session begins from a fixed system prompt plus whatever context is explicitly passed. The agent has no visibility into patterns that emerge across sessions unless that visibility is explicitly engineered.
Existing approaches to persistent agent memory fall into a few categories:
Explicit tool-based writes : the agent calls
memory.write()when instructedEnd-of-session summarization : a summary is generated and prepended to future sessions
RAG over interaction history : past sessions are embedded and retrieved at query time
All of these are reactive. The agent records what it’s told to record, or retrieves what’s explicitly queried. None of them surface emergent patterns across sessions without human-defined retrieval logic.
Dreaming addresses a different problem: proactive pattern extraction from session history without explicit instruction.
Dreaming: Architecture
What it is
Dreaming is a scheduled background process that runs between sessions. The agent reviews past conversation transcripts, identifies recurring patterns, and writes learnings into its memory store. The original session data is not modified.
Three pattern types are targeted:
Recurring errors of the same type
Approaches that consistently produced good outcomes
Edge cases the agent systematically missed
Why cross-session visibility matters
A single session cannot observe cross-session patterns. A support agent making the same classification error 12 times in a month has no mechanism to notice this — each session starts fresh. Dreaming surfaces exactly this class of signal.
This is structurally similar to the role of sleep-phase memory consolidation in biological systems: individual experiences are processed in isolation during acquisition, but pattern extraction and long-term storage happen in a separate, offline phase.
Autonomy modes
Automatic:
analysis → direct write to memory store
(no human in the loop)
Human Review:
analysis → proposed memory updates
→ human approval
→ apply on confirmation
The choice between modes is an architectural decision about acceptable autonomy level. Automatic is appropriate for well-bounded domains with predictable error patterns. Human Review is appropriate where unintended learning could have significant downstream consequences.
Memory as accumulated deployment context
An agent that has been running for three months and an agent deployed today with an identical prompt are different systems. The former has three months of self-curated experience in its memory store. This is not model fine-tuning — it’s dynamically updated context specific to a particular deployment instance.
The implication: the competitive advantage of an agent is no longer solely its prompt or its base model. It’s the history of what it has learned from its own operation.
Outcomes: Isolated Evaluation
The signal problem
Dreaming requires a quality signal. Without it, the agent cannot distinguish good outputs from bad when analyzing session history. Outcomes provides this signal through an isolated evaluator.
Architecture
Success rubric : defined by the developer. Can include objective criteria (file structure, required fields, format compliance) or subjective criteria (editorial voice, brand consistency, writing style).
Isolated evaluator : a separate Claude instance running in its own context window, isolated from the primary agent’s reasoning chain. This isolation is architecturally significant: the evaluator has no access to the agent’s chain-of-thought, preventing rationalization bias in evaluation.
Iteration loop :
agent generates output
↓
evaluator checks against rubric
↓
pass → done
fail → evaluator identifies what needs to change
↓
agent iterates
↓
repeat until pass or max iterations
Performance numbers
From Anthropic’s internal testing:
Task success rates: +10 percentage points over standard prompting
Structured file generation: .docx +8.4% , .pptx +10.1%
Applicable to subjective quality dimensions: editorial voice, style, brand consistency
These numbers are from Anthropic’s internal benchmarks. Results on specific tasks will depend heavily on rubric quality and task characteristics.
The Dreaming + Outcomes coupling
Outcomes → identifies failures (what didn't work)
Dreaming → remembers failure patterns (why it didn't work)
Together they close the feedback loop without human intervention at each cycle. Outcomes is the exam; Dreaming is the error notebook. The combination enables a self-improvement loop that operates autonomously between sessions.
Multi-Agent Orchestration: Topology
Structure
Coordinator agent (1 instance)
├── Subagent 1 (independent context window)
├── Subagent 2 (independent context window)
├── ...
└── Subagent N (up to 20, shared filesystem)
Key constraints
Orchestration depth: 1 level. Sub-subagents are not supported. This is a deliberate constraint that simplifies tracing and debugging.
Claude models only. Orchestration, Dreaming, and Outcomes grading all run on Claude. Cross-provider routing is not supported at this layer.
Shared filesystem as the coordination mechanism between subagents.
Full trace visibility in Claude Console.
Coordinator can send follow-up messages mid-workflow; subagents retain context between exchanges.
Infrastructure ownership
Anthropic handles process management, failure recovery, context synchronization, and timeout handling. The developer defines what each agent does; Anthropic manages how it runs.
Reported results
Harvey (legal AI): task completion rates increased approximately 6x
Wisedocs (document verification): review speed improved 50% while maintaining quality standards
Netflix: parallel batch analysis across hundreds of build logs
Spiral by Every: Haiku coordinator + Opus writing subagents + Outcomes grader scoring against editorial principles
The Complete Self-Improvement Loop
The three capabilities compose into a closed loop:
Task decomposition (orchestration)
↓
Execution
↓
Output evaluation (Outcomes)
↓
Cross-session pattern extraction (Dreaming)
↓
Applied to future sessions
This is the architectural shift from stateless tool to accumulating system. Each component addresses a distinct layer:
| Layer | Component | Function |
|---|---|---|
| Execution | Multi-agent orchestration | Parallel task decomposition and delegation |
| Evaluation | Outcomes | Isolated quality grading against developer rubrics |
| Reflection | Dreaming | Scheduled cross-session pattern extraction |
| Notification | Webhooks | Push notifications on task completion |
Limitations and Open Questions
Claude-only constraint
All components — orchestration, Dreaming, Outcomes grading — run exclusively on Claude models. Systems requiring model diversity for cost optimization, specialized capabilities, or latency requirements need to solve that routing layer separately.
Dreaming is research preview
Not GA. Production integration planning should account for potential API changes.
Orchestration depth limit
Sub-subagents are not currently supported. Complex hierarchical task decomposition requiring more than one level of delegation is a design constraint.
Autonomous memory update risks
In Automatic mode, the agent can learn in unintended directions. Human Review mode exists as a mitigation, but at scale, human review becomes a bottleneck.
Open questions not yet addressed in public documentation
Dreaming schedule frequency : configurable or fixed?
History window : how many past sessions are analyzed per cycle?
Memory conflict resolution : how are contradictions between new and existing memory entries handled?
Multi-tenant isolation : if one agent serves multiple clients, how is memory isolated per client?
These questions become critical at production scale.
Pricing
Standard Claude API token rates + $0.08 per active session hour. Idle time is free. Dreaming, Outcomes, and Webhooks carry no additional charges.
Quick Reference
| Feature | Status | Function |
|---|---|---|
| Dreaming | Research preview | Scheduled review of past sessions, pattern extraction, memory update |
| Outcomes | Public beta | Automated output grading against developer-defined rubrics |
| Multi-agent orchestration | Public beta | Coordinator + up to 20 parallel subagents, shared filesystem |
| Webhooks | Public beta | Push notifications on agent task completion |
| Pricing | Live | $0.08/active session hour + standard token costs |
Sources:
Anthropic: New in Claude Managed Agents
Anthropic Engineering: Decoupling the Brain from the Hands
Author: Jessie — works on multi-model agent integration infrastructure at EvoLink.
Discussion in the ATmosphere