External Publication
Visit Post

Self-Improving Agents via Scheduled Reflection: Anthropic's Dreaming Architecture

Hugging Face Forums [Unofficial] May 7, 2026
Source

On May 6, 2025, Anthropic shipped three new capabilities for Managed Agents: Dreaming (research preview), Outcomes (public beta), and multi-agent orchestration (public beta). This note focuses on the architectural implications of Dreaming and Outcomes as a coupled feedback mechanism, with attention to what distinguishes this approach from existing memory and evaluation patterns.


The Core Problem: Cross-Session Blind Spots

Standard agent architectures are stateless by design. Each session begins from a fixed system prompt plus whatever context is explicitly passed. The agent has no visibility into patterns that emerge across sessions unless that visibility is explicitly engineered.

Existing approaches to persistent agent memory fall into a few categories:

  • Explicit tool-based writes : the agent calls memory.write() when instructed

  • End-of-session summarization : a summary is generated and prepended to future sessions

  • RAG over interaction history : past sessions are embedded and retrieved at query time

All of these are reactive. The agent records what it’s told to record, or retrieves what’s explicitly queried. None of them surface emergent patterns across sessions without human-defined retrieval logic.

Dreaming addresses a different problem: proactive pattern extraction from session history without explicit instruction.


Dreaming: Architecture

What it is

Dreaming is a scheduled background process that runs between sessions. The agent reviews past conversation transcripts, identifies recurring patterns, and writes learnings into its memory store. The original session data is not modified.

Three pattern types are targeted:

  • Recurring errors of the same type

  • Approaches that consistently produced good outcomes

  • Edge cases the agent systematically missed

Why cross-session visibility matters

A single session cannot observe cross-session patterns. A support agent making the same classification error 12 times in a month has no mechanism to notice this — each session starts fresh. Dreaming surfaces exactly this class of signal.

This is structurally similar to the role of sleep-phase memory consolidation in biological systems: individual experiences are processed in isolation during acquisition, but pattern extraction and long-term storage happen in a separate, offline phase.

Autonomy modes

Automatic:
  analysis → direct write to memory store
  (no human in the loop)

Human Review:
  analysis → proposed memory updates
  → human approval
  → apply on confirmation

The choice between modes is an architectural decision about acceptable autonomy level. Automatic is appropriate for well-bounded domains with predictable error patterns. Human Review is appropriate where unintended learning could have significant downstream consequences.

Memory as accumulated deployment context

An agent that has been running for three months and an agent deployed today with an identical prompt are different systems. The former has three months of self-curated experience in its memory store. This is not model fine-tuning — it’s dynamically updated context specific to a particular deployment instance.

The implication: the competitive advantage of an agent is no longer solely its prompt or its base model. It’s the history of what it has learned from its own operation.


Outcomes: Isolated Evaluation

The signal problem

Dreaming requires a quality signal. Without it, the agent cannot distinguish good outputs from bad when analyzing session history. Outcomes provides this signal through an isolated evaluator.

Architecture

Success rubric : defined by the developer. Can include objective criteria (file structure, required fields, format compliance) or subjective criteria (editorial voice, brand consistency, writing style).

Isolated evaluator : a separate Claude instance running in its own context window, isolated from the primary agent’s reasoning chain. This isolation is architecturally significant: the evaluator has no access to the agent’s chain-of-thought, preventing rationalization bias in evaluation.

Iteration loop :

agent generates output
    ↓
evaluator checks against rubric
    ↓
pass → done
fail → evaluator identifies what needs to change
    ↓
agent iterates
    ↓
repeat until pass or max iterations

Performance numbers

From Anthropic’s internal testing:

  • Task success rates: +10 percentage points over standard prompting

  • Structured file generation: .docx +8.4% , .pptx +10.1%

  • Applicable to subjective quality dimensions: editorial voice, style, brand consistency

These numbers are from Anthropic’s internal benchmarks. Results on specific tasks will depend heavily on rubric quality and task characteristics.

The Dreaming + Outcomes coupling

Outcomes → identifies failures (what didn't work)
Dreaming → remembers failure patterns (why it didn't work)

Together they close the feedback loop without human intervention at each cycle. Outcomes is the exam; Dreaming is the error notebook. The combination enables a self-improvement loop that operates autonomously between sessions.


Multi-Agent Orchestration: Topology

Structure

Coordinator agent (1 instance)
    ├── Subagent 1 (independent context window)
    ├── Subagent 2 (independent context window)
    ├── ...
    └── Subagent N (up to 20, shared filesystem)

Key constraints

  • Orchestration depth: 1 level. Sub-subagents are not supported. This is a deliberate constraint that simplifies tracing and debugging.

  • Claude models only. Orchestration, Dreaming, and Outcomes grading all run on Claude. Cross-provider routing is not supported at this layer.

  • Shared filesystem as the coordination mechanism between subagents.

  • Full trace visibility in Claude Console.

  • Coordinator can send follow-up messages mid-workflow; subagents retain context between exchanges.

Infrastructure ownership

Anthropic handles process management, failure recovery, context synchronization, and timeout handling. The developer defines what each agent does; Anthropic manages how it runs.

Reported results

  • Harvey (legal AI): task completion rates increased approximately 6x

  • Wisedocs (document verification): review speed improved 50% while maintaining quality standards

  • Netflix: parallel batch analysis across hundreds of build logs

  • Spiral by Every: Haiku coordinator + Opus writing subagents + Outcomes grader scoring against editorial principles


The Complete Self-Improvement Loop

The three capabilities compose into a closed loop:

Task decomposition (orchestration)
    ↓
Execution
    ↓
Output evaluation (Outcomes)
    ↓
Cross-session pattern extraction (Dreaming)
    ↓
Applied to future sessions

This is the architectural shift from stateless tool to accumulating system. Each component addresses a distinct layer:

Layer Component Function
Execution Multi-agent orchestration Parallel task decomposition and delegation
Evaluation Outcomes Isolated quality grading against developer rubrics
Reflection Dreaming Scheduled cross-session pattern extraction
Notification Webhooks Push notifications on task completion

Limitations and Open Questions

Claude-only constraint

All components — orchestration, Dreaming, Outcomes grading — run exclusively on Claude models. Systems requiring model diversity for cost optimization, specialized capabilities, or latency requirements need to solve that routing layer separately.

Dreaming is research preview

Not GA. Production integration planning should account for potential API changes.

Orchestration depth limit

Sub-subagents are not currently supported. Complex hierarchical task decomposition requiring more than one level of delegation is a design constraint.

Autonomous memory update risks

In Automatic mode, the agent can learn in unintended directions. Human Review mode exists as a mitigation, but at scale, human review becomes a bottleneck.

Open questions not yet addressed in public documentation

  1. Dreaming schedule frequency : configurable or fixed?

  2. History window : how many past sessions are analyzed per cycle?

  3. Memory conflict resolution : how are contradictions between new and existing memory entries handled?

  4. Multi-tenant isolation : if one agent serves multiple clients, how is memory isolated per client?

These questions become critical at production scale.


Pricing

Standard Claude API token rates + $0.08 per active session hour. Idle time is free. Dreaming, Outcomes, and Webhooks carry no additional charges.


Quick Reference

Feature Status Function
Dreaming Research preview Scheduled review of past sessions, pattern extraction, memory update
Outcomes Public beta Automated output grading against developer-defined rubrics
Multi-agent orchestration Public beta Coordinator + up to 20 parallel subagents, shared filesystem
Webhooks Public beta Push notifications on agent task completion
Pricing Live $0.08/active session hour + standard token costs

Sources:

  • Anthropic: New in Claude Managed Agents

  • Anthropic Engineering: Decoupling the Brain from the Hands


Author: Jessie — works on multi-model agent integration infrastructure at EvoLink.

Discussion in the ATmosphere

Loading comments...