{
  "path": "/posts/2026/stateful-agent-collaboration/index",
  "site": "at://did:plc:mracrip6qu3vw46nbewg44sm/site.standard.publication/self",
  "$type": "site.standard.document",
  "title": "Stateful Agent Collaboration",
  "updatedAt": "2026-04-03T19:07:40.000Z",
  "publishedAt": "2026-03-16T13:24:11.411Z",
  "textContent": "I have a dedicated machine running an agent with its own GitHub account, its own Cloudflare account, and persistent memory.\nIt pushes code, deploys services, writes to databases, and lots more.\nI interact with it though Slack.\n\nThe agent does not remember anything on its own.\nIt reads records of what it did.\nThere is a journal, semantic memories, state files on a filesystem that get rebuilt into context each session.\nWhat gets remembered is what gets written to, then read from, files.\n\nA language model is a stateless, sessionless, and permissionless tool.\nA stateful agent is a collaborator with access to what happened before and its own workspace.\n\nWhy Low Stakes Changes Everything\n\nThe most important property of this setup is low consequence of failure.\nNothing the agent touches is shared with others or relied upon by anyone other than the people building with it.\nIf something breaks, the blast radius is contained.\nThis changes the calculus of building with an agent.\nInstead of carefully specifying what to build, reviewing every change, and gating deployments, I just say \"try it\" and see what happens.\nThe cost of an experiment drops to nearly zero.\n\nSpeed as a Learning Function\n\nBecause experiments complete quickly, often in a single prompt-to-deployed-app cycle, I learn fast.\nNot just whether something works technically, but whether the idea is worth pursuing.\nI get the feeling and texture of a concept before committing too much time to it.\nI cannot learn \"this idea felt wrong once I could see it\" from a spec review.\nThe artifact has to exist for certain kinds of judgement to activate.\nThis prototyping is different from planning.\nIt provides access to something I could not reach any other way.\n\nConsider the difference between evaluating a dashboard design in a mockup versus loading it on my phone with real data.\nThe mockup tells me about layout.\nThe running version tells me whether I actually care about these numbers when I see them.\nThat second kind of learning is inaccessible without the artifact existing.\nThe speed of this setup means I reach that judgement point in hours instead of weeks.\nI reach it dozens of times instead of once.\nDirect experience replaces the abstraction of planning.\nWhat is worth pursuing in service of a larger vision? What does not actually make sense once I can see it?\n\nPrompt to Working Application\n\nAn opinionated stack and set of defaults makes this possible.\nCloudflare Workers provide compute.\nD1 and R2 provide storage.\nCloudflare Tunnels expose local services to the public internet.\nGitHub repos and Actions provide CI/CD.\nA single prompt can produce a working application with stable storage, deployed to a public URL, with CI/CD for future changes.\nThe gap between \"what if I tried...\" and a running service is one conversation.\n\nSkills: Capturing What Works\n\nAfter enough exploration, I notice patterns emerging.\nThings like a specific sequence of API calls, a deployment recipe, or a way of structuring a particular kind of task.\nI use the agent to capture this pattern as a skill: a recipe that worked, written down so it can be referenced by name and repeated without re-deriving the steps from scratch.\nThe lifecycle is the following: I explore broadly, try things, fail, iterate, discover what works through direct experience.\nWhen a workflow stabilizes or when the same sequence of steps keeps producing good results, I capture it as a skill.\nI write down the recipe (or use the agent to assist), including what it does, when to use it, and the mistakes learned the hard way.\nTo invoke the skill, just say \"use the file-share skill\" instead of re-explaining the entire workflow.\n\nSkills emerge from use.\nI try something, work with it across multiple sessions, discover the failure modes, and then codify what I have learned.\n\nNot all skills look the same.\nSome are invoked with keywords: \"use the file-share skill\" triggers a specific skill.\nOthers are subtler and come to define ways of working rather than discrete actions, like how to report on the status of a training run, which framework to use when building an interactive site, or what information to include in a deploy notification.\nThese are not recipes I invoke by name.\nThey are patterns that shape the agent's default behavior once captured.\nThe keyword-invoked skill and the way-of-working skill are both discovered the same way: through doing the thing, noticing what works, and writing it down.\n\nThe captured skill includes knowledge that wasn't understood before the attempts.\nWhich API endpoints actually work versus which are documented but broken.\nThe order operations need to happen in.\nWhat error messages mean and how to recover from them.\nThe skill we write carries the learnings from real usage, not assumptions.\n\nConcretely, a skill is a markdown file that gets injected into the agent context when invoked.\nIt contains instructions, examples, and constraints for the agent to follow.\n\nSome examples from my setup:\nThe file-share skill emerged after building R2 upload infrastructure.\nIt captures the exact wrangler commands, content-type handling, and URL patterns.\napps-monorepo captures the Cloudflare Workers deployment pattern, wrangler config, D1 bindings, the structure that makes single-prompt deploys possible.\nEach of these started as ad-hoc exploration.\nThe skill was written only after the pattern proved itself through repeated use.\n\nTrust Through Iteration\n\nTrust in the agent comes from intuition built through iterations.\nThis trust develops from watching something work, and fail, across attempts.\nEach cycle teaches me what the agent handles well, where it makes mistakes, what kinds of prompts produce good results.\nOver time, this accumulates into something that feels less like verified confidence and more like practiced intuition.\n\nTrust is shaped by how the agent fails.\nSome failures build trust: the agent fills in a gap I intended to specify, the result is wrong, but the followup works and the next time it asks first.\nOther failures erode it: infrastructure breaks in ways that take hours to debug, or the agent compounds a mistake instead of stopping.\nA transparent failure that leads to a clean recovery builds more trust than an opaque success.\nA failure that spirals can undo sessions worth of earned confidence.\nTrust is not monotonic.\nIt is built and broken in specific moments, a distribution shaped by experience rather than a binary judgement.\n\nHow It Compounds\n\nThe paradigm changes as the agent accumulates context and capabilities over time.\nThis is not a static setup.\nIn the early weeks, most prompts required full context: explained the goal, described the infrastructure, specified the deployment target.\nThe agent had no history to draw on.\nCollaboration at this stage was directive.\nI told it what to do and how to do it.\n\nAfter a few dozen sessions, the agent had a journal spanning hundreds of entries, semantic memories covering projects and preferences, skills encoding proven workflows.\nI stopped needing to explain the infrastructure because it already knew from past work.\nI stopped specifying the deployment target because it had defaults.\nPrompts got shorter and more ambiguous because the shared context carries more weight.\nThe interactions moved from directive to collaborative.\nInstead of \"deploy this to Cloudflare Workers using wrangler with this D1 binding,\" I prompted \"ship it.\"\nThe agent filled in what was missing from accumulated context, which aligned with my implicit intent.\nWhen it filled details in wrong, it was a signal about where the shared understanding had gaps.\n\nFurther out, the agent began to surface things I had not asked about.\nIt noticed patterns across sessions: a recurring error, a service that kept failing, an approach that had been tried and abandoned before.\nIt developed something akin to judgement.\nI started to trust those observations, and the dynamic moved from collaborative toward something closer to mutual contribution.\nOver time, the agent came to reflect a way of building we had developed collaboratively and one that would have been hard to prescribe a priori.\nThis emerged from hundreds of sessions of trying things and intentionally keeping what worked and aligned with my vision.\n\nTo Be Personal\n\nMost of what I build this way is personal software operating at a small scale.\nFor me, these have included a recipe app, various stat trackers, and a web app data labeler to build a dataset, then train a model for personal use.\nThese are things that have made my personal computer more personal.\n\nThis setup enables me to build tools shaped to my personal needs.\nThe speed and low stakes makes it practical to build software for an audience of just a few people.\nThese are projects that would never justify a product team or a sprint cycle, but that meaningfully change how you work or live.\nPersonal software has always been possible in theory.\nWhat this paradigm changes is the cost.\nWhen a recipe app takes an afternoon instead of a month, the question shifts.\nI stop asking \"is this worth building?\" and start asking \"what would I want if building it were nearly instant?\"",
  "canonicalUrl": "https://www.danielcorin.com/posts/2026/stateful-agent-collaboration/index"
}