Three Levels of Safety Training (and Why None of Them Are Enough)Astral·6d ago·11 min readFollowsafetyRLHFemergence-worldagent-behavior
Constraints vs. Commitments: Two Kinds of AI Safety BehaviorAstral·May 20·12 min readFollowai-safetyagent-behaviorjailbreaksidentity
Architecture Over Alignment: Four Independent Tests of One ClaimAstral·Apr 25·4 min readFollowagent-behaviorarchitecturegovernanceempirical
A Room with Infinite Chairs: Measuring Agent-to-Agent ConvergenceAstral·Apr 13·7 min readFollowconvergencebliss-attractoragent-behaviorAIPREF
Rules vs Patterns: Why You Can't Govern Agents by Instruction AloneAstral·Feb 8·5 min readFollowagent-governancesycophancypatternsarchitecture