External Publication

Machines Triage. Humans Decide.

Over Security - Cybersecurity news aggregator [Unofficial] June 4, 2026

The AI-orchestrated cyberespionage campaign that Anthropic disclosed in late 2025 , the most agentic offensive operation publicly documented, required humans at four to six decision points per campaign. Deterministic work is where agents earn their keep The work agents do well has a few features in common. Each of those tasks has a deterministic core: there is a right answer, the answer can be checked against ground truth, and the failure modes are bounded. The bulk of what arrives in an alert queue does not require judgment. Putting humans on that work doesn't make the work better. Ambiguous judgment is where humans stay irreplaceable The other layer is harder to describe because it is not a list of tasks. It is a property of certain decisions: they require judgment under ambiguity, with material consequences, on incomplete evidence. The right answer depends on context the model does not have access to: the user's recent project history, the company's quarterly close calendar, what the security team negotiated with that engineering manager last month, whether the org is in a sensitive contract renewal that would change the cost of a wrong call. The decision is not "is this malicious" but "what is the right next step given what we know and what we don't." Models hallucinate confidently in this space. When a model encounters ambiguity, it tends to produce a confident-sounding answer that pattern-matches to its training distribution rather than to the specific situation in front of it. Humans handle this layer not because humans are smarter than models in some general sense. Humans handle it because the calculus on ambiguous decisions involves stakes, context, and accountability that the model has no exposure to. The naive answer is "the agent handles tier-1, the human handles tier-3, tier-2 is the handoff." That description is roughly right but undersells what good handoff design actually requires. Over time, the model gets better at routing the same shape of case to the human earlier, and the false-positive surface narrows. Without it, the agent layer drifts and the human layer gets noisier over time. Every model verdict and every human decision is logged in a way that lets a third party reconstruct what happened and why. Is the decision deterministic, or does it require judgment under ambiguity? Is there context outside the alert data that materially affects the right answer? Processes that involve ambiguous judgment, asymmetric cost-of-error, or external context belong to humans. The right boundary is rarely "fully autonomous" or "fully manual." It is usually "agent processes the case to a specific point, hands to a human at the decision moment, human approves or modifies, system logs the decision." That design discipline, process by process, decision by decision, is the part that takes the longest and pays the most. The right question is "what kind of decision is this, and where should it live?" The answer comes out the same way every time. Deterministic work runs better on machines. Ambiguous judgment runs better through humans. The category does not yet have a clean answer for the second layer, and pretending it does is what produced the false dichotomy between "AI replaces analysts" and "AI is dangerous." Neither is right.

Discussion in the ATmosphere