External Publication

Gunnar Wolf: Heads we win, tails you lose — AI detectors in education

Planet Debian [Unofficial] April 29, 2026

This post is an unpublished review for Heads we win, tails you lose — AI detectors in education

Educators throughout the world are tasked with the difficult requirement of evaluating students’ works, making sure the grades meaningfully reflect the students’ understanding of the subject, and that a graded assignment maps to the relevant work invested in solving it. After the irruption of Large-Language Models in late 2023, this task became obviously much harder: if a widely available computer program is able to solve an assignment in a way that resembles a human-generated response, how can educators meaningfully grade their groups?

As it has been the case with different innovations over time (such as with the appearance of electronic calculators or the mass availability of digital encyclopedias), the first reactions were of prohibition and denial: students who use the new tool in question are to be disqualified or somehow punished. It is only some time after the innovation in question settles that teachers find a way to properly weigh, integrate and accept its use.

The authors of this position article present several arguments as to why it is impossible, unethical and unadvisable to use automated AI detection systems to process student assignments. The first argument is whether it is at all possible to reliably differentiate human-written essays from LLM-generated artifacts. The first criticism is that AI detectors are, themselves, LLMs trained on human-generated texts (negative) and LLM-generated texts (positive). However, the only way to assert the training material is not noisy is to use pre-2020 text as human-generated — but natural ways of writing are influenced by what people read, and the authors quote studies pointing out that human language, particularly in the scholarly fields, has incorporated terms and constructions that were used as LLM markers. Quoting the authors, «As exposure to AI-generated material becomes increasingly widespread, it is reasonable to expect that the linguistic patterns of human writing will shift, reflecting the influence of AI-assisted texts encountered across education, media, and everyday communication». Stylistic elements and other such markers are being adopted back into regular speech at a high rate.

Then, the aspect of ethics comes into play as well. While it is expected that teachers should demand intellectual integrity from students, and plagiarism detectors have been widely accepted into the workflow of academics, the accusation of presenting LLM output as own work is necessarily an uphill battle: the accused party is tasked with providing proof of innocence based on nebulous, probabilistic accusations. The authors argue, once an accusation of turning in a LLM-generated text is made on a student, the onus on proving innocence lies with the accused.

The authors review and argue against a series of techniques that have been presented in literature to aid teachers in detecting LLM abuse, such as linguistic markers, single or multiple AI detectors, the use of false references, hidden adversarial prompts, arguing in all cases the techniques fail to be trustable enough and highlighting the probability of both false positives and negatives. They also present AI detection as a false dichotomy: many works presented are not 100% human generated nor 100% LLM-generated, but some pertinent LLM-generated paragraphs are presented mixed with human-generated content, in a positive, critical AI use (“Students’ work is frequently created with, not by, generative AI”).

The article closes by reiterating the authors’ position: “AI detection in education is not merely flawed; it is conceptually unsound”. they call upon institutions to accept the use of generative LLMs cannot be “solved through surveillance and punishment”, but has to be tackled by an “assessment design that recognizes AI’s role in learning”.

This article’s position is very strong and well argued, and although it will surely meet with ample opposition, it surely poses an important, very current problematic. As a teacher, I found it a very enlightening read.

Discussion in the ATmosphere