{
"$type": "site.standard.document",
"content": "---\ntitle: \"The road to COMP4020: pledges, not questions\"\ndescription: \"Replacing two of the three weekly reflection questions with falsifiable\n pledges, and using a cross-eval matrix to find the interesting disagreements.\"\ntags: [comp4020]\n---\n\n:::tip\n\nThis post is part of a series I'm writing as I develop\n[COMP4020: Agentic Coding Studio](/blog/2025/12/19/comp4020-rapid-prototyping-for-the-web/).\nSee [all posts in the series](/blog/tag/comp4020/). This one is a direct\nrevision of [the weekly questions](/blog/2026/03/26/comp4020-the-weekly-questions/)\npost from a few weeks ago---prompted by a conversation I had this morning with\n[Lorenn Ruster](https://lorenn.medium.com/dignity-in-tech-phd-origin-story-5b2d47147825)\nabout her work on responsibility pledges and dignity-centred reflective practice.\n\n:::\n\nA month ago I [landed on three questions](/blog/2026/03/26/comp4020-the-weekly-questions/)\nthat students in COMP4020 would answer each week alongside their prototype:\n_Why this?_ (intentionality), _What made it better?_ (feedback loops), and _Any\ngood?_ (judgement). The idea was to scaffold the weekly\n[studio crit](/blog/2026/02/20/comp4020-the-core-mechanic/) so that students\narrived having already done some thinking, rather than fumbling to figure out\nwhat they thought in real time.\n\nI still believe in the scaffold, but the questions weren't quite right. \"Any\ngood?\" was the weakest of the three: the vaguest, and the one students would\nstruggle with most. \"Why this?\" was fine in isolation, but it turned out to\noverlap with something better. And \"What made it better?\" was pointing at the\nright idea but phrased too vaguely, inviting a list of incremental improvements\nwhen what I really wanted was the singular breakthrough.\n\nLorenn's work is what unstuck the design. She's a recent PhD graduate from the\n[School of Cybernetics](https://cybernetics.anu.edu.au/) whose research focuses on\nclosing the gap between responsible AI principles and actual practice. Her\n[MISQE paper with Katherine Daniell](https://aisel.aisnet.org/misqe/vol24/iss2/6/)\ndescribes how two organisations operationalised responsible AI by crafting\n_responsibility pledges_: specific commitments embedded in routine practice,\nnot lofty principles pinned to a wall. Her\n[dignity-centred reflective practice work](https://aisel.aisnet.org/acis2023/25/)\nshowed that even in fast-moving startup contexts, structured reflection built\naround concrete commitments actually stuck. People integrated it into their\nroutines, which is exactly what I need for a ten-week course with a new\nprototype every week.\n\nWhat makes pledges more useful than questions is that they're\n_testable_. \"Any good?\" invites a shrug. A pledge like \"this prototype will show\nits reasoning before taking any action on the user's behalf\" invites scrutiny:\ndid it, or didn't it?\n\nSo here's the shape of it. Each week, alongside their prototype, students\nsubmit three pledges and answer one question.\n\nThe **pledges** take the form \"This prototype will...\" or \"This prototype will\nnot...\": specific, falsifiable commitments about what the prototype does and\nwhat trade-offs it makes. Students write fresh pledges each week, though\nthey're free to carry forward ones they still believe in. The format matters:\nthese need to be concrete enough that someone else could check the source code\nand the deployed app and say whether the pledge was honoured.\n\nSome examples of what I mean:\n\n- \"This prototype will show its reasoning before taking any action on the user's behalf\"\n- \"This prototype will not store any data it doesn't need to function\"\n- \"This prototype will fail visibly rather than silently when the API is down\"\n- \"This prototype will work without JavaScript for its core reading experience\"\n\nAnd here's what I _don't_ mean: \"I pledge to be ethical\", \"This prototype\nrespects human dignity\", \"I will consider the user.\" These sound nice but\nthey're not evaluable. You can't look at a codebase and tell me whether it\n\"considers the user.\" You _can_ tell me whether it stores data it doesn't need.\n\nI considered other numbers---Donna Hicks's\n[dignity model](https://www.amazon.com/Dignity-Essential-Role-Resolving-Conflict/dp/0300188056)\nhas ten elements, the\n[Holberton-Turing Oath](https://github.com/Holberton-Turing/oath/blob/master/The_Holberthon-Turing_Oath.md)\nhas thirteen, most organisational AI principles land around five to seven. But\nthose are comprehensive frameworks meant to cover everything. Three pledges for\na specific weekly prototype is about right: enough to cover meaningfully\ndifferent dimensions of the work, few enough that each one gets genuine thought.\n(I originally considered keeping the 280-character skeet format from the earlier\ndesign, but dropped it---the pledges need to be specific enough to be\nevaluable, and character limits would work against that.)\n\nThat's three pledges. The fourth weekly deliverable is a single question, and\nit's this: _What was the aha moment?_ This replaces the old \"What\nmade it better?\" with something more pointed. I don't want a list of things\nthat helped. I want the singular breakthrough that turned the project from\nstuck to working. It might be switching tech stacks, or a blog post that\nreframed the problem, or a tweak to the agentic coding harness, or a\nparticular prompting angle you tried in a fresh Claude Code session, or a\nconversation with a friend who pointed out you were solving the wrong problem.\nThe point is to identify _the_ thing, not _a_ thing.\n\nOver ten weeks, with twenty students per crit group, that's roughly two hundred\ndocumented aha moments---a browsable catalogue of \"things that actually work\nwhen you're building with AI agents.\" I'd read that catalogue even if I wasn't teaching\nthe course.\n\nI did consider other framings before getting here. I thought about adding a\nfourth \"pledge\" question on top of the existing three, or reframing all three\nquestions through Lorenn's dignity lens, or replacing only \"Any good?\" and\nleaving the other two untouched. The problem with keeping \"Why this?\" is that\nthe pledges already absorb it. A pledge like \"this prototype will surface your\nbrowser fingerprint in a way you didn't ask for, to show how little privacy you\nactually have\" _tells_ you why the student built what they built. The\nintentionality is baked into the commitment. And the problem with keeping all\nthree questions and adding pledges on top is that it's just too much weekly\noverhead alongside the prototype work itself.\n\nI want to head off a concern I had myself about all this: that the word\n\"pledge\" creates a performative do-gooder dynamic where students compete to\nwrite the most virtuous-sounding commitments and avoid anything spicy. That would be a\ndisaster for this course. The weekly\n[provocations](/blog/2026/02/20/comp4020-the-core-mechanic/) are designed to\ninvite subversive, experimental, even uncomfortable responses. These prototypes\nare closer to art projects than pitch decks for a startup accelerator.\n\nA pledge in this context isn't \"I promise to be good.\" It's \"here's what I'm\ncommitting this prototype to do, and you can hold me to it.\" That includes\ncommitments to provoke or to expose something uncomfortable:\n\n- \"This prototype will surface your browser fingerprint in a way you didn't ask for, to show you how little privacy you actually have\"\n- \"This prototype will deliberately exclude power users to find out what happens when software optimises for beginners only\"\n- \"This prototype will feel slightly wrong to use, on purpose\"\n\nA student building something deliberately transgressive should be _more_ able to\nwrite sharp pledges, not less, because they've already thought about what\nreaction they're trying to provoke and what trade-offs they're consciously\nmaking. The pledge isn't a moral filter; it's a demand for specificity about\nwhat you're actually doing and why.\n\nOnce the pledges are written, they become the raw material for the crit\nitself. Each crit group has about twenty students, each submitting three\npledges. That's sixty pledges per week, and twenty prototypes. What happens if\nyou evaluate every prototype against every individual pledge?\n\nA 20×60 matrix, each prototype scored against each pledge, is far too many\nevaluations for humans (twelve hundred cells) but straightforward for an LLM\nwith access to the source code and the deployed app. The raw matrix isn't what\nmatters, though. What matters is what clusters out of it:\n\n_Universally honoured pledges_---ones every prototype satisfies. These are\nprobably too vague or too easy. Worth interrogating: is this pledge actually\nsaying anything, or is it the equivalent of \"I pledge to be ethical\"?\n\n_Universally broken pledges_---ones nobody's prototype satisfies. Either the\npledge is unrealistic given a one-week build, or it's pointing at a genuine\nblind spot the whole group shares.\n\n_Controversial prototypes_---ones that satisfy some pledges and violate others.\nThese are where the crit gets good. The prototype is\nmaking a trade-off that some students' values endorse and others reject.\n\n_Controversial pledges_---ones where the group splits on whether a given\nprototype honours them. This means the pledge is ambiguous, or students\ninterpret \"in accord\" differently. Also worth pulling apart.\n\nThe facilitator doesn't need to fish for tension in the crit, since the matrix\nhas already found it. \"Your prototype broke thirty-eight of the sixty pledges\nin the room. Let's talk about why.\" Or: \"Everyone's prototype honoured this\npledge. Does that mean it's a good pledge, or does it mean none of you were\nambitious enough?\"\n\nThe matrix is a conversation starter, not a verdict. Showing the LLM's\nreasoning alongside each judgement means students can push back on it, and\narguing about whether the evaluator got it right is itself a productive crit\nconversation.\n\nThe obvious risk with any of this is ethics washing. There's a well-documented\n[problem in the broader pledge literature](https://www.aiethicist.org/pledges)\nwith what gets called that name: organisations issuing lofty commitments that\nlook good in a press release and change nothing in practice. Student pledges\ncould easily go the same way: three nice-sounding sentences, duly submitted\neach Monday, utterly disconnected from the prototype they accompany.\n\nThe cross-eval is one defence: it makes pledges _testable_. A vague pledge like\n\"this prototype will be fair\" gets exposed the moment an LLM tries to evaluate\nit against a codebase and can't produce a clear answer. If your pledge can't\ngenerate a yes-or-no verdict, it probably isn't saying anything.\n\nBut the subversive prototypes are the other defence, and maybe the more\nimportant one. A deliberately provocative prototype stress-tests\nwhether the class's pledges are thoughtful or just pious. If someone builds\nsomething that intentionally violates privacy to make a point about\nsurveillance, and that prototype breaks forty out of sixty pledges,\nthat's not a failure but a fantastic crit discussion. \"You all pledged that\nsoftware should respect user consent. This prototype deliberately violates it.\nIs it irresponsible, or is it the most responsible prototype here, because it's\nthe only one that made you _feel_ what users actually experience?\" You can't get\na conversation like that from \"any good?\"\n\nNone of this works, of course, if students don't know how to write good\npledges in the first place. The first couple of weeks need to model it\nexplicitly.\n\nIn week one, I'd write pledges for a demo prototype and walk through my\nreasoning in front of the class: why this pledge and not that one, what makes\nit evaluable, what trade-off it encodes. In week two, students write their own\nand see the cross-eval matrix for the first time. The facilitation question\nbecomes: \"whose prototype broke your pledge, and do you think that's a\nproblem?\" By week three or four, the scaffolding should be unnecessary.\n",
"createdAt": "2026-05-13T23:14:36.222Z",
"description": "Replacing two of the three weekly reflection questions with falsifiable pledges, and using a cross-eval matrix to find the interesting disagreements.",
"path": "/blog/2026/04/16/comp4020-pledges-not-questions",
"publishedAt": "2026-04-16T00:00:00.000Z",
"site": "at://did:plc:tevykrhi4kibtsipzci76d76/site.standard.publication/self",
"tags": [
"comp4020"
],
"textContent": "Replacing two of the three weekly reflection questions with falsifiable pledges, and using a cross-eval matrix to find the interesting disagreements.",
"title": "The road to COMP4020: pledges, not questions"
}