{
"path": "/posts/2025/models-as-force-multipliers/index",
"site": "at://did:plc:mracrip6qu3vw46nbewg44sm/site.standard.publication/self",
"$type": "site.standard.document",
"title": "Models as Force Multipliers and their Limits in Software Development",
"updatedAt": "2025-04-19T16:35:44.000Z",
"description": "You need to use models to build software to really understand their limits",
"publishedAt": "2025-04-08T16:29:01.000Z",
"textContent": "import Chat from '@components/prose/Chat.astro';\n\nI'm on a flight and wanted to write code to work on an idea.\nAfter a few moments of shifting mental gears, I popped open Zed, which allows me to code with a local LLM using ollama.\nMy default impulse when writing code is to prompt a model.\nAt first, I felt somewhat negative about this but with llama3.2:3b at my side, eventually I got into the rhythm of working through the idea and mostly writing the code by hand which I don't do as often these days.\n\nThis experience of not having a good model available made me reflect for the first time in a while -- how am I even thinking about when I use a LLM today to write some bit of code or functionality in a project I am working on?\nHow do I reason about whether the LLM is up to the task and how much am I filling in the model's gaps without realizing it?\n\nllama3.2 feels like going back in time\n\nOne of the first things I was confronted with was how mediocre llama3.2 was as a code assistant compared to Claude Sonnet, my current daily driver.\nIt's not that llama3.2 can't write code that fulfills my prompts -- it mostly can -- it's that it can't fluidly edit code inline while following prompt instructions and regenerating the specific parts of the file that need changes to comply with the instructions.\n\nHere is an example\n\n<Chat\n model=\"llama3.2\"\n messages={[\n {\n role: \"user\",\n content: \"write python code to create a markdown file with created_at as frontmatter. use the stdlib only. no comments.\"\n },\n {\n role: \"assistant\",\n content: \n\\\\\\python wrap showLineNumbers=false\nimport json\nfrom datetime import datetime\n\ndef create_markdown_file(file_name, created_at):\ndata = {\n\"created_at\": created_at.strftime('%Y-%m-%d %H:%M:%S')\n}\nwith open(f'{file_name}.md', 'w') as f:\njson.dump(data, f)\nf.write('\\\\n---\\\\n')\n\ncreate a markdown file\n\ncreate_markdown_file('example', datetime.now())\n\\\\\\\n\n}\n]}\n/>\n\nThis code creates an example.md containing\n\nMaybe not exactly what I had in mind but gets enough of the way there that the output with minor modifications will do what I want.\nThis reminds me of the days I used to use ChatGPT to help write code.\nI would toss in a prompt like the one above and get code that was a helpful starting point for the direction I was headed in.\n\nThese small models don't work very well as editors of source code or agents because they are not consistent enough at following instructions.\nThey are best used for chat.\nUsing inline editor prompting (<kbd>cmd</kbd>+<kbd>I</kbd> by default in Zed) often results in strange artifacts getting added to the file and that model often does not follow instructions well.\nNevertheless, chat can still go a long way.\n\nBuilding software feels different now\n\nIf you've been following state of the art models and agents, you're aware that these days you can push both the model alone, as well as a tool-calling agents, to build and iterate on what used to be nontrivial sized software projects (x,000 lines of code).\nThe state of the art has pushed us so far that people can now build proof-of-concept quality, functioning software without ever directly touching code themselves.\nThere are still plenty of challenges but this was not possible a year ago.\n\nThese capabilities are pretty incredible.\nI use them.\nThey have a ceiling, but it's one that is hard to articulate.\nFor someone hoping to build a simple UI and deploy to Vercel, maybe this ceiling is irrelevant.\nEspecially if they're not familiar with code, they may not have a concept of the ceiling of the model/agent's capabilities.\nThey just know the agent did what they asked it to do.\n\nDifferent types of software have different burdens of maintenance\n\nSoftware systems don't need to be incredibly large or complex to be useful.\nPlenty of software becomes less useful as it grows, bloated with features meant to extract value rather than create it.\nAt the same time, as a codebase grows, issues of maintainability and reliability become more relevant.\n\nIf I create a registration system for a summer camp that all of a sudden breaks when I need it to be working most, that is a problem I need to solve myself if I wrote the software.\nIf you don't know how to write code by hand, maybe you could use a model to build this registration system today, but could you use a model to fix the system when it breaks?\nMost of the \"life\" of software is spent being maintained, not written.\nMost engineering jobs prioritize keeping the existing system available above any new changes or improvements.\n\nThese maintenance needs are invisible to the first-time software author (read vibe coder) until they're obvious.\n\nThe system breaks.\n\nSomeone compromised your system's data.\n\nThe system is overwhelmed by traffic.\n\nThese are realities of software in the world.\nModels are even useful for solving these problems.\nNo perfect system exists -- it's all about making tradeoffs within the constraints.\n\nModels are force multipliers\n\nRight now we're in a period of adjustment.\nI'm still surprised by how much code I can write with simple instructions and how much progress I can make on a project in fixed time relative to what has previous been possible.\nBut many of the realities of building software that needs to run reliably and be secure have not changed.\nIf I am still the responsible party for handling and being accountable for issues with software, then models are simply tools for me.\n\nIt would be ill advised to generate and merge code for a system I was oncall for without carefully reading and testing that code.\nIt's much easier to deal with any problems with that code before it makes it to production.\nKnowing whether things work requires verification, by a human, tests, or trusted system.\n\nModels are currently a force multiplier.\nWith a powerful model, I can build a ChatGPT clone in Swift in days or weeks instead of weeks or months.\nI can get code in any language to solve a simple problem like formatting a date as yyyy-mm-dd in a few seconds instead of minutes a very high percentage of the time.\n\nSuch a force multiplier empowers a larger number of people to realize a vision they have in less time and with fewer resources.\nIt empowers people to use their existing skills in different ways.\nIt makes the computer a more powerful tool for more people.",
"canonicalUrl": "https://www.danielcorin.com/posts/2025/models-as-force-multipliers/index"
}