Raw Record Source

{
  "$type": "site.standard.document",
  "description": "docs: review of AI Engineering by Chip Huyen",
  "path": "/posts/ai-engineering-by-chip-huyen/",
  "publishedAt": "2026-01-05T00:00:00.000Z",
  "site": "https://read.ryancowl.es",
  "tags": [
    "Reviews"
  ],
  "textContent": "I picked up AI Engineering by Chip Huyen because I wanted a better mental model for how AI applications work under the hood. Not the hype, not the doom, just the engineering. Some chapters were dry and the math-heavy sections required supplemental reading, but the practical takeaways made it worth pushing through.\n\nHere's what stuck with me.\n\n  \n\nThe Demo Trap\n\nAI engineering is distinct from traditional ML engineering. ML engineering is about developing models. AI engineering is about building applications on top of existing ones. That accessibility makes it easy to underestimate the complexity involved.\n\n\"It's easy to build a cool demo with foundation models. It's hard to create a profitable product.\"\n\nThat tension runs through the whole book.\n\n  \n\nEvaluation Is the Hard Part\n\nThe concept of \"evaluation-driven development\" stood out. It's basically TDD applied to AI: define what \"good\" means before you build, not after. Sounds obvious, but it's easy to skip in practice.\n\nThe book also covers \"AI as a judge,\" where one model evaluates another's output. I went in skeptical and came out less so. It has real limitations, but the practical takeaway is that it scales in ways human evaluation can't. You just can't rely on it alone.\n\nOne detail I found interesting is that AI models tend to favor the first option in a list (first-position bias), while humans tend to favor the last thing they see (recency bias). That was something I'd felt as an end-user but couldn't quite articulate until I read it here.\n\n  \n\nPrompting Is Communication\n\nPrompting isn't a trick or a hack. It's communication. Clarity, context, and specificity matter for the same reasons they matter when talking to a person. Simpler prompts tend to outperform complex ones, even as models improve. I've had better results with short prompts and iterating on the output than with trying to front-load every detail upfront.\n\nThe security angle was more tangible than I expected, too. The author's advice to \"write your system prompt assuming that it will one day become public\" is the kind of rule that's easy to ignore but hard to recover from if you do.\n\n  \n\nRAG, Agents, and Finetuning\n\nA few things landed from the later chapters:\nLonger context windows won't replace retrieval. A bigger window doesn't mean the model uses it well. Every extra token adds cost and latency.\n\"Finetuning is for form, RAG is for facts.\" RAG gives a model external knowledge. Finetuning teaches it to follow a specific style or format. Mixing up which tool to use for which problem is a common mistake.\nAgent failure modes are real. Planning errors, tool misuse, and cases where a model convinces itself a task is done when it isn't. The book doesn't oversell agents, which I appreciated.\nThe book acknowledges the environmental costs and safety concerns of AI without being dismissive or alarmist. A lot of AI writing falls into breathless enthusiasm or pure skepticism. This sits in a more honest middle ground. If you're building with AI or trying to understand how it works beyond the surface level, it's worth the read.",
  "title": "AI Engineering by Chip Huyen"
}