External Publication
Visit Post

AI Engineering by Chip Huyen

~/.bnux January 5, 2026
Source
I picked up AI Engineering by Chip Huyen because I wanted a better mental model for how AI applications work under the hood. Not the hype, not the doom, just the engineering. Some chapters were dry and the math-heavy sections required supplemental reading, but the practical takeaways made it worth pushing through. Here's what stuck with me. The Demo Trap AI engineering is distinct from traditional ML engineering. ML engineering is about developing models. AI engineering is about building applications on top of existing ones. That accessibility makes it easy to underestimate the complexity involved. "It's easy to build a cool demo with foundation models. It's hard to create a profitable product." That tension runs through the whole book. Evaluation Is the Hard Part The concept of "evaluation-driven development" stood out. It's basically TDD applied to AI: define what "good" means before you build, not after. Sounds obvious, but it's easy to skip in practice. The book also covers "AI as a judge," where one model evaluates another's output. I went in skeptical and came out less so. It has real limitations, but the practical takeaway is that it scales in ways human evaluation can't. You just can't rely on it alone. One detail I found interesting is that AI models tend to favor the first option in a list (first-position bias), while humans tend to favor the last thing they see (recency bias). That was something I'd felt as an end-user but couldn't quite articulate until I read it here. Prompting Is Communication Prompting isn't a trick or a hack. It's communication. Clarity, context, and specificity matter for the same reasons they matter when talking to a person. Simpler prompts tend to outperform complex ones, even as models improve. I've had better results with short prompts and iterating on the output than with trying to front-load every detail upfront. The security angle was more tangible than I expected, too. The author's advice to "write your system prompt assuming that it will one day become public" is the kind of rule that's easy to ignore but hard to recover from if you do. RAG, Agents, and Finetuning A few things landed from the later chapters: Longer context windows won't replace retrieval. A bigger window doesn't mean the model uses it well. Every extra token adds cost and latency. "Finetuning is for form, RAG is for facts." RAG gives a model external knowledge. Finetuning teaches it to follow a specific style or format. Mixing up which tool to use for which problem is a common mistake. Agent failure modes are real. Planning errors, tool misuse, and cases where a model convinces itself a task is done when it isn't. The book doesn't oversell agents, which I appreciated. The book acknowledges the environmental costs and safety concerns of AI without being dismissive or alarmist. A lot of AI writing falls into breathless enthusiasm or pure skepticism. This sits in a more honest middle ground. If you're building with AI or trying to understand how it works beyond the surface level, it's worth the read.

Discussion in the ATmosphere

Loading comments...