Raw Record Source

{
  "$type": "site.standard.document",
  "description": "Better practices for safely and reliably deploying and updating NixOS systems.",
  "path": "/blog/2025-02-07-stop_using_nixos-rebuild_switch/",
  "publishedAt": "2025-02-07T00:00:00.000Z",
  "site": "at://did:plc:zntngpowgd6rorjt3haywj36/site.standard.publication/3m36bacisus2x",
  "tags": [
    "nix",
    "cicd",
    "linux"
  ],
  "textContent": "🚨 Attention NixOS users! 🚨\n\nOkay, I know this will sound strange to many of you. But hear me out.\n\nPlease stop using nixos-rebuild switch.\n\nSeriously. Every time you deploy this way, you're likely causing more harm than you realize. And I promise you, there are better, safer, and more reliable ways to live on NixOS.\n\nLet’s talk about it.\n\n---\n\nBackground\n\nNix is a powerful package manager that lets you manage derivations—which include packages, configuration files, and more—in an atomic, declarative, and reproducible™ way.\n\nOn NixOS, everything—applications, configuration files, service units, etc.—lives in /nix/store, which acts as the system's read-only source of truth. Instead of modifying files in place like traditional package managers, Nix creates symlinks to the Nix store as needed. A particular collection of these symlinks is called a generation.\n\nUnlike most Linux distributions, upgrading NixOS doesn't overwrite or delete anything. A new generation is created (symlinked to /run/current-system), while older generations remain intact and available for rollback. This guarantees access to previous configurations and installed packages.\n\nThis approach has major benefits. It prevents rug-pulling—where an application's executables or libraries change mid-run, potentially causing crashes or instability. It also isolates packages and their dependencies to avoid conflicts. In many cases, you can even pull packages from nixos-unstable or the latest upstream master branch without worrying about catastrophic breakage.\n\nMost importantly, Nix makes rollbacks nearly effortless. If something goes wrong, you can boot into a previous generation—no panic or rescue USB required. Each generation is like a Git commit: you can always revert. Or so they say.\n\n---\n\nThe Problem with nixos-rebuild switch\n\nOkay, back to the issue at hand. Why should you avoid using nixos-rebuild switch?\n\nAt first glance, it seems like the obvious choice for applying a newly minted system generation. It immediately updates the running system and restarts services as needed, making it the most common method for upgrading or rolling back a NixOS system. However, despite its convenience, nixos-rebuild switch introduces several significant problems.\n\nIncomplete Updates and Inconsistent System State\n\nNixOS isolates system components to prevent unexpected rug-pulling. However, some components update immediately while others do not. For example, low-level components (like the kernel and systemd) take effect only after a reboot, and many user applications (e.g. Firefox, GNOME, Hyprland, various display servers) do not restart automatically after an update. Although the updated files appear in /run/current-system, the running instances continue to use the older versions until manually restarted.\n\nThis behavior minimizes disruptions but also creates a mixed system state: some components run the latest software while others continue using outdated (and potentially less secure) versions. This mismatch can lead to unpredictable behavior and complicates troubleshooting.\n\nHidden Bugs and Boot Issues\n\nBecause the system remains in an inconsistent state after a switch, subtle bugs or boot failures may only become apparent after a reboot. A system that appears stable at first might reveal issues only after a reboot, complicating your troubleshooting efforts. If a system fails its reboot test, the configuration cannot be reliably reproduced.\n\nBoot Menu Clutter and Configuration Guesswork\n\nFrequent iterations with switch generate an overwhelming number of system generations in your boot menu. This clutter makes it hard to identify the last known-good generation, often requiring multiple rollbacks to restore stability. After all, a rollback isn't very useful if you're not sure which generation to revert to.\n\nIncompatible States\n\nEach configuration iteration carries some risk—it could break your system's ability to boot or function as expected. Even if not every change is catastrophic, it's important to recognize the dangers before proceeding. Blindly running nixos-rebuild switch is akin to pushing broken code to master—except it's worse because generations are not fully isolated environments.\n\nIt isn't just about a new version of Linux or Mesa with a show-stopping bug. The core issue is that NixOS is not, by itself, a turly stateless system, and flawless rollbacks are never guaranteed. Your Wayland compositor might refuse to launch with configuration files from a newer generation, or your browser profile might become corrupted by an older version of Firefox. There is a real potential for data loss or corruption if you don't take proper precautions.\n\nWhen it comes down to it, most software is designed to move forward, not backward.\n\n---\n\nThe Solution\n\nInstead of relying on blind faith in rollbacks for increasingly obfuscated generations, let's adopt a harm-reduction approach based on these guiding principles:\n\nTest Configurations Before Deployment  \n   Every generation should be tested before it's committed to the bootloader. Unverified changes should never be deployed directly.\n\nEnsure Bootloader Entries Only Point to Working Generations  \n   Broken configurations should not appear in your boot menu. A bootloader cluttered with failed attempts is as bad as a Git history filled with commits like “try fix #1,” “fix last fix,” or “oops.”\n\nKeep Generations Meaningful and Bisectable  \n   Each system generation should be useful for debugging. Just as you wouldn't push messy, unstructured commits to master, you shouldn't create a confusing boot history filled with arbitrary, untested changes.\n\nFollowing these principles helps maintain a clean system history, reduces unnecessary breakage, and makes troubleshooting more manageable.\n\nBetter Practices for Testing NixOS Configurations\n\nCheck Evaluation\n   Use nix flake check to catch syntax or logic errors.\nBuild the Configuration  \n   Run nix build to ensure the configuration compiles successfully.\nPreview System Changes  \n   Use nixos-rebuild dry-activate to preview modifications.\nTest in an Isolated Environment  \n   Run nixos-rebuild build-vm—especially useful for multi-host setups.\nApply Changes Ephemerally  \n   Run nixos-rebuild test to verify success before committing to a bootable generation.\n\nBetter Practices for Safe and Reliable Deployments\n\nFor most systems, manual updates should be avoided in favor of automated, controlled deployments. Several NixOS deployment tools exist[^fn1]. One straightforward solution is to simply use the system.autoUpgrade module in nixpkgs, which can:\n\nBuild your system from a remote or local flake.\nAutomatically upgrade your host to the latest commit at a specified time.\nDetect when a reboot is necessary and schedule it accordingly.\n\nHere's an example configuration using system.autoUpgrade:\n\n\n\nThis module automatically builds your system from the remote master branch at 2 AM, activates the new generation immediately, and then schedules a reboot while you sleep. Downtime is minimized, and your system is always built from the most recent commit. Unlike other Nix deployment tools, it doesn't deploy in real time, so you may occasionally need to intervene manually. Nevertheless, it's an excellent starting point for harmonizing your fleet of NixOS hosts and minimizing configuration drift.\n\nKeep in mind, if your laptop is not connected to the internet at 2AM (i.e. it is off or sleeping), system.autoUpgrade with this configuration will neither build nor activate the new generation. As an alternative, you can set system.autoUpgrade.persistent to true. This ensures that the autoUpgrade service runs whenever the machine is turned on, provided it would have run when the machine was off or sleeping.\n\nInitially, I had these machines set to use switch as well, but I found that this approach frequently introduced breakage—and it simply wasn't worth the risk. Instead, I now set system.autoUpgrade.operation to boot and system.autoUpgrade.persistent to true. Together, these options ensure that new generations are prepared for deployment without being immediately activated until the next reboot.\n\nIdeally, there would be an easy way to notify the user when a new generation is pending—similar to how other distributions do it. Since such a feature isn't available in NixOS, users must manually reboot from time to time to activate the new generation. Being slightly behind other hosts in the fleet is usually acceptable—and it certainly beats having a broken and inconsistent system.\n\nTL;DR: Your seven ThinkPads and Beelink servers need a CICD pipeline.\n\n---\n\nFootnotes\n\n[^fn1]: I'm most familiar with cachix and colmena. These enable more robust push deployments and can be automated with your preferred Git CI/CD solution, but they aren't as simple as system.autoUpgrade.",
  "title": "Please Stop Using `nixos-rebuild switch`"
}