{
  "$type": "site.standard.document",
  "canonicalUrl": "https://justingarrison.com/blog/2022-11-04-immutable-declarative-automated",
  "coverImage": {
    "$type": "blob",
    "ref": {
      "$link": "bafkreihvwt6s6u7o4o3km6xzggcwxptmywiltvcqyj54rc7ahul5tjd5ku"
    },
    "mimeType": "image/jpeg",
    "size": 222673
  },
  "description": "The infrastructure lies we tell ourselves, and why they're useful.",
  "path": "/blog/2022-11-04-immutable-declarative-automated",
  "publishedAt": "2022-11-04T08:21:24.000Z",
  "site": "at://did:plc:p7uix7mresfq4nfzxp3klgfa/site.standard.publication/3mmdn7mg2qm2d",
  "textContent": "The goals of infrastructure since I started managing it has been to automate it, make it immutable, and make it declarative.\nA good sysadmin never had to do something manually twice.\nAn SRE doesn't let their infrastructure change.\nPlatform engineers are imperative allergic.\n\nAll of these idealized extremes are guaranteed to waste your time.\nThey are helpful mechanisms to employ as needed, but without understanding their usefulness can be dangerous traps.\n\n<!--more-->\n\nAutomated\nThe benefits of speed, consistency, and a deep understanding of the systems you're automating include having more time to automate more things and a fast track to a sysadmin level 2.\nAutomation can be leveraged thousands of times for thousands of hours or maybe only once a year for a few minutes.\n\nYou shouldn't have to manually do something more than twice, but the effort level of automating a system safely may be more than manually doing it 100 times.\nIt's up to you to determine if the automation is worth it.\n\nHow long does it take and how often is the task performed are the usual calculations to see if automation is worth it.\nThe unknown maintenance, feature additions, documentation, and training may take as much time as the initial efforts.\nNot to mention the eventual re-write to rust.\n\nThe downsides of automation include―but are not limited to―automation that exits half way and requires manual work to revert.\nAccelerating an outage by running automation that assumed a system was healthy.\n\nThe key to successful automation is to write the smallest amount of it as necessary.\nHave well defined scope to the task your automating, and validations to assure it's run under constrained circumstances.\n\nAutomation is great.\nThe mantra to \"Automate all the things!\" is scary.\n\nImmutable\nNothing should ever change in production.\nExcept for application deployments, logs, memory heaps, and a million other things you don't care about until something breaks.\n\nThe need for immutability came from a time when we had physical servers and very little automation.\nIt was hard to set up a server.\nDays of plugging things in, updating firmware, rebooting, installing the OS―usually via physical media―rebooting again, and changing 18 files with the lessons learned from previous outages.\n\nAfter setup, the easiest thing to do was to not touch it.\nUptime was a source of pride because it was hard to achieve.\nKeeping a server powered on for 365+ days was hard work and a badge of honor to show how good you were as a sysadmin.\n\nWhen we finally started to care about things like security patching, we would do everything to avoid rebooting the system.\nSo we would mutate them.\nAnything short of a kernel update would get yum update and a quick SIGHUP.\n\nMinimal automation and mutating files on disk didn't scale.\nMore updates required more mutations.\nOperating system, runtimes, and configuration changes required more downtime which was hard to plan for during maintenance windows.\n\nSo, the pendulum swung the other way and immutability was the new uptime.\nWe could automate more things with new tools so everything old was new again.\nDon't-call-them-golden-images were all the rage.\nWrap your mutable RPMs in a Dockerfile and even your apps could be golden.\n\n\"Infrastructure as cattle\" became the goal, and plenty of conference talks proclaimed the enlightenment they had achieved.\nThe combination of automation and immutability made it so your could take a little bit of bash, terraform, and some artifacts and create a complete environment that never changed.\n\nThe golden signal of a good sysadmin―now called SRE―was how fast you could create an environment from scratch.\n\"I can recreate everything automatically\" was the most common lie we told ourselves.\nThe belief that nothing changed without us knowing was never spoken out loud.\nSo we called it \"immutable\".\n\nThe truth is, a system that doesn't change isn't useful.\nIt's a snapshot, not a running system.\nLots of things change, but the things we carefully crafted should change less often or change with intention.\n\nThe application configuration doesn't change.\nUnless you want to use dynamic feature flags or graceful degradation.\nOperating systems never change unless you want fast patch times or need interactive debugging.\n\nThe images should be golden, but the complete system cannot be semver'd.\nEnvironments change too frequently.\nYou can't take a snapshot of everything and re-deploy it exactly how it was.\nData―the thing you care most about―is different.\nTables have changed, APIs have updated, security keys are rotated (or they should be).\n\nThere's no such thing as complete immutability, but there is value in knowing which parts of your system should change and which parts shouldn't.\nWhen you evaluate the system you'll find a lot more things mutate than don't.\nHopefully, the pendulum has started to swing back to center and immutability is not the new uptime.\n\nImmutability still doesn't solve problems of deployment.\nThings mutate to create artifacts, get pushed to a storage system, and then a Jenkins orchestrated, Rube Goldberg bash script wraps curl, SSH, hopes, and dreams to push out a new version without downtime.\n\nWe thought we were creating reproducibility with Dockerfiles, but reality was relying on :latest is just as mutatable as yum update-ing.\nThere had to be a better way.\n\nDeclarative\nInstead of fixing the problems we created we decided that imperative steps were the problem so we started making things declarative.\nIf statements were unreliable so they were removed.\nLoops and logic could not be statically queried and often lead to incorrect assumptions being made during outages.\nSo they were abandoned for declarative, WET code.\n\nWe wrote more HCL and YAML than we ever planned in our pursuit of fully declarative APIs in the same way we chased complete immutability.\nJust like immutability, fully declarative systems is a lie.\n\nThe parts of your system that are immutable should be declared, but declarative systems are rigid and reliability requires flexibility.\nHow many replicas does your deployment have right now?\nHopefully, that's not statically declared but dynamically scaled based on need―and limited based on budget.\nWhat are the DNS names of your service load balancers?\nIt doesn't matter, the Kubernetes service is dynamically provisioned and associated with targets and they're available.\n\nWe're all tired of writing WET yaml so let's go back to DRY HCL.\nBetter still, we can use the power of general purpose programming languages with Pulumi or cdk8s to create abstractions on top of the declarative APIs.\nThen we transpile it to the verbose, declarative text files we can't be bothered to artisanally hand craft on our Cherry MX switches.\n\nThe goal of declarative systems is to reduce error prone, imperative automation.\nIf we can trust the declarative APIs then we can have some assurances the state of our systems are―or eventually will be―reconciled.\n\nSimilar to immutable systems, we want to think we know what's going on.\nThe best way we can find and fix a problem is to start with the correct assumptions, and ask questions of the running system to be answered―hello observability.\n\nThe goal of all of these mechanisms is reliability.\nReliability happens with careful planning and understanding when to use―and not use―the mechanisms available.\n\nThere will be a new goal, slogan, and unattainable holy grail.\nUnderstanding how and when to implement the tools available for your requirements is the only way to succeed.",
  "title": "Automated, immutable, and declarative"
}