Raw Record Source

{
  "$type": "site.standard.document",
  "canonicalUrl": "https://justingarrison.com/blog/2026-06-04-kubernetes-autoscaling",
  "coverImage": {
    "$type": "blob",
    "ref": {
      "$link": "bafkreifeotjaxqpptaskmbawr2smcn7fw4ifqxfd43zsgxa7smokraryym"
    },
    "mimeType": "image/png",
    "size": 49669
  },
  "description": "Workload and node autoscaling in 2026",
  "path": "/blog/2026-06-04-kubernetes-autoscaling",
  "publishedAt": "2026-06-04T00:00:00.000Z",
  "site": "at://did:plc:p7uix7mresfq4nfzxp3klgfa/site.standard.publication/3mmdn7mg2qm2d",
  "textContent": "> This post is sponsored by Cast AI, but comes from my real world experiences.\n\nAutomatically scaling nodes is a cloud parlor trick you don't get to opt out of.\nIf you run in a hyperscaler you have to play by their rules weather you need it or not.\n\nIs it useful?\nSometimes.\nIs it required?\nYes, to make the economics make sense.\nIs it cool?\nAbsolutely.\n\nCloud pricing makes static provisioning expensive, so you're forced to dynamically scale.\nBut autoscaling introduces problems you didn't have on-prem.\nThen you spend years turning knobs to make the problems managable.\n\nI've seen many engineers lose months tuning autoscaling when they should be contributing to business initiatives.\nThe company and engineer moved from a static, on-prem environment and the thing that used to be extremely difficult — adding and removing compute capacity — is now trivial and fun.\nSo they tune the knobs until they get it _just_ right.\nSix months later all of their assumptions were wrong and they either do it again, hand it off to someone else, or automate it.\n\nThe video to assist this blog post dives more into autoscaling, my history with it, and what a possible solution could be.\n\n{{< youtube 38r60gJsqLQ >}}\n\nStages of autoscaling\n\nKubernetes adds a workload native, portable API on top of cloud APIs.\nWhile clouds _can_ auto scale, not all clouds are able to scale equally.\n\nThis causes a problem for portability because any \"enterprise\" will use a bit of everything and they'll group similar functions to save money on their org chart.\nThis creates a centralized team of reliability and scalability to figure out how to handle multiple workloads in multiple clouds with different use cases and constraints.\n\nNaturally, the team will try to make a similar interface to do their job as consistently as possible and they'll use tools they're familiar with.\n15 years ago this meant boto3 or ansible.\nIn 2026 this means Kubernetes.\n\nTeams will start with the Kubernetes Cluster Autoscaler, maybe try Karpenter, and if the business or resume requires it, will end up with KEDA.\nThe addiction of dynamic scaling is never satiated.\n\nThe problems you didn't have on-prem\n\nIt feels good to pretend to be a hyperscaler.\nYou're offering your developers elastic compute, storage, and networking and your company only has to pay for what you use (and all the engineering effort to make sure you don't use too much).\n\nExcept, it's not good enough and unless your workloads are very mature, you're going to have problems.\n\nThe problems start with eager user-data scripts that assume apt repositories are always available.\nThat you can install packages at runtime and not worry about managaging AMIs ever again.\nThis line of thinking makes your startup time too slow so you build AMIs to make your provisioning more reliable.\n\nThen you start hitting limits.\nNot engineering limits of hardware capacity, but artificial, cloud limits you have to open requests to solve.\nIf you scale large enough, eventually your instances get iced, meaning you need to request a wider variety of instance types.\n\nAfter you solve those problems you'll end up with inconsistent performance characteristics and bugs that you're not able to reproduce.\nThis will push you further down the observability rabbit hole to identify problems sooner and understand them more holistically.\n\nA lot of this is to solve cloud-induced problems.\n\nAre there benefits to better application observability? Absolutely.\nDo you need 4 years of 5 second infrastructure granularity? Most certainly not (unless regulations require it).\n\nIt's surprising you never had these problems or needed this level of granularity when you were on-prem.\nThe reason you didn't is because you did most of the work up front with load testing and planning.\nHaving a static environment is much easier to understand and troubleshoot.\nThis assumes you have some level of reproducability and auditability of changes.\n\nThe autoscaling tax\n\nOne of the problems with the Kubernetes autoscaling options is they fill different needs.\nHorizontal Pod Autoscaler (HPA), Vertical Pod Autoscaler (VPA), Event-Driven Autoscaling (KEDA) and others will scale workloads up, down, and sideways.\nCluster Autoscaler (CA) and Karpenter will manage the nodes.\n\nBut each of these autoscalers have their own unique idiosyncrasies that require lots of experience to get right not mess up.\nAnd none of them are actually complete for what the business wants.\n\nSure you can scale things, but business leaders don't look at Grafana, they need a report.\nSomething in their email that makes them feel smart.\nThey're not going to get that from GitOps or YAML.\n\nYou can't rely on your cloud provider to be the sole source of cost savings because Kubernetes runs a lot more places than a single cloud provider.\nAnd some cloud providers have terrible cost reporting data.\n\nYou could build reporting with Kubecost Apptio (an IBM project), but this just turns into another source of questionable alerts.\nIt _can_ do some automation, but that's not really what it was built for.\nI'm sure it's perfectly fine (I've never used it).\n\nInstead you probably need to rely on OpenCost, but you're back to monitoring disconnected from optimizations.\nKarpenter will help make the right decisions when you scale, but Karpenter doesn't do reporting, doesn't work everywhere, and only scales nodes.\n\nYou can't opt out of this in the cloud. The economics won't let you treat infrastructure like a static data center, so you'll autoscale, and autoscaling creates demand for the cost reporting and optimization that autoscaling itself can't do. The question is whether you'll run three tools that don't talk to each other or one that does.\n\nThis is where Cast AI comes in.\nI've never used it in anger (a.k.a. in production) so I'm not an expert in all the ways it works or doesn't.\nI've only used it to learn enough about how it works to feel confident enough to know it does more than Karpenter and OpenCost.\n\nWhen you combine all this data and control, the sum is greater than the parts.\n\nCast AI originally asked me to review their platform 3 years ago.\nAt the time I was working at AWS on Karpenter and Cast AI claimed it worked better than Karpenter with no public data to corroborate their claims.\nI said no because I didn't think I could be a unbiased reviewer.\n\nThis time, when they reached out, they let me know they built integrations so it works with a cluster already managing autoscaling with Karpenter.\nObviously, the long term goal would be to get rid of Karpenter or the Cluster Autoscaler and only use Cast AI's autoscaling capabilities.\nI assume that's where a lot of enterprises will end up.\n\nIn the cloud, autoscaling and cost reporting are the same job. The industry calls it FinOps. The pricing model is continuous, so the management is continuous.\n\nOn-prem has budgets, not FinOps. Different economics, different tools.\n\nEven so, on-prem Kubernetes clusters end up running cloud autoscaling tools.\nDoes it make sense?\nNo — the hardware is already a sunk cost.\nWill leadership ask why on-prem clusters don't need it?\nNo — asking that question admits they don't already know the answer.\nWill you install it anyway?\nYes.",
  "title": "Kubernetes Autoscaling"
}