Raw Record Source

{
  "$type": "site.standard.document",
  "canonicalUrl": "https://johnnyreilly.com/posts/azure-app-service-health-checks-and-zero-downtime-deployments",
  "description": "Azure App Service enables zero downtime deployments using health checks and deployment slots. Automated swapping slots ensure constant service.",
  "path": "/posts/azure-app-service-health-checks-and-zero-downtime-deployments",
  "publishedAt": "2021-02-11T00:00:00.000Z",
  "site": "at://did:plc:yy3apqjlms24kso7ahn7lbmb/site.standard.publication/3mova7c4nho2b",
  "tags": [
    "azure"
  ],
  "textContent": "I've been working recently on zero downtime deployments using Azure App Service. They're facilitated by a combination of Health checks and deployment slots. This post will talk about why this is important and how it works.\n\n\n\nWhy zero downtime deployments?\n\nHistorically (and for many applications, currently) deployment results in downtime. A period of time during the release where an application is not available to users whilst the new version is deployed. There are a number of downsides to releases with downtime:\n\n1. Your users cannot use your application. This will frustrate them and make them sad.\n2. Because you're a kind person and you want your users to be happy, you'll optimise to make their lives better. You'll release when the fewest users are accessing your application. It will likely mean you'll end up working late, early or at weekends.\n3. Again because you want to reduce impact on users, you'll release less often. This means that every release will bring with it a greater collection of changes. This is turn will often result in a large degree of focus on manually testing each release, to reduce the likelihood of bugs ending up in users hands. This is a noble aim, but it drags the teams focus away from shipping.\n\nPut simply: downtime in releases impacts customer happiness and leads to reduced pace for teams. It's a vicious circle.\n\nBut if we turn it around, what does it look like if releases have _no_ downtime at all?\n\n1. Your users can always use your application. This will please them.\n2. Your team is now safe to release at any time, day or night. They will likely release more often as a consequence.\n3. If your team has sufficient automated testing in place, they're now in a position where they can move to Continuous Deployment.\n4. Releases become boring. This is good. They \"just work™️\" and so the team can focus instead on building the cool features that are going to make users lives even better.\n\nManual zero downtime releases with App Services\n\nApp Services have the ability to scale out. To quote the docs:\n\n> A scale out operation is the equivalent of creating multiple copies of your web site and adding a load balancer to distribute the demand between them. When you scale out ... there is no need to configure load balancing separately since this is already provided by the platform.\n\nAs you can see, scaling out works by having multiple instances of your app. Deployment slots are exactly this, but with an extra twist. If you add a deployment slot to your App Service, then you no longer deploy to production. Instead you deploy to your staging slot. Your staging slot is accessible in the same way your production slot is accessible. So whilst your users may go to https://my-glorious-app.io, your staging slot may live at https://my-glorious-app-stage.azurewebsites.net instead. Because this is accessible, this is testable. You are in a position to test the deployed application before making it generally available.\n\nOnce you're happy that everything looks good, you can \"swap slots\". What this means, is the version of the app living in the staging slot, gets moved into the production slot. So that which lived at https://my-glorious-app-stage.azurewebsites.net moves to https://my-glorious-app.io. For a more details on what that involves read this. The significant take home is this: there is no downtime. Traffic stops being routed to the old instance and starts being routed to the new one. It's as simple as that.\n\nI should mention at this point that there's a number of zero downtime strategies out there and slots can help support a number of these. This includes canary deployments, where a subset of traffic is routed to the new version prior to it being opened out more widely. In our case, we're looking at rolling deployments, where we replace the currently running instances of our application with the new ones; but it's worth being aware that there are other strategies that slots can facilitate.\n\nSo what does it look like when slots swap? Well, to test that out, we swapped slots on our two App Service instances. We repeatedly CURLed our apps api/build endpoint that exposes the build information; to get visibility around which version of our app we were routing traffic to. This is what we saw:\n\nThe first new version of our application showed up in a production slot at 11:51:54, and the last old version showed up at 11:52:12. So it took a total of 15 seconds to complete the transition from hitting only instances of the old application to hitting only instances of the new application. During that 15 seconds either old or new versions of the application would be serving traffic. Significantly, there was always a version of the application returning responses.\n\nThis is _very_ exciting! We have zero downtime deployments!\n\nRollbacks for bonus points\n\nWe now have the new version of the app (buildNumber: 20210121.6) in the production slot, and the old version of the app (buildNumber: 20210121.5) in the staging slot.\n\nSlots have a tremendous rollback story. If it emerges that there was some uncaught issue in your release and you'd like to revert to the previous version, you can! Just as we swapped just now to move buildNumber: 20210121.6 from the staging slot to the production slot and buildNumber: 20210121.5 the other way, we can swap right back and revert our release like so:\n\nOnce again users going to https://my-glorious-app.io are hitting buildNumber: 20210121.5.\n\nThis is also _very_ exciting! We have zero downtime deployments _and_ rollbacks!\n\nAutomated zero downtime releases with Health checks\n\nThe final piece of the puzzle here automation. You're a sophisticated team, you've put a great deal of energy into automating your tests. You don't want your release process to be manual for this very reason; you trust your test coverage. You want to move to Continuous Deployment.\n\nFortunately, automating swapping slots is a breeze with azure-pipelines.yml. Consider the following:\n\nThe first job here, deploys our previously built webapp to the stage slot. The second job swaps the slot.\n\nWhen I first considered this, the question rattling around in the back of my mind was this: how does App Service know when it's safe to swap? What if we swap before our app has fully woken up and started serving responses?\n\nIt so happens that using Health checks, App Service caters for this beautifully. A health check endpoint is a URL in your application which, when hit, checks the dependencies of your application. \"Is the database accessible?\" \"Are the APIs I depend upon accessible?\" The diagram from the docs expresses it very well:\n\nThis approach is very similar to liveness, readiness and startup probes in Kubernetes. To make use of Health checks, in our ARM template for our App Service we have configured a healthCheckPath:\n\nThis tells App Service where to look to check the health. The health check endpoint itself is provided by the MapHealthChecks in our Startup.cs of our .NET application:\n\nYou read a full list of all the ways App Service uses Health checks here. Pertinent for zero downtime deployments is this:\n\n> when scaling up or out, App Service pings the Health check path to ensure new instances are ready.\n\nThis is the magic sauce. App Service doesn't route traffic to an instance until it's given the thumbs up that it's ready in the form of passing health checks. This is excellent; it is this that makes automated zero downtime releases a reality.\n\nProps to the various Azure teams that have made this possible; I'm very impressed by the way in which the Health checks and slots can be combined together to support some tremendous use cases.",
  "title": "Azure App Service, Health checks and zero downtime deployments"
}