Diagnosing a Frozen Apache Server
Khürt Williams
June 28, 2026
Saturday morning I noticed that my WordPress site had stopped responding. Not slow — gone fishing. Pages wouldn’t load. I couldn’t connect via SSH. Even DigitalOcean‘s emergency console, the one that’s supposed to work when everything else fails, sat there doing nothing. That last detail worried me more than the non-responsive web server. A frozen web server is annoying. A frozen console suggests the VPS itself has stopped responding at a level below the operating system’s web server, below SSH, below anything I could reach remotely. I checked DigitalOcean’s status page first, partly out of hope and partly out of habit — if there’s a regional outage, you want to know before you start troubleshooting your own configuration. There wasn’t one. Whatever had gone wrong was mine to solve. The only option left was a hard power cycle, which DigitalOcean offers as a single button on the droplet’s control panel: the PaaS equivalent of unplugging it and plugging it back in. Not a graceful reboot, which politely asks the operating system to wind things down — that depends on the OS being awake enough to listen, and clearly it wasn’t. A power cycle doesn’t ask permission. It just happens. There’s a real cost to that: anything mid-write to disk can be lost, anything in memory certainly is. But when the alternative is an indefinitely frozen virtual server, you take the risk. It worked. The site came back, services reported healthy, and for about ten minutes, everything looked fine. Then it happened again. The Detective Work This second time, I had something valuable: a live SSH terminal session into a virtual server that was still working, even as the website itself had gone unresponsive in the browser. That’s the moment to gather evidence before it potentially disappears. The first checks were almost disappointingly normal. CPU usage was low. Memory had headroom. Swap was untouched. The load average — a rough measure of how much work a system is queued up to do — sat comfortably low. None of the obvious signs of a resource crisis were present. Whatever was wrong, it wasn’t a machine straining under too much work. So I checked the MySQL database. Sometimes a single slow, badly indexed query can quietly choke a WordPress site without raising the CPU much at all — it’s possible to be “busy waiting” rather than busy computing. Nothing. The list of currently running MySQL queries was essentially empty. PHP, the language running the website’s logic, was idle too — no backlog, no struggling workers. This was the strange part. Every individual component looked healthy, and yet a simple request to load the homepage took thirteen seconds. Then, a little later, seventy-three seconds. For context, a normal page load on a reasonably configured personal WordPress server should take well under a second. Finding the Bottleneck The breakthrough came from looking at Apache, the web server software that receives requests before handing them off to anything else. Apache has a setting — a maximum number of worker processes it’s allowed to run simultaneously, each one handling a single visitor’s request at a time. Mine was set to ten. Ten might sound like plenty for a personal blog, and most of the time, it probably was. But Apache’s own error log had quietly been recording a specific issue: it had reached that ceiling and was warning that the limit should probably be raised. It had logged this message within thirty seconds of the server starting up. The website wasn’t broken so much as queued. Every visitor was joining a waiting line ten requests deep, and once that line filled, everyone else simply waited their turn — including, unhelpfully, my own diagnostic tools (curl) trying to check what was wrong. That explained the bottleneck, but not why it was filling up in the first place. Ten requests at once isn’t an unreasonable number for ordinary traffic. Something was holding those ten slots hostage rather than letting them cycle through quickly. The answer was found in another, related setting: how long Apache would wait for a single request to complete before giving up on it. That value was set to five minutes. Five minutes is an enormous amount of time in web terms — long enough that a handful of slow, stalled, or simply unresponsive connections could occupy the entire worker pool for the better part of an hour between them, leaving nothing spare for genuine visitors. Checking the live network connections confirmed it. Several connections, from a scattering of different addresses, had been sitting open for nearly five minutes each, technically still “reading” a request that never quite arrived. They weren’t necessarily malicious — that I genuinely can’t say with confidence either way — but whether they were a deliberate slow-request attack, a wave of poorly written bots, or some other quirk of the modern internet’s background noise, the effect on the server was the same either way. The Fix, and the Lesson I raised the worker ceiling modestly — not recklessly, since more simultaneous workers means more memory consumed, and I’d been through a memory crisis on this same server before. I calculated the memory cost per worker, worked out a safe headroom, and picked a number with margin to spare rather than guessing. Then I cut the waiting-time allowance from five minutes down to one. A single stalled connection now ties up a worker for sixty seconds at most, not three hundred. That one change, on its own, was probably doing most of the work. The proof came afterwards. A page that had taken seventy-three seconds to load came back in under three. And rather than trust a synthetic test alone, I ran the workflow I actually use day to day — uploading photographs directly from Adobe Lightroom Classic to the site. It completed without a single error. What bothers me is how quiet the actual issue was. No alarming spike, no resource exhaustion, no obvious villain in a log file shouting that something was wrong. Just two unremarkable configuration values, set sensibly enough on their own, that happened to compound each other badly under a particular kind of pressure. The server wasn’t overwhelmed. It was simply waiting, patiently, for connections that were never going to finish — and there weren’t enough waiting rooms to also serve everyone else. Self-hosting teaches you that resilience rarely comes from one dramatic safeguard. It comes from understanding how your own small decisions interact with each other, and being willing to sit with a terminal window for an hour, methodically, until the quiet thing reveals itself. If you’re curious about running your own server, DigitalOcean is my hosting provider.
Discussion in the ATmosphere