Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreibc7g64f6j4vqlkuh6g4jr4l2ukzhrvzkr2ywtalrvmf5lwsx6bqe",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mjv4la2ui462"
  },
  "path": "/t/space-auto-restart/175337#post_5",
  "publishedAt": "2026-04-19T22:56:23.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "huggingface.co",
    "discuss.huggingface.co",
    "github.com"
  ],
  "textContent": "Aside from the general points mentioned below, I recall seeing a report on a forum stating that free CPU spaces may reboot within 24 to 48 hours even if they haven’t reached their processing load or RAM limits. (I don’t remember exactly.)\n\nWell, HF Free CPU Spaces isn’t really suited for practical backends that need to run continuously. It’s basically just for demos, after all…\n\n* * *\n\n## Bottom line\n\nYour Spaces are most likely restarting because they are running a **stateful, multi-process Docker stack** on a platform that is much more comfortable with **simpler, faster-starting, mostly stateless containers**. This is **not the normal free-tier sleep behavior**. A Space can restart without any new commit when the container exits, the app becomes unhealthy during startup, the runtime is recycled, or a platform-side control/runtime issue occurs. Hugging Face’s docs clearly distinguish lifecycle behavior, startup-health behavior, ephemeral local storage, and restart-triggering configuration changes. (huggingface.co, huggingface.co, huggingface.co, huggingface.co)\n\nSo the short answer is:\n\n  * **No, this does not look like ordinary sleep.**\n  * **Yes, it can happen without new commits.**\n  * **Yes, there are several reasons besides RAM.**\n  * **Yes, support can sometimes help, but your app/runtime design itself is also a strong cause.** (huggingface.co, discuss.huggingface.co)\n\n\n\n* * *\n\n## What Hugging Face Spaces are designed for\n\nA Hugging Face Space is not a normal VPS. It is a **managed container runtime** with a web-facing app endpoint.\n\nThat matters because the platform comes with its own rules:\n\n  * free hardware has lifecycle rules,\n  * local disk is ephemeral unless you use the proper persistent storage path,\n  * startup health matters,\n  * outbound networking is restricted,\n  * and complex stateful Docker apps need more care than simple demos. (huggingface.co, huggingface.co, huggingface.co)\n\n\n\nHugging Face’s own deployment guides for heavier Docker apps such as **Label Studio** , **Langfuse** , **ZenML** , and **Giskard** all emphasize persistence and runtime structure. That is a signal in itself: once you move beyond a simple app server, the platform gets less forgiving. (huggingface.co, huggingface.co, huggingface.co, huggingface.co)\n\n* * *\n\n## What your Spaces are doing instead\n\nFrom the code structure, your Spaces are not just serving one app. They are doing all of this:\n\n  * boot from a shell script,\n  * extract an encrypted web app archive,\n  * maybe run `npm install`,\n  * initialize or restore MariaDB,\n  * start Node,\n  * start Apache,\n  * and in `box2`, continuously create DB snapshots in the background.\n\n\n\nThat is much closer to a **small self-hosted stack** than to a normal Space app.\n\nThat design is workable on a full server you control. On a free managed Docker Space, it creates many more ways for the runtime to recycle or fail even when there was no commit and no obvious user-side traffic spike.\n\n* * *\n\n## Why a restart can happen with no new commit\n\nThis is the most important conceptual point.\n\nA new commit is only **one** reason a Space restarts. Other causes include:\n\n### 1) The main process exits\n\nIf the process that Hugging Face considers the main app process exits, the container exits.\n\n### 2) Startup never becomes healthy\n\nHugging Face exposes `startup_duration_timeout` for a reason. The default is **30 minutes** , and the app can still be treated as unhealthy if startup behavior is bad or incomplete. (huggingface.co)\n\n### 3) Local state disappears or becomes inconsistent\n\nIf your app expects local DB/files to act like persistent state, a restart becomes much harder to recover from because the next boot has to do more work on ephemeral storage. Hugging Face’s storage docs explicitly say the disk is not persistent by default. (huggingface.co)\n\n### 4) Child services fail inside a multi-service container\n\nA Space can look “fine” from the outside while one internal service has already gone bad. That often leads to confusing behavior, partial failures, and later restarts.\n\n### 5) Platform-side runtime/control issues\n\nThere are public cases where restart/factory reboot itself broke, returned `503`, or the runtime seemed wedged even across recreated Spaces. That means some failures really are platform-side. (discuss.huggingface.co, discuss.huggingface.co)\n\nSo “no commit happened” does **not** mean “HF should never have restarted this.”\n\n* * *\n\n## Why your specific code is vulnerable\n\n### A) Too much startup work\n\nYour runtime is doing heavy tasks at startup:\n\n  * archive extraction,\n  * dependency install,\n  * database init/import/restore,\n  * process orchestration.\n\n\n\nThat is exactly the kind of startup path that becomes fragile on Spaces. Hugging Face’s config reference supports `startup_duration_timeout` and `preload_from_hub` because startup behavior is operationally important. (huggingface.co)\n\nFor a stable Space, startup should be as close as possible to:\n\n  * read config/secrets,\n  * start already-installed services,\n  * become healthy quickly.\n\n\n\nYour stack is much heavier than that.\n\n### B) You are using a local SQL database without persistent storage\n\nYour two Spaces have **no persistent storage**.\n\nThat means the database layer is sitting on storage Hugging Face describes as non-persistent. This is not just a convenience issue. It directly changes runtime behavior:\n\n  * every restart becomes more expensive,\n  * state recovery becomes more complex,\n  * boot becomes slower and less predictable,\n  * and the platform sees a heavier app each time it has to restart. (huggingface.co)\n\n\n\nThis is the single biggest architectural mismatch in your setup.\n\n### C) `box2` adds a costly background snapshot loop\n\nAmong your two Spaces, `box2` is riskier because it keeps generating DB snapshots in the background. That means even when user activity is “light,” the container is still doing nontrivial work.\n\nThat is exactly the sort of hidden workload that can make a free container less stable without obvious traffic pressure.\n\n### D) Your current logging model hides the true failure point\n\nIf important services mostly write to internal log files instead of stdout/stderr, the Space log view becomes much less useful. Hugging Face recently improved Spaces log access with programmatic log tools, but those tools are only as useful as the logs you emit. (github.com)\n\nSo part of the mystery may be that the most important evidence is not reaching the main log stream.\n\n### E) Your process model is harder to supervise cleanly\n\nA multi-process container where Apache, Node, and MariaDB are all started by a shell script is inherently more fragile than a single-process app. If one child process dies, the Space may become partially broken before the container eventually restarts.\n\nThat makes the runtime harder to reason about.\n\n* * *\n\n## Why “not RAM” is not enough\n\nThis part matters because you emphasized it.\n\nIt is completely possible for a Space to restart while average RAM looks normal, because the trigger can be:\n\n  * a child service exit,\n  * an unhealthy startup transition,\n  * a bad local state recovery,\n  * a temporary platform/runtime issue,\n  * or a short-lived resource spike that you never happened to see. (huggingface.co, huggingface.co)\n\n\n\nSo I would not use “RAM looked fine” as a strong argument that the platform restarted a perfectly healthy app. It may have. But your runtime gives several stronger explanations first.\n\n* * *\n\n## The hidden networking detail that affects architecture\n\nOne detail that many people miss:\n\nHugging Face documents that outbound requests from Spaces are restricted to **ports 80, 443, and 8080**. (huggingface.co)\n\nThat matters because it makes “just use a normal external MySQL/Postgres server” less simple than it sounds. A direct connection on a typical DB port like **3306** may not work from a Space.\n\nSo your choices are not as open as they would be on a VPS:\n\n  * local DB inside the container is fragile,\n  * but external DB is also constrained by outbound networking rules.\n\n\n\nThat is an important part of the background here.\n\n* * *\n\n## My actual diagnosis for your case\n\nIf I put everything together, this is my best reading:\n\n### Most likely\n\nYour Spaces are restarting because the runtime design is already too close to the edge:\n\n  * stateful local DB on ephemeral storage,\n  * startup-heavy Docker entrypoint,\n  * multi-process orchestration,\n  * expensive background work,\n  * weak log visibility.\n\n\n\n### Also possible\n\nThere may have been one or more Hugging Face runtime/control-plane problems on top of that, because public reports show that those do happen. But those reports do **not** erase the fact that your current setup is inherently restart-sensitive. (discuss.huggingface.co, discuss.huggingface.co)\n\nSo the best honest answer is:\n\n**The platform may have nudged it, but your code/runtime made it much easier for that nudge to become a visible restart.**\n\n* * *\n\n## Solutions, ordered by impact\n\n## 1) Remove runtime installs\n\nDo **not** do `npm install` at boot if you can avoid it.\n\nInstall dependencies in the image build. That reduces startup variability immediately.\n\nThis is one of the highest-value changes you can make.\n\n## 2) Make startup much smaller\n\nTry to move from:\n\n  * extract,\n  * install,\n  * initialize,\n  * restore,\n  * then launch\n\n\n\nto:\n\n  * verify,\n  * launch,\n  * become healthy quickly.\n\n\n\nIf archive extraction must remain, then at least remove all the other avoidable boot work.\n\n## 3) Stop treating local DB state as durable\n\nWith no persistent storage, local DB state is not trustworthy across restarts. Officially, local disk is ephemeral. (huggingface.co)\n\nThat means you should choose one of these paths:\n\n### Demo path\n\nMake the app boot from a small seed and accept that user changes are not durable.\n\n### Persistent path\n\nAdd real persistence and redesign around it.\n\nRight now you are between those two paths, which is the least stable place to be.\n\n## 4) Slow down or remove the snapshot loop\n\nFor `box2`, the snapshot loop should be dramatically less frequent, or removed until true persistence exists.\n\nA background safety loop that constantly dumps/compresses the DB can become a destabilizer on free managed infrastructure.\n\n## 5) Supervise all services properly\n\nThe shell script should either:\n\n  * remain PID 1 and supervise Apache/Node/MariaDB itself, or\n  * use a proper process supervisor\n\n\n\nThe key goal is simple:\n\n**if a critical child dies, the container should fail cleanly and visibly, not drift into a half-working state.**\n\n## 6) Push service logs to stdout/stderr\n\nThis is a big practical fix.\n\nMake Node, MariaDB, and Apache logs visible in the Space logs. Then the restart story becomes much easier to prove.\n\nThat also pairs well with the new Hugging Face log tooling. (github.com)\n\n## 7) Set Docker config explicitly\n\nAdd these in `README.md`:\n\n\n    app_port: 7860\n    startup_duration_timeout: 1h\n\n\nThis does not solve everything, but it removes ambiguity and gives a startup-heavy container more room. (huggingface.co)\n\n## 8) Use Dev Mode or programmatic log collection when reproducing\n\nHugging Face provides Dev Mode and improved log access for Spaces. For a stack like yours, debugging only via the normal web UI is not enough. (huggingface.co, github.com)\n\n* * *\n\n## Final conclusion\n\nYour Spaces are not restarting for “no reason.”\n\nThe reasons are most likely a combination of:\n\n  * **managed container lifecycle** ,\n  * **heavy startup/runtime design** ,\n  * **multi-service fragility** ,\n  * **local DB on ephemeral storage** ,\n  * **costly background snapshotting** ,\n  * and possibly **occasional Hugging Face runtime/control glitches**. (huggingface.co, huggingface.co, discuss.huggingface.co)\n\n\n\nSo my clearest advice is:\n\n  1. **remove runtime installs** ,\n  2. **shrink startup** ,\n  3. **reduce or remove the snapshot loop** ,\n  4. **make logs visible** ,\n  5. **supervise services cleanly** ,\n  6. **choose either stateless demo mode or real persistence**.\n\n\n\nThat is the most evidence-based explanation and the most realistic fix path for your case.",
  "title": "Space Auto restart"
}