Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreiarvqyfzrkkuzfph3tnnjuqj6qjn5dnpy2qakct4fkvz3h5yh7s44",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mhzmtyrukab2"
  },
  "path": "/t/spaces-persistent-storage-upgrade-not-accessible/171226#post_15",
  "publishedAt": "2026-03-27T04:00:51.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "Hugging Face",
    "Argilla Docs",
    "Docker Documentation",
    "Vultr Docs",
    "GitHub"
  ],
  "textContent": "I’m not very familiar with Argilla, but it seems that exporting and importing data at the Argilla data unit level—rather than on a file-by-file basis—is important.\n\nI don’t know the background behind the sudden discontinuation of Persistent Storage, but it certainly seems to have been discontinued…\n\n* * *\n\nFor your situation, the best practice is to **treat the Space as disposable** and move durability somewhere else. The current Hugging Face docs say persistent storage is no longer available, while other Hugging Face docs and API references still describe the older storage flow. That means chasing the old `small` tier is not a reliable operating plan anymore. (Hugging Face)\n\n## The core rule\n\n**Do not let a Hugging Face Space be the only copy of Argilla state.**\nUse the Space for UI and compute. Keep durable data outside it. That matches Hugging Face’s newer storage direction, where repos are Git-based and buckets are mutable object storage. (Hugging Face)\n\n## Best practices for you\n\n### 1. Stop trying to recover the old storage tier as your main plan\n\nYou can still ask support, but I would treat that as a low-probability rescue path, not as the design. The docs currently say the persistent-storage setting is ignored, even though other docs still expose storage-management methods. That is a documentation and product-state mismatch, not a stable workflow. (Hugging Face)\n\n### 2. Back up at the **Argilla dataset layer** , not just the file-upload layer\n\nYour backup unit should be the **Argilla dataset** , not just “files pushed to the Hub.” Argilla’s docs say a complete dataset includes the configuration in `rg.Settings` plus the records, and their `to_hub` / `from_hub` and `to_disk` / `from_disk` flows are specifically for exporting and restoring that full dataset. The `to_disk` reference says the export contains the dataset model, settings, and records as JSON files. (Argilla Docs)\n\n### 3. Keep **two independent backups** at all times\n\nFor every important milestone, keep:\n\n  * one **Hub dataset repo** copy for versioned external backup,\n  * one **local disk** copy outside the Space or VPS.\nThis is the safest pattern because it protects you from both Space failure and single-host failure, and it uses the Argilla-supported export flows rather than ad hoc copying. (Argilla Docs)\n\n\n\n### 4. Use the right Hugging Face storage for the right job\n\nUse a **dataset repo** when you want versioned, inspectable snapshots of Argilla datasets. Use a **Storage Bucket** for mutable artifacts such as logs, uploaded files, exports in progress, checkpoints, or other large files that change often. Hugging Face’s docs explicitly distinguish repos from buckets this way. (Hugging Face)\n\n### 5. For any serious Argilla deployment, use a real stateful stack\n\nArgilla is not just one container with a folder. Its docs describe a relational database layer, a search layer, and related server configuration, and the official Compose example includes **Argilla, PostgreSQL, Elasticsearch, Redis, and named volumes**. That is the shape you should think in when the data matters. (Argilla Docs)\n\n### 6. Separate **container persistence** from **host persistence**\n\nDocker volumes persist beyond the life of an individual container. That protects you from `docker compose down` / `up` cycles and container recreation. It does **not** protect you from losing the host itself, rebuilding the VM, or destroying the underlying disk. Docker’s own docs are explicit that volumes persist beyond the container lifecycle. (Docker Documentation)\n\n### 7. On Vultr, add host-level backups intentionally\n\nThis is where your VPS attempt likely fell short. Vultr’s docs say automatic backups cover the compute instance’s active file system, but **do not include attached Block Storage volumes**. So a durable Vultr design needs both persistent storage for the live service and a separate backup plan for that storage. (Vultr Docs)\n\n### 8. Test restore, not just backup\n\nA backup is only real if restore works. Argilla’s docs support restoring datasets from the Hub and from disk. So your standard operating procedure should include a restore drill into a fresh workspace or staging instance after major annotation milestones. (Argilla Docs)\n\n## What I would do in your place\n\n### Good enough for a short project\n\nUse the Space only as a temporary frontend. After each session or milestone:\n\n  1. export the dataset with Argilla’s export methods,\n  2. push to a Hub dataset repo,\n  3. also write a local disk export,\n  4. store large mutable artifacts outside the repo, ideally in a bucket. (Argilla Docs)\n\n\n\n### Better for an ongoing team workflow\n\nKeep Hugging Face for sharing and snapshots, but move the authoritative runtime off the Space. Run Argilla on a VPS or managed environment with Postgres, Elasticsearch or OpenSearch, Redis, Docker volumes, and host-level backups. That matches Argilla’s documented architecture much better than relying on Space-local state. (Argilla Docs)\n\n### Best for anything you would be upset to lose\n\nUse **three layers** :\n\n  * **live runtime** : self-hosted Argilla with persistent volumes or managed data services,\n  * **versioned dataset backup** : Argilla `to_hub` to a dataset repo,\n  * **offline or second-site backup** : Argilla `to_disk` copied off the VM. (Argilla Docs)\n\n\n\n## What not to do\n\nDo not rely on:\n\n  * a Space’s local disk as the source of truth,\n  * plain file uploads as a substitute for Argilla-native dataset export,\n  * Docker volumes alone as your whole disaster-recovery plan,\n  * Vultr backups without checking whether the actual data disk is included. (Hugging Face)\n\n\n\n## A practical operating checklist\n\nUse this as your default routine:\n\n  * **After creating or changing a dataset schema** : export with Argilla, not raw file copy. (Argilla Docs)\n  * **After each annotation milestone** : export to Hub and to disk. (Argilla Docs)\n  * **Before any deployment change or restart-risking change** : snapshot the dataset first. (Argilla Docs)\n  * **For VPS deployments** : confirm named volumes are present for Argilla, Postgres, and Elasticsearch. (GitHub)\n  * **For VPS durability** : enable snapshots/backups and verify which disks they cover. (Vultr Docs)\n  * **Once per month** : restore one backup into a clean environment and verify the dataset loads correctly. (Argilla Docs)\n\n\n\n## My recommendation, plainly\n\nFor you, the best practice is:\n\n**Space for convenience. External storage for truth. Self-hosted stack for durability.**\n\nThat means:\n\n  * do not spend more energy on reviving the old Space persistent-storage path,\n  * use Argilla-native export/import as your backup primitive,\n  * keep two backups,\n  * and move long-lived Argilla state to infrastructure you control. (Hugging Face)\n\n",
  "title": "Spaces Persistent Storage Upgrade Not Accessible"
}