Spaces Persistent Storage Upgrade Not Accessible
I’m not very familiar with Argilla, but it seems that exporting and importing data at the Argilla data unit level—rather than on a file-by-file basis—is important.
I don’t know the background behind the sudden discontinuation of Persistent Storage, but it certainly seems to have been discontinued…
For your situation, the best practice is to treat the Space as disposable and move durability somewhere else. The current Hugging Face docs say persistent storage is no longer available, while other Hugging Face docs and API references still describe the older storage flow. That means chasing the old small tier is not a reliable operating plan anymore. (Hugging Face)
The core rule
Do not let a Hugging Face Space be the only copy of Argilla state. Use the Space for UI and compute. Keep durable data outside it. That matches Hugging Face’s newer storage direction, where repos are Git-based and buckets are mutable object storage. (Hugging Face)
Best practices for you
1. Stop trying to recover the old storage tier as your main plan
You can still ask support, but I would treat that as a low-probability rescue path, not as the design. The docs currently say the persistent-storage setting is ignored, even though other docs still expose storage-management methods. That is a documentation and product-state mismatch, not a stable workflow. (Hugging Face)
2. Back up at the Argilla dataset layer , not just the file-upload layer
Your backup unit should be the Argilla dataset , not just “files pushed to the Hub.” Argilla’s docs say a complete dataset includes the configuration in rg.Settings plus the records, and their to_hub / from_hub and to_disk / from_disk flows are specifically for exporting and restoring that full dataset. The to_disk reference says the export contains the dataset model, settings, and records as JSON files. (Argilla Docs)
3. Keep two independent backups at all times
For every important milestone, keep:
- one Hub dataset repo copy for versioned external backup,
- one local disk copy outside the Space or VPS. This is the safest pattern because it protects you from both Space failure and single-host failure, and it uses the Argilla-supported export flows rather than ad hoc copying. (Argilla Docs)
4. Use the right Hugging Face storage for the right job
Use a dataset repo when you want versioned, inspectable snapshots of Argilla datasets. Use a Storage Bucket for mutable artifacts such as logs, uploaded files, exports in progress, checkpoints, or other large files that change often. Hugging Face’s docs explicitly distinguish repos from buckets this way. (Hugging Face)
5. For any serious Argilla deployment, use a real stateful stack
Argilla is not just one container with a folder. Its docs describe a relational database layer, a search layer, and related server configuration, and the official Compose example includes Argilla, PostgreSQL, Elasticsearch, Redis, and named volumes. That is the shape you should think in when the data matters. (Argilla Docs)
6. Separate container persistence from host persistence
Docker volumes persist beyond the life of an individual container. That protects you from docker compose down / up cycles and container recreation. It does not protect you from losing the host itself, rebuilding the VM, or destroying the underlying disk. Docker’s own docs are explicit that volumes persist beyond the container lifecycle. (Docker Documentation)
7. On Vultr, add host-level backups intentionally
This is where your VPS attempt likely fell short. Vultr’s docs say automatic backups cover the compute instance’s active file system, but do not include attached Block Storage volumes. So a durable Vultr design needs both persistent storage for the live service and a separate backup plan for that storage. (Vultr Docs)
8. Test restore, not just backup
A backup is only real if restore works. Argilla’s docs support restoring datasets from the Hub and from disk. So your standard operating procedure should include a restore drill into a fresh workspace or staging instance after major annotation milestones. (Argilla Docs)
What I would do in your place
Good enough for a short project
Use the Space only as a temporary frontend. After each session or milestone:
- export the dataset with Argilla’s export methods,
- push to a Hub dataset repo,
- also write a local disk export,
- store large mutable artifacts outside the repo, ideally in a bucket. (Argilla Docs)
Better for an ongoing team workflow
Keep Hugging Face for sharing and snapshots, but move the authoritative runtime off the Space. Run Argilla on a VPS or managed environment with Postgres, Elasticsearch or OpenSearch, Redis, Docker volumes, and host-level backups. That matches Argilla’s documented architecture much better than relying on Space-local state. (Argilla Docs)
Best for anything you would be upset to lose
Use three layers :
- live runtime : self-hosted Argilla with persistent volumes or managed data services,
- versioned dataset backup : Argilla
to_hubto a dataset repo, - offline or second-site backup : Argilla
to_diskcopied off the VM. (Argilla Docs)
What not to do
Do not rely on:
- a Space’s local disk as the source of truth,
- plain file uploads as a substitute for Argilla-native dataset export,
- Docker volumes alone as your whole disaster-recovery plan,
- Vultr backups without checking whether the actual data disk is included. (Hugging Face)
A practical operating checklist
Use this as your default routine:
- After creating or changing a dataset schema : export with Argilla, not raw file copy. (Argilla Docs)
- After each annotation milestone : export to Hub and to disk. (Argilla Docs)
- Before any deployment change or restart-risking change : snapshot the dataset first. (Argilla Docs)
- For VPS deployments : confirm named volumes are present for Argilla, Postgres, and Elasticsearch. (GitHub)
- For VPS durability : enable snapshots/backups and verify which disks they cover. (Vultr Docs)
- Once per month : restore one backup into a clean environment and verify the dataset loads correctly. (Argilla Docs)
My recommendation, plainly
For you, the best practice is:
Space for convenience. External storage for truth. Self-hosted stack for durability.
That means:
- do not spend more energy on reviving the old Space persistent-storage path,
- use Argilla-native export/import as your backup primitive,
- keep two backups,
- and move long-lived Argilla state to infrastructure you control. (Hugging Face)
Discussion in the ATmosphere