Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreiecwythwqrmd5n5wsioowofemcer55ywevoyaqtigpcyys6545tr4",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mj45m43zoik2"
  },
  "path": "/t/how-should-i-write-to-bucket-from-a-space/175123#post_2",
  "publishedAt": "2026-04-09T20:19:52.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "use Bucket via the HF PYTHON API whenever possible",
    "Hugging Face",
    "GitHub",
    "Hugging Face Forums"
  ],
  "textContent": "I’ve tried using Bucket once, but there are still some areas where the specifications might change, so I’m not entirely sure how it works…\n\nIn any case, especially when using Python, it’s probably safest to use Bucket via the HF PYTHON API whenever possible.\n\n* * *\n\nYou probably did **not** miss a bucket-creation step or a “write mode” checkbox. Current Hugging Face docs say Space volumes can mount models, datasets, or **storage buckets** , and that **only storage buckets support read-write mounts**. The docs even show a bucket mounted at `/data`, and the CLI/docs say buckets are **read-write by default** unless you explicitly mount them read-only. (Hugging Face)\n\nThe first thing to correct is the test command. This:\n\n\n    mkdir u\n\n\ndoes **not** test whether `/data` is writable. It tests whether your **current working directory** is writable. If you were in `/`, then it was trying to create `/u`, not `/data/u`. On a Space, creating arbitrary directories directly under `/` is normally not allowed for the app user. So your current evidence proves “my current directory is not writable,” but it does **not yet prove** “the mounted bucket at `/data` is not writable.”\n\n## What to test instead\n\nRun these exact commands:\n\n\n    pwd\n    id\n    ls -ld /data\n    touch /data/__write_test__\n    mkdir /data/__dir_test__\n    python - <<'PY'\n    from pathlib import Path\n    p = Path(\"/data/__py_write_test__.txt\")\n    p.write_text(\"ok\\n\")\n    print(p.read_text())\n    PY\n\n\nInterpretation:\n\n  * if `touch /data/__write_test__` and `mkdir /data/__dir_test__` succeed, then the bucket mount is writable and the earlier `mkdir u` test was just targeting the wrong place.\n  * if those fail with `Permission denied`, then the bucket is mounted but not writable **to your runtime user**.\n\n\n\n## Why `nobody:nogroup` matters\n\nIf `/data` really looks like this:\n\n\n    drwxr-xr-x 2 nobody nogroup ... /data\n\n\nthen only the **owner** has write permission. Group and others only have read/execute. So if your Gradio Space process runs as some uid other than `nobody`, Linux will treat `/data` as effectively read-only for your app. That would explain “can list and read, cannot create files.” This is standard Unix permission logic.\n\nWhat makes this suspicious is that it conflicts with Hugging Face’s documented contract for bucket mounts. Buckets are supposed to be the **writable** volume type in Spaces. So if `/data` is mounted as `nobody:nogroup` with mode `755` and your process is not that owner, the observed runtime behavior does **not** match the documented feature behavior. (Hugging Face)\n\n## Did you miss something in the setup?\n\nProbably not.\n\nFrom the public docs, the Space volume interface exposes the concepts `type`, `source`, `mount_path`, and `read_only`, and the examples for buckets are just `Volume(type=\"bucket\", source=\"...\", mount_path=\"/data\")`. I do **not** see documented uid/gid/permission remapping controls for Space volumes in the current interface. So in a Gradio Space, without Dockerfile-level control, there is no documented user-side knob to say “mount this bucket as my runtime uid.” That means if the mount arrives with incompatible ownership, there may be nothing you can fix from the app code itself. That is an inference from the current documented API surface, not an explicit Hugging Face statement. (Hugging Face)\n\n## Is this kind of permission problem seen elsewhere?\n\nYes. Not the exact same mount-ownership case every time, but the symptom family is real.\n\nA public TGI-on-Spaces issue shows `PermissionError: [Errno 13] Permission denied: '/data'` on Hugging Face Spaces, with the reporter saying the container keeps trying to write to `/data` and fails. That issue is not specifically about buckets, but it shows that `/data` permission failures on Spaces are a real public class of problem. (GitHub)\n\nThere are also older Hugging Face forum posts about Spaces failing to create directories because the runtime user lacked permission for the target path, especially in Docker Spaces or paths under `/`. Those cases are adjacent rather than identical, but they point in the same direction: write success depends on the runtime user matching the actual mounted path permissions. (Hugging Face Forums)\n\n## Why this may be happening now\n\nThe Spaces volume-mount feature for models, datasets, and buckets is very recent. Hugging Face’s v1.9.0 release notes from **April 2, 2026** introduced “Spaces Volumes: Mount Models, Datasets, and Buckets Directly” and stated that this replaces the deprecated persistent-storage feature. That recent rollout makes a mount-permission regression plausible, because there is not yet years of field-hardening or a large backlog of solved public cases for bucket mounts in Spaces. (GitHub)\n\n## What to do next\n\n### 1. Verify the actual mount behavior with absolute paths\n\nUse:\n\n\n    touch /data/test.txt\n    mkdir /data/testdir\n\n\nIf these work, the problem was only the earlier shell test.\n\nIf these fail, continue below.\n\n### 2. Check what the Space thinks is mounted\n\nIf you can run Python with a token that can inspect the Space, check the runtime volume config:\n\n\n    from huggingface_hub import HfApi\n\n    api = HfApi(token=...)\n    runtime = api.get_space_runtime(\"your-namespace/your-space\")\n    print(runtime.volumes)\n\n\nThat confirms whether the Space runtime still sees a `bucket` mounted at `/data`. The docs explicitly show `get_space_runtime(...).volumes` for this purpose. (Hugging Face)\n\n### 3. Try a mount reset\n\nBecause you do not control the mount user in a Gradio Space, the only Space-side fixes you can try are operational:\n\n  * detach the bucket\n  * restart or factory rebuild the Space\n  * reattach the bucket\n  * test again with `/data/...` absolute paths\n  * if possible, try a different mount path such as `/bucket` or `/output`\n\n\n\nThis is not guaranteed, but it is the only realistic remount/reset you can do from the Space side.\n\n### 4. If it still fails, treat it as a platform-side bug or support case\n\nAt that point the clean report is:\n\n  * bucket mounted as read-write\n  * Space can read/list `/data`\n  * `/data` shows `nobody:nogroup`\n  * `touch /data/test.txt` fails with `Permission denied`\n  * docs say bucket mounts are the writable mount type in Spaces\n\n\n\nThat is a direct mismatch between expected and observed behavior. (Hugging Face)\n\n## The practical workaround that avoids the mount entirely\n\nIf you need to ship now, the safest route is to **write to the bucket through the Hugging Face API** , not through the mounted filesystem.\n\nThis is fully supported. Hugging Face’s bucket docs say you can interact with buckets using the Python API or CLI, and `batch_bucket_files()` can upload from local paths or directly from raw bytes. The docs also note that `HfFileSystem` works for bucket paths, but recommend `HfApi` methods when possible for better performance and reliability. Authentication can be provided with `HF_TOKEN`, and the quickstart explicitly says `HF_TOKEN` is especially useful in a Space as a Space secret. (Hugging Face)\n\n### Minimal Python workaround\n\nSet `HF_TOKEN` as a Space secret, then do this:\n\n\n    import os\n    import json\n    import tempfile\n    from huggingface_hub import batch_bucket_files\n\n    BUCKET_ID = \"your-namespace/your-bucket\"\n\n    payload = {\n        \"status\": \"ok\",\n        \"message\": \"written from Space\"\n    }\n\n    with tempfile.NamedTemporaryFile(\"w\", suffix=\".json\", delete=False) as f:\n        json.dump(payload, f)\n        local_path = f.name\n\n    batch_bucket_files(\n        BUCKET_ID,\n        add=[(local_path, \"logs/result.json\")],\n        token=os.environ[\"HF_TOKEN\"],\n    )\n\n\nThat matches the documented bucket upload API pattern: local file path on the left, destination path inside the bucket on the right. (Hugging Face)\n\n### Direct bytes upload\n\nIf you do not even want a temp file:\n\n\n    import os\n    from huggingface_hub import batch_bucket_files\n\n    batch_bucket_files(\n        \"your-namespace/your-bucket\",\n        add=[(b'{\"status\":\"ok\"}', \"logs/result.json\")],\n        token=os.environ[\"HF_TOKEN\"],\n    )\n\n\nThe docs explicitly show raw-bytes uploads too. (Hugging Face)\n\n## The bottom line\n\nThe most likely outcomes are:\n\n  1. **Best case:** your test was pointed at the wrong directory.\n`mkdir u` was not testing `/data`.\n`mkdir /data/u` may work.\n\n  2. **More likely if your absolute-path test fails:** the bucket is attached, but the mounted directory ownership inside the Space runtime does not match the uid that your Gradio app runs under.\nIn that case, you probably did **not** misconfigure the bucket. The observed permissions simply do not match the documented promise that bucket mounts are writable. (Hugging Face)\n\n  3. **Fastest reliable path forward:** use `batch_bucket_files()` with `HF_TOKEN` from the Space instead of relying on `/data` until the mount behavior is fixed. (Hugging Face)\n\n\n",
  "title": "How should I write to bucket from a space"
}