Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreihuo6dqssvvhtkrj2y4aioec2jbovcx75exzhgv2lmes42tkomkl4",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mkni3jmfkhs2"
  },
  "path": "/t/module-torchaudio-has-no-attribute-audiometadata/175647#post_4",
  "publishedAt": "2026-04-29T15:33:52.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "pyannote.audio 3.3.0 docs on PyPI",
    "pyannote/speaker-diarization-3.1 model card",
    "pyannote/speaker-diarization-community-1 model card",
    "pyannote.audio releases",
    "TorchAudio 2.8 list_audio_backends deprecation docs",
    "TorchAudio 2.8 deprecation overview",
    "TorchCodec compatibility table",
    "uv script locking docs",
    "pyannote.audio 3.3.0 docs",
    "pyannote/speaker-diarization-3.1",
    "pyannote/speaker-diarization-community-1",
    "pyannote.audio releases: Community-1 and exclusive diarization",
    "PyTorch Audio issue: TorchAudio future / TorchCodec transition",
    "TorchAudio installation compatibility notes",
    "uv scripts guide",
    "uv locking script dependencies",
    "Community-1 launch post",
    "TorchCodec README",
    "pyannote/segmentation-3.0",
    "Hugging Face access tokens docs"
  ],
  "textContent": "Maybe that new issue is likely a compatibility problem on the `pyannote` side.\nI don’t have much personal experience with `pyannote` myself, but I have used it while investigating migration issues. It’s a library very version sensitive where the usage itself tends to change significantly with each version update.\n\nThis isn’t limited to `pyannote`, but **when updating libraries that are close to the backend, it’s best to proceed on the assumption that you’ll need to rewrite whole the model configurations and related execution code** slightly within your scripts:\n\n* * *\n\n# New errors after pinning pyannote/TorchAudio: causes and fixes\n\n## Short version\n\nYou made progress.\n\nThe original problem was:\n\n\n    AttributeError: module 'torchaudio' has no attribute 'AudioMetaData'\n\n\nThat was the **TorchAudio 2.9+ compatibility problem**. Pinning back to the Torch 2.8 / TorchAudio 2.8 generation gets you past that layer.\n\nNow you have a different problem:\n\n\n    TypeError: Pipeline.from_pretrained() got an unexpected keyword argument 'token'\n\n\nThis is not the same error. This one is a **pyannote API mismatch**.\n\nYour dependency resolver says:\n\n\n    brouhaha==0.9.0 depends on pyannote-audio==3.3.0\n\n\nSo your environment is now effectively pinned to:\n\n\n    pyannote.audio==3.3.0\n\n\nBut your code is calling pyannote like this:\n\n\n    pipeline = Pipeline.from_pretrained(MODEL_ID, token=tokens[\"diarization\"])\n\n\nand it is loading:\n\n\n    pyannote/speaker-diarization-community-1\n\n\nThat is the newer pyannote 4.x / Community-1 style. It does not match the `pyannote.audio==3.3.0` API that `brouhaha` forces.\n\nThe immediate fix is:\n\n\n    MODEL_ID = \"pyannote/speaker-diarization-3.1\"\n\n    pipeline = Pipeline.from_pretrained(\n        MODEL_ID,\n        use_auth_token=tokens[\"diarization\"],\n    )\n\n\nDo **not** use `token=` with `pyannote.audio==3.3.0`.\n\nDo **not** use `speaker-diarization-community-1` while you are on the `brouhaha` / pyannote 3.3 recovery path.\n\nUseful references:\n\n  * pyannote.audio 3.3.0 docs on PyPI\n  * pyannote/speaker-diarization-3.1 model card\n  * pyannote/speaker-diarization-community-1 model card\n  * pyannote.audio releases\n  * TorchAudio 2.8 list_audio_backends deprecation docs\n  * TorchAudio 2.8 deprecation overview\n  * TorchCodec compatibility table\n  * uv script locking docs\n\n\n\n* * *\n\n# What caused the first new error?\n\nYou got this resolver error:\n\n\n    × No solution found when resolving script dependencies:\n    ╰─▶ Because only brouhaha==0.9.0 is available and brouhaha==0.9.0 depends on pyannote-audio==3.3.0,\n        we can conclude that all versions of brouhaha depend on pyannote-audio==3.3.0.\n        And because you require pyannote-audio==3.4.0 and brouhaha, we can conclude that your\n        requirements are unsatisfiable.\n\n\nThis means uv is doing the correct thing.\n\nYou asked for:\n\n\n    pyannote-audio==3.4.0\n\n\nbut your local `brouhaha` package requires:\n\n\n    pyannote-audio==3.3.0\n\n\nThose two cannot both be true.\n\nSo changing:\n\n\n    pyannote-audio==3.4.0\n\n\nto:\n\n\n    pyannote-audio==3.3.0\n\n\nwas a reasonable fix.\n\nBut that change has an important consequence:\n\n\n    You are now on the pyannote 3.3 API.\n\n\nThat means the rest of the code must also use the pyannote 3.3 call style.\n\n* * *\n\n# What caused the second new error?\n\nYou then got:\n\n\n    Loading diarization pipeline pyannote/speaker-diarization-community-1...\n    Traceback (most recent call last):\n      File \"/home/user/diarization/repos/scripts/diaritranscribe3.py\", line 621, in <module>\n        main()\n      File \"/home/user/diarization/repos/scripts/diaritranscribe3.py\", line 589, in main\n        diarization = diarize_audio(\n                      ^^^^^^^^^^^^^^\n      File \"/home/user/diarization/repos/scripts/diaritranscribe3.py\", line 208, in diarize_audio\n        pipeline = Pipeline.from_pretrained(MODEL_ID, token=tokens[\"diarization\"])\n                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n    TypeError: Pipeline.from_pretrained() got an unexpected keyword argument 'token'\n\n\nThe key line is:\n\n\n    Pipeline.from_pretrained(MODEL_ID, token=tokens[\"diarization\"])\n\n\nThe `token=` keyword is the newer call style. It appears in current Community-1 examples.\n\nBut `pyannote.audio==3.3.0` expects the older keyword:\n\n\n    use_auth_token=\n\n\nSo this:\n\n\n    pipeline = Pipeline.from_pretrained(\n        MODEL_ID,\n        token=tokens[\"diarization\"],\n    )\n\n\nshould become this:\n\n\n    pipeline = Pipeline.from_pretrained(\n        MODEL_ID,\n        use_auth_token=tokens[\"diarization\"],\n    )\n\n\nThat is the direct fix for the `unexpected keyword argument 'token'` error.\n\n* * *\n\n# The model ID is probably wrong for this recovery path too\n\nYour log says:\n\n\n    Loading diarization pipeline pyannote/speaker-diarization-community-1...\n\n\nThat is another mismatch.\n\nFor `pyannote.audio==3.3.0`, use:\n\n\n    MODEL_ID = \"pyannote/speaker-diarization-3.1\"\n\n\nnot:\n\n\n    MODEL_ID = \"pyannote/speaker-diarization-community-1\"\n\n\nThe `speaker-diarization-community-1` pipeline belongs to the newer pyannote 4.x era. It is documented with `token=...`, `output.speaker_diarization`, and `output.exclusive_speaker_diarization`.\n\nThe pyannote 3.3 path is different. It uses `speaker-diarization-3.1`, `use_auth_token=...`, and the returned object is usually iterated with:\n\n\n    for turn, _, speaker in diarization.itertracks(yield_label=True):\n        ...\n\n\nReferences:\n\n  * pyannote.audio 3.3.0 docs\n  * pyannote/speaker-diarization-3.1\n  * pyannote/speaker-diarization-community-1\n  * pyannote.audio releases: Community-1 and exclusive diarization\n\n\n\n* * *\n\n# The TorchAudio warning is expected\n\nThis warning:\n\n\n    /home/rodrigo/.cache/uv/environments-v2/diaritranscribe3-3f9949c47f20e532/lib/python3.12/site-packages/pyannote/audio/core/io.py:212: UserWarning: torchaudio._backend.list_audio_backends has been deprecated. This deprecation is part of a large refactoring effort to transition TorchAudio into a maintenance phase. The decoding and encoding capabilities of PyTorch for both audio and video are being consolidated into TorchCodec. Please see https://github.com/pytorch/audio/issues/3902 for more information. It will be removed from the 2.9 release.\n      torchaudio.list_audio_backends()\n\n\nis not the current crash.\n\nIt means:\n\n\n    pyannote.audio 3.3.0 is calling an old TorchAudio API.\n    TorchAudio 2.8 still has that API, but warns that it will disappear in 2.9.\n\n\nThat warning is exactly why you should **not** upgrade TorchAudio to 2.9 in this recovery path.\n\nKeep:\n\n\n    torch==2.8.0\n    torchaudio==2.8.0\n\n\nTorchAudio 2.8 warns. TorchAudio 2.9 removes. For old pyannote code, a warning is better than a missing attribute crash.\n\nRelevant references:\n\n  * TorchAudio 2.8 list_audio_backends deprecation docs\n  * TorchAudio 2.8 deprecation overview\n  * PyTorch Audio issue: TorchAudio future / TorchCodec transition\n\n\n\n* * *\n\n# Recommended current fix\n\n## Use the pyannote 3.3-compatible dependency set\n\nGiven your `brouhaha` constraint, use this dependency block:\n\n\n    #!/usr/bin/env -S uv run --script\n    # /// script\n    # requires-python = \">=3.10,<3.14\"\n    # dependencies = [\n    #   \"faster-whisper\",\n    #   \"numpy\",\n    #   \"pyannote.audio==3.3.0\",\n    #   \"scikit-learn\",\n    #   \"torch==2.8.0\",\n    #   \"torchaudio==2.8.0\",\n    #   \"torchcodec==0.7.*\",\n    #   \"omegaconf\",\n    #   \"brouhaha @ file:///home/user/diarization/repos/.venv/brouhaha-vad\",\n    # ]\n    # ///\n\n\nWhy:\n\nPackage | Reason\n---|---\n`pyannote.audio==3.3.0` | Required by your local `brouhaha==0.9.0` package.\n`torch==2.8.0` | Coherent with TorchAudio 2.8 and TorchCodec 0.7.\n`torchaudio==2.8.0` | Keeps deprecated APIs available instead of removed.\n`torchcodec==0.7.*` | TorchCodec’s compatibility table maps `0.7` to Torch `2.8`.\n`faster-whisper` | Keep it for transcription, but debug it separately from pyannote.\nNo manual `nvidia-*` packages | Avoid mixing CUDA generations while fixing pyannote import and model loading.\n\nUseful references:\n\n  * TorchAudio installation compatibility notes\n  * TorchCodec compatibility table\n  * uv scripts guide\n\n\n\n* * *\n\n# Recommended code patch\n\nFind your current code around line 208:\n\n\n    pipeline = Pipeline.from_pretrained(MODEL_ID, token=tokens[\"diarization\"])\n\n\nChange it to:\n\n\n    pipeline = Pipeline.from_pretrained(\n        MODEL_ID,\n        use_auth_token=tokens[\"diarization\"],\n    )\n\n\nAlso change the model ID.\n\nIf you currently have:\n\n\n    MODEL_ID = \"pyannote/speaker-diarization-community-1\"\n\n\nchange it to:\n\n\n    MODEL_ID = \"pyannote/speaker-diarization-3.1\"\n\n\nA compact pyannote 3.3-compatible function would look like:\n\n\n    from pyannote.audio import Pipeline\n    import torch\n\n    MODEL_ID = \"pyannote/speaker-diarization-3.1\"\n\n    def diarize_audio(audio_path, tokens):\n        print(f\"Loading diarization pipeline {MODEL_ID}...\")\n\n        pipeline = Pipeline.from_pretrained(\n            MODEL_ID,\n            use_auth_token=tokens[\"diarization\"],\n        )\n\n        if torch.cuda.is_available():\n            pipeline.to(torch.device(\"cuda\"))\n\n        diarization = pipeline(audio_path)\n\n        return diarization\n\n\nThen, when reading the result:\n\n\n    for turn, _, speaker in diarization.itertracks(yield_label=True):\n        print(f\"{turn.start:.2f} {turn.end:.2f} {speaker}\")\n\n\nThis matches the pyannote 3.x style.\n\n* * *\n\n# Why it still happens after “reverting” the script\n\nThere are a few likely reasons.\n\n## 1. You changed the environment, not just the file\n\nEven if you revert part of `diaritranscribe3.py`, your dependency environment still contains:\n\n\n    pyannote.audio==3.3.0\n\n\nbecause `brouhaha` requires it.\n\nSo `token=` will keep failing until the code matches pyannote 3.3.\n\nCheck the actual runtime version:\n\n\n    from importlib.metadata import version\n\n    print(\"pyannote.audio:\", version(\"pyannote.audio\"))\n\n\nExpected now:\n\n\n    pyannote.audio: 3.3.0\n\n\nIf that is the version, use:\n\n\n    use_auth_token=\n\n\nnot:\n\n\n    token=\n\n\n* * *\n\n## 2. Your `MODEL_ID` may still point to Community-1\n\nSearch your script:\n\n\n    grep -n \"speaker-diarization\" diaritranscribe3.py\n\n\nFor the recovery path, it should show:\n\n\n    pyannote/speaker-diarization-3.1\n\n\nnot:\n\n\n    pyannote/speaker-diarization-community-1\n\n\n* * *\n\n## 3. Your script may still contain `token=`\n\nSearch:\n\n\n    grep -n \"token=\" diaritranscribe3.py\n\n\nFor the pyannote call, change:\n\n\n    token=tokens[\"diarization\"]\n\n\nto:\n\n\n    use_auth_token=tokens[\"diarization\"]\n\n\nDo not necessarily change every `token=` in the whole script. Other libraries may still use a `token` keyword. The specific problem is the pyannote 3.3 call to `Pipeline.from_pretrained`.\n\n* * *\n\n## 4. uv may be reusing a cached script environment\n\nUse refresh while testing:\n\n\n    uv run --refresh --script diaritranscribe3.py\n\n\nThen inspect the dependency tree:\n\n\n    uv tree --script diaritranscribe3.py\n\n\nYou want to see something close to:\n\n\n    pyannote.audio==3.3.0\n    torch==2.8.0\n    torchaudio==2.8.0\n    torchcodec==0.7.x\n\n\nOnce it works, lock it:\n\n\n    uv lock --script diaritranscribe3.py\n\n\nReference:\n\n  * uv locking script dependencies\n\n\n\n* * *\n\n# Two coherent paths from here\n\n## Path A — recommended now: stay with `brouhaha` and pyannote 3.3\n\nChoose this if your priority is to get the current script working.\n\nUse:\n\n\n    pyannote.audio==3.3.0\n    torch==2.8.0\n    torchaudio==2.8.0\n    torchcodec==0.7.*\n\n\nUse model:\n\n\n    MODEL_ID = \"pyannote/speaker-diarization-3.1\"\n\n\nUse auth keyword:\n\n\n    use_auth_token=tokens[\"diarization\"]\n\n\nUse output iteration:\n\n\n    for turn, _, speaker in diarization.itertracks(yield_label=True):\n        ...\n\n\nThis is the low-risk recovery path because it respects the `brouhaha` dependency pin.\n\n* * *\n\n## Path B — later migration: use Community-1 and pyannote 4.x\n\nChoose this if you want the newer pyannote stack and are willing to deal with migration work.\n\nYou would need to remove or modify the `brouhaha` constraint first. Options:\n\n  1. Remove `brouhaha`.\n  2. Replace `brouhaha` with another VAD path.\n  3. Fork/edit your local `brouhaha` package so it does not require `pyannote-audio==3.3.0`.\n  4. Update `brouhaha`, if a newer compatible version exists in your local project.\n  5. Split the environment so `brouhaha` and modern pyannote are not forced into the same dependency graph.\n\n\n\nThen you can move toward:\n\n\n    pipeline = Pipeline.from_pretrained(\n        \"pyannote/speaker-diarization-community-1\",\n        token=tokens[\"diarization\"],\n    )\n\n\nand newer output handling:\n\n\n    output = pipeline(audio_path)\n\n    for turn, speaker in output.speaker_diarization:\n        print(turn.start, turn.end, speaker)\n\n    # If available and useful for transcript alignment:\n    for turn, speaker in output.exclusive_speaker_diarization:\n        print(turn.start, turn.end, speaker)\n\n\nBut treat this as a real migration. It may involve:\n\n  * TorchCodec;\n  * FFmpeg;\n  * newer pyannote output objects;\n  * new model access requirements;\n  * possibly higher VRAM use;\n  * different diarization output behavior;\n  * changes to transcript/speaker alignment code.\n\n\n\nUseful references:\n\n  * pyannote/speaker-diarization-community-1\n  * Community-1 launch post\n  * pyannote.audio releases\n  * TorchCodec README\n\n\n\n* * *\n\n# Immediate diagnostic checklist\n\nRun these in order.\n\n## 1. Confirm versions\n\nAdd this temporarily near the top of the script:\n\n\n    from importlib.metadata import version\n    import torch\n    import torchaudio\n\n    print(\"pyannote.audio:\", version(\"pyannote.audio\"))\n    print(\"torch:\", torch.__version__)\n    print(\"torchaudio:\", torchaudio.__version__)\n    print(\"torchcodec:\", version(\"torchcodec\"))\n    print(\"AudioMetaData exists:\", hasattr(torchaudio, \"AudioMetaData\"))\n\n\nExpected for the recovery path:\n\n\n    pyannote.audio: 3.3.0\n    torch: 2.8.0...\n    torchaudio: 2.8.0...\n    torchcodec: 0.7...\n    AudioMetaData exists: True\n\n\nIf `torchaudio` is `2.9.x`, you are back in the danger zone.\n\n* * *\n\n## 2. Confirm model ID\n\nFor Path A, use:\n\n\n    MODEL_ID = \"pyannote/speaker-diarization-3.1\"\n\n\nnot:\n\n\n    MODEL_ID = \"pyannote/speaker-diarization-community-1\"\n\n\n* * *\n\n## 3. Confirm auth keyword\n\nFor Path A, use:\n\n\n    pipeline = Pipeline.from_pretrained(\n        MODEL_ID,\n        use_auth_token=tokens[\"diarization\"],\n    )\n\n\nnot:\n\n\n    pipeline = Pipeline.from_pretrained(\n        MODEL_ID,\n        token=tokens[\"diarization\"],\n    )\n\n\n* * *\n\n## 4. Confirm access to gated models\n\nFor `speaker-diarization-3.1`, make sure the Hugging Face account behind your token has accepted the relevant model conditions.\n\nCommon symptoms of missing access are different from your current error. They look more like:\n\n\n    401 Unauthorized\n    403 Forbidden\n    Repository not found\n    Could not download pipeline\n\n\nUseful links:\n\n  * pyannote/speaker-diarization-3.1\n  * pyannote/segmentation-3.0\n  * Hugging Face access tokens docs\n\n\n\n* * *\n\n## 5. Refresh uv while testing\n\n\n    uv run --refresh --script diaritranscribe3.py\n\n\nThen inspect:\n\n\n    uv tree --script diaritranscribe3.py\n\n\nThen lock after success:\n\n\n    uv lock --script diaritranscribe3.py\n\n\n* * *\n\n# What not to do right now\n\nDo not upgrade TorchAudio to silence the warning.\n\nThis warning:\n\n\n    torchaudio._backend.list_audio_backends has been deprecated\n\n\ndoes not mean:\n\n\n    upgrade torchaudio\n\n\nIn this case it means:\n\n\n    you are using legacy pyannote code that still works on TorchAudio 2.8, but will break on TorchAudio 2.9\n\n\nSo for the recovery path, keep:\n\n\n    torchaudio==2.8.0\n\n\nDo not switch back to:\n\n\n    token=tokens[\"diarization\"]\n\n\nunless you migrate to a pyannote version that supports it.\n\nDo not use:\n\n\n    pyannote/speaker-diarization-community-1\n\n\nunless you deliberately move to the newer pyannote 4.x path.\n\nDo not reintroduce mixed CUDA packages while debugging this pyannote problem. CUDA can be debugged after pyannote loads.\n\n* * *\n\n# Final recommended state for your current script\n\nUse this dependency block:\n\n\n    #!/usr/bin/env -S uv run --script\n    # /// script\n    # requires-python = \">=3.10,<3.14\"\n    # dependencies = [\n    #   \"faster-whisper\",\n    #   \"numpy\",\n    #   \"pyannote.audio==3.3.0\",\n    #   \"scikit-learn\",\n    #   \"torch==2.8.0\",\n    #   \"torchaudio==2.8.0\",\n    #   \"torchcodec==0.7.*\",\n    #   \"omegaconf\",\n    #   \"brouhaha @ file:///home/user/diarization/repos/.venv/brouhaha-vad\",\n    # ]\n    # ///\n\n\nUse this model ID:\n\n\n    MODEL_ID = \"pyannote/speaker-diarization-3.1\"\n\n\nUse this load call:\n\n\n    pipeline = Pipeline.from_pretrained(\n        MODEL_ID,\n        use_auth_token=tokens[\"diarization\"],\n    )\n\n\nUse this output style:\n\n\n    for turn, _, speaker in diarization.itertracks(yield_label=True):\n        print(f\"{turn.start:.2f} {turn.end:.2f} {speaker}\")\n\n\nThen run:\n\n\n    uv run --refresh --script diaritranscribe3.py\n\n\nInspect:\n\n\n    uv tree --script diaritranscribe3.py\n\n\nLock:\n\n\n    uv lock --script diaritranscribe3.py\n\n\n* * *\n\n# Bottom line\n\nYou now have two separate compatibility boundaries:\n\n  1. **TorchAudio boundary**\nOld pyannote code needs TorchAudio 2.8, not 2.9+.\nThe deprecation warning is expected. Keep `torchaudio==2.8.0`.\n\n  2. **pyannote API boundary**\n`brouhaha` forces `pyannote.audio==3.3.0`.\nThat means use `speaker-diarization-3.1` and `use_auth_token=...`, not `speaker-diarization-community-1` and `token=...`.\n\n\n\n\nSo the practical fix is:\n\n\n    pyannote.audio==3.3.0\n    torch==2.8.0\n    torchaudio==2.8.0\n    torchcodec==0.7.*\n    MODEL_ID = \"pyannote/speaker-diarization-3.1\"\n    Pipeline.from_pretrained(..., use_auth_token=...)\n\n\nTreat `community-1` + `token=...` as a later migration path after you remove, update, or fork the `brouhaha` dependency that pins pyannote to 3.3.0.",
  "title": "Module 'torchaudio' has no attribute 'AudioMetaData'"
}