Module 'torchaudio' has no attribute 'AudioMetaData'
Maybe that new issue is likely a compatibility problem on the pyannote side.
I don’t have much personal experience with pyannote myself, but I have used it while investigating migration issues. It’s a library very version sensitive where the usage itself tends to change significantly with each version update.
This isn’t limited to pyannote, but when updating libraries that are close to the backend, it’s best to proceed on the assumption that you’ll need to rewrite whole the model configurations and related execution code slightly within your scripts:
New errors after pinning pyannote/TorchAudio: causes and fixes
Short version
You made progress.
The original problem was:
AttributeError: module 'torchaudio' has no attribute 'AudioMetaData'
That was the TorchAudio 2.9+ compatibility problem. Pinning back to the Torch 2.8 / TorchAudio 2.8 generation gets you past that layer.
Now you have a different problem:
TypeError: Pipeline.from_pretrained() got an unexpected keyword argument 'token'
This is not the same error. This one is a pyannote API mismatch.
Your dependency resolver says:
brouhaha==0.9.0 depends on pyannote-audio==3.3.0
So your environment is now effectively pinned to:
pyannote.audio==3.3.0
But your code is calling pyannote like this:
pipeline = Pipeline.from_pretrained(MODEL_ID, token=tokens["diarization"])
and it is loading:
pyannote/speaker-diarization-community-1
That is the newer pyannote 4.x / Community-1 style. It does not match the pyannote.audio==3.3.0 API that brouhaha forces.
The immediate fix is:
MODEL_ID = "pyannote/speaker-diarization-3.1"
pipeline = Pipeline.from_pretrained(
MODEL_ID,
use_auth_token=tokens["diarization"],
)
Do not use token= with pyannote.audio==3.3.0.
Do not use speaker-diarization-community-1 while you are on the brouhaha / pyannote 3.3 recovery path.
Useful references:
- pyannote.audio 3.3.0 docs on PyPI
- pyannote/speaker-diarization-3.1 model card
- pyannote/speaker-diarization-community-1 model card
- pyannote.audio releases
- TorchAudio 2.8 list_audio_backends deprecation docs
- TorchAudio 2.8 deprecation overview
- TorchCodec compatibility table
- uv script locking docs
What caused the first new error?
You got this resolver error:
× No solution found when resolving script dependencies:
╰─▶ Because only brouhaha==0.9.0 is available and brouhaha==0.9.0 depends on pyannote-audio==3.3.0,
we can conclude that all versions of brouhaha depend on pyannote-audio==3.3.0.
And because you require pyannote-audio==3.4.0 and brouhaha, we can conclude that your
requirements are unsatisfiable.
This means uv is doing the correct thing.
You asked for:
pyannote-audio==3.4.0
but your local brouhaha package requires:
pyannote-audio==3.3.0
Those two cannot both be true.
So changing:
pyannote-audio==3.4.0
to:
pyannote-audio==3.3.0
was a reasonable fix.
But that change has an important consequence:
You are now on the pyannote 3.3 API.
That means the rest of the code must also use the pyannote 3.3 call style.
What caused the second new error?
You then got:
Loading diarization pipeline pyannote/speaker-diarization-community-1...
Traceback (most recent call last):
File "/home/user/diarization/repos/scripts/diaritranscribe3.py", line 621, in <module>
main()
File "/home/user/diarization/repos/scripts/diaritranscribe3.py", line 589, in main
diarization = diarize_audio(
^^^^^^^^^^^^^^
File "/home/user/diarization/repos/scripts/diaritranscribe3.py", line 208, in diarize_audio
pipeline = Pipeline.from_pretrained(MODEL_ID, token=tokens["diarization"])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: Pipeline.from_pretrained() got an unexpected keyword argument 'token'
The key line is:
Pipeline.from_pretrained(MODEL_ID, token=tokens["diarization"])
The token= keyword is the newer call style. It appears in current Community-1 examples.
But pyannote.audio==3.3.0 expects the older keyword:
use_auth_token=
So this:
pipeline = Pipeline.from_pretrained(
MODEL_ID,
token=tokens["diarization"],
)
should become this:
pipeline = Pipeline.from_pretrained(
MODEL_ID,
use_auth_token=tokens["diarization"],
)
That is the direct fix for the unexpected keyword argument 'token' error.
The model ID is probably wrong for this recovery path too
Your log says:
Loading diarization pipeline pyannote/speaker-diarization-community-1...
That is another mismatch.
For pyannote.audio==3.3.0, use:
MODEL_ID = "pyannote/speaker-diarization-3.1"
not:
MODEL_ID = "pyannote/speaker-diarization-community-1"
The speaker-diarization-community-1 pipeline belongs to the newer pyannote 4.x era. It is documented with token=..., output.speaker_diarization, and output.exclusive_speaker_diarization.
The pyannote 3.3 path is different. It uses speaker-diarization-3.1, use_auth_token=..., and the returned object is usually iterated with:
for turn, _, speaker in diarization.itertracks(yield_label=True):
...
References:
- pyannote.audio 3.3.0 docs
- pyannote/speaker-diarization-3.1
- pyannote/speaker-diarization-community-1
- pyannote.audio releases: Community-1 and exclusive diarization
The TorchAudio warning is expected
This warning:
/home/rodrigo/.cache/uv/environments-v2/diaritranscribe3-3f9949c47f20e532/lib/python3.12/site-packages/pyannote/audio/core/io.py:212: UserWarning: torchaudio._backend.list_audio_backends has been deprecated. This deprecation is part of a large refactoring effort to transition TorchAudio into a maintenance phase. The decoding and encoding capabilities of PyTorch for both audio and video are being consolidated into TorchCodec. Please see https://github.com/pytorch/audio/issues/3902 for more information. It will be removed from the 2.9 release.
torchaudio.list_audio_backends()
is not the current crash.
It means:
pyannote.audio 3.3.0 is calling an old TorchAudio API.
TorchAudio 2.8 still has that API, but warns that it will disappear in 2.9.
That warning is exactly why you should not upgrade TorchAudio to 2.9 in this recovery path.
Keep:
torch==2.8.0
torchaudio==2.8.0
TorchAudio 2.8 warns. TorchAudio 2.9 removes. For old pyannote code, a warning is better than a missing attribute crash.
Relevant references:
- TorchAudio 2.8 list_audio_backends deprecation docs
- TorchAudio 2.8 deprecation overview
- PyTorch Audio issue: TorchAudio future / TorchCodec transition
Recommended current fix
Use the pyannote 3.3-compatible dependency set
Given your brouhaha constraint, use this dependency block:
#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.10,<3.14"
# dependencies = [
# "faster-whisper",
# "numpy",
# "pyannote.audio==3.3.0",
# "scikit-learn",
# "torch==2.8.0",
# "torchaudio==2.8.0",
# "torchcodec==0.7.*",
# "omegaconf",
# "brouhaha @ file:///home/user/diarization/repos/.venv/brouhaha-vad",
# ]
# ///
Why:
| Package | Reason |
|---|---|
pyannote.audio==3.3.0 |
Required by your local brouhaha==0.9.0 package. |
torch==2.8.0 |
Coherent with TorchAudio 2.8 and TorchCodec 0.7. |
torchaudio==2.8.0 |
Keeps deprecated APIs available instead of removed. |
torchcodec==0.7.* |
TorchCodec’s compatibility table maps 0.7 to Torch 2.8. |
faster-whisper |
Keep it for transcription, but debug it separately from pyannote. |
No manual nvidia-* packages |
Avoid mixing CUDA generations while fixing pyannote import and model loading. |
Useful references:
- TorchAudio installation compatibility notes
- TorchCodec compatibility table
- uv scripts guide
Recommended code patch
Find your current code around line 208:
pipeline = Pipeline.from_pretrained(MODEL_ID, token=tokens["diarization"])
Change it to:
pipeline = Pipeline.from_pretrained(
MODEL_ID,
use_auth_token=tokens["diarization"],
)
Also change the model ID.
If you currently have:
MODEL_ID = "pyannote/speaker-diarization-community-1"
change it to:
MODEL_ID = "pyannote/speaker-diarization-3.1"
A compact pyannote 3.3-compatible function would look like:
from pyannote.audio import Pipeline
import torch
MODEL_ID = "pyannote/speaker-diarization-3.1"
def diarize_audio(audio_path, tokens):
print(f"Loading diarization pipeline {MODEL_ID}...")
pipeline = Pipeline.from_pretrained(
MODEL_ID,
use_auth_token=tokens["diarization"],
)
if torch.cuda.is_available():
pipeline.to(torch.device("cuda"))
diarization = pipeline(audio_path)
return diarization
Then, when reading the result:
for turn, _, speaker in diarization.itertracks(yield_label=True):
print(f"{turn.start:.2f} {turn.end:.2f} {speaker}")
This matches the pyannote 3.x style.
Why it still happens after “reverting” the script
There are a few likely reasons.
1. You changed the environment, not just the file
Even if you revert part of diaritranscribe3.py, your dependency environment still contains:
pyannote.audio==3.3.0
because brouhaha requires it.
So token= will keep failing until the code matches pyannote 3.3.
Check the actual runtime version:
from importlib.metadata import version
print("pyannote.audio:", version("pyannote.audio"))
Expected now:
pyannote.audio: 3.3.0
If that is the version, use:
use_auth_token=
not:
token=
2. Your MODEL_ID may still point to Community-1
Search your script:
grep -n "speaker-diarization" diaritranscribe3.py
For the recovery path, it should show:
pyannote/speaker-diarization-3.1
not:
pyannote/speaker-diarization-community-1
3. Your script may still contain token=
Search:
grep -n "token=" diaritranscribe3.py
For the pyannote call, change:
token=tokens["diarization"]
to:
use_auth_token=tokens["diarization"]
Do not necessarily change every token= in the whole script. Other libraries may still use a token keyword. The specific problem is the pyannote 3.3 call to Pipeline.from_pretrained.
4. uv may be reusing a cached script environment
Use refresh while testing:
uv run --refresh --script diaritranscribe3.py
Then inspect the dependency tree:
uv tree --script diaritranscribe3.py
You want to see something close to:
pyannote.audio==3.3.0
torch==2.8.0
torchaudio==2.8.0
torchcodec==0.7.x
Once it works, lock it:
uv lock --script diaritranscribe3.py
Reference:
- uv locking script dependencies
Two coherent paths from here
Path A — recommended now: stay with brouhaha and pyannote 3.3
Choose this if your priority is to get the current script working.
Use:
pyannote.audio==3.3.0
torch==2.8.0
torchaudio==2.8.0
torchcodec==0.7.*
Use model:
MODEL_ID = "pyannote/speaker-diarization-3.1"
Use auth keyword:
use_auth_token=tokens["diarization"]
Use output iteration:
for turn, _, speaker in diarization.itertracks(yield_label=True):
...
This is the low-risk recovery path because it respects the brouhaha dependency pin.
Path B — later migration: use Community-1 and pyannote 4.x
Choose this if you want the newer pyannote stack and are willing to deal with migration work.
You would need to remove or modify the brouhaha constraint first. Options:
- Remove
brouhaha. - Replace
brouhahawith another VAD path. - Fork/edit your local
brouhahapackage so it does not requirepyannote-audio==3.3.0. - Update
brouhaha, if a newer compatible version exists in your local project. - Split the environment so
brouhahaand modern pyannote are not forced into the same dependency graph.
Then you can move toward:
pipeline = Pipeline.from_pretrained(
"pyannote/speaker-diarization-community-1",
token=tokens["diarization"],
)
and newer output handling:
output = pipeline(audio_path)
for turn, speaker in output.speaker_diarization:
print(turn.start, turn.end, speaker)
# If available and useful for transcript alignment:
for turn, speaker in output.exclusive_speaker_diarization:
print(turn.start, turn.end, speaker)
But treat this as a real migration. It may involve:
- TorchCodec;
- FFmpeg;
- newer pyannote output objects;
- new model access requirements;
- possibly higher VRAM use;
- different diarization output behavior;
- changes to transcript/speaker alignment code.
Useful references:
- pyannote/speaker-diarization-community-1
- Community-1 launch post
- pyannote.audio releases
- TorchCodec README
Immediate diagnostic checklist
Run these in order.
1. Confirm versions
Add this temporarily near the top of the script:
from importlib.metadata import version
import torch
import torchaudio
print("pyannote.audio:", version("pyannote.audio"))
print("torch:", torch.__version__)
print("torchaudio:", torchaudio.__version__)
print("torchcodec:", version("torchcodec"))
print("AudioMetaData exists:", hasattr(torchaudio, "AudioMetaData"))
Expected for the recovery path:
pyannote.audio: 3.3.0
torch: 2.8.0...
torchaudio: 2.8.0...
torchcodec: 0.7...
AudioMetaData exists: True
If torchaudio is 2.9.x, you are back in the danger zone.
2. Confirm model ID
For Path A, use:
MODEL_ID = "pyannote/speaker-diarization-3.1"
not:
MODEL_ID = "pyannote/speaker-diarization-community-1"
3. Confirm auth keyword
For Path A, use:
pipeline = Pipeline.from_pretrained(
MODEL_ID,
use_auth_token=tokens["diarization"],
)
not:
pipeline = Pipeline.from_pretrained(
MODEL_ID,
token=tokens["diarization"],
)
4. Confirm access to gated models
For speaker-diarization-3.1, make sure the Hugging Face account behind your token has accepted the relevant model conditions.
Common symptoms of missing access are different from your current error. They look more like:
401 Unauthorized
403 Forbidden
Repository not found
Could not download pipeline
Useful links:
- pyannote/speaker-diarization-3.1
- pyannote/segmentation-3.0
- Hugging Face access tokens docs
5. Refresh uv while testing
uv run --refresh --script diaritranscribe3.py
Then inspect:
uv tree --script diaritranscribe3.py
Then lock after success:
uv lock --script diaritranscribe3.py
What not to do right now
Do not upgrade TorchAudio to silence the warning.
This warning:
torchaudio._backend.list_audio_backends has been deprecated
does not mean:
upgrade torchaudio
In this case it means:
you are using legacy pyannote code that still works on TorchAudio 2.8, but will break on TorchAudio 2.9
So for the recovery path, keep:
torchaudio==2.8.0
Do not switch back to:
token=tokens["diarization"]
unless you migrate to a pyannote version that supports it.
Do not use:
pyannote/speaker-diarization-community-1
unless you deliberately move to the newer pyannote 4.x path.
Do not reintroduce mixed CUDA packages while debugging this pyannote problem. CUDA can be debugged after pyannote loads.
Final recommended state for your current script
Use this dependency block:
#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.10,<3.14"
# dependencies = [
# "faster-whisper",
# "numpy",
# "pyannote.audio==3.3.0",
# "scikit-learn",
# "torch==2.8.0",
# "torchaudio==2.8.0",
# "torchcodec==0.7.*",
# "omegaconf",
# "brouhaha @ file:///home/user/diarization/repos/.venv/brouhaha-vad",
# ]
# ///
Use this model ID:
MODEL_ID = "pyannote/speaker-diarization-3.1"
Use this load call:
pipeline = Pipeline.from_pretrained(
MODEL_ID,
use_auth_token=tokens["diarization"],
)
Use this output style:
for turn, _, speaker in diarization.itertracks(yield_label=True):
print(f"{turn.start:.2f} {turn.end:.2f} {speaker}")
Then run:
uv run --refresh --script diaritranscribe3.py
Inspect:
uv tree --script diaritranscribe3.py
Lock:
uv lock --script diaritranscribe3.py
Bottom line
You now have two separate compatibility boundaries:
TorchAudio boundary Old pyannote code needs TorchAudio 2.8, not 2.9+. The deprecation warning is expected. Keep
torchaudio==2.8.0.pyannote API boundary
brouhahaforcespyannote.audio==3.3.0. That means usespeaker-diarization-3.1anduse_auth_token=..., notspeaker-diarization-community-1andtoken=....
So the practical fix is:
pyannote.audio==3.3.0
torch==2.8.0
torchaudio==2.8.0
torchcodec==0.7.*
MODEL_ID = "pyannote/speaker-diarization-3.1"
Pipeline.from_pretrained(..., use_auth_token=...)
Treat community-1 + token=... as a later migration path after you remove, update, or fork the brouhaha dependency that pins pyannote to 3.3.0.
Discussion in the ATmosphere