External Publication
Visit Post

Module 'torchaudio' has no attribute 'AudioMetaData'

Hugging Face Forums [Unofficial] April 29, 2026
Source

Seems maybe just a compatibility issue. If you just want to work around the issue, pinning the library version should suffice, but if you want to use the latest version of the library, you may need to make some modifications to the script:


Fixing AttributeError: module 'torchaudio' has no attribute 'AudioMetaData' in a pyannote + faster-whisper + uv script

Short diagnosis

This is very likely a dependency compatibility issue , not a bug in your diarization or transcription logic.

The important clue is that the crash happens during this import:

from pyannote.audio import Inference, Model, Pipeline

The script is not reaching the real diarization/transcription part yet. Python is still importing pyannote.audio, and the crash happens inside pyannote’s audio I/O module:

) -> torchaudio.AudioMetaData:
      ^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: module 'torchaudio' has no attribute 'AudioMetaData'

That means pyannote code is referencing:

torchaudio.AudioMetaData

but the installed torchaudio package no longer exposes that object.

This fits the current TorchAudio transition: AudioMetaData and related audio I/O APIs were deprecated in TorchAudio 2.8 and removed in TorchAudio 2.9 as part of TorchAudio’s move into maintenance mode and the shift of media decoding/encoding functionality toward TorchCodec.

Useful references:

  • TorchAudio 2.8 torchaudio.info / AudioMetaData deprecation docs
  • TorchAudio 2.8 package warning: many APIs deprecated in 2.8 and removed in 2.9
  • TorchAudio 2.9 docs: APIs deprecated in 2.8 were removed in 2.9
  • PyTorch Audio issue: Update on TorchAudio’s future
  • Matching pyannote issue: AttributeError: module 'torchaudio' has no attribute 'AudioMetaData'

The likely version story is:

Your script asks uv for broad/latest-ish package versions
        ↓
uv resolves a newer TorchAudio, probably 2.9+
        ↓
the installed pyannote.audio code still references torchaudio.AudioMetaData
        ↓
import pyannote.audio fails before your own script logic runs

Why your current dependency block is fragile

Your current inline metadata has this shape:

#!/usr/bin/env -S uv run
# /// script
# requires-python = ">=3.10"
# dependencies = [
#   "faster-whisper",
#   "nvidia-cublas-cu12",
#   "nvidia-cudnn-cu12",
#   "numpy",
#   "pyannote.audio>=3.1",
#   "nvidia-cublas",
#   "nvidia-cudnn-cu13",
#   "nvidia-npp",
#   "scikit-learn",
#   "torch",
#   "torchaudio",
#   "torchcodec",
#   "omegaconf",
#   "brouhaha @ file:///home/user/diarization/repos/.venv/brouhaha-vad",
# ]
# ///

The main risk is here:

"pyannote.audio>=3.1",
"torch",
"torchaudio",
"torchcodec",

Those constraints are too broad for a fast-moving audio/ML stack.

They allow uv to pick a package family like:

pyannote.audio 3.x
torch 2.9.x
torchaudio 2.9.x
torchcodec 0.8.x or 0.9.x

That is exactly the kind of combination that can fail: pyannote 3.x-era code may still reference older TorchAudio APIs, while TorchAudio 2.9 removed APIs deprecated in 2.8.

This is not really uv’s fault. uv is resolving from the constraints you gave it. The problem is that the constraints are too loose for a stack where torch, torchaudio, torchcodec, CUDA libraries, FFmpeg, pyannote, and faster-whisper all interact.

Relevant docs:

  • uv running scripts guide
  • uv PyTorch integration guide
  • TorchAudio installation docs: PyTorch and TorchAudio versions must match
  • TorchCodec README compatibility table

Recommended fix: recover first with a pinned compatible stack

I would not start by rewriting the whole diarization/transcription pipeline. First, recover the existing script by pinning a compatible version family.

Use this dependency block first:

#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.10,<3.14"
# dependencies = [
#   "faster-whisper",
#   "numpy",
#   "pyannote.audio==3.4.0",
#   "scikit-learn",
#   "torch==2.8.0",
#   "torchaudio==2.8.0",
#   "torchcodec==0.7.*",
#   "omegaconf",
#   "brouhaha @ file:///home/user/diarization/repos/.venv/brouhaha-vad",
# ]
# ///

Why these pins?

Package Reason
pyannote.audio==3.4.0 Keeps you on the pyannote 3.x generation, which is likely closer to your current script. The pyannote 3.4.0 release was a maintenance release that pinned related pyannote.{core,database,metrics,pipeline} dependencies to avoid breakage in the 3.x branch. See the pyannote 3.4.0 release note.
torch==2.8.0 Keeps PyTorch in the last generation before the TorchAudio 2.9 removal boundary.
torchaudio==2.8.0 Keeps torchaudio.AudioMetaData available. TorchAudio 2.8 has it, though deprecated. See the TorchAudio 2.8 docs.
torchcodec==0.7.* TorchCodec’s own compatibility table maps TorchCodec 0.7 to Torch 2.8. See the TorchCodec README.
requires-python = ">=3.10,<3.14" Your traceback shows Python 3.12. TorchCodec 0.7 supports Python >=3.9, <=3.13, so Python 3.12 is a reasonable target.

The practical point is simple:

TorchAudio 2.9+ removed AudioMetaData
        ↓
pyannote import crashes
        ↓
pin TorchAudio to 2.8.0
        ↓
AudioMetaData exists again
        ↓
pyannote can import

Remove the manually listed NVIDIA packages for the first recovery attempt

I would remove these from the first recovery attempt:

"nvidia-cublas-cu12",
"nvidia-cudnn-cu12",
"nvidia-cublas",
"nvidia-cudnn-cu13",
"nvidia-npp",

Reasons:

  1. They are not the cause of the current error.
  2. The current error is a Python import-time attribute lookup, not a CUDA runtime error.
  3. The block mixes CUDA 12 and CUDA 13 package names.
  4. Manually mixing NVIDIA runtime packages can make the environment harder to reason about.
  5. PyTorch CUDA wheel selection should be handled coherently through the PyTorch wheel/index strategy, not by mixing low-level NVIDIA packages casually.

This does not mean CUDA never matters. It does mean CUDA should be debugged after pyannote imports.

For faster-whisper GPU execution, you may later need CUDA/cuDNN-related fixes. The faster-whisper project documents CUDA and cuDNN expectations in its README:

  • faster-whisper README

But that is a second-stage issue. First fix:

from pyannote.audio import Inference, Model, Pipeline

Also fix the shebang and quotes

Use:

#!/usr/bin/env -S uv run --script

instead of:

#!/usr/bin/env -S uv run

The uv docs use uv run --script for scripts with inline metadata:

  • uv running scripts guide

Also make sure your actual file uses straight quotes, not curly quotes.

Bad if literally present in the file:

“torch”

Good:

"torch"

If the curly quotes only appeared because of formatting while pasting into a forum, ignore this. If they are actually in the file, the inline metadata is not valid TOML.


Step-by-step recovery procedure

Step 1: Replace the dependency block

Use this exact header first:

#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.10,<3.14"
# dependencies = [
#   "faster-whisper",
#   "numpy",
#   "pyannote.audio==3.4.0",
#   "scikit-learn",
#   "torch==2.8.0",
#   "torchaudio==2.8.0",
#   "torchcodec==0.7.*",
#   "omegaconf",
#   "brouhaha @ file:///home/user/diarization/repos/.venv/brouhaha-vad",
# ]
# ///

Step 2: Test pyannote import in isolation

Before testing the full diarization/transcription script, create a small file called check_pyannote_import.py:

#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.10,<3.14"
# dependencies = [
#   "pyannote.audio==3.4.0",
#   "torch==2.8.0",
#   "torchaudio==2.8.0",
#   "torchcodec==0.7.*",
# ]
# ///

import sys
from importlib.metadata import version

import torch
import torchaudio
from pyannote.audio import Pipeline

print("python:", sys.version)
print("torch:", torch.__version__)
print("torchaudio:", torchaudio.__version__)
print("torchcodec:", version("torchcodec"))
print("AudioMetaData exists:", hasattr(torchaudio, "AudioMetaData"))
print("pyannote import OK")

Run it with a fresh resolution:

uv run --refresh --script check_pyannote_import.py

Expected output should include something like:

torch: 2.8.0...
torchaudio: 2.8.0...
torchcodec: 0.7...
AudioMetaData exists: True
pyannote import OK

The most important line is:

AudioMetaData exists: True

If that line is False, you are still not running with the TorchAudio version you think you are running.


Step 3: Inspect the resolved dependency tree

Run:

uv tree --script diaritranscribe3.py

Look for:

pyannote.audio==3.4.0
torch==2.8.0
torchaudio==2.8.0
torchcodec==0.7.x

For this recovery path, you do not want:

torchaudio==2.9.x
torch==2.9.x

TorchAudio and PyTorch should be matched. Do not use a mixed pair like:

torch 2.8 + torchaudio 2.9

or:

torch 2.9 + torchaudio 2.8

The safer recovery pair is:

torch 2.8.0 + torchaudio 2.8.0

Reference:

  • TorchAudio installation compatibility notes

Step 4: Run your real script with refresh

After the minimal import test works:

uv run --refresh --script diaritranscribe3.py

If the script is executable:

chmod +x diaritranscribe3.py
./diaritranscribe3.py

Step 5: Lock the script after it works

Once the import works and the script begins running normally, lock the dependency set:

uv lock --script diaritranscribe3.py

uv supports lockfiles for PEP 723 inline scripts. The lockfile is created next to the script, for example:

diaritranscribe3.py.lock

Reference:

  • uv docs: locking script dependencies

This is important because your current error is exactly the kind of failure that lockfiles prevent. Without a lockfile, the same script can work today and break later when a newer torchaudio, torchcodec, pyannote-core, pyannote-metrics, or other dependency becomes resolvable.


Why I recommend recovery before full migration

There are two possible paths:

Path Meaning When to choose it
Recovery path Keep your current pyannote 3.x-style script and pin compatible versions. Best first move when the script fails at import time and you want minimal code changes.
Migration path Move to current pyannote 4.x / community-1 / TorchCodec / FFmpeg assumptions. Better long-term, but may require code changes and may expose new TorchCodec/FFmpeg issues.

For your case, I would choose recovery first.

Reason: the traceback proves the import environment is broken. It does not prove that your diarization logic, faster-whisper logic, VAD logic, or speaker-label alignment logic is wrong.

The disciplined order is:

fix pyannote import
        ↓
test pyannote alone
        ↓
test faster-whisper alone
        ↓
test speaker/transcript alignment
        ↓
then consider migrating to newer pyannote conventions

What the forward-migration path would look like later

The current pyannote direction is more TorchCodec-centered. The current pyannote repository describes pyannote.audio as a PyTorch-based speaker diarization toolkit, and current pyannote usage increasingly assumes TorchCodec and FFmpeg for audio decoding.

References:

  • pyannote.audio GitHub repository
  • pyannote speaker-diarization-community-1 model card
  • TorchCodec README

A newer pyannote-style snippet can look like this:

import torch
from pyannote.audio import Pipeline
from pyannote.audio.pipelines.utils.hook import ProgressHook

pipeline = Pipeline.from_pretrained(
    "pyannote/speaker-diarization-community-1",
    token="<HUGGINGFACE_ACCESS_TOKEN>",
)

pipeline.to(torch.device("cuda"))

with ProgressHook() as hook:
    output = pipeline("audio.wav", hook=hook)

for turn, speaker in output.speaker_diarization:
    print(f"start={turn.start:.1f}s stop={turn.end:.1f}s speaker_{speaker}")

In normal prose, that token placeholder would be written as \<HUGGINGFACE_ACCESS_TOKEN\>.

That newer path may be the right long-term direction, but it is a migration, not just a one-line dependency fix. It may change:

  • model name;
  • access/token handling;
  • audio decoding assumptions;
  • FFmpeg requirements;
  • TorchCodec version requirements;
  • output object shape;
  • how you iterate diarization results;
  • how you align diarization segments with transcript segments.

Older pyannote 3.x code commonly looks more like this:

for turn, _, speaker in diarization.itertracks(yield_label=True):
    print(f"start={turn.start:.1f}s stop={turn.end:.1f}s speaker_{speaker}")

So moving from pyannote 3.x to pyannote 4.x may require real script edits. That is why I would first recover your current script.


What to test after the import works

After this import stops crashing:

from pyannote.audio import Inference, Model, Pipeline

test each subsystem separately.

1. Test PyTorch and CUDA

import torch

print("torch:", torch.__version__)
print("torch cuda build:", torch.version.cuda)
print("cuda available:", torch.cuda.is_available())

if torch.cuda.is_available():
    print("gpu:", torch.cuda.get_device_name(0))

If this prints:

cuda available: False

then the pyannote import may be fixed, but your PyTorch build is CPU-only or CUDA-incompatible. That is a separate problem.


2. Test pyannote alone on a small WAV

For the first test, avoid MP3/M4A/WEBM. Normalize to a small mono 16 kHz WAV:

ffmpeg -y -i input.mp3 -ac 1 -ar 16000 test.wav

Then test only diarization:

from pyannote.audio import Pipeline
import torch

pipeline = Pipeline.from_pretrained(
    "pyannote/speaker-diarization-3.1",
    use_auth_token="<HUGGINGFACE_ACCESS_TOKEN>",
)

if torch.cuda.is_available():
    pipeline.to(torch.device("cuda"))

diarization = pipeline("test.wav")

for turn, _, speaker in diarization.itertracks(yield_label=True):
    print(turn.start, turn.end, speaker)

In normal prose, write the token placeholder as \<HUGGINGFACE_ACCESS_TOKEN\>.

Useful model page:

  • pyannote/speaker-diarization-3.1

3. Test faster-whisper alone

from faster_whisper import WhisperModel

model = WhisperModel("small", device="cuda", compute_type="float16")
segments, info = model.transcribe("test.wav", beam_size=5)

print("language:", info.language, info.language_probability)

for segment in segments:
    print(segment.start, segment.end, segment.text)

Important faster-whisper detail: segments is a generator, so transcription starts when you iterate over it or convert it to a list. The faster-whisper README documents this behavior.

Reference:

  • faster-whisper README

If faster-whisper fails with CUDA/cuDNN/cuBLAS errors, that is a different layer from the pyannote import failure.


4. Then combine diarization and transcription

Once both pyannote and faster-whisper work independently, then debug the speaker-attributed transcript logic.

The next hard problem is usually timestamp reconciliation:

diarization:
SPEAKER_00 from 10.0s to 14.8s

transcription:
"yeah that makes sense" from 13.9s to 16.2s

You need a policy for assigning transcript text to speakers.

Common policies:

Policy Meaning Tradeoff
Midpoint assignment Assign a transcript segment to the speaker active at the segment midpoint. Simple, but weak for long segments that cross speaker changes.
Maximum overlap Assign the transcript segment to the speaker with the largest time overlap. Usually a good first implementation.
Split at speaker boundaries Split transcript segments when diarization changes speakers. More accurate, more code.
Word-level assignment Use word timestamps and assign each word separately. Best when word timestamps are reliable.
Exclusive diarization Prefer non-overlapping diarization when available. Easier to reconcile with transcript timestamps.

For a first robust script, I would use maximum overlap at the segment level. Later, if transcript quality matters a lot, move to word-level assignment.


Why not just monkey-patch pyannote?

Some workarounds patch the installed pyannote file and replace something like:

) -> torchaudio.AudioMetaData:

with:

) -> object:

or a quoted annotation.

That can work temporarily because the failing reference is often annotation-related. But I would not keep that as the real solution.

Reasons:

  • it modifies files inside site-packages;
  • uv can rebuild the environment and erase the patch;
  • it hides the real version mismatch;
  • another removed TorchAudio API may fail later;
  • it makes the environment non-reproducible;
  • it is harder to explain or maintain.

For this script, version pinning is cleaner.


Likely next errors after this fix

After you fix AudioMetaData, you may hit another layer. That is normal.

Possible next error: Hugging Face model access

If you load pyannote models from Hugging Face, you may need to accept model conditions and use a token.

Possible symptoms:

401 Unauthorized
403 Forbidden
Repository not found
You are not in the authorized list

Useful links:

  • pyannote/speaker-diarization-3.1
  • pyannote/speaker-diarization-community-1
  • Hugging Face access tokens docs

Possible next error: TorchCodec / FFmpeg

If you migrate toward newer pyannote or use TorchCodec-backed decoding, you may see errors like:

Could not load libtorchcodec
FFmpeg is not properly installed

Useful links:

  • TorchCodec README
  • TorchAudio future / TorchCodec migration issue

Possible next error: CUDA unavailable

If:

torch.cuda.is_available()

returns:

False

then the import problem is fixed, but your PyTorch install is not seeing the GPU. That is a PyTorch wheel/index/CUDA issue.

Useful link:

  • uv PyTorch integration guide

Possible next error: faster-whisper CUDA/cuDNN/CTranslate2

Possible symptoms:

Library libcudnn_ops_infer.so not found
CUDA failed
unsupported compute type

Useful link:

  • faster-whisper README

Final recommended script header

For your current script, I would start with this:

#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.10,<3.14"
# dependencies = [
#   "faster-whisper",
#   "numpy",
#   "pyannote.audio==3.4.0",
#   "scikit-learn",
#   "torch==2.8.0",
#   "torchaudio==2.8.0",
#   "torchcodec==0.7.*",
#   "omegaconf",
#   "brouhaha @ file:///home/user/diarization/repos/.venv/brouhaha-vad",
# ]
# ///

Then run:

uv run --refresh --script diaritranscribe3.py

Then inspect:

uv tree --script diaritranscribe3.py

Then lock:

uv lock --script diaritranscribe3.py

Bottom line

The fix is not to install a random missing package. The fix is to choose a coherent version family.

For your current script, the most practical recovery family is:

pyannote.audio 3.4.0
torch 2.8.0
torchaudio 2.8.0
torchcodec 0.7.x
Python 3.12

That combination is aimed at the exact failure:

AttributeError: module 'torchaudio' has no attribute 'AudioMetaData'

After that works, debug CUDA, faster-whisper, model access, FFmpeg/TorchCodec, and speaker-transcript alignment as separate layers.

Short summary

  • The error is almost certainly a dependency compatibility issue.
  • torchaudio.AudioMetaData existed in TorchAudio 2.8 but was removed in TorchAudio 2.9.
  • Your dependency block lets uv resolve incompatible versions.
  • Pin pyannote.audio==3.4.0, torch==2.8.0, torchaudio==2.8.0, and torchcodec==0.7.*.
  • Remove the mixed manual nvidia-* packages for the first recovery attempt.
  • Test pyannote import by itself.
  • Use uv run --refresh --script ....
  • Inspect with uv tree --script ....
  • Lock with uv lock --script ....
  • Consider pyannote 4.x / community-1 later as a real migration, not the first fix.

Discussion in the ATmosphere

Loading comments...