External Publication
Visit Post

Module 'torchaudio' has no attribute 'AudioMetaData'

Hugging Face Forums [Unofficial] April 29, 2026
Source

Looking ahead, updating the library is really the best course of action, but given your current setup, the migration process is quite complicated:


Path B — later migration: use Community-1 and pyannote.audio 4.x

Short version

Path B means intentionally leaving the old pyannote.audio==3.3.0 recovery stack and moving to the newer pyannote stack:

pyannote.audio 4.x
pyannote/speaker-diarization-community-1
Pipeline.from_pretrained(..., token=...)
output.speaker_diarization
output.exclusive_speaker_diarization
TorchCodec-backed audio decoding
FFmpeg installed

This is not just a one-line model change.

It is a real migration because your current brouhaha dependency pins:

pyannote-audio==3.3.0

while the newer Community-1 examples expect the newer pyannote API surface:

Pipeline.from_pretrained(
    "pyannote/speaker-diarization-community-1",
    token="<HUGGINGFACE_ACCESS_TOKEN>",
)

The current pyannote README shows this community-1 + token=... style and says FFmpeg must be installed because TorchCodec handles audio decoding:

  • pyannote.audio GitHub README
  • pyannote/speaker-diarization-community-1 model card
  • pyannote.audio release notes
  • TorchCodec README and compatibility table
  • uv PyTorch guide
  • uv script locking docs

Why you should not do Path B casually

Your current stack has two separate constraints:

brouhaha==0.9.0
        ↓
requires pyannote-audio==3.3.0

and:

Community-1 / pyannote 4.x examples
        ↓
use token=...
use output.speaker_diarization
use output.exclusive_speaker_diarization
expect TorchCodec/FFmpeg audio decoding

Those are different worlds.

The pyannote 3.3 recovery world uses:

pipeline = Pipeline.from_pretrained(
    "pyannote/speaker-diarization-3.1",
    use_auth_token="<HUGGINGFACE_ACCESS_TOKEN>",
)

diarization = pipeline("audio.wav")

for turn, _, speaker in diarization.itertracks(yield_label=True):
    ...

The pyannote 4 / Community-1 world uses:

pipeline = Pipeline.from_pretrained(
    "pyannote/speaker-diarization-community-1",
    token="<HUGGINGFACE_ACCESS_TOKEN>",
)

output = pipeline("audio.wav")

for turn, speaker in output.speaker_diarization:
    ...

And, when available, the newer path also gives:

output.exclusive_speaker_diarization

That exclusive_speaker_diarization output is especially relevant for your transcription project because the Community-1 model card describes it as simplifying reconciliation between diarization timestamps and transcription timestamps.

Source links:

  • Community-1 model card: quick start, GPU, exclusive diarization, offline use
  • pyannote.audio release notes: use_auth_token renamed to token
  • pyannote README: Community-1 usage

What Path B is for

Choose Path B if you want one or more of these:

  • newer pyannote.audio API;
  • the open-source pyannote/speaker-diarization-community-1 pipeline;
  • better diarization quality than the old speaker-diarization-3.1 baseline;
  • easier reconciliation with transcripts using exclusive_speaker_diarization;
  • a forward-looking stack instead of living on TorchAudio 2.8 deprecation warnings;
  • a cleaner long-term project layout.

Do not choose Path B if your immediate goal is only:

make the old script run with the least changes

For the least-change recovery path, stay with:

pyannote.audio==3.3.0
pyannote/speaker-diarization-3.1
use_auth_token=...
torch==2.8.0
torchaudio==2.8.0
torchcodec==0.7.*

Path B is the better long-term migration, but the worse emergency fix.


The main blocker: brouhaha

The problem

Your resolver already told you:

brouhaha==0.9.0 depends on pyannote-audio==3.3.0

So this cannot work:

"pyannote.audio>=4,<5",
"brouhaha @ file:///home/user/diarization/repos/.venv/brouhaha-vad",

unless you change something about brouhaha.

The resolver is correct. If brouhaha requires exactly:

pyannote-audio==3.3.0

then the environment cannot also contain:

pyannote.audio>=4

Your options

You have five realistic choices.

Option What it means Good if Risk
Remove brouhaha Delete it from dependencies and remove/replace its VAD calls. You do not strictly need Brouhaha VAD. You may lose the current VAD behavior.
Replace brouhaha Use pyannote’s own diarization behavior, faster-whisper VAD, Silero VAD, or another VAD stage. You only used Brouhaha as a helper. May change segmentation and final transcript quality.
Fork/edit brouhaha Change its dependency metadata from pyannote-audio==3.3.0 to a looser or newer version. You control the local package and can test it. Its code may actually depend on pyannote 3.3 internals.
Split environments Run Brouhaha preprocessing in one script/env, then run pyannote 4 diarization in another script/env. You need Brouhaha but also want Community-1. More moving parts and file handoff.
Stay on Path A Do not migrate now. Keep pyannote 3.3. You want stability first. You do not get Community-1 yet.

My recommendation: do not start by editingbrouhaha dependency metadata blindly.

First inspect why it pins pyannote:

grep -R "pyannote" -n /home/user/diarization/repos/.venv/brouhaha-vad

Look for files like:

pyproject.toml
setup.py
setup.cfg
requirements.txt

Then inspect imports:

grep -R "from pyannote\|import pyannote" -n /home/user/diarization/repos/.venv/brouhaha-vad

If Brouhaha only uses public, stable APIs, loosening the pin might work. If it uses pyannote internals or pyannote 3.x-specific output structures, expect breakage.


Recommended migration strategy

Do not migrate the production script all at once.

Use a three-stage migration.

Stage 1: build a tiny Community-1 proof-of-life script
Stage 2: port only diarization code
Stage 3: reintegrate transcription, VAD, and speaker-label alignment

This prevents one common failure mode:

changed model + changed pyannote version + changed TorchCodec + changed FFmpeg + changed CUDA + changed VAD + changed transcript alignment
        ↓
too many variables
        ↓
impossible to tell what broke

Stage 1 — prove Community-1 works by itself

Create a new test file, separate from diaritranscribe3.py.

For example:

check_pyannote4_community1.py

Use this as a minimal proof-of-life script:

#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.10,<3.14"
# dependencies = [
#   "pyannote.audio>=4,<5",
#   "torch",
#   "torchaudio",
#   "torchcodec",
# ]
# ///

import os
from importlib.metadata import version

import torch
from pyannote.audio import Pipeline
from pyannote.audio.pipelines.utils.hook import ProgressHook

MODEL_ID = "pyannote/speaker-diarization-community-1"
AUDIO_PATH = "audio.wav"

token = os.environ.get("HF_TOKEN")
if not token:
    raise RuntimeError("Set HF_TOKEN before running this script.")

print("pyannote.audio:", version("pyannote.audio"))
print("torch:", torch.__version__)
print("torch cuda build:", torch.version.cuda)
print("cuda available:", torch.cuda.is_available())
print("torchaudio:", version("torchaudio"))
print("torchcodec:", version("torchcodec"))

pipeline = Pipeline.from_pretrained(
    MODEL_ID,
    token=token,
)

if torch.cuda.is_available():
    pipeline.to(torch.device("cuda"))

with ProgressHook() as hook:
    output = pipeline(AUDIO_PATH, hook=hook)

print("\nRegular diarization:")
for turn, speaker in output.speaker_diarization:
    print(f"{turn.start:.3f}\t{turn.end:.3f}\t{speaker}")

print("\nExclusive diarization:")
if hasattr(output, "exclusive_speaker_diarization"):
    for turn, speaker in output.exclusive_speaker_diarization:
        print(f"{turn.start:.3f}\t{turn.end:.3f}\t{speaker}")
else:
    print("exclusive_speaker_diarization is not available on this output.")

Run it like:

export HF_TOKEN="<HUGGINGFACE_ACCESS_TOKEN>"
uv run --refresh --script check_pyannote4_community1.py

In normal prose, write the token placeholder as \<HUGGINGFACE_ACCESS_TOKEN\>.

Before running it, make sure:

  1. you accepted the Community-1 user conditions;
  2. your token can access the model;
  3. FFmpeg is installed;
  4. the test file audio.wav exists.

Relevant setup docs:

  • Community-1 model card
  • pyannote.audio README
  • Hugging Face access tokens
  • TorchCodec README

Stage 2 — choose a coherent Torch/TorchCodec version family

The current pyannote project metadata says the modern branch requires:

Python >=3.10
torch >=2.8.0
torchaudio >=2.8.0
torchcodec >=0.7.0

Source:

  • pyannote.audio pyproject.toml

But “greater than or equal” does not mean every arbitrary combination is equally good.

TorchCodec publishes a compatibility table. Current table highlights include:

torchcodec 0.7  ↔ torch 2.8
torchcodec 0.8  ↔ torch 2.9
torchcodec 0.9  ↔ torch 2.9
torchcodec 0.10 ↔ torch 2.10
torchcodec 0.11 ↔ torch 2.11

Source:

  • TorchCodec README: compatibility table

So do not mix randomly.

Conservative modern family

This is the least aggressive Community-1 migration target:

pyannote.audio>=4,<5
torch==2.8.0
torchaudio==2.8.0
torchcodec==0.7.*

Pros:

  • close to the minimum modern pyannote requirements;
  • avoids jumping all the way to newer Torch/TorchAudio generations;
  • TorchCodec 0.7 matches Torch 2.8;
  • likely easier if the rest of your audio stack was stabilized around Torch 2.8.

Cons:

  • still close to the old TorchAudio transition boundary;
  • may not represent the newest pyannote-tested stack.

Newer Torch family

A newer family might look like:

pyannote.audio>=4,<5
torch==2.9.*
torchaudio==2.9.*
torchcodec==0.9.*

or:

pyannote.audio>=4,<5
torch==2.10.*
torchaudio==2.10.*
torchcodec==0.10.*

Pros:

  • more aligned with the post-TorchAudio-2.9 world;
  • better long-term direction if your other dependencies support it.

Cons:

  • may expose TorchCodec/FFmpeg issues;
  • may conflict with faster-whisper/CTranslate2 expectations;
  • may require more careful PyTorch CUDA wheel/index selection.

Practical advice

For a migration branch, start with the conservative modern family:

"pyannote.audio>=4,<5",
"torch==2.8.0",
"torchaudio==2.8.0",
"torchcodec==0.7.*",

Then, after Community-1 works, decide whether to move Torch upward.

Do not solve every modernization problem at once.


Stage 3 — remove or isolate brouhaha

Because brouhaha pins pyannote 3.3, your Community-1 test script should not include Brouhaha.

For Path B, the dependency block should start without it:

#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.10,<3.14"
# dependencies = [
#   "pyannote.audio>=4,<5",
#   "torch==2.8.0",
#   "torchaudio==2.8.0",
#   "torchcodec==0.7.*",
# ]
# ///

Only after Community-1 works should you decide what to do with Brouhaha.

If you remove Brouhaha

Delete:

"brouhaha @ file:///home/user/diarization/repos/.venv/brouhaha-vad",

and remove code like:

import brouhaha

or any function calls into Brouhaha.

Then rely on pyannote diarization directly, or use another VAD/preprocessing layer.

If you fork Brouhaha

Edit its dependency metadata.

For example, if its pyproject.toml contains:

dependencies = [
    "pyannote-audio==3.3.0",
]

you could test:

dependencies = [
    "pyannote-audio>=4,<5",
]

or, if Brouhaha does not actually need pyannote at runtime after your refactor:

dependencies = []

But do this only in a branch or copy.

Then run its own tests, or at least import it:

uv run --refresh --script check_brouhaha_import.py

where:

#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.10,<3.14"
# dependencies = [
#   "brouhaha @ file:///home/user/diarization/repos/.venv/brouhaha-vad",
#   "pyannote.audio>=4,<5",
# ]
# ///

import brouhaha
from importlib.metadata import version

print("brouhaha import OK")
print("pyannote.audio:", version("pyannote.audio"))

If this fails, Brouhaha is not pyannote-4-compatible yet.

If you split environments

Use two scripts.

First script:

vad_preprocess.py

uses Brouhaha and pyannote 3.3 if needed.

Second script:

diarize_community1.py

uses pyannote 4 and Community-1.

The handoff should be a file, JSON, RTTM, or plain timestamp list. This is clunkier, but it avoids forcing incompatible libraries into one dependency graph.


Stage 4 — update the pyannote call

Old Path A code:

pipeline = Pipeline.from_pretrained(
    "pyannote/speaker-diarization-3.1",
    use_auth_token=tokens["diarization"],
)

diarization = pipeline(audio_path)

for turn, _, speaker in diarization.itertracks(yield_label=True):
    ...

New Path B code:

pipeline = Pipeline.from_pretrained(
    "pyannote/speaker-diarization-community-1",
    token=tokens["diarization"],
)

output = pipeline(audio_path)

for turn, speaker in output.speaker_diarization:
    ...

And, for transcript alignment, prefer testing:

for turn, speaker in output.exclusive_speaker_diarization:
    ...

The current Community-1 model card says exclusive_speaker_diarization is provided on top of regular diarization and is meant to simplify reconciliation with transcription timestamps.

Source:

  • Community-1: exclusive speaker diarization

Stage 5 — rewrite speaker/transcript alignment around exclusive diarization

This is the most important practical benefit for your script.

Your final goal is not just diarization. Your goal is:

audio file
        ↓
transcript segments or words
        ↓
speaker labels
        ↓
speaker-attributed transcript

Old diarization can produce fine-grained, overlapping, or awkward speaker turns. That can be hard to align to Whisper/faster-whisper transcript segments.

Community-1 adds:

output.exclusive_speaker_diarization

Use that first for transcript alignment.

Basic maximum-overlap assignment

Use this when your ASR gives segment-level timestamps.

def overlap_seconds(a_start, a_end, b_start, b_end):
    return max(0.0, min(a_end, b_end) - max(a_start, b_start))


def assign_speaker_to_segment(segment_start, segment_end, diarization_turns):
    best_speaker = None
    best_overlap = 0.0

    for turn_start, turn_end, speaker in diarization_turns:
        overlap = overlap_seconds(segment_start, segment_end, turn_start, turn_end)
        if overlap > best_overlap:
            best_overlap = overlap
            best_speaker = speaker

    return best_speaker or "UNKNOWN"


def diarization_to_turns(exclusive_speaker_diarization):
    turns = []
    for turn, speaker in exclusive_speaker_diarization:
        turns.append((float(turn.start), float(turn.end), str(speaker)))
    return turns

Then:

turns = diarization_to_turns(output.exclusive_speaker_diarization)

for segment in whisper_segments:
    speaker = assign_speaker_to_segment(segment.start, segment.end, turns)
    print(f"[{segment.start:.2f}-{segment.end:.2f}] {speaker}: {segment.text}")

Word-level assignment

If faster-whisper returns word timestamps, word-level assignment is usually better.

Conceptually:

for each word:
    find the speaker turn with max overlap
    assign that speaker to the word
then merge adjacent words with the same speaker

This handles speaker changes inside a long ASR segment better than assigning one speaker to the whole segment.


Stage 6 — verify FFmpeg and TorchCodec

Community-1 uses TorchCodec-backed decoding. The pyannote README explicitly says FFmpeg must be installed because TorchCodec handles audio decoding.

Check FFmpeg:

ffmpeg -version

Check TorchCodec import:

import torchcodec
print("torchcodec import OK")

Check versions:

from importlib.metadata import version
import torch

print("torch:", torch.__version__)
print("torchcodec:", version("torchcodec"))

TorchCodec supports FFmpeg major versions in [4, 8], and on Windows it needs FFmpeg builds with separate shared libraries. The TorchCodec README also provides the TorchCodec/Torch/Python compatibility table.

Source:

  • TorchCodec README

If TorchCodec fails

Common error shapes:

RuntimeError: Could not load libtorchcodec



FFmpeg is not properly installed



No compatible FFmpeg found

Likely causes:

  • FFmpeg missing;
  • FFmpeg installed but not visible on PATH;
  • Windows FFmpeg build is not a shared build;
  • TorchCodec version does not match Torch version;
  • Python version is outside the wheel’s supported range;
  • unsupported architecture, especially Linux ARM64/aarch64.

Check the compatibility table before changing random packages.


Stage 7 — choose uv layout: inline script vs project

You can do Path B with inline script metadata, but a project layout is cleaner once you are juggling:

pyannote.audio
torch
torchaudio
torchcodec
faster-whisper
ctranslate2
ffmpeg
CUDA
tokens
local packages

Inline script version

Good for quick experiments:

#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.10,<3.14"
# dependencies = [
#   "pyannote.audio>=4,<5",
#   "torch==2.8.0",
#   "torchaudio==2.8.0",
#   "torchcodec==0.7.*",
# ]
# ///

from pyannote.audio import Pipeline

Lock after success:

uv lock --script check_pyannote4_community1.py

Source:

  • uv script locking docs

Project version

Better for the real app.

pyproject.toml:

[project]
name = "diaritranscribe"
version = "0.1.0"
requires-python = ">=3.10,<3.14"
dependencies = [
  "pyannote.audio>=4,<5",
  "faster-whisper",
  "numpy",
  "scikit-learn",
  "omegaconf",
  "torch==2.8.0",
  "torchaudio==2.8.0",
  "torchcodec==0.7.*",
]

[tool.uv]
required-version = ">=0.5.3"

Then:

uv lock
uv sync
uv run python scripts/diaritranscribe4.py

If you need explicit CUDA PyTorch indexes, use uv’s PyTorch guide:

  • Using uv with PyTorch

PyTorch packaging is unusual because CPU and CUDA builds may live on different indexes and use local version specifiers such as +cpu or +cu130.


Stage 8 — update token handling

Use environment variables rather than hardcoding tokens.

export HF_TOKEN="<HUGGINGFACE_ACCESS_TOKEN>"

Python:

import os

token = os.environ.get("HF_TOKEN")
if not token:
    raise RuntimeError("Set HF_TOKEN.")

Then:

pipeline = Pipeline.from_pretrained(
    "pyannote/speaker-diarization-community-1",
    token=token,
)

In normal prose, write the placeholder as \<HUGGINGFACE_ACCESS_TOKEN\>.

Make sure the token’s Hugging Face account has accepted the model conditions:

  • pyannote/speaker-diarization-community-1
  • Hugging Face access tokens

Missing access usually gives errors like:

401 Unauthorized
403 Forbidden
Repository not found
gated repo

Those are different from the old unexpected keyword argument 'token' error.


Stage 9 — account for telemetry

Current pyannote docs mention optional telemetry. The README says it tracks privacy-preserving information such as pipeline origin, pipeline class, file duration, and speaker-count parameters, and documents ways to control it.

Disable for the current process if desired:

export PYANNOTE_METRICS_ENABLED=0

Or in Python:

from pyannote.audio.telemetry import set_telemetry_metrics

set_telemetry_metrics(False)

Source:

  • pyannote.audio README telemetry section

Stage 10 — test accuracy and runtime before deleting Path A

Do not delete the working pyannote 3.3 path until you compare:

  • same audio file;
  • same hardware;
  • same preprocessing;
  • same transcript segments;
  • same speaker-label assignment policy;
  • same output format.

Compare:

speaker count
number of turns
total diarization time
overlap behavior
transcript speaker-label quality
GPU memory use
runtime
failure rate on long files

A migration is successful only if the final speaker-attributed transcript improves or remains acceptable.


Suggested branch layout

Keep two scripts for a while:

diaritranscribe3.py       # recovery path, pyannote 3.3
diaritranscribe4.py       # migration path, pyannote 4 / Community-1

Keep two lockfiles if using inline scripts:

diaritranscribe3.py.lock
diaritranscribe4.py.lock

This prevents accidentally breaking the known-good path while testing the new one.


Minimal diaritranscribe4.py starting point

This is a clean starting point for just the diarization part.

#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.10,<3.14"
# dependencies = [
#   "pyannote.audio>=4,<5",
#   "torch==2.8.0",
#   "torchaudio==2.8.0",
#   "torchcodec==0.7.*",
# ]
# ///

import argparse
import os
from importlib.metadata import version

import torch
from pyannote.audio import Pipeline
from pyannote.audio.pipelines.utils.hook import ProgressHook

MODEL_ID = "pyannote/speaker-diarization-community-1"


def print_versions():
    print("pyannote.audio:", version("pyannote.audio"))
    print("torch:", torch.__version__)
    print("torch cuda build:", torch.version.cuda)
    print("cuda available:", torch.cuda.is_available())
    print("torchaudio:", version("torchaudio"))
    print("torchcodec:", version("torchcodec"))


def load_pipeline(token: str):
    pipeline = Pipeline.from_pretrained(
        MODEL_ID,
        token=token,
    )

    if torch.cuda.is_available():
        pipeline.to(torch.device("cuda"))

    return pipeline


def run_diarization(audio_path: str):
    token = os.environ.get("HF_TOKEN")
    if not token:
        raise RuntimeError("Set HF_TOKEN before running this script.")

    print_versions()
    print(f"Loading {MODEL_ID}...")

    pipeline = load_pipeline(token)

    with ProgressHook() as hook:
        output = pipeline(audio_path, hook=hook)

    return output


def print_diarization(output):
    print("\nRegular speaker diarization:")
    for turn, speaker in output.speaker_diarization:
        print(f"{turn.start:.3f}\t{turn.end:.3f}\t{speaker}")

    print("\nExclusive speaker diarization:")
    if hasattr(output, "exclusive_speaker_diarization"):
        for turn, speaker in output.exclusive_speaker_diarization:
            print(f"{turn.start:.3f}\t{turn.end:.3f}\t{speaker}")
    else:
        print("Not available.")


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("audio_path")
    args = parser.parse_args()

    output = run_diarization(args.audio_path)
    print_diarization(output)


if __name__ == "__main__":
    main()

Run:

export HF_TOKEN="<HUGGINGFACE_ACCESS_TOKEN>"
uv run --refresh --script diaritranscribe4.py audio.wav

Lock after it works:

uv lock --script diaritranscribe4.py

Adding faster-whisper back later

After Community-1 works by itself, add faster-whisper back.

# /// script
# requires-python = ">=3.10,<3.14"
# dependencies = [
#   "pyannote.audio>=4,<5",
#   "torch==2.8.0",
#   "torchaudio==2.8.0",
#   "torchcodec==0.7.*",
#   "faster-whisper",
#   "numpy",
#   "scikit-learn",
#   "omegaconf",
# ]
# ///

Then test faster-whisper separately before combining:

from faster_whisper import WhisperModel

model = WhisperModel("small", device="cuda", compute_type="float16")
segments, info = model.transcribe("audio.wav", beam_size=5)

for segment in segments:
    print(segment.start, segment.end, segment.text)

If faster-whisper fails with CUDA/cuDNN/CTranslate2 errors, that is separate from pyannote.

Source:

  • faster-whisper README

Common Path B failure modes

Failure: No solution found

Usually means you still have a dependency pin like:

brouhaha -> pyannote-audio==3.3.0

Fix:

  • remove Brouhaha from the pyannote 4 environment;
  • fork/update Brouhaha;
  • split environments.

Failure: unexpected keyword argument 'token'

This means you are still on old pyannote.

Check:

from importlib.metadata import version
print(version("pyannote.audio"))

If it prints 3.3.0, you are not on Path B yet.

Failure: unexpected keyword argument 'use_auth_token'

This means you are probably on newer pyannote but still using old code.

Use:

token="<HUGGINGFACE_ACCESS_TOKEN>"

not:

use_auth_token="<HUGGINGFACE_ACCESS_TOKEN>"

Failure: Could not load libtorchcodec

Check:

  • TorchCodec/Torch version compatibility;
  • FFmpeg installation;
  • Python version;
  • platform wheel availability.

Source:

  • TorchCodec README

Failure: model access denied

Check that you accepted the model conditions and used a valid token:

  • Community-1 model card
  • Hugging Face access tokens

Failure: CUDA not available

Check PyTorch install:

import torch
print(torch.__version__)
print(torch.version.cuda)
print(torch.cuda.is_available())

Use uv’s PyTorch guide for accelerator-specific builds:

  • uv PyTorch guide

Recommended Path B checklist

  1. Create diaritranscribe4.py.
  2. Remove brouhaha from that script.
  3. Use pyannote.audio>=4,<5.
  4. Start with a coherent Torch/TorchAudio/TorchCodec family.
  5. Install FFmpeg.
  6. Accept Community-1 model conditions.
  7. Set HF_TOKEN.
  8. Load with token=....
  9. Use output.speaker_diarization.
  10. Prefer output.exclusive_speaker_diarization for transcript alignment.
  11. Test pyannote alone.
  12. Add faster-whisper back only after pyannote works.
  13. Rebuild speaker assignment around maximum overlap or word-level timestamps.
  14. Lock the migrated script.
  15. Keep the pyannote 3.3 script until the new output is verified.

Bottom line

Path B is not:

change speaker-diarization-3.1 to speaker-diarization-community-1

and it is not:

change use_auth_token= to token=

It is:

remove or isolate the Brouhaha pyannote 3.3 pin
        ↓
move to pyannote.audio 4.x
        ↓
use Community-1
        ↓
install/verify TorchCodec and FFmpeg
        ↓
change the output parsing code
        ↓
use exclusive diarization for transcript alignment
        ↓
lock the new environment

For your project, the safest approach is to keep:

diaritranscribe3.py

as the recovery script and create:

diaritranscribe4.py

as the Community-1 migration script.

Do not merge them until Community-1 works alone, faster-whisper works alone, and the speaker-attributed transcript is at least as good as your pyannote 3.3 path.

Discussion in the ATmosphere

Loading comments...