{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreicknpndra7vd4gqwmpupdellv273fsrit5zniqcyyqaokeym6dt2m",
"uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mkocwsd2vk22"
},
"path": "/t/module-torchaudio-has-no-attribute-audiometadata/175647#post_6",
"publishedAt": "2026-04-29T23:22:32.000Z",
"site": "https://discuss.huggingface.co",
"tags": [
"pyannote.audio GitHub README",
"pyannote/speaker-diarization-community-1 model card",
"pyannote.audio release notes",
"TorchCodec README and compatibility table",
"uv PyTorch guide",
"uv script locking docs",
"Community-1 model card: quick start, GPU, exclusive diarization, offline use",
"pyannote.audio release notes: use_auth_token renamed to token",
"pyannote README: Community-1 usage",
"Community-1 model card",
"pyannote.audio README",
"Hugging Face access tokens",
"TorchCodec README",
"pyannote.audio pyproject.toml",
"TorchCodec README: compatibility table",
"Community-1: exclusive speaker diarization",
"Using uv with PyTorch",
"pyannote/speaker-diarization-community-1",
"pyannote.audio README telemetry section",
"faster-whisper README"
],
"textContent": "Looking ahead, updating the library is really the best course of action, but given your current setup, the migration process is quite complicated:\n\n* * *\n\n# Path B — later migration: use `Community-1` and `pyannote.audio` 4.x\n\n## Short version\n\nPath B means intentionally leaving the old `pyannote.audio==3.3.0` recovery stack and moving to the newer pyannote stack:\n\n\n pyannote.audio 4.x\n pyannote/speaker-diarization-community-1\n Pipeline.from_pretrained(..., token=...)\n output.speaker_diarization\n output.exclusive_speaker_diarization\n TorchCodec-backed audio decoding\n FFmpeg installed\n\n\nThis is not just a one-line model change.\n\nIt is a real migration because your current `brouhaha` dependency pins:\n\n\n pyannote-audio==3.3.0\n\n\nwhile the newer `Community-1` examples expect the newer pyannote API surface:\n\n\n Pipeline.from_pretrained(\n \"pyannote/speaker-diarization-community-1\",\n token=\"<HUGGINGFACE_ACCESS_TOKEN>\",\n )\n\n\nThe current pyannote README shows this `community-1` + `token=...` style and says FFmpeg must be installed because TorchCodec handles audio decoding:\n\n * pyannote.audio GitHub README\n * pyannote/speaker-diarization-community-1 model card\n * pyannote.audio release notes\n * TorchCodec README and compatibility table\n * uv PyTorch guide\n * uv script locking docs\n\n\n\n* * *\n\n## Why you should not do Path B casually\n\nYour current stack has two separate constraints:\n\n\n brouhaha==0.9.0\n ↓\n requires pyannote-audio==3.3.0\n\n\nand:\n\n\n Community-1 / pyannote 4.x examples\n ↓\n use token=...\n use output.speaker_diarization\n use output.exclusive_speaker_diarization\n expect TorchCodec/FFmpeg audio decoding\n\n\nThose are different worlds.\n\nThe pyannote 3.3 recovery world uses:\n\n\n pipeline = Pipeline.from_pretrained(\n \"pyannote/speaker-diarization-3.1\",\n use_auth_token=\"<HUGGINGFACE_ACCESS_TOKEN>\",\n )\n\n diarization = pipeline(\"audio.wav\")\n\n for turn, _, speaker in diarization.itertracks(yield_label=True):\n ...\n\n\nThe pyannote 4 / Community-1 world uses:\n\n\n pipeline = Pipeline.from_pretrained(\n \"pyannote/speaker-diarization-community-1\",\n token=\"<HUGGINGFACE_ACCESS_TOKEN>\",\n )\n\n output = pipeline(\"audio.wav\")\n\n for turn, speaker in output.speaker_diarization:\n ...\n\n\nAnd, when available, the newer path also gives:\n\n\n output.exclusive_speaker_diarization\n\n\nThat `exclusive_speaker_diarization` output is especially relevant for your transcription project because the Community-1 model card describes it as simplifying reconciliation between diarization timestamps and transcription timestamps.\n\nSource links:\n\n * Community-1 model card: quick start, GPU, exclusive diarization, offline use\n * pyannote.audio release notes: use_auth_token renamed to token\n * pyannote README: Community-1 usage\n\n\n\n* * *\n\n## What Path B is for\n\nChoose Path B if you want one or more of these:\n\n * newer `pyannote.audio` API;\n * the open-source `pyannote/speaker-diarization-community-1` pipeline;\n * better diarization quality than the old `speaker-diarization-3.1` baseline;\n * easier reconciliation with transcripts using `exclusive_speaker_diarization`;\n * a forward-looking stack instead of living on TorchAudio 2.8 deprecation warnings;\n * a cleaner long-term project layout.\n\n\n\nDo **not** choose Path B if your immediate goal is only:\n\n\n make the old script run with the least changes\n\n\nFor the least-change recovery path, stay with:\n\n\n pyannote.audio==3.3.0\n pyannote/speaker-diarization-3.1\n use_auth_token=...\n torch==2.8.0\n torchaudio==2.8.0\n torchcodec==0.7.*\n\n\nPath B is the better long-term migration, but the worse emergency fix.\n\n* * *\n\n# The main blocker: `brouhaha`\n\n## The problem\n\nYour resolver already told you:\n\n\n brouhaha==0.9.0 depends on pyannote-audio==3.3.0\n\n\nSo this cannot work:\n\n\n \"pyannote.audio>=4,<5\",\n \"brouhaha @ file:///home/user/diarization/repos/.venv/brouhaha-vad\",\n\n\nunless you change something about `brouhaha`.\n\nThe resolver is correct. If `brouhaha` requires exactly:\n\n\n pyannote-audio==3.3.0\n\n\nthen the environment cannot also contain:\n\n\n pyannote.audio>=4\n\n\n## Your options\n\nYou have five realistic choices.\n\nOption | What it means | Good if | Risk\n---|---|---|---\nRemove `brouhaha` | Delete it from dependencies and remove/replace its VAD calls. | You do not strictly need Brouhaha VAD. | You may lose the current VAD behavior.\nReplace `brouhaha` | Use pyannote’s own diarization behavior, faster-whisper VAD, Silero VAD, or another VAD stage. | You only used Brouhaha as a helper. | May change segmentation and final transcript quality.\nFork/edit `brouhaha` | Change its dependency metadata from `pyannote-audio==3.3.0` to a looser or newer version. | You control the local package and can test it. | Its code may actually depend on pyannote 3.3 internals.\nSplit environments | Run Brouhaha preprocessing in one script/env, then run pyannote 4 diarization in another script/env. | You need Brouhaha but also want Community-1. | More moving parts and file handoff.\nStay on Path A | Do not migrate now. Keep pyannote 3.3. | You want stability first. | You do not get Community-1 yet.\n\nMy recommendation: **do not start by editing`brouhaha` dependency metadata blindly.**\n\nFirst inspect why it pins pyannote:\n\n\n grep -R \"pyannote\" -n /home/user/diarization/repos/.venv/brouhaha-vad\n\n\nLook for files like:\n\n\n pyproject.toml\n setup.py\n setup.cfg\n requirements.txt\n\n\nThen inspect imports:\n\n\n grep -R \"from pyannote\\|import pyannote\" -n /home/user/diarization/repos/.venv/brouhaha-vad\n\n\nIf Brouhaha only uses public, stable APIs, loosening the pin might work. If it uses pyannote internals or pyannote 3.x-specific output structures, expect breakage.\n\n* * *\n\n# Recommended migration strategy\n\nDo not migrate the production script all at once.\n\nUse a three-stage migration.\n\n\n Stage 1: build a tiny Community-1 proof-of-life script\n Stage 2: port only diarization code\n Stage 3: reintegrate transcription, VAD, and speaker-label alignment\n\n\nThis prevents one common failure mode:\n\n\n changed model + changed pyannote version + changed TorchCodec + changed FFmpeg + changed CUDA + changed VAD + changed transcript alignment\n ↓\n too many variables\n ↓\n impossible to tell what broke\n\n\n* * *\n\n# Stage 1 — prove Community-1 works by itself\n\nCreate a new test file, separate from `diaritranscribe3.py`.\n\nFor example:\n\n\n check_pyannote4_community1.py\n\n\nUse this as a minimal proof-of-life script:\n\n\n #!/usr/bin/env -S uv run --script\n # /// script\n # requires-python = \">=3.10,<3.14\"\n # dependencies = [\n # \"pyannote.audio>=4,<5\",\n # \"torch\",\n # \"torchaudio\",\n # \"torchcodec\",\n # ]\n # ///\n\n import os\n from importlib.metadata import version\n\n import torch\n from pyannote.audio import Pipeline\n from pyannote.audio.pipelines.utils.hook import ProgressHook\n\n MODEL_ID = \"pyannote/speaker-diarization-community-1\"\n AUDIO_PATH = \"audio.wav\"\n\n token = os.environ.get(\"HF_TOKEN\")\n if not token:\n raise RuntimeError(\"Set HF_TOKEN before running this script.\")\n\n print(\"pyannote.audio:\", version(\"pyannote.audio\"))\n print(\"torch:\", torch.__version__)\n print(\"torch cuda build:\", torch.version.cuda)\n print(\"cuda available:\", torch.cuda.is_available())\n print(\"torchaudio:\", version(\"torchaudio\"))\n print(\"torchcodec:\", version(\"torchcodec\"))\n\n pipeline = Pipeline.from_pretrained(\n MODEL_ID,\n token=token,\n )\n\n if torch.cuda.is_available():\n pipeline.to(torch.device(\"cuda\"))\n\n with ProgressHook() as hook:\n output = pipeline(AUDIO_PATH, hook=hook)\n\n print(\"\\nRegular diarization:\")\n for turn, speaker in output.speaker_diarization:\n print(f\"{turn.start:.3f}\\t{turn.end:.3f}\\t{speaker}\")\n\n print(\"\\nExclusive diarization:\")\n if hasattr(output, \"exclusive_speaker_diarization\"):\n for turn, speaker in output.exclusive_speaker_diarization:\n print(f\"{turn.start:.3f}\\t{turn.end:.3f}\\t{speaker}\")\n else:\n print(\"exclusive_speaker_diarization is not available on this output.\")\n\n\nRun it like:\n\n\n export HF_TOKEN=\"<HUGGINGFACE_ACCESS_TOKEN>\"\n uv run --refresh --script check_pyannote4_community1.py\n\n\nIn normal prose, write the token placeholder as `\\<HUGGINGFACE_ACCESS_TOKEN\\>`.\n\nBefore running it, make sure:\n\n 1. you accepted the Community-1 user conditions;\n 2. your token can access the model;\n 3. FFmpeg is installed;\n 4. the test file `audio.wav` exists.\n\n\n\nRelevant setup docs:\n\n * Community-1 model card\n * pyannote.audio README\n * Hugging Face access tokens\n * TorchCodec README\n\n\n\n* * *\n\n# Stage 2 — choose a coherent Torch/TorchCodec version family\n\nThe current pyannote project metadata says the modern branch requires:\n\n\n Python >=3.10\n torch >=2.8.0\n torchaudio >=2.8.0\n torchcodec >=0.7.0\n\n\nSource:\n\n * pyannote.audio pyproject.toml\n\n\n\nBut “greater than or equal” does not mean every arbitrary combination is equally good.\n\nTorchCodec publishes a compatibility table. Current table highlights include:\n\n\n torchcodec 0.7 ↔ torch 2.8\n torchcodec 0.8 ↔ torch 2.9\n torchcodec 0.9 ↔ torch 2.9\n torchcodec 0.10 ↔ torch 2.10\n torchcodec 0.11 ↔ torch 2.11\n\n\nSource:\n\n * TorchCodec README: compatibility table\n\n\n\nSo do not mix randomly.\n\n## Conservative modern family\n\nThis is the least aggressive Community-1 migration target:\n\n\n pyannote.audio>=4,<5\n torch==2.8.0\n torchaudio==2.8.0\n torchcodec==0.7.*\n\n\nPros:\n\n * close to the minimum modern pyannote requirements;\n * avoids jumping all the way to newer Torch/TorchAudio generations;\n * TorchCodec `0.7` matches Torch `2.8`;\n * likely easier if the rest of your audio stack was stabilized around Torch 2.8.\n\n\n\nCons:\n\n * still close to the old TorchAudio transition boundary;\n * may not represent the newest pyannote-tested stack.\n\n\n\n## Newer Torch family\n\nA newer family might look like:\n\n\n pyannote.audio>=4,<5\n torch==2.9.*\n torchaudio==2.9.*\n torchcodec==0.9.*\n\n\nor:\n\n\n pyannote.audio>=4,<5\n torch==2.10.*\n torchaudio==2.10.*\n torchcodec==0.10.*\n\n\nPros:\n\n * more aligned with the post-TorchAudio-2.9 world;\n * better long-term direction if your other dependencies support it.\n\n\n\nCons:\n\n * may expose TorchCodec/FFmpeg issues;\n * may conflict with faster-whisper/CTranslate2 expectations;\n * may require more careful PyTorch CUDA wheel/index selection.\n\n\n\n## Practical advice\n\nFor a migration branch, start with the conservative modern family:\n\n\n \"pyannote.audio>=4,<5\",\n \"torch==2.8.0\",\n \"torchaudio==2.8.0\",\n \"torchcodec==0.7.*\",\n\n\nThen, after Community-1 works, decide whether to move Torch upward.\n\nDo not solve every modernization problem at once.\n\n* * *\n\n# Stage 3 — remove or isolate `brouhaha`\n\nBecause `brouhaha` pins pyannote 3.3, your Community-1 test script should **not** include Brouhaha.\n\nFor Path B, the dependency block should start without it:\n\n\n #!/usr/bin/env -S uv run --script\n # /// script\n # requires-python = \">=3.10,<3.14\"\n # dependencies = [\n # \"pyannote.audio>=4,<5\",\n # \"torch==2.8.0\",\n # \"torchaudio==2.8.0\",\n # \"torchcodec==0.7.*\",\n # ]\n # ///\n\n\nOnly after Community-1 works should you decide what to do with Brouhaha.\n\n## If you remove Brouhaha\n\nDelete:\n\n\n \"brouhaha @ file:///home/user/diarization/repos/.venv/brouhaha-vad\",\n\n\nand remove code like:\n\n\n import brouhaha\n\n\nor any function calls into Brouhaha.\n\nThen rely on pyannote diarization directly, or use another VAD/preprocessing layer.\n\n## If you fork Brouhaha\n\nEdit its dependency metadata.\n\nFor example, if its `pyproject.toml` contains:\n\n\n dependencies = [\n \"pyannote-audio==3.3.0\",\n ]\n\n\nyou could test:\n\n\n dependencies = [\n \"pyannote-audio>=4,<5\",\n ]\n\n\nor, if Brouhaha does not actually need pyannote at runtime after your refactor:\n\n\n dependencies = []\n\n\nBut do this only in a branch or copy.\n\nThen run its own tests, or at least import it:\n\n\n uv run --refresh --script check_brouhaha_import.py\n\n\nwhere:\n\n\n #!/usr/bin/env -S uv run --script\n # /// script\n # requires-python = \">=3.10,<3.14\"\n # dependencies = [\n # \"brouhaha @ file:///home/user/diarization/repos/.venv/brouhaha-vad\",\n # \"pyannote.audio>=4,<5\",\n # ]\n # ///\n\n import brouhaha\n from importlib.metadata import version\n\n print(\"brouhaha import OK\")\n print(\"pyannote.audio:\", version(\"pyannote.audio\"))\n\n\nIf this fails, Brouhaha is not pyannote-4-compatible yet.\n\n## If you split environments\n\nUse two scripts.\n\nFirst script:\n\n\n vad_preprocess.py\n\n\nuses Brouhaha and pyannote 3.3 if needed.\n\nSecond script:\n\n\n diarize_community1.py\n\n\nuses pyannote 4 and Community-1.\n\nThe handoff should be a file, JSON, RTTM, or plain timestamp list. This is clunkier, but it avoids forcing incompatible libraries into one dependency graph.\n\n* * *\n\n# Stage 4 — update the pyannote call\n\nOld Path A code:\n\n\n pipeline = Pipeline.from_pretrained(\n \"pyannote/speaker-diarization-3.1\",\n use_auth_token=tokens[\"diarization\"],\n )\n\n diarization = pipeline(audio_path)\n\n for turn, _, speaker in diarization.itertracks(yield_label=True):\n ...\n\n\nNew Path B code:\n\n\n pipeline = Pipeline.from_pretrained(\n \"pyannote/speaker-diarization-community-1\",\n token=tokens[\"diarization\"],\n )\n\n output = pipeline(audio_path)\n\n for turn, speaker in output.speaker_diarization:\n ...\n\n\nAnd, for transcript alignment, prefer testing:\n\n\n for turn, speaker in output.exclusive_speaker_diarization:\n ...\n\n\nThe current Community-1 model card says `exclusive_speaker_diarization` is provided on top of regular diarization and is meant to simplify reconciliation with transcription timestamps.\n\nSource:\n\n * Community-1: exclusive speaker diarization\n\n\n\n* * *\n\n# Stage 5 — rewrite speaker/transcript alignment around exclusive diarization\n\nThis is the most important practical benefit for your script.\n\nYour final goal is not just diarization. Your goal is:\n\n\n audio file\n ↓\n transcript segments or words\n ↓\n speaker labels\n ↓\n speaker-attributed transcript\n\n\nOld diarization can produce fine-grained, overlapping, or awkward speaker turns. That can be hard to align to Whisper/faster-whisper transcript segments.\n\nCommunity-1 adds:\n\n\n output.exclusive_speaker_diarization\n\n\nUse that first for transcript alignment.\n\n## Basic maximum-overlap assignment\n\nUse this when your ASR gives segment-level timestamps.\n\n\n def overlap_seconds(a_start, a_end, b_start, b_end):\n return max(0.0, min(a_end, b_end) - max(a_start, b_start))\n\n\n def assign_speaker_to_segment(segment_start, segment_end, diarization_turns):\n best_speaker = None\n best_overlap = 0.0\n\n for turn_start, turn_end, speaker in diarization_turns:\n overlap = overlap_seconds(segment_start, segment_end, turn_start, turn_end)\n if overlap > best_overlap:\n best_overlap = overlap\n best_speaker = speaker\n\n return best_speaker or \"UNKNOWN\"\n\n\n def diarization_to_turns(exclusive_speaker_diarization):\n turns = []\n for turn, speaker in exclusive_speaker_diarization:\n turns.append((float(turn.start), float(turn.end), str(speaker)))\n return turns\n\n\nThen:\n\n\n turns = diarization_to_turns(output.exclusive_speaker_diarization)\n\n for segment in whisper_segments:\n speaker = assign_speaker_to_segment(segment.start, segment.end, turns)\n print(f\"[{segment.start:.2f}-{segment.end:.2f}] {speaker}: {segment.text}\")\n\n\n## Word-level assignment\n\nIf faster-whisper returns word timestamps, word-level assignment is usually better.\n\nConceptually:\n\n\n for each word:\n find the speaker turn with max overlap\n assign that speaker to the word\n then merge adjacent words with the same speaker\n\n\nThis handles speaker changes inside a long ASR segment better than assigning one speaker to the whole segment.\n\n* * *\n\n# Stage 6 — verify FFmpeg and TorchCodec\n\nCommunity-1 uses TorchCodec-backed decoding. The pyannote README explicitly says FFmpeg must be installed because TorchCodec handles audio decoding.\n\nCheck FFmpeg:\n\n\n ffmpeg -version\n\n\nCheck TorchCodec import:\n\n\n import torchcodec\n print(\"torchcodec import OK\")\n\n\nCheck versions:\n\n\n from importlib.metadata import version\n import torch\n\n print(\"torch:\", torch.__version__)\n print(\"torchcodec:\", version(\"torchcodec\"))\n\n\nTorchCodec supports FFmpeg major versions in `[4, 8]`, and on Windows it needs FFmpeg builds with separate shared libraries. The TorchCodec README also provides the TorchCodec/Torch/Python compatibility table.\n\nSource:\n\n * TorchCodec README\n\n\n\n## If TorchCodec fails\n\nCommon error shapes:\n\n\n RuntimeError: Could not load libtorchcodec\n\n\n\n FFmpeg is not properly installed\n\n\n\n No compatible FFmpeg found\n\n\nLikely causes:\n\n * FFmpeg missing;\n * FFmpeg installed but not visible on `PATH`;\n * Windows FFmpeg build is not a shared build;\n * TorchCodec version does not match Torch version;\n * Python version is outside the wheel’s supported range;\n * unsupported architecture, especially Linux ARM64/aarch64.\n\n\n\nCheck the compatibility table before changing random packages.\n\n* * *\n\n# Stage 7 — choose uv layout: inline script vs project\n\nYou can do Path B with inline script metadata, but a project layout is cleaner once you are juggling:\n\n\n pyannote.audio\n torch\n torchaudio\n torchcodec\n faster-whisper\n ctranslate2\n ffmpeg\n CUDA\n tokens\n local packages\n\n\n## Inline script version\n\nGood for quick experiments:\n\n\n #!/usr/bin/env -S uv run --script\n # /// script\n # requires-python = \">=3.10,<3.14\"\n # dependencies = [\n # \"pyannote.audio>=4,<5\",\n # \"torch==2.8.0\",\n # \"torchaudio==2.8.0\",\n # \"torchcodec==0.7.*\",\n # ]\n # ///\n\n from pyannote.audio import Pipeline\n\n\nLock after success:\n\n\n uv lock --script check_pyannote4_community1.py\n\n\nSource:\n\n * uv script locking docs\n\n\n\n## Project version\n\nBetter for the real app.\n\n`pyproject.toml`:\n\n\n [project]\n name = \"diaritranscribe\"\n version = \"0.1.0\"\n requires-python = \">=3.10,<3.14\"\n dependencies = [\n \"pyannote.audio>=4,<5\",\n \"faster-whisper\",\n \"numpy\",\n \"scikit-learn\",\n \"omegaconf\",\n \"torch==2.8.0\",\n \"torchaudio==2.8.0\",\n \"torchcodec==0.7.*\",\n ]\n\n [tool.uv]\n required-version = \">=0.5.3\"\n\n\nThen:\n\n\n uv lock\n uv sync\n uv run python scripts/diaritranscribe4.py\n\n\nIf you need explicit CUDA PyTorch indexes, use uv’s PyTorch guide:\n\n * Using uv with PyTorch\n\n\n\nPyTorch packaging is unusual because CPU and CUDA builds may live on different indexes and use local version specifiers such as `+cpu` or `+cu130`.\n\n* * *\n\n# Stage 8 — update token handling\n\nUse environment variables rather than hardcoding tokens.\n\n\n export HF_TOKEN=\"<HUGGINGFACE_ACCESS_TOKEN>\"\n\n\nPython:\n\n\n import os\n\n token = os.environ.get(\"HF_TOKEN\")\n if not token:\n raise RuntimeError(\"Set HF_TOKEN.\")\n\n\nThen:\n\n\n pipeline = Pipeline.from_pretrained(\n \"pyannote/speaker-diarization-community-1\",\n token=token,\n )\n\n\nIn normal prose, write the placeholder as `\\<HUGGINGFACE_ACCESS_TOKEN\\>`.\n\nMake sure the token’s Hugging Face account has accepted the model conditions:\n\n * pyannote/speaker-diarization-community-1\n * Hugging Face access tokens\n\n\n\nMissing access usually gives errors like:\n\n\n 401 Unauthorized\n 403 Forbidden\n Repository not found\n gated repo\n\n\nThose are different from the old `unexpected keyword argument 'token'` error.\n\n* * *\n\n# Stage 9 — account for telemetry\n\nCurrent pyannote docs mention optional telemetry. The README says it tracks privacy-preserving information such as pipeline origin, pipeline class, file duration, and speaker-count parameters, and documents ways to control it.\n\nDisable for the current process if desired:\n\n\n export PYANNOTE_METRICS_ENABLED=0\n\n\nOr in Python:\n\n\n from pyannote.audio.telemetry import set_telemetry_metrics\n\n set_telemetry_metrics(False)\n\n\nSource:\n\n * pyannote.audio README telemetry section\n\n\n\n* * *\n\n# Stage 10 — test accuracy and runtime before deleting Path A\n\nDo not delete the working pyannote 3.3 path until you compare:\n\n * same audio file;\n * same hardware;\n * same preprocessing;\n * same transcript segments;\n * same speaker-label assignment policy;\n * same output format.\n\n\n\nCompare:\n\n\n speaker count\n number of turns\n total diarization time\n overlap behavior\n transcript speaker-label quality\n GPU memory use\n runtime\n failure rate on long files\n\n\nA migration is successful only if the final speaker-attributed transcript improves or remains acceptable.\n\n* * *\n\n# Suggested branch layout\n\nKeep two scripts for a while:\n\n\n diaritranscribe3.py # recovery path, pyannote 3.3\n diaritranscribe4.py # migration path, pyannote 4 / Community-1\n\n\nKeep two lockfiles if using inline scripts:\n\n\n diaritranscribe3.py.lock\n diaritranscribe4.py.lock\n\n\nThis prevents accidentally breaking the known-good path while testing the new one.\n\n* * *\n\n# Minimal `diaritranscribe4.py` starting point\n\nThis is a clean starting point for just the diarization part.\n\n\n #!/usr/bin/env -S uv run --script\n # /// script\n # requires-python = \">=3.10,<3.14\"\n # dependencies = [\n # \"pyannote.audio>=4,<5\",\n # \"torch==2.8.0\",\n # \"torchaudio==2.8.0\",\n # \"torchcodec==0.7.*\",\n # ]\n # ///\n\n import argparse\n import os\n from importlib.metadata import version\n\n import torch\n from pyannote.audio import Pipeline\n from pyannote.audio.pipelines.utils.hook import ProgressHook\n\n MODEL_ID = \"pyannote/speaker-diarization-community-1\"\n\n\n def print_versions():\n print(\"pyannote.audio:\", version(\"pyannote.audio\"))\n print(\"torch:\", torch.__version__)\n print(\"torch cuda build:\", torch.version.cuda)\n print(\"cuda available:\", torch.cuda.is_available())\n print(\"torchaudio:\", version(\"torchaudio\"))\n print(\"torchcodec:\", version(\"torchcodec\"))\n\n\n def load_pipeline(token: str):\n pipeline = Pipeline.from_pretrained(\n MODEL_ID,\n token=token,\n )\n\n if torch.cuda.is_available():\n pipeline.to(torch.device(\"cuda\"))\n\n return pipeline\n\n\n def run_diarization(audio_path: str):\n token = os.environ.get(\"HF_TOKEN\")\n if not token:\n raise RuntimeError(\"Set HF_TOKEN before running this script.\")\n\n print_versions()\n print(f\"Loading {MODEL_ID}...\")\n\n pipeline = load_pipeline(token)\n\n with ProgressHook() as hook:\n output = pipeline(audio_path, hook=hook)\n\n return output\n\n\n def print_diarization(output):\n print(\"\\nRegular speaker diarization:\")\n for turn, speaker in output.speaker_diarization:\n print(f\"{turn.start:.3f}\\t{turn.end:.3f}\\t{speaker}\")\n\n print(\"\\nExclusive speaker diarization:\")\n if hasattr(output, \"exclusive_speaker_diarization\"):\n for turn, speaker in output.exclusive_speaker_diarization:\n print(f\"{turn.start:.3f}\\t{turn.end:.3f}\\t{speaker}\")\n else:\n print(\"Not available.\")\n\n\n def main():\n parser = argparse.ArgumentParser()\n parser.add_argument(\"audio_path\")\n args = parser.parse_args()\n\n output = run_diarization(args.audio_path)\n print_diarization(output)\n\n\n if __name__ == \"__main__\":\n main()\n\n\nRun:\n\n\n export HF_TOKEN=\"<HUGGINGFACE_ACCESS_TOKEN>\"\n uv run --refresh --script diaritranscribe4.py audio.wav\n\n\nLock after it works:\n\n\n uv lock --script diaritranscribe4.py\n\n\n* * *\n\n# Adding faster-whisper back later\n\nAfter Community-1 works by itself, add faster-whisper back.\n\n\n # /// script\n # requires-python = \">=3.10,<3.14\"\n # dependencies = [\n # \"pyannote.audio>=4,<5\",\n # \"torch==2.8.0\",\n # \"torchaudio==2.8.0\",\n # \"torchcodec==0.7.*\",\n # \"faster-whisper\",\n # \"numpy\",\n # \"scikit-learn\",\n # \"omegaconf\",\n # ]\n # ///\n\n\nThen test faster-whisper separately before combining:\n\n\n from faster_whisper import WhisperModel\n\n model = WhisperModel(\"small\", device=\"cuda\", compute_type=\"float16\")\n segments, info = model.transcribe(\"audio.wav\", beam_size=5)\n\n for segment in segments:\n print(segment.start, segment.end, segment.text)\n\n\nIf faster-whisper fails with CUDA/cuDNN/CTranslate2 errors, that is separate from pyannote.\n\nSource:\n\n * faster-whisper README\n\n\n\n* * *\n\n# Common Path B failure modes\n\n## Failure: `No solution found`\n\nUsually means you still have a dependency pin like:\n\n\n brouhaha -> pyannote-audio==3.3.0\n\n\nFix:\n\n * remove Brouhaha from the pyannote 4 environment;\n * fork/update Brouhaha;\n * split environments.\n\n\n\n## Failure: `unexpected keyword argument 'token'`\n\nThis means you are still on old pyannote.\n\nCheck:\n\n\n from importlib.metadata import version\n print(version(\"pyannote.audio\"))\n\n\nIf it prints `3.3.0`, you are not on Path B yet.\n\n## Failure: `unexpected keyword argument 'use_auth_token'`\n\nThis means you are probably on newer pyannote but still using old code.\n\nUse:\n\n\n token=\"<HUGGINGFACE_ACCESS_TOKEN>\"\n\n\nnot:\n\n\n use_auth_token=\"<HUGGINGFACE_ACCESS_TOKEN>\"\n\n\n## Failure: `Could not load libtorchcodec`\n\nCheck:\n\n * TorchCodec/Torch version compatibility;\n * FFmpeg installation;\n * Python version;\n * platform wheel availability.\n\n\n\nSource:\n\n * TorchCodec README\n\n\n\n## Failure: model access denied\n\nCheck that you accepted the model conditions and used a valid token:\n\n * Community-1 model card\n * Hugging Face access tokens\n\n\n\n## Failure: CUDA not available\n\nCheck PyTorch install:\n\n\n import torch\n print(torch.__version__)\n print(torch.version.cuda)\n print(torch.cuda.is_available())\n\n\nUse uv’s PyTorch guide for accelerator-specific builds:\n\n * uv PyTorch guide\n\n\n\n* * *\n\n# Recommended Path B checklist\n\n 1. Create `diaritranscribe4.py`.\n 2. Remove `brouhaha` from that script.\n 3. Use `pyannote.audio>=4,<5`.\n 4. Start with a coherent Torch/TorchAudio/TorchCodec family.\n 5. Install FFmpeg.\n 6. Accept Community-1 model conditions.\n 7. Set `HF_TOKEN`.\n 8. Load with `token=...`.\n 9. Use `output.speaker_diarization`.\n 10. Prefer `output.exclusive_speaker_diarization` for transcript alignment.\n 11. Test pyannote alone.\n 12. Add faster-whisper back only after pyannote works.\n 13. Rebuild speaker assignment around maximum overlap or word-level timestamps.\n 14. Lock the migrated script.\n 15. Keep the pyannote 3.3 script until the new output is verified.\n\n\n\n* * *\n\n# Bottom line\n\nPath B is not:\n\n\n change speaker-diarization-3.1 to speaker-diarization-community-1\n\n\nand it is not:\n\n\n change use_auth_token= to token=\n\n\nIt is:\n\n\n remove or isolate the Brouhaha pyannote 3.3 pin\n ↓\n move to pyannote.audio 4.x\n ↓\n use Community-1\n ↓\n install/verify TorchCodec and FFmpeg\n ↓\n change the output parsing code\n ↓\n use exclusive diarization for transcript alignment\n ↓\n lock the new environment\n\n\nFor your project, the safest approach is to keep:\n\n\n diaritranscribe3.py\n\n\nas the recovery script and create:\n\n\n diaritranscribe4.py\n\n\nas the Community-1 migration script.\n\nDo not merge them until Community-1 works alone, faster-whisper works alone, and the speaker-attributed transcript is at least as good as your pyannote 3.3 path.",
"title": "Module 'torchaudio' has no attribute 'AudioMetaData'"
}