Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreicknpndra7vd4gqwmpupdellv273fsrit5zniqcyyqaokeym6dt2m",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mkocwsd2vk22"
  },
  "path": "/t/module-torchaudio-has-no-attribute-audiometadata/175647#post_6",
  "publishedAt": "2026-04-29T23:22:32.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "pyannote.audio GitHub README",
    "pyannote/speaker-diarization-community-1 model card",
    "pyannote.audio release notes",
    "TorchCodec README and compatibility table",
    "uv PyTorch guide",
    "uv script locking docs",
    "Community-1 model card: quick start, GPU, exclusive diarization, offline use",
    "pyannote.audio release notes: use_auth_token renamed to token",
    "pyannote README: Community-1 usage",
    "Community-1 model card",
    "pyannote.audio README",
    "Hugging Face access tokens",
    "TorchCodec README",
    "pyannote.audio pyproject.toml",
    "TorchCodec README: compatibility table",
    "Community-1: exclusive speaker diarization",
    "Using uv with PyTorch",
    "pyannote/speaker-diarization-community-1",
    "pyannote.audio README telemetry section",
    "faster-whisper README"
  ],
  "textContent": "Looking ahead, updating the library is really the best course of action, but given your current setup, the migration process is quite complicated:\n\n* * *\n\n# Path B — later migration: use `Community-1` and `pyannote.audio` 4.x\n\n## Short version\n\nPath B means intentionally leaving the old `pyannote.audio==3.3.0` recovery stack and moving to the newer pyannote stack:\n\n\n    pyannote.audio 4.x\n    pyannote/speaker-diarization-community-1\n    Pipeline.from_pretrained(..., token=...)\n    output.speaker_diarization\n    output.exclusive_speaker_diarization\n    TorchCodec-backed audio decoding\n    FFmpeg installed\n\n\nThis is not just a one-line model change.\n\nIt is a real migration because your current `brouhaha` dependency pins:\n\n\n    pyannote-audio==3.3.0\n\n\nwhile the newer `Community-1` examples expect the newer pyannote API surface:\n\n\n    Pipeline.from_pretrained(\n        \"pyannote/speaker-diarization-community-1\",\n        token=\"<HUGGINGFACE_ACCESS_TOKEN>\",\n    )\n\n\nThe current pyannote README shows this `community-1` + `token=...` style and says FFmpeg must be installed because TorchCodec handles audio decoding:\n\n  * pyannote.audio GitHub README\n  * pyannote/speaker-diarization-community-1 model card\n  * pyannote.audio release notes\n  * TorchCodec README and compatibility table\n  * uv PyTorch guide\n  * uv script locking docs\n\n\n\n* * *\n\n## Why you should not do Path B casually\n\nYour current stack has two separate constraints:\n\n\n    brouhaha==0.9.0\n            ↓\n    requires pyannote-audio==3.3.0\n\n\nand:\n\n\n    Community-1 / pyannote 4.x examples\n            ↓\n    use token=...\n    use output.speaker_diarization\n    use output.exclusive_speaker_diarization\n    expect TorchCodec/FFmpeg audio decoding\n\n\nThose are different worlds.\n\nThe pyannote 3.3 recovery world uses:\n\n\n    pipeline = Pipeline.from_pretrained(\n        \"pyannote/speaker-diarization-3.1\",\n        use_auth_token=\"<HUGGINGFACE_ACCESS_TOKEN>\",\n    )\n\n    diarization = pipeline(\"audio.wav\")\n\n    for turn, _, speaker in diarization.itertracks(yield_label=True):\n        ...\n\n\nThe pyannote 4 / Community-1 world uses:\n\n\n    pipeline = Pipeline.from_pretrained(\n        \"pyannote/speaker-diarization-community-1\",\n        token=\"<HUGGINGFACE_ACCESS_TOKEN>\",\n    )\n\n    output = pipeline(\"audio.wav\")\n\n    for turn, speaker in output.speaker_diarization:\n        ...\n\n\nAnd, when available, the newer path also gives:\n\n\n    output.exclusive_speaker_diarization\n\n\nThat `exclusive_speaker_diarization` output is especially relevant for your transcription project because the Community-1 model card describes it as simplifying reconciliation between diarization timestamps and transcription timestamps.\n\nSource links:\n\n  * Community-1 model card: quick start, GPU, exclusive diarization, offline use\n  * pyannote.audio release notes: use_auth_token renamed to token\n  * pyannote README: Community-1 usage\n\n\n\n* * *\n\n## What Path B is for\n\nChoose Path B if you want one or more of these:\n\n  * newer `pyannote.audio` API;\n  * the open-source `pyannote/speaker-diarization-community-1` pipeline;\n  * better diarization quality than the old `speaker-diarization-3.1` baseline;\n  * easier reconciliation with transcripts using `exclusive_speaker_diarization`;\n  * a forward-looking stack instead of living on TorchAudio 2.8 deprecation warnings;\n  * a cleaner long-term project layout.\n\n\n\nDo **not** choose Path B if your immediate goal is only:\n\n\n    make the old script run with the least changes\n\n\nFor the least-change recovery path, stay with:\n\n\n    pyannote.audio==3.3.0\n    pyannote/speaker-diarization-3.1\n    use_auth_token=...\n    torch==2.8.0\n    torchaudio==2.8.0\n    torchcodec==0.7.*\n\n\nPath B is the better long-term migration, but the worse emergency fix.\n\n* * *\n\n# The main blocker: `brouhaha`\n\n## The problem\n\nYour resolver already told you:\n\n\n    brouhaha==0.9.0 depends on pyannote-audio==3.3.0\n\n\nSo this cannot work:\n\n\n    \"pyannote.audio>=4,<5\",\n    \"brouhaha @ file:///home/user/diarization/repos/.venv/brouhaha-vad\",\n\n\nunless you change something about `brouhaha`.\n\nThe resolver is correct. If `brouhaha` requires exactly:\n\n\n    pyannote-audio==3.3.0\n\n\nthen the environment cannot also contain:\n\n\n    pyannote.audio>=4\n\n\n## Your options\n\nYou have five realistic choices.\n\nOption | What it means | Good if | Risk\n---|---|---|---\nRemove `brouhaha` | Delete it from dependencies and remove/replace its VAD calls. | You do not strictly need Brouhaha VAD. | You may lose the current VAD behavior.\nReplace `brouhaha` | Use pyannote’s own diarization behavior, faster-whisper VAD, Silero VAD, or another VAD stage. | You only used Brouhaha as a helper. | May change segmentation and final transcript quality.\nFork/edit `brouhaha` | Change its dependency metadata from `pyannote-audio==3.3.0` to a looser or newer version. | You control the local package and can test it. | Its code may actually depend on pyannote 3.3 internals.\nSplit environments | Run Brouhaha preprocessing in one script/env, then run pyannote 4 diarization in another script/env. | You need Brouhaha but also want Community-1. | More moving parts and file handoff.\nStay on Path A | Do not migrate now. Keep pyannote 3.3. | You want stability first. | You do not get Community-1 yet.\n\nMy recommendation: **do not start by editing`brouhaha` dependency metadata blindly.**\n\nFirst inspect why it pins pyannote:\n\n\n    grep -R \"pyannote\" -n /home/user/diarization/repos/.venv/brouhaha-vad\n\n\nLook for files like:\n\n\n    pyproject.toml\n    setup.py\n    setup.cfg\n    requirements.txt\n\n\nThen inspect imports:\n\n\n    grep -R \"from pyannote\\|import pyannote\" -n /home/user/diarization/repos/.venv/brouhaha-vad\n\n\nIf Brouhaha only uses public, stable APIs, loosening the pin might work. If it uses pyannote internals or pyannote 3.x-specific output structures, expect breakage.\n\n* * *\n\n# Recommended migration strategy\n\nDo not migrate the production script all at once.\n\nUse a three-stage migration.\n\n\n    Stage 1: build a tiny Community-1 proof-of-life script\n    Stage 2: port only diarization code\n    Stage 3: reintegrate transcription, VAD, and speaker-label alignment\n\n\nThis prevents one common failure mode:\n\n\n    changed model + changed pyannote version + changed TorchCodec + changed FFmpeg + changed CUDA + changed VAD + changed transcript alignment\n            ↓\n    too many variables\n            ↓\n    impossible to tell what broke\n\n\n* * *\n\n# Stage 1 — prove Community-1 works by itself\n\nCreate a new test file, separate from `diaritranscribe3.py`.\n\nFor example:\n\n\n    check_pyannote4_community1.py\n\n\nUse this as a minimal proof-of-life script:\n\n\n    #!/usr/bin/env -S uv run --script\n    # /// script\n    # requires-python = \">=3.10,<3.14\"\n    # dependencies = [\n    #   \"pyannote.audio>=4,<5\",\n    #   \"torch\",\n    #   \"torchaudio\",\n    #   \"torchcodec\",\n    # ]\n    # ///\n\n    import os\n    from importlib.metadata import version\n\n    import torch\n    from pyannote.audio import Pipeline\n    from pyannote.audio.pipelines.utils.hook import ProgressHook\n\n    MODEL_ID = \"pyannote/speaker-diarization-community-1\"\n    AUDIO_PATH = \"audio.wav\"\n\n    token = os.environ.get(\"HF_TOKEN\")\n    if not token:\n        raise RuntimeError(\"Set HF_TOKEN before running this script.\")\n\n    print(\"pyannote.audio:\", version(\"pyannote.audio\"))\n    print(\"torch:\", torch.__version__)\n    print(\"torch cuda build:\", torch.version.cuda)\n    print(\"cuda available:\", torch.cuda.is_available())\n    print(\"torchaudio:\", version(\"torchaudio\"))\n    print(\"torchcodec:\", version(\"torchcodec\"))\n\n    pipeline = Pipeline.from_pretrained(\n        MODEL_ID,\n        token=token,\n    )\n\n    if torch.cuda.is_available():\n        pipeline.to(torch.device(\"cuda\"))\n\n    with ProgressHook() as hook:\n        output = pipeline(AUDIO_PATH, hook=hook)\n\n    print(\"\\nRegular diarization:\")\n    for turn, speaker in output.speaker_diarization:\n        print(f\"{turn.start:.3f}\\t{turn.end:.3f}\\t{speaker}\")\n\n    print(\"\\nExclusive diarization:\")\n    if hasattr(output, \"exclusive_speaker_diarization\"):\n        for turn, speaker in output.exclusive_speaker_diarization:\n            print(f\"{turn.start:.3f}\\t{turn.end:.3f}\\t{speaker}\")\n    else:\n        print(\"exclusive_speaker_diarization is not available on this output.\")\n\n\nRun it like:\n\n\n    export HF_TOKEN=\"<HUGGINGFACE_ACCESS_TOKEN>\"\n    uv run --refresh --script check_pyannote4_community1.py\n\n\nIn normal prose, write the token placeholder as `\\<HUGGINGFACE_ACCESS_TOKEN\\>`.\n\nBefore running it, make sure:\n\n  1. you accepted the Community-1 user conditions;\n  2. your token can access the model;\n  3. FFmpeg is installed;\n  4. the test file `audio.wav` exists.\n\n\n\nRelevant setup docs:\n\n  * Community-1 model card\n  * pyannote.audio README\n  * Hugging Face access tokens\n  * TorchCodec README\n\n\n\n* * *\n\n# Stage 2 — choose a coherent Torch/TorchCodec version family\n\nThe current pyannote project metadata says the modern branch requires:\n\n\n    Python >=3.10\n    torch >=2.8.0\n    torchaudio >=2.8.0\n    torchcodec >=0.7.0\n\n\nSource:\n\n  * pyannote.audio pyproject.toml\n\n\n\nBut “greater than or equal” does not mean every arbitrary combination is equally good.\n\nTorchCodec publishes a compatibility table. Current table highlights include:\n\n\n    torchcodec 0.7  ↔ torch 2.8\n    torchcodec 0.8  ↔ torch 2.9\n    torchcodec 0.9  ↔ torch 2.9\n    torchcodec 0.10 ↔ torch 2.10\n    torchcodec 0.11 ↔ torch 2.11\n\n\nSource:\n\n  * TorchCodec README: compatibility table\n\n\n\nSo do not mix randomly.\n\n## Conservative modern family\n\nThis is the least aggressive Community-1 migration target:\n\n\n    pyannote.audio>=4,<5\n    torch==2.8.0\n    torchaudio==2.8.0\n    torchcodec==0.7.*\n\n\nPros:\n\n  * close to the minimum modern pyannote requirements;\n  * avoids jumping all the way to newer Torch/TorchAudio generations;\n  * TorchCodec `0.7` matches Torch `2.8`;\n  * likely easier if the rest of your audio stack was stabilized around Torch 2.8.\n\n\n\nCons:\n\n  * still close to the old TorchAudio transition boundary;\n  * may not represent the newest pyannote-tested stack.\n\n\n\n## Newer Torch family\n\nA newer family might look like:\n\n\n    pyannote.audio>=4,<5\n    torch==2.9.*\n    torchaudio==2.9.*\n    torchcodec==0.9.*\n\n\nor:\n\n\n    pyannote.audio>=4,<5\n    torch==2.10.*\n    torchaudio==2.10.*\n    torchcodec==0.10.*\n\n\nPros:\n\n  * more aligned with the post-TorchAudio-2.9 world;\n  * better long-term direction if your other dependencies support it.\n\n\n\nCons:\n\n  * may expose TorchCodec/FFmpeg issues;\n  * may conflict with faster-whisper/CTranslate2 expectations;\n  * may require more careful PyTorch CUDA wheel/index selection.\n\n\n\n## Practical advice\n\nFor a migration branch, start with the conservative modern family:\n\n\n    \"pyannote.audio>=4,<5\",\n    \"torch==2.8.0\",\n    \"torchaudio==2.8.0\",\n    \"torchcodec==0.7.*\",\n\n\nThen, after Community-1 works, decide whether to move Torch upward.\n\nDo not solve every modernization problem at once.\n\n* * *\n\n# Stage 3 — remove or isolate `brouhaha`\n\nBecause `brouhaha` pins pyannote 3.3, your Community-1 test script should **not** include Brouhaha.\n\nFor Path B, the dependency block should start without it:\n\n\n    #!/usr/bin/env -S uv run --script\n    # /// script\n    # requires-python = \">=3.10,<3.14\"\n    # dependencies = [\n    #   \"pyannote.audio>=4,<5\",\n    #   \"torch==2.8.0\",\n    #   \"torchaudio==2.8.0\",\n    #   \"torchcodec==0.7.*\",\n    # ]\n    # ///\n\n\nOnly after Community-1 works should you decide what to do with Brouhaha.\n\n## If you remove Brouhaha\n\nDelete:\n\n\n    \"brouhaha @ file:///home/user/diarization/repos/.venv/brouhaha-vad\",\n\n\nand remove code like:\n\n\n    import brouhaha\n\n\nor any function calls into Brouhaha.\n\nThen rely on pyannote diarization directly, or use another VAD/preprocessing layer.\n\n## If you fork Brouhaha\n\nEdit its dependency metadata.\n\nFor example, if its `pyproject.toml` contains:\n\n\n    dependencies = [\n        \"pyannote-audio==3.3.0\",\n    ]\n\n\nyou could test:\n\n\n    dependencies = [\n        \"pyannote-audio>=4,<5\",\n    ]\n\n\nor, if Brouhaha does not actually need pyannote at runtime after your refactor:\n\n\n    dependencies = []\n\n\nBut do this only in a branch or copy.\n\nThen run its own tests, or at least import it:\n\n\n    uv run --refresh --script check_brouhaha_import.py\n\n\nwhere:\n\n\n    #!/usr/bin/env -S uv run --script\n    # /// script\n    # requires-python = \">=3.10,<3.14\"\n    # dependencies = [\n    #   \"brouhaha @ file:///home/user/diarization/repos/.venv/brouhaha-vad\",\n    #   \"pyannote.audio>=4,<5\",\n    # ]\n    # ///\n\n    import brouhaha\n    from importlib.metadata import version\n\n    print(\"brouhaha import OK\")\n    print(\"pyannote.audio:\", version(\"pyannote.audio\"))\n\n\nIf this fails, Brouhaha is not pyannote-4-compatible yet.\n\n## If you split environments\n\nUse two scripts.\n\nFirst script:\n\n\n    vad_preprocess.py\n\n\nuses Brouhaha and pyannote 3.3 if needed.\n\nSecond script:\n\n\n    diarize_community1.py\n\n\nuses pyannote 4 and Community-1.\n\nThe handoff should be a file, JSON, RTTM, or plain timestamp list. This is clunkier, but it avoids forcing incompatible libraries into one dependency graph.\n\n* * *\n\n# Stage 4 — update the pyannote call\n\nOld Path A code:\n\n\n    pipeline = Pipeline.from_pretrained(\n        \"pyannote/speaker-diarization-3.1\",\n        use_auth_token=tokens[\"diarization\"],\n    )\n\n    diarization = pipeline(audio_path)\n\n    for turn, _, speaker in diarization.itertracks(yield_label=True):\n        ...\n\n\nNew Path B code:\n\n\n    pipeline = Pipeline.from_pretrained(\n        \"pyannote/speaker-diarization-community-1\",\n        token=tokens[\"diarization\"],\n    )\n\n    output = pipeline(audio_path)\n\n    for turn, speaker in output.speaker_diarization:\n        ...\n\n\nAnd, for transcript alignment, prefer testing:\n\n\n    for turn, speaker in output.exclusive_speaker_diarization:\n        ...\n\n\nThe current Community-1 model card says `exclusive_speaker_diarization` is provided on top of regular diarization and is meant to simplify reconciliation with transcription timestamps.\n\nSource:\n\n  * Community-1: exclusive speaker diarization\n\n\n\n* * *\n\n# Stage 5 — rewrite speaker/transcript alignment around exclusive diarization\n\nThis is the most important practical benefit for your script.\n\nYour final goal is not just diarization. Your goal is:\n\n\n    audio file\n            ↓\n    transcript segments or words\n            ↓\n    speaker labels\n            ↓\n    speaker-attributed transcript\n\n\nOld diarization can produce fine-grained, overlapping, or awkward speaker turns. That can be hard to align to Whisper/faster-whisper transcript segments.\n\nCommunity-1 adds:\n\n\n    output.exclusive_speaker_diarization\n\n\nUse that first for transcript alignment.\n\n## Basic maximum-overlap assignment\n\nUse this when your ASR gives segment-level timestamps.\n\n\n    def overlap_seconds(a_start, a_end, b_start, b_end):\n        return max(0.0, min(a_end, b_end) - max(a_start, b_start))\n\n\n    def assign_speaker_to_segment(segment_start, segment_end, diarization_turns):\n        best_speaker = None\n        best_overlap = 0.0\n\n        for turn_start, turn_end, speaker in diarization_turns:\n            overlap = overlap_seconds(segment_start, segment_end, turn_start, turn_end)\n            if overlap > best_overlap:\n                best_overlap = overlap\n                best_speaker = speaker\n\n        return best_speaker or \"UNKNOWN\"\n\n\n    def diarization_to_turns(exclusive_speaker_diarization):\n        turns = []\n        for turn, speaker in exclusive_speaker_diarization:\n            turns.append((float(turn.start), float(turn.end), str(speaker)))\n        return turns\n\n\nThen:\n\n\n    turns = diarization_to_turns(output.exclusive_speaker_diarization)\n\n    for segment in whisper_segments:\n        speaker = assign_speaker_to_segment(segment.start, segment.end, turns)\n        print(f\"[{segment.start:.2f}-{segment.end:.2f}] {speaker}: {segment.text}\")\n\n\n## Word-level assignment\n\nIf faster-whisper returns word timestamps, word-level assignment is usually better.\n\nConceptually:\n\n\n    for each word:\n        find the speaker turn with max overlap\n        assign that speaker to the word\n    then merge adjacent words with the same speaker\n\n\nThis handles speaker changes inside a long ASR segment better than assigning one speaker to the whole segment.\n\n* * *\n\n# Stage 6 — verify FFmpeg and TorchCodec\n\nCommunity-1 uses TorchCodec-backed decoding. The pyannote README explicitly says FFmpeg must be installed because TorchCodec handles audio decoding.\n\nCheck FFmpeg:\n\n\n    ffmpeg -version\n\n\nCheck TorchCodec import:\n\n\n    import torchcodec\n    print(\"torchcodec import OK\")\n\n\nCheck versions:\n\n\n    from importlib.metadata import version\n    import torch\n\n    print(\"torch:\", torch.__version__)\n    print(\"torchcodec:\", version(\"torchcodec\"))\n\n\nTorchCodec supports FFmpeg major versions in `[4, 8]`, and on Windows it needs FFmpeg builds with separate shared libraries. The TorchCodec README also provides the TorchCodec/Torch/Python compatibility table.\n\nSource:\n\n  * TorchCodec README\n\n\n\n## If TorchCodec fails\n\nCommon error shapes:\n\n\n    RuntimeError: Could not load libtorchcodec\n\n\n\n    FFmpeg is not properly installed\n\n\n\n    No compatible FFmpeg found\n\n\nLikely causes:\n\n  * FFmpeg missing;\n  * FFmpeg installed but not visible on `PATH`;\n  * Windows FFmpeg build is not a shared build;\n  * TorchCodec version does not match Torch version;\n  * Python version is outside the wheel’s supported range;\n  * unsupported architecture, especially Linux ARM64/aarch64.\n\n\n\nCheck the compatibility table before changing random packages.\n\n* * *\n\n# Stage 7 — choose uv layout: inline script vs project\n\nYou can do Path B with inline script metadata, but a project layout is cleaner once you are juggling:\n\n\n    pyannote.audio\n    torch\n    torchaudio\n    torchcodec\n    faster-whisper\n    ctranslate2\n    ffmpeg\n    CUDA\n    tokens\n    local packages\n\n\n## Inline script version\n\nGood for quick experiments:\n\n\n    #!/usr/bin/env -S uv run --script\n    # /// script\n    # requires-python = \">=3.10,<3.14\"\n    # dependencies = [\n    #   \"pyannote.audio>=4,<5\",\n    #   \"torch==2.8.0\",\n    #   \"torchaudio==2.8.0\",\n    #   \"torchcodec==0.7.*\",\n    # ]\n    # ///\n\n    from pyannote.audio import Pipeline\n\n\nLock after success:\n\n\n    uv lock --script check_pyannote4_community1.py\n\n\nSource:\n\n  * uv script locking docs\n\n\n\n## Project version\n\nBetter for the real app.\n\n`pyproject.toml`:\n\n\n    [project]\n    name = \"diaritranscribe\"\n    version = \"0.1.0\"\n    requires-python = \">=3.10,<3.14\"\n    dependencies = [\n      \"pyannote.audio>=4,<5\",\n      \"faster-whisper\",\n      \"numpy\",\n      \"scikit-learn\",\n      \"omegaconf\",\n      \"torch==2.8.0\",\n      \"torchaudio==2.8.0\",\n      \"torchcodec==0.7.*\",\n    ]\n\n    [tool.uv]\n    required-version = \">=0.5.3\"\n\n\nThen:\n\n\n    uv lock\n    uv sync\n    uv run python scripts/diaritranscribe4.py\n\n\nIf you need explicit CUDA PyTorch indexes, use uv’s PyTorch guide:\n\n  * Using uv with PyTorch\n\n\n\nPyTorch packaging is unusual because CPU and CUDA builds may live on different indexes and use local version specifiers such as `+cpu` or `+cu130`.\n\n* * *\n\n# Stage 8 — update token handling\n\nUse environment variables rather than hardcoding tokens.\n\n\n    export HF_TOKEN=\"<HUGGINGFACE_ACCESS_TOKEN>\"\n\n\nPython:\n\n\n    import os\n\n    token = os.environ.get(\"HF_TOKEN\")\n    if not token:\n        raise RuntimeError(\"Set HF_TOKEN.\")\n\n\nThen:\n\n\n    pipeline = Pipeline.from_pretrained(\n        \"pyannote/speaker-diarization-community-1\",\n        token=token,\n    )\n\n\nIn normal prose, write the placeholder as `\\<HUGGINGFACE_ACCESS_TOKEN\\>`.\n\nMake sure the token’s Hugging Face account has accepted the model conditions:\n\n  * pyannote/speaker-diarization-community-1\n  * Hugging Face access tokens\n\n\n\nMissing access usually gives errors like:\n\n\n    401 Unauthorized\n    403 Forbidden\n    Repository not found\n    gated repo\n\n\nThose are different from the old `unexpected keyword argument 'token'` error.\n\n* * *\n\n# Stage 9 — account for telemetry\n\nCurrent pyannote docs mention optional telemetry. The README says it tracks privacy-preserving information such as pipeline origin, pipeline class, file duration, and speaker-count parameters, and documents ways to control it.\n\nDisable for the current process if desired:\n\n\n    export PYANNOTE_METRICS_ENABLED=0\n\n\nOr in Python:\n\n\n    from pyannote.audio.telemetry import set_telemetry_metrics\n\n    set_telemetry_metrics(False)\n\n\nSource:\n\n  * pyannote.audio README telemetry section\n\n\n\n* * *\n\n# Stage 10 — test accuracy and runtime before deleting Path A\n\nDo not delete the working pyannote 3.3 path until you compare:\n\n  * same audio file;\n  * same hardware;\n  * same preprocessing;\n  * same transcript segments;\n  * same speaker-label assignment policy;\n  * same output format.\n\n\n\nCompare:\n\n\n    speaker count\n    number of turns\n    total diarization time\n    overlap behavior\n    transcript speaker-label quality\n    GPU memory use\n    runtime\n    failure rate on long files\n\n\nA migration is successful only if the final speaker-attributed transcript improves or remains acceptable.\n\n* * *\n\n# Suggested branch layout\n\nKeep two scripts for a while:\n\n\n    diaritranscribe3.py       # recovery path, pyannote 3.3\n    diaritranscribe4.py       # migration path, pyannote 4 / Community-1\n\n\nKeep two lockfiles if using inline scripts:\n\n\n    diaritranscribe3.py.lock\n    diaritranscribe4.py.lock\n\n\nThis prevents accidentally breaking the known-good path while testing the new one.\n\n* * *\n\n# Minimal `diaritranscribe4.py` starting point\n\nThis is a clean starting point for just the diarization part.\n\n\n    #!/usr/bin/env -S uv run --script\n    # /// script\n    # requires-python = \">=3.10,<3.14\"\n    # dependencies = [\n    #   \"pyannote.audio>=4,<5\",\n    #   \"torch==2.8.0\",\n    #   \"torchaudio==2.8.0\",\n    #   \"torchcodec==0.7.*\",\n    # ]\n    # ///\n\n    import argparse\n    import os\n    from importlib.metadata import version\n\n    import torch\n    from pyannote.audio import Pipeline\n    from pyannote.audio.pipelines.utils.hook import ProgressHook\n\n    MODEL_ID = \"pyannote/speaker-diarization-community-1\"\n\n\n    def print_versions():\n        print(\"pyannote.audio:\", version(\"pyannote.audio\"))\n        print(\"torch:\", torch.__version__)\n        print(\"torch cuda build:\", torch.version.cuda)\n        print(\"cuda available:\", torch.cuda.is_available())\n        print(\"torchaudio:\", version(\"torchaudio\"))\n        print(\"torchcodec:\", version(\"torchcodec\"))\n\n\n    def load_pipeline(token: str):\n        pipeline = Pipeline.from_pretrained(\n            MODEL_ID,\n            token=token,\n        )\n\n        if torch.cuda.is_available():\n            pipeline.to(torch.device(\"cuda\"))\n\n        return pipeline\n\n\n    def run_diarization(audio_path: str):\n        token = os.environ.get(\"HF_TOKEN\")\n        if not token:\n            raise RuntimeError(\"Set HF_TOKEN before running this script.\")\n\n        print_versions()\n        print(f\"Loading {MODEL_ID}...\")\n\n        pipeline = load_pipeline(token)\n\n        with ProgressHook() as hook:\n            output = pipeline(audio_path, hook=hook)\n\n        return output\n\n\n    def print_diarization(output):\n        print(\"\\nRegular speaker diarization:\")\n        for turn, speaker in output.speaker_diarization:\n            print(f\"{turn.start:.3f}\\t{turn.end:.3f}\\t{speaker}\")\n\n        print(\"\\nExclusive speaker diarization:\")\n        if hasattr(output, \"exclusive_speaker_diarization\"):\n            for turn, speaker in output.exclusive_speaker_diarization:\n                print(f\"{turn.start:.3f}\\t{turn.end:.3f}\\t{speaker}\")\n        else:\n            print(\"Not available.\")\n\n\n    def main():\n        parser = argparse.ArgumentParser()\n        parser.add_argument(\"audio_path\")\n        args = parser.parse_args()\n\n        output = run_diarization(args.audio_path)\n        print_diarization(output)\n\n\n    if __name__ == \"__main__\":\n        main()\n\n\nRun:\n\n\n    export HF_TOKEN=\"<HUGGINGFACE_ACCESS_TOKEN>\"\n    uv run --refresh --script diaritranscribe4.py audio.wav\n\n\nLock after it works:\n\n\n    uv lock --script diaritranscribe4.py\n\n\n* * *\n\n# Adding faster-whisper back later\n\nAfter Community-1 works by itself, add faster-whisper back.\n\n\n    # /// script\n    # requires-python = \">=3.10,<3.14\"\n    # dependencies = [\n    #   \"pyannote.audio>=4,<5\",\n    #   \"torch==2.8.0\",\n    #   \"torchaudio==2.8.0\",\n    #   \"torchcodec==0.7.*\",\n    #   \"faster-whisper\",\n    #   \"numpy\",\n    #   \"scikit-learn\",\n    #   \"omegaconf\",\n    # ]\n    # ///\n\n\nThen test faster-whisper separately before combining:\n\n\n    from faster_whisper import WhisperModel\n\n    model = WhisperModel(\"small\", device=\"cuda\", compute_type=\"float16\")\n    segments, info = model.transcribe(\"audio.wav\", beam_size=5)\n\n    for segment in segments:\n        print(segment.start, segment.end, segment.text)\n\n\nIf faster-whisper fails with CUDA/cuDNN/CTranslate2 errors, that is separate from pyannote.\n\nSource:\n\n  * faster-whisper README\n\n\n\n* * *\n\n# Common Path B failure modes\n\n## Failure: `No solution found`\n\nUsually means you still have a dependency pin like:\n\n\n    brouhaha -> pyannote-audio==3.3.0\n\n\nFix:\n\n  * remove Brouhaha from the pyannote 4 environment;\n  * fork/update Brouhaha;\n  * split environments.\n\n\n\n## Failure: `unexpected keyword argument 'token'`\n\nThis means you are still on old pyannote.\n\nCheck:\n\n\n    from importlib.metadata import version\n    print(version(\"pyannote.audio\"))\n\n\nIf it prints `3.3.0`, you are not on Path B yet.\n\n## Failure: `unexpected keyword argument 'use_auth_token'`\n\nThis means you are probably on newer pyannote but still using old code.\n\nUse:\n\n\n    token=\"<HUGGINGFACE_ACCESS_TOKEN>\"\n\n\nnot:\n\n\n    use_auth_token=\"<HUGGINGFACE_ACCESS_TOKEN>\"\n\n\n## Failure: `Could not load libtorchcodec`\n\nCheck:\n\n  * TorchCodec/Torch version compatibility;\n  * FFmpeg installation;\n  * Python version;\n  * platform wheel availability.\n\n\n\nSource:\n\n  * TorchCodec README\n\n\n\n## Failure: model access denied\n\nCheck that you accepted the model conditions and used a valid token:\n\n  * Community-1 model card\n  * Hugging Face access tokens\n\n\n\n## Failure: CUDA not available\n\nCheck PyTorch install:\n\n\n    import torch\n    print(torch.__version__)\n    print(torch.version.cuda)\n    print(torch.cuda.is_available())\n\n\nUse uv’s PyTorch guide for accelerator-specific builds:\n\n  * uv PyTorch guide\n\n\n\n* * *\n\n# Recommended Path B checklist\n\n  1. Create `diaritranscribe4.py`.\n  2. Remove `brouhaha` from that script.\n  3. Use `pyannote.audio>=4,<5`.\n  4. Start with a coherent Torch/TorchAudio/TorchCodec family.\n  5. Install FFmpeg.\n  6. Accept Community-1 model conditions.\n  7. Set `HF_TOKEN`.\n  8. Load with `token=...`.\n  9. Use `output.speaker_diarization`.\n  10. Prefer `output.exclusive_speaker_diarization` for transcript alignment.\n  11. Test pyannote alone.\n  12. Add faster-whisper back only after pyannote works.\n  13. Rebuild speaker assignment around maximum overlap or word-level timestamps.\n  14. Lock the migrated script.\n  15. Keep the pyannote 3.3 script until the new output is verified.\n\n\n\n* * *\n\n# Bottom line\n\nPath B is not:\n\n\n    change speaker-diarization-3.1 to speaker-diarization-community-1\n\n\nand it is not:\n\n\n    change use_auth_token= to token=\n\n\nIt is:\n\n\n    remove or isolate the Brouhaha pyannote 3.3 pin\n            ↓\n    move to pyannote.audio 4.x\n            ↓\n    use Community-1\n            ↓\n    install/verify TorchCodec and FFmpeg\n            ↓\n    change the output parsing code\n            ↓\n    use exclusive diarization for transcript alignment\n            ↓\n    lock the new environment\n\n\nFor your project, the safest approach is to keep:\n\n\n    diaritranscribe3.py\n\n\nas the recovery script and create:\n\n\n    diaritranscribe4.py\n\n\nas the Community-1 migration script.\n\nDo not merge them until Community-1 works alone, faster-whisper works alone, and the speaker-attributed transcript is at least as good as your pyannote 3.3 path.",
  "title": "Module 'torchaudio' has no attribute 'AudioMetaData'"
}