External Publication

Module 'torchaudio' has no attribute 'AudioMetaData'

Hugging Face Forums [Unofficial] April 30, 2026

This technically solved my problem, as rewritting the script around that starting point you made worked For posterity, the changes I made were:

adding a slight block of code at the start to bypass a couple errors:

def patched_forward(self, sequences, weights=None):
    mean = sequences.mean(dim=-1)
    if sequences.size(-1) > 1:
        std = sequences.std(dim=-1, correction=1)
    else:
        std = torch.zeros_like(mean)
    return torch.cat([mean, std], dim=-1)

StatsPool.forward = patched_forward
torch.backends.cuda.matmul.allow_tf32 = False
torch.backends.cudnn.allow_tf32 = False

a small change to the assign_speaker_to_segment function to account for multiple segments of the same speaker

def assign_speaker_to_segment(segment_start, segment_end, diarization_turns):
    best_speaker = None
    best_overlap = 0.0
    speakerdict = {}
    for speaker in diarization_turns:
        speakerdict[speaker[2]] = 0.0
    for turn_start, turn_end, speaker in diarization_turns:
        speakerdict[speaker] += overlap_seconds(segment_start, segment_end, turn_start, turn_end)
        overlap = speakerdict[speaker]
        if overlap > best_overlap:
            best_overlap = overlap
            best_speaker = speaker

    return best_speaker or "UNKNOWN"

And a small change to the token function.

Unfortunately, this script is just a cleaner version of the previous iteration of my script, and the current itteration was meant to solve a problem regarding diarization errors themselves. For now, thank you, and I will eventualy open a topic with the next step once I figure out how to formulate the problem.

Discussion in the ATmosphere