Module 'torchaudio' has no attribute 'AudioMetaData'
Hugging Face Forums [Unofficial]
April 30, 2026
This technically solved my problem, as rewritting the script around that starting point you made worked For posterity, the changes I made were:
adding a slight block of code at the start to bypass a couple errors:
def patched_forward(self, sequences, weights=None):
mean = sequences.mean(dim=-1)
if sequences.size(-1) > 1:
std = sequences.std(dim=-1, correction=1)
else:
std = torch.zeros_like(mean)
return torch.cat([mean, std], dim=-1)
StatsPool.forward = patched_forward
torch.backends.cuda.matmul.allow_tf32 = False
torch.backends.cudnn.allow_tf32 = False
a small change to the assign_speaker_to_segment function to account for multiple segments of the same speaker
def assign_speaker_to_segment(segment_start, segment_end, diarization_turns):
best_speaker = None
best_overlap = 0.0
speakerdict = {}
for speaker in diarization_turns:
speakerdict[speaker[2]] = 0.0
for turn_start, turn_end, speaker in diarization_turns:
speakerdict[speaker] += overlap_seconds(segment_start, segment_end, turn_start, turn_end)
overlap = speakerdict[speaker]
if overlap > best_overlap:
best_overlap = overlap
best_speaker = speaker
return best_speaker or "UNKNOWN"
And a small change to the token function.
Unfortunately, this script is just a cleaner version of the previous iteration of my script, and the current itteration was meant to solve a problem regarding diarization errors themselves. For now, thank you, and I will eventualy open a topic with the next step once I figure out how to formulate the problem.
Discussion in the ATmosphere