{
"$type": "site.standard.document",
"coverImage": {
"$type": "blob",
"ref": {
"$link": "bafkreifs3g7n3moiuuozulr75vkmf77shachbdfpxcdos4oneot7snw5ra"
},
"mimeType": "image/webp",
"size": 6248
},
"description": "Local voice transcription.\n\nThe way that the Whisper model works seems to be that real-time transcriptions are inæffective, so I shouldn’t bother looking for it. To achieve transcriptions practical for real-time communication, voice activity detection should be used to identify speech from silence.\n\nWhisper tends to hallucinate phrases when given i...",
"path": "/Whisper",
"publishedAt": "2026-06-06T18:27:23.000Z",
"site": "at://did:plc:rfescy2ghdk6ma2wwwhr3bu2/site.standard.publication/3mktkmfk37k2g",
"textContent": "\nLocal voice transcription.\n\nThe way that the *Whisper* model works seems to be that real-time transcriptions are inæffective, so I shouldn’t bother looking for it. To achieve transcriptions practical for real-time communication, voice activity detection should be used to identify speech from silence.\n\n*Whisper* tends to hallucinate phrases when given insufficient data, silence or short phrases. This includes:\n\n- “Thank you” or “Thanks for watching”.\n- “Sorry”.\n- Subtitle attribution, usually includes a domain name.\n\nPost-processing tends to be necessary for short speech.\n\n## Stream avatar\n\nI have ideas of using it for a stream/live camera avatar of sorts “*[[Sheep Zhing]]*.”. Speech-to-text-to-speech, essentially. This had rather humorous results but is terrible for clear communication.\n\n- [huwprosser/web-whisper](https://github.com/huwprosser/web-whisper) - *Python* backend.\n - Model runs hot and occasionally locks up.\n - Long payloads get rejected by the server.\n- Considering creating a separate *Node.js*/*Express* implementation that invokes a *Whisper* CLI tool instead.\n For [*whisper.cpp*](https://github.com/ggml-org/whisper.cpp), this command might be sufficient.\n `whisper-cli.exe -m ./models/tiny.en.bin -np -nt speech.wav -sns`",
"title": "Whisper"
}