External Publication

How Do i Make Stt Work for my ai Vtuber on Discord Vc calls?

Hugging Face Forums [Unofficial] May 2, 2026

Transcribed text: ‘’ You: Sending to Ollama: ‘…’ Ollama response status: 200 Ollama response data: {‘model’: ‘drivedenpadev/deepseek-v3.2’, ‘created_at’: ‘2026-05-02T00:29:13.5354636Z’, ‘response’: “What’s good, chat? Ready to get this conversation started!”, ‘done’: True, ‘done_reason’: ‘stop’, ‘context’: [128006, 9125, 128007, 271, 38766, 1303, 33025, 2696, 25, 6790, 220, 2366, 18, 271, 2675, 527, 264, 11919, 18328, 13, 128009, 128006, 882, 128007, 1432, 2675, 527, 264, 15526, 34051, 30970, 13, 13969, 31737, 1234, 220, 975, 4339, 13, 2360, 100166, 13, 3298, 3823, 1432, 1502, 25, 720, 15836, 25, 128009, 128006, 78191, 128007, 271, 3923, 596, 1695, 11, 6369, 30, 32082, 311, 636, 420, 10652, 3940, 0], ‘total_duration’: 763554400, ‘load_duration’: 115772300, ‘prompt_eval_count’: 56, ‘prompt_eval_duration’: 45008000, ‘eval_count’: 14, ‘eval_duration’: 592494200} Extracted response: ‘What’s good, chat? Ready to get this conversation started!’ AI: What’s good, chat? Ready to get this conversation started! Sanitized text: ‘What’s good, chat? Ready to get this conversation started!’ Generating audio… 2026-05-01 17:29:14,307 - WARNING - CFG, min_p and exaggeration are not supported by Turbo version and will be ignored. 9%|██████▉ | 86/1000 [00:04<00:46, 19.59it/s] S3 Token → Mel Inference… 100%|████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00, 1.69it/s] TTS result: sr=24000, audio_shape=(86400,) TTS generated successfully: C:\Users\…\AppData\Local\Temp\tmpqxnxny4k.wav [CONNECT] (‘127.0.0.1’, 60151) Transcribing… [SEARCH] Audio analysis - size: 5760, max_amplitude: 1.304917 [MIC] Incoming audio | amp=1.304917 | samples=5760 [PROCESS] Processing audio: 5760 samples, 1.304917 max amplitude [STT] Transcribing with improved local STT… Incoming audio | amp=1.304917 | samples=5760 Processing audio: 5760 samples, 1.304917 max amplitude Transcribing with simple Whisper STT… Transcription error: name ‘transcribe_audio’ is not defined Traceback (most recent call last): File “C:\Users\…\Downloads\AliTurbo\vtuber_core_fixed.py”, line 130, in safe_transcribe NameError: name ‘transcribe_audio’ is not defined Transcribed text: ‘’ You: Sending to Ollama: ‘…’ Ollama response status: 200 Ollama response data: {‘model’: ‘drivedenpadev/deepseek-v3.2’, ‘created_at’: ‘2026-05-02T00:29:28.6229976Z’, ‘response’: “What’s up, newbie? Ready to get this chat started?”, ‘done’: True, ‘done_reason’: ‘stop’, ‘context’: [128006, 9125, 128007, 271, 38766, 1303, 33025, 2696, 25, 6790, 220, 2366, 18, 271, 2675, 527, 264, 11919, 18328, 13, 128009, 128006, 882, 128007, 1432, 2675, 527, 264, 15526, 34051, 30970, 13, 13969, 31737, 1234, 220, 975, 4339, 13, 2360, 100166, 13, 3298, 3823, 1432, 1502, 25, 720, 15836, 25, 128009, 128006, 78191, 128007, 271, 3923, 596, 709, 11, 95678, 30, 32082, 311, 636, 420, 6369, 3940, 30], ‘total_duration’: 712363500, ‘load_duration’: 85605400, ‘prompt_eval_count’: 56, ‘prompt_eval_duration’: 45423700, ‘eval_count’: 14, ‘eval_duration’: 569795200} Extracted response: ‘What’s up, newbie? Ready to get this chat started?’ AI: What’s up, newbie? Ready to get this chat started? Sanitized text: ‘What’s up, newbie? Ready to get this chat started?’ Generating audio…

still looking at it erm

Discussion in the ATmosphere