External Publication

AI VTuber For Beginners/non-programmers Easy To setup

Hugging Face Forums [Unofficial] June 25, 2026

Bro77XP/Beginner-Friendly-Ai-Vtuber: THE EASIEST AI VTUBER TO SETUP KNOWN TO MAN.

A 100% local Ai Vuber for Beginners and Non-programmer setup That is 100% free to Run With instant zero‑shot voice cloning That Uses Vtube studio’s api To make the mouth open and close and play animations after setting it up

----- Readme -----

AI VTuber For Begginers/non programmers Easy To setup

An AI VTuber that uses Whisper for speech recognition, Ollama for LLM inference, and Chatterbox TTS in a continuous listening loop.

This Was Also Made On a AMD gpu But the code is mainly supported For cpu users So it can be used without amd or nvdia gpus

This uses Python 3.10.11 if you don’t have it as your main Version do: py -3.10 -m venv venv

(you can check the version with python -V)

Features

Whisper (base.en model) - Real-time speech-to-text in English
Ollama (llama3.2) - AI model for generating VTuber responses
Chatterbox TTS - Text-to-speech to speak responses
Automatic silence detection - Only records when speech is detected
Continuous listening loop - Runs forever until Ctrl+C
VTube Studio integration - Controls mouth expressions via VTube Studio Api

Dependencies

(IMPORTANT!!!) MAKE A VENV FIRST AND MAKE SURE YOU ARE INSIDE THE PROJECT FOLDER for example

C:\Users(Yourusername)\Downloads\Begginerfriendlyai

And right click and do “open in terminal”

or do cd C:\Users(Yourusername)\Downloads\Begginerfriendlyai

THEN DO

python -m venv venv

then

venv\Scripts\Activate

Now Once your inside your virtual Environment (venv) do this

pip install -r requirements.txt

Required external tools:

Ollama - Install and run: ollama serve
[VTube Studio] - For character animation control

Core Dependencies (Required - Works on Windows)

pip install openai-whisper ollama chatterbox-tts pyaudio numpy torch sounddevice soundfile websocket-client rich

Optional RVC Voice Cloning (Advanced - Windows Build Required)

# Uncomment in requirements.txt or install manually (requires C++ build tools)
# pip install torchaudio librosa onnxruntime onnx fairseq pyworld praat-parselmouth TTS edge-tts

Note: RVC voice cloning is optional. The VTuber works perfectly with just the core dependencies using Chatterbox TTS. RVC voice cloning requires C++ build tools (Visual Studio Build Tools) and can be challenging to install on Windows.

Quick Start

MAKE A VENV FIRST AND MAKE SURE YOU ARE INSIDE THE PROJECT FOLDER for example

C:\Users(Yourusername)\Downloads\Begginerfriendlyai

And right click and do “open in terminal”

or do cd C:\Users(Yourusername)\Downloads\Begginerfriendlyai

THEN DO

python -m venv venv

then

venv\Scripts\Activate

Now Once your inside your virtual Environment (venv) do this

Install dependencies:

pip install -r requirements.txt

Pull models:

ollama pull llama3.2
python -m pip install openai-whisper

Start Ollama:

ollama serve

Run the VTuber:

python Aivtuber.py

Configuration

The AI VTuber integrates with VTube Studio to control character animations:

VTube Studio Integration

The script automatically connects to VTube Studio (port 8001) to control:

Mouth expressions : Real-time mouth movement synchronized with speech
Emotion expressions : Triggers pre-configured emotion hotkeys (happy, sad, angry, thinking, neutral)

Setup instructions:

Install VTube Studio and start it
Open the plugin “Local AI VTuber” from the VTube Studio plugins menu
The plugin will automatically generate an authentication token if one doesn’t exist
Restart the AI VTuber script after initial setup
Configure VTube Studio mouth parameter:
- Input: MouthOpen
- Output: ParamMouthOpenY

Configuring Animation Hotkeys

Important: You must configure the actual hotkey IDs in VTube Studio for animations to work:

In VTube Studio, create animations for each emotion:
- Happy animation (e.g., “happy”, “joyful”, “smile”)
- Sad animation (e.g., “sad”, “cry”, “depressed”)
- Angry animation (e.g., “angry”, “mad”, “upset”)
- Thinking animation (e.g., “think”, “hmm”, “idea”)
- Neutral animation (e.g., “neutral”, “calm”, “default”)
Set hotkey IDs for these animations in the VTube Studio plugin settings
Edit Aivtuber.py and update EMOTION_HOTKEYS with the actual hotkey IDs:

EMOTION_HOTKEYS = {
    "happy": "your_happy_hotkey_id",    # Replace with actual hotkey ID
    "sad": "your_sad_hotkey_id",        # Replace with actual hotkey ID
    "angry": "your_angry_hotkey_id",    # Replace with actual hotkey ID
    "thinking": "your_thinking_hotkey_id", # Replace with actual hotkey ID
    "neutral": "your_neutral_hotkey_id", # Replace with actual hotkey ID
}

Auto-configuration option: The script can auto-detect hotkeys based on name patterns. Leave empty to use auto-detection.

Model Configuration

Whisper : Uses “base.en” model for faster English speech recognition
Ollama : Uses “llama3.2” model for AI responses
Chatterbox : Automatically loads on startup

Voice Configuration

Change the voice used by the VTuber:

# Use pre-trained Chatterbox voices
python Aivtuber.py --voice af_heart      # Female heart voice
python Aivtuber.py --voice am_sleepy     # Male sleepy voice
python Aivtuber.py --voice af_smiling    # Female smiling voice
python Aivtuber.py  # Uses default voice

# Use custom RVC voice model (RECOMMENDED for voice cloning)
python Aivtuber.py --voice ./my_rvc_model
python Aivtuber.py --voice /home/user/my_rvc_model
python Aivtuber.py --voice "C:\\Users\\l-ota\\Downloads\\Recording159.wav"
python Aivtuber.py --voice "C:\\Users\\l-ota\\OneDrive\\Documents\\Sound recordings\\Recording159.wav"

Voice Options

Pre-trained Voices (Chatterbox)

af_heart - Female heart voice
am_sleepy - Male sleepy voice
af_smiling - Female smiling voice
(More voices available in Chatterbox)

Custom RVC Voice Models

Provide a directory path to an RVC voice model
Directory must contain infer.py and model files
Supports both Windows and Unix paths

RVC Voice Setup Instructions

Download an RVC voice model from https://github.com/RVC-SFT/SVS
Extract the model to a directory
The directory should contain:
- infer.py - RVC inference script
- Model files (e.g., model.pth, config.json)
- Other required files
Run the VTuber with the directory path

Voice Parameter Behavior

If a file path (with .wav, .mp3 extension or / or \ in path): Loads as custom voice
If a directory path without extension: Tries to load as RVC model
If empty or not provided: Uses default voice (Chatterbox)
Invalid paths fall back to default voice with warning

Advanced Features

Automatic emotion detection : The AI analyzes response text and detects emotions to trigger appropriate VTS hotkeys
Response formatting : The AI is prompted to be a cute anime VTuber with expressive responses
Robust recording : Advanced silence detection prevents unnecessary recording

Usage

The AI VTuber runs in a continuous loop:

Listening phase : Waits for speech with automatic silence detection
Speech detection : Only starts recording after minimum speech duration is confirmed
Transcribing : Uses Whisper to convert speech to text
AI response : Ollama generates a VTuber-appropriate response
Speaking : Chatterbox TTS speaks the response aloud
Mouth control : VTube Studio controls mouth expressions in sync with speech
Repeat : Returns to listening mode

Notes

Press Ctrl+C to stop the VTuber at any time
Ensure proper audio device permissions for microphone access
For GPU acceleration, install PyTorch CUDA versions
Adjust silence_threshold, silence_duration, and min_speech_duration in the code for different environments

Troubleshooting

Common Issues

“Ollama not running” error :
- Make sure Ollama is installed and running with ollama serve
- Verify the model “llama3.2” is pulled
VTube Studio connection failed :
- Ensure VTube Studio is running
- Check that VTS_PORT (default: 8001) is correct
- Make sure VTube Studio plugins are enabled
Audio permissions :
- Grant microphone permissions to this application
- On Linux: pip install pyaudio might require additional system packages
Model loading issues :
- Whisper uses “base.en” for faster performance
- Ensure all dependencies are installed from requirements.txt

Customization

Adjusting Silence Detection

Edit Aivtuber.py and modify these constants:

silence_threshold = 0.01    # Lower = more sensitive, Higher = less sensitive
silence_duration = 1.5      # Seconds of silence before stopping recording
min_speech_duration = 0.5   # Minimum speech duration to trigger recording

Changing Models

Whisper : Change in line 24: self.whisper_model = whisper.load_model("base")
Ollama : Change in line 109: model="llama3.2"

Adding Emotions

Edit the EMOTION_HOTKEYS dictionary in the code and add hotkeys to VTube Studio:

EMOTION_HOTKEYS = {
    "happy": "your_happy_hotkey_id",
    "sad": "your_sad_hotkey_id",
    "angry": "your_angry_hotkey_id",
    "thinking": "your_thinking_hotkey_id",
    "neutral": "your_neutral_hotkey_id",
}

Future Enhancements

Potential future improvements:

Local LLM alternatives : Support for other Ollama models or local LLM implementations
Multi-language support : Whisper language switching and response localization
Context memory : Maintain conversation history for more coherent interactions
Advanced emotion system : More nuanced emotion detection and expression control
Stream processing : WebSocket streaming for lower latency
Plugin architecture : Easy addition of new features and integrations

##questions

Is this Ali?

No this is Not ali In Fact Ali is A WAY more complicated program than this.

This Also Doesn’t use Any of ali’s og code aside from How the mouth api works And Some recreated stuff Like the api being used so you can play music without having issues

Does This contain Any preMade vtuber models i can Download?

No But i do Have Older vtuber models you can use for this for example:

https://drive.google.com/file/d/1WGdSQxnzKeirUBSKeif4En2-b8AW7yTN/view?usp=sharing (IF you Don’t want to use Hyori’s model and test a different one)

Is the ai Sentient And planning arson

-Proby not Unless You Replaced Ollama with something Else

License

This project is open source. Feel free to modify and distribute as long as you give appropriate credit since that’s really important to get a habit out of.

Features

Dependencies

And right click and do “open in terminal”

Core Dependencies (Required - Works on Windows)

Optional RVC Voice Cloning (Advanced - Windows Build Required)

Quick Start

And right click and do “open in terminal”

Configuration

VTube Studio Integration

Configuring Animation Hotkeys

Model Configuration

Voice Configuration

Voice Options

RVC Voice Setup Instructions

Voice Parameter Behavior

Advanced Features

Usage

Notes

Troubleshooting

Common Issues

Customization

Adjusting Silence Detection

Changing Models

Adding Emotions

Future Enhancements

License

Discussion in the ATmosphere