Training lora for LTX2.3 voice / sound only
Hello guys,
I am kind of stuck at the moment. I am trying to train Lora for voice only through Ostris AI ToolKit - VPS - Vast RTX5090. Here is the thing, I want individually or separately train voice lora only for my character. So when I manage that, I will train a video lora with character+voice. But as I mentioned above I am stuck. I am getting multiple errors from ostris AI. I got 27 clips between 6-10 seconds all well captioned. This is the error which mostly appears among others - RuntimeError: Internal error: Internal Writer Error: Background writer channel closed. Not even sure if my lora training settings are correct
Thank you for all the answers if some appears lol
job: “extension” config: " process:
type: “diffusion_trainer” training_folder: “/workspace/ai-toolkit/output” sqlite_db_path: “./aitk_db.db” device: “cuda” trigger_word: “” performance_log_every: 10 network: type: “lora” linear: 32 linear_alpha: 32 conv: 16 conv_alpha: 16 lokr_full_rank: true lokr_factor: -1 network_kwargs: ignore_if_contains: save: dtype: “bf16” save_every: 500 max_step_saves_to_keep: 4 save_format: “diffusers” push_to_hub: false datasets:
folder_path: “/workspace/ai-toolkit/datasets/ema_voice” mask_path: null mask_min_value: 0.1 default_caption: “” caption_ext: “txt” caption_dropout_rate: 0.05 cache_latents_to_disk: true is_reg: false network_weight: 1 resolution:
512 controls: shrink_video_to_frames: true num_frames: 1 flip_x: false flip_y: false num_repeats: 1 do_i2v: false do_audio: true fps: 24 auto_frame_count: true train: batch_size: 1 bypass_guidance_embedding: false steps: 5000 gradient_accumulation: 1 train_unet: true train_text_encoder: false gradient_checkpointing: true noise_scheduler: “flowmatch” optimizer: “adamw8bit” timestep_type: “weighted” content_or_style: “balanced” optimizer_params: weight_decay: 0.0001 unload_text_encoder: false cache_text_embeddings: false lr: 0.0001 ema_config: use_ema: false ema_decay: 0.99 skip_first_sample: false force_first_sample: false disable_sampling: false dtype: “bf16” diff_output_preservation: false diff_output_preservation_multiplier: 1 diff_output_preservation_class: “person” switch_boundary_every: 1 loss_type: “mse” audio_loss_multiplier: 1 logging: log_every: 1 use_ui_logger: true model: name_or_path: “Lightricks/LTX-2.3/ltx-2.3-22b-dev.safetensors” quantize: true qtype: “qfloat8” quantize_te: true qtype_te: “qfloat8” arch: “ltx2.3” low_vram: true model_kwargs: {} layer_offloading: false layer_offloading_text_encoder_percent: 1 layer_offloading_transformer_percent: 1 sample: sampler: “flowmatch” sample_every: 500 width: 768 height: 768 samples:
neg: "" seed: 42 walk_seed: true guidance_scale: 4 sample_steps: 30 num_frames: 121 fps: 24
meta: name: “[name]” version: “1.0”
Discussion in the ATmosphere