External Publication

HoLo/HSL: a 100M change-rate-based multimodal toy model on a single RTX 4070

Hugging Face Forums [Unofficial] June 7, 2026

Hi everyone,

I’d like to share a small personal research POC I built: HoLo/HSL.

HoLo is a ~100M parameter toy-scale multimodal model trained on a single consumer GPU (RTX 4070). It is not meant to compete with existing models, and the generation quality is still very rough. At this stage, the goal is simply to test whether the core architecture can work.

The main idea is to explore whether change-rate, rather than tokens, can serve as a shared signal substrate across modalities such as text, image, audio, and video.

What currently works:

byte-native input/output pipeline
text, image-frame, audio, and early video generation plumbing
closure signal for distinguishing end-of-content vs end-of-window
multimodal signal-space visualization
phase/affect channel experiments
external memory/offload toy experiments
synthetic sensor-fusion toy experiments
a small public demo showing the generated signal trajectory

Important caveats:

This is a feasibility study, not a benchmark claim.
The model is very small.
Generation quality is currently poor.
Some experiments are toy-scale.
I am not claiming superiority over existing systems.

Still, I wanted to share it because the full pipeline now runs end-to-end, and I think the change-rate / signal-substrate direction may be interesting for multimodal research.

SPACE : ggunio/holo-demo-space

Paper:

Zenodo link

Demo: https://holo-demo-p5txmh4dda-as.a.run.app

GitHub:

GitHub link

I’d be happy to hear feedback, especially on the architecture, evaluation design, and possible next experiments.

Discussion in the ATmosphere