HoLo/HSL: a 100M change-rate-based multimodal toy model on a single RTX 4070
Hi everyone,
I’d like to share a small personal research POC I built: HoLo/HSL.
HoLo is a ~100M parameter toy-scale multimodal model trained on a single consumer GPU (RTX 4070). It is not meant to compete with existing models, and the generation quality is still very rough. At this stage, the goal is simply to test whether the core architecture can work.
The main idea is to explore whether change-rate, rather than tokens, can serve as a shared signal substrate across modalities such as text, image, audio, and video.
What currently works:
- byte-native input/output pipeline
- text, image-frame, audio, and early video generation plumbing
- closure signal for distinguishing end-of-content vs end-of-window
- multimodal signal-space visualization
- phase/affect channel experiments
- external memory/offload toy experiments
- synthetic sensor-fusion toy experiments
- a small public demo showing the generated signal trajectory
Important caveats:
- This is a feasibility study, not a benchmark claim.
- The model is very small.
- Generation quality is currently poor.
- Some experiments are toy-scale.
- I am not claiming superiority over existing systems.
Still, I wanted to share it because the full pipeline now runs end-to-end, and I think the change-rate / signal-substrate direction may be interesting for multimodal research.
SPACE : ggunio/holo-demo-space
Paper:
Demo: https://holo-demo-p5txmh4dda-as.a.run.app
GitHub:
I’d be happy to hear feedback, especially on the architecture, evaluation design, and possible next experiments.
Discussion in the ATmosphere