External Publication
Visit Post

I'm not an engineer. I just wanted to see if a 3D cube of cells could learn to talk

Hugging Face Forums [Unofficial] May 26, 2026
Source

I’m not an engineer. I just wanted to see if a 3D cube of cells could learn to talk.

Hi everyone,

I want to share a project I’ve been working on for the past week. I’m not a machine learning engineer, I don’t have a CS degree, and I had no idea if this would work. I just had a question: what if instead of Transformers, we used a 3D grid of simple cells that only talk to their neighbors?

Like a brain made of tiny cells, where information travels as waves. No attention, no layers — just local communication.

It kind of worked. And along the way, I found things I didn’t expect.

The idea

I built a Neural Cellular Automaton in 3D — a 16×16×16 cube (4,096 cells) where each cell can only see its 26 immediate neighbors. Information enters one face of the cube, propagates as waves through the interior, and the prediction is read from the opposite face.

Think of it like dropping a pebble in a pond — the ripples carry the information.

Phase 1: Can it do math?

I started simple: arithmetic. Addition, subtraction, multiplication, division.

With just 499K parameters (a Transformer would need millions), the model reached 98.4% accuracy on numbers it had never seen during training. Not memorization — actual generalization. It learned the rules of arithmetic.

That gave me confidence. If a cube of cells can learn math, maybe it can learn something harder.

Phase 2: Does it understand relationships?

I taught it semantic relations: “dog is_a animal”, “Paris capital_of France”, “king opposite_of queen”. 100 relations, thousands of pairs.

73.4% test accuracy. 87.5% generalization to novel combinations.

Then grammar + semantics together (184 relations): 93.5% overall. The Conv3d weights that learned math could also learn world knowledge. Same brain, different skills.

Phase 3: Can it reason?

I tested transitive reasoning without training for it. If it knows “wolf is_a mammal” and “mammal produces milk”, can it infer “wolf → milk”?

83.3% on novel chains it had never seen. wolf->mammal->milk, shark->fish->water, penguin->bird->fly. Reasoning emerged from the structure.

It also learned to answer questions: “capital of France?” → “Paris”. 85% accuracy on direct questions, 75% on novel combinations.

Phase 4: Language (the hard part)

This is where it got interesting — and where I failed many times.

9 versions of text generation failed. Every single one collapsed to “the the the” or “the of in a”. The most common English words dominated everything.

The breakthrough came with three changes:

  1. Dilated convolutions — cycle [1, 2, 4, 8] so each cell can “see” the entire grid in 4 steps
  2. Word embeddings — switching from characters to a 30K word vocabulary
  3. Synaptic fatigue — cells that fire too much get tired, preventing repetition

The current model (v5) generates coherent phrases:

“she started to play together again” “the little girl wanted to play with her parents” “he said that he was very happy” “in the morning she went to the garden”

10.7% eval accuracy on 30K vocabulary. That’s not impressive by Transformer standards, but for a cellular automaton with 35M parameters that processes everything through local 3D wave propagation? I think it’s something.

What surprised me (emergent phenomena)

This is the part that really blew my mind. I didn’t program any of this — it emerged from training:

  1. The brain developed hemispheres. Region x=12 produces good language. Region x=6 produces garbage. Just like biological brains have lateralization — but nobody told the model to do this.

  2. Three phases of thinking. Steps 1-5: chaos (activations are noisy). Steps 6-7: “eureka” (the model suddenly organizes). Steps 8-15: decision (converges to the answer). The eureka moment coincides with the dilated convolution cycle reaching global coverage.

  3. Grammar and semantics separated spatially. Grammar channels concentrate in the center of the grid, semantic channels in the periphery. Like Broca’s area (syntax) and Wernicke’s area (meaning) in the human brain. The model spontaneously organized this way.

  4. Semantic clustering. Animals, family members, nature words, and objects each form distinct spatial clusters in the grid. The cube organized its own “brain regions” by category.

  5. Emotions activate a specific highway. Emotional words light up depth layer z=12 more than neutral words. The model created an “emotion highway” through the cube.

  6. The wave is visible. You can literally watch information travel from z=0 (input) to z=15 (output) step by step. The answer arrives as a wave at step 7 — the earliest step where the signal reaches the output face.

88 documented discoveries

Over the course of this project, I documented 88 experimental findings. Some of the key ones:

  • Cross-entropy loss works better than knowledge distillation (7.4% vs 4.2%)
  • The model thinks in waves — visualized and confirmed
  • Arithmetic knowledge gets overwritten when you teach language (the Conv3d transforms completely)
  • With 10 inference techniques combined, the model produced “you are having fun” — a grammatically perfect sentence — without any retraining, just by manipulating the grid’s activity
  • The init_state (the brain’s “DNA”) already contains the seeds of specialization before any training

What this is NOT

I want to be clear about what this project is:

  • It’s not a competitor to Transformers. GPT-2 Small (124M params) would destroy this model on every benchmark.
  • It’s not a practical language model. You can’t use it for anything useful.
  • It’s not polished research. I’m one person experimenting, not a lab with peer review.

What I think it IS

  • Proof that a fundamentally different architecture can learn language structure. Not well, but it can.
  • Evidence that spatial organization matters. The brain developed regions, hemispheres, and highways that weren’t programmed.
  • An exploration of what “thinking” looks like when computation happens through waves in 3D space instead of matrix multiplications in 1D.
  • A fun project by someone who just wanted to try something different.

The model

I uploaded the v5 model (the best one) to HuggingFace:

huggingface.co

killking69/nca3d-brain-v5 · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

  • 35.4M parameters, 68 MB
  • 30K word vocabulary
  • Includes model code, inference script, dictionary, and brain visualizations
  • Runs on CPU, no GPU needed
  • MIT license

What’s next?

Honestly, I’m not sure. I’ve been at this for about a week and I’m a bit burned out. v6 (knowledge distillation from GPT-2) showed promise but needs much more training than I can afford. I’d love to see what happens with:

  • More training data and compute (v6.2 is ready but needs ~20h on a B200)
  • A Gradio Space where people can see the waves propagate in real-time
  • Someone with more ML experience taking a look at the architecture

If any of this is interesting to you, the code and all 88 findings are in the repo. I’d love to hear what you think.

Thanks for reading.

-– Cristian

Discussion in the ATmosphere

Loading comments...