Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreie3e4io473orkz7z23lfwkllprsicuy54nl2xttpe5ievayxdr26a",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mmrrmclqwe32"
  },
  "path": "/t/im-not-an-engineer-i-just-wanted-to-see-if-a-3d-cube-of-cells-could-learn-to-talk/176242#post_1",
  "publishedAt": "2026-05-26T19:45:21.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "huggingface.co",
    "killking69/nca3d-brain-v5 · Hugging Face"
  ],
  "textContent": "I’m not an engineer. I just wanted to see if a 3D cube of cells could learn to talk.\n\nHi everyone,\n\nI want to share a project I’ve been working on for the past week. I’m not a machine learning engineer, I don’t have a\nCS degree, and I had no idea if this would work. I just had a question: what if instead of Transformers, we used a 3D\ngrid of simple cells that only talk to their neighbors?\n\nLike a brain made of tiny cells, where information travels as waves. No attention, no layers — just local\ncommunication.\n\nIt kind of worked. And along the way, I found things I didn’t expect.\n\nThe idea\n\nI built a Neural Cellular Automaton in 3D — a 16×16×16 cube (4,096 cells) where each cell can only see its 26\nimmediate neighbors. Information enters one face of the cube, propagates as waves through the interior, and the\nprediction is read from the opposite face.\n\nThink of it like dropping a pebble in a pond — the ripples carry the information.\n\nPhase 1: Can it do math?\n\nI started simple: arithmetic. Addition, subtraction, multiplication, division.\n\nWith just 499K parameters (a Transformer would need millions), the model reached 98.4% accuracy on numbers it had\nnever seen during training. Not memorization — actual generalization. It learned the rules of arithmetic.\n\nThat gave me confidence. If a cube of cells can learn math, maybe it can learn something harder.\n\nPhase 2: Does it understand relationships?\n\nI taught it semantic relations: “dog is_a animal”, “Paris capital_of France”, “king opposite_of queen”. 100 relations,\nthousands of pairs.\n\n73.4% test accuracy. 87.5% generalization to novel combinations.\n\nThen grammar + semantics together (184 relations): 93.5% overall. The Conv3d weights that learned math could also\nlearn world knowledge. Same brain, different skills.\n\nPhase 3: Can it reason?\n\nI tested transitive reasoning without training for it. If it knows “wolf is_a mammal” and “mammal produces milk”, can\nit infer “wolf → milk”?\n\n83.3% on novel chains it had never seen. wolf->mammal->milk, shark->fish->water, penguin->bird->fly. Reasoning emerged from\nthe structure.\n\nIt also learned to answer questions: “capital of France?” → “Paris”. 85% accuracy on direct questions, 75% on novel\ncombinations.\n\nPhase 4: Language (the hard part)\n\nThis is where it got interesting — and where I failed many times.\n\n9 versions of text generation failed. Every single one collapsed to “the the the” or “the of in a”. The most common\nEnglish words dominated everything.\n\nThe breakthrough came with three changes:\n\n  1. Dilated convolutions — cycle [1, 2, 4, 8] so each cell can “see” the entire grid in 4 steps\n  2. Word embeddings — switching from characters to a 30K word vocabulary\n  3. Synaptic fatigue — cells that fire too much get tired, preventing repetition\n\n\n\nThe current model (v5) generates coherent phrases:\n\n“she started to play together again”\n“the little girl wanted to play with her parents”\n“he said that he was very happy”\n“in the morning she went to the garden”\n\n10.7% eval accuracy on 30K vocabulary. That’s not impressive by Transformer standards, but for a cellular automaton\nwith 35M parameters that processes everything through local 3D wave propagation? I think it’s something.\n\nWhat surprised me (emergent phenomena)\n\nThis is the part that really blew my mind. I didn’t program any of this — it emerged from training:\n\n  1. The brain developed hemispheres. Region x=12 produces good language. Region x=6 produces garbage. Just like\nbiological brains have lateralization — but nobody told the model to do this.\n\n  2. Three phases of thinking. Steps 1-5: chaos (activations are noisy). Steps 6-7: “eureka” (the model suddenly\norganizes). Steps 8-15: decision (converges to the answer). The eureka moment coincides with the dilated convolution\ncycle reaching global coverage.\n\n  3. Grammar and semantics separated spatially. Grammar channels concentrate in the center of the grid, semantic\nchannels in the periphery. Like Broca’s area (syntax) and Wernicke’s area (meaning) in the human brain. The model\nspontaneously organized this way.\n\n  4. Semantic clustering. Animals, family members, nature words, and objects each form distinct spatial clusters in the\ngrid. The cube organized its own “brain regions” by category.\n\n  5. Emotions activate a specific highway. Emotional words light up depth layer z=12 more than neutral words. The model\ncreated an “emotion highway” through the cube.\n\n  6. The wave is visible. You can literally watch information travel from z=0 (input) to z=15 (output) step by step. The\nanswer arrives as a wave at step 7 — the earliest step where the signal reaches the output face.\n\n\n\n\n88 documented discoveries\n\nOver the course of this project, I documented 88 experimental findings. Some of the key ones:\n\n  * Cross-entropy loss works better than knowledge distillation (7.4% vs 4.2%)\n  * The model thinks in waves — visualized and confirmed\n  * Arithmetic knowledge gets overwritten when you teach language (the Conv3d transforms completely)\n  * With 10 inference techniques combined, the model produced “you are having fun” — a grammatically perfect sentence —\nwithout any retraining, just by manipulating the grid’s activity\n  * The init_state (the brain’s “DNA”) already contains the seeds of specialization before any training\n\n\n\nWhat this is NOT\n\nI want to be clear about what this project is:\n\n  * It’s not a competitor to Transformers. GPT-2 Small (124M params) would destroy this model on every benchmark.\n  * It’s not a practical language model. You can’t use it for anything useful.\n  * It’s not polished research. I’m one person experimenting, not a lab with peer review.\n\n\n\nWhat I think it IS\n\n  * Proof that a fundamentally different architecture can learn language structure. Not well, but it can.\n  * Evidence that spatial organization matters. The brain developed regions, hemispheres, and highways that weren’t\nprogrammed.\n  * An exploration of what “thinking” looks like when computation happens through waves in 3D space instead of matrix\nmultiplications in 1D.\n  * A fun project by someone who just wanted to try something different.\n\n\n\nThe model\n\nI uploaded the v5 model (the best one) to HuggingFace:\n\nhuggingface.co\n\n### killking69/nca3d-brain-v5 · Hugging Face\n\nWe’re on a journey to advance and democratize artificial intelligence through open source and open science.\n\n  * 35.4M parameters, 68 MB\n  * 30K word vocabulary\n  * Includes model code, inference script, dictionary, and brain visualizations\n  * Runs on CPU, no GPU needed\n  * MIT license\n\n\n\nWhat’s next?\n\nHonestly, I’m not sure. I’ve been at this for about a week and I’m a bit burned out. v6 (knowledge distillation from\nGPT-2) showed promise but needs much more training than I can afford. I’d love to see what happens with:\n\n  * More training data and compute (v6.2 is ready but needs ~20h on a B200)\n  * A Gradio Space where people can see the waves propagate in real-time\n  * Someone with more ML experience taking a look at the architecture\n\n\n\nIf any of this is interesting to you, the code and all 88 findings are in the repo. I’d love to hear what you think.\n\nThanks for reading.\n\n-– Cristian",
  "title": "I'm not an engineer. I just wanted to see if a 3D cube of cells could learn to talk"
}