Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreifold74xlzqnjiulhxqvnzznlbmrd2wqdsulliktvvolaul4njkwa",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mhnaolfzufs2"
  },
  "path": "/t/wave-coherence-scoring-phase-aware-alternative-to-cosine-similarity/173375#post_11",
  "publishedAt": "2026-03-22T08:19:42.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "wave-engine",
    "wave-server"
  ],
  "textContent": "**Wave-Engine: New Architecture, Honest Numbers\n**\nThe kerr-engine and kerr-server are now parked. Two new repos replace them:\n\n  * wave-engine — training (Apache 2.0)\n\n  * wave-server — inference, OpenAI-compatible API with KV-cache (Apache 2.0)\n\n\n\n\nThe kerr repos stay public as historical reference. Their READMEs point to the new repos. Everything validated in kerr-engine carries forward — maestro dim=16, curriculum training, stochastic resonance, all of it.\n\n**What changed and why**\n\nThe kerr-engine proved the core concept (98.1% of MLP at 44% params). But three architectural limits showed up during scaling:\n\n  1. Sequential blocks — attention had to finish before FFN could start. Wave-engine uses GPT-J parallel blocks: attention and FFN read the same normalised input, run simultaneously, outputs sum into the residual stream.\n\n  2. Trained attention — standard dot-product attention consumed parameters and compute. Wave-engine uses frozen harmonic coherence attention — phase-based scoring from a mathematical structure, zero attention parameters trained. The attention pattern is determined entirely by harmonic embedding geometry.\n\n  3. RK4-16 ODE — 16 sequential integration steps per layer, not parallelisable. Wave-engine uses a perturbative ODE inspired by techniques from telecom fiber optics DSP. Single-pass analytical computation. 14x fewer arithmetic operations, MSE 0.000005 vs RK4-16. Trains better because the true gradient flows through the perturbative computation, not an identity backward approximation.\n\n\n\n\n**The numbers**\n\nTraining tiers from a single Rust binary, single `cargo run` command:\n\nTier | Flag | Loss @ 199 | Speed | Params | Hardware\n---|---|---|---|---|---\nCPU | _(none)_ | **2.52** | 520ms/iter | 2.63M | Any computer\nwgpu GPU | `--gpu` | **2.52** | 520ms/iter | 2.63M | Any GPU (Vulkan/Metal/DX12)\nCandle CUDA | `--candle` | **2.81** | 213ms/iter | 657K | NVIDIA only\n\n_Measured March 22 2026: 4 layers, seq=64, batch=4, 200 iters, Shakespeare, no curriculum, RTX 4070 Ti._\n\nCPU and wgpu produce identical loss at every single iteration — same init, same math. Candle is 2.4x faster with block-diagonal output projection (6 groups of 128×128) and perturbative ODE — 4x fewer FFN parameters, faster convergence, slightly higher loss from cosine LR warmup. VRAM rock-solid at 1329MB.\n\nAll three tiers produce compatible checkpoints. A model trained on CPU can be served from GPU, or vice versa. The wgpu tier runs on AMD, Intel, Apple Silicon — no CUDA required.\n\n**One honest null**\n\nWe built a hybrid converter — take Qwen 2.5 0.5B, keep the attention layers, replace the MLP with our ODE layers, distill. The idea: maybe trained MLP weights contain hidden wave structure we can tap into.\n\nRan SVD and DFT analysis on all 72 weight matrices across 24 layers. The answer: no. Full effective rank 896/896 everywhere. Flat frequency spectrum (33/33/33% low/mid/high). No near-identity layers. No structured frequency content to exploit.\n\nThe “translate existing model to waves” path doesn’t exist. Wave-engine models need to be trained from scratch. The efficiency gains come from learning a _different_ representation, not compressing an existing one. Finding archived. Hybrid experiment parked.\n\n**What’s next**\n\nTraining a wave-engine model on diverse real English with BPE tokenization and education curriculum (grammar → children’s literature → general English → domain-specific). The architecture is validated. The infrastructure is built. The next question is whether it generates coherent text at 24 layers on real data.\n\n80 defensive patterns now published in the research repo under MIT. Everything that makes the engine work is documented and open.\n\n**The stack**\n\nRepo | What | License\n---|---|---\nWave Coherence | Research framework, 80 defensive patterns | MIT\nwave-engine | Training, 3 tiers, perturbative ODE, block-diagonal | Apache 2.0\nwave-server | Inference, OpenAI-compatible API, KV-cache, wave memory | Apache 2.0\nkerr-memory | Wave memory library, works with both engines | Apache 2.0\nkerr-engine | Original prototype (parked, historical) | Apache 2.0\nkerr-server | Original server (parked, historical) | Apache 2.0\n\nNo Python. No pip. No CUDA toolkit required. One `cargo build --release`, one `cargo run`.",
  "title": "Wave-Coherence Scoring: Phase-Aware Alternative to Cosine Similarity"
}