External Publication

Wave-Coherence Scoring: Phase-Aware Alternative to Cosine Similarity

Hugging Face Forums [Unofficial] March 22, 2026

**Wave-Engine: New Architecture, Honest Numbers ** The kerr-engine and kerr-server are now parked. Two new repos replace them:

wave-engine — training (Apache 2.0)
wave-server — inference, OpenAI-compatible API with KV-cache (Apache 2.0)

The kerr repos stay public as historical reference. Their READMEs point to the new repos. Everything validated in kerr-engine carries forward — maestro dim=16, curriculum training, stochastic resonance, all of it.

What changed and why

The kerr-engine proved the core concept (98.1% of MLP at 44% params). But three architectural limits showed up during scaling:

Sequential blocks — attention had to finish before FFN could start. Wave-engine uses GPT-J parallel blocks: attention and FFN read the same normalised input, run simultaneously, outputs sum into the residual stream.
Trained attention — standard dot-product attention consumed parameters and compute. Wave-engine uses frozen harmonic coherence attention — phase-based scoring from a mathematical structure, zero attention parameters trained. The attention pattern is determined entirely by harmonic embedding geometry.
RK4-16 ODE — 16 sequential integration steps per layer, not parallelisable. Wave-engine uses a perturbative ODE inspired by techniques from telecom fiber optics DSP. Single-pass analytical computation. 14x fewer arithmetic operations, MSE 0.000005 vs RK4-16. Trains better because the true gradient flows through the perturbative computation, not an identity backward approximation.

The numbers

Training tiers from a single Rust binary, single cargo run command:

Tier	Flag	Loss @ 199	Speed	Params	Hardware
CPU	(none)	2.52	520ms/iter	2.63M	Any computer
wgpu GPU	`--gpu`	2.52	520ms/iter	2.63M	Any GPU (Vulkan/Metal/DX12)
Candle CUDA	`--candle`	2.81	213ms/iter	657K	NVIDIA only

Measured March 22 2026: 4 layers, seq=64, batch=4, 200 iters, Shakespeare, no curriculum, RTX 4070 Ti.

CPU and wgpu produce identical loss at every single iteration — same init, same math. Candle is 2.4x faster with block-diagonal output projection (6 groups of 128×128) and perturbative ODE — 4x fewer FFN parameters, faster convergence, slightly higher loss from cosine LR warmup. VRAM rock-solid at 1329MB.

All three tiers produce compatible checkpoints. A model trained on CPU can be served from GPU, or vice versa. The wgpu tier runs on AMD, Intel, Apple Silicon — no CUDA required.

One honest null

We built a hybrid converter — take Qwen 2.5 0.5B, keep the attention layers, replace the MLP with our ODE layers, distill. The idea: maybe trained MLP weights contain hidden wave structure we can tap into.

Ran SVD and DFT analysis on all 72 weight matrices across 24 layers. The answer: no. Full effective rank 896/896 everywhere. Flat frequency spectrum (33/33/33% low/mid/high). No near-identity layers. No structured frequency content to exploit.

The “translate existing model to waves” path doesn’t exist. Wave-engine models need to be trained from scratch. The efficiency gains come from learning a different representation, not compressing an existing one. Finding archived. Hybrid experiment parked.

What’s next

Training a wave-engine model on diverse real English with BPE tokenization and education curriculum (grammar → children’s literature → general English → domain-specific). The architecture is validated. The infrastructure is built. The next question is whether it generates coherent text at 24 layers on real data.

80 defensive patterns now published in the research repo under MIT. Everything that makes the engine work is documented and open.

The stack

Repo	What	License
Wave Coherence	Research framework, 80 defensive patterns	MIT
wave-engine	Training, 3 tiers, perturbative ODE, block-diagonal	Apache 2.0
wave-server	Inference, OpenAI-compatible API, KV-cache, wave memory	Apache 2.0
kerr-memory	Wave memory library, works with both engines	Apache 2.0
kerr-engine	Original prototype (parked, historical)	Apache 2.0
kerr-server	Original server (parked, historical)	Apache 2.0

No Python. No pip. No CUDA toolkit required. One cargo build --release, one cargo run.

Discussion in the ATmosphere