{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreig4z6fjg57jucjbwt2ethentepxfgccx5ffiqea4c6bnid3h2flmi",
"uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mhgwyy73z6y2"
},
"path": "/t/wave-field-llm-o-n-log-n-attention-via-wave-equation-dynamics-within-5-of-standard-transformer/173625#post_5",
"publishedAt": "2026-03-19T13:53:13.000Z",
"site": "https://discuss.huggingface.co",
"textContent": "**[ Update ] Just built fused Triton kernels for Wave Field LLM v5.**\n\nWhen you build an architecture from scratch, you end up building\n\neverything from scratch.\n\n * Custom attention mechanism (O(n log n) via FFT wave convolution)\n\n * Custom optimizer (Wave optimization)\n\n * Custom KV cache compression (WaveKV filtering)\n\n * Custom Triton kernels (fused scatter-FFT-gather for H100)\n\n * Custom positional encoding (Wave Field pipeline)\n\n\n\n\nNone of the existing tools work when your math is fundamentally\n\ndifferent.\n\nStandard transformers use Q·K^T dot products. We use damped wave\n\npropagation through a continuous field. Flash Attention can’t help us\n\n: it optimizes matrix multiplies we don’t do.\n\nSo we write our own.\n\nThe result: 20x faster than standard attention at 32K context. Runs at 128K where others OOM. 5x less memory.\n\nBuilding the full stack isn’t a choice — it’s a requirement when\n\nyou’re doing something new.\n\n**#WaveFieldLLM** **#AI** **#DeepLearning** **#Triton** **#CUDA** **#Optimization**",
"title": "Wave Field LLM — O(n log n) attention via wave equation dynamics, within 5% of standard transformer"
}