Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreig4z6fjg57jucjbwt2ethentepxfgccx5ffiqea4c6bnid3h2flmi",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mhgwyy73z6y2"
  },
  "path": "/t/wave-field-llm-o-n-log-n-attention-via-wave-equation-dynamics-within-5-of-standard-transformer/173625#post_5",
  "publishedAt": "2026-03-19T13:53:13.000Z",
  "site": "https://discuss.huggingface.co",
  "textContent": "**[ Update ] Just built fused Triton kernels for Wave Field LLM v5.**\n\nWhen you build an architecture from scratch, you end up building\n\neverything from scratch.\n\n  * Custom attention mechanism (O(n log n) via FFT wave convolution)\n\n  * Custom optimizer (Wave optimization)\n\n  * Custom KV cache compression (WaveKV filtering)\n\n  * Custom Triton kernels (fused scatter-FFT-gather for H100)\n\n  * Custom positional encoding (Wave Field pipeline)\n\n\n\n\nNone of the existing tools work when your math is fundamentally\n\ndifferent.\n\nStandard transformers use Q·K^T dot products. We use damped wave\n\npropagation through a continuous field. Flash Attention can’t help us\n\n: it optimizes matrix multiplies we don’t do.\n\nSo we write our own.\n\nThe result: 20x faster than standard attention at 32K context. Runs at 128K where others OOM. 5x less memory.\n\nBuilding the full stack isn’t a choice — it’s a requirement when\n\nyou’re doing something new.\n\n**#WaveFieldLLM** **#AI** **#DeepLearning** **#Triton** **#CUDA** **#Optimization**",
  "title": "Wave Field LLM — O(n log n) attention via wave equation dynamics, within 5% of standard transformer"
}