{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreia2qzapj2nzlymeqlk2c6oj3enhuwgysnuatproes6giif7zoy5qe",
"uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mgrbayou72a2"
},
"path": "/t/technical-blog-post-streaming-algorithms-and-numerical-stability-in-ml-systems/174162#post_1",
"publishedAt": "2026-03-11T01:42:02.000Z",
"site": "https://discuss.huggingface.co",
"tags": [
"huggingface.co",
"FlashAttention, Streaming Algorithms, and Numerical Stability in Modern ML..."
],
"textContent": "Hey everyone — I published a technical blog post today on streaming algorithms and numerical stability in ML systems, using FlashAttention as the main example:\n\n**What it covers:**\nFlashAttention’s tiled computation avoids materializing the full attention matrix — but doing so requires maintaining numerically stable running statistics (running max, normalization constant, output accumulator). This turns out to be the same design constraint behind stable softmax, log-sum-exp, and Welford’s variance algorithm.\n\nThe post traces that common pattern and includes two small experiments:\n\n * Variance: four mathematically equivalent formulas that produce wildly different results under float32 (including one that returns -65,542 when the correct answer is ~1)\n * Softmax: naive vs. subtract-max, showing overflow propagation to NaN\n\n\n\n**Why I wrote it:**\nA lot of the “implementation details” in ML infrastructure aren’t really details — they’re load-bearing. I wanted to write something that made that concrete rather than just asserting it.\n\nWould love feedback, especially:\n\n * Are there other examples of this pattern I should have included?\n * Anything in the numerical stability section that could be sharper?\n\n\n\nto post:\n\nhuggingface.co\n\n### FlashAttention, Streaming Algorithms, and Numerical Stability in Modern ML...\n\nA Blog post by Jen Wei on Hugging Face\n\n, Jen",
"title": "Technical blog post -- streaming algorithms and numerical stability in ML systems"
}