{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreif2r2lj5ybxhfvej7bjyuwllom4diipft4aextmrx4qxr2youz4uu",
    "uri": "at://did:plc:pi6woz4d47bkuws673w2il2r/app.bsky.feed.post/3midwwjdkpr72"
  },
  "path": "/t/ray-tracing-in-one-weekend/10078#post_9",
  "publishedAt": "2026-03-30T22:52:23.000Z",
  "site": "https://discourse.haskell.org",
  "tags": [
    "https://dpwiz.gitlab.io/rtow/"
  ],
  "textContent": "> Use SIMD, do ssssttttuuuuppppiiiidddd tttthhhhiiiinnnnggggssss faster, with more energy efficiency\n\nA long-delayed post-Zurihac update. The only primops used so far are the basics that the NCG got in 9.12. No special layouts and fancy primops as they will only appear in 9.16 (and then I’ll need some more). Just do the regular thing four lanes at once. It’s a good start… oh wai~\n\nIt got better since then. The models took a repo with half-staged commits and made it running without segfaults. They also told me that I was absolutely right, but my BVH code is stupid and could be much simpler and faster. And also cooked benchmark harness which I was procrastinated since the very beginning. I love getting faster, but hate writing benchmarks\n\nBut one scene was quite resistant to the improvements, the `finalSceneHigh` from the 2nd book.\nI wasn’t sure why and had to guess until I implemented a tiled scheduler. The abysmally slow tile was not where I expected it to be, but it persisted through most of the run at just 1/40 of the mean pixel rate. And also broke the tiled scheduler by leaving the last job working full steam, while the remaining 15 cores were idle.\n\nI took a low-SPP run to measure the tiles and put the slowest first. Then subdivide the slowest half. Then subdivide the slowest quarter again. This mostly solved the idling cores, but the solution wasn’t satisfactory.\nAnd the code was a mess. While refactoring it around I thought that I can skip the tile/subtile distinction and the grid itself and just work with arbitrary rectangles. That also served one of the long-standing project’s goals - binding the renderer to UI where I can select regions and get them rendered for debugging.\nI still have no UI (or a job server to distribute the load over all of the household appliances around my house and my parents’ too), and no debugging. But after a few iterations the round limit and most of the manual size tweaking were gone as the system was optimizing itself by measuring and subdividing until the tiles are fast enough or too small.\nThe slow bunch is still slow, but the system now crushes it first and fills the scheduling gap at the end with the easier tiles.\n\nI even got back and recorded the historical renders to make a timeline mini-site: https://dpwiz.gitlab.io/rtow/\n\nThis is far from over, but I’m not dead-inside wrt this codebase anymore.",
  "title": "Ray Tracing in One Weekend"
}