{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreicqwopifytixle3dzvll2zb4darnajks2hb2yb5r5nw6wo5m6fxly",
"uri": "at://did:plc:pi6woz4d47bkuws673w2il2r/app.bsky.feed.post/3mkgjyhignpz2"
},
"path": "/t/help-with-optimization-profiling/13982#post_13",
"publishedAt": "2026-04-26T20:51:51.000Z",
"site": "https://discourse.haskell.org",
"textContent": "No worries, I did see that mentioned in the initial post. I do like that we have a bit cleaner baseline now.\n\nI’ve now found a way to speed up `imageBytes` a bit. A problem with the current formulation is your use of a function argument. Your `imageBytes` (and some parent functions) takes a function as argument which tells it how to normalize floats to pixel values. This function gets called for every pixel, but it only takes in a `Float` and produces a `Pixel8`. There is quite some overhead for allocating boxes for these values, so unboxing them would be beneficial. You can do that manually like this:\n\n\n {-# NOINLINE imageBytes #-}\n imageBytes :: SizedImage -> (Float# -> Pixel8#) -> BSL.ByteString\n imageBytes img convertPixel# = do\n let convertPixel (F# x) = W8# (convertPixel# x)\n image =\n generateImage\n (\\x y -> convertPixel $ img.entries SV.! ((img.height - y - 1) * img.width + x))\n img.width\n img.height\n case encodePalettedPng viridisPalette image of\n Right v -> v\n Left e -> error e\n\n\nI also modified some of the functions that call `imageBytes`, but only in a quite straightforward way. This cuts down the total number of allocations quite a bit and the running time is cut by about 100ms:\n\n\n % cabal -O2 run exes -- +RTS -s\n 601,347,672 bytes allocated in the heap\n 607,824 bytes copied during GC\n 100,037,912 bytes maximum residency (4 sample(s))\n 670,440 bytes maximum slop\n 237 MiB total memory in use (1 MiB lost due to fragmentation)\n\n Tot time (elapsed) Avg pause Max pause\n Gen 0 44 colls, 0 par 0.000s 0.000s 0.0000s 0.0000s\n Gen 1 4 colls, 0 par 0.001s 0.004s 0.0009s 0.0022s\n\n INIT time 0.001s ( 0.001s elapsed)\n MUT time 0.705s ( 0.703s elapsed)\n GC time 0.001s ( 0.004s elapsed)\n EXIT time 0.001s ( 0.009s elapsed)\n Total time 0.709s ( 0.718s elapsed)\n\n %GC time 0.0% (0.0% elapsed)\n\n Alloc rate 852,594,474 bytes per MUT second\n\n Productivity 99.5% of total user, 98.0% of total elapsed\n\n\nAnother way to do this is to inline all the functions that take the function argument until the point where you provide the function argument. That is what JuicyPixels does; `generateImage` and related functions all have `INLINE` pragmas. In your case that means you would need to add `INLINE` pragmas for `imageBytes` and `generateImageSvg`. But the latter is quite a large function, so it is a bit silly to inline it just for this purpose.",
"title": "Help with optimization/profiling"
}