External Publication
Visit Post

Help with optimization/profiling

Haskell Community [Unofficial] April 26, 2026
Source

No worries, I did see that mentioned in the initial post. I do like that we have a bit cleaner baseline now.

I’ve now found a way to speed up imageBytes a bit. A problem with the current formulation is your use of a function argument. Your imageBytes (and some parent functions) takes a function as argument which tells it how to normalize floats to pixel values. This function gets called for every pixel, but it only takes in a Float and produces a Pixel8. There is quite some overhead for allocating boxes for these values, so unboxing them would be beneficial. You can do that manually like this:

{-# NOINLINE imageBytes #-}
imageBytes :: SizedImage -> (Float# -> Pixel8#) -> BSL.ByteString
imageBytes img convertPixel# = do
    let convertPixel (F# x) = W8# (convertPixel# x)
        image =
            generateImage
                (\x y -> convertPixel $ img.entries SV.! ((img.height - y - 1) * img.width + x))
                img.width
                img.height
    case encodePalettedPng viridisPalette image of
        Right v -> v
        Left e -> error e

I also modified some of the functions that call imageBytes, but only in a quite straightforward way. This cuts down the total number of allocations quite a bit and the running time is cut by about 100ms:

% cabal -O2 run exes -- +RTS -s
     601,347,672 bytes allocated in the heap
         607,824 bytes copied during GC
     100,037,912 bytes maximum residency (4 sample(s))
         670,440 bytes maximum slop
             237 MiB total memory in use (1 MiB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0        44 colls,     0 par    0.000s   0.000s     0.0000s    0.0000s
  Gen  1         4 colls,     0 par    0.001s   0.004s     0.0009s    0.0022s

  INIT    time    0.001s  (  0.001s elapsed)
  MUT     time    0.705s  (  0.703s elapsed)
  GC      time    0.001s  (  0.004s elapsed)
  EXIT    time    0.001s  (  0.009s elapsed)
  Total   time    0.709s  (  0.718s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    852,594,474 bytes per MUT second

  Productivity  99.5% of total user, 98.0% of total elapsed

Another way to do this is to inline all the functions that take the function argument until the point where you provide the function argument. That is what JuicyPixels does; generateImage and related functions all have INLINE pragmas. In your case that means you would need to add INLINE pragmas for imageBytes and generateImageSvg. But the latter is quite a large function, so it is a bit silly to inline it just for this purpose.

Discussion in the ATmosphere

Loading comments...