Help with optimization/profiling
No worries, I did see that mentioned in the initial post. I do like that we have a bit cleaner baseline now.
I’ve now found a way to speed up imageBytes a bit. A problem with the current formulation is your use of a function argument. Your imageBytes (and some parent functions) takes a function as argument which tells it how to normalize floats to pixel values. This function gets called for every pixel, but it only takes in a Float and produces a Pixel8. There is quite some overhead for allocating boxes for these values, so unboxing them would be beneficial. You can do that manually like this:
{-# NOINLINE imageBytes #-}
imageBytes :: SizedImage -> (Float# -> Pixel8#) -> BSL.ByteString
imageBytes img convertPixel# = do
let convertPixel (F# x) = W8# (convertPixel# x)
image =
generateImage
(\x y -> convertPixel $ img.entries SV.! ((img.height - y - 1) * img.width + x))
img.width
img.height
case encodePalettedPng viridisPalette image of
Right v -> v
Left e -> error e
I also modified some of the functions that call imageBytes, but only in a quite straightforward way. This cuts down the total number of allocations quite a bit and the running time is cut by about 100ms:
% cabal -O2 run exes -- +RTS -s
601,347,672 bytes allocated in the heap
607,824 bytes copied during GC
100,037,912 bytes maximum residency (4 sample(s))
670,440 bytes maximum slop
237 MiB total memory in use (1 MiB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 44 colls, 0 par 0.000s 0.000s 0.0000s 0.0000s
Gen 1 4 colls, 0 par 0.001s 0.004s 0.0009s 0.0022s
INIT time 0.001s ( 0.001s elapsed)
MUT time 0.705s ( 0.703s elapsed)
GC time 0.001s ( 0.004s elapsed)
EXIT time 0.001s ( 0.009s elapsed)
Total time 0.709s ( 0.718s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 852,594,474 bytes per MUT second
Productivity 99.5% of total user, 98.0% of total elapsed
Another way to do this is to inline all the functions that take the function argument until the point where you provide the function argument. That is what JuicyPixels does; generateImage and related functions all have INLINE pragmas. In your case that means you would need to add INLINE pragmas for imageBytes and generateImageSvg. But the latter is quite a large function, so it is a bit silly to inline it just for this purpose.
Discussion in the ATmosphere