Other Applications • Re: Dactyloidae Browser (Basilisk fork)
I rewrote the memory management for JavaScript using SSE2. I don't know if I can push this to upstream though considering Pale Moon is AVX only
Keep in mind that I am using the AVX2 build of Pale Moon and Dactyloidae 13.0pre1 smokes it in loading YT despite being SSE2 only
I should probably clarify something for anyone listening who doesn't know a lot about instruction sets...
Partial Supersets, Not Complete Replacements:
SSE2 is a superset of MMX for integers: SSE2 added 128-bit integer instructions, making MMX redundant for integer workloads (and avoiding register aliasing issues). Modern compilers rarely generate MMX code. AVX is a superset of SSE: AVX uses the VEX prefix to extend SSE instructions to 256 bits (e.g., vaddps ymm0, ymm1, ymm2 vs. SSE’s addps xmm0, xmm1). SSE instructions still work on AVX-enabled CPUs (using the lower 128 bits of YMM registers). AVX2 is a superset of AVX: Adds 256-bit integer and gather/scatter support to AVX. AVX-512 is a superset (with caveats): Extends AVX2 to 512 bits but requires hardware support (not all CPUs have it).
So, some people might misread that and think having AVX means you can't have SSE... that's not actually the case. AVX just opens the door to more optimizations. So we could have all the optimizations he has and also some he doesn't, we wouldn't have to drop back to down SSE2 just to benefit.
The reason it's impressive is because he's got the speed improvement without using AVX optimization at all, and working entirely within the limits of what SSE2 can do. It's like if someone got an application to run faster on a Pentium III CPU than it did on a Pentium D. It wouldn't mean the Pentium III itself was a better CPU, it would mean the code was very optimized to the point that it beat the old code even despite the handcap of less capable optimizations or older hardware.
Discussion in the ATmosphere