High-performance zero-copy tensor serialization for Inference
We’re too comfortable with serialization that treats high-end silicon like a text parser. Tenso eliminates the invisible tax where formats like SafeTensors and Pickle burn 40% of your CPU just to move data.
The update introduces a Direct Pinned Memory Reader. It allocates page-locked memory to trigger async DMA transfers directly to VRAM for PyTorch and JAX, bypassing the copy overhead and keeping CPU usage at a minimal 0.8%.
I’ve also hardened the protocol with strict validation guards and optional XXH3 checksums. Bluntly, enabling checksums kills the zero-copy speed, but safety is now a configurable trade-off. With native async support for FastAPI and gRPC, Tenso is finally a transport layer that respects the hardware.
github.com
GitHub - Khushiyant/tenso: High-performance zero-copy tensor serialization...
High-performance zero-copy tensor serialization for Fastest Transmission
Discussion in the ATmosphere