Take a look at the neural network framework I wrote, which is implemented in Rust + CUDA

Hugging Face Forums [Unofficial] May 2, 2026

Source

github.com

Lumen 是一个轻量级、高性能的深度学习训练与推理框架，使用 Rust + CUDA编写。

Highlights

Rust-first, not Rust-only implementation
- Rust owns the framework structure and most high-level logic.
- CUDA C++ is used for optional GPU acceleration.
- CPU-only builds remain available without the cuda feature.
Dynamic autograd built around tensor graph construction
Module-style abstraction for model components
Separated layers / ops / models for easier experimentation
Flexible precision system
- parameter dtype
- runtime dtype
- activation dtype
- KV-cache dtype
Quantization-aware loading
- load float weights normally
- quantize on load to i8
- generate offline quantized safetensors
CPU and CUDA execution paths with explicit kernel/backend work
Hugging Facetokenizers integration
Safetensors support with memory-mapped and streamed loading modes
Release profile tuned with lto, panic = "abort", and strip