{
"path": "/niche-int-types-in-rust.html",
"site": "at://did:plc:x67qh7v3fd7znbdhauc45ng3/site.standard.publication/3mjcd2t6afe25",
"$type": "site.standard.document",
"title": "Niches for integer types in Rust",
"updatedAt": "2026-05-04T00:00:00.000Z",
"bskyPostRef": {
"cid": "bafyreigz572nhuadza4bpy33nh5uzvpjyxlgzo3bz2c5ybcn7fv5g5nddy",
"uri": "at://did:plc:x67qh7v3fd7znbdhauc45ng3/app.bsky.feed.post/3mkzwpegfyu2k"
},
"publishedAt": "2026-05-04T00:00:00.000Z",
"textContent": "While working on [seqair]\n(see [my post here][seqair-post]),\nI also wrote a bunch of wrapper types\nfor the domain we work in.\nOne example is Base (a DNA base),\nwhich is a fairly straightforward enum.\nAnother is Pos, a position in a DNA sequence,\nwhich I'd like to talk about here.\n\nThe value range for a position is effectively u31.\nWe only support the positions that the BAM file format supports,\nwhich store positions as i32.\nBut positions are always positive!\nIn C, one would use -1 to indicate \"no position\", \"invalid position\"\n(or a whole list of error cases)\nbut in Rust, I'd prefer not to do that.\nCan we make use of this in some other way?\n\n[seqair]: https://github.com/Softleif/seqair \"Pure-Rust BAM/SAM/CRAM/FASTA reader with a pileup engine and BCF/BAM writing.\"\n[seqair-post]: https://deterministic.space/seqair.html \"Seqair, a custom htslib reimplementation\"\n\nSimple position type\n\nSo here is what we have as a start.\nWe also track if it is zero- or one-based using a type parameter\nbecause some file formats count from 1 for human convenience.\n\nwhich we can use like this:\n\nType niches\n\nLet's look at Option<T>.\nIt is typically two things in memory:\na discriminant (is this Some or None?)\nand the payload.\nFor Option<u32>, that means 8 bytes:\n4 for the tag[^alignment] and 4 for the u32 value.\nBut the compiler is smarter than that\nwhen the inner type has _invalid_ bit patterns.\nA reference &T can never be null,\nso Option<&T> uses the null pointer pattern to represent None\nand stays pointer-sized[^null].\n\nThe standard library's [NonZero<u32>] works the same way:\nthe value 0 is invalid,\nso Option<NonZero<u32>> fits in 4 bytes.\nThese invalid bit patterns are called \"niches\".\n\n[^alignment]: Because of alignment to 32 bits.\n[^null]: That means it is as efficient to use Option in Rust\n as it is to use a null pointer in C.\n\nYou can verify this with size_of:\n\nOur Pos type wraps a plain u32, so Option<Pos<Zero>> is 8 bytes.\nThat's wasteful:\nwe know positions only go up to i32::MAX,\nwhich means half the u32 range is invalid.\nThat's billions of niches, and we can't use a single one!\n\nOn stable: the NonZero bias trick\n\n[NonZero<T>] is the only niche-bearing integer type\navailable on stable Rust (1.95.0).\nWe can use it by storing value + 1 internally:\nthe position 0 maps to NonZero(1),\nand i32::MAX maps to NonZero(0x8000_0000).\nThe 0 bit pattern is never used, giving us our niche.\n\nThis works, and Option<Pos<Zero>> is now 4 bytes.\nBut every new adds 1 and every get subtracts 1.\nIt's a single ALU instruction each time,\nso the cost is probably negligible in most code paths.\nStill, it's conceptually unsatisfying:\nwe're contorting the representation\nto fit a niche that doesn't match our actual invariant.\n\n[NonZero<u32>]: https://doc.rust-lang.org/1.95.0/std/num/struct.NonZero.html \"std::num::NonZero\"\n[NonZero<T>]: https://doc.rust-lang.org/1.95.0/std/num/struct.NonZero.html \"std::num::NonZero\"\n\nOn nightly: declaring the valid range directly\n\nAs of Rust 1.95 (May 2026),\nthere is no stable way to tell the compiler\n\"this u32 only holds values 0..=0x7FFF_FFFF.\"\n\n~~But internally, the standard library does exactly that for its own types\nusing the attributes\nrustc_layout_scalar_valid_range_start and rustc_layout_scalar_valid_range_end.~~\n\nOh wait -- while I was writing this post,\nthis exact feature got replaced!\n\nOn nightly: Pattern types\n\nWhile researching this,\nI came across [this issue][rust-135996]\nwhere [Oli] proposes using a \"pattern types\" feature.\nSo it seems there is new way of doing this!\nWhile reading through the [tracking issue][rust-123646] and [Zulip channel],\nI found this [pre-RFC document][pre-rfc]\n(last updated in 2024 but discussed further in 2025).\nWhat is in the standard library right now\non nightly is a [pattern_type!] macro.\n[Rust PR 136006] has some usage of this so I could put this together:\n\n[Oli]: https://github.com/oli-obk\n[rust-135996]: https://github.com/rust-lang/rust/issues/135996 \"Replace rustc_layout_scalar_valid_range_start attribute with pattern types\"\n[rust-123646]: https://github.com/rust-lang/rust/issues/123646\n[pre-rfc]: https://gist.github.com/joboet/0cecbce925ee2ad1ee3e5520cec81e30\n[pattern_type!]: https://doc.rust-lang.org/1.95.0/core/macro.pattern_type.html \"core::pattern_type!\"\n[Rust PR 136006]: https://github.com/rust-lang/rust/pull/136006\n[Zulip channel]: https://rust-lang.zulipchat.com/#narrow/channel/481660-t-lang.2Fpattern-types\n\n[You can play with the code here.][play2]\n\n[play2]: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2024&gist=d4c7bf6936d2bcef89455e7e271fba76\n\nFor now, transmute from the underlying type is the only way\nto construct the pattern type.\nOn Zulip, Oli also recommended using inclusive ranges.\n\nConclusion\n\nI'm not using any of these nightly features in the real code yet,\nbut I'm glad to see momentum in this space.\nIt's a feature I've wanted in a few places already.\nOther use cases for patterns type are an \"inline length\" type,\nremoving a workaround like the one SmolStr uses [here][smol_str len],\nor types that have sentinels by specification,\nlike INT8 in BAM files which actuallys is -120..=127.\n\n[smol_str len]: https://github.com/rust-lang/rust-analyzer/blob/4a244d4c6bf18bae57626dcaf81bf6442ad59380/lib/smol_str/src/lib.rs#L541-L569",
"canonicalUrl": "https://deterministic.space/niche-int-types-in-rust.html"
}