{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreibwaqsqc7yb6uguze4mr3wstymagqmy64kqlbyxtpyepxk76rfjti",
"uri": "at://did:plc:3okajqf5ifpzwzkuzluxd66m/app.bsky.feed.post/3ml2b7xjppyo2"
},
"path": "/niche-int-types-in-rust.html",
"publishedAt": "2026-05-03T22:00:00.000Z",
"site": "https://deterministic.space",
"tags": [
"seqair",
"my post here",
"NonZero<u32>",
"NonZero<T>",
"this issue",
"Oli",
"tracking issue",
"Zulip channel",
"pre-RFC document",
"pattern_type!",
"Rust PR 136006",
"You can play with the code here.",
"here"
],
"textContent": "While working on seqair (see my post here ), I also wrote a bunch of wrapper types for the domain we work in. One example is `Base` (a DNA base), which is a fairly straightforward enum. Another is `Pos`, a position in a DNA sequence, which I’d like to talk about here.\n\nThe value range for a position is effectively `u31`. We only support the positions that the BAM file format supports, which store positions as `i32`. But positions are always positive! In C, one would use `-1` to indicate “no position”, “invalid position” (or a whole list of error cases) but in Rust, I’d prefer not to do that. Can we make use of this in some other way?\n\n## Simple position type\n\nSo here is what we have as a start. We also track if it is zero- or one-based using a type parameter because some file formats count from `1` for human convenience.\n\n\n pub struct Pos<S> {\n value: u32,\n _system: PhantomData<S>,\n }\n\n pub struct Zero;\n pub struct One;\n\n impl TryFrom<i32> for Pos<Zero> {\n type Error = InvalidPosition;\n\n fn try_from(value: i32) -> Result<Self, Self::Error> {\n if value < 0 { return Err(InvalidPosition); }\n Ok(Self { value: value as u32, _system: PhantomData })\n }\n }\n\n #[derive(Debug, thiserror::Error)]\n #[error(\"Invalid position\")]\n struct InvalidPosition;\n\n\nwhich we can use like this:\n\n\n let record = get_raw_bam_record();\n let position: Pos<Zero> = record.pos.try_into()?;\n\n\n## Type niches\n\nLet’s look at `Option<T>`. It is typically two things in memory: a discriminant (is this `Some` or `None`?) and the payload. For `Option<u32>`, that means 8 bytes: 4 for the tag1 and 4 for the `u32` value. But the compiler is smarter than that when the inner type has _invalid_ bit patterns. A reference `&T` can never be null, so `Option<&T>` uses the null pointer pattern to represent `None` and stays pointer-sized2.\n\nThe standard library’s NonZero<u32> works the same way: the value `0` is invalid, so `Option<NonZero<u32>>` fits in 4 bytes. These invalid bit patterns are called “niches”.\n\nYou can verify this with `size_of`:\n\n\n assert_eq!(size_of::<Option<u32>>(), 8); // no niche\n assert_eq!(size_of::<Option<NonZero<u32>>>(), 4); // niche\n\n\nOur `Pos` type wraps a plain `u32`, so `Option<Pos<Zero>>` is 8 bytes. That’s wasteful: we know positions only go up to `i32::MAX`, which means half the `u32` range is invalid. That’s billions of niches, and we can’t use a single one!\n\n## On stable: the `NonZero` bias trick\n\nNonZero<T> is the only niche-bearing integer type available on stable Rust (1.95.0). We can use it by storing `value + 1` internally: the position `0` maps to `NonZero(1)`, and `i32::MAX` maps to `NonZero(0x8000_0000)`. The `0` bit pattern is never used, giving us our niche.\n\n\n #[repr(transparent)]\n pub struct Pos<S> {\n value: NonZeroU32, // stores actual_value + 1\n _system: PhantomData<S>,\n }\n\n impl TryFrom<i32> for Pos<Zero> {\n type Error = ();\n\n fn try_from(value: i32) -> Result<Self, Self::Error> {\n if value < 0 { return Err(()); }\n let new_val = value as u32 + 1;\n Ok(Self {\n // SAFETY: Every positive i32 x fits into u32, and so does x+1\n value: unsafe { NonZeroU32::new_unchecked(new_val) },\n _system: PhantomData,\n })\n }\n }\n\n impl Pos<Zero> {\n pub const fn get(self) -> i32 {\n (self.value.get() - 1) as i32\n }\n }\n\n\nThis works, and `Option<Pos<Zero>>` is now 4 bytes. But every `new` adds 1 and every `get` subtracts 1. It’s a single ALU instruction each time, so the cost is probably negligible in most code paths. Still, it’s conceptually unsatisfying: we’re contorting the representation to fit a niche that doesn’t match our actual invariant.\n\n## On nightly: declaring the valid range directly\n\nAs of Rust 1.95 (May 2026), there is no stable way to tell the compiler “this `u32` only holds values `0..=0x7FFF_FFFF`.”\n\n~~But internally, the standard library does exactly that for its own types using the attributes`rustc_layout_scalar_valid_range_start` and `rustc_layout_scalar_valid_range_end`.~~\n\n**Oh wait – while I was writing this post, this exact feature got replaced!**\n\n## On nightly: Pattern types\n\nWhile researching this, I came across this issue where Oli proposes using a “pattern types” feature. So it seems there is new way of doing this! While reading through the tracking issue and Zulip channel , I found this pre-RFC document (last updated in 2024 but discussed further in 2025). What is in the standard library right now on nightly is a pattern_type! macro. Rust PR 136006 has some usage of this so I could put this together:\n\n\n #![feature(pattern_types)]\n #![feature(pattern_type_macro)]\n\n pub struct Pos(pattern_type!(i32 is 0..=i32::MAX));\n\n impl Pos {\n pub const fn new(value: i32) -> Option<Self> {\n if value < 0 {\n None\n } else {\n // SAFETY: values >=0 fit in pattern\n Some(Self(unsafe { std::mem::transmute(value) }))\n }\n }\n\n pub const fn get(self) -> i32 {\n // SAFETY: self.0 is subset of i32\n unsafe { std::mem::transmute(self.0) }\n }\n }\n\n\nYou can play with the code here.\n\nFor now, `transmute` from the underlying type is the only way to construct the pattern type. On Zulip, Oli also recommended using inclusive ranges.\n\n## Conclusion\n\nI’m not using any of these nightly features in the real code yet, but I’m glad to see momentum in this space. It’s a feature I’ve wanted in a few places already. Other use cases for patterns type are an “inline length” type, removing a workaround like the one `SmolStr` uses here , or types that have sentinels by specification, like `INT8` in BAM files which actuallys is `-120..=127`.\n\n* * *\n\n 1. Because of alignment to 32 bits. ↩︎\n\n 2. That means it is as efficient to use `Option` in Rust as it is to use a null pointer in C. ↩︎\n\n\n",
"title": "Niches for integer types in Rust"
}