{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreibwaqsqc7yb6uguze4mr3wstymagqmy64kqlbyxtpyepxk76rfjti",
    "uri": "at://did:plc:3okajqf5ifpzwzkuzluxd66m/app.bsky.feed.post/3ml2b7xjppyo2"
  },
  "path": "/niche-int-types-in-rust.html",
  "publishedAt": "2026-05-03T22:00:00.000Z",
  "site": "https://deterministic.space",
  "tags": [
    "seqair",
    "my post here",
    "NonZero<u32>",
    "NonZero<T>",
    "this issue",
    "Oli",
    "tracking issue",
    "Zulip channel",
    "pre-RFC document",
    "pattern_type!",
    "Rust PR 136006",
    "You can play with the code here.",
    "here"
  ],
  "textContent": "While working on seqair (see my post here ), I also wrote a bunch of wrapper types for the domain we work in. One example is `Base` (a DNA base), which is a fairly straightforward enum. Another is `Pos`, a position in a DNA sequence, which I’d like to talk about here.\n\nThe value range for a position is effectively `u31`. We only support the positions that the BAM file format supports, which store positions as `i32`. But positions are always positive! In C, one would use `-1` to indicate “no position”, “invalid position” (or a whole list of error cases) but in Rust, I’d prefer not to do that. Can we make use of this in some other way?\n\n## Simple position type\n\nSo here is what we have as a start. We also track if it is zero- or one-based using a type parameter because some file formats count from `1` for human convenience.\n\n\n    pub struct Pos<S> {\n        value: u32,\n        _system: PhantomData<S>,\n    }\n\n    pub struct Zero;\n    pub struct One;\n\n    impl TryFrom<i32> for Pos<Zero> {\n        type Error = InvalidPosition;\n\n        fn try_from(value: i32) -> Result<Self, Self::Error> {\n            if value < 0 { return Err(InvalidPosition); }\n            Ok(Self { value: value as u32, _system: PhantomData })\n        }\n    }\n\n    #[derive(Debug, thiserror::Error)]\n    #[error(\"Invalid position\")]\n    struct InvalidPosition;\n\n\nwhich we can use like this:\n\n\n    let record = get_raw_bam_record();\n    let position: Pos<Zero> = record.pos.try_into()?;\n\n\n## Type niches\n\nLet’s look at `Option<T>`. It is typically two things in memory: a discriminant (is this `Some` or `None`?) and the payload. For `Option<u32>`, that means 8 bytes: 4 for the tag1 and 4 for the `u32` value. But the compiler is smarter than that when the inner type has _invalid_ bit patterns. A reference `&T` can never be null, so `Option<&T>` uses the null pointer pattern to represent `None` and stays pointer-sized2.\n\nThe standard library’s NonZero<u32> works the same way: the value `0` is invalid, so `Option<NonZero<u32>>` fits in 4 bytes. These invalid bit patterns are called “niches”.\n\nYou can verify this with `size_of`:\n\n\n    assert_eq!(size_of::<Option<u32>>(), 8); // no niche\n    assert_eq!(size_of::<Option<NonZero<u32>>>(), 4); // niche\n\n\nOur `Pos` type wraps a plain `u32`, so `Option<Pos<Zero>>` is 8 bytes. That’s wasteful: we know positions only go up to `i32::MAX`, which means half the `u32` range is invalid. That’s billions of niches, and we can’t use a single one!\n\n## On stable: the `NonZero` bias trick\n\nNonZero<T> is the only niche-bearing integer type available on stable Rust (1.95.0). We can use it by storing `value + 1` internally: the position `0` maps to `NonZero(1)`, and `i32::MAX` maps to `NonZero(0x8000_0000)`. The `0` bit pattern is never used, giving us our niche.\n\n\n    #[repr(transparent)]\n    pub struct Pos<S> {\n        value: NonZeroU32, // stores actual_value + 1\n        _system: PhantomData<S>,\n    }\n\n    impl TryFrom<i32> for Pos<Zero> {\n        type Error = ();\n\n        fn try_from(value: i32) -> Result<Self, Self::Error> {\n            if value < 0 { return Err(()); }\n            let new_val = value as u32 + 1;\n            Ok(Self {\n                // SAFETY: Every positive i32 x fits into u32, and so does x+1\n                value: unsafe { NonZeroU32::new_unchecked(new_val) },\n                _system: PhantomData,\n            })\n        }\n    }\n\n    impl Pos<Zero> {\n        pub const fn get(self) -> i32 {\n            (self.value.get() - 1) as i32\n        }\n    }\n\n\nThis works, and `Option<Pos<Zero>>` is now 4 bytes. But every `new` adds 1 and every `get` subtracts 1. It’s a single ALU instruction each time, so the cost is probably negligible in most code paths. Still, it’s conceptually unsatisfying: we’re contorting the representation to fit a niche that doesn’t match our actual invariant.\n\n## On nightly: declaring the valid range directly\n\nAs of Rust 1.95 (May 2026), there is no stable way to tell the compiler “this `u32` only holds values `0..=0x7FFF_FFFF`.”\n\n~~But internally, the standard library does exactly that for its own types using the attributes`rustc_layout_scalar_valid_range_start` and `rustc_layout_scalar_valid_range_end`.~~\n\n**Oh wait – while I was writing this post, this exact feature got replaced!**\n\n## On nightly: Pattern types\n\nWhile researching this, I came across this issue where Oli proposes using a “pattern types” feature. So it seems there is new way of doing this! While reading through the tracking issue and Zulip channel , I found this pre-RFC document (last updated in 2024 but discussed further in 2025). What is in the standard library right now on nightly is a pattern_type! macro. Rust PR 136006 has some usage of this so I could put this together:\n\n\n    #![feature(pattern_types)]\n    #![feature(pattern_type_macro)]\n\n    pub struct Pos(pattern_type!(i32 is 0..=i32::MAX));\n\n    impl Pos {\n        pub const fn new(value: i32) -> Option<Self> {\n            if value < 0 {\n                None\n            } else {\n                // SAFETY: values >=0 fit in pattern\n                Some(Self(unsafe { std::mem::transmute(value) }))\n            }\n        }\n\n        pub const fn get(self) -> i32 {\n            // SAFETY: self.0 is subset of i32\n            unsafe { std::mem::transmute(self.0) }\n        }\n    }\n\n\nYou can play with the code here.\n\nFor now, `transmute` from the underlying type is the only way to construct the pattern type. On Zulip, Oli also recommended using inclusive ranges.\n\n## Conclusion\n\nI’m not using any of these nightly features in the real code yet, but I’m glad to see momentum in this space. It’s a feature I’ve wanted in a few places already. Other use cases for patterns type are an “inline length” type, removing a workaround like the one `SmolStr` uses here , or types that have sentinels by specification, like `INT8` in BAM files which actuallys is `-120..=127`.\n\n* * *\n\n  1. Because of alignment to 32 bits. ↩︎\n\n  2. That means it is as efficient to use `Option` in Rust as it is to use a null pointer in C. ↩︎\n\n\n",
  "title": "Niches for integer types in Rust"
}