{
"path": "/high-performance-rust.html",
"site": "at://did:plc:x67qh7v3fd7znbdhauc45ng3/site.standard.publication/3mjcd2t6afe25",
"$type": "site.standard.document",
"title": "Cheap tricks for high-performance Rust",
"updatedAt": "2020-03-04T00:00:00.000Z",
"publishedAt": "2020-03-04T00:00:00.000Z",
"textContent": "So you're writing Rust but it's not fast enough?\nEven though you're using cargo build --release?\nHere's some small things you can do to increase the runtime speed of a Rust project\n– practically without changing any code!\n\nPlease remember that the following suggestions do not replace actual profiling and optimizations!\nI also think it goes without saying that the only way to detect if any of this helps\nis having benchmarks that represent how your application behaves under real usage.\n\nIf you'd like to read about performance optimizations that take a bit longer\nbut actually are about improving your code,\nhave a look at this small online book\nby Nicholas Nethercote.\n\nTweaking our release profile\n\nLet's first of all enable some more optimizations\nfor when we do cargo build --release.\nThe deal is pretty simple:\nWe enable some features that make building release builds even slower\nbut get more thorough optimizations as a reward.\n\nWe add the flags described below to our main Cargo.toml file,\ni.e., the top most manifest file in case you are using a [Cargo workspace].\nIf you don't already have a section called profile.release, add it:\n\nLink-time optimization\n\nThe first thing we'll do is enable [link-time optimization] (LTO).\nIt's a kind of whole-program or inter-module optimization as it runs as the very last step\nwhen linking the different parts of your binary together.\nYou can think of it as allowing\nbetter inlining across dependency boundaries\n(but it's of course more complicated that that).\n\nRust can use multiple linker flavors,\nand the one we want is \"optimize across all crates\", which is called \"fat\".\nTo set this, add the [lto] flag to your profile:\n\nCode generation units\n\nNext up is a similar topic.\nTo speed up compile times, Rust tries to split your crates into small chunks\nand compile as many in parallel as possible.\nThe downside is that there's less opportunities for the compiler\nto optimize code across these chunks.\nSo, let's [tell it][codegen-units] to do one chunk per crate:\n\nSetting a specific target CPU\n\nBy default, Rust wants to build a binary that works on as many machines\nof the target architecture as possible.\nHowever, you might actually have a pretty new CPU with cool new features!\nTo [enable][target-cpu] those, we add\n\nas a \"Rust flag\",\ni.e. the environment variable RUSTFLAGS\nor the target's rustflags field in your [.cargo/config].\n\nAborting\n\nNow we get into some of the more unsafe options.\nRemember how Rust by default uses [stack unwinding]\n(on the most common platforms)?\nThat costs performance!\nLet's skip stack traces and the ability to catch panics\nfor reduced code size and better cache usage:\n\nPlease note that some libraries might depend on unwinding\nand will explode horribly if you enable this!\n\nUsing a different allocator\n\nOne thing many Rust programs do is allocate memory.\nAnd they don't just do this themselves but actually use an (external) library for that:\nan allocator.\nCurrent Rust binaries use the default system allocator by default,\npreviously they included their own with the standard library.\n(This change has lead to smaller binaries and better debug-abiliy\nwhich made some people quite happy).\n\nSometimes your system's allocator is not the best pick, though.\nNot to worry, we can change it!\nI suggest giving both [jemalloc] and [mimalloc] a try.\n\njemalloc\n\n[jemalloc] is the allocator that Rust previously shipped with\nand that the Rust compiler still uses itself.\nIts focus is to reduce memory fragmentation and support high concurrency.\nIt's also the default allocator on FreeBSD.\nIf this sounds interesting to you, let's give it a try!\n\nFirst off, add the [jemallocator] crate as a dependency:\n\nThen in your applications entry point (main.rs),\nset it as the global allocator like this:\n\nPlease note that jemalloc doesn't support all platforms.\n\nmimalloc\n\nAnother interesting alternative allocator is [mimalloc].\nIt was developed by Microsoft, has quite a small footprint,\nand some innovative ideas for free lists.\n\nIt also features configurable security features\n(have a look at [its Cargo.toml][mimalloc features]).\nWhich means we can turn them off more performance!\nAdd the [mimalloc crate] as a dependency like this:\n\nand, same as above, add this to your entry point file:\n\nProfile Guided Optimization\n\nThis is a neat feature of LLVM\nbut I've never used it.\nPlease read [the docs][pgo].\n\nActual profiling and optimizing your code\n\nNow this is where you need to actually adjust your code\nand fix all those clone() calls.\nSadly, this is a topic for another post!\n(While you wait another year for me to write it, you can read about [cows]!)\n\nEdit: People keep asking for those actual tips on how to optimize Rust code.\nAnd luckily ~~I tricked them~~ they had some good material for me to link to:\n\n- The very convenient cargo flamegraph (also works as a standalone tool)\n- Christopher Sebastian recently published How To Write Fast Rust Code\n- Jack Fransham's Fastware Workshop from RustFest 2018\n\n[Cargo workspace]: https://doc.rust-lang.org/1.41.1/book/ch14-03-cargo-workspaces.html\n[link-time optimization]: https://llvm.org/docs/LinkTimeOptimization.html\n[lto]: https://doc.rust-lang.org/1.41.1/rustc/codegen-options/index.html#lto\n[codegen-units]: https://doc.rust-lang.org/1.41.1/rustc/codegen-options/index.html#codegen-units\n[target-cpu]: https://doc.rust-lang.org/1.41.1/rustc/codegen-options/index.html#target-cpu\n[panic flag]: https://doc.rust-lang.org/1.41.1/rustc/codegen-options/index.html#panic\n[opt-level]: https://doc.rust-lang.org/1.41.1/rustc/codegen-options/index.html#opt-level\n[jemalloc]: https://github.com/jemalloc/jemalloc\n[jemallocator]: https://docs.rs/jemallocator\n[mimalloc]: https://github.com/microsoft/mimalloc\n[mimalloc crate]: https://docs.rs/mimalloc\n[mimalloc features]: https://github.com/purpleprotocol/mimalloc_rust/blob/c6bf4578d3258a0b6a28696196ede6d50e5ee8c2/Cargo.toml#L25-L28\n[stack unwinding]: https://doc.rust-lang.org/1.41.1/nomicon/unwinding.html\n[pgo]: https://doc.rust-lang.org/1.41.1/rustc/profile-guided-optimization.html\n[cows]: /secret-life-of-cows.html\n[.cargo/config]: https://doc.rust-lang.org/1.41.1/cargo/reference/config.html",
"canonicalUrl": "https://deterministic.space//high-performance-rust.html"
}