[Pre-RFC] `cargo package` should include fewer files by default
SwishSwushPow:
A bit of context: At my company, it has become part of our routine to review every Rust dependency that is updated in our code base. This means that we have looked at hundreds of project structures and diffs over the months and from time to time we find crates that include files that aren’t really required. Be it gifs or pngs, pdfs, or maybe binary test data. For some of them, it is worth discussing if they should actually be shipped with the crate, e.g. to locally verify that the tests are running, and that is ok, but some files are truly not required and balloon the size of the crate without any benefits.
Whenever we encounter such cases, we try to open PRs to exclude these files from the shipped crate. Here are some examples for the typical three scenarios:
I am sympathetic to your plight. As package maintainer for Rust crates in Fedora Linux, we often encounter the same issues while reviewing crate contents, and also file pull requests to exclude problematic files from published crates.
Big however: In my experience, the inverse problem happens much more frequently, i.e. not enough files are included in published crates for what we need - and I fear that including fewer files by default would exacerbate this issue.
The most frequent problem is that license texts - especially for licenses where including the license text is mandatory - are missing, which is a hard blocker for packaging purposes. It appears that there are multiple open RFCs for cargo that could improve this situation with better Cargo.toml metadata or cargo behaviour (cargo#5933, cargo#16666, cargo#16893, cargo#12053, cargo#9972, etc.), some of which are three Editions old
The second most frequent issue is that published crates don't contain input files for tests. We run cargo test during packaging (where possible) to make sure we don't ship software that is completely broken. I understand that test input files that are larger than a few kilobytes get excluded from published crates by most developers - this is a reasonable decision. However, it currently appears to be "best practice" (or at the very least, "usual practice") to extensively use include_bytes! or include_str! macros in test code to include that test data into tests, which means running cargo test fails to compile (due to missing input files) instead of failing the test at runtime. Tests that harmlessly fail at runtime are easy to deal with (cargo test -- --skip test::name), but if cargo test fails-fast because the tests fail to compile that's much more annoying to deal with.
In both cases, I fear that the proposal here (include fewer files by default) would make things worse, while only having a small positive effect in most other circumstances.
Discussion in the ATmosphere