[Pre-RFC] `cargo package` should include fewer files by default
It's good to see the discussions flurishing here.
Meanwhile I've hopefully read through most of the issues and comments linked to the license problem mentioned by @epage. From what I gather, it mostly boils down to the fact that workspace crates "lose" license files during packaging if they are only located in the workspace root. I think the comment about deciding against the glob pattern for matching licenses deals with copying files from the workspace root into the individual workspace crates to circumvent that issue, specifically when running cargo new I think (Symlink-or-copy LICENSE / LICENSE-* in workspace root when creating new member · Issue #13328 · rust-lang/cargo · GitHub).
I agree that this is not a good way to go about this because here assumptions are made about which license belongs to which workspace crate and which license is related to which SPDX entry, while ignoring if maybe there is a custom license in the mix as well. So to solve this issue, I think the discussed approaches to link the SPDX identifieres with the specific files (either with a table that links SPDX identifier with the license file path or, also mentioned, with an adjustment of the SPDX standard to include the file path directly as part of the SPDX entry as well) are reasonable.
Yet I believe that the glob-based matching of license files that I have described above (like pip does it) is not related to the solutions discussed as part of the workspace-license issue.
- In this Pre RFC we try to limit the number of files included by default in a crate during packaging.
- As part of this, we are only dealing with files that are already there, either as part of a single crate or as part of workspace crates.
- During packaging, I've proposed that we only take a sensible selection of files instead of all of them to improve crate size and ease supply chain reviews. What that sensible selection actually is should be figured out in this (Pre) RFC.
What we are not doing with the approach described in this Pre RFC:
- We are not adding any files from outside of a (workspace) crate. We are only reducing the number of included files to a subset of the files that are already there.
- We are not making any assumptions about which license belongs to which crate or which SPDX identifier. We are not categorizing licenses into SPDX licenses or custom licenses. License files that are in a crate root have been put there by the crate authors deliberately. License files in a workspace crate have been put there deliberately as well to workaround the current limitations of cargo license handling.
So in my opinion the issue of workspace licenses can be handled independently from the topic described in this Pre RFC and is not a prerequisite. Adding a glob-based license file matching to make a sensible selection of license files to include in a crate is not a short term, lukewarm solution to the issue of workspace licenses and is not influencing any decisions in that process. As soon as a solution is found for the workspace license issue, the glob matching of license files can be (and has to be) replaced by pulling in clear information about which license files need to go where, and I think that should be easily possible. During that step, crate authors have to make adjustments to their file structures anyway. But since currently most (if not all) workspaces have to rely on workarounds to correctly ship license files, the glob-based matching should be a fitting approach alongside it.
As mentioned, the workspace-license discussion has been going on since 2018, so there is a good chance that I have not read about every aspect of that topic. Please let me know if I've missed a crucial part that affects what I have written above. I hope it is not taken as carelessnes on my side.
It is also very interesting to read about the use cases of tests that are shipped with crates. Personally I've never run a test from a dependency in all my years of programming Rust at my day job or for personal projects, so it's good to learn about what others are doing with it. Since I am mostly working on macOS, I am not familiar with linux package managers and I feel like crater on the other hand is a somewhat mysterious thing looming in the background. So it's great go get a bit more insight in these topics.
Discussion in the ATmosphere