{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreig7gl6nnfqqkprzhfbt6mb4gcc5thwzehqnstomursu6gdoa6kapq",
"uri": "at://did:plc:25rdn5elo5izoxrmtis34zuk/app.bsky.feed.post/3moigqrabfco2"
},
"coverImage": {
"$type": "blob",
"ref": {
"$link": "bafkreiea74xqhoju7rksdj4mzrgec44hlf4ltdaa6i2nkctoiiu67uzy5e"
},
"mimeType": "image/webp",
"size": 100646
},
"path": "/_06a3df6b50aec966668fb/fdupes-is-great-until-you-cant-install-it-i-built-a-zero-install-duplicate-finder-3313",
"publishedAt": "2026-06-17T13:28:52.000Z",
"site": "https://dev.to",
"tags": [
"showdev",
"cli",
"opensource",
"productivity",
"https://www.npmjs.com/package/duphunt",
"https://pypi.org/project/duphunt/",
"https://github.com/jjdoor/duphunt"
],
"textContent": "`fdupes`, `jdupes`, `rdfind`, `fclones` — the duplicate-file finders are all excellent. They're also all native binaries you have to install first. Which is exactly what you can't do on the box that actually has the duplicate-file problem: the locked-down work laptop, the client's server, the CI runner, the throwaway container, the colleague's machine you're helping debug.\n\nSo I built **duphunt** : a duplicate finder that runs the second you have Node or Python, with nothing to install and no dependencies of its own.\n\n\n\n $ npx duphunt ~/Downloads\n\n 2 duplicate group(s), 5 files, 8.1 MB reclaimable\n\n 4.1 MB × 2 4.1 MB reclaimable\n /Users/me/Downloads/invoice.pdf\n /Users/me/Downloads/invoice (1).pdf\n\n 2.0 MB × 3 4.0 MB reclaimable\n /Users/me/Downloads/clip.mp4\n /Users/me/Downloads/clip-copy.mp4\n /Users/me/Downloads/old/clip.mp4\n\n\nGroups are sorted biggest-waste-first, so the files worth deleting are right at the top.\n\n## How it works\n\n 1. **Group by size.** Two files of different sizes can't be byte-identical, so files with a unique size are never even opened.\n 2. **Hash the collisions.** Within each size group, each file gets a streamed SHA-256 (64 KB chunks — multi-GB files won't blow up memory).\n 3. **Report identical content.** Same hash ⇒ true byte-for-byte duplicate. Grouped and ranked by reclaimable space.\n\n\n\nIt **reports — it never deletes.** You decide what goes.\n\n## Install\n\n\n npx duphunt . # Node — nothing to install\n pip install duphunt # Python — same tool, same results\n\n\nTwo builds (Node + Python) that hash with SHA-256 and produce identical output, so it slots into whatever a given machine already has.\n\n## Use it in CI\n\n\n duphunt assets/ --exit-code # fail the build if duplicate assets sneak in\n duphunt . --json # or pipe the groups into your own tooling\n\n\n## A few honest details\n\n * **Zero dependencies, both builds.** stdlib only. A \"find my duplicates\" tool that pulled in a dependency tree of its own would be a bit much.\n * **Each physical file is counted once.** Repeated or overlapping roots (`duphunt ~/a ~/a/b`) and symlink aliases are de-duplicated by real path, so they never inflate the numbers — while genuine hard links still surface. (This one took a couple of rounds to get right.)\n * **Empty files and symlinks are skipped by default** (`--min-size 0` and `--follow` if you want them).\n\n\n\n## Links\n\n * **npm:** https://www.npmjs.com/package/duphunt\n * **PyPI:** https://pypi.org/project/duphunt/\n * **Source:** https://github.com/jjdoor/duphunt\n\n\n\nWhat do you reach for to find duplicate files today — and is \"I can't install anything on this box\" a problem you've hit too? Curious whether anyone would actually gate CI on a duplicate check.",
"title": "fdupes is great — until you can't install it. I built a zero-install duplicate finder."
}