{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreig7gl6nnfqqkprzhfbt6mb4gcc5thwzehqnstomursu6gdoa6kapq",
    "uri": "at://did:plc:25rdn5elo5izoxrmtis34zuk/app.bsky.feed.post/3moigqrabfco2"
  },
  "coverImage": {
    "$type": "blob",
    "ref": {
      "$link": "bafkreiea74xqhoju7rksdj4mzrgec44hlf4ltdaa6i2nkctoiiu67uzy5e"
    },
    "mimeType": "image/webp",
    "size": 100646
  },
  "path": "/_06a3df6b50aec966668fb/fdupes-is-great-until-you-cant-install-it-i-built-a-zero-install-duplicate-finder-3313",
  "publishedAt": "2026-06-17T13:28:52.000Z",
  "site": "https://dev.to",
  "tags": [
    "showdev",
    "cli",
    "opensource",
    "productivity",
    "https://www.npmjs.com/package/duphunt",
    "https://pypi.org/project/duphunt/",
    "https://github.com/jjdoor/duphunt"
  ],
  "textContent": "`fdupes`, `jdupes`, `rdfind`, `fclones` — the duplicate-file finders are all excellent. They're also all native binaries you have to install first. Which is exactly what you can't do on the box that actually has the duplicate-file problem: the locked-down work laptop, the client's server, the CI runner, the throwaway container, the colleague's machine you're helping debug.\n\nSo I built **duphunt** : a duplicate finder that runs the second you have Node or Python, with nothing to install and no dependencies of its own.\n\n\n\n    $ npx duphunt ~/Downloads\n\n    2 duplicate group(s), 5 files, 8.1 MB reclaimable\n\n      4.1 MB × 2   4.1 MB reclaimable\n        /Users/me/Downloads/invoice.pdf\n        /Users/me/Downloads/invoice (1).pdf\n\n      2.0 MB × 3   4.0 MB reclaimable\n        /Users/me/Downloads/clip.mp4\n        /Users/me/Downloads/clip-copy.mp4\n        /Users/me/Downloads/old/clip.mp4\n\n\nGroups are sorted biggest-waste-first, so the files worth deleting are right at the top.\n\n##  How it works\n\n  1. **Group by size.** Two files of different sizes can't be byte-identical, so files with a unique size are never even opened.\n  2. **Hash the collisions.** Within each size group, each file gets a streamed SHA-256 (64 KB chunks — multi-GB files won't blow up memory).\n  3. **Report identical content.** Same hash ⇒ true byte-for-byte duplicate. Grouped and ranked by reclaimable space.\n\n\n\nIt **reports — it never deletes.** You decide what goes.\n\n##  Install\n\n\n    npx duphunt .          # Node — nothing to install\n    pip install duphunt    # Python — same tool, same results\n\n\nTwo builds (Node + Python) that hash with SHA-256 and produce identical output, so it slots into whatever a given machine already has.\n\n##  Use it in CI\n\n\n    duphunt assets/ --exit-code   # fail the build if duplicate assets sneak in\n    duphunt . --json              # or pipe the groups into your own tooling\n\n\n##  A few honest details\n\n  * **Zero dependencies, both builds.** stdlib only. A \"find my duplicates\" tool that pulled in a dependency tree of its own would be a bit much.\n  * **Each physical file is counted once.** Repeated or overlapping roots (`duphunt ~/a ~/a/b`) and symlink aliases are de-duplicated by real path, so they never inflate the numbers — while genuine hard links still surface. (This one took a couple of rounds to get right.)\n  * **Empty files and symlinks are skipped by default** (`--min-size 0` and `--follow` if you want them).\n\n\n\n##  Links\n\n  * **npm:** https://www.npmjs.com/package/duphunt\n  * **PyPI:** https://pypi.org/project/duphunt/\n  * **Source:** https://github.com/jjdoor/duphunt\n\n\n\nWhat do you reach for to find duplicate files today — and is \"I can't install anything on this box\" a problem you've hit too? Curious whether anyone would actually gate CI on a duplicate check.",
  "title": "fdupes is great — until you can't install it. I built a zero-install duplicate finder."
}