External Publication
Visit Post

fdupes is great — until you can't install it. I built a zero-install duplicate finder.

DEV Community [Unofficial] June 17, 2026
Source

fdupes, jdupes, rdfind, fclones — the duplicate-file finders are all excellent. They're also all native binaries you have to install first. Which is exactly what you can't do on the box that actually has the duplicate-file problem: the locked-down work laptop, the client's server, the CI runner, the throwaway container, the colleague's machine you're helping debug.

So I built duphunt : a duplicate finder that runs the second you have Node or Python, with nothing to install and no dependencies of its own.

$ npx duphunt ~/Downloads

2 duplicate group(s), 5 files, 8.1 MB reclaimable

  4.1 MB × 2   4.1 MB reclaimable
    /Users/me/Downloads/invoice.pdf
    /Users/me/Downloads/invoice (1).pdf

  2.0 MB × 3   4.0 MB reclaimable
    /Users/me/Downloads/clip.mp4
    /Users/me/Downloads/clip-copy.mp4
    /Users/me/Downloads/old/clip.mp4

Groups are sorted biggest-waste-first, so the files worth deleting are right at the top.

How it works

  1. Group by size. Two files of different sizes can't be byte-identical, so files with a unique size are never even opened.
  2. Hash the collisions. Within each size group, each file gets a streamed SHA-256 (64 KB chunks — multi-GB files won't blow up memory).
  3. Report identical content. Same hash ⇒ true byte-for-byte duplicate. Grouped and ranked by reclaimable space.

It reports — it never deletes. You decide what goes.

Install

npx duphunt .          # Node — nothing to install
pip install duphunt    # Python — same tool, same results

Two builds (Node + Python) that hash with SHA-256 and produce identical output, so it slots into whatever a given machine already has.

Use it in CI

duphunt assets/ --exit-code   # fail the build if duplicate assets sneak in
duphunt . --json              # or pipe the groups into your own tooling

A few honest details

  • Zero dependencies, both builds. stdlib only. A "find my duplicates" tool that pulled in a dependency tree of its own would be a bit much.
  • Each physical file is counted once. Repeated or overlapping roots (duphunt ~/a ~/a/b) and symlink aliases are de-duplicated by real path, so they never inflate the numbers — while genuine hard links still surface. (This one took a couple of rounds to get right.)
  • Empty files and symlinks are skipped by default (--min-size 0 and --follow if you want them).

Links

What do you reach for to find duplicate files today — and is "I can't install anything on this box" a problem you've hit too? Curious whether anyone would actually gate CI on a duplicate check.

Discussion in the ATmosphere

Loading comments...