{
  "path": "/dripline/similarity-at-scale-perceptual-hashing",
  "site": "at://did:plc:rxduhzsfgfpl2glle7vagcwl/site.standard.publication/3mdw2qys2v42z",
  "$type": "site.standard.document",
  "title": "Similarity at scale",
  "coverImage": {
    "$type": "blob",
    "ref": {
      "$link": "bafkreibzlkkqkq4hyqjlyllflgjzyuwap22bjiphgxvdfjmszrwpffjkuu"
    },
    "mimeType": "image/webp",
    "size": 20438
  },
  "description": "An overview of perceptual hashing",
  "publishedAt": "2026-02-19T00:00:00.000Z",
  "textContent": "Recent Dripline posts have covered what can be built on top of cryptographic hash functions: important concepts like data integrity and content provenance. Having the ability to uniquely refer to and verify an exact piece of data is crucial to building reliable digital content systems. But this can only take us so far. How can we keep track of an image when it's been converted to a different file format? Or compressed? How can we match a song played in a noisy room to its original studio recording? Perceptual hashing answers these questions, and allows provenance information to travel far beyond the exact bits it was originally attached to.\n\nPerceptual hash functions are designed differently than cryptographic hash functions, with algorithms that extract features from content and reduce them, outputting similar hashes for similar inputs. This means the compressed version of an image will have a hash that's nearly the same as the original uncompressed image. These hashes can then be stored and compared to find matching images. This concept allows for powerful tools, from Shazam to Google's reverse image search. At Hypha we want to take these tools further, to keep track of provenance even when media is altered.\n\nComparing basic, cryptographic, and perceptual hash functions\n\nLet's quickly revisit some of the basics. A hash function takes input data of any size (like someone's name) and returns a fixed size output (like a number from 0-100). Reducing the size and type of the data can make referring to it easier. You can learn more about hash functions in this excellent interactive article by Sam Rose.\n\nA cryptographic hash function adds certain constraints, for example that the function can't be reversed easily and that collisions are difficult to find. This is what makes them so useful – all these properties mean the 32 bytes of output of a cryptographic hash function like SHA-256 can uniquely represent the input data, even if that data is terabytes in size.\n\nOne key property of cryptographic hash functions is that their output is randomly distributed, with no identifiable patterns or mapping. This results in the \"avalanche effect.\" Even a very small change in the input will cause a large change in the output. Similar inputs do not result in similar outputs. This is a desirable property for most scenarios, because it prevents attackers from being able to learn anything about the input data.\n\n<figure class=\"pb4\">\n    <div class='flex items-center justify-center' style=\"width: 100%;\">\n        <img class=\"w-100\" src=\"{{ 'assets/images/posts/2026-02-19-perceptual-hashing-image3.png' | relative_url }}\" alt=\"The SHA1 hash function exhibits good avalanche effect. When a single bit is changed the hash sum becomes totally different. The hash sums in the diagram are the SHA1 sums of the strings &#34;000&#34;, &#34;001&#34; and &#34;010&#34; encoded in standard ASCII and no trailing new line etc. The difference between 000 and 001 is only one bit even though it is ASCII encoded bytes. Original illustration by David Göthberg, Sweden. Released by David as public domain.\"/>\n    </div>\n    <figcaption>\n        <a href=\"https://en.wikipedia.org/wiki/Avalanche_effect\">Wikipedia article</a> or <a href=\"https://commons.wikimedia.org/wiki/File:Avalanche_effect.svg\">Commons page</a><br>\n        David Göthberg (public domain)\n    </figcaption>\n</figure>\n\nPerceptual hashing has the opposite goal: similar inputs should result in similar hash outputs. This property is the entire utility of perceptual hash functions. A good perceptual hash function will return similar outputs for an original image, as well as the compressed version of that image, for example – because both these input images look very similar.\n\n<figure class=\"pb4\">\n    <div class='flex items-center justify-center' style=\"width: 100%;\">\n        <img class=\"w-100\" src=\"{{ 'assets/images/posts/2026-02-19-perceptual-hashing-image1.png' | relative_url }}\" alt=\"Side-by-side comparison showing how small image changes affect perceptual hash similarity. Six columns display: (1) original image, (2) JPEG compressed to 20% file size, (3) resized to 75% smaller, (4) noise added to 40% of pixels, (5) cropped by 10 pixels, and (6) a completely different image structure. Each column includes the modified image, a small grid visualization of hash differences (with colored squares marking changed bits), and the resulting hash string with changed characters highlighted. Captions below indicate similarity: JPEG compression shows 2 bits different (99% similarity), resize 10 bits (96%), noise 14 bits (95%), crop 18 bits (93%), and different image structure 120 bits (55% similarity).\"/>\n    </div>\n    <figcaption>\n        Similar images have very similar hashes (PDQ hash algorithm)<br>\n        <a href=\"https://github.com/darwinium-com/pdqhash\">pdqhash</a> (Apache 2.0)\n    </figcaption>\n</figure>\n\nWhile cryptographic hash functions operate on raw bytes, perceptual hash functions are specific to a type of media. There are different perceptual hash functions for images, video, audio, and other mediums. This is because a perceptual hash \"is a fingerprint of a multimedia file derived from various features from its content\" (phash.org), and the kinds of features available in an image vs video are different. If you've heard the term \"audio fingerprint\" before, that's referring to an audio perceptual hash algorithm.\n\nAll these hash functions exist on a spectrum from shallow analysis to deep analysis. Cryptographic hash functions only compare exact bits, perceptual hash functions roughly compare media content (pixels, musical notes, etc.), and at the most intensive end of the scale, machine learning algorithms compare concepts like \"a forest\". This article focuses on perceptual hashes because they nicely balance utility and efficiency: they can match content without being too computationally expensive to create or calculate.\n\n<figure class=\"pb4\">\n    <div class='flex items-center justify-center' style=\"width: 100%;\">\n        <img class=\"w-100\" src=\"{{ 'assets/images/posts/2026-02-19-perceptual-hashing-image4.png' | relative_url }}\" alt=\"A horizontal diagram showing a spectrum of image comparison techniques from shallow to deep analysis. At the top, arrows indicate increasing depth of image analysis, robustness, recall, and CPU cost toward the right, and shallower analysis, stricter matching, lower cost, and faster processing toward the left. Four labeled sections appear from left to right: &#34;Exact same bits: MD5, SHA-256&#34; &#34;Syntactic: PhotoDNA, dHash/aHash/pHash, PDQ&#34; &#34;Deeper syntactic: GIST, SIFT&#34; &#34;Semantic: Machine-learning algorithms&#34; The rightmost &#34;Semantic&#34; section is darker blue, visually emphasizing higher computational cost and deeper analysis.\"/>\n    </div>\n    <figcaption>\n        <a href=\"https://github.com/facebook/ThreatExchange/blob/main/hashing/hashing.pdf\">Facebook, hashing.pdf</a> page 5 (BSD)\n    </figcaption>\n</figure>\n\nHow perceptual hash functions work\n\nThe final output of a perceptual hash function will be binary data of some length, such as 32 bytes or 256 KiB. The challenge of designing the algorithm is extracting the important parts from the media file and reducing those \"features\" down to just the small output size. For example Chromaprint–a perceptual hash algorithm aimed at music identification–extracts which 12 notes are being played in a provided song over time, filters that data to reduce it further, and then stores it as a set of integers that can be easily compared to find similar songs.\n\nFor image hashing, there are a variety of methods, but they all boil down to repeated reductions of the image data. First resize the image to some very small size, then convert to grayscale. Process the grayscale pixels in some relevant way, such as averaging the colours available, or applying the discrete cosine transform. Binarize the output from that process, and then take those bits as your hash output.\n\n<figure class=\"pb4\">\n    <div class='flex items-center justify-center' style=\"width: 100%;\">\n        <img class=\"w-100\" src=\"{{ 'assets/images/posts/2026-02-19-perceptual-hashing-image2.png' | relative_url }}\" alt=\"Step-by-step visual diagram explaining how a perceptual image hash (PDQ/pHash-style) is generated from an image. The process is shown in numbered stages: Original image: A colorful looping ribbon shape on a black background. Resize to max 512×512. Take luminance values (convert to grayscale). Blur using Jarosz filters. Downsample to 64×64. Apply Discrete Cosine Transform (DCT), shown as a matrix of frequency coefficients. Convert DCT values to binary by thresholding against the median, shown as a grid of 0s and 1s. Read the binary values in a specified order (bottom right to top left). Convert the binary string to hexadecimal, producing a final hash value labeled as the &#34;PDQ Hash.&#34; Each stage includes a small visual representation of the intermediate image or matrix transformation.\"/>\n    </div>\n    <figcaption>\n        Visualization of the PDQ image perceptual hashing algorithm<br>\n        <a href=\"https://github.com/darwinium-com/pdqhash\">pdqhash</a> (Apache 2.0)\n    </figcaption>\n</figure>\n\nAs you can see, each perceptual hash function is quite different, especially across different input mediums. Understanding or creating the algorithm will often require some specialized knowledge in the domain (such as digital audio), as well as knowledge of signal processing and even machine learning.\n\nUse cases for a perceptual hash function\n\nPerceptual hashing is useful for any scenario that involves automatically finding similar media, especially at scale. An example is the Shazam app, which works by comparing a perceptual hash generated by your phone recording against their massive database of perceptual hashes for songs. Using a perceptual hash means songs can be found even in noisy environments–the underlying musical features can",
  "canonicalUrl": "https://hypha.coop/dripline/similarity-at-scale-perceptual-hashing"
}