Raw Record Source

{
  "path": "/dripline/who-took-this-photo",
  "site": "at://did:plc:rxduhzsfgfpl2glle7vagcwl/site.standard.publication/3mdw2qys2v42z",
  "$type": "site.standard.document",
  "title": "Who took this photo?",
  "coverImage": {
    "$type": "blob",
    "ref": {
      "$link": "bafkreia323fjc7yduwohkrjh2ny53doh5w33b7nzuc67rs6c2sp7vacvlq"
    },
    "mimeType": "image/webp",
    "size": 22614
  },
  "description": "An explainer for publishers, platforms, and their audiences, on the technology that traces the journey of digital information.",
  "publishedAt": "2026-02-03T00:00:00.000Z",
  "textContent": "\"Content provenance\" is an emerging set of techniques that Hypha has been working with, often in the context of mis/disinformation. Instead of asking \"Is this content fake?\" content provenance systems ask a narrow, more technically precise question: \"Where did this come from, and what happened to it?\" By cryptographically binding media to devices, creators, and workflows, provenance turns content into an auditable object.\n\nOrigins of provenance\n\nIn the period after World War II, there was an explosion of published writing, particularly in science and engineering. People were worried that there would be so much information that we would be too overwhelmed to make sense of it. (Perhaps similar to our fears surrounding the deluge of AI driven content)\n\nIn 1953, Hans Peter Luhn, a printer and textile expert, hired at IBM as an \"inventor\", proposed a technique that we now understand as one of the first practical hash functions. He was trying to solve the specific problem of quickly looking up a phone number in a large phone book. Instead of searching through the list of phone numbers sequentially, item by item, number by number, he wondered if there was a calculation you could perform to know where the item would be instantly.\n\nLuhn suggested putting the phone numbers into \"buckets.\" Given a phone number, the computer would perform a quick calculation to determine which bucket it belonged in, then search only that bucket. This was a very early hash function: a mathematical process that transforms data (in Luhn's case, a phone number) into a unique fingerprint (the bucket).\n\nRun the same data through a hash function, you always get the same result. Change even one bit, and you get a completely different result. Today's content provenance systems use this same property of hashing to uniquely identify content and detect tampering. If a file's hash changes, the file must have been changed. We can ensure the integrity of the file.\n\nHash functions can help prove that a file's contents haven't changed, but how do we prove who created the file in the first place? This is the problem of authentication, and it's solved using digital signatures, an application of public key cryptography.\n\nPublic key cryptography is a set up with two keys. A creator uses a \"private key\" (think of it as a secret stamp) to sign a file. Anyone can then use a \"public key\" to verify that the stamp is authentic. The principle is straightforward: create something that's computationally expensive to forge but cheap to verify.\n\nWe've always relied on forgery cost as the barrier to authentication. The Ottoman Sultans used the Tughra, an intricate calligraphic monogram, to authenticate imperial decrees. It worked because forging it required specialized craftsmanship, years of training, and access to specific materials. The cost of creating a convincing fake was cost prohibitive. Today, we use digital signatures to do the same thing, just with cryptography instead of calligraphy. You could use a powerful computer to forge a digital signature, but that computer would need to be tremendously more powerful than any computer humans have ever built.\n\n<figure class=\"pb4\">\n    <div class='flex items-center justify-center' style=\"width: 100%;\">\n        <img class=\"w-100\" src=\"{{ 'assets/images/posts/2026-02-03-why-provenance-matters-tughra.jpg' | relative_url }}\" alt=\"The Ottoman Tughra, an intricate calligraphic monogram used to authenticate imperial decrees\"/>\n    </div>\n    <figcaption>\n        The Tughra: from calligraphy to cryptography\n    </figcaption>\n</figure>\n\nSo: hashes prove content integrity. Digital signatures prove authorship. Content provenance combines these to prove chain of custody, authenticating every entity that touched the content and ensuring its integrity across the chain.\n\nProvenance in practice\n\nHypha members at Starling Lab, worked with Reuters and the camera manufacturing company, Canon, to demonstrate the provenance workflow in the field. This end-to-end system was field-tested by Reuters photojournalist Violeta Santos Moura in Ukraine during March and April of 2023.\n\n<figure class=\"pb4\">\n    <div class='flex items-center justify-center' style=\"width: 100%;\">\n        <img class=\"w-100\" src=\"{{ 'assets/images/posts/2026-02-03-why-provenance-matters-c2pa-workflow.png' | relative_url }}\" alt=\"C2PA provenance workflow diagram showing capture, signing, and verification of media\"/>\n    </div>\n    <figcaption>\n        Reuters and Canon field-tested the C2PA provenance workflow in Ukraine\n    </figcaption>\n</figure>\n\nThe most prominent framework trying to standardize the provenance process is called C2PA (the Coalition for Content Provenance and Authenticity). As of 2026, we're seeing real adoption, with cameras (Sony, Canon, Leica), software (Photoshop, Premiere), platforms (Google, Meta, Microsoft, OpenAI), and news organizations (BBC, Bloomberg), engaging with the standard.\n\nThat matters, because provenance only works if it survives across tools, workflows, and platforms. Without shared standards, provenance data would be stripped, ignored, or inconsistently applied. With them, it has a chance to become infrastructure rather than a niche feature.\n\nA key feature of C2PA is that it allows for provenance information to travel with the content itself in the form of signed metadata. It doesn't require another database to store information about the provenance, rather the content comes bundled with the provenance information. This is particularly valuable when publishers do not have a priori knowledge of how or where their content will be viewed.\n\nLimitations\n\nContent provenance raises the cost of certain deceptions. It makes it harder to claim videos are AI-generated when they're not. It creates audit trails journalists can follow. However, it does not mean that the content is accurate or truthful. We think of it purely as infrastructure.\n\nAnd the infrastructure itself could have vulnerabilities. Who decides which public keys are trustworthy? We rely on Certificate Authorities (like DigiCert or Let's Encrypt) that vouch for identity. If compromised or coerced, the entire chain collapses. In 2011, hackers breached DigiNotar and obtained fraudulent certificates to impersonate Google, enabling surveillance of Iranian citizens.\n\nThen there's the privacy paradox. If you're a journalist in Belarus documenting protests and your phone signs every photo, that signature becomes evidence against you. There are techniques (like Zero Knowledge proofs) for proving \"this photo is authentic\" without disclosing \"I took this at these coordinates\", but implementing this is complex and many tools don't support it yet.\n\nOur greatest concern around provenance is that we could move from a world of \"fake content\" to a world of \"signed (mis)information\". The technical authenticity of media can be a shield for dishonest narratives.\n\nHere are some scenarios to consider:\n\nA politician gives a speech. It's recorded with a signed camera, full C2PA credentials. In context, they're presenting an opponent's argument before refuting it. But a verified account shares a 15-second clip (cryptographically authentic, completely signed) that only includes the inflammatory quote. The provenance is perfect. The story is a lie.\n\nOr: a protest organizer gets photographed by a professional camera with full Content Credentials. That genuine image then gets shared with a caption claiming the person is a paid agitator. The photo is real. The accusation is false.\n\nPerhaps the most insidious form of misinformation is reality stripped of context or paired with false narratives. Technology can solve the problem of revealing provenance. It doesn't solve the interpretation problem, the context problem, or the intent problem.\n\nThis is not to say that provenance technology is useless. It is, like any other technology, limited in scope.\n\nWho is provenance for?\n\nAt Hypha, we believe that our informational landscape is going through a period of massive change, leading to an epistemic crisis. There are a host of intersecting factors here: the relationship between reader and publisher being severed, increasingly mediated by big tech platforms (first social media and now big AI); user generated content providing faster information and surfacing underreported views from the margins; outmoded media business models, kept alive by private benefactors or through state intervention; and now generative AI exacerbating the liar's dividend, where the mere existence of deepfakes allows dishonest actors to dismiss genuine, damaging evidence as \"AI-generated.\"\n\nThis problem space is complex and requires social and political interventions, but also needs a rearchitecture in our technical systems. At Hypha, we're researching using open web protocols to rebuild connection with users, thoughtful and values-oriented ways of using AI for knowledge management and synthesis, and content provenance, the topic we're covering here. We'll be writing more on the former in future posts in our digital trust series.\n\nHere's how different actors should think about content provenance today:\n\nPublishers\n\n Your content isn't being surfaced just on your channel anymore. It's being pulled up by LLMs, summarized by AI systems, shared and remixed across platforms. Laying down provenance infrastructure now means you can trace where your content goes. This could enable attribution and potentially monetization as the business models around AI-era content licensing develop.\n Your authentic content is being misused right now. Photos taken out of context, videos clipped to misrepresent events, genuine footage dismissed as AI-generated. Provenance gives you technical proof: capture device, timestamp, chain of custody. This can be useful evidence in court and for takedown requests when others misuse your content.\n Content provenance technology is still nascent. Currently most content on the web is unsigned. Early adopters can start \"showing their",
  "canonicalUrl": "https://hypha.coop/dripline/who-took-this-photo"
}