{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreiflrwdd3qjfvpqcjvt5xwxqszf6mafvaifhezelufgexuhco42le4",
    "uri": "at://did:plc:jo3wjj2gx46alocis4wubmwr/app.bsky.feed.post/3mlkzicwlpah2"
  },
  "coverImage": {
    "$type": "blob",
    "ref": {
      "$link": "bafkreih43vxyxzayuhsavdrpm2bphfvukad5ftur5bvos5wrntfmcxdb2e"
    },
    "mimeType": "image/jpeg",
    "size": 233078
  },
  "path": "/2026/05/11/technical-tinkering-for-commonsdb-at-the-wikimedia-hackathon/",
  "publishedAt": "2026-05-11T07:00:00.000Z",
  "site": "https://diff.wikimedia.org",
  "tags": [
    "the\nfirst blog post in the series",
    "the\nNorthwestern hackathon in Arnhem",
    "Wikimedia\nHackathon",
    "Paulina",
    "the\ndataset on Wikimedia Commons",
    "CC BY-SA\n4.0",
    "Full credits on Commons",
    "contact\nus",
    "Oulu\nLöyly",
    "hackathon\nbefore Wikimania in Paris",
    "Margott",
    "CC\nBY-SA 4.0",
    "Attribution\nAPI",
    "uses",
    "David\nLynch",
    "suggestion\nmode",
    "patch to implement\nthat",
    "my own unconference\nsession",
    "TheDJ",
    "feature\nrequest",
    "User:Arian\nBozorg (WMDE)"
  ],
  "textContent": "_This blog post is the fourth in the series about CommonsDB. If you don’t know about the project at all, I recommend checking out the\nfirst blog post in the series which has a video clearly introducing the project._\n\n> Imagine you come across an image on social media and you realize it would fit very well on Wikimedia Commons. But it has been shared so much that all metadata is lost and you don’t know who created it nor what its copyright status is. But, in this imaginary timeline, you also know that Wikimedia Commons has a connection to CommonsDB which might help you. So you start your upload in the UploadWizard as usual and lo and behold, you get notified that Europeana has this image with a Creative Commons Attribution ShareAlike license, so you can go ahead with the upload.\n\nOnly a few weeks after the\nNorthwestern hackathon in Arnhem we headed to the Wikimedia\nHackathon in Milan. At the time of the last report, we were very close to have declared one million images from Wikimedia Commons in the CommonsDB registry, and now we have also passed 1.5 million images.\n\nOne thing we had pondered internally was where and how the functionality that helps a user uploading an image should be integrated. After considering user scripts, gadgets, or a bespoke extension, the discussions with various people at the hackathon made it clear that having it in the actual UploadWizard makes a lot of sense. It aligns with other existing image checks and will likely make the actual integration more straightforward and at the same time seems beneficial for the general maintainability.\n\nThe hackathon also provided an opportunity to talk with people familiar with the Eventstreams. We had already identified that as a possible solution to how to know if a page gets deleted. If it is, we should, of course, remove it from the CommonsDB registry. The possibility to sit down and discuss a particular use case and then get assurance that this is the right tool and tips on how to use it properly is immensely valuable and reason that hackathons are needed in our community.\n\nIn the last report we also mentioned how the Wikidata community is modeling reasons why images are in the public domain; these are already in use by the tool Paulina and I got the chance to talk to the maintainer of the tool, verifying that we are aligned on the same goal and hopefully can make this  _Public Domain Rationale_ a standardized way to store this kind of data useful for many more than just our two tools.\n\n## The technical tinkering\n\nFrom the North Western Europe Hackathon, the prototype script could identify similar images in the registry with the CommonsDB search API and retrieve the canonical license URL. This was already a good step, but it still didn’t help the user to know which template to use during the upload if it was anything else than the latest versions of the Creative Commons licenses that have selection buttons in the UploadWizard. My goal for the hackathon was to figure out which template corresponded to this canonical URL. I figured out that we can find this out by first asking the Wikidata API which item had this URL as a value, which would be the item for the license. From there we can find the item for the template for that license with another API call. A third API call could find the Wikimedia Commons sitelink from which we can construct the wikitext to paste. This could be made with fewer calls using the Wikidata Query Service, but discussing the whole querying process with people at the hackathon made me realize that these templates might be very stable, so it may be quicker to do all this querying once and then just embed a lookup table in the script. A bit tedious work, and some minor data cleanup on Wikidata, but this gave a good result. If anyone else needs to do a similar lookup, I thought I could save them some time and published the\ndataset on Wikimedia Commons.\n\nScreenshot from the published dataset. CC 0.\n\nThis made the lookup lightning quick, and in the video below, which is a narrated version of the one I presented at the ending hackathon showcase, we can see it in use. The user uploads a file that is obviously modified. The CommonsDB registry still finds a match and a license and the user can click through to the source to verify. The correct template has been identified and the user can copy it with one-click to paste it in the next step of the upload and preview that it is working as expected.\n\nA demo of how the upload process could be supported by CommonsDB. CC BY-SA\n4.0 \nFull credits on Commons.\n\n## What’s happening next in the project?\n\nWe are still declaring as many images from Wikimedia Commons as we can in the CommonsDB registry. We are also looking for more media providers. The more media in there, the more useful it will be for everyone using it. If you have a repository of public domain or freely licensed images or if you have contacts with anyone that does and who might be willing to participate, please contact\nus.\n\nWe’ll also keep hacking on the prototype. For example, the upload process can be even smoother for the user, perhaps by automating the copying and pasting (but still possible to manually override for corner cases). We also need to think about a good workflow for uploads with multiple images. We will also be traveling to Oulu\nLöyly and the hackathon\nbefore Wikimania in Paris. Please talk to us if you are there and are curious or have ideas.\n\nJan gesturing during the showcase. Photo: Margott, CC\nBY-SA 4.0.\n\n## Sharing knowledge\n\nI already highlighted it in the last hackathon report, but I think it may be worth iterating that much of the value from a hackathon is getting the chance to talk to other people. In my book, that also includes making yourself useful for other participants of the hackathon too and here are a few things that being at the hackathon serendipitously enabled.\n\nWhen working with licensing questions, I took some time to check out the new Attribution\nAPI which possibly could be useful for us to use when making the declarations. By chance, I noticed that the URLs for the licenses were not the same URLs that Creative Commons themselves \nuses. It’s a tiny difference, our URLs are missing a trailing slash. Luckily, some of the people working on this were there and I could show them the mismatch in practice.\n\nI happened to hear David\nLynch talking about the suggestion\nmode during the pitches and I thought it might be useful to let the communities be able to define their own messages for maintenance templates. I must have sold the idea to him well over lunch, because later that day David had a patch to implement\nthat.\n\nWhen preparing my own unconference\nsession, I was wondering if it was possible to adjust the video playback speed. This seemingly inspired TheDJ to tackle an almost decade long feature\nrequest and submit a patch to enable these as keyboard shortcuts.\n\nUser:Arian\nBozorg (WMDE) grabbed me for a quick but structured interview about mobile editing on Wikidata. It was quite fun to see how much better it has become, and we also discovered a few odd bugs whilst doing the interview.",
  "title": "Technical tinkering for CommonsDB at the Wikimedia Hackathon"
}