{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreifdq6qm73sbyeidtpqbpa44g3e5ek66xgdb4vemyryludnbqvduwe",
"uri": "at://did:plc:jo3wjj2gx46alocis4wubmwr/app.bsky.feed.post/3mhatsblvtwm2"
},
"coverImage": {
"$type": "blob",
"ref": {
"$link": "bafkreih3aio7b7qehe2s2g2kmdwnvccc2rgyst6xwfrkiw2s7migqudqdm"
},
"mimeType": "image/png",
"size": 197370
},
"path": "/2026/03/17/building-the-dagbanli-dictionarys-audio-pipeline-ogg-ios-and-transcoding/",
"publishedAt": "2026-03-17T07:00:00.000Z",
"site": "https://diff.wikimedia.org",
"tags": [
"previous post",
"https://upload.wikimedia.org/wikipedia/commons/5/5e/Dag-Kuli.ogg",
"https://dagbanli-harvest-worker.workers.dev/audio-proxy?url=https://upload.wikimedia.org/wikipedia/commons/5/5e/Dag-Kuli.ogg"
],
"textContent": "\n\n_A dictionary without pronunciation is incomplete, especially for a tonal language like Dagbanli. Here’s how we built a pipeline from Wikimedia Commons recordings to in-browser playback, including a workaround for iOS Safari._\n\n## Introduction\n\nDagbanli is a tonal language where pronunciation carries meaning. The word _wahu_ with high tones means “horse”, but with low tones means “snake”. A text-only dictionary that simply prints the spelling misses this crucial dimension. Listeners cannot know which word is intended unless they hear it.\n\nIn our previous post, we covered how we structured Dagbanli Lexemes on Wikidata using Senses, Forms, and special handling for digraphs. Now we turn to the audio pipeline that brings those words to life.\n\nWikidata’s lexicographical model includes the **P443** property (pronunciation audio), which links a specific form of a Lexeme to an audio file stored on Wikimedia Commons. These recordings are crowdsourced contributions from native Dagbanli speakers, making them an invaluable resource for preserving authentic pronunciation.\n\nThe challenge is making these recordings play reliably across every device, operating system, and browser, including those that refuse to play the open OGG format. This post dives into how we built an audio pipeline that respects the source of truth while ensuring universal playback.\n\n## 1. P443: Pronunciation Audio on Wikidata\n\n**How Audio Is Stored on Wikidata**\n\nWhen a contributor records a pronunciation for a Dagbanli word, they upload an audio file (typically `.ogg` format) to Wikimedia Commons. Then, on the corresponding Lexeme Form, they add a statement with property P443 pointing to that file.\n\nFor example, the singular Form of “kuli” might have:\n\n\n kuli (L307875-F1)\n P443 --> File: Dag-kuli.ogg\n\nThe file itself lives at a URL like:\n\n\n https://upload.wikimedia.org/wikipedia/commons/5/5e/Dag-Kuli.ogg\n\n**Extracting Audio During Harvest**\n\nOur harvest script (running every six hours via cron) fetches all Dagbanli Lexemes from Wikidata. When processing Forms, it looks for P443 claims and extracts the filename:\n\n\n javascript\n\n const audioClaimValue = Form.claims?.P443?.[0]?.mainsnak?.datavalue?.value;\n const audioFilename = typeof audioClaimValue === 'string'\n ? audioClaimValue\n : audioClaimValue?.value || audioClaimValue?.text;\n\n const audioUrl = audioFilename\n ? `https://commons.wikimedia.org/wiki/Special:FilePath/${encodeURIComponent(audioFilename.replace(/ /g, '_'))}`\n : undefined;\n\nThe resulting audioUrl is stored in the final JSON alongside the Form data, ready to be served to the frontend.\n\n**The Audio Index**\n\nScanning every Lexeme’s Forms on each search or filter would be prohibitively slow. Instead, we build an **audio index** : a lightweight set of Lexeme IDs that have at least one Form with a P443 recording.\n\n\n javascript\n\n const audioIds = new Set();\n for (const Lexeme of Lexemes) {\n if (Lexeme.Forms?.some(Form => Form.audioUrl)) {\n audioIds.add(Lexeme.wikidataId);\n }\n }\n\nThis index powers the “Has Wikidata Form Pronunciation Audio” filter in the Gballi browser, enabling instant filtering across 11,000+ words without scanning the entire dataset.\n\n## 2. The OGG Problem on iOS\n\n**Why OGG?**\n\nWikimedia Commons stores audio in the **OGG Vorbis** format. It is an open, patent-free codec aligned with the mission of free knowledge. No proprietary licensing, no restrictions, exactly what an open platform should use.\n\nUnfortunately, browser support for OGG is inconsistent:\n\n**Browser** | **OGG Support**\n---|---\nChrome | Full support\nFirefox | Full support\nEdge | Full support\nSafari (macOS) | Partial (may require configuration)\nSafari (iOS) | No support at all\n\niOS Safari simply refuses to play OGG files. No fallback, no codec download, nothing.\n\n**Our Solution: On‑the‑Fly Transcoding**\n\nWe could not just store MP3 copies alongside the OGG files, because Wikimedia Commons is the source of truth. Duplicating files would break the principle of a single authoritative source. Instead, we built a **transcoding proxy** into our Cloudflare Worker.\n\nWhen the frontend requests an audio file, it first checks if the browser is Safari on iOS:\n\n\n javascript\n\n const isIOS = /iPad|iPhone|iPod/.test(navigator.userAgent) && !window.MSStream;\n const isSafari = /^((?!chrome|android).)*safari/i.test(navigator.userAgent);\n const needsTranscode = (isIOS || isSafari);\n\n\nIf transcoding is needed, the app requests the audio through our worker’s /audio-proxy endpoint:\n\n\n https://dagbanli-harvest-worker.workers.dev/audio-proxy?url=https://upload.wikimedia.org/wikipedia/commons/5/5e/Dag-Kuli.ogg\n\nThe worker then:\n\n 1. Fetches the original OGG file from Wikimedia Commons.\n 2. Transcodes it to MP3 using FFmpeg (via WebAssembly or a dedicated transcoding service).\n 3. Caches the result in R2 for future requests.\n 4. Returns the MP3 with appropriate headers.\n\n\n\nThis approach keeps Wikimedia Commons as the single source of truth while providing seamless playback on all devices.\n\n**Why Not Just Store MP3s?**\n\nWe considered running a one‑time script to download all OGG files, convert them to MP3, and upload them to R2. This would simplify the architecture significantly. However, it would break the connection to Wikimedia Commons:\n\n * If a new recording is added, our dictionary would not see it until we re‑ran the conversion.\n * If an existing recording is improved or corrected, we would have a stale copy.\n * We would be responsible for storing and serving audio files indefinitely, increasing our storage costs and maintenance burden.\n\n\n\nBy keeping Wikimedia Commons as the source of truth and transcoding on‑demand, we stay aligned with the open data ecosystem. The `Special:FilePath` URL always points to the latest version, and our cache ensures that popular files are served quickly after the first request.\n\n## 3. Audio Playback UX\n\n**One‑Tap Playback**\n\nIn the word detail card, each Form with an associated audio file displays a speaker icon. Tapping the icon triggers playback immediately, with no page reload or navigation.\n\n\n jsx\n\n <button onClick={() => playAudio(Form.audioUrl)}>\n <SpeakerIcon />\n <span>{Form.representation}</span>\n </button>\n\n**Visual Feedback**\n\nWhile the audio is loading, the icon shows a spinner. While playing, it changes to a stop icon, allowing users to interrupt playback. All audio is handled through a central `AudioPlayer` service that ensures only one file plays at a time.\n\n**Offline Caching for Favorites**\n\nWhen a user favorites a word, we proactively cache its audio files in IndexedDB. This ensures that favorite words are fully usable offline, a critical feature for users in areas with unreliable internet.\n\n\n javascript\n\n async function cacheAudioForFavorite(wordId, audioUrl) {\n const response = await fetch(audioUrl);\n const blob = await response.blob();\n await db.favoriteAudio.put({ wordId, blob, timestamp: Date.now() });\n }\n\n**Graceful Degradation**\n\nIf an audio file fails to load due to network issues, missing file, or transcoding failure, the icon remains visible but shows a tooltip explaining the problem. The dictionary remains usable even when audio is not available.\n\n## 4. The Audio Index: Fast “Has Audio” Filtering\n\n**The Performance Problem**\n\nThe Gballi browser includes a filter toggle: **“Has Wikidata Form Pronunciation Audio”**. Checking this box should instantly show only words that have at least one recorded pronunciation.\n\nA naive implementation would scan every Lexeme’s Forms on each filter toggle, iterating over 11,000 words and their associated Forms, checking for `audioUrl` properties. This would be too slow for real‑time interaction, especially on mobile devices.\n\n**The Solution: Pre‑built Index**\n\nDuring the harvest process, we build an **audio index** : a simple array of Lexeme IDs that have at least one Form with a P443 recording.\n\n\n javascript\n\n // audio-index.json\n [ \"L307875\", \"L308234\", \"L309871\", ... ]\n\nThis file is loaded once at sync time and stored in memory. When the user toggles the filter, we simply check whether the current Lexeme’s ID is in this set, an `O(1)` operation.\n\n**Filter Logic**\n\n\n javascript\n\n const hasAudioFilterEnabled = true;\n const filteredLexemes = allLexemes.filter(Lexeme =>\n !hasAudioFilterEnabled || audioIndex.has(Lexeme.wikidataId)\n );\n\nThis pattern appears throughout the dictionary: pre‑compute expensive lookups during the harvest, then use simple set membership checks at runtime.\n\n## 5. Implementation Details\n\n**Worker Route for Audio Proxy**\n\nHere is the simplified worker route that handles audio transcoding:\n\n\n javascript\n\n if (request.method === 'GET' && url.pathname === '/audio-proxy') {\n const originalUrl = url.searchParams.get('url');\n if (!originalUrl) return new Response('Missing url', { status: 400 });\n\n // Check R2 cache first\n const cacheKey = `audio-cache/${hash(originalUrl)}.mp3`;\n const cached = await env.dict.get(cacheKey);\n if (cached) {\n return new Response(cached.body, {\n headers: { 'Content-Type': 'audio/mpeg' }\n });\n }\n\n // Fetch original OGG\n const oggResponse = await fetch(originalUrl);\n if (!oggResponse.ok) {\n return new Response('Audio not found', { status: 404 });\n }\n\n // Transcode OGG to MP3 (using FFmpeg WASM or external service)\n const mp3Buffer = await transcodeOggToMp3(await oggResponse.arrayBuffer());\n\n // Cache in R2\n await env.dict.put(cacheKey, mp3Buffer, {\n httpMetadata: { contentType: 'audio/mpeg' }\n });\n\n // Return MP3\n return new Response(mp3Buffer, {\n headers: { 'Content-Type': 'audio/mpeg' }\n });\n }\n\n**Audio Service on the Frontend**\n\nThe frontend audio service ensures only one file plays at a time and handles iOS detection:\n\n\n javascript\n\n class AudioPlayer {\n constructor() {\n this.current = null;\n this.needsTranscode = /iPad|iPhone|iPod/.test(navigator.userAgent) ||\n (/^((?!chrome|android).)*safari/i.test(navigator.userAgent));\n }\n\n async play(url) {\n this.stop();\n\n const finalUrl = this.needsTranscode\n ? `https://dagbanli-harvest-worker.workers.dev/audio-proxy?url=${encodeURIComponent(url)}`\n : url;\n\n const audio = new Audio(finalUrl);\n audio.play();\n this.current = audio;\n\n audio.onended = () => { this.current = null; };\n }\n\n stop() {\n if (this.current) {\n this.current.pause();\n this.current = null;\n }\n }\n }\n\n## Conclusion\n\nAudio transforms the dictionary from a reference tool into an oral history archive. Every recording on Wikimedia Commons is a native speaker preserving their pronunciation for future generations. By building a pipeline that respects Wikimedia Commons as the source of truth while transcoding on‑demand for incompatible browsers, we ensure that these recordings reach the widest possible audience.\n\nThe audio index demonstrates a pattern we have repeated throughout the dictionary: pre‑compute expensive operations during harvest, store the results in lightweight lookup tables, and use them for instant filtering and search at runtime.\n\nIn the next post, we will dive into how we made the entire dictionary work offline, syncing the full dataset to IndexedDB, handling version updates, and building a resilient offline‑first experience.",
"title": "Building the Dagbanli Dictionary’s Audio Pipeline: OGG, iOS, and\nTranscoding"
}