External Publication

Introducing OpenSpeaks Tools: Subtitler, Tome, and Bento

en.planet.wikimedia.org [Unofficial] May 29, 2026

Community language documenters and archivists have long been recording and preserving their community’s knowledge in audio and video. However, the tools for subtitling, describing, and managing media require expensive infrastructure. Since 2024, we have been identifying such technological barriers. Based on the learning, we at OpenSpeaks have also developed a set of tools.

We invite you to use them: Subtitler, Tome, and Bento.

Currently in alpha/beta, these tools are open source, work across platforms as browser-based applications, and are offline‑first where possible. Subtitler is an audio and video caption/subtitle editor, Tome is a metadata editor, and Bento is a media file and folder utility. Their ability to work offline or with slow internet connectivity is specifically designed to let field documenters use them on the go. They are also self-contained, with user documentation built into the app, letting users get help from one place.

OpenSpeaks Subtiter: caption and subtitler editor/translator

OpenSpeaks Subtitler (v.0.10.1 Alpha) showing active subtitling process

Captions/subtitles are critical to web accessibility as deaf people rely on on-screen text of spoken words. So, we had included accessibility as a section in OpenSpeaks, our flagship open educational resource (OER), and had worked in 2021 to enhance practical guides for language documenters and archivists. To know whether we should develop a subtitle editor in the first place, we tested several open-source and proprietary subtitle editors. Amara was our best choice for its clean, intuitive interface, multilingual support, and, above all, its open-source status. However, offline and local loading for private or sensitive media, compatibility with Wikimedia Commons, and many other requirements demanded a new subtitle editor.

OpenSpeaks Subtitler is optimised for subtitling during field documentation. You can load an audio or video file and a related subtitle file from a device or TimedText from Wikimedia Commons, and edit it. You can also create dummy subtitles by using silence detection and other parameters. In the waveform view, you can also manually create subtitles. We plan to integrate machine translation to assist users with translation. We also plan to use speech recognition (ASR) to create a rough draft that users can review and edit, though ASR is absent or faulty for most Indigenous and other local languages.

Check out OpenSpeaks Subtitler and share feedback.

OpenSpeaks Tome: language documentation and oral knowledge metadata editor

OpenSpeaks Tome (v.0.1.3 Beta) showing metadata import page

Language archives often collect a wide range of information related to a recording. This includes date and place of recording, interviewer, recordist, videographer, translator and transcriber’s names, interviewee’s name, occupation, consent and other details, review/transcription/subtitle process and details. This information is called metadata. Metadata is published alongside the audio/video files so others can verify what is said in those files.

Wikimedia Commons asks only a few metadata fields for audio, video and image files to make it easier for new contributors. However, oral knowledge is far from being considered an equal form of knowledge as compared to written knowledge. Our goal is never just to help communities create and upload content, but to help them understand how to fight back against epistemic inequality. For community oral knowledge to stand on equal footing with other dominant written knowledge, one way is to clearly document the recording and preservation processes. This includes technological (recording, media processing, captioning/transcribing), personal (interviewers, interviewees and other involved individuals), and peer production (how recorded knowledge of one person was reviewed by others, how the reviewer’s comments were captured, how disagreements about any recorded topic were dealt with, if any) details. We capture these using the Oral Knowledge Framework (formerly called the Oral History Framework), a set of principles that adhere to the FAIR–CARE principles, specifically covering community-based language recording and preservation.

OpenSpeaks Tome helps capture metadata largely required by language archives such as the Endangered Languages Archive (ELAR) and the Language Archive Cologne (LAC), and is also framed within the Oral Knowledge Framework. Since resources available to each community language documenter and archivist are different, but limited, Tome only recommends what is useful, keeping most input fields optional. It also allows importing and exporting metadata files in a wide range of formats (e.g., Wikitext, plaintext, Markdown, Word, HTML, JSON, XML) and metadata vocabularies (e.g., OPEX/IMDI for ELAR, BLAM/CMDI for LAC, and Dublin Core and the OpenSpeaks JSON). It can export Wikimedia Commons templates that can be copied directly into a file’s Wikitext. See this example.

Check out OpenSpeaks Tome and share feedback.

OpenSpeaks Bento: a small toolbox for media reality

OpenSpeaks Tome (v.0.0.2 Alpha) showing file renaming tab

OpenSpeaks Bento clubs three existing OpenSpeaks utilities: Media Metadata Viewer & Compress Helper, Media Duration Calculator, and Multimedia Organization Tool. Together, they address three very ordinary but very time‑consuming problems in language documentation workflows:

You need to quickly see what is inside a media file and decide whether you can send it over the connection you have.
You need to know how much audio and video your team has recorded to plan subtitles, translations, or budgets.
You need to name files in a structured manner, so multiple individuals can collaborate.

The Organise tab (formerly, Multimedia Organization Tool) helps organise, tag, and batch‑rename media files in a folder using structured naming conventions that reflect how a project actually works. The Duration tab (Media Duration Calculator) opens audio nd video files inside a folder and returns their total/specific durations. It can also export that information in CSV or text format to support project planning, billing, and estimating the effort for subtitling and translation. The Compress tab (formerly, Media Metadata Viewer & Compress Helper) reads key technical properties of a file: duration, resolution, frame rate, audio sample rate, bitrate, and codec. It helps users choose compression settings for sharing with an audio/video editor, a transcriber, or even the interviewee for review before publishing.

Bento is to assist project coordinators or archivists for basic operations without scripting, installing heavy software, or copying data between tools. It is intentionally narrow: it does not try to be a full digital asset management system but a set of utilities tuned to the realities of small teams and unstable networks.

Check out OpenSpeaks Bento and share feedback.

What we need from the Wikimedia community

All three tools are in active development. There are open questions about how far they should go, especially around automation, integration, and long‑term maintenance.

We are looking for:

Testers who can try these tools with real recordings and report failures, missing use cases, and other challenges.
Wikimedians who handle audio, video, or oral history on Commons and can tell us where these tools fit—or do not fit—into existing workflows.
Developers interested in helping maintain, extend, or integrate these tools with other Wikimedia infrastructure.

Links and documentation:

OpenSpeaks Subtitler : tool documentation on Meta-Wiki and live editor on Toolforge.
OpenSpeaks Tome : tool documentation on Meta-Wiki and live editor on Toolforge

OpenSpeaks Bento : tool documentation on Meta-Wiki and live editor on Toolforge

OpenSpeaks Subtiter: caption and subtitler editor/translator

OpenSpeaks Tome: language documentation and oral knowledge metadata editor

OpenSpeaks Bento: a small toolbox for media reality

What we need from the Wikimedia community

Discussion in the ATmosphere