DGW.ltd

Markdown my words

dgw.ltd May 5, 2026

A Markdown-for-agents plugin shipped on WordPress.org. The Chancery Lane Project’s access logs show AI agents pulling climate clauses on a schedule; dgw.ltd shows the same agents barely turning up. The mechanism works — it only matters if you’ve got something worth pulling. I’ve been writing in Markdown since I started building 11ty sites in 2021, and using Bear for notes the whole time. Markdown is the quiet default of the modern web: READMEs, docs, static sites, notes apps, and now the format LLMs are most fluent in. WordPress is the day job. Getting it to speak Markdown felt overdue. Markdown launched in 2004. John Gruber’s pitch: “lets you use the regular characters on your keyboard which you already use while typing out things like emails, to make fancy formatting of text for the web.” For example: # Hello world This is some bold and italic text and a link. A typical web page weighs a few hundred kilobytes and in some cases even a megabyte or more. The same content as Markdown is only a few kilobytes. An 80-90% reduction. My first crack at pulling Markdown out of WordPress was a hacky Inquirer.js script that exported posts to numbered files (post-1.md, post-2.md). It worked. It was also one-shot, off-server, and required me to remember it existed. Several iterations later it became a wp-cli command. Useful, still very much for me. When “for me” turned into “for agents” The brief shifted because the traffic shifted. I work with The Chancery Lane Project (TCLP), a charity reducing emissions through legal clauses. Last year their stats made it obvious AI was already arriving: bot traffic spiking, LLMs citing the site as authority on climate clauses, no real visibility into either. Generative Engine Optimisation wasn’t a buzzword I picked up from a blog; the traffic patterns made it land. So less “a Markdown export utility” and more “what would a tool that helps an LLM read this site cleanly actually do?” Markdown saves bytes at both ends of the request. Fewer tokens in means cheaper, faster inference. Fewer bytes out means less server load. For a charity working on climate, that’s not incidental. The Cloudflare moment Cloudflare’s Markdown for Agents post is what made the design click. Content negotiation. Accept: text/markdown. Same URL, different representation depending on who’s asking. A 1990s HTTP feature with a 2026 use case: agents can pull tokens-light Markdown without a parallel /api/ URL structure or a sitemap of .md files. That’s the bit worth labouring. The plugin doesn’t replace your HTML. It serves Markdown to clients that ask for it and ignores the rest. A in the head lets agents discover it. Browsers carry on as before. What’s actually in the box The plugin landed on WordPress.org with three jobs: generation, frontmatter, stats. Generation. Files write themselves on publish. There’s a bulk generate button for catching up on existing content, and a wp markdown-agents command for bigger sites or anyone allergic to AJAX. Re-runs can be incremental (only re-write the posts that actually changed), and there’s a changes.json delta file so any downstream RAG pipeline can see what’s new without reprocessing the world. Frontmatter. YAML, configurable per post type. ACF fields drop in via dot notation: clause_fields.clause_summary walks an ACF group cleanly. Optional bits for hierarchical types: parent, ancestors, children IDs, author display name, root-relative featured image paths so a domain migration doesn’t break anything. Stats. Every request to a .md file is logged: which agent, when, how often. The admin screen breaks it down by post and by bot. This is the bit that turns “are LLMs actually reading us?” from a guess into a chart. What the stats actually show TCLP gets pulled by AI agents that have never once shown up on dgw.ltd. The major models are all there as regulars. The titles they’re hitting read like an LLM in the middle of a job: climate due diligence questionnaires, green loan frameworks, net zero construction clauses. Someone, somewhere, asked their AI to draft a green lease clause. The AI went there to look up what one’s supposed to look like. The contrast with dgw.ltd is instructive. Here I get steady but modest AI attention, mostly indexing crawlers working through the back catalogue. Specialist, high-trust content attracts more AI than a generalist dev blog. The other interesting signal: somebody is using Accept: text/markdown properly on TCLP. Most AI crawlers get served Markdown because the plugin recognises their user-agent. They don’t ask, they just get it. A client sending Accept: text/markdown is doing something more deliberate: it has read the tag in the head, understood it, and come back to explicitly request Markdown by name. On TCLP those requests arrive in pairs, each followed by a second request from the same client moments later. That’s the signature of an automated tool syncing fresh content on a schedule. On dgw.ltd that count is effectively zero. The mechanism works; it only matters if you’ve got something worth pulling on a schedule. Bucket list, ticked This is the first plugin I’ve contributed to the .org repo. The credit for it landing belongs to TCLP. Their Head of Digital, Felix Cohen, drove it: shaped the brief, set the priorities, and pushed the work all the way from “a nice Markdown export” to a published WordPress.org plugin. The work was funded as part of TCLP’s grant from the Patrick J. McGovern Foundation, which supports the charity’s AI work. I got to ship something with a genuine use case; the charity gets a feedback loop on how AI is consuming their content. The plugin is free, open source, and works on any WordPress site. If you’ve got content worth reading, it’s worth making readable. Want to see it in action? Click ‘View as Markdown’ in the menu, or download the plugin from WordPress.org. Under the hood Implementation detail for the curious. Skip if you’re not. Render on demand, or write a file? We picked static. AI crawlers don’t politely throttle, and a render-per-request model is a thinly-veiled DDoS amplifier waiting to happen. The cost matters. Converting on the fly means every hit spins up a converter, parses the page into a DOMDocument, walks the tree, runs the regex passes, and serialises out the other side. That ties up a PHP-FPM worker on string-mangling for every request, and a busy crawler will chew through your whole worker pool. Doing the work once on save_post and serving the file from disk is the obvious trade. The expensive part of a WordPress page is rarely the query. It’s template loading, block rendering, and the long tail of the_content filters. Skipping that is where the real win is. Caveat: PHP isn’t off the hook entirely. WordPress still boots, loads its plugins, and runs the main query before we get to look at the Accept header — the plugin hooks in at template_redirect. What we duck is everything after that: the template hierarchy, the block-rendering pass, the_content filters, footer scripts. The plugin reads the .md from disk and exits. Cheap, not free. Files land in wp-content/uploads/{export_dir}/{post-type}/{slug}.md, with taxonomy archives under taxonomy/{taxonomy}/{term-slug}.md. Auto-regenerate on post save, manual bulk via the settings page, or wp markdown-agents generate for big sites where AJAX would time out. HTML to Markdown: the package vs core decision We went with league/html-to-markdown over WordPress’s own WP_HTML_Processor. v1 needed a converter, not a parser. WP_HTML_Tag_Processor and WP_HTML_Processor shipped in core over the last two years and they’re properly good: spec-compliant, no Composer dep, first-class server-side HTML parsing. So why the third-party package? league/html-to-markdownWP_HTML_ProcessorPurposeHTML → Markdown converterHTML parser/walkerMaturityAround since 2014, well-troddenStable but newer, surface area still expandingOut of the box(new HtmlConverter)->convert($html)You write the tree walk yourself league/html-to-markdown is a converter. WP_HTML_Processor is a parser you can build a converter on top of. For a v1 we wanted “give me Markdown, please” with sensible defaults around code fences, tables, and unwrapping

soup. The League package does that. The next iteration probably swaps. We already use WP_HTML_Tag_Processor elsewhere for class injection on core blocks, the dependency footprint shrinks, and we’d own the conversion rules end to end.

Discussion in the ATmosphere