{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreig62wnti4ifoophm3hxzi77bekrx3ftg2f2tpleo6x7ogjuadytni",
"uri": "at://did:plc:25rdn5elo5izoxrmtis34zuk/app.bsky.feed.post/3molepgctdqd2"
},
"coverImage": {
"$type": "blob",
"ref": {
"$link": "bafkreiazyq2jaqmigp6xpcwarxxouubcl5squexsevfx4fv6x7my4l5kfe"
},
"mimeType": "image/webp",
"size": 36742
},
"path": "/schiff_heimlich/batch-converting-documents-to-markdown-with-microsofts-markitdown-kbi",
"publishedAt": "2026-06-18T17:01:26.000Z",
"site": "https://dev.to",
"tags": [
"cli",
"microsoft",
"python",
"tooling",
"https://github.com/microsoft/markitdown"
],
"textContent": "Here's a quick tool that landed in my queue recently: **microsoft/markitdown**\n\nIt's a Python CLI that converts PDFs, Word docs, PowerPoint, and Excel files to Markdown. Not groundbreaking, but if you've ever had to process a folder of legacy documentation for a static site, you know the value of not doing it manually.\n\nTwo things I found useful:\n\n**Batch conversion with piping**\n\n\n\n markitdown --input document.docx --output converted/\n\n\nYou can point it at a directory and it processes everything in one shot. Combine with standard Unix tools:\n\n\n\n find ./legacy-docs -name '*.docx' | xargs -I{} sh -c 'markitdown --input {} --output ./md/'\n\n\n**stdout output for scripting**\n\n\n\n markitdown document.pdf\n\n\nDumps the markdown to stdout, which makes it easy to pipe into other text processing or redirect to specific filenames based on the input.\n\nIt's on PyPI (`pip install markitdown`), so it'll drop into a CI pipeline without much friction. If you've got a documentation migration on your plate and you're tired of manual conversions, it's worth a look.\n\nhttps://github.com/microsoft/markitdown",
"title": "Batch-converting documents to markdown with Microsoft's markitdown"
}