Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreig62wnti4ifoophm3hxzi77bekrx3ftg2f2tpleo6x7ogjuadytni",
    "uri": "at://did:plc:25rdn5elo5izoxrmtis34zuk/app.bsky.feed.post/3molepgctdqd2"
  },
  "coverImage": {
    "$type": "blob",
    "ref": {
      "$link": "bafkreiazyq2jaqmigp6xpcwarxxouubcl5squexsevfx4fv6x7my4l5kfe"
    },
    "mimeType": "image/webp",
    "size": 36742
  },
  "path": "/schiff_heimlich/batch-converting-documents-to-markdown-with-microsofts-markitdown-kbi",
  "publishedAt": "2026-06-18T17:01:26.000Z",
  "site": "https://dev.to",
  "tags": [
    "cli",
    "microsoft",
    "python",
    "tooling",
    "https://github.com/microsoft/markitdown"
  ],
  "textContent": "Here's a quick tool that landed in my queue recently: **microsoft/markitdown**\n\nIt's a Python CLI that converts PDFs, Word docs, PowerPoint, and Excel files to Markdown. Not groundbreaking, but if you've ever had to process a folder of legacy documentation for a static site, you know the value of not doing it manually.\n\nTwo things I found useful:\n\n**Batch conversion with piping**\n\n\n\n    markitdown --input document.docx --output converted/\n\n\nYou can point it at a directory and it processes everything in one shot. Combine with standard Unix tools:\n\n\n\n    find ./legacy-docs -name '*.docx' | xargs -I{} sh -c 'markitdown --input {} --output ./md/'\n\n\n**stdout output for scripting**\n\n\n\n    markitdown document.pdf\n\n\nDumps the markdown to stdout, which makes it easy to pipe into other text processing or redirect to specific filenames based on the input.\n\nIt's on PyPI (`pip install markitdown`), so it'll drop into a CI pipeline without much friction. If you've got a documentation migration on your plate and you're tired of manual conversions, it's worth a look.\n\nhttps://github.com/microsoft/markitdown",
  "title": "Batch-converting documents to markdown with Microsoft's markitdown"
}