Raw Record Source

{
  "$type": "site.standard.document",
  "canonicalUrl": "https://johnnyreilly.com/posts/definitive-guide-to-migrating-from-blogger-to-docusaurus",
  "description": "Learn how to transfer a Blogger website to Docusaurus without losing content. Use a TypeScript console app to convert HTML to Markdown.",
  "path": "/posts/definitive-guide-to-migrating-from-blogger-to-docusaurus",
  "publishedAt": "2021-03-15T00:00:00.000Z",
  "site": "at://did:plc:yy3apqjlms24kso7ahn7lbmb/site.standard.publication/3mova7c4nho2b",
  "tags": [
    "docusaurus",
    "typescript"
  ],
  "textContent": "This post documents how to migrate a blog from Blogger to Docusaurus.\n\n\n\nUpdated 5th November 2022\n\nThis post started out as an investigation into migrating from Blogger to Docusaurus. In the end I very much made the leap, and would recommend doing so to others. I've transformed this post into a \"definitive guide\" on how to migrate. I intend to maintain this on an ongoing basis for the benefit of the community.\n\nBecause I rather like what I originally wrote when I was in \"investigation mode\", I have largely left it in place. However, there are new sections which have been added in to augment what's there.\n\nIntroduction\n\nDocusaurus is, amongst other things, a Markdown powered blogging platform. My blog has lived happily on Blogger for the past decade. I'm considering moving, but losing my historic content as part of the move was never an option. This post goes through what it would look like to move from Blogger to Docusaurus _without_ losing your content.\n\nIt is imperative that the world never forgets what I was doing with jQuery in 2012.\n\nBlog as code\n\nEverything is better when it's code. Infrastructure as code. Awesome right? So naturally \"blog as code\" must be better than just a blog. More seriously, Markdown is a tremendous documentation format. Simple, straightforward and, like Goldilocks, \"just right\". For a long time I've written everything as Markdown. My years of toil down the Open Source mines have preconditioned me to be very MD-disposed.\n\nI started out writing this blog a long time ago as pure HTML. Not the smoothest of writing formats. At some point I got into the habit of spinning up a new repo in GitHub for a new blogpost, writing it in Markdown and piping it through a variety of tools to convert it into HTML for publication on Blogger. As time passed I felt I'd be a lot happier if I wasn't creating a repo each time. What if I did all my blogging in a single repo and used that as the code that represented my blog?\n\nJust having that thought laid the seeds for what was to follow:\n\n1. An investigation into importing my content from Blogger into a GitHub repo\n2. An experimental port to Docusaurus\n\nWe're going to go this now. First, let's create ourselves a Docusaurus site for our blog:\n\nThis creates a standard Docusaurus site in the blog-website directory. In there we'll find a docusaurus.config.js file. There's much that can be configured here. It's worth remembering that Docusaurus is a tool for building documentation sites that also happens to feature a blog component. We're going to use it as a blog only. So we'll deactivate the docs component and configure the blog component to be the home page of our site, following the Docusaurus documentation:\n\nDownloading your Blogger content\n\nIn order that we can migrate, we must obtain the blog content. This is a mass of HTML that lived inside Blogger's database. (One assumes they have a database; I haven't actually checked.) There's a Back up content option inside Blogger's settings to allow this:\n\nIt provides you with an XML file with a dispiritingly small size. Ten years blogging? You'll get change out of 4Mb it turns out.\n\nFrom HTML in XML to Markdown\n\nWe now want to take that XML and:\n\n- Extract each blog post (and it's associated metadata; title / tags and whatnot)\n- Convert the HTML content of each blog post from HTML to Markdown, and save it as a Markdown file\n- Download the images used in the blogpost so they can be stored in the repo as well\n\nTo do this we're going to whip up a smallish TypeScript console app. Let's initialise it with the packages we're going to need:\n\nWe're using:\n\n- fast-xml-parser to parse XML\n- he, jsdom and showdown to convert HTML to Markdown\n- axios to download images\n- typescript to code in and ts-node to make our TypeScript Node.js console app.\n\nNow we have all the packages we need, it's time to write our script.\n\nTo summarise what the script does, it:\n\n- deletes the default blog posts\n- creates a new authors.yml file with my details in\n- parses the blog XML into an array of Posts\n- each post is then converted from HTML into Markdown, a Docusaurus header is created and prepended, then the index.md file is saved to the blog-website/blog/{POST_NAME} directory\n- the images of each post are downloaded with Axios and saved to the blog-website/blog/{POST_NAME} directory\n\nTo see the full code, you can find it on the GitHub repository that now represents the blog.\n\nIf you're trying to do this yourself, you'll want to change some of the variable values in the script; such as the author details.\n\nBringing it all together\n\nTo run the script, we add the following script to the package.json:\n\nAnd have ourselves a merry little yarn start to kick off the process. In a very short period of time, if you crack open the blogs directory of your Docusaurus site you'll see a collection of folders, Markdown files and images. These represent your blog and are ready to power Docusaurus:\n\nI have slightly papered over some details here. For my own case I discovered that I hadn't always written perfect HTML when blogging. I had to go in and fix the HTML in a number of historic blogs and re-download, to get cleanish Markdown.\n\nI also learned that a number of my blog's images had vanished from Blogger at some point. This makes me all the more convinced that storing your blog in a repo is a good idea. Things should not \"go missing\".\n\nIf we now run yarn start in the blog-website directory we can see the blog in action:\n\nCongratulations! We're now the proud owners of a Docusaurus blog site based upon our Blogger content.\n\nIf you've got some curiously named image files you might encounter some minor issues that need fixing up. This should get you 95% the way there though. Docusaurus does a great job of telling you when there's issues.\n\nRedirecting from Blogger URLs to Docusaurus URLs\n\nThe final step is to redirect from the old Blogger URLs to the new Docusaurus URLs. Blogger URLs look like this: /2019/10/definitely-typed-movie.html. On the other hand, Docusaurus URLs look like this: /2019/10/08/definitely-typed-movie.\n\nI'll want to redirect from the former to the latter. I'll use the @docusaurus/plugin-client-redirects plugin to do this. Inside the docusaurus.config.js file, I'll add the following to the plugins section:\n\nThe function above will be run during the build process for each URL. And consequently a client side redirect will be created to go from the landing URL to the Docusaurus URL. The console.log is there to help me see what's going on. I don't actually need it.\n\nHaving this in place should protect my SEO when the domain switches from Blogger to Docusaurus. Long term I shouldn't need this approach in place.\n\nComments\n\nI'd always had comments on my blog. First with Blogger's in-built functionality and then with Disqus. One thing that Docusaurus doesn't support by default is comments for blog posts. There's a feature request for it here. However, it doesn't exist right now.\n\nFor a while I considered this a dealbreaker, and wasn't planning to complete the migration. But then I had a discussion with Josh Goldberg as to the value of comments. Essentially that they are nice, but not essential.\n\nI rather came to agree with the notion that comments were only slightly interesting as I looked back at the comments I'd received on my blog over the years. So I decided to go ahead _without_ comments. I remain happy with that choice, so thanks Josh!\n\nHowever, if it's important to you, there are ways to support comments. One example is using Giscus; here is a guide on how to integrate it.\n\nDNS and RSS\n\nAt this point I had a repository that represented my blog. I had a Docusaurus site that represented my blog. When I ran yarn build I got a Docusaurus site that looked like my blog. I had a redirect mechanism in place to protect my SEO.\n\nI was ready to make the switch.\n\nHosting is a choice. When I initially migrated, I made use of GitHub Pages. I also experimented with Netlify. Finally I moved to using Azure Static Web Apps to make use of preview environments. There are many choices out there - you can pick the one that works best for you.\n\nOnce your site is up, the last stage of the migration is updating your DNS to point to the Docusaurus site. I use Cloudflare to manage my domain names and so that's where I made the switch.\n\nRSS / Atom feeds\n\nIf you're like me, you'll want to keep your RSS feed. I didn't want to disrupting people who consumed my RSS feed as I migrated.\n\nHappily, Docusaurus ships with RSS / Atom in the box. Even happier still, most of the feed URLs in Blogger match the same URLs in Docusaurus. There was one exception in the form of the /feeds/posts/default feed which is an Atom feed. Docusaurus has an atom.xml feed but it's not in the same place.\n\nThis isn't a significant issue as I can create a page rule in Cloudflare to redirect from the old URL (https://johnnyreilly.com/feeds/posts/default) to the new URL (https://johnnyreilly.com/atom.xml):\n\nConclusion\n\nI've migrated to Docusaurus and have been happily running there for a while now. I'm very happy with the result.\n\nThis post is intended to be a community resource that helps folk migrate from Blogger to Docusaurus. If you should find issues with the migration, please do let me know and help make this resource even better.",
  "title": "The definitive guide to migrating from Blogger to Docusaurus"
}