{
  "$type": "site.standard.document",
  "canonicalUrl": "https://johnnyreilly.com/posts/adding-lastmod-to-sitemap-git-commit-date",
  "description": "This post demonstrates enriching an XML sitemap with `lastmod` timestamps based on git commits.",
  "path": "/posts/adding-lastmod-to-sitemap-git-commit-date",
  "publishedAt": "2022-11-25T00:00:00.000Z",
  "site": "at://did:plc:yy3apqjlms24kso7ahn7lbmb/site.standard.publication/3mova7c4nho2b",
  "tags": [
    "node.js",
    "docusaurus"
  ],
  "textContent": "This post demonstrates enriching an XML sitemap with lastmod timestamps based on git commits. The sitemap being enriched in this post was generated automatically by Docusaurus. The techniques used are predicated on the way Docusaurus works; in that it is file based. You could easily use this technique for another file based website solution; but you would need tweaks to target the relevant files you would use to drive your lastmod.\n\nIf you're interested in applying the same technique to your RSS / Atom / JSON feeds in Docusaurus, you may find this post interesting.\n\n\n\nUpdated 30/03/2024 - this is built into Docusaurus 3.2\n\nI'm delighted to say that Docusaurus 3.2 has this functionality built in. So you don't need this anymore!\n\nReading git log in Node.js\n\nIn the last post I showed how to manipulate XML in Node.js, and filter our sitemap. In this post we'll build upon what we did last time, read the git log in Node.js and use that to power a lastmod property.\n\nThe lastmod property (documented here) is a optional, and if supplied, should be date of last modification of a page in a W3C Datetime format. (This allows YYYY-MM-DD.)\n\nTo read the git log in Node.js we'll use the simple-git package. It's a great package that makes it easy to read the git log. Other stuff too - but that's what we care about today.\n\nTo work with simple-git we need to create a Git instance. We can do that like so:\n\nFrom sitemap to git log\n\nIt's worth pausing to consider what our sitemap looks like:\n\nIf you look at the URL (loc) you can see that it's fairly easy to determine the path to the original markdown file. If we take the URL https://johnnyreilly.com/2012/01/07/standing-on-shoulders-of-giants, we can see that the path to the markdown file is blog-website/blog/2012-01-07-standing-on-shoulders-of-giants/index.md.\n\nAs long as we don't have a custom slug in play (and I rarely do), we have a reliable way to get from blog post URL (loc) to markdown file. With that we can use simple-git to get the git log for that file. We can then use that to populate the lastmod property.\n\nAbove we're using a regular expression to extract the date and slug from the URL. We then use those to construct the path to the markdown file. We then use simple-git to get the git log for that file. We then use the latest commit date to populate the lastmod property, and push that onto the urls array.\n\nFinally we return the urls array and write that to the sitemap before we write it out:\n\nOur new sitemap looks like this:\n\nYou see the lastmod property has been populated for URLs based upon the most recent commit for that file. Yay!\n\nGitHub Actions - fetch_depth\n\nYou might think we were done (I thought we were done), but we're not. We're not done because we're using GitHub Actions to build the site.\n\nWhen I tested this locally, it worked fine. However, when I pushed it to GitHub Actions, it surfaced a latest.date which wasn't populated with the value you'd hope. The reason was that the fetch_depth was set to 1 (the default). This meant that the git log wasn't providing the information we'd hope for. By changing the fetch_depth to 0 the situation is resolved.\n\nUpdated 12th November 2023: Google's view on lastmod, changefreq and priority\n\nGoogle have announced that they use lastmod as a specific signal for triggering recrawling. It goes on to say that it doesn't use the changefreq or priority elements to trigger recrawling of URLs.\n\nSo if you want to have a sitemap that triggers reindexing well, having an accurate lastmod will help.\n\nConclusion\n\nThis post demonstrates how you can enrich a lastmodless sitemap to have one that is driven by git commit date. I hope it helps!",
  "title": "Adding lastmod to sitemap based on git commits"
}