Azure Open AI: generate article metadata with TypeScript
This post grew out of my desire to improve the metadata for my blog posts. I have been blogging for more than ten years, and the majority of my posts lack descriptions. A description is meta tag that sits in a page and describes the contents of the page. This is what this posts description meta tag looks like in HTML:
Descriptions are important for search engine optimisation (SEO) and for accessibility. You can read up more on the topic here. I wanted to have descriptions for all my blog posts. But writing around 230 descriptions for my existing posts was not something I wanted to do manually. I wanted to automate it.
TypeScript Azure Open AI SDK
I've been using Azure Open AI for a while now, and I've been using the TypeScript SDK in the @azure/openai package to interact with it. What I wanted to do, was to use the SDK to generate descriptions for my blog posts based on the content. Since my blog is powered by Docusaurus and each post is a Markdown file, I had easy access to a individual files that could be summarised.
What I wanted to do
The plan was, to build a script to do the following:
- read all of my blog posts without descriptions
- for each one, generate a description using Azure Open AI
- write the description to the Markdown file
I wanted to use Bun for this as it supports TypeScript by default. Using Node.js would equally be possible; but it wouldn't have been so easy to use TypeScript.
Reading the blog posts
I started off by creating a new Bun project:
Then adding the various packages we needed, including the Azure Open AI one:
I then created an index.ts file and added the following code:
The code above does the following:
- reads all of the blog posts in my blog; they're all directories in the blog directory with an index.md underneath so it's pretty easy
- for each post, it checks to see if there is a description in the front matter (front matter is a metadata section at the top of the Markdown file)
- if there is no description, it adds the post to a list of posts without descriptions
- it then loops through the posts without descriptions and generates a description for each one using the produceSummary function - more on that in a moment
Generating the descriptions
So far, so text wrangling. Let's look at the produceSummary function in the summarizer.ts file:
There's a lot going on here, so let's break it down.
Authentication
Oftentimes the fiddliest part of using Azure Open AI is authentication. In this case, I'm using the Azure CLI credential. So to run this you need to authenticate with the Azure CLI with az login. (Remember to make sure you have the correct subscription selected. You can check this by running az account show and checking that the isDefault property is set to true.)
Once you're logged in, this code will use the currently logged in user to authenticate with Azure Open AI.
You need to set the endpoint variable to the endpoint of your Azure Open AI resource. You can find this in the Azure Portal by going to your Azure Open AI resource. Look for something like https://.openai.azure.com. You'll also need to get the name of your deployment. In my case this is OpenAi-gpt-35-turbo.
Producing the summary
Once you've authenticated and got the client you can start to summarise. The first thing to do is provide a system message to prime the model with context on what we're trying to do. As part of writing a good description, there's a sweet spot to hit in terms of length; too short and it's not useful, too long and it gets truncated. So we're going to aim for between 120 and 156 characters. We're also going to encourage the AI to avoid certain wording constructs and also avoid using the ' character as it upsets the front matter.
Once primed, we hand over the blog content to the AI and ask it to produce a summary.
The way we interact with the Azure Open AI is with the getChatCompletions method, which is effectively a strongly typed wrapper for the chat-completions endpoint in Azure.
What's quite interesting, is that you really can't rely on the AI do what you ask it to do. It may create a description of an appropriate length. It may not. So we need to check what it gives us, and if it doesn't satisfy our needs, then we ask it to try again. We'll give it a maximum of 10 attempts per post, as surprisingly, every now and then it struggles to meet the brief and infinite loops are to be avoided.
Running the script
When the script ran (after I'd az login-ed) it produced descriptions for all my blog posts. I reviewed each summary and tweaked them where necessary. If I really didn't like a description I'd delete and run the script again. In the end I had descriptions for all my blog posts that I was pretty happy with. If you take a look at this giant PR you can see them all landing.
Hopefully this post provides a useful example of how to use the TypeScript Azure Open AI SDK to generate article metadata. You can see the raw code here.
Discussion in the ATmosphere