Raw Record Source

{
  "$type": "site.standard.document",
  "canonicalUrl": "https://johnnyreilly.com/posts/using-kernel-memory-to-chunk-documents-into-azure-ai-search",
  "description": "To build RAG (Retrieval Augmented Generation) experiences, where LLMs can query documents, you need a strategy to chunk those documents. Kernel Memory supports this.",
  "path": "/posts/using-kernel-memory-to-chunk-documents-into-azure-ai-search",
  "publishedAt": "2024-04-21T00:00:00.000Z",
  "site": "at://did:plc:yy3apqjlms24kso7ahn7lbmb/site.standard.publication/3mova7c4nho2b",
  "tags": [
    "azure",
    "c#",
    "asp.net",
    "ai"
  ],
  "textContent": "I've recently been working on building retrieval augmented generation (RAG) experiences into applications; building systems where large language models (LLMs) can query documents. To achieve this, we first need a strategy to chunk those documents and make them LLM-friendly. Kernel Memory, a sister project of Semantic Kernel supports this.\n\n\n\nIf you haven't heard of Kernel Memory before, it's a library that, amongst other things, provides a way to chunk documents into smaller pieces. To quote the Kernel Memory GitHub repository:\n\n> Kernel Memory (KM) is a service built on the feedback received and lessons learned from developing Semantic Kernel (SK) and Semantic Memory (SM). It provides several features that would otherwise have to be developed manually, such as storing files, extracting text from files, providing a framework to secure users' data, etc. The KM codebase is entirely in .NET, which eliminates the need to write and maintain features in multiple languages. As a service, KM can be used from any language, tool, or platform, e.g. browser extensions and ChatGPT assistants.\n\nIn this post, I'll show you how to use Kernel Memory to chunk documents in the background of an ASP.NET application.\n\nKernel Memory: Serverless vs Service\n\nThere's two ways that we can run Kernel Memory: \"Serverless\" and \"Service\".\n\nRunning the full service is more powerful, but effectively requires running a separate application. We would then need to integrate our main app with that. Given that I'm building with ASP.NET, I'll be using the serverless approach, which allows us to run Kernel Memory within the context of a single application (which will contain our main app code as well). We can then manage our integrations with Kernel Memory as simple method calls.\n\nThis is simpler and more cost-effective, but it does have some limitations. The service approach offers more features; including persistent retry logic. The documentation states that if we want to scale then we'll likely want to consider the service approach. But my own experience has been that serverless works very well for small to medium-sized applications.\n\nPerhaps surprisingly, using serverless we can still have the experience of running Kernel Memory as a non-blocking separate service within the context of our ASP.NET application. This is achieved by running Kernel Memory as a hosted service - this is the standard ASP.NET mechanism for running background tasks. That's what we're going to use.\n\nThere's four parts to bring this to life:\n\n1. Our Kernel Memory serverless instance - this is where the integration between Kernel Memory, Azure Open AI, Azure AI Search and the actual chunking takes place\n2. A queue which we'll use to provide documents for chunking with Kernel Memory\n3. Our hosted service which will bring together the queue and the Kernel Memory integration to manage our background document processing\n4. An endpoint in our ASP.NET application to add documents to the queue\n\n1. Setting up Kernel Memory serverless\n\nThere's a number of dependencies that we need to add to our project to get Kernel Memory working. These are:\n\nWith this in place we'll start to integrate with Kernel Memory. We will first construct ourselves an IKernelMemory like so:\n\nWhat we're doing here, is creating an IKernelMemory instance and making it aware of all our deployed Azure resources. Going through how to deploy those is out of the scope of this post, but it's probably worth highlighting that we're using AzureIdentity for auth as it's particularly secure, if you would like to use other options, you certainly can.\n\nIt's probably worth highlighting that we're using the text-embedding-ada-002 model for text embedding and the gpt-3.5-turbo-16k model for text generation. These are the models that I've found to be most effective for my use cases. Of these, the text embedding model is the most important - it's the one that will be used to chunk documents.\n\nYou'll also note we're using Azure AI Document Intelligence; this is optional and just tackles a few more document chunking scenarios. It's not mandatory.\n\nChunking with Kernel Memory serverless\n\nWith our IKernelMemory ready to go, we now need a way to chunk documents. Deep down, this is achieved by acquiring the document we want to chunk from blob storage and passing it to _memory.ImportDocumentAsync with the name of the index we want to process into. You can see examples of this usage in the Kernel Memory docs. You can also see how it works in the Kernel Memory repository itself.\n\nHowever, it's often helpful to have a number of other things in place to manage:\n\n1. Applying tags to documents (this gives us more power when querying later)\n2. Creating acceptable names / ids for the Azure AI Search Service\n3. Handling rate limiting - more on that in a moment\n\nTo that end, I tend to end up implementing a Process method that looks something like this:\n\nMuch of the code above concerns rate limiting / 429s. It's not uncommon when chunking to be hit by 429s - \"Too many requests\". Chunking documents requires use of Azure Open AI resources, and the level of access we have is typically restricted and controlled via quotas. There's an element of this that we can avoid by controlling the quota available on our Azure Open AI deployments (you can read more about this here), and we can implement a certain amount of retry logic also.\n\nThe code above tries to handle a number of re-attempts as wisely as it can, and using the information that Azure APIs surface around when re-attempting is allowed. Interestingly you'll see a variety of strategies employed here around retry times, as the way information is surfaced to support this keeps changing! We can likely have less code in future when a final standard is committed to.\n\nBringing it together\n\nWe're going to put this all together in a single class called RagGestionService.\n\nYou might be puzzled by the name \"RagGestion\" - this is a term my good friend George Karsas coined to describe the process of preparing documents for Retrieval Augmented Generation. It's a great term, and I've adopted it!\n\nThe RagGestionService will look like this:\n\nBy the way, I don't advise hard-coding the Azure resources as I have here, but rather passing them in as configuration. Incidentally, we could also use dependency injection to inject a prepared IKernelMemory instance into the service, but again, I'm keeping it simple here for clarity.\n\n2. Our document processor queue\n\nIn order that we have a way to provide documents for chunking, we need a queue. This is a simple queue that we can add documents to, and then process them in the background. We're going to use a ConcurrentQueue for this, with a little wrapper around it so we can encapsulate the queue for sharing between our UI and our background task, and also to do some logging.\n\nThe EnqueueDocumentUri method above will be called from the context of our UI - from an ASP.NET controller. This will be invoked when someone uploads a file and will also be responsible for adding the file to a BlobService for storage prior to processing.\n\nBy contrast, the DequeueDocumentUri method will be called from the context of our background service; it will call this method to pick up a file for processing.\n\n3. Our background service\n\nNext, we need a background service to bring together our DocumentProcessorQueue and our RagGestionService. This is a standard ASP.NET hosted service. It will look like this:\n\nThis service will run in the background of the ASP.NET application, will pick up documents from the queue (if there are any) and pass them to the RagGestionService for processing. It will trigger every 5 seconds, running for the lifetime of the application.\n\nYou'll see we're doing some timing here - this is because it's useful to know how long the process takes. If we're processing a lot of documents, we'll want to know how long it's taking to process each one.\n\n4. Adding documents to the queue\n\nTo add documents to the queue, we'll need to create an endpoint in our ASP.NET application. This endpoint will accept files and add them to the queue. Here's an example of how we might do that:\n\nAs we can see, this endpoint:\n\n1. Accepts files from a POST request with an index name in the querystring\n2. Uploads them to Blob Storage (matching the container name to the index they will be processed into in future)\n3. Adds them to the queue with _documentProcessorQueue.EnqueueDocumentUri. This will then be picked up by the background service and processed.\n\nRegistering our services\n\nFinally, we'll need to register our services in the Program.cs file. We'll want to add the following:\n\nWith this in place we have an application that can upload documents and chunk them in the background.\n\nConclusion\n\nAnd that's it! This is an ASP.NET application that can chunk documents (or RagGest 😉) in the background using Kernel Memory running in serverless mode. I haven't yet had the need to upgrade to the full Kernel Memory service. Perhaps the day will come, but the mileage we can get with this approach is considerable.\n\nMany thanks to David Rosevear and George Karsas for their help working on this mechanism. And George for \"RagGestion\" - I love it!",
  "title": "Using Kernel Memory to Chunk Documents into Azure AI Search"
}