Log context propagation in Python ASGI apps

Redowan Delowar August 6, 2024
Source

Let's say you have a web app that emits log messages from different layers. Your log shipper collects and sends these messages to a destination like Datadog where you can query them. One common requirement is to tag the log messages with some common attributes, which you can use later to query them.

In distributed tracing, this tagging is usually known as context propagation, where you're attaching some contextual information to your log messages that you can use later for query purposes. However, if you have to collect the context at each layer of your application and pass it manually to the downstream ones, that'd make the whole process quite painful.

Suppose you have a web view for an endpoint that calls another function to do something:

I'm using starlette syntax for the above pseudocode, but this is valid for any generic ASGI web app. The view procedure collects contextual information like user_id and platform from the request headers. Then it tags the log statements before and after calling the work function using the extra fields in the logger calls. This way, the log messages have contextual info attached to them.

However, the work procedure also generates a log message, and that won't get tagged here. We may be tempted to pass the contextual information to the work subroutine and use them to tag the logs, but that'll quickly get repetitive and cumbersome. Passing a bunch of arguments to a function just so it can tag some log messages also makes things unnecessarily verbose. Plus, it's quite easy to forget to do so, which will leave you with orphan logs with no way to query them.

It turns out we can write a simple middleware to tag log statements in a way where we won't need to manually propagate the contextual information throughout the call chain. To demonstrate that, here's a simple get endpoint server written in Starlette that'll just return a canned response after logging a few events. The app structure looks as follows:

Configure the logger

The first step is to configure the application logger so that it emits structured log statements in JSON where each message will look as follows:

Here's the log configuration logic:

The contextvars module manages context information across asynchronous tasks, preventing context leakage between requests. We use a log_context_var context variable to store user ID and platform information, ensuring each log entry includes relevant context for the request.

The ContextAwareJsonFormatter formats log statements to include the message, timestamp in milliseconds, and context tags. The context is retrieved using log_context_var.get(), ensuring concurrency-safe access. The context variable is set in the middleware, so log_context_var.get() always returns the current context for each request.

Next, we set up a StreamHandler, attach the ContextAwareJsonFormatter to it, and add the handler to the root logger.

Write a middleware that tags the log statements automatically

With log formatting out of the way, here's how to write the middleware to update the logger so that all the log messages within a request-response cycle get automatically tagged:

The LogContextMiddleware class inherits from starlette.BaseHTTPMiddleware and gets initialized with the application.

The dispatch method is called automatically for each request. It extracts user_id and platform from the request headers and sets these values in the log_context_var to tag log messages. Then it logs the incoming request, processes it, logs the outgoing response, and then clears the context so that we don't leak the context information across requests. This way, our view function won't need to be peppered with repetitive log statements.

Write the simplified view

Setting up the logger and middleware drastically simplifies our endpoint view since we won't need to tag the logs explicitly or add request-response logs in each view. It looks like this now:

Notice there's no repetitive request-response log statements in the view function, and we're not passing the log context anywhere explicitly. The middleware will ensure that the request and response logs are always emitted and all the logs, including the one coming out of the work function, are tagged with the contextual information.

Wire everything together

The logging configuration and middleware can be wired up like this:

To instantiate the logger config, we import log.py in the init.py module:

Now the application can be started with:

And then we can make a request to the server:

On the server, the request will emit the following log messages:

And we're done. You can find the fully working code in this GitHub gist.

Note: The previous version of this example wasn't concurrency safe and used a shared logger filter, leaking context information during concurrent requests. This was pointed out in this GitHub comment.

Discussion in the ATmosphere

Loading comments...