Raw Record Source

{
  "$type": "site.standard.document",
  "canonicalUrl": "https://rednafi.com/python/log-context-propagation/",
  "description": "Automatically tag Python logs with request context using middleware and contextvars for distributed tracing in ASGI web applications.",
  "path": "/python/log-context-propagation/",
  "publishedAt": "2024-08-06T00:00:00.000Z",
  "site": "at://did:plc:fgtm2c26vfcj74rfmeggbyqj/site.standard.publication/3mnl6f7ob462z",
  "tags": [
    "Python",
    "API",
    "Logging"
  ],
  "textContent": "Let's say you have a web app that emits log messages from different layers. Your log shipper\ncollects and sends these messages to a destination like Datadog where you can query them.\nOne common requirement is to tag the log messages with some common attributes, which you can\nuse later to query them.\n\nIn distributed tracing, this tagging is usually known as [context propagation], where you're\nattaching some contextual information to your log messages that you can use later for query\npurposes. However, if you have to collect the context at each layer of your application and\npass it manually to the downstream ones, that'd make the whole process quite painful.\n\nSuppose you have a web view for an endpoint that calls another function to do something:\n\nI'm using [starlette] syntax for the above pseudocode, but this is valid for any generic\nASGI web app. The view procedure collects contextual information like user_id and\nplatform from the request headers. Then it tags the log statements before and after\ncalling the work function using the extra fields in the logger calls. This way, the log\nmessages have contextual info attached to them.\n\nHowever, the work procedure also generates a log message, and that won't get tagged here.\nWe may be tempted to pass the contextual information to the work subroutine and use them\nto tag the logs, but that'll quickly get repetitive and cumbersome. Passing a bunch of\narguments to a function just so it can tag some log messages also makes things unnecessarily\nverbose. Plus, it's quite easy to forget to do so, which will leave you with orphan logs\nwith no way to query them.\n\nIt turns out we can write a simple middleware to tag log statements in a way where we won't\nneed to manually propagate the contextual information throughout the call chain. To\ndemonstrate that, here's a simple get endpoint server written in Starlette that'll just\nreturn a canned response after logging a few events. The app structure looks as follows:\n\nConfigure the logger\n\nThe first step is to configure the application logger so that it emits structured log\nstatements in JSON where each message will look as follows:\n\nHere's the log configuration logic:\n\nThe contextvars module manages context information across asynchronous tasks, preventing\ncontext leakage between requests. We use a log_context_var context variable to store user\nID and platform information, ensuring each log entry includes relevant context for the\nrequest.\n\nThe ContextAwareJsonFormatter formats log statements to include the message, timestamp in\nmilliseconds, and context tags. The context is retrieved using log_context_var.get(),\nensuring concurrency-safe access. The context variable is set in the middleware, so\nlog_context_var.get() always returns the current context for each request.\n\nNext, we set up a StreamHandler, attach the ContextAwareJsonFormatter to it, and add the\nhandler to the root logger.\n\nWrite a middleware that tags the log statements automatically\n\nWith log formatting out of the way, here's how to write the middleware to update the logger\nso that all the log messages within a request-response cycle get automatically tagged:\n\nThe LogContextMiddleware class inherits from starlette.BaseHTTPMiddleware and gets\ninitialized with the application.\n\nThe dispatch method is called automatically for each request. It extracts user_id and\nplatform from the request headers and sets these values in the log_context_var to tag\nlog messages. Then it logs the incoming request, processes it, logs the outgoing response,\nand then clears the context so that we don't leak the context information across requests.\nThis way, our view function won't need to be peppered with repetitive log statements.\n\nWrite the simplified view\n\nSetting up the logger and middleware drastically simplifies our endpoint view since we won't\nneed to tag the logs explicitly or add request-response logs in each view. It looks like\nthis now:\n\nNotice there's no repetitive request-response log statements in the view function, and\nwe're not passing the log context anywhere explicitly. The middleware will ensure that the\nrequest and response logs are always emitted and all the logs, including the one coming out\nof the work function, are tagged with the contextual information.\n\nWire everything together\n\nThe logging configuration and middleware can be wired up like this:\n\nTo instantiate the logger config, we import log.py in the __init__.py module:\n\nNow the application can be started with:\n\nAnd then we can make a request to the server:\n\nOn the server, the request will emit the following log messages:\n\nAnd we're done. You can find the fully working code in this [GitHub gist].\n\n_Note: The [previous version] of this example wasn't concurrency safe and used a shared\nlogger filter, leaking context information during concurrent requests. This was pointed out\nin this [GitHub comment]._\n\n\n\n\n[context propagation]:\n    https://opentelemetry.io/docs/concepts/context-propagation/\n\n[starlette]:\n    https://www.starlette.io/\n\n[github gist]:\n    https://gist.github.com/rednafi/dc2016a8ea0e2405b943f023bfb18142\n\n[previous version]:\n    https://web.archive.org/web/20240806220817/https://rednafi.com/python/log_context_propagation/\n\n[github comment]:\n    https://gist.github.com/rednafi/dc2016a8ea0e2405b943f023bfb18142?permalink_comment_id=5148207#gistcomment-5148207",
  "title": "Log context propagation in Python ASGI apps"
}