{
  "path": "/parsing-apache2-access-logs-with-the-opentelemetry-collector",
  "site": "at://did:plc:gttrfs4hfmrclyxvwkwcgpj7/site.standard.publication/self",
  "tags": [
    "opentelemetry",
    "tutorial"
  ],
  "$type": "site.standard.document",
  "title": "Parsing Apache2 access logs with the OpenTelemetry Collector",
  "content": {
    "text": "I couldn't find a ton of resources on this, but FYI -- the OpenTelemetry Collector's `filelog` receiver has a pretty robust regex parser built into it. Want to get your access.log files from Apache? Here's the config.\n\n```yaml\n  filelog/access:\n    include: [ /var/log/apache2/access.log ]\n    operators:\n      - type: regex_parser\n        regex: '(?P<ip>\\d{1,3}(?:\\.\\d{1,3}){3}) - - \\[(?P<datetime>[^\\]]+)] \"(?P<method>\\S+) (?P<path>\\S+) (?P<protocol>\\S+)\" (?P<status>\\d{3}) (?P<size>\\d+) \"(?P<referrer>[^\"]*)\" \"(?P<user_agent>[^\"]*)'\n        timestamp:\n          parse_from: attributes[\"datetime\"]\n          layout: '%d/%b/%Y:%H:%M:%S %z'\n        severity:\n          parse_from: attributes[\"status\"]\n```\n\nThe documentation for a lot of this stuff is stuck inside the GitHub repositories for the receiver modules, so be sure to check that out if you're looking for a quick reference.\n\nWhat if we want to go further and turn our attributes into their appropriate semantic conventions? While there's no explicit log conventions for HTTP servers, the Span ones should work for our purposes.\n\n```yaml\n  transform:\n    error_mode: ignore\n    log_statements:\n      - context: log\n        statements:\n          - replace_all_patterns(attributes, \"key\", \"method\",  \"http.request.method\")\n          - replace_all_patterns(attributes, \"key\", \"status\",  \"http.response.status_code\")\n          - replace_all_patterns(attributes, \"key\", \"user_agent\", \"user_agent.original\")\n          - replace_all_patterns(attributes, \"key\", \"ip\", \"client.address\")\n          - replace_all_patterns(attributes, \"key\", \"path\", \"url.path\")\n          - delete_key(attributes, \"datetime\")\n          - delete_key(attributes, \"size\")\n```\n\nThis should be enough to get started, at least, although there's more you might want to do:\n\n- Add resource attributes for the logical service name (apache, reverse-proxy, etc.)\n\n- Change up your Apache [log format](https://httpd.apache.org/docs/2.4/mod/mod_log_config.html#formats) to get more information like the scheme, or time spent serving the request.\n",
    "$type": "site.standard.content.markdown",
    "version": "1.0"
  },
  "description": "I couldn't find a ton of resources on this, but FYI -- the OpenTelemetry Collector's filelog receiver has a pretty robust regex parser built into it. Want to get your access.log files from Apache? Here's the config.",
  "publishedAt": "2024-01-06T00:00:00Z",
  "textContent": "I couldn't find a ton of resources on this, but FYI -- the OpenTelemetry Collector's filelog receiver has a pretty robust regex parser built into it. Want to get your access.log files from Apache? Here's the config.The documentation for a lot of this stuff is stuck inside the GitHub repositories for the receiver modules, so be sure to check that out if you're looking for a quick reference.What if we want to go further and turn our attributes into their appropriate semantic conventions? While there's no explicit log conventions for HTTP servers, the Span ones should work for our purposes.This should be enough to get started, at least, although there's more you might want to do:Add resource attributes for the logical service name (apache, reverse-proxy, etc.)Change up your Apache log format to get more information like the scheme, or time spent serving the request."
}