{
"$type": "site.standard.document",
"canonicalUrl": "https://rednafi.com/python/implement-traceroute/",
"description": "Build a traceroute clone in Python using UDP and ICMP sockets to trace network packet routes and measure hop latency with TTL manipulation.",
"path": "/python/implement-traceroute/",
"publishedAt": "2023-06-01T00:00:00.000Z",
"site": "at://did:plc:fgtm2c26vfcj74rfmeggbyqj/site.standard.publication/3mnl6f7ob462z",
"tags": [
"Python",
"Networking",
"Shell",
"Unix"
],
"textContent": "I was watching [Storytelling with traceroute] lightning talk by Karla Burnett and wanted to\nunderstand how traceroute works in Unix. Traceroute is a tool that shows the route of a\nnetwork packet from your computer to another computer on the internet. It also tells you how\nlong it takes for the packet to reach each stop along the way.\n\nIt's useful when you want to know more about how your computer connects to other computers\non the internet. For example, if you want to visit a website, your computer sends a request\nto the website's server, which is another computer that hosts the website. But the request\ndoesn't go directly from your computer to the server. It has to pass through several other\ndevices, such as routers, that help direct the traffic on the internet. These devices are\ncalled hops. Traceroute shows you the list of hops that your request goes through, and how\nlong it takes for each hop to respond. This can help you troubleshoot network problems, such\nas slow connections or unreachable websites.\n\nThis is how you usually use traceroute:\n\nThis returns:\n\nThis traceroute output draws the path of a network packet from my computer to\nexample.com's server, which has an IP address of 93.184.216.34. It shows that the packet\ngoes through 11 hops before reaching the destination. The first hop is my router\n(192.168.1.1), the second hop is my ISP's router (142.254.158.201), and so on. The last\ncolumn shows the time it takes for each hop to respond in milliseconds (ms). The lower the\ntime, the faster the connection.\n\nSome hops have multiple lines with different names or IP addresses. This means that there\nare multiple routers at that hop that can handle the traffic, and traceroute randomly\npicks one of them for each packet. For example, hop 7 has three routers with names starting\nwith lag-11, lag-21, and lag-31. These are probably load-balancing routers that\ndistribute the traffic among them.\n\nThe last hop (93.184.216.34) appears twice in the output. This is because traceroute sends\nthree packets to each hop by default, and sometimes the last hop responds to all three\npackets instead of discarding them. This is not a problem and does not affect the accuracy\nof the traceroute.\n\nThis is all good and dandy but I wanted to understand how traceroute can find out what\nroute a packet takes and how long it takes between each hop. So I started reading blogs like\n[How traceroute works] that does an awesome job at explaining what's going on behind the\nscene. The gist of it goes as follows.\n\nUnderneath traceroute\n\nTraceroute works by sending a series of ICMP (Internet Control Message Protocol) echo\nrequest packets, which are also known as pings, to the target IP address or URL that you\nwant to reach. Each packet has an associated time-to-live (TTL) value, which is a number\nthat indicates how many hops (or intermediate devices) the packet can pass through before it\nexpires and is discarded by a router. Yeah, strangely, TTL doesn't denote any time duration\nhere.\n\nTraceroute starts by sending a packet with a low TTL value, usually 1. This means that the\npacket can only make one hop before it expires. When a router receives this packet, it\ndecreases its TTL value by 1 and checks if it is 0. If it is 0, the router discards the\npacket and sends back an ICMP time exceeded message to the source of the packet. This\nmessage contains the IP address of the router that discarded the packet. This is how the\nsender knows the IP address of the first hop (router, computer, or whatsoever).\n\nTraceroute records the IP address and round-trip time (RTT) of each ICMP time exceeded\nmessage it receives. The RTT is the time it takes for a packet to travel from the source to\nthe destination and back. It reflects the latency (or delay) between each hop.\n\nTraceroute then increases the TTL value by 1 and sends another packet. This packet can make\n2 hops before it expires. The process repeats until traceroute reaches the destination or a\nmaximum TTL value, usually 30. When the returned IP is the same as the initial destination\nIP, traceroute knows that the packet has completed the whole journey. By doing this,\ntraceroute can trace the route that your packets take to reach the target IP address or URL\nand measure the latency between each hop. The tool prints out the associated IPs and\nlatencies as it jumps through different hops.\n\nI snagged this photo from an [SFU traceroute machinery slide] that I think explains the\nmachinery of traceroute quite well:\n\n![Diagram showing traceroute mechanism with TTL incrementing at each router hop][image_1]\n\nWriting a crappier version of traceroute in Python\n\nAfter getting a rough idea of what's going on underneath, I wanted to write a simpler and\ncrappier version of traceroute in Python. This version would roughly perform the following\nsteps:\n\n1. Establish a UDP socket connection that'd be used to send empty packets to the hops.\n2. Create an ICMP socket that'd receive _ICMP time exceeded_ messages.\n3. Start a loop and use the UDP socket to send an empty byte with a TTL of 1 to the first\n hop.\n4. The TTL value of the packet would be decremented by 1 at the first hop. Once the TTL\n reaches 0, the packet would be discarded, and an ICMP time exceeded message would be\n returned to the sender through the ICMP socket. The sender would also receive the address\n of the first hop.\n5. Calculate the time delta between sending a packet and receiving the ICMP time exceeded\n message. Also, capture the address of the first hop and log the time delta and address to\n the console.\n6. In the subsequent iterations, the TTL value will be incremented by 1 (2, 3, 4, ...) and\n the steps from 1 through 5 will be repeated until it reaches the max_hops value, which\n is set at 64.\n\nHere's the complete self-contained implementation. I tested it on Python 3.11:\n\nRunning the script will give you the following nicely formatted output:\n\n\n\n\n[storytelling with traceroute]:\n https://www.youtube.com/watch?v=xW_ALxfop7Y\n\n[how traceroute works]:\n https://www.slashroot.in/how-does-traceroute-work-and-examples-using-traceroute-command\n\n[sfu traceroute machinery slide]:\n http://www.sfu.ca/~ljilja/cnl/presentations/arman/nafips2001/sld006.htm\n\n[image_1]:\n https://blob.rednafi.com/static/images/implement_traceroute/img_1.png",
"title": "Implementing a simple traceroute clone in Python"
}