{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreidu456az6bfzq5klw5ogjgsfqlkzn5ziuuioh3unrpipjg2o7qrda",
"uri": "at://did:plc:25rdn5elo5izoxrmtis34zuk/app.bsky.feed.post/3mom7k4dh2ac2"
},
"coverImage": {
"$type": "blob",
"ref": {
"$link": "bafkreifpfj3hokld6fycidsifnvvb6xg2aove77gamdcs52colbp43avpi"
},
"mimeType": "image/webp",
"size": 453280
},
"path": "/vesviet/system-design-graphhopper-distance-matrix-self-host-osrm-vs-haversine-for-route-optimization-2cpg",
"publishedAt": "2026-06-19T01:02:51.000Z",
"site": "https://dev.to",
"tags": [
"systemdesign",
"architecture",
"graphhopper",
"routing",
"E-commerce Order Allocation",
"Part 6",
"Order Allocation Algorithms",
"Uber Engineering: H3 Hexagonal Hierarchical Spatial Index",
"Google Maps Platform Pricing: Distance Matrix API",
"Open Source Routing Machine (OSRM)",
"GraphHopper Routing Engine Documentation",
"GraphHopper Matrix API Reference",
"OpenStreetMap Vietnam Data (Geofabrik)",
"Part 6 — Building a Mini Allocation Engine",
"Series Executive Summary",
"GraphHopper Distance Matrix: Self-Host OSRM vs Haversine for Route Optimization",
"LinkedIn",
"tanhdev.com/hire"
],
"textContent": "> **Series context:** This is Part 7 of the E-commerce Order Allocation series. The distance matrix built here feeds directly into the OR-Tools VRP solver in Part 6.\n\n## The Invisible yet Most Expensive Bottleneck in E-commerce Routing\n\n**Answer-first:** For 1 warehouse + 100 delivery stops, you need 10,201 pairwise distances. Self-hosting **GraphHopper** or **OSRM** on a $20/month VPS delivers this in under 50ms per batch — completely free. Using Google Maps Distance Matrix API for the same workload costs **$510/day**. This guide shows you exactly when to use which tool, with Docker setup and production Python code.\n\nFor any VRP (Vehicle Routing Problem) solver to find the optimal delivery route, it needs to know the exact cost between every pair of stops — this is the **distance matrix**. For 1 warehouse + 100 orders (101 points), that is `101 × 101 = 10,201` pairs. Choosing the wrong tool for this step can cost **$510/day** in API fees or cause multi-second latency spikes under load.\n\n## 1. As the Crow Flies: The Haversine Formula\n\nThis calculates the straight-line distance between two points on a sphere (the Earth) using their Latitude and Longitude.\n\n### Pros\n\n * **Extremely fast:** Calculating 10,000 pairs takes milliseconds on a CPU.\n * **No external data needed:** Pure mathematics; zero reliance on APIs or map data.\n\n\n\n### Cons\n\n * **Inaccurate:** It ignores roads, rivers, tunnels, and one-way streets. In urban reality, actual driving distance is usually 1.2x to 1.5x the Haversine distance.\n\n\n\n### Python Example\n\n\n import math\n\n def haversine(lat1, lon1, lat2, lon2):\n R = 6371.0 # Earth radius in km\n\n dlat = math.radians(lat2 - lat1)\n dlon = math.radians(lon2 - lon1)\n\n a = (math.sin(dlat / 2)**2 +\n math.cos(math.radians(lat1)) * math.cos(math.radians(lat2)) *\n math.sin(dlon / 2)**2)\n\n c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))\n return R * c # Returns km\n\n # Building the Distance Matrix:\n def build_haversine_matrix(locations):\n n = len(locations)\n matrix = [[0] * n for _ in range(n)]\n for i in range(n):\n for j in range(n):\n if i != j:\n matrix[i][j] = haversine(\n locations[i].lat, locations[i].lng,\n locations[j].lat, locations[j].lng\n )\n return matrix\n\n\n**Practical usage:** Amazon and Grab often use Haversine as a \"Candidate Filter\" to eliminate points that are obviously too far away before calling more expensive algorithms.\n\n## 2. The Ideal Solution for the Base Problem: Routing Engine (OSRM / GraphHopper)\n\nIf your problem **only requires delivering from a fixed warehouse to customers** , **without caring about real-time traffic** or **rush hour histories** , and solely needs distance based on the **actual existing road network** — then this is the perfect, most cost-effective solution.\n\n> **Prerequisite:** This section assumes familiarity with Order Allocation Algorithms and how a VRP solver consumes a pre-built distance matrix.\n\nYou need a **Routing Engine** to load road network graph data. This data is usually downloaded for free from **OpenStreetMap (OSM)** — which the community constantly updates whenever a new road is built, an alley is removed, or weight restrictions change. When the roads change, you simply re-download the map file (`.osm.pbf`) and restart the engine.\n\n### Why not standard Dijkstra or A*?\n\nIf you run a standard **Dijkstra** or **A** * algorithm to find paths between 10,000 pairs on a city-wide map (which has millions of nodes), your server will freeze because the graph is too large to traverse per query.\n\nTherefore, modern Routing Engines use \"pre-processing\" techniques:\n\n 1. **Contraction Hierarchies (CH):** Spends a few hours pre-analyzing the map to create \"shortcuts\" (virtual highways) between areas. At query time, the algorithm jumps across these shortcuts rather than traversing every alley → Query time drops to **microseconds**. The downside is that if a road closes, you must rebuild the entire map.\n 2. **Multi-Level Dijkstra (MLD) / Customizable Route Planning (CRP):** Divides the map into small cells. Updating the cost of a road only requires updating that specific cell (taking seconds) rather than the whole map. Great for real-time traffic injection.\n\n\n\n### The Open-Source Giants: OSRM and GraphHopper Distance Matrix\n\nIn custom logistics systems, the two most famous open-source engines are **OSRM** (C++) and **GraphHopper** (Java).\n\nFeature | OSRM (Open Source Routing Machine) | GraphHopper\n---|---|---\n**Language** | C++ | Java\n**Main Algorithms** | MLD and CH | CH, A*, Landmark\n**Routing Profiles** | Written in Lua (driving, cycling, walking) | Written in Java / YAML\n**Matrix API** | Unbelievably fast (C++ optimized) | Good, but OSRM edges out on massive matrices\n**Flexibility** | Quite rigid, hard to change rules at runtime | Highly flexible (Custom Models) allows runtime rule changes\n**Community** | Backed by Mapbox, widely used for static routing | Easy to embed in Java apps, very flexible\n\n**Routing Profiles:**\nThe route for a motorcycle navigating tight alleys is completely different from a 10-ton truck (which is banned from small roads). Both OSRM and GraphHopper allow you to define **Profiles**. In OSRM, you write Lua files: `car.lua`, `truck.lua`, `bike.lua` to tell the engine which roads are legal.\n\n### OSRM Table API: Lightning Fast Matrices\n\nAssuming you self-host an OSRM server (very easy via Docker), it provides a `table` API that generates distance matrices optimally:\n\n\n\n # Request to local OSRM to calculate a 3x3 matrix\n curl \"http://localhost:5000/table/v1/driving/106.70,10.77;106.71,10.78;106.72,10.79?annotations=distance,duration\"\n\n # Returns both durations (seconds) and distances (meters)\n {\n \"durations\": [\n [0, 150, 320],\n [155, 0, 180],\n [330, 175, 0]\n ],\n \"distances\": [\n [0, 1200, 2400],\n [1250, 0, 1100],\n [2500, 1150, 0]\n ]\n }\n\n\n_Note: OSRM Table API can return a 100x100 matrix (10,000 elements) in just a few dozen milliseconds!_\n\n**Pros of self-hosting:** Free, insanely fast, zero rate limits.\n**Cons:** RAM heavy (especially if loading an entire country's map).\n\n## How to Calculate a Distance Matrix with GraphHopper\n\n**GraphHopper** is the most developer-friendly open-source routing engine for building a distance matrix in Java or via a self-hosted HTTP API. Unlike OSRM, GraphHopper supports runtime routing rule changes via **Custom Models** — making it the preferred choice when your delivery fleet has vehicle-specific constraints (weight limits, road class restrictions).\n\n### Option A: GraphHopper Matrix API (Self-hosted)\n\nSelf-host the GraphHopper server with Docker and call the `/matrix` endpoint:\n\n\n\n # Start GraphHopper server with a local OSM map file\n docker run -d -p 8989:8989 \\\n -v $(pwd)/data:/data \\\n israelhikingmap/graphhopper \\\n --url https://download.geofabrik.de/asia/vietnam-latest.osm.pbf \\\n --host 0.0.0.0\n\n\n\n import requests\n import json\n\n def build_graphhopper_distance_matrix(locations: list[dict]) -> dict:\n \"\"\"\n Calculate a full GraphHopper distance matrix for a list of lat/lng points.\n Returns duration (seconds) and distance (meters) for every pair.\n Args:\n locations: List of dicts with 'lat' and 'lng' keys.\n Returns:\n dict with 'durations' and 'distances' 2D arrays.\n \"\"\"\n url = \"http://localhost:8989/matrix\"\n\n # GraphHopper Matrix API requires [lng, lat] order (GeoJSON convention)\n points = [[loc[\"lng\"], loc[\"lat\"]] for loc in locations]\n\n payload = {\n \"points\": points,\n \"profile\": \"car\", # Routing profile: car, bike, foot\n \"out_arrays\": [\"times\", \"distances\"], # Request both duration and distance\n \"fail_fast\": False # Return partial results if a pair is unreachable\n }\n\n response = requests.post(url, json=payload, timeout=30)\n response.raise_for_status()\n data = response.json()\n\n return {\n \"durations\": data[\"times\"], # N×N matrix in seconds\n \"distances\": data[\"distances\"] # N×N matrix in meters\n }\n\n # Example: 3 warehouses / delivery points in Ho Chi Minh City\n locations = [\n {\"lat\": 10.7712, \"lng\": 106.7011}, # Warehouse\n {\"lat\": 10.7780, \"lng\": 106.7100}, # Customer A\n {\"lat\": 10.7650, \"lng\": 106.6980}, # Customer B\n ]\n\n matrix = build_graphhopper_distance_matrix(locations)\n print(f\"Duration matrix (seconds): {matrix['durations']}\")\n print(f\"Distance matrix (meters): {matrix['distances']}\")\n # Output:\n # Duration matrix (seconds): [[0, 320, 185], [315, 0, 410], [180, 405, 0]]\n # Distance matrix (meters): [[0, 2100, 1350], [2050, 0, 2900], [1300, 2880, 0]]\n\n\n### Option B: GraphHopper Java SDK (Embedded)\n\nFor Java-based logistics backends, embed GraphHopper directly without a network hop:\n\n\n\n // Embedded GraphHopper distance matrix calculation\n // Purpose: Build an NxN distance matrix without requiring a running HTTP server\n import com.graphhopper.GraphHopper;\n import com.graphhopper.config.CHProfile;\n import com.graphhopper.config.Profile;\n import com.graphhopper.routing.util.EncodingManager;\n\n GraphHopper hopper = new GraphHopper();\n hopper.setOSMFile(\"/data/vietnam-latest.osm.pbf\");\n hopper.setGraphHopperLocation(\"/data/graph-cache\");\n hopper.setProfiles(new Profile(\"car\").setVehicle(\"car\").setWeighting(\"fastest\"));\n hopper.getCHPreparationHandler().setCHProfiles(new CHProfile(\"car\"));\n hopper.importOrLoad();\n\n // Use GHMatrixAPI to compute full NxN matrix\n // See: https://github.com/graphhopper/graphhopper/tree/master/web-api\n\n\n### GraphHopper vs. OSRM: Which to Choose for Distance Matrix?\n\nCriterion | GraphHopper Distance Matrix | OSRM Table API\n---|---|---\n**Speed (large matrices)** | Fast (CH/MLD) | Slightly faster (C++ optimized)\n**Custom vehicle rules** | ✅ Runtime Custom Models | ❌ Requires recompile + Lua\n**Java ecosystem** | ✅ Native SDK | ❌ HTTP only\n**Docker ease** | ✅ Official image | ✅ Official image\n**OSM data format** | `.osm.pbf` | `.osm.pbf`\n**Best for** | Flexible profiles, Java backends | Max throughput, static profiles\n\n**Rule of thumb:** Use **OSRM** if you have a fixed vehicle type and need maximum raw speed. Use **GraphHopper** if you need to change routing rules at runtime (truck weight limits, toll avoidance, time-dependent costs).\n\n## 3. Expensive Overkill: Commercial APIs (Google Maps / Mapbox)\n\nIf you are building a Ride-Hailing app that requires minute-perfect Estimated Time of Arrival (ETA) so customers don't cancel, you need real-time traffic data and historical rush hour patterns. In that case, you must use commercial APIs (like Google Maps).\n\n**However, for static delivery routing from a fixed warehouse, this is entirely overkill and extremely wasteful.**\n\n\n\n # Calling Google Maps Distance Matrix API (Very Expensive)\n import requests\n\n url = \"https://maps.googleapis.com/maps/api/distancematrix/json\"\n params = {\n \"origins\": \"10.77,106.70|10.78,106.71\",\n \"destinations\": \"10.77,106.70|10.78,106.71\",\n \"departure_time\": \"now\", # Current traffic\n \"key\": \"YOUR_API_KEY\"\n }\n\n\n### The Exorbitant Cost\n\nGoogle Maps charges about **$0.005** per Element (Pair).\nReturning to our 1 warehouse + 100 orders = 10,201 pairs.\nEvery time you run the allocation algorithm, you pay: `10,201 × $0.005 = $51`.\nIf your warehouse dispatches 10 batches a day, you lose **$510/day** just to calculate distances!\n\nThis is exactly why domestic delivery companies self-host **OSRM** or **GraphHopper** instead of throwing money at Google Maps for warehouse routing. The OpenStreetMap network is more than adequate for this need.\n\n## 4. System Design Strategies to Save Compute\n\nTo maintain high accuracy without overloading servers, large systems use these tricks:\n\n### Technique 1: Clustering\n\nIf 5 orders are going to the same apartment building, cluster them into a single coordinate. A 101x101 matrix drops to 97x97.\n\n### Technique 2: Hybrid Matrix (Haversine + OSRM)\n\nOnly calculate precise road distances for points that could realistically follow one another.\n\n * If Point A and Point B are > 15km apart (via Haversine), assign an infinite cost. The VRP solver will naturally never route a vehicle from A to B.\n * Only call OSRM for nearby pairs (< 5km).\n\n\n\n### Technique 3: H3 Hexagon Caching (How Uber/Grab does it)\n\nRecalculating the distance between two nearby alleys day after day is a waste of resources. Logistics giants use Uber's **H3 (Hexagonal Hierarchical Spatial Index)** to cache distance results.\n\n#### Why Hexagons and not Geohash (Squares)?\n\nIn a square grid, the distance from the center to the 4 straight edges differs from the distance to the 4 corners. In a hexagon grid, the distance from the center to all neighboring cells is **exactly equal**. This is critical in routing because distance error margins are uniformly controlled in all directions.\n\n#### How it works:\n\n**1. Pick a Resolution:**\n**H3 Resolution 9** is typically used (each hexagon is about 174 meters wide). This perfectly encapsulates a small residential block or a street segment.\n\n**2. Convert Coordinates (Lat/Lng) to H3 Index:**\n\n\n\n import h3\n\n # Convert Point A and Point B\n h3_A = h3.geo_to_h3(10.7712, 106.7011, 9) # Result: '8965a6a0033ffff'\n h3_B = h3.geo_to_h3(10.7780, 106.7100, 9) # Result: '8965a6a0027ffff'\n\n\n**3. Check Cache (Redis) before invoking the Engine:**\n\n\n\n redis_key = f\"route_cost:{h3_A}:{h3_B}\"\n\n cache_result = redis.get(redis_key)\n if cache_result:\n return json.loads(cache_result) # Cache Hit! Zero computation.\n else:\n # Cache Miss! Call OSRM\n result = call_osrm_api(lat1, lon1, lat2, lon2)\n\n # Save to Redis for next time (TTL = 30 days)\n redis.set(redis_key, json.dumps({\n \"distance_m\": result.distance,\n \"duration_s\": result.duration\n }), ex=2592000)\n\n return result\n\n\n#### Pre-warming the Cache\n\nInstead of waiting for a customer to order, the system can run a batch job at night:\n\n 1. Fetch all H3 cells that contain residential areas in the city.\n 2. Use OSRM (since it's free and local) to pre-calculate the distance between all H3 pairs (within 10km of each other).\n 3. Pump this data into Redis.\n\n\n\nWhen the OR-Tools system runs the next day, it reads purely from Redis RAM at the speed of light without needing to traverse any graphs. Because road networks rarely change (only when there's long-term construction), the Cache Hit ratio can exceed **95%** , giving you the speed of Haversine but the accuracy of a Routing Engine!\n\n> _Summary of the Series: Order Fulfillment is not just a CRUD application tracking statuses. It is a symphony of event-driven microservices, real-time inventory algorithms, OR-Tools optimization, and ultra-fast Routing Engines processing spatial data._\n\n## References & Further Reading\n\n * Uber Engineering: H3 Hexagonal Hierarchical Spatial Index\n * Google Maps Platform Pricing: Distance Matrix API\n * Open Source Routing Machine (OSRM)\n * GraphHopper Routing Engine Documentation\n * GraphHopper Matrix API Reference\n * OpenStreetMap Vietnam Data (Geofabrik)\n\n\n\n🔗 **Next Step:** Now that you understand how to build an accurate distance matrix with GraphHopper, see how this feeds into Part 6 — Building a Mini Allocation Engine and the complete Series Executive Summary.\n\n_This post was originally published on my blog at GraphHopper Distance Matrix: Self-Host OSRM vs Haversine for Route Optimization._\n\n**Hi, I'm Lê Tuấn Anh (vesviet) 👋**\n_I am a Senior Go Backend Architect & Distributed Systems Engineer with 17+ years of experience building high-traffic platforms (25M+ requests/month)._\n_If you enjoyed this deep-dive, let's connect on LinkedIn or explore my consulting services at tanhdev.com/hire._",
"title": "[System Design] GraphHopper Distance Matrix: Self-Host OSRM vs Haversine for Route Optimization"
}