{
"$type": "site.standard.document",
"canonicalUrl": "https://rednafi.com/go/topological-sort/",
"description": "Implement topological sorting in Go to order tasks by dependencies. Process directed acyclic graphs for build systems and scheduling.",
"path": "/go/topological-sort/",
"publishedAt": "2024-10-13T00:00:00.000Z",
"site": "at://did:plc:fgtm2c26vfcj74rfmeggbyqj/site.standard.publication/3mnl6f7ob462z",
"tags": [
"Go",
"Data Structures"
],
"textContent": "I was fiddling with graphlib in the Python stdlib and found it quite nifty. It processes a\nDirected Acyclic Graph (DAG), where tasks (nodes) are connected by directed edges\n(dependencies), and returns the correct execution order. The \"acyclic\" part ensures no\ncircular dependencies.\n\nTopological sorting is useful for arranging tasks so that each one follows its dependencies.\nIt's widely used in scheduling, build systems, dependency resolution, and database\nmigrations.\n\nFor example, consider these tasks:\n\n- Task A must be completed before Tasks B and C.\n- Tasks B and C must be completed before Task D.\n\nThis can be represented as:\n\n\n\n{{< mermaid >}}\ngraph TD\n A --> B\n A --> C\n B --> D\n C --> D\n{{</ mermaid >}}\n\n\n\nHere, A can start right away, B and C follow after A, and D is last,\ndepending on both B and C.\n\nThe task order can be determined as:\n\n1. A\n2. B and C (in parallel, since both depend only on A)\n3. D (which depends on both B and C)\n\nThis method ensures tasks are executed in the right sequence while respecting all\ndependencies.\n\nTo resolve the above-mentioned case with graphlib, you'd do the following:\n\nRunning this will print the following:\n\nSince Python's stdlib already has graphlib, I thought I'd write a sloppy one in Go to\nlearn the mechanics of how it works.\n\nWriting a topological sorter in Go\n\nThe API will be similar to what we've seen in the graphlib example.\n\nDefining the graph structure\n\nFirst, we need a graph structure to hold the tasks and their dependencies. We'll use an\nadjacency list to represent the graph, and a map to track the in-degree of each node (how\nmany tasks it depends on).\n\nHere:\n\n- vertices: a list of tasks that each node points to (i.e., its dependents).\n- inDegree: how many tasks must finish before each task can be processed.\n- queue: tasks that can be processed because they have no unmet dependencies.\n- active: how many tasks are currently ready for processing.\n\nAdding dependencies\n\nNext, we'll define how one task depends on another. The AddEdge function sets up this\nrelationship, ensuring the source task knows it must finish before the destination\ntask can proceed.\n\n- The destination task is added to the list of tasks that the source task points to,\n marking the dependency.\n- The in-degree of the destination task is increased by 1 because it depends on the\n source task.\n- If the source task is new, we initialize its in-degree to 0.\n\nInitializing and processing tasks in batches\n\nNow we'll initialize the graph by identifying tasks that can be processed immediately -\nthose with an in-degree of 0 (i.e., they have no dependencies). We then process tasks batch\nby batch.\n\n- This function finds tasks with an in-degree of 0 (no dependencies) and adds them to the\n processing queue.\n- The active count keeps track of how many tasks are ready to run.\n\nProcessing each batch of tasks\n\nWe use GetReady to retrieve the next batch of tasks that are ready for processing. These\nare tasks with no unmet dependencies.\n\n- GetReady pulls the current batch of tasks from the queue and clears it for the next\n batch.\n- Tasks are returned in the order they are ready to be processed.\n\nMarking the processed tasks as done\n\nOnce a batch of tasks is completed, we mark them as done and reduce the in-degree of any\ntasks that depend on them.\n\n- For each completed task, we reduce the in-degree of any dependent tasks.\n- If a dependent task's in-degree reaches 0, it's added to the queue and is now ready to be\n processed in the next batch.\n\nRunning the full topological sort\n\nFinally, we'll implement the TopologicalSortBatch function, which processes all tasks in\nbatches until none are left.\n\n- Prepare loads the first set of tasks that can be processed.\n- IsActive checks if there are any tasks left to process.\n- GetReady retrieves the next batch of tasks to process.\n- Done marks tasks as finished, allowing dependent tasks to be processed next.\n\nUsing the sorter\n\nYou can use the API as follows:\n\nThis will return:\n\nHere, A needs to run first. B and C can run in parallel after A finishes, and only then can\nD run.\n\nComplete example\n\nHere's the full implementation, heavily annotated for clarity:\n\nYou can use this to make [custom task orchestrators].\n\nFin!\n\n\n\n\n\n[custom task orchestrators]:\n https://philuvarov.io/python-top-sort/",
"title": "Topological sort"
}