{
"$type": "site.standard.document",
"canonicalUrl": "https://rednafi.com/python/unix-style-pipeline-with-subprocess/",
"description": "Build Unix-style command pipelines in Python using subprocess.run with stdout piping for efficient process chaining and output capture.",
"path": "/python/unix-style-pipeline-with-subprocess/",
"publishedAt": "2023-07-14T00:00:00.000Z",
"site": "at://did:plc:fgtm2c26vfcj74rfmeggbyqj/site.standard.publication/3mnl6f7ob462z",
"tags": [
"Python",
"TIL",
"Shell",
"Unix"
],
"textContent": "Python offers a ton of ways like os.system or os.spawn* to create new processes and run\narbitrary commands in your system. However, the documentation usually encourages you to use\nthe [subprocess] module for creating and managing child processes. The subprocess module\nexposes a high-level run() function that provides a simple interface for running a\nsubprocess and waiting for it to complete. It accepts the command to run as a list of\nstrings, starts the subprocess, waits for it to finish, and then returns a\nCompletedProcess object with information about the result. For example:\n\nThis prints:\n\nThis works great when you're carrying out simple and synchronous workflows, but it doesn't\noffer enough flexibility when you need to fork multiple processes and want the processes to\nrun in parallel. I was working on a project where I wanted to glue a bunch of programs\ntogether with Python and needed a way to run composite shell commands with pipes, e.g.\necho 'foo\\nbar' | grep 'foo'. So I got curious to see how I could emulate that in Python.\n\nTurns out you can do that easily with subprocess.Popen. This function allows for more\ncontrol over the subprocess. It starts the process and returns a Popen object immediately,\nwithout waiting for the command to complete. This allows you to continue executing code\nwhile the subprocess runs in parallel. Popen has methods like poll() to check if the\nprocess has finished, wait() to wait for completion, and communicate() for interacting\nwith stdin/stdout/stderr. For example:\n\nThe above example shows how you can fire off subprocess tasks to run in parallel, let them\nchug along in the background, do other stuff, and then collect the results at the end when\nyou need them. The goal here is to ping a couple of IP addresses in parallel using the\nsubprocess module. First, it creates an empty list to store the processes. Then it loops\nthrough the IPs, printing a message and kicking off a ping for each one using Popen() so\nthey run asynchronously in the background. The Popen objects get appended to the procs\nlist.\n\nAfter starting the pings, it simulates doing other work by sleeping for a second. Then it\nloops through the processes again, waits for each one to finish with communicate(), and\nprints out the process ID and return code for each ping. Running the script will give you\nthe following result (truncated for brevity):\n\nNow that we can run processes asynchronously and gather results, I'll demonstrate how I\nemulated a composite UNIX command using that technique.\n\nEmulating UNIX pipes\n\nSay you want to emulate the following shell command:\n\nI'm running MacOS. So this returns:\n\nThe ps -ef command outputs a full list of running processes, then the pipe symbol sends\nthat output as input to the head -5 command, which reads the first 5 lines from that input\nand prints just those, essentially slicing off the top 5 processes. We can emulate this in\nPython as follows:\n\nThis snippet uses the subprocess.Popen to run shell commands and pipe the outputs between\nthem. First, ps_cmd executes ps -ef and sends the full output to the subprocess.PIPE\nbuffer. Next, head_cmd runs head -n 5. The stdin of head_cmd is set to the stdout of\nps_cmd. This pipes the stdout from ps_cmd as input to head_cmd. Finally,\nhead_cmd.communicate() runs the composite command and waits for the whole thing to finish.\nThe final output of this snippet is the same as the ps -ef | head -5 command.\n\nHere's another example where we'll emulate the sha256sum < <(echo 'foo') command. On the\nleft side, sha256sum computes the SHA-256 cryptographic hash of an input. The construct\n<(echo 'foo') creates a temporary file descriptor containing the output 'foo' from echo,\nwhich is then redirected via < as standard input to sha256sum. Together this computes\nand prints the SHA-256 hash of the input string without needing an actual file. In this\nparticular case, we want to compute the hash of 3 different inputs in parallel by spawning\nthree separate processes.\n\nRunning this snippet will display the 3 hashes:\n\nFirst, we define a function called calculate_hash that accepts a bytes plaintext input and\nreturns a subprocess.Popen object. This function will spawn a new child process running\nthe sha256sum command. The stdin and stdout of the child process are configured as\nsubprocess.PIPE using the Popen constructor. This enables data to be piped between the\nparent and child processes. Inside calculate_hash, the plaintext input is written to the\nstdin pipe of the child process using proc.stdin.write(). This pipes the data into the\nchild's standard input stream. Next, proc.stdin.flush() method is called to ensure the\nchild process actually receives the input.\n\nThe main logic begins by initializing an empty list called procs. Then a loop runs 3\ntimes, each time generating a random 10-byte string using os.urandom. This string is\npassed to calculate_hash, which spawns a new sha256sum child process, pipes the random\ndata to it, and returns the Popen object representing the child. Each Popen is appended\nto the procs list, so now there are 3 child processes running in parallel.\n\nFinally, the procs list is iterated through and proc.communicate() is called on each\nPopen instance to read back the stdout pipe from the child. This contains the output of\nsha256sum, which is the hash of the random input string. The hash is then decoded,\nstripped, and printed to the console.\n\nFurther reading\n\n- [Effective Python - Item 52]\n\n\n\n\n[subprocess]:\n https://docs.python.org/3/library/subprocess\n\n[effective python - item 52]:\n https://effectivepython.com/",
"title": "Unix-style pipelining with Python's subprocess module"
}