Controlling the Ctrl-C
The REPL UI issue
As a local AI assistant, Tiles embeds Pi agent for agent harness. Since the REPL is written in Rust and Pi is in TypeScript, we embed the Pi Bun binary and use it via Pi's RPC mode, spawn a headless Pi binary as a child process, and communicate with it via standard streams (stdin/stdout/stderr) in JSON format. The user input to the model and the streamed output from the model come and go through Pi. It sits between the REPL and the inference system.
As with any LLM inference interface, Tiles REPL must stop the output streaming from the model as soon as the user presses Ctrl-C and the REPL should return to prompt state, ideally like this.
But in the versions before v0.4.11, although the streaming ends and REPL returns to prompt state, on the next user input we were greeted with a broken pipe error like Err value: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }. Because of this, users had to restart the REPL to continue. More details are in this issue.
The broken pipes
As mentioned, we spawn the Pi process as a child process and communicate through its stdin and stdout. For this reason the streams are connected via pipes rather than the child inheriting the parent's streams (which is the default behavior). If the child inherited the parent's streams, communication would be messy and processes would have to filter what they need from the shared stream, which would introduce race conditions.
When piped, the parent can write to the child's stdin; the child writes to its own stdout, which the parent can then read, giving a clear separation of concerns. This is how we explicitly set the streams to be piped in Rust.
So when we press Ctrl-C, the SIGINT (signal interrupt) event is propagated to the child processes, and if they don't handle it then the default behavior is to exit. To monitor this, we can use the following commands on Unix systems.
Here the processId is 7011, the parent's processId is 65985 (system root is parent), and the process's groupId is 7011. To inspect the children, we can use:
We can see the Pi process is running as a child of Tiles with PID 88099.
We can use lsof to further monitor their relationship as follows:
Here we can see Pi's stdin (FD=0) is connected to Tiles stdout (FD=16) and vice versa.
So when we press Ctrl-C to stop a streaming response, the signal propagates to the Pi process and Pi exits. When we try a new user prompt next time in the REPL, unbeknownst to the REPL that such a process does not exist, it still tries to write to Pi's non-existent stdin, which gives us a broken pipe, aka a broken connection.
Letting them go
Since by default a child process is in the same process group (pgid) as the parent, the SIGINT event is propagated to all the processes in the group. One way is to remove the child from the same group as the parent. This means they still have a parent-child relationship, but the parent is no longer in direct control of the children. On Unix we can use setsid for this to create a new session and set the current process as the leader of it.
But to do that in Rust is an unsafe operation (where we lose safety assurance from the compiler), as setsid is only available in the nightly version as of now, so we need to call it via C FFI. So setsid is achieved via other libraries such as libc, nix, etc., of course colored by the unsafe keyword.
This resolves our broken pipe issue, as the SIGINT is never reaching the Pi process. But now we have another issue on hand: the model streaming is non-stoppable via Ctrl-C, and the REPL is completely unresponsive during the entire time we are streaming the output from Pi's stdout. It's no longer respecting SIGINT.
Controlling the Control-C
Thanks to a serendipitous moment while surfing the rustyline repo, which we use for building a nice UX for our REPL, we found that the version we were using (v17) had a bug which masks the SIGINT event. So we upgraded to the latest version, and now we are receiving SIGINT while the model is streaming, but the SIGINT exits the program altogether instead of stopping the stream and returning to the user prompt.
This is in fact expected, as normally programs should handle SIGINT themselves if they want to do cleanups, graceful shutdowns, etc. So we use the ctrlc library to handle SIGINT, which uses a dedicated thread for handling the event.
But again, for some reason we are back to square one where the REPL is non-responsive to Ctrl-C when it's streaming the output.
Turns out the way we read from Pi's stdout is a synchronous, blocking operation. So we tried converting all the functions related to this to async using the corresponding async functions provided by tokio (an async runtime library for Rust). For example, the core operation here is using a buffered reader to read from Pi's stdout efficiently, so we replace the BufReader from the std library with the async BufReader provided by the Tokio runtime.
Tokio uses co-operative scheduling to switch between its tasks, so when we use an async function, it will yield frequently instead of blocking throughout the process.
Once we refactored the codebase to be async, we started getting SIGINT events in the handler we set using the ctrlc library before, and the program no longer exits either. Now all we have to do is abort the Pi session by sending an abort event to Pi and do the cleanup from our side.
For more details on the sync-async conversion, see the PR diff.
Unexpected entry of SIGPIPE
The interesting thing now is that when we exit the main REPL program, the Pi process also exits, which shouldn't be the case as both are now in different process groups, right? Could this be related to the pipes getting closed on one end?
Although this is fine for us, as we don't want the Pi process to be a background daemon and go rogue, it's important to understand what's happening under the hood, as we also have a Tiles daemon process (which is a background headless Tiles HTTP server) that is still alive even after the main REPL program closes, as it's supposed to be (this was also spawned in a different process group).
PID=88098 is our daemon.
Why the dual behavior for the same action? For that we can live-debug the Pi program using lldb (LLVM debugger) to see what happens when the parent exits. We will attach the Pi process to lldb, add a breakpoint for SIGPIPE, then step through to see if Pi is handling SIGPIPE or not. The actions we take are commented with numbered index.
As seen in the lldb logs, the program exits as soon as it receives SIGPIPE, so Pi doesn't have a handler for SIGPIPE, which causes it to exit.
Conclusion
Debugging a seemingly trivial terminal UI issue led us into a rabbit hole of standard streams, pipes, Unix processes, and their dynamic behavior on system signals with respect to their parent, and finally to the problems caused by blocking I/O in a UI and how async Rust can fix it.
Discussion in the ATmosphere