Observe Process while it is running

mpilman · March 26, 2023, 3:08pm

Hi all,

I am a relatively new Swift user (I am more experienced in C++ but giving this a go).

I am working on a test framework that is supposed to do the following:

Setup a empty PostgreSQL instance
Connect to it and, populate data, and run tests
Destroy the database

Now for reasons not related to Swift (the TL;DR is that postgres can't bind to port 0), I need to start postgres in a loop and check the output for errors and success messages in order to find a free port (I don't want to assume that the system has port 5432 or any other port available). This loop looks as follows:

        var _: Process? // to keep the reference count up
        try await withCheckedThrowingContinuation {
            continuation in
            do {
                _ = try Process.run(URL(filePath: "\(initDB)"),
                                    arguments: ["-D", "\(dataDir)"],
                                    terminationHandler: { process in
                                        assert(!process.isRunning)
                                        assert(process.terminationStatus == 0)
                                        continuation.resume()
                                    })
            } catch {
                continuation.resume(throwing: error)
            }
        }
        // find a network address we can use on the loopback device
        server = Process()
        var stdoutPipe = Pipe()
        var stderrPipe = Pipe()
        var port = 5432
        let startServer = {
            self.server.executableURL = URL(filePath: "\(postgres)")
            self.server.arguments = ["-D", "\(dataDir)", "-p", "\(port)"]
            self.server.standardOutput = stdoutPipe
            self.server.standardError = stderrPipe
            try self.server.run()
        }
        // we need to try out ports as multiple tests could run in parallel
        while true {
            try startServer()
            if try await PostgresTestInstance.checkLaunchOutput(stdout: stdoutPipe, stderr: stderrPipe) {
                break
            }
            server.terminate()
            server = Process()
            stdoutPipe = Pipe()
            stderrPipe = Pipe()
            port += 1
        }

The way I am attempting to check for errors is by using this async function:

    private static func checkLaunchErrOrSuccess(handle: FileHandle, prefix: String) async throws -> Bool {
        for try await line in handle.bytes.lines {
            print("\(prefix): \(line)")
            if line.ends(with: "FATAL:  could not create any TCP/IP sockets") {
                fflush(stdout)
                return false
            } else if line.ends(with: "LOG:  database system is ready to accept connections") {
                fflush(stdout)
                return true
            }
        }
        fflush(stdout)
        throw TestInstanceError.postgresError
    }

    private static func checkLaunchOutput(stdout: Pipe, stderr: Pipe) async throws -> Bool {
        return try await withThrowingTaskGroup(of: Bool.self) { group in
            group.addTask {
                try await checkLaunchErrOrSuccess(handle: stdout.fileHandleForReading, prefix: "Postgres (out)")
            }
            group.addTask {
                try await checkLaunchErrOrSuccess(handle: stderr.fileHandleForReading, prefix: "Postgres (err)")
            }
            let first = try await group.next()
            group.cancelAll()
            return first!
        }
    }

Now this doesn't work. It seems there's more than one bug in here (and the test behaves significantly different in XCode than if I run swift test on the command line). But my biggest problem, and the one I am observing consistently, is this:

It seems that the pipe only becomes readable AFTER the process terminates. This obviously makes all of the above useless, but it also makes it very hard to debug anything. Is this expected? And if yes, what would be the correct way of running a child process and asynchronously read its output?

eskimo · March 27, 2023, 8:07am

It seems that the pipe only becomes readable AFTER the process
terminates.

This is a pretty common phenomenon when dealing with pipes. If the child process does client-side buffering, data won’t show up on the parent side of the pipe until the client flushes its buffer.

As the parent, you don’t have direct control over the child’s buffering policy. Some programs have a command-line option to change their behaviour in this regard, although that’s pretty rare. Most programs use the default buffering policy described in the setvbuf man page. If that’s the case then you can implicitly switch the policy by using pseudo-terminal rather than a pipe. See the pty man page for more. It’s not much fun.

As a first step I recommend that you test the program from Terminal using cat. Imagine a tool that prints data to stdout periodically:

% cat main.swift 
import Foundation

func main() {
    while true {
        print(Date.now)
        sleep(1)
    }
}
main()
% swiftc main.swift

If you run it directly from your shell, stdout is connected to your terminal and you see lines of output each second:

% ./main 
2023-03-27 07:55:38 +0000
2023-03-27 07:55:39 +0000
2023-03-27 07:55:40 +0000
^C

Now pipe the output to cat:

% ./main | cat
^C

Here stdout is connect to a pipe and the tool now buffers. If you leave it long enough, you’ll eventually see one giant blob of output.

So, run your tool in this way and see if its log messages show up promptly.

Share and Enjoy

Quinn “The Eskimo!” @ DTS @ Apple

mpilman · March 27, 2023, 3:54pm

Thanks for your answer! I fear you might be right.

I did have some bugs which caused the behavior to be weird in a way I didn't understand. After a lot of debugging from my side the behavior now makes sense (also one weird issue is that Xcode sets the localization environment variables differently than the shell, but I think in the test driver I should set this explicitly anyways to make the tests behave more deterministically).

I was hoping that PostgreSQL would flush more often (as this would just make sense for a DB to do since "I want to pipe the logs" doesn't seem like an outrages use case ).

Anyways, instead of trying to bother with tty stuff (which I did like once in my life and I still have PTSD from that experience) I will next try to configure postgres to log into a file and then tail this file. This has the added benefit that I can tail json data which is a bit easier to work with.