[Review] SF-0007: Introducing Swift Subprocess

icharleshu · March 1, 2024, 9:34pm

I fully agree, which is why I'm not sure how should a combined stream behave. And I don't believe Subprocess should be define one (at least for now). I understand the that a naive usage of the closure (i.e. only reading the input or reading input/error in sequence) could lead to deadlock, but that's why the closure case is the "advance use case". You are given raw streams from each IO and you decided what to do depending on the use case. One of the major reasons why I chose AsyncSequence to represent the output here, besides it's async, is that it's composable. You can build your own stream however you want.

icharleshu · March 1, 2024, 9:43pm

Could you elaborate on a specific advanced use case that this design might not support (specifically something that the existing design prohibits us from adding support in the future)?

It really sounds like you want a daemon with some sort of IPC. Neither Subprocess nor NSTask/Process was designed to support long running daemons and I would not suggest you use either to launch a daemon.

Granted we don't have a good API for you to launch daemons in Swift today, but that's way beyond the scope of Subprocess and this proposal.

tera · March 1, 2024, 11:16pm

wadetregaskis:

To be clear, the problem I see - and it's a problem with Process and all 3rd party equivalents today, in Swift - is that the following code looks at a glance like it's fine but is fundamentally wrong:
let subprocess = Process(…) // Or equivalent.

for line in subprocess.standardOutput.lines {
    …
}
Until the subprocess writes something to stdout (or closes stdout) that will hang indefinitely. So if the subprocess instead writes to stderr (and doesn't exit, or the stderr pipe buffer fills up), not only will the program hang permanently but you will never see that error output (making it hard to debug the problem).

Where is the root cause of this problem? Is it using AsyncBytes API instead of, say, InputStream? AFAIR with the latter you never block on read, read returns the bytes immediately available, there's a call to check how many bytes there are available to read, also a delegate as an alternative mechanism. It's older, non async/await aware API but at least it doesn't cause the problem you are talking about, or does it?

icharleshu · March 1, 2024, 11:23pm

This is a great point and thank you for bringing it up. IMO POSIX_SPAWN_CLOEXEC_DEFAULT is essential enough (I believe NSTask uses it by default) that we should emulate on Linux. Now the question is: should this emulation be the default? IMO on Darwin having POSIX_SPAWN_CLOEXEC_DEFAULT is the right choice, but if we were to emulate this on Linux we'll most likely have to use fork/exec directly to avoid the memory racing problem that you mentioned. This unfortunately means we can't use posix_spawn on Linux anyone, which is not ideal because posix_spawn is probably more performant than straight up fork/exec.

What's everyone's thoughts on this? Should we aim for the emulation of POSIX_SPAWN_CLOEXEC_DEFAULT on Linux in favor of being able to use posix_spawn? I suppose we can add an option and fall back to posix_spawn if POSIX_SPAWN_CLOEXEC_DEFAULT is not needed...

icharleshu · March 1, 2024, 11:25pm

Subprocess is definitely planned to support Darwin, Linux, and Windows (it might support more platforms in the future too! Please let us know your favorites). However, the implementation will mostly likely not ready "out of box" on Windows just yet, but we are working on it.

icharleshu · March 1, 2024, 11:30pm

Our goal is to eventually replace the existing Process, which is why this API needs to exist in SwiftFoundation. I don't think we want to go with the approach of maintaining a separate package for now simply because that seemed too granular. SwiftFoundation itself is designed to be modular and Subprocess will go into the slim FoundationEssentials module (as opposed to the not so slim FoundationInternationalizations.

jim · March 1, 2024, 11:46pm

As long as there's no repeat of FoundationNetworking, I'm happy. Thanks!

icharleshu · March 1, 2024, 11:48pm

This is a great point! I'm torn between using executable or not having one at all. What are everyone's thoughts?

icharleshu · March 1, 2024, 11:55pm

Thinking more on this, I don't think Subprocess will support this use case. IMO launching eternal processes (daemons) is a different enough use case than launching a Subprocess that it deserves its own API instead of us trying to retrofit this type to support both.

rauhul · March 2, 2024, 12:05am

I dont think these are safe to use with the Swift Runtime in the mix, I don't recall exactly why though. Maybe @Mike_Ash remembers?

icharleshu · March 2, 2024, 12:29am

This is a great point. The reason borrow/consume didn't come up in the proposal is because FileDescriptor is not ~Copyable yet. But if it does, I agree that we can no longer have an API like this that "decides whether it should borrow or consume" at runtime. In that case we'll probably have to optimize for the happy path which is to always consume.

Jean-Daniel · March 2, 2024, 12:33am

I think the fork/exec issue is more a macOS/iOS issue than a swift runtime one. Most if not all macOS Frameworks do not support fork/exec.

That said, I think that on linux platform, common use cases can be solved by using posix_spawn_file_actions_addclosefrom_np() when available.

tera · March 2, 2024, 2:02am

I'm trying to model the issue Wade is talking about to better understand it:

import Foundation
func main() {
    if CommandLine.arguments.count > 1 && CommandLine.arguments[1] == "child" {
        print("child process")
        let process = Process()
        let file = process.standardOutput as! FileHandle
        let data = "Hello\nWorld\n".data(using: .utf8)!
        file.write(data)
        try! file.synchronize()
    } else {
        print("parent process")
        Task {
            let url = Bundle.main.executableURL!
            let process = try! Process.run(url, arguments: ["child"])
            let bytes = (process.standardInput! as! FileHandle).bytes
            for try await v in bytes {
                print("got byte \(v)")
            }
        }
    }
}
main()
RunLoop.current.run(until: .distantFuture)

What I am doing wrong? I see the child process output in the console and I don't get anything in the "for try await v in bytes" loop.

wadetregaskis · March 2, 2024, 5:54am

Process's standardInput is the current process's stdin by default. So your 'parent process' is waiting for you to type something.

This highlights a hazard of Process - that it defaults to using the current process's standard I/Os, which I find very unintuitive and likely to cause exactly these sorts of errors.

You need to explicitly create a Pipe instance and assign it to the relevant channels on Process (before invoking run), for any through which want to communicate with the subprocess. You'll then want to read from the standardOutput pipe inside the parent, not standardInput.

It might be worth considering more helpful names for these in their Subprocess incarnation. childInput / childOutput / childError, perhaps?

wadetregaskis · March 2, 2024, 6:13am

Well, a lot of the things I've mentioned - like the restrictive closure, which - while it doesn't preclude any particular designs, technically - is not friendly to use-cases that inherently need to interact with the subprocess from many places. It imposes a burden, of having to manage a long-lived closure and set up yet more communication channels to talk to it. That sort of thing is easier (IMO) if done by e.g. an actor, not a closure.

In fact I anticipate using XPC (or similar) in order to better lock-down the subprocess in this case - because Process and all similar APIs don't provide that facility - but that doesn't help anything other than security, while adding a lot more complexity and labour. Ultimately I still have to talk to a subprocess via Process (or its equivalent).

It sounds like what you're implying is that all non-trivial use-cases should be satisfied by libraries. That'd be great, but it's just not reality. e.g. there's really only one viable tool out there for image metadata, exiftool, and as far as I'm aware there's no way to interact with it (as a Perl program) in any way but as a subprocess like this. I did in fact consider - quite seriously - trying to write a native Swift version of exiftool, but that's a life's work kind of magnitude of task. Just ask Phil Harvey.

The problem is fundamentally that you have two channels you must observe simultaneously. It doesn't really matter if AsyncBytes provides a way to tell if there's data available, because that would just mean now you have to figure out a way to turn that polling API into a clean & efficient async API (which I don't think you can, short of extracting the file descriptors out of the API in order to bypass it entirely and actually calling select on them, or similar).

APIs that invoke a closure on data availability do give you an avenue, but it's not ideal - I'd prefer to write async code. And I don't see a sensible way to do that, in any case, on Process or this Subprocess proposal.

If buffers were infinite then this would only afflict interactive subprocesses, because non-interactive ones would just write all their output and exit (so even if they don't write to stdout, the parent awaiting on it will still move on because the async sequence will terminate as complete albeit empty). Unfortunately, pipe buffers are never infinite. So even a trivial, non-interactive program can deadlock with the naive code I shared.

wadetregaskis · March 2, 2024, 6:46am

Remember that this isn't just 'daemons' (meaning processes that live indefinitely). Having the child process outlive the parent is also a common occurrence in helper tools, pipelines, and task systems.

e.g. you might have a frontend CLI tool that sets up the environment or otherwise prepares the way for the "real" tool, launches that, and then has no further purpose so simply exits. You can have it hang around indefinitely waiting for the child to exit, but that's wasteful.

And in some cases might it be necessary that it exits, or at least that the child reparents, because there may be a grandparent process that's supposed to supervise its descendants? (I haven't dealt with that particular aspect in a long while, but I vaguely recall that some things can only be monitored on immediate children, not grandchildren)

Sometimes needs can be satisfied with exec & variants, in principle… I haven't had to do that myself in a long while, but from what I recall that's hard to get right w.r.t. security (lingering file descriptors etc). And it may be undesirable for other, more mundane reasons, like the fact that it abruptly terminates the parent whereas you may prefer to have a graceful shutdown (cleanly close sockets, clean up temporary files, etc). Or you might simply need to do things after the new process is launched (e.g. write out a final report).

And since there isn't any native Swift way to do any of that anyway (of which I'm aware…?), maybe it would be good to just support those use-cases in Subprocess anyway. It doesn't seem like a big deal to have configurable behaviour regarding parental attachment.

I do want to note - because I know all this debate might appear to suggest otherwise - that I'm very pleased to see this proposal and the tremendous work behind it. As I've alluded, I'm no big fan of the existing Process API and all its foibles. This new API does fix quite a lot of things about that, already. I'd just like to see it be all it can be; I don't want to have to keep using Process, or worse reinvent this wheel entirely.

Jean-Daniel · March 2, 2024, 7:40am

For such case, posix_spawn works quite well. I had such use case not so long ago, and after fighting against Process API and other high level API, I finally switch to using posix_spawn. And it greatly simplified my code.

I admit that having a Swift posix_spawn wrapper would be very helpful, the C API is not very nice to use from Swift, but I don't see it as competition against the Subprocess API. Maybe what you need is a low-level API to launch processes, that can be use to implements Subprocess and other high level API.

tera · March 2, 2024, 1:07pm

I don't quite understand how... Process only gives me two inits, init() "An initialized process object with the environment of the current process." and class func run(url... "Creates and runs a task with a specified executable and arguments". Ditto for NSTask named version of Process we have in Obj-C. Do I create a copy of the current process with init() and override all the fields or how could I do something before run?

If you have to jump through hoops with async code it's worth considering the alternatives. However even with a closure that's invoked on data arrival there are questions to answer:

what would happen if you are still in that closure and the new data arrives.
what happens on the sending side? Is it also closure based? Same question there: what happens when the system asks you for a new data but you are still in the closure providing the "previous" data?

BTW, why they can't be infinite? Not literally, but limited only by available memory. We have no problem that when we create a class the whole process could be terminated on out of memory error, same could happen if memory overflows in read/write buffers. On the second thought not a good idea.

hassila · March 2, 2024, 3:17pm

I’d probably not have one at all as the verb run preceding it contextually makes it quite clear at the point of usage. Perhaps also platformOptions -> options is as clear, but more concise?

wadetregaskis · March 2, 2024, 5:16pm

let process = Process()
let stdout = Pipe()
process.standardOutput = stdout
process.run()

// Now use stdout.fileHandleForReading

This builder-like pattern is pretty common in Apple's pre-Swift frameworks, because Objective-C didn't support default arguments and so you couldn't just use an initialiser as you can in Swift. (plus it was also the subjective style of the time)