[Review] SF-0007: Introducing Swift Subprocess

Yes it is, you can run any of the examples using the code from PR. And the math from the post seems to be working.

The collecting run is implemented here:

return (
  processIdentifier: subprocess.processIdentifier,
  standardOutput: try subprocess.captureStandardOutput(),
  standardError: try subprocess.captureStandardError()
)

Those methods are implemented here:

internal func captureStandardOutput() throws -> Data? {
  guard case .collected(let limit, let readFd, _) = self.executionOutput else {
    return nil
  }
  let captured = try readFd.read(upToLength: limit)
  return Data(captured)
}

internal func captureStandardError() throws -> Data? {
  guard case .collected(let limit, let readFd, _) = self.executionError else {
    return nil
  }
  let captured = try readFd.read(upToLength: limit)
  return Data(captured)
}

So after reading x bytes from stdout we jump to stderr. At the same time the process still writes to stdout, fills the buffer and deadlocks.

That said, you still can cat "Pride and Prejudice.txt", but you have to estimate to limit:

  • if you overestimate and say 1GB then PR will actually try to allocate that much memory, even if the stdout is much smaller
  • if you underestimate by pipe buffer size then it deadlocks
4 Likes

Oof, this seems like a big hole, I was assuming that this proposal used the concurrency-friendly APIs like FileHandle.AsyncBytes. When I was working on Shwift, I used SwiftNIO's NonblockingFileIO since the APIs weren't available at the time. Is there something missing from the core async IO support that is needed to make this work properly?

2 Likes

Is the right solution perhaps to default subprocesses to sharing the parent’s stdin/stdout/stderr, like shell scripts do? Not only does this make the simplest API more closely match the simplest motivating use case, it also makes the problem easier to understand and fix.

So, unless I'm missing something about UnsafeMutableBufferPointer's semantics re. memory initialisation, all it's doing is mmaping (essentially) that much anonymous memory, not actually touching it (until and only insofar as it's used). So while it's not free - it's potentially quite a lot of unnecessary VM work - it's probably not that expensive.

Unless the platform in question doesn't support page faulting, which I'm pretty sure every Swift-supported OS does.

Still, it's a curious choice - I would have just used ContiguousArray<UInt8>.

2 Likes

Process does this and it's bitten me more than once. Though possibly that's because of Process's API, whereby there's FileHandles there by default (for its standardOutput et al properties) which are actually the parent process's, and it can lead to you closing your own stdout & stderr. Which is (unsurprisingly) a bit tricky to debug, because now all your logging is lost. :confused:

It's also unclear what that means for stdin - e.g. what happens if both the parent and child read from stdin simultaneously? I don't think that's defined behaviour, beyond (maybe) that input will go to exactly one of them (undeterministically)? As the only logical possibility.

It is useful to have that behaviour sometimes, for stdout and stderr, as you noted. In "shell scripting" situations etc. I'm just not sure it's a wise default.

It might be that there isn't a good default. Maybe the user should always have to choose the desired behaviour, at least in lieu of clearly use-case-specific convenience initialisers which more clearly imply their default behaviour.

P.S. It's worth noting that default arguments can always be added after the fact, in a source-compatible and binary-compatible way. So it might be wise to start without them and only add them in later if and when experience suggests they're safe [enough].

2 Likes

That’s sadly not how Windows works. Windows commits memory as soon as you ask for it.

Are you sure about that? Everything I've read suggests otherwise, such as Microsoft's documentation.

Plenty of programs and some algorithsm rely on being able to allocate a huge address range but don't actually use it all.

VirtualAlloc can do either, but abstractions built on VirtualAlloc (like malloc) will usually commit. Otherwise you’d have to call another API (or VirtualAlloc directly) to actually commit the reserved pages before you can use them.

…yes, but only the in the same way as *nix, right? UnsafeMutableBufferPointer probably calls malloc specifically (not mmap) but for non-trivial allocation sizes malloc usually turns that into an mmap (or equivalent on Windows, presumably VirtualAlloc(0, size, MEM_COMMIT, …) or somesuch).

(that still leaves 'trivial' allocation sizes - usually a page or less - to pay the cost of memset(ptr, 0, size), though at least that overhead is essentially bounded at a relatively small cost)

The docs for MEM_COMMIT specifically say that memory is (in short) zero-faulted:

The function also guarantees that when the caller later initially accesses the memory, the contents will be zero. Actual physical pages are not allocated unless/until the virtual addresses are actually accessed.

By default, Linux will return success for any reasonable memory allocation, and only try to find that memory when it is faulted in. This is called “overcommit” and is the reason Linux has an OOM killer. Windows instead returns failure for allocations that would exceed the commit limit.

I think I was recalling this blog post which says that for MEM_LARGE_PAGES allocations, commit wires memory eagerly.

Ah, yes, that does seem to be an important edge case. Per that post:

Because very large pages are not pageable.

I think this is true of huge pages (transparent or not) on Linux, too. Presumably for the same reasons that Raymond provides - that paging a GiB page is still a bit pricey on current hardware.

But that shouldn't apply here because huge pages should always be explicitly opted-into by programs; Subprocess should not be using them in any case.

Just to add to @FranzBusch 's point: I absolutely agree with you that the closure approach isn't as "nice" to use as an actor or similar thing. In fact, that's how I started the design. However, I believe one of the biggest principles of Swift is that it values safely over many things, and I'd argue Swift should absolutely value safety over ergonomics. This is why you see a lot of closure based APIs in Swift today -- withUnsafeBytes, withCString, etc. All these APIs needs some sort of cleanup, so instead of returning a pointer to you and expect you to free it yourself after use, we provide a closure so we can perform clean up for you after the closure finishes but before method returns. This is exactly the same design pattern you see here -- as @hassila mentioned, the subprocess spawned can conceptually consider as resources allocated by the run method, and we want to make sure we clean up after ourselves.

As you noted earlier the "return an actor to you" approach can't really facilitate automatically resource cleanup (even if we use deinit we still need to throw any cleanup errors), that's why I decided on this design even though it's "less nice to use".

On a different note: part of our job at Foundation and Swift Standard Library is exactly to push the Swift language forward and to adopt new Swift language features -- this is why the SwiftFoundation package requires the latest Swift. If we stick with old and familiar approaches then Swift itself will never get better.

2 Likes

Just to clarify: when I said Subprocess won't support his use case, I meant specifically daemons. The reason behind this is because for the majority of the use case you probably want to interact with your daemon somehow (even potentially interacting with your daemon from many different processes), which implies some sort of IPC whether via pipe or something else. All these requirements conflicts with Subprocess's intentional design (for example, a daemon isn't really a "sub"process, how would a second process talk to the daemon you just launched?)

But like @wadetregaskis pointed out, there are other usages of child process outliving the parent like helper tools or you simply just want to launch a long running process. For those use cases I can envision a "barebones" API on Subprocess that's almost just a wrapper for posix_spawn -- it doesn't process any IO (maybe it simply takes a file descriptor); it simply spawns a process and gives you the pid (this way there's no cleanup needed, and the spawn method doesn't have to be asynchronous).

With all that said, I'm not convinced this spawn API is required for the initial version since there's nothing in the design that prohibits us add it in the future.

My goal is to keep the API surface for Subprocess as minimal as possible to allow us space to improve -- if we introduced something now and don't like it later, it'll be very hard to deprecate.

1 Like

Thanks for the detailed analysis. I just want to point out that the implementation itself is under rapid development and many of what you experimented is no longer true (all the blocking read writes is completely removed. It's there so when I develop I don't have to jump through async context).

In this post we are more so looking for feedbacks on the API itself rather than the implementation. Here are the API changes that I picked up from your description (please let me know if I missed something):

We have to do this every single time we use the body closure overload to avoid SIGPIPE or deadlock. Are we sure that our users will be able to write it? It looks quite complicated.

The design intention is that the "simple" case is the one without closure. The closure case is intended to give you full control. All the workarounds that I came up with so far requires us to have some assumptions on how this closure is being used. For example, how would you generalize

try await writer.finish()

try await withThrowingTaskGroup(of: Void.self) { group in
  group.addTask { try await p.standardOutput!.reduce(()) { (_, _) in () } }
  group.addTask { try await p.standardError!.reduce(()) { (_, _) in () } }
  try await group.waitForAll()
}

Without know if write should finish before read or if there's a order of which output and error are supposed to be read?

  • do not require the users to write parallel read from stdout/stderr, give them a method to do it with 1 call

I'm all ears on this one. Please let me know if you have any good suggestions (I disagree that a combined stream is the solution here).

It might well be the case that if you do consider and explorer that side of the API it could influence the overall API (including the part you are proposing now). Hard to tell upfront.

With this one I can't agree more:


In case you wondered those are the APIs:

  1. fork (with or without exec)
  2. posix_spawn
  3. Process (aka NSTask)
  4. NSWorkspace

Plus these which I forgot initially:

  1. AppleScript
  2. the ancient LaunchApplication of Carbon (still works!)
3 Likes

I fully agree, which is why I don't want to rush to get this in for the first iteration. I'll need time to explore what' the best approach. Right now I'm making sure there's nothing from the design that prohibits us from adding support later, and I don't think there is.

Also just two be clear: even though our plan is to hopefully deprecate Process one day, it certainly won't happen overnight and it'll take a long time. So in the mean time the old Process API is still available for use.

Thanks for the meme! #new-team-t-shirt

All these methods are going away (which is why they are internal and implementation detail).

But awkward or limited APIs can coerce people into doing unsafe things.

One of the biggest benefits of Foundation (IMO) has always been that it provides pretty reliable and widely applicable ways to do things. NSString isn't perfect but it's actually pretty darn capable, enough-so that it actually gets used for almost all relevant use-cases, which is why you don't generally have to deal with 3rd party frameworks not being able to talk to each other on such a fundamental level.

It's prevented a lot of not just wheel reinvention, but dodgy square wheels.

(see also @tera's humorous but apt xkcd adaptation)

Granted that's less of a concern for this particular problem domain, of spawning subprocesses, but it's not no issue entirely. If Foundation's solution continues to be limited, as Process is, we'll continue to see a large number of 3rd party attempts to replace it. Each representing a lot of redundant, wasted work, and bringing with them their own problems.

Plus, to solve the more advanced cases such 3rd party libraries would very likely have to cover all the basic cases anyway, so they wouldn't even benefit from being complimentary to a more basic option from Foundation.

(Foundation is of course showing its age - Process is a product of its Objective-C time, as is a lot of Foundation - but I feel there's continued value in striving for robust solutions, within Foundation.)

Can you clarify or elaborate on that problem?

I totally get what you mean w.r.t. deinit being in some ways not the ideal time to actually do anything (beyond freeing memory) because you have no good way to communicate failures, but by the same token this is a trade-off that Task makes, for example, and I think it's reasonable there - even though I have accidentally swallowed errors by discarding the Task object. Is the balance of things tipped the other way for running subprocesses?

And I agree that's a wise approach to these things.

It helps to have a roadmap though, or a vision documented. Otherwise, I worry that deferring these use-cases might defer currently unseen problems, that make them not so trivial to add later in a compatible way.

But, I recognise that it's hard to draw that line between having to do everything up front and actually deferring stuff [appropriately].

4 Likes

This is true no matter when you introduce an API. It's inherent in working on an ABI stable framework. It's not an argument for or against anything. This isn't a fundamentally new type of API, it's well trod ground.

1 Like