[Pitch] Subprocess 1.0

Update: we actually have to make it Sendable and Copyable in Subprocess' use case because in the body closure, you are supposed to perform read and write concurrently.

return try await withThrowingTaskGroup { group in
    group.addTask {
        try await standardInputWriter.write("echo hello stdout\n")
        try await standardInputWriter.finish()
    }
    group.addTask {
        for try await text in standardOutput.lines() { ... }
    }
}

If you don't perform them in parallel, the child process might hang, waiting to write more output or read more input. StandardInputWriter therefore needs to be Sendable and Copyable.

I have some feedback I'd like to provide privately. Please ping me via the usual channels™®©!

I haven't raised this in my last comment, but I am personally not super happy about Subprocess providing a global run method. Looking at the broader ecosystem and standard libraries, we have rarely used global methods. Their discoverability is often really hard, and they "litter" the global namespace. I would love if we could reconsider moving the run methods into Subprocess. Especially with your suggested migration path to stream, which would be another global method. Both run and stream are very common names.

The reason I brought this up is that I saw the nonisolated(unsafe) usage in the API which is something we could handle once in a wrapper type similar to what FileDescriptor is doing.

This is the last chance for us to re-evaluate the core design of this package before we lock it in for the foreseeable future. I have brought up this similar concern during the initial reviews, and I continue to think that these protocols are better served with a static struct with static vars and methods. I would really encourage you and the reviewers to explore this alternative.

Correct, for the trivial types they can conform, but as you know from the implementation, the "special" conformances such as FileDescriptor-based ones have special case handling internal to the library. With a struct + static vars/methods, users can write extensions on this struct to allow the same kind of syntax that can be achieved with the protocol-based design, e.g.:

extension SubprocessInput {
  static func myType(_ myType: MyType) -> Self {
    self.data(myType.data) // Use the existing Data based method
  } 

Correct, you need to be able to concurrently read and write, but you should not be able to concurrently write/read from multiple tasks. Your example is showing concurrent read and writes, but we need to disallow this:

try await withThrowingTaskGroup { group in
    group.addTask {
        try await standardInputWriter.write("echo hello stdout\n")
    }
    group.addTask {
        try await standardInputWriter.write("echo hello stdout\n")
    }
}

Your example is achievable with a ~Copyable but Sendable writer. It does slightly regress the ergonomics since Swift doesn't have call once-closures yet, but it can be worked around by sticking the writer into an Optional and taking it out. Unfortunately, AsyncSequence doesn't allow ~Copyable conformances, such that for that case we have two options:

  1. Make the AsyncSequence ~Sendable and pass it as sending to the closure
  2. Make the AsyncSequence Sendable and add runtime checks that only a single iterator is ever created.

For the new AsyncStreaming interfaces we are following a similar patterns where we expect that most readers and writers are ~Copyable since they model a single resources such as a file descriptor or socket. We do expect them to be Sendable though since all their methods are mutating so you cannot mutate them concurrently anyways.

4 Likes

I was skeptical about Input and OutputProtocol originally because they felt like marker protocols for me. I also agree that this design forces the implementation to cast and force unwrap the type, which I don't quite feel comfortable with.

But then I found a use case where I actually created my custom type that conforms to the protocol. I work with a lot of JSON objects, so I created this,

struct MyJSONOutput : OutputProtocol {
    struct MyType: Codable {
        let title: String
    }
    func output(from span: RawSpan) throws -> MyType {
        return try span.withUnsafeBytes { buffer in
            let d = Data(bytes: buffer.baseAddress!, count: buffer.count)
            let t = try JSONDecoder().decode(MyType.self, from: d)
            return t
        }
    }
}

Now I use it everywhere without needing to duplicate the Data-JSONDecoder dance

        let ghResult = try await Subprocess.run(
            .name("gh"),
            arguments: [
                "pr", "view",
                "\(prNumber)",
                "--repo", hostName,
                "--json", "title"
            ],
            output: MyJSONOutput(),
            error: .string(limit: .max)
        ) 
        let titleInfo = ghResult.standardOutput.title // `titleInfo` is a string

While it's still possible to do it if OutputProtocol is replaced with a concrete type, this changes how I viewed this problem. Between the two patterns I don't see a clear winner since I see both often:

(1) library provides a concrete type; users extend this with their custom static var/func
(2) library provides a protocol; users extend this by having a custom type that conforms to this protocol

1 Like

How practical would it be to make this implementation work with streaming output that requires more than one buffer? Does the simplicity of the protocol get in the way?

I understand there was some controversy in the original pitch about these topics. However, this is exactly why we decided to have a beta period so people can try it out and let us know if some of these design decisions were actually an issue in practice. The subprocess package was released exactly a year ago, and now we have around 90k (~10k unique) clones per 14-day period. We haven't received any issues/comments about the free-standing run() function being ambiguous or the design of InputProtocol and OutputProtocol. This gives us a good signal that these design decisions work in practice.

In addition, InputProtocol and OutputProtocol being protocols also allows us to make ExecutionRecord generic. When you want .string as output, the .standardOutput property on ExecutionRecord is also a String. How do you achieve this with static variables? What benefit does a concrete type really bring us over the more flexible protocols?

Correct, this is why I don't think it's appropriate to use ~Copyable here. I experimented with it and this essentially says developers need to write "hacks" in their code even for the normal "happy" path. I don't think disallowing concurrent writes is the top priority here, and this is something that can be documented.

I'd like to add my voice in favor of the global run function, after having used the library quite a bit. And I say that despite almost always spelling it Subprocess.run (soon I guess I can change that to Subprocess::run). What I like is that I have that option—when I use it in a more complex component somewhere, I'm likely to qualify it, but I also see it as an important piece for scripting use cases where being able to just write run is quite nice with much less ceremony, and that latter bit is what sways me toward giving it a simpler spelling.

2 Likes

Thanks for sharing this I think this is an interesting example.

I just want to clarify that I think non of these two things are blocking. This feedback here is merely representing my personal opinion on this. I haven't raised a Github issue around either of those two since I did raise it during the previous review rounds already.

This isn't a signal, at all.

First, unique clones are just that, clones. They don't say anything about who is using the library, or how. For all you know there's a semi-popular library that uses Subprocess as a dependency, so users may not be using run directly at all. Or it could just be AI bots using the data for training.

Second, Swift's global namespace is already so polluted, especially once you import Foundation, and especially on Apple platforms, that two new functions are hard to notice. This doesn't mean it's okay to offer them, just that they don't make a bad situation noticeably worse for most users.

Third, users very, very rarely file GitHub issues. Issues that are filed are almost always bugs of some kind, not API critique, so you shouldn't expect this sort of feedback through GitHub in the first place. And most devs don't think about this sort of thing day to day, which is why the review process is so important. Subprocess has been a package for all of seven months or so, and there have been 99 issues opened, many from before it was a released package. I've been a developer on Alamofire for over 10 years now, which still has 40 - 50k unique clones every 14 days, and I can count on one hand the number of issues related to API style I've ever received.

So while I don't care about this API, and only slightly about the use of global functions, please don't use the lack of feedback from internet randos as evidence for anything.

8 Likes

Apple is known to ignore feedback, so most people learned to not give them.

Lately even PRs are ignored for months.

1 Like

@FranzBusch I want to give you (and others on the thread) more concrete reasoning behind our decision to use protocols.

We initially explored the idea of concrete IO types way back in version 5 (out of 17 at this point). You can find the code here. As I mentioned, we decided against it because we want ExecutionRecord.standard{Output|Error} to be generic over the output type such that if you choose .string, you will get a String back.

We explored this idea again with a new PR yesterday since you commented, and this time we made it generic to resolve the above problem. However, the "code smell" you disliked from the protocol-based design also appears here:

internal static var _customWrite: InputMethod { 
    return InputMethod( 
        createPipe: { try CreatedPipe(closeWhenDone: true, purpose: .input) }, 
        Write: {_ in } 
    ) 
}   

Essentially, the fatalError (which will be changed to a no-op) that was in the code was not caused by the protocol design but because NoInput, FileDescriptorInput, DiscardedOutput, and FileDescriptorOutput are fundamentally different: instead of dealing only with write() and output() methods, they need to pass special file descriptors to the child process. One could argue that these types should not conform to {Input|Output}Protocol, but that would cause a combinatory explosion of run() overloads. We eventually settled on the protocol design, which also allows it to be extended.

1 Like

Thanks for giving this another try. I think we could come up with a slightly different solution to combat the mentioned code smell by having something akin to Rust's RawFd and RawHandle types that would allow us to understand if an input or output is a file descriptor or handle. However, the ExecutionRecord won't work with that easily.

Consider the following "indirect" design for the protocols:

enum InputMethod {
    case pipe((StandardInputWriter) async throws -> Void)
    case fileDescriptor(FileDescriptor, closeAfterSpawning: Bool)
    case devNull
}

public protocol InputProtocol: Sendable {
    var inputMethod: InputMethod { get }
}
enum OutputMethod<T: Sendable> {
    case collect(maxSize: Int, (RawSpan) throws -> T)
    case fileDescriptor(FileDescriptor, closeAfterSpawning: Bool)
    case stream
    case discard
}

public protocol OutputProtocol: Sendable {
    associatedtype OutputType: Sendable
    var outputMethod: OutputMethod<OutputType> { get }
}
The conformances
struct NoInput: InputProtocol {
    var inputMethod: InputMethod { .devNull }
}
struct FileDescriptorInput: InputProtocol {
    let fd: FileDescriptor
    let closeAfterSpawning: Bool
    var inputMethod: InputMethod {
        .fileDescriptor(fd, closeAfterSpawning: closeAfterSpawning)
    }
}
struct StringInput: InputProtocol {
    let string: String
    var inputMethod: InputMethod {
        .pipe { writer in
            try await writer.write(string.utf8)
            try await writer.finish()
        }
    }
}
struct ArrayInput: InputProtocol {
    let array: [UInt8]
    var inputMethod: InputMethod {
        .pipe { writer in
            try await writer.write(array)
            try await writer.finish()
        }
    }
}
struct DiscardedOutput: OutputProtocol {
    typealias OutputType = Void
    var outputMethod: OutputMethod<Void> { .discard }
}
struct FileDescriptorOutput: OutputProtocol {
    let fd: FileDescriptor
    let closeAfterSpawning: Bool
    typealias OutputType = Void
    var outputMethod: OutputMethod<Void> {
        .fileDescriptor(fd, closeAfterSpawning: closeAfterSpawning)
    }
}
struct StringOutput: OutputProtocol {
    let limit: Int
    let encoding: any Encoding.Type
    typealias OutputType = String
    var outputMethod: OutputMethod<String> {
        .collect(maxSize: limit) { String(decoding: $0, as: encoding) }
    }
}
struct BytesOutput: OutputProtocol {
    let limit: Int
    typealias OutputType = [UInt8]
    var outputMethod: OutputMethod<[UInt8]> {
        .collect(maxSize: limit) { Array($0) }
    }
}
struct SequenceOutput: OutputProtocol {
    typealias OutputType = Void
    var outputMethod: OutputMethod<Void> { .stream }
}
Changes to run method
switch input.inputMethod {
case .devNull:
    // redirect to /dev/null
case .fileDescriptor(let fd, let close):
    // use fd directly
case .pipe(let writeFn):
    // create pipe, call writeFn(writer)
}

The review is opened here [Review] SF-0037: Subprocess 1.0

I think 1.0 should have the SubprocessFoundation trait turned off by default and then maybe add stubs in debug, so these extensions are more discoverable:

#if DEBUG && !FOOBAR
@available(*, unavailable, message: "Enable FOOBAR trait to use this")
func foobar() {}
#endif

The main problem with default on traits is that if a library author either already depends on the package with default traits when the trait is added or if they just don't look into the traits when adding the dependency then later specifying a smaller subset of traits in that library is a breaking change. (Depending packages can just import Subprocess without also specifying it in Package.swift)

1 Like

I've ran into an error with 0.4.0:

Failed to spawn the new process. Underlying error: No file descriptors available

I'm running a short-lived process every couple seconds. Ubuntu server 24.04 host, docker container as non-root user with Alpine base image. Restarting the container fixes it (at least for a while).

Failed to spawn the new process. Underlying error: No file descriptors available

I'm running a short-lived process every couple seconds

Uh oh, that suggests a file descriptor leak. Would you mind filing an issue of swift-subprocess, ideally with a repro and the output of sudo lsof -Pn & ps auxww after it has been running for a while.

2 Likes
1 Like