[Pitch] Progress Reporting in Swift Concurrency

As a concrete proposal of what I'm thinking… could an API like the following be safely used?

class Progress {
  init(totalCount: Int)
  
  var completedCount: Int
  var percentCompleted: Double { get }

  func makePercentCompletedStream() -> AsyncStream<Double>
  func assignCount(_ unitCount: Int, to stream: AsyncStream<Double>)
}

Whenever percentCompleted changes, it would publish the new value to all the AsyncStreams it had previously vended[1].

progress.assignCount(3, to: otherProgress.makePercentCompletedStream()) would behave somewhat like the current progress.addChild(_:withPendingUnitCount:), but instead of adding the progress as a child, it adds the stream as a child, and each progress can vend multiple streams.


  1. If we had multicast streams, this would be simpler, but I'm assuming we don't for now. ↩︎

As I understand it, a progress reporter frees the progress information client from knowing too much about the underlying task and frees the information provider from knowing about clients.

I agree that progress reporting shouldn't be used to control task cancellation, but how is cancellation actually reported, and how is that information consumed by clients?

It would be confusing to conflate cancellation with any progress value. (Incomplete is correct but implies further completion; complete is incorrect.) It's essential to show that there will be no more progress due to cancellation.

This is especially important in the structured concurrency context where cancellation is cooperative. The progress client should be able to show when subtask is has been cancelled but other subtasks are continuing, i.e., not cooperating with cancellation, which in turn blocks parent task completion. This is a very high-traffic fault model many users will encounter.

I believe the same is true of errors halting any further progress, and may be true when the process is blocked (that's harder, since progress may resume).

To preserve the ProgressReporter + Progress + Properties dance, consider replacing the Int count with an enum with different states, one of which is something like running(count: Int). The enum could even have an Int initializer for the default of a value, and running could perhaps support the maximum expected value.

I imagine there would be debate over the canonical states, but running, cancelled, and error would track endogenous task status and correlate with client use-cases. Blocked would be nice to have and tracks high-traffic threading issues that would be great to surface.

Unlike dynamic Properties, these are states common to all clients, and they should be in the API to keep progress reporters oblivious to internal details of progress report clients and vice-versa.

Being able to reliably and regularly notice and diagnose these kinds of stalls would go a long way towards making concurrency easy to use in Swift. I think both API and end users would really appreciate it!

The revision is undoubtedly better, thank you.

This API signature seems wrong:

public func withProperties<T>(
    _ closure: @Sendable (inout Values) throws -> T
) rethrows -> T

Minimal fix, T would have to be Sendable, and it should adopt typed throws, but it should probably exactly match Mutex?

public func withProperties<T, E: Error>(
    _ closure: (inout sending Values) throws(E) -> sending T
) throws(E) -> sending T

However, many of my previous problems (such as the integer-centric nature, the necessity of partitioning parent progress up-front, the vague boundaries and type confusion between indeterminate and determinate progress, and the story for mixing directly reported properties with value reduction) remain unsolved.

And all that said, I'm well on board with @davedelong's way of thinking — this is perpetuating the problems of NSProgress, whilst also being very complex to learn and use. It feels like there's a much more general solution, involving AsyncStream and map/reduce operations, that would also be vastly simpler.

3 Likes

How do these states, presumably represented as some kind of enum in Progress, integrate with the task concept of cancellation, or the language concept of throwing (errors)?

The proposed API gets out of the business of cancellation precisely because it seems like a mistake to conflate the two. The original NSProgress API certainly tied them together, but that was more because Cocoa has a standard error for cancellation, but that concept did not port into the idiomatic Swift error handling patterns.

Since the reporter API is just plain calls without some with{} scope:

let choppingReporter = progress.reporter(totalCount: fruits.count)
await fruit.chop()
choppingReporter.complete(count: 1)

one would not be able to automatically handle errors or cancellation to mark the reporter as "cancelled" with these APIs. We'd have to make it:

try await progress.withReporter(totalCount: fruits.count) { choppingReporter in  
  try await fruit.chop() // THROWS
  choppingReporter.complete(count: 1)
} // if throws, mark the reporter "part" as failed...

The same could be done with cancellation but it would require everything be done in with with-style blocks.

--

Now, is that worth it or not? I'm not sure it would be.

After all, since cancellation and errors require handling already, and in a way a failed part is a "completed" part, isn't it? If the failure was terminal, you should throw errors and handle those errors rather than not throw errors and just let the progress know you've failed to process, right?

I'm not sure this non-local "let's let consumers of the progress know one of the parts of the progress has failed" is actionable by them in any way...? Would it be useful to show in an UI "some parts of the progress have failed?" and get this information from Progress? I don't know, maybe?

It does seem like this would complicate the picture and muddy the purpose of this API a bit, but I'm not very familiar with how UI's surface progress information from Progress.

Hand-wavy counterproposal:

Reuse AsyncStream.Continuation? for subtasks to communicate progress to their callers.

public func myLibraryAPI(
    reportingProgressVia progress: AsyncStream<MyLibraryAPIProgress>.Continuation? = nil
) async {
    while ... {
        progress(MyLibraryAPIProgress(...))
    }
}

The stream element type is completely open for the library to report whatever makes sense to it (but it will need to be Sendable).

Invent a new AsyncMapReduceStream, acting like a heterogeneous CombineLatest, perhaps something like:

struct AsyncMapReduceStream<Key: Hashable, Intermediate: Sendable, Element>: AsyncSequence {
    init(reduce: @escaping @Sendable ([Intermediate]) -> Element) { ... }
    mutating func addUpstreamSequence<S: AsyncSequence>(
        key: Key,
        sequence: S,
        map: @escaping (S) -> Intermediate
    ) { ... }
    mutating func removeUpstreamSequence(key: Key) { ... }
}

Each time any upstream emits an element, it's mapped to the Intermediate, then combined with the latest value from each upstream sequence using reduce, to emit a new element.

It could offer "QOL" overloads to reduce boilerplate where Intermediate == Element, where S: Identifiable, etc.

This can be used to aggregate progress from heterogeneous children.

I've suggested above that it allow elements to be added and removed dynamically, against my better judgement, because that seems like a requirement of the "progress" use-case. If dynamic removal isn't needed, we could get rid of Key, which'd simplify it.

Invent a generic ObservedAsyncSequence to meet SwiftUI's needs, perhaps something like:

final class ObservedAsyncSequence<S: AsyncSequence>: Observable
where S.Failure == Never {
    var value: S.Element? { get }
}

My earlier example, under this scheme:

// DirectoryTraversal library
struct DirectoryTraversalProgress { ... }
enum DirectoryTraversalDecision { ... }
func traverse(
    _ root: FilePath,
    glob: String,
    progress: AsyncStream<DirectoryTraversalProgress>? = nil,
    eachFile: @Sendable (FilePath) -> DirectoryTraversalDecision
) async throws { ... }

// ImageTranscoder library
struct ImageTranscoderProgress { ... }
enum ImageFormat { ... }
func transcode(
    _ file: FilePath,
    toFormat: ImageFormat,
    dest: FilePath,
    progress: AsyncStream<ImageTranscoderProgress>? = nil
) async throws { ... }

// Me:
enum MyProgressIntermediate {
    case traversal(...)
    case image(...)
}
struct MyProgress { ... }
enum MyProgressKey: Hashable {
    case traversal
    case transcode(FilePath)
}
let (traversalStream, traversalProgress) = AsyncStream<DirectoryTraversalProgress>.makeStream()
let aggregate = AsyncMapReduceSequence<MyProgressKey, MyProgressIntermediate, MyProgress> {
    ...
    for intermediate in $0 {
        ...
    }
    return MyProgress(...)
}
aggregate.addUpstream(key: .traversal, traversalStream) { 
    .traversal($0....)
}
try await traverse(
    "path",
    glob: "*.png",
    progress: traversalProgress
) { path in
    let (transcodeStream, transcodeProgress) = AsyncStream<ImageTranscoderProgress>.makeStream()
    aggregate.addUpstream(key: .transcode(path), transcodeStream) {
        .transcode($0....)
    }
    try await transcode(
        path,
        toFormat: .jpeg,
        dest: path.replacing(".png", with: ".jpeg"),
        progress: transcodeProgress
    )
}

// Finally, for SwiftUI:
@State var progress = ObservedAsyncStream(aggregate)

var body: some View {
    Progress(progress.value.current, total: progress.value.total)
}

Yes, I had to specify in detail how to merge the heterogeneous progress kinds, and decide for myself how all the data should be merged together. I consider that a good thing — it's clear here that there's no "one size fits all".

If there is a "one size fits most", it might be:

struct DeterministicProgress {
    var completed: Int
    var total: Int
}

This could also be provided, along with simplifying type aliases for AsyncStream, constructors for AsyncMapReduceSequence that sum them in the obvious way, constructors for SwiftUI.Progress that accept them, etc.

1 Like

Thank you for your replies

Sorry, by "endogenous" I meant they're entirely derivative (i.e., task cancellation, not user events). A good reporter of a task would report when the task is cancelled or throws an error in addition to reporting positive progress. They're not intended to report the error value or other kinds of cancellation, but to reflect the processing state as incomplete but terminated in relevant flavors.

Progress reporting is a notification scheme for the underlying processing status; it can neither be divorced nor conflated. Progress reporting is not responsible for task cancellation, but the progress state reported has to reflect a cancelled or erroneous state to be helpful in diagnosing fault models in task processing and to avoid misleading clients. (And yes, the progress reported is only as good as the reporter is diligent, which can be hard when handling cancellation or errors.)

I agree that reporting this can be a real pain, and I would understand if some developers decide not to do so in some situations. (Better integration with task API's would be nice and might be called for.) But though most code doesn't follow best practice, we still try to have one where we can.

But for those developers who do want to report state accurately I'd like to give them a common approach and means to do so, to make it easier for them. Building it into the API is also the only way they can inter-operate with such reporting from other tasks written by other developers and especially libraries. (Concurrency, too, takes a village...)

It's not actionable in the sense of fixing that process (and we're expressly not trying to integrate process handling). But it will help with diagnosing and working around a broken process. It's essential to distinguish a hung process (where one task is ignoring another's cancellation) from one that is blocked (on some resource), or to avoid retrying a process that's throwing errors, because those merit 3 different responses to incompleteness.

Progress that hums along normally is not really important; clients are most concerned about progress that stalls. To me the difficulty of concurrency boils over precisely in lacking insight into incomplete states. People become concurrency-averse if they get themselves in situations they don't understand. And it could save a lot in support costs if such situations can be logged; it's precisely the non-local aggregators that should be able to do this.

Would such states be easier to stomach if a (reference) enum with associated values is not required?

If the state must be an Int, a unixy solution would be to nominate some magic negative "count" values as terminal cancellation, error (and possibly blocked, et al), with API to interpret the state as such for backwards compatibility if the states represented might change.

Or a similar Swift solution with better backwards and forwards compatibility would wrap the Int in a single (binary-compatible?) struct with appropriate semantic guarantees. (One could also imagine decomposing it, e.g., a Int split into completed and total values, but those are other concerns.)

Thanks for your patience. I think users will appreciate it.

1 Like

First of all I think it's great to see this being worked on. I've used NSProgress pretty extensively in the past to report some complex asset download progress and it's something which I'd love to see a modern take on. In particular my use case was not too dissimilar to what @davedelong has illustrated where a manifest was downloaded which contained a bunch of assets (text, images, videos etc.) and then progress was reported across all of them into a single parent progress which was then exposed to the UI layer. There was also caching involved where certain assets were already downloaded and the progress was adjusted to suit as well as shared assets among manifests which shared progress.

Unfortunately I feel, like others in the thread, that this proposed solution isn't it. I think @davedelong and others cover a lot of my feelings pretty well so I won't repeat them, but in general this is a missed opportunity to do something which fits in to the Swift language and structured concurrency story more seamlessly and improve the story pf progress reporting rather than being hamstrung by trying to be backwards compatible with NSProgress.

I like the ideas floating around proposing the use of AsyncStream and dislike the reasoning in the proposal around task locals. Task locals in particular feel like they can provide a more succinct ergonomics when you don't require fine grained control over everything but reporting progress without them should still be possible.

I haven't spent/got the time to think about this more in depth at the moment, but I just wanted to voice my concerns about the current pitch. I think it would be unfortunate to go ahead in its current form and I think it requires a step back and re-assessment starting from the ideal situation/requirements and working back from there. The original pitch for Observable is a great illustration of where it started with great goals but was a little rough to begin with, and eventually morphed in to something pretty great. I hope this pitch can follow the same journey.

3 Likes

If I understand this correctly, it boils down to two things:

  1. You prefer that every function defines its own progress "kind" type
  2. You prefer to send progress via async streams

Re: 1, that was actually closer to the original version of this proposal - which made ProgressReporter generic on the kind of progress it reports. The 2nd revision removed this, because (a) it added quite a bit of complexity to the surface, plus a performance penalty for dealing with merging heterogenous children, (b) in reality most progress is only represented as a fraction completed anyway, and (c) it forced intermediate layers of progress reporting to lose metadata that a higher layer of the tree may be interested in reading. Other properties, like completed file counts, throughput, etc, are sometimes added on just as metadata to display up to the user.

Re: 2, I think this approach allows for some of the same confusion that makes NSProgress more difficult to understand than we like. Who is responsible for creating that continuation, and can it be used more than once? What is possible to do with the continuation besides just yielding a value?

Combining those two points, you get to the realization that most code would simply yield an update to the fraction completed. Why make everyone write out a call to a closure instead of just setting the property value or calling a function like progress.complete(count: 3)? Why make everyone define the same DeterministicProgress instead of building it into the base API?

So, like any API, this proposal does draw a line in the sand about what it considers the most important aspects of progress reporting.

  • Most cases do not need children reporting to multiple parents.
  • Most cases do need a completed and total.
  • It leaves itself open to expansion (in a way that NSProgress could not). This is the purpose of the property protocol, with a reduce function for custom values to propagate themselves up the tree.
  • It encourages self-documentation on functions which report progress, because they have a ProgressReporter.Progress property as an argument. The reliance on thread-local state in NSProgress was a bug from the start. @KeithBauerANZ - that bug you filed (14462333) is at least part the reason for the existence of the family of discrete API on that class.

With respect to reporting sub-progress of a larger operation - that is completely doable with the proposed API. Just observe one of the intermediate ProgressReporter objects in the tree. It's up to the developer to get a reference to that instance to their UI, but it should be fairly straightforward.

I do not want to dismiss the possibility of someone wanting to have the complicated graph diagrammed above. I just don't think this API has to solve that kind of need to be useful.

I think that this proposal does the right thing by: 1. minimizing the API surface to be a purpose-built tree for reducing values, and 2. focusing the API surface to handle the common cases above.

I've seen it said a number of times that we should aspire to make simple things easy, and make hard things possible. I fully acknowledge that having a complex graph as shown above falls into the "hard things" category.

The concern I have is that the proposed design makes that hard thing nearly impossible. The closest I've found for making it possible is this: for every progress that may want to report multiple parents, create a "shadow copy" of the progress, storing that in the original progress's userInfoDictionary. The shadow observes the fractionComplete of the original (using KVO) and updates it own progress to mirror that fractionComplete, which can then be attached to a parent as desired.

Expand to see some sample code class ProgressMirror: NSObject {
    private let totalUnitCount: Int64 = 100
    private weak var source: Progress?
    private var targets = [Progress]()
    
    init(source: Progress) {
        self.source = source
        super.init()
        startObserving(source: source)
    }
    
    func startMirroring(to progress: Progress) {
        guard let source else {
            return
        }
        
        progress.totalUnitCount = totalUnitCount
        targets.append(progress)
        synchronize(source: source, target: progress)
    }
    
override public func observeValue(forKeyPath keyPath: String?, of object: Any?, change: [NSKeyValueChangeKey: Any]?, context:     UnsafeMutableRawPointer?) {
        guard let source = object as? Progress else {
            return
        }
        
        for target in targets {
            synchronize(source: source, target: target)
        }
        
        if source.isFinished || source.isCancelled {
            collect()
        }
    }
}


// MARK: - Private API
//
private extension ProgressMirror {

    /// Replicates the state of a source Progress into a target Progress
    ///
    func synchronize(source: Progress, target: Progress) {
        
        /// Rather than just carrying over the `completedUnitCount`, we rely on the `fractionCompleted`
        target.completedUnitCount = Int64(source.fractionCompleted * Double(totalUnitCount))
        
        if source.isPaused {
            target.pause()
        }
        
        if source.isFinished {
            target.markAsComplete()
        }
        
        if source.isCancelled {
            target.cancel()
        }
    }
    
    /// Initializes the receiver as the KVO Observer
    ///
    func startObserving(source: Progress) {
        source.addObserver(self, forKeyPath: #keyPath(Progress.totalUnitCount), context: nil)
        source.addObserver(self, forKeyPath: #keyPath(Progress.fractionCompleted), context: nil)
        source.addObserver(self, forKeyPath: #keyPath(Progress.completedUnitCount), context: nil)
        source.addObserver(self, forKeyPath: #keyPath(Progress.isCancelled), context: nil)
        source.addObserver(self, forKeyPath: #keyPath(Progress.isPaused), context: nil)
    }
    
    /// Removes KVO Observers and cleans up our self reference
    ///
    func collect() {
        guard let source else {
            return
        }
        
        source.removeObserver(self, forKeyPath: #keyPath(Progress.totalUnitCount))
        source.removeObserver(self, forKeyPath: #keyPath(Progress.fractionCompleted))
        source.removeObserver(self, forKeyPath: #keyPath(Progress.completedUnitCount))
        source.removeObserver(self, forKeyPath: #keyPath(Progress.isCancelled))
        source.removeObserver(self, forKeyPath: #keyPath(Progress.isPaused))
        source.setUserInfoObject(nil, forKey: .mirror)
    }
}

This is doable, but it's a serious pain, and that extra observation has a performance impact. We've had three people just on this single thread report that they have use cases for progress reporting to multiple parents, so this doesn't seem like an extremely rare use case. Surely there's a better way to do this than mirroring progress objects via KVO!

6 Likes

Yes, this sort of "mirror" is exactly what we've had to resort to as well. It works, but it's clunky.

1 Like

For anyone following along, I just stumbled across the formal review thread for this. It looks basically identical to the original pitch: [Review] SF-0023: Progress Reporting in Swift Concurrency

2 Likes