[Pitch] Progress Reporting in Swift Concurrency

I think it's even worse than the non-atomicity suggests, because reduce is modifying these properties asynchronously in response to progress updates from children. So even if my uses of my properties are single-threaded (which the API in no way guarantees), I can still race against my children, whom in general I have no control over.

Another problem with NSProgress that this proposal also ignores is that of "update frequency". With NSProgress you were supposed to KVO it to update your UI, but its value could be updated arbitrarily quickly, leading to performance problems. With ProgressReporter, the problem is punted to the observer. If the observer is always SwiftUI, that might almost be OK; I presume it already throttles updates to the refresh rate. But if someone's trying to integrate this into AppKit/UIKit/whatever on another platform, they will have to add custom logic to throttle their updates from the ProgressReporter.

I also think there's an internal problem with frequent updates: basically, the way the API is constructed, whenever a child updates its properties:

  • the parent constructs a new aggregate Properties instance
  • for each of its children,
    • the parent reads the child's properties
    • the parent calls reduce to fold those properties into the new aggregate
  • the parent replaces its own properties with the newly-reduced version, which then recursively applies this procedure in its parent (throwing away any manual changes that were made to its own properties; that might be another issue?).

If children are spamming updates into a large tree, the overhead of this recomputation could easily overwhelm the actual operations being performed.

I see, thanks for the clarifications! I tried it out and it is true that when two threads try to mutate the same property of a ProgressReporter, such as totalCount, in place at the same time, the results become inconsistent. The end result sometimes reflects mutations from both threads (good), but sometimes reflects a mutation from only one thread (really bad).

In this case, would the best way to approach this be to expose the Mutex backing totalCount and properties directly so that developers always have to do the mutations within withLock closures?

Or would it be better to not expose a setter via the properties and instead introduce a method that will set the property, similar to how completedCount exposes a getter but will only be mutated through calls to the method complete(count:)?

Yep, you're right about the constraints here. I think I could've been clearer with what I meant, let me clarify this in code, and feel free to let me know what you think about this!

Taking your example, the call site I was envisioning would look something like this:

// Me:
struct MyProperties: ProgressProperties { ... }
let myReporter = ProgressReporter<MyProperties>(totalCount: 200)

// to hold reference to traverse reporter
let traverseIntermediary = myReporter.assign(totalCount: 100, kind: DirectoryTraversalProperties.self) 
let traverseReporter =  traverseIntermediary.reporter(totalCount: 100) 

// to hold reference to transcode reporter
let transcodeIntermediary = myReporter.assign(totalCount: 100, ImageTranscoderProperties.self) 
let transcodeReporter =  transcodeIntermediary.reporter(totalCount: 100) 

// assign from traverse reporter & transcode reporter 
try await traverse(
    "path",
    glob: "*.png",
    progress: traverseReporter.assign(count: 100)
) { path in
    try await transcode(
        path,
        toFormat: .jpeg,
        dest: path.replacing(".png", with: ".jpeg"),
        progress: transcodeReporter.assign(count: 100)
    )
}

This way, we'll be able to access properties of from traversing the directory and transcoding images from the references that we hold to the traverseReporter and transcodeReporter.

1 Like

One note for the potential users of this API as pitched; it would interoperate with the Observed AsyncSequence when accessing the ProgressReporter - so any calculation for that done in the closure passed to Observed would emit a sequence of values composed from said progress.

3 Likes

Progress reporting is similar to metrics. What we are doing in swift-metrics is only expose compound actions e.g. our counter has a method called increment. The underlying implementation can then decide how to safely implement the compound action. This can either be done by doing the get and set in a single withLock "transaction" or in some cases backed by atomic operations.

1 Like

Sounds good, thank you! I'll make sure to resolve the issues of backing totalCount and properties with a lock in this API.

In my mind, this makes the proposed design a non-starter. I need progress values to be part of multiple hierarchies, and it's a total PITA to get Foundation.Progress to do this.

Here's a very very simple example:

  • in my app, there's a central manager that downloads things
  • downloads are uniquely identified with stable identifiers
  • the download manager supports downloading packages of things

The problem:

  • Multiple packages may request the same thing to be downloaded
  • I don't want to download the same thing multiple times
  • Each package should be reporting the progress of the single, unique download

Currently I've got a helper on Foundation.Progress that creates a new progress that basically KVO's everything on the original and copies over the values to the new progress, because Foundation.Progress throws an exception if you try to add the same progress as the child of multiple parents.

When I started reading this, I was really hoping to see an observation-based approach, and not a parent-child approach. It was quite disappointing to not see this addressed, since IMO this is one of the fatal flaws of Foundation.Progress and makes this entire API wholly undesirable.

I've also been finding immense value in Foundation.Progress's resumingHandler, cancellationHandler, and pausingHandler. They've basically allowed me to use a Progress almost like a future; I can return it out of the "async" method and then allow clients to pause and resume and cancel and can easily react to that.

I take it then that the expectation of this API is that I can't use it as-is and will always be expected to wrap it up in some sort of action-providing thing that implements pause(), resume(), and cancel()?

Hi Dave! Yes, as mentioned in the alternatives considered section of the proposal, we decided that this API will not support cancellation, pausing and resuming the way Foundation.Progress did due to two main reasons:

  1. ProgressReporter's goal is to report progress, and it does not assume ownership over the task that is reporting on. The consequence of this is that ProgressReporter will not be responsible for acting upon the task that it is reporting on.

  2. Task cancellation is a feature that is available in Swift concurrency, and we encourage developers wishing to cancel a task to utilize that feature. This is to more clearly separate the responsibilities of a Task and a ProgressReporter.

1 Like

If all the API is doing is conveying some information downstream to people listening for changes... then maybe progress reporting might be better modeled as an AsyncSequence of progress values with some special combineLatest() operators to handle merging "child" progress values into a single, overall progress value.

4 Likes

Hi Dave, would you mind clarifying further what this progress tree would look like in this case?

Is it that there's an overall ProgressReporter, and that it has children ProgressReporter which would each address downloads happening in a single package?

Imagine you're building an app that supports downloading packages. Each package consists of one or more assets that must be downloaded. You establish some requirements:

  • Each asset should only be downloaded once, regardless of how many packages it "belongs" to
  • Each asset download should be able to report its individual progress
  • Each package should be able to report its progress of downloading its referenced assets
  • It would be nice to get a total progress of all asset downloads
  • It would be nice to get a total progress of all packages

Drawing this out, you might end up with a progress reporting hierarchy like the following (A through F are the individual asset downloads; Pkg 1 through Pkg 3 are three packages of assets, which each reference 3 assets that must be downloaded. Arrows represent progress reporting for that thing).

As you can see, it is trivial to imagine scenarios where a single progress reporter would want to belong to multiple hierarchies. For example download A technically would have direct "parents" of Pkg 1, Pkg 2, and Total Download Progress.

This is why the single parent model is insufficient.

I think the continued usage of "parent" and "children" and "tree" is really complicating this pitch and design.

A thing produces progress and zero or more things want to consume that progress. I think we should be looking at producer/consumer or pub/sub styles of progress reporting, not strict and inflexible parent/child hierarchies.

9 Likes