Introducing `Kubrick` a director for resilient, long-running, asynchronous, jobs

Kevin_Wooten · October 31, 2023, 5:10pm

We've built a new Job system named Kubrick. It is a director for resilient, long-running, asynchronous Jobs. Kubrick ensures job completion, even across process restarts, and possibly even transfer's between processes (e.g. from an app extension to the app itself).

How does it differ from current solutions like OperationQueue?

Embraces Swift Concurrency

Jobs are, at their heart, an async function with the ability to link dependencies to their inputs. Integration with asynchronous code or frameworks is easy and natural. For example, implementing a "delay" is easy as calling Task.sleep.

Resilient & Idempotent

Jobs are stored and resurrected when necessary to ensure they complete (success or failure) and Kubrick ensures that each Job will only run to completion a single time. If a tree of Jobs is resurrected, only those Jobs that have not finished will be executed to ensure their completion.

System Integration

Kubrick integrates with persistent long-running system services. Allowing Jobs to be written that use these services and, if the process is stopped or backgrounded, they will pick up where they left off. Currently background URLSessions and UserNotifications are implemented.

Multi-process Coordination

Kubrick runs as a "principal" director in a main process (e.g. an Application) and "assistant" directors in cooperating processes (e.g. an Extension). Jobs started in an assistant are automatically transferred to the principal if the assistant's process stops or faults before the Jobs are finished. If the principal's process is not running, Jobs will be resumed at its next execution.

You can check out the documentation here. It is currently unfinished but shares the ideas and has a nice getting started that walks you through setting up Director(s) and creating Jobs.

Have a look and let me know what you think!

Sajjon · October 31, 2023, 6:16pm

Skimmed through the Getting Started documentation, but I guess I never really got answer to my primary question - what does this solve? I’m certain it solve a lot of things and the documentation was quite good, but I couldn’t really focus because my burning need of a juxtaposition example vs TaskGroup / or otherwise low level core async building blocks never got satisfied. Maybe it is just how my brain works but when I find a new Swift package I want to see in the very start:
“Without this you have to do these 48 lines, with this package you only need these 15 lines”

Or “without this package you get good features X,Y with 30 lines, with this package you get good features X,Y AND Z with 25 lines” etc

Kevin_Wooten · October 31, 2023, 6:45pm

@Sajjon Good point that should probably be made clear...

Kubrick is about long running jobs (minutes, maybe hours, or even days). It is probably overkill for anything that is expected to be completed in short order.

As an example... you have a number of large files your must download and after they are all downloaded, you need to post process them as a unit. Apple OSs require this to be done using a background URLSession because the OS may background your app, which can include killing your app. Using a background URLSession will cause the OS to launch the app when your download are finished but it's up to the developer to "pick up where you left off". Kubrick solves this problem by allowing you to simply write async functions and takes care of the rest.

There are a lot of situations like this in the iOS (and related) world. Another example is showing a UserNotification and then responding to the user's action which may happen days later when your app is no longer running. Again you are faced with the "pick up where you left off" problem. Kubrick Jobs solve this easily.

Kubrick integrates with the URLSession and UserNotification services because they are ubiquitous but any external persistent service can have this problem, and on iOS/watchOS/tvOS you have to prepare for eventuality of the app being backgrounded and/or killed by the OS.

Additionally, Kubrick's resilience features do help with all tasks, even fairly short lived ones, that must be completed in the face of abnormal termination. Submitting a task as a Job ensures even if the user kills the app or it crashes due to an internal error it will be restarted/resumed upon restart.

Thanks for the feedback and I hope that sufficiently demonstrates its uses for you.

Kevin_Wooten · October 31, 2023, 7:02pm

BTW both the examples I gave above get infinitely harder when you add in app extensions. Starting a background URLSession download in an app extension (e.g. share extension) means it may (but not always) have it's completion event sent to the main application. A similar situation exists for UserNotifications presented from an extension.

These scenarios requires developers to write a large amount of code to coordinate work between the multiple processes. Kubrick has specific facilities to make this coordination automatic and hidden, all while ensuring that Jobs always execute to completion.