New ServiceLifecycle APIs

A couple of weeks ago we tagged a new 2.0.0 alpha version of Swift Service
Lifecycle
which fully embraces Swift Structured Concurrency. Service Lifecycle's goal is to provide a library that helps applications to manage their internal subsystems in a structured way.

Structured Concurrency

Structured Concurrency was introduced with SE-0304 and enables developers to organise their code into high-level tasks and their child tasks. Using Structured Concurrency allows information to flow up and down the task tree such as task priority, task cancellation or user provided values. Furthermore, Structured Concurrency brings compile and runtime guarantees which make sure that every child task is finished at the end of a scope. These features make Structured Concurrency an ideal system for building complex systems such as server applications while keeping the code readability and maintainability high.

New Lifecycle APIs

Most applications are composed of multiple internal systems that make up their business logic. Before Swift Structured Concurrency was introduced ServiceLifcycle provided APIs to manage the startup and shutdown sequences of these internal systems. ServiceLifecycle APIs were focused on running startup and shutdown work in an ordered form. With the introduction of Structured Concurrency this can now be expressed in plain Swift code :tada:; Therefore, we worked on a new version of ServiceLifecycle that leverages Structured Concurrency to orchestrate the service’s internal systems. At its core, the new APIs consist of a protocol called Service and an actor called ServiceGroup. The former is used to define a common API to represent a subsystem. The latter is then taking a group of services and manages their lifecycle in a structure way, in practice, running each of them in a separate Swift child task using a TaskGroup.

Example

import Lifecycle

actor FooService: Service {
    func run() async throws {
        print("FooService starting")
        try await Task.sleep(for: .seconds(10))
        print("FooService done")
    }
}

actor BarService: Service {
    func run() async throws {
        print("BarService starting")
        try await Task.sleep(for: .seconds(10))
        print("BarService done")
    }
}

@main
struct Application {
    static func main() async throws {
        let fooService = FooService()
        let barService = BarService()
        
        let serviceGroup = ServiceGroup(
            services: [fooService, barService],
            configuration: .init(gracefulShutdownSignals: []),
            logger: logger
        )
        
        // This spawns a new child task for each service and calls the respective run method
        try await serviceGroup.run()
    }
}

Graceful shutdown

In addition to orchestrating Services, ServiceLifecycle introduces a new concept called graceful shutdown. Graceful shutdown is closely related to task cancellation; however, while task cancellation is required, graceful shutdown is opt-in. A common pattern with networked applications like web services is to gracefully shed load when handing a shutdown signal (most commonly SIGTERM), then exit the process once all outstanding requests have been handled. The graceful shutdown API is designed to help with this scenario. The APIs for graceful shutdown are very similar to the task cancellation APIs and graceful shutdown propagates down the task tree in the same way as task cancellation does. Additionally, ServiceLifecycle provides convenience APIs to trigger task cancellation on graceful shutdown.

public func withGracefulShutdownHandler<T>(
    operation: () async throws -> T,
    onGracefulShutdown handler: @Sendable @escaping () -> Void
)

Task.isShuttingDownGracefully

public func cancelOnGracefulShutdown<T>(
    _ operation: @Sendable @escaping () async throws -> T
) async rethrows -> T?

extension AsyncSequence where Self: Sendable, Element: Sendable {
    public func cancelOnGracefulShutdown() → AsyncCancelOnGracefulShutdownSequence<Self>
}

Example

actor CancellingService: Service {
    func run() async throws {
        try await cancenlOnGracefulShutdown {
            try await Task.sleep(for: .seconds(10))
        }
    }
}

@main
struct Application {
    static func main() async throws {
        let cancellingService = CancellingService()
        
        let serviceGroup = ServiceGroup(
            services: [cancellingService],
            configuration: .init(gracefulShutdownSignals: [.sigterm]),
            logger: logger
        )
        
        try await serviceGroup.run()
    }
}

Feedback wanted

We are excited to see how everyone is adopting these new APIs and would love to hear your feedback!:rocket:

24 Likes

We have added service-lifecycle to a number of internal libraries and we use it in applications that are running in production. Generally we really like the new APIs and we want to +1 them!

What we did before

Some of our components already worked on a pattern of having a single run function which stops when it receives the task cancellation signal. We had further helper code for listening to unix termination signals and cancelling the Task when certain signals were received. We were able to replace this with graceful shutdown listeners. We implement the Service protocol for all the components that we offer and expect our adopters to use ServiceLifecycle now.

Previously, we found ourselves often writing helper functions in the pattern of withHTTPClient { httpClient in which would handle the lifecycle for us. The with function would start the service, then call the closure, then stop the service. It would ensure the service is always shutdown, even if the closure throws an error. However, this resulted in heavily nested code when calling many with functions. It is also easy for users of our libraries to forget to use these helper functions, and instead directly instantiate services. In that scenario, they may forget to shut it down correctly.

Why service-lifecycle is better

For libraries, adopting service-lifecycle has made our code much simpler to write and to understand, as service-lifecycle is able to do a lot of heavy lifting for us. We were able to remove our own unix-signals catching, that our adopters used, since this func is now provided by ServiceLifecycle.

For applications, developers no longer need to ensure they shut down every service cleanly in every scenario (e.g. error cases). Using the with helper functions makes this easier, but results in heavily nested and hard-to-read code. With service-lifecycle, it becomes very clear what is running and what order the dependencies are in

Our experience

Conforming libraries to service-lifecycle

In the case of components which already have a single run function, adopting service-lifecycle is trivial and has made our code much simpler to write and to understand. Furthermore, we now have a concept of graceful shutdown, which allows us to implement more complex shutdown functions. For example, a HTTP server can wait for requests to finish when asked to shutdown gracefully, but forcefully stop them when the Task is cancelled.

Adapting legacy libraries

For components which are not based on the pattern of a single run function, it is not too difficult to adapt. We found there are 2 common patterns.
The first is components which have a start function and a stop function, and some way to wait for the shutdown to happen
They can implement a run function as follows

public func run() async throws {
   await cancelOnGracefulShutdown {
        try await withTaskCancellationHandler {
            try await self.start()
            try await shutdownFuture.get()
        } onCancel: {
            self.shutdown()
        }
    }
}

The second type is components which have no way of waiting until shutdown. They need some way to pause until shutdown is requested.

For example, AsyncHTTPClient does not have any equivalent to the run function but instead requires shutting down once it is no longer needed. We want to use AHC in projects which otherwise use service-lifecycle. We achieved this by implementing the conformance ourselves with 2 steps

  1. Do nothing until shutdown signal is received
  2. Shutdown

This requires blocking execution until the shutdown signal is received. We implemented an actor to keep track of this state, as follows:

actor CancellationWaiter {
    private var taskContinuation: CheckedContinuation<Void, Never>?

    init() {}

    func wait() async {
        await withTaskCancellationHandler {
            await withGracefulShutdownHandler {
                await withCheckedContinuation { continuation in
                    self.taskContinuation = continuation
                }
            } onGracefulShutdown: {
                Task {
                    await self.stop()
                }
            }
        } onCancel: {
            Task {
                await self.stop()
            }
        }
    }

    private func stop() {
        self.taskContinuation?.resume()
        self.taskContinuation = nil
    }
}

Then the run function can be implemented as

extension HTTPClient: Service {
    public func run() async throws {
        await CancellationWaiter().wait()
        try await self.shutdown()
    }
}

This is quite verbose, but is a small price to pay. In exchange, our adopters can specify their dependencies and have the lifecycle managed for them.

In any case, care is needed to ensure that the run function does not terminate prematurely, as this is considered an error by service-lifecycle, and would trigger a full shutdown of all services.

Adopting service-lifecycle in applications

Our applications which use service-lifecycle typically need to instantiate the services one-by-one, often in a particular order if one service needs a reference to another
These services then need to be added to the ServiceGroup and run. The api is generally simple and easy to use.
This makes it very easy to handle the shutdown, as it is done for us and done in order. However, if using legacy libraries, these either need to be adapted using the pattern above, or managed separately outside of the ServiceGroup.
It is easy to forget to add a service to the service group, this results in it not running which can cause the application to not work correctly.

This is a big improvement over the patterns we were previously using. Adopters don’t need to worry about ensuring every service is shutdown correctly and about handling signals.

Testing a library with service-lifecycle

The run function of a Service is a normal Swift function and so can be tested trivially. For checking shutdown is handled correctly, ServiceLifecycleTestKit provides a try await testGracefulShutdown { shutdownManager in helper function. We simply need to start our service, then call shutdownManager.triggerGracefulShutdown() and then assert that the service has shut down. Depending on the service, we might want to do further assertions that things were shutdown cleanly/correctly
This is really clean and simple!

Nesting services

For advanced use-cases, we found the API to be flexible enough to allow us to nest services. I.e., we could have our run function implemented as running a TaskGroup, which then runs multiple services underneath. This allows us to expose a group of Services as a single Service to our adopters.

Shortcomings

One potential shortcoming of the current API is there is no way to wait for a service to finish starting up. Services are started up one after the other. However, it is possible to work around this. For example, we can wrap service B in a LazyService (an implementation of Service which waits for a continuation from service A before running underlying service B). This way, we ensure B is not started before A has reached a certain point in its startup.
Service-lifecycle should be able to add helper functions for these use cases in future if needed, without breaking API.

3 Likes

Thanks for the great and detailed feedback!

This was an intentional design decision because I did not want to require services to expose their current state. What I recommend is that services offer an asynchronous sequence of their current state. You can then inject the service into other services that require it and they can wait until the right state transition has happened.

actor ServiceA {
    var state: some AsyncSequence<State> // Hypothetical some usage. Back this with a custom async sequence that is broadcasting.

    func run() async throws { ... }
}

actor ServiceB: Service {
    private let serviceA: ServiceA

    init(serviceA: ServiceA) {}

    func run() async throws {
        _ = await serviceA.state.first { $0 == .running }
        // Start to do your own logic here now
    }
}

I will add this as documentation to our Docc articles!

I'd like to offer an alternative solution/mental modal to tackle this. May not work for all cases, but I am using it a ton and makes great sense in my applications:

I tend to only treat components as Service that are actually listening for requests in some form or another (ie: http server, subscription to message queues, tasks queues, whatever you have...).

All other components that are needed to deal with the requests are initialized and connected before the serviceGroup.run. This could be connections to databases, message brokers, redis, whatever you have. These components generally do not have to "run", just be there and be alive once the gates are opened.

I'm using a separate lifecycle/extensions container, much like vapor or hummingbird (but with async/await shutdown) to handle this generically.

This way an application life cycle looks like this (pseudo code)

let extensions = try await initializeAllMyExtensions() //things are up and runing
try await serviceGroup.run(myServices) //open the floodgates for requests
try await extensions.shutThemAllDown() //disconnect things in reserve order

In other words, if you extract the "we need to at least be here before other services are safe" bit out of the run and call it before, things get a lot simpler.