Clock synchronization

Tino · October 27, 2021, 1:49pm

During the discussion of [Pitch] Clock, Instant, Date, and Duration, distributed actors and the ability to define deadlines for remote computers have been presented as motivation for some additions to the stdlib.
That thread is already quite long, but I'm really curious how exactly coordination is supposed to work ([Pitch] Clock, Instant, Date, and Duration - #256 by Tino).

To sum it up (as far as I understand it):
It's stated that deadlines shall be transferred as absolute time (UTC), relying on NTP for ensuring a common baseline.
However, I don't understand how this is supposed to work:
I guess for devices like iPhones, we can assume pretty high precision due to GPS — but I think most servers don't have GPS, and you can't even enforce proper NTP-sync.

I'm definitely not an expert, but I think distributed actors will either have to use durations to specify timeouts (that's not an option for reasons beyond my understanding), have some internal mechanism to keep track of deltas — or require that the involved computers take care for common time themselves (thus causing errors when system clocks are out of sync).

How is this handled in existing solutions, or am I missing something and there's no problem here at all?

Dante-Broggi · October 27, 2021, 3:21pm

If I understand correctly, the thing with preferring deadlines to timeouts is always valid within a single clock domain, while you are talking about what is better for transmission between clock domains.
However, IIUC, in the proposals transmission between clock domains is being implicitly deferred to the transmission system (e.g. actor transport, RPC implementation, etc.).

The reason timeouts are preferred within a single clock domain is demonstrated in the difference between these functions which deal with a single clock:

func repeatUntilSuccess<T>(or deadline: Deadline, on clock: Clock, f: (Deadline, Clock) async throws -> T) async throws -> T {
  while true {
    do { return try await f(deadline, clock) }
    catch let e as CancelationError { throw e }
    catch { continue }
  }
}
func repeatUntilSuccess<T>(or timeout: Duration, on clock: Clock, f: (Duration, Clock) async throws -> T) async throws -> T) async throws -> T {
  while true {
    do { return try await f(timeout, clock) }
    catch let e as CancelationError { throw e }
    catch { continue }
  }
}

The first (with deadlines) is guaranteed to complete within the deadline + error processing time.
The second can wait forever, as long as each attempt takes less than the given duration.
The first is generally preferred.
It is possible to write the first with durations, but it isn't easy and usually amounts to re-inventing deadlines anyways.

Karl · October 27, 2021, 5:07pm

I'm also not an expert on this topic, but my understanding is essentially what @Dante-Broggi said: the distributed system has its own clock domain, and times and deadlines are all contextualised to that domain. It doesn't really matter how other clocks interpret those deadlines.

To draw an analogy: there are some drafts circulating in the IETF to introduce some new forms of UUID (versions 6, 7, and 8), specifically designed to support distributed systems. Part of this involves giving each node the ability to specify a node/machine-ID which is negotiated through some implementation-specific process and guarantees collision avoidance within that distributed system. UUIDs which incorporate a node-ID may have a higher collision rate when compared to all UUIDs everywhere, but they can be guaranteed unique within the distributed system.

I don't know if that helps, but in my mind I understand the two situations to be similar. You define your own little universe which you can completely control, and that enables you to coordinate all the nodes as a single, coherent system. Values such as deadlines or UUIDs may only be meaningful within that private universe.

ktoso · November 2, 2021, 10:33am

Hello everyone,
now that we released our work in progress distributed actor cluster I got a bit more time to reply here

To be clear though, the "deadlines are a point in time" is not really because distributed actors per se, but because any such "multiple threads / tasks / processes / computers" system (even a bunch of microservices) will require such representation. Hopefully this post can clarify why.

@Dante-Broggi's example is quite good -- that's one of the issues. That's what we mean with "timeouts don't compose". To give another typical and very visual example how "timeouts" are just outright completely wrong when transmitting to other peers:

Client wants to express: "give me a response, within 5 seconds" (I'll use seconds for simple math).
they do so at time T0

Consider sending the request with a timeout:

set timeout 5 seconds
locally schedule a timer to fail / stop waiting after 5 seconds (so far so good)
also, transmit to the server "If it takes you longer than 5 seconds to produce a result, I'll already have cancelled my waiting and won't use the value; so don't bother producing the result anymore"

a) if you encode this as "5 seconds" in the request this is what happens:

"client: ok, timeout 5 seconds"
T0                                         Tcd                             Tx
|----\[serialize][send] ~ network latency ~ [server receive][server replies]|
      \---------\                          <client timeout>                 |
                 \----\                                                     |
                       \~~~~~~~~~~~~~~~~~~~\                                |
                                            \--------------\                |
                                                            \--------------\|
                                            ^
                                            ^
                                            ^
"oh, client wants 5 seconds... ok, set timeout 5 seconds..." --------------->

The fact that the server would just take the "oh, 5 seconds, ok" is not what the client expressed, and that timeout is pretty useless. It missed to account for any async/serialization/network latency involved in the message transfer.

b) the same expressed as a "point in time deadline" works correctly, assuming the clock drift isn't too terribly out of whack between them. I'd suggest using gracious deadlines here, and not super tight ones, this is just to avoid "clearly unnecessary work", and not "ok it'll definitely complete in 20ms"...

"client: ok, deadline = now() + 5 seconds"
T0                                         Tcd                             
|----\[serialize][send] ~ network latency ~ [server receive][server replies]
      \---------\                          <client deadline exceeded>       
                 \----\                                                     
                       \~~~~~~~~~~~~~~~~~~~\                                
                                            \--------------\                
                                                            \ cancelled=true
                                            ^
                                            ^
                                            ^
"ok, deadline min(now + maxAllowedTimeout, Tcd) == Tcd" --> xxxxxxxxxxxxxxxxxxxxxxx

Notice that the server does not have to JUST blindly trust that point in time given by the client.

The clients clock could be complete nonsense, and we guard against in in two ways:

if the Tcd point in time we got is already in the past -> we can immediately cancel and not even begin work on this request, the client is out of whack (we can have some tolerance threshold here etc...)
- again... those deadlines are best effort and gracious timeouts -- and not "100ns"
if the Tcd is way too far in the future, we bound it with something the server considers a max timeout for a request.
- This makes sense because services usually have some SLO (service level objective) how fast they should be able to reply to responses. So if the SLO is 400ms, we can set this to some "well, if this is taking longer than our SLO, definitely cancel the work" (and alert that we're missing our objectives).

So... Does anyone do this? Yeah, it's common in Go services:

The same applies equally to just asynchronous tasks locally -- you want to set a specific deadline by which tasks should be completed, and not just "5 seconds" which doesn't mean anything, as shown by @Dante-Broggi's example. E.g. every time you enter a function with "timeout 5 seconds" it keeps being the same 5 seconds... even if it's already many seconds past the "5 seconds from the beginning of the first task".

Also, in Swift Concurrency we'll want to do the same thing:

// PSEUDO CODE, NOT PROPOSAL
task.deadline = .now() + .seconds(5)
// oh deadline was already "now() + .seconds(1) -> DONT UPDATE DEADLINE

If this task, or the parent task had a deadline already set, and it is earlier than this new one -- we'd NOT extend the deadline. And that's a feature

As for the questions on "can we trust clocks"...

You have this reversed: datacenters are the ones with very well synchronized clocks, vastly better network hardware and (more) predictable latencies, especially compared to devices on random flaky networks

Since we're nerding out here a bit... here's some more fun reading about advanced clock synchronization systems. They're only used by specialized applications, but just FYI that NTP isn't the "endgame" for these things:

Clock synchronization techniques had an exciting renaissance recently, ever since Google's TrueTime. Check out Spanner's TrueTime [1], Amazon TSS [2], Sundial [3]). Those can get clocks synchronized down to hundreds of ns.

But anyway, not all systems need these super synchronized clocks at all (Spanner needs them because it uses them to commit transactions). NTP gets you around ~100ms synchronization AFAIR (again, don't trust client devices tho), so honestly for "plain old" service stuff it's quite enough...

Fun reading... none of this is a requirement for "normal boring 1 second resolution" deadlines for best effort request cancellation, but it's a very nice read:

[1] Spanner: Google’s Globally Distributed Database Spanner, TrueTime and the CAP Theorem – Google Research
[2] Amazon's Time Sync Service (it's not ntpd) Manage Amazon EC2 instance clock accuracy using Amazon Time Sync Service and Amazon CloudWatch – Part 1 | AWS Cloud Operations & Migrations Blog seems this is out since 2017 as well: Introducing the Amazon Time Sync Service
[3] Google's recent paper on Sundial Sundial: Fault-tolerant Clock Synchronization for Datacenters – Google Research
- Sundial is very very impressive (TrueTime already was very impressive)
- "Through experiments in a 500-machine testbed and large-scale simulations, we show that Sundial can achieve∼100ns time-uncertainty bound under different types of failures, which is more than two orders of magnitude lower than the state-of-the-art solutions. "

Tino · November 3, 2021, 2:20pm

While the first link talks about an error called Context deadline exceeded, the examples actually only use durations (context.WithTimeout(req.Context(), 1*time.Second)) — and so does Go Concurrency Patterns: Context - The Go Programming Language.
I couldn't find a single example where absolute time was used when talking to a second computer, and besides Go, I searched in the context of Akka, Julia and Erlang.

Maybe this was just bad luck, and there are plenty of counterexamples. But still, my gut feeling is that timeouts are a better choice than deadlines in most scenarios.

Peer to peer
Deadlines don't work reliable when you are not in full control of all computers involved — timeouts just add some overhead (and some algorithms have to be designed in a different way).

Client server
Same as above — usually, the server has not full control over its clients (thinking of something like SETI).

Cluster with large chunks of work
This would be something like a datacenter with long-running tasks ("take this video and try to compress it in the next five minutes").
Here, it would be possible to deploy something like Sundial, or at least run NTP-update several time a day, so deadlines are feasible.
However, it's still yet another thing which can break — and you don't get much in return:

In this scenario, overhead is not only predictable, but also so small that it doesn't matter much. Imho that's a price worth paying for a more robust system.

What's left for deadlines?
The scenario I can think of is when time uncertainty is small compared to overhead, and the actual timeout isn't that much bigger.
Maybe you can add more here, but my only example would be a cluster that needs fast processing of small messages, maybe even with significant differences for the overhead.

However, even if you are aiming for a use case where deadlines are superior, I'd suggest to at least include also an alternative transport layer that does not require precise clock synchronisation (shouldn't be that much additional work, should it?): For someone who is just curious and wants to try the feature, lowering the requirements might prevent some serious frustration.

ktoso · November 3, 2021, 9:29pm

I'll admit I overdid my example a bit there, it seems Go's Request WithContext does not propagate the deadline -- I don't know why I was under the impression they do I was expecting it to propagate for some reason, with the assumption that within DC the clocks are good enough.

I'm definitely not an advocate of using wallclock time for correctness – that's just wrong, and we never do this this either in Akka nor our current Swift Distributed Actor cluster.

My impression of Go's mechanism has mislead me here and I'll step back on that idea.

I do wonder if we could offer an opt-in for propagating as:
request timeout == remaining time amount == now() - task.deadline
I've not thought about these enough though, but that would seem viable in theory though and not suffer from needing to sync clocks.

Deadlines though: they are vastly more composable in-process, as highlighted by the examples above. So even within process we'll want deadlines, not timeouts. The Go examples that do WithTimeout literarily invoke WithDeadline(now() + timeout) and the storage is deadlines.

ktoso · November 3, 2021, 9:58pm

I guess another topic that i mixed into this subconsciously is IPC -- in cross-process, but same host, the propagation of the deadline does make sense Seems this propagating or not will be very transport specific.

Thanks for the thread, helped me to re-verify my understanding! Silly mistake at the end honestly, thanks.

Tino · November 3, 2021, 11:32pm

Glad that this topic is considered to be helpful!

I think @Dante-Broggi introduction of the term "clock domain" really helps talking about the issue:
Although I think that timeouts are more convenient than deadlines even when there is only a single domain (just look at the examples in [Pitch] Clock, Instant, Date, and Duration), deadlines are more versatile (and it is trivial to translate a duration to a deadline).

"My" naive solution to utilize deadlines created with a different clock would be transmitting the local time — with each message that is sent, during initial handshake, or following some clever rules.

One thing that brought me here is that keeping track of remote clocks frees you from the need to use a wall clock (with all its downsides). Instead, the sender could simply use its uptime clock (maybe adding some random number when you are concerned about revealing this information).

This does not help with latency, but even with two clocks in perfect sync, you'll still have to deal with that anyways: As long as you need an answer, this won't be there before deadline plus the time it takes to transfer the result (and maybe it is feasible to estimate roundtrip time).