SE-0329: Clock, Instant, Date, and Duration

jrose · November 8, 2021, 3:28pm

Well for one thing, 10^-9 / 2^-32 is not an integer, so you wouldn’t be able to represent 1 nanosecond anymore. That seems more useful than the extra two bits or so of precision.

Tino · November 8, 2021, 6:42pm

Really? NTP uses a resolution of 2^-32 seconds (or even 2^-64 seconds), so it's hardly possible that this is a fundamental problem.
I don't think exact nanosecond-precision is relevant for scheduling — and if it was, we probably want to use picoseconds to be on the safe side.

However, it would be strange if differences of commonly used Instants could not be expressed as Duration, and instants have to come from somewhere…
I have no idea if there will be significant progress in the near future, but right now, I think struct timespec will be used to retrieve time on most systems relevant for Swift (some facts about that would be really helpful…).
Seen in isolation, I don't think the chosen representation is the best option — but it's the one that fits best to the legacy we have to build upon.

Philippe_Hausler · November 8, 2021, 8:11pm

Per my current implementations the Duration is normalized such that it is a fractional second with at most 1 nanosecond shy of 1 second. However that being said; I am still experimenting with the storage such that it strides efficiently and maintains proper accuracy when the distribution is small (i.e. when measuring things that are VERY fast and dividing by a large number of events - specifically this is for making sure this is suitable to benchmark the swift-collections project and other such high perf scenarios)

Philippe_Hausler · November 8, 2021, 8:20pm

Per Date's encoding, we cannot break existing file formats - it would be an incredible shame if any change would make a file a user has saved somehow stop working because we change things; likewise we need to make sure that the stored values that are encoded are able to be deserialized by versions that exist in the wild today. That being said any additional new encoding forms; such as distributed actor based encoding or any potential future additions to the encoders/decoders can do more. '

In my view breaking file formats is worse than breaking ABI.

Part of the strategy is slightly split here: no functionality of Date will be lost, except perhaps cases like Date(timeIntervalSinceReferenceDate: .nan) which is nonsensical. Parts of the functionality will remain in Foundation, and parts of its functionality will be moved down. That split comes at the interoperation with bridging, localization, and calendrical interaction. The rest is in my opinion a core part of the behavior of Date itself.

ktoso · November 9, 2021, 11:33am

Philippe_Hausler:

mayoff:

If Date is entering the standard library anyway, perhaps we could teach the codable containers about it. For example, add an encode(_ value: Date) requirement to SingleValueEncodingContainer with a default implementation that, for backward compatibility, uses the lossy Double encoding.

Per Date's encoding, we cannot break existing file formats - it would be an incredible shame if any change would make a file a user has saved somehow stop working because we change things; likewise we need to make sure that the stored values that are encoded are able to be deserialized by versions that exist in the wild today. That being said any additional new encoding forms; such as distributed actor based encoding or any potential future additions to the encoders/decoders can do more. '

In my view breaking file formats is worse than breaking ABI.

Agreed with Philippe here, we can't break the serialization format -- as much as we'd like to. Or rather, I don't see there much to be gained by breaking at this point in time. Keeping the same format is a feature.

I would like to say though that "improving the serialization story" and customizability of Codable is somewhat of a different topic, and we're definitely interested in exploring this in general. That would address the actual issue with formatting dates -- someones you need a different formatting after all. So that's something we should not let sneak into this proposal, and have some separate work happen towards improving the customizability of formatting/serialization (be it via property wrappers, or per encoder configuration, either way really).

Karl · November 9, 2021, 12:38pm

Isn't breaking ABI undefined behaviour? I'm not sure anything is worse than that.

When it comes to serialisation, it turns out that nothing in Date is actually marked @inlinable (neither in corelibs-foundation, nor in Darwin Foundation from my tests). That means users would not need to change their Apps, and support for reading the old format could be contained within Date's implementation of init(from: Decoder).

The only issue would be if you serialised a Date using the new version, but loaded it with an older operating system.

That's still an issue, of course, but it's narrower. For an updated App running on an older OS, it may be possible to patch something in via @_alwaysEmitIntoClient (again, the current implementation wouldn't have been inlined). That narrows the problem case yet further - to an unpatched App, running on an older OS, loading data encoded by a newer App.

(EDIT: Oh, and doesn't Apple have the whole bitcode recompilation thing? Would it not be possible to automatically include a shim in App downloads which overrides the implementation in Foundation. so even unpatched Apps get the new implementation? Not saying it's worth the engineering effort to do so, but purely theoretically, could it be an option?)

It's also worth noting that this almost certainly wouldn't impact JSON, as dates in JSON tend to be encoded with ISO8601 or some other string date format rather than Double. According to research by the datetime-rs team back in 2014, Foundation is a bit unconventional in using a floating-point type for dates.

So what this comes down to is an unpatched App, running on an older OS, loading data encoded by a newer App, and which encodes dates as Doubles.

I don't know if Apple has any telemetry on how often Dates actually get encoded as Doubles, and in which contexts. I've been collaborating with the WHATWG and am constantly jealous about their data-gathering capabilities, and how it can be used to inform decisions like this.

For example, one issue: Prevent requests to HTTP(S) URLs containing raw \n and <. The Chrome team are able to provide metrics such as:

From Chrome's beta channel, we see the following numbers over the last week:

0.4708% of page views parse a URL containing \n .

0.2749% actually fetch a url containing \n .

0.0189% of page views parse a URL a URL containing both \n and < .

Having those kinds of numbers available would make this a lot easier. The cost of preserving Date's serialisation format is high - encoding and decoding a Date may return a different (!=) Date, with a different hash value, etc. That kind of unintuitive behaviour might have just as much potential to break Apps.

Is it worth it? Who knows? We don't have any data.

michelf · November 9, 2021, 2:15pm

Why is the actual proposal buried inside Detailed Design > Prior Art > Swift?

What is your evaluation of the proposal?

I don't like implicit conversions that lose precision or break equality comparisons. Nowhere does it say "implicit conversion" in the proposal, but it feels to me every time your Swift.Date interact with a Foundation.Date in a library or a framework it will incur a conversion to a Double. Is this the case?

The new integer-based storage format is better, but in usage it might be worse if you happen to unknowingly rely on implicit conversions. I think it'd be better if there was no implicit conversion so that losses of precision are explicit and equality is preserved when the value is passed around.

(I recently had a bug caused by a loss of precision after encoding a date to JSON that caused equality and ordering comparison to fail later. With implicit conversions this kind of bug will happen without the need of a coding roundtrip.)

Which brings me to the second point: Date is not a good name for a point in time having a higher precision than a day. It seems to have been chosen to make those implicit conversions with Foundation.Date seamless, but I don't think being seamless should be a goal (per my opinion on implicit conversion that lose information), so that should open the design to picking a less confusing name..

Is the problem being addressed significant enough to warrant a change to Swift?

Perhaps it is, but I'm not sure this proposal addresses the problem well. I'm also under the impression those implicit conversions would create other problems, by making the new Date type unreliable to store dates with precision (since the precise bits can be altered implicitly by various APIs using the old Foundation.Date).

Does this proposal fit well with the feel and direction of Swift?

An implicit conversion that changes the value is not a good fit. While it was recently allowed for CGFloat, the CGFloat conversion is lossless (in most cases) and more limited in domain (graphics and UI on Apple platforms), whereas points in time are used much more broadly with various precision and preservation requirements.

If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?

I've used various time APIs in other languages, but I don't feel like I really master the differences to do a meaningful comparison. I did make my own time abstraction layer once because the existing types to differentiate time points, time durations, dates (as in a specific day), date durations, and time of day were too error prone.

How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

Followed the pitch and read the pitch and review threads.

Douglas_Gregor · November 12, 2021, 8:38pm

Torust:

I'll save commenting on the wider proposal until I've had a chance to read through in more depth, but on the specific point of the naming of Date : I know it was of contention in the pitch thread and I happen to agree it's not a mistake that should be carried forward. Is there any technical reason it could not be given a new name, have a typealias of:
@available(*, deprecated) // with the correct stdlib version
public typealias Date = NewName
and change the Clang importer to map NSDate to the new name? It would remain bridgeable with NSDate under this scenario. Aside from a bunch of deprecation warnings in Swift code that uses Date currently (easily addressed by a find & replace), it seems like this would provide a fairly clean road forward.

Short answer: yes, with a tiny bit of compiler support, we could take the type Foundation.Date and make it NewName in the standard library, with Foundation.Date becoming a typealias for NewName. NSDate in Objective-C code would start importing as NewName, and we could maintain both ABI and source compatibility modulo the deprecation warnings you mention.

The feedback on this review thread, and really also during the pitch, is that we should not use the name Date for this type in the standard library. Much of the discussion has been on either extreme---either we invent a new type with a new name, or we keep the same Date type---but you're suggesting a middle ground that changes the name without introducing a new type. The pros/cons of this approach fall somewhere between the other two:

Pros:

As with introducing a new type, you get the better name going forward.
Because Date and NewName are are the same type, we can maintain source compatibility and migration to the new name is straightforward and mechanical. It's easy to support both names "forever".
As with keeping the same Date type, this change has neither ABI impact nor back-deployment concerns.

Cons:

Introducing a new name for the same concept creates a schism between "old" and "new" Swift code. This is unavoidable when changing the name or introducing a new type, but the cost is real.
Since we are using the same type (albeit with a new name), we don't get a chance to fix the serialization behavior.
We're still arguing against the name Date, but we aren't arguing for any specific replacement name. I've seen Timestamp (and arguments against it), WallClock.Instant (which is pretty verbose), and others. There's the potential that we don't actually have a significantly better name to agree on.

On balance, this approach seems promising to me. However, the serialization issue bothers me.

The serialization issue

Extending the representation of Date from 64 bits to 96 bits means that the existing serialization mechanisms, which default to using a Double are lossy. This is certainly a problem, and the only way to comprehensively fix it is to introduce a new type. I agree with @Philippe_Hausler's view that "breaking file formats is worse than breaking ABI", so we can't just change the default. However, we do need some path forward.

@mayoff makes some interesting suggestions about trying to preserve additional precision with the JSON floating point encoding. Such a scheme has the advantage of working within most existing infrastructure, although it's certainly not perfect: it would still not round-trip through many other JSON-based utilities.

I accept that the default will be lossy in some situations, because that's the cost of working with existing data and other programs in the ecosystem. However, I'd like to see the proposal introduce some way of losslessly encoding the 96-bit values. Perhaps that means extending JSONEncoder and JSONDecoder with new DateEncodingStrategy and DateDecodingStrategy cases that maintain full precision, or adding some kind of codable-specific property wrapper that enables specific uses of this type to encode losslessly:

struct MyType {
  @FullPrecisionNewName var creationTime: NewName
}

Doug

Jumhyn · November 12, 2021, 9:05pm

It's mainly the "other programs" that are the concern here, right? As @Karl notes, if it were just "existing data" we were concerned about, it would be a viable path forward to have the new Date encode in a lossless format but continue to accept the (old) lossy format when decoding.

Douglas_Gregor:

However, I'd like to see the proposal introduce some way of losslessly encoding the 96-bit values. Perhaps that means extending JSONEncoder and JSONDecoder with new DateEncodingStrategy and DateDecodingStrategy cases that maintain full precision, or adding some kind of codable-specific property wrapper that enables specific uses of this type to encode losslessly:
struct MyType {
  @FullPrecisionNewName var creationTime: NewName
}

If the problem can be narrowed just to 'new' programs producing serialized Dates which need to be read by old programs, is that narrow enough that we could invert this burden, and ask that authors of such 'new' programs do the annotation?

struct MyType {
  @OldLossyCoding var creationTime: NewName
}

Also, in either case, it would be great if such annotations didn't have to happen on a per-use basis. Would it be feasible to have this be settable via a static var on new Date, or a compilation flag or something (or would that be a ~dialect~? )

Philippe_Hausler · November 12, 2021, 10:07pm

Timestamp is perhaps slightly better, but looking at the potential impact to APIs it wasn't demonstrably better that it merited the churn factor. And this becomes even more of an issue if you consider the carry on effects like documentation or tutorials. Using a different name will instantly make those obsolete and not optimized for searching. So the bar for a new name for this existing behavior seems very high for me.

I think it would be fair to say that this is definitely something that would be an item we would likely want to investigate in Foundation. I would be even willing to go to bat for arguing that we should accept that as a conditional responsibility for this proposal. Obviously I have obligation to a different process for those changes but it seems like a reasonable ask for accommodation.

In the proposal (perhaps with too much hand-waving) that was an inference I made with regards to distributed actor transport serialization. We don't have a concrete spelling of that yet but I don't think sending a lossy type over the wire is the best move.

Karl · November 12, 2021, 10:36pm

+1 to inverting the burden.

I'd like to stress that if encoding a Date becomes lossy, that is also an invisible behaviour change, and has just as much potential to break existing applications as changing the serialisation format does.

Going with a lossy encoding is not an easier, safer, or more compatible option; it just prioritises one very, very specific scenario (an unpatched App, running on an older OS, loading data encoded by a newer version of the same App, and which encodes dates as Double s) at the cost of unintuitive behaviour which will burden all future Swift applications, and all future Swift developers, "forever".

tera · November 12, 2021, 10:52pm

100% my feeling, and i'll make it even stronger - it should be bit preserving. this on my scale is more important than nanosecond precision hundreds years from now. if bit preservation could be achieved only if we introduce a new type - then we should introduce a new type and live with back & forth conversion between the old and the new type.

yeah, i hate those...

Philippe_Hausler · November 12, 2021, 10:56pm

I think any codable type should always round trip on the same machine just fine. The issue at hand that has been perhaps misunderstood is the case of opening old files and saving files that will be read by older implementations.

Jumhyn · November 12, 2021, 11:21pm

Huh, could you elaborate? AFAICT, this is as specific as the proposal gets:

Readers may have noticed that Date remains Codable at the standard library layer but gains a new storage mechanism. The coding format will remain the same. Since that represents a serialization mechanism that is written to disk and is therefore permanent for document formats. We do not intend for Date to break existing document formats and all current serialization will both emit and decode as it would for double values relative to Jan 1 2001 UTC as well as the DateEncodingStrategy for JSONSerialization. This does mean that when encoding and decoding Date values it may loose small portions of precision, however this is acceptable losses since any format stored as such inherently takes some amount of time to either transmit or write to disk; any sub-second (near nanosecond) precision that may be lost will be vastly out weighed from the write and read times.

I interpreted this as, basically, "new Date will encode/decode itself as though it were the Double closest in value to that represented by the new second/nanosecond storage mechanism."

What is the actual serialization behavior being proposed that would maintain precision when round-tripping on the same machine while continuing to work with old formats? If that was spelled out earlier somewhere, I missed it.

Philippe_Hausler · November 12, 2021, 11:36pm

So the serialization options of Date in for example JSON are:

deferredToDate
secondsSince1970
millisecondsSince1970
iso8601
formatted
custom

The custom case obviously cannot take advantage automatically so I am omitting that case (it is left as a task for the implementors that use that).

The other cases however all have intrinsic knowledge of that parsed type on encoding, this means that we know at the point of serialization that the type is a Date. This means that we can read the seconds and nanoseconds from that type via the accessors that are not fetching through a double. The flip side of decoding (which I am still working on an implementation that has the appropriate level of information ferried through at that decode point) can have the correct type info to parse out the fractional parts as nanoseconds and the whole part of the value as seconds.

Now obviously these parts are not included in the proposal so that is solely implementation details of how Foundation's JSONEncoder/Decoder works. We won't be changing how dates are encoded with regards to storing them as a pair of integers versus a JSON double value; we still will obviously respect the strategies (since that is distinct format requirements). However how the JSON parser parses or emits them is something we can investigate changes to, ensuring proper backwards compatibility but also preserving round trips of values.

That portion is distinct and separate from this proposal so I guess I erred on the side of ambiguity to ensure we have our bases covered.

The basic rules stand; we can't break existing serialization, minor drift when dealing with old files or sharing to old machines still can occur, but the encoding should be made as best effort (with a mind for performance, since that is something we are always keenly aware of).

Beyond that I don't have specifics for those higher level details.

Jumhyn · November 12, 2021, 11:48pm

Awesome, thank you for writing that up. So, just to make sure I'm understanding properly, which of the following should hold for the new Date's serialization behavior:

New Date's Encodable behavior will encode enough precision such that Decodable will be able to reconstruct the exact same new Date value.
Decoding a new Date from serialized old Date data will construct the (a) new Date that is closest in value to the old Date that was serialized.
Decoding an old Date from serialized new Date data will construct the (an) old Date that is closest in value to the new Date that was serialized.

It seems like you're saying that (1) should hold but that (2) and (3) may not. Is that right?

Zollerboy1 · November 13, 2021, 12:19am

I have two additional suggestions to make: firstly we could take the name from C++: TimePoint. At least IMHO that would be a significantly better name than either Date or Timestamp.
But I think there would be a much better name which unfortunately is obstructed as of the last revision of this proposal: I would like to call this type Clock.Instant. That would mean that the WallClock (which was another repeatedly criticized type name) just becomes Clock, which could make sense IMHO, and we would have ClockProtocol back. I think that this would be worth it, because Clock.Instant just makes sense.

hassila · November 15, 2021, 7:42am

Well, the pushback was so strong so the question was posed whether the authors agreed that Date is misleading at all and whether it was technically possible with alternative spellings, because if not even agreeing on that basic thing discussing alternatives seemed fruitless (and there was no answer to that question, which at least made me assume that there was little alternative to discuss options).

So there are several suggestions in this thread that all would be better than Date which actively obstructs understanding to new users in my opinion (e.g. Clock.Instant, TimePoint, TimeInstant and one could add MomentInTime and more, personally I think TimePoint or Clock.Instant would be a lot clearer, either works for me). So there is some advocacy for alternatives not just against the status quo.

As mentioned I think the overall direction of the pitch is great and very happy to see movement in this direction, but echoing others this seems to be the only point in time this can be corrected (or should I have written it’s the only Date it can be addressed? ;-).

Karl · November 15, 2021, 9:04am

Are you talking about Codable or JSON? It may indeed be possible for a JSON implementation to encode the time as an arbitrary-precision floating-point string and be lossless (depending on what JSON allows), but this is not true for encoders or decoders in general.

Also, these kinds of type-specific hacks do not always work in generic contexts. For example, Foundation's URL has a similar issue: URL fails to decode when it is a generic argument and `GenericArgument(from: decoder)` is used

So even if JSON allows it, it's a fragile solution.

Karl · November 15, 2021, 9:26am

Not only that, but the author made extremely tenuous claims that there was impossible unless we were willing to break ABI, that it might have to wait until Swift 6 and even then probably wouldn't meet the bar for breaking changes. I immediately called that out as being inaccurate, and that has now been confirmed.

Personally, I favour the name Timestamp. It is claimed that there are arguments against it, but the only argument I can find from the pitch thread is this:

Which I find to be just as flimsy as the ABI claims. That wikipedia article makes clear that there are many forms of timestamps, including literal rubber stamps, and its example section includes precisely the kind of epoch-relative timestamps which Date represents. The article's section on "Digital timestamps" refers to the article "Timestamping (computing)", which it defines as "the use of an electronic timestamp to provide a temporal order among a set of events." Which is a perfectly fine definition.

The entire idea of changing the name has been met with nothing but hostility from the Foundation team.
It is rather disappointing. I'm sure we'd all rather not fight over this, but at the same time, the rest of the community seems to unanimously consider the name Date to be unacceptable.