[Pitch] Clock, Instant, Date, and Duration

Karl · October 1, 2021, 1:22am

Indirection means Date would no longer be POD and require reference counting. That's... a bit much I'd rather have a slightly larger inline value.

tera · October 1, 2021, 1:22am

or just allocate more bits for seconds and fewer bits for precision when you move further and further from today. although that would be very similar to double... hopefully it's possible to make it reasonable enough (e.g. drift free, etc). could you provide a use case scenario or a few that illustrate the issue? i'm not inherently opposing, just following the 5 whys here, in the pursuit for the most reasonable solution.

i'd say that would be far worse... e.g. that will prohibit its use in realtime threads that can't call anything that might block.

tera · October 1, 2021, 1:34am

e.g. something as simple as 30 bits for fraction (nanoseconds) + 33 bits for seconds for dates reasonably "near" to 2000 (+/- 135 years). 10 bits for fraction (milliseconds) and 53 bits for seconds for far dates (up to +/- 142M years). (we'd also need to spend one bit as a selector between the two, corrected the numbers above).

xwu · October 1, 2021, 2:49am

This feels like overengineering. What is the harm in using an additional 32 bits that merits this sort of complication to avoid? What is the point of actively thwarting representations with nanosecond accuracy around 4 BC? This is an implementation detail that not only doesn't need Swift Evolution discussion (so long as there are sufficient bits not to impact roundtrip compatibility with existing types), I can hardly see the point of spending anyone's time to optimize it.

tera · October 1, 2021, 5:37am

round trip compatibility would be a problem for some dates in both solutions, especially if to start the trip with the newer type or even if start the trip with older Date type and consider dates near zero (2000) where double works best as it can allocate most of its mantissa bits for the fraction. true "both ways" round trip compatibility would only be possible with matching number of bits for old and new types and either hard or impossible to achieve unless the new type is using double as well. if both ways round trip compatibility is a must - it's worth returning to the question "what's wrong with double representation of time / time interval". some examples would be essential here.

4BC? the proposed approach can express nanosecond resolutions all the way down to billion years prior to big band - some 280 billion years before time was invented, and then keeping its uniform nanosecond resolution all the way up to some 285 billion years past the death of our galaxy. ability to represent such days and nanoseconds would be just a peculiarity if wasn't a waste: the dates bits will end up in RAM and cache and might also end up being stored on disk and sent across network - neither of those is free and some of those is quite expensive, be it on mobile or on the server side when used at scale. and as you well know space and time here go hand to hand, when things stop being fit into L1/2/3 cache everything becomes much slower, plus the relevant O(number of bits) operations per se would be slower for things like hash and EQ or memmove. i trust this would be actively harmful.

Lantua · October 1, 2021, 5:53am

Adding onto @xwu a bit. It also sounds like 10+ operations before you can even begin to do any arithmetic, and you'd probably end up w/ that 96-bit scheme during the calculations anyway.

tera · October 1, 2021, 6:03am

it's time vs space... if it so happens that it is 10 more CPU operations vs a cache miss - the former is still faster...

the 30+33 / 10+53 approach gives nanosecond resolution for the nearest 135 years and millisecond resolution for some +/- 142M years beyond that. the original proposed approach has nanosecond resolution for many billion years beyond the galaxy lifetime:

Lantua · October 1, 2021, 6:16am

That's a pretty big "if" and will likely need solid data in either direction. That is, if the difference is not negligible to begin with.

Ok, that's what I missed.

Plus it looks more like they're trying not to over-engineer it that we end up w/ 280-billion-year super resolution.

Just like how we can give each star in the observable universe quadrillion of IPv6. IETF most likely didn't plan for star wars back in the 90s (or did they? cue Vsauce music).

tera · October 1, 2021, 6:31am

just remember the level of efforts of space and time tunings that were carefully put into ref count optimization (small - inline, large - offsite), or short vs long strings storage optimization, or tagged objects.. "lets just through another 32 bits here" would go against all those efforts. and it's not just space wastage, indirect slow downs that it will cause due to cache misses and associated costs it would add when deployed, say at "google" scale - it's also a direct CPU wastage due to slower O(number of bits) operations, e.g. for the hash calculations when you put the new date values into dictionaries or when you memmove things on array inserts.

would be super useful to see some real world examples when Date's double usage is inappropriate. @Philippe_Hausler

Lantua · October 1, 2021, 6:56am

Yes, I remember. Thanks for reminding me of that. Lest we forget String's ABI. So at the very least, there's someone inside ~~secretly~~ packing these bits up.

Again, that is a pretty bold statement, not the least bit implying that their impact are on the same scale and frequency.

Yes, I hear you the first time, but to me that's just a normal engineering tradeoff depending on the intended usage and scale of deployment. Esp. since your proposed solution isn't better in every way even while disregarding difference in resolution.

Now, it's quite possible that the design doesn't quite match the real world usage, which is also why we're having this thread. However, unless presented w/ evidence that current approach would adversely affect ones' use cases. I doubt there's much point in continuing this discussion (so refraining I shall). Conversely, the same goes if bit-packed approach is in the original design.

Karl · October 1, 2021, 7:16am

I think @Philippe_Hausler has been clear about what motivates this change:

And what the design constraints are:

And the sort of impact it is expected to have (not much):

Yes, it's good practice to keep data types as small as they can be, and we shouldn't make data types larger for no reason -- but these are just general, philosophical points.

If you want to challenge the proposed change, (IMO) we need more than just abstract points about how larger data types might be wasteful, or possibly result in cache misses, or how they might not be appropriate if you scale them to the moon. Those arguments don't convince me that the change in storage representation is more harmful than having an inaccurate Date type.

Systems at the very low-/very high-end often need specialised data types (or even specialised Swift distributions! e.g. Swift for embedded platforms which omit Unicode data tables for String, or omit core libraries such as Foundation). If they need a more compact Date, at the cost of accuracy, they can have that.

tera · October 1, 2021, 7:36am

yep, i'm looking for real world examples of that. e.g. "i used a NSCalendar based calculation and got inaccurate results around 4BC when i am doing that and that" or, "when i was scheduling a timer 10 years in the future it's accuracy was only milliseconds if i do this and that". some concrete examples with timer accuracy issues, that do occur.

if there's enough interest here (indicate your interest with a like) i can play with a struct like this:

struct NewDate: Equatable, Hashable {
    let seconds: Int64
    let nanoseconds: Int32
}

struct Person: Equatable, Hashable {
    let id: Int
    let firstName: String
    let lastName: String
    let dob: NewDate
    let dateLastActive: NewDate
}

which is 56 bytes with old dates and 68 bytes with new dates, perhaps draw some performance charts how the two compare for searches / inserts depending upon number of persons, arrays, vs dicts vs sets. if people of this thread are interested in such comparison.

David_Smith · October 1, 2021, 7:48am

An odd one, but when I was working on NSUserDefaults, at one point I ran into a bug where an application was storing a date in the year 1AD in its preferences, and that was getting lost as it was sent over XPC due to differing representations. I suspect having a date that far back was unintentional, but it's hard to say for sure.

tera · October 1, 2021, 8:13am

moon is at least reachable... dates some 280 billion years before the big bang just don't exist. i'll return you your statement and claim that anything that requires beyond, say, +/- 100 years of nanosecond accuracy (even that's an overkill!) or more than +/- 10000 years of second accuracy can be treated as a special case and addressed by special custom means. adding 32 bits to a general purpose time type would mean we all have to pay this new "tax", even if it's really meaningful in 0.00001% cases (a number of cases here is a speculation highlighting the claim that we need those real world use cases and their need and frequencies spelled out!)

Tino · October 1, 2021, 9:57am

I really wish Date could be left out here and get a real (compatibility breaking) overhaul in a different context:
For me, a date is something like the first of April, not a "reference point in time ± nanoseconds". Actually, you could even argue that not only the sub-second precision is way over the top, but also that a date does not have to be tied to a certain year — and even when you include the year, you still did not take care for timezones… so in short: Dates are really complicated (no pun intended ;-), and a big discussion on their own.

However, this proposal does not seem to be concerned with questions like "when will easter happen in the year 2525?", but with topics that are much more relevant for computers than for humans. Therefor, I think we should consider representations that are less handy for us, but are better suited for machines:
Having seconds and nanoseconds is easy to understand for people used to calculate in those units, but for a computer, a single number type with a fixed precision is way better: It doesn't waste any memory, and it makes calculations less complicated.

Update: Always those restrictions by actual hardware ;-) — probably calculations with a 80 bit integer would be slower than two numbers which need some additional processing, but are natively supported by the CPU...
However, I still don't think there is a much benefit in having a direct translation to nanoseconds (vs. using all 32 bits of the smaller part to represent a full second).

Karl · October 1, 2021, 10:39am

It is perhaps worth mentioning that the representation proposed for Date is not novel or exotic. The other name for it is a timespec, and it's part of C11, POSIX, and C++17, and the same representation is used by Rust, Java, and many other languages. It is the way most operating systems provide nanosecond-resolution time. So I'm not sure that there really is any meaningful "tax" to using this representation.

The problem with 64-bit floats is that their absolute error grows with distance from the epoch. This seems to be a widely accepted problem:

Note that IEEE 64-bit floats are not a sufficient representation for timespecs because their nanosecond-precision range is confined to a period of 208 days centered on the epoch.

Scheme SRFI-174

The [ISO C 89] standard theoretically allows a floating point type for time_t , but even a 64-bit FP type cannot represent the resolution of modern CPU bus-cycle counters. In addition, the absolute error of floating point types grows with distance from the epoch, which can be problematic in many applications.

Modernized <time.h> for ISO C

michelf · October 1, 2021, 10:52am

I've always been bugged by this too. In some code of mine I use the name TimePoint for this concept.

Although if the intent is to replace Foundation.Date, keeping the same name should make the transition smoother... maybe. It might not if we get errors of the kind "type Date expected but was passed a Date instead" because one of the two is Foundation.Date.

AliSoftware · October 1, 2021, 11:04am

Thanks a lot @Philippe_Hausler for refining the proposal with the recent feedbacks!

To add on the recent discussion, I'm not sure I fully grasp the difference in concept (not talking about implementation details here but more vocabulary definitions) between Date and Instant.

To me, the concept of "a specific and precise moment in time" (as in a moment in time that is happening at the same time for everyone wherever they are and independently from any human conception of calendars and timezones) is what I usually call an Instant, and what I think the NSDate/Date types from Foundation should always have been called.

That would have avoided a lot of confusion by many developers (beginners or not) who understandably think at first that NSDate was representing a (human concept of) Date, as in "9am" or "Oct 1, 2021" while this is a job for a totally different class of types (Calendar, DateComponents and friends) unrelated to Date and Instant and what we're trying to define in this proposal.

And I'd love for this proposal to be the occasion to rectify that legacy naming mistake (even if not in Foundation — as this might be outside of SE's scope, at least not reproduce it in the stdlib).

Could anyone clarify to me why Instant and Date seem to be different types in the above discussions (and listed at 2 entries in the proposal's "definitions" intro section, while I don't understand the difference in their respective definitions there)?

Couldn't Instant just be what we call Date above (ie the 64 bits POD type just holding the number of nanoseconds since an epoch, or whatever implementation details it ends up being) and Clock.now return such Instant (aka "number of nanoseconds since a fixed epoch)?

benrimmington · October 1, 2021, 11:34am

According to that C/C++ reference, timespec doesn't allow negative values.

The proposed Date(s: Int64, ns: UInt32) does support negative values, but it can't represent less than one second before the epoch, i.e. negative zero seconds.

Int64 could be used instead, by reserving a bit for the precision, leaving 62 bits for the value:

±146 years, stored in nanoseconds.
±146 thousand years, stored in microseconds.

michelf · October 1, 2021, 11:50am

One thing I do like about the Int64+UInt32 scheme is that it's straightforward to truncate it to the second. If you only need to store with a precision of seconds (like presumably in a dateLastActive field), then you can just store the Int64 part: the roundtrip (after truncation) is guarantied to always produce the exact same date.

Granted, you'll then have to do conversions. It might be annoying to have to convert between Date64 and Date96, but we can manage it with integers so probably we can do the same with dates too.

And sorry to be a bit pedantic, but I think the dob (date of birth) field probably should be stored as a calendar date of some kind, not as a "Date". In most situations at least.