[Pitch] Improved Compiler Support for Large Homogenous Tuples

masters3d · May 30, 2021, 6:03pm

+1 this is excellent. I love that is could be back-ported and that this allows us to have extensions on a common protocol. It seems to me that the syntactic sugar it’s not essential to the proposal. If so, I believe you should separate the multiple type declaration syntax into a different proposal that also expands other types (not only applies to tuples).

michelf · May 30, 2021, 8:40pm

Regarding the ABI, could we give these tuples a separate ABI from normal tuples? The ABI for (Int, Int, Int) would be different from (3 x Int) which means they'd be two different types, but semantically in the language they would be interchangeable (via implicit conversions) and the collection APIs would be available on both.

benrimmington · May 30, 2021, 11:41pm

If homogeneous tuples conformed directly to the collection protocols, would there be an issue of unintended copying of large tuples, when using wrappers such as Slice, LazySequence, and Zip2Sequence?

For stdlib/system programmers, it might be enough to provide:

@_marker public protocol Tuple {}
@_marker public protocol HomogeneousTuple: Tuple {}

@_alwaysEmitIntoClient
public func withUnsafeBufferPointer<HT, E, R>(
  to homogeneousTuple: HT,
  of elementType: E.Type = E.self,
  do body: (UnsafeBufferPointer<E>) throws -> R
) rethrows -> R where HT: HomogeneousTuple

@_alwaysEmitIntoClient
public func withUnsafeMutableBufferPointer<HT, E, R>(
  to homogeneousTuple: inout HT,
  of elementType: E.Type = E.self,
  do body: (UnsafeMutableBufferPointer<E>) throws -> R
) rethrows -> R where HT: HomogeneousTuple

UPDATE: And perhaps an elementType(of:) function?

Karl · May 31, 2021, 12:55am

Yes, that is a danger, but it isn’t specific to tuples. It would be just as much of an issue for any other spelling of a fixed-size array that stored its elements directly.

The reason Array/String don’t have this precise issue (e.g. for lazy wrappers) is because the storage is heap allocated with an independent lifetime that can extended by retaining it. That said, they do have a sort-of similar issue when it comes to slices - in that even a small slice retains the entire parent Array/String (see Strings in Swift 4 section “Substrings”).

taylorswift · May 31, 2021, 2:37am

sighoya:

With value generics you can do that:
struct Vector<T,n:Int> {...}
extension Vector<Int,2>:CrossProduct {...}
extension Vector<Int,3>:CrossProduct {...}
Doesn't it suffice to solve your problem? I'm sorry, but I'm too inexperienced to fully understand your example.

these are just conditional conformances written another way

xAlien95 · May 31, 2021, 11:47am

Off topic reply

The problem with multiple conditional conformances is that they may overlap. However, with equatable compile-time-evaluable values as generic types, you can state mutual-exclusivity (see point 1. in the downsides listed at the end of the linked section), thus removing the overlaps. @sighoya's example (without the sugar syntax) would then work if I'm not mistaken.

xedin · May 31, 2021, 6:51pm

+1 on this proposal. It seems reasonable to me to avoid having a special type.

The question I have is whether it would be possible to use this protocol as an existential and/or generic requirement or it would be purely a type-check convenience? E.g. it's possible to declare a function like func test<T: ExpressibleBy*Literal>(_: T) or extension X where Y: ExpressibleByIntegerLiteral today.

grynspan · May 31, 2021, 9:44pm

FWIW I don't feel strongly here, I was just spitballing.

grynspan · May 31, 2021, 9:48pm

Obviously the T[N] syntax I mentioned is stolen blatantly from other languages like Java (or, indirectly, C.) Which is not a point in its favour, I suppose, but also has the benefit of the prior art bit being set. And it would make the relationship between these values and C arrays more obvious, which is valuable in the context of types-imported-from-C.

We could extend it further… we could go completely off-the-wall and let the declarer specify the range of indices!

var wat: Int[9 ..< 50]

(Kidding.)

glessard · June 1, 2021, 7:02am

That would be a slice that does not share its index representation with its base collection. I believe that would not fulfill the Collection contract. “Slices Share Indices”, Collection

ensan-hcl · June 1, 2021, 7:54am

I reconsidered what I said here, and found that tuple is the root of problem. Due to (T) == T, all types including heterogeneous tuple are homogeneous.

// 2 x heterogeneous
(Int, String) 
// 1 x homogeneous
(1 x (Int, String))
// however these two types are equal
(Int, String) == (1 x (Int, String))

However, we also want to treat all length of homogeneous tuples in the same generic context and don't want to prepare special function for zero and one length homogeneous tuple. We must create NewFixedSizeArray that enables NewFixedSizeArray<T> != T.

Unfortunately, having NewFixedSizeArray as a mere wrapper of tuple is not enough because it still has ambiguity, since tuple is the problem.

NewFixedSizeArray<(Int, Int)>     // 2 x Homogeneous?  1 x Homogeneous?

To avoid it, a type like this is great.

NewFixedSizeArray<Int, Int>       // 2 x 
NewFixedSizeArray<Int>            // 1 x 
NewFixedSizeArray<(Int, String)>  // 1 x 
NewFixedSizeArray<(Int, Int)>     // 1 x 
NewFixedSizeArray<Int, String>  // Error

Some plans

Add new special type like HomogeneousTuple that is not a struct, add protocol conformances, enable it only with (n x T) (or similar) syntax, and ensure T != (1 x T).

let a: (2 x Int) = (0, 1)
let b: (Int, Int) = (0, 1)
print(a[0])    // 0
print(b[0])    // Error

let c: (1 x Int) = 1
let d: Int = 1
print(c[0])    // 1
print(d[0])    // Error

Add variadic generics, create generic strcut type NewFixedSizeArray<...T>, add protocol conformances, and add [n x T] as syntax sugar of it.

struct NewFixedSizeArray<...T>: RandomAccessCollection, MutableCollection { /* ... */ }

// (T, T) != [2 x T] == NewFixedSizeArray<T, T>
let a: [2 x Int] = (0, 1)
print(a[0]) // 0

// T != [1 x T] == NewFixedSizeArray<T>
let b: [1 x Int] = (1
print(b[0]) // 1

In both way, we must use some magical ways to achieve subscript of tuple, because

Adding new homogeneous tuple type of course requires it.
Reading previous discussions, variadic generics doesn't necessarily has index-based access for variadic values.

At least, we should avoid to add special behavior into current homogeneous tuple, simply because it has inevitable ambiguity and therefore has difficulty in usage.

Geordie_J · June 1, 2021, 8:37am

I have read and mostly (hopefully) understood the proposal and the resulting thread.

I think it’s definitely something worth doing and IMHO it fits the direction of Swift. Leaving the new types as Tuples makes sense in terms of my mental model of what a Swift Array is (compared to a C array). I also agree with making use of the path of least resistance here in this way.

Two things I’m unsure about are:

The ABI stability concerns of changing the Swift calling convention for these new types. It sounds good but I don’t have a clear understanding of it’s implications for backwards compatibility. It seems like a lot of experienced eyes have reviewed the proposal and no one seems to take issue with the plan so that’s good enough for me.
I somewhat agree with others’ concerns regarding the (N x Type) syntax. It does seem strange to me but I understand the motivation of wanting to avoid the * operator, presumably to simplify parsing and to avoid even more overloads there. I’m not sure either of those concerns is strong enough to merit introducing a totally new syntax though (x). To me it reads fine but it is pretty surprising and doesn’t map to anything that already exists, which does seem like a big step for what should be a relatively minor stop-gap feature.

All in all: +1!

Joe_Groff · June 1, 2021, 3:16pm

I don't think this is possible, because the size of a subsequence operation can be dynamic, and without fancier type system features, there's no way to capture that dynamic size into a type. We probably need a Slice wrapper type for homogeneous tuples that can capture the extents of the slice with the fixed-size array. If we want to take a stronger bet on future language directions, maybe we could make it so the slice type is ArraySlice—today, that would require boxing the fixed-size array in a buffer that can be referenced as an ArraySlice, but in the future, we could hopefully make it so that move-only borrowed ArraySlices can reference unowned memory that is guaranteed to live the lifetime of a call.

I don't think it should be. Single-element tuples don't, and shouldn't exist. (1 x Int) is the same type as Int, and it would be gnarly for every type to be effectively a one-element collection of itself. Without variadics, or type-level integers, I think it is an acceptable limitation to disallow (0 x Int) and (1 x Int) for now. When we add those type system features, I think there are things we can do to mitigate the potential problems with collection conformance on those cases, by making the conformances only visible over variadic abstractions that may have 0, 1, or N elements, but not on the concrete types themselves when we know concretely we have a () or (T) scalar.

If we separate the types, then I fear that will break the compatibility benefits of this approach. The problems Michael laid out with our current ABI are a scalability problem for potentially all code, so I think it's worth trying to fix the ABI unilaterally here. We could alternatively add attributes to let you specify whether you want the tuple-splitting ABI or not for arguments, in order to maintain compatibility with existing libraries.

Jumhyn · June 1, 2021, 4:00pm

Some sort of shorthand like this seems useful and well-motivated by the proposal! Would be nice to see those long homogenous lists get compressed, especially if it can be done in a way that is forward-compatible with things like variadic generics, etc.

Bikeshedding the syntax

I agree with others who have mentioned that x seems like the wrong choice of spelling for this 'operator'—among the other reasons already mentioned, not every monospace font has a glyph for x which appears visually similar to the unicode times symbol (U+00D7). E.g., the Courier family (which is used on the LLVM doc pages, even) applies a serif to the x glyph which IMO makes the syntax look kind of silly compared to another option like *.

Screen Shot 2021-06-01 at 11.43.41 AM

David_Catmull · June 1, 2021, 4:16pm

Too bad × isn't easier to type. Maybe it should auto-complete when you type a number after (

Nobody1707 · June 1, 2021, 7:07pm

I'm starting to understand why Rust uses [T; N] for this. Especially since having the dimensions after the size makes it easy to extend the syntax to multi-dimensional arrays: [T; N, M]

Ben_Cohen · June 1, 2021, 7:28pm

my (possibly unachievable) dream is that this not be something specific to ArraySlice... that any MutableCollection that uses the default Slice ( as Deque does) gets the ability to do COW-avoiding mutating divide and conquer. So in that sense I'd rather not do something funky like make SubSequence = ArraySlice for this particular type. I can see how it might be more pragmatic though, even if its implicitly admitting defeat on the wider goal.

Michael_Gottesman · June 1, 2021, 7:41pm

I agree with you BenC here. I think we should go with the simple default.

Michael_Gottesman · June 3, 2021, 9:40pm

FYI, I am going to take in all of the feedback here and incorporate it into a 2nd pitch of a similar form. I am also going to talk more about the ABI break options.

Michael_Gottesman · June 3, 2021, 10:11pm

When I do that I am going to ask the moderators to close this thread and to post a link to the second pitch thread.