[Pitch] Improved Compiler Support for Large Homogenous Tuples

Michael_Gottesman · May 29, 2021, 1:09am

Hey everyone. Below is a pitch for improved language support for Large Homogenous Tuples. This will allow for us to add syntax sugar for homogneous tuples, a special homogenous tuple protocol, improve the tuple ABI for such large tuples, make it easier to initialize/work with such tuples, and use this technique to improve the experience of working with homogenous tuples resulting from the clang importer importing fixed size arrays from C.

Looking forward to everyone's feedback. Here is a permanent link to the pitch: swift-evolution/0000-improved-homogeneous-tuple-language-support.md at homogenous-tuples · gottesmm/swift-evolution · GitHub. I also reproduced the proposal below:

Michael_Gottesman · May 29, 2021, 1:10am

Sorry for the fat finger:

Improved Language Support for Homogenous Tuples

Proposal: SE-NNNN
- Authors: Michael Gottesman
Review Manager: TBD
Status: Awaiting implementation
Reference: SE-0283

Introduction

Swift programmers frequently encounter homogenous tuples (e.x.: (T, T, ..., T)) in contexts such as:

Writing code that uses an imported C fixed size array in Swift.
Defining a Swift nominal type that uses a homogenous tuples to represent a
field with a fixed layout of bytes (e.x.: SmallString)

In this proposal, we attempt to improve language support for such tuples in a
manner that makes homogenous tuples easier to write and compose better with the
rest of the language. The specific list of proposed changes are:

The addition of sugar for declaring large homogenous tuples.
Introducing a new HomogenousTuple protocol. This protocol will
extend RandomAccessCollection and MutableCollection allowing for
homogenous tuples to be used as collections and access contiguous storage. It
will also provide a place for us to declare new helper init methods for
initializing homogenous tuple memory, allow us to expose the count of a tuple
type, and also allow us to implement using default protocol implementations.
We will only allow for homogenous tuples to conform to this, not user types.
Changing the Clang Importer to import fixed size arrays as sugared homogenous
tuples and remove the arbitrary limitation on the number of elements (4096
elements) that an imported fixed size array can have now that type checking
homogenous tuples is fast.
Eliminating the need to use unsafe type pointer punning to pass imported C
fixed size arrays to related imported C APIs.
Changing the Swift calling convention ABI of sufficiently large tuples to
allow for them to be passed as an aggregate rather than aggressively
destructuring the tuple into individual arguments.

NOTE: This proposal is specifically not attempting to implement a fixed size
"Swifty" array for all Swift programmers. Instead, we are attempting to extend
Swift in a minimal, composable way that helps system/stdlib programmers get
their job done today in the context where these tuples are already used today.

Swift-evolution thread: Discussion thread topic for that proposal

Motivation

Today in Swift, system programmers use homogenous tuples when working with
imported fixed size C-arrays and when defining a fixed buffer of bytes of a
certain type. As a quick example, consider the following C struct,

typedef struct {
  int versionDigits[3];
} MyVersionInfo_t;

This would be imported as the following struct in Swift,

struct MyVersionInfo_t {
  var versionDigits: (Int, Int, Int)
}

This works in the small for such small fixed size arrays but as the number of
tuple elements increase, using homogenous tuples in this manner does not scale
from a usability and compile time perspective. We explore these difficulties
below:

Problem 1: Basic operations on large homogenous tuples require use of a source generator or unsafe code

To explore the implications of the current homogenous tuple model, imagine that
we are defining a System API that wants to maintain a cache of 128 pointers to
improve performance. Since we need to have enough storage for 128 pointers, we
use a homogenous tuple of UnsafeMutablePointer that point at objects of type
T:

@frozen
struct PointerCache<T> {
  typealias PointerType = UnsafeMutablePointer<T>

  /// 128 pointers that we use to cache the first pointers that are passed
  /// to our API.
  var _storage: (PointerType?, /*126 more pointers*/, PointerType?)
}

If we want to define an initializer that initializes this tuple to nil, we
either have to use a Swift source generator or use unsafe code:

extension PointerCache {
  init(repeating value: PointerType) {
    _storage = (value, /* value 125 times */, value)
  }

  init(unsafe: ()) {
    withUnsafeMutableBytes(of: &_storage) { (buffer: UnsafeMutableRawBufferPointer) in
      for i in 0..<128 {
        buffer.storeBytes(of: nil, toByteOffset: i*MemoryLayout<PointerType?>.stride, as: PointerType?.self)
      }
    }
  }

  init(unsafe2: ()) {
    withUnsafeMutableBytes(of: &_storage) { (buffer: UnsafeMutableRawBufferPointer) in
      memset(buffer.baseAddress!, 0, 128*MemoryLayout<PointerType?>.stride)
    }
  }
}

If we want to define a Collection conformance for our type, we must wrap our
tuple in a nominal type and are forced to again use either a Swift source code
generator or use unsafe code,

extension PointerCache {
  subscript(index: Int) -> PointerType? {
    switch index {
    case 0:
      return _storage.0
    case 1:
      return _storage.1
    /* ... 125 more cases ... */
    case 127:
      return _storage.127
    default:
      fatalError("...")
  }
}

extension PointerCache {
  subscript(unsafe index: Int) -> PointerType? {
    withUnsafeBytes(of: x) { (buffer: UnsafeRawBufferPointer) -> PointerType? in
        buffer.load(fromByteOffset: index*MemoryLayout<PointerType?>.stride, as: PointerType?)
    }
  }
}

Even with this, we still are forced to avoid the natural manner of iterating in
Swift, the for-in loop and instead must iterate by using an index range and
subscript into the type:

func printCache(_ p: PointerCache) {
  for i in 0..<1024 {
    print(p[i])
  }
}
// Instead of
func printCache(_ p : PointerCache) {
  for elt in p {
    print(elt)
  }
}

In all of these cases, the lack of language support add unnecessary complexity
to the program for what should be simple primitive operations that should scale
naturally to larger types without needing us to use unsafe code. Even if we say
that the unsafe code example is ok, we would be relying on the optimizer to
eliminate overhead. Relying on the optimizer is ok in the large, but when system
programming praying is insufficient since the programmer must at compile time be
able to guarantee certain performance constraints. This generally forces the
programmer to look at the assembly to guarantee that the optimizer optimized the
unsafe code as the programmer expected, an unfortunate outcome.

Problem 2: Imported C fixed size arrays do not compose well with other Swift language constructs

The Clang Importer today imports C fixed size arrays into Swift as homogenous
tuples. For instance, the following C:

typedef struct {
  float dataBuffer[1024];
} MyData_t;
void printFloatData(const float *buffer);

would be imported as:

struct MyData_t {
  dataBuffer: (Float, ... /* 1022 Floats */, ..., Float)
}
void printFloatData(UnsafePointer<Float>);

The lack of language support for tuples results in these imported arrays not
being as easy to work with as they should be:

The ClangImporter due to the bad type checker performance of large homogenous
tuples will not import a fixed size array if the array would be imported as a
tuple with more than 4096 elements (see
ImportType.cpp). At
a high level, the type checker is running into the same problem of the
programmer: we have made the problem more difficult than it need to be by
forcing the expression of redundant information in the language.
Imported fixed size arrays can not be concisely iterated over due to
homogenous tuples lacking a Collection conformance. This makes an operation
that is easy to write in C much harder to perform in Swift since one must
define a nominal type wrapper and use one of the techniques above from our
PointerCache example. As a quick reminder using our imported MyData_t, this
is how we could use unsafe code to write our print method:
```
extension MyData_t {
  func print() {
    withUnsafeBytes(of: dataBuffer) { (buffer: UnsafeRawBufferPointer) -> PointerType? in
      for i in 0..<1024 {
         let f = buffer.load(fromByteOffset: i*MemoryLayout<Float>.stride, as: Float)
         print(f)
      }
    }
  }
}
```

Imported fixed size arrays from C no longer compose with associated C apis
when working with C code in Swift. As an example, if we wanted to use the
imported C API printFloatData with our C imported type, we would be unable
to write the natural code due to the types not lining up:

   extension MyData_t {
     mutating func cPrint1() {
       // We get a type error here since printFloatData accepts an
       // UnsafePointer<Float> as its first argument and
       // '&dataBuffer' is an UnsafePointer<(Float, ..., Float)>.
       printFloatData(&dataBuffer) // Error! Types don't line up!
     }
   }

but instead must write the following verbose code to satisfy the type checker:

  extension MyData_t {
    func cPrint() {
      withUnsafeBytes(of: x?.dataBuffer) {
        printFloatData($0.baseAddress!.assumingMemoryBound(to: Float.self))
      }
    }
  }

Problem 3: Large homogenous tuples result in linear type checking behavior

SE-0283 while enabling
tuples to become equatable, hashable and comparable also introduced new code
paths that result in linear work by the type checker since the type checker
needs to prove that each individual tuple element obeys that protocol. This is
exascerbated by the length of homogenous tuples. Luckily, We can eliminate that
issue for homogenous tuples by taking advantage of the extra semantic
information we have from the syntax to just use the type of the first tuple
element to perform the type checking.

Problem 4: Passing/returning large homogenous tuples is inefficient

Today Swift's ABI possesses a rule that all tuples are eagerly destructured at
all costs. This means that if one were to pass a 1024 element tuple as an
argument, the resulting function would have 1024 arguments resulting in poor
performance. This ABI is an artifact of Swift's early evolution where there was
an attempt to enable tuples to be passed as a function argument list. In such a
case, destructuring the tuple into arguments makes sense. Moving forward in
time, that functionality was eliminated from the language so now we are just
left with an ABI rule that is actively harmful to performance. This performance
cost is significant enough that users generally avoid large tuples.

Proposed solution

Syntax Change: Homogenous Tuple Type Sugar

We propose a new sugar syntax for a "homogenous tuple span" element. The
grammar of tuple elements in Swift would be extended as follows:

(5 x Int) -> (Int, Int, Int, Int, Int)

This sugar is expanded before type checking, so beyond the homogenous bit that
we set in the tuple type itself, the rest of the compiler just sees a normal
tuple and thus will be minimally effected. We would restrict ones ability to
only use homogenous tuple spans in tuples that are otherwise single element
lists.

Standard Library/Runtime Improvements:

We propose the addition of a new protocol called HomogenousTuple that all
homogenous tuples implicitly conform to:

protocol HomogenousTuple : RandomAccessCollection, MutableCollection {
  /// Used to initialize a tuple with a repeating element. All elements of the tuple will be initialized.
  ///
  /// Example:
  ///
  /// let x = (128 x Int)(repeating: 0)
  /// let y = (128 x UnsafePointer<MyDataType>)(repeating: sentinelValue)
  init(repeating: repeatedValue: Element)

  /// Initialize all elements of a tuple using a callback initializer. WARNING:
  /// The closure is expected to initialize all elements of the buffer. Accessing
  /// ubinitialized memory elements is undefined behavior.
  ///
  /// Example:
  ///
  /// // Fill tuple with integral data.
  /// let x = (1024 x Int) { (buffer: UnsafeMutableBufferPointer<Int>) in
  ///   for i in 0..<1024 { x[i] = i }
  /// }
  /// // Or more succintly:
  /// let x = (1024 x Int) { for i in 0..<1024 { $0[i] = i } }
  /// 
  /// // memcpy data from a data buffer into a tuple after being called via a callback from C.
  /// var _cache: (1024 x Int) = ...
  /// func callBackForC(_ src: UnsafeMutableBufferPointer<Int>) {
  ///   precondition(src.size == 1024)
  ///   _cache = (1024 x Int) { dst: UnsafeMutableBufferPointer<Int> in
  ///     memcpy(dst.baseAddress!, src.baseAddress!, src.size * MemoryLayout<Int>.stride)
  ///   }
  /// }
  init(initializingWith initializer: (UnsafeMutableBufferPointer<Element>) throws -> Void) rethrows

  /// Initialize all elements of a tuple with pre-known values, placing the
  /// number of elements actually written to in count. After returning, the
  /// routine will fill the remaining uninitialized memory with either a zero
  /// fill or a bad pointer pattern and when asan is enabled will poison
  /// the memory. We will follow the same condition's of [Array's version of this method](https://developer.apple.com/documentation/swift/array/3200717-init)
  /// around the behavior of the initializedCount parameter.
  ///
  /// Example:
  ///
  ///   // Fill tuple with integral data.
  ///   let x = (1024 x Int) { (buffer: UnsafeMutableBufferPointer<Int>, initializedCount: inout Int) in
  ///     for i in 0..<1000 { x[i] = i }
  ///     initializedCount = 1000
  ///   }
  ///   precondition(x[1001] == 0, "Zero init!") // For arguments sake
  init(unsafeUninitializedCapacity: Int, initializingWith initializer: (inout UnsafeMutableBufferPointer<Element>, inout Int) throws -> Void) rethrows

}

Clang Importer Changes:

We propose changing the Clang Importer so that it sets the homogenous bit on all
fixed size arrays that it imports as homogenous tuples. This will allow for all
of the benefits from the work above to apply to these types when used in Swift.

Detailed design

Grammar Change: Add support for homogenous tuple span

In order to prase homogenous tuple spans, we modify the swift grammar as follows:

  set-product: 'x'
  type-tuple:
    '(' type-tuple-body? ')'
  type-tuple-body:
    type-tuple-element (',' type-tuple-element)* '...'?
  type-tuple-element:
    identifier? identifier ':' type
    type
    type-homogenous-tuple-span
  type-homogenous-tuple-span:
    integer_literal set-product type
  type-pure-homogenous-tuple:
    '(' type-homogenous-tuple-span ')'

When we parse a tuple that consists of a single homogenous tuple span element,
we set in the tuple type a homogenous tuple bit using spare bits already
available in the tuple type.

Type Checker: Reduce number of Type variables for Homogenous Tuples

The type checker today creates a type variable for all tuple elements when
solving constraints since any element of the tuple could be different. But if
the user provides us with the information ahead of time that they explicitly
have a homogenous tuple, we can introduce a join constraint on all of the tuple
elements allowing the constraint solver to be more efficient via the use of its
merging heuristic:

// We know that all of these must be the same type, so we can provide the type checker with that constraint info.
let x: (3 x Int) = (1, 2, 3)

Type Checker: Use homogenous tuple bit to decrease type checking when comparing, equating, and hashing homogenous tuples

Today the type checker has to perform a linear amount of work when type checking
homogenous tuples. This is exascerbated by the comparable/equatable/hashable
conformances added in
SE-0283. We can eliminate
this for large homogenous tuples since when we typecheck we will be able to
infer that all elements of a tuple that has the homogenous tuple bit set is the
same, allowing the type checker can just type check the first element of the
tuple type and use only that for type checking purposes.

HomogenousTuple protocol

This will be implemented following along the lines of the
ExpressibleBy*Literal protocols except that we will not allow for user
defined types to conform to this protocol. The only conformances are for types
with the homogenous tuple bit set preventing bad type checker performance. This
will additionally let us implement the protocol's functionality by using a
default implementation and using that default implementation for all
tuples. This default implementation in practice will just be a trampoline into
the c++ runtime.

Clang Importer Changes

The Clang Importer will be modified so that we set the homogenous tuple bit on
imported fixed size arrays. This will ensure that we print out the imported
tuples in homogenous form and will allow us to lift the 4096 element limit since
we will no longer have type checker slow down issues.

Improved Tuple ABI of Swift's Calling Convention

We propose that we change Swift's calling convention so that a function that
takes a homogenous tuple will have the new ABI. Easily, if we pass the
homogenous tuple to a function with the normal ABI we can just destructure it
and can provide a thunk for the old tuple ABI if we ever pass a non-homogenous
tuple as an argument and or result of a homogenous tuple API.

Source compatibility

This is purely additive from a source perspective. The only possible concern is
from the ClangImporter changing how it prints such headers. This is not actually
a concern since this is not used by any Swift programs for compilation as an
artifact. Rather the ClangImporter in this case is instead only intended to show
programmers what the ClangImporter imported.

Effect on ABI stability

The main effect on ABI stability would be additive since the change in tuple ABI
would only affect functions that have explicit homogenous tuples as arguments,
results and we can provide thunks when needed to convert in between the
representations.

Effect on API resilience

This is additive at the source level so should not effect API resilience.

Alternatives considered

The main alternative approach considered is the introduction of a new fixed size
array type rather than extending the tuple type grammar. For the purpose of this
proposal, we call such a type a NewFixedSizeArray and use the following syntax
as a straw man: [5 x Int]. The main advantage of using NewFixedSizeArray is
that it allows us to fix some ABI issues around tail padding (see footnote 1
below). That being said, we trade this ABI improvement for the following
problems:

The language surface area is being expanded to add a new type/concept rather
than building on something that already exists in the language. This is a
real concern given that the language is already relatively large and this is
another concept for Swift programmers to learn.
Since we are introducing a whole new type into the language and into the
compiler, we will need to update a far larger part of the compiler to add
this type and additionally will need to do work all over the optimizer to
ensure that we can handle these new types. That being said, we /may/ be able
to build on top of alloc_box/alloc_stack and thus avoid needing to touch
the optimizer.
Any code written assuming that the ClangImporter imports fixed size arrays as
homogenous tuples will either need to be modified (a source compatibility
break) or we will have to ensure that where-ever one could use one of those
tuples we can pass a NewFixedSizeArray. In practice this means adding a new
implicit conversion and adding support in the type checker for accessing
elements of NewFixedSizeArray like a tuple using x.0, x.1 syntax.

Footnote 1: Tail Padding Issues. An additional ABI issue that a new fixed size
array type would fix is around tail padding. Specifically if one has a type with
tail padding due to alignment, then Swift considers its size (the number of
bytes used to store the data for the value) to be smaller than its stride (the
space in between consecutive elements in an array. This contrasts with the C ABI
where it is always assumed that size and stride are the same.

As an example, MyState in the following has a size of 9 but a stride of 16 b/c
of the 8-byte alignment on an Int64:

// Size of 9, but stride of 16
struct MyState {
  var a: Int64 // 8 byte and 8 byte aligned
  var b: Int8  // 1 byte
  // var padding: Int56 (7 bytes of padding)
}

This means that (2 x MyState) would have a size of 16 + 9 = 25 instead of
16 + 16 = 32 like one may expect if one were considering a fixed size
array from C. As another example of this, consider if we made a MyPair that
contains a MyState:

struct MyPair {
  var state: MyState
  var b: Int8
}

By the rules of C, we would have that MyPair has a size of 24 since
MyState has an alignment of 8 and a stride of 16 meaning that b causes us
to need a whole additional byte to satisfy the alignment. In contrast in Swift,
MyPair will have a size of 10 and a stride of 16.

Now lets take a look at how this plays with homogenous tuples stored in structs:

struct MyHomogenousTuplePair {
  var states: (2 x MyState)
  var b: Int8
}

If we wanted to load from states.1, we would need to load bytes (16,32] and
then mask out bytes (25,32].

Luckily for us we import all C types (including tuples) as having the same
size/stride. So if (2 x MyState) was a fixed size array imported from C, we
would not have these issues. The issue is only for Swift types.

Acknowledgments

Thanks to Joe Groff for suggesting going down this path! Also thanks to Andrew
Trick, Michael Ilseman, David Smith, Tim Kientzle, John McCall, Doug Gregor, and
many others for feedback on this!

xwu · May 29, 2021, 2:14am

This is a pragmatic solution that makes sense for where we are. I really appreciate the discussion contrasting it with fixed-size arrays and why this solution is appropriate for Swift as it is, and I’m convinced.

A minor note: The word you’re looking for is (traditionally) homogeneous, not homogenous. They are often confused to the point that some dictionaries now define homogenous as (among other definitions) homogeneous, in the grand tradition where if you hold it wrong for long enough it becomes right. Nonetheless…

ensan-hcl · May 29, 2021, 2:20am

I have a few questions.

What is SubSequence of a tuple? Having conformance to RandomAccessCollection requires it.
Does labeled tuple also conform to HomogeneousTuple?
Is (1 x Int) allowed?

This will be implemented following along the lines of the ExpressibleBy*Literal protocols except that we will not allow for user defined types to conform to this protocol. The only conformances are for types with the homogenous tuple bit set preventing bad type checker performance.

Is creating extension for HomogeneousTuple allowed?

Personally (not pragmatically), I think it's better to have it as NewFixedSizeArray in that it isn't affected by the limitation of tuple and creates no special protocol. Maybe implementation will be harder though...

xwu · May 29, 2021, 2:26am

Follow-up question. Based on this naming choice—

—are you hinting at the possibility of supporting not-purely-homogeneous tuples? Certainly, by the way in which you phrase this feature as “sugar expanded before type checking,” one is given to wondering if you intend to support, for example:

typealias T = (flags: Int, 256 x UInt8)

John_McCall · May 29, 2021, 2:44am

This seems like it prioritizes avoiding one specific form of complexity — not changing the type that the importer uses for C arrays — over essentially all other goals.

fclout · May 29, 2021, 2:46am

Oh my goodness–is it finally happening?

I thought that there were several hard parts to this. I'm very much in favor of it and I have a few follow-up questions. It seems like the current IteratorProtocol design would require a copy of the homogeneous tuple to work. Is that the case? Or have we reached the point Swift is smart enough to inline everything iteration-related and avoid a copy? Also, it used to be that non-nominal types couldn't have protocol conformances, yet homogeneous tuples conform to HomogeneousTuple. To what extent is this restriction lifted? Is HomogeneousTuple and its conformance hard-coded magic?

BigSur · May 29, 2021, 3:17am

Does (0 x T) also mean ()/Void tuple?

Why not use * - (3 * Int) or (Int, 3)?

Is 'x/X' equivalent in this expression ( 2 x Int)==(2 X Int)?

Tino · May 29, 2021, 4:52am

I feel with those who have to deal with C code and are forced to work with how fixed size arrays are imported. But as I personally rarely encounter such pain, my focus is more on the language as a whole. In this regard, imho the pitch is not a clean solution but rather a workaround — which might even block the "clean solution":
I'm still hoping for an extension of generics with values, so that we can write foo: Array<type: Int, size: 5>.

Supporting value parameters is probably a bigger change in the compiler, but it has value besides a singular problem, and I also think it is easier to comprehend for users.
Generics are well established, but this new syntax is just "magic" (is it really using the letter "x"? Afaics, this is without precedent, and I find it really disturbing to use a consonant as an operator).

allevato · May 29, 2021, 4:55am

This feature is a long time coming; but does the sugar as described here go far enough to meet the performance goals?

Does this mean that if I declare let x: (5 x Int), the sugar isn't carried over into SIL/IR and it'll be equivalent to (Int, Int, Int, Int, Int)? Likewise, will an imported C type int[5] have the same IR representation as it does today?

A situation a colleague ran into a while back involved importing a C module with a struct that had a fairly large two-dimensional array, where I think each of its dimensions happened to fall right under the ClangImporter limit:

struct foo {
  int bar[4096][4096];
};

We found that when this struct was used from Swift, compile time slowed to a crawl, but type checking wasn't the problem in this case—it was the SIL -> IR and IR -> machine code lowering.

The problem is that Swift's lowering of the type doesn't take any advantage of the multiplicity. When that struct is used from C, Clang emits it compactly using LLVM array types:

%struct.foo = type { [4096 x [4096 x i32]] }

Swift, importing the same C type, emits something I can't repeat on this forum because it lists every single element separately:

%Ts5Int32V = type <{ i32 }>
%TSo3foo = %type <{ <{
    <{ %Ts5Int32V, %Ts5Int32V, ... column repeated 4096 times ... }>,
    <{ %Ts5Int32V, %Ts5Int32V, ... column repeated 4096 times ... }>,
    ... row repeated 4096 times ...
}> }>

I actually gave up waiting for the 4096x4096 case to finish compiling and lowered it to 1024x1024. A Swift program that did nothing but zero-initializes that struct, compiled without optimizations, generates 10.5MB of x86_64 machine code after about 15 seconds on my machine. Fortunately compiling with -O turns that into a single bzero call, but we shouldn't have to rely on the optimizer here.

Swift tuples suffer from the same issue today, but most folks wouldn't have the patience to write one long enough. However, this syntax would give users the power to do so easily:

struct Foo {
  var bar: (4096 x (4096 x Int))
}

So a solution to this problem needs to make sure that it plumbs the multiplicity down far enough that it lowers to LLVM arrays; it shouldn't be possible for a simple code snippet like the one above to make the compiler appear to hang.

grynspan · May 29, 2021, 11:04am

Weren't we just talking about this in my temporary buffers pitch thread?

@Michael_Gottesman: Improvements to C array interop are long-overdue, and being able to access homogeneous tuples in a more sequence-like way is something I know I'd use.

I don't love the proposed shorthand syntax though. The use of x, which in any other context is an identifier, doesn't sit well with me. Off the top of my head, a few syntaxes that might express the idea of a fixed-size array better:

var option1: [Int, 10]
var option2: [Int * 10]
var option3: [Int ..< 10]
var option4: Int[10]

I understand this is all syntactic sugar over tuples but I view that as an implementation detail: folks want fixed-size arrays, so let folks work with these values as if they were arrays. Hence, square brackets. If we really want to use round brackets as we do with tuples, fiiiiine... but please give a passing consideration to alternative shorthand spellings for them too.

DevAndArtist · May 29, 2021, 12:40pm

Hi, daily pure Swift user here. This pitch looks interesting but I personally see it as a quick workaround to a general problem.

Some of the steps I have in my imagination world might look similar to these (not claiming that everything is achievable in Swift):

get variadic generics done first
create a struct Tuple<...>: ExpressibleByTupleLiteral while ... is the variadic part which was left out here
get some statically safe type range feature rolling (solves issues where we want to be able constraint numerical and collection types and also introduce non-emptiness)
extend collection types, numerical types and tuple with these ranges as constraints
Tuple<Int..., ...10> homogenous tuple of 10 elements.

This has a lot of imaginary features, but those feature would cover many more holes in the language than a short-term (Int * 10) syntax (yes I would personally be against x being the times operator in for the pitched feature). That said, regardless how painful the workarounds for current problems might be, the long-term solutions "seem" to be more promising to me than polluting the language and the cognitive load with other sugar syntaxes. I don't want to sound rude, but I'm also against the introduction the HomogenousTuple protocol. Without non-nominal types being able to conform to protocols this is a no no for me.

ryansobol · May 29, 2021, 2:16pm

Kudos on having drafted a well-written proposal. However, right from the get go, two questions kept repeating in my mind while reading it over:

Why not pitch fixed-sized Swift Arrays?
Why not pitch a variadic generic syntax?

Joe_Groff · May 29, 2021, 2:38pm

I think there are other significant forms of complexity this avoids that make this an interesting direction to explore. Beyond just not needing to change the C importer, we also avoid introducing a new kind of type into the language. That would have backward deployment and compatibility constraints, whether that be a new, specific, structural fixed-size array type, or some larger type system expansion project like type-level integers or variadics. Those type system features may be desirable, but they're large projects, and they would also undoubtedly come with minimum deployment targets because of the necessary runtime support. (And even if you could define struct FixedArray<n: Int, T>, what bottom-turtle type do you use to represent its storage?) If we use homogeneous tuples, we have a runtime type representation that can map, albeit inefficiently, to what old runtimes support, and we have room to improve the representation efficiency in new compilers. It also wouldn't preclude us from integrating with general type system features in the future. Rust's history also suggests that merely having fixed-sized arrays, without the fancy type system features to generalize over them, is enough to get by; it took more than half a decade after Rust 1.0 came out for type-level integers to get added to Rust.

grynspan:

I don't love the proposed shorthand syntax though. The use of x , which in any other context is an identifier, doesn't sit well with me. Off the top of my head, a few syntaxes that might express the idea of a fixed-size array better:
var option1: [Int, 10]
var option2: [Int * 10]
var option3: [Int ..< 10]
var option4: Int[10]

I would strongly encourage whatever syntax we come up with to put the count before the type, so that nested dimensions compose in the same reading order in types and in expressions, without needing parser hacks like C's declaration syntax for arrays. You index a (N x (M x Int)) as array[n][m].

ryansobol · May 29, 2021, 3:23pm

These points are very well made, and I hear what you’re saying. The direction being proposed is definitely worth exploring. Though I’m sure you would agree that the “fancy type system features” mentioned in this thread are also worth exploring simultaneously. Especially since, by the authors own admission, the current pitch proposal is mostly syntactic sugar and convenience additions, along with short-term, albeit highly desired, type safety gains.

I appreciate you pointing to Rust’s history, but we don’t even need to look beyond our own; it’s taken nearly a half a decade for the ideas in the Swift Concurrency Manifesto to bare fruit. If we’re willing to spend that much time and energy getting async/await right, are we not willing to do the same getting other fancy features right as an alternative to this proposal? Though, I’m optimistic that it won’t take quite as long.

Karl · May 29, 2021, 3:30pm

(100 x PlusOne)

Fixed-size arrays are one of the only things I needed for WebURL where I had to bite the bullet and drop down to unsafe code - and it's the worst kind of unsafe code, involving pointer casts and such to turn a tuple in to an unsafe buffer.

I've also found that performance can be very delicate - the tuple will often get copied if you capture it in a closure, even if that closure is non-escaping. I don't know if this is due to ABI limitations or if our IR representation causes LLVM to miss some optimisations, but it makes reasoning about performance very difficult. You can easily end up with lots of byte-shuffling that shouldn't really be necessary.

(Also, it's a shame that @compilerEvaluable seems to have been abandoned, as a lot of C APIs define their lengths using constants - e.g PATH_MAX or INET6_ADDRSTRLEN. I suppose we wouldn't be able to support any length that isn't an integer literal)

Ben_Cohen · May 29, 2021, 3:34pm

By default, collections have a SubSequence of Slice, which wraps the base collection and serves up elements within a range of indices. That default is probably appropriate here.

the interesting thing about slices is they turn a fixed-size thing into a variable size thing

let fixed: (4 x String) = ("a","b","c","d")
var slice = fixed[...]
while let _ = slice.popFirst() { print(slice.count, terminator: ",") }
// 3,2,1,0

Joe_Groff · May 29, 2021, 3:40pm

By all means, the community can do more than one thing at the same time. It's worth exploring this direction, and on the other hand, as we do, we should also consider how it will fit into future language directions. If we make homogeneous tuples the canonical representation of inline fixed-size array storage, that does imply that we'd need some way to make variadics and type-level integers interact well in the future too.

Karl · May 29, 2021, 3:40pm

We already have an accepted proposal for tuples to conform to Equatable, Hashable and Comparable, even if it hasn't shipped yet, so it seems to me like this has already been settled.

Joe_Groff · May 29, 2021, 3:42pm

Yeah, rather than get hung up on the difference between "nominal" and "structural" types, I think it's better to work toward a direction where users don't need to care what a nominal type is. There's no fundamental reason tuples, functions, metatypes, and existentials can't have protocol conformances, and in time, we should allow for user-defined conformances on all types.