Vector manifesto

CTMacUser · February 17, 2019, 11:45pm

I've discussed fixed-size arrays here in the past. Here's what I think the design should be; a competing idea before any code hackers get too invested to this design.

@taylorswift and I both agree that SIMD vector types and fixed-size arrays should have similar syntax. Their difference is more in the implementations than the interfaces.

But @taylorswift's core design is very different from mine. S/he jams them in as quirky structs and tuples, while I use a brand new syntax.

Fixed-size arrays are compound types. Their declaration is like Rust, but the order is swapped:

let x:  [10; Int]

is an array of ten integers. Dereference works like our Array, which is also used in C and its many inspired languages (C++, Go, Rust, Java, etc.):

x[3]

The order is swapped from Rust so combining axes via nesting means going from the outermost span to the innermost span is left-to-right in both declaration and dereference. (The extent declaration and dereference orders also align in C and Go. I'm not sure why Rust switched it up.) Brackets are used since they are used for all array stuff in the C family of languages; I find stuffing fixed-size arrays as a quirky tuple as a surprise, especially since we already have an array declaration syntax. The separator between the extent and element type is a semicolon because (besides matching Rust) it's already a punctuator-keyword, as opposed to the asterisk which is currently only a punctuator-identifier.

When I referenced combining axes and the extent/element separator, I put in little qualifiers. That's because my design includes multi-dimensional support.

let y: [2, 5; Double]

Pre-C array designs, at least in Fortran and COBOL, where multi-dimensional. The designers of C ripped that out, probably because C was meant as a to-the-metal portable assembly language. We are not bound to that, like many of the ways Swift already eschew keeping copied-over C features primitive (big example: enumeration types). I'm anti-ripping it back in because the partition between extents is user-valuable design information. I'm not sure how multi-dimensionality is supposed to work in @taylorswift's design, besides dumping it and go with simple nesting, like C and every other C-inspired language does.

Dereference for a multi-dimensional type just comma-separates the coordinates:

y[1, 4]

There is also a static dereference mode, which is a single-number offset:

x.1  // x[1]
y.7  // y[1,2]

I came up with this to satisfy deterministic initialization until we get variadic generic function support ready enough to write the fixed-size array support functions. (The lack of a VGF story currently is one of the reasons why I never gave an update until the OP forced my hand.)

Oh yeah, zero-dimensional arrays are supported, which have exactly one element (and would probably need the zeroExample.0 = whatever syntax to use). Zero-valued extents should be supported; they would take up zero space when they're not a top-level object and minimal space when they are.

For vector support, I'm currently thinking of decorating the targeted extent:

let xx: [@vector 4, 16; Int]  // sixteen 4-way-SIMD-Int
let x3: [4, @vector 16; Int]  // four 16-way-SIMD-Int
let x4: [4, 16; @vector Int]  // compiler decides, including other combinations, like a single 64-way-SIMD-Int, or two 32-way-SIMD-Int or thirty-two 2-way-SIMD-Int

Maybe we need to support a [whatever ; @vector(directions) Int] where the "directions" are specific instructions, so we can force an option, like 4x16, 16x4, or 1x64. Storage is generally row-major, but the compiler can switch it up behind the scenes when vectorization is added. If a vector shape, element type, or combination is not supported, then the compiler can flag an error.

Existential arrays are also supported. I'm guessing that they'll be a bundle of a pointer to the array's data block and a reference to the object's witness table. Obviously, I'm imaging a new type of witness table entry. It would have a reference to the element type's witness table, a cached copy of the total number of elements, description of the SIMD status (if any), the number of extents, then a variadic list of each extent count. (The total number of elements is cached so we don't have to do the multiplication each time.)

The existential-ness can be on either the extent side or the element side. Casts can be done, as always, with "as", "as?", or "as!", where the punctuator-less version can't be used when whether a downcast is involved is either yes or unknown.

An array of Derived can be upcast to Base with the same shape or an existentially-compatible one, where Derived and Base are reference types.
An array of T can be upcast to MyProtocol.singular with the same shape or an existentially-compatible one, where T conforms to MyProtocol. The "singular" suffix can be bikeshedded later. Note the original array has elements all of the same type; a [Shape ; Numeric.singular] can be assigned a [Shape ; Int] or a [Shape ; Double], but not a mix. (Said mix of integers and doubles would be a regular [Shape ; Numeric], with no "singular" suffix. Remember that protocols don't conform to themselves so [Shape ; P] doesn't conform to [Shape ; P.singular] unless P is either Error or Any.) Dereferenced elements through the existential would be of MyProtocol. (Hmm, maybe we should let [Shape ; P] conform to [Shape ; P.singular].)
An array where its shape consists of all numbers can be upcast to one where one or more extent numbers are replaced with a placeholder ("_") with the same element type or an existentially-compatible one. Such existential array types have a dimension-specified existential shape. An array or array existential where all the extents are numbers have a non-existential shape. (Zero-dimensional arrays always have a non-existential shape.)
An array with a dimension-specified existential shape can be upcast to one with more of the definitive extents replaced with placeholder(s) with the same element type or an existentially-compatible one.
An array with a non-existential or dimension-specified existential shape can be upcast to the dimension-agnostic existential shape, with the same element type or an existentially-compatible one. This shape is specified by using the ellipsis ("...") as the extent. (This is the only existential shape a zero-dimension array can transition to.)

Note that these rules mean that [... ; Any.singular] is the top fixed-size array type.

Since extent counts besides 1 are supported, automatic Sequence support would be awkward. But I was about to propose a reworking of the Sequence/Collection hierarchy, and fixed-size arrays could be made to conform to those, where single-dimension arrays could conform to MutableCollection too.