[Second review] SE-0453: Vector, a fixed size array

dimi · December 16, 2024, 1:15pm

And to be clear, a) I did not bring this up in the first review because it was too close to the naming arguments from the pitch thread and I wanted to respect the intent of not derailing that, and b) I did outline how we could go forward with this current state of the world, by going with Tuple and making a few modifications to the proposed type to allow for eventually merging these ideas in the language itself:

michelf · December 16, 2024, 1:33pm

Or maybe FixArray<5, Int>?

I've seen that term before in the Message Pack serialization format, meaning an array where the count is mixed in the type identifier byte.

I suppose the "fix" prefix will make some people think of fixed-point arithmetics, but I doubt it'd confuse anyone for long.

technogen · December 16, 2024, 2:45pm

As pointed out by several people before, I also disagree that resizability is a defining aspect of an array. If InlineArray is already in the works, then I think we just need another member of the Array family of types:

Array: Can grow and shrink without limit, auto-bridges to ObjC, heap-allocated with CoW.
ContiguousArray: Can grow and shrink without limit, guarantees direct access to memory, heap-allocated with CoW.
InlineArray (upcoming type): Can grow and shrink within the limit of compile-time capacity, guarantees direct access to memory, stack-allocated.
StableInlineArray (proposed type): Has fixed size, guarantees direct access to memory, stack-allocated.

I have an unusually strong feeling about the name StableInlineArray because it feels simultaneously extremely expressive, descriptive and follows established nomenclature:

Array: It offers homogeneous storage for values laid out contiguously in memory.
Inline: Its memory entirely exists in line with wherever it is located (relevant naming precedent: inline function being fully baked into the point of use).
Stable: Due to its fixed size, its indices are guaranteed to always be stable (relevant naming precedent: stable sort leaving all existing indices valid).

It also looks very similar to the other array type (InlineArray), except for the added limitation/guarantee, which I think expresses their relationship very elegantly. After all, InlineArray can be though of as a StableInlineArray where the compile-time count is turned into a compile-time capacity.

DandyLyons · December 16, 2024, 3:20pm

Naming the type Tuple sounds like a bad idea to me (though I'm open to the idea of prefixing Tuple like "______Tuple").

Swift already has tuples, which are different from this proposed type in at least a few meaningful ways:

tuples can hold heterogenous values. This proposed type cannot.
tuples are structurally typed. Values can be looked up by index or by "property" name labels. This type is nominally typed.
tuples cannot conform to protocols. This proposed type can.

By naming the type "Tuple" we unnecessarily force the user to understand the difference between "tuple" and "Tuple" in Swift.

Jon_Shier · December 16, 2024, 3:23pm

Well reasoned. I'd also submit FixedInlineArray as an alternative.

davedelong · December 16, 2024, 3:32pm

That's not what I said at all. Please re-read my post.

I said:

it is highly unlikely a package would expose this as a currency type in its API
other packages are likely to want to use the name "Vector"
we have established precedence of name-squatting things that we regret.

My only point was that this should not be named "Vector". I did not say that only public APIs should have nice names. I did not say that this shouldn't have an easily-spelled nor easily-pronounced name. Please don't put words into my mouth.

Andropov · December 16, 2024, 3:47pm

FWIW, I went to look up which terms the book Computer Architecture: A Quantitative Approach generally uses to talk about vector architectures (which, I think, is commonly regarded as the book on Computer Architecture, so should give a reasonable idea of what people from this area of CS think when they hear "vector"). From the Data Level Parallelism in Vector Architectures section, the first example of vector processing is:

How Vector Processors Work: An Example
We can best understand a vector processor by looking at a vector loop for RV64V.

Let’s take a typical vector problem, which we use throughout this section:

Y = a × X + Y

X and Y are vectors, initially resident in memory, and a is a scalar. This problem is the so-called SAXPY or DAXPY loop that forms the inner loop of the Linpack benchmark [...].

For now, let's assume the number of elements, or length, of a vector register (32) matches the length of the vector operation we're interested in. (This restriction will be lifted shortly.)

I can't overstate just how nice it'd be to call X and Y Swift Vectors here. This typical vector problem deals with mathematical vectors that will be stored in vector registers, represented in Swift code by the type... Vector!

The equation above only works if X and Y have matching sizes, something that would be best expressed in Swift at the type level via the integer generic parameter of Vector.

LAPACK calls these arguments vectors too, even though the argument types would be C arrays typically. (It's not lost on me that, despite all the consideration for C++ interoperability, if one were to write a Swift wrapper around CLAPACK, using the "bare" Vector<N, Double> type proposed here would be both reasonable and immensely superior as a name than FixedArray as the argument is, indeed, a vector).

The book later goes into the topic of handling vector operations of length not known at compile time, giving the example:

Moreover, in a real program the length of a particular vector operation is often unknown at compile time. In fact, a single piece of code may require different vector lengths. For example, consider this code:
for (i=0; i <n; i=i+1)
    Y[i] = a ∗ X[i] + Y[i];
The size of all the vector operations depends on n, which may not even be known until run time!

And then goes on to describe how strip mining is used to deal with vectorization of loops of length unknown at compile time. This actually made me think of another way this type relates to a "vector": it seems to me like that, if one were to write this in Swift, in the use cases where N is known at compile time, it would be possible to use generics, with one key difference between the Array version:

func naiveSAXPY<N, T: Numeric>(a: T, x: [T], y: inout [T]) {
    for i in x.indices {
        y[i] = a * x[i] + y[i]
    }
}

And the Vector version:

func naiveSAXPY<N, T: Numeric>(
    a: T,
    x: Vector<N, T>,
    y: inout Vector<N, T>
) {
    for i in x.indices {
        y[i] = a * x[i] + y[i]
    }
}

Being that the latter can expose (through function specialization) the actual value of N to the compiler, while the former can't. If I understood correctly, this is how Vector can improve performance through the removal of unnecessary bound checks. But (again, if my understanding is correct) it would also open the door for the compiler to more reliably vectorize code (not only linear algebra code, but other kinds of code, like vectorizing the comparison of two Vector sequences).

If this is accurate (getting a bit out of my depth here, though at least Godbolt seems to agree in a few quick experiments that Array does indeed fail to optimize to vector instructions in this case when it can't infer the length of the array) I think it's notable that a type that can be used to build semantically-accurate vector types for linear algebra and helps vectorize code (two different "spins" on the meaning of vector) is not called Vector.

technogen · December 16, 2024, 4:28pm

I forgot to mention that well-defined adjectives like Inline are not only providing a descriptive name fragment for a particular type, but are also establishing a standardized naming pattern.

One aspect of the adjective Stable that stood out to me was the fact that it seems to be just the right level of vague, where it still conveys a critical piece of meaning, but leaves room for wider application beyond a specific use case.

The way I'd define this adjective in the context of type naming is: a type that doesn't change its shape despite a reasonable expectation and precedent otherwise.

For instance, a low-level serialization library could offer these types for holding raw binary data:

ByteBuffer: Provides contiguous storage for raw binary data, can grow or shrink without limitation, heap-allocated with CoW, uses ContiguousArray as implementation.
InlineByteBuffer: Provides contiguous storage for raw binary data, can grow or shrink within the limit of a compile-time capacity, stack-allocated, uses InlineArray as implementation.
StableInlineByteBuffer: Provides contiguous storage for raw binary data, has compile-time size, stack-allocated, uses StableInlineArray as implementation.

Or a graphics library that could offer a family of raster image types:

RasterImage: Provides storage for a 2D grid of pixels, can grow or shrink in both dimensions without limit, heap-allocated with CoW, uses ContiguousArray as implementation.
InlineRasterImage: Provides storage for a 2D grid of pixels, can grow or shrink in both dimensions within the limit of a compile-time pixel capacity, stack-allocated, uses InlineArray as implementation.
StableInlineRasterImage: Provides storage for a 2D grid of pixels, has compile-time dimensions, stack-allocated, uses StableInlineArray as implementation.

In all of these examples, both Inline (meaning roughly is all in one place right here) and Stable (meaning roughly can never change its shape) adequately describe the relevant aspect of each type.

xwu · December 16, 2024, 4:34pm

Indeed, I did not say that you said any of those things: I think you'll find that I wrote explicitly that those things were not said by you.

Often, when making arguments, we speak in enthymemes: that is, we leave premises in a syllogism unstated, to be filled in by the listener based on the stated premises and the conclusion.

It is often useful to puzzle out those unstated premises because they can reveal what is assumed to be obviously true but may in fact be arguable. Here's how I arrived at the implied premise I was speaking to based on reading your opening statement:

Only types exposed in public APIs need "nice" names (major premise 1, unstated)
This type will not be exposed in public APIs (minor premise 1, stated)
Therefore, this type does not need a "nice" name (conclusion 1 = major premise 2)
Vector is a "nice" name (minor premise 2, ~~unstated~~ apologies—stated later in the post, insofar as "nice" ≈ desirable for others to use)
Therefore, this type does not need to be named Vector (conclusion 2, stated)

Thank you for clarifying that your reasoning was different from this; I responded to some of those other points above.

ksluder · December 16, 2024, 4:39pm

Except you cannot use Vector<n, Float> to efficiently store values that will be used by vector operations. You must use SIMDn<Float> to ensure that you are using a supported vector width and the value is properly aligned in memory to be used as an operand.

scanon · December 16, 2024, 6:25pm

To be precise, both Vector and SIMD are unsuitable for general LAPACK/BLAS vector operations like SAXPY. They fundamentally want to operate in terms of a non-owning view of memory, like Span (it's not quite Span even, because both LAPACK and BLAS support strided vectors, where only every nth element is operated upon, but we can squint and pretend that's Span-like, at least).

These API must operate on data where it lies in memory, so SIMD is unsuitable because of the additional alignment requirements it imposes, and Vector is likely also unsuitable because you would tend to create lots of copies¹ in real use of these operations (and certainly Vector is unsuitable today because of the inability to do arithmetic on integer generics, which would be necessary for the actual use of SAXPY within LAPACK).

¹ Within LAPACK, low-level vector operations like SAXPY are generally performed on a vector that's actually a piece of a row or column of a larger matrix, and copying those elements out into an owning vector would be unacceptable for performance reasons; you want a span-like view instead.

ksluder · December 16, 2024, 6:30pm

Do you think Vector<n, Float> eliminates the need for a Swift projection of simd_packed_floatN?

scanon · December 16, 2024, 6:31pm

Yes, I think so. (At least, paired with some other API)

wes1 · December 16, 2024, 6:36pm

I too think of this as an array and arrays as possibly fixed-size==count, and am attracted to naming the family of types as such with sensible qualifying aspects -- if they're orthogonal and their combinations are clear.

So, Rigid?

Stable seems pretty broad, and to imply instability (volatile?) in the alternative, which is a feature of values that shouldn't be tangled here.

Counted might reinforce the constant nature of the count

But Fixed seems more common than Rigid in the coding context when size is a primary feature, and here the "shape" is really just the size since the element type shape/size is orthogonal to the vector count/size (as implied by the distinct generic parameter).

As for other aspects: I don't like Inline because array reference types are also contiguous. I.e., Inline is being used to mean nested or embedded-within, but the term tangles with the contiguous-memory aspect (that distinguishes array from other sequences). So I'd exclude Inline, again following the principle that an aspect should not have alternate meanings overlapping other aspects.

For the inline/embedded aspect I prefer Value, and recommended Values<N, T> as the type name because the key feature of this type is being a value type (and value types are a distinguishing feature of Swift). The key semantic is that value types are shared only by copying or by adopting (consuming) if ~Copyable. (Describing it as (best-effort) stack-based (or register or heap) is leaking the abstraction.)

Andropov · December 16, 2024, 7:36pm

On the example code (from the book, the C version) the loop is just iterating over a C array, with the same considerations about alignment a Swift Vector would have.

for (i=0; i <n; i=i+1)
    Y[i] = a ∗ X[i] + Y[i];

I was just making an observation about how the loop above is operating on vectors semantically, and how any pieces of these vectors would be moved into vector registers (because the assumption is that the compiler will vectorize the code) but the actual type of X and Y is called array in C.

(To clear any misunderstanding)

I didn't want to imply that X or Y would each be the operands of a vector instruction or anything like that. The X, Y vectors can be much larger than the operand of any vector instruction on any architecture (ie, imagine you want to perform the operation for N = 1763).

What I meant was that the compiler will break up X and Y into multiple pieces, operating on only a few of those pieces at a time, and that those pieces would be stored in vector registers. Only if X and Y happen to match the natural vector length of the architecture (and have the correct alignment) would there be a 1:1 mapping between a Vector in Swift and a vector register in the µarch, definitely not in the general case.

In any case, my point was that if you were to write a function that does Y = a * X + Y in Swift naively, a function that takes [Double] as arguments would possibly be much slower than the equivalent version using Vector<N, Double>, if the compiler is able to emit code that uses vector instructions for the latter but not the former.

Basically, that if changing code like this:

func foo(a: [Double], b: [Double]) -> [Double] { ... }

To this:

func foo<let N: Int>(a: Vector<N, Double>, b: Vector<N, Double>) -> Vector<N, Double> { ... }

Has the effect of making the compiler more likely to be able to vectorize the code inside the function, that's... significant for the name of a type that mentions performance in its motivation section and has Vector as the proposed name. It definitely seems to give the right vibe here too.

Oh, your code appears to be slow because the compiler is doing lots of bounds-checking on that array that prevent vectorization. Have you tried using Vector instead?

Wouldn't have the same ring to it with FixedArcaneArray.

It was just a minor note anyway, I think there are other (better) reasons why the name should be Vector, I just wanted to share this one as I hadn't seen any discussion on that direction and found it interesting.

Matt_McLaughlin · December 16, 2024, 7:44pm

The Swift Book has this to say:

Swift provides three primary collection types , known as arrays, sets, and dictionaries, for storing collections of values. Arrays are ordered collections of values.

And

An array stores values of the same type in an ordered list. The same value can appear in an array multiple times at different positions.

I’m firmly in the camp that this is an array based on what I consider the essential nature of an array to be. I think naming it SomethingArray is the right thing to do. I don’t have a great sense of what that Something should be but I’ll note that there are two characteristics here that we might want to call out separately :

Fixed size
Fully initialized

FixedSizeFullyInitializedArray is a mouthful, but this isn’t a currency type so I’d love to have more concise ways of describing those two concepts, but in the absence of consensus terse names I’m fine with the verbose one.

Pampel · December 16, 2024, 8:53pm

Just adding some more possibly poorly informed fuel to the fire, I was wondering about what properties a fixed array type would have, and how the names suggested so far fit...

ordered - rules out names like Bunch, Batch, Bag, Clump, Cluster, Lot etc
fixed size - rules out names like List that suggest something that supports adding or removing items, or Elements, Values and many collective nouns that suggest their size is ephemeral
homogenous - rules out Tuple, Row
describes a collection with no particular use case - rules out Batch, Buffer
describes a collection with no particular element type - possibly rules out Vector, which is generally thought of as having numeric elements
describes a collection of potentially unrelated things - rules out *Vector, which often describes a single thing, or Series, Group, Grouping, Cluster and possibly Run which all suggest a relationship between the members
concrete type - rules out Elements, Values which feel more like protocols
eager, not lazy - not sure this rules anything out

Andropov · December 16, 2024, 8:59pm

For vectors that are part of a larger matrix, am I correct in thinking that the problem with using inout to prevent copying is that it would also prevent concurrent accesses to the larger matrix? Also, if the vector is not part of a larger matrix (or concurrent accesses are not needed), is there a performance problem of using inout to avoid copying a Vector in this scenario?

Pippin · December 16, 2024, 9:58pm

Heap, Deque, and tuples are ordered collections (or is their main API) but contain theoretical and technical characterizations that merit other names. I hardly think there would be a discussion that these should be _Array variants purely due to documentation.

Prior to this proposal, I don't think much thought has gone into "What really is a Swift Array". This has been thoroughly explored and explained by @lorentey in the threads. Many distinctions have been brought up between the two structures that I would even find the title of the proposal "Vector, a fixed size array" self contradictory.

Edit: A point against _Array variants: the language vends CollectionOfOne and EmptyCollection, which can be used with Array concatenation and such. It has already been shown that ExpressibleByArrayLiteral is not conclusive that conformers are Arrays (nor should they, that's a language feature).

If we really wanted to follow language precedents, InlineCollection would make more sense then InlineArray (imo).

HsilopNoitaton · December 16, 2024, 10:43pm

I'm sorry to bring this point in so late, but I only joined the forum a couple of days ago.

It is reasonable to want a two-dimensional array which is a flexible array of fixed arrays.

It is perhaps reasonable to want a two-dimensional array which is a fixed array of flexible arrays.

Shouldn't the property of flexible versus fixed be a property of each dimension rather than a property of the array as a whole?

Shouldn't we leave the name "Array" as it is, and enhance the syntax of the dimensions?

As a suggestion,

var a = [1, 2, 3] // flexible

var b = [1, 2, 3 .] // fixed

var fixedArray = Array(repeating: value, count: number, fixed:true)

var c = [[1, 2 .], [3, 4 .]] // flexible array of fixed arrays

var d = [[1, 2], [3, 4] .] // fixed array of flexible arrays

var c = [[1, 2 .], [3, 4]] // syntax error, you can't have an array of different types of array