And to be clear, a) I did not bring this up in the first review because it was too close to the naming arguments from the pitch thread and I wanted to respect the intent of not derailing that, and b) I did outline how we could go forward with this current state of the world, by going with Tuple
and making a few modifications to the proposed type to allow for eventually merging these ideas in the language itself:
Or maybe FixArray<5, Int>
?
I've seen that term before in the Message Pack serialization format, meaning an array where the count is mixed in the type identifier byte.
I suppose the "fix" prefix will make some people think of fixed-point arithmetics, but I doubt it'd confuse anyone for long.
As pointed out by several people before, I also disagree that resizability is a defining aspect of an array. If InlineArray
is already in the works, then I think we just need another member of the Array
family of types:
Array
: Can grow and shrink without limit, auto-bridges to ObjC, heap-allocated with CoW.ContiguousArray
: Can grow and shrink without limit, guarantees direct access to memory, heap-allocated with CoW.InlineArray
(upcoming type): Can grow and shrink within the limit of compile-time capacity, guarantees direct access to memory, stack-allocated.StableInlineArray
(proposed type): Has fixed size, guarantees direct access to memory, stack-allocated.
I have an unusually strong feeling about the name StableInlineArray
because it feels simultaneously extremely expressive, descriptive and follows established nomenclature:
- Array: It offers homogeneous storage for values laid out contiguously in memory.
- Inline: Its memory entirely exists in line with wherever it is located (relevant naming precedent: inline function being fully baked into the point of use).
- Stable: Due to its fixed size, its indices are guaranteed to always be stable (relevant naming precedent: stable sort leaving all existing indices valid).
It also looks very similar to the other array type (InlineArray
), except for the added limitation/guarantee, which I think expresses their relationship very elegantly. After all, InlineArray
can be though of as a StableInlineArray
where the compile-time count
is turned into a compile-time capacity
.
Naming the type Tuple sounds like a bad idea to me (though I'm open to the idea of prefixing Tuple like "______Tuple").
Swift already has tuples, which are different from this proposed type in at least a few meaningful ways:
- tuples can hold heterogenous values. This proposed type cannot.
- tuples are structurally typed. Values can be looked up by index or by "property" name labels. This type is nominally typed.
- tuples cannot conform to protocols. This proposed type can.
By naming the type "Tuple" we unnecessarily force the user to understand the difference between "tuple" and "Tuple" in Swift.
Well reasoned. I'd also submit FixedInlineArray
as an alternative.
That's not what I said at all. Please re-read my post.
I said:
- it is highly unlikely a package would expose this as a currency type in its API
- other packages are likely to want to use the name "Vector"
- we have established precedence of name-squatting things that we regret.
My only point was that this should not be named "Vector". I did not say that only public APIs should have nice names. I did not say that this shouldn't have an easily-spelled nor easily-pronounced name. Please don't put words into my mouth.
FWIW, I went to look up which terms the book Computer Architecture: A Quantitative Approach generally uses to talk about vector architectures (which, I think, is commonly regarded as the book on Computer Architecture, so should give a reasonable idea of what people from this area of CS think when they hear "vector"). From the Data Level Parallelism in Vector Architectures section, the first example of vector processing is:
How Vector Processors Work: An Example
We can best understand a vector processor by looking at a vector loop for RV64V.Letās take a typical vector problem, which we use throughout this section:
Y = a Ć X + Y
X
andY
are vectors, initially resident in memory, anda
is a scalar. This problem is the so-called SAXPY or DAXPY loop that forms the inner loop of the Linpack benchmark [...].For now, let's assume the number of elements, or length, of a vector register (32) matches the length of the vector operation we're interested in. (This restriction will be lifted shortly.)
I can't overstate just how nice it'd be to call X
and Y
Swift Vector
s here. This typical vector problem deals with mathematical vectors that will be stored in vector registers, represented in Swift code by the type... Vector
!
The equation above only works if X
and Y
have matching sizes, something that would be best expressed in Swift at the type level via the integer generic parameter of Vector
.
LAPACK calls these arguments vectors too, even though the argument types would be C array
s typically. (It's not lost on me that, despite all the consideration for C++ interoperability, if one were to write a Swift wrapper around CLAPACK, using the "bare" Vector<N, Double>
type proposed here would be both reasonable and immensely superior as a name than FixedArray
as the argument is, indeed, a vector).
The book later goes into the topic of handling vector operations of length not known at compile time, giving the example:
Moreover, in a real program the length of a particular vector operation is often unknown at compile time. In fact, a single piece of code may require different vector lengths. For example, consider this code:
for (i=0; i <n; i=i+1) Y[i] = a ā X[i] + Y[i];
The size of all the vector operations depends on n, which may not even be known until run time!
And then goes on to describe how strip mining is used to deal with vectorization of loops of length unknown at compile time. This actually made me think of another way this type relates to a "vector": it seems to me like that, if one were to write this in Swift, in the use cases where N
is known at compile time, it would be possible to use generics, with one key difference between the Array
version:
func naiveSAXPY<N, T: Numeric>(a: T, x: [T], y: inout [T]) {
for i in x.indices {
y[i] = a * x[i] + y[i]
}
}
And the Vector
version:
func naiveSAXPY<N, T: Numeric>(
a: T,
x: Vector<N, T>,
y: inout Vector<N, T>
) {
for i in x.indices {
y[i] = a * x[i] + y[i]
}
}
Being that the latter can expose (through function specialization) the actual value of N
to the compiler, while the former can't. If I understood correctly, this is how Vector
can improve performance through the removal of unnecessary bound checks. But (again, if my understanding is correct) it would also open the door for the compiler to more reliably vectorize code (not only linear algebra code, but other kinds of code, like vectorizing the comparison of two Vector
sequences).
If this is accurate (getting a bit out of my depth here, though at least Godbolt seems to agree in a few quick experiments that Array
does indeed fail to optimize to vector instructions in this case when it can't infer the length of the array) I think it's notable that a type that can be used to build semantically-accurate vector types for linear algebra and helps vectorize code (two different "spins" on the meaning of vector) is not called Vector
.
I forgot to mention that well-defined adjectives like Inline are not only providing a descriptive name fragment for a particular type, but are also establishing a standardized naming pattern.
One aspect of the adjective Stable that stood out to me was the fact that it seems to be just the right level of vague, where it still conveys a critical piece of meaning, but leaves room for wider application beyond a specific use case.
The way I'd define this adjective in the context of type naming is: a type that doesn't change its shape despite a reasonable expectation and precedent otherwise.
For instance, a low-level serialization library could offer these types for holding raw binary data:
ByteBuffer
: Provides contiguous storage for raw binary data, can grow or shrink without limitation, heap-allocated with CoW, usesContiguousArray
as implementation.InlineByteBuffer
: Provides contiguous storage for raw binary data, can grow or shrink within the limit of a compile-time capacity, stack-allocated, usesInlineArray
as implementation.StableInlineByteBuffer
: Provides contiguous storage for raw binary data, has compile-time size, stack-allocated, usesStableInlineArray
as implementation.
Or a graphics library that could offer a family of raster image types:
RasterImage
: Provides storage for a 2D grid of pixels, can grow or shrink in both dimensions without limit, heap-allocated with CoW, usesContiguousArray
as implementation.InlineRasterImage
: Provides storage for a 2D grid of pixels, can grow or shrink in both dimensions within the limit of a compile-time pixel capacity, stack-allocated, usesInlineArray
as implementation.StableInlineRasterImage
: Provides storage for a 2D grid of pixels, has compile-time dimensions, stack-allocated, usesStableInlineArray
as implementation.
In all of these examples, both Inline (meaning roughly is all in one place right here) and Stable (meaning roughly can never change its shape) adequately describe the relevant aspect of each type.
Indeed, I did not say that you said any of those things: I think you'll find that I wrote explicitly that those things were not said by you.
Often, when making arguments, we speak in enthymemes: that is, we leave premises in a syllogism unstated, to be filled in by the listener based on the stated premises and the conclusion.
It is often useful to puzzle out those unstated premises because they can reveal what is assumed to be obviously true but may in fact be arguable. Here's how I arrived at the implied premise I was speaking to based on reading your opening statement:
- Only types exposed in public APIs need "nice" names (major premise 1, unstated)
- This type will not be exposed in public APIs (minor premise 1, stated)
- Therefore, this type does not need a "nice" name (conclusion 1 = major premise 2)
Vector
is a "nice" name (minor premise 2,unstatedapologiesāstated later in the post, insofar as "nice" ā desirable for others to use)- Therefore, this type does not need to be named
Vector
(conclusion 2, stated)
Thank you for clarifying that your reasoning was different from this; I responded to some of those other points above.
Except you cannot use Vector<n, Float>
to efficiently store values that will be used by vector operations. You must use SIMDn<Float>
to ensure that you are using a supported vector width and the value is properly aligned in memory to be used as an operand.
To be precise, both Vector
and SIMD
are unsuitable for general LAPACK/BLAS vector operations like SAXPY. They fundamentally want to operate in terms of a non-owning view of memory, like Span
(it's not quite Span
even, because both LAPACK and BLAS support strided vectors, where only every nth element is operated upon, but we can squint and pretend that's Span-like, at least).
These API must operate on data where it lies in memory, so SIMD
is unsuitable because of the additional alignment requirements it imposes, and Vector
is likely also unsuitable because you would tend to create lots of copiesĀ¹ in real use of these operations (and certainly Vector
is unsuitable today because of the inability to do arithmetic on integer generics, which would be necessary for the actual use of SAXPY within LAPACK).
Ā¹ Within LAPACK, low-level vector operations like SAXPY are generally performed on a vector that's actually a piece of a row or column of a larger matrix, and copying those elements out into an owning vector would be unacceptable for performance reasons; you want a span-like view instead.
Do you think Vector<n, Float>
eliminates the need for a Swift projection of simd_packed_floatN
?
Yes, I think so. (At least, paired with some other API)
I too think of this as an array and arrays as possibly fixed-size==count, and am attracted to naming the family of types as such with sensible qualifying aspects -- if they're orthogonal and their combinations are clear.
So, Rigid
?
Stable
seems pretty broad, and to imply instability (volatile?) in the alternative, which is a feature of values that shouldn't be tangled here.
Counted
might reinforce the constant nature of the count
But Fixed
seems more common than Rigid
in the coding context when size is a primary feature, and here the "shape" is really just the size since the element type shape/size is orthogonal to the vector count/size (as implied by the distinct generic parameter).
As for other aspects: I don't like Inline
because array reference types are also contiguous. I.e., Inline
is being used to mean nested or embedded-within, but the term tangles with the contiguous-memory aspect (that distinguishes array from other sequences). So I'd exclude Inline
, again following the principle that an aspect should not have alternate meanings overlapping other aspects.
For the inline/embedded aspect I prefer Value
, and recommended Values<N, T>
as the type name because the key feature of this type is being a value type (and value types are a distinguishing feature of Swift). The key semantic is that value types are shared only by copying or by adopting (consuming) if ~Copyable
. (Describing it as (best-effort) stack-based (or register or heap) is leaking the abstraction.)
On the example code (from the book, the C version) the loop is just iterating over a C array, with the same considerations about alignment a Swift Vector
would have.
for (i=0; i <n; i=i+1)
Y[i] = a ā X[i] + Y[i];
I was just making an observation about how the loop above is operating on vectors semantically, and how any pieces of these vectors would be moved into vector registers (because the assumption is that the compiler will vectorize the code) but the actual type of X
and Y
is called array
in C.
(To clear any misunderstanding)
I didn't want to imply that X
or Y
would each be the operands of a vector instruction or anything like that. The X
, Y
vectors can be much larger than the operand of any vector instruction on any architecture (ie, imagine you want to perform the operation for N = 1763).
What I meant was that the compiler will break up X
and Y
into multiple pieces, operating on only a few of those pieces at a time, and that those pieces would be stored in vector registers. Only if X
and Y
happen to match the natural vector length of the architecture (and have the correct alignment) would there be a 1:1 mapping between a Vector
in Swift and a vector register in the Āµarch, definitely not in the general case.
In any case, my point was that if you were to write a function that does Y = a * X + Y
in Swift naively, a function that takes [Double]
as arguments would possibly be much slower than the equivalent version using Vector<N, Double>
, if the compiler is able to emit code that uses vector instructions for the latter but not the former.
Basically, that if changing code like this:
func foo(a: [Double], b: [Double]) -> [Double] { ... }
To this:
func foo<let N: Int>(a: Vector<N, Double>, b: Vector<N, Double>) -> Vector<N, Double> { ... }
Has the effect of making the compiler more likely to be able to vectorize the code inside the function, that's... significant for the name of a type that mentions performance in its motivation section and has Vector
as the proposed name. It definitely seems to give the right vibe here too.
Oh, your code appears to be slow because the compiler is doing lots of bounds-checking on that array that prevent vectorization. Have you tried using
Vector
instead?
Wouldn't have the same ring to it with FixedArcaneArray
.
It was just a minor note anyway, I think there are other (better) reasons why the name should be Vector
, I just wanted to share this one as I hadn't seen any discussion on that direction and found it interesting.
The Swift Book has this to say:
Swift provides three primary collection types , known as arrays, sets, and dictionaries, for storing collections of values. Arrays are ordered collections of values.
And
An array stores values of the same type in an ordered list. The same value can appear in an array multiple times at different positions.
Iām firmly in the camp that this is an array based on what I consider the essential nature of an array to be. I think naming it SomethingArray is the right thing to do. I donāt have a great sense of what that Something should be but Iāll note that there are two characteristics here that we might want to call out separately :
- Fixed size
- Fully initialized
FixedSizeFullyInitializedArray is a mouthful, but this isnāt a currency type so Iād love to have more concise ways of describing those two concepts, but in the absence of consensus terse names Iām fine with the verbose one.
Just adding some more possibly poorly informed fuel to the fire, I was wondering about what properties a fixed array type would have, and how the names suggested so far fit...
- ordered - rules out names like
Bunch
,Batch
,Bag
,Clump
,Cluster
,Lot
etc - fixed size - rules out names like
List
that suggest something that supports adding or removing items, orElements
,Values
and many collective nouns that suggest their size is ephemeral - homogenous - rules out
Tuple
,Row
- describes a collection with no particular use case - rules out
Batch
,Buffer
- describes a collection with no particular element type - possibly rules out
Vector
, which is generally thought of as having numeric elements - describes a collection of potentially unrelated things - rules out
*Vector
, which often describes a single thing, orSeries
,Group
,Grouping
,Cluster
and possiblyRun
which all suggest a relationship between the members - concrete type - rules out
Elements
,Values
which feel more like protocols - eager, not lazy - not sure this rules anything out
For vectors that are part of a larger matrix, am I correct in thinking that the problem with using inout
to prevent copying is that it would also prevent concurrent accesses to the larger matrix? Also, if the vector is not part of a larger matrix (or concurrent accesses are not needed), is there a performance problem of using inout
to avoid copying a Vector
in this scenario?
Heap
, Deque
, and tuples are ordered collections (or is their main API) but contain theoretical and technical characterizations that merit other names. I hardly think there would be a discussion that these should be _Array
variants purely due to documentation.
Prior to this proposal, I don't think much thought has gone into "What really is a Swift Array
". This has been thoroughly explored and explained by @lorentey in the threads. Many distinctions have been brought up between the two structures that I would even find the title of the proposal "Vector, a fixed size array" self contradictory.
Edit: A point against _Array
variants: the language vends CollectionOfOne
and EmptyCollection
, which can be used with Array
concatenation and such. It has already been shown that ExpressibleByArrayLiteral
is not conclusive that conformers are Array
s (nor should they, that's a language feature).
If we really wanted to follow language precedents, InlineCollection
would make more sense then InlineArray
(imo).
I'm sorry to bring this point in so late, but I only joined the forum a couple of days ago.
It is reasonable to want a two-dimensional array which is a flexible array of fixed arrays.
It is perhaps reasonable to want a two-dimensional array which is a fixed array of flexible arrays.
Shouldn't the property of flexible versus fixed be a property of each dimension rather than a property of the array as a whole?
Shouldn't we leave the name "Array" as it is, and enhance the syntax of the dimensions?
As a suggestion,
var a = [1, 2, 3] // flexible
var b = [1, 2, 3 .] // fixed
var fixedArray = Array(repeating: value, count: number, fixed:true)
var c = [[1, 2 .], [3, 4 .]] // flexible array of fixed arrays
var d = [[1, 2], [3, 4] .] // fixed array of flexible arrays
var c = [[1, 2 .], [3, 4]] // syntax error, you can't have an array of different types of array