[Second review] SE-0453: Vector, a fixed size array

ibex10 · December 12, 2024, 10:21am

Maybe it is worth exploring a nameless type instead, if there is too much opposition to the name Vector?

let u: [8 x Int] = [2, 3, 5, 7, 11, 13, 17, 19]

var v: [3 x [3 x Double]] = [...]

I don't see much of a difference between these:

var v: [3 x [3 x Double]] = [...]

var v: Vector <3, Vector <3, Double>> = [...]

The nameless type would essentially be this:

[Int x T]

'x' could be replaced by ',' but 'x' is easier to see.

@Slava_Pestov, have I gone off the rails by suggesting this?

dimi · December 12, 2024, 1:58pm

I'm a big fan of the goals of this proposal, but as I've voiced in the pitch thread already, I don't think Vector is the right name at all. My primary reasoning is:

Vector is not a friendly term to someone new to programming or swift, as they will likely be thinking of geometric vectors instead, the much more common term despite the mathematical broadness. Given the feedback Swift gets as becoming ever more complex, especially by veterans in the apple ecosystem, I don't think this should be taken lightly just because this is a "power user" type.
C++ using it already is bound to a) make the (brand new!) interop with C++ painful, and b) not a great reason why we should use it too. I long relished in Apple's APIs having better names than the competition out there, and still feel like we made a mistake choosing something so cryptic to the average coder as Mutex over the more common term of Lock.

If we were to do nothing but a rename, I would push for Tuple or HomogenousTuple as the most accurate representation of what this is, or Inline as an adequate short description of the intent.

Some have brought up the issue with Inline as an adjective, but I find this to be a silly reason to reject it as it feels to my limited experience with other languages that Swift more or less pioneered the use of Optional from one such adjective into now a common noun. Not to mention, when referring to an Inline, I would use it like an adjective: "This function returns an optional integer" … "This function returns inline integers" — in both these cases, optional and inline are adjectives modifying the principle type being returned: an ... integer/integers.

All of this aside, I really do think the actual type here is a specific subset of a tuple, and I think that instead of shoring up a named type, there is much more value in extending and modernizing tuples themselves to gain these capabilities directly. Namely:

A shorthand for defining them: (16 * UInt8) rather than (UInt8, UInt8, UInt8, ...) (I would also push for a compiler error to always force the numeral to show up first here as an enforced style).
Allowing extensions and protocol conformances on tuple definitions (both in terms of repeated types and parameter packs) to leverage generics for shared implementations.
Introduce a Tuple named variant to represent all existing tuples, allowing a current Tuple<N, T> form along with a hypothetical future Tuple<A, B, C, ...> form (perhaps making use of parameter packs? new named generics? the sky's the limit!), to make more complex initializers available.

This has many benefits:

Uplifts the rest of the language rather than the narrow slice that would make use of an inline array type for optimization reasons.
Maintains C-interop without additional code, as the new syntax would be an alias to the old one and functionality and existing code can remain unchanged.
Enables the goals of having a tuple-like structure that can live on the stack, but also conforms to collection like types with a full library of accessors and mutations on them.

As a shortcut to doing all of this now, I propose we do go with Tuple as the "now" solution, with explicit action items to generalize tuples such that this interim Tuple type transparently disappears when the more general tuple catches up. To make this happen, I would imagine it's memory layout would need to be identical to that of a tuple for upcoming ABI compatibility, and I would change the ExpressibleByArrayLiteral initialization pattern to use that of a tuple directly.

technogen · December 12, 2024, 2:14pm

I posted something like this elsewhere, but I think it's worth sharing it here:

I do think that having the word Array in there is a good idea, because (as far as I know, please correct me if i'm wrong), it's functionally is identical to the Array except for its size being compile-time known and non-resizable. I think that the conformance of Array to the Collection protocol is not its defining characteristic and we shouldn't use it as a distinguishing factor for these two types. It's entirely not inconceivable that someone will come up with a different protocol to abstract away both Array and this type for the purpose of generic code.

I would think in a direction of "why did we have to introduce this type"? Unless I'm mistaken, there are two reasons:

Because the dynamic allocation overhead of Array is not acceptable for certain use cases.
In order to gracefully work with C arrays, which in turn exist for the aforementioned reason, so we can disregard this reason for the purpose of this thought experiment.

Side-note on using the word Tuple: I think it's the heterogeneous nature of a tuple that makes it a tuple and not something else, and calling something HomogeneousTuple feels categorically contradictory.

I think it's safe to say that the best and most descriptive name for a homogeneous sequence of elements laid out in a continuous block of memory is Array.

In light of all this, I'd recommend naming it something like InlineArray, because the word inline (referring to a function) already means "take what normally resides elsewhere and hard-code it in the usage site for the sake of performance", which is exactly what the purpose of this type is.

I'm not married to the word Inline, but I do think that analyzing the purpose of the type carefully and considering the nuances of it's characteristics is the best way to come up with the most appropriate name.

So far, I haven't seen a name that more accurately describes why this type actually exists.

P.S.

This name also neatly falls in line (no pun intended) of the existing progression of more restrictive and more performant implementation of an array:

Array: Most flexible and compatible, least performant.
ContiguousArray: Less flexible and compatible, more performant.
InlineArray: Least flexible and compatible, most performant.

Lua · December 12, 2024, 5:08pm

InlineArray is also a good name because it could introduce a widely applicable naming convention.

I have an Image struct with a dynamic width and height, and with such a convention introduced by an InlineArray I could add a new compile time sized struct named InlineImage, and the difference between these image structs would be immediately clear to anyone who understands Array vs InlineArray.

Libraries will need to come up with such names, and a nice convention in the standard library would help everyone choose more consistent naming for their own types

technogen · December 12, 2024, 5:11pm

That's a great observation! This is exactly what I hoped to achieve: a naming convention that can be extrapolated to similar use cases!

Nobody1707 · December 12, 2024, 5:54pm

Span won't own any memory, it's literally just a view into already allocated memory.

Dmitriy_Ignatyev · December 12, 2024, 6:16pm

+1 for Vector name. Vector is from math, it is felt right.

I'm against the idea of naming it as in C++ because it is effectively to say: 'lets do it wrong again because previously it was done wrong, so forever we have no chance to make it right'.
There is a great opportunity to make things better in Swift.

Another reason is that we can not do as in C++, because we already have an Array and we can't rename it. So Swift Array is a C++ Vector, and what is called Array in C++ has no better name than Vector. Other names will be unbalanced with C++ anyway.

One more reason is that when anybody begin to learn new language, it is needed to learn tons of nuances. What is called Dictionary in Swift has naming like Map / HashMap / Hashtable in other languages. So it is need to be learned and remembered anyway.
But at least newcomers can feel a happy moment that fundamental data types like Array and Vector are named correctly in Swift.

As for InlineArray / SmallArray and other kinds of Array – I would prefer to use this names later, for array-like types with statically known limited capacity where elements are not required for initialization.

michelf · December 12, 2024, 6:48pm

I like the name InlineArray because I think it's important to make clear the storage is inline more than anything else about this type.

To me, the word "array" doesn't mean the container size is variable at runtime. An array is just a list of elements placed adjacent in memory. That Swift.Array can grow or shrink doesn't imply all arrays can grow or shrink.

I see Vector as an invitation for every library to add their own operations to the type when they should really just create their own domain-specific wrapper. This type shouldn't be named in a way that implies it can do more than just store values.

As an aside, I'm still wondering if we could still use tuple-like syntax.

Perhaps like this:

let u: (3 Int) = (2, 3, 5)

This would make a tuple of 3 Ints, but it'd be a distinct type from the normal tuple (Int, Int, Int) in order to get the correct padding at the end. We could call this kind of type a homogeneous tuple, or maybe an array tuple. And we probably could enable implicit conversions between the two in many situations.

Like a normal tuple, a homogenous/array tuple wouldn't be a named type so you can't write extensions for it. Also like a normal tuple, built-in conformances to certain standard library protocols could be part of the type. And like a tuple, the storage is inline and fixed in size and the values are always initialized, so those expectations are already baked in.

But there's still plenty of fuzzy details I haven't though about so maybe it's not a good idea.

Cgaafary · December 12, 2024, 7:25pm

I agree with this sentiment. I would propose that naming be optimized to reduce confusion for existing swift programmers rather than to reduce confusion for C++ programmers.

I would expect that a polyglot programmer from another language such as C++ would have the capability to adapt to the syntax of a different language.

Is the concern that the name Vector will lead to an excess of errors when interfacing between Swift and C++?

Dmitriy_Ignatyev · December 12, 2024, 7:28pm

I would argue that ability to store elements inline is a consequence of the fact that all elements of Vector are known at its initialization.
You can then wrap Vector in a class-Ref like it is done in Rust to make it heap allocated instead of stack allocation, or make some kind of indirection to make it not-in-place-allocated being member of a class if you need it.
So core semantics as I see it is that Vector:

fixed sized
all elements are known and initialized when Vector is initialized
Inline storage is the ability based on listed above contract.

Dmitriy_Ignatyev · December 12, 2024, 7:39pm

+1
It is unusual but I like it, because it is a specific term with very concrete characteristics.

What I want to understand better is how it differs from Vector and what are the arguments for this name is better than well and broad known Vector.

You also provide terms FixedVector / StableVector which implies there are not fixed / unstable variants, which seems to be current Swift.Array.

Lua · December 12, 2024, 8:01pm

The fixed size is a consequence of the array being inline, since you can't grow what isn't allocated, not the other way around. You could make a fixed size allocated array, but this one can't be made resizable.

I think inline storage is the defining trait of this type. It is much more suitable for implementing lower level data structures than general use as a fixed size array. Trying to treat it as one would have you quickly run out of stack memory.

Dmitriy_Ignatyev · December 12, 2024, 8:06pm

I can easily imagine a sort of xxxArray, that is fix-sized stack allocated (and inline), but empty after initialization. Elements can be added / removed, but it is impossible to add more elements than capacity of such an array, because its capacity is fixed.

Both of them are and can be inline because their capacity is 1) known 2) fixed. Second one (xxxArray) is much more similar to current Swift.Array than what is proposed as Vector – it can be empty, elements can be added / removed.

The ability of being stored inline is only an option. Such data structures are stored inline by default, but it is possible to opt out.

lorentey · December 12, 2024, 8:20pm

The defining characteristic of Swift's Array types is that they provide operations for changing their size: insert, append, remove`, concatenation, etc.

Span does not provide any of these things: it is not an array type in Swift, and it should not be called an Array.

UnsafeBufferPointer does not provide any of these things: it is not an array type in Swift, and it should not be called an Array.

Vector does not provide any of these things: it is not an array type in Swift, and it should not be called an Array.

I agree! InlineArray is a good name, which is, as I have repeatedly explained, precisely why we intend to use it to name the actual array type that happens to have inline storage:

struct InlineArray<let capacity: Int, Element: ~Copyable>: ~Copyable {
  init()
  var count: Int
  var isEmpty: Bool { get }
  var isFull: Bool { get }
  subscript(position: Int) -> Element { borrow mutate }
  mutating func append(_ newElement: consuming Element)
  mutating func removeLast() -> Element
  // etc.
}

To reiterate, InlineArray is a thing we are planning to add. If it comes to fruition, is going to be a true variant of Array, just like ContiguousArray and BitArray are. The prefix Inline* in the name InlineArray establishes a clear naming convention that applies to other basic data structures: InlineDeque, InlineSet, InlineDictionary, InlineString are all great names that will immediately convey the nature of these types.

Vector is not InlineArray. Vector is not a variant of Array: it is its own, unique construct. As a core primitive, it deserves a dedicated name; it does not need to establish a general naming pattern.

Vector is not a variant of Array.

There is such a thing as a multi-dimensional "vector"; the appropriate name for such a type is, obviously, Matrix.

Can you expand why you'd want to encode the dimensions of an image into its type? Image types with inline storage feel like a curious idea, in general. What is the use case you have in mind?

technogen · December 12, 2024, 8:23pm

Okay, this is news to me! I'm getting confused... Can you please explain why does this type deserve to exist if the planned InlineArray is essentially the same thing, but with a more polished API?

Lua · December 12, 2024, 8:27pm

As an example, I am writing a Minecraft based game, and there is one thing Swift can't currently express, that is the fact that a Chunk contains inline a Vector<16, Vector<256, Vector<16, Block>>>

though I would just make it one array and write a subscript that treats it as 3d.

It does not make sense for this to be resizable, a chunk always contains all the blocks

You can obviously just ignore the resizability, but that's storing more bytes for no reason, so it isn't exactly equivalent

ksluder · December 12, 2024, 8:29pm

I am not a mathematician, but I believe you are describing a tensor, not a matrix.

walkerbrown · December 12, 2024, 8:36pm

I think this note regarding multidimensional indexing and syntax sugar is really insightful. I hope such index reordering is considered.

For both new programmers and experienced ones coming from C, C++, or Rust, I think FixedArray would be the most obvious and least surprising name choice.

Breaking with established convention to name Swift's Array was the right call. Inverting the established convention by using Vector for a fixed-size type seems set to invite confusion and to work against Swift's goals of C++ interoperability.

Lastly, with a type named Vector, I'd expect to be able to point it in the opposite direction by multiplying (each component) by -1. However, the current proposal would offer Vector<3, UInt>.

Lua · December 12, 2024, 8:40pm

Really only the fact that it doesn't allocate, so it could be written as a global variable/constant and used with the same API as Image they both inherit from a Drawable protocol.

/// A namespace for built in images.
public enum Images {
    public enum Interface {
        /// The pico-8 style cursor embedded for debugging purposes.
        public static let cursor: Image = [
            [.clear, .black, .clear, .clear, .clear, .clear],
            [.black, .white, .black, .clear, .clear, .clear],
            [.black, .white, .white, .black, .clear, .clear],
            [.black, .white, .white, .white, .black, .clear],
            [.black, .white, .white, .white, .white, .black],
            [.black, .white, .white, .black, .black, .clear],
            [.clear, .black, .black, .white, .black, .clear]
        ]
    }
}

I have some simple images inlined in code so I can use them for debugging, in this case I wanted to test drawing a cursor to the screen in wasm before implementing a system to embed actual image files in the wasm binary. This could be an InlineImage instead.

Or since this works in Embedded Swift, maybe it would be useful there. I can think of some games I could implement to run in wasm with an InlineImage as a fixed size drawing target that wouldn't require linking in the memory allocator, making the binary even smaller

Like this embedded swift pong, the size of the "screen" is known at compile time

davedelong · December 12, 2024, 8:42pm

This doesn't make sense to me, because:

// is this not an "Array" because I can't insert, append, remove, nor concat?
let immutable = [1, 2, 3]

// but this is an array because I can insert, append, remove, or concat?
var mutable = [1, 2, 3]

I have never heard that mutation is the core definition of array-ness. I've always heard arrays described as ordered sequences of values that can be retrieved-by-index in constant time and, if you choose to attempt to mutate them, have varying performance characteristics depending on the operation.

In other words, the defining characteristics were ordering and O(1) indexing.