Vector, a fixed-size array

rauhul · October 12, 2024, 9:22pm

I can't say I love the Rigid prefix, but putting that aside. Is it expected that Vector would form the backing inline storage for these types? Wouldn't that necessitate a static func uninitialized() unsafe -> Self -esk factory function on Vector?

glessard · October 13, 2024, 6:59am

No. Rather their inline storage would be built with the same building blocks used for Vector. The fully initialized state is fundamental to Vector.

Andropov · October 13, 2024, 11:00am

Genuinely excited for this proposal to come to Swift. Fully agree with every single point raised in the proposal. It's one of those things that seems so obvious in retrospect, and yet are extremely hard to pull off.

I like Vector as a name choice. I opened this thread prepared to say well, this isn't actually a vector... but I ended up being convinced by the arguments in the proposal.

First, this makes "true" mathematical vectors trivial to write: one just needs to define a group operation (typically addition) and scalar multiplication. This can be easily done by extending the type. I think this alone already is a strong basis for naming this data structure Vector. Other programming languages already use vector as a name for some stuff that is even more detached from the concept of a mathematical vector (C/C++/Rust).

It's true that for certain types it wouldn't be possible to define a group operation and a field over which Vector<N,T> becomes a true mathematical vector, but I think that's largely irrelevant. When T is not numeric, the mathematical definition of vector is likely not a consideration at all and developers will understand that Vector<N,T> is something shaped like the analogous (mathematical) vectors but with none of the properties.

While vectors of any length can fulfill the mathematical definition of a vector, that's not very useful in a vacuum. Addition and scalar multiplication define a vector space, and this vector space is a set of vectors of a fixed size. One rarely operates with vectors from different vector spaces at the same time.

So Vector<N,T> would be an element from the vector space defined by vectors of length N and type T.

If T is numerical, then regardless how the operations are defined, all the potential generated vector spaces are isomorphic, and equal to the set of all allowed values of Vector<N,T> (for a fixed N). So one could think of Vector<N,T> as a vector from a vector space for which we don't know the defining operations.

xtremekforever · October 13, 2024, 12:31pm

I wonder if this would be great for embedded? I noticed that Array was a heap allocating type so it couldn't be used with -no-allocations. So would Vector then be a non-heap allocating type and maybe live somewhere on the stack or get pulled out of .text?

jacubit · October 13, 2024, 12:40pm

I'm pretty excited about this pitch too! Like a lot of people here, the general concept is so much needed and welcome.

The name Vector for the proposed type is very unfortunate and IMO needs to be revisited. I favour names like Buffer or even FixedSizeArray. Just because other languages have made the mistake doesn't mean we have to go down that road.

IMO, Swift got the name Array right for a dynamic container! An array is an ordered container for any type. There are no restrictions on the type as long as the container has the same type of elements (It doesn't have any invariants pertaining to mathematical meaning). As far as I know, the proposed type is a fixed-size array and has nothing to do with the mathematical term vector (more on this below). The proposed type can hold any non-copyable type, so it is not intended to be used only for numeric types.

Indeed, the std::vector type goes against the definition of vector by being a
growable container, having a non-fixed magnitude.
We fully acknowledge that the Swift types, Swift.Array and Swift.Vector, are
complete opposites of the C++ ones, std::vector and std::array. While it may
be confusing at first, ultimately we feel that our names are more in line with
the mathematical term of art.
If there was any type we could add to the standard library whose name could be
Vector, it must be this one.

No, this is not the mathematical term. A vector is an object that may be added together and multiply by scalars. It is used extensively in computer graphics and machine learning, to name but a few, and to point out that it is not an unrelated term in programming. Sure, the proposed type can be used as a primitive to support those operations, but that is not what this proposal is about, nor the intention for the general use of this container. The proposed name doesn't help to disambiguate meanings, and this indirectly affects productivity.

I am concerned that naming this type Vector will start us off on the wrong foot when we try to introduce a LinearAlgebra library into the ecosystem. This is what happened in C++, and it took them so many years to come up with a proposal for this (I hope it doesn't take Swift a decade to get this into the standard library). In fact, I'd like to share the reference for the C++26 proposal of the addition of a Linear Algebra library, which may help to understand that Vector is just not the right name in the long term:

many C++ programmers incorrectly use the term vector as a synonym for
“dynamically re-sizable array.
” This bad habit is reinforced by the unfortunate naming of
std::vector.

Andropov · October 13, 2024, 4:05pm

I'm surprised by the machine learning example as I'd have argued that the term "vector" in machine learning is actually more aligned with this proposal's definition of vector than with the actual mathematical term. A vector does not only require addition and scalar multiplication (technically a group operation and a field), but also that the results of those operations remain in the same vector space. Yet there are multiple examples in machine learning where this is not true.

For example the input of a machine learning model, sometimes called the feature vector, does not fulfill the mathematical definition of a vector, because the feature space is not a vector space. Adding two feature vectors together (or multiplying one of them by an scalar) does not necessarily result in a valid feature vector (the result may not be contained in the feature space), which would be a must have requirement to properly consider it a vector. So it's more like a box of numbers that supports element-wise addition and multiplication.

jacubit · October 13, 2024, 5:54pm

Linear algebra studies vector spaces as you have defined them, and there is basic linear algebra in ML, but of course it is not just linear algebra. It's certainly more than that. Another common misconception is to confuse vectors with "points". In some scenarios, there exist a mapping that can take those "points" into a vector space (think of a plane that does not cross the origin). AFAIK this happens in so-called "Kernel Methods", and the feature space is not just a vector space, it has even an inner product.

I do not want to hijack the thread as this is not a mathematical discussion, so I will just leave it with this comment. Besides, I am far from being an expert in machine learning.

I am happy about the prospect of the addition of this new type to the standard library!

glessard · October 13, 2024, 11:32pm

Yes, but we'd like to do it with a safer, non-escapable type such as OutputSpan (prototyped in swift-foundation .) While this doesn't address the out-of-order initialization need, OutputSpan and other types in the Span family will themselves provide unsafe access to Vector's underlying storage. Hopefully, we won't have to add an withUnsafeBufferPointer() methods to types such as Vector that can vend a Span.

benrimmington · October 14, 2024, 12:09am

Can the Count and/or Element be inferred?

let cArray: Vector<2, _> = [x, y]

Should string literals also be supported?

let cString: Vector = "\r\n"

// cString.count == 3
// cString[0] == UInt8(ascii: "\r")
// cString[1] == UInt8(ascii: "\n")
// cString[2] == UInt8(ascii: "\0")

Alejandro · October 14, 2024, 12:18am

Yes! Both of these are able to be inferred either by using the placeholder _ or by omitting the generics entirely, Vector.

benrimmington:

Should string literals also be supported?

let cString: Vector = "\r\n"

// cString.count == 3
// cString[0] == UInt8(ascii: "\r")
// cString[1] == UInt8(ascii: "\n")
// cString[2] == UInt8(ascii: "\0")

Maybe with some more compiler hackery it could, but generally speaking Vector could not conform to ExpressibleByStringLiteral because the supplied string instance may have more or less bytes/scalars/characters than the initialized Vector's Count.

Alejandro · October 14, 2024, 12:36am

It could, but if I remember correctly @glessard has been looking into reducing this implicit conversion. I think we want to go into a direction of being more explicit around these APIs because it's very easy to see a vector being passed a regular parameter but it's actually being converted to a pointer and passed to a C function which may or may not hold onto that pointer longer than it should.

Yes, this should be possible sometime in the future. It would be pretty trivial with a couple more language features constraining parameter pack types:

extension Vector where Element: ~Copyable {
  public init<each T>(
    _ tuple: (repeat each T)
  ) where count(each T) == Count, T == Element {
    ...
  }
}

Going to a tuple seems a bit more trickier

This is untrue! I described in the source compatibility section that the standard library types are already shadowed from user modules. Any modules currently defining a name Vector will continue to work as normal as well as clients importing said modules!

Yes! However, we'll save that for a potential future proposal

Oops, yes you are correct. I will correct the proposal.

You are correct, this doesn't make any sense. I will remove the result value.

Right. I included the swapAt that takes two indices because that appears on MutableCollection and thought there should be an exchange equivalent, but if we can not include that and just go through the global exchange that'd be even better.

Yes, that is a major motivation for this type as well! I expect we will need to teach the compiler to lower these things into constant memory, but the result should be the same as if you were using a tuple. There wouldn't be any initialization necessary unless the element type isn't static/constant.

Alejandro · October 14, 2024, 1:06am

There's currently no feature flag yet for this as this hasn't landed yet (at least experimentally).

Yeah, this is a pretty big performance difference, but the behavior of the types should feel the same to Swift developers because they are both value types. Mutating an Array only mutates your copy of it, mutating a Vector will only mutate your unique instance/copy. It is probably worth calling out in the proposal however about the performance differences however because you're right, copying a vector and assigning it to a new variable, etc. will incur an immediate full copy.

Fair! I will update the proposal with usages of all the proposed API on it (excluding the generalized Collection APIs because those should feel very familiar).

I feel it'd be weirder to not support the array literal syntax for this type no? I mentioned in a previous comment that we could write a parameter pack taking vector init, but that would require some new constraints on parameter pack generics.

There could be a Sequence taking initializer that throws an error when the count of the sequence is less than the count of the vector. What do we do with the leftover elements? I assume we just ignore those and initialize the first Count number of elements within a sequence to the vector?

beccadax:

I'm imagining a call site for this initializer and it seems…strange.
let evens = Vector<8, Int> { i in i * 2 }
Would it be possible for this to have a "size-pinning parameter" in the same way that many type conversion APIs have a "type-pinning parameter"? If it is possible, do we think that would be a good idea?
// Feels a little more comfy if we can pull it off
let evens = Vector(count: 8) { i in i * 2 }

Not currently with integer generics, no. The generic parameter Count must either be an integer literal, inferred, or some other integer generic. This functionality may included when we get existential integer generics however

Yep, this result will be removed, thanks!

I'm unsure if we're lacking features, more so just that we need to generalize these protocols. Equatable and Hashable should be relatively easy to do. CustomStringConvertible should also be really easy, but I know @lorentey might've had some thoughts about whether the shape of this protocol is really the right thing for noncopyable types.

I think the current plan is to generalize Optional to support Equatable and Hashable for noncopyable wrapped types as soon as those are generalized as well. We could pretty easily do the same for Vector I believe.

Array is well served with ExpressibleByArrayLiteral, but there may be other types who do want a variable-sized initialization that doesn't allocate an array but instead uses a stack allocated local to do so. There is a desire to both allow types to control the size of the array literal being passed while also not allocating an array for the variable sized case as well.

xwu · October 14, 2024, 1:21am

Will incur, or will the compiler try to elide at least clearly unnecessary copies in optimization passes?

I'd imagine this has been considered, but if this is worth calling out, has there been thought to this or a parallel type being unconditionally noncopyable, so users have to explicitly invoke an API in order to copy all or a subset of elements?

Alejandro · October 14, 2024, 1:48am

The latter. The compiler will treat this type as if it were any other copyable type.

We initially wanted this type to be unconditionally noncopyable, but it was brought up that this might be a bit overly ceremonial for vectors of trivial types. It would require enclosing types to be noncopyable and parameters to declare ownership (which is not a bad thing! just maybe a bit too much for a couple of integers) among other things. The desire for an explicit copy is there, but I wonder if instead of a separate type we were to be explicit about all the ways the compiler will implicitly copy your type as well as ways it can be explicitly copied through source

xwu · October 14, 2024, 3:00am

RIght, part of what I'm wondering is if there are sufficient differences between, say, Vector<1024, URL> and "a couple of integers" that it's worth making the latter something like SmallVector<2, Int>; where smallness is in the control of the user and means "small enough that you want to copy it implicitly without the benefit of CoW." (This isn't new of course, even to Swift: we already have "smol"-ness as an internal concept for String.)

Naively, I feel that a user who is advanced enough to opt into the 'less flexible' Vector over the currency type Array will many times actively care about being explicit about whether their aggregate is something that's implicitly copied around or not.

Nevin · October 14, 2024, 3:47am

Perhaps “Vector” should be as pitched, and “FixedSizeArray” should be a COW wrapper around it…

Philippe_Hausler · October 14, 2024, 4:35am

Yes absolutely, one of the factors for pitching this was the exploration we did for embedded. There are many places where a non-heap allocating fixed sized collection is quite useful for building firmwares or kernels etc.

Per your question about where it lives: I believe the behavior will end up being that if the value is global and a let it will be able to be emitted directly to the read-only memory section. When writing boot loaders specifically it has a number of uses that can help avoid writing assembly by hand (specifically the interrupt/exception vector... and that nod of the name as prior art imho lends to the naming chosen quite nicely).

sveinhal · October 14, 2024, 12:13pm

I agree that these are well-established and true. But to me, the problem arises with any binary operation on this type: There is obviously an equally well-established expectation that vectors conform to—well—vector arithmetic.

The pitched type would be super useful to represent all kinds of vectors from mathematical and physical domains, such as euclidean vectors, complex numbers, matrices, polynomials, (polar) coordinates, and more.

And for each of these, you'd expect them to be non-growable and indexable by subscript, as you suggest. And, they are all well-known as vectors!

But, crucially, you'd also expect + to not be concatenation, but the appropriate vector space addition operation.

I realize that there are also a huge number of people who do expect the type described in this pitch to support concatenation, but the fact that these expectations are as incompatible are they are polarizing, suggests that the name probably isn't that great after all.

grynspan · October 14, 2024, 1:21pm

Do we need a fixed-size buffer type? Yes, I think so, especially for C interop. Should they be named Vector? Sorry, but that seems to me to be the wrong choice:

There is existing usage of the term to mean resizable buffers in other high-profile related languages; this existing usage is sufficient to cause confusion for developers moving to Swift from those languages.
There is existing usage of the term for types that represent mathematical vectors with vector arithmetic support (vec3 et al.) and these types have widespread adoption.

FixedArray, FixedSizeArray, SafeBuffer, or even just Buffer might be better names for this type.

(I don't think anything I've said here is novel and other comments have covered all of it—just adding my two cents.)

supervillain · October 14, 2024, 2:23pm

It sounds like what you are actually defining is a tuple, which is a superset of a vector.

The intended use cases might be to use this as a vector, but it could also be used as a quaternion, or a matrix, or even some other multi-dimensional list of things which are not numbers (which makes this definitely not a true vector).

From a programmer's point of view, I might be thinking of a Tuple<4> as a quaternion, while at the same time I have a Tuple<4> which refers to a 3d homogenous coordinate, and have yet another Tuple<4> which defines the coordinates of an axis aligned bounding box. Only the 3d coordinate is conceivably vector in the technical sense, providing direction and magnitude.

Perhaps it would be sensible to extend existing Tuples where all of the elements are of the same type, or provide a new homogenous tuple which provides the proposed capabilities. The fact that C interop already supports tuples hints that what we need is an extended tuple instead of a more limited array?