Vector, a fixed-size array

scanon · October 14, 2024, 2:28pm

Tangent, but very much no! Quaternions are vectors in the technical mathematical sense: they are a commutative group under addition, and have a scalar product that distributes over the addition, and that's all that a mathematical vector space is.

scanon · October 14, 2024, 2:39pm

This has been proposed and discussed extensively in the past few years. In practice it runs up against some pretty hard limits, some of which have already been discussed up-thread:

Tuples are non-nominal types, so there's no way to conform them to, say Collection.
Tuple conformances can only express the special case of a tuple that conforms to a protocol because all of its elements do.
The ABI mangling for tuples is pretty awful when it comes to using them as a fixed-size container.
We have no good mechanism for talking about large homogeneous tuples in the language.
Plumbing initialization through tuples is kind of a mess.
Also there's a layout issue about tail padding.
and some other stuff

All of these are probably solvable, but by the time you do so, you're really talking about a separate non-tuple type. So it makes more sense to just build that type directly.

That is exactly what this proposal is doing.

ksluder · October 14, 2024, 2:55pm

It seems to have gotten little attention that this type is non-copyable and stack-allocated. That makes this type very niche for such a general term as “Vector”.

scanon · October 14, 2024, 2:59pm

Well, that's because it isn't.

The proposed type is conditionally copyable:

extension Vector: Copyable where Element: Copyable {}

"stack allocated" isn't really a meaningful concept. It's stored inline in the context in which it appears, which sort of mostly correlates to what people probably have in mind when they say "stack allocated".

ksluder · October 14, 2024, 3:04pm

Ah, my mistake. But I think this highlights a downside of the chosen ~Copyable syntax, since so many container types’ main declarations are ~Copyable, including ones that are not conditionally copyable.

lorentey · October 14, 2024, 11:15pm

There is no need to spitball anything. Perhaps I wasn't clear enough, but my post already lists the members of the vector family: they're Vector, Span, UnsafeBufferPointer and UnsafeMutableBufferPointer.

We're planning to soon add MutableSpan; it'll be extending this list. I don't believe there are any other vector-like constructs in consideration.

As @glessard explained, no. Neither Span nor Vector are appropriate choices to represent the raw backing store of a resizable data structure, because both of these constructs require their contents to be fully initialized. (Plus Span is non-escapable.) We do not expect to ever add operations that punch uninitialized holes inside Vector, just like we're very unlikely to add an operation that deinitializes a single member of a tuple.

High-performance low-level data structure implementations need to be able to manage initialization of their own storage in whatever way they see fit. An array type wants to compress its initialized items at the start of its storage, so bookkeeping just requires a count; a ring buffer type like Deque needs to add a start offset; a hash table with open addressing requires a full bitmap; etc. etc. The construct that is the greatest common divisor of all these is UnsafeMutableBufferPointer.

We can choose to implement Vector and Span in terms of operations on UnsafeMutableBufferPointer. Whether we end up doing this is an engineering decision for the stdlib, mostly outside of the Swift Evolution process. (Span in particular may end up rolling its own builtins, as it has unique exclusivity guarantees that can enable future compiler optimizations that wouldn't be possible to do with regular pointers.)

Yes; embedded use cases are indeed the primary driving force behind this work. (Like most work on the Swift performance predictability roadmap.)

Vector has near-universal appeal though. It adds a new core operation to Swift's type algebra, exponentiation.

Vector<5, Int> is literally 5 integers glued together and treated as a unit. This has the same* shape as (Int, Int, Int, Int, Int). But it's so much more! Treating it as a distinct, named construct gives us extra power: we can give it custom member operations; we can conform it to arbitrary protocols; we have the full flexibility of Swift to make it integrate well into our library idioms.

(* In spirit. It turns out that homogeneous tuples do not have precisely the same memory layout as what we expect of a vector type -- AIUI the final item isn't fully padded. Vector resolves this vexing issue.)

I expect to see Vector types frequently pop up as components of custom types, and as input/result types of custom functions -- not just in novel use cases like embedded platforms, but everywhere, throughout all Swift.

(Vector is necessary for embedded platforms to embrace Swift, but it's probably not sufficient. To progress things beyond the 1970s, we'll eventually also want to add an actual array type with inline storage. In our drafts we're calling that separate type InlineArray<N: Int, Element>. It has a fixed capacity of N items of type Element, and it works like a true array, with insert/remove operations.

Unlike Vector (which has near-universal appeal) InlineArray would be specifically targeting "embedded"-style and/or extraordinarily high-performance use cases -- cases that need/want to avoid dynamic allocations at all cost. It isn't clear yet if we'd want to add InlineArray to the core stdlib; given its far narrower role, it may be better to propose it in a module that needs to be explicitly imported.)

Courses on linear algebra are core parts of the curriculum for most computer science studies. I am well aware of what vectors are in computer graphics and machine learning.

The Vector name is so apt precisely because this type would be directly usable as a vector representation in a Swift linear algebra package.

I say "a vector representation" rather than "the vector representation", because I expect a production library will generally also need to offer other vector forms. (Sparse vectors, strided vectors, vectors with heap-allocated storage, etc., etc.) This does not provide grounds for objection, though: the vector representation we're proposing is the most natural one, and it deserves to carry the unqualified name.

We are not proposing vector space operations at this time. I already detailed the reasons behind this:

I look forward to seeing how far our package ecosystem can go towards resolving or working around these issues.

Unlike C++'s broken std::vector, our proposed Vector type is fully capable of becoming the core representation of vectors in a Swift linear algebra module. This is part of why I think this name is so apt.

The quoted passage simply reiterates Stepanov's opinion that using the term "vector" to mean a "dynamically re-sizable array" was incorrect. We wholeheartedly agree with this position; that is why we quoted it in the pitch document.

That very same document defines "vector" as follows:

A matrix is a rectangular array of numbers, symbols, or expressions, arranged in rows and columns. [...]

A vector is a matrix with only one row or one column. Although the vector is traditionally introduced as a tuple of scalars, and a matrix as a tuple of vectors, as far as theorems of linear algebra are concerned there is no difference between a single column matrix and a column vector, nor a single row matrix and a row vector.

Our proposed type fully embraces this definition. It abstracts it in precisely one way: Vector allows its elements to be of arbitrary types. I don't see how this would be a problem for linear algebra packages -- the operations they provide will simply constrain the element type to match scalars in whatever approximation of a vector space they are using.

This abstraction is the only practical choice -- I don't expect the Standard Library would want to mess with trying to define a hand-wavy concept like "numbers, symbols, or expressions". And of course it enables the use of the Vector type to escape the strict confines of specialized linear algebra libraries. Applying existing words to new contexts is a common form of evolution in all human languages. (In our case, this evolution has already happened, decades ago.)

People's expectations are sometimes far removed from reality. I do not believe we will ever need/want to add a concatenation operation to Vector, under any name. I cannot prove this belief; but I can point out that (as far as I know) the existing vector types in C and C++ (T[] and std::array<T, N>) do not provide such an operation, either. Not just one spelled with +, but vector concatenation in any form.

Given that + over vectors can be technically interpreted in two different ways, I think the stdlib is unlikely to define such an operator -- but this should not by no means prevent other modules from providing it!

That said, Swift's binary operators are not in a great place -- they have unresolved, major design issues that generally prevent public library code from exposing custom operator definitions.

I'd very much like to see libraries define vector addition, scalar multiplication etc. over Vector, because I'm hoping one implementation would become definitive enough to include in the stdlib. But given Swift's issues with operators, I expect successful libraries would rather represent these operations with named member methods, not infix operators:

extension Vector where Element: AdditiveArithmetic {
  func adding(_ other: Self) -> Self
}
extension Vector where Element: Numeric {
  func scaled(by factor: Element) -> Self
}

The real challenge here is to make sure expressions like a.adding(b.scaled(by: 3)) will get compiled into efficient enough machine code -- our experience with the stdlib's SIMD vector types strongly indicates that we cannot rely on optimizer heuristics for reliable vectorization. (Aside: oh hey, there is that word again!)

(One interesting research area would be to use macros to implement custom expression-level vectorization; perhaps that'll also enable the (re)introduction of infix operator syntax.)

This work is independent of the core definition of the Vector type itself. The proposed type will be the canonical representation of a vector in Swift, with deep compiler integration which will be impossible to replicate outside the stdlib.

lorentey · October 15, 2024, 2:55am

If we were designing Swift from scratch today, Tuple would indeed be a good name for this struct. However, we would need to find another name for the general type product, i.e., what Swift has been calling a "tuple":

Tuple Type

A tuple type is a comma-separated list of types, enclosed in parentheses.

You can use a tuple type as the return type of a function to enable the function to return a single tuple containing multiple values. You can also name the elements of a tuple type and use those names to refer to the values of the individual elements. An element name consists of an identifier followed immediately by a colon (:). For an example that demonstrates both of these features, see Functions with Multiple Return Values.

Having a struct Tuple<N: Int, Element> in the stdlib while also keeping the tuple terminology for the type product would cause constant confusion: perhaps less so in written prose (because we can usually use code voice to distinguish between Tuple and tuple), but it would most definitely be an issue in speech.

When I hear someone say "that function needs to return a tuple", I need to know precisely what they mean:

extension FileDescription {
  static func pipe() throws -> (readEnd: FileDescriptor, writeEnd: FileDescriptor)
  static func pipe() throws -> Tuple<2, FileDescriptor>
}

"Oh, not that sort of tuple -- I meant it in the vector sense."

Using the name Tuple instead of Vector would be an obstacle to clear communication -- we should not fall into that trap.

Nick_Gabba · October 15, 2024, 4:33am

I am unsure how this should be different performance-wise from the SIMD primitives - for short Vectors - or an ordinary Array for large Vectors. This feels like a problem better-solved at the optimizer level, so that it would benefit all code that operates on local “collections of things”.

ksluder · October 15, 2024, 5:02am

Perhaps, in the fullness of time, the feature proposed here can be an instance of a more generalized variadic Tuple<let each n, each T>, where each n and each T must have the same arity.

dimi · October 15, 2024, 7:59am

I wholehartedly agree here, which is which I proposed HomogenousTuple, as a more constrained definition with more capabilities as a result, possibly with automatic bridging with regular tuples (obviating the need for a special language mode for C-imports):

dimi:

One final benefit of this line of thinking, HomogenousTuple could also be a regular tuple, so long as all the elements match. For instance, the following could become allowable:
// In some existing library
func lookup(id: (Int, Int, Int, Int)) {...}

...

let itemID: HomogenousTuple = (1, 2, 3, 4) // let id: HomogenousTuple<4, Int>
lookup(id: itemID)
This could simplify the compatibility story. uuid_t could be re-defined as HomogenousTuple<16, UInt8> in a potentially non-breaking way, along with the rest of the C APIs, potentially simplifying the requirement of waiting for a future swift language mode.

Ultimately, this ambiguity is why I don’t prefer Vector as a name for this type. All the reasoning for this name are sound, don’t get me wrong, but the ambiguity with other languages and with other subjects is setting it up poorly for success. I’m not saying the situation is identical, but as someone forced into TypeScript for work reasons, I am constantly getting let/var, length/count, and ;/ mixed up between the two, and this is with decades of Apple platform and web development experience. Adding yet another ambiguous name is likely to hurt those who work in C++ and Swift daily the most (ie. those doing interop work) — sure, they likely know the answer, but it’s extra cognitive burden.

Similarly, newcomers to the language or programming in general (many of which get into coding because they want to make a game, as I’ve seen from my own students) may see Vector in autocomplete and think it’s primarily there for doing math and geometry, and be confused when they eventually get around to reading documentation that it isn’t primarily for that purpose. This is opposed to seasoned programmers coming to Swift (or any swift developer not reading this thread), seeing Vector in a completion… and likely thinking the same thing!

Worse would be being introduced to the type reading existing code, and being confused as to why a Vector is being used in a situation likely divorced entirely from math. The best outcome here are folks thinking “well that’s weird” and moving on, but it makes for a poor impression of the language in most other cases (which, ultimately hurts my prospects of convincing my work to switch to server-side swift, so I do have some skin in the game ). An aside, but this was also my reaction when Deque was introduced — even after using them a ton, and this may be old age creeping in, but I sincerely am reminiscing about the “gold old days when NSArray would just have this behavior automatically”…

If the core team sincerely believes the average Swift developer will have no cognitive burden seeing Vector in use, both when it is introduced and after a few years in use, then by all means use it, but otherwise I do think alternatives may need to be more wholeheartedly considered.

(I also acknowledge that developers just starting out will have little reason to reach for this proposed type over an Array in general, but at the same time I hope we all realize why that’s a poor excuse for falling back to deeply technical requirements to make it accessible at all).

mgriebling · October 15, 2024, 9:28am

A key difference would be the ability to deal with large data sets (e.g., arbitrary precision math algorithms). I also agree that SIMD optimization is severely lacking when compared to ordinary C coding of algorithms.

sveinhal · October 15, 2024, 1:37pm

I don't think the expectation is "far removed from reality". The proposed type is presented as a collection type, and most non-mathematically minded users will first and foremost have a mental model of this as a "collection of fixed size", not an "element in a vector space".

Whether this will translate into a demand for a concatenation operator, I'm not sure. I hope not, but the mere existence of the duality in semantic meanings, are problematic to me.

I fear that modules will.

allevato · October 15, 2024, 1:57pm

If some other module wishes to implement + for whichever meaning of + it wants to use, I suppose it has two options:

If it wants v + w to mean the arithmetic sense of +, then ideally it should conform to AdditiveArithmetic. That would be a @retroactive conformance, which should be enough of a guard rail to make them think twice about doing it. The right answer here would be for that library to not use Vector directly, but to wrap it in its own type that provides the arithmetic conformances and operations.
If it wants v + w to mean concatenation, there is no protocol that provides just that operation without a slew of others that don't apply (e.g., RangeReplaceableCollection provides implementations of +, but Vector can't conform to that). So they would have to implement func + directly without an associated conformance. But given the global nature of operators in Swift, implementing operators you don't own on types you don't own may have some of the same dangers as retroactive conformances. (Some of the risks aren't there, like dynamic casts/dispatch behaving differently if the owner introduces the conformance, but there could be issues with ambiguity or overload selection later on?)

This makes me wonder if we should also require @retroactive on operator function implementations if you don't own the operator or any of its argument types.

grynspan · October 15, 2024, 2:23pm

To navel-gaze a bit, I can absolutely see a concatenating + operator on this type when the element type is CChar because it would be a zero-allocation equivalent of strlcat() from C:

let vector1: Vector<_, CChar> = "Hello "
let vector2: Vector<_, CChar> = "world!"
let vector3 = vector1 + vector2 // contains "Hello world!"

(I'm assuming here that the count could be inferred as it can be with true C arrays, but that's not the point.)

This sort of operation would be useful in Embedded Swift and other memory-constrained environments, I think.

But to navel-gaze in the opposite direction, because the prospective name of this type is Vector, I would absolutely expect all of its operators to be vector mathematics operators including +. (How would we express the dot product without needing a Unicode punctuation character like •, I wonder…?)

scanon · October 15, 2024, 2:42pm

From a mathematical perspective, using + for concatenation is deeply flawed, though widespread (+ by convention is used for commutative groups, while concatenation forms a non-commutative monoid). Julia did this right (if you're going to provide an operator for string concatenation, it ought to be *, but probably it makes the most sense to just spell it out a.append(contentsOf: b) or concatenate(a, b), or cat(a, b) if you need it to be concise).

I don't think any language has tried to use ⋅ for the dot product, even linear-algebra-focused languages, for two good reasons: first, it's a pain in the butt to type, and second it doesn't leave you any good spelling to differentiate the conjugated- and unconjugated-complex dot products.¹ Fortran calls it dot_product(a, b) (or if you're using the BLAS bindings, dot(a, b)). Numpy, Julia, and Matlab use dot(a,b), and Julia and Matlab also expose it as a' * b.

¹ The conventional ("unconjugated") dot product definition (∑aᵢbᵢ) used for real vectors is missing some of the usual properties when a and b are complex vectors. Namely you can have dot(a, a) == 0 with a non-zero, and dot(a, a) won't be a real number, which means that the dot product no longer gives you a norm on the vector space. This is usually addressed by defining the complex dot product as ∑aᵢ conj(bᵢ), which recovers the norm properties at the cost of symmetry in a and b. It's useful to be able to talk about both "dot products", however, so languages often define something like dotu(a, b) and dotc(a, b).

Alejandro · October 15, 2024, 4:40pm

I don't believe either of these proposals are doing it wrong, but it is worth mentioning why this can conform to Collection and why Span cannot. Span is a ~Escapable type which means that we'd need to generalize Collection to support non escapable types, but that isn't as simple as just slapping ~Escapable on the protocol. It would require changes to the SubSequence and Indices associated types which gets really hairy really fast. We're hoping that whatever new collection/container protocols we come up with will be fully generalized allowing any kind of type to conform to it.

jacubit · October 15, 2024, 7:59pm

Thank you for your response. I didn’t mean to question anyone’s background or knowledge. My comment was simply about how the pitch is framed in relation to the suggested naming for the type. After reading through the earlier comments, I thought it might be useful to revisit the mathematical definition of a vector to help clarify why the term matters in programming. I’ve noticed some concerns about aligning with mathematical terminology, but my intention was to highlight a part of the pitch that I believe could be confusing, even for mathematically adept people.

I can see more of your motivations now. If I may make a suggestion on this part of the pitch, I would add that the name is inspired by the common representation of a vector. As you emphasised in your reply, "representation" is the key word here. This is clearer than what was mentioned about magnitude.

GreatApe · October 17, 2024, 2:51pm

Why would it be confusing for geometers? It will be a perfect type for representing mathematical vectors.

GreatApe · October 17, 2024, 2:53pm

Strictly speaking I don't think you even need a vector space in order to talk about vectors?

GreatApe · October 17, 2024, 2:57pm

Surely that's possible for real vector spaces?