Vector, a fixed-size array

roopekv · October 29, 2024, 2:20pm

Maybe in the broadest sense, but when talking about mathematical vectors people usually mean something more along the lines of the Swift Standard Library's SIMD Vector Types, which I would have personally liked to have been named something like Vector<Width, Scalar> instead of SIMD2/3/4/8/16/32/64<Scalar>.

https://developer.apple.com/documentation/swift/simd-vector-types

Not a member of the core team, but we do already have these SIMD Vector Types in the Swift Standard Library, but no matrices or quaternions as of yet.

https://github.com/swiftlang/swift-evolution/blob/main/proposals/0229-simd.md

https://github.com/swiftlang/swift-evolution/blob/main/proposals/0251-simd-additions.md

GreatApe · October 29, 2024, 2:33pm

A vector itself is just a piece of information, formatted in a certain way. Operations on vectors are not part of the vector itself. So a vector space is a set (of vectors) along with some operations (addition, scalar multiplication etc).

So it makes total mathematical sense to define a generic vector type that can’t necessarily always be used in a vector space. For example; one my conditionally make it Addable when its scalar type is.

So i can’t see any mathematical objections at all. As to how other languages chose to name it in the 70s, I don’t think that should be taken into consideration at all. I am sure that C++ developers will manage, they are smart people!

ksluder · October 29, 2024, 2:45pm

C++ itself doesn’t date back to the 70s, and the STL is even younger. And as @Andropov pointed out, Rust uses Vec for its growable container too.

The mathematical arguments have already been thoroughly debated. We don’t need to rehash them.

GreatApe · October 29, 2024, 2:52pm

Ok but still, using vec or vector for something that can change length is very dubious, I don’t think we should perpetuate that mistake. The whole idea with swift was to have a fresh start, wasn’t it?

hassila · October 29, 2024, 2:55pm

I don't think anyone has argued in favour of that...

ksluder · October 29, 2024, 2:55pm

Nobody is suggesting to rename Array (Swift’s growable container) to Vec. Swift does, however, have native interop with C++, which uses std::vector frequently at API boundaries.

rauhul · October 29, 2024, 2:55pm

Chiming in on this, I've been experimentally adopting «this new type» in an embedded swift code base to avoid Arrays. My conclusion is that the name probably wont matter much because every single one of these has been wrapped in another stuct e.g.

struct StrongType: ... {
  typealias Storage = Vector<54, UInt8>
  var storage = Storage

  ...
}

I am almost exclusively referring to StrongType or StrongType.Storage and only generic code is operating on Vector.

In this example I think both Vector and Inline are totally fine names.

I think the fact that some API will need to traffic in Vector itself makes it a better name. Specifically I think func x(_: Vector<...>) -> Vector<...> reads better than func x(_: Inline<...>) -> Inline<...>.

GreatApe · October 29, 2024, 3:01pm

Sorry, i meant indirectly, by not naming this clear vector type ”vector”, based on the fact that some other languages misuse that name…

ksluder · October 29, 2024, 3:02pm

My counterpoint is that it shouldn’t read better because you probably shouldn’t be doing this without a very good reason. Returning a Vector<N, T> almost certainly involves spilling nearly N instances of T to the stack. If your program or library cares enough to adopt Vector, it will almost always be a better idea for such a function to take an uninitialized UnsafeMutableBufferPointer<T> (or Span<T> in the future?) and avoid the temporary copy to the stack.

scanon · October 29, 2024, 4:35pm

SIMD arithmetic has all sorts of operations that do not make sense on general vector spaces (e.g. elementwise multiplication, or shifts, or division, or horizontal reductions, or saturating rounding doubling multiply-add. We very deliberately did not name the SIMD types Vector for exactly this reason.

(If we had had integer generic parameters when we were designing SIMD, I would have used them, so we would have SIMD<2, Int32>, but we would very definitely have not have called them Vector.)

scanon · October 29, 2024, 4:43pm

Vectors would be modeled by a VectorSpace protocol, rather than a concrete or generic type, if we wanted to make them "conceptually correct" in the stdlib. A vector space is a set together with an addition and multiplication that satisfy certain axioms, not a specific represesntation, something like:

protocol VectorSpace {
  associatedtype Scalar: ScalarField
  static var zero: Self
  static func +(a: Self, b: Self) -> Self
  static func *(a: Scalar, b: Self) -> Self
  // everything else can be defaulted in terms of the above
}

as such, I do not believe that any of the names proposed here create a problem for progress in such a direction.

jeremyabannister · October 29, 2024, 4:51pm

Ok, well thank you for addressing my question. It sounds like you're mostly certain as opposed to entirely certain that the name Vector wouldn't be needed in such code. To me, the fact that it does sound like it is a real possibility that we will eventually model such things in the standard library makes me continue to feel uneasy about using Vector here, but now that I know that my concern has been registered I feel at least a little better since I do trust you to be prudent about these things. I'll rest my case

roopekv · October 29, 2024, 5:02pm

Although you did get quite close to doing so:

github.com

swiftlang/swift-evolution/blob/93575ecac1d292970a389a144d8fc9187d94259c/proposals/0229-simd.md?plain=1#L605-L643


      
          6. Updates based on discussion on Swift-Evolution and with core team: I have eliminated
          `_`-prefixed protocol requirements, to make it somewhat more obvious how user types
          can be made to conform. This required some renaming to make the function of these
          associatedtypes more clear. I have also systematically removed `Vector` from the naming
          scheme, in order to remove confusion with C++-style vectors or "mathematical" vector
          structures that might be added at some future point. The resulting names are generally
          shorter, which is an nice benefit, and they now have a uniform `SIMD` prefix.
          
            Protocols renamed:
            
            |  Old name | New name |
            | --- | --- |
            | `SIMDVector` | `SIMD` |
            | `SIMDVectorizable` | `SIMDScalar` |
            | `SIMDVectorStorage` | `SIMDStorage` |
            | `SIMDMaskVector` | N/A* |
          
            Associated types renamed:
            
            | Old name | New name |

This file has been truncated. show original

Old name New name

Vector2<Scalar> SIMD2<Scalar>

Vector3<Scalar> SIMD3<Scalar>

Vector4<Scalar> SIMD4<Scalar>

Vector8<Scalar> SIMD8<Scalar>

Vector16<Scalar> SIMD16<Scalar>

Vector32<Scalar> SIMD32<Scalar>

Vector64<Scalar> SIMD64<Scalar>

lorentey · October 30, 2024, 6:05pm

There has been an important concern raised about Vector's conditional Sequence and Collection conformances: its iterator and subsequence types will need to make a full copy of the vector every time they are created:

let items: Vector = [1, 2, 3, ..., 1_000_000]

func foo() {
  for item in vector { // Implicit copy of a million integers
    ...
  }
}

func bar() {
  var slice = items[...] // copy of a million integers
  while let value = slice.first {
    ...
    slice = slice.dropFirst() // copy of a million integers
  }
}

For large vectors, this has the potential to become a major performance footgun.

Collection requires its slicing subscript to have O(1) complexity; Vector's implementation does technically satisfy this -- it copies a constant number of items! -- however, this observation merely highlights the limits of big-o notation; it will provide precious little solace to folks who have loops like in bar above.

If large vectors are going to be pervasive, then the only remedy I can see is to remove Vector's conditional Sequence and Collection conformances, losing our ability to pass vectors to functions generic over these protocols, and the ability to use classic for-in loops to iterate over them.

Borrowable container protocols will eventually give us borrowing for-in loops, but we aren't ready to propose those at this time.

I'm a bit disappointed and frustrated by this, as I expected Vector to become the ideal type for storing bulk data in static storage, like the static let in the example above -- but its copyability makes that use case a clear performance trap. The only way out would be to relegate Vector to the tiny cases, giving up on its use for bulk storage.

ksluder · October 30, 2024, 6:09pm

I kind of assumed that Span was going to be the recommended way to pass references to a Vector, but it’s been difficult to understand that since the Span and Vector proposals were split.

This is a bit of a knee-jerk reaciton, but perhaps this is evidence that a type like Vector is not the right solution to the kinds of problems it’s trying to solve? Perhaps what we really want is language support for tail allocations and a way to vend Spans of those.

Karl · October 30, 2024, 6:39pm

Alternatively, once we have borrow variables, you should be able to pass a borrowed vector around. But you'd still have to be vigilant; it would be easy to accidentally make a copy.

let UnicodeDB: Vector<18_549, UnicodeData> = [...]

func useDB(scalar: Unicode.Scalar) {

  let offset = lookupOffset(scalar)
  let db     = UnicodeDB // oops, makes a copy!
  let data   = db[offset]

  let db     = borrow UnicodeDB // ok - no copy :)
  let data   = db[offset]
}

func dbUtility<let N: Int, Data>(_ db: borrowing Vector<N, Data>) {
  // Can't accidentally copy db

  let db2 = db // Error - implicit copies disabled, use the 'copy' operator.
}

ksluder · October 30, 2024, 6:43pm

Strictly speaking, per @Joe_Groff, “borrowing is orthogonal to whether the value is physically copied or passed by reference at the machine calling convention level.” Which would be bad for a Vector<n, Lock>! We might also need extension Vector: RawLayout where Element : RawLayout.

Joe_Groff · October 30, 2024, 6:46pm

As the comment you linked said:

Values of types that either do have a significant address (such as C++ types, or weak references), or are larger than a given threshold (four pointers's size), or are of unknown size (such as unspecialized generics) are passed by address.

which should cover either huge vectors or vectors containing types with address-significant types like Lock.

ksluder · October 30, 2024, 6:47pm

How does the compiler know that a Vector<n, SomeRawLayoutType> is itself @_rawLayout?

Joe_Groff · October 30, 2024, 6:47pm

The baseline assumption for a generic type we don't know anything more specific about is that it is potentially raw-layout, and so passed by address.

Old name	New name
`Vector2<Scalar>`	`SIMD2<Scalar>`
`Vector3<Scalar>`	`SIMD3<Scalar>`
`Vector4<Scalar>`	`SIMD4<Scalar>`
`Vector8<Scalar>`	`SIMD8<Scalar>`
`Vector16<Scalar>`	`SIMD16<Scalar>`
`Vector32<Scalar>`	`SIMD32<Scalar>`
`Vector64<Scalar>`	`SIMD64<Scalar>`