Vector, a fixed-size array

Hi Evolution, with the recent pitch for Integer Generics, we finally have a way to express counts, capacities, and more for types. The most natural extension from that pitch in my opinion is finally letting Swift have fixed-size arrays. Here is my pitch for such a construct for the standard library. Please let me know what you think!

Vector, a fixed-size array

Introduction

This proposal introduces a new type to the standard library, Vector, which is
a fixed-size array. This is analogous to the
classical C arrays T[N],
C++'s std::array<T, N>,
and Rust's arrays [T; N].

Motivation

Arrays in Swift have served as the go to choice when needing to put items in an
ordered list. They are a great data structure ranging from a variety of
different use cases from teaching new developers all the way up to sophisticated
implementation details of something like a cache.

However, using Array all the time doesn't really make sense in some scenarios.
It's important to understand that Array is a heap allocated growable data
structure which can be expensive and unnecessary in some situations. The next
best thing is to force a known quantity of elements onto the stack, probably by
using tuples.

func complexAlgorithm() {
  let elements = (first, second, third, fourth)
}

Unfortunately, using tuples in this way is very limited. They don't allow for
dynamic indexing or iteration:

func complexAlgorithm() {
  let elements = (first, second, third, fourth)

  // error: for-in loop requires '(Int, Int, Int, Int)' to conform to 'Sequence'
  for element in elements {
    compute(element)
  }

  var i = 0

  while i != elements.indices {
    // error: cannot access element using subscript for tuple type
    //        '(Int, Int, Int, Int)'; use '.' notation instead
    let element = elements[i]

    compute(element)
  }
}

It wasn't until SE-0322 Temporary uninitialized buffers, which proposed the withUnsafeTemporaryAllocation
facilities, that made this situation a little easier to work with by giving us a
direct UnsafeMutableBufferPointer pointing either somewhere on the stack or to
a heap allocation. This API allows us to get the indexing and iteration we want,
but it drops down to an unsafe layer which is unfortunate because there should
be much safer ways to achieve the same while not exposing unsafety to
developers.

While we aren't getting rid of Array anytime soon, more and more folks are
looking towards Swift to build safer and performant code and having Array be
our only solution to an ordered list of things is less than ideal. Array is a
very general purpose array collection that can suit almost any need, but it is
always heap allocated, automatically resizable, and introduces retain/release
traffic. These implicit allocations are becoming more and more of a bottleneck,
especially in embedded domains where there might not be a lot of memory for many
allocations or even heap allocations at all. Swift should be able to provide
developers a safe API to have an ordered list of homogeneous items on the stack,
allowing for things like indexing, iteration, and many other collection
utilities.

Proposed solution

We introduce a new top level type, Vector, to the standard library which is a
fixed-size contiguously inline allocated array. We're defining "inline" as using
the most natural allocation pattern depending on the context of where this is
used. It will be stack allocated most of the time, but as a class property
member it will be inline allocated on the heap with the rest of the properties.
Vector will never introduce an implicit heap allocation just for its storage
alone.

func complexAlgorithm() -> Int {
  // This is a stack allocation, no 'malloc's or reference counting here!
  let elements: Vector<4, Int> = [1, 2, 3, 4]

  // OK
  for element in elements {
    compute(element)
  }

  var i = 0

  while i != elements.indices {
    let element = elements[i] // OK

    compute(element)
  }
}

This type also serves as a safer replacement to withUnsafeTemporaryAllocation
when the amount of things to reserve is known statically:

// Previously
withUnsafeTemporaryAllocation(of: Int.self, capacity: 16) {
  $0.initialize(repeating: 0)

  for i in $0.indices {
    $0[i] = compute()
  }
}

// Now
var ints = Vector<16, Int>(repeating: 0)

for i in ints.indices {
  ints[i] = compute()
}

Detailed design

Vector will be a simple noncopyable struct capable of storing other potentially
noncopyable elements. It will be conditionally copyable only when its elements
are.

public struct Vector<let Count: Int, Element: ~Copyable>: ~Copyable {}

extension Vector: Copyable where Element: Copyable {}
extension Vector: BitwiseCopyable where Element: BitwiseCopyable {}
extension Vector: Sendable where Element: Sendable {}

MemoryLayout

The memory layout of a Vector is defined by taking its Element's stride and
multiplying that by its Count for its size and stride. Its alignment is equal
to that of its Element:

MemoryLayout<UInt8>.stride == 1
MemoryLayout<UInt8>.alignment == 1

MemoryLayout<Vector<4, UInt8>>.size == 4
MemoryLayout<Vector<4, UInt8>>.stride == 4
MemoryLayout<Vector<4, UInt8>>.alignment == 1

struct Uneven {
  let x: UInt32
  let y: Bool
}

MemoryLayout<Uneven>.stride == 8
MemoryLayout<Uneven>.alignment == 4

MemoryLayout<Vector<4, Uneven>>.size == 32
MemoryLayout<Vector<4, Uneven>>.stride == 32
MemoryLayout<Vector<4, Uneven>>.alignment == 4

struct ACoupleOfUInt8s {
  let x: Vector<2, UInt8>
}

MemoryLayout<ACoupleOfUInt8s>.stride == 2
MemoryLayout<ACoupleOfUInt8s>.alignment == 1

MemoryLayout<Vector<2, ACoupleOfUInt8s>>.size == 4
MemoryLayout<Vector<2, ACoupleOfUInt8s>>.stride == 4
MemoryLayout<Vector<2, ACoupleOfUInt8s>>.alignment == 1

Literal Initialization

Before discussing any of the API, we need to discuss how the array literal
syntax will be used to initialize a value of Vector. While naively we could
conform to ExpressibleByArrayLiteral, the shape of the initializer always
takes an actual Array value. This could be optimized away in the simple cases,
but fundamentally it doesn't make sense to have to do an array allocation to
initialize a stack allocated Vector. Therefore, the array literal
initialization for Vector will be a special case, at least to start out with.
A stack allocated vector using a vector literal will do in place initialization
of each element at its stack slot. The two below are roughly equivalent:

let numbers: Vector<3, Int> = [1, 2, 3]

// Roughly gets compiled as:

// This is not a real 'Vector' initializer!
let numbers: Vector<3, Int> = Vector()
numbers[0] = 1
numbers[1] = 2
numbers[2] = 3

There shouldn't be any intermediary values being copied or moved into the vector.

Note that the array literal syntax will only create a Vector value when the
compiler knows concretely that it is a Vector value. We don't want to break
source whatsoever, so whatever current rules the compiler has will still be
intact. Consider the following uses of the array literal syntax and where each
call site creates either a Swift.Array or a Swift.Vector.

let a = [1, 2, 3] // Swift.Array
let b: Vector<3, Int> = [1, 2, 3] // Swift.Vector

func generic<T>(_: T) {}

generic([1, 2, 3]) // passes a Swift.Array
generic([1, 2, 3] as Vector<3, Int>) // passes a Swift.Vector

func test<T: ExpressibleByArrayLiteral>(_: T) {}

test([1, 2, 3]) // passes a Swift.Array
test([1, 2, 3] as Vector<3, Int>) // error: 'Vector<3, Int>' does not conform to 'ExpressibleByArrayLiteral'

func array<T>(_: [T]) {}

array([1, 2, 3]) // passes a Swift.Array
array([1, 2, 3] as Vector<3, Int>) // error: 'Vector<3, Int>' is not convertible to 'Array<Int>'

func vector<T>(_: Vector<3, T>) {}

vector([1, 2, 3]) // passes a Swift.Vector
vector([1, 2, 3] as [Int]) // error: 'Array<Int>' is not convertible to 'Vector<3, Int>'

I discuss later about a hypothetical ExpressibleByVectorLiteral and the design
challenges there in Future Directions.

Initialization

In addition to literal initialization, Vector offers two other forms of
initialization; a closure based one for all element types and a repeating based
one for copyable elements.

extension Vector where Element: ~Copyable {
  /// Initializes every element in this vector running the given closure value
  /// that returns the element to emplace at the given index.
  ///
  /// This will call the closure `Count` times, where `Count` is the static
  /// count of the vector, to initialize every element by passing the closure
  /// the index of the current element being initialized. The closure is allowed
  /// to throw an error at any point during initialization at which point the
  /// vector will stop initialization, deinitialize every currently initialized
  /// element, and throw the given error back out to the caller.
  ///
  /// - Parameter body: A closure that returns an owned `Element` to emplace at
  ///                   the passed in index.
  public init<E: Error>(_ body: (Int) throws(E) -> Element) throws(E)
}

extension Vector where Element: Copyable {
  /// Initializes every element in this vector to a copy of the given value.
  ///
  /// - Parameter value: The instance to initialize this vector with.
  public init(repeating: Element)
}

Deinitialization and consumption

Once a vector is no longer used, the compiler will implicitly destroy its value.
This means that it will do an element by element deinitialization, releasing any
class references or calling any deinits on noncopyable types. Doing
deinitialization this way will lose all access to every element. If one is done
with the vector value but would like retain one or more of its elements, then a
call to the consume method will grant access to the owned element value for
retention.

extension Vector where Element: ~Copyable {
  /// Consumes the vector by running the given closure value over each element
  /// passing in the element's index as well as the owned element value.
  ///
  /// This will call the closure `Count` times, where `Count` is the static
  /// count of the vector, to deinitialize every element by passing the closure
  /// the index of the current element being deinitialized and the owned element
  /// value. The closure is allowed to throw an error at any point during
  /// deinitialization at which point the vector will eagerly deinitialize the
  /// remaining elements and throw the given error back out to the caller.
  ///
  /// - Parameter body: A closure given the current deinitialized element's
  ///                   index and the owned element value.
  /// - Returns: The return value, if any, of the `body` closure parameter.
  public consuming func consume<Result: ~Copyable, E: Error>(
    _ body: (Int, consuming Element) throws(E) -> Result
  ) throws(E) -> Result
}

Generalized Collection APIs

extension Vector where Element: ~Copyable {
  public typealias Element = Element
  public typealias Index = Int
  public typealias Indices = Range<Int>

  public var count: Int { Count }
  public var indices: Range<Int> { 0 ..< Count }
  public var isEmpty: Bool { Count == 0 }
  public var startIndex: Int { 0 }
  public var endIndex: Int { Count }

  public borrowing func index(after i: Int) -> Int
  public borrowing func index(before i: Int) -> Int

  public subscript(_ index: Int) -> Element
}

Collection conformances for copyable elements

Here, we define our basic collection conformances to RandomAccessCollection,
MutableCollection, and BidirectionalCollection but only when the element is
known to be copyable. This is an unfortunate limitation, but the current
collection protocols make a deep assumption about element copyability as well
as the copyability of the collection itself. I discuss about this more in depth
in Future Directions.

extension Vector: RandomAccessCollection where Element: Copyable {}
extension Vector: MutableCollection where Element: Copyable {}
extension Vector: BidirectionalCollection where Element: Copyable {}

These conformances allow us to iterate vectors, but only when the element is
copyable right now. Noncopyable elements need to fallback to index iteration:

let numbers: Vector<3, Int> = [1, 2, 3]

for number in numbers {
  print(number)
}

let atomicInts: Vector<3, Atomic<Int>> = [Atomic(1), Atomic(2), Atomic(3)]

for i in atomicInts.indices {
  print(atomicInts[i].load(ordering: .relaxed))
}

Swapping elements

extension Vector where Element: ~Copyable {
  /// Replaces an existing vector entry with a new value, returning the value
  /// that was replaced.
  ///
  /// - Parameters:
  ///   - i: The index of the element to be swapped with.
  ///   - newValue: The new value to swap with the current element.
  /// - Returns: The old swapped value.
  public borrowing func swapAt(
    _ i: Int,
    with newValue: consuming Element
  ) -> Element

  /// Exchanges the values at the specified indices of the vector.
  ///
  /// Both parameters must be valid indices of the vector that are not
  /// equal to `endIndex`. Calling `swapAt(_:_:)` with the same index as both
  /// `i` and `j` has no effect.
  ///
  /// - Parameters:
  ///   - i: The index of the first value to swap.
  ///   - j: The index of the second value to swap.
  public borrowing func swapAt(
    _ i: Int,
    _ j: Int
  )
}

C Interop changes

With the introduction of Vector, we have a unique opportunity to fix another
pain point within the language with regards to C interop. Currently, the Swift
compiler imports a C array of type T[24] as a tuple of T with 24 elements.
Previously, this was really the only representation that the compiler could pick
to allow interfacing with C arrays. It was a real challenge working with these
fields from C in Swift. Consider the following C struct:

struct section_64 {
  char sectname[16];
  char segname[16];
  uint64_t addr;
  uint64_t size;
  uint32_t offset;
  uint32_t align;
  ...
};

Today, this gets imported as the following Swift struct:

struct section_64 {
  let sectname: (CChar, CChar, CChar, CChar, CChar, CChar, ... 10 more times)
  let segname: (CChar, CChar, CChar, CChar, CChar, CChar, ... 10 more times)
  let addr: UInt64
  let size: UInt64
  let offset: UInt32
  let align: UInt32
  ...
}

Using an instance of section_64 in Swift for the most part is really easy.
Accessing things like addr or size are simple and easy to use, but using the
sectname property introduces a level of complexity that isn't so fun to use.

func getSectionName(_ section: section_64) -> String {
  withUnsafePointer(to: section.sectname) {
    // This is unsafe! 'sectname' isn't guaranteed to have a null byte
    // indicating the end of the C string!
    String(cString: $0)
  }
}

func iterateSectionNameBytes(_ section: section_64) {
  withUnsafeBytes(to: section.sectname) {
    for byte in $0 {
      ...
    }
  }
}

Having to resort to using very unsafe API to do anything useful with imported C
arrays is not something a memory safe language like Swift should be in the
business of. Vector allows us to clean up this sore spot in a very elegant and
much more importantly, safe way.

We plan to introduce a new upcoming feature that folks can enable whenever
they'd like to via -enable-upcoming-feature CArrayVector. This will change the
current behavior of importing C arrays as tuples, to instead import them as
Vector. For the previous example, we'd instead generate the following:

struct section_64 {
  let sectname: Vector<16, CChar>
  let segname: Vector<16, CChar>
  let addr: UInt64
  let size: UInt64
  let offset: UInt32
  let align: UInt32
  ...
}

By introducing an upcoming feature, it lets folks who are ready to migrate their
codebase using a C interface to use Vector entirely, while still allowing
dependencies or dependents to use the old behavior with the same C interface.

In a future Swift language mode, we plan to make this behavior the default and
drop importing C array fields as tuples and only as Vector. This approach
allows developers to get the new behavior early if they'd like to use it, which
will make the language mode transition much smoother because there would be no
source break with regards to this feature.

Source compatibility

Vector is a brand new type in the standard library, so source should still be
compatible.

Given the name of this type however, we foresee this clashing with existing user
defined types named Vector. This isn't a particular issue though because the
standard library has special shadowing rules which prefer user defined types by
default. Which means in user code with a custom Vector type, that type will
always be preferred over the standard library's Swift.Vector. By always I
truly mean always.

Given the following two scenarios:

// MyLib
public struct Vector<T> {

}

print(Vector<Int>.self)

// error: generic type 'Vector' specialized with too many type parameters
//        (got 2, but expected 1)
print(Vector<3, Int>.self)

Here, we're exercising the fact that this MyLib.Vector has a different generic
signature than Swift.Vector, but regardless of that we will prefer MyLib's
version even if we supply more generic arguments than it supports.

// MyLib
public struct Vector<T> {

}

// MyExecutable main.swift
import MyLib

print(Vector<Int>.self) // OK

// error: generic type 'Vector' specialized with too many type parameters
//        (got 2, but expected 1)
print(Vector<3, Int>.self)

// MyExecutable test.swift

// error: generic type 'Vector' specialized with too few type parameters
//        (got 1, but expected 2)
print(Vector<Int>.self)

And here, we exercise that a module with its own Vector, like MyLib, will
always prefer its own definition within the module, but even for dependents
who import MyLib it will prefer MyLib.Vector. For files that don't
explicitly MyLib, it will prefer Swift.Vector.

ABI compatibility

Vector is a brand new type in the standard library, so ABI should still be
compatible.

Implications on adoption

This is a brand new type which means there will be deployment version
requirement to be able to use this type, especially considering it is using new
runtime features from integer generics.

Future directions

Equatable, Hashable, CustomStringConvertible, and other protocols.

There are a wide class of protocols that this type has the ability to conform to,
but the issue is that it can only conform to them when the element conforms to
them (this is untrue for CustomStringConvertible, but it still requires
copyability). We could introduce these conformances but have them be conditional
right now and generalize it later when we generalize these protocols, but if we
were to ship say Swift X.Y with:

@available(SwiftStdlib X.Y)
extension Vector: Equatable where Element: Equatable // & Element: Copyable

and later down the road in Swift X.(Y + 1):

@available(SwiftStdlib X.Y)
extension Vector: Equatable where Element: ~Copyable & Equatable

Suddenly, this availability isn't quite right because the conformance that
shipped in Swift X.Y doesn't support noncopyable elements. To prevent the
headache of this and any potential new availability feature, we're holding off on
these conformances until they are fully generalized.

Noncopyable Sequence and Collection protocols

Similarly, while we aren't conforming to a lot of protocols that we envision
we'll generalize one day, we are conforming to Sequence, Collection,
RandomAccessCollection, and MutableCollection. We don't foresee that we'll
ever generalize these protocols because they have a deep design for implicitly
copying their elements, but even the collection itself.
SE-0437 Noncopyable Standard Library Primitives
goes into more depth about this rationale and mentions that creating new
protocols to support noncopyable containers with potentially noncopyable
elements are all marked as future work.

Much of the Collection API that we're generalizing here for this type are all
API we feel confident will be included in any future container protocol. Even if
we find that to not be the case, they are still useful API outside of generic
collection contexts in their own right.

Span APIs

With the recent proposal
SE-0447 Span: Safe Access to Contiguous Storage
who defines a safe abstraction over viewing contiguous storage, it would make
sense to define API on Vector to be able to get one of these Spans. However,
the proposal states that:

We could provide withSpan() and withBytes() closure-taking functions as
safe replacements for the existing withUnsafeBufferPointer() and
withUnsafeBytes() functions. We could also also provide lifetime-dependent
span or bytes properties.
...
Of these, the closure-taking functions can be implemented now, but it is
unclear whether they are desirable. The lifetime-dependent computed properties
require lifetime annotations, as initializers do. We are deferring proposing
these extensions until the lifetime annotations are proposed.

All of which is exactly true for the current Vector type. We could propose a
withSpan style API now, but it's unclear if that's what we truly want vs. a
computed property that returns the span which requires lifetime annotation
features. For now, we're deferring such API until a lifetime proposal is
proposed and accepted.

ExpressibleByVectorLiteral

While the proposal does propose a literal initialization for Vector that
doesn't use ExpressibleByArrayLiteral, we are intentionally not exposing some
ExpressibleByVectorLiteral or similar. It's unclear what this protocol would
look like because each design has a different semantic guarantee:

public protocol ExpressibleByVectorLiteral: ~Copyable {
  associatedtype Element: ~Copyable

  init<let N: Int>(vectorLiteral: consuming Vector<N, Element>)
}

This naive approach would satisfy a lot of types like Array, Set,
some hypothetical future noncopyable array, etc. These types actually want a
generic count and can allocate just enough space to hold all of those elements.

However, this shape doesn't quite work for Vector itself because initializing
a Vector<4, Int> should require that the literal has exactly 4 elements. Note
that we wouldn't be able to impose a new constraint just for the conformer, so
Vector couldn't require that N == Count and still have this witness the
requirement. Similarly, a Pair type could be vector initialized, but only if
the vector has exactly 2 elements. If we had the ability to define
associatedvalue, then this makes the conformance pretty trivial for both of
these types:

public protocol ExpressibleByVectorLiteral: ~Copyable {
  associatedtype Element: ~Copyable
  associatedvalue Count: Int

  init(vectorLiteral: consuming Vector<Count, Element>)
}

extension Vector: ExpressibleByVectorLiteral {
  init(vectorLiteral: consuming Vector<Count, Element>) { ... }
}

extension Pair: ExpressibleByVectorLiteral {
  init(vectorLiteral: consuming Vector<2, Element>) { ... }
}

But even with this design it's unsuitable for Array itself because it doesn't
want a static count for the literal, it still wants it to be generic.

It would be nice to define something like this either on top of Vector,
parameter packs, or something else that would let us define statically the
number of elements we need for literal initialization or be dynamic if we opt to.

InlineArray and SmallArray

In the same vein as this type, it may make sense to introduce some InlineArray
type which would support appending and removing elements given a fixed-capacity.

var numbers: InlineArray<4, Int> = [1, 2]
print(numbers.capacity) // 4
print(numbers.count) // 2
numbers.append(3)
print(numbers.count) // 3
numbers.append(4)
print(numbers.count) // 4
numbers.append(5) // error: not enough space

This type is significantly different than the type we're proposing because
Vector defines a fixed-size meaning you cannot append or remove from it, but
it also requires that every single element is initialized. There must never be
an uninitialized element within a Vector, however for InlineArray this is
not true. It would act as a regular array with an initialized prefix and an
uninitialized suffix, it would be inline allocated (stack allocated for locals,
heap allocated if it's a class member, etc.), and it would not be growable.

The difficulty in proposing such a type right now is that we have no way of
informing the compiler what parts of InlineArray are initialized and what
parts are not. This is critical for copy operations, move operations, and
destroy operations. Assuming that an uninitialized element is initialized and
attempting to perform any of these operations on it may lead to runtime crashes
which is definitely undesirable.

Once we have InlineArray and some hypothetical noncopyable heap allocated
array type (which SE-0437 Noncopyable Standard Library Primitives
dons as HypoArray as a placeholder), it should be very trivial to define a
SmallArray type similar to the one found in LLVM APIs llvm::SmallVector.

public enum SmallArray<let Capacity: Int, Element: ~Copyable>: ~Copyable {
  case small(InlineArray<Capacity, Element>)
  case large(HypoArray<Element>)
}

which would act as an inline allocated array until one out grew the inline
capacity and would fall back to a dynamic heap allocation.

Syntax sugar

We feel that this type will become as fundamental as Array and Dictionary
both of which have syntactic sugar for declaring a type of them, [T] for
Array and [K: V] for Dictionary. It may make sense to define something
similar for Vector, however we leave that as a future direction as the
spelling for such syntax is not critical to landing this type.

It should be fairly trivial to propose such a syntax in the future either via a
new proposal, or as an amendment to this one. Such a change should only require
a newer compiler that supports the syntax and nothing more.

Some syntax suggestions:

  • [N x T] or [T x N]
  • [N * T] or [T * N]
  • T[N] (from C)
  • [T; N] (from Rust)

Note that it may make more sense to have the length appear before the type. I
discuss this more in depth in Reorder the generic arguments.

Alternatives considered

A name other than Vector

For obvious reasons, we cannot name this type Swift.Array to match the
"term of art" that other languages like C, C++, and Rust are using for this
exact type. However, while this name is the de facto for other languages, it
actually mischaracterizes the properties and behaviors of this type considering
existing terminology in mathematics. A. Stepanov mentions in his book, "From
Mathematics to Generic Programming", that using the name std::vector for their
dynamically allocated growable array type was perhaps a mistake for this same
reason:

If we are coming up with a name for something, or overloading an existing name,
we should follow these three guidelines:

  1. If there is an established term, use it.
  2. Do not use an established term inconsistently with its accepted meaning. In
    particular, overload an operator or function name only when you will be
    preserving its existing semantics.
  3. If there are conflicting usages, the much more established one wins.

The name vector in STL was taken from the earlier programming languages
Scheme and Common Lisp. Unfortunately, this was inconsistent with the much
older meaning of the term in mathematics and violates Rule 3; this data structure
should have been called array. Sadly, if you make a mistake and violate these
principles, the result might stay around for a long time.

- Stepanov, A. A., Rose, D. E. (2014). From Mathematics to Generic Programming. United Kingdom: Addison-Wesley.

Indeed, the std::vector type goes against the definition of vector by being a
growable container, having a non-fixed magnitude.

We fully acknowledge that the Swift types, Swift.Array and Swift.Vector, are
complete opposites of the C++ ones, std::vector and std::array. While it may
be confusing at first, ultimately we feel that our names are more in line with
the mathematical term of art.

If there was any type we could add to the standard library whose name could be
Vector, it must be this one.

Reorder the generic arguments (Vector<T, N> instead of Vector<N, T>)

If we directly followed existing APIs from C++, then obviously the length should
follow the element type. However we realized that when reading this type aloud,
it's "a vector of 3 integers" for example instead of "a vector of integers of
size 3". It gets more interesting the more dimensions you add.
Consider an MxN matrix. In C, you'd write this as T[N][M] but index it as
[m][n]. We don't want to introduce that sort of confusion (which is a good
argument against T[N] as a potential syntax sugar for this type), so the
length being before the underlying element makes the most sense at least for any
potential sugared form. [M * [N * T]] would be indexed directly as it is spelt
out in the sugared form, [m][n]. In light of that, we wouldn't want the sugar
form to have a different ordering than the generic type itself leading us to
believe that the length must be before the element type.

Acknowledgments

I would like the thank the following people for helping in the design process
of this type:

  • Karoy Lorentey
  • Guillaume Lessard
  • Joe Groff
  • Kuba Mracek
  • Andrew Trick
  • Erik Eckstein
  • Philippe Hausler
  • Tim Kientzle
70 Likes

Great proposal, I am happy to see it.

especially considering it is using new runtime features from integer generics.

Can you elaborate on this part? What exactly does the implementation use from the runtime? My main interest is in the possibility of reimplementing the type in a separate module.

If that's possible I'd be also interested in having some way to specify a type to use to import a C-array.

Some uses cases I'm interested in compute the count based on statically available information e.g. fit as many elements in 1 KB through something like MemoryLayout<MyType>.stride / 1024. Will that be eventually possible as well? This would be especially useful for the InlineArray type mentioned as a future direction.

Not bad, but what if we put Agda and Swift on the left and “C++ and Rust” on the right instead.

23 Likes

I feel like if we have to clarify what a Vector is in the title of this pitch, maybe it should just be called FixedSizeArray.

Nice pitch otherwise though, excited to see this in the standard library!

17 Likes

I read through the complete proposal and just want to say that the name Vector makes sense to me and is the logical choice.

Just because other languages got it wrong doesn’t mean Swift has too, and it’s easy to say too. Anyone who does work with vectors will immediately recognize this.

13 Likes

Ok I actually read it this time (and I see you have already addressed the name, lol). Here is some real feedback.

You seem very focused on the exact layout and allocation details of this type. I understand that a lot of people who care about such a type would be interested in such a thing. Do we actually truly intend to guarantee all of these things? Can I take the address of such an object directly and muck around with it? I don't think we guarantee this for pretty much any other type except for tuples, and even then I think we use wishy-washy language like "in instances where it would be visible from a C ABI". Is this type never going to be copy-on-write? Will we never add metadata to it? (I understand that these will get decided now and baked into the ABI, so the time period I am referring to is "in between this proposal and when someone goes to implement it".) Just wanted to clarify that this is intended.

This seems like a reasonable extension but actually has subtle nuance. An imported fixed-size (such as one similar the very one you mention) can be partially initialized. This type cannot be as far as I can tell. That is fine if you only do member-wise accesses to the data that is initialized, but the new type lets people iterate over the collection and read uninitialized data. I would feel better if you explicitly addressed the rules around this. One example you may wish to pair with that is the actual code to read out sectname and segname using this new interface ;)

Partially addressed above but want to repeat in this context: is this type ABI compatible with a tuple? Is there a potential migration path here for people who are already using tuples?

Just off the top of my head random bits and bobs that might be interesting. Do we want any special integration with EmptyCollection or CollectionOfOne? Maybe compile-time slices, which might be possible to do without generic parameter arithmetic?

10 Likes

Is there a specific reason that the size is defined as count * stride rather than aligning with the specific semantics of size vs stride and being max(0, count - 1) * stride + size? It seems like the more consistent behavior is:

MemoryLayout<Vector<4, Uneven>>.size == 29 // 3*8 + 5
MemoryLayout<Vector<4, Uneven>>.stride == 32

to preserve the space at the end of the type.


I don't have a strong opinion about the syntax except that I think it shouldn't be like C, I think the type should be nested inside it so that composition is clearer to read. But I see you're already making that argument as well:

3 Likes

Given this guarantee, what is the expected behavior of this code?

func getClosure() -> () -> Void {
    var vector: Vector<5, Int> = [1, 2, 3, 4, 5]
    return { vector[0] = 0 }
}

I assume it would be a compilation error given that it would otherwise need an implicit heap allocation, is that correct?

One reason to pad out the final element is that the existing runtime operations for bulk moving/copying/destroying multiple values don't try to preserve tail padding when writing the final element.

3 Likes

You’d need that heap allocation even if the variable were just an Int, so it’s not Vector introducing the allocation.

3 Likes

Makes sense, I think that's the most consistent behavior.

Maybe the proposal should be modified to mention that there are cases other than class members where a Vector will be on the heap. It could say something like "The Vector type does not introduce any heap allocations itself, but - just like any value - it may be stored on the heap in certain situations (like when it is a class property member)".

3 Likes

I'm very excited at the prospect of having this type available. So excited I just made an account! :)

It's relatively rare that one needs this - allocation is pretty fast all things considered. And when you do need this kind of type, its importance quickly becomes very high. There's no in-between.

However, I think naming this Vector is going to be very confusing. Forever.

C++ and Rust naming disagree on a number of high-visibility items:

  • Containers and Collections
  • unordered_map and HashMap
  • map and BTreeMap
  • priority_queue and BinaryHeap
  • std::transform and .map

The one thing here they get close on is vector and Vec. If even they agree on the naming, there's something going on. By making Swift the odd one out here, every single tutorial online and in a book is going to have to include a section where they explain that Swift and C++ use opposite terms here.

Everyone I work with using Swift has a background in Objective-C, C++, or Rust. I'm going to have to explain to them that these terms are flipped when we start using this. And future Swift programmers are going to have to learn that if they transition from Swift into C/C++/Rust, that the terms are flipped. This proposal was posted today and there's already memes making fun of this.

Using the name Vector to mean a fixed-size array violates Stepanov's rule #2, "Do not use an established term inconsistently with its accepted meaning." While Array varies between languages on whether it includes resizeable types or not (as not all languages have a strong stack vs heap distinction), Vector is pretty consistent within software. It means resizable.

This naming adds a lot of friction for... what, exactly?

ultimately we feel that our names are more in line with the mathematical term of art.

Why does being in line with mathematical terms matter? We're programmers, not mathematicians. I'd wager that a super majority of Swift programmers do not have a significant math background, but that most do have experience with C, C++, Rust, or Objective-C. The term of art in software engineering is that Vector is a dynamicly sized buffer. Adding the opposite under the name muddies water that is not muddy.

The proposal already uses what I think is the best language for this: "fixed-size array". Therefore, I believe variations on FixedArray, FixedSizeArray, etc are perfectly fine names for this type that we should use instead.

16 Likes

As a general concept, and the design specifics, I'm all into: YES, PLEASE, YESTERDAY!

That said 'Vector' has FAR too many overrides with other concepts, most notably the mathematical one, as has already been highlighted in this thread. I don't have a brilliant name, but honestly something as functional as FixedSizeArray seems perfectly fine to me. Obviously more verbose, yes - but it avoids splashing over a a type that I constantly see in active use in quite a few codebases (including my own). Like - nearly anything that does physics, or references 3D graphics.

20 Likes

I’m also in favor of FixedSizeArray over Vector, not only because years of working in graphics and UI frameworks have primed me to think of vectors as 2-3 dimensional numeric types, but also explicitly because there would now be ambiguity with C++’s vector vs array (of which previously there was none).

If we wanted to have fun and still wanted a one word name, I would suggest Shelf or Case — it ain’t growing in size, but it holds a fixed number of items. That said, +1 to FixedSizeArray for being unambiguously exactly what this is, even with the limitations that we can’t reasonably “append” to it.

On that topic, perhaps we could offer append and remove operations when the type is Optional<T>? That covers describing the uninitialized case, and the caller accepts at that point the extra memory requirements of describing the optional for the API benefits.

8 Likes

A fixed size array type would be a very valuable addition to Swift, so absolutely support that.

About the naming, I'm not a fan of Vector. As per wikipedia, vectors (in mathematics)

have to be expressed by both magnitude and direction.

This trips me up in other languages where they use the term vector to describe some form of collection (i.e. C++) as well. In Swift, we now get the chance to get this right at the start.

Of the various suggested alternatives FixedSizeArray sounds like the best one: it is what it says on the tin (and even your pitches title :wink:)

8 Likes

as for proposal - big +1. Regarding naming - for me, Vector is associated with a dynamically sized container akka std::vector. I'm not sure whether FixedSizeArray is the best name, but it sounds better than Vector to me. Other alternatives, as already suggested FixedArray or maybe just ArrayN ( where N denotes the size, like Array10)

2 Likes

I could only skim the proposal for now (may have more tim later), but this seems like an obvious +1 for me.

Regarding the name, I don't want to take a stance, but for the side that is against Vector because it clashes with C++ and Rust, please consider this: Just as the flip might be confusing for people coming to Swift with a C++ background (or Rust), it has always been confusing for people to come to C++ from other languages (e.g. JavaScript), for two reasons:

  1. Arrays all of a sudden cannot grow, this often comes as a surprise (amongst other things, like them decaying into pointers).
  2. Instead, they learn to use something called "vector", a name which they probably know from school and conceptualize as a "thing that holds a fixed amount of numbers".

I am mostly thinking of younger people (as I mentor trainees and students) and have personally seen this happen. In other words: C++ and Rust got it "wrong" (IMO), at least when naming growable arrays "vectors".

Obviously that doesn't mean we should or should not name it in the same way, but "confusion for learners depending on their background" goes both ways, so I think we should consider other arguments for the name. Since C++ and Rust (and probably other languages) already "started" the confusion and there is a clash to another field (maths, as in the HS subject, vectors are not only introduced in college, right?) we run into that either way.

Personally I like Vector as I like the "math-ness" of it, but I realize that's personal preference and no objective rgument.

24 Likes

I, too, dislike the name of this type.

  • First of all, I disagree with this proposal that this type actually represents a mathematical vector. To me, a vector is a mathematical construct that represents a sequence of mathematical values. However, the proposed type allows you to store any type in it. To me, the concept of a vector of tasks, threads, objects, mutexes, or distributed actors sounds ridiculous, but they would all be allowed under this proposal.
  • Another reason why the proposed type doesn’t actually represent a vector is because it doesn’t provide the operations you’d expect a vector type to have. There’s no addition, no scalar multiplication, no dot product, and no cross product. It doesn’t conform to AdditiveArithmetic (even conditionally), even though an actual vector type would.
  • The proposal argues that C++’s std::vector isn’t an actual vector type because it’s “a
    growable container, having a non-fixed magnitude.” But I think that’s being a bit selective. If we’re going by the exact mathematical definition of a vector, then shouldn’t we also disallow mutating the Vector instance’s elements? After all, a vector is a mathematical construct, and as such, it’s immutable. (Plus, practically speaking, there are cases where representing a vector in code would require you to use a dynamically-sized container, e.g. reading an arbitrarily-dimensional vector from a file or creating a vector from user input.)
  • If the standard library were to define a type named Vector, then that would cause a source break in any module that uses another module’s Vector type. Additionally, it would discourage library authors in the future from creating their own types named Vector, even when that’s the best name for the type. I know that Swift would allow you to disambiguate these types, but it’s a poor experience to have to qualify a type you’re used to using unqualified and library authors know that.
  • C++ is very popular in the programming world and Rust is also gaining momentum. Nearly every programmer who has used a “vector” in another language would expect it to be a dynamically sized container, and I think it’s more important to conform to that expectation over trying to fit the mathematical term of art. (Especially since work on Swift Numerics and differential programming appears stagnant.)

Other thoughts:

  • I am glad we’re getting this type; it’ll be very useful.
  • While I thought having the count before the element type in the generic parameters was strange at first, I think the reasoning behind the decision is solid.
  • I think the swapAt(_:with:) method should be renamed to exchangeAt(_:with:) because its functionality is closer to the standard library’s exchange(_:with:) function than the swap(_:_:) function.
  • I think we should also consider having a heap-allocated fixed-size array type as well. While having an inline fixed array type is useful, it does come with downsides like O(n) copying and the risk of stack overflow for large arrays.
  • An alternative name for this type could be FixedArray. If we want a more terse name, we could instead call it Fixed. I also like dimi’s idea of Shelf.
3 Likes