[Second review] SE-0453: Vector, a fixed size array

technogen · December 11, 2024, 4:21pm

I agree that Dictionary is not the best name for this type and I'd much rather have it named more practically. I suspect that the choice of the name was entirely dictated by Objective-C, seeing as Swift was designed to be a replacement for Objective-C and one of its goals was to make the transition as smooth as possible.

Karl · December 11, 2024, 4:36pm

And when they need to copy their data in to a Vector, they'll find the expected Vector<let N: Int, Element>.init(_: some Sequence<Element>) does not exist.

And when they get a Vector out of a function and need to copy it in to an Array or something, they'll find the expected RangeReplaceableCollection.init<let N: Int>(_: Vector<N, Element>) also does not exist.

So we definitely don't make it seamless for people to write functions taking/returning vectors.

Nobody1707 · December 11, 2024, 4:51pm

The only way for an Array not to be homogeneous is to store dynamically typed boxes, like an existential or AnyHashable. Vector can do that trick just as well, so I don't see how it's a difference here. Now, if we were talking about the difference between a Vector and a tuple, then this would make sense.

tera · December 11, 2024, 9:05pm

`Fixed` has a few meanings and it's not unprecedented to use this name as a type.

Looking at this whole issue from slightly different angle: I know that this (overloading the type name like below) is not supported currently. But why?

struct Array<Element> { ... }
struct Array<Element, capacity: Int> { ... }

// usage:
let a1: Array<Int> = ...
let a2: [Int] = ...
let b1: Array<Int, 10> = ...
let b2: [Int, 10] = ...

Could have been an elegant solution to this big naming debate.

scanon · December 11, 2024, 9:07pm

If we had that shorthand, I think we'd be much more likely to use it for something like an array with a fixed-capacity inline representation (so up to N elements would be stored inline, and beyond that point it would flip over to being a normal array; Karoy calls this SmallArray<inlineCapacity, Element> following the precedent of LLVM's "SmallXxxx" collections).

allevato · December 11, 2024, 9:15pm

tera:

Looking at this whole issue from slightly different angle: I know that this (overloading the type name like below) is not supported currently. But why?
struct Array<Element> { ... }
struct Array<Element, capacity: Int> { ... }

// usage:
let a1: Array<Int> = ...
let a2: [Int] = ...
let b1: Array<Int, 10> = ...
let b2: [Int, 10] = ...
Could have been an elegant solution to this big naming debate.

Today, you can also write let a1: Array = ... and it will infer the generic type arguments (this is true for any generic type). Introducing these overloads (if the language allowed it) would make existing code ambiguous, because something like let a1: Array = [1, 2, 3] could mean either Array<Int> or Array<Int, 3> in your example.

One could try to carve out exceptions, like disallowing inference of integer type arguments or preferring overloads that don't have them, but I think we do want the ability to infer it because it's a lot nicer to write let v: WhateverWeNameThisThing = [1, 2, 3] than let v: WhateverWeNameThisThing<3, Int> = [1, 2, 3].

taylorswift · December 11, 2024, 10:19pm

the lesson we should take from the mythical String.contains(_:) (née (extension in _StringProcessing).StringProtocol.contains(_:)) is this style of API works, and some even find it elegant, but it just doesn’t have great discoverability.

users are accustomed to packages vending types. the types carry API. this is “simple” and matches peoples’ expectations for where to look for things. vending Foundation-style API “layers” just leads to confusion and inhibits API uptake. the people who know which bricks in the wall to push (usually, the people who wrote the API) will thrive but others will struggle.

tera · December 11, 2024, 10:20pm

You could write this, but I wonder when would you want writing this?
let a1: Array = [1, 2, 3] ?

(I would've understood if you picked something that isn't inferred as Array:
let a1: Set = [1, 2, 3]

If such generic type overloading was allowed I'd (conceptually) treat Array as if it was defined:

struct Array<Element = infer, capacity: Int = 0> { ... }

where the corresponding generic parameters have default values (with capacity=0 meaning the current "dynamic array" and capacity > 0 meaning the newly discussed type. "StaticArray"?

allevato · December 11, 2024, 10:44pm

Continuing to peel that onion will lead to further roadblocks. For example, if someone writes:

extension Array { ... }

Does that now extend both forms of Array? Would we need to introduce more magic syntax to differentiate the two while preserving the existing meaning of the former?

Even if all of this were solvable, from a readability and education point of view, the idea that two types both named Array would expose vastly different APIs solely on the basis of the number of generic parameters they were instantiated with would be a gigantic pill to swallow.

duan · December 11, 2024, 11:24pm

These days you really should not use Vector<32, ImageID>. Rather, you want to represent your vectors/tensors in something that describes its shape, build up the graph of computes connecting these shapes, do some sort of optimization on that graph, and compile it to run on some backend hardware that probably isn't your CPU.

Granted, sooner or later you'll need the in/out put of the ML model to be represented in your process, you'd probably want to pre-allocate some memory that can be reused for multiple batches of training. (in short, Span).

taco · December 11, 2024, 11:26pm

I appreciate the suggestions here so far, and I'd like to sugggest something that I think can clarify the use from non-copying types for things like DSL use and for developers coming from C++.

Lets use the existing API for macros and property wrappers and just add a specific case for this fixed-size stack-allocated array. It'll be underscored so that it isn't accidentally used by developers who don't know what the type is doing. This will add on to the Array type so we have similar semantics but without things like count or append.

Here's my suggestion:

// To declare a variable with this new type
@__FixedSizedStackAllocatedArrayType_
let stackAllocated: Array<UInt8, @__StackAllocated_ Copyable T> = @__StackAllocatedRepresentable_ [
  // ...
]

The @__FixedSizedStackAllocatedArrayType_ will indicate that the array (a new keyword) will be stack-allocated and will remove the functions and variables associated with a dynamic heap-allocated array. The second part is the @__StackAllocated_ keyword that will indicate that the parameter of this type needs to be copied to the new stack-allocated array. The @__StackAllocatedRepresentable_ keyword also helps devs know that the following array representable is going to be copied into the binary's data section and will be stack allocated!

Let me know if you have any suggestions!

ibex10 · December 11, 2024, 11:41pm

Thank you for expressing the crux of the matter elegantly in plain English.

Let's name this type Vector and get along with it.

tera · December 12, 2024, 2:11am

Note: there's an ~ even split of opinion regarding the name: about half of this thread responders / those who liked - are "definitely Vector" while the other half are "anything but Vector".

tera · December 12, 2024, 2:21am

A wild (?) idea:

The preface:

array/dictionary/set variables are often named plurally: items, elements, names, etc.
There's a long established tradition (including in other languages) to name variables after types C c;

How about combining these two together and use Elements?

struct Elements<Element, capacity: Int> { ... }
let elements: Elements = [1, 2, 3]
let elements = Elements([1, 2, 3])

lorentey · December 12, 2024, 4:47am

Yes, precisely so! Vector will serve as a more usable alternative to homogeneous tuples; Vector<3, Int> is shaped like (Int, Int, Int). (And in fact in actual use, Vector will typically get used in places where we are currently forced to use homogeneous tuples instead.)

(Vector is also shaped like int3 or vec<3, int>. Shader languages are already universally calling these types "vectors", disproving any suggestion that "vectors" must model actual vector spaces, even in contexts that care deeply about linear algebra.)

(A third way to describe Vector is to approach it from its core API surface, by saying it is like a "fixed-size array": it uses zero-based integer indices, and it has indexing subscript. Which of these three approaches "clicks" appears to vary from person to person. I find the tuple and shader vector analogues are the most apt, as they both directly imply that Vector has inline storage -- I think that is its most notable feature.)

One major benefit of the Vector type over tuples is that it is a named type, so it can carry extensions and it can conform to protocols. (Early on we've decided against conforming copyable vectors to the current Sequence/Collection, because operations like makeIterator() or the slicing subscript would need to inherently copy the entire vector. But Vector will most definitely conform to the upcoming noncopyable container protocols, which are designed to avoid such gotchas. So in that sense, Vector will eventually become a "collection-like" type.)

I'm starting to get ever so slightly frustrated by this line of reasoning. Whether or not Vector is a "currency type" has no relation to whether or not we need to be able to spell or say its name. A type doesn't need to appear in public function signatures for us to need to do that -- it is enough for the type to be mentioned in function implementations. Vector will obviously find quite a lot of use in such.

As it happens, I also do strongly believe that Vector will also become a bona fide currency type. Not a widespread one, mind you -- it will not appear in public API surfaces nearly as frequently as, say, Int is -- but it will nevertheless see a nontrivial amount of use.

For one thing, we intend to (eventually, gradually) switch to importing C arrays into Swift as Vector, and this fact alone makes Vector a currency type, by definition. Vector types will show up in Swift as members of public C structs. They will therefore inherently appear as currency types on module boundaries.

But more generally, Vector provides an (often far better) alternative to native Swift homogeneous tuple types like (Int, Int, Int). Such tuple types do frequently get used in pure Swift signatures, and Vector will naturally replace at least some such use. We will most definitely need to be able to talk about a function taking or returning Vectors, or the need to for a function implementation to introduce a local Vector variable, or suggest a struct type to include an internal vector as a stored property. Some of this use will most certainly cross module boundaries, which also makes Vector a currency type.

Here is a random sampling of homogeneous tuple types that are used within a couple of core Swift repositories; I very much expect that we'll almost always prefer to use vectors in similar cases in the future. I promise you we'll need to mention this type, both in written code reviews and everyday oral conversations.

@usableFromInline typealias Buffer = (UInt8, UInt8, UInt8, UInt8, UInt8, UInt8, UInt8, UInt8, UInt8, UInt8, UInt8, UInt8, UInt8, UInt8) //len  //enum

public typealias Mantissa = (UInt16, UInt16, UInt16, UInt16, UInt16, UInt16, UInt16, UInt16)

public typealias uuid_t = (UInt8, UInt8, UInt8, UInt8, UInt8, UInt8, UInt8, UInt8, UInt8, UInt8, UInt8, UInt8, UInt8, UInt8, UInt8, UInt8)
public typealias uuid_string_t = (Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8)

internal typealias RawBitPattern = (UInt64, UInt64)

typealias Elf_Magic = (UInt8, UInt8, UInt8, UInt8)

typealias Elf_Ident = (
  UInt8, UInt8, UInt8, UInt8,
  UInt8, UInt8, UInt8, UInt8,
  UInt8, UInt8, UInt8, UInt8,
  UInt8, UInt8, UInt8, UInt8
)

  internal init(_rawSeed: (UInt64, UInt64)) {...}
  internal static var _executionSeed: (UInt64, UInt64) {...}

On the contrary -- of course we can choose to use proper terms! Swift exists to fix the mistakes of its precursors, not to uncritically repeat them.

We haven't at all been shy about ripping up the sad legacy of C/C++ before. It does feel like Swift can be quite reasonably described as doing everything in the most un-C++ way we could imagine.

Swift has chosen to burn down C's regrettable declaration syntax
it has bravely gotten rid of headlining Cisms like ++/-- and for(;;) loops
it has had no qualms about eliminating semicolons, parens, explicit returns, or choosing to invent its own closure syntax
Even our function calls look different!
We never had any second thoughts about overruling STL's terrible precedents when naming types in our standard library:
- Our Array is not like C arrays or std::array
- Our Set is not like std::set
- Our String is not like std::string
- etc etc etc, ad nauseam. The list never ends.

How and why would the name Vector be the place where we stop and draw a line? It's a tiny, tiny drop in a vast ocean of very deliberate, unrepentant divergence.

Am I missing something obvious? Did I miss a memo?

One of the primary goals when naming things in any language is to create a useful and coherent taxonomy. We must not name things the same unless they are actually alike. Violations of this rule breed actual, widespread confusion.

In Swift's established taxonomy, the proposed Vector is emphatically not an array type. Therefore, we must not call it an array. I beg: please let us not fall into this trap.

CharlesS · December 12, 2024, 5:17am

I think we should call this type Florglequat.

Unencumbered by any usage in any existing programming language that could cause ambiguity
Unencumbered by any usage in any existing human natural language that could cause ambiguity
Will put an end to naming wars and restore harmony, as the entire community will unite over the fact that it is terrible
Honestly, any name is better than (Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8...)

hassila · December 12, 2024, 5:41am

For the record, this was not at all what I suggested (as you wrote this in reply to me specifically), but rather that the expected use of a type may have important influence on what the name should be to guide people correctly. There is a distinction.

I do strongly care about naming and think it is a crucial point of design and obviously the fact that we try to engage in that discussion even with such a strongly held conviction of that the proposed naming is right is because we care. We might not agree, but that doesn’t mean we don’t care. Rather the opposite.

Regardless, it is what it is, I have nothing more to add and will shut up.

Huge +1 overall on the type itself, very much looking forward.

ibex10 · December 12, 2024, 5:53am

The name Vector will be embraced once the realisation occurs that a Vector is just a homogeneous Tuple.

ibex10 · December 12, 2024, 5:57am

Or the Latin word Ordinare, but I like the sound of Vector better.

Andropov · December 12, 2024, 8:41am

In my example Vector<32, ImageID> was not meant to be the input to (nor a layer of) the ML model. Internally, the API would would fetch the images based on the ID stored in the Vector<32, ImageID> and use that as the input of the ML model. Since ImageID is just an ID, the vector is tiny, no need to reach for Span.

I was specifically thinking of an API wrapping CreateML's update(_:with:eventHandler:) (to fetch the images from a database based on ID) and forcing a known length for the InputSequence. Used to refine a model on-device.

I don't want to get too hung up on the ML example though, it's a bit outside my wheelhouse (I just thought of that example because I had recently used CreateML). It was just an example of an API requiring input elements being a batch of a known size. I think that's going to be relatively common.