I agree that Dictionary
is not the best name for this type and I'd much rather have it named more practically. I suspect that the choice of the name was entirely dictated by Objective-C, seeing as Swift was designed to be a replacement for Objective-C and one of its goals was to make the transition as smooth as possible.
And when they need to copy their data in to a Vector
, they'll find the expected Vector<let N: Int, Element>.init(_: some Sequence<Element>)
does not exist.
And when they get a Vector
out of a function and need to copy it in to an Array or something, they'll find the expected RangeReplaceableCollection.init<let N: Int>(_: Vector<N, Element>)
also does not exist.
So we definitely don't make it seamless for people to write functions taking/returning vectors.
The only way for an Array
not to be homogeneous is to store dynamically typed boxes, like an existential or AnyHashable
. Vector
can do that trick just as well, so I don't see how it's a difference here. Now, if we were talking about the difference between a Vector
and a tuple
, then this would make sense.
Looking at this whole issue from slightly different angle: I know that this (overloading the type name like below) is not supported currently. But why?
struct Array<Element> { ... }
struct Array<Element, capacity: Int> { ... }
// usage:
let a1: Array<Int> = ...
let a2: [Int] = ...
let b1: Array<Int, 10> = ...
let b2: [Int, 10] = ...
Could have been an elegant solution to this big naming debate.
If we had that shorthand, I think we'd be much more likely to use it for something like an array with a fixed-capacity inline representation (so up to N elements would be stored inline, and beyond that point it would flip over to being a normal array; Karoy calls this SmallArray<inlineCapacity, Element>
following the precedent of LLVM's "SmallXxxx" collections).
Today, you can also write let a1: Array = ...
and it will infer the generic type arguments (this is true for any generic type). Introducing these overloads (if the language allowed it) would make existing code ambiguous, because something like let a1: Array = [1, 2, 3]
could mean either Array<Int>
or Array<Int, 3>
in your example.
One could try to carve out exceptions, like disallowing inference of integer type arguments or preferring overloads that don't have them, but I think we do want the ability to infer it because it's a lot nicer to write let v: WhateverWeNameThisThing = [1, 2, 3]
than let v: WhateverWeNameThisThing<3, Int> = [1, 2, 3]
.
the lesson we should take from the mythical String.contains(_:)
(née (extension in _StringProcessing).StringProtocol.contains(_:)
) is this style of API works, and some even find it elegant, but it just doesnât have great discoverability.
users are accustomed to packages vending types. the types carry API. this is âsimpleâ and matches peoplesâ expectations for where to look for things. vending Foundation-style API âlayersâ just leads to confusion and inhibits API uptake. the people who know which bricks in the wall to push (usually, the people who wrote the API) will thrive but others will struggle.
You could write this, but I wonder when would you want writing this?
let a1: Array = [1, 2, 3]
?
(I would've understood if you picked something that isn't inferred as Array:
let a1: Set = [1, 2, 3]
If such generic type overloading was allowed I'd (conceptually) treat Array as if it was defined:
struct Array<Element = infer, capacity: Int = 0> { ... }
where the corresponding generic parameters have default values (with capacity=0 meaning the current "dynamic array" and capacity > 0 meaning the newly discussed type. "StaticArray"?
Continuing to peel that onion will lead to further roadblocks. For example, if someone writes:
extension Array { ... }
Does that now extend both forms of Array
? Would we need to introduce more magic syntax to differentiate the two while preserving the existing meaning of the former?
Even if all of this were solvable, from a readability and education point of view, the idea that two types both named Array
would expose vastly different APIs solely on the basis of the number of generic parameters they were instantiated with would be a gigantic pill to swallow.
These days you really should not use Vector<32, ImageID>
. Rather, you want to represent your vectors/tensors in something that describes its shape, build up the graph of computes connecting these shapes, do some sort of optimization on that graph, and compile it to run on some backend hardware that probably isn't your CPU.
Granted, sooner or later you'll need the in/out put of the ML model to be represented in your process, you'd probably want to pre-allocate some memory that can be reused for multiple batches of training. (in short, Span
).
I appreciate the suggestions here so far, and I'd like to sugggest something that I think can clarify the use from non-copying types for things like DSL use and for developers coming from C++.
Lets use the existing API for macros and property wrappers and just add a specific case for this fixed-size stack-allocated array. It'll be underscored so that it isn't accidentally used by developers who don't know what the type is doing. This will add on to the Array
type so we have similar semantics but without things like count
or append
.
Here's my suggestion:
// To declare a variable with this new type
@__FixedSizedStackAllocatedArrayType_
let stackAllocated: Array<UInt8, @__StackAllocated_ Copyable T> = @__StackAllocatedRepresentable_ [
// ...
]
The @__FixedSizedStackAllocatedArrayType_
will indicate that the array (a new keyword) will be stack-allocated and will remove the functions and variables associated with a dynamic heap-allocated array. The second part is the @__StackAllocated_
keyword that will indicate that the parameter of this type needs to be copied to the new stack-allocated array. The @__StackAllocatedRepresentable_
keyword also helps devs know that the following array representable is going to be copied into the binary's data section and will be stack allocated!
Let me know if you have any suggestions!
Thank you for expressing the crux of the matter elegantly in plain English.
Let's name this type Vector
and get along with it.
Note: there's an ~ even split of opinion regarding the name: about half of this thread responders / those who liked - are "definitely Vector" while the other half are "anything but Vector".
A wild (?) idea:
The preface:
- array/dictionary/set variables are often named plurally: items, elements, names, etc.
- There's a long established tradition (including in other languages) to name variables after types
C c;
How about combining these two together and use Elements
?
struct Elements<Element, capacity: Int> { ... }
let elements: Elements = [1, 2, 3]
let elements = Elements([1, 2, 3])
Yes, precisely so! Vector
will serve as a more usable alternative to homogeneous tuples; Vector<3, Int>
is shaped like (Int, Int, Int)
. (And in fact in actual use, Vector
will typically get used in places where we are currently forced to use homogeneous tuples instead.)
(Vector
is also shaped like int3
or vec<3, int>
. Shader languages are already universally calling these types "vectors", disproving any suggestion that "vectors" must model actual vector spaces, even in contexts that care deeply about linear algebra.)
(A third way to describe Vector
is to approach it from its core API surface, by saying it is like a "fixed-size array": it uses zero-based integer indices, and it has indexing subscript. Which of these three approaches "clicks" appears to vary from person to person. I find the tuple and shader vector analogues are the most apt, as they both directly imply that Vector
has inline storage -- I think that is its most notable feature.)
One major benefit of the Vector
type over tuples is that it is a named type, so it can carry extensions and it can conform to protocols. (Early on we've decided against conforming copyable vectors to the current Sequence
/Collection
, because operations like makeIterator()
or the slicing subscript would need to inherently copy the entire vector. But Vector
will most definitely conform to the upcoming noncopyable container protocols, which are designed to avoid such gotchas. So in that sense, Vector
will eventually become a "collection-like" type.)
I'm starting to get ever so slightly frustrated by this line of reasoning. Whether or not Vector
is a "currency type" has no relation to whether or not we need to be able to spell or say its name. A type doesn't need to appear in public function signatures for us to need to do that -- it is enough for the type to be mentioned in function implementations. Vector
will obviously find quite a lot of use in such.
As it happens, I also do strongly believe that Vector
will also become a bona fide currency type. Not a widespread one, mind you -- it will not appear in public API surfaces nearly as frequently as, say, Int
is -- but it will nevertheless see a nontrivial amount of use.
For one thing, we intend to (eventually, gradually) switch to importing C arrays into Swift as Vector
, and this fact alone makes Vector
a currency type, by definition. Vector
types will show up in Swift as members of public C structs. They will therefore inherently appear as currency types on module boundaries.
But more generally, Vector
provides an (often far better) alternative to native Swift homogeneous tuple types like (Int, Int, Int)
. Such tuple types do frequently get used in pure Swift signatures, and Vector
will naturally replace at least some such use. We will most definitely need to be able to talk about a function taking or returning Vector
s, or the need to for a function implementation to introduce a local Vector
variable, or suggest a struct
type to include an internal vector as a stored property. Some of this use will most certainly cross module boundaries, which also makes Vector
a currency type.
Here is a random sampling of homogeneous tuple types that are used within a couple of core Swift repositories; I very much expect that we'll almost always prefer to use vectors in similar cases in the future. I promise you we'll need to mention this type, both in written code reviews and everyday oral conversations.
@usableFromInline typealias Buffer = (UInt8, UInt8, UInt8, UInt8, UInt8, UInt8, UInt8, UInt8, UInt8, UInt8, UInt8, UInt8, UInt8, UInt8) //len //enum
public typealias Mantissa = (UInt16, UInt16, UInt16, UInt16, UInt16, UInt16, UInt16, UInt16)
public typealias uuid_t = (UInt8, UInt8, UInt8, UInt8, UInt8, UInt8, UInt8, UInt8, UInt8, UInt8, UInt8, UInt8, UInt8, UInt8, UInt8, UInt8)
public typealias uuid_string_t = (Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8)
internal typealias RawBitPattern = (UInt64, UInt64)
typealias Elf_Magic = (UInt8, UInt8, UInt8, UInt8)
typealias Elf_Ident = (
UInt8, UInt8, UInt8, UInt8,
UInt8, UInt8, UInt8, UInt8,
UInt8, UInt8, UInt8, UInt8,
UInt8, UInt8, UInt8, UInt8
)
internal init(_rawSeed: (UInt64, UInt64)) {...}
internal static var _executionSeed: (UInt64, UInt64) {...}
On the contrary -- of course we can choose to use proper terms! Swift exists to fix the mistakes of its precursors, not to uncritically repeat them.
We haven't at all been shy about ripping up the sad legacy of C/C++ before. It does feel like Swift can be quite reasonably described as doing everything in the most un-C++ way we could imagine.
- Swift has chosen to burn down C's regrettable declaration syntax
- it has bravely gotten rid of headlining Cisms like
++
/--
andfor(;;)
loops - it has had no qualms about eliminating semicolons, parens, explicit
return
s, or choosing to invent its own closure syntax - Even our function calls look different!
- We never had any second thoughts about overruling STL's terrible precedents when naming types in our standard library:
- Our
Array
is not like C arrays orstd::array
- Our
Set
is not likestd::set
- Our
String
is not likestd::string
- etc etc etc, ad nauseam. The list never ends.
- Our
How and why would the name Vector
be the place where we stop and draw a line? It's a tiny, tiny drop in a vast ocean of very deliberate, unrepentant divergence.
Am I missing something obvious? Did I miss a memo?
One of the primary goals when naming things in any language is to create a useful and coherent taxonomy. We must not name things the same unless they are actually alike. Violations of this rule breed actual, widespread confusion.
In Swift's established taxonomy, the proposed Vector
is emphatically not an array type. Therefore, we must not call it an array. I beg: please let us not fall into this trap.
I think we should call this type Florglequat
.
- Unencumbered by any usage in any existing programming language that could cause ambiguity
- Unencumbered by any usage in any existing human natural language that could cause ambiguity
- Will put an end to naming wars and restore harmony, as the entire community will unite over the fact that it is terrible
- Honestly, any name is better than
(Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8, Int8...)
For the record, this was not at all what I suggested (as you wrote this in reply to me specifically), but rather that the expected use of a type may have important influence on what the name should be to guide people correctly. There is a distinction.
I do strongly care about naming and think it is a crucial point of design and obviously the fact that we try to engage in that discussion even with such a strongly held conviction of that the proposed naming is right is because we care. We might not agree, but that doesnât mean we donât care. Rather the opposite.
Regardless, it is what it is, I have nothing more to add and will shut up.
Huge +1 overall on the type itself, very much looking forward.
The name Vector
will be embraced once the realisation occurs that a Vector
is just a homogeneous Tuple
.
Or the Latin word Ordinare
, but I like the sound of Vector
better.
In my example Vector<32, ImageID>
was not meant to be the input to (nor a layer of) the ML model. Internally, the API would would fetch the images based on the ID stored in the Vector<32, ImageID>
and use that as the input of the ML model. Since ImageID
is just an ID, the vector is tiny, no need to reach for Span
.
I was specifically thinking of an API wrapping CreateML's update(_:with:eventHandler:) (to fetch the images from a database based on ID
) and forcing a known length for the InputSequence
. Used to refine a model on-device.
I don't want to get too hung up on the ML example though, it's a bit outside my wheelhouse (I just thought of that example because I had recently used CreateML). It was just an example of an API requiring input elements being a batch of a known size. I think that's going to be relatively common.