Vector manifesto

taylorswift · February 14, 2019, 9:14pm

Hi all, there’s been a lot of talk for a while about getting vector types and fixed-size arrays into the language, so I’ve written up a document outlining what I think these features should look like, and how they’re related to upcoming improvements in the type system.

ladislas · February 14, 2019, 10:24pm

This is remarkable and impressive work and I can't wait to have those features added to the language so that we can start experimenting!

Regarding the literal syntax, I like the C/C++ syntaxt {1, 2, 3, 4, 5} using curly braces. Any reason it is not considered in the manifesto?

Torust · February 14, 2019, 10:31pm

I like the general direction. One specific question: you mention that fixed size arrays cannot conform to RandomAccessCollection. It’s not immediately obvious to me why that’s the case; a fixed size array can easily produce a slice (albeit one without a fixed size) that provides access to its contents. Could you elaborate on your reasoning here?

taylorswift · February 14, 2019, 11:45pm

lexically, it could work if we special-case the 0-length fixed-size array, since {} is a () -> () function. The syntax {,} might work for this case. I’m not so sure this is a good idea though, since we already have a tuple syntax, and so far, swift has told a pretty consistent lexical story where “curly braces mean block of code”

taylorswift · February 15, 2019, 12:00am

It certainly could, but it would be very inefficient because a slice would copy the entire fixed-size array into its self.base property. makeIterator() on Sequence has the same problem though, which violates this requirement

Expected Performance

A sequence should provide its iterator in O(1). The Sequence protocol makes no other requirements about element access, so routines that traverse a sequence should be considered O( n ) unless documented otherwise.

In the end it might not even be appropriate to give Vector<N:Int, T> Sequence conformance at all, and make self.withUnsafeBufferPointer the canonical way to do “sequence”/“collection” things to a fixed-sized array. But that would mean giving up the ability to use them with for _ in loops at all.

xwu · February 15, 2019, 12:06am

The earliest discussion of fixed-sized arrays, not linked to or mentioned in your document, is this one:

It's a lengthy discussion that is worth reviewing to incorporate its insights.

Torust · February 15, 2019, 12:09am

This all stems from the requirement that fixed size arrays be stack-allocated. I'm not sure that's the right way to go for large buffers, and for small buffers copying isn't an issue.

If they were implemented as a fixed-size heap allocated buffer which can be stack-promoted under the right circumstances in the same way that any Swift type can be. Then, a slice would just be a reference to the buffer, and all the usual Swift guarantees would apply.

To avoid the reference counting overhead, move-only types could be used, with slices as a scoped borrow. Having a way to enforce that a variable will be stack-allocated is another useful, orthogonal feature (e.g. stack var with a compiler error if it can't guarantee stack-promotion).

fswarbrick · February 15, 2019, 12:16am

I might have missed it, but I did not see any mention of how to declare a multi-dimensional vector. Given what I read I can only see it being this:

let fa2: Vector<8, Vector<16, Int>> // 8 elements ("rows") of 16 integers

Seems a little "heavy" to me. How about something like this?

let fa2: Vector<(8,16), Int>> // 8 elements ("rows") of 16 integers

fswarbrick · February 15, 2019, 12:19am

I'd like some consideration to be given to allowing the following:

let fstr = Vector<8, UInt8> = "RANDOM"

I think the result is "obvious". Not sure how this fits in with Unicode.

taylorswift · February 15, 2019, 12:26am

that thread being brought back from the dead a few days ago is why i decided to revisit this topic lol

updated the document with links to that thread, and the compile time parameters proposal inside it

That is what Array<T> is for. However, the range of “sensible fixed size arrays” that you’d want on the stack is usually about 2 ... 1024 elements, which is still far larger than any other type in the standard library, so it remains an issue

The way stack promotion works is the type itself is made the smallest size that fits the largest size we want to stack-promote, and then add a flag that indicates if the body of the type stores the contents inline or stores a reference to a heap buffer. So if you want to promote MemoryLayout<T>.size <= 256B, then your type has to be at least 257B big on the stack, even if your vector is only using 4 out of 256 bytes available. String, Character, and some protocol types are examples of this in the standard library

How would you define such a generic type?

I don’t think the result is obvious, once you take encodings and grapheme breaking into account

fswarbrick · February 15, 2019, 12:32am

First, would you agree that the first example I gave is "correct" for what you are pitching? If so, wouldn't my suggested shorthand just be sugar for that?

Second, I don't even understand the question, so I can't answer.

fswarbrick · February 15, 2019, 12:33am

taylorswift:

I'd like some consideration to be given to allowing the following:
let fstr = Vector<8, UInt8> = "RANDOM" 
I think the result is "obvious". Not sure how this fits in with Unicode.
I don’t think the result is obvious, once you take encodings and grapheme breaking into account

I guess more properly I should have said that the "use case" is obvious. Certainly there are further considerations, given Unicode.

fswarbrick · February 15, 2019, 12:36am

Since I have already made a fool of myself, why not take it one step further. Can/should n-based subscripting be supported?

let fa2 = Vector<1...4, Character> = ("A", "B", "C", "D")
// fa2[1] = Character("A")
// fa2[2] = Character("B")
// fa2[3] = Character("C")
// fa2[4] = Character("D")

taylorswift · February 15, 2019, 12:41am

No. what you are pitching would require a generics system way more flexible than anything we have on the drawing board right now.

fswarbrick:

I'd like some consideration to be given to allowing the following:
let fstr = Vector<8, UInt8> = "RANDOM" 
I think the result is "obvious". Not sure how this fits in with Unicode.
I don’t think the result is obvious, once you take encodings and grapheme breaking into account
I guess more properly I should have said that the "use case" is obvious. Certainly there are further considerations, given Unicode.

you could do this by making Vector<N:Int, T> conform to ExpressibleByStringLiteral and take the first N characters (or code units) of the string at run time.

by the way,, your fixed-size array is 8 code units long, but your string only has 6 ASCII characters. what goes in the last 2 array cells?

taylorswift · February 15, 2019, 12:42am

fswarbrick:

Since I have already made a fool of myself, why not take it one step further. Can/should n-based subscripting be supported?
let fa2 = Vector<1...4, Character> = ("A", "B", "C", "D")
// fa2[1] = Character("A")
// fa2[2] = Character("B")
// fa2[3] = Character("C")
// fa2[4] = Character("D")

Can it? yes, though I would spell it as Vector<1, 4, Character> to keep the implementation simple. Should it? i don’t think a lot of people would find value in this

fswarbrick · February 15, 2019, 12:44am

A lot of technical requirements in my line of business document using 1-based subscripting. Allowing for it in Swift would make it easier to translate the business rules to an implementation.

Torust · February 15, 2019, 12:51am

I'm not sure that's completely correct, although I'll certainly believe that's how it works in generic contexts. As I understand it, any UnsafeMutablePointer.allocate() and UnsafeMutablePointer.deallocate() pair may be converted to a stack allocation provided the compiler can prove it's safe. I can't find the reference for that at the moment, though.

Effectively, what we need for Collection conformance is a guarantee that the slice will not outlive its source, regardless of whether the source is stack or heap allocated. If that is known, then it's safe to form a pointer to the source (provided it's in memory and not just in registers). I'm not sure exactly how that would be implemented (apart from sharing a lot of machinery with moveonly types), but it seems necessary; I don't think FixedSizeArrays without Collection conformance are a worthwhile addition to the language.

taylorswift · February 15, 2019, 12:54am

i’ve never heard of this , @Andrew_Trick might know

Andrew_Trick · February 15, 2019, 1:15am

No, if you manually allocate a buffer, you can assume it will give you a pointer to the heap, and the memory can be freed by free on Linux/Darwin or _aligned_free on Windows. This is never going to be the fast allocation path. It's more of a least-common-denominator for interop.

Regular Swift types can be stack allocated--even classes--and you can get an UnsafeBufferPointer into that storage.

Torust · February 15, 2019, 1:31am

Given that, the potential uses of FixedSizeArrays as I see it are:

guarantee that the stack allocation occurs
guarantee that a FixedSizeArray var in a type (e.g. as a struct member) is stored inline.

I can see these two features being useful outside of FixedSizeArrays, and if possible I'd prefer they be implemented as more general features (as aware as I am of the problems that calling for a more general solution often comes with).

In terms of the in-memory representation: if a FixedSizeArray were stored inline, we could ensure that its size is only the size of its members. If it's not stored inline, then just treating it like any other heap buffer makes sense to me.