SE-0229 — SIMD Vectors


(Steve Canon) #44

Hi 00buggy00 -- can you clarify what you mean by "Swift feel"? You use that term in a few places, but other than Vector4<Float>, you don't really offer any alternatives of what you think would have more "Swift feel", so it's a little bit hard to see what you're getting at.

To me, Vector4<Float> vs Float.Vector4 is basically "colour" vs "color". Neither feels especially more "Swift" to me--one is the free spelling and the other the member spelling. "Swift" for me is more about what the language lets you do and the safety that it gives you. simd_float4 is not-very-swift, because it doesn't follow the conventions and doesn't enable generic code.

The reason I personally prefer Float.Vector4 is that it's not fighting the language; I don't mind the extra boilerplate, but I think that we should encourage styles that minimize the boilerplate for users who copy the style of the standard library in their own code. With the language as it stands today, that makes me lean towards Float.Vector4. If you all feel really strongly about Vector4<Float>, I'm OK with that, but I hope that you'll push just as hard for the language features that would actually make that the natural spelling.

As mentioned before, there is no information that Vector4<Float> has but Float.Vector4 does not. Both types know that they have precisely 4 elements. These are essentially equivalent formulations. It is OK to prefer one or the other on style grounds, but many of the arguments that I've seen advanced claim that one lets us do something that the other does not. I am yet to see any example where this is actually the case.

In this case, the count property is still useful in both instances to enable writing code that is generic over vector length. It exists because over the course of a few months writing code against prototypes of this proposal, I found that it was necessary.

Ditto subscripting. It is not a vestigial limb of Collection, but rather a key escape valve for writing real code. Absent Collection conformance, count and subscript are the tools we have to iterate over a vector. To some extent this can be addressed by adding variants of map and reduce for these types, but these two operations would still be necessary.

This is, AFAIK, the first concrete example of something that is actually simpler using the Vector4<Float> model that anyone has put forward, and is well worth consideration. Thank you.


(Steve Canon) #45

Because there isn't one Vector4<VectorBool> type. Comparisons on Vector4<Int8> (on x86_64 or arm today) produce a mask where each lane is represented by eight bits. Comparisons on Vector4<Float> produce a mask where each lane is represented by thirty-two bits. It's vitally important for performance in SIMD code to keep lanewidth fixed as much as possible; you do not want to be going through narrowing / widening operations all over the place. We could collapse these types, but the optimizer today cannot eliminate all of the narrowing and widening operations that would entail, and long experience with the LLVM vector backend suggests that it would be folly to depend on that optimization.


(Steve Canon) #46

The biggest issue I see here is coming up with names that generalize to long vectors. xyzw works well for Vector4. 0123456789abcdef gets us to Vector16. Beyond that, I'm stuck.

Generic tuple subscripting allowing extracting multiple members would be fun: result.0112358d, but in the meantime the gathering: init gives the same functionality in slightly more verbose form.


(Tino) #47

Imho this is one of many topics where a generous evaluation phase would make a lot of sense:
I bet the discussion could go on for months, and the final result most likely would still have significant flaws.
At the same time, there is an implementation that could be deployed right away, enabling people to gain hands-on experience...
It would even be possible to include several spellings - I guess simply adding another implementation would take less time than arguing about its theoretical benefits.


(Brent Royal-Gordon) #48

One of the standard library's jobs is to handle and hide the awkwardness of builtin types. This is why GYB and the Builtin module exist in the standard library but not elsewhere. Users need to be judicious about which parts of the standard library they copy.

In other words, we should favor an approach which makes the public interfaces as simple and clean as possible, rather than one which reduces boilerplate in the standard library but exposes the shape of the underlying implementation.

In other other words, Vector3<Float>, not Float.Vector3.

If we're worried users will take the wrong lessons from the standard library's design, we could have separate VectorizableN and _BuiltinVectorizableN protocols to separate the part users should emulate from the part they shouldn't:

// Protocol anyone can conform to in order to allow vector operations on a type.
// CGFloat could conform to this, for instance, and specify `Float`
// or `Double` for its `Vectorizable4Type` on a platform-specific basis.
public protocol Vectorizable4 {
  associatedtype VectorizableType: _BuiltinVectorizable4
  init(vectorizable: VectorizableType)
  var vectorizable: VectorizableType { get }
}
public protocol _BuiltinVectorizable4 {
  // Wrappers around primitive operations
}

This is an excellent reason (and one I didn't have the background to understand on my own). Thanks for explaining.

Is there a reason why, for instance, Vector4<Int32> and Vector4<Float> should have the same mask type? I understand that they can because they have the same bit width and lane width, but are there reasonable cases where you would want to compute the mask for a Float vector from an Int32 vector or vice versa? If not, we might consider making each VectorN<T> have a distinct VectorN<T>.Mask type. That would prevent accidental mixing and further reduce the public interface by changing a typealias + several possible types to use for it into a concrete nested type.


(Steve Canon) #49

I would say that cases where you want to intermix these are rare, but they do exist. You would be able to work around it via an explicit conversion, of course, which would compile down to a no-op.

I would be more or less fine with doing it either way; it would reduce the set of top-level types, but increase the total set of exposed types, and complicate the implementation of some of the functions on the float and integer types ever so slightly.


(^) #50

The subscript is very useful (you can do stuff like selecting basis vectors with way fewer if statements) but I don’t think it means SIMD vectors are Collection-like or that it should be removed to ward off any suspicions. the array initializer should really be thought of as a way to initialize a vector from raw data,, maybe a Array.load(at:) method that returns a vector might be more appropriate.


(^) #51

think of the count property as being analogous to the bitWidth property on scalar types. also keep in mind at one point we were actually planning on adding subscripts to homogenous tuples so having a [] doesn’t mean Collection.

If the previously mentioned naming convention were used, you wouldn't need a count method as you already known how many values are in the vector.

Saying you don’t need the count property because the length of the vector is in the name is like saying Int32 doesn’t need the bitWidth property because the number of bits is in the name. it’s useful for generic programming.

is this a bug or a feature?


(Karl) #52

Hah, well I suppose it’s better to “measure twice, cut once”. There’s no rush; Swift 5 won’t be released for a while yet.

It’s a really good start (not taking anything away from it - I think steve has done a great job and obviously has vast expertise in this area), but I think it’s clear from the comments so far that the design as proposed is not ready to become part of the standard library. So that’s my answer to the actual review part of this thread.


(Ben Cohen) #53

This is not really accurate. The final branch date for Swift 5 is only about a month off.


(^) #54

and here we have an example of a procrastinator, and an anticipator


(Chris Lattner) #55

Great point.


(Chris Lattner) #56

.xyxy syntax is definitely useful for Vec2/3/4, perhaps useful for vec8. by the time I get to something like vec.bc4e I can't keep track of what is going on, so it serves no value. I don't think we "have to" support this for large vectors of 8 or more, and I see little value in doing so, so lets not!

What do you think? Have you found reasonable use cases for this syntax of vec8+ types in your code?

-Chris


(Chris Lattner) #57

Just to probe on this a little bit - given that we don't want vectors to be used as collections, and given that .count on arithmetic types can be confusing, wouldn't it make more sense to name this property .elementCount?

-Chris


(^) #58

probably, yeah


(Steve Canon) #59

I would push back lightly on that and say that even though they aren't collections, when you are using count, you are using them in ways very similar to how you would use Collection; it's the established name in Swift for "the number of elements in a thing", and we should stick to that, rather than making up a new term for each domain. If we defined this number on Tuples (I don't see any reason to, but ...) I expect we would call it count as well.


(Myles Schultz) #60

When I look at Float.Vector4 the first thing that comes to mind is a nested type and when I try to reason about what that means, I feel it is misleading and thus not very Swift. To me, when I see Float, I think that of a single value. Then when I look at that Float's nested type, Vector4 I see something is plural. Unlike Array or Dictionary where I expected a singular thing that holds many things, I do not think of a Float, as one number, not multiple numbers. Vector4<Float> reads in a more straightforward way, in my opinion, because I expect a vector to be one thing that is made of multiple values. A Vector4 contains four values, and <Float> tells me that those values are each a Float.

To me, it is the difference between a "bushel of apples" and an "apple of bushels".

I completely agree with you.

In the discussion thread, it was conceded that the presence of difficulty in using Vector4<Float> as the natural spelling was likely a sign of a language deficiency and one that should be amended. It would be better to get this truly working now then to look back later and say, "why didn't we do that", and "it's too messy to fix it now"--should we decide to go with Vector4<Float> which I personally hope that we do.

It would be great if you could go into greater detail on what the languages features would be that would be needed.

To both you and @taylorswift, I see what you're saying. I don't think thinking of count as like a bitWidth is quite the right way to put it though. My thinking was that count feels more synonymous with a collection of elements, but I absolutely see your point and agree with both of you.

That's fine, but again, there is much talk about not thinking about vectors as collections. I think this is another case where the proposal could use more clarity on what is being implemented and why.


(Myles Schultz) #61

That's a good point, but also seems to indicate that, despite no one wanting to think of vectors as collections, they still are when comes right down to it. Not that I am voting to provide them with all of the associated conformances though. Just an observation, haha.


(Steve Canon) #62

We're getting off into the weeds here, but there's a lot more to Collection conformance than "contains a finite number of elements".


(Jens Persson) #63

I just feel the need to point out that this analogy might be a bit misleading in that the number of elements multiplied by the number of bits per element is not always the number of bits used to represent a SIMD vector.

For example a (SIMD) vector of 4 and a (SIMD) vector of 3 Floats are both represented by 4 * 4 * 8 == 128 bits.

import simd

typealias ML3f = MemoryLayout<simd_float3>
typealias ML4f = MemoryLayout<simd_float4>
print(
    ML3f.size       == ML4f.size        &&
    ML3f.stride     == ML4f.stride      &&
    ML3f.alignment  == ML4f.alignment
)
// prints true