SE-0251: SIMD Additions

I'm pretty unhappy with the swizzle API silently accepting out-of-range indexes. Nothing else in Swift works that way except &+, and it's not like it's a hugely expensive check (one equality check the original value + one branch in the happy path). If it's going to do that, maybe the subscript should have an argument label.

(Also, you've got a typo: xyz in the code example should be zyx.)

Everything else looks pretty good.


It does not silently accept out-of-range indices: the index is defined to wrap, so there is no such thing as "out-of-range". This is an important improvement in functionality; it is what lets you build larger table lookups out of multiple swizzles:

let table0 = SIMD16<UInt8>{ /* values for 0...15 */ }
let table1 = SIMD16<UInt8>{ /* values for 16...31 */ }
var result = table0[index]
result.replace(with: table1[index], where: index .>= 16)

Wrapping isn't the only semantic that works for this--all we need is that no input is invalid so we can do the operation first and then fix it up later. This principle doesn't only apply to swizzles-it's a very common pattern in effective SIMD code to do both paths of an operation and pick the one with the right result at the end, because branching is often prohibitive.

Wrapping is an explainable choice, one that is already precedented by &+ (and especially &>> and &<< which are almost precisely analogous), and one that is efficiently implementable for all SIMD architectures.

If the consensus is that people are strongly opposed to this behavior, the right resolution is to add a wrapping: label on the subscript. Making it trap is both unnecessarily expensive and makes the operation less useful.


IMO writing the & (or -) explicitly in that code would make it clearer what was going on and would still ensure that the check is optimized away. wrapping: works too, but it's not the common case anyway.

How do we justify forcing people to write arithmetic operations that are always optimized away? We have &>> so people don't have to write x >> (count & x.bitWidth) every time.

People do not (only) write source code for the benefit of the compiler.

1 Like

Right. The compiler doesn’t care about excess noise arithmetic that can simply optimize away. The people writing and reading the code do care, however.

“Compute both sides of a branch and select the valid result” is an important programming model for SIMD. This often gets people into trouble as C compilers get smarter and exploit UB, but rather than taking that pattern off the table, we can fix this without sacrificing performance by defining those edge cases. Wrapping table indices is one such opportunity.

Put another way: if we are unwilling to embrace any SIMD programming idioms, why are we even providing SIMD types or operations? If we want SIMD code to look the same as scalar code, then we should just tell people to write scalar code and improve the autovectorizer.

1 Like

if we are unwilling to embrace any SIMD programming idioms, why are we even providing SIMD types or operations?

This is hyperbole. Not embracing one part of one idiom is not the same as being unwilling to embrace any idioms. We think automatic truncation is an important pitfall elsewhere in the language; why is this any different?

1 Like

We think it's an important pitfall, but we have escape valves for it in cases where it's idiomatic and necessary for performance, like &+ and &>>. This is such a case.

In particular, for dynamic indices, checking that they are in-range before doing the shuffle is a significant performance hazard. For vectors that fit in register and a dynamic index vector, it takes us from and + shuffle to, best case, something like add + cmp + movmsk + test + jne + shuffle (x86) (or maybe conjure mask + ptest + jne + shuffle depending on baseline) or cmp + minv + fmov + cmp + bne + tbl (arm).

1 Like

I don't have a strong opinion on most of the mini proprosals here, but I don't like polluting the global namespace with any and all.

  • Static scalarCount: +1

  • Extending vectors: Ok, but why not forms where the scalar comes before the vector? Why only afterward? I agree that the label adds no value.

  • Swizzle design: -1 as written. This is a complicated topic with many possible design directions. Given the weirdness with vec3 in particular, I can't get excited about this design. Did you consider having the subscript take an enum, which would allow expressions like vec[.x, .x, .y, .z]? It isn't as general as what you propose, but is more ergonomic for the common case and doesn't have the vec3 weirdness.

  • Reductions: I get why you want any/all to be global functions. I was originally opposed to this but have since come around. That said, I don't understand why you want wrappedSumand min to be properties - This seems inconsistent and strange to me, and the alternatives section doesn't explain this.

Why not make min and sum be global functions? If this causes semantic problems (seems like it might, because min(a,b) should be defined on vectors too, and we don't want min(x) to be different) then maybe it should be named reduceMin or something else to make it distinct that it is a horizontal operation. If there is a good compound name for this, then perhaps that compound name should be applied to any/all to make them more distinctly obvious as horizontal vector operations.

  • clamp/clamped both make sense as members to me, +1.

  • Why aren't lanewise min and max operators overloads of the existing global min and max? I don't understand the "These could be free functions, but that introduces an ambiguity if someone retroactively conforms SIMD types to Comparable because they want lexicographic ordering." point, that doesn't seem like something we cater to unless it is super common and makes sense in practice.

  • one: +1

Yes, this all seems fine to me, happy to see SIMD getting fleshed out.


1 Like

i’ve never found x | yz to be all that useful. half the time i’m extending vectors it’s xy | 1 to homogenize something 2D or 3D to something 3D or 4D, the the other half i’m just reassembling a 3D quantity where the z coordinate needed some special processing.


wasn’t this rejected for scalar floats a while back? not that i am opposed to it

Why cause a problem and then work around it when there's little to motivate causing it in the first place?

Having min and max be properties of SIMD is consistent with the same pattern on Sequence . They are for selecting a single scalar (element) from the SIMD (sequence). There is a clear self – the SIMD or sequence being reduced – in these cases.

This leaves the min free function for when you're comparing two peers, neither of which should be self. I think the same logic would apply to seq.sum vs sum(x, y, z) if we had them.

any and all are an unusual case because they aren't operating directly on a SIMD like min / max are. They're operating on the result of a SIMD comparison. It seems wrong to make that mask self in the expression. That's why they should be free functions IMO. This is a different reason to over in SE-0246, where free functions are preferred because that's the convention with sin, log etc. That justification doesn't seem to be strong enough with SIMD operations.


This requires blowing up the index vector for dynamic permutes (table lookups). For some vector code (e.g. a lot of string operations) that's more common than static swizzles, and this would make it much less ergonomic. The SIMD3<T> weirdness is mildly annoying to define, but I don't think will be an issue at all in practice--one simply doesn't use 3-element dictionaries for permutes. We could alternatively simply define the index to be modulo dictionary size; that would make the 3-element dictionary case slower, but as I said, I don't think that case really matters--we just need to pick something and define it.

There's also a crowd that wants these to be super-terse and as close to the clang-style v.xxyz as possible. What you sketched out doesn't allow for a terser style, while the SIMD subscript lets you do:

let xxyz = SIMD4(0,0,1,2)

and then use that repeatedly as v[xxyz], which is really pretty nice. I don't think we need to put the whims of the "terseness above all" crowd first, but it's nice that this form lets you be more or less expressive as needed.

We could also wrap that in some compilerEvaluable + dynamicMemberLookup at some future point to actually allow v.xxyz if we wanted to.

Right. x | yz could be defined, but xy | z is definitely the 99% case.

ok, fair enough. I have effectively zero experience with this sort of thing, so I'm very happy to just trust your experience. It can always be added later if there is some user demand later.

Yes, I agree that dynamic indexing would be unpleasant with only static indexing. That said, why not have both?

I don't find your let xxyz = SIMD4(0,0,1,2) example compelling as a tersifier - it requires you to write a temporary for every swizzle you want to do, or are you proposing that these become globals?

If you went with both a static and dynamic subscript approach, then I think you'd only add dynamic swizzles for power of two vectors. Is there a usecase to have them on vec3 at all?

I'm saying that if you use specific static shuffles frequently, you can define them like that to achieve the terse syntax. If you don't, then it doesn't matter much. Most code that makes heavy use of swizzles uses a few specific swizzles, not a random sampling of the possible space =)

The other thing I like about this is that it gives a nicer way for people to use domain-specific names. Maybe for my problem it's not really xxyz but rather rrgb or aabc or whatever.

The main reason I avoided this is that it prevents defining the operations on the SIMD protocol and instead moves them to only be on the concrete types. That seems worse. It also requires O(n^2) definitions once you move them to the concrete types, which is admittedly only a minor inconvenience but also a sign that it's not the nicest design.

1 Like

SE-0251 has been accepted with modifications; please take any further comments to that thread.