SIMD additions

scanon · March 12, 2019, 5:44am

Hi all --

There are a few features that we pushed out of the previous SIMD proposal in the interests of time, as well as a few features that have been requested by internal Apple adopters of Swift as they've started using the new SIMD APIs.

A first draft of the pitch is here: https://github.com/stephentyrone/swift-evolution/blob/simd-additions/proposals/0000-simd-additions.md. Note that there's (as of now) no ABI stability, etc sections in the pitch. These are all purely additive changes to the library.

Looking forward to everyone's thoughts.

Torust · March 12, 2019, 7:55am

SIMD3 init(_ xy: SIMD2<Scalar>, _ z: Scalar)
SIMD4.init(_ xyz: SIMD3<Scalar>, _ w: Scalar)

I already provide extensions for this myself; definitely +1 to having them in the library.

SIMD.min(), SIMD.max(), SIMD.sum(), any(), all()

Again, I already provide these. +1.

For loading and storing from collections: I feel like this is better written as init(collection[start...]) rather than having start as a separate argument. This could maybe go with SIMD.replacePrefix(of collection: inout C), where C can also be sliced.

indexOfMinValue, indexOfMaxValue:

minComponentIndex or minScalarIndex maybe?

Swizzles: at the moment, I have this implemented through a giant auto-generated file that provides computed getters/setters. Having a nicer solution would be great.

Chris_Lattner3 · March 17, 2019, 4:05am

Hi Steve,

Here are my thoughts for whatever they are worth. I am not a numerics expert . Also + @rxwei for visibility:

I'm generally very +1 on this, the SIMD proposal was cut down to make the Swift 5 schedule so I'm thrilled you're coming back around to improve some of the stuff that was cut just for schedule reasons.
The rationale for any/all being free functions is inconsistent. I agree with your point that "These two are defined as free functions, because at use sites they read significantly more clearly", but that seems like it applies just as well to the other reductions. Why not make sum/min/max be global functions as well (at least as far as users see them, I understand there will be backing protocol requirements and members for the impl) for consistency?
The indexOf*Value proposals are a bit weird and we don't have a lot of precedent for such operations AFAIK. If you have no obviously great name for them, it might make sense to split them out to a separate discussion, and consider the intersection between their usecases and similar usecases on other collection'y things.
Similarly to the reduction operations, I don't see why Float4.max(a, b) is better than max(a,b). Is that required? Actually, is the actual issue that these are elementwise operations? If so, it seems really weird to me that Float4.max is elementwise without calling that out. We have the . operators to specify elementwise/pointwise operations, but it seems like a prefix word of some sort should be used on named methods to make this explicit.
If you add a one member to SIMD (something I'm generally +1 on) it is worth considering whether we should add them to the scalar types for consistency.

Thanks!

-Chris

taylorswift · March 17, 2019, 5:59am

Extending vectors

Add the following initializers:

extension SIMD3 {
  /// The vector (xy.x, xy.y, z)
  public init(_ xy: SIMD2<Scalar>, _ z: Scalar)
}

extension SIMD4 {
  /// The vector (xyz.x, xyz.y, xyz.z, w)
  public init(_ xyz: SIMD3<Scalar>, _ w: Scalar)
}

I’m worried that these SIMD types are starting to reach into Vector${N} territory, as it does sound like a lot of your apple-internal users are using them in-place of proper vector wrappers. I have these same functions in my vectors.swift.gyb file (except called extend and homogenize static functions), but it seems like you’re setting us up for a lot of API overlap once we add true vector types to the language.

Horizontal operations

Where is horizontal volume? I don’t find horizontal add much use except for implementing dot and cross product on the higher level vector type. Horizontal multiply on the other hand is a lot more useful for stuff like computing the number of pixels in an image size vector.

Min, max, clamp

YES. (I would also very much like to see clamp on scalar floats, but that already got rejected.) though my same concerns about API overlap apply here too.

scanon · March 17, 2019, 5:15pm

We've been pushing quite hard to keep the two things separate, actually. The particular extending initializers here are closer to the boundary, I agree, but still worth having on the SIMD side of things.

Literally none of our internal clients have asked for horizontal multiply, I've never wanted it in the course of writing ~10M lines of simd code in other languages, and there's no support for efficient implementation in hardware (so there's no advantage to having the abstraction instead of just using indices.reduce(into: 1).)

As a side note, representing an image size as as SIMD vector seems at first glance like an extremely odd use case--what is the SIMD representation buying you over a tuple or array? Does it make sense to perform an elementwise sum of image sizes? Are height and width even the same thing? This seems like a recipe for subtle errors.

taylorswift · March 17, 2019, 9:43pm

It’s useful for computing offsets/flattened positions, mainly through vector addition and subtraction. In the case of text shaping, having SIMD bitwise right shift is also useful with text shaping APIs, as a lot of them will give 2D coordinates and glyph image dimensions in 1/64s.

jrose · March 18, 2019, 6:51pm

I'm confused about the collection initializer. Don't we have that already using Slices? Or is it really important to convert the prefix of a Collection to a SIMD vector? That seems bizarre to me.

indexOfMinValue and indexOfMaxValue don't seem like they belong on SIMD vectors; they're general Collection operations. Do they have significantly faster SIMD implementations?

You could handle sum by partitioning on negative/positive and then adding pairs first, but I admit that seems like a lot of work and something that won't compile down to a single instruction.

scanon · March 18, 2019, 7:17pm

The primary use case for these API is "I have an Array<Float> that I got from some API, and I want to iterate over groups of three elements, process them as SIMD3<Float> and then do something else with them. Yes, this can be done in two steps via slices. That may be an adequate solution, but at least some of our users would like something more direct.

SIMD doesn't conform to Collection, so even if they're defined on Collection we'd still need some solution for SIMD vectors.

It actually might on some targets, but yeah, users shouldn't need to do that.

Ben_Cohen · March 18, 2019, 9:39pm

The min and max reductions must be methods on SIMD types to be consistent with the same pattern on Sequence. They are for selecting a single scalar (element) from the SIMD (sequence). It's clear that the SIMD being reduced should be self in these cases. This leaves the min free function for when you're comparing two peers, neither of which should be self. I think the same logic would apply to seq.sum vs sum(x, y) if we had it.

any and all are an unusual case because they aren't operating on a SIMD like min/max are. They're operating on the result of a SIMD comparison. It seems wrong to make that mask "self" in the expression. The member equivalent would need to be something like if (x .< 0).anyIsTrue( ), which is ungainly to type, difficult to read, and has discoverability challenges.

I think I agree. My other worry is there's a reasonable case to be made that SIMD should be conditionally Comparable (lexicographically – this is often requested of Array) and if that happens (or if a user does it themselves retroactively) we'll get ambiguity with the other max.

jrose · March 18, 2019, 10:13pm

Let's see…

for i in stride(from: fullArray.startIndex, to: fullArray.endIndex, by: 3) {
  let next = SIMD3(fullArray[i..<(i+3)])
  doSomethingWith(next)
}

Or with offset subscripts:

var remaining = fullArray[...]
while !remaining.empty {
  let next = SIMD3(remaining[offset: ..<3])
  doSomethingWith(next)
  remaining = remaining[offset: 3...]
}

I'm not sure the init(_:start:) variant adds enough value.