Specialize protocol witness for SIMDStorage

There is a hierarchy of protocols as follows

public protocol ColorSpaceBase<Components> {
  associatedtype Components: SIMD<Double>
}

public protocol ColorSpace: ColorSpaceBase {
  // ...
}

public struct Oklab: ColorSpace {
  public typealias Components = SIMD4<Double>
  // ...
}

mix(with:t:) method on Color type is called millions of times in the loop

public struct Color<ColorSpace: UIKitUtils.ColorSpace> {
  public typealias Components = ColorSpace.Components
  public let colorSpace: ColorSpace
  public let components: Components
  
  @_specialize(exported: true, where ColorSpace == Oklab)
  @inlinable
  public init(_ colorSpace: ColorSpace, _ components: Components) {
    self.colorSpace = colorSpace
    self.components = components
  }
}

extension Color: InterpolatableData {
  @_specialize(exported: true, where ColorSpace == Oklab)
  @inlinable
  public func mix(with other: __shared Self, t: __shared Double) -> Self {
    if t == 0.0 { return self }
    if t == 1.0 { return other }
    return .init(colorSpace, components + t * (other.components - components))
  }
}

and in Xcode Instruments Profiler i got this

Any idea how specialize protocol witness for SIMDStorage?

1 Like

The getter is marked @_alwaysEmitImtoClient, so I believe the compiler should already be able to devirtualize the call. I think you should file a bug.

I think this one

Those are marked @_transparent and should never appear in a backtrace. Also, if the compiler picked that getter to implement multiplication, it would effectively be undoing the SIMD entirely by multiplying each element individually.

1 Like

The compiler can fail to specialize if the call above is not able to specialize either, and that is dependent on how exactly mix is called. If the call happens through an unspecialized generic or an existential, then we'll hit this fully-unspecialized path.

This is a consequence of the fact that protocol witness tables are never specialized in Swift (or, more accurately, they're specialized as far as necessary to ensure that they work with all possible values of their type parameters). If you have to call through that witness table, you'll hit the unspecialized path.

This is a source of some pain in SwiftNIO, which calls through existentials on a regular basis.

2 Likes

Can you show the context of the call site to Color.mix? Are you ever calling this through a generic over InterpolatableData or through an any InterpolatableData?

This way

public protocol InterpolatableData { 
  func mix(with other: Self, t: Double) -> Self
}

private struct Stops<ColorSpace: UIKitUtils.RGBColorSpace, MixColorSpace: UIKitUtils.ColorSpace>: Sequence {
  internal typealias Element = (distanceRange: ClosedRange<Double>, fromColor: Color<MixColorSpace>, toColor: Color<MixColorSpace>)
}

@_specialize(where ColorSpace == ColorSpaces.SRGB, MixColorSpace == ColorSpaces.SRGBLinear)
@_specialize(where ColorSpace == ColorSpaces.SRGB, MixColorSpace == ColorSpaces.Oklab)
@_specialize(where ColorSpace == ColorSpaces.DisplayP3, MixColorSpace == ColorSpaces.DisplayP3Linear)
@_specialize(where ColorSpace == ColorSpaces.DisplayP3, MixColorSpace == ColorSpaces.Oklab)
private func _prepareImage<ColorSpace: UIKitUtils.RGBColorSpace, MixColorSpace: UIKitUtils.ColorSpace>(
  _ width: __shared Int,
  _ height: __shared Int,
  _ size: __shared Int,
  _ overallDistance: __shared Double,
  _ distances: __shared UnsafeBufferPointer<Double>,
  _ colorSpace: __shared ColorSpace,
  _ cgColorSpace: __shared CGColorSpace,
  _ mixColorSpace: __shared MixColorSpace,
  _ mixMethod: __shared Configuration.MixMethod.RawMethod,
  _ stops: __shared [Configuration.Stop]
) -> CGImage {
  for stop in Stops(colorSpace, cgColorSpace, mixColorSpace, overallDistance, stops) {
   for k in 0..<size {
      let mixedColor = stop.fromColor.mix(with: stop.toColor, t: t) 
   }
  }
}

We have to keep going up. What's the call site of _prepareImage?

@_specialize(where R == CGImage)
private func _withCalculateDistances<R>(
  _ rect: __shared CGRect,
  _ configuration: __shared Configuration,
  _ body: (_ width: __shared Int, _ height: __shared Int, _ size: __shared Int, _ overallDistance: __shared Double, _ distances: __shared UnsafeBufferPointer<Double>) -> R
) -> R {

   return body(...)
}

override func draw(_ rect: CGRect) {
   let image = _withCalculateDistances(rect, currentConfiguration) { width, height, size, overallDistance, distances in
      let rawColorSpace: any RGBColorSpace = currentConfiguration.colorSpace.rawColorSpace
      let rawMixColorSpace: any ColorSpace = currentConfiguration.mixColorSpace.rawMixColorSpace(for: currentConfiguration.colorSpace)
      return _prepareImage(width, height, size, overallDistance, distances, rawColorSpace, cgColorSpace, rawMixColorSpace, rawMixMethod, stops)
    }
}

These existentials look like the culprit. _prepareImage will be called with the witness table obtained by opening the any ColorSpace, which will have a pointer to the fully-generic implementation of mix.

1 Like

Yes, this analysis is correct.

To get optimisation, you need your code to be generic all the way up to a level where the concrete type is known. In your case here, you could try replacing the existentials with enums, then switch over the enum and call the function with the appropriate types. That'll allow the compiler to emit specialised calls.

While we're here, I recommend dropping the @_specialize annotations. They aren't doing what you want done: they force the compiler to generate specialisations, but they don't force the compiler to call them. For your use-case they don't matter: the compiler will specialise when it can.

2 Likes

@ksluder @lukasa Thanks! Now it works as expected.