Numeric type conversion: marking functions as equivalent for generic specialisation

Torust · August 28, 2020, 11:56pm

Swift currently has some significant performance shortcomings when dealing with conversions between integers and floating point types in generic contexts. As an example, I've been looking at optimising the following method:

@inlinable
func floatToUnormGenericA<I: BinaryInteger & FixedWidthInteger & UnsignedInteger>(_ c: Float, type: I.Type) -> I {
    if c.isNaN {
        return 0
    }
    let c = min(1.0, max(c, 0.0))
    let scale = Float(I.max)
    let rescaled = c * scale
    return I(rescaled.rounded(.toNearestOrAwayFromZero))
}

public func convertA(_ x: Float) -> UInt8 {
    return floatToUnormGenericA(x, type: UInt8.self)
}

The behaviour I want is for the generated assembly to be equivalent to the following, where the type I is made concrete:

@inlinable
func floatToUnormConcrete(_ c: Float, type: UInt8.Type) -> UInt8 {
    if c.isNaN {
        return 0
    }
    let c = min(1.0, max(c, 0.0))
    let scale = Float(UInt8.max)
    let rescaled = c * scale
    return UInt8(rescaled.rounded(.toNearestOrAwayFromZero))
}

public func convertConcrete(_ x: Float) -> UInt8 {
    return floatToUnormConcrete(x, type: UInt8.self)
}

Even with all optimisations enabled and full inlining, Swift is not able to (and not allowed to) specialise convertA to be equivalent to convertB due to the fact that in the generic context, methods such as I(_: Float) are interpreted as calls to FixedWidthInteger.init<T: BinaryFloatingPoint>(_ source: T) rather than the concrete equivalents (e.g. UInt8.init(_ source: Float)). What this means is that currently, to get good performance, we're required to manually specialise like this:

@inlinable
func floatToUnormGenericB<I: BinaryInteger & FixedWidthInteger & UnsignedInteger>(_ c: Float, type: I.Type) -> I {
    if c.isNaN {
        return 0
    }
    let c = min(1.0, max(c, 0.0))
    let scale: Float
    if I.self == UInt8.self {
        scale = Float(UInt8.max)
    } else if I.self == UInt16.self {
        scale = Float(UInt16.max)
    } else {
        scale = Float(I.max)
    }
    let rescaled = c * scale
    let rounded = rescaled.rounded(.toNearestOrAwayFromZero)
    if I.self == UInt8.self {
        return UInt8(rounded) as! I
    } else if I.self == UInt16.self {
        return UInt16(rounded) as! I
    } else {
        return I(rounded)
    }
}

When fully specialised, this produces equivalent assembly to floatToUnormConcrete, which is what we want. However, no Swift author should ever have to write this.

Semantically, what we want is the ability for the method UInt8.init(_ source: Float) to be marked as an equivalent more-specific overload of FixedWidthInteger.init<T: BinaryFloatingPoint>(_ source: T); however, to my knowledge, there's currently no way to express this in Swift. I'm not proposing any specific syntax here; however, I'm proposing that there be a supported way to mark methods as equivalent such that the compiler is allowed to replace one with another when specialising in a generic context.

Beyond the example I've outlined here, this is actually a very important issue in other contexts. SwiftUI on macOS's performance profile is dominated by calls to BinaryFloatingPoint._convert when converting between CGFloats and Float/Doubles; I obviously don't have access to SwiftUI's source, but I'm reasonably sure that it's caused by the same problem:

The assembly for all three of the snippets above can be viewed here on Godbolt.

Joe_Groff · August 29, 2020, 12:10am

You can express it using type tests that dispatch out to the concrete implementations, like you had in your floatToUnormGenericB example.

Torust · August 29, 2020, 12:15am

Right; what I should have said is there's no way to mark it on the per-function level rather than dispatching within the function. To put it another way, if a non-stdlib module defined its own integer type MyInt: FixedWidthInteger and a MyInt.init(_ source: Float) method, there's no way for it to mark that MyInt.init(_ source: Float) is allowed to be called rather than the generic FixedWidthInteger.init<T: BinaryFloatingPoint>(_ source: T) within the stdlib; every floatToUnorm-like method would need to manually add it as a known type.

(Yes, there are workarounds if FixedWidthInteger.init<T: BinaryFloatingPoint>(_ source: T) is a protocol requirement, where MyInt.init(_ source: T) could manually dispatch; my point is more that I don't think writing out lists of manual specialisations should be necessary).

Joe_Groff · August 29, 2020, 12:20am

I think that's a feature, not a bug. In cases where the other method is not in fact equivalent to the generic implementation, it can be really hard to track down the reason for the behavior difference if the choice between the concrete and generic implementation is implicit. Furthermore, if external libraries could change the behavior of FixedWidthInteger.init, then that means we could never reliably statically specialize the generic implementation, since there may be overriding implementations we can't see from other modules.

Lantua · August 29, 2020, 12:27am

It’d be pretty unfortunate, and borderline ironic, if you cannot inline FixedWidthInteger.init<T: BinaryFloatingPoint>(_ source: T) because someone, somewhere, may wants to implement a faster version of .init.

Since we’d want inlining more often than overriding, I’d say it’s a pretty decent trade-off.

Torust · August 29, 2020, 12:37am

I think if this feature were added, it would have to be as an option for the optimiser – the optimiser wouldn't have to pick the most-specific overload, it would just be allowed to where it is visible. That doesn't resolve the subtle behaviour differences issue, however, so I guess in that case it's a question of tradeoffs.

To me, it seems that for certain standard library types at least, like Float and UInt8, there should be a reasonable guarantee that the methods do in fact behave equivalently and that therefore the more-specific overloads can be substituted. Whether that belongs as an underscored attribute, a public-facing attribute, or just as a special-case in the compiler I'm not sure; however I don't think requiring floatToUnormGenericB is the right solution – it's too easy to miss, as evidenced by the SwiftUI performance problems.

Lantua · August 29, 2020, 12:41am

Out of curiosity (and a bit of a digression), what problem are you referring to?

Torust · August 29, 2020, 12:46am

I've been experiencing performance problems in a SwiftUI app I'm working on – looking at the Instruments trace shows that most of the time is spent in layout (which is something I'm able to address), but also that layout is spending most of its time in BinaryFloatingPoint._convert – see the image in the first post. Looking at the assembly around that area in Instruments, it seems like the problem is that SwiftUI uses CGFloats in places internally, needs to convert them to/from Doubles, and ends up calling the generic conversion rather than the concrete conversions since it's taking place in a generic context. This is obviously all just speculation since I don't have access to the source; however, I don't see another reason why BinaryFloatingPoint._convert would be in the trace.

xwu · August 29, 2020, 11:27am

This is a straightforwardly solvable problem. Either CGFloat could provide its own implementation of the generic requirement or SwiftUI could be reworked not to call the generic function, or both.

Torust · August 29, 2020, 11:53am

Just to add another example:

public func process(_ x: SIMD4<Double>) -> SIMD4<Double> {
    return SIMD4<Double>(x)
}

produces awful assembly due to the same issue. It may be that all of this is solvable by some adjustments within the standard library – for example manually specialising like in floatToUnormGenericB for all generic conversion methods involving stdlib number types, of which there are quite a few – but I do want to emphasise that this shows up in quite a few places as-is.

Lantua · August 29, 2020, 12:14pm

Well, it’s not like we have SIMD4.init(_: Self). I don’t think that counts as a problem caused by this. Though I do wonder which one will be selected as being more specific should we have both .init .

Torust · August 29, 2020, 12:21pm

I don't want to get too bogged down in this, but it is actually the exact same problem as in the let scale = Float(I.max) line of floatToUnormGenericA (well, excepting that the argument here is BinaryFloatingPoint rather than FixedWidthInteger). If you look at the implementation for SIMD4.init<Other>(_: SIMD4<Other>) where Other: BinaryFloatingPoint, it calls through to the generic BinaryFloatingPoint conversion since it's not allowed to substitute in the concrete Double.init(_: Double) initialiser.

xwu · August 29, 2020, 12:33pm

Is there a concrete Double.init(_: Double) initializer? I didn’t think so; there’s only the float literal initializer which is an artifact of literals being modeled using protocols. The point here is that manually marking these as specializations presupposes that the concrete implementations exist, and the issue is that in the general case they don’t.

There is nothing stopping each numeric type from implementing their own generic conversion initializers with fast paths for known conversion pairs. They can then fall back to the generic conversion routine for unsupported pairs (it was implemented by me as a standalone static function for a reason). That they don’t leaves performance on the floor for the same reason that there are other suboptimal aspects of the current implementation of numeric types: no one has gotten to it yet. It’s not a fundamental limitation of the language.

Lantua · August 29, 2020, 12:41pm

There is. Well, nothing stops the generic version from adding its own fast path, still.

xwu · August 29, 2020, 12:42pm

Neat. I forgot that we didn’t take those out.

I’ve been convinced to have another look at what’s implemented currently; there are a few fast paths that can be added easily, I’d wager.

Torust · August 29, 2020, 11:20pm

This thread now maybe belongs more in the development category more than Evolution, but it seems like we have a few long-term options:

Add explicit checks for the most common pairs only/where concrete-type initialisers are a protocol requirement, slightly impacting code size and affecting unspecialised performance in an as-yet unknown way. In this case, there will always be cases where the conversion takes the generic path in specialised code because no explicit check for that pair was implemented.
Add explicit checks for all three (per platform) BinaryFloatingPoint types to each other, all ten BinaryInteger types to each other, and between each BinaryInteger and BinaryFloatingPoint pair, for all conversion methods (init, init(exactly:), init(truncatingIfNeeded:) etc.) This might impact code size significantly, could slow down unspecialised code noticeably, and also seems like a bit of a maintenance nightmare, although I guess gyb would help.
Allow the optimiser to substitute in the concrete overloads in specialised contexts via some underscored attributes or compiler magic. The downside here is of potential divergence in behaviour between unspecialised and specialised code in the case of bugs; the upsides are in maintenance, code size, and consistent performance for all conversions.
Teach the optimiser to recognise the generic conversion and change it back into a concrete conversion, possibly at the LLVM level.

I’d be happy with any option but the first - leaving performance on the table forever seems undesirable, particularly for code that will often occur in inner loops. The third option seems most pragmatic to me, which is why I originally suggested it. Beyond that, I don’t have any stake in how this gets solved long-term, only that it does get solved.

CTMacUser · August 30, 2020, 6:03am

Both FixedWidthInteger and UnsignedInteger directly refine BinaryInteger, so I think you can take out BI from the generic parameter's type. (I'm not completely sure; there are some cases when conditionally conforming to a second-level derived protocol that you must include its base protocols.)

Torust · September 4, 2020, 10:14am

I've put up a PR with a GYB-based solution for this at https://github.com/apple/swift/pull/33799 (cc @xwu – hopefully I've saved you some time).

Lantua · September 4, 2020, 12:07pm

Hmm, I wonder how the number goes. Dynamic casting isn't exactly cheap either .