Performance of generic protocol extension (30x slower, Swift 4.2)

rsebbe · December 5, 2018, 9:41am

Hi.

I need a number of 2-dimensional data types that can be iterated on. For those, I thought of having a 2-dimensional "QuadraticSequence" protocol, joining 2 1-dimensional "Sequence" protocol:

public protocol QuadraticSequence: Sequence {
    associatedtype LinearSequence: Sequence
    var sequences: (LinearSequence, LinearSequence) { get }
    static func combineDimensions(_: LinearSequence.Element, _: LinearSequence.Element) -> Element
}

then implementing makeIterator() as a protocol extension of QuadraticSequence,
then add compliance to QuadraticSequence to 2-dimensional types, such as this 2-d index range:

public struct NQuadraticIndexRange{
    public let rows, columns: Int
}
extension NQuadraticIndexRange: QuadraticSequence {
    public typealias Element = (Int, Int)
    public typealias LinearSequence = Range<Int>
    public typealias Iterator = QuadraticIterator<NQuadraticIndexRange>
    public var sequences: (Range<Int>, Range<Int>) { return (0..<rows, 0..<columns) }
}

It all goes well and I can add quadratic iteration to all my quadratic types very easily.

Howerer, when I tested the execution speed of this versus making specific iterators, I see that it is 30x slower with the generic protocol implementation (optimized compiler settings). I suspect dynamic dispatch of some sort in the inner loop of the iterator. It seems strange, as I would have thought that the compiler has all necessary data to inline everything and be as fast as direct code. (added iterator below for reference)

At this time I'm reverting to direct implementation of Sequence for the quadratic types, but I'd be interested to know what is making it that slow, and if there are solutions to have an elegant, single generic implementation instead of multiple ones.

Thank you.

public struct QuadraticIterator<Sequence: QuadraticSequence>: IteratorProtocol {
    let sequence: Sequence
    var iterators: (Sequence.LinearSequence.Iterator, Sequence.LinearSequence.Iterator)
    var current: (Sequence.LinearSequence.Element, Sequence.LinearSequence.Element)
    var done: Bool

    init(sequence s: Sequence) {
            sequence = s
            iterators = (sequence.sequences.0.makeIterator(), sequence.sequences.1.makeIterator())
	
            if let i0 = iterators.0.next(), let i1 = iterators.1.next() {
                    done = false
                    current.0 = i0
                    current.1 = i1
            } else {
                    done = true
                    current.0 = Sequence._defaultLinearSequenceElement()
                    current.1 = Sequence._defaultLinearSequenceElement()
            }
    }

    public mutating func next() -> Sequence.Element? {
        guard !done else { return nil }
            let loc = Sequence.combineDimensions(current.0, current.1)
            if let i1 = iterators.1.next() {
		current.1 = i1
            } else {
		if let i0 = iterators.0.next() {
			current.0 = i0
			iterators.1 = sequence.sequences.1.makeIterator()
			current.1 = iterators.1.next()!
		} else {
			done = true
		}
            }
	
            return loc
    }

SDGGiesbrecht · December 5, 2018, 11:52am

Generics are fast when the compiler can specialize them, but suffer from speed issues when the compiler cannot see what it needs in order to specialize them and thus has to pass around protocol witness tables instead.

There are two things which could be unnecessarily preventing the compiler from specializing your methods:

If the protocol is in a separate module from the type that conforms to it, then when the type is compiled, it does not have access to the implementation to specialize it and has to call it generically. Fix this by adding @inlinable to the protocol extension methods to make their implementations visible to other modules. (@inlinable has repercussions for module stability, but I doubt those apply here.) You can read about it in the attribute list of The Swift Programming Language and find more detail in its evolution proposal.
If the protocol is in the same module but a separate file from the type that conforms to it, make sure that whole module optimization has not been disabled. Here is an older blog post about it.

rsebbe · December 5, 2018, 12:01pm

All that code resides in the same Swift file.

Also, when you mention this as "generics" optimizations, does that also apply to protocol with associated types or just standard generics? This code uses both.

nuclearace · December 5, 2018, 12:10pm

Both of them can suffer performance wise. But generally code that is using generics has a higher probability of being aggressively optimized. Using protocols as existentials generally means the compiler can't do as much aggressive specialization/optimization, and so has to go through the witness tables (i.e. indirection.)

SDGGiesbrecht · December 5, 2018, 12:46pm

Then none of this has anything to do with the problem. Have you ran it through instruments to find the bottleneck?

I will still answer your second question though:

All of your posted functions/methods are generic. When I say “generics”, I mean anything that can be used in a separate specialized form (the compiler makes a special version of it for each particular set of parameter types used and prefers to call these separately optimized versions instead). Generic declarations can be identified easily because they have or are related to something with generic parameters (<...>) , a where clause, or associated types. Here are several examples:

func genericFunction<T>(_ parameter: T) {}

struct GenericStruct<T> {
    func simpleFunction() {} // Still generic, since `Self` already involves generics.
}

extension Protocol {
    func extensionMethod() {} // This behaves as generic (specialized where possible), because it is essentially an abbreviated form of the following:
}
func refactoredExtensionMethod<T>(_ instance: T) where T : Protocol {}

Existentials use (i.e. let x: MyProtocol = SomeConformingType()) is never specialized; it would be impossible. But this plays no role here, because the posted protocol cannot be used existentially anyway. The associated values force it to only be used generically.

nuclearace · December 5, 2018, 12:54pm

Right, I was speaking broadly about

func someFunc(_ x: SomeProtocol) { }

vs

func someFunc<T: SomeProtocol>(_ x: T) { }

Another question I have though, how exactly are testing this slow down? Sometimes the way you write your test code can affect the results. One common pitfall seems to involve writing performance testing code at the top-level main.swift file. Instances at a global scope are harder for the compiler to reason about access to.

Slava_Pestov · December 5, 2018, 6:29pm

Looking at the output of swiftc myfile.swift -emit-sil -O might shed some light into where the compiler is not able to optimize away runtime generics or dynamic dispatch (it won't tell you why, though). An overview of how to read SIL output is beyond the scope of this email, but if you file a bug one of us can take a look later.

rsebbe · December 5, 2018, 7:33pm

I use unit test compiled in Release for speed measurement:

	func testPerformance1Loop() {
      self.measure {
		// Put the code you want to measure the time of here.
		let indices = NQuadraticIndexRange(rows: 1000, columns: 1000)
		var sum = 0
		for i in indices {
			sum += i.0+i.1
		}
		print("sum= \(sum)")
	}
}

When using the protocol based iterator, I get a mean running time of 0.3s, when using a direct implementation, I get 0.01s. Direct range loop with 2 indices (i,j) in Swift is a bit better, like 0.006s, and C code give the best, at 0.002s.

rsebbe · December 5, 2018, 8:05pm

I extracted the code and ran it through the compiler with the flags. Do you mean I should file a radar with Apple's bug reporter / Devtools? Thx.

Slava_Pestov · December 5, 2018, 8:07pm

Filing a bug at bugs.swift.org would be best.

rsebbe · December 5, 2018, 8:24pm

Just did it: [SR-9416] Performance problem with generic protocol · Issue #51881 · apple/swift · GitHub

Thank you.