I'm working on a feature for a resilient library, that looks something like this:
@frozen
public struct Stream<Source: IteratorProtocol<Int>>: IteratorProtocol {
public var source: Source
@usableFromInline
internal var internalState: InternalState
@inlinable
public mutating func next() -> Int? {
internalState.next { source.next() }
}
}
@usableFromInline
internal struct InternalState {
@usableFromInline
internal func next(consumeFromSource: () -> Int?) -> Int? {
//... Algorithm not available for inlining ...
}
}
In words: we have a public generic outer type, which is completely specialisable by client modules, and a non-generic internal core, which is our ABI barrier. The implementation details of the core are not exposed at all to clients.
Clients are supposed to pass data in to the core via the non-escaping closure. The idea is that, because that part is @inlinable
, they can specialise the closure and hopefully avoid runtime metadata lookups.
Unfortunately I don't think that's what I'm seeing:
BTW: the resilient library I mentioned before is the standard library, Stream
is Unicode.NFC<S>.Iterator
, and InternalState
is _NFCNormalizerState
and _NFDNormalizerState
. I don't think that's relevant but it helps explain what the picture is showing - the highlighted frame is Stream<S>.next
.
What I think the Instruments output is telling me is that, although the Stream.next
function can be inlined (I did have to force it using @inline(__always)
), that doesn't seem to include closure #1 in (Stream.next)
, which doesn't seem to be inlined or specialised or anything.
And as a result, the closure contains calls to swift_checkMetadataState
taking up 44% of each call to next()
, together with smaller functions such as swift_getAssociatedTypeWitness
and swift_getAssociatedConformanceWitness
which I believe indicate a lack of specialisation.
So my questions are:
-
Am I reading the instruments output ~correctly? Would it be possible to eliminate the calls to
swift_checkMetadataState
if the closure were specialised? -
Is the pattern I'm trying to implement sound, conceptually? Is there something I'm overlooking?
-
If I understand it correctly and the pattern is sound, is it possible for me to do anything to get the performance I'm looking for? Or is this maybe an optimiser issue?