Specialised closures in an ABI-stable library

I'm working on a feature for a resilient library, that looks something like this:

@frozen
public struct Stream<Source: IteratorProtocol<Int>>: IteratorProtocol {

  public var source: Source

  @usableFromInline
  internal var internalState: InternalState

  @inlinable
  public mutating func next() -> Int? {
    internalState.next { source.next() }
  }
}

@usableFromInline
internal struct InternalState {

  @usableFromInline
  internal func next(consumeFromSource: () -> Int?) -> Int? {
    //... Algorithm not available for inlining ...
  }
}

In words: we have a public generic outer type, which is completely specialisable by client modules, and a non-generic internal core, which is our ABI barrier. The implementation details of the core are not exposed at all to clients.

Clients are supposed to pass data in to the core via the non-escaping closure. The idea is that, because that part is @inlinable, they can specialise the closure and hopefully avoid runtime metadata lookups.

Unfortunately I don't think that's what I'm seeing:

BTW: the resilient library I mentioned before is the standard library, Stream is Unicode.NFC<S>.Iterator, and InternalState is _NFCNormalizerState and _NFDNormalizerState. I don't think that's relevant but it helps explain what the picture is showing - the highlighted frame is Stream<S>.next.

What I think the Instruments output is telling me is that, although the Stream.next function can be inlined (I did have to force it using @inline(__always)), that doesn't seem to include closure #1 in (Stream.next), which doesn't seem to be inlined or specialised or anything.

And as a result, the closure contains calls to swift_checkMetadataState taking up 44% of each call to next(), together with smaller functions such as swift_getAssociatedTypeWitness and swift_getAssociatedConformanceWitness which I believe indicate a lack of specialisation.

So my questions are:

  1. Am I reading the instruments output ~correctly? Would it be possible to eliminate the calls to swift_checkMetadataState if the closure were specialised?

  2. Is the pattern I'm trying to implement sound, conceptually? Is there something I'm overlooking?

  3. If I understand it correctly and the pattern is sound, is it possible for me to do anything to get the performance I'm looking for? Or is this maybe an optimiser issue?

1 Like

OK I solved it.

It turns out, one of the calling functions in the client library was generic but only @usableFromInline. In other words, Source itself was an unspecialised generic:

// This was a stack of functions, but basically amounts to this:

@usableFromInline
internal func process(_ source: some IteratorProtocol<Int>) {
  for value in Stream(source) {
    // ...
  }
}

Once that function was made @inlinable, Stream<Source> could be fully specialised (as intended). It did exactly what I wanted it to do, and the performance is everything I hoped it would be. So this pattern does work :slight_smile:

4 Likes

It catches me pretty regularly. Wish there's a linter for this (Or maybe I just didn't find one)