The Legend of the Toll-Free Swift -> ObjC Collection Bridge


(Alexis) #1

I’ve been looking a lot into how Swift and Objective C collections inter-convert because a bunch of necessary cleanup for ABI stability interacts with it. Unfortunately I’ve run into some conflicting information with respect to how stuff should work, and how it seems to actually work. Hoping y'all could help me clear this up.

Here’s the literal state of the Swift codebase, as I currently understand it:

* String, Array, Dictionary, and Set are all effectively tagged unions of "Native || Objc”.
* The ObjC case is the result of lazy bridging, and is basically just storing an NSWhatever pointer.
* If any of these collections are in the ObjC state, then bridging to ObjC is obvious and trivial. (yay!)
* If any of these collections are in the Native state:

Array: If the storage is verbatim bridgeable to ObjC (~it’s a class or objc existential), just return the buffer (toll free). Otherwise, wrap the storage in a _SwiftDeferredNSArray (not toll free). The first time someone tries to access the contents of the _SwiftDeferredNSArray, a CAS-loop race occurs to create a new buffer containing all the bridged values. The only alternative to this would be bridging each element as it’s requested, but that’s ultimately just a trade-off (and has issues if people are relying on pointer equality). There has to be some toll here.

However the construction of the _SwiftDeferredNSArray is hypothetically unnecessary, as alluded to in the comments on _SDNSArray. The class and its CAS pointer could be embedded in the native array buffer. This would presumably incur some bloat on all native Arrays, but it might not be too bad (and platforms which don’t support _runtime(ObjC) presumably can omit it)? That said, I’m not 100% clear if we have the machinery to accomplish this yet or not. If anyone knows this, it would help a lot!

(there’s also some special case singleton for empty arrays, that all seems fine)
  
See:
  ContiguousArrayBuffer.swift: _asCocoaArray
  ContiguousArrayBuffer.swift: _getNonVerbatimBridgedHeapBuffer
  SwiftNativeNSArray: _SwiftDeferredNSArray (the class that wraps the storage)
    
Dictionary/Set: Looks to be pretty much the same thing as Array, except that the old indexing model led to a double-indirection being baked into the design, so a wrapper class doesn’t need to be constructed in the non-verbatim case.
(_NativeDictionaryStorageOwner contains the CAS-pointer). (toll free as can be!)

But this means that cleaning up all the gunk from the old indexing model and removing this indirection will lead to a regression in bridging performance *unless* Dictionary/Set gets the kind of optimizations discussed for Array. The class inlining optimization also seem more acceptable for Dictionary, since it’s necessarily a more bloated allocation than Array (less overhead by %).

See:
  HashedCollections.swift.gyb: _bridgeToObjectiveCImpl
  HashedCollections.swift.gyb: _Native${Self}StorageOwner (the “outer” class)

String: Unconditionally construct a class that wraps the String’s storage. (not toll free)
  
This just seems bad, and as far as I can tell isn’t expected. It seems to be the result of _StringBuffer (the lowest-level type in the String abstraction stack that still actually knows it’s a string) being a struct, and not a class that inherits from _HeapBuffer due to some problems with deriving from generic classes. I’m not 100% sure what the “fix” for this is supposed to be.

I think any fix will necessarily lead to String becoming pointer-sized, which appears to be a desirable ABI feature anyway.
However this has tricky consequences for Strings which are actually sub-slices of other Strings. At the limit, this will definitely require some slices which don’t allocate (because they just create a new String pointing at the old buffer with different start/length values) to start requiring an allocation (because those fields will be in a class, and not a struct). Maybe stack promotion and careful pointer-tagging can eliminate most allocations in practice.

See:
  StringBridge.swift: _stdlib_binary_bridgeToObjectiveCImpl
  StringBridge.swift: _NSContiguousString (the class that wraps the storage)
  StringBuffer.swift: _StringBuffer (the type that wants to subclass _HeapBuffer)

So that’s the situation as I understand it. Did I get anything wrong? Are there any details I’m missing?


(Dave Abrahams) #2

I’ve been looking a lot into how Swift and Objective C collections
inter-convert because a bunch of necessary cleanup for ABI stability
interacts with it. Unfortunately I’ve run into some conflicting
information with respect to how stuff should work, and how it seems to
actually work. Hoping y'all could help me clear this up.

Here’s the literal state of the Swift codebase, as I currently understand it:

* String, Array, Dictionary, and Set are all effectively tagged unions of "Native || Objc”.
* The ObjC case is the result of lazy bridging, and is basically just storing an NSWhatever pointer.
* If any of these collections are in the ObjC state, then bridging to ObjC is obvious and
trivial. (yay!)
* If any of these collections are in the Native state:

Array: If the storage is verbatim bridgeable to ObjC (~it’s a class or
objc existential), just return the buffer (toll free). Otherwise, wrap
the storage in a _SwiftDeferredNSArray (not toll free). The first time
someone tries to access the contents of the _SwiftDeferredNSArray, a
CAS-loop race occurs to create a new buffer containing all the bridged
values.

Correct, except to be clear it's not a data race because of the atomics.

The only alternative to this would be bridging each element as
it’s requested, but that’s ultimately just a trade-off (and has issues
if people are relying on pointer equality). There has to be some toll
here.

However the construction of the _SwiftDeferredNSArray is
hypothetically unnecessary, as alluded to in the comments on
_SDNSArray. The class and its CAS pointer could be embedded in the
native array buffer.

That doesn't make creating it unneccessary; it only makes the separate
allocation unnecessary. Whether or not it's actually needed is a
separate question. When we first tested Swift without it there were
some bugs reported due to object identity of things bridged from value
types not being preserved. We could try again to discover who's relying
on it and see whether we've reached the point where that code can be
changed not to rely on it.

This would presumably incur some bloat on all native Arrays, but it
might not be too bad (and platforms which don’t support _runtime(ObjC)
presumably can omit it)? That said, I’m not 100% clear if we have the
machinery to accomplish this yet or not. If anyone knows this, it
would help a lot!

We do.

(there’s also some special case singleton for empty arrays, that all seems fine)

See:
  ContiguousArrayBuffer.swift: _asCocoaArray
  ContiguousArrayBuffer.swift: _getNonVerbatimBridgedHeapBuffer
  SwiftNativeNSArray: _SwiftDeferredNSArray (the class that wraps the storage)

Dictionary/Set: Looks to be pretty much the same thing as Array,
except that the old indexing model led to a double-indirection being
baked into the design, so a wrapper class doesn’t need to be
constructed in the non-verbatim case.
(_NativeDictionaryStorageOwner contains the CAS-pointer). (toll free
as can be!)

I didn't know that, but it sounds plausible.

But this means that cleaning up all the gunk from the old indexing
model and removing this indirection will lead to a regression in
bridging performance *unless* Dictionary/Set gets the kind of
optimizations discussed for Array.

I wouldn't worry about this particular cost. Bridging values into
Objective-C is already very expensive unless none of the values are
actually used, which is rare to say the least.

The class inlining optimization also seem more acceptable for
Dictionary, since it’s necessarily a more bloated allocation than
Array (less overhead by %).

Yes. And given the scale of expense occurred the first time an element
is requested from ObjC, it probably doesn't make sense for Dictionary
either. Embedding the NS class in the Swift data structure's buffer
can also keep some memory alive longer than necessary. I don't think we
should do this optimization.

See:
  HashedCollections.swift.gyb: _bridgeToObjectiveCImpl
  HashedCollections.swift.gyb: _Native${Self}StorageOwner (the “outer” class)

String: Unconditionally construct a class that wraps the String’s storage. (not toll free)

This just seems bad, and as far as I can tell isn’t expected.

As I told you, it isn't.

It seems to be the result of _StringBuffer (the lowest-level type in
the String abstraction stack that still actually knows it’s a string)
being a struct, and not a class that inherits from _HeapBuffer

? _HeapBuffer is a struct; you can't inherit from it. You must mean
_HeapBufferStorage. But there's an underlying _HeapBufferStorage in
_StringBuffer, and this class can be made a subclass of NSString.

due to some problems with deriving from generic classes. I’m not 100%
sure what the “fix” for this is supposed to be.

I think any fix will necessarily lead to String becoming
pointer-sized, which appears to be a desirable ABI feature anyway.

No, you don't have to do that, but yes, we do want that feature.

The only Strings that can bridge “toll-free” are the ones currently
backed by NSString and the ones that occupy *all* of their backing
native buffer (rather than being sliced from that buffer).

However this has tricky consequences for Strings which are actually
sub-slices of other Strings. At the limit, this will definitely
require some slices which don’t allocate (because they just create a
new String pointing at the old buffer with different start/length
values) to start requiring an allocation (because those fields will be
in a class, and not a struct).

Yes, that's expected.

Maybe stack promotion and careful pointer-tagging can eliminate most
allocations in practice.

See:
  StringBridge.swift: _stdlib_binary_bridgeToObjectiveCImpl
  StringBridge.swift: _NSContiguousString (the class that wraps the storage)
  StringBuffer.swift: _StringBuffer (the type that wants to subclass _HeapBuffer)

So that’s the situation as I understand it. Did I get anything wrong? Are there any details I’m
missing?

You can give me a call to discuss details if you like :slight_smile:

···

on Fri Oct 07 2016, Alexis <swift-dev-AT-swift.org> wrote:

--
-Dave