Realtime threads with Swift

swift_retain and swift_release should generally be realtime-safe as long as

  • the final release isn't done in a realtime context (since obviously it calls free)
  • no weak references are involved (they allocate on initial use)
  • the object is not being retained and released from multiple threads (contending on the cache line containing the refcount can be surprisingly slow)
  • the reference count doesn't integer overflow (you probably have other serious problems if this is happening though)

swift_retain is just:

  SWIFT_ALWAYS_INLINE
  HeapObject *increment(uint32_t inc = 1) {
    auto oldbits = refCounts.load(SWIFT_MEMORY_ORDER_CONSUME);
    
    RefCountBits newbits;
    do {
      newbits = oldbits;
      bool fast = newbits.incrementStrongExtraRefCount(inc); //just an addition
      if (SWIFT_UNLIKELY(!fast)) {
        //handling for unusual cases above elided
      }
    } while (!refCounts.compare_exchange_weak(oldbits, newbits,
                                              std::memory_order_relaxed));
    return getHeapObject(); //just a subtraction
  }

so just a consume-ordered load and a relaxed-ordered compare-and-swap, which will only loop if there's contention.

swift_release does a little more, but it's still just:

[elided]
    auto oldbits = refCounts.load(SWIFT_MEMORY_ORDER_CONSUME);
[elided]
   while (!refCounts.compare_exchange_weak(oldbits, newbits,
                                              std::memory_order_release,
                                              std::memory_order_relaxed));
9 Likes

This is what's the use case in the mentioned source: the "fillCount: ManagedAtomic" variable is getting retained / released in two different threads.


Good to know!

Hmm, in numerous other examples I saw a stronger releasing in this position, is relaxed ok here?

I can also see "SWIFT_RT_TRACK_INVOCATION" which raises a question, is what you are saying also true under Xcode debug builds with or without various diagnostics enabled?

What's the "side table" business, btw, is it still used? I remember in the past retain count was a byte quantity with the remainder stored in a side table, but I thought that's no longer the case.

Retain/releasing aside, I can see quite a few things leading to malloc or lock in fillCount.load / fillCount.store called when fillCount is typed as ManagedAtomic's
typos ahead due to screenshot OCR-ing:

swift: :MetadataAllocator: :Allocate (unsigned long, unsigned long)
O swift::_getWitnessTable (swift: :TargetProtocolConformanceDescriptor<swi...
1 lazy protocol witness table accessor for type Int32 and conformance Int32
2 RingBuffer.push(:)
3 test()
4 main

bl 0x1bfe8d4a8 • svmbol stub for: malloc
O swift::Demangle:.runtime::Node:addChild(swift.De....
1 swift: :Demangle::_ runtime::Demangler::demangleProt...
2 swift::Demangle::
runtime::Demangler::demangleSpe…..
3 swift: :Demangle:: runtime::Demangler::demangleType..
4 swift getTvoeBvMangledNamelmpl(swift::MetadataRe...
5 swift getTypeByMangledName
6 swift getTvpeByMangledNamelnContext
7 swift instantiateConcreteTvpeFromMangledName
8 _getUnsafePointerToStoredProperties(:)[inlinedl
9 ManagedAtomic._ptr.getter
10 specialized RingBuffer.push(_:)
11 test()
12 main

libswiftCore.dylib'swift::Demangle::_runtime: :Demangler: :demangleProtocolList:
bl malloc again
0 swift: :Demangle::__runtime::Demangler::demangleProt...
1 swift: :Demangle:: runtime::Demangler::demangleSpec...
2 swift: :Demangle::_runtime::Demangler::demangleType...
3 swift getTypeByMangledNamelmpl(swift::MetadataRe...
4 swift_getTypeByMangledName
5 swift_getTypeByMangledNamelnContext
6 _swift_instantiateConcreteTypeFromMangledName
7 _getUnsafePointerToStoredProperties(_:) [inlined]
8 ManagedAtomic._ptr.getter
9 specialized RingBuffer.push(_:)
10 test()
11 main

malloc
0 swift_getForeignTypeMetadata
1 type metadata accessor for _sa_DoubleWord
2_getUnsafePointerToStoredProperties_:) [inlined]
3 ManagedAtomic._ptr.getter
4 specialized RingBuffer.push(_:)
5 test(
6 main

swift:<+364>texPlatformHelper: :lock(os unfair lock s&)
0 swift getForeignTvpeMetadata
1 type metadata accessor for _sa_DoubleWord
2 _ getUnsafePointerToStoredProperties(_:) [inlined]
3 ManagedAtomic._ptr.getter
4 specialized RingBuffer.push_:)
5 test()
6 main

OTOH, when I type fillCount as UnsafeAtomic the asm is much shorter / cleaner and nicer :slight_smile:

Hmm. I think there could in fact be a dependency that requires a stronger ordering.

Thread A holds the sole strong reference to object O, which has no weak references. Thread A passes its reference to Thread B via Unmanaged<T>.passUnretained(_:), and then blocks on a semaphore. Thread B calls takeRetainedValue() on the Unmanaged<T> it from from Thread A, then signals the semaphore. Then Thread A releases its reference.

Thread A’s view of the object’s retain count has a dependency on Thread B’s increment of the retain count. But it doesn’t have a dependency on the specific load that Thread A performed as part of the compare_exchange_weak(). So I’m a little concerned that if Threads A and B are running on different cores, they might actually see inconsistent views of the refcount (e.g. both cores load the refcount from non-shared L1 cache).

@Andrew_Trick, is this actually a concern? If so, the two-ordering version of compare_exchange_weak could be used, specifying release for the successful ordering and relaxed for the failure ordering.

Hard to say if this would be an issue. The case I've seen refcount contention matter a lot in, there were 10 or so threads contending on one global.

Instruments will replace the implementation of refcounting functions when it's tracking retain/release history. Normal debugging doesn't do this though.

Side tables are used for weak references, and if the inline refcount overflows.

Signaling a semaphore synchronizes with any waiting thread. And you obviously can't do what you're describing without a semaphore, because otherwise you've just got a retain racing with the final release.

(Actually you can't do what you describe at all because passUnretained and takeRetainedValue are not a balanced pair; that is a net over-release. My analysis above assumes you meant takeUnretainedValue.)

1 Like

When does it overflow? on 2^8? 2^16? 2^32? 2^64?

Edit: found this old post, is this layout used now? So it is enough to retain an object 256 times when ref count overflows?

Fair. Imagine instead the application uses a hand-rolled spinlock. There’s no requirement that the spinlock emit a full memory barrier to correctly synchronize the work its threads perform. Doesn’t the Swift runtime need to use an release semantics to guard against that?

No. If the retain doesn't happen before the release, then they can race, and the retain can lose. It's just fundamentally not a salvageable situation.

Also, spin locks are required to synchronize — to establish well-ordering under happens before — or else they're not really locks in any sense and cannot be used in lock-like patterns where they protect access to other memory. They don't have to barrier more strongly than that (on architectures with stronger relations than happens before), but they do have to do that. Just because something spins doesn't make it a spin lock; most lock-free algorithms involve some potential for spinning.

1 Like

I think we’re referring to two separate problems. This sounds like a description of resurrection, which can happen even if all atomic accesses are well-ordered. I’m asking about decoherence, where two threads disagree on the temporal ordering of changes to a memory location despite exclusively accessing that location through atomic operations.

I can see where you’re coming from, but if the program correctly orders all of its own atomic memory accesses and still manages to corrupt an object’s refcount because a runtime it has no insight into chose to use a relaxed memory ordering, isn’t that the runtime’s fault?

I have never agreed with the philosophy that libraries have a responsibility to “just work” for any arbitrary sequence of operations. To do useful work performantly, libraries must be allowed to have preconditions on their correct use. This is surely even more true of low-level libraries such as language runtimes which expect their primary clients to be something like compiler output which can be trusted to follow high-level rules. Otherwise you end up with libraries that do tons of repetitive internal locking just to protect against use patterns that don’t really make sense to use in practice.

“No resurrection” is a precondition, just one without a precise statement. If I formalize “no resurrection” in a concurrent environment, I must end up with something like “the end of all retains must happen before the start of the final release.” It is easy for the compiler to ensure that in generated code, and it is reasonable to expect unsafe code to also live up to it. That is especially true because it is hard to come up with any example where it can be violated without introducing deeper problems.

5 Likes

Refcount retains are unordered. So are refcount releases... a refcount release has a "release" barrier because it synchronizes with the deinitialization path, which has an "acquire" barrier.

In your example

TA:
retain ref
semaphore block
semaphore resume (acquire barrier)
release ref

TB:
retain ref
semaphore signal (release barier)

Operations to any memory location on the same thread happen before a release barrier and after an acquire barrier. So, if the OS orders signal->resume, then all the operations are sequential.

The thing I can’t get past is that we’re talking about a well-behaved program that does guarantee that the call to swift_retain happens-before the call to swift_release. The source of the problem is that swift_retain doesn’t guarantee that its compare/exchange happens-before the load in swift_release, because it uses relaxed semantics instead of acq_rel and release.

Relaxed memory accesses are still transitively ordered by other synchronization. (As Andy says just above.)

Thanks; it seems I was confusing consume semantics with acquire/release.

No, that's an ObjC isa pointer, not a Swift refcount.

  static const size_t StrongExtraRefCountBitCount = 30;

So 2^30

2 Likes

Good to know. So when someone calls an unbalanced retain one billion time – at that point an entry in the side table is created (I wonder if retain count overflows happen in practice at all? But I guess - yes, otherwise we wouldn't have this code path in the library in the first place, right?).

And when RC drops just below one billion that side-table entry is deleted or stays in place?
And for @objc classes it's different, right?

This raises a question: are there no problems with two or more threads competing with each other doing retain on the same object when its RC is near one billion and transition from inline to side table is about to happen? Just checking.

Not handling it correctly is potentially a security issue, if an attacker can cause an unbounded number of retains. Another possible solution is to use saturating arithmetic for refcounting.

It's a one way transition. The refcount word in its entirety is replaced with a pointer to the side table.

Anything* inheriting from NSObject will use its retain/release implementation, yeah.

*with the exception of the backing classes for String/Array/etc… which are special and jump through hoops to use Swift refcounting while being subclasses of Foundation types at runtime.

3 Likes

Well, hopefully the implementation detects overflow from another thread and compensates appropriately--given that it's a one-way transition. But I'm not the runtime engineer you're looking for :slight_smile:

We may someday do this for normal Swift classes that inherit from NSObject as well. We added a root class for it as part of the Concurrency project in order to allow actors to inherit from NSObject, and then we haven't done much with that.

1 Like

I think @tera might be concerned that if the two threads are using unordered atomics, they could both independently think they won the race to create the side table.