This is what's the use case in the mentioned source: the "fillCount: ManagedAtomic" variable is getting retained / released in two different threads.
Good to know!
Hmm, in numerous other examples I saw a stronger releasing in this position, is relaxed ok here?
I can also see "SWIFT_RT_TRACK_INVOCATION" which raises a question, is what you are saying also true under Xcode debug builds with or without various diagnostics enabled?
What's the "side table" business, btw, is it still used? I remember in the past retain count was a byte quantity with the remainder stored in a side table, but I thought that's no longer the case.
Retain/releasing aside, I can see quite a few things leading to malloc or lock in fillCount.load / fillCount.store called when fillCount is typed as ManagedAtomic's
typos ahead due to screenshot OCR-ing:
swift: :MetadataAllocator: :Allocate (unsigned long, unsigned long)
O swift::_getWitnessTable (swift: :TargetProtocolConformanceDescriptor<swi...
1 lazy protocol witness table accessor for type Int32 and conformance Int32
2 RingBuffer.push(:)
3 test()
4 main
bl 0x1bfe8d4a8 ⢠svmbol stub for: malloc
O swift::Demangle:.runtime::Node:addChild(swift.De....
1 swift: :Demangle::_ runtime::Demangler::demangleProt...
2 swift::Demangle::
runtime::Demangler::demangleSpeâŚ..
3 swift: :Demangle:: runtime::Demangler::demangleType..
4 swift getTvoeBvMangledNamelmpl(swift::MetadataRe...
5 swift getTypeByMangledName
6 swift getTvpeByMangledNamelnContext
7 swift instantiateConcreteTvpeFromMangledName
8 _getUnsafePointerToStoredProperties(:)[inlinedl
9 ManagedAtomic._ptr.getter
10 specialized RingBuffer.push(_:)
11 test()
12 main
libswiftCore.dylib'swift::Demangle::_runtime: :Demangler: :demangleProtocolList:
bl malloc again
0 swift: :Demangle::__runtime::Demangler::demangleProt...
1 swift: :Demangle:: runtime::Demangler::demangleSpec...
2 swift: :Demangle::_runtime::Demangler::demangleType...
3 swift getTypeByMangledNamelmpl(swift::MetadataRe...
4 swift_getTypeByMangledName
5 swift_getTypeByMangledNamelnContext
6 _swift_instantiateConcreteTypeFromMangledName
7 _getUnsafePointerToStoredProperties(_:) [inlined]
8 ManagedAtomic._ptr.getter
9 specialized RingBuffer.push(_:)
10 test()
11 main
malloc
0 swift_getForeignTypeMetadata
1 type metadata accessor for _sa_DoubleWord
2_getUnsafePointerToStoredProperties_:) [inlined]
3 ManagedAtomic._ptr.getter
4 specialized RingBuffer.push(_:)
5 test(
6 main
swift:<+364>texPlatformHelper: :lock(os unfair lock s&)
0 swift getForeignTvpeMetadata
1 type metadata accessor for _sa_DoubleWord
2 _ getUnsafePointerToStoredProperties(_:) [inlined]
3 ManagedAtomic._ptr.getter
4 specialized RingBuffer.push_:)
5 test()
6 main
OTOH, when I type fillCount as UnsafeAtomic the asm is much shorter / cleaner and nicer
Hmm. I think there could in fact be a dependency that requires a stronger ordering.
Thread A holds the sole strong reference to object O, which has no weak references. Thread A passes its reference to Thread B via Unmanaged<T>.passUnretained(_:), and then blocks on a semaphore. Thread B calls takeRetainedValue() on the Unmanaged<T> it from from Thread A, then signals the semaphore. Then Thread A releases its reference.
Thread Aâs view of the objectâs retain count has a dependency on Thread Bâs increment of the retain count. But it doesnât have a dependency on the specific load that Thread A performed as part of the compare_exchange_weak(). So Iâm a little concerned that if Threads A and B are running on different cores, they might actually see inconsistent views of the refcount (e.g. both cores load the refcount from non-shared L1 cache).
@Andrew_Trick, is this actually a concern? If so, the two-ordering version of compare_exchange_weak could be used, specifying release for the successful ordering and relaxed for the failure ordering.
Hard to say if this would be an issue. The case I've seen refcount contention matter a lot in, there were 10 or so threads contending on one global.
Instruments will replace the implementation of refcounting functions when it's tracking retain/release history. Normal debugging doesn't do this though.
Side tables are used for weak references, and if the inline refcount overflows.
Signaling a semaphore synchronizes with any waiting thread. And you obviously can't do what you're describing without a semaphore, because otherwise you've just got a retain racing with the final release.
(Actually you can't do what you describe at all because passUnretained and takeRetainedValue are not a balanced pair; that is a net over-release. My analysis above assumes you meant takeUnretainedValue.)
Fair. Imagine instead the application uses a hand-rolled spinlock. Thereâs no requirement that the spinlock emit a full memory barrier to correctly synchronize the work its threads perform. Doesnât the Swift runtime need to use an release semantics to guard against that?
No. If the retain doesn't happen before the release, then they can race, and the retain can lose. It's just fundamentally not a salvageable situation.
Also, spin locks are required to synchronize â to establish well-ordering under happens before â or else they're not really locks in any sense and cannot be used in lock-like patterns where they protect access to other memory. They don't have to barrier more strongly than that (on architectures with stronger relations than happens before), but they do have to do that. Just because something spins doesn't make it a spin lock; most lock-free algorithms involve some potential for spinning.
I think weâre referring to two separate problems. This sounds like a description of resurrection, which can happen even if all atomic accesses are well-ordered. Iâm asking about decoherence, where two threads disagree on the temporal ordering of changes to a memory location despite exclusively accessing that location through atomic operations.
I can see where youâre coming from, but if the program correctly orders all of its own atomic memory accesses and still manages to corrupt an objectâs refcount because a runtime it has no insight into chose to use a relaxed memory ordering, isnât that the runtimeâs fault?
I have never agreed with the philosophy that libraries have a responsibility to âjust workâ for any arbitrary sequence of operations. To do useful work performantly, libraries must be allowed to have preconditions on their correct use. This is surely even more true of low-level libraries such as language runtimes which expect their primary clients to be something like compiler output which can be trusted to follow high-level rules. Otherwise you end up with libraries that do tons of repetitive internal locking just to protect against use patterns that donât really make sense to use in practice.
âNo resurrectionâ is a precondition, just one without a precise statement. If I formalize âno resurrectionâ in a concurrent environment, I must end up with something like âthe end of all retains must happen before the start of the final release.â It is easy for the compiler to ensure that in generated code, and it is reasonable to expect unsafe code to also live up to it. That is especially true because it is hard to come up with any example where it can be violated without introducing deeper problems.
Refcount retains are unordered. So are refcount releases... a refcount release has a "release" barrier because it synchronizes with the deinitialization path, which has an "acquire" barrier.
Operations to any memory location on the same thread happen before a release barrier and after an acquire barrier. So, if the OS orders signal->resume, then all the operations are sequential.
The thing I canât get past is that weâre talking about a well-behaved program that does guarantee that the call to swift_retain happens-before the call to swift_release. The source of the problem is that swift_retaindoesnât guarantee that its compare/exchange happens-before the load in swift_release, because it uses relaxed semantics instead of acq_rel and release.
Good to know. So when someone calls an unbalanced retain one billion time â at that point an entry in the side table is created (I wonder if retain count overflows happen in practice at all? But I guess - yes, otherwise we wouldn't have this code path in the library in the first place, right?).
And when RC drops just below one billion that side-table entry is deleted or stays in place?
And for @objc classes it's different, right?
This raises a question: are there no problems with two or more threads competing with each other doing retain on the same object when its RC is near one billion and transition from inline to side table is about to happen? Just checking.
Not handling it correctly is potentially a security issue, if an attacker can cause an unbounded number of retains. Another possible solution is to use saturating arithmetic for refcounting.
It's a one way transition. The refcount word in its entirety is replaced with a pointer to the side table.
Anything* inheriting from NSObject will use its retain/release implementation, yeah.
*with the exception of the backing classes for String/Array/etc⌠which are special and jump through hoops to use Swift refcounting while being subclasses of Foundation types at runtime.
Well, hopefully the implementation detects overflow from another thread and compensates appropriately--given that it's a one-way transition. But I'm not the runtime engineer you're looking for
We may someday do this for normal Swift classes that inherit from NSObject as well. We added a root class for it as part of the Concurrency project in order to allow actors to inherit from NSObject, and then we haven't done much with that.
I think @tera might be concerned that if the two threads are using unordered atomics, they could both independently think they won the race to create the side table.