Hi i'm trying to serialize some work on queues and i'm getting a crash with this error. I'm not sure what it means, and i'm not sure where the crash site actually is.
Can someone help me understand what this is? thanks!
2023-05-07 12:40:35.4180 - 10752148672 - AppleCameraDataDelegate: DEINIT
Object 0x104f2f3a0 deallocated with retain count 2, reference may have escaped from deinit.
2023-05-07 12:40:35.418718-0400 booth[29128:2090318] Object 0x104f2f3a0 deallocated with retain count 2, reference may have escaped from deinit.
This error was added in this PR, which provides a bit more explanation. The issue is that in deinit we want to be able to call methods on self, but it's also possible for methods on self to form new strong references to self which outlive the duration of the deinit. This is what's being referred to as a reference having 'escaped.'
Such an escape is illegal because once deinit finishes the memory for the class instance is deallocated, and prior to the linked PR it would be possible for that memory to be accessed through the persisting strong reference, which is a memory safety issue. The current behavior will double check that before we deallocate a class instance, no additional references to the object we are deallocating still exist. Since deinit will only be called when the last reference to an object is released, in such a circumstance we know that something in the deinit has caused self to escape.
As for where the crash is happening, I can't be sure but it looks like it's happening as a result of the AppleCameraDataDelegatedeinit. I would take a look at what you're doing in that deinit to see where a reference could be kept alive after deinit returns.
Thank you so much for the quick & detailed response! I never would have figured that out. With that info it was easy to find what was going on: deinit was calling into an async dispatch that referenced self. D'oh. Fixed!
Hi! Thanks a lot for bringing up this topic. I wonder what are the other potential root causes of this issue to happen assuming the code doesn't even implement "deinit" of the object which is causing this to crash. Could it be primarily other threads capturing an object instance when it gets de-allocated (via usual weak->strong capture) or are there some other cases you might suggest to check?
@renanyoy@LeonidKokhnovych has any of you managed to track down this issue yet? I have the same scenario you described: no deinit implemented in the class where the crash occurs. in my case it's Object 0x109b64480 of class _DictionaryStorage deallocated with non-zero retain count 2.
I've met these crashes. I will need some amount of time to remember concrete details and inspect the old code from repo. But for now I can highlight the following:
these crashes appeared after release of iOS 17
they arose under a special combination of conditions:
there was a logical mistake with usage of two pthread_rwlock_ts in terms of concurrency
in rare circumstances a deadlock took place because of this incorrectness.
luckily at that time auth token refresh began to occur an order of magnitude more often
token refresh logic also contained incorrectness in one edge case
finally there were mistakes in implementation of ReadWriteLock class which used pthread_rwlock_t under the hood. Particulary, there was closure based initialisation:
private var rwlock: pthread_rwlock_t = {
var rwlock = pthread_rwlock_t()
pthread_rwlock_init(&rwlock, nil)
return rwlock
}()
Which is incorrect. And this value was passed as an argument in the following way:
public func writeLock() {
pthread_rwlock_wrlock(&rwlock)
}
Which is also incorrect. Thet pthread_rwlock_wrlock function has the following interface:
It has no inout parameter, it has UnsafeMutablePointer<pthread_rwlock_t> arg. Unfortunately, passing a pthread_rwlock_t variable as an &inout parameter to UnsafeMutablePointer arg is successfully compiled with no warnings and compiler diagnostics.
The second (and the most unexpected) thing is that it was not a problem before iOS 17. For some reasons we couldn't run our tests in iOS 17 during that period. There were no failures in iOS 15 / 16, but when we ran tests in iOS 17 then those tests failed.
Finally, we ended up with this implementation of ReadWriteLock :
public final class ReadWriteLock {
private let rwlock: UnsafeMutablePointer<pthread_rwlock_t>
public func writeLock() {
pthread_rwlock_wrlock(rwlock)
}
public func readLock() {
pthread_rwlock_rdlock(rwlock)
}
public func unlock() {
pthread_rwlock_unlock(rwlock)
}
public init() {
rwlock = UnsafeMutablePointer<pthread_rwlock_t>.allocate(capacity: 1)
rwlock.initialize(to: pthread_rwlock_t())
pthread_rwlock_init(rwlock, nil)
}
deinit {
pthread_rwlock_destroy(rwlock)
rwlock.deinitialize(count: 1)
rwlock.deallocate()
}
}
Wrappers for os_unfair_lock_t and pthread_mutex_t were refactored In the same manner.
I don't remember exactly when we began to catch less of these crashes, as we solved defects one by one, but after addressing all of them these crashes completely disappeared.
May be information about our concrete case will help you to investigate the problem.
Object 0x283ff8840 of class HttpStatusError deallocated with non-zero retain count 3. This object's deinit, or something called from it, may have created a strong reference to self which outlived deinit, resulting in a dangling reference.
– The class is immutable (all properties are declared as let), there is no deinit and there are no captures of self inside the class implementation.
In my case, all the crashes occur around mutations of a class's dictionary and another custom type which is implemented as a struct. I refactored all the mutations to call on a single function: updateDictValue(with key: ID, to value: Value) and updateValue(to newValue)
Based on my understanding of Swift Concurrency, my first attempt at fixing these crashes was to adnotate these functions with a custom global actor. I was expecting would synchronise access to the function and hence mutations.
However after spending more time to debug this, I managed to make it crash by adding a DispatchQueue.concurrentPerform which called on this functions. The crash that resulted is identical to the ones in Firebase.
I defaulted to wrapping those calls in dispatchQueue.sync closures and removing the global actor annotation entirely, which stopped crashing even with the artificially introduced race condition. I still do not completely understand why did the global actor solution did not work.