Deep recursion in deinit should not happen

sspringer · January 31, 2022, 5:37pm

When the removal of the only reference to an item triggers the removal of the only reference to another item and so on, we get a stack overflow of the deinits (could be implicit deinits just created by the compiler for reference counting) when this chain is long enough (depending on how "thick" each deinit is, maybe forum entry 54206 would help with that "thickness" of each item at least). See my code down below.

See there what people are tying to do to circumvent this problem, but I really think this should not occur at all, my gut feeling is that these nested calls of deinits should be unnecessary (maybe think tail recursion as an analogy). There should be a more clever way to manage the reference counting.

Any opinions about this? I would actually like to create a Swift bug for this, but I think I should get some opinions here first.

(For me, this issue came up via my forum entry 54506.)

class Chained {
    var next: Chained? = nil
    init() {}
}

print("building...")
var first: Chained? = Chained()
var last = first
//for _ in 1...1_000_000 { // for release mode
for _ in 1...100_000 { // for debug mode
    let next = Chained()
    last?.next = next
    last = next
}
last = nil

print("forgetting references...")
first = nil // Thread 1: EXC_BAD_ACCESS (code=2, address=0x16f603ff0)

print("done")

Stack trace (sorry, did not know how to copy it):

lorentey · January 31, 2022, 11:14pm

This would be best resolved by changing the compiler to emit more clever code for deinitializers, but for this simple class case we can use a source-level workaround:

class Chained {
  var next: Chained? = nil
  init() {}

  deinit {
    // Prevent stack overflow when destroying a long list
    var node = self
    while isKnownUniquelyReferenced(&node.next) {
      (node, node.next) = (node.next!, nil)
    }
  }
}

There is no suitable source-level workaround for enums with indirect cases, as they have no equivalent for isKnownUniquelyReferenced. (Which is another thing that we should fix, eventually.)

(Note: I have only convinced myself that the code above is correct; I haven't tried actually running it.)

tera · February 1, 2022, 2:08am

I'd say this limitation is quote serious, can be a showstopper for server side. Hope it is not an inherent ARC limitation, just the limitation of the current implementation, and fixable.

As a fix can this work: compiler does deinit in two phases, first accumulating "to be deinited" objects into a linked list, then going through that list and doing the actual deinit.

ouch.

ksluder · February 1, 2022, 3:27am

When would the compiler insert the call to “go through that list”? After every statement? Perhaps even more granular, which would require introducing the equivalent of a C language “sequence point”?

You’re proposing a highly visible change in observable behavior, which probably makes possible resurrection in situations it currently is not.

sspringer · February 1, 2022, 5:42am

Seems to be a serious thing. I created SR-15803 on bugs.swift.org with a backlink to this discussion.

jakevossen5 · February 1, 2022, 4:28pm

For what it is worth, Rust has a similar issue open here (migrated from here)

sspringer · February 1, 2022, 5:36pm

Even worse, from there:

This is due to the way rust drops things that are not longer used, since there is a big chain of Arc links the recursive calls can stack overflow. See rust-lang/rfcs#686.

Unfortunately there is not much we can do.

I really do not believe that such is true (same for Rust and Swift), this is maybe true for a kind of reference counting that one could implement manually in deinitialisers etc. (I do not even believe that it is that simple in Swift, as there are optimizations done for reference counting from what I heard), maybe this is just something that has to be taken seriously enough by the key developers of the language. So maybe it is no simple problem – but so was ABI stability, which has not been achieved in Rust but in Swift, because the "responsive" Swift people felt that it was really important. Any opinion from the Swift compiler people on this? Thanks.

Joe_Groff · February 1, 2022, 5:37pm

It should be possible to recursively destroy Swift object graphs without using up stack space or changing observable behavior, by reclaiming the freed space from objects as context for tracking the rest of the object graph to release. There was a paper describing this technique recently:

We have some ideas for how to apply this technique to Swift, but we haven't gotten around to implementing them yet.

ksluder · February 1, 2022, 5:53pm

The authors admit that their technique is not applicable in the presence of Rust’s bit-packing, which Swift also emulates:

We did not find a formal description of Rust’s optimised memory layout, but it is likely that the representation of some types must be reconsidered if one wants to support efficient destructors.

I also wonder whether their technique relies on total visibility into the types that make up the recursive structure. If so, their technique is possibly not applicable if mutually-recursive types are separated by a resilience boundary.

Joe_Groff · February 1, 2022, 5:57pm

In Swift's case, we have the benefit that all refcounted objects have a consistent header layout, which gives us at least two guaranteed words of extra space we can cram tracking information into.

tera · February 2, 2022, 12:25am

Same issue in Objective-C

#import <Foundation/Foundation.h>

@interface Chained: NSObject
@property Chained* next;
@end

@implementation Chained
- (void)dealloc {
    static int n = 0;
    if ((n % 1000) == 0) {
        NSLog(@"dealloc %d", n);
    }
    n += 1;
}
@end

void test(void) {
    NSLog(@"building...");
    Chained* first = [Chained new];
    Chained* last = first;
    int n = 100000; // 1000000 for release mode
    for (int i = 1; i < n; i++) {
        Chained* next = [Chained new];
        last.next = next;
        last = next;
    }
    last = nil;
    
    NSLog(@"forgetting references...");
    first = nil;
    
    /*
     Thread sanitizer, running outside Xcode
    2022-02-02 00:20:23.496 StackOverflow[15381:446319] dealloc 30000
    2022-02-02 00:20:24.229 StackOverflow[15381:446319] dealloc 31000
    2022-02-02 00:20:24.980 StackOverflow[15381:446319] dealloc 32000
    ThreadSanitizer:DEADLYSIGNAL
    ==15381==ERROR: ThreadSanitizer: stack-overflow on address 0x7ffee22edff8 (pc 0x00010d13a21c bp 0x7ffee22ee020 sp 0x7ffee22ee000 T446319)
    */
    
    /*
     No thread sanitizer:
     2022-02-02 00:21:45.604 StackOverflow[15425:447543] dealloc 38000
     2022-02-02 00:21:45.605 StackOverflow[15425:447543] dealloc 39000
     2022-02-02 00:21:45.605 StackOverflow[15425:447543] dealloc 40000
     Segmentation fault: 11
     */
    
    /*
     release mode:
     2022-02-02 00:22:29.130 StackOverflow[15478:448365] dealloc 45000
     2022-02-02 00:22:29.130 StackOverflow[15478:448365] dealloc 46000
     2022-02-02 00:22:29.130 StackOverflow[15478:448365] dealloc 47000
     Segmentation fault: 11
     */
    NSLog(@"done");
}

int main(int argc, const char * argv[]) {
    @autoreleasepool {
        test();
    }
    return 0;
}

Ditto for the manual reference counted version (different number of deallocations until crash)

specify -fno-objc-arc for this file in build phases

#import <Foundation/Foundation.h>

@interface Chained: NSObject {
    @public
    Chained* next;
}
@end

@implementation Chained
- (void)dealloc {
    static int n = 0;
    if ((n % 1000) == 0) {
        NSLog(@"dealloc %d", n);
    }
    n += 1;
    Chained* next = self->next;
    [super dealloc];
    [next release];
}
@end

void test(void) {
    NSLog(@"building...");
    Chained* first = [Chained new];
    Chained* last = first;
    int n = 1000000;
    for (int i = 1; i < n; i++) {
        Chained* next = [Chained new];
        last->next = next;
        last = next;
    }
    last = nil;
    
    NSLog(@"forgetting references...");
    [first release];
    
    /*
     Thread sanitizer, running outside Xcode
     2022-02-02 00:38:59.498 StackOverflow[15633:461353] dealloc 67000
     2022-02-02 00:38:59.922 StackOverflow[15633:461353] dealloc 68000
     2022-02-02 00:39:00.358 StackOverflow[15633:461353] dealloc 69000
     2022-02-02 00:39:00.808 StackOverflow[15633:461353] dealloc 70000
     ThreadSanitizer:DEADLYSIGNAL
     ThreadSanitizer:DEADLYSIGNAL
     ThreadSanitizer: nested bug in the same thread, aborting.
    */
    
    /*
     No thread sanitizer:
     2022-02-02 00:40:15.630 StackOverflow[15722:463283] dealloc 126000
     2022-02-02 00:40:15.630 StackOverflow[15722:463283] dealloc 127000
     2022-02-02 00:40:15.630 StackOverflow[15722:463283] dealloc 128000
     2022-02-02 00:40:15.630 StackOverflow[15722:463283] dealloc 129000
     2022-02-02 00:40:15.631 StackOverflow[15722:463283] dealloc 130000
     Segmentation fault: 11
     */
    
    /*
     release mode:
     2022-02-02 00:40:47.407 StackOverflow[15765:463962] dealloc 171000
     2022-02-02 00:40:47.407 StackOverflow[15765:463962] dealloc 172000
     2022-02-02 00:40:47.408 StackOverflow[15765:463962] dealloc 173000
     2022-02-02 00:40:47.408 StackOverflow[15765:463962] dealloc 174000
     Segmentation fault: 11
     */
    NSLog(@"done");
}

int main(int argc, const char * argv[]) {
    @autoreleasepool {
        test();
    }
    return 0;
}

I wonder how did we not notice it before.. No Objective-C user was using linked lists with 30-100K elements in the last 38 years?

itaiferber · February 2, 2022, 1:49pm

A similar example in C++ (excuse my C++-17):

#include <iostream>
#include <memory>
#include <optional>

struct Node {
    std::optional<std::unique_ptr<Node>> next;
};

int main(int argc, char *argv[]) {
    {
        std::optional<std::unique_ptr<Node>> root;
        for (auto i = 0; i < 1000000; ++i) {
            auto new_node = std::make_unique<Node>();
            new_node->next = std::move(root);
            root = std::optional{std::move(new_node)};
        }
    } // => Segmentation fault before ever hitting "Oops".
    
    std::cerr << "Oops" << std::endl;
}

This is a latent issue in most (all?) refcounting systems where manual care isn't taken to prevent long object chains.

Honestly, I'm myself a little surprised at the general surprise in this thread — in practice, it's pretty rare to have such a deep, acyclic object graph whose sole reference is a single root item such that deallocating the graph blows up the stack.

Non-intrusive linked lists are very rare in practice, and in many cases, once your object graph grows large enough, it's very common to:

Have links across objects in the graph such that it's unlikely that taking down a single object will deallocate the whole graph at once, and/or the root of the graph is effectively immortal for the lifetime of the program, making destruction occur only at the end of execution lifetime (e.g. many apps)
Have some form of entity management system which scales much better than direct object access (and in many cases, a side effect of such a system is a flattening of the object hierarchy, which prevents this in the first place)

At the risk of over-extending myself, I'd say that in practice, this is largely a non-issue for the vast majority of code out there. Obviously, it's not impossible to end up in a situation where manually overriding deinit is necessary, but it seems unlikely that this is a scenario worth optimizing for as a primary use-case.

(I'd love to see the ideas that @Joe_Groff has linked to explored and put into practice, but I don't feel like this might be an immediate, urgent issue.)

sspringer · February 2, 2022, 2:33pm

I have such a use case where I do need linked lists of many objects (linked lists because some items might get removed) and the whole structure might not be needed any more at one point. It might not occur often as a use case, but it certainly is a sensible use case, and I do not see any replacement for it.

I would maybe not argue about the urgency of this issue (just one comment about the urgency at the end of my comment), but this issue is certainly an important one. Let me justify:

The promise of the reference counting used by Swift is that it is a good replacement for garbage collection as implemented in Java and other languages if you just take care of possible cyclic references.
The use case described (at least with the large number of items) might not be a very common one, but it still a sensible, simple use case without an apparent alternative.
It is not clear in the described use case that a stack overflow could occur. You need to have knowledge about how reference counting is implemented in Swift to guess that something like this could happen. From the practitioner's view is is an unexpected behavior.
The two comments by @Joe_Groff above suggest that this problem it not inevitable. I seems that it can be resolved.

As a (maybe a little bold, and certainly a very personal) general statement, I think in a language that is started with such a large promise something like the described problem should not occur. This does not diminish how great this language is and how much very good and successful work has been put into it, but I really think that it is just not OK for Swift to have this issue.

One comment about the urgency (or the timing, if you like): ARC improvements according to this forums entry are on the way anyway, so maybe it is just the right time now?

itaiferber · February 2, 2022, 3:27pm

Apologies; I think my post came out unnecessarily abrupt. To clarify a little bit:

Absolutely! There are sensible use-cases that might hit this, and I'd be interested to hear more about your usage of linked lists in particular. (e.g., what access patterns for your data look like such that linked lists are the optimal choice for storage)

The Rust issue linked up-thread, for instance, doesn't add much backing information beyond the toy example, so real-world details add necessary nuance to a solution here.

I don't disagree with this statement at all — there was some commentary in this thread that made it sound (to me) as if this problem is both new, and possibly serious enough to need fix ASAP; my choice of words was aimed at that feeling, but maybe I misinterpreted others' wording.

In any case, I think there are definitely interesting developments in this area (e.g. the paper that Joe linked to is only from 2019, so it's not like this is a decades-old solved problem) that would be great to learn from (indeed, especially now if we're looking at other improvements to ARC).

tera · February 2, 2022, 5:52pm

I really hope this is not an inherent ARC limitation... Otherwise it can be a show stopper for ARC.

IMHO this is very serious. Imagine reference counting (¹) based web service that allows working with linked lists and allows hundred thousands of nodes. It is now obvious how malicious users can easily crash my service by merely creating and dropping a linked list of 50K elements. Can be very bad for swift publicity ("gee, our Java / Go / C# service can easily handle 1M nodes and swift service crashes after 50K!")

(¹) are there any precedents of ref count based languages that do not have this problem?

jakevossen5 · February 2, 2022, 6:01pm

I agree in general that typically 30k element linked lists aren't the ideal data structure, and there are almost always better options. But for the sake of a somewhat "real world example" I first came across this issue when I was naively parsing Lisp in Rust. Everything seemed to work, until I came across a massive file that looked like (in pseudo-lisp)

(let x = 1 in (let y = 2 in ( let z = 3 in ....)))

Yeah, there are ways to get around this issue, but it makes the ADT more complicated and less representative of the AST that the Lisp represents.

David_Smith · February 2, 2022, 6:09pm

I've dealt with this issue in ObjC (NSXMLDocument's internals) and C++ (WebKit's internals) before. My favorite suggestion in the ObjC case came from a former exec who pointed out that if I autoreleased, instead of released, in dealloc, it would "smear" the stack usage across the next N turns of the runloop, avoiding stack overflow. I didn't end up doing that, but it was a fun idea.

Philippe_Hausler · February 2, 2022, 6:09pm

It has been; particularly folks do notice it with NSOperationQueue from time to time when they fall into the pattern of super-linear chains of dependencies (when they should just be using a dedicated maxConcurrentOperations = 1 queue with cross queue dependencies to managed the non linear graph of dependencies).

By-in-far it is not common however since most usage for things like linked lists are often better served with arrays.

tera · February 2, 2022, 6:17pm

Easy example when I would want to use linked lists instead of array: Let's say I'm implementing object reuse, and instead of deiniting / initialising objects from scratch I put deleted objects into a linked list, and when I need a new object I take it from there. In this case a linked list is better than array (unless of course we are fighting with the issue being discussed). Obviously the list can grow massively. Now I am deleting this list because, say, I am changing scenes in the app - oops.

Philippe_Hausler · February 2, 2022, 6:20pm

In that case perhaps a Set<SomeSortOfHashableBoxByObjectIdentifier<T>> might be easier to use instead of a linked list.

I am not saying that it is a one size fits all answer, just often I find that going to a single or even doubly linked list makes things much more complicated at perhaps un-needed complexity.