Swift, ARC and memory footprint vs JVM and GC

Sea_Dragons · July 16, 2024, 1:43am

If garbage collectors' work requires combing through all the RAM while determining what can be freed, and reference-counted systems avoid all this associated processing, it makes sense there's some power overhead in the associated processing quite above the RAM consumed by either the pages awaiting garbage collection or the garbage collectors' notes on its progress. Do we have a source on the relative power cost of garbage collection? We saw a post that suggested the other mobile device required a 25% larger battery for parity, how much of that is the tech in the CPU and how much is the garbage collection process burning cycles trying to free RAM? Do we have data?

John_McCall · July 16, 2024, 2:07am

Note that the work Objective-C has done to optimize reference-counting is largely orthogonal to whether the compiler implicitly emits retains and releases, other than becoming more important because the compiler tends to be much more conservative about refcounting than a human would.

Objective-C is close to unique among traditional object-oriented languages (not counting languages like C++ with radically different models) in that it does not have a common object management system for all objects. I suspect it would have one if ref-counting were less of an afterthought in terms of the language’s evolution — for many years, Objective-C only had unique references, and you were just expected to copy objects instead of sharing them.

crontab · July 16, 2024, 5:56am

Among the modern mainstream languages, PHP has a combination of ARC and garbage collection, the latter being optional and can be turned off. GC is meant to detect circular references and in fact I think can be safely turned off if you use PHP's FastCGI model, i.e. when each script is run in response to one network request and dies with it. I.e. it's so short-lived that circular references can't do any significant harm.

jaleel · July 16, 2024, 6:14am

Keep in mind this are mobile devices, where memory wattage could make some difference. Comparing to servers having something like Xeon 120w energy hog. So I believe you can’t really extrapolate this results.

jaleel · July 16, 2024, 6:22am

Just realised there was this article

Not sure how accurate it is, but Java is more power efficient than Swift there. Article is quite old though.

crontab · July 16, 2024, 6:48am

It's from 2017, i.e. it was either Swift 3 or 4.

Swift did win in the fannkuch-redux test, it came No.4 after C, C++ and Rust, whereas Java came No.9 and with a significantly greater memory footprint (x5 compared to Swift).

However the fasta test - I have no idea what it does, but just judging from the fact that Fortran took No.2 here, it's likely a heavily computational test suite. Here, Java and Swift come next to each other with almost identical results at No.5 and 6.

This says nothing about the real-world performance of UI or server apps doing things that are a bit more useful than reshuffling int arrays and counting the number of flips.

jaleel · July 16, 2024, 6:51am

There are normalised results and energy table as well. Even if Swift beats Java now—it doesn’t mean GC will lead to significant worse results. My point still stands though, and on par with your scepticism about reshuffling arrays—you can’t really extrapolate results.

crontab · July 16, 2024, 7:17am

Reshuffling int arrays may not involve memory allocations within loops, though it is possible that Swift does a lot of unnecessary refcounting (while passing arrays around) where GC-based languages would do nothing, that's true, and it's probably what brings them so close to each other in these tests.

Java itself is not slow, it's a statically typed language after all, and well engineered too. The 25-30% difference in energy consumption for GUI apps is not that significant if you think about the higher-level VM-based languages like Python, Ruby etc., where the difference would be in multiples of that.

vns · July 16, 2024, 8:18am

To simplify memory management aspect of the language. At least at the base level, you don't need to think about lifetime, dangling pointers, cycles, etc. That's an abstraction level, so that you can put memory aside until it becomes important. And they are good at this. The major issue they introduce IMO is that by greatly hiding memory management from the developers, many won't start thinking about it ever, while with ARC you are forced by the language to understand at the beginning what's happening with the memory.

It is odd they didn't have binary-trees test results for Swift.

rysurd · July 16, 2024, 3:33pm

Ah I missed this reply.

I'd like more explaination if possible, on why you would lose benefits of ARC in a multi-thread situation?

Karl · July 16, 2024, 4:50pm

You wouldn't - at least, not entirely.

Where you would lose some predictability is that if you share data with other threads/suspended tasks, they will keep that data alive, and there is no firm guarantee about when they will be scheduled so they can do their work and eventually relinquish that data. That means that things like your application-wide peak memory usage are more difficult to predict.

The point I was making in reply is that even though at global scale it is more difficult to predict the lifetimes of data, ARC (especially with some of the new language features) still allows you to reason about lifetimes at a smaller scale.

David_Smith · July 16, 2024, 8:30pm

It's also worth noting, re ARC and multithreading, that concurrent access to refcounts is drastically slower than single-threaded (due to cores evicting the cache line containing the refcount from each other's caches on each atomic write).

So far this has only been a significant issue in a few very unusual cases^[1], but high-core-count server environments may want to approach architecting for ARC as though it was a distributed system (i.e. shared nothing + message passing, rather than shared state protected by locks).

Conveniently this sort of architecture is also generally optimal for high core count scalability anyway, so I wouldn't necessarily count it as a significant downside for ARC vs generational GCs^[2], but it is something to be aware of.

For example, I made the empty collection singletons "immortal" specifically to avoid contending on their refcounts when lots of threads are using Arrays ↩︎
It's been a long time since I studied garbage collectors, but my recollection is that they avoid this particular issue because the relevant memory writes are to each thread's stack, rather than to the shared thing they're all pointing to ↩︎

johannesweiss · July 17, 2024, 6:44am

FWIW, since day one in SwiftNIO's development, it was pretty clear that deinit-based (non-memory) resource management is not acceptable and wasn't used. There was a very short time where regrettably some defaults for NIOAsyncChannel were managed deinit-based which has since been addressed. In my opinion not doing deinit-based resource management is one of the core reasons SwiftNIO actually works and can safely be exposed to a network. If you ever rely on scarce resources like file descriptors or network connections to be managed with deinit, it's well possible that a network-bound attacker knows more about your file descriptor tables than your own program does. If so, that's game over and they can probably denial-of-service you easily. Not to speak of the countless hours of debugging you'd waste.

I'm surprised to hear this. Allow me to share my personal opinions about this topic too.

First of all, even with ~Copyable we still need ~Escapable to actually get to full RAII. But even when we get there, I think it's of questionable use for Swift. One of the things that keeps getting mentioned is C++'s std::lock_guard and Rust's MutexGuard. IMHO both suboptimal, they encourage accidental callouts to unrelated code under a lock and they actively try to make it hard for non-experts to figure out where lock is actually being released (all hidden inside the little }). C++ may be forgiven because the whole closure story there is a bit harder. In Swift we use withLock { ... } which fixes those. FWIW, once we have ~Escapable & ~Copyable, MutexGuard would still be incorrect in Swift as it may straddle an await which would be very very bad.

Okay, but now back to what's IMHO regrettably more commonly done in Swift: (Non-memory) Resource management with deinit in classes/actors. Let's first discuss what resources can safely & correctly be managed with this pattern:

The resources must be:

Releasable from any thread (because we can't predict the thread that will deinit)
Releasable synchronously (because we can't run asynchronous code from deinit)
The release process only trigger deinit-safe code (note: not all Swift code is deinit safe, deinit-safe code mustn't increase self's ref count beyond the scope of deinit, or else we crash (as that we resurrect self but we just declared it dead and called deinit))
Abundant enough that not releasing them or releasing them a little late is ok
The exact place & order of the deinit call doesn't matter for correctness (where a deinit is placed depends on a lot of things, including optimisation level)
The release should be infallible as there's really not much we can do without being able to return a value or throw from deinit

I've thought about the question what resources fit this bill and so far my list has one entry: memory. If there are other resources that fit this bill, I'm keen to learn.
This brings us to what deinit is great at: Managing memory for otherwise unmanaged memory, for example if need to manually .deallocate a pointer for some C library. (I also like deinit to assert that a bunch of state is how I expect it to be.)

As to why file descriptors / network connections and most other resources don't fit this bill at all:

In io_uring, most kevent/kqueue code and I/O libraries (Like NIO, libuv, ...), disposing of a file descriptor is asynchronous and pretty much has to be
Just using some synchronous invalidate() function from deinit which then disposes of the file descriptor in the background is
- Loses precise resource control (my program can no longer know or limit its use of file descriptors)
- Is incompatible with structured concurrency. Structured Concurrency mandates that I'm done with whatever background work I triggered once my async function returns. If I buy into deinit-based resource management I cannot uphold this guarantee
- Even if those are acceptable, this pretty much necessitates some top-level singleton object which is the always-alive thing that continuously makes progress disposing of the resources when its owners have vanished
File descriptors are of very limited, so many seemingly valid programs will start to see errors if the out-of-band resource destruction is slower than the creation of new ones
(sadly) There are network protocols where the EOF from a connection is meaningful for framing, so it's important to send it at the right time

Furthermore, buying into deinit-based resource management often forces the user into using weak which makes the overhead that reference counting adds (lots of -1/+1 ref count writes) even larger as there's now another level of indirection to access the reference count.

Being able to use tooling is important. Here's how SwiftNIO solves this: The newer NIOAsyncChannel APIs have with-style APIs which define the problem away, work great with cancellation etc.

The older pre-concurrency APIs make sure that the objects like Channels that hold file descriptors only go away if they have been disposed of correctly. So in case you do have a leak you can use the standard tools like heap, leaks and Xcode's Memory Graph Debugger to find what you leaked and who is holding onto that. Then you add your resource management and you're good.

Typically yes

It is not at all the same class of bugs. UaF are security vulnerabilities, use after close are things that may throw an error (NIO will give you ChannelError.ioOnClosedChannel if you write to a channel that you or the remote end have closed before).
Even better, in structured concurrency it's mostly try await withResource { resource in try await resource.use() } anyway which makes this class of problem impossible as we leveraged the code's structure to maintain its lifetime.

Yes, for memory, deinit works really well.

Is this xpc_transaction_{begin,end}? If so, I don't think I know enough about this API but from a first glance it seems like a structured API would solve these issues too in a very understandable and clear way, no?

try await withXPCTransaction { transaction in // _begin
   ...
} // _end

But look, I'm not saying that there is no resource apart from memory that can be managed successfully and correctly with deinit, I just haven't come across one that wasn't very niche. Maybe os_transaction is that niche, maybe it would better be withTransaction { ... }, I honestly don't know.

So if you're sure you have resources that fit the above requirements (any-thread, synchronous, order-independent, infallible, delayable destruction), then sure, deinit can be used for that.

But the messaging is IMHO important: If we say "Yay, use deinit for resource management, it's great (maybe there are places where it's not great)" then we're IMHO setting up developers for failure, especially under structured concurrency. On the other hand, saying "deinit is for manual memory management (and some limited other uses)" will lead to a much better place.

Somewhat comically, in the old Obj-C docs for dealloc did even have some good information but we lost it, presumably because deinit just doesn't have a docs entry...

You should try to avoid managing the lifetime of limited resources such as file descriptors using dealloc .

crontab · July 17, 2024, 7:30am

Wrappers around various handle-based PThreads objects come to mind. In C++ you'd use the destructor for e.g. pthread_mutex_destroy. But because you can't have a deinit in Swift structs, your wrapper would be a class probably making it a bit impractical. (Now all this makes me wonder how the newly proposed Mutex is implemented, need to check.)

Genaro-Chris · July 17, 2024, 7:59am

I think it is implemented as a noncopyable type with a deinit function

crontab · July 17, 2024, 9:14am

Interesting, so now deinit for non-copyables begins to make sense!

FlorianPircher · July 17, 2024, 10:45am

Mutex is ~Copyable so it can hold values that are themselves ~Copyable, but deinit is not used for locking. Instead, Mutex offers withLock APIs that take a closure. See:

github.com

swiftlang/swift/blob/7eff8fbb44007d60b998eec33e128e30056dc9db/stdlib/public/Synchronization/Mutex/Mutex.swift#L37-L53


      
          public struct Mutex<Value: ~Copyable>: ~Copyable {
            @usableFromInline
            let handle = _MutexHandle()
          
            @usableFromInline
            let value: _Cell<Value>
          
            /// Initializes a value of this mutex with the given initial state.
            ///
            /// - Parameter initialValue: The initial value to give to the mutex.
            @available(SwiftStdlib 6.0, *)
            @_alwaysEmitIntoClient
            @_transparent
            public init(_ initialValue: consuming sending Value) {
              value = _Cell(initialValue)
            }
          }

Alejandro · July 17, 2024, 12:06pm

@available(*, noasync)

Jumhyn · July 17, 2024, 1:34pm

Perhaps this is already what you're getting at, but it would be neat if we could have a regime where applying that annotation at the right spot meant that the type was still usable from an async context, but that its lifetime was guaranteed lie entirely within a single job and die before a suspension point is reached.

FranzBusch · July 17, 2024, 2:38pm

In its current form that annotation isn’t enough since you can trivially circumvent it by calling the method from a synchronous inline closure.