A roadmap for improving Swift performance predictability: ARC improvements and ownership control

I agree with this, but frankly I don't think it matters what the rule is, so long as we agree on a rule. I brought this up a while ago: [SR-14419] Clarify guarantees around the lifetime of `self` in member functions · Issue #56775 · apple/swift · GitHub. If we have a clear rule, and a way to enable the alternative behaviour, then I think we've covered our bases.

While I'm here, I think these discussions have also made clear that another major problem we face is that the Swift compiler is deeply, deeply opaque. There is no way to see my code the way the Swift compiler sees it, in terms of ARC operations. I'd love a way to observe where the compiler thinks ARC operations should happen, both in terms of the logical model and then in terms of where they actually end up. I do this manually using SIL or ASM, but this is not really a terribly scalable programming model, and I guarantee I'm making mistakes.

5 Likes

That’s being worked on in the form of the currently unstable (not that you’d commit it into to source control anyway) @_assemblyVision.

2 Likes

I think that @_assemblyVision is the building block of such a tool, not the tool itself. It's definitely better than nothing, but using it across a project at scale is extremely tiresome. Its existence is a good sign for our ability to build the kind of tool I want, though.

1 Like

At what point does optimization at that level lapse into relying on undefined compiler behavior for performant code, anyway?

Ideally, not at all. I don't need the optimiser to make these choices for me: I'd actually be fine if the optimiser did very little, as long as I can a) easily introspect the choices it did make, and b) instruct the compiler in what I actually want it to do.

As a hypothetical in a different area, consider autovectorisation. Here's a simple loop:

func addOneToAll(_ arr: inout [UInt]) {
    var index = arr.startIndex
    while index < arr.endIndex {
        arr[index] &+= 1
        arr.formIndex(after: &index)
    }
}

This is a straightforward candidate for autovectorization. Sadly, we don't get it, because of our good friend bounds checking (Compiler Explorer). But I can rewrite the function to get it (Compiler Explorer):

func addOneToAll(_ arr: inout [UInt]) {
    arr.withUnsafeMutableBufferPointer { ptr in
        var index = ptr.startIndex
        while index < ptr.endIndex {
            ptr[index] &+= 1
            ptr.formIndex(after: &index)
        }
    }
}

I would describe this as "relying on undefined compiler behaviour for performant code". What I want (in this example) to do is to be able to say "I expect the autovectorization pass to be able to transform this loop into a series of vector operations", and have the compiler be able to tell me "couldn't do that boss, sorry". This at least makes the compiler's behaviour visible to me.

Now, autovec is a bad example for this because it's actually really hard to add this kind of diagnostic. However, with more semantic annotations and more tooling it becomes possible to get a lot closer to being able to solve for this without relying on undefined compiler behaviour.

6 Likes

Another win for UnsafeBoundsCheckedBufferPointer, as it happens ;) - Compiler Explorer

I wonder if we can somehow merge the two worlds and keep the readability of the former and performance of the latter fragments. For example:

@performance(opt_out_of_bounds_checking)
func addOneToAll(_ arr: inout [UInt]) {
     // here follows original nice looking code
}

or 

func addOneToAll(_ arr: inout [UInt]) {
    i_solemnly_swear_this_correct_and_opting_out_of_bounds_checking {
        // here follows original nice looking code
    }
}

Ideally, the way to do that is functional programming: a function that takes a mutable collection and performs a provided mutating operation on each element could be designed such that the compiler does not need to check it.

/// Precondition: The collection must support an internal representation in a form of mutable contiguous storage.
func mutateAll<C: MutableCollection>(in elements: inout C, using operation: (inout C.Element) -> Void) {
  let result: Void? = elements.withContiguousMutableStorageIfAvailable { ptr in
    var index = ptr.startIndex
    while index < ptr.endIndex {
      operation(&ptr[index])
      ptr.formIndex(after: &index)
    }
  }
  precondition(result != nil) // Couldn’t figure out a better way to do this
}

Then you can use that repeatedly without risking some form of mistake.

/// Precondition: All values in `arr` must be less than or equal to `UInt.max - 1`.
func addOneToAll(_ arr: inout [UInt]) {
  mutateAll(in: &arr) {
    precondition($0 <= .max &- 1)
    $0 &+= 1
  }
}

You could make unsafe variants that don’t do precondition checks, of course. You could also eliminate irrelevant requirements with ease.

/// Precondition: All values in the collection must be less than or equal to `C.Element.max - 1`.
/// Precondition: `C` must support an internal representation in a form of mutable contiguous storage.
func unsafeAddOneToAll<C: MutableCollection>(_ arr: inout C) where C.Element: FixedWidthInteger {
  mutateAll(in: &arr) {
    assert($0 <= .max &- 1)
    $0 &+= 1
  }
}

Maybe I shouldn't have used the term “semantic ARC,” because I'm not exactly sure what you're hearing when I say that. What I really mean is that there are still lots of places where extra traffic persists, @Karl​'s example being one. It seems to me that these extra retains and releases should never make it into unoptimized code in the first place, and the part of codegen that is responsible for them needs to be rethought. I've always assumed that was a part of the “semantic ARC” work, but if it isn't, IMO the problem needs to be attacked at a much more fundamental level than “how do we get rid of these extra things we inserted?”

and seeing the extent of breakage and instability that fell out from aggressively shortening lifetimes.

What I think I hear you saying is that there's a lot of code out there that's incorrect under the current rules but happened to work with the current implementation, and you want to change the rules to make that code correct. Essentially, Hyrum's law has played out?

If the problem is only source compatibility with that broken code, I'd say make the authors of unsafe code explicitly migrate and check their unsafe code. If the problem is binary compatibility with that code, you have my sympathy, and I have no way to judge the seriousness of the problem from Apple's perspective… but the vagueness and complexity of what's being proposed here still leaves me concerned for Swift's future. “A variable's lifetime ends immediately after its last use” is relatively easy to understand and reason about, but “releases are anchored to the end of the variable's scope, and that operations such as accessing a weak reference, using pointers, or calling into external functions, act as deinitialization barriers that limit the optimizer's ability to shorten variable lifetimes…[we] will go into more detail about what exactly anchoring means, and what constitutes a barrier…” is not.

To be clear, the goal is not to be exactly like C++, but to provide rules that establish an actual boundary to how much we can shorten lifetimes, while still providing some flexibility to optimize.

I'm on board with those goals!

I think there will always be value to having explicit annotations for people who want to establish performance guarantees for moves, without having to think like a compiler to get them.

I'm not arguing that no annotations are needed. I'm arguing that we only need one concept, call it whatever you like: “escaping,” “pass-by-move,” ”owned,” “consuming,” or “Fred;” there's just one thing here. We already have @escaping, and if you want to respell that I'm fine with it, but there's no need to add another one.

Those are good approaches.

consuming and nonconsuming become unavoidable concepts once we have move-only types, because they specify whether a move-only value even exists after getting used by a call.

Yes, but they are not unavoidably distinct from escaping/non-escaping. That's my point.

just because you may escape a value doesn't necessarily mean you want to consume your argument, since you may be copying it out of its existing location.

If you are going to copy it out of its existing location anyway, the optimal thing to do is to push that copy up to the caller and make it escaping/owned/consuming, in case they might be passing a value whose lifetime is going to end—in which case the copy can be avoided.

IIUC, the only case this doesn't cover is that you might want to pass a parameter that is only conditionally escaped: it may or may not be copied out of its existing location, based on some dynamic value. The horrible rvalue reference system of C++ (at least partly my fault!) essentially exists to support that scenario, and it's one of my great regrets, because at least IME we almost never take advantage of that capability in real code. I'd rather tell people that in those rare cases, they may pay for a copy at the call site that is later discarded, than add an ”unowned-but-escaping“ concept before it's been proven necessary. And this is especially true for Swift where a discarded copy is O(1). You can even justify this choice in the name of performance predictability.

And my apologies for leaving some things under-described; for some things, it's hard to find a balance between providing a synopsis of what it is that doesn't end up overwhelming the entire document.

My concern here is that if a full description would overwhelm the entire document, it's probably too much complexity for the user model. With no apologies to the Swift API Guidelines:

If you are having trouble describing your language's functionality in simple terms, you may have designed the wrong language.

3 Likes

How do you propose doing this if the caller and callee are compiled separately, and the callee is resilient?

The

The callee has an @escaping annotation (or whatever) on the parameter declaration, which affects the calling convention.

I must have misunderstood what you were describing. I thought you were talking about changing the ARC behavior in a revision of the callee, to optimize a different implementation. I guess you’re referring to how it should be spelled in the first place?

Sorry @ksluder, I'm afraid I don't understand your question. Care to rephrase?

Well, it’s not really a question, but the misunderstanding I had is actually relevant to the larger feature.

The vast majority of the deployed Swift ecosystem is on Darwin, in which the system frameworks are resilient and built separately from client applications. If, as a system framework author, I identify a way to rewrite a method implementation that could avoid ARC traffic and improve performance, how do I do that without breaking ABI?

ObjC has the takeAutoreleasedReturnValue family of runtime functions that peek the stack to determine whether they can safely avoid touching the retain count. Should we contemplate the same for Swift?

I feel it is a mistake to disregard Swift packages, which are used extensively in pretty much every part of the ecosystem (their use is actively encouraged for iOS/macOS development, for instance) and are built alongside client applications.

Just because something breaks ABI doesn’t mean it should be impossible even outside of library evolution.

I’m not disregarding Swift packages at all. But a) binary Swift packages with resilient ABIs are a thing, and b) several hot-path frameworks like the Standard Library and AppKit/UIKit are not delivered as Swift packages, but rather are distributed with the host platform. We cannot disregard those either.

I agree. I just feel it is worth considering that many optimizations couldn’t apply to such frameworks, but may still have value in the language.

By the way, I find it very bizarre that binary targets in Swift packages remain unusable outside Darwin. I feel that should be addressed as soon as possible, to avoid further impeding Swift’s adoption.

I’m curious how many optimizations could exist for such frameworks… perhaps at the cost of a branch, which may be better than a cache invalidation (due to atomically incrementing a refcount) on highly parallel systems.

1 Like

To be more specific, our plan for the previous release was to make the copy forwarding optimization mandatory, and perform it in all builds, since the semantic ARC infrastructure is now robust enough to do so. The major problem we ran into is that it broke many people's debug workflows, and also broke a lot of code that assumed longer lifetimes than we actually guaranteed—because, as @lukasa, @Andrew_Trick , and others noted, we didn't really have any guarantees, and it wasn't just "unsafe" code, or code that could be reasonably be declared to be wrong, that was affected. We've worked out a proposed set of rules that looks like it fits existing practice, and it's fair to be concerned that it's too complex and/or still too conservative. We do have to define some rule in order to establish guardrails as the restraints on the optimizer get lifted by better SIL infrastructure.

Another situation where you may need to copy something out of a value without consuming it is if you have a mutating method on the value that caches some part of the value going in; you want to copy those parts out into the cache, but you wouldn't be able avoid leaving the original in the value being modified. If an ABI-resilient API has already been published, and its implementation changes to either copy-out a value or stop copying out a value, then you have to keep the original ABI as well. Although it's true that nonescaping is inherently nonconsuming (it's effectively nonconsuming + no-implicit-copy + no-explicit-copy either), it is useful to be able to escape copiable nonconsuming values too. So yeah, there is only consuming/nonconsuming for move-only types, but for copiable types, I think nonescaping/nonconsuming/consuming are all interesting conditions.

Although you can't break ABI, you can in principle extend the ABI. If a resilient framework author identifies an opportunity to provide a better convention, we could possibly allow you to export that as a new entry point for new code to compile against, while keeping the old convention around as a thunk.

3 Likes

If the lifetime ends after the last "use", then I think we need a rule for what constitutes a formal use, and that rule should only require local reasoning.

let d = Delegate()
let c = Class(weakDelegate: d)
c.useDelegate() // does d exist?
doNothing(d)

The reference to d on the last line may be "dead" after optimization, but I think it should extend the lifetime of d if there may be any uses of a weak reference to d.

I find lexical lifetimes more intuitive than the above, but either model would work for me. I'm concerned that the current model does not allow local reasoning about lifetimes, and any mental shortcuts that I develop are likely to be broken by a better optimizer.

2 Likes