A roadmap for improving Swift performance predictability: ARC improvements and ownership control

Saklad5 · December 31, 2021, 10:20pm

  @inlinable
  internal func offsetForward(_ i: Index, by distance: Int) -> Index {
    guard let index = offsetForward(i, by: distance, limitedBy: endIndex)
      else { fatalError("Index is out of bounds") }
    return index
  }

That function isn’t used anywhere outside the file, and the fatal error will never be triggered. It could be the following:

  private func offsetForward(_ i: Index, by distance: Int) -> Index {
    offsetForward(i, by: distance, limitedBy: endIndex).unsafelyUnwrapped
  }

I say “could” because you then start to realize the entire function has no reason to exist, because the only way any of this could fail is a catastrophic disregard for protocol requirements, and if you don’t trust that none of this has any meaning.

Jon_Shier · December 31, 2021, 10:26pm

I still don't see your point. The fatalError serves as a check against local changes being made in the future, even if they aren't visible outside the file. At worst it does the same thing more verbosely, which is offset by the additional readability and / or the runtime error message.

So, aside from your opinion on code structure, is there actually a performance impact here? Is the expense of the else branch you're worried about? Or just the unwrapping in general? That seems more a case for using unsafelyUnwrapped instead of traditional unwrapping in performance sensitive code, not a case against using fatalError, which seemed to be your original point.

Saklad5 · December 31, 2021, 10:43pm

If I compile that in -Ounchecked, it will retain a demonstrably unnecessary check. This is no better than avoidable ARC traffic.

Looking at it further, the actual reason those checks exist is that it is already allowing precondition violations: distance is allowed to be negative even if it isn’t a BidirectionalCollection, despite the requirements in the inherited documentation. As a result, there are many checks where there could have been one, and they are kept when they don’t need to be.

Current implementation (with inherited documentation)

/// Returns an index that is the specified distance from the given index.
/// […]
/// The value passed as `distance` must not offset `i` beyond the bounds of
/// the collection.
///
/// - Parameters:
///   - i: A valid index of the collection.
///   - distance: The distance to offset `i`. `distance` must not be negative
///     unless the collection conforms to the `BidirectionalCollection`
///     protocol.
/// - Returns: An index offset by `distance` from the index `i`. If
///   `distance` is positive, this is the same value as the result of
///   `distance` calls to `index(after:)`. If `distance` is negative, this
///   is the same value as the result of `abs(distance)` calls to
///   `index(before:)`.
///
/// - Complexity: O(1) if the collection conforms to
///   `RandomAccessCollection`; otherwise, O(*k*), where *k* is the absolute
///   value of `distance`.
@inlinable
public func index(_ i: Index, offsetBy distance: Int) -> Index {
  guard distance != 0 else { return i }
  
  return distance > 0
    ? offsetForward(i, by: distance)
    : offsetBackward(i, by: -distance)
}

Performant code needs to be diligent about ruling out invalid state immediately, so it can safely skip redundant checks after that point.

Don’t get me wrong, fatalError should be used in certain cases. But it is often used where it shouldn’t be. If you can’t identify why it would be reached in correct code, that’s a bad sign.

Also, as was discussed earlier: all Swift code should be performant. This principle should be followed everywhere. Problem areas may justify unsafe-prefixed alternatives that use assertions instead of preconditions (effectively shifting responsibility for safety to the caller), but other than that there should be no distinction.

Lantua · January 1, 2022, 1:50am

I can't say I agree w/ much of these, but I also think we steered off topic, which is about predictable & fine-tunable ARC.

Karl · January 1, 2022, 10:19pm

When looking at the retains and releases in my code, I've noticed one particular pattern coming up fairly frequently. Consider the following:

func test(_ array: Array<Int>) {
  someFunction(array.someSlice)
}

func someFunction(_ c: ArraySlice<Int>) {
  blackHole(c)
}

extension Collection {
  @inline(never) // Simulate a function the compiler decides not to inline.
  var someSlice: SubSequence {
    self[startIndex..<index(startIndex, offsetBy: 5)]
  }
}

@_silgen_name("blackHole")
func blackHole<T>(_: T)

What I'm seeing is that the compiler creates the slice, retains it, calls blackHole, and releases the slice afterwards (this happens both in a standard -O build, and with OSSA modules enabled). The retain/release disappears if I use the entire array directly, or if the compiler happens to inline the someSlice getter. (Godbolt).

I'm not entirely sure why these refcounting operations are even happening - as I understand it, both someFunction and blackHole should accept their arguments at +0. test already has a +0 value provided to it, so the lifetime math trivially checks out.

I'm also not sure how to resolve it. I've tried adding a _read accessor to someSlice, making it a __consuming func, and marking parameters as __shared (i.e. nonconsuming), but none of those things helped. Is the idea that the someSlice property should return a ref SubSequence? And if so, wouldn't that prohibit anybody from ever extending the slice's lifetime?

I don't want that - it's not what this code is supposed to express. Within test, the slice's lifetime is guaranteed by the array's lifetime, so when calling someFunction/blackHole, I don't want any retains or releases added since the lifetime is not extended. At the same time, the slice doesn't point to a stack buffer or whatever, so any callees are free to escape the value if they so choose (provided they retain it first).

The only thing I can think of is that the retains are caused by the ArraySlice initializer, which of course needs to store its base collection. But slices in Swift are just facades - they adjust the startIndex and endIndex properties used for the Collection conformance, but otherwise just forward everything to the base collection. If the slice is causing the extra ARC traffic and there's no way in sight to remove that, it means that structs in Swift are very far from "zero-cost" abstractions since they do not carry lifetime information.

Saklad5 · January 1, 2022, 10:25pm

Have you tried @_assemblyVision?

Karl · January 1, 2022, 10:31pm

Yes, and you can try it for yourself on the Godbolt link. It doesn't have much to say, except it's retaining and releasing a slice:

remark: retain of type 'ArraySlice<Int>'
remark: release of type 'ArraySlice<Int>'

Michael_Gottesman · January 2, 2022, 1:06am

@Karl suggestion. For these types of performance issues, can you make a post to a separate thread in compiler-dev rather than in this thread? I am taking a look at your example and I will tell you what is happening there in a post in using swift.

Just give me a little bit.

Joe_Groff · January 4, 2022, 6:42pm

[quote="dabrahams, post:75, topic:54206"]

dabrahams:

IMO guaranteeing the lifetime of locals until end-of-scope is one of the worst design tradeoffs in C++. The “RAII trick” of attaching cleanups to the lifetimes of locals is not inherently better than defer —to the contrary, it's only an indirect expression of intent—and requiring explicit move() on last use (or penalizing performance when it is omitted) for the convenience of low-level code that exposes pointers or interoperates with C/C++ seems like an inversion of design priorities. Do we really believe this change is consistent with the spirit of Swift?

It seems a bit premature to make other ARC changes in the name of performance predictability while semantic ARC is (IIUC) as yet not-fully-realized. Swift's performance and performance predictability has always been hobbled by an ARC implementation that doesn't work at -O0 and tries to do its job based on a fraction of the information available in the source. Shouldn't that work be carried through to its conclusion so we can see what the actual remaining problems are? /cc @Michael_Gottesman

The inspiration for what we're calling "lexical lifetimes" came directly from trying to fully realize semantic ARC, and seeing the extent of breakage and instability that fell out from aggressively shortening lifetimes. To be clear, the goal is not to be exactly like C++, but to provide rules that establish an actual boundary to how much we can shorten lifetimes, while still providing some flexibility to optimize. As Andy noted, we can introduce new concepts that allow for fully aggressive lifetime shortening on value types and other objects where we know there are no interesting deinit side effects too. Regardless of that, though, I think there will always be value to having explicit annotations for people who want to establish performance guarantees for moves, without having to think like a compiler to get them. I see it as being analogous to tail call optimization in code: it's desirable to the point of being required for lots of code, but hard to reason about exactly when it happens, so many languages provide a means to make it explicit where it's required.

The combination of read coroutines with borrow bindings leads us to an alternative to that pattern, since you can write the setup/teardown of the resource as the before and after of the yield in the coroutine body, yield the resource as a ref to the caller, and then clean it up when the ref is no longer needed in the caller. We could also conceivably introduce a with x = resource1, y = resource2 { } sort of block statement to make the acquisition of resources for a lexical scope more explicit.

consuming and nonconsuming become unavoidable concepts once we have move-only types, because they specify whether a move-only value even exists after getting used by a call. They also affect ABI. And escaping a value is not the only case where you want to consume it; anything that makes sense as the final use of a value, or which tears down a resource, is also a good candidate for using a consuming convention. Conversely, just because you may escape a value doesn't necessarily mean you want to consume your argument, since you may be copying it out of its existing location. We do have optimizer flexibility to infer from the usage of a value whether we should change the convention for a function, in cases where ABI is not a concern, but because it affects ABI, we can't infer it for anything public.

To be clear, this document is intended to be a roadmap, previewing what we're working on, and acting as a reference point to tie together all of these related proposals, so that as the proposals themselves come out, they can focus on describing their specific feature and refer to this roadmap for context where necessary. None of the names or syntaxes I picked are intended to be the final word. And my apologies for leaving some things under-described; for some things, it's hard to find a balance between providing a synopsis of what it is that doesn't end up overwhelming the entire document.

Lantua · January 4, 2022, 7:47pm

One thing I haven't seen mentioned is how the lexical lifetime is resistant to refactoring compared to the usage lifetime (until last use). For example, things as ordinary as swapping two seemingly unrelated statements could rearrange the deinit.

{
  let a = ...
  // statement 1
  someFunction(a)
  // deinit a
}
vs
{
  let a = ...
  someFunction(a)
  // deinit a
  // statement 1
}

That'd be very bad if statement 1 depends on the a existence, while doesn't directly use a. Now, one can say that we should use withExtendedLifetime to extend a appropriately. The thing is, the first code was correct that a outlives statement1. It's only become a problem when the carefully-crafted code got forgotten and refactored (swapping the two seemingly unrelated lines). This would be less problematic with the lexical lifetime since they don't move as much compared to the last statement.

Refactoring resistance is something that I've grown to appreciate over the years. You write once, read often, refactor ad infinitum.

Andrew_Trick · January 4, 2022, 10:36pm

The unpredictable nature of inlining is an extremely common problem for understanding and reproducing ARC and deinitialization behavior. It is one of the main reasons that implicit assumptions almost always go unnoticed. And when issues are noticed, it's almost always surprising and not linked to an obviously related change. It's also a reason that we can't generally diagnose such assumptions.

(The other main reason that issues are typically never found is that the compiler never directly implemented, tested, or enforced the optimistic formal rules).

John_McCall · January 5, 2022, 7:39am

How many of these lifetime issues that block the implementation of semantic ARC would be addressed if class methods guaranteed self for their duration? I think programmers would find that intuitive: they know that it's dangerous to pull a dependent value (e.g. an unsafe pointer) into an independent variable, but they don't think of that as something that can happen within a method, which is part of why so many of these examples of miscompiles seem surprising. Tricky code using dependent values or weak references is usually encapsulated into a method anyway.

This would also bring optimized code much more closely in line with unoptimized code. Without optimization, we can never interleave destruction with a method call on that object. Class methods in particular require fairly substantial optimization before the lifetime problems you're talking about are exposed, since the method call has to be both devirtualized and inlined. Depending on the method, that might really require the stars to align, which is a serious problem: even if we liked interleaving in principle, the fact that it doesn't happen reliably makes the problems much more latent. This rule would turn class methods into a sort of critical section that the destruction of self must be strictly ordered after, no matter how the call was optimized. That means inlining a class method call would at best change the lifetime of the other values flowing in and out of the method, and I think that's both more intuitively understandable and less likely to actually cause bugs.

This would effectively make all class method calls pin the lifetime of self, but I don't think that's a particularly surprising or harmful outcome. It would only inhibit optimization we could otherwise do if the method was both inlinable and didn't touch self at all, and that's a rare thing.

The most important thing is that this would preserve most of the power of the optimizer to shorten lifetimes by default. I would expect that we wouldn't need an analogous rule for value types, or for protocol methods; the critical sections that it's important not to disrupt will overwhelmingly be in concrete code on class types. (I do think we'd need this rule again for value types with explicit deinits, though. And while it's a minor consideration, it's also interesting that this rule naturally matches how we'll treat C++ methods, since their bodies are similarly atomic from the lifetime optimizer's perspective.)

Saklad5 · January 5, 2022, 4:03pm

I agree that’s an extremely easy thing to overlook. Implicit self means instance methods appear to be part of the instance, which would imply self remains loaded for their duration.

That being said, I’d rather Swift code made every runtime optimization it could get away with, or at least didn’t rule any out. Would it be possible to instead require some sort of explicit acknowledgement that unloading could happen during a method’s lifetime? That way, we could prevent mistakes without also preventing potential improvements.

John_McCall · January 5, 2022, 4:14pm

I understand, but I think a compelling case has been made that we can’t get away with such a rule. Having no guardrails on the destruction of objects simply makes it far too difficult to write correct code. You should make your peace with that.

Saklad5 · January 5, 2022, 4:16pm

I’m mainly concerned about what people might end up doing to work around such a guardrail in the name of optimization.

On balance, however, I think you’re probably right.

Flipping the issue around, what if there was a way to opt into shortening the lifetime of self further? Maybe an attribute at the function declaration that doesn’t affect the interface at all?

For that matter, could you shorten the lifetime by doing this?

let _ = move(self)

If so, I’d be much less worried about people making convoluted workarounds, as they could simply do that.

John_McCall · January 5, 2022, 4:38pm

Yes, something like that would be possible if it proves useful. And perhaps the rule just generally shouldn’t apply in consuming methods, although those are quite rare.

Saklad5 · January 5, 2022, 5:26pm

By the way, when you say “class methods” do you mean “instance methods of any type with reference semantics”? Or do you mean “instance methods of classes” or “type methods of classes that can be overridden”?

I’ve been assuming the former, if it wasn’t clear.

John_McCall · January 5, 2022, 5:52pm

The former, yes.

lukasa · January 6, 2022, 12:34pm

I agree with this, but frankly I don't think it matters what the rule is, so long as we agree on a rule. I brought this up a while ago: [SR-14419] Clarify guarantees around the lifetime of `self` in member functions · Issue #56775 · apple/swift · GitHub. If we have a clear rule, and a way to enable the alternative behaviour, then I think we've covered our bases.

While I'm here, I think these discussions have also made clear that another major problem we face is that the Swift compiler is deeply, deeply opaque. There is no way to see my code the way the Swift compiler sees it, in terms of ARC operations. I'd love a way to observe where the compiler thinks ARC operations should happen, both in terms of the logical model and then in terms of where they actually end up. I do this manually using SIL or ASM, but this is not really a terribly scalable programming model, and I guarantee I'm making mistakes.

Saklad5 · January 6, 2022, 12:37pm

That’s being worked on in the form of the currently unstable (not that you’d commit it into to source control anyway) @_assemblyVision.