Adding a "unique" flag to alloc_ref

Adding a "unique" flag to alloc_ref instructions.

I'd like to propose adding a flag to alloc_ref (and alloc_ref_dynamic). The flag, "unique", would mark that this is the only reference to an object. Meaning this is the only place to write to or read from a given reference. This would allow for a few optimizations, paramount is the ability to use static access enforcement of reference types.

Scope

While the scope may grow over time, I think (at least for right now) keeping this small is key. The primary case I'm looking to improve is the following:

var c = class()
c.x = 42
return c.x

Because of the dynamic access enforcement around both the read and write of the member "x" many optimizations cannot be applied to the above snippet.

Performance

I think it's reasonable to expect that with very simple optimization passes, this change could provide similar performance improvements as seen in #31423. With little to no regression and almost a 100% code-size improvement performance improvement with -Osize in very select cases.

Design

I've created a PR (#32844) implementing this change. I'm 100% open to feedback. Both about the high-level design and specific implementation.

Edit: "code-size improvement" vs. "performance improvement with -Osize."

A few thoughts:

  1. I don't think that we can do this without a much bigger story here and I don't think without move only types (or something around uniqueness at the language level) we can implement this in a "nice way". So I am not sure if this is the right time/way to do this sort of thing. I think we would need a higher level, more holistic approach to this problem.

  2. I am against adding another flag onto alloc_ref: I would rather have a different allocation instruction. My reasoning is that adding flags to an instruction that create significantly different semantics/requirements is an anti-pattern in the code base. Generally as a principle, when semantics differ significantly to the point where we have to bifurcate logic in a bunch of different places, we should use a different instruction so that we can use exhaustive switches. If abstraction is needed over those instructions, we can always create a composition type like FullApplySite that enables us to define specialized behavior while still allowing for exhaustive switches. As an example of this, we have already added a flag like this to alloc_ref (the [stack]) modifier. In the vast majority of cases where we deal with alloc_ref today we have to check for that stack modifier since its semantics around ownership/etc are very different. We have had a ton of bugs over time related to not handling alloc_ref [stack] appropriately since it is just hard to know all of the places that one needs to update.

Since SIL already has reasonable conservative escape analysis—your example will be optimized if the bodies of class.init and c.x are visible to the compiler—I feel like the important part of this work would be figuring out how to enforce uniqueness. Should we be able to mark a constructor / method as "non-self-escaping"? Will the compiler then enforce that, as it does for non-escaping closure arguments?

As Michael says, I suspect the planned checking for move-only types would extend pretty well to this use case. It's a little unfortunate to tie today's reasonable pitches to a big planned feature, but I also wouldn't want to have several small features that all did similar things with slightly different rules and implementations.

Since SIL already has reasonable conservative escape analysis—your example will be optimized if the bodies of class.init and c.x are visible to the compiler

Unfortunately, this is not the case. Also, I should clarify that EscapeAnalysis is not what we need here. We cannot mark a value "unique" if it is passed as two different parameters to another function, even if it does not escape.

As Michael says, I suspect the planned checking for move-only types would extend pretty well to this use case. It's a little unfortunate to tie today's reasonable pitches to a big planned feature, but I also wouldn't want to have several small features that all did similar things with slightly different rules and implementations.

This is fair. I think we should be able to perform some sort of optimization on the above example both for move only class and normal classes, though.

That might be a nice add on later, but that is far out of scope for what I want to do here. A user should never know about this change. It is purely to allow the optimizer to have extra information which it can use to optimize reference types.

That's a good point, but it's still something SIL already has the information to do. It already promoted the allocation to the stack, which means it knows it doesn't escape. That's enough to prove that the "instantaneous" access checks (beginAccess with no endAccess) are trivially correct.

I'm not sure it does. Being allocated on the stack doesn't mean it gets the semantics of a local variable (as discussed in #31423).

That's enough to prove that the "instantaneous" access checks ( beginAccess with no endAccess ) are trivially correct.

What do you mean? Where do you see a begin_access without an end_access? Also, what do you mean by "instantaneous" access checks?

Taking that in opposite order: I suggest checking out the original proposal for exclusive access enforcement, SE-0176, and then taking a look at the generated IR for your example, not just the SIL: Compiler Explorer. (Some of the access optimizations run post-emit-sil.)

But the goal isn't (necessarily) to get these properties to have the semantics of a local variable; it's to prove that the dynamic access checks are unnecessary and thus avoid emitting them. #31423's "mark them as on-stack" was one way to do that, if one that turned out to be flawed, but it's not your goal, just a mechanism.

Thanks. Now I see what you mean. That would fix the problem :)

Yes, you're right. The static access is able to be optimized away. But, not optimizing these early (or at least earlier, might be a different issue, though) makes it harder to optimize other things. Anyway, as you said earlier, this will be fixed by "instantaneous" access checks.

Yeah, that's fair. I think we're in agreement.

1 Like

First, we do need some clear motivating examples to discuss. Some test cases are here:

But it isn't clear which ones are solved by the current PR:
https://github.com/apple/swift/pull/32844

And it isn't clear which optimization that PR is improving.

Ideally, I think AccessEnforcementSelection would completely eliminate dynamic checks for locally allocated references based on its own analysis without needing your flag. But it currently doesn't work on references at all, and still only runs very early in the pipeline. In the meantime, some of the other AccessEnforcementOpts will reduce the number of checks with your flag, but probably won't eliminate them completely.

I suspect there are some decent motivating examples, so assuming you can uncover those, I'll proceed to give a lengthy argument for adding the flag...

The alloc_ref [uniq] flag says that a reference is uniquely identified, which primarily implies lack of aliasing. It also happens to imply (CoW) uniqueness at all use points, no additional analysis needed, but that isn't the primary intention.

Aliasing and escaping are different things. Trivially:

@inline(never)
func bar(_ o: MyClass) -> MyClass {
  return o
}
public func foo() {
  let o1 = MyClass()
  o1.x = 1
  let o2 = bar(o1)
  o2.x = 2
}

o1 is a local nonescaping reference, but we cannot treat it as a distinct object from o2.

The immediate goal of the flag is to recognize the base of a memory access as fundamentally "uniquely identified". See AccessedStorage::isUniquelyIdentified. This is important because it allows a storage location to be uniqued by a hash key. This does not preclude any further alias analysis, but it is useful for some well-defined subset of local allocations, and will likely also be helpful for -Onone diagnostics.

Adding an instruction flag goes against my intuition. I almost never want to use the IR as an analysis cache. But let's seriously evaluate this case.

SIL should primarily function as a high-level IR, meaning that important Swift-specific properties should be easily discoverable from the SIL structure whenever possible. If it's easy to discover that some well-defined set of local references are "uniquely identified", then that should be a property of the SIL. It should not require interprocedural global dataflow, as would be expected with a mid-level IR. There is no question that we still need additional analyses to propagate reference uniqueness, but giving alloc_ref a property that supersedes the analysis nicely short-circuits it.

I expect checking if a pointer source is uniquely identified to be increasingly pervasive throughout the compiler. It is checked within utilities that don't have access to analyses. And, whenever this property is true, EscapeAnalysis queries will be cheaper. It's also important that we limit the interprocedural AliasAnalysis and EscapeAnalysis to a few key points in the pipeline.

In other words, just because StackPromotion was able to optimize something doesn't make it a "property of SIL".

Structural SIL properties should be computed on-the-fly if possible as long as that doesn't increase algorithmic complexity. For example, if discovering a property only requires walking the use-def chain without handling phis, then that should always be done on-the-fly. The proposed "unique" property requires analyzing all uses if the alloc_ref. That certainly could be done on the fly, but I have to admit, caching this in a flag will be cheaper, and I do want identifying AccessStorage to be as cheap as possible--it's already used within quadratic algorithms.

A viable alternative would be to have every pass that uses AccessedStorage or that otherwise checks if a reference is "uniquely identified" separately compute and cache this property. That's a fair amount of design complexity and realistically we'll end up missing out on it for a some passes.

To meet the bar for adding an instruction flag, we need it to be high-value, low-risk, low-maintenance.

The flag is high value. Identifying unique storage locations enables powerful optimization. I'm expecting @zoecarver to show examples where dynamic exclusivity checks to "automatically" disappear for local objects. The only uncertainty I have is how common it really is to have locally allocated references that are never passed as arguments or captured. Note, though that capturing an inout property of the object does not count as capturing the reference.

The flag is low-risk. Once set, nothing should invalidate it, and it is easily checked in the SIL verifier with the same logic that sets the flag.

QUESTION: Can anyone think of a SIL transformation that would violate the uniqueness of a local reference?

The flag is low-maintenance. The only cost of this proposal is the implementation of the flag itself: [SIL] Add a "unique" flag to alloc_ref instructions. by zoecarver · Pull Request #32844 · apple/swift · GitHub
It it easily verified with the same logic that originally determines uniqueness.

I'm open to other alternatives, but this flag is potentially a valuable addition to the current SIL infrastructure. Worst case, we get some nice improvements for a while and decide to remove the flag later when every optimization that matters has its own more powerful "unique reference analysis".


This is somewhat moot, but for those who are interested, the EscapeAnalysis public API (canEscapeTo) does not currently determine local reference uniqueness in the situation we want:

class MyClass {
  var x = 0
}
@inline(never)
func bar(_ i: inout Int) {
  i = 2
}
public func foo() {
  let c1 = MyClass()
  let c2 = MyClass()
  bar(&c1.x)
  c2.x = 3
}

Here, @zoecarver 's logic easily identifies the locally allocated references as unique. But according to our current AliasAnalysis, which uses EscapeAnalysis, the accesses to c1.x and c2.x appear to alias and the release of c1 and c2 both appear to escape both c1.x and c2.x.

Under the hood, there is enough information in EscapeAnalysis connection graph to determine that only the property escapes, not the object itself. But the public API is not designed to make use of that information.

StackPromotion does work in these cases because it actually peeks under the hood to look at the EscapeAnalysis connection graph (the pass and analysis are tightly coupled).

3 Likes

@Andrew_Trick that was a fantastic reply. Thank you.

First, we do need some clear motivating examples to discuss.

Good idea.

But it isn't clear which ones are solved by the current PR:

Well... none of them. This only implements the flag, it doesn't add a pass to mark references unique, or update any existing passes to use that information.

To clear up which PRs/commits do what:

This is an old (and closed) PR that promoted stack-allocated references to have static access enforcement and "stack" memory kind. But, it shows some possible performance improvements that would be nice to have if we can achieve them in other ways.

This PR adds the "unique" flag to alloc_ref. That's it.

The next PR just shows a benchmark of what happens when we remove access markers in the middle of the pass pipeline. It has the same improvements as the first PR but with some added regressions... curious. I think DSE/RLE is the reason for the improvement when accesses are removed. I'm not sure why the first PR doesn't have the regressions.

And last, here's the most recent PR I created. This PR will give us a benchmark of some baseline improvements we will (hopefully) get by incorporating the "unique" information into various access optimization passes. The goal of the PR is to get a benchmark, not to commit anything. So there's a lot going on.

I suspect there are some decent motivating examples, so assuming you can uncover those

Yes, I can work to uncover these. The above PR will hopefully show the performance improvements from updating AccessedStorage::isUniquelyIdentified. I suspect we could get even better results with a custom pass, though. As you said, there are also other improvements here beyond performance. I.e., static access errors instead or runtime errors.

Structural SIL properties should be computed on-the-fly if possible as long as that doesn't increase algorithmic complexity. For example, if discovering a property only requires walking the use-def chain without handling phis, then that should always be done on-the-fly. The proposed "unique" property requires analyzing all uses if the alloc_ref. That certainly could be done on the fly, but I have to admit, caching this in a flag will be cheaper, and I do want identifying AccessStorage to be as cheap as possible--it's already used within quadratic algorithms.

To be sure a reference is actually unique, I think we have to use a quadratic algorithm, we have to evaluate every use and every use of that use and so on and so forth. So, I think the performance issue is actually quite an important one to keep in mind.

Another thing that is really neat about this implementation is that because it doesn't rely on any analysis utils, it can run in the SILVerifier. This means it can run after every pass which makes debugging extremely easy.

The flag is low-maintenance.

Additionally, if we wanted to remove it for some reason, I don't think there would be much cost to simply "turning it off."


After thinking about this more, there's another place that could have huge performance benefits from this feature. As you mentioned, the CoW "problem" discussed in:

could potentially be resolved using this flag.

Yesterday and today I spent a bit of time playing around with the example in that post to try to figure out if I could use this flag to resolve the issue. I was actually able to get rid of all the copies with:

and manual inlining. But, I suspect more complicated examples of the problem that aren't able to be optimized away could be solved by using this flag to essentially say, "if the source argument is unique, don't make a copy here." We might have to have a way to expose this on the language level, though, and that may not be the best. Especially considering that ownership changes (move only types, etc.) are (hopefully) just around the corner.

Thanks for looking into this, @zoecarver!

In this post, you'll find a proof that the mutating slice CoW problem can't be solved purely with information that is tracked at compile-time. You need one bit of information that is tracked at runtime. So while adding a parameter to alloc_ref instructions might mesh nicely with a solution to that problem, it's neither necessary nor sufficient to solve the problem on its own.

I wonder if the property of uniqueness is really the most useful thing to represent, because it's so easily destroyed: all it takes is a copy. CoW buffers have the property that they're immutable unless uniquely-referenced, a durable property that can be encoded into the type system and used for many of the same optimizations. Oh I see you're mostly concerned with optimizations for types with reference semantics. Not my department, I suppose :wink:

Hmm. I think you're right. However, that all hangs on the fact that the mutation may not be visible:

because the ultimate mutation being applied to the slice may not be visible to the compiler so the extra information can't reach the site of the mutation statically

When the mutation is visible, I think we can still fix it statically. I know that doesn't completely solve the problem but it's at least a step in the right direction and might be a good mitigation in the mean time.

The hard part of this problem is the phase ordering aspect. DSE, RLE, and AccessEnforcementOpts are costly passes. The requirement for adding instances of costly passes to the pipeline is an understanding of the pass dependencies and justification for running the pass multiple times. No one wants to be in the business of reengineering the pass pipeline, but that's part of the job here. Realistically, the only way to keep the pass pipeline under control is having a high bar for expanding it. So I'm afraid the burden is on us to explain why it's necessary to augment the pipeline rather than simply pointing to the benchmark numbers.

I'm all for steps in the right direction and mitigation of the problem. That said, if I understand what you've described, it depends on having clear visibility all the way from the point of the allocation of the CoW buffer through the whole algorithm doing the mutations. I'm pretty sure that only happens in a small fraction of cases even in optimized, non-resilient code.

One of the first optimizations I asked the performance team for when I was designing our CoW approach was the ability to hoist the check out of a mutating operation. The first write in any mutating algorithm establishes uniqueness and that uniqueness is carried through the whole thing with no need to ever check dynamically again. And again, when the operation doesn't start with uniqueness, the buffer is immutable anyway. So I have a really hard time seeing how the flag you've described is capturing the information needed for that particular problem.

Another idea is to add another parameter convention. I'm not sure that's a good idea, and I'm sure it would have its own set of problems. But it would probably be an ABI stable way to fix the issue ¹. Anyway, now I've gotten off the original topic. I'll keep thinking about the problem, though :)


¹ It would be ABI stable if we could guarantee two overloads of the mutating function. One that takes a unique reference and the other would be the current function. Depending on whether the reference was unique, we'd either choose to call the function with the unique argument or the current function. To maintain ABI stability, we could simply always call the "old" function. I think this could all be done in SILGen/SILOptimizer so, the user wouldn't have to know the difference.

Fair enough. Then this probably wouldn't be that helpful.

This is true, but I think the problem is more that DSE and RLE have bad (if any) support for access markers. While moving the pass pipeline around might work because we could make it so that we strip access markers before running those passes, I think it would be better to just update the passes that don't support accesses markers (as a general rule of thumb).

It makes sense that access markers are interfering with those passes. That will be a much bigger problem when we have persistent static access markers, which is an active/ongoing goal. Stripping the static access markers is a problematic hack that I needed for bootstrapping.

You might check if this PR changed the behavior

When RLE/DSE was still in PR form I strongly objected to the way it represented memory access, but was told not to worry because it was just a prototype and would be fixed later :) I've said that myself many times so should know better. I'm in the middle of generalizing and cleaning up MemAccessUtils so it can be used by passes like RLE/DSE to reduce the time and space complexity. When that happens, those passes would automatically benefit from your alloc_ref uniqueness.

I'm still personally undecided whether it's best for alloc-uniqueness to be an analysis or a flag, but haven't heard any practical argument against the flag.

@zoecarver, sorry I conflated two concepts in my big wordy post above. Let me define these terms:

uniquely identified: two pointers with distinct uniquely identified bases cannot alias.

uniquely referenced: cannot alias with any other reference in the same function

unique SSA reference: a unique OSSA reference cannot alias with another reference within its OSSA lifetime. But it may be derived from a non-unique reference. We can also make use of this in non-OSSA, but there's no reference lifetime... instead the value is just unique up to any operation that's not known copy-safe.

%r1 = alloc_ref       // %r1 is uniquely identified and uniquely referenced
ref_element_addr %r1  // normal uses
destroy_value %r1     // lifetime ends

%r2 = alloc_ref       // %r2 is uniquely identified, not uniquely referenced
%r3 = copy_value %r1  // %r3 is neither uniquely identified nor uniquely referenced

%r4 = end_cow_mutation // unique SSA reference
// can't alias with another reference before it's destroyed
destroy_value %4

/cc @Erik_Eckstein since I haven't gotten around to reviewing all the CoW PRs yet. Just assuming we're on the same page.


Now, I looked into your issue with isUniquelyIdentified() and think it's much easier to fix your immediate problem; sorry if I misled you.

AccessedStorage::isUniquelyIdentified() can return true whenever isa<AllocRefInst>(getObject()).

This has nothing to do with escaping or stack allocation.

We could still add an alloc_ref flag and add an AccessedStorage::isUniquelyReferenced() API that checks it. Then AccessedStorage::isDistinct would be true for:

getKind() == Class && other.getKind() == Class
&& this->isUniquelyReferenced() || other->isUniquelyReferenced()
&& !this->hasIdenticalBase(other)

I'm not sure you're hitting that case in practice though.

A more powerful API would be an isUniqueReference() that is specific to an SSA def-use chain, possibly combined with a flow-sensitive analysis.

%r4 = end_cow_mutation // unique SSA reference
apply (@guaranteed %4) // known not to copy its argument
// %4 is still unique here
destroy_value %4
// It no longer "matters" of %4 aliases anything

But that API doesn't belong in AccessedStorage, because AccessedStorage properties should hold true at least throughout the current function.

At that point, you don't really need the alloc_ref flag anymore, since you might as well fold it's discovery into the same analysis.

I could potentially see adding both an alloc_ref uniquely identified flag and a "uniqueness preserving" flag on SILFunctionArgument since that would be much more powerful that the alloc_ref flag alone, and would still be easy to check in SILVerifier. But I can't justify that unless there ends up being a widespread need to call "isUniquelyReferenced" and we don't want to recompute interprocedural analysis each time.