Adding a "unique" flag to alloc_ref

I'm not sure it does. Being allocated on the stack doesn't mean it gets the semantics of a local variable (as discussed in #31423).

That's enough to prove that the "instantaneous" access checks ( beginAccess with no endAccess ) are trivially correct.

What do you mean? Where do you see a begin_access without an end_access? Also, what do you mean by "instantaneous" access checks?

Taking that in opposite order: I suggest checking out the original proposal for exclusive access enforcement, SE-0176, and then taking a look at the generated IR for your example, not just the SIL: Compiler Explorer. (Some of the access optimizations run post-emit-sil.)

But the goal isn't (necessarily) to get these properties to have the semantics of a local variable; it's to prove that the dynamic access checks are unnecessary and thus avoid emitting them. #31423's "mark them as on-stack" was one way to do that, if one that turned out to be flawed, but it's not your goal, just a mechanism.

Thanks. Now I see what you mean. That would fix the problem :)

Yes, you're right. The static access is able to be optimized away. But, not optimizing these early (or at least earlier, might be a different issue, though) makes it harder to optimize other things. Anyway, as you said earlier, this will be fixed by "instantaneous" access checks.

Yeah, that's fair. I think we're in agreement.

1 Like

First, we do need some clear motivating examples to discuss. Some test cases are here:

But it isn't clear which ones are solved by the current PR:
https://github.com/apple/swift/pull/32844

And it isn't clear which optimization that PR is improving.

Ideally, I think AccessEnforcementSelection would completely eliminate dynamic checks for locally allocated references based on its own analysis without needing your flag. But it currently doesn't work on references at all, and still only runs very early in the pipeline. In the meantime, some of the other AccessEnforcementOpts will reduce the number of checks with your flag, but probably won't eliminate them completely.

I suspect there are some decent motivating examples, so assuming you can uncover those, I'll proceed to give a lengthy argument for adding the flag...

The alloc_ref [uniq] flag says that a reference is uniquely identified, which primarily implies lack of aliasing. It also happens to imply (CoW) uniqueness at all use points, no additional analysis needed, but that isn't the primary intention.

Aliasing and escaping are different things. Trivially:

@inline(never)
func bar(_ o: MyClass) -> MyClass {
  return o
}
public func foo() {
  let o1 = MyClass()
  o1.x = 1
  let o2 = bar(o1)
  o2.x = 2
}

o1 is a local nonescaping reference, but we cannot treat it as a distinct object from o2.

The immediate goal of the flag is to recognize the base of a memory access as fundamentally "uniquely identified". See AccessedStorage::isUniquelyIdentified. This is important because it allows a storage location to be uniqued by a hash key. This does not preclude any further alias analysis, but it is useful for some well-defined subset of local allocations, and will likely also be helpful for -Onone diagnostics.

Adding an instruction flag goes against my intuition. I almost never want to use the IR as an analysis cache. But let's seriously evaluate this case.

SIL should primarily function as a high-level IR, meaning that important Swift-specific properties should be easily discoverable from the SIL structure whenever possible. If it's easy to discover that some well-defined set of local references are "uniquely identified", then that should be a property of the SIL. It should not require interprocedural global dataflow, as would be expected with a mid-level IR. There is no question that we still need additional analyses to propagate reference uniqueness, but giving alloc_ref a property that supersedes the analysis nicely short-circuits it.

I expect checking if a pointer source is uniquely identified to be increasingly pervasive throughout the compiler. It is checked within utilities that don't have access to analyses. And, whenever this property is true, EscapeAnalysis queries will be cheaper. It's also important that we limit the interprocedural AliasAnalysis and EscapeAnalysis to a few key points in the pipeline.

In other words, just because StackPromotion was able to optimize something doesn't make it a "property of SIL".

Structural SIL properties should be computed on-the-fly if possible as long as that doesn't increase algorithmic complexity. For example, if discovering a property only requires walking the use-def chain without handling phis, then that should always be done on-the-fly. The proposed "unique" property requires analyzing all uses if the alloc_ref. That certainly could be done on the fly, but I have to admit, caching this in a flag will be cheaper, and I do want identifying AccessStorage to be as cheap as possible--it's already used within quadratic algorithms.

A viable alternative would be to have every pass that uses AccessedStorage or that otherwise checks if a reference is "uniquely identified" separately compute and cache this property. That's a fair amount of design complexity and realistically we'll end up missing out on it for a some passes.

To meet the bar for adding an instruction flag, we need it to be high-value, low-risk, low-maintenance.

The flag is high value. Identifying unique storage locations enables powerful optimization. I'm expecting @zoecarver to show examples where dynamic exclusivity checks to "automatically" disappear for local objects. The only uncertainty I have is how common it really is to have locally allocated references that are never passed as arguments or captured. Note, though that capturing an inout property of the object does not count as capturing the reference.

The flag is low-risk. Once set, nothing should invalidate it, and it is easily checked in the SIL verifier with the same logic that sets the flag.

QUESTION: Can anyone think of a SIL transformation that would violate the uniqueness of a local reference?

The flag is low-maintenance. The only cost of this proposal is the implementation of the flag itself: [SIL] Add a "unique" flag to alloc_ref instructions. by zoecarver · Pull Request #32844 · apple/swift · GitHub
It it easily verified with the same logic that originally determines uniqueness.

I'm open to other alternatives, but this flag is potentially a valuable addition to the current SIL infrastructure. Worst case, we get some nice improvements for a while and decide to remove the flag later when every optimization that matters has its own more powerful "unique reference analysis".


This is somewhat moot, but for those who are interested, the EscapeAnalysis public API (canEscapeTo) does not currently determine local reference uniqueness in the situation we want:

class MyClass {
  var x = 0
}
@inline(never)
func bar(_ i: inout Int) {
  i = 2
}
public func foo() {
  let c1 = MyClass()
  let c2 = MyClass()
  bar(&c1.x)
  c2.x = 3
}

Here, @zoecarver 's logic easily identifies the locally allocated references as unique. But according to our current AliasAnalysis, which uses EscapeAnalysis, the accesses to c1.x and c2.x appear to alias and the release of c1 and c2 both appear to escape both c1.x and c2.x.

Under the hood, there is enough information in EscapeAnalysis connection graph to determine that only the property escapes, not the object itself. But the public API is not designed to make use of that information.

StackPromotion does work in these cases because it actually peeks under the hood to look at the EscapeAnalysis connection graph (the pass and analysis are tightly coupled).

3 Likes

@Andrew_Trick that was a fantastic reply. Thank you.

First, we do need some clear motivating examples to discuss.

Good idea.

But it isn't clear which ones are solved by the current PR:

Well... none of them. This only implements the flag, it doesn't add a pass to mark references unique, or update any existing passes to use that information.

To clear up which PRs/commits do what:

This is an old (and closed) PR that promoted stack-allocated references to have static access enforcement and "stack" memory kind. But, it shows some possible performance improvements that would be nice to have if we can achieve them in other ways.

This PR adds the "unique" flag to alloc_ref. That's it.

The next PR just shows a benchmark of what happens when we remove access markers in the middle of the pass pipeline. It has the same improvements as the first PR but with some added regressions... curious. I think DSE/RLE is the reason for the improvement when accesses are removed. I'm not sure why the first PR doesn't have the regressions.

And last, here's the most recent PR I created. This PR will give us a benchmark of some baseline improvements we will (hopefully) get by incorporating the "unique" information into various access optimization passes. The goal of the PR is to get a benchmark, not to commit anything. So there's a lot going on.

I suspect there are some decent motivating examples, so assuming you can uncover those

Yes, I can work to uncover these. The above PR will hopefully show the performance improvements from updating AccessedStorage::isUniquelyIdentified. I suspect we could get even better results with a custom pass, though. As you said, there are also other improvements here beyond performance. I.e., static access errors instead or runtime errors.

Structural SIL properties should be computed on-the-fly if possible as long as that doesn't increase algorithmic complexity. For example, if discovering a property only requires walking the use-def chain without handling phis, then that should always be done on-the-fly. The proposed "unique" property requires analyzing all uses if the alloc_ref. That certainly could be done on the fly, but I have to admit, caching this in a flag will be cheaper, and I do want identifying AccessStorage to be as cheap as possible--it's already used within quadratic algorithms.

To be sure a reference is actually unique, I think we have to use a quadratic algorithm, we have to evaluate every use and every use of that use and so on and so forth. So, I think the performance issue is actually quite an important one to keep in mind.

Another thing that is really neat about this implementation is that because it doesn't rely on any analysis utils, it can run in the SILVerifier. This means it can run after every pass which makes debugging extremely easy.

The flag is low-maintenance.

Additionally, if we wanted to remove it for some reason, I don't think there would be much cost to simply "turning it off."


After thinking about this more, there's another place that could have huge performance benefits from this feature. As you mentioned, the CoW "problem" discussed in:

could potentially be resolved using this flag.

Yesterday and today I spent a bit of time playing around with the example in that post to try to figure out if I could use this flag to resolve the issue. I was actually able to get rid of all the copies with:

and manual inlining. But, I suspect more complicated examples of the problem that aren't able to be optimized away could be solved by using this flag to essentially say, "if the source argument is unique, don't make a copy here." We might have to have a way to expose this on the language level, though, and that may not be the best. Especially considering that ownership changes (move only types, etc.) are (hopefully) just around the corner.

Thanks for looking into this, @zoecarver!

In this post, you'll find a proof that the mutating slice CoW problem can't be solved purely with information that is tracked at compile-time. You need one bit of information that is tracked at runtime. So while adding a parameter to alloc_ref instructions might mesh nicely with a solution to that problem, it's neither necessary nor sufficient to solve the problem on its own.

I wonder if the property of uniqueness is really the most useful thing to represent, because it's so easily destroyed: all it takes is a copy. CoW buffers have the property that they're immutable unless uniquely-referenced, a durable property that can be encoded into the type system and used for many of the same optimizations. Oh I see you're mostly concerned with optimizations for types with reference semantics. Not my department, I suppose :wink:

Hmm. I think you're right. However, that all hangs on the fact that the mutation may not be visible:

because the ultimate mutation being applied to the slice may not be visible to the compiler so the extra information can't reach the site of the mutation statically

When the mutation is visible, I think we can still fix it statically. I know that doesn't completely solve the problem but it's at least a step in the right direction and might be a good mitigation in the mean time.

The hard part of this problem is the phase ordering aspect. DSE, RLE, and AccessEnforcementOpts are costly passes. The requirement for adding instances of costly passes to the pipeline is an understanding of the pass dependencies and justification for running the pass multiple times. No one wants to be in the business of reengineering the pass pipeline, but that's part of the job here. Realistically, the only way to keep the pass pipeline under control is having a high bar for expanding it. So I'm afraid the burden is on us to explain why it's necessary to augment the pipeline rather than simply pointing to the benchmark numbers.

I'm all for steps in the right direction and mitigation of the problem. That said, if I understand what you've described, it depends on having clear visibility all the way from the point of the allocation of the CoW buffer through the whole algorithm doing the mutations. I'm pretty sure that only happens in a small fraction of cases even in optimized, non-resilient code.

One of the first optimizations I asked the performance team for when I was designing our CoW approach was the ability to hoist the check out of a mutating operation. The first write in any mutating algorithm establishes uniqueness and that uniqueness is carried through the whole thing with no need to ever check dynamically again. And again, when the operation doesn't start with uniqueness, the buffer is immutable anyway. So I have a really hard time seeing how the flag you've described is capturing the information needed for that particular problem.

Another idea is to add another parameter convention. I'm not sure that's a good idea, and I'm sure it would have its own set of problems. But it would probably be an ABI stable way to fix the issue ¹. Anyway, now I've gotten off the original topic. I'll keep thinking about the problem, though :)


¹ It would be ABI stable if we could guarantee two overloads of the mutating function. One that takes a unique reference and the other would be the current function. Depending on whether the reference was unique, we'd either choose to call the function with the unique argument or the current function. To maintain ABI stability, we could simply always call the "old" function. I think this could all be done in SILGen/SILOptimizer so, the user wouldn't have to know the difference.

Fair enough. Then this probably wouldn't be that helpful.

This is true, but I think the problem is more that DSE and RLE have bad (if any) support for access markers. While moving the pass pipeline around might work because we could make it so that we strip access markers before running those passes, I think it would be better to just update the passes that don't support accesses markers (as a general rule of thumb).

It makes sense that access markers are interfering with those passes. That will be a much bigger problem when we have persistent static access markers, which is an active/ongoing goal. Stripping the static access markers is a problematic hack that I needed for bootstrapping.

You might check if this PR changed the behavior

When RLE/DSE was still in PR form I strongly objected to the way it represented memory access, but was told not to worry because it was just a prototype and would be fixed later :) I've said that myself many times so should know better. I'm in the middle of generalizing and cleaning up MemAccessUtils so it can be used by passes like RLE/DSE to reduce the time and space complexity. When that happens, those passes would automatically benefit from your alloc_ref uniqueness.

I'm still personally undecided whether it's best for alloc-uniqueness to be an analysis or a flag, but haven't heard any practical argument against the flag.

@zoecarver, sorry I conflated two concepts in my big wordy post above. Let me define these terms:

uniquely identified: two pointers with distinct uniquely identified bases cannot alias.

uniquely referenced: cannot alias with any other reference in the same function

unique SSA reference: a unique OSSA reference cannot alias with another reference within its OSSA lifetime. But it may be derived from a non-unique reference. We can also make use of this in non-OSSA, but there's no reference lifetime... instead the value is just unique up to any operation that's not known copy-safe.

%r1 = alloc_ref       // %r1 is uniquely identified and uniquely referenced
ref_element_addr %r1  // normal uses
destroy_value %r1     // lifetime ends

%r2 = alloc_ref       // %r2 is uniquely identified, not uniquely referenced
%r3 = copy_value %r1  // %r3 is neither uniquely identified nor uniquely referenced

%r4 = end_cow_mutation // unique SSA reference
// can't alias with another reference before it's destroyed
destroy_value %4

/cc @Erik_Eckstein since I haven't gotten around to reviewing all the CoW PRs yet. Just assuming we're on the same page.


Now, I looked into your issue with isUniquelyIdentified() and think it's much easier to fix your immediate problem; sorry if I misled you.

AccessedStorage::isUniquelyIdentified() can return true whenever isa<AllocRefInst>(getObject()).

This has nothing to do with escaping or stack allocation.

We could still add an alloc_ref flag and add an AccessedStorage::isUniquelyReferenced() API that checks it. Then AccessedStorage::isDistinct would be true for:

getKind() == Class && other.getKind() == Class
&& this->isUniquelyReferenced() || other->isUniquelyReferenced()
&& !this->hasIdenticalBase(other)

I'm not sure you're hitting that case in practice though.

A more powerful API would be an isUniqueReference() that is specific to an SSA def-use chain, possibly combined with a flow-sensitive analysis.

%r4 = end_cow_mutation // unique SSA reference
apply (@guaranteed %4) // known not to copy its argument
// %4 is still unique here
destroy_value %4
// It no longer "matters" of %4 aliases anything

But that API doesn't belong in AccessedStorage, because AccessedStorage properties should hold true at least throughout the current function.

At that point, you don't really need the alloc_ref flag anymore, since you might as well fold it's discovery into the same analysis.

I could potentially see adding both an alloc_ref uniquely identified flag and a "uniqueness preserving" flag on SILFunctionArgument since that would be much more powerful that the alloc_ref flag alone, and would still be easy to check in SILVerifier. But I can't justify that unless there ends up being a widespread need to call "isUniquelyReferenced" and we don't want to recompute interprocedural analysis each time.

@Andrew_Trick before we make more progress on this, I have one remaining concern: I want to make sure this doesn't impact anything about what @John_McCall is exploring. If this will fit in with what he wants to do, then I am fine with taking this. I just don't want to put this in and have it not gel with that.

@Michael_Gottesman @John_McCall I'll address that, as briefly as I can ;)

We may want to create a user model (for various reasons) that allows us to constrain a particular reference to be unique (in the "unique SSA reference" sense that I mentioned above). In other words, the reference is guaranteed to be unique at all of its use points, but otherwise may alias. None of the "flags" we're talking about so far in this thread constitute a unique reference constraint.

First consider function arguments. We actually need two properties:

  1. A unique constraint as part of the function type. Only unique references can be passed to that argument.

  2. A uniqueness-preserving property that results from some -Onone level interprocedural analysis. Unique references could potentially be passed to these even if they don't have a unique constraint. Think support for no-escape closures.

In this thread, we've only been discussing problem #2. And the only reason we could consider representing the uniqueness-preserving property as a SILFunctionArgument flag is that it's easy to verify and nothing should invalidate it, but that's really just an implementation detail.

An interesting question is whether the unique constraint #1 should also imply #2. I'm not sure that's possible, because we may need a way to opt-out of the constraint with unsafe code.

In addition to constraining function arguments, I expect the type system to tell us some day that allocating certain classes requires uniqueness, that loading certain properties requires uniqueness, or that a particular variable declaration was annotated "unique".

Additionally, we need to be able to dynamically impose a uniqueness constraint after performing the runtime check in cases where we can't statically prove the incoming reference is unique.

This is different from what end_cow_mutation does today, which simply asserts uniqueness, but still allows the reference to be copied later. Let's generalize the end_cow_mutation semantics and call it an assert_unique operation.

Let's call an operation that imposes the uniqueness constraint constrain_unique. For the sake of discussion we can reduce the problem of implementing all the constraints I mentioned above to emitting constrain_unique instructions.

To diagnose constraint_unique, we need the same basic analysis that we're talking about in this thread, although we would only do it in OSSA which makes the analysis simpler and much more powerful. @zoecarver originally wanted this in non-OSSA, but I think we've found other ways to workaround the issues in the short term. So if we add any sort of uniqueness-preserving analysis, I now think we should just focus on OSSA.

What information would constrain_unique provide to optimization passes?

Uniquely identified: No. We can't really say anything about the origin of the reference.

Uniquely referenced: No. It can be derived from a non-unique reference.

Unique SSA reference: Yes, constrain_unique implies assert_unique, so we can prove that the reference is still unique at its use points, and for example we can remove any redundant uniqueness checks. At least this would work in OSSA as long as any unsafe opt-out is a consuming operation.

1 Like

Agreed, I'm not sure this would be necessary.

At that point, you don't really need the alloc_ref flag anymore, since you might as well fold it's discovery into the same analysis.

I'm thinking this flag is needed less and less but...

I could potentially see adding both an alloc_ref uniquely identified flag and a "uniqueness preserving" flag on SILFunctionArgument since that would be much more powerful that the alloc_ref flag alone, and would still be easy to check in SILVerifier. But I can't justify that unless there ends up being a widespread need to call "isUniquelyReferenced" and we don't want to recompute interprocedural analysis each time.

I think this would be extremely powerful and would probably fix @dabrahams CoW issue, maybe even without breaking ABI. Right now, this seems like the best argument for keeping this flag (and adding another flag for SILFunctionArguments).

  1. A unique constraint as part of the function type. Only unique references can be passed to that argument.

We could even have two overloads of this function so that if only one was present (i.e before a certain version of swift) then we could use that one. This is how we would fix the problem without breaking ABI.

An interesting question is whether the unique constraint #1 should also imply #2. I'm not sure that's possible, because we may need a way to opt-out of the constraint with unsafe code.

I agree. I don't think this is possible, especially without exposing this at a language level.

Additionally, we need to be able to dynamically impose a uniqueness constraint after performing the runtime check in cases where we can't statically prove the incoming reference is unique.

Hmm. I think we should aim not to have to do this. Maybe just for debugging. But, in theory, if the argument says that the incoming reference has uniqueness constrained, we should take that at face value. It is then the responsibility of the caller to enforce this, which they should be able to do statically. Even if these functions don't have visibility into each other, this should work. And it should all be statically verifiable.

Let me outline what I'm thinking with "unique SSA reference arguments." Still just brainstorming. But, if there's interest in further investigation, I'm happy to put together a sample implementation.

// Module 1
sil @mutator_for_unique(@unique %ref) {
  // Mutate ref in place.
}

sil @mutator_old(%ref) {
  // Copy then mutate.
}

// Module 2
sil @test1() {
  %r = end_cow_mutation
  // When both are known to exist:
  apply @mutator_for_unique(%r)
  // When module 1 was built before this feature existed:
  apply @mutator_old(%r)
}

sil @test2() {
  %r = unkown
  // Either way:
  apply @mutator_old(%r)
}

I'm thinking this would be automatically generated by the compiler. So the user would just write something like:

var r = Class()
mutator(r)

Now, after the apply, those references are no longer unique. That could be fixed with another argument flag, though.

How would the necessary dynamic bit be communicated without any change to the ABI?

That's what I mean. There wouldn't be a dynamic bit. This would all be static. It would only benefit code built after this feature lands but, it wouldn't need to have visibility through the various functions.

If a reference could be proved to be a "unique SSA reference." Then it would call out to the "new" function overload that takes a unique SSA reference. It doesn't matter if it had visibility into that function or not.