Representing "address-only" values in SIL

Andy Trick and I had this conversation Thursday, and I thought I'd capture it here.

The Problem

It's a longstanding complaint that SIL uses drastically different code patterns for the same sequence of operations based on whether the types of the values involved are loadable or "address-only". For example, suppose I have Swift code like this:
  let x, y : T
  let z = x + y

If T is a loadable type, this will generate SIL somewhat like this:
// %x and %y are values of type $T
%lhs = copy_value %x
%rhs = copy_value %y
%operator = function_ref T.+
%result = apply %operator(%lhs, %rhs)
%z = %result

(copy_value doesn't currently exist, but it's easier to understand, and as it happens we're thinking of adding it back.)

If T is an address-only type, this will generate SIL somewhat like this:
  // %x and %y are values of type $*T
  %z = alloc_stack $T
  %lhs = alloc_stack $T
  copy_addr %x to [initialization] %lhs
  %rhs = alloc_stack $T
  copy_addr %y to [initialization] %rhs
  %operator = function_ref T.+
  apply %operator(%z, %lhs, %rhs)
  dealloc_stack %rhs
  dealloc_stack %lhs

Notably, we're explicitly modeling the memory in which values are stored, which is both very verbose and — more importantly — loses any interesting SSA properties for tracking actual values around. And this has a bunch of secondary effects where various high-level operations like dynamic casts end up needing two different instructions based on whether the value is stored in memory. This is pretty dumb, and it's a major contributor to the reality that generic code is currently very poorly optimized.

It does, however, have some significant advantages: since the memory allocation is explicit, it's quite natural to express optimizations that e.g. hoist or sink those allocations, and the next level of lowering (IRGen) can be very simple.

Addresses and address-only types

Swift is an imperative language, and imperative languages make formal memory locations part of the high-level semantics. DI and SSA formation allow us to eliminate formal memory locations for most local variables, but class properties, global variables, escaped local variables, and pointers are all fundamentally un-SSA-able. Therefore we will always have some concept of an address.

But most values of address-only type aren't really being stored in that sort of formal memory location; we're just representing them that way in SIL. Why do we do this? What is it about a type that makes it address-only?

Well, some types are inherently "memory-pinned": something about their representation only makes sense, or is only implementable, if the value is always kept in memory:
  - The representation of the value may involve interior pointers, as with LLVM's SmallVector. This isn't currently a thing in Swift, but it's a possibility someday with the import of non-POD C++ types.
  - The address of the value may need to be registered elsewhere, as with weak references.
  - The value may allow internal mutation even from a shared reference, like a C++ class with a mutable field or a Rust atomic type; you can see weak references as analogous to this.
Such types are necessarily address-only at a low level.

Other types are address-only by abstraction: their representation isn't (fully) known, and so they have to be kept in memory (1) in case that representation includes another address-only value and (2) because there's no way to store a unbounded amount of data except in memory anyway.

But this sense of address-only types that we're describing is really a property of the final generated code. It's necessary for LLVM IR generation to know about it, but it's not obviously necessary for SIL to know about it beyond the implications of the inherently memory-pinned cases above:
  - it is not generally safe to assume that the stored properties of a non-POD C++ type remain invariant across moves
  - weak references *do* represent the same value across moves, but of course that value can change dynamically at any time anyway, per the rules of weak references
  - mutable fields can be modified even by a shared borrowed reference, and so (if they are modeled at all in SIL at all, rather than just leaving the type opaque) there must be some way to project a mutable address from a shared borrow and so on.

Our current representation of address-only types arises directly from the low-level operations that are required, which does admit some interesting optimizations on its own, but the disadvantages of having to support wildly divergent code paths and completely give up SSA use/def chains are crippling. What would SIL look like if we abandoned this entirely?

All types as SIL scalars

The fundamental issue here is that IR-level lowering does need to place memory-pinned values into memory. Suppose we take a value, use it as an enum case payload, inject that enum into an Any, and pass that to a function. In a future world where we consistently SSA all local values, this ideally looks something like this:

// %value is a $T
%enum = enum #MyEnum.foo, %value : $T
%any = existential $Any, %enum
%fn = function_ref @bar
  apply %fn(%any)

If all of these types were loadable, lowering this to IR doesn't impose any abstraction costs, because we can just manipulate it as a bit-pattern in memory, which LLVM (really, any reasonable compiler infrastructure) should be quite good at analyzing and optimizing. But if all of these types are address-only, that's not obviously true. Let's look at the details of lowering to memory.

Because all types are movable — even in a world with Rust-style ownership — there's a fairly straightforward lowering that works for all possible functions: every address-only SIL value gets its own allocation which is deallocated along the dominance frontier of the value. Indirect arguments use the passed-in buffer. Basic block arguments are copied from the allocation for the branch argument value to the allocation for the corresponding destination block parameter. Indirect return values are copied from the allocation for the return value to the pointer passed in.

This admits some obvious optimizations. If "owned" SIL values are pseudo-linear — i.e. along every path from their definition point, it is statically evident that they are (optionally) borrowed multiple times, consumed exactly once, and then never used again — then the copies can instead be moves. Standard register-allocation algorithms can recognize when basic block parameters can use the same location as their arguments. SIL only permits a single return instruction, so there's no reason not to allocate returned values directly into the return slot. These optimizations will tend to eliminate a lot of moves.

However, there's another problem, which is injections. Consider the example above. %any is an opaque existential, and initializing it formally involves allocating a buffer within the existential and moving the argument into that buffer. Ideally, we would allocate that buffer first and then simply allocate %enum in-place into that buffer. In this simple example, that's easy. But if the control flow were more complex, detecting that this is possible becomes significantly more challenging, as does ensuring that the buffer is properly cleaned up along all paths. For example, suppose that %value were computed by calling a throwing function:

  try_apply %someFunction() normal %cont, unwind %handler
cont(%value: $T):
  %enum = enum #MyEnum.foo, %value : $T
  %any = existential $Any, %enum
  %fn = function_ref @bar
  apply %fn(%any)
handler(%error: $Error):
  throw $error

Naive allocation here is going to introduce a lot of moves. Optimally, we would receive the return value from %someFunction directly in the payload of %enum, which we want to build directly into the allocated existential buffer of %any. But to do this, we actually need to allocate that existential buffer before executing the try_apply; and if the try_apply throws, we need to deallocate that existential buffer in the handler block. The need to retroactively insert this kind of clean-up code adds a lot of complexity to this allocation approach. Moreover, it's quite possible that complex intermediate control — for example, if there's a loop somewhere between the definition of a value and its consuming use — will tend to block this kind of analysis and cause more unnecessary moves.

In contrast, the current SIL-generation scheme tends to be aware of at least the immediate local uses of values and therefore emits a lot of these kinds of initialization "in-place" already. But that said, it does proceed from a simple recursive examination of the AST, which means it will miss examples that are just slightly more opaque, like binding a return value as a let and only then returning it. That kind of thing is currently left to the optimizer for no real good reason.

Summary

Overall, I think representing local address-only values as SIL scalars with full SSA use/def chains is really promising, and I do think we can write an allocation pass that does an acceptable job eliminating unnecessary moves. In order to actually do this, though, I think we need two things:
  - We need SIL to be "pseudo-linear" as discussed above. We really don't want the allocation pass to have to worry about keeping values alive past consumptions, and thus potentially having to insert copies instead of moves.
  - We need the allocation pass to handle exits before initialization (like with try_apply) and other sorts of interfering control flow. It will not be acceptable for this to be a block-local analysis with only a trivial amount of cross-block argument merging.

John.

I feel like moving in this direction is the right thing to do. Some random comments below:

Andy Trick and I had this conversation Thursday, and I thought I'd capture it here.

The Problem

It's a longstanding complaint that SIL uses drastically different code patterns for the same sequence of operations based on whether the types of the values involved are loadable or "address-only". For example, suppose I have Swift code like this:
  let x, y : T
  let z = x + y

If T is a loadable type, this will generate SIL somewhat like this:
// %x and %y are values of type $T
%lhs = copy_value %x
%rhs = copy_value %y
%operator = function_ref T.+
%result = apply %operator(%lhs, %rhs)
%z = %result

(copy_value doesn't currently exist, but it's easier to understand, and as it happens we're thinking of adding it back.)

If T is an address-only type, this will generate SIL somewhat like this:
  // %x and %y are values of type $*T
  %z = alloc_stack $T
  %lhs = alloc_stack $T
  copy_addr %x to [initialization] %lhs
  %rhs = alloc_stack $T
  copy_addr %y to [initialization] %rhs
  %operator = function_ref T.+
  apply %operator(%z, %lhs, %rhs)
  dealloc_stack %rhs
  dealloc_stack %lhs

Notably, we're explicitly modeling the memory in which values are stored, which is both very verbose and — more importantly — loses any interesting SSA properties for tracking actual values around. And this has a bunch of secondary effects where various high-level operations like dynamic casts end up needing two different instructions based on whether the value is stored in memory. This is pretty dumb, and it's a major contributor to the reality that generic code is currently very poorly optimized.

It does, however, have some significant advantages: since the memory allocation is explicit, it's quite natural to express optimizations that e.g. hoist or sink those allocations, and the next level of lowering (IRGen) can be very simple.

Addresses and address-only types

Swift is an imperative language, and imperative languages make formal memory locations part of the high-level semantics. DI and SSA formation allow us to eliminate formal memory locations for most local variables, but class properties, global variables, escaped local variables, and pointers are all fundamentally un-SSA-able. Therefore we will always have some concept of an address.

But most values of address-only type aren't really being stored in that sort of formal memory location; we're just representing them that way in SIL. Why do we do this? What is it about a type that makes it address-only?

Well, some types are inherently "memory-pinned": something about their representation only makes sense, or is only implementable, if the value is always kept in memory:
  - The representation of the value may involve interior pointers, as with LLVM's SmallVector. This isn't currently a thing in Swift, but it's a possibility someday with the import of non-POD C++ types.
  - The address of the value may need to be registered elsewhere, as with weak references.
  - The value may allow internal mutation even from a shared reference, like a C++ class with a mutable field or a Rust atomic type; you can see weak references as analogous to this.
Such types are necessarily address-only at a low level.

Other types are address-only by abstraction: their representation isn't (fully) known, and so they have to be kept in memory (1) in case that representation includes another address-only value and (2) because there's no way to store a unbounded amount of data except in memory anyway.

But this sense of address-only types that we're describing is really a property of the final generated code. It's necessary for LLVM IR generation to know about it, but it's not obviously necessary for SIL to know about it beyond the implications of the inherently memory-pinned cases above:
  - it is not generally safe to assume that the stored properties of a non-POD C++ type remain invariant across moves

Even if we want to accommodate ill-behaved C++ value types in the future, it seems to me that we could avoid penalizing the common case by making "WellBehavedValueSemantics" another opt-out type constraint, in the vein of "Copyable" in the move-only types model. If we do want to accommodate C++ types with semantically loaded move or copy operations in Swift, we'll probably want to make more C++-like guarantees about exactly when move, copy, and assign operations are performed on values of those types, but I don't think we want generic code to be subject to those constraints by default. If an address-only SIL representation ends up being fundamentally necessary to semantically model ill-behaved C++ types, we'd also want to avoid having to emit the SIL for all generics that way.

  - weak references *do* represent the same value across moves, but of course that value can change dynamically at any time anyway, per the rules of weak references
  - mutable fields can be modified even by a shared borrowed reference, and so (if they are modeled at all in SIL at all, rather than just leaving the type opaque) there must be some way to project a mutable address from a shared borrow and so on.

We had briefly discussed the possibility of giving "owned" and "borrowed" variants of a type different low-level representations. There are obvious complexities involved in making this a user-facing feature, but it might suffice to have an abstraction model that accommodates only two possibilities—either the owned and borrowed representations are the same, as for most POD or refcounted types, or the borrowed representation is a pointer to the owned representation for types with sharably mutable fields, such as weak references, atomics, mutexes, and so on. We could distinguish owned/borrowed in the SIL type system to make this possible.

Our current representation of address-only types arises directly from the low-level operations that are required, which does admit some interesting optimizations on its own, but the disadvantages of having to support wildly divergent code paths and completely give up SSA use/def chains are crippling. What would SIL look like if we abandoned this entirely?

All types as SIL scalars

The fundamental issue here is that IR-level lowering does need to place memory-pinned values into memory. Suppose we take a value, use it as an enum case payload, inject that enum into an Any, and pass that to a function. In a future world where we consistently SSA all local values, this ideally looks something like this:

// %value is a $T
%enum = enum #MyEnum.foo, %value : $T
%any = existential $Any, %enum
%fn = function_ref @bar
  apply %fn(%any)

If all of these types were loadable, lowering this to IR doesn't impose any abstraction costs, because we can just manipulate it as a bit-pattern in memory, which LLVM (really, any reasonable compiler infrastructure) should be quite good at analyzing and optimizing. But if all of these types are address-only, that's not obviously true. Let's look at the details of lowering to memory.

Because all types are movable — even in a world with Rust-style ownership — there's a fairly straightforward lowering that works for all possible functions: every address-only SIL value gets its own allocation which is deallocated along the dominance frontier of the value. Indirect arguments use the passed-in buffer. Basic block arguments are copied from the allocation for the branch argument value to the allocation for the corresponding destination block parameter. Indirect return values are copied from the allocation for the return value to the pointer passed in.

This admits some obvious optimizations. If "owned" SIL values are pseudo-linear — i.e. along every path from their definition point, it is statically evident that they are (optionally) borrowed multiple times, consumed exactly once, and then never used again — then the copies can instead be moves. Standard register-allocation algorithms can recognize when basic block parameters can use the same location as their arguments. SIL only permits a single return instruction, so there's no reason not to allocate returned values directly into the return slot. These optimizations will tend to eliminate a lot of moves.

However, there's another problem, which is injections. Consider the example above. %any is an opaque existential, and initializing it formally involves allocating a buffer within the existential and moving the argument into that buffer. Ideally, we would allocate that buffer first and then simply allocate %enum in-place into that buffer. In this simple example, that's easy. But if the control flow were more complex, detecting that this is possible becomes significantly more challenging, as does ensuring that the buffer is properly cleaned up along all paths. For example, suppose that %value were computed by calling a throwing function:

  try_apply %someFunction() normal %cont, unwind %handler
cont(%value: $T):
  %enum = enum #MyEnum.foo, %value : $T
  %any = existential $Any, %enum
  %fn = function_ref @bar
  apply %fn(%any)
handler(%error: $Error):
  throw $error

Naive allocation here is going to introduce a lot of moves. Optimally, we would receive the return value from %someFunction directly in the payload of %enum, which we want to build directly into the allocated existential buffer of %any. But to do this, we actually need to allocate that existential buffer before executing the try_apply; and if the try_apply throws, we need to deallocate that existential buffer in the handler block. The need to retroactively insert this kind of clean-up code adds a lot of complexity to this allocation approach. Moreover, it's quite possible that complex intermediate control — for example, if there's a loop somewhere between the definition of a value and its consuming use — will tend to block this kind of analysis and cause more unnecessary moves.

I think we can do a reasonable job of avoiding unnecessary moves in most injection cases. Perhaps we could keep the address-oriented SIL instructions around so that this late allocation pass can still be done in SIL (and thereby be subject to the correctness analyses we want to implement) instead of needing to be completely done on-the-fly in IRGen.

-Joe

···

On Oct 1, 2016, at 1:32 AM, John McCall via swift-dev <swift-dev@swift.org> wrote:

In contrast, the current SIL-generation scheme tends to be aware of at least the immediate local uses of values and therefore emits a lot of these kinds of initialization "in-place" already. But that said, it does proceed from a simple recursive examination of the AST, which means it will miss examples that are just slightly more opaque, like binding a return value as a let and only then returning it. That kind of thing is currently left to the optimizer for no real good reason.

Summary

Overall, I think representing local address-only values as SIL scalars with full SSA use/def chains is really promising, and I do think we can write an allocation pass that does an acceptable job eliminating unnecessary moves. In order to actually do this, though, I think we need two things:
  - We need SIL to be "pseudo-linear" as discussed above. We really don't want the allocation pass to have to worry about keeping values alive past consumptions, and thus potentially having to insert copies instead of moves.
  - We need the allocation pass to handle exits before initialization (like with try_apply) and other sorts of interfering control flow. It will not be acceptable for this to be a block-local analysis with only a trivial amount of cross-block argument merging.

John.
_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

When I first started reading this proposal, my primary objection was going to be that SSA doesn’t seem to really jive well with the idea of values becoming (in)valid at some point in the future (due to moves). You can see this in the definite-init pass, which it works entirely with addresses to handle the idea of a value which becomes assigned “eventually”. But if SIL’s notion of SSA can be extended to handle these problems, this sounds great!

It’s not clear to me how this would work though. How does an optimization pass reason about an SSA register becoming invalid because it was moved out of? Or rather: in what ways do the interesting properties of SSA survive passes needing to handle this? Is this a standard extension that’s been researched/implemented before?

···

On Oct 1, 2016, at 4:32 AM, John McCall via swift-dev <swift-dev@swift.org> wrote:

Andy Trick and I had this conversation Thursday, and I thought I'd capture it here.

The Problem

It's a longstanding complaint that SIL uses drastically different code patterns for the same sequence of operations based on whether the types of the values involved are loadable or "address-only". For example, suppose I have Swift code like this:
  let x, y : T
  let z = x + y

If T is a loadable type, this will generate SIL somewhat like this:
// %x and %y are values of type $T
%lhs = copy_value %x
%rhs = copy_value %y
%operator = function_ref T.+
%result = apply %operator(%lhs, %rhs)
%z = %result

(copy_value doesn't currently exist, but it's easier to understand, and as it happens we're thinking of adding it back.)

If T is an address-only type, this will generate SIL somewhat like this:
  // %x and %y are values of type $*T
  %z = alloc_stack $T
  %lhs = alloc_stack $T
  copy_addr %x to [initialization] %lhs
  %rhs = alloc_stack $T
  copy_addr %y to [initialization] %rhs
  %operator = function_ref T.+
  apply %operator(%z, %lhs, %rhs)
  dealloc_stack %rhs
  dealloc_stack %lhs

Notably, we're explicitly modeling the memory in which values are stored, which is both very verbose and — more importantly — loses any interesting SSA properties for tracking actual values around. And this has a bunch of secondary effects where various high-level operations like dynamic casts end up needing two different instructions based on whether the value is stored in memory. This is pretty dumb, and it's a major contributor to the reality that generic code is currently very poorly optimized.

It does, however, have some significant advantages: since the memory allocation is explicit, it's quite natural to express optimizations that e.g. hoist or sink those allocations, and the next level of lowering (IRGen) can be very simple.

Addresses and address-only types

Swift is an imperative language, and imperative languages make formal memory locations part of the high-level semantics. DI and SSA formation allow us to eliminate formal memory locations for most local variables, but class properties, global variables, escaped local variables, and pointers are all fundamentally un-SSA-able. Therefore we will always have some concept of an address.

But most values of address-only type aren't really being stored in that sort of formal memory location; we're just representing them that way in SIL. Why do we do this? What is it about a type that makes it address-only?

Well, some types are inherently "memory-pinned": something about their representation only makes sense, or is only implementable, if the value is always kept in memory:
  - The representation of the value may involve interior pointers, as with LLVM's SmallVector. This isn't currently a thing in Swift, but it's a possibility someday with the import of non-POD C++ types.
  - The address of the value may need to be registered elsewhere, as with weak references.
  - The value may allow internal mutation even from a shared reference, like a C++ class with a mutable field or a Rust atomic type; you can see weak references as analogous to this.
Such types are necessarily address-only at a low level.

Other types are address-only by abstraction: their representation isn't (fully) known, and so they have to be kept in memory (1) in case that representation includes another address-only value and (2) because there's no way to store a unbounded amount of data except in memory anyway.

But this sense of address-only types that we're describing is really a property of the final generated code. It's necessary for LLVM IR generation to know about it, but it's not obviously necessary for SIL to know about it beyond the implications of the inherently memory-pinned cases above:
  - it is not generally safe to assume that the stored properties of a non-POD C++ type remain invariant across moves
  - weak references *do* represent the same value across moves, but of course that value can change dynamically at any time anyway, per the rules of weak references
  - mutable fields can be modified even by a shared borrowed reference, and so (if they are modeled at all in SIL at all, rather than just leaving the type opaque) there must be some way to project a mutable address from a shared borrow and so on.

Our current representation of address-only types arises directly from the low-level operations that are required, which does admit some interesting optimizations on its own, but the disadvantages of having to support wildly divergent code paths and completely give up SSA use/def chains are crippling. What would SIL look like if we abandoned this entirely?

All types as SIL scalars

The fundamental issue here is that IR-level lowering does need to place memory-pinned values into memory. Suppose we take a value, use it as an enum case payload, inject that enum into an Any, and pass that to a function. In a future world where we consistently SSA all local values, this ideally looks something like this:

// %value is a $T
%enum = enum #MyEnum.foo, %value : $T
%any = existential $Any, %enum
%fn = function_ref @bar
  apply %fn(%any)

If all of these types were loadable, lowering this to IR doesn't impose any abstraction costs, because we can just manipulate it as a bit-pattern in memory, which LLVM (really, any reasonable compiler infrastructure) should be quite good at analyzing and optimizing. But if all of these types are address-only, that's not obviously true. Let's look at the details of lowering to memory.

Because all types are movable — even in a world with Rust-style ownership — there's a fairly straightforward lowering that works for all possible functions: every address-only SIL value gets its own allocation which is deallocated along the dominance frontier of the value. Indirect arguments use the passed-in buffer. Basic block arguments are copied from the allocation for the branch argument value to the allocation for the corresponding destination block parameter. Indirect return values are copied from the allocation for the return value to the pointer passed in.

This admits some obvious optimizations. If "owned" SIL values are pseudo-linear — i.e. along every path from their definition point, it is statically evident that they are (optionally) borrowed multiple times, consumed exactly once, and then never used again — then the copies can instead be moves. Standard register-allocation algorithms can recognize when basic block parameters can use the same location as their arguments. SIL only permits a single return instruction, so there's no reason not to allocate returned values directly into the return slot. These optimizations will tend to eliminate a lot of moves.

However, there's another problem, which is injections. Consider the example above. %any is an opaque existential, and initializing it formally involves allocating a buffer within the existential and moving the argument into that buffer. Ideally, we would allocate that buffer first and then simply allocate %enum in-place into that buffer. In this simple example, that's easy. But if the control flow were more complex, detecting that this is possible becomes significantly more challenging, as does ensuring that the buffer is properly cleaned up along all paths. For example, suppose that %value were computed by calling a throwing function:

  try_apply %someFunction() normal %cont, unwind %handler
cont(%value: $T):
  %enum = enum #MyEnum.foo, %value : $T
  %any = existential $Any, %enum
  %fn = function_ref @bar
  apply %fn(%any)
handler(%error: $Error):
  throw $error

Naive allocation here is going to introduce a lot of moves. Optimally, we would receive the return value from %someFunction directly in the payload of %enum, which we want to build directly into the allocated existential buffer of %any. But to do this, we actually need to allocate that existential buffer before executing the try_apply; and if the try_apply throws, we need to deallocate that existential buffer in the handler block. The need to retroactively insert this kind of clean-up code adds a lot of complexity to this allocation approach. Moreover, it's quite possible that complex intermediate control — for example, if there's a loop somewhere between the definition of a value and its consuming use — will tend to block this kind of analysis and cause more unnecessary moves.

In contrast, the current SIL-generation scheme tends to be aware of at least the immediate local uses of values and therefore emits a lot of these kinds of initialization "in-place" already. But that said, it does proceed from a simple recursive examination of the AST, which means it will miss examples that are just slightly more opaque, like binding a return value as a let and only then returning it. That kind of thing is currently left to the optimizer for no real good reason.

Summary

Overall, I think representing local address-only values as SIL scalars with full SSA use/def chains is really promising, and I do think we can write an allocation pass that does an acceptable job eliminating unnecessary moves. In order to actually do this, though, I think we need two things:
  - We need SIL to be "pseudo-linear" as discussed above. We really don't want the allocation pass to have to worry about keeping values alive past consumptions, and thus potentially having to insert copies instead of moves.
  - We need the allocation pass to handle exits before initialization (like with try_apply) and other sorts of interfering control flow. It will not be acceptable for this to be a block-local analysis with only a trivial amount of cross-block argument merging.

John.
_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

I feel like moving in this direction is the right thing to do. Some random comments below:

Andy Trick and I had this conversation Thursday, and I thought I'd capture it here.

The Problem

It's a longstanding complaint that SIL uses drastically different code patterns for the same sequence of operations based on whether the types of the values involved are loadable or "address-only". For example, suppose I have Swift code like this:
let x, y : T
let z = x + y

If T is a loadable type, this will generate SIL somewhat like this:
// %x and %y are values of type $T
%lhs = copy_value %x
%rhs = copy_value %y
%operator = function_ref T.+
%result = apply %operator(%lhs, %rhs)
%z = %result

(copy_value doesn't currently exist, but it's easier to understand, and as it happens we're thinking of adding it back.)

If T is an address-only type, this will generate SIL somewhat like this:
// %x and %y are values of type $*T
%z = alloc_stack $T
%lhs = alloc_stack $T
copy_addr %x to [initialization] %lhs
%rhs = alloc_stack $T
copy_addr %y to [initialization] %rhs
%operator = function_ref T.+
apply %operator(%z, %lhs, %rhs)
dealloc_stack %rhs
dealloc_stack %lhs

Notably, we're explicitly modeling the memory in which values are stored, which is both very verbose and — more importantly — loses any interesting SSA properties for tracking actual values around. And this has a bunch of secondary effects where various high-level operations like dynamic casts end up needing two different instructions based on whether the value is stored in memory. This is pretty dumb, and it's a major contributor to the reality that generic code is currently very poorly optimized.

It does, however, have some significant advantages: since the memory allocation is explicit, it's quite natural to express optimizations that e.g. hoist or sink those allocations, and the next level of lowering (IRGen) can be very simple.

Addresses and address-only types

Swift is an imperative language, and imperative languages make formal memory locations part of the high-level semantics. DI and SSA formation allow us to eliminate formal memory locations for most local variables, but class properties, global variables, escaped local variables, and pointers are all fundamentally un-SSA-able. Therefore we will always have some concept of an address.

But most values of address-only type aren't really being stored in that sort of formal memory location; we're just representing them that way in SIL. Why do we do this? What is it about a type that makes it address-only?

Well, some types are inherently "memory-pinned": something about their representation only makes sense, or is only implementable, if the value is always kept in memory:
- The representation of the value may involve interior pointers, as with LLVM's SmallVector. This isn't currently a thing in Swift, but it's a possibility someday with the import of non-POD C++ types.
- The address of the value may need to be registered elsewhere, as with weak references.
- The value may allow internal mutation even from a shared reference, like a C++ class with a mutable field or a Rust atomic type; you can see weak references as analogous to this.
Such types are necessarily address-only at a low level.

Other types are address-only by abstraction: their representation isn't (fully) known, and so they have to be kept in memory (1) in case that representation includes another address-only value and (2) because there's no way to store a unbounded amount of data except in memory anyway.

But this sense of address-only types that we're describing is really a property of the final generated code. It's necessary for LLVM IR generation to know about it, but it's not obviously necessary for SIL to know about it beyond the implications of the inherently memory-pinned cases above:
- it is not generally safe to assume that the stored properties of a non-POD C++ type remain invariant across moves

Even if we want to accommodate ill-behaved C++ value types in the future, it seems to me that we could avoid penalizing the common case by making "WellBehavedValueSemantics" another opt-out type constraint, in the vein of "Copyable" in the move-only types model. If we do want to accommodate C++ types with semantically loaded move or copy operations in Swift, we'll probably want to make more C++-like guarantees about exactly when move, copy, and assign operations are performed on values of those types, but I don't think we want generic code to be subject to those constraints by default. If an address-only SIL representation ends up being fundamentally necessary to semantically model ill-behaved C++ types, we'd also want to avoid having to emit the SIL for all generics that way.

I completely agree that we'll need to ask non-value-semantics C++ types to opt out explicitly in some way. That wasn't what I had in mind here, though.

My concern is about types that provide a consistent view of their abstract "value" but implement that in ways that change storage; as one example, a pre-C++11 type for which moves are actually implemented as copies. We would not want to e.g. assume that we can read a property, move the value, and then rely on the old load still being valid.

Generic code should be fine regardless, since the optimizer can't make assumptions about abstractly-implemented properties.

- weak references *do* represent the same value across moves, but of course that value can change dynamically at any time anyway, per the rules of weak references
- mutable fields can be modified even by a shared borrowed reference, and so (if they are modeled at all in SIL at all, rather than just leaving the type opaque) there must be some way to project a mutable address from a shared borrow and so on.

We had briefly discussed the possibility of giving "owned" and "borrowed" variants of a type different low-level representations. There are obvious complexities involved in making this a user-facing feature, but it might suffice to have an abstraction model that accommodates only two possibilities—either the owned and borrowed representations are the same, as for most POD or refcounted types, or the borrowed representation is a pointer to the owned representation for types with sharably mutable fields, such as weak references, atomics, mutexes, and so on. We could distinguish owned/borrowed in the SIL type system to make this possible.

Yeah, this is something I'm still tossing around in my mind. The problem is tuples; is a borrowed tuple a tuple of borrows or just that same rule applied to the aggregate? The latter doesn't handle any sort of translation we might want to do.

Our current representation of address-only types arises directly from the low-level operations that are required, which does admit some interesting optimizations on its own, but the disadvantages of having to support wildly divergent code paths and completely give up SSA use/def chains are crippling. What would SIL look like if we abandoned this entirely?

All types as SIL scalars

The fundamental issue here is that IR-level lowering does need to place memory-pinned values into memory. Suppose we take a value, use it as an enum case payload, inject that enum into an Any, and pass that to a function. In a future world where we consistently SSA all local values, this ideally looks something like this:

// %value is a $T
%enum = enum #MyEnum.foo, %value : $T
%any = existential $Any, %enum
%fn = function_ref @bar
apply %fn(%any)

If all of these types were loadable, lowering this to IR doesn't impose any abstraction costs, because we can just manipulate it as a bit-pattern in memory, which LLVM (really, any reasonable compiler infrastructure) should be quite good at analyzing and optimizing. But if all of these types are address-only, that's not obviously true. Let's look at the details of lowering to memory.

Because all types are movable — even in a world with Rust-style ownership — there's a fairly straightforward lowering that works for all possible functions: every address-only SIL value gets its own allocation which is deallocated along the dominance frontier of the value. Indirect arguments use the passed-in buffer. Basic block arguments are copied from the allocation for the branch argument value to the allocation for the corresponding destination block parameter. Indirect return values are copied from the allocation for the return value to the pointer passed in.

This admits some obvious optimizations. If "owned" SIL values are pseudo-linear — i.e. along every path from their definition point, it is statically evident that they are (optionally) borrowed multiple times, consumed exactly once, and then never used again — then the copies can instead be moves. Standard register-allocation algorithms can recognize when basic block parameters can use the same location as their arguments. SIL only permits a single return instruction, so there's no reason not to allocate returned values directly into the return slot. These optimizations will tend to eliminate a lot of moves.

However, there's another problem, which is injections. Consider the example above. %any is an opaque existential, and initializing it formally involves allocating a buffer within the existential and moving the argument into that buffer. Ideally, we would allocate that buffer first and then simply allocate %enum in-place into that buffer. In this simple example, that's easy. But if the control flow were more complex, detecting that this is possible becomes significantly more challenging, as does ensuring that the buffer is properly cleaned up along all paths. For example, suppose that %value were computed by calling a throwing function:

try_apply %someFunction() normal %cont, unwind %handler
cont(%value: $T):
%enum = enum #MyEnum.foo, %value : $T
%any = existential $Any, %enum
%fn = function_ref @bar
apply %fn(%any)
handler(%error: $Error):
throw $error

Naive allocation here is going to introduce a lot of moves. Optimally, we would receive the return value from %someFunction directly in the payload of %enum, which we want to build directly into the allocated existential buffer of %any. But to do this, we actually need to allocate that existential buffer before executing the try_apply; and if the try_apply throws, we need to deallocate that existential buffer in the handler block. The need to retroactively insert this kind of clean-up code adds a lot of complexity to this allocation approach. Moreover, it's quite possible that complex intermediate control — for example, if there's a loop somewhere between the definition of a value and its consuming use — will tend to block this kind of analysis and cause more unnecessary moves.

I think we can do a reasonable job of avoiding unnecessary moves in most injection cases. Perhaps we could keep the address-oriented SIL instructions around so that this late allocation pass can still be done in SIL (and thereby be subject to the correctness analyses we want to implement) instead of needing to be completely done on-the-fly in IRGen.

Yeah, this is probably the right thing to do. We would probably want to introduce a new phase so that ordinary optimizations can assume away the address-oriented SIL instructions; it's not clear to me that they work well with a stricter, higher-level SIL, since otherwise I think we can mostly define away uninitialized memory.

John.

···

On Oct 3, 2016, at 9:22 AM, Joe Groff <jgroff@apple.com> wrote:

On Oct 1, 2016, at 1:32 AM, John McCall via swift-dev <swift-dev@swift.org> wrote:

-Joe

In contrast, the current SIL-generation scheme tends to be aware of at least the immediate local uses of values and therefore emits a lot of these kinds of initialization "in-place" already. But that said, it does proceed from a simple recursive examination of the AST, which means it will miss examples that are just slightly more opaque, like binding a return value as a let and only then returning it. That kind of thing is currently left to the optimizer for no real good reason.

Summary

Overall, I think representing local address-only values as SIL scalars with full SSA use/def chains is really promising, and I do think we can write an allocation pass that does an acceptable job eliminating unnecessary moves. In order to actually do this, though, I think we need two things:
- We need SIL to be "pseudo-linear" as discussed above. We really don't want the allocation pass to have to worry about keeping values alive past consumptions, and thus potentially having to insert copies instead of moves.
- We need the allocation pass to handle exits before initialization (like with try_apply) and other sorts of interfering control flow. It will not be acceptable for this to be a block-local analysis with only a trivial amount of cross-block argument merging.

John.
_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

When I first started reading this proposal, my primary objection was going to be that SSA doesn’t seem to really jive well with the idea of values becoming (in)valid at some point in the future (due to moves). You can see this in the definite-init pass, which it works entirely with addresses to handle the idea of a value which becomes assigned “eventually”. But if SIL’s notion of SSA can be extended to handle these problems, this sounds great!

It’s not clear to me how this would work though. How does an optimization pass reason about an SSA register becoming invalid because it was moved out of? Or rather: in what ways do the interesting properties of SSA survive passes needing to handle this? Is this a standard extension that’s been researched/implemented before?

I think that’s why John claimed that we need to enforce “pseudo-linear” SIL values types. Moving out of an SSA value must be the last use of that value. SIL will enforce this single-consumer property throughout.

-Andy

···

On Oct 3, 2016, at 12:10 PM, Alexis via swift-dev <swift-dev@swift.org> wrote:

On Oct 1, 2016, at 4:32 AM, John McCall via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

Andy Trick and I had this conversation Thursday, and I thought I'd capture it here.

The Problem

It's a longstanding complaint that SIL uses drastically different code patterns for the same sequence of operations based on whether the types of the values involved are loadable or "address-only". For example, suppose I have Swift code like this:
  let x, y : T
  let z = x + y

If T is a loadable type, this will generate SIL somewhat like this:
// %x and %y are values of type $T
%lhs = copy_value %x
%rhs = copy_value %y
%operator = function_ref T.+
%result = apply %operator(%lhs, %rhs)
%z = %result

(copy_value doesn't currently exist, but it's easier to understand, and as it happens we're thinking of adding it back.)

If T is an address-only type, this will generate SIL somewhat like this:
  // %x and %y are values of type $*T
  %z = alloc_stack $T
  %lhs = alloc_stack $T
  copy_addr %x to [initialization] %lhs
  %rhs = alloc_stack $T
  copy_addr %y to [initialization] %rhs
  %operator = function_ref T.+
  apply %operator(%z, %lhs, %rhs)
  dealloc_stack %rhs
  dealloc_stack %lhs

Notably, we're explicitly modeling the memory in which values are stored, which is both very verbose and — more importantly — loses any interesting SSA properties for tracking actual values around. And this has a bunch of secondary effects where various high-level operations like dynamic casts end up needing two different instructions based on whether the value is stored in memory. This is pretty dumb, and it's a major contributor to the reality that generic code is currently very poorly optimized.

It does, however, have some significant advantages: since the memory allocation is explicit, it's quite natural to express optimizations that e.g. hoist or sink those allocations, and the next level of lowering (IRGen) can be very simple.

Addresses and address-only types

Swift is an imperative language, and imperative languages make formal memory locations part of the high-level semantics. DI and SSA formation allow us to eliminate formal memory locations for most local variables, but class properties, global variables, escaped local variables, and pointers are all fundamentally un-SSA-able. Therefore we will always have some concept of an address.

But most values of address-only type aren't really being stored in that sort of formal memory location; we're just representing them that way in SIL. Why do we do this? What is it about a type that makes it address-only?

Well, some types are inherently "memory-pinned": something about their representation only makes sense, or is only implementable, if the value is always kept in memory:
  - The representation of the value may involve interior pointers, as with LLVM's SmallVector. This isn't currently a thing in Swift, but it's a possibility someday with the import of non-POD C++ types.
  - The address of the value may need to be registered elsewhere, as with weak references.
  - The value may allow internal mutation even from a shared reference, like a C++ class with a mutable field or a Rust atomic type; you can see weak references as analogous to this.
Such types are necessarily address-only at a low level.

Other types are address-only by abstraction: their representation isn't (fully) known, and so they have to be kept in memory (1) in case that representation includes another address-only value and (2) because there's no way to store a unbounded amount of data except in memory anyway.

But this sense of address-only types that we're describing is really a property of the final generated code. It's necessary for LLVM IR generation to know about it, but it's not obviously necessary for SIL to know about it beyond the implications of the inherently memory-pinned cases above:
  - it is not generally safe to assume that the stored properties of a non-POD C++ type remain invariant across moves
  - weak references *do* represent the same value across moves, but of course that value can change dynamically at any time anyway, per the rules of weak references
  - mutable fields can be modified even by a shared borrowed reference, and so (if they are modeled at all in SIL at all, rather than just leaving the type opaque) there must be some way to project a mutable address from a shared borrow and so on.

Our current representation of address-only types arises directly from the low-level operations that are required, which does admit some interesting optimizations on its own, but the disadvantages of having to support wildly divergent code paths and completely give up SSA use/def chains are crippling. What would SIL look like if we abandoned this entirely?

All types as SIL scalars

The fundamental issue here is that IR-level lowering does need to place memory-pinned values into memory. Suppose we take a value, use it as an enum case payload, inject that enum into an Any, and pass that to a function. In a future world where we consistently SSA all local values, this ideally looks something like this:

// %value is a $T
%enum = enum #MyEnum.foo, %value : $T
%any = existential $Any, %enum
%fn = function_ref @bar
  apply %fn(%any)

If all of these types were loadable, lowering this to IR doesn't impose any abstraction costs, because we can just manipulate it as a bit-pattern in memory, which LLVM (really, any reasonable compiler infrastructure) should be quite good at analyzing and optimizing. But if all of these types are address-only, that's not obviously true. Let's look at the details of lowering to memory.

Because all types are movable — even in a world with Rust-style ownership — there's a fairly straightforward lowering that works for all possible functions: every address-only SIL value gets its own allocation which is deallocated along the dominance frontier of the value. Indirect arguments use the passed-in buffer. Basic block arguments are copied from the allocation for the branch argument value to the allocation for the corresponding destination block parameter. Indirect return values are copied from the allocation for the return value to the pointer passed in.

This admits some obvious optimizations. If "owned" SIL values are pseudo-linear — i.e. along every path from their definition point, it is statically evident that they are (optionally) borrowed multiple times, consumed exactly once, and then never used again — then the copies can instead be moves. Standard register-allocation algorithms can recognize when basic block parameters can use the same location as their arguments. SIL only permits a single return instruction, so there's no reason not to allocate returned values directly into the return slot. These optimizations will tend to eliminate a lot of moves.

However, there's another problem, which is injections. Consider the example above. %any is an opaque existential, and initializing it formally involves allocating a buffer within the existential and moving the argument into that buffer. Ideally, we would allocate that buffer first and then simply allocate %enum in-place into that buffer. In this simple example, that's easy. But if the control flow were more complex, detecting that this is possible becomes significantly more challenging, as does ensuring that the buffer is properly cleaned up along all paths. For example, suppose that %value were computed by calling a throwing function:

  try_apply %someFunction() normal %cont, unwind %handler
cont(%value: $T):
  %enum = enum #MyEnum.foo, %value : $T
  %any = existential $Any, %enum
  %fn = function_ref @bar
  apply %fn(%any)
handler(%error: $Error):
  throw $error

Naive allocation here is going to introduce a lot of moves. Optimally, we would receive the return value from %someFunction directly in the payload of %enum, which we want to build directly into the allocated existential buffer of %any. But to do this, we actually need to allocate that existential buffer before executing the try_apply; and if the try_apply throws, we need to deallocate that existential buffer in the handler block. The need to retroactively insert this kind of clean-up code adds a lot of complexity to this allocation approach. Moreover, it's quite possible that complex intermediate control — for example, if there's a loop somewhere between the definition of a value and its consuming use — will tend to block this kind of analysis and cause more unnecessary moves.

In contrast, the current SIL-generation scheme tends to be aware of at least the immediate local uses of values and therefore emits a lot of these kinds of initialization "in-place" already. But that said, it does proceed from a simple recursive examination of the AST, which means it will miss examples that are just slightly more opaque, like binding a return value as a let and only then returning it. That kind of thing is currently left to the optimizer for no real good reason.

Summary

Overall, I think representing local address-only values as SIL scalars with full SSA use/def chains is really promising, and I do think we can write an allocation pass that does an acceptable job eliminating unnecessary moves. In order to actually do this, though, I think we need two things:
  - We need SIL to be "pseudo-linear" as discussed above. We really don't want the allocation pass to have to worry about keeping values alive past consumptions, and thus potentially having to insert copies instead of moves.
  - We need the allocation pass to handle exits before initialization (like with try_apply) and other sorts of interfering control flow. It will not be acceptable for this to be a block-local analysis with only a trivial amount of cross-block argument merging.

John.
_______________________________________________
swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

When I first started reading this proposal, my primary objection was going to be that SSA doesn’t seem to really jive well with the idea of values becoming (in)valid at some point in the future (due to moves). You can see this in the definite-init pass, which it works entirely with addresses to handle the idea of a value which becomes assigned “eventually”. But if SIL’s notion of SSA can be extended to handle these problems, this sounds great!

It’s not clear to me how this would work though. How does an optimization pass reason about an SSA register becoming invalid because it was moved out of? Or rather: in what ways do the interesting properties of SSA survive passes needing to handle this? Is this a standard extension that’s been researched/implemented before?

I think that’s why John claimed that we need to enforce “pseudo-linear” SIL values types. Moving out of an SSA value must be the last use of that value. SIL will enforce this single-consumer property throughout.

Precisely. SSA is already subject to a number of non-trivial formation rules, such as the rule that you can only refer to dominating values. And SIL already adds a number of rules to that, such as the rule that alloc_stacks have to be statically nested. I don't think pseudo-linearity is a problematic extension to that set of rules.

It is, of course, an expressivity limitation on SIL — there will always be cases where you could *prove* correctness but it's not *structurally valid*. But that's true of the SSA dominance restriction as well.

John.

···

On Oct 3, 2016, at 1:37 PM, Andrew Trick <atrick@apple.com> wrote:

On Oct 3, 2016, at 12:10 PM, Alexis via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

-Andy

On Oct 1, 2016, at 4:32 AM, John McCall via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

Andy Trick and I had this conversation Thursday, and I thought I'd capture it here.

The Problem

It's a longstanding complaint that SIL uses drastically different code patterns for the same sequence of operations based on whether the types of the values involved are loadable or "address-only". For example, suppose I have Swift code like this:
  let x, y : T
  let z = x + y

If T is a loadable type, this will generate SIL somewhat like this:
// %x and %y are values of type $T
%lhs = copy_value %x
%rhs = copy_value %y
%operator = function_ref T.+
%result = apply %operator(%lhs, %rhs)
%z = %result

(copy_value doesn't currently exist, but it's easier to understand, and as it happens we're thinking of adding it back.)

If T is an address-only type, this will generate SIL somewhat like this:
  // %x and %y are values of type $*T
  %z = alloc_stack $T
  %lhs = alloc_stack $T
  copy_addr %x to [initialization] %lhs
  %rhs = alloc_stack $T
  copy_addr %y to [initialization] %rhs
  %operator = function_ref T.+
  apply %operator(%z, %lhs, %rhs)
  dealloc_stack %rhs
  dealloc_stack %lhs

Notably, we're explicitly modeling the memory in which values are stored, which is both very verbose and — more importantly — loses any interesting SSA properties for tracking actual values around. And this has a bunch of secondary effects where various high-level operations like dynamic casts end up needing two different instructions based on whether the value is stored in memory. This is pretty dumb, and it's a major contributor to the reality that generic code is currently very poorly optimized.

It does, however, have some significant advantages: since the memory allocation is explicit, it's quite natural to express optimizations that e.g. hoist or sink those allocations, and the next level of lowering (IRGen) can be very simple.

Addresses and address-only types

Swift is an imperative language, and imperative languages make formal memory locations part of the high-level semantics. DI and SSA formation allow us to eliminate formal memory locations for most local variables, but class properties, global variables, escaped local variables, and pointers are all fundamentally un-SSA-able. Therefore we will always have some concept of an address.

But most values of address-only type aren't really being stored in that sort of formal memory location; we're just representing them that way in SIL. Why do we do this? What is it about a type that makes it address-only?

Well, some types are inherently "memory-pinned": something about their representation only makes sense, or is only implementable, if the value is always kept in memory:
  - The representation of the value may involve interior pointers, as with LLVM's SmallVector. This isn't currently a thing in Swift, but it's a possibility someday with the import of non-POD C++ types.
  - The address of the value may need to be registered elsewhere, as with weak references.
  - The value may allow internal mutation even from a shared reference, like a C++ class with a mutable field or a Rust atomic type; you can see weak references as analogous to this.
Such types are necessarily address-only at a low level.

Other types are address-only by abstraction: their representation isn't (fully) known, and so they have to be kept in memory (1) in case that representation includes another address-only value and (2) because there's no way to store a unbounded amount of data except in memory anyway.

But this sense of address-only types that we're describing is really a property of the final generated code. It's necessary for LLVM IR generation to know about it, but it's not obviously necessary for SIL to know about it beyond the implications of the inherently memory-pinned cases above:
  - it is not generally safe to assume that the stored properties of a non-POD C++ type remain invariant across moves
  - weak references *do* represent the same value across moves, but of course that value can change dynamically at any time anyway, per the rules of weak references
  - mutable fields can be modified even by a shared borrowed reference, and so (if they are modeled at all in SIL at all, rather than just leaving the type opaque) there must be some way to project a mutable address from a shared borrow and so on.

Our current representation of address-only types arises directly from the low-level operations that are required, which does admit some interesting optimizations on its own, but the disadvantages of having to support wildly divergent code paths and completely give up SSA use/def chains are crippling. What would SIL look like if we abandoned this entirely?

All types as SIL scalars

The fundamental issue here is that IR-level lowering does need to place memory-pinned values into memory. Suppose we take a value, use it as an enum case payload, inject that enum into an Any, and pass that to a function. In a future world where we consistently SSA all local values, this ideally looks something like this:

// %value is a $T
%enum = enum #MyEnum.foo, %value : $T
%any = existential $Any, %enum
%fn = function_ref @bar
  apply %fn(%any)

If all of these types were loadable, lowering this to IR doesn't impose any abstraction costs, because we can just manipulate it as a bit-pattern in memory, which LLVM (really, any reasonable compiler infrastructure) should be quite good at analyzing and optimizing. But if all of these types are address-only, that's not obviously true. Let's look at the details of lowering to memory.

Because all types are movable — even in a world with Rust-style ownership — there's a fairly straightforward lowering that works for all possible functions: every address-only SIL value gets its own allocation which is deallocated along the dominance frontier of the value. Indirect arguments use the passed-in buffer. Basic block arguments are copied from the allocation for the branch argument value to the allocation for the corresponding destination block parameter. Indirect return values are copied from the allocation for the return value to the pointer passed in.

This admits some obvious optimizations. If "owned" SIL values are pseudo-linear — i.e. along every path from their definition point, it is statically evident that they are (optionally) borrowed multiple times, consumed exactly once, and then never used again — then the copies can instead be moves. Standard register-allocation algorithms can recognize when basic block parameters can use the same location as their arguments. SIL only permits a single return instruction, so there's no reason not to allocate returned values directly into the return slot. These optimizations will tend to eliminate a lot of moves.

However, there's another problem, which is injections. Consider the example above. %any is an opaque existential, and initializing it formally involves allocating a buffer within the existential and moving the argument into that buffer. Ideally, we would allocate that buffer first and then simply allocate %enum in-place into that buffer. In this simple example, that's easy. But if the control flow were more complex, detecting that this is possible becomes significantly more challenging, as does ensuring that the buffer is properly cleaned up along all paths. For example, suppose that %value were computed by calling a throwing function:

  try_apply %someFunction() normal %cont, unwind %handler
cont(%value: $T):
  %enum = enum #MyEnum.foo, %value : $T
  %any = existential $Any, %enum
  %fn = function_ref @bar
  apply %fn(%any)
handler(%error: $Error):
  throw $error

Naive allocation here is going to introduce a lot of moves. Optimally, we would receive the return value from %someFunction directly in the payload of %enum, which we want to build directly into the allocated existential buffer of %any. But to do this, we actually need to allocate that existential buffer before executing the try_apply; and if the try_apply throws, we need to deallocate that existential buffer in the handler block. The need to retroactively insert this kind of clean-up code adds a lot of complexity to this allocation approach. Moreover, it's quite possible that complex intermediate control — for example, if there's a loop somewhere between the definition of a value and its consuming use — will tend to block this kind of analysis and cause more unnecessary moves.

In contrast, the current SIL-generation scheme tends to be aware of at least the immediate local uses of values and therefore emits a lot of these kinds of initialization "in-place" already. But that said, it does proceed from a simple recursive examination of the AST, which means it will miss examples that are just slightly more opaque, like binding a return value as a let and only then returning it. That kind of thing is currently left to the optimizer for no real good reason.

Summary

Overall, I think representing local address-only values as SIL scalars with full SSA use/def chains is really promising, and I do think we can write an allocation pass that does an acceptable job eliminating unnecessary moves. In order to actually do this, though, I think we need two things:
  - We need SIL to be "pseudo-linear" as discussed above. We really don't want the allocation pass to have to worry about keeping values alive past consumptions, and thus potentially having to insert copies instead of moves.
  - We need the allocation pass to handle exits before initialization (like with try_apply) and other sorts of interfering control flow. It will not be acceptable for this to be a block-local analysis with only a trivial amount of cross-block argument merging.

John.
_______________________________________________
swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev

_______________________________________________
swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev