[semantic-arc][proposal] High Level ARC Memory Operations

The document attached below contains the first "Semantic ARC" mini proposal: the High Level ARC Memory Operations Proposal.

An html rendered version of this markdown document is available at the following URL:

https://gottesmm.github.io/proposals/high-level-arc-memory-operations.html

···

----

# Summary

This document proposes:

1. adding the `load_strong`, `store_strong` instructions to SIL. These can only
   be used with memory locations of `non-trivial` type.
2. banning the use of `load`, `store` on values of `non-trivial` type.

This will allow for:

1. eliminating optimizer miscompiles that occur due to releases being moved into
   the region in between a `load`/`retain`, `load`/`release`,
   `store`/`release`. (For a specific example, see the appendix).
2. modeling `load`/`store` as having `unsafe unowned` ownership semantics. This
   will be enforced via the verifier.
3. more aggressive ARC code motion.

# Definitions

## load_strong

We propose three different forms of load_strong differentiated via flags. First
define `load_strong` as follows:

    %x = load_strong %x_ptr : $*C

      =>

    %x = load %x_ptr : $*C
    retain_value %x : $C

Then define `load_strong [take]` as:

    %x = load_strong [take] %x_ptr : $*Builtin.NativeObject

      =>

    %x = load %x_ptr : $*Builtin.NativeObject

**NOTE** `load_strong [take]` implies that the loaded from memory location no
longer owns the result object (i.e. a take is a move). Loading from the memory
location again without reinitialization is illegal.

Next we provide `load_strong [guaranteed]`:

    %x = load_strong [guaranteed] %x_ptr : $*Builtin.NativeObject
    ...
    fixLifetime(%x)

      =>

    %x = load %x_ptr : $*Builtin.NativeObject
    ...
    fixLifetime(%x)

`load_strong [guaranteed]` implies that in the region before the fixLifetime,
the loaded object is guaranteed semantically to remain alive. The fixLifetime
communicates to the optimizer the location up to which the value's lifetime is
guaranteed to live. An example of where this construct is useful is when one has
a let binding to a class instance `c` that contains a let field `f`. In that
case `c`'s lifetime guarantees `f`'s lifetime.

## store_strong

Define a store_strong as follows:

    store_strong %x to %x_ptr : $*C

       =>

    %old_x = load %x_ptr : $*C
    store %new_x to %x_ptr : $*C
    release_value %old_x : $C

*NOTE* store_strong is defined as a consuming operation. We also provide
`store_strong [init]` in the case where we know statically that there is no
previous value in the memory location:

    store_strong %x to [init] %x_ptr : $*C

       =>

    store %new_x to %x_ptr : $*C

# Implementation

## Goals

Our implementation strategy goals are:

1. zero impact on other compiler developers until the feature is fully
   developed. This implies all work will be done behind a flag.
2. separation of feature implementation from updating passes.

Goal 2 will be implemented via a pass that blows up `load_strong`/`store_strong`
right after SILGen.

## Plan

We begin by adding initial infrastructure for our development. This means:

1. Adding to SILOptions a disabled by default flag called
"EnableSILOwnershipModel". This flag will be set by a false by default frontend
option called "-enable-sil-ownership-mode".

2. Bots will be brought up to test the compiler with
   "-enable-sil-ownership-model" set to true. The specific bots are:

   * RA-OSX+simulators
   * RA-Device
   * RA-Linux.

   The bots will run once a day until the feature is close to completion. Then a
   polling model will be followed.

Now that change isolation is guaranteed, we develop building blocks for the
optimization:

1. load_strong, store_strong will be added to SIL and IRGen, serialization,
printing, SIL parsing support will be implemented. SILGen will not be modified
at this stage.

2. A pass called the "OwnershipModelEliminator" will be implemented. It will
(initially) blow up load_strong/store_strong instructions into their constituent
operations.

3. An option called "EnforceSILOwnershipMode" will be added to the verifier. If
the option is set, the verifier will assert if unsafe unowned loads, stores are
used to load from non-trivial memory locations.

Finally, we wire up the building blocks:

1. If SILOption.EnableSILOwnershipModel is true, then the after SILGen SIL
   verification will be performed with EnforceSILOwnershipModel set to true.
2. If SILOption.EnableSILOwnershipModel is true, then the pass manager will run
   the OwnershipModelEliminator pass right after SILGen before the normal pass
   pipeline starts.
3. SILGen will be changed to emit load_strong, store_strong instructions when
   the EnableSILOwnershipModel flag is set. We will use the verifier throwing to
   guarantee that we are not missing any specific cases.

Then once all fo the bots are green, we change SILOption.EnableSILOwnershipModel
to be true by default. After a cooling off period, we move all of the code
behind the SILOwnershipModel flag in front of the flag. We do this so we can
reuse that flag for further SILOwnershipModel changes.

## Optimizer Changes

Since the SILOwnershipModel eliminator will eliminate the load_strong,
store_strong instructions right after ownership verification, there will be no
immediate affects on the optimizer and thus the optimizer changes can be done in
parallel with the rest of the ARC optimization work.

But, in the long run, we need IRGen to eliminate the load_strong, store_strong
instructions, not the SILOwnershipModel eliminator, so that we can enforce
Ownership invariants all through the SIL pipeline. Thus we will need to update
passes to handle these new instructions. The main optimizer changes can be
separated into the following areas: memory forwarding, dead stores, ARC
optimization. In all of these cases, the necessary changes are relatively
trivial to respond to. We give a quick taste of two of them: store->load
forwarding and ARC Code Motion.

### store->load forwarding

Currently we perform store->load forwarding as follows:

    store %x to %x_ptr : $C
    ... NO SIDE EFFECTS THAT TOUCH X_PTR ...
    %y = load %x_ptr : $C
    use(%y)

      =>

    store %x to %x_ptr : $C
    ... NO SIDE EFFECTS THAT TOUCH X_PTR ...
    use(%x)

In a world, where we are using load_strong, store_strong, we have to also
consider the ownership implications. *NOTE* Since we are not modifying the
store_strong, `store_strong` and `store_strong [init]` are treated the
same. Thus without any loss of generality, lets consider solely `store_strong`.

    store_strong %x to %x_ptr : $C
    ... NO SIDE EFFECTS THAT TOUCH X_PTR ...
    %y = load_strong %x_ptr : $C
    use(%y)

      =>

    store_strong %x to %x_ptr : $C
    ... NO SIDE EFFECTS THAT TOUCH X_PTR ...
    strong_retain %x
    use(%x)

### ARC Code Motion

If ARC Code Motion wishes to move `load_strong`, `store_strong` instructions, it
must now consider read/write effects. On the other hand, it will be able to now
not consider the side-effects of destructors when moving retain/release
operations.

### Normal Code Motion

Normal code motion will lose some effectiveness since many of the load/store
operations that it used to be able to move now must consider ARC information. We
may need to consider running ARC code motion earlier in the pipeline where we
normally run Normal Code Motion to ensure that we are able to handle these
cases.

### ARC Optimization

The main implication for ARC optimization is that instead of eliminating just
retains, releases, it must be able to recognize `load_strong`, `store_strong`
and set their flags as appropriate.

### Function Signature Optimization

Semantic ARC affects function signature optimization in the context of the owned
to guaranteed optimization. Specifically:

1. A `store_strong` must be recognized as a release of the old value that is
   being overridden. In such a case, we can move the `release` of the old value
   into the caller and change the `store_strong` into a `store_strong
   [init]`.
2. A `load_strong` must be recognized as a retain in the callee. Then function
   signature optimization will transform the `load_strong` into a `load_strong
   [guaranteed]`. This would require the addition of a new `@guaranteed` return
   value convention.

# Appendix

## Partial Initialization of Loadable References in SIL

In SIL, a value of non-trivial loadable type is loaded from a memory location as
follows:

    %x = load %x_ptr : $*S
    ...
    retain_value %x_ptr : $S

At first glance, this looks reasonable, but in truth there is a hidden drawback:
the partially initialized zone in between the load and the retain
operation. This zone creates a period of time when an "evil optimizer" could
insert an instruction that causes x to be deallocated before the copy is
finished being initialized. Similar issues come up when trying to perform a
store of a non-trival value into a memory location.

Since this sort of partial initialization is allowed in SIL, the optimizer is
forced to be overly conservative when attempting to move releases passed retains
lest the release triggers a deinit that destroys a value like `%x`. Lets look at
two concrete examples that show how semantically providing load_strong,
store_strong instructions eliminate this problem.

**NOTE** Without any loss of generality, we will speak of values with reference
semantics instead of non-trivial values.

## Case Study: Partial Initialization and load_strong

### The Problem

Consider the following swift program:

    func opaque_call()

    final class C {
      var int: Int = 0
      deinit {
        opaque_call()
      }
    }

    final class D {
      var int: Int = 0
    }

    var GLOBAL_C : C? = nil
    var GLOBAL_D : D? = nil

    func useC(_ c: C)
    func useD(_ d: D)

    func run() {
        let c = C()
        GLOBAL_C = c
        let d = D()
        GLOBAL_D = d
        useC(c)
        useD(d)
    }

Notice that both `C` and `D` have fixed layouts and separate class hierarchies,
but `C`'s deinit has a call to the function `opaque_call` which may write to
`GLOBAL_D` or `GLOBAL_C`. Additionally assume that both `useC` and `useD` are
known to the compiler to not have any affects on instances of type `D`, `C`
respectively and useC assigns `nil` to `GLOBAL_C`. Now consider the following
valid SIL lowering for `run`:

    sil_global GLOBAL_D : $D
    sil_global GLOBAL_C : $C

    final class C {
      var x: Int
      deinit
    }

    final class D {
      var x: Int
    }

    sil @useC : $@convention(thin) () -> ()
    sil @useD : $@convention(thin) () -> ()

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      %c2 = load %global_c : $*C (3)
      strong_retain %c2 : $C (4)
      %d2 = load %global_d : $*D (5)
      strong_retain %d2 : $D (6)

      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c2) : $@convention(thin) (@owned C) -> () (7)

      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %d : $D (9)
      strong_release %c : $C (10)
    }

Lets optimize this function! First we perform the following operations:

1. Since `(2)` is storing to an identified object that can not be `GLOBAL_C`, we
   can store to load forward `(1)` to `(3)`.
2. Since a retain does not block store to load forwarding, we can forward `(2)`
   to `(5)`. But lets for the sake of argument, assume that the optimizer keeps
   such information as an analysis and does not perform the actual load->store
   forwarding.
3. Even though we do not foward `(2)` to `(5)`, we can still move `(4)` over
   `(6)` so that `(4)` is right before `(7)`.

This yields (using the ' marker to designate that a register has had load-store
forwarding applied to it):

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      strong_retain %c : $C (4')
      %d2 = load %global_d : $*D (5)
      strong_retain %d2 : $D (6)

      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')

      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %d : $D (9)
      strong_release %c : $C (10)
    }

Then by assumption, we know that `%useC` does not perform any releases of any
instances of class `D`. Thus `(6)` can be moved past `(7')` and we can then pair
and eliminate `(6)` and `(9)` via the rules of ARC optimization using the
analysis information that `%d2` and `%d` are th same due to the possibility of
performing store->load forwarding. After performing such transformations, `run`
looks as follows:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D

      %d2 = load %global_d : $*D (5)
      strong_retain %c : $C (4')
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')

      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %c : $C (10)
    }

Now by assumption, we know that `%useD_func` does not touch any instances of
class `C` and `%c` does not contain any ivars of type `D` and is final so none
can be added. At first glance, this seems to suggest that we can move `(10)`
before `(8')` and then pair/eliminate `(4')` and `(10)`. But is this a safe
optimization perform? Absolutely Not! Why? Remember that since `useC_func`
assigns `nil` to `GLOBAL_C`, after `(7')`, `%c` could have a reference count
of 1. Thus `(10)` _may_ invoke the destructor of `C`. Since this destructor
calls an opaque function that _could_ potentially write to `GLOBAL_D`, we may be
be passing `%d2`, an already deallocated object to `%useD_func`, an illegal
optimization!

Lets think a bit more about this example and consider this example at the
language level. Remember that while Swift's deinit are not asychronous, we do
not allow for user level code to create dependencies from the body of the
destructor into the normal control flow that has called it. This means that
there are two valid results of this code:

- Operation Sequence 1: No optimization is performed and `%d2` is passed to
  `%useD_func`.
- Operation Sequence 2: We shorten the lifetime of `%c` before `%useD_func` and
   a different instance of `$D` is passed into `%useD_func`.

The fact that 1 occurs without optimization is just as a result of an
implementation detail of SILGen. 2 is also a valid sequence of operations.

Given that:

1. As a principle, the optimizer does not consider such dependencies to avoid
   being overly conservative.
2. We provide constructs to ensure appropriate lifetimes via the usage of
   constructs such as fix_lifetime.

We need to figure out how to express our optimization such that 2
happens. Remember that one of the optimizations that we performed at the
beginning was to move `(6)` over `(7')`, i.e., transform this:

      %d = alloc_ref $D
      %global_d_addr = global_addr GLOBAL_D : $D
      %d = load %global_d_addr : $*D (5)
      strong_retain %d : $D (6)

      // Call the user functions passing in the instances that we loaded from the globals.
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')

into:

      %global_d_addr = global_addr GLOBAL_D : $D
      %d2 = load %global_d_addr : $*D (5)

      // Call the user functions passing in the instances that we loaded from the globals.
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')
      strong_retain %d2 : $D (6)

This transformation in Swift corresponds to transforming:

      let d = GLOBAL_D
      useC(c)

to:

      let d_raw = load_d_value(GLOBAL_D)
      useC(c)
      let d = take_ownership_of_d(d_raw)

This is clearly an instance where we have moved a side-effect in between the
loading of the data and the taking ownership of such data, that is before the
`let` is fully initialized. What if instead of just moving the retain, we moved
the entire let statement? This would then result in the following swift code:

      useC(c)
      let d = GLOBAL_D

and would correspond to the following SIL snippet:

      %global_d_addr = global_addr GLOBAL_D : $D

      // Call the user functions passing in the instances that we loaded from the globals.
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')
      %d2 = load %global_d_addr : $*D (5)
      strong_retain %d2 : $D (6)

Moving the load with the strong_retain to ensure that the full initialization is
performed even after code motion causes our SIL to look as follows:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D

      strong_retain %c : $C (4')
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')

      %d2 = load %global_d : $*D (5)
      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %c : $C (10)
    }

Giving us the exact result that we want: Operation Sequence 2!

### Defining load_strong

Given that we wish the load, store to be tightly coupled together, it is natural
to express this operation as a `load_strong` instruction. Lets define the
`load_strong` instruction as follows:

    %1 = load_strong %0 : $*C

      =>

    %1 = load %0 : $*C
    retain_value %1 : $C

Now lets transform our initial example to use this instruction:

Notice how now if we move `(7)` over `(3)` and `(6)` now, we get the following SIL:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      %c2 = load_strong %global_c : $*C (3)
      %d2 = load_strong %global_d : $*D (5)

      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c2) : $@convention(thin) (@owned C) -> () (7)

      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %d : $D (9)
      strong_release %c : $C (10)
    }

We then perform the previous code motion:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      %c2 = load_strong %global_c : $*C (3)
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c2) : $@convention(thin) (@owned C) -> () (7)
      strong_release %d : $D (9)

      %d2 = load_strong %global_d : $*D (5)
      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)
      strong_release %c : $C (10)
    }

We then would like to eliminate `(9)` and `(10)` by pairing them with `(3)` and
`(8)`. Can we still do so? One way we could do this is by introducing the
`[take]` flag. The `[take]` flag on a load_strong says that one is semantically
loading a value from a memory location and are taking ownership of the value
thus eliding the retain. In terms of SIL this flag is defined as:

    %x = load_strong [take] %x_ptr : $*C

      =>

    %x = load %x_ptr : $*C

Why do we care about having such a `load_strong [take]` instruction when we
could just use a `load`? The reason why is that a normal `load` has unsafe
unowned ownership (i.e. it has no implications on ownership). We would like for
memory that has non-trivial type to only be able to be loaded via instructions
that maintain said ownership. We will allow for casting to trivial types as
usual to provide such access if it is required.

Thus we have achieved the desired result:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      %c2 = load_strong [take] %global_c : $*C (3)
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c2) : $@convention(thin) (@owned C) -> () (7)

      %d2 = load_strong [take] %global_d : $*D (5)
      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)
    }

The document attached below contains the first "Semantic ARC" mini proposal: the High Level ARC Memory Operations Proposal.

An html rendered version of this markdown document is available at the following URL:

https://gottesmm.github.io/proposals/high-level-arc-memory-operations.html

----

# Summary

This document proposes:

1. adding the `load_strong`, `store_strong` instructions to SIL. These can only
   be used with memory locations of `non-trivial` type.

I would really like to avoid using the word "strong" here. Under the current proposal, these instructions will be usable with arbitrary non-trivial types, not just primitive class references. Even if you think of an aggregate that happens to contain one or more strong references as some sort of aggregate strong reference (which is questionable but not completely absurd), we already have loadable non-strong class references that this operation would be usable with, like native unowned references. "load_strong %0 : $*@sil_unowned T" as an operation yielding a scalar "@sil_unowned T" is ridiculous, and it will only get more ridiculous when we eventually allow this operation to work with types that are currently address-only, like weak references.

Brainstorming:

Something like load_copy and store_copy would be a bit unfortunate, since store_copy doesn't actually copy the source operand and we want to have a load_copy [take].

load_value and store_value seem excessively generic. It's not like non-trivial types aren't values.

One question that comes to mind: do we actually need new instructions here other than for staging purposes? We don't actually need new instructions for pseudo-linear SIL to work; we just need to say that we only enforce pseudo-linearity for non-trivial types.

If we just want the instruction to be explicit about ownership so that we can easily distinguish these cases, we can make the rule always explicit, e.g.:
  load [take] %0 : $*MyClass
  load [copy] %0 : $*MyClass
  load [trivial] %0 : $*Int

  store %0 to [initialization] %1 : $*MyClass
  store %0 to [assignment] %1 : $*MyClass
  store %0 to [trivial] %1 : $*Int

John.

···

On Sep 30, 2016, at 11:54 PM, Michael Gottesman via swift-dev <swift-dev@swift.org> wrote:

2. banning the use of `load`, `store` on values of `non-trivial` type.

This will allow for:

1. eliminating optimizer miscompiles that occur due to releases being moved into
   the region in between a `load`/`retain`, `load`/`release`,
   `store`/`release`. (For a specific example, see the appendix).
2. modeling `load`/`store` as having `unsafe unowned` ownership semantics. This
   will be enforced via the verifier.
3. more aggressive ARC code motion.

# Definitions

## load_strong

We propose three different forms of load_strong differentiated via flags. First
define `load_strong` as follows:

    %x = load_strong %x_ptr : $*C

      =>

    %x = load %x_ptr : $*C
    retain_value %x : $C

Then define `load_strong [take]` as:

    %x = load_strong [take] %x_ptr : $*Builtin.NativeObject

      =>

    %x = load %x_ptr : $*Builtin.NativeObject

**NOTE** `load_strong [take]` implies that the loaded from memory location no
longer owns the result object (i.e. a take is a move). Loading from the memory
location again without reinitialization is illegal.

Next we provide `load_strong [guaranteed]`:

    %x = load_strong [guaranteed] %x_ptr : $*Builtin.NativeObject
    ...
    fixLifetime(%x)

      =>

    %x = load %x_ptr : $*Builtin.NativeObject
    ...
    fixLifetime(%x)

`load_strong [guaranteed]` implies that in the region before the fixLifetime,
the loaded object is guaranteed semantically to remain alive. The fixLifetime
communicates to the optimizer the location up to which the value's lifetime is
guaranteed to live. An example of where this construct is useful is when one has
a let binding to a class instance `c` that contains a let field `f`. In that
case `c`'s lifetime guarantees `f`'s lifetime.

## store_strong

Define a store_strong as follows:

    store_strong %x to %x_ptr : $*C

       =>

    %old_x = load %x_ptr : $*C
    store %new_x to %x_ptr : $*C
    release_value %old_x : $C

*NOTE* store_strong is defined as a consuming operation. We also provide
`store_strong [init]` in the case where we know statically that there is no
previous value in the memory location:

    store_strong %x to [init] %x_ptr : $*C

       =>

    store %new_x to %x_ptr : $*C

# Implementation

## Goals

Our implementation strategy goals are:

1. zero impact on other compiler developers until the feature is fully
   developed. This implies all work will be done behind a flag.
2. separation of feature implementation from updating passes.

Goal 2 will be implemented via a pass that blows up `load_strong`/`store_strong`
right after SILGen.

## Plan

We begin by adding initial infrastructure for our development. This means:

1. Adding to SILOptions a disabled by default flag called
"EnableSILOwnershipModel". This flag will be set by a false by default frontend
option called "-enable-sil-ownership-mode".

2. Bots will be brought up to test the compiler with
   "-enable-sil-ownership-model" set to true. The specific bots are:

   * RA-OSX+simulators
   * RA-Device
   * RA-Linux.

   The bots will run once a day until the feature is close to completion. Then a
   polling model will be followed.

Now that change isolation is guaranteed, we develop building blocks for the
optimization:

1. load_strong, store_strong will be added to SIL and IRGen, serialization,
printing, SIL parsing support will be implemented. SILGen will not be modified
at this stage.

2. A pass called the "OwnershipModelEliminator" will be implemented. It will
(initially) blow up load_strong/store_strong instructions into their constituent
operations.

3. An option called "EnforceSILOwnershipMode" will be added to the verifier. If
the option is set, the verifier will assert if unsafe unowned loads, stores are
used to load from non-trivial memory locations.

Finally, we wire up the building blocks:

1. If SILOption.EnableSILOwnershipModel is true, then the after SILGen SIL
   verification will be performed with EnforceSILOwnershipModel set to true.
2. If SILOption.EnableSILOwnershipModel is true, then the pass manager will run
   the OwnershipModelEliminator pass right after SILGen before the normal pass
   pipeline starts.
3. SILGen will be changed to emit load_strong, store_strong instructions when
   the EnableSILOwnershipModel flag is set. We will use the verifier throwing to
   guarantee that we are not missing any specific cases.

Then once all fo the bots are green, we change SILOption.EnableSILOwnershipModel
to be true by default. After a cooling off period, we move all of the code
behind the SILOwnershipModel flag in front of the flag. We do this so we can
reuse that flag for further SILOwnershipModel changes.

## Optimizer Changes

Since the SILOwnershipModel eliminator will eliminate the load_strong,
store_strong instructions right after ownership verification, there will be no
immediate affects on the optimizer and thus the optimizer changes can be done in
parallel with the rest of the ARC optimization work.

But, in the long run, we need IRGen to eliminate the load_strong, store_strong
instructions, not the SILOwnershipModel eliminator, so that we can enforce
Ownership invariants all through the SIL pipeline. Thus we will need to update
passes to handle these new instructions. The main optimizer changes can be
separated into the following areas: memory forwarding, dead stores, ARC
optimization. In all of these cases, the necessary changes are relatively
trivial to respond to. We give a quick taste of two of them: store->load
forwarding and ARC Code Motion.

### store->load forwarding

Currently we perform store->load forwarding as follows:

    store %x to %x_ptr : $C
    ... NO SIDE EFFECTS THAT TOUCH X_PTR ...
    %y = load %x_ptr : $C
    use(%y)

      =>

    store %x to %x_ptr : $C
    ... NO SIDE EFFECTS THAT TOUCH X_PTR ...
    use(%x)

In a world, where we are using load_strong, store_strong, we have to also
consider the ownership implications. *NOTE* Since we are not modifying the
store_strong, `store_strong` and `store_strong [init]` are treated the
same. Thus without any loss of generality, lets consider solely `store_strong`.

    store_strong %x to %x_ptr : $C
    ... NO SIDE EFFECTS THAT TOUCH X_PTR ...
    %y = load_strong %x_ptr : $C
    use(%y)

      =>

    store_strong %x to %x_ptr : $C
    ... NO SIDE EFFECTS THAT TOUCH X_PTR ...
    strong_retain %x
    use(%x)

### ARC Code Motion

If ARC Code Motion wishes to move `load_strong`, `store_strong` instructions, it
must now consider read/write effects. On the other hand, it will be able to now
not consider the side-effects of destructors when moving retain/release
operations.

### Normal Code Motion

Normal code motion will lose some effectiveness since many of the load/store
operations that it used to be able to move now must consider ARC information. We
may need to consider running ARC code motion earlier in the pipeline where we
normally run Normal Code Motion to ensure that we are able to handle these
cases.

### ARC Optimization

The main implication for ARC optimization is that instead of eliminating just
retains, releases, it must be able to recognize `load_strong`, `store_strong`
and set their flags as appropriate.

### Function Signature Optimization

Semantic ARC affects function signature optimization in the context of the owned
to guaranteed optimization. Specifically:

1. A `store_strong` must be recognized as a release of the old value that is
   being overridden. In such a case, we can move the `release` of the old value
   into the caller and change the `store_strong` into a `store_strong
   [init]`.
2. A `load_strong` must be recognized as a retain in the callee. Then function
   signature optimization will transform the `load_strong` into a `load_strong
   [guaranteed]`. This would require the addition of a new `@guaranteed` return
   value convention.

# Appendix

## Partial Initialization of Loadable References in SIL

In SIL, a value of non-trivial loadable type is loaded from a memory location as
follows:

    %x = load %x_ptr : $*S
    ...
    retain_value %x_ptr : $S

At first glance, this looks reasonable, but in truth there is a hidden drawback:
the partially initialized zone in between the load and the retain
operation. This zone creates a period of time when an "evil optimizer" could
insert an instruction that causes x to be deallocated before the copy is
finished being initialized. Similar issues come up when trying to perform a
store of a non-trival value into a memory location.

Since this sort of partial initialization is allowed in SIL, the optimizer is
forced to be overly conservative when attempting to move releases passed retains
lest the release triggers a deinit that destroys a value like `%x`. Lets look at
two concrete examples that show how semantically providing load_strong,
store_strong instructions eliminate this problem.

**NOTE** Without any loss of generality, we will speak of values with reference
semantics instead of non-trivial values.

## Case Study: Partial Initialization and load_strong

### The Problem

Consider the following swift program:

    func opaque_call()

    final class C {
      var int: Int = 0
      deinit {
        opaque_call()
      }
    }

    final class D {
      var int: Int = 0
    }

    var GLOBAL_C : C? = nil
    var GLOBAL_D : D? = nil

    func useC(_ c: C)
    func useD(_ d: D)

    func run() {
        let c = C()
        GLOBAL_C = c
        let d = D()
        GLOBAL_D = d
        useC(c)
        useD(d)
    }

Notice that both `C` and `D` have fixed layouts and separate class hierarchies,
but `C`'s deinit has a call to the function `opaque_call` which may write to
`GLOBAL_D` or `GLOBAL_C`. Additionally assume that both `useC` and `useD` are
known to the compiler to not have any affects on instances of type `D`, `C`
respectively and useC assigns `nil` to `GLOBAL_C`. Now consider the following
valid SIL lowering for `run`:

    sil_global GLOBAL_D : $D
    sil_global GLOBAL_C : $C

    final class C {
      var x: Int
      deinit
    }

    final class D {
      var x: Int
    }

    sil @useC : $@convention(thin) () -> ()
    sil @useD : $@convention(thin) () -> ()

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      %c2 = load %global_c : $*C (3)
      strong_retain %c2 : $C (4)
      %d2 = load %global_d : $*D (5)
      strong_retain %d2 : $D (6)

      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c2) : $@convention(thin) (@owned C) -> () (7)

      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %d : $D (9)
      strong_release %c : $C (10)
    }

Lets optimize this function! First we perform the following operations:

1. Since `(2)` is storing to an identified object that can not be `GLOBAL_C`, we
   can store to load forward `(1)` to `(3)`.
2. Since a retain does not block store to load forwarding, we can forward `(2)`
   to `(5)`. But lets for the sake of argument, assume that the optimizer keeps
   such information as an analysis and does not perform the actual load->store
   forwarding.
3. Even though we do not foward `(2)` to `(5)`, we can still move `(4)` over
   `(6)` so that `(4)` is right before `(7)`.

This yields (using the ' marker to designate that a register has had load-store
forwarding applied to it):

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      strong_retain %c : $C (4')
      %d2 = load %global_d : $*D (5)
      strong_retain %d2 : $D (6)

      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')

      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %d : $D (9)
      strong_release %c : $C (10)
    }

Then by assumption, we know that `%useC` does not perform any releases of any
instances of class `D`. Thus `(6)` can be moved past `(7')` and we can then pair
and eliminate `(6)` and `(9)` via the rules of ARC optimization using the
analysis information that `%d2` and `%d` are th same due to the possibility of
performing store->load forwarding. After performing such transformations, `run`
looks as follows:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D

      %d2 = load %global_d : $*D (5)
      strong_retain %c : $C (4')
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')

      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %c : $C (10)
    }

Now by assumption, we know that `%useD_func` does not touch any instances of
class `C` and `%c` does not contain any ivars of type `D` and is final so none
can be added. At first glance, this seems to suggest that we can move `(10)`
before `(8')` and then pair/eliminate `(4')` and `(10)`. But is this a safe
optimization perform? Absolutely Not! Why? Remember that since `useC_func`
assigns `nil` to `GLOBAL_C`, after `(7')`, `%c` could have a reference count
of 1. Thus `(10)` _may_ invoke the destructor of `C`. Since this destructor
calls an opaque function that _could_ potentially write to `GLOBAL_D`, we may be
be passing `%d2`, an already deallocated object to `%useD_func`, an illegal
optimization!

Lets think a bit more about this example and consider this example at the
language level. Remember that while Swift's deinit are not asychronous, we do
not allow for user level code to create dependencies from the body of the
destructor into the normal control flow that has called it. This means that
there are two valid results of this code:

- Operation Sequence 1: No optimization is performed and `%d2` is passed to
  `%useD_func`.
- Operation Sequence 2: We shorten the lifetime of `%c` before `%useD_func` and
   a different instance of `$D` is passed into `%useD_func`.

The fact that 1 occurs without optimization is just as a result of an
implementation detail of SILGen. 2 is also a valid sequence of operations.

Given that:

1. As a principle, the optimizer does not consider such dependencies to avoid
   being overly conservative.
2. We provide constructs to ensure appropriate lifetimes via the usage of
   constructs such as fix_lifetime.

We need to figure out how to express our optimization such that 2
happens. Remember that one of the optimizations that we performed at the
beginning was to move `(6)` over `(7')`, i.e., transform this:

      %d = alloc_ref $D
      %global_d_addr = global_addr GLOBAL_D : $D
      %d = load %global_d_addr : $*D (5)
      strong_retain %d : $D (6)

      // Call the user functions passing in the instances that we loaded from the globals.
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')

into:

      %global_d_addr = global_addr GLOBAL_D : $D
      %d2 = load %global_d_addr : $*D (5)

      // Call the user functions passing in the instances that we loaded from the globals.
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')
      strong_retain %d2 : $D (6)

This transformation in Swift corresponds to transforming:

      let d = GLOBAL_D
      useC(c)

to:

      let d_raw = load_d_value(GLOBAL_D)
      useC(c)
      let d = take_ownership_of_d(d_raw)

This is clearly an instance where we have moved a side-effect in between the
loading of the data and the taking ownership of such data, that is before the
`let` is fully initialized. What if instead of just moving the retain, we moved
the entire let statement? This would then result in the following swift code:

      useC(c)
      let d = GLOBAL_D

and would correspond to the following SIL snippet:

      %global_d_addr = global_addr GLOBAL_D : $D

      // Call the user functions passing in the instances that we loaded from the globals.
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')
      %d2 = load %global_d_addr : $*D (5)
      strong_retain %d2 : $D (6)

Moving the load with the strong_retain to ensure that the full initialization is
performed even after code motion causes our SIL to look as follows:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D

      strong_retain %c : $C (4')
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')

      %d2 = load %global_d : $*D (5)
      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %c : $C (10)
    }

Giving us the exact result that we want: Operation Sequence 2!

### Defining load_strong

Given that we wish the load, store to be tightly coupled together, it is natural
to express this operation as a `load_strong` instruction. Lets define the
`load_strong` instruction as follows:

    %1 = load_strong %0 : $*C

      =>

    %1 = load %0 : $*C
    retain_value %1 : $C

Now lets transform our initial example to use this instruction:

Notice how now if we move `(7)` over `(3)` and `(6)` now, we get the following SIL:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      %c2 = load_strong %global_c : $*C (3)
      %d2 = load_strong %global_d : $*D (5)

      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c2) : $@convention(thin) (@owned C) -> () (7)

      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %d : $D (9)
      strong_release %c : $C (10)
    }

We then perform the previous code motion:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      %c2 = load_strong %global_c : $*C (3)
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c2) : $@convention(thin) (@owned C) -> () (7)
      strong_release %d : $D (9)

      %d2 = load_strong %global_d : $*D (5)
      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)
      strong_release %c : $C (10)
    }

We then would like to eliminate `(9)` and `(10)` by pairing them with `(3)` and
`(8)`. Can we still do so? One way we could do this is by introducing the
`[take]` flag. The `[take]` flag on a load_strong says that one is semantically
loading a value from a memory location and are taking ownership of the value
thus eliding the retain. In terms of SIL this flag is defined as:

    %x = load_strong [take] %x_ptr : $*C

      =>

    %x = load %x_ptr : $*C

Why do we care about having such a `load_strong [take]` instruction when we
could just use a `load`? The reason why is that a normal `load` has unsafe
unowned ownership (i.e. it has no implications on ownership). We would like for
memory that has non-trivial type to only be able to be loaded via instructions
that maintain said ownership. We will allow for casting to trivial types as
usual to provide such access if it is required.

Thus we have achieved the desired result:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      %c2 = load_strong [take] %global_c : $*C (3)
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c2) : $@convention(thin) (@owned C) -> () (7)

      %d2 = load_strong [take] %global_d : $*D (5)
      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)
    }

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

The document attached below contains the first "Semantic ARC" mini proposal: the High Level ARC Memory Operations Proposal.

An html rendered version of this markdown document is available at the following URL:

https://gottesmm.github.io/proposals/high-level-arc-memory-operations.html

----

# Summary

This document proposes:

1. adding the `load_strong`, `store_strong` instructions to SIL. These can only
   be used with memory locations of `non-trivial` type.

I would really like to avoid using the word "strong" here. Under the current proposal, these instructions will be usable with arbitrary non-trivial types, not just primitive class references. Even if you think of an aggregate that happens to contain one or more strong references as some sort of aggregate strong reference (which is questionable but not completely absurd), we already have loadable non-strong class references that this operation would be usable with, like native unowned references. "load_strong %0 : $*@sil_unowned T" as an operation yielding a scalar "@sil_unowned T" is ridiculous, and it will only get more ridiculous when we eventually allow this operation to work with types that are currently address-only, like weak references.

Brainstorming:

Something like load_copy and store_copy would be a bit unfortunate, since store_copy doesn't actually copy the source operand and we want to have a load_copy [take].

load_value and store_value seem excessively generic. It's not like non-trivial types aren't values.

One question that comes to mind: do we actually need new instructions here other than for staging purposes? We don't actually need new instructions for pseudo-linear SIL to work; we just need to say that we only enforce pseudo-linearity for non-trivial types.

If we just want the instruction to be explicit about ownership so that we can easily distinguish these cases, we can make the rule always explicit, e.g.:
  load [take] %0 : $*MyClass
  load [copy] %0 : $*MyClass
  load [trivial] %0 : $*Int

  store %0 to [initialization] %1 : $*MyClass
  store %0 to [assignment] %1 : $*MyClass
  store %0 to [trivial] %1 : $*Int

John.

The reason why I originally suggested to go the load_strong route is that we already have load_weak, load_unowned instructions. If I could add a load_strong instruction, then it would make sense to assign an engineer to do a pass over all 3 of these instructions and combine them into 1 load instruction. That is, first transform into a form amenable for canonicalization and then canonicalize all at once.

As you pointed out, both load_unowned and load_weak involve representation changes in type (for instance the change of weak pointers to Optional<T>). Such a change would be against the "spirit" of a load instruction to perform such representation changes versus ownership changes.

In terms of the properties that we actually want here, what is important is that we can verify that no non-trivially typed values are loaded in an unsafe unowned manner. That can be done also with ownership flags on load/store.

Does this sound reasonable:

1. We introduce two enums that define memory ownership changes, one for load and one for store. Both of these enums will contain a [trivial] ownership.
2. We enforce in the verifier that non-trivial types must have a non-trivial ownership modifier on any memory operations that they are involved in.

Michael

···

On Oct 4, 2016, at 1:04 PM, John McCall <rjmccall@apple.com> wrote:

On Sep 30, 2016, at 11:54 PM, Michael Gottesman via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

2. banning the use of `load`, `store` on values of `non-trivial` type.

This will allow for:

1. eliminating optimizer miscompiles that occur due to releases being moved into
   the region in between a `load`/`retain`, `load`/`release`,
   `store`/`release`. (For a specific example, see the appendix).
2. modeling `load`/`store` as having `unsafe unowned` ownership semantics. This
   will be enforced via the verifier.
3. more aggressive ARC code motion.

# Definitions

## load_strong

We propose three different forms of load_strong differentiated via flags. First
define `load_strong` as follows:

    %x = load_strong %x_ptr : $*C

      =>

    %x = load %x_ptr : $*C
    retain_value %x : $C

Then define `load_strong [take]` as:

    %x = load_strong [take] %x_ptr : $*Builtin.NativeObject

      =>

    %x = load %x_ptr : $*Builtin.NativeObject

**NOTE** `load_strong [take]` implies that the loaded from memory location no
longer owns the result object (i.e. a take is a move). Loading from the memory
location again without reinitialization is illegal.

Next we provide `load_strong [guaranteed]`:

    %x = load_strong [guaranteed] %x_ptr : $*Builtin.NativeObject
    ...
    fixLifetime(%x)

      =>

    %x = load %x_ptr : $*Builtin.NativeObject
    ...
    fixLifetime(%x)

`load_strong [guaranteed]` implies that in the region before the fixLifetime,
the loaded object is guaranteed semantically to remain alive. The fixLifetime
communicates to the optimizer the location up to which the value's lifetime is
guaranteed to live. An example of where this construct is useful is when one has
a let binding to a class instance `c` that contains a let field `f`. In that
case `c`'s lifetime guarantees `f`'s lifetime.

## store_strong

Define a store_strong as follows:

    store_strong %x to %x_ptr : $*C

       =>

    %old_x = load %x_ptr : $*C
    store %new_x to %x_ptr : $*C
    release_value %old_x : $C

*NOTE* store_strong is defined as a consuming operation. We also provide
`store_strong [init]` in the case where we know statically that there is no
previous value in the memory location:

    store_strong %x to [init] %x_ptr : $*C

       =>

    store %new_x to %x_ptr : $*C

# Implementation

## Goals

Our implementation strategy goals are:

1. zero impact on other compiler developers until the feature is fully
   developed. This implies all work will be done behind a flag.
2. separation of feature implementation from updating passes.

Goal 2 will be implemented via a pass that blows up `load_strong`/`store_strong`
right after SILGen.

## Plan

We begin by adding initial infrastructure for our development. This means:

1. Adding to SILOptions a disabled by default flag called
"EnableSILOwnershipModel". This flag will be set by a false by default frontend
option called "-enable-sil-ownership-mode".

2. Bots will be brought up to test the compiler with
   "-enable-sil-ownership-model" set to true. The specific bots are:

   * RA-OSX+simulators
   * RA-Device
   * RA-Linux.

   The bots will run once a day until the feature is close to completion. Then a
   polling model will be followed.

Now that change isolation is guaranteed, we develop building blocks for the
optimization:

1. load_strong, store_strong will be added to SIL and IRGen, serialization,
printing, SIL parsing support will be implemented. SILGen will not be modified
at this stage.

2. A pass called the "OwnershipModelEliminator" will be implemented. It will
(initially) blow up load_strong/store_strong instructions into their constituent
operations.

3. An option called "EnforceSILOwnershipMode" will be added to the verifier. If
the option is set, the verifier will assert if unsafe unowned loads, stores are
used to load from non-trivial memory locations.

Finally, we wire up the building blocks:

1. If SILOption.EnableSILOwnershipModel is true, then the after SILGen SIL
   verification will be performed with EnforceSILOwnershipModel set to true.
2. If SILOption.EnableSILOwnershipModel is true, then the pass manager will run
   the OwnershipModelEliminator pass right after SILGen before the normal pass
   pipeline starts.
3. SILGen will be changed to emit load_strong, store_strong instructions when
   the EnableSILOwnershipModel flag is set. We will use the verifier throwing to
   guarantee that we are not missing any specific cases.

Then once all fo the bots are green, we change SILOption.EnableSILOwnershipModel
to be true by default. After a cooling off period, we move all of the code
behind the SILOwnershipModel flag in front of the flag. We do this so we can
reuse that flag for further SILOwnershipModel changes.

## Optimizer Changes

Since the SILOwnershipModel eliminator will eliminate the load_strong,
store_strong instructions right after ownership verification, there will be no
immediate affects on the optimizer and thus the optimizer changes can be done in
parallel with the rest of the ARC optimization work.

But, in the long run, we need IRGen to eliminate the load_strong, store_strong
instructions, not the SILOwnershipModel eliminator, so that we can enforce
Ownership invariants all through the SIL pipeline. Thus we will need to update
passes to handle these new instructions. The main optimizer changes can be
separated into the following areas: memory forwarding, dead stores, ARC
optimization. In all of these cases, the necessary changes are relatively
trivial to respond to. We give a quick taste of two of them: store->load
forwarding and ARC Code Motion.

### store->load forwarding

Currently we perform store->load forwarding as follows:

    store %x to %x_ptr : $C
    ... NO SIDE EFFECTS THAT TOUCH X_PTR ...
    %y = load %x_ptr : $C
    use(%y)

      =>

    store %x to %x_ptr : $C
    ... NO SIDE EFFECTS THAT TOUCH X_PTR ...
    use(%x)

In a world, where we are using load_strong, store_strong, we have to also
consider the ownership implications. *NOTE* Since we are not modifying the
store_strong, `store_strong` and `store_strong [init]` are treated the
same. Thus without any loss of generality, lets consider solely `store_strong`.

    store_strong %x to %x_ptr : $C
    ... NO SIDE EFFECTS THAT TOUCH X_PTR ...
    %y = load_strong %x_ptr : $C
    use(%y)

      =>

    store_strong %x to %x_ptr : $C
    ... NO SIDE EFFECTS THAT TOUCH X_PTR ...
    strong_retain %x
    use(%x)

### ARC Code Motion

If ARC Code Motion wishes to move `load_strong`, `store_strong` instructions, it
must now consider read/write effects. On the other hand, it will be able to now
not consider the side-effects of destructors when moving retain/release
operations.

### Normal Code Motion

Normal code motion will lose some effectiveness since many of the load/store
operations that it used to be able to move now must consider ARC information. We
may need to consider running ARC code motion earlier in the pipeline where we
normally run Normal Code Motion to ensure that we are able to handle these
cases.

### ARC Optimization

The main implication for ARC optimization is that instead of eliminating just
retains, releases, it must be able to recognize `load_strong`, `store_strong`
and set their flags as appropriate.

### Function Signature Optimization

Semantic ARC affects function signature optimization in the context of the owned
to guaranteed optimization. Specifically:

1. A `store_strong` must be recognized as a release of the old value that is
   being overridden. In such a case, we can move the `release` of the old value
   into the caller and change the `store_strong` into a `store_strong
   [init]`.
2. A `load_strong` must be recognized as a retain in the callee. Then function
   signature optimization will transform the `load_strong` into a `load_strong
   [guaranteed]`. This would require the addition of a new `@guaranteed` return
   value convention.

# Appendix

## Partial Initialization of Loadable References in SIL

In SIL, a value of non-trivial loadable type is loaded from a memory location as
follows:

    %x = load %x_ptr : $*S
    ...
    retain_value %x_ptr : $S

At first glance, this looks reasonable, but in truth there is a hidden drawback:
the partially initialized zone in between the load and the retain
operation. This zone creates a period of time when an "evil optimizer" could
insert an instruction that causes x to be deallocated before the copy is
finished being initialized. Similar issues come up when trying to perform a
store of a non-trival value into a memory location.

Since this sort of partial initialization is allowed in SIL, the optimizer is
forced to be overly conservative when attempting to move releases passed retains
lest the release triggers a deinit that destroys a value like `%x`. Lets look at
two concrete examples that show how semantically providing load_strong,
store_strong instructions eliminate this problem.

**NOTE** Without any loss of generality, we will speak of values with reference
semantics instead of non-trivial values.

## Case Study: Partial Initialization and load_strong

### The Problem

Consider the following swift program:

    func opaque_call()

    final class C {
      var int: Int = 0
      deinit {
        opaque_call()
      }
    }

    final class D {
      var int: Int = 0
    }

    var GLOBAL_C : C? = nil
    var GLOBAL_D : D? = nil

    func useC(_ c: C)
    func useD(_ d: D)

    func run() {
        let c = C()
        GLOBAL_C = c
        let d = D()
        GLOBAL_D = d
        useC(c)
        useD(d)
    }

Notice that both `C` and `D` have fixed layouts and separate class hierarchies,
but `C`'s deinit has a call to the function `opaque_call` which may write to
`GLOBAL_D` or `GLOBAL_C`. Additionally assume that both `useC` and `useD` are
known to the compiler to not have any affects on instances of type `D`, `C`
respectively and useC assigns `nil` to `GLOBAL_C`. Now consider the following
valid SIL lowering for `run`:

    sil_global GLOBAL_D : $D
    sil_global GLOBAL_C : $C

    final class C {
      var x: Int
      deinit
    }

    final class D {
      var x: Int
    }

    sil @useC : $@convention(thin) () -> ()
    sil @useD : $@convention(thin) () -> ()

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      %c2 = load %global_c : $*C (3)
      strong_retain %c2 : $C (4)
      %d2 = load %global_d : $*D (5)
      strong_retain %d2 : $D (6)

      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c2) : $@convention(thin) (@owned C) -> () (7)

      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %d : $D (9)
      strong_release %c : $C (10)
    }

Lets optimize this function! First we perform the following operations:

1. Since `(2)` is storing to an identified object that can not be `GLOBAL_C`, we
   can store to load forward `(1)` to `(3)`.
2. Since a retain does not block store to load forwarding, we can forward `(2)`
   to `(5)`. But lets for the sake of argument, assume that the optimizer keeps
   such information as an analysis and does not perform the actual load->store
   forwarding.
3. Even though we do not foward `(2)` to `(5)`, we can still move `(4)` over
   `(6)` so that `(4)` is right before `(7)`.

This yields (using the ' marker to designate that a register has had load-store
forwarding applied to it):

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      strong_retain %c : $C (4')
      %d2 = load %global_d : $*D (5)
      strong_retain %d2 : $D (6)

      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')

      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %d : $D (9)
      strong_release %c : $C (10)
    }

Then by assumption, we know that `%useC` does not perform any releases of any
instances of class `D`. Thus `(6)` can be moved past `(7')` and we can then pair
and eliminate `(6)` and `(9)` via the rules of ARC optimization using the
analysis information that `%d2` and `%d` are th same due to the possibility of
performing store->load forwarding. After performing such transformations, `run`
looks as follows:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D

      %d2 = load %global_d : $*D (5)
      strong_retain %c : $C (4')
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')

      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %c : $C (10)
    }

Now by assumption, we know that `%useD_func` does not touch any instances of
class `C` and `%c` does not contain any ivars of type `D` and is final so none
can be added. At first glance, this seems to suggest that we can move `(10)`
before `(8')` and then pair/eliminate `(4')` and `(10)`. But is this a safe
optimization perform? Absolutely Not! Why? Remember that since `useC_func`
assigns `nil` to `GLOBAL_C`, after `(7')`, `%c` could have a reference count
of 1. Thus `(10)` _may_ invoke the destructor of `C`. Since this destructor
calls an opaque function that _could_ potentially write to `GLOBAL_D`, we may be
be passing `%d2`, an already deallocated object to `%useD_func`, an illegal
optimization!

Lets think a bit more about this example and consider this example at the
language level. Remember that while Swift's deinit are not asychronous, we do
not allow for user level code to create dependencies from the body of the
destructor into the normal control flow that has called it. This means that
there are two valid results of this code:

- Operation Sequence 1: No optimization is performed and `%d2` is passed to
  `%useD_func`.
- Operation Sequence 2: We shorten the lifetime of `%c` before `%useD_func` and
   a different instance of `$D` is passed into `%useD_func`.

The fact that 1 occurs without optimization is just as a result of an
implementation detail of SILGen. 2 is also a valid sequence of operations.

Given that:

1. As a principle, the optimizer does not consider such dependencies to avoid
   being overly conservative.
2. We provide constructs to ensure appropriate lifetimes via the usage of
   constructs such as fix_lifetime.

We need to figure out how to express our optimization such that 2
happens. Remember that one of the optimizations that we performed at the
beginning was to move `(6)` over `(7')`, i.e., transform this:

      %d = alloc_ref $D
      %global_d_addr = global_addr GLOBAL_D : $D
      %d = load %global_d_addr : $*D (5)
      strong_retain %d : $D (6)

      // Call the user functions passing in the instances that we loaded from the globals.
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')

into:

      %global_d_addr = global_addr GLOBAL_D : $D
      %d2 = load %global_d_addr : $*D (5)

      // Call the user functions passing in the instances that we loaded from the globals.
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')
      strong_retain %d2 : $D (6)

This transformation in Swift corresponds to transforming:

      let d = GLOBAL_D
      useC(c)

to:

      let d_raw = load_d_value(GLOBAL_D)
      useC(c)
      let d = take_ownership_of_d(d_raw)

This is clearly an instance where we have moved a side-effect in between the
loading of the data and the taking ownership of such data, that is before the
`let` is fully initialized. What if instead of just moving the retain, we moved
the entire let statement? This would then result in the following swift code:

      useC(c)
      let d = GLOBAL_D

and would correspond to the following SIL snippet:

      %global_d_addr = global_addr GLOBAL_D : $D

      // Call the user functions passing in the instances that we loaded from the globals.
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')
      %d2 = load %global_d_addr : $*D (5)
      strong_retain %d2 : $D (6)

Moving the load with the strong_retain to ensure that the full initialization is
performed even after code motion causes our SIL to look as follows:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D

      strong_retain %c : $C (4')
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')

      %d2 = load %global_d : $*D (5)
      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %c : $C (10)
    }

Giving us the exact result that we want: Operation Sequence 2!

### Defining load_strong

Given that we wish the load, store to be tightly coupled together, it is natural
to express this operation as a `load_strong` instruction. Lets define the
`load_strong` instruction as follows:

    %1 = load_strong %0 : $*C

      =>

    %1 = load %0 : $*C
    retain_value %1 : $C

Now lets transform our initial example to use this instruction:

Notice how now if we move `(7)` over `(3)` and `(6)` now, we get the following SIL:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      %c2 = load_strong %global_c : $*C (3)
      %d2 = load_strong %global_d : $*D (5)

      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c2) : $@convention(thin) (@owned C) -> () (7)

      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %d : $D (9)
      strong_release %c : $C (10)
    }

We then perform the previous code motion:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      %c2 = load_strong %global_c : $*C (3)
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c2) : $@convention(thin) (@owned C) -> () (7)
      strong_release %d : $D (9)

      %d2 = load_strong %global_d : $*D (5)
      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)
      strong_release %c : $C (10)
    }

We then would like to eliminate `(9)` and `(10)` by pairing them with `(3)` and
`(8)`. Can we still do so? One way we could do this is by introducing the
`[take]` flag. The `[take]` flag on a load_strong says that one is semantically
loading a value from a memory location and are taking ownership of the value
thus eliding the retain. In terms of SIL this flag is defined as:

    %x = load_strong [take] %x_ptr : $*C

      =>

    %x = load %x_ptr : $*C

Why do we care about having such a `load_strong [take]` instruction when we
could just use a `load`? The reason why is that a normal `load` has unsafe
unowned ownership (i.e. it has no implications on ownership). We would like for
memory that has non-trivial type to only be able to be loaded via instructions
that maintain said ownership. We will allow for casting to trivial types as
usual to provide such access if it is required.

Thus we have achieved the desired result:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      %c2 = load_strong [take] %global_c : $*C (3)
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c2) : $@convention(thin) (@owned C) -> () (7)

      %d2 = load_strong [take] %global_d : $*D (5)
      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)
    }

_______________________________________________
swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev

The document attached below contains the first "Semantic ARC" mini proposal: the High Level ARC Memory Operations Proposal.

An html rendered version of this markdown document is available at the following URL:

https://gottesmm.github.io/proposals/high-level-arc-memory-operations.html

----

# Summary

This document proposes:

1. adding the `load_strong`, `store_strong` instructions to SIL. These can only
   be used with memory locations of `non-trivial` type.

I would really like to avoid using the word "strong" here. Under the current proposal, these instructions will be usable with arbitrary non-trivial types, not just primitive class references. Even if you think of an aggregate that happens to contain one or more strong references as some sort of aggregate strong reference (which is questionable but not completely absurd), we already have loadable non-strong class references that this operation would be usable with, like native unowned references. "load_strong %0 : $*@sil_unowned T" as an operation yielding a scalar "@sil_unowned T" is ridiculous, and it will only get more ridiculous when we eventually allow this operation to work with types that are currently address-only, like weak references.

Brainstorming:

Something like load_copy and store_copy would be a bit unfortunate, since store_copy doesn't actually copy the source operand and we want to have a load_copy [take].

load_value and store_value seem excessively generic. It's not like non-trivial types aren't values.

One question that comes to mind: do we actually need new instructions here other than for staging purposes? We don't actually need new instructions for pseudo-linear SIL to work; we just need to say that we only enforce pseudo-linearity for non-trivial types.

If we just want the instruction to be explicit about ownership so that we can easily distinguish these cases, we can make the rule always explicit, e.g.:
  load [take] %0 : $*MyClass
  load [copy] %0 : $*MyClass
  load [trivial] %0 : $*Int

  store %0 to [initialization] %1 : $*MyClass
  store %0 to [assignment] %1 : $*MyClass
  store %0 to [trivial] %1 : $*Int

John.

The reason why I originally suggested to go the load_strong route is that we already have load_weak, load_unowned instructions. If I could add a load_strong instruction, then it would make sense to assign an engineer to do a pass over all 3 of these instructions and combine them into 1 load instruction. That is, first transform into a form amenable for canonicalization and then canonicalize all at once.

As you pointed out, both load_unowned and load_weak involve representation changes in type (for instance the change of weak pointers to Optional<T>). Such a change would be against the "spirit" of a load instruction to perform such representation changes versus ownership changes.

In terms of the properties that we actually want here, what is important is that we can verify that no non-trivially typed values are loaded in an unsafe unowned manner. That can be done also with ownership flags on load/store.

Does this sound reasonable:

1. We introduce two enums that define memory ownership changes, one for load and one for store. Both of these enums will contain a [trivial] ownership.
2. We enforce in the verifier that non-trivial types must have a non-trivial ownership modifier on any memory operations that they are involved in.

Sorry for not being explicit. I will not add new instructions, just modifiers. Assuming that this is agreeable to you, I am going to prepare a quick additional version of the proposal document.

···

On Oct 5, 2016, at 4:40 PM, Michael Gottesman via swift-dev <swift-dev@swift.org> wrote:

On Oct 4, 2016, at 1:04 PM, John McCall <rjmccall@apple.com <mailto:rjmccall@apple.com>> wrote:

On Sep 30, 2016, at 11:54 PM, Michael Gottesman via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

Michael

2. banning the use of `load`, `store` on values of `non-trivial` type.

This will allow for:

1. eliminating optimizer miscompiles that occur due to releases being moved into
   the region in between a `load`/`retain`, `load`/`release`,
   `store`/`release`. (For a specific example, see the appendix).
2. modeling `load`/`store` as having `unsafe unowned` ownership semantics. This
   will be enforced via the verifier.
3. more aggressive ARC code motion.

# Definitions

## load_strong

We propose three different forms of load_strong differentiated via flags. First
define `load_strong` as follows:

    %x = load_strong %x_ptr : $*C

      =>

    %x = load %x_ptr : $*C
    retain_value %x : $C

Then define `load_strong [take]` as:

    %x = load_strong [take] %x_ptr : $*Builtin.NativeObject

      =>

    %x = load %x_ptr : $*Builtin.NativeObject

**NOTE** `load_strong [take]` implies that the loaded from memory location no
longer owns the result object (i.e. a take is a move). Loading from the memory
location again without reinitialization is illegal.

Next we provide `load_strong [guaranteed]`:

    %x = load_strong [guaranteed] %x_ptr : $*Builtin.NativeObject
    ...
    fixLifetime(%x)

      =>

    %x = load %x_ptr : $*Builtin.NativeObject
    ...
    fixLifetime(%x)

`load_strong [guaranteed]` implies that in the region before the fixLifetime,
the loaded object is guaranteed semantically to remain alive. The fixLifetime
communicates to the optimizer the location up to which the value's lifetime is
guaranteed to live. An example of where this construct is useful is when one has
a let binding to a class instance `c` that contains a let field `f`. In that
case `c`'s lifetime guarantees `f`'s lifetime.

## store_strong

Define a store_strong as follows:

    store_strong %x to %x_ptr : $*C

       =>

    %old_x = load %x_ptr : $*C
    store %new_x to %x_ptr : $*C
    release_value %old_x : $C

*NOTE* store_strong is defined as a consuming operation. We also provide
`store_strong [init]` in the case where we know statically that there is no
previous value in the memory location:

    store_strong %x to [init] %x_ptr : $*C

       =>

    store %new_x to %x_ptr : $*C

# Implementation

## Goals

Our implementation strategy goals are:

1. zero impact on other compiler developers until the feature is fully
   developed. This implies all work will be done behind a flag.
2. separation of feature implementation from updating passes.

Goal 2 will be implemented via a pass that blows up `load_strong`/`store_strong`
right after SILGen.

## Plan

We begin by adding initial infrastructure for our development. This means:

1. Adding to SILOptions a disabled by default flag called
"EnableSILOwnershipModel". This flag will be set by a false by default frontend
option called "-enable-sil-ownership-mode".

2. Bots will be brought up to test the compiler with
   "-enable-sil-ownership-model" set to true. The specific bots are:

   * RA-OSX+simulators
   * RA-Device
   * RA-Linux.

   The bots will run once a day until the feature is close to completion. Then a
   polling model will be followed.

Now that change isolation is guaranteed, we develop building blocks for the
optimization:

1. load_strong, store_strong will be added to SIL and IRGen, serialization,
printing, SIL parsing support will be implemented. SILGen will not be modified
at this stage.

2. A pass called the "OwnershipModelEliminator" will be implemented. It will
(initially) blow up load_strong/store_strong instructions into their constituent
operations.

3. An option called "EnforceSILOwnershipMode" will be added to the verifier. If
the option is set, the verifier will assert if unsafe unowned loads, stores are
used to load from non-trivial memory locations.

Finally, we wire up the building blocks:

1. If SILOption.EnableSILOwnershipModel is true, then the after SILGen SIL
   verification will be performed with EnforceSILOwnershipModel set to true.
2. If SILOption.EnableSILOwnershipModel is true, then the pass manager will run
   the OwnershipModelEliminator pass right after SILGen before the normal pass
   pipeline starts.
3. SILGen will be changed to emit load_strong, store_strong instructions when
   the EnableSILOwnershipModel flag is set. We will use the verifier throwing to
   guarantee that we are not missing any specific cases.

Then once all fo the bots are green, we change SILOption.EnableSILOwnershipModel
to be true by default. After a cooling off period, we move all of the code
behind the SILOwnershipModel flag in front of the flag. We do this so we can
reuse that flag for further SILOwnershipModel changes.

## Optimizer Changes

Since the SILOwnershipModel eliminator will eliminate the load_strong,
store_strong instructions right after ownership verification, there will be no
immediate affects on the optimizer and thus the optimizer changes can be done in
parallel with the rest of the ARC optimization work.

But, in the long run, we need IRGen to eliminate the load_strong, store_strong
instructions, not the SILOwnershipModel eliminator, so that we can enforce
Ownership invariants all through the SIL pipeline. Thus we will need to update
passes to handle these new instructions. The main optimizer changes can be
separated into the following areas: memory forwarding, dead stores, ARC
optimization. In all of these cases, the necessary changes are relatively
trivial to respond to. We give a quick taste of two of them: store->load
forwarding and ARC Code Motion.

### store->load forwarding

Currently we perform store->load forwarding as follows:

    store %x to %x_ptr : $C
    ... NO SIDE EFFECTS THAT TOUCH X_PTR ...
    %y = load %x_ptr : $C
    use(%y)

      =>

    store %x to %x_ptr : $C
    ... NO SIDE EFFECTS THAT TOUCH X_PTR ...
    use(%x)

In a world, where we are using load_strong, store_strong, we have to also
consider the ownership implications. *NOTE* Since we are not modifying the
store_strong, `store_strong` and `store_strong [init]` are treated the
same. Thus without any loss of generality, lets consider solely `store_strong`.

    store_strong %x to %x_ptr : $C
    ... NO SIDE EFFECTS THAT TOUCH X_PTR ...
    %y = load_strong %x_ptr : $C
    use(%y)

      =>

    store_strong %x to %x_ptr : $C
    ... NO SIDE EFFECTS THAT TOUCH X_PTR ...
    strong_retain %x
    use(%x)

### ARC Code Motion

If ARC Code Motion wishes to move `load_strong`, `store_strong` instructions, it
must now consider read/write effects. On the other hand, it will be able to now
not consider the side-effects of destructors when moving retain/release
operations.

### Normal Code Motion

Normal code motion will lose some effectiveness since many of the load/store
operations that it used to be able to move now must consider ARC information. We
may need to consider running ARC code motion earlier in the pipeline where we
normally run Normal Code Motion to ensure that we are able to handle these
cases.

### ARC Optimization

The main implication for ARC optimization is that instead of eliminating just
retains, releases, it must be able to recognize `load_strong`, `store_strong`
and set their flags as appropriate.

### Function Signature Optimization

Semantic ARC affects function signature optimization in the context of the owned
to guaranteed optimization. Specifically:

1. A `store_strong` must be recognized as a release of the old value that is
   being overridden. In such a case, we can move the `release` of the old value
   into the caller and change the `store_strong` into a `store_strong
   [init]`.
2. A `load_strong` must be recognized as a retain in the callee. Then function
   signature optimization will transform the `load_strong` into a `load_strong
   [guaranteed]`. This would require the addition of a new `@guaranteed` return
   value convention.

# Appendix

## Partial Initialization of Loadable References in SIL

In SIL, a value of non-trivial loadable type is loaded from a memory location as
follows:

    %x = load %x_ptr : $*S
    ...
    retain_value %x_ptr : $S

At first glance, this looks reasonable, but in truth there is a hidden drawback:
the partially initialized zone in between the load and the retain
operation. This zone creates a period of time when an "evil optimizer" could
insert an instruction that causes x to be deallocated before the copy is
finished being initialized. Similar issues come up when trying to perform a
store of a non-trival value into a memory location.

Since this sort of partial initialization is allowed in SIL, the optimizer is
forced to be overly conservative when attempting to move releases passed retains
lest the release triggers a deinit that destroys a value like `%x`. Lets look at
two concrete examples that show how semantically providing load_strong,
store_strong instructions eliminate this problem.

**NOTE** Without any loss of generality, we will speak of values with reference
semantics instead of non-trivial values.

## Case Study: Partial Initialization and load_strong

### The Problem

Consider the following swift program:

    func opaque_call()

    final class C {
      var int: Int = 0
      deinit {
        opaque_call()
      }
    }

    final class D {
      var int: Int = 0
    }

    var GLOBAL_C : C? = nil
    var GLOBAL_D : D? = nil

    func useC(_ c: C)
    func useD(_ d: D)

    func run() {
        let c = C()
        GLOBAL_C = c
        let d = D()
        GLOBAL_D = d
        useC(c)
        useD(d)
    }

Notice that both `C` and `D` have fixed layouts and separate class hierarchies,
but `C`'s deinit has a call to the function `opaque_call` which may write to
`GLOBAL_D` or `GLOBAL_C`. Additionally assume that both `useC` and `useD` are
known to the compiler to not have any affects on instances of type `D`, `C`
respectively and useC assigns `nil` to `GLOBAL_C`. Now consider the following
valid SIL lowering for `run`:

    sil_global GLOBAL_D : $D
    sil_global GLOBAL_C : $C

    final class C {
      var x: Int
      deinit
    }

    final class D {
      var x: Int
    }

    sil @useC : $@convention(thin) () -> ()
    sil @useD : $@convention(thin) () -> ()

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      %c2 = load %global_c : $*C (3)
      strong_retain %c2 : $C (4)
      %d2 = load %global_d : $*D (5)
      strong_retain %d2 : $D (6)

      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c2) : $@convention(thin) (@owned C) -> () (7)

      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %d : $D (9)
      strong_release %c : $C (10)
    }

Lets optimize this function! First we perform the following operations:

1. Since `(2)` is storing to an identified object that can not be `GLOBAL_C`, we
   can store to load forward `(1)` to `(3)`.
2. Since a retain does not block store to load forwarding, we can forward `(2)`
   to `(5)`. But lets for the sake of argument, assume that the optimizer keeps
   such information as an analysis and does not perform the actual load->store
   forwarding.
3. Even though we do not foward `(2)` to `(5)`, we can still move `(4)` over
   `(6)` so that `(4)` is right before `(7)`.

This yields (using the ' marker to designate that a register has had load-store
forwarding applied to it):

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      strong_retain %c : $C (4')
      %d2 = load %global_d : $*D (5)
      strong_retain %d2 : $D (6)

      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')

      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %d : $D (9)
      strong_release %c : $C (10)
    }

Then by assumption, we know that `%useC` does not perform any releases of any
instances of class `D`. Thus `(6)` can be moved past `(7')` and we can then pair
and eliminate `(6)` and `(9)` via the rules of ARC optimization using the
analysis information that `%d2` and `%d` are th same due to the possibility of
performing store->load forwarding. After performing such transformations, `run`
looks as follows:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D

      %d2 = load %global_d : $*D (5)
      strong_retain %c : $C (4')
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')

      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %c : $C (10)
    }

Now by assumption, we know that `%useD_func` does not touch any instances of
class `C` and `%c` does not contain any ivars of type `D` and is final so none
can be added. At first glance, this seems to suggest that we can move `(10)`
before `(8')` and then pair/eliminate `(4')` and `(10)`. But is this a safe
optimization perform? Absolutely Not! Why? Remember that since `useC_func`
assigns `nil` to `GLOBAL_C`, after `(7')`, `%c` could have a reference count
of 1. Thus `(10)` _may_ invoke the destructor of `C`. Since this destructor
calls an opaque function that _could_ potentially write to `GLOBAL_D`, we may be
be passing `%d2`, an already deallocated object to `%useD_func`, an illegal
optimization!

Lets think a bit more about this example and consider this example at the
language level. Remember that while Swift's deinit are not asychronous, we do
not allow for user level code to create dependencies from the body of the
destructor into the normal control flow that has called it. This means that
there are two valid results of this code:

- Operation Sequence 1: No optimization is performed and `%d2` is passed to
  `%useD_func`.
- Operation Sequence 2: We shorten the lifetime of `%c` before `%useD_func` and
   a different instance of `$D` is passed into `%useD_func`.

The fact that 1 occurs without optimization is just as a result of an
implementation detail of SILGen. 2 is also a valid sequence of operations.

Given that:

1. As a principle, the optimizer does not consider such dependencies to avoid
   being overly conservative.
2. We provide constructs to ensure appropriate lifetimes via the usage of
   constructs such as fix_lifetime.

We need to figure out how to express our optimization such that 2
happens. Remember that one of the optimizations that we performed at the
beginning was to move `(6)` over `(7')`, i.e., transform this:

      %d = alloc_ref $D
      %global_d_addr = global_addr GLOBAL_D : $D
      %d = load %global_d_addr : $*D (5)
      strong_retain %d : $D (6)

      // Call the user functions passing in the instances that we loaded from the globals.
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')

into:

      %global_d_addr = global_addr GLOBAL_D : $D
      %d2 = load %global_d_addr : $*D (5)

      // Call the user functions passing in the instances that we loaded from the globals.
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')
      strong_retain %d2 : $D (6)

This transformation in Swift corresponds to transforming:

      let d = GLOBAL_D
      useC(c)

to:

      let d_raw = load_d_value(GLOBAL_D)
      useC(c)
      let d = take_ownership_of_d(d_raw)

This is clearly an instance where we have moved a side-effect in between the
loading of the data and the taking ownership of such data, that is before the
`let` is fully initialized. What if instead of just moving the retain, we moved
the entire let statement? This would then result in the following swift code:

      useC(c)
      let d = GLOBAL_D

and would correspond to the following SIL snippet:

      %global_d_addr = global_addr GLOBAL_D : $D

      // Call the user functions passing in the instances that we loaded from the globals.
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')
      %d2 = load %global_d_addr : $*D (5)
      strong_retain %d2 : $D (6)

Moving the load with the strong_retain to ensure that the full initialization is
performed even after code motion causes our SIL to look as follows:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D

      strong_retain %c : $C (4')
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')

      %d2 = load %global_d : $*D (5)
      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %c : $C (10)
    }

Giving us the exact result that we want: Operation Sequence 2!

### Defining load_strong

Given that we wish the load, store to be tightly coupled together, it is natural
to express this operation as a `load_strong` instruction. Lets define the
`load_strong` instruction as follows:

    %1 = load_strong %0 : $*C

      =>

    %1 = load %0 : $*C
    retain_value %1 : $C

Now lets transform our initial example to use this instruction:

Notice how now if we move `(7)` over `(3)` and `(6)` now, we get the following SIL:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      %c2 = load_strong %global_c : $*C (3)
      %d2 = load_strong %global_d : $*D (5)

      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c2) : $@convention(thin) (@owned C) -> () (7)

      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %d : $D (9)
      strong_release %c : $C (10)
    }

We then perform the previous code motion:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      %c2 = load_strong %global_c : $*C (3)
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c2) : $@convention(thin) (@owned C) -> () (7)
      strong_release %d : $D (9)

      %d2 = load_strong %global_d : $*D (5)
      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)
      strong_release %c : $C (10)
    }

We then would like to eliminate `(9)` and `(10)` by pairing them with `(3)` and
`(8)`. Can we still do so? One way we could do this is by introducing the
`[take]` flag. The `[take]` flag on a load_strong says that one is semantically
loading a value from a memory location and are taking ownership of the value
thus eliding the retain. In terms of SIL this flag is defined as:

    %x = load_strong [take] %x_ptr : $*C

      =>

    %x = load %x_ptr : $*C

Why do we care about having such a `load_strong [take]` instruction when we
could just use a `load`? The reason why is that a normal `load` has unsafe
unowned ownership (i.e. it has no implications on ownership). We would like for
memory that has non-trivial type to only be able to be loaded via instructions
that maintain said ownership. We will allow for casting to trivial types as
usual to provide such access if it is required.

Thus we have achieved the desired result:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      %c2 = load_strong [take] %global_c : $*C (3)
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c2) : $@convention(thin) (@owned C) -> () (7)

      %d2 = load_strong [take] %global_d : $*D (5)
      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)
    }

_______________________________________________
swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev

_______________________________________________
swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev

That sounds great, thanks.

John.

···

On Oct 5, 2016, at 4:48 PM, Michael Gottesman <mgottesman@apple.com> wrote:

On Oct 5, 2016, at 4:40 PM, Michael Gottesman via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

On Oct 4, 2016, at 1:04 PM, John McCall <rjmccall@apple.com <mailto:rjmccall@apple.com>> wrote:

On Sep 30, 2016, at 11:54 PM, Michael Gottesman via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

The document attached below contains the first "Semantic ARC" mini proposal: the High Level ARC Memory Operations Proposal.

An html rendered version of this markdown document is available at the following URL:

https://gottesmm.github.io/proposals/high-level-arc-memory-operations.html

----

# Summary

This document proposes:

1. adding the `load_strong`, `store_strong` instructions to SIL. These can only
   be used with memory locations of `non-trivial` type.

I would really like to avoid using the word "strong" here. Under the current proposal, these instructions will be usable with arbitrary non-trivial types, not just primitive class references. Even if you think of an aggregate that happens to contain one or more strong references as some sort of aggregate strong reference (which is questionable but not completely absurd), we already have loadable non-strong class references that this operation would be usable with, like native unowned references. "load_strong %0 : $*@sil_unowned T" as an operation yielding a scalar "@sil_unowned T" is ridiculous, and it will only get more ridiculous when we eventually allow this operation to work with types that are currently address-only, like weak references.

Brainstorming:

Something like load_copy and store_copy would be a bit unfortunate, since store_copy doesn't actually copy the source operand and we want to have a load_copy [take].

load_value and store_value seem excessively generic. It's not like non-trivial types aren't values.

One question that comes to mind: do we actually need new instructions here other than for staging purposes? We don't actually need new instructions for pseudo-linear SIL to work; we just need to say that we only enforce pseudo-linearity for non-trivial types.

If we just want the instruction to be explicit about ownership so that we can easily distinguish these cases, we can make the rule always explicit, e.g.:
  load [take] %0 : $*MyClass
  load [copy] %0 : $*MyClass
  load [trivial] %0 : $*Int

  store %0 to [initialization] %1 : $*MyClass
  store %0 to [assignment] %1 : $*MyClass
  store %0 to [trivial] %1 : $*Int

John.

The reason why I originally suggested to go the load_strong route is that we already have load_weak, load_unowned instructions. If I could add a load_strong instruction, then it would make sense to assign an engineer to do a pass over all 3 of these instructions and combine them into 1 load instruction. That is, first transform into a form amenable for canonicalization and then canonicalize all at once.

As you pointed out, both load_unowned and load_weak involve representation changes in type (for instance the change of weak pointers to Optional<T>). Such a change would be against the "spirit" of a load instruction to perform such representation changes versus ownership changes.

In terms of the properties that we actually want here, what is important is that we can verify that no non-trivially typed values are loaded in an unsafe unowned manner. That can be done also with ownership flags on load/store.

Does this sound reasonable:

1. We introduce two enums that define memory ownership changes, one for load and one for store. Both of these enums will contain a [trivial] ownership.
2. We enforce in the verifier that non-trivial types must have a non-trivial ownership modifier on any memory operations that they are involved in.

Sorry for not being explicit. I will not add new instructions, just modifiers. Assuming that this is agreeable to you, I am going to prepare a quick additional version of the proposal document.

Attached below is an updated version of the proposal. Again a rendered version is located at:

https://gottesmm.github.io/proposals/high-level-arc-memory-operations.html

Michael

···

----

# Summary

This document proposes:

1. adding the following ownership qualifiers to `load`: `[take]`, `[copy]`,
   `[borrow]`, `[trivial]`.
2. adding the following ownership qualifiers to `store`: `[init]`, `[assign]`,
   `[trivial]`.
3. requiring all `load` and `store` operations to have ownership qualifiers.
4. banning the use of `load [trivial]`, `store [trivial]` on memory locations of
   `non-trivial` type.

This will allow for:

1. eliminating optimizer miscompiles that occur due to releases being moved into
   the region in between a `load`/`retain`, `load`/`release`,
   `store`/`release`. (For a specific example, see the appendix).
2. explicitly modeling `load [trivial]`/`store [trivial]` as having `unsafe
   unowned` ownership semantics. This will be enforced via the verifier.
3. more aggressive ARC code motion.

# Definitions

## ownership qualified load

We propose four different ownership qualifiers for load. Define `load [trivial]`
as:

    %x = load [trivial] %x_ptr : $*Int

      =>

    %x = load %x_ptr : $*Int

A `load [trivial]` can not be used to load values of non-trivial type. Define
`load [copy]` as:

    %x = load [copy] %x_ptr : $*C

      =>

    %x = load %x_ptr : $*C
    retain_value %x : $C

Then define `load [take]` as:

    %x = load [take] %x_ptr : $*Builtin.NativeObject

      =>

    %x = load %x_ptr : $*Builtin.NativeObject

**NOTE** `load [take]` implies that the loaded from memory location no longer
owns the result object (i.e. a take is a move). Loading from the memory location
again without reinitialization is illegal.

Next we provide `load [borrow]`:

    %x = load [borrow] %x_ptr : $*Builtin.NativeObject
    ...
    endBorrow(%x, %x_ptr)

      =>

    %x = load %x_ptr : $*Builtin.NativeObject
    ...
    endBorrow(%x, %x_ptr)

`load [borrow]` implies that in the region between the `load` and the
`endBorrow`, the loaded object must semantically remain alive. The `endBorrow`
communicates to the optimizer:

1. That the value in `%x_ptr` should not be destroyed before endBorrow.
2. Uses of `%x` should not be sunk past endBorrow since `%x` is only a shallow
   copy of the value in `%x_ptr` and past that point `%x_ptr` may not remain
   alive.

An example of where this construct is useful is when one has a let binding to a
class instance `c` that contains a let field `f`. In that case `c`'s lifetime
guarantees `f`'s lifetime meaning that returning `f` over the function call
boundary is safe.

## ownership qualified store

First define a `store [trivial]` as:

    store %x to [trivial] %x_ptr : $*Int

      =>

    store %x to %x_ptr : $*Int

The verifier will prevent this instruction from being used on types with
non-trivial ownership. Define a `store [assign]` as follows:

    store %x to [assign] %x_ptr : $*C

       =>

    %old_x = load %x_ptr : $*C
    store %new_x to %x_ptr : $*C
    release_value %old_x : $C

*NOTE* `store` is defined as a consuming operation. We also provide
`store [init]` in the case where we know statically that there is no
previous value in the memory location:

    store %x to [init] %x_ptr : $*C

       =>

    store %new_x to %x_ptr : $*C

# Implementation

## Goals

Our implementation strategy goals are:

1. zero impact on other compiler developers until the feature is fully
   developed. This implies all work will be done behind a flag.
2. separation of feature implementation from updating passes.

Goal 2 will be implemented via a pass that transforms ownership qualified
`load`/`store` instructions into unqualified `load`/`store` right after SILGen.

## Plan

We begin by adding initial infrastructure for our development. This means:

1. Adding to SILOptions a disabled by default flag called
"EnableSILOwnershipModel". This flag will be set by a false by default frontend
option called "-enable-sil-ownership-mode".

2. Bots will be brought up to test the compiler with
   "-enable-sil-ownership-model" set to true. The specific bots are:

   * RA-OSX+simulators
   * RA-Device
   * RA-Linux.

   The bots will run once a day until the feature is close to completion. Then a
   polling model will be followed.

Now that change isolation is borrow, we develop building blocks for the
optimization:

1. Two enums will be defined: `LoadInstOwnershipQualifier`,
   `StoreInstOwnershipQualifier`. The exact definition of these enums are as
   follows:

       enum class LoadOwnershipQualifier {
         Unqualified, Take, Copy, Borrow, Trivial
       };
       enum class StoreOwnershipQualifier {
         Unqualified, Init, Assign, Trivial
       };

    *NOTE* `LoadOwnershipQualifier::Unqualified` and
    `StoreOwnershipQualifier::Unqualified` are only needed for staging purposes.

2. Creating a `LoadInst`, `StoreInst` will be changed to require an ownership
qualifier. At this stage, this argument will default to `Unqualified`. "Bare"
`load`, `store` when parsed via textual SIL will be considered to be
unqualified. This implies that the rest of the compiler will not have to be
changed as a result of this step.

3. Support will be added to SIL, IRGen, Serialization, SILPrinting, and SIL
Parsing for the rest of the qualifiers. SILGen will not be modified at this
stage.

4. A pass called the "OwnershipModelEliminator" will be implemented. It will
   blow up all `load`, `store` instructions with non `*::Unqualified` ownership
   into their constituant ARC operations and `*::Unqualified` `load`, `store`
   insts.

3. An option called "EnforceSILOwnershipMode" will be added to the verifier. If
the option is set, the verifier will assert if:

   a. `load`, `store` operations with trivial ownership are applied to memory
      locations with non-trivial type.

   b. `load`, `store` operation with unqualified ownership type are present in
   the IR.

Finally, we wire up the building blocks:

1. If SILOption.EnableSILOwnershipModel is true, then the after SILGen SIL
   verification will be performed with EnforceSILOwnershipModel set to true.
2. If SILOption.EnableSILOwnershipModel is true, then the pass manager will run
   the OwnershipModelEliminator pass right after SILGen before the normal pass
   pipeline starts.
3. SILGen will be changed to emit non-unqualified ownership qualifiers on load,
   store instructions when the EnableSILOwnershipModel flag is set. We will use
   the verifier throwing to guarantee that we are not missing any specific
   cases.

Then once all of the bots are green, we change SILOption.EnableSILOwnershipModel
to be true by default. After a cooling off period, we move all of the code
behind the SILOwnershipModel flag in front of the flag. We do this so we can
reuse that flag for further SILOwnershipModel changes.

## Optimizer Changes

Since the SILOwnershipModel eliminator will eliminate the ownership qualifiers
on load, store instructions right after ownership verification, there will be no
immediate affects on the optimizer and thus the optimizer changes can be done in
parallel with the rest of the ARC optimization work.

But, in the long run, we want to enforce these ownership invariants all
throughout the SIL pipeline implying these ownership qualified `load`, `store`
instructions must be processed by IRGen, not eliminated by the SILOwnershipModel
eliminator. Thus we will need to update passes to handle these new
instructions. The main optimizer changes can be separated into the following
areas: memory forwarding, dead stores, ARC optimization. In all of these cases,
the necessary changes are relatively trivial to respond to. We give a quick
taste of two of them: store->load forwarding and ARC Code Motion.

### store->load forwarding

Currently we perform store->load forwarding as follows:

    store %x to %x_ptr : $C
    ... NO SIDE EFFECTS THAT TOUCH X_PTR ...
    %y = load %x_ptr : $C
    use(%y)

      =>

    store %x to %x_ptr : $C
    ... NO SIDE EFFECTS THAT TOUCH X_PTR ...
    use(%x)

In a world, where we are using ownership qualified load, store, we have to also
consider the ownership implications. *NOTE* Since we are not modifying the
store, `store [assign]` and `store [init]` are treated the same. Thus without
any loss of generality, lets consider solely `store`.

    store %x to [assign] %x_ptr : $C
    ... NO SIDE EFFECTS THAT TOUCH X_PTR ...
    %y = load [copy] %x_ptr : $C
    use(%y)

      =>

    store %x to [assign] %x_ptr : $C
    ... NO SIDE EFFECTS THAT TOUCH X_PTR ...
    strong_retain %x
    use(%x)

### ARC Code Motion

If ARC Code Motion wishes to move the ARC semantics of ownership qualified
`load`, `store` instructions, it must now consider read/write effects. On the
other hand, it will be able to now not consider the side-effects of destructors
when moving retain/release operations.

### Normal Code Motion

Normal code motion will lose some effectiveness since many of the load/store
operations that it used to be able to move now must consider ARC information. We
may need to consider running ARC code motion earlier in the pipeline where we
normally run Normal Code Motion to ensure that we are able to handle these
cases.

### ARC Optimization

The main implication for ARC optimization is that instead of eliminating just
retains, releases, it must be able to recognize ownership qualified `load`,
`store` and set their flags as appropriate.

### Function Signature Optimization

Semantic ARC affects function signature optimization in the context of the owned
to borrow optimization. Specifically:

1. A `store [assign]` must be recognized as a release of the old value that is
   being overridden. In such a case, we can move the `release` of the old value
   into the caller and change the `store [assign]` into a `store [init]`.
2. A `load [copy]` must be recognized as a retain in the callee. Then function
   signature optimization will transform the `load [copy]` into a `load
   [borrow]`. This would require the addition of a new `@borrow` return
   value convention.

# Appendix

## Partial Initialization of Loadable References in SIL

In SIL, a value of non-trivial loadable type is loaded from a memory location as
follows:

    %x = load %x_ptr : $*S
    ...
    retain_value %x_ptr : $S

At first glance, this looks reasonable, but in truth there is a hidden drawback:
the partially initialized zone in between the load and the retain
operation. This zone creates a period of time when an "evil optimizer" could
insert an instruction that causes x to be deallocated before the copy is
finished being initialized. Similar issues come up when trying to perform a
store of a non-trival value into a memory location.

Since this sort of partial initialization is allowed in SIL, the optimizer is
forced to be overly conservative when attempting to move releases passed retains
lest the release triggers a deinit that destroys a value like `%x`. Lets look at
two concrete examples that show how semantically providing ownership qualified
load, store instructions eliminate this problem.

**NOTE** Without any loss of generality, we will speak of values with reference
semantics instead of non-trivial values.

## Case Study: Partial Initialization and load [copy]

### The Problem

Consider the following swift program:

    func opaque_call()

    final class C {
      var int: Int = 0
      deinit {
        opaque_call()
      }
    }

    final class D {
      var int: Int = 0
    }

    var GLOBAL_C : C? = nil
    var GLOBAL_D : D? = nil

    func useC(_ c: C)
    func useD(_ d: D)

    func run() {
        let c = C()
        GLOBAL_C = c
        let d = D()
        GLOBAL_D = d
        useC(c)
        useD(d)
    }

Notice that both `C` and `D` have fixed layouts and separate class hierarchies,
but `C`'s deinit has a call to the function `opaque_call` which may write to
`GLOBAL_D` or `GLOBAL_C`. Additionally assume that both `useC` and `useD` are
known to the compiler to not have any affects on instances of type `D`, `C`
respectively and useC assigns `nil` to `GLOBAL_C`. Now consider the following
valid SIL lowering for `run`:

    sil_global GLOBAL_D : $D
    sil_global GLOBAL_C : $C

    final class C {
      var x: Int
      deinit
    }

    final class D {
      var x: Int
    }

    sil @useC : $@convention(thin) () -> ()
    sil @useD : $@convention(thin) () -> ()

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      %c2 = load %global_c : $*C (3)
      strong_retain %c2 : $C (4)
      %d2 = load %global_d : $*D (5)
      strong_retain %d2 : $D (6)

      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c2) : $@convention(thin) (@owned C) -> () (7)

      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %d : $D (9)
      strong_release %c : $C (10)
    }

Lets optimize this function! First we perform the following operations:

1. Since `(2)` is storing to an identified object that can not be `GLOBAL_C`, we
   can store to load forward `(1)` to `(3)`.
2. Since a retain does not block store to load forwarding, we can forward `(2)`
   to `(5)`. But lets for the sake of argument, assume that the optimizer keeps
   such information as an analysis and does not perform the actual load->store
   forwarding.
3. Even though we do not foward `(2)` to `(5)`, we can still move `(4)` over
   `(6)` so that `(4)` is right before `(7)`.

This yields (using the ' marker to designate that a register has had load-store
forwarding applied to it):

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      strong_retain %c : $C (4')
      %d2 = load %global_d : $*D (5)
      strong_retain %d2 : $D (6)

      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')

      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %d : $D (9)
      strong_release %c : $C (10)
    }

Then by assumption, we know that `%useC` does not perform any releases of any
instances of class `D`. Thus `(6)` can be moved past `(7')` and we can then pair
and eliminate `(6)` and `(9)` via the rules of ARC optimization using the
analysis information that `%d2` and `%d` are th same due to the possibility of
performing store->load forwarding. After performing such transformations, `run`
looks as follows:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D

      %d2 = load %global_d : $*D (5)
      strong_retain %c : $C (4')
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')

      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %c : $C (10)
    }

Now by assumption, we know that `%useD_func` does not touch any instances of
class `C` and `%c` does not contain any ivars of type `D` and is final so none
can be added. At first glance, this seems to suggest that we can move `(10)`
before `(8')` and then pair/eliminate `(4')` and `(10)`. But is this a safe
optimization perform? Absolutely Not! Why? Remember that since `useC_func`
assigns `nil` to `GLOBAL_C`, after `(7')`, `%c` could have a reference count
of 1. Thus `(10)` _may_ invoke the destructor of `C`. Since this destructor
calls an opaque function that _could_ potentially write to `GLOBAL_D`, we may be
be passing `%d2`, an already deallocated object to `%useD_func`, an illegal
optimization!

Lets think a bit more about this example and consider this example at the
language level. Remember that while Swift's deinit are not asychronous, we do
not allow for user level code to create dependencies from the body of the
destructor into the normal control flow that has called it. This means that
there are two valid results of this code:

- Operation Sequence 1: No optimization is performed and `%d2` is passed to
  `%useD_func`.
- Operation Sequence 2: We shorten the lifetime of `%c` before `%useD_func` and
   a different instance of `$D` is passed into `%useD_func`.

The fact that 1 occurs without optimization is just as a result of an
implementation detail of SILGen. 2 is also a valid sequence of operations.

Given that:

1. As a principle, the optimizer does not consider such dependencies to avoid
   being overly conservative.
2. We provide constructs to ensure appropriate lifetimes via the usage of
   constructs such as fix_lifetime.

We need to figure out how to express our optimization such that 2
happens. Remember that one of the optimizations that we performed at the
beginning was to move `(6)` over `(7')`, i.e., transform this:

      %d = alloc_ref $D
      %global_d_addr = global_addr GLOBAL_D : $D
      %d = load %global_d_addr : $*D (5)
      strong_retain %d : $D (6)

      // Call the user functions passing in the instances that we loaded from the globals.
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')

into:

      %global_d_addr = global_addr GLOBAL_D : $D
      %d2 = load %global_d_addr : $*D (5)

      // Call the user functions passing in the instances that we loaded from the globals.
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')
      strong_retain %d2 : $D (6)

This transformation in Swift corresponds to transforming:

      let d = GLOBAL_D
      useC(c)

to:

      let d_raw = load_d_value(GLOBAL_D)
      useC(c)
      let d = take_ownership_of_d(d_raw)

This is clearly an instance where we have moved a side-effect in between the
loading of the data and the taking ownership of such data, that is before the
`let` is fully initialized. What if instead of just moving the retain, we moved
the entire let statement? This would then result in the following swift code:

      useC(c)
      let d = GLOBAL_D

and would correspond to the following SIL snippet:

      %global_d_addr = global_addr GLOBAL_D : $D

      // Call the user functions passing in the instances that we loaded from the globals.
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')
      %d2 = load %global_d_addr : $*D (5)
      strong_retain %d2 : $D (6)

Moving the load with the strong_retain to ensure that the full initialization is
performed even after code motion causes our SIL to look as follows:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D

      strong_retain %c : $C (4')
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')

      %d2 = load %global_d : $*D (5)
      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %c : $C (10)
    }

Giving us the exact result that we want: Operation Sequence 2!

### Defining load [copy]

Given that we wish the load, store to be tightly coupled together, it is natural
to express this operation as a `load [copy]` instruction. Lets define the `load
[copy]` instruction as follows:

    %1 = load [copy] %0 : $*C

      =>

    %1 = load %0 : $*C
    retain_value %1 : $C

Now lets transform our initial example to use this instruction:

Notice how now if we move `(7)` over `(3)` and `(6)` now, we get the following SIL:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      %c2 = load [copy] %global_c : $*C (3)
      %d2 = load [copy] %global_d : $*D (5)

      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c2) : $@convention(thin) (@owned C) -> () (7)

      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %d : $D (9)
      strong_release %c : $C (10)
    }

We then perform the previous code motion:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      %c2 = load [copy] %global_c : $*C (3)
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c2) : $@convention(thin) (@owned C) -> () (7)
      strong_release %d : $D (9)

      %d2 = load [copy] %global_d : $*D (5)
      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)
      strong_release %c : $C (10)
    }

We then would like to eliminate `(9)` and `(10)` by pairing them with `(3)` and
`(8)`. Can we still do so? One way we could do this is by introducing the
`[take]` flag. The `[take]` flag on a `load [take]` says that one is
semantically loading a value from a memory location and are taking ownership of
the value thus eliding the retain. In terms of SIL this flag is defined as:

    %x = load [take] %x_ptr : $*C

      =>

    %x = load %x_ptr : $*C

Why do we care about having such a `load [take]` instruction when we could just
use a `load`? The reason why is that a normal `load` has unsafe unowned
ownership (i.e. it has no implications on ownership). We would like for memory
that has non-trivial type to only be able to be loaded via instructions that
maintain said ownership. We will allow for casting to trivial types as usual to
provide such access if it is required.

Thus we have achieved the desired result:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      %c2 = load [take] %global_c : $*C (3)
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c2) : $@convention(thin) (@owned C) -> () (7)

      %d2 = load [take] %global_d : $*D (5)
      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)
    }

On Oct 6, 2016, at 3:03 PM, John McCall <rjmccall@apple.com> wrote:

On Oct 5, 2016, at 4:48 PM, Michael Gottesman <mgottesman@apple.com <mailto:mgottesman@apple.com>> wrote:

On Oct 5, 2016, at 4:40 PM, Michael Gottesman via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

On Oct 4, 2016, at 1:04 PM, John McCall <rjmccall@apple.com <mailto:rjmccall@apple.com>> wrote:

On Sep 30, 2016, at 11:54 PM, Michael Gottesman via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

The document attached below contains the first "Semantic ARC" mini proposal: the High Level ARC Memory Operations Proposal.

An html rendered version of this markdown document is available at the following URL:

https://gottesmm.github.io/proposals/high-level-arc-memory-operations.html

----

# Summary

This document proposes:

1. adding the `load_strong`, `store_strong` instructions to SIL. These can only
   be used with memory locations of `non-trivial` type.

I would really like to avoid using the word "strong" here. Under the current proposal, these instructions will be usable with arbitrary non-trivial types, not just primitive class references. Even if you think of an aggregate that happens to contain one or more strong references as some sort of aggregate strong reference (which is questionable but not completely absurd), we already have loadable non-strong class references that this operation would be usable with, like native unowned references. "load_strong %0 : $*@sil_unowned T" as an operation yielding a scalar "@sil_unowned T" is ridiculous, and it will only get more ridiculous when we eventually allow this operation to work with types that are currently address-only, like weak references.

Brainstorming:

Something like load_copy and store_copy would be a bit unfortunate, since store_copy doesn't actually copy the source operand and we want to have a load_copy [take].

load_value and store_value seem excessively generic. It's not like non-trivial types aren't values.

One question that comes to mind: do we actually need new instructions here other than for staging purposes? We don't actually need new instructions for pseudo-linear SIL to work; we just need to say that we only enforce pseudo-linearity for non-trivial types.

If we just want the instruction to be explicit about ownership so that we can easily distinguish these cases, we can make the rule always explicit, e.g.:
  load [take] %0 : $*MyClass
  load [copy] %0 : $*MyClass
  load [trivial] %0 : $*Int

  store %0 to [initialization] %1 : $*MyClass
  store %0 to [assignment] %1 : $*MyClass
  store %0 to [trivial] %1 : $*Int

John.

The reason why I originally suggested to go the load_strong route is that we already have load_weak, load_unowned instructions. If I could add a load_strong instruction, then it would make sense to assign an engineer to do a pass over all 3 of these instructions and combine them into 1 load instruction. That is, first transform into a form amenable for canonicalization and then canonicalize all at once.

As you pointed out, both load_unowned and load_weak involve representation changes in type (for instance the change of weak pointers to Optional<T>). Such a change would be against the "spirit" of a load instruction to perform such representation changes versus ownership changes.

In terms of the properties that we actually want here, what is important is that we can verify that no non-trivially typed values are loaded in an unsafe unowned manner. That can be done also with ownership flags on load/store.

Does this sound reasonable:

1. We introduce two enums that define memory ownership changes, one for load and one for store. Both of these enums will contain a [trivial] ownership.
2. We enforce in the verifier that non-trivial types must have a non-trivial ownership modifier on any memory operations that they are involved in.

Sorry for not being explicit. I will not add new instructions, just modifiers. Assuming that this is agreeable to you, I am going to prepare a quick additional version of the proposal document.

That sounds great, thanks.

John.

Attached below is an updated version of the proposal. Again a rendered version is located at:

https://gottesmm.github.io/proposals/high-level-arc-memory-operations.html

Michael

----

# Summary

This document proposes:

1. adding the following ownership qualifiers to `load`: `[take]`, `[copy]`,
   `[borrow]`, `[trivial]`.
2. adding the following ownership qualifiers to `store`: `[init]`, `[assign]`,
   `[trivial]`.
3. requiring all `load` and `store` operations to have ownership qualifiers.
4. banning the use of `load [trivial]`, `store [trivial]` on memory locations of
   `non-trivial` type.

This will allow for:

1. eliminating optimizer miscompiles that occur due to releases being moved into
   the region in between a `load`/`retain`, `load`/`release`,
   `store`/`release`. (For a specific example, see the appendix).
2. explicitly modeling `load [trivial]`/`store [trivial]` as having `unsafe
   unowned` ownership semantics. This will be enforced via the verifier.
3. more aggressive ARC code motion.

# Definitions

## ownership qualified load

We propose four different ownership qualifiers for load. Define `load [trivial]`
as:

    %x = load [trivial] %x_ptr : $*Int

      =>

    %x = load %x_ptr : $*Int

A `load [trivial]` can not be used to load values of non-trivial type.

Should we mandate the reverse as well, that e.g. load [copy] cannot be used to
load values of trivial type? That's a little more work for substituting cloners, but it
keeps everything nice and canonical.

Define
`load [copy]` as:

    %x = load [copy] %x_ptr : $*C

      =>

    %x = load %x_ptr : $*C
    retain_value %x : $C

Then define `load [take]` as:

    %x = load [take] %x_ptr : $*Builtin.NativeObject

      =>

    %x = load %x_ptr : $*Builtin.NativeObject

**NOTE** `load [take]` implies that the loaded from memory location no longer
owns the result object (i.e. a take is a move). Loading from the memory location
again without reinitialization is illegal.

Next we provide `load [borrow]`:

    %x = load [borrow] %x_ptr : $*Builtin.NativeObject
    ...
    endBorrow(%x, %x_ptr)

      =>

    %x = load %x_ptr : $*Builtin.NativeObject
    ...
    endBorrow(%x, %x_ptr)

`load [borrow]` implies that in the region between the `load` and the
`endBorrow`, the loaded object must semantically remain alive.

For consistency with other multi-word SIL instructions, this should be end_borrow.

I wonder whether it might make more sense for load [borrow] to be a different instruction.
There's a couple reasons for that first. The first is that it's the only load which introduces
a scope, which is a really big difference structurally. The second is that it's the only load
which returns a non-owned value, which will be a typing difference when we record
ownership in the type system.

Anyway, I really like that load [borrow] is scoped.. Are you planning to include the formation
restrictions on scopes instructions immediately, or will you do that in a later proposal?

The requirements we need from scopes are:
  - there has to be a well-defined set of IPs that lie within the scope and
  - there can't be a non-explicit transition into or out of the scope.

One way to get the first condition is to say that there has to be a unique scope-ender that
post-dominates the scope-beginner, but that's a pretty hard restriction for SILGen to satisfy
(as well as the optimizer, I imagine), and it doesn't handle "unreachable" or infinite loops.
We need to allow multiple scope-enders, and we need to allow scope-enders to be missing
in some cases. Here's the right formalism, I think:

For all walks W beginning from the entry point of the function:
  For each node B in the CFG which is a scope-beginner:
    Let E be the set of scope-enders for B, and consider just the sub-sequence S of nodes
    of W where the node is a member of {B} U E. Then the elements of S at even
    positions (starting from 0) must be B, and the elements at odd positions must be
    members of E. Furthermore, if the walk ends in a return or throw instruction, then
    S must have even length.

Note that for this to be true, all the scope-enders must be dominated by the scope-beginner.

It is sufficient to just consider the set of "beeline" paths, i.e. acyclic paths ending in either a true
terminator (a return, throw, or unreachable) or an edge back to a node already in the path.
No such path may include multiple scope-enders for the same scope-beginner. If the path ends
in a return or throw, it must include a matching scope-ender after every scope-beginner. If
it ends in a loop back, then for every scope-beginner in the path, if the path contains a scope-ender
then the loop destination must either precede the scope-beginner or follow the scope-ender;
otherwise, the loop destination must follow the scope-beginner.

Or, as a decision algorithm in Swift for a single scope-beginner:

  var blockEntryIsInScope = [Block: Bool]()
  func scan(startingFrom inst: Instruction, isInScope: Bool) {
    if inst is ReturnInst || inst is ThrowInst {
      guard !isInScope else { invalid("ended function while in scope") }
      return
    }

    if let term = inst as? TerminatorInst {
      for successor in term.successors {
        guard begin.dominates(successor) else {
          guard !isInScope else { invalid("branch out of scope while in scope") }
          continue
        }
        if let cachedValue = blockEntryIsInScope[successor] {
          if cachedValue != isInScope {
            invalid(isInScope ? "branch out of scope while in scope" : "branch into scope after exiting scope")
          }
        } else {
          blockEntryIsInScope[successor] = isInScope
          scan(startingFrom: successor.begin, isInScope: isInScope)
        }
      }
      return
    }

    if inst.endsScopeOf(begin) {
      guard isInScope else { invalid("ending scope that was already ended") }
      scan(startingFrom: inst.next, isInScope: false)
    } else {
      scan(startingFrom: inst.next, isInScope: isInScope)
    }
  }
  scan(startingFrom: begin, isInScope: true)

John.

···

On Oct 7, 2016, at 2:38 PM, Michael Gottesman <mgottesman@apple.com> wrote:

The `endBorrow` communicates to the optimizer:

1. That the value in `%x_ptr` should not be destroyed before endBorrow.
2. Uses of `%x` should not be sunk past endBorrow since `%x` is only a shallow
   copy of the value in `%x_ptr` and past that point `%x_ptr` may not remain
   alive.

An example of where this construct is useful is when one has a let binding to a
class instance `c` that contains a let field `f`. In that case `c`'s lifetime
guarantees `f`'s lifetime meaning that returning `f` over the function call
boundary is safe.

## ownership qualified store

First define a `store [trivial]` as:

    store %x to [trivial] %x_ptr : $*Int

      =>

    store %x to %x_ptr : $*Int

The verifier will prevent this instruction from being used on types with
non-trivial ownership. Define a `store [assign]` as follows:

    store %x to [assign] %x_ptr : $*C

       =>

    %old_x = load %x_ptr : $*C
    store %new_x to %x_ptr : $*C
    release_value %old_x : $C

*NOTE* `store` is defined as a consuming operation. We also provide
`store [init]` in the case where we know statically that there is no
previous value in the memory location:

    store %x to [init] %x_ptr : $*C

       =>

    store %new_x to %x_ptr : $*C

# Implementation

## Goals

Our implementation strategy goals are:

1. zero impact on other compiler developers until the feature is fully
   developed. This implies all work will be done behind a flag.
2. separation of feature implementation from updating passes.

Goal 2 will be implemented via a pass that transforms ownership qualified
`load`/`store` instructions into unqualified `load`/`store` right after SILGen.

## Plan

We begin by adding initial infrastructure for our development. This means:

1. Adding to SILOptions a disabled by default flag called
"EnableSILOwnershipModel". This flag will be set by a false by default frontend
option called "-enable-sil-ownership-mode".

2. Bots will be brought up to test the compiler with
   "-enable-sil-ownership-model" set to true. The specific bots are:

   * RA-OSX+simulators
   * RA-Device
   * RA-Linux.

   The bots will run once a day until the feature is close to completion. Then a
   polling model will be followed.

Now that change isolation is borrow, we develop building blocks for the
optimization:

1. Two enums will be defined: `LoadInstOwnershipQualifier`,
   `StoreInstOwnershipQualifier`. The exact definition of these enums are as
   follows:

       enum class LoadOwnershipQualifier {
         Unqualified, Take, Copy, Borrow, Trivial
       };
       enum class StoreOwnershipQualifier {
         Unqualified, Init, Assign, Trivial
       };

    *NOTE* `LoadOwnershipQualifier::Unqualified` and
    `StoreOwnershipQualifier::Unqualified` are only needed for staging purposes.

2. Creating a `LoadInst`, `StoreInst` will be changed to require an ownership
qualifier. At this stage, this argument will default to `Unqualified`. "Bare"
`load`, `store` when parsed via textual SIL will be considered to be
unqualified. This implies that the rest of the compiler will not have to be
changed as a result of this step.

3. Support will be added to SIL, IRGen, Serialization, SILPrinting, and SIL
Parsing for the rest of the qualifiers. SILGen will not be modified at this
stage.

4. A pass called the "OwnershipModelEliminator" will be implemented. It will
   blow up all `load`, `store` instructions with non `*::Unqualified` ownership
   into their constituant ARC operations and `*::Unqualified` `load`, `store`
   insts.

3. An option called "EnforceSILOwnershipMode" will be added to the verifier. If
the option is set, the verifier will assert if:

   a. `load`, `store` operations with trivial ownership are applied to memory
      locations with non-trivial type.

   b. `load`, `store` operation with unqualified ownership type are present in
   the IR.

Finally, we wire up the building blocks:

1. If SILOption.EnableSILOwnershipModel is true, then the after SILGen SIL
   verification will be performed with EnforceSILOwnershipModel set to true.
2. If SILOption.EnableSILOwnershipModel is true, then the pass manager will run
   the OwnershipModelEliminator pass right after SILGen before the normal pass
   pipeline starts.
3. SILGen will be changed to emit non-unqualified ownership qualifiers on load,
   store instructions when the EnableSILOwnershipModel flag is set. We will use
   the verifier throwing to guarantee that we are not missing any specific
   cases.

Then once all of the bots are green, we change SILOption.EnableSILOwnershipModel
to be true by default. After a cooling off period, we move all of the code
behind the SILOwnershipModel flag in front of the flag. We do this so we can
reuse that flag for further SILOwnershipModel changes.

## Optimizer Changes

Since the SILOwnershipModel eliminator will eliminate the ownership qualifiers
on load, store instructions right after ownership verification, there will be no
immediate affects on the optimizer and thus the optimizer changes can be done in
parallel with the rest of the ARC optimization work.

But, in the long run, we want to enforce these ownership invariants all
throughout the SIL pipeline implying these ownership qualified `load`, `store`
instructions must be processed by IRGen, not eliminated by the SILOwnershipModel
eliminator. Thus we will need to update passes to handle these new
instructions. The main optimizer changes can be separated into the following
areas: memory forwarding, dead stores, ARC optimization. In all of these cases,
the necessary changes are relatively trivial to respond to. We give a quick
taste of two of them: store->load forwarding and ARC Code Motion.

### store->load forwarding

Currently we perform store->load forwarding as follows:

    store %x to %x_ptr : $C
    ... NO SIDE EFFECTS THAT TOUCH X_PTR ...
    %y = load %x_ptr : $C
    use(%y)

      =>

    store %x to %x_ptr : $C
    ... NO SIDE EFFECTS THAT TOUCH X_PTR ...
    use(%x)

In a world, where we are using ownership qualified load, store, we have to also
consider the ownership implications. *NOTE* Since we are not modifying the
store, `store [assign]` and `store [init]` are treated the same. Thus without
any loss of generality, lets consider solely `store`.

    store %x to [assign] %x_ptr : $C
    ... NO SIDE EFFECTS THAT TOUCH X_PTR ...
    %y = load [copy] %x_ptr : $C
    use(%y)

      =>

    store %x to [assign] %x_ptr : $C
    ... NO SIDE EFFECTS THAT TOUCH X_PTR ...
    strong_retain %x
    use(%x)

### ARC Code Motion

If ARC Code Motion wishes to move the ARC semantics of ownership qualified
`load`, `store` instructions, it must now consider read/write effects. On the
other hand, it will be able to now not consider the side-effects of destructors
when moving retain/release operations.

### Normal Code Motion

Normal code motion will lose some effectiveness since many of the load/store
operations that it used to be able to move now must consider ARC information. We
may need to consider running ARC code motion earlier in the pipeline where we
normally run Normal Code Motion to ensure that we are able to handle these
cases.

### ARC Optimization

The main implication for ARC optimization is that instead of eliminating just
retains, releases, it must be able to recognize ownership qualified `load`,
`store` and set their flags as appropriate.

### Function Signature Optimization

Semantic ARC affects function signature optimization in the context of the owned
to borrow optimization. Specifically:

1. A `store [assign]` must be recognized as a release of the old value that is
   being overridden. In such a case, we can move the `release` of the old value
   into the caller and change the `store [assign]` into a `store [init]`.
2. A `load [copy]` must be recognized as a retain in the callee. Then function
   signature optimization will transform the `load [copy]` into a `load
   [borrow]`. This would require the addition of a new `@borrow` return
   value convention.

# Appendix

## Partial Initialization of Loadable References in SIL

In SIL, a value of non-trivial loadable type is loaded from a memory location as
follows:

    %x = load %x_ptr : $*S
    ...
    retain_value %x_ptr : $S

At first glance, this looks reasonable, but in truth there is a hidden drawback:
the partially initialized zone in between the load and the retain
operation. This zone creates a period of time when an "evil optimizer" could
insert an instruction that causes x to be deallocated before the copy is
finished being initialized. Similar issues come up when trying to perform a
store of a non-trival value into a memory location.

Since this sort of partial initialization is allowed in SIL, the optimizer is
forced to be overly conservative when attempting to move releases passed retains
lest the release triggers a deinit that destroys a value like `%x`. Lets look at
two concrete examples that show how semantically providing ownership qualified
load, store instructions eliminate this problem.

**NOTE** Without any loss of generality, we will speak of values with reference
semantics instead of non-trivial values.

## Case Study: Partial Initialization and load [copy]

### The Problem

Consider the following swift program:

    func opaque_call()

    final class C {
      var int: Int = 0
      deinit {
        opaque_call()
      }
    }

    final class D {
      var int: Int = 0
    }

    var GLOBAL_C : C? = nil
    var GLOBAL_D : D? = nil

    func useC(_ c: C)
    func useD(_ d: D)

    func run() {
        let c = C()
        GLOBAL_C = c
        let d = D()
        GLOBAL_D = d
        useC(c)
        useD(d)
    }

Notice that both `C` and `D` have fixed layouts and separate class hierarchies,
but `C`'s deinit has a call to the function `opaque_call` which may write to
`GLOBAL_D` or `GLOBAL_C`. Additionally assume that both `useC` and `useD` are
known to the compiler to not have any affects on instances of type `D`, `C`
respectively and useC assigns `nil` to `GLOBAL_C`. Now consider the following
valid SIL lowering for `run`:

    sil_global GLOBAL_D : $D
    sil_global GLOBAL_C : $C

    final class C {
      var x: Int
      deinit
    }

    final class D {
      var x: Int
    }

    sil @useC : $@convention(thin) () -> ()
    sil @useD : $@convention(thin) () -> ()

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      %c2 = load %global_c : $*C (3)
      strong_retain %c2 : $C (4)
      %d2 = load %global_d : $*D (5)
      strong_retain %d2 : $D (6)

      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c2) : $@convention(thin) (@owned C) -> () (7)

      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %d : $D (9)
      strong_release %c : $C (10)
    }

Lets optimize this function! First we perform the following operations:

1. Since `(2)` is storing to an identified object that can not be `GLOBAL_C`, we
   can store to load forward `(1)` to `(3)`.
2. Since a retain does not block store to load forwarding, we can forward `(2)`
   to `(5)`. But lets for the sake of argument, assume that the optimizer keeps
   such information as an analysis and does not perform the actual load->store
   forwarding.
3. Even though we do not foward `(2)` to `(5)`, we can still move `(4)` over
   `(6)` so that `(4)` is right before `(7)`.

This yields (using the ' marker to designate that a register has had load-store
forwarding applied to it):

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      strong_retain %c : $C (4')
      %d2 = load %global_d : $*D (5)
      strong_retain %d2 : $D (6)

      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')

      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %d : $D (9)
      strong_release %c : $C (10)
    }

Then by assumption, we know that `%useC` does not perform any releases of any
instances of class `D`. Thus `(6)` can be moved past `(7')` and we can then pair
and eliminate `(6)` and `(9)` via the rules of ARC optimization using the
analysis information that `%d2` and `%d` are th same due to the possibility of
performing store->load forwarding. After performing such transformations, `run`
looks as follows:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D

      %d2 = load %global_d : $*D (5)
      strong_retain %c : $C (4')
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')

      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %c : $C (10)
    }

Now by assumption, we know that `%useD_func` does not touch any instances of
class `C` and `%c` does not contain any ivars of type `D` and is final so none
can be added. At first glance, this seems to suggest that we can move `(10)`
before `(8')` and then pair/eliminate `(4')` and `(10)`. But is this a safe
optimization perform? Absolutely Not! Why? Remember that since `useC_func`
assigns `nil` to `GLOBAL_C`, after `(7')`, `%c` could have a reference count
of 1. Thus `(10)` _may_ invoke the destructor of `C`. Since this destructor
calls an opaque function that _could_ potentially write to `GLOBAL_D`, we may be
be passing `%d2`, an already deallocated object to `%useD_func`, an illegal
optimization!

Lets think a bit more about this example and consider this example at the
language level. Remember that while Swift's deinit are not asychronous, we do
not allow for user level code to create dependencies from the body of the
destructor into the normal control flow that has called it. This means that
there are two valid results of this code:

- Operation Sequence 1: No optimization is performed and `%d2` is passed to
  `%useD_func`.
- Operation Sequence 2: We shorten the lifetime of `%c` before `%useD_func` and
   a different instance of `$D` is passed into `%useD_func`.

The fact that 1 occurs without optimization is just as a result of an
implementation detail of SILGen. 2 is also a valid sequence of operations.

Given that:

1. As a principle, the optimizer does not consider such dependencies to avoid
   being overly conservative.
2. We provide constructs to ensure appropriate lifetimes via the usage of
   constructs such as fix_lifetime.

We need to figure out how to express our optimization such that 2
happens. Remember that one of the optimizations that we performed at the
beginning was to move `(6)` over `(7')`, i.e., transform this:

      %d = alloc_ref $D
      %global_d_addr = global_addr GLOBAL_D : $D
      %d = load %global_d_addr : $*D (5)
      strong_retain %d : $D (6)

      // Call the user functions passing in the instances that we loaded from the globals.
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')

into:

      %global_d_addr = global_addr GLOBAL_D : $D
      %d2 = load %global_d_addr : $*D (5)

      // Call the user functions passing in the instances that we loaded from the globals.
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')
      strong_retain %d2 : $D (6)

This transformation in Swift corresponds to transforming:

      let d = GLOBAL_D
      useC(c)

to:

      let d_raw = load_d_value(GLOBAL_D)
      useC(c)
      let d = take_ownership_of_d(d_raw)

This is clearly an instance where we have moved a side-effect in between the
loading of the data and the taking ownership of such data, that is before the
`let` is fully initialized. What if instead of just moving the retain, we moved
the entire let statement? This would then result in the following swift code:

      useC(c)
      let d = GLOBAL_D

and would correspond to the following SIL snippet:

      %global_d_addr = global_addr GLOBAL_D : $D

      // Call the user functions passing in the instances that we loaded from the globals.
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')
      %d2 = load %global_d_addr : $*D (5)
      strong_retain %d2 : $D (6)

Moving the load with the strong_retain to ensure that the full initialization is
performed even after code motion causes our SIL to look as follows:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D

      strong_retain %c : $C (4')
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')

      %d2 = load %global_d : $*D (5)
      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %c : $C (10)
    }

Giving us the exact result that we want: Operation Sequence 2!

### Defining load [copy]

Given that we wish the load, store to be tightly coupled together, it is natural
to express this operation as a `load [copy]` instruction. Lets define the `load
[copy]` instruction as follows:

    %1 = load [copy] %0 : $*C

      =>

    %1 = load %0 : $*C
    retain_value %1 : $C

Now lets transform our initial example to use this instruction:

Notice how now if we move `(7)` over `(3)` and `(6)` now, we get the following SIL:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      %c2 = load [copy] %global_c : $*C (3)
      %d2 = load [copy] %global_d : $*D (5)

      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c2) : $@convention(thin) (@owned C) -> () (7)

      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %d : $D (9)
      strong_release %c : $C (10)
    }

We then perform the previous code motion:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      %c2 = load [copy] %global_c : $*C (3)
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c2) : $@convention(thin) (@owned C) -> () (7)
      strong_release %d : $D (9)

      %d2 = load [copy] %global_d : $*D (5)
      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)
      strong_release %c : $C (10)
    }

We then would like to eliminate `(9)` and `(10)` by pairing them with `(3)` and
`(8)`. Can we still do so? One way we could do this is by introducing the
`[take]` flag. The `[take]` flag on a `load [take]` says that one is
semantically loading a value from a memory location and are taking ownership of
the value thus eliding the retain. In terms of SIL this flag is defined as:

    %x = load [take] %x_ptr : $*C

      =>

    %x = load %x_ptr : $*C

Why do we care about having such a `load [take]` instruction when we could just
use a `load`? The reason why is that a normal `load` has unsafe unowned
ownership (i.e. it has no implications on ownership). We would like for memory
that has non-trivial type to only be able to be loaded via instructions that
maintain said ownership. We will allow for casting to trivial types as usual to
provide such access if it is required.

Thus we have achieved the desired result:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      %c2 = load [take] %global_c : $*C (3)
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c2) : $@convention(thin) (@owned C) -> () (7)

      %d2 = load [take] %global_d : $*D (5)
      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)
    }

On Oct 6, 2016, at 3:03 PM, John McCall <rjmccall@apple.com <mailto:rjmccall@apple.com>> wrote:

On Oct 5, 2016, at 4:48 PM, Michael Gottesman <mgottesman@apple.com <mailto:mgottesman@apple.com>> wrote:

On Oct 5, 2016, at 4:40 PM, Michael Gottesman via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

On Oct 4, 2016, at 1:04 PM, John McCall <rjmccall@apple.com <mailto:rjmccall@apple.com>> wrote:

On Sep 30, 2016, at 11:54 PM, Michael Gottesman via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

The document attached below contains the first "Semantic ARC" mini proposal: the High Level ARC Memory Operations Proposal.

An html rendered version of this markdown document is available at the following URL:

https://gottesmm.github.io/proposals/high-level-arc-memory-operations.html

----

# Summary

This document proposes:

1. adding the `load_strong`, `store_strong` instructions to SIL. These can only
   be used with memory locations of `non-trivial` type.

I would really like to avoid using the word "strong" here. Under the current proposal, these instructions will be usable with arbitrary non-trivial types, not just primitive class references. Even if you think of an aggregate that happens to contain one or more strong references as some sort of aggregate strong reference (which is questionable but not completely absurd), we already have loadable non-strong class references that this operation would be usable with, like native unowned references. "load_strong %0 : $*@sil_unowned T" as an operation yielding a scalar "@sil_unowned T" is ridiculous, and it will only get more ridiculous when we eventually allow this operation to work with types that are currently address-only, like weak references.

Brainstorming:

Something like load_copy and store_copy would be a bit unfortunate, since store_copy doesn't actually copy the source operand and we want to have a load_copy [take].

load_value and store_value seem excessively generic. It's not like non-trivial types aren't values.

One question that comes to mind: do we actually need new instructions here other than for staging purposes? We don't actually need new instructions for pseudo-linear SIL to work; we just need to say that we only enforce pseudo-linearity for non-trivial types.

If we just want the instruction to be explicit about ownership so that we can easily distinguish these cases, we can make the rule always explicit, e.g.:
  load [take] %0 : $*MyClass
  load [copy] %0 : $*MyClass
  load [trivial] %0 : $*Int

  store %0 to [initialization] %1 : $*MyClass
  store %0 to [assignment] %1 : $*MyClass
  store %0 to [trivial] %1 : $*Int

John.

The reason why I originally suggested to go the load_strong route is that we already have load_weak, load_unowned instructions. If I could add a load_strong instruction, then it would make sense to assign an engineer to do a pass over all 3 of these instructions and combine them into 1 load instruction. That is, first transform into a form amenable for canonicalization and then canonicalize all at once.

As you pointed out, both load_unowned and load_weak involve representation changes in type (for instance the change of weak pointers to Optional<T>). Such a change would be against the "spirit" of a load instruction to perform such representation changes versus ownership changes.

In terms of the properties that we actually want here, what is important is that we can verify that no non-trivially typed values are loaded in an unsafe unowned manner. That can be done also with ownership flags on load/store.

Does this sound reasonable:

1. We introduce two enums that define memory ownership changes, one for load and one for store. Both of these enums will contain a [trivial] ownership.
2. We enforce in the verifier that non-trivial types must have a non-trivial ownership modifier on any memory operations that they are involved in.

Sorry for not being explicit. I will not add new instructions, just modifiers. Assuming that this is agreeable to you, I am going to prepare a quick additional version of the proposal document.

That sounds great, thanks.

John.

Attached below is an updated version of the proposal. Again a rendered version is located at:

https://gottesmm.github.io/proposals/high-level-arc-memory-operations.html

Michael

This looks great Michael, a couple of comments/questions:

typo: affects -> effects.

If ARC Code Motion wishes to move the ARC semantics of ownership qualified load, store instructions, it must now consider read/write effects. On the other hand, it will be able to now not consider the side-effects of destructors when moving retain/release operations.

Can you elaborate on what you mean by this? Does this mean the ARC optimizer has additional freedom somehow to ignore side effects in deinits? What grants that ability?

-Chris

···

On Oct 7, 2016, at 2:38 PM, Michael Gottesman via swift-dev <swift-dev@swift.org> wrote:

Attached below is a final version of the proposal. I am going to commit it to the repo if there are no further questions/changes/etc.

https://gottesmm.github.io/proposals/high-level-arc-memory-operations.html

Michael

···

----

# Summary

This document proposes:

1. adding the following ownership qualifiers to `load`: `[take]`, `[copy]`,
   `[trivial]`.
2. adding the following ownership qualifiers to `store`: `[init]`, `[assign]`,
   `[trivial]`.
3. adding the `load_borrow` instruction and the `end_borrow` instruction.
3. requiring all `load` and `store` operations to have ownership qualifiers.
4. banning the use of `load [trivial]`, `store [trivial]` on memory locations of
   `non-trivial` type.

This will allow for:

1. eliminating optimizer miscompiles that occur due to releases being moved into
   the region in between a `load`/`retain`, `load`/`release`,
   `store`/`release`. (For a specific example, see the appendix).
2. explicitly modeling `load [trivial]`/`store [trivial]` as having `unsafe
   unowned` ownership semantics. This will be enforced via the verifier.
3. more aggressive ARC code motion.

# Definitions

## ownership qualified load

We propose three different ownership qualifiers for load. Define `load [trivial]`
as:

    %x = load [trivial] %x_ptr : $*Int

      =>

    %x = load %x_ptr : $*Int

A `load [trivial]` can not be used to load values of non-trivial type. Define
`load [copy]` as:

    %x = load [copy] %x_ptr : $*C

      =>

    %x = load %x_ptr : $*C
    retain_value %x : $C

Then define `load [take]` as:

    %x = load [take] %x_ptr : $*Builtin.NativeObject

      =>

    %x = load %x_ptr : $*Builtin.NativeObject

**NOTE** `load [take]` implies that the loaded from memory location no longer
owns the result object (i.e. a take is a move). Loading from the memory location
again without reinitialization is illegal.

## load_borrow and end_borrow

Next we provide `load_borrow` and `end_borrow`:

    %x = load_borrow %x_ptr : $*Builtin.NativeObject
    ...
    end_borrow %x, %x_ptr : $*Builtin.NativeObject

      =>

    %x = load %x_ptr : $*Builtin.NativeObject
    ...
    endLifetime %x : $Builtin.NativeObject
    fixLifetime %x_ptr : $*Builtin.NativeObject

`load [borrow]` implies that in the region between the `load` and the
`end_borrow`, the loaded object must semantically remain alive. The `end_borrow`
communicates to the optimizer:

1. that the value in `%x_ptr` should not be destroyed before endBorrow.
2. uses of `%x` should not be sunk past endBorrow since `%x` is only a shallow
   copy of the value in `%x_ptr` and past that point `%x_ptr` may not remain
   alive.

An example of where this construct is useful is when one has a let binding to a
class instance `c` that contains a let field `f`. In that case `c`'s lifetime
guarantees `f`'s lifetime meaning that returning `f` over the function call
boundary is safe.

*NOTE* since the SILOwnershipModelEliminator will not process these
instructions, endLifetime is just a strawman instruction that will not be
implemented. In practice though, IRGen will need to create a suitable barrier to
ensure that LLVM does not move any uses of `%x` past the `fixLifetime`
instruction of `%x_ptr` once we begin creating such instructions as a result of
ARC optimization.

## ownership qualified store

First define a `store [trivial]` as:

    store %x to [trivial] %x_ptr : $*Int

      =>

    store %x to %x_ptr : $*Int

The verifier will prevent this instruction from being used on types with
non-trivial ownership. Define a `store [assign]` as follows:

    store %x to [assign] %x_ptr : $*C

       =>

    %old_x = load %x_ptr : $*C
    store %new_x to %x_ptr : $*C
    release_value %old_x : $C

*NOTE* `store` is defined as a consuming operation. We also provide
`store [init]` in the case where we know statically that there is no
previous value in the memory location:

    store %x to [init] %x_ptr : $*C

       =>

    store %new_x to %x_ptr : $*C

# Implementation

## Goals

Our implementation strategy goals are:

1. zero impact on other compiler developers until the feature is fully
   developed. This implies all work will be done behind a flag.
2. separation of feature implementation from updating passes.

Goal 2 will be implemented via a pass that transforms ownership qualified
`load`/`store` instructions into unqualified `load`/`store` right after SILGen.

## Plan

We begin by adding initial infrastructure for our development. This means:

1. Adding to SILOptions a disabled by default flag called
"EnableSILOwnershipModel". This flag will be set by a false by default frontend
option called "-enable-sil-ownership-mode".

2. Bots will be brought up to test the compiler with
   "-enable-sil-ownership-model" set to true. The specific bots are:

   * RA-OSX+simulators
   * RA-Device
   * RA-Linux.

   The bots will run once a day until the feature is close to completion. Then a
   polling model will be followed.

Now that change isolation is borrow, we develop building blocks for the
optimization:

1. Two enums will be defined: `LoadInstOwnershipQualifier`,
   `StoreInstOwnershipQualifier`. The exact definition of these enums are as
   follows:

       enum class LoadOwnershipQualifier {
         Unqualified, Take, Copy, Trivial
       };
       enum class StoreOwnershipQualifier {
         Unqualified, Init, Assign, Trivial
       };

    *NOTE* `LoadOwnershipQualifier::Unqualified` and
    `StoreOwnershipQualifier::Unqualified` are only needed for staging purposes.

2. Creating a `LoadInst`, `StoreInst` will be changed to require an ownership
qualifier. At this stage, this argument will default to `Unqualified`. "Bare"
`load`, `store` when parsed via textual SIL will be considered to be
unqualified. This implies that the rest of the compiler will not have to be
changed as a result of this step.

3. Support will be added to SIL, IRGen, Serialization, SIL Printing, and SIL
Parsing for the rest of the qualifiers. SILGen will not be modified at this
stage.

4. The `load_borrow` and `end_borrow` instructions will be implemented in SIL,
   IRGen, Serialization, SIL Printing, and SIL Parsing. They will not be used
   immediately.

4. A pass called the "OwnershipModelEliminator" will be implemented. It will
   blow up all `load`, `store` instructions with non `*::Unqualified` ownership
   into their constituant ARC operations and `*::Unqualified` `load`, `store`
   insts. It will not process `load_borrow` and `end_borrow` since currently it
   is not expected for SILGen to emit such instructions.

5. An option called "EnforceSILOwnershipMode" will be added to the verifier. If
the option is set, the verifier will assert if:

   a. `load`, `store` operations with trivial ownership are applied to memory
      locations with non-trivial type.

   b. `load`, `store` operation with unqualified ownership type are present in
   the IR.

   c. `load_borrow` or `end_borrow` are present in the IR. This is because
   currently we do not support SIL containing such instructions in SIL
   Ownership Mode. Once we have the ability to verify borrowing scopes, this
   will no longer be the case, but this is a different proposal.

Finally, we wire up the building blocks:

1. If SILOption.EnableSILOwnershipModel is true, then the after SILGen SIL
   verification will be performed with EnforceSILOwnershipModel set to true.
2. If SILOption.EnableSILOwnershipModel is true, then the pass manager will run
   the OwnershipModelEliminator pass right after SILGen before the normal pass
   pipeline starts.
3. SILGen will be changed to emit non-unqualified ownership qualifiers on load,
   store instructions when the EnableSILOwnershipModel flag is set. We will use
   the verifier throwing to guarantee that we are not missing any specific
   cases.

Then once all of the bots are green, we change SILOption.EnableSILOwnershipModel
to be true by default. After a cooling off period, we move all of the code
behind the SILOwnershipModel flag in front of the flag. We do this so we can
reuse that flag for further SILOwnershipModel changes.

## Optimizer Changes

Since the SILOwnershipModel eliminator will eliminate the ownership qualifiers
on load, store instructions right after ownership verification, there will be no
immediate effects on the optimizer and thus the optimizer changes can be done in
parallel with the rest of the ARC optimization work.

But, in the long run, we want to enforce these ownership invariants all
throughout the SIL pipeline implying these ownership qualified `load`, `store`
instructions must be processed by IRGen, not eliminated by the SILOwnershipModel
eliminator. Thus we will need to update passes to handle these new instructions
and also will need to implement the `load_borrow`, `end_borrow` instruction.

The main optimizer changes can be separated into the following areas: memory
forwarding, dead stores, ARC optimization. In all of these cases, the necessary
changes are relatively trivial to respond to. We give a quick taste of two of
them: store->load forwarding and ARC Code Motion.

### store->load forwarding

Currently we perform store->load forwarding as follows:

    store %x to %x_ptr : $C
    ... NO SIDE EFFECTS THAT TOUCH X_PTR ...
    %y = load %x_ptr : $C
    use(%y)

      =>

    store %x to %x_ptr : $C
    ... NO SIDE EFFECTS THAT TOUCH X_PTR ...
    use(%x)

In a world, where we are using ownership qualified load, store, we have to also
consider the ownership implications. *NOTE* Since we are not modifying the
store, `store [assign]` and `store [init]` are treated the same. Thus without
any loss of generality, lets consider solely `store`.

    store %x to [assign] %x_ptr : $C
    ... NO SIDE EFFECTS THAT TOUCH X_PTR ...
    %y = load [copy] %x_ptr : $C
    use(%y)

      =>

    store %x to [assign] %x_ptr : $C
    ... NO SIDE EFFECTS THAT TOUCH X_PTR ...
    strong_retain %x
    use(%x)

### ARC Code Motion

If ARC Code Motion wishes to move the ARC semantics of ownership qualified
`load`, `store` instructions, it must now consider read/write effects. On the
other hand, we can perform more aggressive ARC code motion of ownership
qualified loads and stores in the face of deinits. This is because we no longer
need to worry about our code motion causing a deinit to fire in between (without
any loss of generality) the load/retain.

### Normal Code Motion

Normal code motion will lose some effectiveness since many of the load/store
operations that it used to be able to move now must consider ARC information. We
may need to consider running ARC code motion earlier in the pipeline where we
normally run Normal Code Motion to ensure that we are able to handle these
cases.

### ARC Optimization

The main implication for ARC optimization is that instead of eliminating just
retains, releases, it must be able to recognize ownership qualified `load`,
`store` and set their flags as appropriate. Also in general ARC optimization and
memory behavior will need to recognize the `end_borrow` instruction as a code
motion barrier.

### Function Signature Optimization

Semantic ARC affects function signature optimization in the context of the owned
to borrow optimization. Specifically:

1. A `store [assign]` must be recognized as a release of the old value that is
   being overridden. In such a case, we can move the `release` of the old value
   into the caller and change the `store [assign]` into a `store [init]`.
2. A `load [copy]` must be recognized as a retain in the callee. Then function
   signature optimization will transform the `load [copy]` into a
   `load_borrow`. This would require the addition of a new `@borrow` return
   value convention.

# Appendix

## Partial Initialization of Loadable References in SIL

In SIL, a value of non-trivial loadable type is loaded from a memory location as
follows:

    %x = load %x_ptr : $*S
    ...
    retain_value %x_ptr : $S

At first glance, this looks reasonable, but in truth there is a hidden drawback:
the partially initialized zone in between the load and the retain
operation. This zone creates a period of time when an "evil optimizer" could
insert an instruction that causes x to be deallocated before the copy is
finished being initialized. Similar issues come up when trying to perform a
store of a non-trival value into a memory location.

Since this sort of partial initialization is allowed in SIL, the optimizer is
forced to be overly conservative when attempting to move releases passed retains
lest the release triggers a deinit that destroys a value like `%x`. Lets look at
two concrete examples that show how semantically providing ownership qualified
load, store instructions eliminate this problem.

**NOTE** Without any loss of generality, we will speak of values with reference
semantics instead of non-trivial values.

## Case Study: Partial Initialization and load [copy]

### The Problem

Consider the following swift program:

    func opaque_call()

    final class C {
      var int: Int = 0
      deinit {
        opaque_call()
      }
    }

    final class D {
      var int: Int = 0
    }

    var GLOBAL_C : C? = nil
    var GLOBAL_D : D? = nil

    func useC(_ c: C)
    func useD(_ d: D)

    func run() {
        let c = C()
        GLOBAL_C = c
        let d = D()
        GLOBAL_D = d
        useC(c)
        useD(d)
    }

Notice that both `C` and `D` have fixed layouts and separate class hierarchies,
but `C`'s deinit has a call to the function `opaque_call` which may write to
`GLOBAL_D` or `GLOBAL_C`. Additionally assume that both `useC` and `useD` are
known to the compiler to not have any affects on instances of type `D`, `C`
respectively and useC assigns `nil` to `GLOBAL_C`. Now consider the following
valid SIL lowering for `run`:

    sil_global GLOBAL_D : $D
    sil_global GLOBAL_C : $C

    final class C {
      var x: Int
      deinit
    }

    final class D {
      var x: Int
    }

    sil @useC : $@convention(thin) () -> ()
    sil @useD : $@convention(thin) () -> ()

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      %c2 = load %global_c : $*C (3)
      strong_retain %c2 : $C (4)
      %d2 = load %global_d : $*D (5)
      strong_retain %d2 : $D (6)

      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c2) : $@convention(thin) (@owned C) -> () (7)

      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %d : $D (9)
      strong_release %c : $C (10)
    }

Lets optimize this function! First we perform the following operations:

1. Since `(2)` is storing to an identified object that can not be `GLOBAL_C`, we
   can store to load forward `(1)` to `(3)`.
2. Since a retain does not block store to load forwarding, we can forward `(2)`
   to `(5)`. But lets for the sake of argument, assume that the optimizer keeps
   such information as an analysis and does not perform the actual load->store
   forwarding.
3. Even though we do not foward `(2)` to `(5)`, we can still move `(4)` over
   `(6)` so that `(4)` is right before `(7)`.

This yields (using the ' marker to designate that a register has had load-store
forwarding applied to it):

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      strong_retain %c : $C (4')
      %d2 = load %global_d : $*D (5)
      strong_retain %d2 : $D (6)

      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')

      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %d : $D (9)
      strong_release %c : $C (10)
    }

Then by assumption, we know that `%useC` does not perform any releases of any
instances of class `D`. Thus `(6)` can be moved past `(7')` and we can then pair
and eliminate `(6)` and `(9)` via the rules of ARC optimization using the
analysis information that `%d2` and `%d` are th same due to the possibility of
performing store->load forwarding. After performing such transformations, `run`
looks as follows:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D

      %d2 = load %global_d : $*D (5)
      strong_retain %c : $C (4')
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')

      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %c : $C (10)
    }

Now by assumption, we know that `%useD_func` does not touch any instances of
class `C` and `%c` does not contain any ivars of type `D` and is final so none
can be added. At first glance, this seems to suggest that we can move `(10)`
before `(8')` and then pair/eliminate `(4')` and `(10)`. But is this a safe
optimization perform? Absolutely Not! Why? Remember that since `useC_func`
assigns `nil` to `GLOBAL_C`, after `(7')`, `%c` could have a reference count
of 1. Thus `(10)` _may_ invoke the destructor of `C`. Since this destructor
calls an opaque function that _could_ potentially write to `GLOBAL_D`, we may be
be passing `%d2`, an already deallocated object to `%useD_func`, an illegal
optimization!

Lets think a bit more about this example and consider this example at the
language level. Remember that while Swift's deinit are not asychronous, we do
not allow for user level code to create dependencies from the body of the
destructor into the normal control flow that has called it. This means that
there are two valid results of this code:

- Operation Sequence 1: No optimization is performed and `%d2` is passed to
  `%useD_func`.
- Operation Sequence 2: We shorten the lifetime of `%c` before `%useD_func` and
   a different instance of `$D` is passed into `%useD_func`.

The fact that 1 occurs without optimization is just as a result of an
implementation detail of SILGen. 2 is also a valid sequence of operations.

Given that:

1. As a principle, the optimizer does not consider such dependencies to avoid
   being overly conservative.
2. We provide constructs to ensure appropriate lifetimes via the usage of
   constructs such as fix_lifetime.

We need to figure out how to express our optimization such that 2
happens. Remember that one of the optimizations that we performed at the
beginning was to move `(6)` over `(7')`, i.e., transform this:

      %d = alloc_ref $D
      %global_d_addr = global_addr GLOBAL_D : $D
      %d = load %global_d_addr : $*D (5)
      strong_retain %d : $D (6)

      // Call the user functions passing in the instances that we loaded from the globals.
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')

into:

      %global_d_addr = global_addr GLOBAL_D : $D
      %d2 = load %global_d_addr : $*D (5)

      // Call the user functions passing in the instances that we loaded from the globals.
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')
      strong_retain %d2 : $D (6)

This transformation in Swift corresponds to transforming:

      let d = GLOBAL_D
      useC(c)

to:

      let d_raw = load_d_value(GLOBAL_D)
      useC(c)
      let d = take_ownership_of_d(d_raw)

This is clearly an instance where we have moved a side-effect in between the
loading of the data and the taking ownership of such data, that is before the
`let` is fully initialized. What if instead of just moving the retain, we moved
the entire let statement? This would then result in the following swift code:

      useC(c)
      let d = GLOBAL_D

and would correspond to the following SIL snippet:

      %global_d_addr = global_addr GLOBAL_D : $D

      // Call the user functions passing in the instances that we loaded from the globals.
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')
      %d2 = load %global_d_addr : $*D (5)
      strong_retain %d2 : $D (6)

Moving the load with the strong_retain to ensure that the full initialization is
performed even after code motion causes our SIL to look as follows:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D

      strong_retain %c : $C (4')
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')

      %d2 = load %global_d : $*D (5)
      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %c : $C (10)
    }

Giving us the exact result that we want: Operation Sequence 2!

### Defining load [copy]

Given that we wish the load, store to be tightly coupled together, it is natural
to express this operation as a `load [copy]` instruction. Lets define the `load
[copy]` instruction as follows:

    %1 = load [copy] %0 : $*C

      =>

    %1 = load %0 : $*C
    retain_value %1 : $C

Now lets transform our initial example to use this instruction:

Notice how now if we move `(7)` over `(3)` and `(6)` now, we get the following SIL:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      %c2 = load [copy] %global_c : $*C (3)
      %d2 = load [copy] %global_d : $*D (5)

      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c2) : $@convention(thin) (@owned C) -> () (7)

      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %d : $D (9)
      strong_release %c : $C (10)
    }

We then perform the previous code motion:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      %c2 = load [copy] %global_c : $*C (3)
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c2) : $@convention(thin) (@owned C) -> () (7)
      strong_release %d : $D (9)

      %d2 = load [copy] %global_d : $*D (5)
      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)
      strong_release %c : $C (10)
    }

We then would like to eliminate `(9)` and `(10)` by pairing them with `(3)` and
`(8)`. Can we still do so? One way we could do this is by introducing the
`[take]` flag. The `[take]` flag on a `load [take]` says that one is
semantically loading a value from a memory location and are taking ownership of
the value thus eliding the retain. In terms of SIL this flag is defined as:

    %x = load [take] %x_ptr : $*C

      =>

    %x = load %x_ptr : $*C

Why do we care about having such a `load [take]` instruction when we could just
use a `load`? The reason why is that a normal `load` has unsafe unowned
ownership (i.e. it has no implications on ownership). We would like for memory
that has non-trivial type to only be able to be loaded via instructions that
maintain said ownership. We will allow for casting to trivial types as usual to
provide such access if it is required.

Thus we have achieved the desired result:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      %c2 = load [take] %global_c : $*C (3)
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c2) : $@convention(thin) (@owned C) -> () (7)

      %d2 = load [take] %global_d : $*D (5)
      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)
    }

----

Attached below is an updated version of the proposal. Again a rendered version is located at:

https://gottesmm.github.io/proposals/high-level-arc-memory-operations.html

Michael

----

# Summary

This document proposes:

1. adding the following ownership qualifiers to `load`: `[take]`, `[copy]`,
   `[borrow]`, `[trivial]`.
2. adding the following ownership qualifiers to `store`: `[init]`, `[assign]`,
   `[trivial]`.
3. requiring all `load` and `store` operations to have ownership qualifiers.
4. banning the use of `load [trivial]`, `store [trivial]` on memory locations of
   `non-trivial` type.

This will allow for:

1. eliminating optimizer miscompiles that occur due to releases being moved into
   the region in between a `load`/`retain`, `load`/`release`,
   `store`/`release`. (For a specific example, see the appendix).
2. explicitly modeling `load [trivial]`/`store [trivial]` as having `unsafe
   unowned` ownership semantics. This will be enforced via the verifier.
3. more aggressive ARC code motion.

# Definitions

## ownership qualified load

We propose four different ownership qualifiers for load. Define `load [trivial]`
as:

    %x = load [trivial] %x_ptr : $*Int

      =>

    %x = load %x_ptr : $*Int

A `load [trivial]` can not be used to load values of non-trivial type.

Should we mandate the reverse as well, that e.g. load [copy] cannot be used to
load values of trivial type? That's a little more work for substituting cloners, but it
keeps everything nice and canonical.

No. I think that in the trivial case, load [copy] optimizes to load [trivial] as a canonicalization. This is just like applying a retain_value to a trivial type.

Define
`load [copy]` as:

    %x = load [copy] %x_ptr : $*C

      =>

    %x = load %x_ptr : $*C
    retain_value %x : $C

Then define `load [take]` as:

    %x = load [take] %x_ptr : $*Builtin.NativeObject

      =>

    %x = load %x_ptr : $*Builtin.NativeObject

**NOTE** `load [take]` implies that the loaded from memory location no longer
owns the result object (i.e. a take is a move). Loading from the memory location
again without reinitialization is illegal.

Next we provide `load [borrow]`:

    %x = load [borrow] %x_ptr : $*Builtin.NativeObject
    ...
    endBorrow(%x, %x_ptr)

      =>

    %x = load %x_ptr : $*Builtin.NativeObject
    ...
    endBorrow(%x, %x_ptr)

`load [borrow]` implies that in the region between the `load` and the
`endBorrow`, the loaded object must semantically remain alive.

For consistency with other multi-word SIL instructions, this should be end_borrow.

Sure.

I wonder whether it might make more sense for load [borrow] to be a different instruction.
There's a couple reasons for that first. The first is that it's the only load which introduces
a scope, which is a really big difference structurally. The second is that it's the only load
which returns a non-owned value, which will be a typing difference when we record
ownership in the type system.

I am fine with a load_borrow. If this is the only change left that you want can I just send out a proposal with that small change and start implementing. I am nervous about perfection being the enemy of the good (and I want to start implementing this weekend if possible *evil smile*).

Anyway, I really like that load [borrow] is scoped.. Are you planning to include the formation
restrictions on scopes instructions immediately, or will you do that in a later proposal?

It will be done in a later proposal. We are just trying to set the stage for verification.

The requirements we need from scopes are:
  - there has to be a well-defined set of IPs that lie within the scope and
  - there can't be a non-explicit transition into or out of the scope.

One way to get the first condition is to say that there has to be a unique scope-ender that
post-dominates the scope-beginner, but that's a pretty hard restriction for SILGen to satisfy
(as well as the optimizer, I imagine), and it doesn't handle "unreachable" or infinite loops.
We need to allow multiple scope-enders, and we need to allow scope-enders to be missing
in some cases.

I agree with you here. We definitely want to be able to support multiple scope-enders.

Here's the right formalism, I think:

For all walks W beginning from the entry point of the function:
  For each node B in the CFG which is a scope-beginner:
    Let E be the set of scope-enders for B, and consider just the sub-sequence S of nodes
    of W where the node is a member of {B} U E. Then the elements of S at even
    positions (starting from 0) must be B, and the elements at odd positions must be
    members of E. Furthermore, if the walk ends in a return or throw instruction, then
    S must have even length.

Note that for this to be true, all the scope-enders must be dominated by the scope-beginner.

It is sufficient to just consider the set of "beeline" paths, i.e. acyclic paths ending in either a true
terminator (a return, throw, or unreachable) or an edge back to a node already in the path.
No such path may include multiple scope-enders for the same scope-beginner. If the path ends
in a return or throw, it must include a matching scope-ender after every scope-beginner. If
it ends in a loop back, then for every scope-beginner in the path, if the path contains a scope-ender
then the loop destination must either precede the scope-beginner or follow the scope-ender;
otherwise, the loop destination must follow the scope-beginner.

Or, as a decision algorithm in Swift for a single scope-beginner:

  var blockEntryIsInScope = [Block: Bool]()
  func scan(startingFrom inst: Instruction, isInScope: Bool) {
    if inst is ReturnInst || inst is ThrowInst {
      guard !isInScope else { invalid("ended function while in scope") }
      return
    }

    if let term = inst as? TerminatorInst {
      for successor in term.successors {
        guard begin.dominates(successor) else {
          guard !isInScope else { invalid("branch out of scope while in scope") }
          continue
        }
        if let cachedValue = blockEntryIsInScope[successor] {
          if cachedValue != isInScope {
            invalid(isInScope ? "branch out of scope while in scope" : "branch into scope after exiting scope")
          }
        } else {
          blockEntryIsInScope[successor] = isInScope
          scan(startingFrom: successor.begin, isInScope: isInScope)
        }
      }
      return
    }

    if inst.endsScopeOf(begin) {
      guard isInScope else { invalid("ending scope that was already ended") }
      scan(startingFrom: inst.next, isInScope: false)
    } else {
      scan(startingFrom: inst.next, isInScope: isInScope)
    }
  }
  scan(startingFrom: begin, isInScope: true)

Since this is tangential to the current proposal, can we introduce a side thread?

···

On Oct 7, 2016, at 5:09 PM, John McCall <rjmccall@apple.com> wrote:

On Oct 7, 2016, at 2:38 PM, Michael Gottesman <mgottesman@apple.com <mailto:mgottesman@apple.com>> wrote:

John.

The `endBorrow` communicates to the optimizer:

1. That the value in `%x_ptr` should not be destroyed before endBorrow.
2. Uses of `%x` should not be sunk past endBorrow since `%x` is only a shallow
   copy of the value in `%x_ptr` and past that point `%x_ptr` may not remain
   alive.

An example of where this construct is useful is when one has a let binding to a
class instance `c` that contains a let field `f`. In that case `c`'s lifetime
guarantees `f`'s lifetime meaning that returning `f` over the function call
boundary is safe.

## ownership qualified store

First define a `store [trivial]` as:

    store %x to [trivial] %x_ptr : $*Int

      =>

    store %x to %x_ptr : $*Int

The verifier will prevent this instruction from being used on types with
non-trivial ownership. Define a `store [assign]` as follows:

    store %x to [assign] %x_ptr : $*C

       =>

    %old_x = load %x_ptr : $*C
    store %new_x to %x_ptr : $*C
    release_value %old_x : $C

*NOTE* `store` is defined as a consuming operation. We also provide
`store [init]` in the case where we know statically that there is no
previous value in the memory location:

    store %x to [init] %x_ptr : $*C

       =>

    store %new_x to %x_ptr : $*C

# Implementation

## Goals

Our implementation strategy goals are:

1. zero impact on other compiler developers until the feature is fully
   developed. This implies all work will be done behind a flag.
2. separation of feature implementation from updating passes.

Goal 2 will be implemented via a pass that transforms ownership qualified
`load`/`store` instructions into unqualified `load`/`store` right after SILGen.

## Plan

We begin by adding initial infrastructure for our development. This means:

1. Adding to SILOptions a disabled by default flag called
"EnableSILOwnershipModel". This flag will be set by a false by default frontend
option called "-enable-sil-ownership-mode".

2. Bots will be brought up to test the compiler with
   "-enable-sil-ownership-model" set to true. The specific bots are:

   * RA-OSX+simulators
   * RA-Device
   * RA-Linux.

   The bots will run once a day until the feature is close to completion. Then a
   polling model will be followed.

Now that change isolation is borrow, we develop building blocks for the
optimization:

1. Two enums will be defined: `LoadInstOwnershipQualifier`,
   `StoreInstOwnershipQualifier`. The exact definition of these enums are as
   follows:

       enum class LoadOwnershipQualifier {
         Unqualified, Take, Copy, Borrow, Trivial
       };
       enum class StoreOwnershipQualifier {
         Unqualified, Init, Assign, Trivial
       };

    *NOTE* `LoadOwnershipQualifier::Unqualified` and
    `StoreOwnershipQualifier::Unqualified` are only needed for staging purposes.

2. Creating a `LoadInst`, `StoreInst` will be changed to require an ownership
qualifier. At this stage, this argument will default to `Unqualified`. "Bare"
`load`, `store` when parsed via textual SIL will be considered to be
unqualified. This implies that the rest of the compiler will not have to be
changed as a result of this step.

3. Support will be added to SIL, IRGen, Serialization, SILPrinting, and SIL
Parsing for the rest of the qualifiers. SILGen will not be modified at this
stage.

4. A pass called the "OwnershipModelEliminator" will be implemented. It will
   blow up all `load`, `store` instructions with non `*::Unqualified` ownership
   into their constituant ARC operations and `*::Unqualified` `load`, `store`
   insts.

3. An option called "EnforceSILOwnershipMode" will be added to the verifier. If
the option is set, the verifier will assert if:

   a. `load`, `store` operations with trivial ownership are applied to memory
      locations with non-trivial type.

   b. `load`, `store` operation with unqualified ownership type are present in
   the IR.

Finally, we wire up the building blocks:

1. If SILOption.EnableSILOwnershipModel is true, then the after SILGen SIL
   verification will be performed with EnforceSILOwnershipModel set to true.
2. If SILOption.EnableSILOwnershipModel is true, then the pass manager will run
   the OwnershipModelEliminator pass right after SILGen before the normal pass
   pipeline starts.
3. SILGen will be changed to emit non-unqualified ownership qualifiers on load,
   store instructions when the EnableSILOwnershipModel flag is set. We will use
   the verifier throwing to guarantee that we are not missing any specific
   cases.

Then once all of the bots are green, we change SILOption.EnableSILOwnershipModel
to be true by default. After a cooling off period, we move all of the code
behind the SILOwnershipModel flag in front of the flag. We do this so we can
reuse that flag for further SILOwnershipModel changes.

## Optimizer Changes

Since the SILOwnershipModel eliminator will eliminate the ownership qualifiers
on load, store instructions right after ownership verification, there will be no
immediate affects on the optimizer and thus the optimizer changes can be done in
parallel with the rest of the ARC optimization work.

But, in the long run, we want to enforce these ownership invariants all
throughout the SIL pipeline implying these ownership qualified `load`, `store`
instructions must be processed by IRGen, not eliminated by the SILOwnershipModel
eliminator. Thus we will need to update passes to handle these new
instructions. The main optimizer changes can be separated into the following
areas: memory forwarding, dead stores, ARC optimization. In all of these cases,
the necessary changes are relatively trivial to respond to. We give a quick
taste of two of them: store->load forwarding and ARC Code Motion.

### store->load forwarding

Currently we perform store->load forwarding as follows:

    store %x to %x_ptr : $C
    ... NO SIDE EFFECTS THAT TOUCH X_PTR ...
    %y = load %x_ptr : $C
    use(%y)

      =>

    store %x to %x_ptr : $C
    ... NO SIDE EFFECTS THAT TOUCH X_PTR ...
    use(%x)

In a world, where we are using ownership qualified load, store, we have to also
consider the ownership implications. *NOTE* Since we are not modifying the
store, `store [assign]` and `store [init]` are treated the same. Thus without
any loss of generality, lets consider solely `store`.

    store %x to [assign] %x_ptr : $C
    ... NO SIDE EFFECTS THAT TOUCH X_PTR ...
    %y = load [copy] %x_ptr : $C
    use(%y)

      =>

    store %x to [assign] %x_ptr : $C
    ... NO SIDE EFFECTS THAT TOUCH X_PTR ...
    strong_retain %x
    use(%x)

### ARC Code Motion

If ARC Code Motion wishes to move the ARC semantics of ownership qualified
`load`, `store` instructions, it must now consider read/write effects. On the
other hand, it will be able to now not consider the side-effects of destructors
when moving retain/release operations.

### Normal Code Motion

Normal code motion will lose some effectiveness since many of the load/store
operations that it used to be able to move now must consider ARC information. We
may need to consider running ARC code motion earlier in the pipeline where we
normally run Normal Code Motion to ensure that we are able to handle these
cases.

### ARC Optimization

The main implication for ARC optimization is that instead of eliminating just
retains, releases, it must be able to recognize ownership qualified `load`,
`store` and set their flags as appropriate.

### Function Signature Optimization

Semantic ARC affects function signature optimization in the context of the owned
to borrow optimization. Specifically:

1. A `store [assign]` must be recognized as a release of the old value that is
   being overridden. In such a case, we can move the `release` of the old value
   into the caller and change the `store [assign]` into a `store [init]`.
2. A `load [copy]` must be recognized as a retain in the callee. Then function
   signature optimization will transform the `load [copy]` into a `load
   [borrow]`. This would require the addition of a new `@borrow` return
   value convention.

# Appendix

## Partial Initialization of Loadable References in SIL

In SIL, a value of non-trivial loadable type is loaded from a memory location as
follows:

    %x = load %x_ptr : $*S
    ...
    retain_value %x_ptr : $S

At first glance, this looks reasonable, but in truth there is a hidden drawback:
the partially initialized zone in between the load and the retain
operation. This zone creates a period of time when an "evil optimizer" could
insert an instruction that causes x to be deallocated before the copy is
finished being initialized. Similar issues come up when trying to perform a
store of a non-trival value into a memory location.

Since this sort of partial initialization is allowed in SIL, the optimizer is
forced to be overly conservative when attempting to move releases passed retains
lest the release triggers a deinit that destroys a value like `%x`. Lets look at
two concrete examples that show how semantically providing ownership qualified
load, store instructions eliminate this problem.

**NOTE** Without any loss of generality, we will speak of values with reference
semantics instead of non-trivial values.

## Case Study: Partial Initialization and load [copy]

### The Problem

Consider the following swift program:

    func opaque_call()

    final class C {
      var int: Int = 0
      deinit {
        opaque_call()
      }
    }

    final class D {
      var int: Int = 0
    }

    var GLOBAL_C : C? = nil
    var GLOBAL_D : D? = nil

    func useC(_ c: C)
    func useD(_ d: D)

    func run() {
        let c = C()
        GLOBAL_C = c
        let d = D()
        GLOBAL_D = d
        useC(c)
        useD(d)
    }

Notice that both `C` and `D` have fixed layouts and separate class hierarchies,
but `C`'s deinit has a call to the function `opaque_call` which may write to
`GLOBAL_D` or `GLOBAL_C`. Additionally assume that both `useC` and `useD` are
known to the compiler to not have any affects on instances of type `D`, `C`
respectively and useC assigns `nil` to `GLOBAL_C`. Now consider the following
valid SIL lowering for `run`:

    sil_global GLOBAL_D : $D
    sil_global GLOBAL_C : $C

    final class C {
      var x: Int
      deinit
    }

    final class D {
      var x: Int
    }

    sil @useC : $@convention(thin) () -> ()
    sil @useD : $@convention(thin) () -> ()

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      %c2 = load %global_c : $*C (3)
      strong_retain %c2 : $C (4)
      %d2 = load %global_d : $*D (5)
      strong_retain %d2 : $D (6)

      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c2) : $@convention(thin) (@owned C) -> () (7)

      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %d : $D (9)
      strong_release %c : $C (10)
    }

Lets optimize this function! First we perform the following operations:

1. Since `(2)` is storing to an identified object that can not be `GLOBAL_C`, we
   can store to load forward `(1)` to `(3)`.
2. Since a retain does not block store to load forwarding, we can forward `(2)`
   to `(5)`. But lets for the sake of argument, assume that the optimizer keeps
   such information as an analysis and does not perform the actual load->store
   forwarding.
3. Even though we do not foward `(2)` to `(5)`, we can still move `(4)` over
   `(6)` so that `(4)` is right before `(7)`.

This yields (using the ' marker to designate that a register has had load-store
forwarding applied to it):

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      strong_retain %c : $C (4')
      %d2 = load %global_d : $*D (5)
      strong_retain %d2 : $D (6)

      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')

      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %d : $D (9)
      strong_release %c : $C (10)
    }

Then by assumption, we know that `%useC` does not perform any releases of any
instances of class `D`. Thus `(6)` can be moved past `(7')` and we can then pair
and eliminate `(6)` and `(9)` via the rules of ARC optimization using the
analysis information that `%d2` and `%d` are th same due to the possibility of
performing store->load forwarding. After performing such transformations, `run`
looks as follows:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D

      %d2 = load %global_d : $*D (5)
      strong_retain %c : $C (4')
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')

      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %c : $C (10)
    }

Now by assumption, we know that `%useD_func` does not touch any instances of
class `C` and `%c` does not contain any ivars of type `D` and is final so none
can be added. At first glance, this seems to suggest that we can move `(10)`
before `(8')` and then pair/eliminate `(4')` and `(10)`. But is this a safe
optimization perform? Absolutely Not! Why? Remember that since `useC_func`
assigns `nil` to `GLOBAL_C`, after `(7')`, `%c` could have a reference count
of 1. Thus `(10)` _may_ invoke the destructor of `C`. Since this destructor
calls an opaque function that _could_ potentially write to `GLOBAL_D`, we may be
be passing `%d2`, an already deallocated object to `%useD_func`, an illegal
optimization!

Lets think a bit more about this example and consider this example at the
language level. Remember that while Swift's deinit are not asychronous, we do
not allow for user level code to create dependencies from the body of the
destructor into the normal control flow that has called it. This means that
there are two valid results of this code:

- Operation Sequence 1: No optimization is performed and `%d2` is passed to
  `%useD_func`.
- Operation Sequence 2: We shorten the lifetime of `%c` before `%useD_func` and
   a different instance of `$D` is passed into `%useD_func`.

The fact that 1 occurs without optimization is just as a result of an
implementation detail of SILGen. 2 is also a valid sequence of operations.

Given that:

1. As a principle, the optimizer does not consider such dependencies to avoid
   being overly conservative.
2. We provide constructs to ensure appropriate lifetimes via the usage of
   constructs such as fix_lifetime.

We need to figure out how to express our optimization such that 2
happens. Remember that one of the optimizations that we performed at the
beginning was to move `(6)` over `(7')`, i.e., transform this:

      %d = alloc_ref $D
      %global_d_addr = global_addr GLOBAL_D : $D
      %d = load %global_d_addr : $*D (5)
      strong_retain %d : $D (6)

      // Call the user functions passing in the instances that we loaded from the globals.
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')

into:

      %global_d_addr = global_addr GLOBAL_D : $D
      %d2 = load %global_d_addr : $*D (5)

      // Call the user functions passing in the instances that we loaded from the globals.
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')
      strong_retain %d2 : $D (6)

This transformation in Swift corresponds to transforming:

      let d = GLOBAL_D
      useC(c)

to:

      let d_raw = load_d_value(GLOBAL_D)
      useC(c)
      let d = take_ownership_of_d(d_raw)

This is clearly an instance where we have moved a side-effect in between the
loading of the data and the taking ownership of such data, that is before the
`let` is fully initialized. What if instead of just moving the retain, we moved
the entire let statement? This would then result in the following swift code:

      useC(c)
      let d = GLOBAL_D

and would correspond to the following SIL snippet:

      %global_d_addr = global_addr GLOBAL_D : $D

      // Call the user functions passing in the instances that we loaded from the globals.
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')
      %d2 = load %global_d_addr : $*D (5)
      strong_retain %d2 : $D (6)

Moving the load with the strong_retain to ensure that the full initialization is
performed even after code motion causes our SIL to look as follows:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D

      strong_retain %c : $C (4')
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')

      %d2 = load %global_d : $*D (5)
      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %c : $C (10)
    }

Giving us the exact result that we want: Operation Sequence 2!

### Defining load [copy]

Given that we wish the load, store to be tightly coupled together, it is natural
to express this operation as a `load [copy]` instruction. Lets define the `load
[copy]` instruction as follows:

    %1 = load [copy] %0 : $*C

      =>

    %1 = load %0 : $*C
    retain_value %1 : $C

Now lets transform our initial example to use this instruction:

Notice how now if we move `(7)` over `(3)` and `(6)` now, we get the following SIL:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      %c2 = load [copy] %global_c : $*C (3)
      %d2 = load [copy] %global_d : $*D (5)

      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c2) : $@convention(thin) (@owned C) -> () (7)

      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %d : $D (9)
      strong_release %c : $C (10)
    }

We then perform the previous code motion:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      %c2 = load [copy] %global_c : $*C (3)
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c2) : $@convention(thin) (@owned C) -> () (7)
      strong_release %d : $D (9)

      %d2 = load [copy] %global_d : $*D (5)
      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)
      strong_release %c : $C (10)
    }

We then would like to eliminate `(9)` and `(10)` by pairing them with `(3)` and
`(8)`. Can we still do so? One way we could do this is by introducing the
`[take]` flag. The `[take]` flag on a `load [take]` says that one is
semantically loading a value from a memory location and are taking ownership of
the value thus eliding the retain. In terms of SIL this flag is defined as:

    %x = load [take] %x_ptr : $*C

      =>

    %x = load %x_ptr : $*C

Why do we care about having such a `load [take]` instruction when we could just
use a `load`? The reason why is that a normal `load` has unsafe unowned
ownership (i.e. it has no implications on ownership). We would like for memory
that has non-trivial type to only be able to be loaded via instructions that
maintain said ownership. We will allow for casting to trivial types as usual to
provide such access if it is required.

Thus we have achieved the desired result:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      %c2 = load [take] %global_c : $*C (3)
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c2) : $@convention(thin) (@owned C) -> () (7)

      %d2 = load [take] %global_d : $*D (5)
      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)
    }

On Oct 6, 2016, at 3:03 PM, John McCall <rjmccall@apple.com <mailto:rjmccall@apple.com>> wrote:

On Oct 5, 2016, at 4:48 PM, Michael Gottesman <mgottesman@apple.com <mailto:mgottesman@apple.com>> wrote:

On Oct 5, 2016, at 4:40 PM, Michael Gottesman via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

On Oct 4, 2016, at 1:04 PM, John McCall <rjmccall@apple.com <mailto:rjmccall@apple.com>> wrote:

On Sep 30, 2016, at 11:54 PM, Michael Gottesman via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

The document attached below contains the first "Semantic ARC" mini proposal: the High Level ARC Memory Operations Proposal.

An html rendered version of this markdown document is available at the following URL:

https://gottesmm.github.io/proposals/high-level-arc-memory-operations.html

----

# Summary

This document proposes:

1. adding the `load_strong`, `store_strong` instructions to SIL. These can only
   be used with memory locations of `non-trivial` type.

I would really like to avoid using the word "strong" here. Under the current proposal, these instructions will be usable with arbitrary non-trivial types, not just primitive class references. Even if you think of an aggregate that happens to contain one or more strong references as some sort of aggregate strong reference (which is questionable but not completely absurd), we already have loadable non-strong class references that this operation would be usable with, like native unowned references. "load_strong %0 : $*@sil_unowned T" as an operation yielding a scalar "@sil_unowned T" is ridiculous, and it will only get more ridiculous when we eventually allow this operation to work with types that are currently address-only, like weak references.

Brainstorming:

Something like load_copy and store_copy would be a bit unfortunate, since store_copy doesn't actually copy the source operand and we want to have a load_copy [take].

load_value and store_value seem excessively generic. It's not like non-trivial types aren't values.

One question that comes to mind: do we actually need new instructions here other than for staging purposes? We don't actually need new instructions for pseudo-linear SIL to work; we just need to say that we only enforce pseudo-linearity for non-trivial types.

If we just want the instruction to be explicit about ownership so that we can easily distinguish these cases, we can make the rule always explicit, e.g.:
  load [take] %0 : $*MyClass
  load [copy] %0 : $*MyClass
  load [trivial] %0 : $*Int

  store %0 to [initialization] %1 : $*MyClass
  store %0 to [assignment] %1 : $*MyClass
  store %0 to [trivial] %1 : $*Int

John.

The reason why I originally suggested to go the load_strong route is that we already have load_weak, load_unowned instructions. If I could add a load_strong instruction, then it would make sense to assign an engineer to do a pass over all 3 of these instructions and combine them into 1 load instruction. That is, first transform into a form amenable for canonicalization and then canonicalize all at once.

As you pointed out, both load_unowned and load_weak involve representation changes in type (for instance the change of weak pointers to Optional<T>). Such a change would be against the "spirit" of a load instruction to perform such representation changes versus ownership changes.

In terms of the properties that we actually want here, what is important is that we can verify that no non-trivially typed values are loaded in an unsafe unowned manner. That can be done also with ownership flags on load/store.

Does this sound reasonable:

1. We introduce two enums that define memory ownership changes, one for load and one for store. Both of these enums will contain a [trivial] ownership.
2. We enforce in the verifier that non-trivial types must have a non-trivial ownership modifier on any memory operations that they are involved in.

Sorry for not being explicit. I will not add new instructions, just modifiers. Assuming that this is agreeable to you, I am going to prepare a quick additional version of the proposal document.

That sounds great, thanks.

John.

Sorry for not being clear.

deinits in ARC are not sequenced with respect to memory operations in normal control flow, but if ARC performs any code motion, we must ensure that memory safety is respected. Such semantics are not new.

The main change here is that previously there were situations where we would split up the load/retain in a load [copy] operation. Then if the right things occured, we could cause the object to be deallocated before we use the value or took full ownership of the value.

Consider the following example:

···

On Oct 10, 2016, at 8:05 PM, Chris Lattner <clattner@apple.com> wrote:

On Oct 7, 2016, at 2:38 PM, Michael Gottesman via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

Attached below is an updated version of the proposal. Again a rendered version is located at:

https://gottesmm.github.io/proposals/high-level-arc-memory-operations.html

Michael

This looks great Michael, a couple of comments/questions:

typo: affects -> effects.

>If ARC Code Motion wishes to move the ARC semantics of ownership qualified load, store instructions, it must now consider read/write effects. On the other hand, it will be able to now not consider the side-effects of destructors when moving retain/release operations.

Can you elaborate on what you mean by this? Does this mean the ARC optimizer has additional freedom somehow to ignore side effects in deinits? What grants that ability?

----
class D {}
class D1 : D {}
class D2 : D {}

var GLOBAL_D : D = D1()

class C { deinit { GLOBAL_D = D2 } }

func main() {
  let c = C()
  let d = GLOBAL_D
  useC(c)
  useD(d)
}

main()
----

Assume that useC does not directly in any way touch an instance of class D except via the destructor .

Since memory operations are not sequenced with respect to deinits, there are two correct programs here that the optimizer can produce: the original and the one where useC(c) and GLOBAL_D are swapped, i.e.:

----
func main() {
  let c = C()
  useC(c)
  let d = GLOBAL_D
  useD(d)
}
----

In the first program, d would be an instance of class D1. In the second, it would be an instance of class D2. Notice how in both programs though, no deinitialized object is accessed. On the other hand, imagine if we had split main like so:

----
func main() {
  let c = C()
  let d = unsafe_unowned_load(GLOBAL_D)
  useC(c)
  let owned_d = retain(d)
  useD(owned_d)
}
----

In this case, we would be passing off to useD a deallocated instance of class D1 which would be undefined behavior.

So as long as for these load [copy] operations, we move the load/retain fused together, we can guarantee that an object is produced instantaneously in a safe way without any worry.

Was this helpful?
Michael

-Chris

This seems fine to me… at a high level!
-Andy

···

On Oct 14, 2016, at 2:44 PM, Michael Gottesman via swift-dev <swift-dev@swift.org> wrote:

Attached below is a final version of the proposal. I am going to commit it to the repo if there are no further questions/changes/etc.

https://gottesmm.github.io/proposals/high-level-arc-memory-operations.html

Michael

----

# Summary

This document proposes:

1. adding the following ownership qualifiers to `load`: `[take]`, `[copy]`,
   `[trivial]`.
2. adding the following ownership qualifiers to `store`: `[init]`, `[assign]`,
   `[trivial]`.
3. adding the `load_borrow` instruction and the `end_borrow` instruction.
3. requiring all `load` and `store` operations to have ownership qualifiers.
4. banning the use of `load [trivial]`, `store [trivial]` on memory locations of
   `non-trivial` type.

This will allow for:

1. eliminating optimizer miscompiles that occur due to releases being moved into
   the region in between a `load`/`retain`, `load`/`release`,
   `store`/`release`. (For a specific example, see the appendix).
2. explicitly modeling `load [trivial]`/`store [trivial]` as having `unsafe
   unowned` ownership semantics. This will be enforced via the verifier.
3. more aggressive ARC code motion.

# Definitions

## ownership qualified load

We propose three different ownership qualifiers for load. Define `load [trivial]`
as:

    %x = load [trivial] %x_ptr : $*Int

      =>

    %x = load %x_ptr : $*Int

A `load [trivial]` can not be used to load values of non-trivial type. Define
`load [copy]` as:

    %x = load [copy] %x_ptr : $*C

      =>

    %x = load %x_ptr : $*C
    retain_value %x : $C

Then define `load [take]` as:

    %x = load [take] %x_ptr : $*Builtin.NativeObject

      =>

    %x = load %x_ptr : $*Builtin.NativeObject

**NOTE** `load [take]` implies that the loaded from memory location no longer
owns the result object (i.e. a take is a move). Loading from the memory location
again without reinitialization is illegal.

## load_borrow and end_borrow

Next we provide `load_borrow` and `end_borrow`:

    %x = load_borrow %x_ptr : $*Builtin.NativeObject
    ...
    end_borrow %x, %x_ptr : $*Builtin.NativeObject

      =>

    %x = load %x_ptr : $*Builtin.NativeObject
    ...
    endLifetime %x : $Builtin.NativeObject
    fixLifetime %x_ptr : $*Builtin.NativeObject

`load [borrow]` implies that in the region between the `load` and the
`end_borrow`, the loaded object must semantically remain alive. The `end_borrow`
communicates to the optimizer:

1. that the value in `%x_ptr` should not be destroyed before endBorrow.
2. uses of `%x` should not be sunk past endBorrow since `%x` is only a shallow
   copy of the value in `%x_ptr` and past that point `%x_ptr` may not remain
   alive.

An example of where this construct is useful is when one has a let binding to a
class instance `c` that contains a let field `f`. In that case `c`'s lifetime
guarantees `f`'s lifetime meaning that returning `f` over the function call
boundary is safe.

*NOTE* since the SILOwnershipModelEliminator will not process these
instructions, endLifetime is just a strawman instruction that will not be
implemented. In practice though, IRGen will need to create a suitable barrier to
ensure that LLVM does not move any uses of `%x` past the `fixLifetime`
instruction of `%x_ptr` once we begin creating such instructions as a result of
ARC optimization.

## ownership qualified store

First define a `store [trivial]` as:

    store %x to [trivial] %x_ptr : $*Int

      =>

    store %x to %x_ptr : $*Int

The verifier will prevent this instruction from being used on types with
non-trivial ownership. Define a `store [assign]` as follows:

    store %x to [assign] %x_ptr : $*C

       =>

    %old_x = load %x_ptr : $*C
    store %new_x to %x_ptr : $*C
    release_value %old_x : $C

*NOTE* `store` is defined as a consuming operation. We also provide
`store [init]` in the case where we know statically that there is no
previous value in the memory location:

    store %x to [init] %x_ptr : $*C

       =>

    store %new_x to %x_ptr : $*C

# Implementation

## Goals

Our implementation strategy goals are:

1. zero impact on other compiler developers until the feature is fully
   developed. This implies all work will be done behind a flag.
2. separation of feature implementation from updating passes.

Goal 2 will be implemented via a pass that transforms ownership qualified
`load`/`store` instructions into unqualified `load`/`store` right after SILGen.

## Plan

We begin by adding initial infrastructure for our development. This means:

1. Adding to SILOptions a disabled by default flag called
"EnableSILOwnershipModel". This flag will be set by a false by default frontend
option called "-enable-sil-ownership-mode".

2. Bots will be brought up to test the compiler with
   "-enable-sil-ownership-model" set to true. The specific bots are:

   * RA-OSX+simulators
   * RA-Device
   * RA-Linux.

   The bots will run once a day until the feature is close to completion. Then a
   polling model will be followed.

Now that change isolation is borrow, we develop building blocks for the
optimization:

1. Two enums will be defined: `LoadInstOwnershipQualifier`,
   `StoreInstOwnershipQualifier`. The exact definition of these enums are as
   follows:

       enum class LoadOwnershipQualifier {
         Unqualified, Take, Copy, Trivial
       };
       enum class StoreOwnershipQualifier {
         Unqualified, Init, Assign, Trivial
       };

    *NOTE* `LoadOwnershipQualifier::Unqualified` and
    `StoreOwnershipQualifier::Unqualified` are only needed for staging purposes.

2. Creating a `LoadInst`, `StoreInst` will be changed to require an ownership
qualifier. At this stage, this argument will default to `Unqualified`. "Bare"
`load`, `store` when parsed via textual SIL will be considered to be
unqualified. This implies that the rest of the compiler will not have to be
changed as a result of this step.

3. Support will be added to SIL, IRGen, Serialization, SIL Printing, and SIL
Parsing for the rest of the qualifiers. SILGen will not be modified at this
stage.

4. The `load_borrow` and `end_borrow` instructions will be implemented in SIL,
   IRGen, Serialization, SIL Printing, and SIL Parsing. They will not be used
   immediately.

4. A pass called the "OwnershipModelEliminator" will be implemented. It will
   blow up all `load`, `store` instructions with non `*::Unqualified` ownership
   into their constituant ARC operations and `*::Unqualified` `load`, `store`
   insts. It will not process `load_borrow` and `end_borrow` since currently it
   is not expected for SILGen to emit such instructions.

5. An option called "EnforceSILOwnershipMode" will be added to the verifier. If
the option is set, the verifier will assert if:

   a. `load`, `store` operations with trivial ownership are applied to memory
      locations with non-trivial type.

   b. `load`, `store` operation with unqualified ownership type are present in
   the IR.

   c. `load_borrow` or `end_borrow` are present in the IR. This is because
   currently we do not support SIL containing such instructions in SIL
   Ownership Mode. Once we have the ability to verify borrowing scopes, this
   will no longer be the case, but this is a different proposal.

Finally, we wire up the building blocks:

1. If SILOption.EnableSILOwnershipModel is true, then the after SILGen SIL
   verification will be performed with EnforceSILOwnershipModel set to true.
2. If SILOption.EnableSILOwnershipModel is true, then the pass manager will run
   the OwnershipModelEliminator pass right after SILGen before the normal pass
   pipeline starts.
3. SILGen will be changed to emit non-unqualified ownership qualifiers on load,
   store instructions when the EnableSILOwnershipModel flag is set. We will use
   the verifier throwing to guarantee that we are not missing any specific
   cases.

Then once all of the bots are green, we change SILOption.EnableSILOwnershipModel
to be true by default. After a cooling off period, we move all of the code
behind the SILOwnershipModel flag in front of the flag. We do this so we can
reuse that flag for further SILOwnershipModel changes.

## Optimizer Changes

Since the SILOwnershipModel eliminator will eliminate the ownership qualifiers
on load, store instructions right after ownership verification, there will be no
immediate effects on the optimizer and thus the optimizer changes can be done in
parallel with the rest of the ARC optimization work.

But, in the long run, we want to enforce these ownership invariants all
throughout the SIL pipeline implying these ownership qualified `load`, `store`
instructions must be processed by IRGen, not eliminated by the SILOwnershipModel
eliminator. Thus we will need to update passes to handle these new instructions
and also will need to implement the `load_borrow`, `end_borrow` instruction.

The main optimizer changes can be separated into the following areas: memory
forwarding, dead stores, ARC optimization. In all of these cases, the necessary
changes are relatively trivial to respond to. We give a quick taste of two of
them: store->load forwarding and ARC Code Motion.

### store->load forwarding

Currently we perform store->load forwarding as follows:

    store %x to %x_ptr : $C
    ... NO SIDE EFFECTS THAT TOUCH X_PTR ...
    %y = load %x_ptr : $C
    use(%y)

      =>

    store %x to %x_ptr : $C
    ... NO SIDE EFFECTS THAT TOUCH X_PTR ...
    use(%x)

In a world, where we are using ownership qualified load, store, we have to also
consider the ownership implications. *NOTE* Since we are not modifying the
store, `store [assign]` and `store [init]` are treated the same. Thus without
any loss of generality, lets consider solely `store`.

    store %x to [assign] %x_ptr : $C
    ... NO SIDE EFFECTS THAT TOUCH X_PTR ...
    %y = load [copy] %x_ptr : $C
    use(%y)

      =>

    store %x to [assign] %x_ptr : $C
    ... NO SIDE EFFECTS THAT TOUCH X_PTR ...
    strong_retain %x
    use(%x)

### ARC Code Motion

If ARC Code Motion wishes to move the ARC semantics of ownership qualified
`load`, `store` instructions, it must now consider read/write effects. On the
other hand, we can perform more aggressive ARC code motion of ownership
qualified loads and stores in the face of deinits. This is because we no longer
need to worry about our code motion causing a deinit to fire in between (without
any loss of generality) the load/retain.

### Normal Code Motion

Normal code motion will lose some effectiveness since many of the load/store
operations that it used to be able to move now must consider ARC information. We
may need to consider running ARC code motion earlier in the pipeline where we
normally run Normal Code Motion to ensure that we are able to handle these
cases.

### ARC Optimization

The main implication for ARC optimization is that instead of eliminating just
retains, releases, it must be able to recognize ownership qualified `load`,
`store` and set their flags as appropriate. Also in general ARC optimization and
memory behavior will need to recognize the `end_borrow` instruction as a code
motion barrier.

### Function Signature Optimization

Semantic ARC affects function signature optimization in the context of the owned
to borrow optimization. Specifically:

1. A `store [assign]` must be recognized as a release of the old value that is
   being overridden. In such a case, we can move the `release` of the old value
   into the caller and change the `store [assign]` into a `store [init]`.
2. A `load [copy]` must be recognized as a retain in the callee. Then function
   signature optimization will transform the `load [copy]` into a
   `load_borrow`. This would require the addition of a new `@borrow` return
   value convention.

# Appendix

## Partial Initialization of Loadable References in SIL

In SIL, a value of non-trivial loadable type is loaded from a memory location as
follows:

    %x = load %x_ptr : $*S
    ...
    retain_value %x_ptr : $S

At first glance, this looks reasonable, but in truth there is a hidden drawback:
the partially initialized zone in between the load and the retain
operation. This zone creates a period of time when an "evil optimizer" could
insert an instruction that causes x to be deallocated before the copy is
finished being initialized. Similar issues come up when trying to perform a
store of a non-trival value into a memory location.

Since this sort of partial initialization is allowed in SIL, the optimizer is
forced to be overly conservative when attempting to move releases passed retains
lest the release triggers a deinit that destroys a value like `%x`. Lets look at
two concrete examples that show how semantically providing ownership qualified
load, store instructions eliminate this problem.

**NOTE** Without any loss of generality, we will speak of values with reference
semantics instead of non-trivial values.

## Case Study: Partial Initialization and load [copy]

### The Problem

Consider the following swift program:

    func opaque_call()

    final class C {
      var int: Int = 0
      deinit {
        opaque_call()
      }
    }

    final class D {
      var int: Int = 0
    }

    var GLOBAL_C : C? = nil
    var GLOBAL_D : D? = nil

    func useC(_ c: C)
    func useD(_ d: D)

    func run() {
        let c = C()
        GLOBAL_C = c
        let d = D()
        GLOBAL_D = d
        useC(c)
        useD(d)
    }

Notice that both `C` and `D` have fixed layouts and separate class hierarchies,
but `C`'s deinit has a call to the function `opaque_call` which may write to
`GLOBAL_D` or `GLOBAL_C`. Additionally assume that both `useC` and `useD` are
known to the compiler to not have any affects on instances of type `D`, `C`
respectively and useC assigns `nil` to `GLOBAL_C`. Now consider the following
valid SIL lowering for `run`:

    sil_global GLOBAL_D : $D
    sil_global GLOBAL_C : $C

    final class C {
      var x: Int
      deinit
    }

    final class D {
      var x: Int
    }

    sil @useC : $@convention(thin) () -> ()
    sil @useD : $@convention(thin) () -> ()

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      %c2 = load %global_c : $*C (3)
      strong_retain %c2 : $C (4)
      %d2 = load %global_d : $*D (5)
      strong_retain %d2 : $D (6)

      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c2) : $@convention(thin) (@owned C) -> () (7)

      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %d : $D (9)
      strong_release %c : $C (10)
    }

Lets optimize this function! First we perform the following operations:

1. Since `(2)` is storing to an identified object that can not be `GLOBAL_C`, we
   can store to load forward `(1)` to `(3)`.
2. Since a retain does not block store to load forwarding, we can forward `(2)`
   to `(5)`. But lets for the sake of argument, assume that the optimizer keeps
   such information as an analysis and does not perform the actual load->store
   forwarding.
3. Even though we do not foward `(2)` to `(5)`, we can still move `(4)` over
   `(6)` so that `(4)` is right before `(7)`.

This yields (using the ' marker to designate that a register has had load-store
forwarding applied to it):

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      strong_retain %c : $C (4')
      %d2 = load %global_d : $*D (5)
      strong_retain %d2 : $D (6)

      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')

      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %d : $D (9)
      strong_release %c : $C (10)
    }

Then by assumption, we know that `%useC` does not perform any releases of any
instances of class `D`. Thus `(6)` can be moved past `(7')` and we can then pair
and eliminate `(6)` and `(9)` via the rules of ARC optimization using the
analysis information that `%d2` and `%d` are th same due to the possibility of
performing store->load forwarding. After performing such transformations, `run`
looks as follows:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D

      %d2 = load %global_d : $*D (5)
      strong_retain %c : $C (4')
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')

      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %c : $C (10)
    }

Now by assumption, we know that `%useD_func` does not touch any instances of
class `C` and `%c` does not contain any ivars of type `D` and is final so none
can be added. At first glance, this seems to suggest that we can move `(10)`
before `(8')` and then pair/eliminate `(4')` and `(10)`. But is this a safe
optimization perform? Absolutely Not! Why? Remember that since `useC_func`
assigns `nil` to `GLOBAL_C`, after `(7')`, `%c` could have a reference count
of 1. Thus `(10)` _may_ invoke the destructor of `C`. Since this destructor
calls an opaque function that _could_ potentially write to `GLOBAL_D`, we may be
be passing `%d2`, an already deallocated object to `%useD_func`, an illegal
optimization!

Lets think a bit more about this example and consider this example at the
language level. Remember that while Swift's deinit are not asychronous, we do
not allow for user level code to create dependencies from the body of the
destructor into the normal control flow that has called it. This means that
there are two valid results of this code:

- Operation Sequence 1: No optimization is performed and `%d2` is passed to
  `%useD_func`.
- Operation Sequence 2: We shorten the lifetime of `%c` before `%useD_func` and
   a different instance of `$D` is passed into `%useD_func`.

The fact that 1 occurs without optimization is just as a result of an
implementation detail of SILGen. 2 is also a valid sequence of operations.

Given that:

1. As a principle, the optimizer does not consider such dependencies to avoid
   being overly conservative.
2. We provide constructs to ensure appropriate lifetimes via the usage of
   constructs such as fix_lifetime.

We need to figure out how to express our optimization such that 2
happens. Remember that one of the optimizations that we performed at the
beginning was to move `(6)` over `(7')`, i.e., transform this:

      %d = alloc_ref $D
      %global_d_addr = global_addr GLOBAL_D : $D
      %d = load %global_d_addr : $*D (5)
      strong_retain %d : $D (6)

      // Call the user functions passing in the instances that we loaded from the globals.
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')

into:

      %global_d_addr = global_addr GLOBAL_D : $D
      %d2 = load %global_d_addr : $*D (5)

      // Call the user functions passing in the instances that we loaded from the globals.
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')
      strong_retain %d2 : $D (6)

This transformation in Swift corresponds to transforming:

      let d = GLOBAL_D
      useC(c)

to:

      let d_raw = load_d_value(GLOBAL_D)
      useC(c)
      let d = take_ownership_of_d(d_raw)

This is clearly an instance where we have moved a side-effect in between the
loading of the data and the taking ownership of such data, that is before the
`let` is fully initialized. What if instead of just moving the retain, we moved
the entire let statement? This would then result in the following swift code:

      useC(c)
      let d = GLOBAL_D

and would correspond to the following SIL snippet:

      %global_d_addr = global_addr GLOBAL_D : $D

      // Call the user functions passing in the instances that we loaded from the globals.
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')
      %d2 = load %global_d_addr : $*D (5)
      strong_retain %d2 : $D (6)

Moving the load with the strong_retain to ensure that the full initialization is
performed even after code motion causes our SIL to look as follows:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D

      strong_retain %c : $C (4')
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')

      %d2 = load %global_d : $*D (5)
      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %c : $C (10)
    }

Giving us the exact result that we want: Operation Sequence 2!

### Defining load [copy]

Given that we wish the load, store to be tightly coupled together, it is natural
to express this operation as a `load [copy]` instruction. Lets define the `load
[copy]` instruction as follows:

    %1 = load [copy] %0 : $*C

      =>

    %1 = load %0 : $*C
    retain_value %1 : $C

Now lets transform our initial example to use this instruction:

Notice how now if we move `(7)` over `(3)` and `(6)` now, we get the following SIL:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      %c2 = load [copy] %global_c : $*C (3)
      %d2 = load [copy] %global_d : $*D (5)

      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c2) : $@convention(thin) (@owned C) -> () (7)

      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %d : $D (9)
      strong_release %c : $C (10)
    }

We then perform the previous code motion:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      %c2 = load [copy] %global_c : $*C (3)
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c2) : $@convention(thin) (@owned C) -> () (7)
      strong_release %d : $D (9)

      %d2 = load [copy] %global_d : $*D (5)
      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)
      strong_release %c : $C (10)
    }

We then would like to eliminate `(9)` and `(10)` by pairing them with `(3)` and
`(8)`. Can we still do so? One way we could do this is by introducing the
`[take]` flag. The `[take]` flag on a `load [take]` says that one is
semantically loading a value from a memory location and are taking ownership of
the value thus eliding the retain. In terms of SIL this flag is defined as:

    %x = load [take] %x_ptr : $*C

      =>

    %x = load %x_ptr : $*C

Why do we care about having such a `load [take]` instruction when we could just
use a `load`? The reason why is that a normal `load` has unsafe unowned
ownership (i.e. it has no implications on ownership). We would like for memory
that has non-trivial type to only be able to be loaded via instructions that
maintain said ownership. We will allow for casting to trivial types as usual to
provide such access if it is required.

Thus we have achieved the desired result:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      %c2 = load [take] %global_c : $*C (3)
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c2) : $@convention(thin) (@owned C) -> () (7)

      %d2 = load [take] %global_d : $*D (5)
      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)
    }

----

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

Attached below is an updated version of the proposal. Again a rendered version is located at:

https://gottesmm.github.io/proposals/high-level-arc-memory-operations.html

Michael

----

# Summary

This document proposes:

1. adding the following ownership qualifiers to `load`: `[take]`, `[copy]`,
   `[borrow]`, `[trivial]`.
2. adding the following ownership qualifiers to `store`: `[init]`, `[assign]`,
   `[trivial]`.
3. requiring all `load` and `store` operations to have ownership qualifiers.
4. banning the use of `load [trivial]`, `store [trivial]` on memory locations of
   `non-trivial` type.

This will allow for:

1. eliminating optimizer miscompiles that occur due to releases being moved into
   the region in between a `load`/`retain`, `load`/`release`,
   `store`/`release`. (For a specific example, see the appendix).
2. explicitly modeling `load [trivial]`/`store [trivial]` as having `unsafe
   unowned` ownership semantics. This will be enforced via the verifier.
3. more aggressive ARC code motion.

# Definitions

## ownership qualified load

We propose four different ownership qualifiers for load. Define `load [trivial]`
as:

    %x = load [trivial] %x_ptr : $*Int

      =>

    %x = load %x_ptr : $*Int

A `load [trivial]` can not be used to load values of non-trivial type.

Should we mandate the reverse as well, that e.g. load [copy] cannot be used to
load values of trivial type? That's a little more work for substituting cloners, but it
keeps everything nice and canonical.

No. I think that in the trivial case, load [copy] optimizes to load [trivial] as a canonicalization. This is just like applying a retain_value to a trivial type.

I guess I'm fine with that.

Define
`load [copy]` as:

    %x = load [copy] %x_ptr : $*C

      =>

    %x = load %x_ptr : $*C
    retain_value %x : $C

Then define `load [take]` as:

    %x = load [take] %x_ptr : $*Builtin.NativeObject

      =>

    %x = load %x_ptr : $*Builtin.NativeObject

**NOTE** `load [take]` implies that the loaded from memory location no longer
owns the result object (i.e. a take is a move). Loading from the memory location
again without reinitialization is illegal.

Next we provide `load [borrow]`:

    %x = load [borrow] %x_ptr : $*Builtin.NativeObject
    ...
    endBorrow(%x, %x_ptr)

      =>

    %x = load %x_ptr : $*Builtin.NativeObject
    ...
    endBorrow(%x, %x_ptr)

`load [borrow]` implies that in the region between the `load` and the
`endBorrow`, the loaded object must semantically remain alive.

For consistency with other multi-word SIL instructions, this should be end_borrow.

Sure.

I wonder whether it might make more sense for load [borrow] to be a different instruction.
There's a couple reasons for that first. The first is that it's the only load which introduces
a scope, which is a really big difference structurally. The second is that it's the only load
which returns a non-owned value, which will be a typing difference when we record
ownership in the type system.

I am fine with a load_borrow. If this is the only change left that you want can I just send out a proposal with that small change and start implementing. I am nervous about perfection being the enemy of the good (and I want to start implementing this weekend if possible *evil smile*).

Yes, it's fine to introduce this incrementally.

We can discuss the formation rules on scopes concurrently. I think the same theoretical
foundation is probably what we'll use for pseudo-linearity, memory-initialization soundness,
and so on.

John.

···

On Oct 7, 2016, at 6:04 PM, Michael Gottesman <mgottesman@apple.com> wrote:

On Oct 7, 2016, at 5:09 PM, John McCall <rjmccall@apple.com <mailto:rjmccall@apple.com>> wrote:

On Oct 7, 2016, at 2:38 PM, Michael Gottesman <mgottesman@apple.com <mailto:mgottesman@apple.com>> wrote:

Anyway, I really like that load [borrow] is scoped.. Are you planning to include the formation
restrictions on scopes instructions immediately, or will you do that in a later proposal?

It will be done in a later proposal. We are just trying to set the stage for verification.

The requirements we need from scopes are:
  - there has to be a well-defined set of IPs that lie within the scope and
  - there can't be a non-explicit transition into or out of the scope.

One way to get the first condition is to say that there has to be a unique scope-ender that
post-dominates the scope-beginner, but that's a pretty hard restriction for SILGen to satisfy
(as well as the optimizer, I imagine), and it doesn't handle "unreachable" or infinite loops.
We need to allow multiple scope-enders, and we need to allow scope-enders to be missing
in some cases.

I agree with you here. We definitely want to be able to support multiple scope-enders.

Here's the right formalism, I think:

For all walks W beginning from the entry point of the function:
  For each node B in the CFG which is a scope-beginner:
    Let E be the set of scope-enders for B, and consider just the sub-sequence S of nodes
    of W where the node is a member of {B} U E. Then the elements of S at even
    positions (starting from 0) must be B, and the elements at odd positions must be
    members of E. Furthermore, if the walk ends in a return or throw instruction, then
    S must have even length.

Note that for this to be true, all the scope-enders must be dominated by the scope-beginner.

It is sufficient to just consider the set of "beeline" paths, i.e. acyclic paths ending in either a true
terminator (a return, throw, or unreachable) or an edge back to a node already in the path.
No such path may include multiple scope-enders for the same scope-beginner. If the path ends
in a return or throw, it must include a matching scope-ender after every scope-beginner. If
it ends in a loop back, then for every scope-beginner in the path, if the path contains a scope-ender
then the loop destination must either precede the scope-beginner or follow the scope-ender;
otherwise, the loop destination must follow the scope-beginner.

Or, as a decision algorithm in Swift for a single scope-beginner:

  var blockEntryIsInScope = [Block: Bool]()
  func scan(startingFrom inst: Instruction, isInScope: Bool) {
    if inst is ReturnInst || inst is ThrowInst {
      guard !isInScope else { invalid("ended function while in scope") }
      return
    }

    if let term = inst as? TerminatorInst {
      for successor in term.successors {
        guard begin.dominates(successor) else {
          guard !isInScope else { invalid("branch out of scope while in scope") }
          continue
        }
        if let cachedValue = blockEntryIsInScope[successor] {
          if cachedValue != isInScope {
            invalid(isInScope ? "branch out of scope while in scope" : "branch into scope after exiting scope")
          }
        } else {
          blockEntryIsInScope[successor] = isInScope
          scan(startingFrom: successor.begin, isInScope: isInScope)
        }
      }
      return
    }

    if inst.endsScopeOf(begin) {
      guard isInScope else { invalid("ending scope that was already ended") }
      scan(startingFrom: inst.next, isInScope: false)
    } else {
      scan(startingFrom: inst.next, isInScope: isInScope)
    }
  }
  scan(startingFrom: begin, isInScope: true)

Since this is tangential to the current proposal, can we introduce a side thread?

John.

The `endBorrow` communicates to the optimizer:

1. That the value in `%x_ptr` should not be destroyed before endBorrow.
2. Uses of `%x` should not be sunk past endBorrow since `%x` is only a shallow
   copy of the value in `%x_ptr` and past that point `%x_ptr` may not remain
   alive.

An example of where this construct is useful is when one has a let binding to a
class instance `c` that contains a let field `f`. In that case `c`'s lifetime
guarantees `f`'s lifetime meaning that returning `f` over the function call
boundary is safe.

## ownership qualified store

First define a `store [trivial]` as:

    store %x to [trivial] %x_ptr : $*Int

      =>

    store %x to %x_ptr : $*Int

The verifier will prevent this instruction from being used on types with
non-trivial ownership. Define a `store [assign]` as follows:

    store %x to [assign] %x_ptr : $*C

       =>

    %old_x = load %x_ptr : $*C
    store %new_x to %x_ptr : $*C
    release_value %old_x : $C

*NOTE* `store` is defined as a consuming operation. We also provide
`store [init]` in the case where we know statically that there is no
previous value in the memory location:

    store %x to [init] %x_ptr : $*C

       =>

    store %new_x to %x_ptr : $*C

# Implementation

## Goals

Our implementation strategy goals are:

1. zero impact on other compiler developers until the feature is fully
   developed. This implies all work will be done behind a flag.
2. separation of feature implementation from updating passes.

Goal 2 will be implemented via a pass that transforms ownership qualified
`load`/`store` instructions into unqualified `load`/`store` right after SILGen.

## Plan

We begin by adding initial infrastructure for our development. This means:

1. Adding to SILOptions a disabled by default flag called
"EnableSILOwnershipModel". This flag will be set by a false by default frontend
option called "-enable-sil-ownership-mode".

2. Bots will be brought up to test the compiler with
   "-enable-sil-ownership-model" set to true. The specific bots are:

   * RA-OSX+simulators
   * RA-Device
   * RA-Linux.

   The bots will run once a day until the feature is close to completion. Then a
   polling model will be followed.

Now that change isolation is borrow, we develop building blocks for the
optimization:

1. Two enums will be defined: `LoadInstOwnershipQualifier`,
   `StoreInstOwnershipQualifier`. The exact definition of these enums are as
   follows:

       enum class LoadOwnershipQualifier {
         Unqualified, Take, Copy, Borrow, Trivial
       };
       enum class StoreOwnershipQualifier {
         Unqualified, Init, Assign, Trivial
       };

    *NOTE* `LoadOwnershipQualifier::Unqualified` and
    `StoreOwnershipQualifier::Unqualified` are only needed for staging purposes.

2. Creating a `LoadInst`, `StoreInst` will be changed to require an ownership
qualifier. At this stage, this argument will default to `Unqualified`. "Bare"
`load`, `store` when parsed via textual SIL will be considered to be
unqualified. This implies that the rest of the compiler will not have to be
changed as a result of this step.

3. Support will be added to SIL, IRGen, Serialization, SILPrinting, and SIL
Parsing for the rest of the qualifiers. SILGen will not be modified at this
stage.

4. A pass called the "OwnershipModelEliminator" will be implemented. It will
   blow up all `load`, `store` instructions with non `*::Unqualified` ownership
   into their constituant ARC operations and `*::Unqualified` `load`, `store`
   insts.

3. An option called "EnforceSILOwnershipMode" will be added to the verifier. If
the option is set, the verifier will assert if:

   a. `load`, `store` operations with trivial ownership are applied to memory
      locations with non-trivial type.

   b. `load`, `store` operation with unqualified ownership type are present in
   the IR.

Finally, we wire up the building blocks:

1. If SILOption.EnableSILOwnershipModel is true, then the after SILGen SIL
   verification will be performed with EnforceSILOwnershipModel set to true.
2. If SILOption.EnableSILOwnershipModel is true, then the pass manager will run
   the OwnershipModelEliminator pass right after SILGen before the normal pass
   pipeline starts.
3. SILGen will be changed to emit non-unqualified ownership qualifiers on load,
   store instructions when the EnableSILOwnershipModel flag is set. We will use
   the verifier throwing to guarantee that we are not missing any specific
   cases.

Then once all of the bots are green, we change SILOption.EnableSILOwnershipModel
to be true by default. After a cooling off period, we move all of the code
behind the SILOwnershipModel flag in front of the flag. We do this so we can
reuse that flag for further SILOwnershipModel changes.

## Optimizer Changes

Since the SILOwnershipModel eliminator will eliminate the ownership qualifiers
on load, store instructions right after ownership verification, there will be no
immediate affects on the optimizer and thus the optimizer changes can be done in
parallel with the rest of the ARC optimization work.

But, in the long run, we want to enforce these ownership invariants all
throughout the SIL pipeline implying these ownership qualified `load`, `store`
instructions must be processed by IRGen, not eliminated by the SILOwnershipModel
eliminator. Thus we will need to update passes to handle these new
instructions. The main optimizer changes can be separated into the following
areas: memory forwarding, dead stores, ARC optimization. In all of these cases,
the necessary changes are relatively trivial to respond to. We give a quick
taste of two of them: store->load forwarding and ARC Code Motion.

### store->load forwarding

Currently we perform store->load forwarding as follows:

    store %x to %x_ptr : $C
    ... NO SIDE EFFECTS THAT TOUCH X_PTR ...
    %y = load %x_ptr : $C
    use(%y)

      =>

    store %x to %x_ptr : $C
    ... NO SIDE EFFECTS THAT TOUCH X_PTR ...
    use(%x)

In a world, where we are using ownership qualified load, store, we have to also
consider the ownership implications. *NOTE* Since we are not modifying the
store, `store [assign]` and `store [init]` are treated the same. Thus without
any loss of generality, lets consider solely `store`.

    store %x to [assign] %x_ptr : $C
    ... NO SIDE EFFECTS THAT TOUCH X_PTR ...
    %y = load [copy] %x_ptr : $C
    use(%y)

      =>

    store %x to [assign] %x_ptr : $C
    ... NO SIDE EFFECTS THAT TOUCH X_PTR ...
    strong_retain %x
    use(%x)

### ARC Code Motion

If ARC Code Motion wishes to move the ARC semantics of ownership qualified
`load`, `store` instructions, it must now consider read/write effects. On the
other hand, it will be able to now not consider the side-effects of destructors
when moving retain/release operations.

### Normal Code Motion

Normal code motion will lose some effectiveness since many of the load/store
operations that it used to be able to move now must consider ARC information. We
may need to consider running ARC code motion earlier in the pipeline where we
normally run Normal Code Motion to ensure that we are able to handle these
cases.

### ARC Optimization

The main implication for ARC optimization is that instead of eliminating just
retains, releases, it must be able to recognize ownership qualified `load`,
`store` and set their flags as appropriate.

### Function Signature Optimization

Semantic ARC affects function signature optimization in the context of the owned
to borrow optimization. Specifically:

1. A `store [assign]` must be recognized as a release of the old value that is
   being overridden. In such a case, we can move the `release` of the old value
   into the caller and change the `store [assign]` into a `store [init]`.
2. A `load [copy]` must be recognized as a retain in the callee. Then function
   signature optimization will transform the `load [copy]` into a `load
   [borrow]`. This would require the addition of a new `@borrow` return
   value convention.

# Appendix

## Partial Initialization of Loadable References in SIL

In SIL, a value of non-trivial loadable type is loaded from a memory location as
follows:

    %x = load %x_ptr : $*S
    ...
    retain_value %x_ptr : $S

At first glance, this looks reasonable, but in truth there is a hidden drawback:
the partially initialized zone in between the load and the retain
operation. This zone creates a period of time when an "evil optimizer" could
insert an instruction that causes x to be deallocated before the copy is
finished being initialized. Similar issues come up when trying to perform a
store of a non-trival value into a memory location.

Since this sort of partial initialization is allowed in SIL, the optimizer is
forced to be overly conservative when attempting to move releases passed retains
lest the release triggers a deinit that destroys a value like `%x`. Lets look at
two concrete examples that show how semantically providing ownership qualified
load, store instructions eliminate this problem.

**NOTE** Without any loss of generality, we will speak of values with reference
semantics instead of non-trivial values.

## Case Study: Partial Initialization and load [copy]

### The Problem

Consider the following swift program:

    func opaque_call()

    final class C {
      var int: Int = 0
      deinit {
        opaque_call()
      }
    }

    final class D {
      var int: Int = 0
    }

    var GLOBAL_C : C? = nil
    var GLOBAL_D : D? = nil

    func useC(_ c: C)
    func useD(_ d: D)

    func run() {
        let c = C()
        GLOBAL_C = c
        let d = D()
        GLOBAL_D = d
        useC(c)
        useD(d)
    }

Notice that both `C` and `D` have fixed layouts and separate class hierarchies,
but `C`'s deinit has a call to the function `opaque_call` which may write to
`GLOBAL_D` or `GLOBAL_C`. Additionally assume that both `useC` and `useD` are
known to the compiler to not have any affects on instances of type `D`, `C`
respectively and useC assigns `nil` to `GLOBAL_C`. Now consider the following
valid SIL lowering for `run`:

    sil_global GLOBAL_D : $D
    sil_global GLOBAL_C : $C

    final class C {
      var x: Int
      deinit
    }

    final class D {
      var x: Int
    }

    sil @useC : $@convention(thin) () -> ()
    sil @useD : $@convention(thin) () -> ()

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      %c2 = load %global_c : $*C (3)
      strong_retain %c2 : $C (4)
      %d2 = load %global_d : $*D (5)
      strong_retain %d2 : $D (6)

      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c2) : $@convention(thin) (@owned C) -> () (7)

      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %d : $D (9)
      strong_release %c : $C (10)
    }

Lets optimize this function! First we perform the following operations:

1. Since `(2)` is storing to an identified object that can not be `GLOBAL_C`, we
   can store to load forward `(1)` to `(3)`.
2. Since a retain does not block store to load forwarding, we can forward `(2)`
   to `(5)`. But lets for the sake of argument, assume that the optimizer keeps
   such information as an analysis and does not perform the actual load->store
   forwarding.
3. Even though we do not foward `(2)` to `(5)`, we can still move `(4)` over
   `(6)` so that `(4)` is right before `(7)`.

This yields (using the ' marker to designate that a register has had load-store
forwarding applied to it):

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      strong_retain %c : $C (4')
      %d2 = load %global_d : $*D (5)
      strong_retain %d2 : $D (6)

      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')

      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %d : $D (9)
      strong_release %c : $C (10)
    }

Then by assumption, we know that `%useC` does not perform any releases of any
instances of class `D`. Thus `(6)` can be moved past `(7')` and we can then pair
and eliminate `(6)` and `(9)` via the rules of ARC optimization using the
analysis information that `%d2` and `%d` are th same due to the possibility of
performing store->load forwarding. After performing such transformations, `run`
looks as follows:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D

      %d2 = load %global_d : $*D (5)
      strong_retain %c : $C (4')
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')

      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %c : $C (10)
    }

Now by assumption, we know that `%useD_func` does not touch any instances of
class `C` and `%c` does not contain any ivars of type `D` and is final so none
can be added. At first glance, this seems to suggest that we can move `(10)`
before `(8')` and then pair/eliminate `(4')` and `(10)`. But is this a safe
optimization perform? Absolutely Not! Why? Remember that since `useC_func`
assigns `nil` to `GLOBAL_C`, after `(7')`, `%c` could have a reference count
of 1. Thus `(10)` _may_ invoke the destructor of `C`. Since this destructor
calls an opaque function that _could_ potentially write to `GLOBAL_D`, we may be
be passing `%d2`, an already deallocated object to `%useD_func`, an illegal
optimization!

Lets think a bit more about this example and consider this example at the
language level. Remember that while Swift's deinit are not asychronous, we do
not allow for user level code to create dependencies from the body of the
destructor into the normal control flow that has called it. This means that
there are two valid results of this code:

- Operation Sequence 1: No optimization is performed and `%d2` is passed to
  `%useD_func`.
- Operation Sequence 2: We shorten the lifetime of `%c` before `%useD_func` and
   a different instance of `$D` is passed into `%useD_func`.

The fact that 1 occurs without optimization is just as a result of an
implementation detail of SILGen. 2 is also a valid sequence of operations.

Given that:

1. As a principle, the optimizer does not consider such dependencies to avoid
   being overly conservative.
2. We provide constructs to ensure appropriate lifetimes via the usage of
   constructs such as fix_lifetime.

We need to figure out how to express our optimization such that 2
happens. Remember that one of the optimizations that we performed at the
beginning was to move `(6)` over `(7')`, i.e., transform this:

      %d = alloc_ref $D
      %global_d_addr = global_addr GLOBAL_D : $D
      %d = load %global_d_addr : $*D (5)
      strong_retain %d : $D (6)

      // Call the user functions passing in the instances that we loaded from the globals.
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')

into:

      %global_d_addr = global_addr GLOBAL_D : $D
      %d2 = load %global_d_addr : $*D (5)

      // Call the user functions passing in the instances that we loaded from the globals.
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')
      strong_retain %d2 : $D (6)

This transformation in Swift corresponds to transforming:

      let d = GLOBAL_D
      useC(c)

to:

      let d_raw = load_d_value(GLOBAL_D)
      useC(c)
      let d = take_ownership_of_d(d_raw)

This is clearly an instance where we have moved a side-effect in between the
loading of the data and the taking ownership of such data, that is before the
`let` is fully initialized. What if instead of just moving the retain, we moved
the entire let statement? This would then result in the following swift code:

      useC(c)
      let d = GLOBAL_D

and would correspond to the following SIL snippet:

      %global_d_addr = global_addr GLOBAL_D : $D

      // Call the user functions passing in the instances that we loaded from the globals.
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')
      %d2 = load %global_d_addr : $*D (5)
      strong_retain %d2 : $D (6)

Moving the load with the strong_retain to ensure that the full initialization is
performed even after code motion causes our SIL to look as follows:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D

      strong_retain %c : $C (4')
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c) : $@convention(thin) (@owned C) -> () (7')

      %d2 = load %global_d : $*D (5)
      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %c : $C (10)
    }

Giving us the exact result that we want: Operation Sequence 2!

### Defining load [copy]

Given that we wish the load, store to be tightly coupled together, it is natural
to express this operation as a `load [copy]` instruction. Lets define the `load
[copy]` instruction as follows:

    %1 = load [copy] %0 : $*C

      =>

    %1 = load %0 : $*C
    retain_value %1 : $C

Now lets transform our initial example to use this instruction:

Notice how now if we move `(7)` over `(3)` and `(6)` now, we get the following SIL:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      %c2 = load [copy] %global_c : $*C (3)
      %d2 = load [copy] %global_d : $*D (5)

      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c2) : $@convention(thin) (@owned C) -> () (7)

      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)

      strong_release %d : $D (9)
      strong_release %c : $C (10)
    }

We then perform the previous code motion:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      %c2 = load [copy] %global_c : $*C (3)
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c2) : $@convention(thin) (@owned C) -> () (7)
      strong_release %d : $D (9)

      %d2 = load [copy] %global_d : $*D (5)
      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)
      strong_release %c : $C (10)
    }

We then would like to eliminate `(9)` and `(10)` by pairing them with `(3)` and
`(8)`. Can we still do so? One way we could do this is by introducing the
`[take]` flag. The `[take]` flag on a `load [take]` says that one is
semantically loading a value from a memory location and are taking ownership of
the value thus eliding the retain. In terms of SIL this flag is defined as:

    %x = load [take] %x_ptr : $*C

      =>

    %x = load %x_ptr : $*C

Why do we care about having such a `load [take]` instruction when we could just
use a `load`? The reason why is that a normal `load` has unsafe unowned
ownership (i.e. it has no implications on ownership). We would like for memory
that has non-trivial type to only be able to be loaded via instructions that
maintain said ownership. We will allow for casting to trivial types as usual to
provide such access if it is required.

Thus we have achieved the desired result:

    sil @run : $@convention(thin) () -> () {
    bb0:
      %c = alloc_ref $C
      %global_c = global_addr @GLOBAL_C : $*C
      strong_retain %c : $C
      store %c to %global_c : $*C (1)

      %d = alloc_ref $D
      %global_d = global_addr @GLOBAL_D : $*D
      strong_retain %d : $D
      store %d to %global_d : $*D (2)

      %c2 = load [take] %global_c : $*C (3)
      %useC_func = function_ref @useC : $@convention(thin) (@owned C) -> ()
      apply %useC_func(%c2) : $@convention(thin) (@owned C) -> () (7)

      %d2 = load [take] %global_d : $*D (5)
      %useD_func = function_ref @useD : $@convention(thin) (@owned D) -> ()
      apply %useD_func(%d2) : $@convention(thin) (@owned D) -> () (8)
    }

On Oct 6, 2016, at 3:03 PM, John McCall <rjmccall@apple.com <mailto:rjmccall@apple.com>> wrote:

On Oct 5, 2016, at 4:48 PM, Michael Gottesman <mgottesman@apple.com <mailto:mgottesman@apple.com>> wrote:

On Oct 5, 2016, at 4:40 PM, Michael Gottesman via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

On Oct 4, 2016, at 1:04 PM, John McCall <rjmccall@apple.com <mailto:rjmccall@apple.com>> wrote:

On Sep 30, 2016, at 11:54 PM, Michael Gottesman via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

The document attached below contains the first "Semantic ARC" mini proposal: the High Level ARC Memory Operations Proposal.

An html rendered version of this markdown document is available at the following URL:

https://gottesmm.github.io/proposals/high-level-arc-memory-operations.html

----

# Summary

This document proposes:

1. adding the `load_strong`, `store_strong` instructions to SIL. These can only
   be used with memory locations of `non-trivial` type.

I would really like to avoid using the word "strong" here. Under the current proposal, these instructions will be usable with arbitrary non-trivial types, not just primitive class references. Even if you think of an aggregate that happens to contain one or more strong references as some sort of aggregate strong reference (which is questionable but not completely absurd), we already have loadable non-strong class references that this operation would be usable with, like native unowned references. "load_strong %0 : $*@sil_unowned T" as an operation yielding a scalar "@sil_unowned T" is ridiculous, and it will only get more ridiculous when we eventually allow this operation to work with types that are currently address-only, like weak references.

Brainstorming:

Something like load_copy and store_copy would be a bit unfortunate, since store_copy doesn't actually copy the source operand and we want to have a load_copy [take].

load_value and store_value seem excessively generic. It's not like non-trivial types aren't values.

One question that comes to mind: do we actually need new instructions here other than for staging purposes? We don't actually need new instructions for pseudo-linear SIL to work; we just need to say that we only enforce pseudo-linearity for non-trivial types.

If we just want the instruction to be explicit about ownership so that we can easily distinguish these cases, we can make the rule always explicit, e.g.:
  load [take] %0 : $*MyClass
  load [copy] %0 : $*MyClass
  load [trivial] %0 : $*Int

  store %0 to [initialization] %1 : $*MyClass
  store %0 to [assignment] %1 : $*MyClass
  store %0 to [trivial] %1 : $*Int

John.

The reason why I originally suggested to go the load_strong route is that we already have load_weak, load_unowned instructions. If I could add a load_strong instruction, then it would make sense to assign an engineer to do a pass over all 3 of these instructions and combine them into 1 load instruction. That is, first transform into a form amenable for canonicalization and then canonicalize all at once.

As you pointed out, both load_unowned and load_weak involve representation changes in type (for instance the change of weak pointers to Optional<T>). Such a change would be against the "spirit" of a load instruction to perform such representation changes versus ownership changes.

In terms of the properties that we actually want here, what is important is that we can verify that no non-trivially typed values are loaded in an unsafe unowned manner. That can be done also with ownership flags on load/store.

Does this sound reasonable:

1. We introduce two enums that define memory ownership changes, one for load and one for store. Both of these enums will contain a [trivial] ownership.
2. We enforce in the verifier that non-trivial types must have a non-trivial ownership modifier on any memory operations that they are involved in.

Sorry for not being explicit. I will not add new instructions, just modifiers. Assuming that this is agreeable to you, I am going to prepare a quick additional version of the proposal document.

That sounds great, thanks.

John.

There’s a lot in the proposal that makes sense to discuss for completeness but isn’t motivated by a particular need. Please separate functionality. We only need load [copy] at first right? When do those need to be promoted to load_borrow? load [trivial] is an optimization, so that should follow a functionally complete implementation. load [take] should definitely not exist until there’s some motivation.

-Andy

···

On Oct 7, 2016, at 6:04 PM, Michael Gottesman via swift-dev <swift-dev@swift.org> wrote:

I wonder whether it might make more sense for load [borrow] to be a different instruction.
There's a couple reasons for that first. The first is that it's the only load which introduces
a scope, which is a really big difference structurally. The second is that it's the only load
which returns a non-owned value, which will be a typing difference when we record
ownership in the type system.

I am fine with a load_borrow. If this is the only change left that you want can I just send out a proposal with that small change and start implementing. I am nervous about perfection being the enemy of the good (and I want to start implementing this weekend if possible *evil smile*).

>If ARC Code Motion wishes to move the ARC semantics of ownership qualified load, store instructions, it must now consider read/write effects. On the other hand, it will be able to now not consider the side-effects of destructors when moving retain/release operations.

Can you elaborate on what you mean by this? Does this mean the ARC optimizer has additional freedom somehow to ignore side effects in deinits? What grants that ability?

Sorry for not being clear.

deinits in ARC are not sequenced with respect to memory operations in normal control flow, but if ARC performs any code motion, we must ensure that memory safety is respected. Such semantics are not new.

The main change here is that previously there were situations where we would split up the load/retain in a load [copy] operation. Then if the right things occured, we could cause the object to be deallocated before we use the value or took full ownership of the value.

Ah, I see what you mean. Got it.

In this case, we would be passing off to useD a deallocated instance of class D1 which would be undefined behavior.

So as long as for these load [copy] operations, we move the load/retain fused together, we can guarantee that an object is produced instantaneously in a safe way without any worry.

Was this helpful?

Yep, totally. The new approach is much more clear, thanks Michael!

-Chris

···

On Oct 11, 2016, at 10:10 AM, Michael Gottesman <mgottesman@apple.com> wrote:

I wonder whether it might make more sense for load [borrow] to be a different instruction.
There's a couple reasons for that first. The first is that it's the only load which introduces
a scope, which is a really big difference structurally. The second is that it's the only load
which returns a non-owned value, which will be a typing difference when we record
ownership in the type system.

I am fine with a load_borrow. If this is the only change left that you want can I just send out a proposal with that small change and start implementing. I am nervous about perfection being the enemy of the good (and I want to start implementing this weekend if possible *evil smile*).

There’s a lot in the proposal that makes sense to discuss for completeness but isn’t motivated by a particular need. Please separate functionality. We only need load [copy] at first right? When do those need to be promoted to load_borrow?

These are needed for the ARC optimizer to eliminate retain, release operations, i.e. a:

%0 = load [copy] %x_ptr

destroy_value %1

=>

%0 = load [borrow] %x_ptr

borrow_end(%0, %x_ptr)

These constructs will be needed by engineers to update passes like ARC. By implementing such modifiers now, we can begin to implement support in the various passes for these new instructions via sil-opt/etc in parallel to other semantic ARC work.

load [trivial] is an optimization, so that should follow a functionally complete implementation.

Yes you are correct that given that we are exploding the load [copy] in the eliminator, the trivial load is not *strictly* needed. But as soon as we start upgrading passes, we are going to want this. Again assuming that parallel work can be done, it makes sense to set the stage for optimizer work that will occur in parallel to further semantic ARC work.

load [take] should definitely not exist until there’s some motivation.

If you look at the frontend, there are places where the frontend wants to emit a take. Unless we are willing to use unqualified loads for those cases (which we can not if we are trying to prove that no unqualified loads are emitted by the frontend), then we must have a load [take].

Did I provide the motivation that you requested?

Michael

···

On Oct 7, 2016, at 9:25 PM, Andrew Trick <atrick@apple.com> wrote:

On Oct 7, 2016, at 6:04 PM, Michael Gottesman via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

-Andy

Yes. My general request is for each commit to be easy to review and the functionality obvious to test. I’m convinced we’ll eventually want the variants. Although I still want to understand better when we need to [take] values out of memory.

I also want to prove that my understanding of the model is accurate by seeing everything work with load [copy].

-Andy

···

On Oct 7, 2016, at 10:08 PM, Michael Gottesman <mgottesman@apple.com> wrote:

On Oct 7, 2016, at 9:25 PM, Andrew Trick <atrick@apple.com <mailto:atrick@apple.com>> wrote:

On Oct 7, 2016, at 6:04 PM, Michael Gottesman via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

I wonder whether it might make more sense for load [borrow] to be a different instruction.
There's a couple reasons for that first. The first is that it's the only load which introduces
a scope, which is a really big difference structurally. The second is that it's the only load
which returns a non-owned value, which will be a typing difference when we record
ownership in the type system.

I am fine with a load_borrow. If this is the only change left that you want can I just send out a proposal with that small change and start implementing. I am nervous about perfection being the enemy of the good (and I want to start implementing this weekend if possible *evil smile*).

There’s a lot in the proposal that makes sense to discuss for completeness but isn’t motivated by a particular need. Please separate functionality. We only need load [copy] at first right? When do those need to be promoted to load_borrow?

These are needed for the ARC optimizer to eliminate retain, release operations, i.e. a:

%0 = load [copy] %x_ptr

destroy_value %1

=>

%0 = load [borrow] %x_ptr

borrow_end(%0, %x_ptr)

These constructs will be needed by engineers to update passes like ARC. By implementing such modifiers now, we can begin to implement support in the various passes for these new instructions via sil-opt/etc in parallel to other semantic ARC work.

load [trivial] is an optimization, so that should follow a functionally complete implementation.

Yes you are correct that given that we are exploding the load [copy] in the eliminator, the trivial load is not *strictly* needed. But as soon as we start upgrading passes, we are going to want this. Again assuming that parallel work can be done, it makes sense to set the stage for optimizer work that will occur in parallel to further semantic ARC work.

load [take] should definitely not exist until there’s some motivation.

If you look at the frontend, there are places where the frontend wants to emit a take. Unless we are willing to use unqualified loads for those cases (which we can not if we are trying to prove that no unqualified loads are emitted by the frontend), then we must have a load [take].

Did I provide the motivation that you requested?

I wonder whether it might make more sense for load [borrow] to be a different instruction.
There's a couple reasons for that first. The first is that it's the only load which introduces
a scope, which is a really big difference structurally. The second is that it's the only load
which returns a non-owned value, which will be a typing difference when we record
ownership in the type system.

I am fine with a load_borrow. If this is the only change left that you want can I just send out a proposal with that small change and start implementing. I am nervous about perfection being the enemy of the good (and I want to start implementing this weekend if possible *evil smile*).

There’s a lot in the proposal that makes sense to discuss for completeness but isn’t motivated by a particular need. Please separate functionality. We only need load [copy] at first right? When do those need to be promoted to load_borrow?

These are needed for the ARC optimizer to eliminate retain, release operations, i.e. a:

%0 = load [copy] %x_ptr

destroy_value %1

=>

%0 = load [borrow] %x_ptr

borrow_end(%0, %x_ptr)

These constructs will be needed by engineers to update passes like ARC. By implementing such modifiers now, we can begin to implement support in the various passes for these new instructions via sil-opt/etc in parallel to other semantic ARC work.

load [trivial] is an optimization, so that should follow a functionally complete implementation.

Yes you are correct that given that we are exploding the load [copy] in the eliminator, the trivial load is not *strictly* needed. But as soon as we start upgrading passes, we are going to want this. Again assuming that parallel work can be done, it makes sense to set the stage for optimizer work that will occur in parallel to further semantic ARC work.

load [take] should definitely not exist until there’s some motivation.

If you look at the frontend, there are places where the frontend wants to emit a take. Unless we are willing to use unqualified loads for those cases (which we can not if we are trying to prove that no unqualified loads are emitted by the frontend), then we must have a load [take].

Did I provide the motivation that you requested?

Yes. My general request is for each commit to be easy to review and the functionality obvious to test. I’m convinced we’ll eventually want the variants. Although I still want to understand better when we need to [take] values out of memory.

Just as a quick example, the API for emitLoad in SILGenFunction:

  ManagedValue emitLoad(SILLocation loc, SILValue addr,
                        const TypeLowering &rvalueTL,
                        SGFContext C, IsTake_t isTake,
                        bool isGuaranteedValid = false);

Notice the IsTake_t parameter. I see that code path used in several locations in SILGenFunction.

I also want to prove that my understanding of the model is accurate by seeing everything work with load [copy].

I am fine doing everything initially with load [copy] (and when SILGen requires load [take]). The other things can wait until we need them. I just don't want to have to do another proposal at that point ; ).

···

On Oct 7, 2016, at 10:26 PM, Andrew Trick <atrick@apple.com> wrote:

On Oct 7, 2016, at 10:08 PM, Michael Gottesman <mgottesman@apple.com <mailto:mgottesman@apple.com>> wrote:

On Oct 7, 2016, at 9:25 PM, Andrew Trick <atrick@apple.com <mailto:atrick@apple.com>> wrote:

On Oct 7, 2016, at 6:04 PM, Michael Gottesman via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

-Andy

Well, you might as well *implement* load [trivial] now. It's literally going to be, like, five lines in various switch statements plus an obvious case in the verifier. You can incrementally move things over to actually start *using* load [trivial] whenever you like.

John.

···

On Oct 7, 2016, at 10:36 PM, Michael Gottesman <mgottesman@apple.com> wrote:

On Oct 7, 2016, at 10:26 PM, Andrew Trick <atrick@apple.com <mailto:atrick@apple.com>> wrote:

On Oct 7, 2016, at 10:08 PM, Michael Gottesman <mgottesman@apple.com <mailto:mgottesman@apple.com>> wrote:

On Oct 7, 2016, at 9:25 PM, Andrew Trick <atrick@apple.com <mailto:atrick@apple.com>> wrote:

On Oct 7, 2016, at 6:04 PM, Michael Gottesman via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

I wonder whether it might make more sense for load [borrow] to be a different instruction.
There's a couple reasons for that first. The first is that it's the only load which introduces
a scope, which is a really big difference structurally. The second is that it's the only load
which returns a non-owned value, which will be a typing difference when we record
ownership in the type system.

I am fine with a load_borrow. If this is the only change left that you want can I just send out a proposal with that small change and start implementing. I am nervous about perfection being the enemy of the good (and I want to start implementing this weekend if possible *evil smile*).

There’s a lot in the proposal that makes sense to discuss for completeness but isn’t motivated by a particular need. Please separate functionality. We only need load [copy] at first right? When do those need to be promoted to load_borrow?

These are needed for the ARC optimizer to eliminate retain, release operations, i.e. a:

%0 = load [copy] %x_ptr

destroy_value %1

=>

%0 = load [borrow] %x_ptr

borrow_end(%0, %x_ptr)

These constructs will be needed by engineers to update passes like ARC. By implementing such modifiers now, we can begin to implement support in the various passes for these new instructions via sil-opt/etc in parallel to other semantic ARC work.

load [trivial] is an optimization, so that should follow a functionally complete implementation.

Yes you are correct that given that we are exploding the load [copy] in the eliminator, the trivial load is not *strictly* needed. But as soon as we start upgrading passes, we are going to want this. Again assuming that parallel work can be done, it makes sense to set the stage for optimizer work that will occur in parallel to further semantic ARC work.

load [take] should definitely not exist until there’s some motivation.

If you look at the frontend, there are places where the frontend wants to emit a take. Unless we are willing to use unqualified loads for those cases (which we can not if we are trying to prove that no unqualified loads are emitted by the frontend), then we must have a load [take].

Did I provide the motivation that you requested?

Yes. My general request is for each commit to be easy to review and the functionality obvious to test. I’m convinced we’ll eventually want the variants. Although I still want to understand better when we need to [take] values out of memory.

Just as a quick example, the API for emitLoad in SILGenFunction:

  ManagedValue emitLoad(SILLocation loc, SILValue addr,
                        const TypeLowering &rvalueTL,
                        SGFContext C, IsTake_t isTake,
                        bool isGuaranteedValid = false);

Notice the IsTake_t parameter. I see that code path used in several locations in SILGenFunction.

I also want to prove that my understanding of the model is accurate by seeing everything work with load [copy].

I am fine doing everything initially with load [copy] (and when SILGen requires load [take]). The other things can wait until we need them. I just don't want to have to do another proposal at that point ; ).

I wonder whether it might make more sense for load [borrow] to be a different instruction.
There's a couple reasons for that first. The first is that it's the only load which introduces
a scope, which is a really big difference structurally. The second is that it's the only load
which returns a non-owned value, which will be a typing difference when we record
ownership in the type system.

I am fine with a load_borrow. If this is the only change left that you want can I just send out a proposal with that small change and start implementing. I am nervous about perfection being the enemy of the good (and I want to start implementing this weekend if possible *evil smile*).

There’s a lot in the proposal that makes sense to discuss for completeness but isn’t motivated by a particular need. Please separate functionality. We only need load [copy] at first right? When do those need to be promoted to load_borrow?

These are needed for the ARC optimizer to eliminate retain, release operations, i.e. a:

%0 = load [copy] %x_ptr

destroy_value %1

=>

%0 = load [borrow] %x_ptr

borrow_end(%0, %x_ptr)

These constructs will be needed by engineers to update passes like ARC. By implementing such modifiers now, we can begin to implement support in the various passes for these new instructions via sil-opt/etc in parallel to other semantic ARC work.

load [trivial] is an optimization, so that should follow a functionally complete implementation.

Yes you are correct that given that we are exploding the load [copy] in the eliminator, the trivial load is not *strictly* needed. But as soon as we start upgrading passes, we are going to want this. Again assuming that parallel work can be done, it makes sense to set the stage for optimizer work that will occur in parallel to further semantic ARC work.

load [take] should definitely not exist until there’s some motivation.

If you look at the frontend, there are places where the frontend wants to emit a take. Unless we are willing to use unqualified loads for those cases (which we can not if we are trying to prove that no unqualified loads are emitted by the frontend), then we must have a load [take].

Did I provide the motivation that you requested?

Yes. My general request is for each commit to be easy to review and the functionality obvious to test. I’m convinced we’ll eventually want the variants. Although I still want to understand better when we need to [take] values out of memory.

Just as a quick example, the API for emitLoad in SILGenFunction:

  ManagedValue emitLoad(SILLocation loc, SILValue addr,
                        const TypeLowering &rvalueTL,
                        SGFContext C, IsTake_t isTake,
                        bool isGuaranteedValid = false);

Notice the IsTake_t parameter. I see that code path used in several locations in SILGenFunction.

I guess it’s doing this to forward locals variables at their last use, avoiding a copy. Although probably not theoretically necessary, I guess it would be silly not to do this.
-Andy

···

On Oct 7, 2016, at 10:36 PM, Michael Gottesman <mgottesman@apple.com> wrote:

On Oct 7, 2016, at 10:26 PM, Andrew Trick <atrick@apple.com <mailto:atrick@apple.com>> wrote:

On Oct 7, 2016, at 10:08 PM, Michael Gottesman <mgottesman@apple.com <mailto:mgottesman@apple.com>> wrote:

On Oct 7, 2016, at 9:25 PM, Andrew Trick <atrick@apple.com <mailto:atrick@apple.com>> wrote:

On Oct 7, 2016, at 6:04 PM, Michael Gottesman via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

I also want to prove that my understanding of the model is accurate by seeing everything work with load [copy].

I am fine doing everything initially with load [copy] (and when SILGen requires load [take]). The other things can wait until we need them. I just don't want to have to do another proposal at that point ; ).

-Andy