[discussion notes] SIL address types and borrowing

On swift-dev, John already sent out a great writeup on SIL SSA:
Representing "address-only" values in SIL.

While talking to John I also picked up a lot of insight into how
address types relate to SIL ownership and borrow checking. I finally
organized the information into these notes. This is not a
proposal. It's background information for those of us writing and
reviewing proposals. Just take it as a strawman for future
discussions. (There's also a good chance I'm getting something
wrong).

[My commentary in brackets.]

** Recap of address-only.

Divide address-only types into two categories:
1. By abstraction (compiler doesn't know the size).
2. The type is "memory-linked". i.e. the address is significant at runtime.
   - weak references (anything that registers its address).
   - C++ this.
   - Anything with interior pointers.
   - Any shared-borrowed value of a type with "nonmutating" properties.
     ["nonmutating" properties allow mutation of state attached to a value.
      Rust atomics are an example.]

Address-only will not be reflected in SIL types. SIL addresses should
only be used for formal memory (pointers, globals, class
properties, captures). We'll get to inout arguments later...

As with opaque types, when IRGen lowers a memory-linked borrowed type,
it needs to allocate storage.

Concern: SILGen has built-in tracking of managed values that automates
insertion of cleanups. Lowering address-only types after SILOpt would
require rediscovering that information based on CFG analysis. Is this
too heroic?

This was already described by John. Briefly recapping:

e.g. Constructung Optional<Any>

We want initialization should be in-place as such:

%0 = struct_element_addr .. #S.any
%1 = init_existential_addr %0, $*Any, $Optional<X>
%2 = inject_enum_data_addr %1, $Optional<X>.Some
apply @initX(%2)

SILValue initialization would look something like:

%0 = apply @initX()
%1 = enum #Optional.Some, %0 : $X
%2 = existential %1 : $Any

[I'm not sure we actually want to represent an existential container
this way, but enum, yes.]

Lowering now requires discovering the storage structure, bottom-up,
hoisting allocation, inserting cleanups as John explained.

Side note: Before lowering, something like alloc_box would directly
take its initial value.

** SILFunction calling convention.

For ownership analysis, there's effectively no difference between the
value/address forms of argument ownership:

@owned / @in
@guaranteed / @in_guaranteed
return / @out
@owned arg
+ @owned return / @inout

Regardless of the representation we choose for @inout, @in/@out will
now be scalar types. SILFunction will maintain the distinction between
@owned/@in etc. based on whether the type is address-only. We need
this for reabstraction, but it only affects the function type, not the
calling convention.

Rather than building a tuple, John prefers SIL support for anonymous
aggregate as "exploded values".

[I'm guessing because tuples are a distinct formal type with their own
convention and common ownership. This may need some discussion though.]

Example SIL function type:

$(@in P, @owned Q) -> (@owned R, @owned S, @out T, @out U)

%p = apply f: $() -> P
%q = apply g: $() -> Q
%exploded = apply h(%p, %q)
%r = project_exploded %exploded, #0 : $R
%s = project_exploded %exploded, #1 : $S
%t = project_exploded %exploded, #2 : $T
%u = project_exploded %exploded, #3 : $U

Exploded types requires all their elements to be projected with their
own independent ownership.

** Ownership terminology.

Swift "owned" = Rust values = SIL @owned = implicitly consumed
Swift "borrowed" = Rust immutable borrow = SIL @guaranteed = shared
Swift "inout" = Rust mutable borrow = SIL @inout = unique

Swift "inout" syntax is already (nearly) sufficient.

"borrowed" may not need syntax on the caller side, just a way to
qualify parameters. Swift still needs syntax for returning a borrowed
value.

** Representation of borrowed values.

Borrowed values represent some shared storage location.

We want some borrowed value references to be passed as SIL values, not SIL addresses:
- Borrowed class references should not be indirected.
- Optimize borrowing other small non-memory linked types.
- Support capture promotion, and other SSA optimizations.
- Borrow CoW values directly.

[Address-only borrowed types will still be passed as SIL addresses (why not?)]

Borrowed types with potentially mutating properties must be passed by
SIL address because they are not actually immutable and their storage
location is significant.

Borrowed references have a scope and need an end-of-borrow marker.

[The end-of-borrow marker semantically changes the memory state, and
statically enforces non-overlapping memory states. It does not
semantically write-back a value. Borrowed values with mutating fields
are semantically modified in-place.]

[Regardless of whether borrowed references are represented as SIL
values or addresses, they must be associated with formal storage. That
storage must remain immutable at the language level (although it may
have mutating fields) and the value cannot be destroyed during the
borrowed scope].

[Trivial borrowed values can be demoted to copies so we can eliminate
their scope]

[Anything borrowed from global storage (and not demoted to a copy)
needs its scope to be dynamically enforced. Borrows from local storage
are sufficiently statically enforced. However, in both cases the
optimizer must respect the static scope of the borrow.]

[I think borrowed values are effectively passed @guaranteed. The
end-of-borrow scope marker will then always be at the top-level
scope. You can't borrow in a caller and end its scope in the callee.]

** Borrowed and inout scopes.

inout value references are also scoped. We'll get to their
representation shortly. Within an inout scope, memory is in an
exclusive state. No borrowed scopes may overlap with an inout state,
which is to say, memory is either shared or exclusive.

We need a flag for stored properties, even for simple trivial
types. That's the only way to provide a simple user model. At least we
don't need this to be implemented atomically, we're not detecting race
conditions. Optimizations will come later. We should be able to prove
that some stored properties are never passed as inout.

The stored property flag needs to be a tri-state: owned, borrowed, exclusive.

The memory value can only be destroyed in the owned state.

The user may mark some storage locations as "unchecked" as an
opt-out. That doesn't change the optimizer's constraints. It simply
bypasses the runtime check.

** Ownership of loaded values.

[MikeG already explained possibilities of load ownership in
[swift-dev] [semantic-arc][proposal] High Level ARC Memory Operations]

For the sake of understanding the model, it's worth realizing that we
only need one form of load ownership: load_borrow. We don't
actually need an operation that loads an owned value out of formal
storage. This makes canonical sense because:

- Semantically, a load must at least be a borrow because the storage
  location's non-exclusive flag needs to be dynamically checked
  anyway, even if the value will be copied.

- Code motion in the SIL optimizer has to obey the same limitations
  within borrow scopes regardless of whether we fuse loads and copies
  (retains).

[For the purpose of semantic ARC, the copy_value would be the RC
root. The load and copy_value would effectively be "coupled" by the
static scope of the borrow. e.g. we would not want to move a release
inside the static scope of a borrow.]

[Purely in the interest of concise SIL, I still think we want a load [copy].]

** SIL value ownership and aggregates

Operations on values:
1. copy
2. forward (move)
3. borrow (share)

A copy or forward produces an owned value.
An owned value has a single consumer.
A borrow has static scope.

For simplicity, passing a bb argument only has move semantics (it
forwards the value). Later that can be expanded if needed.

We want to allow simultaneous access to independent subelements of a
fragile aggregate. We should be able to borrow one field while
mutating another.

Is it possible to forward a subelement within an aggregate? No. But we
can fully explode an owned aggregate into individual owned elements
and reconstruct the aggregate. This makes use of the @exploded type
feature described in the calling convention.

[I don't think forwarding a subelement is useful anyway except for
modeling @inout semantics...]

That leads us to this question: Does an @inout value reference have
formal storage (thus a SIL address) or is it just a convention for
passing owned SSA values?

** World 1: SSA @inout

Projecting an element produces a new SILValue. Does this SILValue have
it's own ownership associated with it's lifetime, or is it derived
from it's parent object by looking through projections?

Either way, projecting any subelement requires reconstructing the
entire aggregate in SIL, through all nesting levels. This will
generate a massive amount of SILValues. Superficially they all need
their own storage.

[We could claim that projections don't need storage, but that only
solves one side of the problem.]

[I argue that this actually obscures the producer/consumer
relationship, which is the opposite of the intention of moving to
SSA. Projecting subelements for mutation fundamentally doesn't make
sense. It does make sense to borrow a subelement (not for
mutation). It also makes sense to project a mutable storage
location. The natural way to project a storage location is by
projecting an address...]

** World 2: @inout formal storage

In this world, @inout references continue to have SILType $*T with
guaranteed exclusive access.

Memory state can be:
- uninitialized
- holds an owned value
  - has exclusive access
  - has shared access

--- expected transitions need to be handled
  - must become uninitialized
  - must become initialized
  - must preserve initialization state

We need to mark initializers with some "must initialize" marker,
similar to how we mark deinitializers [this isn't clear to me yet].

We could give address types qualifiers to distinguish the memory state
of their pointee (uninitialized, shared, exclusive). Addresses
themselves could be pseudo-linear types. This would provide the same
use-def guarantees as the SSA @inout approach, but producing a new
address each type memory changes states would also be complicated and
cumbersome (though not as bad as SSA).

[[
We didn't talk about the alternative, but presumably exclusive
vs. shared scope would be delimited by pseudo memory operations as
such:

%a1 = alloc_stack

begin_exclusive %a
apply foo(%a) // must be marked an initializer?
end_exclusive %a

begin_shared %a
apply bar(%a) // immutable access
end_shared %a

dealloc_stack %a

Values loaded from shared memory also need to be scoped. They must be
consumed within the shared region. e.g.

%a2 = ref_element_addr

%x = load_borrow %a2

end_borrow %x, %a2

It makes sense to me that a load_borrow would implicitly transition
memory to shared state, and end_borrow would implicitly return memory
to an owned state. If the address type is already ($* @borrow T), then
memory would remain in the shared state.
]]

For all sorts of analysis and optimization, from borrow checking to
CoW to ARC, we really need aliasing guarantees. Knowing we have a
unique address to a location is about as good as having an owned
value.

To get this guarantee we need to structurally guarantee
unique addresses.

[Is there a way to do this with out making all the element_addr
operations scoped?]

With aliasing guaratees, verification should be able to statically
prove that most formal storage locations are properly initialized and
uninitialized (pseudo-linear type) by inspecting the memory
operations.

Likewise, we can verify the shared vs. exclusive states.

Representing @inout with addresses doesn't really add features to
SIL. In any case, SIL address types are still used for
formal storage. Exclusive access through any of the following
operations must be guaranteed dynamically:

- ref_element_addr
- global_addr
- pointer_to_address
- alloc_stack
- project_box

We end up with these basic SIL Types:

$T = owned value

$@borrowed T = shared value

$*T = exclusively accessed

$* @borrowed T = shared access

[I think the non-address @borrowed type is only valid for concrete
types that the compiler knows are not memory-linked? This can be used
to avoid passing borrowed values indirectly for arrays and other
small, free-to-copy values].

[We obviously need to work through concrete examples before we can
claim to have a real design.]

-Andy

On swift-dev, John already sent out a great writeup on SIL SSA:
Representing "address-only" values in SIL.

While talking to John I also picked up a lot of insight into how
address types relate to SIL ownership and borrow checking. I finally
organized the information into these notes. This is not a
proposal. It's background information for those of us writing and
reviewing proposals. Just take it as a strawman for future
discussions. (There's also a good chance I'm getting something
wrong).

Thanks for the write-up; this is great. Commentary / clarification / speculation below.

[My commentary in brackets.]

** Recap of address-only.

Divide address-only types into two categories:
1. By abstraction (compiler doesn't know the size).
2. The type is "memory-linked". i.e. the address is significant at runtime.
   - weak references (anything that registers its address).
   - C++ this.
   - Anything with interior pointers.
   - Any shared-borrowed value of a type with "nonmutating" properties.
     ["nonmutating" properties allow mutation of state attached to a value.
      Rust atomics are an example.]

Sortof. "nonmutating" on a setter in Swift today just means that the property can be
mutated even on a formally-immutable aggregate, i.e. either a shared borrow or
an r-value. Generally this implies that the aggregate has some sort of reference-like
semantics, like UnsafePointer does. Rust and C++, however, have language features
which allow you to declare a directly-stored field to be mutable even when the
aggregate is not. If we wanted to model a feature like that in Swift, we'd end up with
a field with a nonmutating setter, but the more directly important thing is that the field
itself would be "mutable" (in the C++ sense), and we would have to pass around
borrows of the aggregate as pointers to the original value in order to preserve the
basic semantics of mutating the field.

Address-only will not be reflected in SIL types. SIL addresses should
only be used for formal memory (pointers, globals, class
properties, captures). We'll get to inout arguments later...

As with opaque types, when IRGen lowers a memory-linked borrowed type,
it needs to allocate storage.

Concern: SILGen has built-in tracking of managed values that automates
insertion of cleanups. Lowering address-only types after SILOpt would
require rediscovering that information based on CFG analysis. Is this
too heroic?

This was already described by John. Briefly recapping:

e.g. Constructung Optional<Any>

We want initialization should be in-place as such:

%0 = struct_element_addr .. #S.any
%1 = init_existential_addr %0, $*Any, $Optional<X>
%2 = inject_enum_data_addr %1, $Optional<X>.Some
apply @initX(%2)

SILValue initialization would look something like:

%0 = apply @initX()
%1 = enum optional.Some, %0 : $X
%2 = existential %1 : $Any

[I'm not sure we actually want to represent an existential container
this way, but enum, yes.]

Lowering now requires discovering the storage structure, bottom-up,
hoisting allocation, inserting cleanups as John explained.

For the most part, you wouldn't have to insert cleanups. There are specific
cases where cleanup insertion would be required in order to eliminate moves
and get optimal code.

For example, if we had this code:

  ...
  let ex: Any = try createMyBigStruct()
  ...

then the natural SIL pattern might be:

  ...
  %fn = function_ref @createMyBigStruct
  try_apply %fn() normal %norm, unwind %abort
norm(%result: $MyBigStruct):
  %ex = existential %result as $Any
  ...
abort(%error: $Error):
  throw %error

The optimal IR pattern here is to evaluate the result of createMyBigStruct
directly into the allocated buffer in the existential %ex (wherever that gets
allocated). That means pre-allocating that buffer, but the SIL will only
contain code to destroy it after we reach %ex. So we basically need to turn
it into:

  ...
  %ex = alloc_stack $Any
  %ex_buffer = init_existential %ex_memory, $MyBigStruct
  %fn = function_ref @createMyBigStruct
  try_apply %fn(%ex_buffer) normal %norm, unwind %abort
norm(%result: $MyBigStruct):
  ...
abort(%error: $Error):
  dealloc_existential %ex, %MyBigStruct
  throw %error

Side note: Before lowering, something like alloc_box would directly
take its initial value.

It at least *could*. I've since become convinced that we can reason about
local memory initialization well enough to make that unnecessary.

** SILFunction calling convention.

For ownership analysis, there's effectively no difference between the
value/address forms of argument ownership:

@owned / @in
@guaranteed / @in_guaranteed
return / @out
@owned arg
+ @owned return / @inout

Regardless of the representation we choose for @inout, @in/@out will
now be scalar types. SILFunction will maintain the distinction between
@owned/@in etc. based on whether the type is address-only. We need
this for reabstraction, but it only affects the function type, not the
calling convention.

To be clear, it affects the *low-level* calling convention. It just doesn't require
a different pattern of SIL.

Rather than building a tuple, John prefers SIL support for anonymous
aggregate as "exploded values".

[I'm guessing because tuples are a distinct formal type with their own
convention and common ownership. This may need some discussion though.]

Yes. A tuple value could potentially actually be used in the code, so it would take
some work to prove that it wasn't necessary to actually build that tuple. In contrast,
if the components of the tuple can only be used separately, then obviously they
can be kept as independent values.

The ownership thing is more deadly. There's no reason a function *has* to return
all owned values or all borrowed values, but it's not obvious that we want the
type system to be able to express things like "a tuple of an owned and a borrowed
value" as opposed to being able to say that tuples are merely aggregate values
that are either owned or borrowed in their entirety.

Example SIL function type:

$(@in P, @owned Q) -> (@owned R, @owned S, @out T, @out U)

%p = apply f: $() -> P
%q = apply g: $() -> Q
%exploded = apply h(%p, %q)
%r = project_exploded %exploded, #0 : $R
%s = project_exploded %exploded, #1 : $S
%t = project_exploded %exploded, #2 : $T
%u = project_exploded %exploded, #3 : $U

Exploded types requires all their elements to be projected with their
own independent ownership.

** Ownership terminology.

Swift "owned" = Rust values = SIL @owned = implicitly consumed
Swift "borrowed" = Rust immutable borrow = SIL @guaranteed = shared
Swift "inout" = Rust mutable borrow = SIL @inout = unique

Swift "inout" syntax is already (nearly) sufficient.

"borrowed" may not need syntax on the caller side, just a way to
qualify parameters. Swift still needs syntax for returning a borrowed
value.

Right. But these are relatively shallow language-design decisions that we
don't need to worry about for SIL.

** Representation of borrowed values.

Borrowed values represent some shared storage location.

We want some borrowed value references to be passed as SIL values, not SIL addresses:
- Borrowed class references should not be indirected.
- Optimize borrowing other small non-memory linked types.
- Support capture promotion, and other SSA optimizations.
- Borrow CoW values directly.

[Address-only borrowed types will still be passed as SIL addresses (why not?)]

They have to be when memory-linked, for all the reasons above and below.

Borrowed types with potentially mutating properties must be passed by
SIL address because they are not actually immutable and their storage
location is significant.

Borrowed references have a scope and need an end-of-borrow marker.

[The end-of-borrow marker semantically changes the memory state, and
statically enforces non-overlapping memory states. It does not
semantically write-back a value. Borrowed values with mutating fields
are semantically modified in-place.]

Right. One of the language-design challenges with mutable fields is that identity
becomes very important — it's important to understand when you're passing
a copy of a value vs. a borrow of it. For types like Rust atomics this is fine
because they're not copyable, but that's not necessarily true for all types.
And yet I really don't think we want a sigil just to say that we're passing a
borrowed value instead of an owned one for normal use; it's like passing
by const & in C++, the caller almost certainly doesn't care that you're doing it,
especially since we'll be guaranteeing that nobody modifies the value while
it's borrowed.

[Regardless of whether borrowed references are represented as SIL
values or addresses, they must be associated with formal storage. That
storage must remain immutable at the language level (although it may
have mutating fields) and the value cannot be destroyed during the
borrowed scope].

[Trivial borrowed values can be demoted to copies so we can eliminate
their scope]

Right.

[Anything borrowed from global storage (and not demoted to a copy)
needs its scope to be dynamically enforced. Borrows from local storage
are sufficiently statically enforced. However, in both cases the
optimizer must respect the static scope of the borrow.]

[I think borrowed values are effectively passed @guaranteed. The
end-of-borrow scope marker will then always be at the top-level
scope. You can't borrow in a caller and end its scope in the callee.]

Right. The caller has to statically guarantee for the duration, so there's
no reason to split any responsibility here.

** Borrowed and inout scopes.

inout value references are also scoped. We'll get to their
representation shortly. Within an inout scope, memory is in an
exclusive state. No borrowed scopes may overlap with an inout state,
which is to say, memory is either shared or exclusive.

We need a flag for stored properties, even for simple trivial
types. That's the only way to provide a simple user model. At least we
don't need this to be implemented atomically, we're not detecting race
conditions. Optimizations will come later. We should be able to prove
that some stored properties are never passed as inout.

The stored property flag needs to be a tri-state: owned, borrowed, exclusive.

The memory value can only be destroyed in the owned state.

The user may mark some storage locations as "unchecked" as an
opt-out. That doesn't change the optimizer's constraints. It simply
bypasses the runtime check.

** Ownership of loaded values.

[MikeG already explained possibilities of load ownership in
[swift-dev] [semantic-arc][proposal] High Level ARC Memory Operations]

For the sake of understanding the model, it's worth realizing that we
only need one form of load ownership: load_borrow. We don't
actually need an operation that loads an owned value out of formal
storage. This makes canonical sense because:

- Semantically, a load must at least be a borrow because the storage
  location's non-exclusive flag needs to be dynamically checked
  anyway, even if the value will be copied.

- Code motion in the SIL optimizer has to obey the same limitations
  within borrow scopes regardless of whether we fuse loads and copies
  (retains).

[For the purpose of semantic ARC, the copy_value would be the RC
root. The load and copy_value would effectively be "coupled" by the
static scope of the borrow. e.g. we would not want to move a release
inside the static scope of a borrow.]

[Purely in the interest of concise SIL, I still think we want a load [copy].]

Yeah, I agree. I expect it'll be the most common form of load by far;
making it take three instructions would be pretty unfortunate. And it might
be easier for IRGen to optimize as a single instruction, e.g. for types where
we bundle the "is this being accessed" bit directly into the storage.

** SIL value ownership and aggregates

Operations on values:
1. copy
2. forward (move)
3. borrow (share)

A copy or forward produces an owned value.
An owned value has a single consumer.
A borrow has static scope.

For simplicity, passing a bb argument only has move semantics (it
forwards the value). Later that can be expanded if needed.

Reasonable.

We want to allow simultaneous access to independent subelements of a
fragile aggregate. We should be able to borrow one field while
mutating another.

Is it possible to forward a subelement within an aggregate? No. But we
can fully explode an owned aggregate into individual owned elements
and reconstruct the aggregate. This makes use of the @exploded type
feature described in the calling convention.

[I don't think forwarding a subelement is useful anyway except for
modeling @inout semantics...]

That leads us to this question: Does an @inout value reference have
formal storage (thus a SIL address) or is it just a convention for
passing owned SSA values?

** World 1: SSA @inout

Projecting an element produces a new SILValue. Does this SILValue have
it's own ownership associated with it's lifetime, or is it derived
from it's parent object by looking through projections?

Either way, projecting any subelement requires reconstructing the
entire aggregate in SIL, through all nesting levels. This will
generate a massive amount of SILValues. Superficially they all need
their own storage.

[We could claim that projections don't need storage, but that only
solves one side of the problem.]

[I argue that this actually obscures the producer/consumer
relationship, which is the opposite of the intention of moving to
SSA. Projecting subelements for mutation fundamentally doesn't make
sense. It does make sense to borrow a subelement (not for
mutation). It also makes sense to project a mutable storage
location. The natural way to project a storage location is by
projecting an address...]

** World 2: @inout formal storage

In this world, @inout references continue to have SILType $*T with
guaranteed exclusive access.

Memory state can be:
- uninitialized
- holds an owned value
  - has exclusive access
  - has shared access

--- expected transitions need to be handled
  - must become uninitialized
  - must become initialized
  - must preserve initialization state

We need to mark initializers with some "must initialize" marker,
similar to how we mark deinitializers [this isn't clear to me yet].

We could give address types qualifiers to distinguish the memory state
of their pointee (uninitialized, shared, exclusive). Addresses
themselves could be pseudo-linear types. This would provide the same
use-def guarantees as the SSA @inout approach, but producing a new
address each type memory changes states would also be complicated and
cumbersome (though not as bad as SSA).

[[
We didn't talk about the alternative, but presumably exclusive
vs. shared scope would be delimited by pseudo memory operations as
such:

%a1 = alloc_stack

begin_exclusive %a
apply foo(%a) // must be marked an initializer?
end_exclusive %a

begin_shared %a
apply bar(%a) // immutable access
end_shared %a

dealloc_stack %a

I think alloc_stack returns an owned (but uninitialized) address and there's
general scoped operation to turn an owned address into a borrow. Or it could
be implicit in the operation that needs a borrowed value, as you suggest below.

John.

···

On Oct 7, 2016, at 11:10 PM, Andrew Trick via swift-dev <swift-dev@swift.org> wrote:

Values loaded from shared memory also need to be scoped. They must be
consumed within the shared region. e.g.

%a2 = ref_element_addr

%x = load_borrow %a2

end_borrow %x, %a2

It makes sense to me that a load_borrow would implicitly transition
memory to shared state, and end_borrow would implicitly return memory
to an owned state. If the address type is already ($* @borrow T), then
memory would remain in the shared state.
]]

For all sorts of analysis and optimization, from borrow checking to
CoW to ARC, we really need aliasing guarantees. Knowing we have a
unique address to a location is about as good as having an owned
value.

To get this guarantee we need to structurally guarantee
unique addresses.

[Is there a way to do this with out making all the element_addr
operations scoped?]

With aliasing guaratees, verification should be able to statically
prove that most formal storage locations are properly initialized and
uninitialized (pseudo-linear type) by inspecting the memory
operations.

Likewise, we can verify the shared vs. exclusive states.

Representing @inout with addresses doesn't really add features to
SIL. In any case, SIL address types are still used for
formal storage. Exclusive access through any of the following
operations must be guaranteed dynamically:

- ref_element_addr
- global_addr
- pointer_to_address
- alloc_stack
- project_box

We end up with these basic SIL Types:

$T = owned value

$@borrowed T = shared value

$*T = exclusively accessed

$* @borrowed T = shared access

[I think the non-address @borrowed type is only valid for concrete
types that the compiler knows are not memory-linked? This can be used
to avoid passing borrowed values indirectly for arrays and other
small, free-to-copy values].

[We obviously need to work through concrete examples before we can
claim to have a real design.]

-Andy

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

Could you add this (and John’s previous writeup) to the docs in the repo?

I was reasonably along the way to adding unowned optionals a while back but got totally lost in SILGen.
This info looks really valuable, but personally I find that with the mailing list format it’s hard to ever find this kind of stuff when I need it.

Thanks

Karl

P.S. going to pick up that unowned optional stuff soon, once I have time to read the docs about SILGen

···

On 8 Oct 2016, at 08:10, Andrew Trick via swift-dev <swift-dev@swift.org> wrote:

On swift-dev, John already sent out a great writeup on SIL SSA:
Representing "address-only" values in SIL.

While talking to John I also picked up a lot of insight into how
address types relate to SIL ownership and borrow checking. I finally
organized the information into these notes. This is not a
proposal. It's background information for those of us writing and
reviewing proposals. Just take it as a strawman for future
discussions. (There's also a good chance I'm getting something
wrong).

[My commentary in brackets.]

** Recap of address-only.

Divide address-only types into two categories:
1. By abstraction (compiler doesn't know the size).
2. The type is "memory-linked". i.e. the address is significant at runtime.
   - weak references (anything that registers its address).
   - C++ this.
   - Anything with interior pointers.
   - Any shared-borrowed value of a type with "nonmutating" properties.
     ["nonmutating" properties allow mutation of state attached to a value.
      Rust atomics are an example.]

Address-only will not be reflected in SIL types. SIL addresses should
only be used for formal memory (pointers, globals, class
properties, captures). We'll get to inout arguments later...

As with opaque types, when IRGen lowers a memory-linked borrowed type,
it needs to allocate storage.

Concern: SILGen has built-in tracking of managed values that automates
insertion of cleanups. Lowering address-only types after SILOpt would
require rediscovering that information based on CFG analysis. Is this
too heroic?

This was already described by John. Briefly recapping:

e.g. Constructung Optional<Any>

We want initialization should be in-place as such:

%0 = struct_element_addr .. #S.any
%1 = init_existential_addr %0, $*Any, $Optional<X>
%2 = inject_enum_data_addr %1, $Optional<X>.Some
apply @initX(%2)

SILValue initialization would look something like:

%0 = apply @initX()
%1 = enum optional.Some, %0 : $X
%2 = existential %1 : $Any

[I'm not sure we actually want to represent an existential container
this way, but enum, yes.]

Lowering now requires discovering the storage structure, bottom-up,
hoisting allocation, inserting cleanups as John explained.

Side note: Before lowering, something like alloc_box would directly
take its initial value.

** SILFunction calling convention.

For ownership analysis, there's effectively no difference between the
value/address forms of argument ownership:

@owned / @in
@guaranteed / @in_guaranteed
return / @out
@owned arg
+ @owned return / @inout

Regardless of the representation we choose for @inout, @in/@out will
now be scalar types. SILFunction will maintain the distinction between
@owned/@in etc. based on whether the type is address-only. We need
this for reabstraction, but it only affects the function type, not the
calling convention.

Rather than building a tuple, John prefers SIL support for anonymous
aggregate as "exploded values".

[I'm guessing because tuples are a distinct formal type with their own
convention and common ownership. This may need some discussion though.]

Example SIL function type:

$(@in P, @owned Q) -> (@owned R, @owned S, @out T, @out U)

%p = apply f: $() -> P
%q = apply g: $() -> Q
%exploded = apply h(%p, %q)
%r = project_exploded %exploded, #0 : $R
%s = project_exploded %exploded, #1 : $S
%t = project_exploded %exploded, #2 : $T
%u = project_exploded %exploded, #3 : $U

Exploded types requires all their elements to be projected with their
own independent ownership.

** Ownership terminology.

Swift "owned" = Rust values = SIL @owned = implicitly consumed
Swift "borrowed" = Rust immutable borrow = SIL @guaranteed = shared
Swift "inout" = Rust mutable borrow = SIL @inout = unique

Swift "inout" syntax is already (nearly) sufficient.

"borrowed" may not need syntax on the caller side, just a way to
qualify parameters. Swift still needs syntax for returning a borrowed
value.

** Representation of borrowed values.

Borrowed values represent some shared storage location.

We want some borrowed value references to be passed as SIL values, not SIL addresses:
- Borrowed class references should not be indirected.
- Optimize borrowing other small non-memory linked types.
- Support capture promotion, and other SSA optimizations.
- Borrow CoW values directly.

[Address-only borrowed types will still be passed as SIL addresses (why not?)]

Borrowed types with potentially mutating properties must be passed by
SIL address because they are not actually immutable and their storage
location is significant.

Borrowed references have a scope and need an end-of-borrow marker.

[The end-of-borrow marker semantically changes the memory state, and
statically enforces non-overlapping memory states. It does not
semantically write-back a value. Borrowed values with mutating fields
are semantically modified in-place.]

[Regardless of whether borrowed references are represented as SIL
values or addresses, they must be associated with formal storage. That
storage must remain immutable at the language level (although it may
have mutating fields) and the value cannot be destroyed during the
borrowed scope].

[Trivial borrowed values can be demoted to copies so we can eliminate
their scope]

[Anything borrowed from global storage (and not demoted to a copy)
needs its scope to be dynamically enforced. Borrows from local storage
are sufficiently statically enforced. However, in both cases the
optimizer must respect the static scope of the borrow.]

[I think borrowed values are effectively passed @guaranteed. The
end-of-borrow scope marker will then always be at the top-level
scope. You can't borrow in a caller and end its scope in the callee.]

** Borrowed and inout scopes.

inout value references are also scoped. We'll get to their
representation shortly. Within an inout scope, memory is in an
exclusive state. No borrowed scopes may overlap with an inout state,
which is to say, memory is either shared or exclusive.

We need a flag for stored properties, even for simple trivial
types. That's the only way to provide a simple user model. At least we
don't need this to be implemented atomically, we're not detecting race
conditions. Optimizations will come later. We should be able to prove
that some stored properties are never passed as inout.

The stored property flag needs to be a tri-state: owned, borrowed, exclusive.

The memory value can only be destroyed in the owned state.

The user may mark some storage locations as "unchecked" as an
opt-out. That doesn't change the optimizer's constraints. It simply
bypasses the runtime check.

** Ownership of loaded values.

[MikeG already explained possibilities of load ownership in
[swift-dev] [semantic-arc][proposal] High Level ARC Memory Operations]

For the sake of understanding the model, it's worth realizing that we
only need one form of load ownership: load_borrow. We don't
actually need an operation that loads an owned value out of formal
storage. This makes canonical sense because:

- Semantically, a load must at least be a borrow because the storage
  location's non-exclusive flag needs to be dynamically checked
  anyway, even if the value will be copied.

- Code motion in the SIL optimizer has to obey the same limitations
  within borrow scopes regardless of whether we fuse loads and copies
  (retains).

[For the purpose of semantic ARC, the copy_value would be the RC
root. The load and copy_value would effectively be "coupled" by the
static scope of the borrow. e.g. we would not want to move a release
inside the static scope of a borrow.]

[Purely in the interest of concise SIL, I still think we want a load [copy].]

** SIL value ownership and aggregates

Operations on values:
1. copy
2. forward (move)
3. borrow (share)

A copy or forward produces an owned value.
An owned value has a single consumer.
A borrow has static scope.

For simplicity, passing a bb argument only has move semantics (it
forwards the value). Later that can be expanded if needed.

We want to allow simultaneous access to independent subelements of a
fragile aggregate. We should be able to borrow one field while
mutating another.

Is it possible to forward a subelement within an aggregate? No. But we
can fully explode an owned aggregate into individual owned elements
and reconstruct the aggregate. This makes use of the @exploded type
feature described in the calling convention.

[I don't think forwarding a subelement is useful anyway except for
modeling @inout semantics...]

That leads us to this question: Does an @inout value reference have
formal storage (thus a SIL address) or is it just a convention for
passing owned SSA values?

** World 1: SSA @inout

Projecting an element produces a new SILValue. Does this SILValue have
it's own ownership associated with it's lifetime, or is it derived
from it's parent object by looking through projections?

Either way, projecting any subelement requires reconstructing the
entire aggregate in SIL, through all nesting levels. This will
generate a massive amount of SILValues. Superficially they all need
their own storage.

[We could claim that projections don't need storage, but that only
solves one side of the problem.]

[I argue that this actually obscures the producer/consumer
relationship, which is the opposite of the intention of moving to
SSA. Projecting subelements for mutation fundamentally doesn't make
sense. It does make sense to borrow a subelement (not for
mutation). It also makes sense to project a mutable storage
location. The natural way to project a storage location is by
projecting an address...]

** World 2: @inout formal storage

In this world, @inout references continue to have SILType $*T with
guaranteed exclusive access.

Memory state can be:
- uninitialized
- holds an owned value
  - has exclusive access
  - has shared access

--- expected transitions need to be handled
  - must become uninitialized
  - must become initialized
  - must preserve initialization state

We need to mark initializers with some "must initialize" marker,
similar to how we mark deinitializers [this isn't clear to me yet].

We could give address types qualifiers to distinguish the memory state
of their pointee (uninitialized, shared, exclusive). Addresses
themselves could be pseudo-linear types. This would provide the same
use-def guarantees as the SSA @inout approach, but producing a new
address each type memory changes states would also be complicated and
cumbersome (though not as bad as SSA).

[[
We didn't talk about the alternative, but presumably exclusive
vs. shared scope would be delimited by pseudo memory operations as
such:

%a1 = alloc_stack

begin_exclusive %a
apply foo(%a) // must be marked an initializer?
end_exclusive %a

begin_shared %a
apply bar(%a) // immutable access
end_shared %a

dealloc_stack %a

Values loaded from shared memory also need to be scoped. They must be
consumed within the shared region. e.g.

%a2 = ref_element_addr

%x = load_borrow %a2

end_borrow %x, %a2

It makes sense to me that a load_borrow would implicitly transition
memory to shared state, and end_borrow would implicitly return memory
to an owned state. If the address type is already ($* @borrow T), then
memory would remain in the shared state.
]]

For all sorts of analysis and optimization, from borrow checking to
CoW to ARC, we really need aliasing guarantees. Knowing we have a
unique address to a location is about as good as having an owned
value.

To get this guarantee we need to structurally guarantee
unique addresses.

[Is there a way to do this with out making all the element_addr
operations scoped?]

With aliasing guaratees, verification should be able to statically
prove that most formal storage locations are properly initialized and
uninitialized (pseudo-linear type) by inspecting the memory
operations.

Likewise, we can verify the shared vs. exclusive states.

Representing @inout with addresses doesn't really add features to
SIL. In any case, SIL address types are still used for
formal storage. Exclusive access through any of the following
operations must be guaranteed dynamically:

- ref_element_addr
- global_addr
- pointer_to_address
- alloc_stack
- project_box

We end up with these basic SIL Types:

$T = owned value

$@borrowed T = shared value

$*T = exclusively accessed

$* @borrowed T = shared access

[I think the non-address @borrowed type is only valid for concrete
types that the compiler knows are not memory-linked? This can be used
to avoid passing borrowed values indirectly for arrays and other
small, free-to-copy values].

[We obviously need to work through concrete examples before we can
claim to have a real design.]

-Andy

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

I think there's a size threshold at which SSA @inout is manageable, and might lead to overall better register-oriented code, if the aggregates can be exploded into a small number of individual values. The cost of reconstructing the aggregate could be mitigated somewhat by introducing 'insert' instructions for aggregates to pair with the projection instructions, similar to how LLVM has insert/extractelement. "%x = project_value %y.field; %x' = transform(%x); %y' = insert %y.field, %x" isn't too terrible compared to the address-oriented formulation. Tracking ownership state through projections and insertions might tricky; haven't thought about that aspect.

-Joe

···

On Oct 7, 2016, at 11:10 PM, Andrew Trick via swift-dev <swift-dev@swift.org> wrote:
** World 1: SSA @inout

Projecting an element produces a new SILValue. Does this SILValue have
it's own ownership associated with it's lifetime, or is it derived
from it's parent object by looking through projections?

Either way, projecting any subelement requires reconstructing the
entire aggregate in SIL, through all nesting levels. This will
generate a massive amount of SILValues. Superficially they all need
their own storage.

[We could claim that projections don't need storage, but that only
solves one side of the problem.]

[I argue that this actually obscures the producer/consumer
relationship, which is the opposite of the intention of moving to
SSA. Projecting subelements for mutation fundamentally doesn't make
sense. It does make sense to borrow a subelement (not for
mutation). It also makes sense to project a mutable storage
location. The natural way to project a storage location is by
projecting an address...]

Could you add this (and John’s previous writeup) to the docs in the repo?

Yeah, it’s unfortunate that design discussions are buried in a flood of email. On the flip side, I’ve checked in some premature design docs that are probably nonsense now. I’m currently preparing a type safe memory model design doc to checkin. After that I’ll probably work on a document for SIL SSA with address-only types, which should cover John’s writeup. I’ll have to work with Michael Gottesman and John McCall to get a SIL ownership docs checked in.

I was reasonably along the way to adding unowned optionals a while back but got totally lost in SILGen.
This info looks really valuable, but personally I find that with the mailing list format it’s hard to ever find this kind of stuff when I need it.

Thanks

Karl

P.S. going to pick up that unowned optional stuff soon, once I have time to read the docs about SILGen

There are SILGen docs somewhere?

-Andy

···

On Oct 8, 2016, at 10:09 AM, Karl <razielim@gmail.com> wrote:

On 8 Oct 2016, at 08:10, Andrew Trick via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

On swift-dev, John already sent out a great writeup on SIL SSA:
Representing "address-only" values in SIL.

While talking to John I also picked up a lot of insight into how
address types relate to SIL ownership and borrow checking. I finally
organized the information into these notes. This is not a
proposal. It's background information for those of us writing and
reviewing proposals. Just take it as a strawman for future
discussions. (There's also a good chance I'm getting something
wrong).

[My commentary in brackets.]

** Recap of address-only.

Divide address-only types into two categories:
1. By abstraction (compiler doesn't know the size).
2. The type is "memory-linked". i.e. the address is significant at runtime.
   - weak references (anything that registers its address).
   - C++ this.
   - Anything with interior pointers.
   - Any shared-borrowed value of a type with "nonmutating" properties.
     ["nonmutating" properties allow mutation of state attached to a value.
      Rust atomics are an example.]

Address-only will not be reflected in SIL types. SIL addresses should
only be used for formal memory (pointers, globals, class
properties, captures). We'll get to inout arguments later...

As with opaque types, when IRGen lowers a memory-linked borrowed type,
it needs to allocate storage.

Concern: SILGen has built-in tracking of managed values that automates
insertion of cleanups. Lowering address-only types after SILOpt would
require rediscovering that information based on CFG analysis. Is this
too heroic?

This was already described by John. Briefly recapping:

e.g. Constructung Optional<Any>

We want initialization should be in-place as such:

%0 = struct_element_addr .. #S.any
%1 = init_existential_addr %0, $*Any, $Optional<X>
%2 = inject_enum_data_addr %1, $Optional<X>.Some
apply @initX(%2)

SILValue initialization would look something like:

%0 = apply @initX()
%1 = enum optional.Some, %0 : $X
%2 = existential %1 : $Any

[I'm not sure we actually want to represent an existential container
this way, but enum, yes.]

Lowering now requires discovering the storage structure, bottom-up,
hoisting allocation, inserting cleanups as John explained.

Side note: Before lowering, something like alloc_box would directly
take its initial value.

** SILFunction calling convention.

For ownership analysis, there's effectively no difference between the
value/address forms of argument ownership:

@owned / @in
@guaranteed / @in_guaranteed
return / @out
@owned arg
+ @owned return / @inout

Regardless of the representation we choose for @inout, @in/@out will
now be scalar types. SILFunction will maintain the distinction between
@owned/@in etc. based on whether the type is address-only. We need
this for reabstraction, but it only affects the function type, not the
calling convention.

Rather than building a tuple, John prefers SIL support for anonymous
aggregate as "exploded values".

[I'm guessing because tuples are a distinct formal type with their own
convention and common ownership. This may need some discussion though.]

Example SIL function type:

$(@in P, @owned Q) -> (@owned R, @owned S, @out T, @out U)

%p = apply f: $() -> P
%q = apply g: $() -> Q
%exploded = apply h(%p, %q)
%r = project_exploded %exploded, #0 : $R
%s = project_exploded %exploded, #1 : $S
%t = project_exploded %exploded, #2 : $T
%u = project_exploded %exploded, #3 : $U

Exploded types requires all their elements to be projected with their
own independent ownership.

** Ownership terminology.

Swift "owned" = Rust values = SIL @owned = implicitly consumed
Swift "borrowed" = Rust immutable borrow = SIL @guaranteed = shared
Swift "inout" = Rust mutable borrow = SIL @inout = unique

Swift "inout" syntax is already (nearly) sufficient.

"borrowed" may not need syntax on the caller side, just a way to
qualify parameters. Swift still needs syntax for returning a borrowed
value.

** Representation of borrowed values.

Borrowed values represent some shared storage location.

We want some borrowed value references to be passed as SIL values, not SIL addresses:
- Borrowed class references should not be indirected.
- Optimize borrowing other small non-memory linked types.
- Support capture promotion, and other SSA optimizations.
- Borrow CoW values directly.

[Address-only borrowed types will still be passed as SIL addresses (why not?)]

Borrowed types with potentially mutating properties must be passed by
SIL address because they are not actually immutable and their storage
location is significant.

Borrowed references have a scope and need an end-of-borrow marker.

[The end-of-borrow marker semantically changes the memory state, and
statically enforces non-overlapping memory states. It does not
semantically write-back a value. Borrowed values with mutating fields
are semantically modified in-place.]

[Regardless of whether borrowed references are represented as SIL
values or addresses, they must be associated with formal storage. That
storage must remain immutable at the language level (although it may
have mutating fields) and the value cannot be destroyed during the
borrowed scope].

[Trivial borrowed values can be demoted to copies so we can eliminate
their scope]

[Anything borrowed from global storage (and not demoted to a copy)
needs its scope to be dynamically enforced. Borrows from local storage
are sufficiently statically enforced. However, in both cases the
optimizer must respect the static scope of the borrow.]

[I think borrowed values are effectively passed @guaranteed. The
end-of-borrow scope marker will then always be at the top-level
scope. You can't borrow in a caller and end its scope in the callee.]

** Borrowed and inout scopes.

inout value references are also scoped. We'll get to their
representation shortly. Within an inout scope, memory is in an
exclusive state. No borrowed scopes may overlap with an inout state,
which is to say, memory is either shared or exclusive.

We need a flag for stored properties, even for simple trivial
types. That's the only way to provide a simple user model. At least we
don't need this to be implemented atomically, we're not detecting race
conditions. Optimizations will come later. We should be able to prove
that some stored properties are never passed as inout.

The stored property flag needs to be a tri-state: owned, borrowed, exclusive.

The memory value can only be destroyed in the owned state.

The user may mark some storage locations as "unchecked" as an
opt-out. That doesn't change the optimizer's constraints. It simply
bypasses the runtime check.

** Ownership of loaded values.

[MikeG already explained possibilities of load ownership in
[swift-dev] [semantic-arc][proposal] High Level ARC Memory Operations]

For the sake of understanding the model, it's worth realizing that we
only need one form of load ownership: load_borrow. We don't
actually need an operation that loads an owned value out of formal
storage. This makes canonical sense because:

- Semantically, a load must at least be a borrow because the storage
  location's non-exclusive flag needs to be dynamically checked
  anyway, even if the value will be copied.

- Code motion in the SIL optimizer has to obey the same limitations
  within borrow scopes regardless of whether we fuse loads and copies
  (retains).

[For the purpose of semantic ARC, the copy_value would be the RC
root. The load and copy_value would effectively be "coupled" by the
static scope of the borrow. e.g. we would not want to move a release
inside the static scope of a borrow.]

[Purely in the interest of concise SIL, I still think we want a load [copy].]

** SIL value ownership and aggregates

Operations on values:
1. copy
2. forward (move)
3. borrow (share)

A copy or forward produces an owned value.
An owned value has a single consumer.
A borrow has static scope.

For simplicity, passing a bb argument only has move semantics (it
forwards the value). Later that can be expanded if needed.

We want to allow simultaneous access to independent subelements of a
fragile aggregate. We should be able to borrow one field while
mutating another.

Is it possible to forward a subelement within an aggregate? No. But we
can fully explode an owned aggregate into individual owned elements
and reconstruct the aggregate. This makes use of the @exploded type
feature described in the calling convention.

[I don't think forwarding a subelement is useful anyway except for
modeling @inout semantics...]

That leads us to this question: Does an @inout value reference have
formal storage (thus a SIL address) or is it just a convention for
passing owned SSA values?

** World 1: SSA @inout

Projecting an element produces a new SILValue. Does this SILValue have
it's own ownership associated with it's lifetime, or is it derived
from it's parent object by looking through projections?

Either way, projecting any subelement requires reconstructing the
entire aggregate in SIL, through all nesting levels. This will
generate a massive amount of SILValues. Superficially they all need
their own storage.

[We could claim that projections don't need storage, but that only
solves one side of the problem.]

[I argue that this actually obscures the producer/consumer
relationship, which is the opposite of the intention of moving to
SSA. Projecting subelements for mutation fundamentally doesn't make
sense. It does make sense to borrow a subelement (not for
mutation). It also makes sense to project a mutable storage
location. The natural way to project a storage location is by
projecting an address...]

** World 2: @inout formal storage

In this world, @inout references continue to have SILType $*T with
guaranteed exclusive access.

Memory state can be:
- uninitialized
- holds an owned value
  - has exclusive access
  - has shared access

--- expected transitions need to be handled
  - must become uninitialized
  - must become initialized
  - must preserve initialization state

We need to mark initializers with some "must initialize" marker,
similar to how we mark deinitializers [this isn't clear to me yet].

We could give address types qualifiers to distinguish the memory state
of their pointee (uninitialized, shared, exclusive). Addresses
themselves could be pseudo-linear types. This would provide the same
use-def guarantees as the SSA @inout approach, but producing a new
address each type memory changes states would also be complicated and
cumbersome (though not as bad as SSA).

[[
We didn't talk about the alternative, but presumably exclusive
vs. shared scope would be delimited by pseudo memory operations as
such:

%a1 = alloc_stack

begin_exclusive %a
apply foo(%a) // must be marked an initializer?
end_exclusive %a

begin_shared %a
apply bar(%a) // immutable access
end_shared %a

dealloc_stack %a

Values loaded from shared memory also need to be scoped. They must be
consumed within the shared region. e.g.

%a2 = ref_element_addr

%x = load_borrow %a2

end_borrow %x, %a2

It makes sense to me that a load_borrow would implicitly transition
memory to shared state, and end_borrow would implicitly return memory
to an owned state. If the address type is already ($* @borrow T), then
memory would remain in the shared state.
]]

For all sorts of analysis and optimization, from borrow checking to
CoW to ARC, we really need aliasing guarantees. Knowing we have a
unique address to a location is about as good as having an owned
value.

To get this guarantee we need to structurally guarantee
unique addresses.

[Is there a way to do this with out making all the element_addr
operations scoped?]

With aliasing guaratees, verification should be able to statically
prove that most formal storage locations are properly initialized and
uninitialized (pseudo-linear type) by inspecting the memory
operations.

Likewise, we can verify the shared vs. exclusive states.

Representing @inout with addresses doesn't really add features to
SIL. In any case, SIL address types are still used for
formal storage. Exclusive access through any of the following
operations must be guaranteed dynamically:

- ref_element_addr
- global_addr
- pointer_to_address
- alloc_stack
- project_box

We end up with these basic SIL Types:

$T = owned value

$@borrowed T = shared value

$*T = exclusively accessed

$* @borrowed T = shared access

[I think the non-address @borrowed type is only valid for concrete
types that the compiler knows are not memory-linked? This can be used
to avoid passing borrowed values indirectly for arrays and other
small, free-to-copy values].

[We obviously need to work through concrete examples before we can
claim to have a real design.]

-Andy

_______________________________________________
swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev

I should have decorated the code with transitions to get the point across:

%a = alloc_stack // -> owned/uninitialized

begin_exclusive %a // -> exclusive/uninitialized
apply foo(%a) // -> exclusive/initialized
end_exclusive %a // -> owned/initialized

begin_shared %a // -> shared (implies initialized)
apply bar(%a) // immutable access
end_shared %a // -> owned/initialized

dealloc_stack %a // -> invalid

-Andy

···

On Oct 8, 2016, at 12:39 AM, John McCall <rjmccall@apple.com> wrote:

%a = alloc_stack

begin_exclusive %a
apply foo(%a) // must be marked an initializer?
end_exclusive %a

begin_shared %a
apply bar(%a) // immutable access
end_shared %a

dealloc_stack %a

I think alloc_stack returns an owned (but uninitialized) address and there's
general scoped operation to turn an owned address into a borrow. Or it could
be implicit in the operation that needs a borrowed value, as you suggest below.

Could you add this (and John’s previous writeup) to the docs in the repo?

I was reasonably along the way to adding unowned optionals a while back but got totally lost in SILGen.
This info looks really valuable, but personally I find that with the mailing list format it’s hard to ever find this kind of stuff when I need it.

Thanks

Karl

P.S. going to pick up that unowned optional stuff soon, once I have time to read the docs about SILGen

I am not sure if it is appropriate to document this sort of thing in the docs directory. This is because, as Andy explicitly mentioned, this document is not an actual proposal or a plan of record. Rather, this is meant to be a record of an in person side discussion that occurred in between two individuals. In the past, when we have had these in person side conversations, notes were not provided to the wider group of developers resulting in siloed knowledge and obscured visibility into the design process.

Eliminating such problems is the intention behind sending out these notes, not providing finalized proposals for placement in the docs directory.

Michael

···

On Oct 8, 2016, at 10:09 AM, Karl via swift-dev <swift-dev@swift.org> wrote:

On 8 Oct 2016, at 08:10, Andrew Trick via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

On swift-dev, John already sent out a great writeup on SIL SSA:
Representing "address-only" values in SIL.

While talking to John I also picked up a lot of insight into how
address types relate to SIL ownership and borrow checking. I finally
organized the information into these notes. This is not a
proposal. It's background information for those of us writing and
reviewing proposals. Just take it as a strawman for future
discussions. (There's also a good chance I'm getting something
wrong).

[My commentary in brackets.]

** Recap of address-only.

Divide address-only types into two categories:
1. By abstraction (compiler doesn't know the size).
2. The type is "memory-linked". i.e. the address is significant at runtime.
   - weak references (anything that registers its address).
   - C++ this.
   - Anything with interior pointers.
   - Any shared-borrowed value of a type with "nonmutating" properties.
     ["nonmutating" properties allow mutation of state attached to a value.
      Rust atomics are an example.]

Address-only will not be reflected in SIL types. SIL addresses should
only be used for formal memory (pointers, globals, class
properties, captures). We'll get to inout arguments later...

As with opaque types, when IRGen lowers a memory-linked borrowed type,
it needs to allocate storage.

Concern: SILGen has built-in tracking of managed values that automates
insertion of cleanups. Lowering address-only types after SILOpt would
require rediscovering that information based on CFG analysis. Is this
too heroic?

This was already described by John. Briefly recapping:

e.g. Constructung Optional<Any>

We want initialization should be in-place as such:

%0 = struct_element_addr .. #S.any
%1 = init_existential_addr %0, $*Any, $Optional<X>
%2 = inject_enum_data_addr %1, $Optional<X>.Some
apply @initX(%2)

SILValue initialization would look something like:

%0 = apply @initX()
%1 = enum optional.Some, %0 : $X
%2 = existential %1 : $Any

[I'm not sure we actually want to represent an existential container
this way, but enum, yes.]

Lowering now requires discovering the storage structure, bottom-up,
hoisting allocation, inserting cleanups as John explained.

Side note: Before lowering, something like alloc_box would directly
take its initial value.

** SILFunction calling convention.

For ownership analysis, there's effectively no difference between the
value/address forms of argument ownership:

@owned / @in
@guaranteed / @in_guaranteed
return / @out
@owned arg
+ @owned return / @inout

Regardless of the representation we choose for @inout, @in/@out will
now be scalar types. SILFunction will maintain the distinction between
@owned/@in etc. based on whether the type is address-only. We need
this for reabstraction, but it only affects the function type, not the
calling convention.

Rather than building a tuple, John prefers SIL support for anonymous
aggregate as "exploded values".

[I'm guessing because tuples are a distinct formal type with their own
convention and common ownership. This may need some discussion though.]

Example SIL function type:

$(@in P, @owned Q) -> (@owned R, @owned S, @out T, @out U)

%p = apply f: $() -> P
%q = apply g: $() -> Q
%exploded = apply h(%p, %q)
%r = project_exploded %exploded, #0 : $R
%s = project_exploded %exploded, #1 : $S
%t = project_exploded %exploded, #2 : $T
%u = project_exploded %exploded, #3 : $U

Exploded types requires all their elements to be projected with their
own independent ownership.

** Ownership terminology.

Swift "owned" = Rust values = SIL @owned = implicitly consumed
Swift "borrowed" = Rust immutable borrow = SIL @guaranteed = shared
Swift "inout" = Rust mutable borrow = SIL @inout = unique

Swift "inout" syntax is already (nearly) sufficient.

"borrowed" may not need syntax on the caller side, just a way to
qualify parameters. Swift still needs syntax for returning a borrowed
value.

** Representation of borrowed values.

Borrowed values represent some shared storage location.

We want some borrowed value references to be passed as SIL values, not SIL addresses:
- Borrowed class references should not be indirected.
- Optimize borrowing other small non-memory linked types.
- Support capture promotion, and other SSA optimizations.
- Borrow CoW values directly.

[Address-only borrowed types will still be passed as SIL addresses (why not?)]

Borrowed types with potentially mutating properties must be passed by
SIL address because they are not actually immutable and their storage
location is significant.

Borrowed references have a scope and need an end-of-borrow marker.

[The end-of-borrow marker semantically changes the memory state, and
statically enforces non-overlapping memory states. It does not
semantically write-back a value. Borrowed values with mutating fields
are semantically modified in-place.]

[Regardless of whether borrowed references are represented as SIL
values or addresses, they must be associated with formal storage. That
storage must remain immutable at the language level (although it may
have mutating fields) and the value cannot be destroyed during the
borrowed scope].

[Trivial borrowed values can be demoted to copies so we can eliminate
their scope]

[Anything borrowed from global storage (and not demoted to a copy)
needs its scope to be dynamically enforced. Borrows from local storage
are sufficiently statically enforced. However, in both cases the
optimizer must respect the static scope of the borrow.]

[I think borrowed values are effectively passed @guaranteed. The
end-of-borrow scope marker will then always be at the top-level
scope. You can't borrow in a caller and end its scope in the callee.]

** Borrowed and inout scopes.

inout value references are also scoped. We'll get to their
representation shortly. Within an inout scope, memory is in an
exclusive state. No borrowed scopes may overlap with an inout state,
which is to say, memory is either shared or exclusive.

We need a flag for stored properties, even for simple trivial
types. That's the only way to provide a simple user model. At least we
don't need this to be implemented atomically, we're not detecting race
conditions. Optimizations will come later. We should be able to prove
that some stored properties are never passed as inout.

The stored property flag needs to be a tri-state: owned, borrowed, exclusive.

The memory value can only be destroyed in the owned state.

The user may mark some storage locations as "unchecked" as an
opt-out. That doesn't change the optimizer's constraints. It simply
bypasses the runtime check.

** Ownership of loaded values.

[MikeG already explained possibilities of load ownership in
[swift-dev] [semantic-arc][proposal] High Level ARC Memory Operations]

For the sake of understanding the model, it's worth realizing that we
only need one form of load ownership: load_borrow. We don't
actually need an operation that loads an owned value out of formal
storage. This makes canonical sense because:

- Semantically, a load must at least be a borrow because the storage
  location's non-exclusive flag needs to be dynamically checked
  anyway, even if the value will be copied.

- Code motion in the SIL optimizer has to obey the same limitations
  within borrow scopes regardless of whether we fuse loads and copies
  (retains).

[For the purpose of semantic ARC, the copy_value would be the RC
root. The load and copy_value would effectively be "coupled" by the
static scope of the borrow. e.g. we would not want to move a release
inside the static scope of a borrow.]

[Purely in the interest of concise SIL, I still think we want a load [copy].]

** SIL value ownership and aggregates

Operations on values:
1. copy
2. forward (move)
3. borrow (share)

A copy or forward produces an owned value.
An owned value has a single consumer.
A borrow has static scope.

For simplicity, passing a bb argument only has move semantics (it
forwards the value). Later that can be expanded if needed.

We want to allow simultaneous access to independent subelements of a
fragile aggregate. We should be able to borrow one field while
mutating another.

Is it possible to forward a subelement within an aggregate? No. But we
can fully explode an owned aggregate into individual owned elements
and reconstruct the aggregate. This makes use of the @exploded type
feature described in the calling convention.

[I don't think forwarding a subelement is useful anyway except for
modeling @inout semantics...]

That leads us to this question: Does an @inout value reference have
formal storage (thus a SIL address) or is it just a convention for
passing owned SSA values?

** World 1: SSA @inout

Projecting an element produces a new SILValue. Does this SILValue have
it's own ownership associated with it's lifetime, or is it derived
from it's parent object by looking through projections?

Either way, projecting any subelement requires reconstructing the
entire aggregate in SIL, through all nesting levels. This will
generate a massive amount of SILValues. Superficially they all need
their own storage.

[We could claim that projections don't need storage, but that only
solves one side of the problem.]

[I argue that this actually obscures the producer/consumer
relationship, which is the opposite of the intention of moving to
SSA. Projecting subelements for mutation fundamentally doesn't make
sense. It does make sense to borrow a subelement (not for
mutation). It also makes sense to project a mutable storage
location. The natural way to project a storage location is by
projecting an address...]

** World 2: @inout formal storage

In this world, @inout references continue to have SILType $*T with
guaranteed exclusive access.

Memory state can be:
- uninitialized
- holds an owned value
  - has exclusive access
  - has shared access

--- expected transitions need to be handled
  - must become uninitialized
  - must become initialized
  - must preserve initialization state

We need to mark initializers with some "must initialize" marker,
similar to how we mark deinitializers [this isn't clear to me yet].

We could give address types qualifiers to distinguish the memory state
of their pointee (uninitialized, shared, exclusive). Addresses
themselves could be pseudo-linear types. This would provide the same
use-def guarantees as the SSA @inout approach, but producing a new
address each type memory changes states would also be complicated and
cumbersome (though not as bad as SSA).

[[
We didn't talk about the alternative, but presumably exclusive
vs. shared scope would be delimited by pseudo memory operations as
such:

%a1 = alloc_stack

begin_exclusive %a
apply foo(%a) // must be marked an initializer?
end_exclusive %a

begin_shared %a
apply bar(%a) // immutable access
end_shared %a

dealloc_stack %a

Values loaded from shared memory also need to be scoped. They must be
consumed within the shared region. e.g.

%a2 = ref_element_addr

%x = load_borrow %a2

end_borrow %x, %a2

It makes sense to me that a load_borrow would implicitly transition
memory to shared state, and end_borrow would implicitly return memory
to an owned state. If the address type is already ($* @borrow T), then
memory would remain in the shared state.
]]

For all sorts of analysis and optimization, from borrow checking to
CoW to ARC, we really need aliasing guarantees. Knowing we have a
unique address to a location is about as good as having an owned
value.

To get this guarantee we need to structurally guarantee
unique addresses.

[Is there a way to do this with out making all the element_addr
operations scoped?]

With aliasing guaratees, verification should be able to statically
prove that most formal storage locations are properly initialized and
uninitialized (pseudo-linear type) by inspecting the memory
operations.

Likewise, we can verify the shared vs. exclusive states.

Representing @inout with addresses doesn't really add features to
SIL. In any case, SIL address types are still used for
formal storage. Exclusive access through any of the following
operations must be guaranteed dynamically:

- ref_element_addr
- global_addr
- pointer_to_address
- alloc_stack
- project_box

We end up with these basic SIL Types:

$T = owned value

$@borrowed T = shared value

$*T = exclusively accessed

$* @borrowed T = shared access

[I think the non-address @borrowed type is only valid for concrete
types that the compiler knows are not memory-linked? This can be used
to avoid passing borrowed values indirectly for arrays and other
small, free-to-copy values].

[We obviously need to work through concrete examples before we can
claim to have a real design.]

-Andy

_______________________________________________
swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

We would have to make sure SROA+mem2reg could still kick in. If that happens, I don’t think we need to worry about inout ownership semantics anymore. A struct_extract is then essentially a borrow. It’s parent’s lifetime needs to be guaranteed, but I don’t know if the subobject needs explicit scoping in SIL since there’s no inout scopes to worry about and nothing for the runtime to do when the scope ends .

(Incidentally, this would never happen to a CoW type that has a uniqueness check—to mutate a CoW type, it’s value needs to be in memory).

-Andy

···

On Oct 10, 2016, at 6:23 PM, Joe Groff <jgroff@apple.com> wrote:

On Oct 7, 2016, at 11:10 PM, Andrew Trick via swift-dev <swift-dev@swift.org> wrote:
** World 1: SSA @inout

Projecting an element produces a new SILValue. Does this SILValue have
it's own ownership associated with it's lifetime, or is it derived
from it's parent object by looking through projections?

Either way, projecting any subelement requires reconstructing the
entire aggregate in SIL, through all nesting levels. This will
generate a massive amount of SILValues. Superficially they all need
their own storage.

[We could claim that projections don't need storage, but that only
solves one side of the problem.]

[I argue that this actually obscures the producer/consumer
relationship, which is the opposite of the intention of moving to
SSA. Projecting subelements for mutation fundamentally doesn't make
sense. It does make sense to borrow a subelement (not for
mutation). It also makes sense to project a mutable storage
location. The natural way to project a storage location is by
projecting an address...]

I think there's a size threshold at which SSA @inout is manageable, and might lead to overall better register-oriented code, if the aggregates can be exploded into a small number of individual values. The cost of reconstructing the aggregate could be mitigated somewhat by introducing 'insert' instructions for aggregates to pair with the projection instructions, similar to how LLVM has insert/extractelement. "%x = project_value %y.field; %x' = transform(%x); %y' = insert %y.field, %x" isn't too terrible compared to the address-oriented formulation. Tracking ownership state through projections and insertions might tricky; haven't thought about that aspect.

-Joe

Does a uniqueness check still need to be associated with a memory location once we associate ownership with SSA values? It seems to me like it wouldn't necessarily need to be. One thing I'd like us to work toward is being able to reliably apply uniqueness checks to rvalues, so that code in a "pure functional" style gets the same optimization benefits as code that explicitly uses inouts.

-Joe

···

On Oct 10, 2016, at 6:58 PM, Andrew Trick <atrick@apple.com> wrote:

On Oct 10, 2016, at 6:23 PM, Joe Groff <jgroff@apple.com> wrote:

On Oct 7, 2016, at 11:10 PM, Andrew Trick via swift-dev <swift-dev@swift.org> wrote:
** World 1: SSA @inout

Projecting an element produces a new SILValue. Does this SILValue have
it's own ownership associated with it's lifetime, or is it derived
from it's parent object by looking through projections?

Either way, projecting any subelement requires reconstructing the
entire aggregate in SIL, through all nesting levels. This will
generate a massive amount of SILValues. Superficially they all need
their own storage.

[We could claim that projections don't need storage, but that only
solves one side of the problem.]

[I argue that this actually obscures the producer/consumer
relationship, which is the opposite of the intention of moving to
SSA. Projecting subelements for mutation fundamentally doesn't make
sense. It does make sense to borrow a subelement (not for
mutation). It also makes sense to project a mutable storage
location. The natural way to project a storage location is by
projecting an address...]

I think there's a size threshold at which SSA @inout is manageable, and might lead to overall better register-oriented code, if the aggregates can be exploded into a small number of individual values. The cost of reconstructing the aggregate could be mitigated somewhat by introducing 'insert' instructions for aggregates to pair with the projection instructions, similar to how LLVM has insert/extractelement. "%x = project_value %y.field; %x' = transform(%x); %y' = insert %y.field, %x" isn't too terrible compared to the address-oriented formulation. Tracking ownership state through projections and insertions might tricky; haven't thought about that aspect.

-Joe

We would have to make sure SROA+mem2reg could still kick in. If that happens, I don’t think we need to worry about inout ownership semantics anymore. A struct_extract is then essentially a borrow. It’s parent’s lifetime needs to be guaranteed, but I don’t know if the subobject needs explicit scoping in SIL since there’s no inout scopes to worry about and nothing for the runtime to do when the scope ends .

(Incidentally, this would never happen to a CoW type that has a uniqueness check—to mutate a CoW type, it’s value needs to be in memory).

We could have an is_unique instruction that returns a “new” reference to storage. But our model for CoW data types relies mutating methods so I don't really know what you have in mind.

-Andy

···

On Oct 11, 2016, at 10:10 AM, Joe Groff <jgroff@apple.com> wrote:

On Oct 10, 2016, at 6:58 PM, Andrew Trick <atrick@apple.com> wrote:

On Oct 10, 2016, at 6:23 PM, Joe Groff <jgroff@apple.com> wrote:

On Oct 7, 2016, at 11:10 PM, Andrew Trick via swift-dev <swift-dev@swift.org> wrote:
** World 1: SSA @inout

Projecting an element produces a new SILValue. Does this SILValue have
it's own ownership associated with it's lifetime, or is it derived
from it's parent object by looking through projections?

Either way, projecting any subelement requires reconstructing the
entire aggregate in SIL, through all nesting levels. This will
generate a massive amount of SILValues. Superficially they all need
their own storage.

[We could claim that projections don't need storage, but that only
solves one side of the problem.]

[I argue that this actually obscures the producer/consumer
relationship, which is the opposite of the intention of moving to
SSA. Projecting subelements for mutation fundamentally doesn't make
sense. It does make sense to borrow a subelement (not for
mutation). It also makes sense to project a mutable storage
location. The natural way to project a storage location is by
projecting an address...]

I think there's a size threshold at which SSA @inout is manageable, and might lead to overall better register-oriented code, if the aggregates can be exploded into a small number of individual values. The cost of reconstructing the aggregate could be mitigated somewhat by introducing 'insert' instructions for aggregates to pair with the projection instructions, similar to how LLVM has insert/extractelement. "%x = project_value %y.field; %x' = transform(%x); %y' = insert %y.field, %x" isn't too terrible compared to the address-oriented formulation. Tracking ownership state through projections and insertions might tricky; haven't thought about that aspect.

-Joe

We would have to make sure SROA+mem2reg could still kick in. If that happens, I don’t think we need to worry about inout ownership semantics anymore. A struct_extract is then essentially a borrow. It’s parent’s lifetime needs to be guaranteed, but I don’t know if the subobject needs explicit scoping in SIL since there’s no inout scopes to worry about and nothing for the runtime to do when the scope ends .

(Incidentally, this would never happen to a CoW type that has a uniqueness check—to mutate a CoW type, it’s value needs to be in memory).

Does a uniqueness check still need to be associated with a memory location once we associate ownership with SSA values? It seems to me like it wouldn't necessarily need to be. One thing I'd like us to work toward is being able to reliably apply uniqueness checks to rvalues, so that code in a "pure functional" style gets the same optimization benefits as code that explicitly uses inouts.

-Joe

As I've pointed out in the past, this doesn't make any semantic sense. Projecting out a buffer reference as a true r-value creates an independent value and therefore requires bumping the reference count. The only query that makes semantic sense is "does this value hold a unique reference to its buffer", which requires some sort of language tool for talking abstractly about values without creating new, independent values. Our only existing language tool for that is inout, which allows you to talk about the value stored in a specific mutable variable. Ownership will give us a second and more general tool, borrowing, which allows you abstractly refer to immutable existing values.

My point is that this isn't a SIL representational problem, it's a semantic problem that's being correctly reflected in SIL. The language tools which will allow us to fix the semantic problem will necessarily also need to be reflected correctly in SIL, and so there won't be a representational problem.

John.

···

On Oct 11, 2016, at 10:10 AM, Joe Groff via swift-dev <swift-dev@swift.org> wrote:

On Oct 10, 2016, at 6:58 PM, Andrew Trick <atrick@apple.com> wrote:

On Oct 10, 2016, at 6:23 PM, Joe Groff <jgroff@apple.com> wrote:

On Oct 7, 2016, at 11:10 PM, Andrew Trick via swift-dev <swift-dev@swift.org> wrote:
** World 1: SSA @inout

Projecting an element produces a new SILValue. Does this SILValue have
it's own ownership associated with it's lifetime, or is it derived
from it's parent object by looking through projections?

Either way, projecting any subelement requires reconstructing the
entire aggregate in SIL, through all nesting levels. This will
generate a massive amount of SILValues. Superficially they all need
their own storage.

[We could claim that projections don't need storage, but that only
solves one side of the problem.]

[I argue that this actually obscures the producer/consumer
relationship, which is the opposite of the intention of moving to
SSA. Projecting subelements for mutation fundamentally doesn't make
sense. It does make sense to borrow a subelement (not for
mutation). It also makes sense to project a mutable storage
location. The natural way to project a storage location is by
projecting an address...]

I think there's a size threshold at which SSA @inout is manageable, and might lead to overall better register-oriented code, if the aggregates can be exploded into a small number of individual values. The cost of reconstructing the aggregate could be mitigated somewhat by introducing 'insert' instructions for aggregates to pair with the projection instructions, similar to how LLVM has insert/extractelement. "%x = project_value %y.field; %x' = transform(%x); %y' = insert %y.field, %x" isn't too terrible compared to the address-oriented formulation. Tracking ownership state through projections and insertions might tricky; haven't thought about that aspect.

-Joe

We would have to make sure SROA+mem2reg could still kick in. If that happens, I don’t think we need to worry about inout ownership semantics anymore. A struct_extract is then essentially a borrow. It’s parent’s lifetime needs to be guaranteed, but I don’t know if the subobject needs explicit scoping in SIL since there’s no inout scopes to worry about and nothing for the runtime to do when the scope ends .

(Incidentally, this would never happen to a CoW type that has a uniqueness check—to mutate a CoW type, it’s value needs to be in memory).

Does a uniqueness check still need to be associated with a memory location once we associate ownership with SSA values? It seems to me like it wouldn't necessarily need to be. One thing I'd like us to work toward is being able to reliably apply uniqueness checks to rvalues, so that code in a "pure functional" style gets the same optimization benefits as code that explicitly uses inouts.

It doesn't fundamentally have to be tied to mutating methods. After all, you ought to be able to take a value parameter you received as uniquely-referenced, and work on that in-place:

func appendTwoArrays(a: [Int], b: [Int]) -> [Int] {
  var a2 = __move__ a // fake syntax to force a move of ownership
  if isUniquelyReferenced(&a2) {
    a2.buffer._appendInPlace(b.buffer)
  } else {
    a2.buffer = Array(buffer: ArrayBuffer(appending: a2.buffer, and: b.buffer)
  }
  return a2
}

-Joe

···

On Oct 11, 2016, at 10:33 AM, Andrew Trick <atrick@apple.com> wrote:

On Oct 11, 2016, at 10:10 AM, Joe Groff <jgroff@apple.com> wrote:

On Oct 10, 2016, at 6:58 PM, Andrew Trick <atrick@apple.com> wrote:

On Oct 10, 2016, at 6:23 PM, Joe Groff <jgroff@apple.com> wrote:

On Oct 7, 2016, at 11:10 PM, Andrew Trick via swift-dev <swift-dev@swift.org> wrote:
** World 1: SSA @inout

Projecting an element produces a new SILValue. Does this SILValue have
it's own ownership associated with it's lifetime, or is it derived
from it's parent object by looking through projections?

Either way, projecting any subelement requires reconstructing the
entire aggregate in SIL, through all nesting levels. This will
generate a massive amount of SILValues. Superficially they all need
their own storage.

[We could claim that projections don't need storage, but that only
solves one side of the problem.]

[I argue that this actually obscures the producer/consumer
relationship, which is the opposite of the intention of moving to
SSA. Projecting subelements for mutation fundamentally doesn't make
sense. It does make sense to borrow a subelement (not for
mutation). It also makes sense to project a mutable storage
location. The natural way to project a storage location is by
projecting an address...]

I think there's a size threshold at which SSA @inout is manageable, and might lead to overall better register-oriented code, if the aggregates can be exploded into a small number of individual values. The cost of reconstructing the aggregate could be mitigated somewhat by introducing 'insert' instructions for aggregates to pair with the projection instructions, similar to how LLVM has insert/extractelement. "%x = project_value %y.field; %x' = transform(%x); %y' = insert %y.field, %x" isn't too terrible compared to the address-oriented formulation. Tracking ownership state through projections and insertions might tricky; haven't thought about that aspect.

-Joe

We would have to make sure SROA+mem2reg could still kick in. If that happens, I don’t think we need to worry about inout ownership semantics anymore. A struct_extract is then essentially a borrow. It’s parent’s lifetime needs to be guaranteed, but I don’t know if the subobject needs explicit scoping in SIL since there’s no inout scopes to worry about and nothing for the runtime to do when the scope ends .

(Incidentally, this would never happen to a CoW type that has a uniqueness check—to mutate a CoW type, it’s value needs to be in memory).

Does a uniqueness check still need to be associated with a memory location once we associate ownership with SSA values? It seems to me like it wouldn't necessarily need to be. One thing I'd like us to work toward is being able to reliably apply uniqueness checks to rvalues, so that code in a "pure functional" style gets the same optimization benefits as code that explicitly uses inouts.

-Joe

We could have an is_unique instruction that returns a “new” reference to storage. But our model for CoW data types relies mutating methods so I don't really know what you have in mind.

If we have @owned values, then we also have the ability to do a uniqueness check on that value, don't we? This would necessarily consume the value, but we could conditionally produce a new known-unique value on the path where the uniqueness check succeeds.

entry(%1: @owned $X):
  is_uniquely_referenced %1, yes, no
yes(%2: /*unique*/ @owned $X):
  // %2 is unique, until copied at least
no(%3: @owned %X):
  // %3 is not

-Joe

···

On Oct 11, 2016, at 10:50 AM, John McCall <rjmccall@apple.com> wrote:

On Oct 11, 2016, at 10:10 AM, Joe Groff via swift-dev <swift-dev@swift.org> wrote:

On Oct 10, 2016, at 6:58 PM, Andrew Trick <atrick@apple.com> wrote:

On Oct 10, 2016, at 6:23 PM, Joe Groff <jgroff@apple.com> wrote:

On Oct 7, 2016, at 11:10 PM, Andrew Trick via swift-dev <swift-dev@swift.org> wrote:
** World 1: SSA @inout

Projecting an element produces a new SILValue. Does this SILValue have
it's own ownership associated with it's lifetime, or is it derived
from it's parent object by looking through projections?

Either way, projecting any subelement requires reconstructing the
entire aggregate in SIL, through all nesting levels. This will
generate a massive amount of SILValues. Superficially they all need
their own storage.

[We could claim that projections don't need storage, but that only
solves one side of the problem.]

[I argue that this actually obscures the producer/consumer
relationship, which is the opposite of the intention of moving to
SSA. Projecting subelements for mutation fundamentally doesn't make
sense. It does make sense to borrow a subelement (not for
mutation). It also makes sense to project a mutable storage
location. The natural way to project a storage location is by
projecting an address...]

I think there's a size threshold at which SSA @inout is manageable, and might lead to overall better register-oriented code, if the aggregates can be exploded into a small number of individual values. The cost of reconstructing the aggregate could be mitigated somewhat by introducing 'insert' instructions for aggregates to pair with the projection instructions, similar to how LLVM has insert/extractelement. "%x = project_value %y.field; %x' = transform(%x); %y' = insert %y.field, %x" isn't too terrible compared to the address-oriented formulation. Tracking ownership state through projections and insertions might tricky; haven't thought about that aspect.

-Joe

We would have to make sure SROA+mem2reg could still kick in. If that happens, I don’t think we need to worry about inout ownership semantics anymore. A struct_extract is then essentially a borrow. It’s parent’s lifetime needs to be guaranteed, but I don’t know if the subobject needs explicit scoping in SIL since there’s no inout scopes to worry about and nothing for the runtime to do when the scope ends .

(Incidentally, this would never happen to a CoW type that has a uniqueness check—to mutate a CoW type, it’s value needs to be in memory).

Does a uniqueness check still need to be associated with a memory location once we associate ownership with SSA values? It seems to me like it wouldn't necessarily need to be. One thing I'd like us to work toward is being able to reliably apply uniqueness checks to rvalues, so that code in a "pure functional" style gets the same optimization benefits as code that explicitly uses inouts.

As I've pointed out in the past, this doesn't make any semantic sense. Projecting out a buffer reference as a true r-value creates an independent value and therefore requires bumping the reference count. The only query that makes semantic sense is "does this value hold a unique reference to its buffer", which requires some sort of language tool for talking abstractly about values without creating new, independent values. Our only existing language tool for that is inout, which allows you to talk about the value stored in a specific mutable variable. Ownership will give us a second and more general tool, borrowing, which allows you abstractly refer to immutable existing values.

You had to copy $X to make it @owned. You could check uniqueness of @borrowed $X, but then you’d need to copy to create a new array (mutation) before destroying the original that you borrowed from.

-Andy

···

On Oct 11, 2016, at 11:02 AM, Joe Groff <jgroff@apple.com> wrote:

On Oct 11, 2016, at 10:50 AM, John McCall <rjmccall@apple.com> wrote:

On Oct 11, 2016, at 10:10 AM, Joe Groff via swift-dev <swift-dev@swift.org> wrote:

On Oct 10, 2016, at 6:58 PM, Andrew Trick <atrick@apple.com> wrote:

On Oct 10, 2016, at 6:23 PM, Joe Groff <jgroff@apple.com> wrote:

On Oct 7, 2016, at 11:10 PM, Andrew Trick via swift-dev <swift-dev@swift.org> wrote:
** World 1: SSA @inout

Projecting an element produces a new SILValue. Does this SILValue have
it's own ownership associated with it's lifetime, or is it derived
from it's parent object by looking through projections?

Either way, projecting any subelement requires reconstructing the
entire aggregate in SIL, through all nesting levels. This will
generate a massive amount of SILValues. Superficially they all need
their own storage.

[We could claim that projections don't need storage, but that only
solves one side of the problem.]

[I argue that this actually obscures the producer/consumer
relationship, which is the opposite of the intention of moving to
SSA. Projecting subelements for mutation fundamentally doesn't make
sense. It does make sense to borrow a subelement (not for
mutation). It also makes sense to project a mutable storage
location. The natural way to project a storage location is by
projecting an address...]

I think there's a size threshold at which SSA @inout is manageable, and might lead to overall better register-oriented code, if the aggregates can be exploded into a small number of individual values. The cost of reconstructing the aggregate could be mitigated somewhat by introducing 'insert' instructions for aggregates to pair with the projection instructions, similar to how LLVM has insert/extractelement. "%x = project_value %y.field; %x' = transform(%x); %y' = insert %y.field, %x" isn't too terrible compared to the address-oriented formulation. Tracking ownership state through projections and insertions might tricky; haven't thought about that aspect.

-Joe

We would have to make sure SROA+mem2reg could still kick in. If that happens, I don’t think we need to worry about inout ownership semantics anymore. A struct_extract is then essentially a borrow. It’s parent’s lifetime needs to be guaranteed, but I don’t know if the subobject needs explicit scoping in SIL since there’s no inout scopes to worry about and nothing for the runtime to do when the scope ends .

(Incidentally, this would never happen to a CoW type that has a uniqueness check—to mutate a CoW type, it’s value needs to be in memory).

Does a uniqueness check still need to be associated with a memory location once we associate ownership with SSA values? It seems to me like it wouldn't necessarily need to be. One thing I'd like us to work toward is being able to reliably apply uniqueness checks to rvalues, so that code in a "pure functional" style gets the same optimization benefits as code that explicitly uses inouts.

As I've pointed out in the past, this doesn't make any semantic sense. Projecting out a buffer reference as a true r-value creates an independent value and therefore requires bumping the reference count. The only query that makes semantic sense is "does this value hold a unique reference to its buffer", which requires some sort of language tool for talking abstractly about values without creating new, independent values. Our only existing language tool for that is inout, which allows you to talk about the value stored in a specific mutable variable. Ownership will give us a second and more general tool, borrowing, which allows you abstractly refer to immutable existing values.

If we have @owned values, then we also have the ability to do a uniqueness check on that value, don't we? This would necessarily consume the value, but we could conditionally produce a new known-unique value on the path where the uniqueness check succeeds.

entry(%1: @owned $X):
is_uniquely_referenced %1, yes, no
yes(%2: /*unique*/ @owned $X):
// %2 is unique, until copied at least
no(%3: @owned %X):
// %3 is not

-Joe

** World 1: SSA @inout

Projecting an element produces a new SILValue. Does this SILValue have
it's own ownership associated with it's lifetime, or is it derived
from it's parent object by looking through projections?

Either way, projecting any subelement requires reconstructing the
entire aggregate in SIL, through all nesting levels. This will
generate a massive amount of SILValues. Superficially they all need
their own storage.

[We could claim that projections don't need storage, but that only
solves one side of the problem.]

[I argue that this actually obscures the producer/consumer
relationship, which is the opposite of the intention of moving to
SSA. Projecting subelements for mutation fundamentally doesn't make
sense. It does make sense to borrow a subelement (not for
mutation). It also makes sense to project a mutable storage
location. The natural way to project a storage location is by
projecting an address...]

I think there's a size threshold at which SSA @inout is manageable, and might lead to overall better register-oriented code, if the aggregates can be exploded into a small number of individual values. The cost of reconstructing the aggregate could be mitigated somewhat by introducing 'insert' instructions for aggregates to pair with the projection instructions, similar to how LLVM has insert/extractelement. "%x = project_value %y.field; %x' = transform(%x); %y' = insert %y.field, %x" isn't too terrible compared to the address-oriented formulation. Tracking ownership state through projections and insertions might tricky; haven't thought about that aspect.

-Joe

We would have to make sure SROA+mem2reg could still kick in. If that happens, I don’t think we need to worry about inout ownership semantics anymore. A struct_extract is then essentially a borrow. It’s parent’s lifetime needs to be guaranteed, but I don’t know if the subobject needs explicit scoping in SIL since there’s no inout scopes to worry about and nothing for the runtime to do when the scope ends .

(Incidentally, this would never happen to a CoW type that has a uniqueness check—to mutate a CoW type, it’s value needs to be in memory).

Does a uniqueness check still need to be associated with a memory location once we associate ownership with SSA values? It seems to me like it wouldn't necessarily need to be. One thing I'd like us to work toward is being able to reliably apply uniqueness checks to rvalues, so that code in a "pure functional" style gets the same optimization benefits as code that explicitly uses inouts.

As I've pointed out in the past, this doesn't make any semantic sense. Projecting out a buffer reference as a true r-value creates an independent value and therefore requires bumping the reference count. The only query that makes semantic sense is "does this value hold a unique reference to its buffer", which requires some sort of language tool for talking abstractly about values without creating new, independent values. Our only existing language tool for that is inout, which allows you to talk about the value stored in a specific mutable variable. Ownership will give us a second and more general tool, borrowing, which allows you abstractly refer to immutable existing values.

If we have @owned values, then we also have the ability to do a uniqueness check on that value, don't we? This would necessarily consume the value, but we could conditionally produce a new known-unique value on the path where the uniqueness check succeeds.

entry(%1: @owned $X):
is_uniquely_referenced %1, yes, no
yes(%2: /*unique*/ @owned $X):
// %2 is unique, until copied at least
no(%3: @owned %X):
// %3 is not

-Joe

You had to copy $X to make it @owned.

This is the part I think I'm missing. It's not clear to me why this is the case, though. You could have had an Array return value that has never been stored in memory, so never needed to be copied. If you have an @inout memory location, and we enforce the single-owner property on inouts so that they act like a Rust-style mutable borrow, then you should also be able to take the value out of the memory location as long as you move a value back in before the scope of the inout expires.

-Joe

···

On Oct 11, 2016, at 11:19 AM, Andrew Trick <atrick@apple.com> wrote:

On Oct 11, 2016, at 11:02 AM, Joe Groff <jgroff@apple.com> wrote:

On Oct 11, 2016, at 10:50 AM, John McCall <rjmccall@apple.com> wrote:

On Oct 11, 2016, at 10:10 AM, Joe Groff via swift-dev <swift-dev@swift.org> wrote:

On Oct 10, 2016, at 6:58 PM, Andrew Trick <atrick@apple.com> wrote:

On Oct 10, 2016, at 6:23 PM, Joe Groff <jgroff@apple.com> wrote:

On Oct 7, 2016, at 11:10 PM, Andrew Trick via swift-dev <swift-dev@swift.org> wrote:

You could check uniqueness of @borrowed $X, but then you’d need to copy to create a new array (mutation) before destroying the original that you borrowed from.

-Andy

I'm not sure what your goal is here vs. relying on borrowing. Both still require actual analysis to prove uniqueness at any given point, as you note with your "until copied at least" comment.

Also, from a higher level, I'm not sure why we care whether a value that was semantically an r-value was a unique reference. CoW types are immutable even if the reference is shared, and that should structurally straightforward to take advantage of under any ownership representation.

John.

···

On Oct 11, 2016, at 11:22 AM, Joe Groff <jgroff@apple.com> wrote:

On Oct 11, 2016, at 11:19 AM, Andrew Trick <atrick@apple.com> wrote:

On Oct 11, 2016, at 11:02 AM, Joe Groff <jgroff@apple.com> wrote:

On Oct 11, 2016, at 10:50 AM, John McCall <rjmccall@apple.com> wrote:

On Oct 11, 2016, at 10:10 AM, Joe Groff via swift-dev <swift-dev@swift.org> wrote:

On Oct 10, 2016, at 6:58 PM, Andrew Trick <atrick@apple.com> wrote:

On Oct 10, 2016, at 6:23 PM, Joe Groff <jgroff@apple.com> wrote:

On Oct 7, 2016, at 11:10 PM, Andrew Trick via swift-dev <swift-dev@swift.org> wrote:
** World 1: SSA @inout

Projecting an element produces a new SILValue. Does this SILValue have
it's own ownership associated with it's lifetime, or is it derived
from it's parent object by looking through projections?

Either way, projecting any subelement requires reconstructing the
entire aggregate in SIL, through all nesting levels. This will
generate a massive amount of SILValues. Superficially they all need
their own storage.

[We could claim that projections don't need storage, but that only
solves one side of the problem.]

[I argue that this actually obscures the producer/consumer
relationship, which is the opposite of the intention of moving to
SSA. Projecting subelements for mutation fundamentally doesn't make
sense. It does make sense to borrow a subelement (not for
mutation). It also makes sense to project a mutable storage
location. The natural way to project a storage location is by
projecting an address...]

I think there's a size threshold at which SSA @inout is manageable, and might lead to overall better register-oriented code, if the aggregates can be exploded into a small number of individual values. The cost of reconstructing the aggregate could be mitigated somewhat by introducing 'insert' instructions for aggregates to pair with the projection instructions, similar to how LLVM has insert/extractelement. "%x = project_value %y.field; %x' = transform(%x); %y' = insert %y.field, %x" isn't too terrible compared to the address-oriented formulation. Tracking ownership state through projections and insertions might tricky; haven't thought about that aspect.

-Joe

We would have to make sure SROA+mem2reg could still kick in. If that happens, I don’t think we need to worry about inout ownership semantics anymore. A struct_extract is then essentially a borrow. It’s parent’s lifetime needs to be guaranteed, but I don’t know if the subobject needs explicit scoping in SIL since there’s no inout scopes to worry about and nothing for the runtime to do when the scope ends .

(Incidentally, this would never happen to a CoW type that has a uniqueness check—to mutate a CoW type, it’s value needs to be in memory).

Does a uniqueness check still need to be associated with a memory location once we associate ownership with SSA values? It seems to me like it wouldn't necessarily need to be. One thing I'd like us to work toward is being able to reliably apply uniqueness checks to rvalues, so that code in a "pure functional" style gets the same optimization benefits as code that explicitly uses inouts.

As I've pointed out in the past, this doesn't make any semantic sense. Projecting out a buffer reference as a true r-value creates an independent value and therefore requires bumping the reference count. The only query that makes semantic sense is "does this value hold a unique reference to its buffer", which requires some sort of language tool for talking abstractly about values without creating new, independent values. Our only existing language tool for that is inout, which allows you to talk about the value stored in a specific mutable variable. Ownership will give us a second and more general tool, borrowing, which allows you abstractly refer to immutable existing values.

If we have @owned values, then we also have the ability to do a uniqueness check on that value, don't we? This would necessarily consume the value, but we could conditionally produce a new known-unique value on the path where the uniqueness check succeeds.

entry(%1: @owned $X):
is_uniquely_referenced %1, yes, no
yes(%2: /*unique*/ @owned $X):
// %2 is unique, until copied at least
no(%3: @owned %X):
// %3 is not

-Joe

You had to copy $X to make it @owned.

This is the part I think I'm missing. It's not clear to me why this is the case, though. You could have had an Array return value that has never been stored in memory, so never needed to be copied. If you have an @inout memory location, and we enforce the single-owner property on inouts so that they act like a Rust-style mutable borrow, then you should also be able to take the value out of the memory location as long as you move a value back in before the scope of the inout expires.

My high-level goal was to get to a point where we could support in-place optimizations on unique buffers that belong to values that are semantically rvalues at the language level. It seems to me that we ought to be able to make 'stringA + B + C + D' as efficient as '{ var tmp = stringA; tmp += B; tmp += C; tmp += D; tmp }()' by enabling uniqueness checks and in-place mutation of the unique-by-construction results of +-ing strings. If you think that works under the borrow/inout-in-memory model, then no problem; I'm also trying to understand the design space a bit more.

-Joe

···

On Oct 11, 2016, at 11:44 AM, John McCall <rjmccall@apple.com> wrote:

On Oct 11, 2016, at 11:22 AM, Joe Groff <jgroff@apple.com> wrote:

On Oct 11, 2016, at 11:19 AM, Andrew Trick <atrick@apple.com> wrote:

On Oct 11, 2016, at 11:02 AM, Joe Groff <jgroff@apple.com> wrote:

On Oct 11, 2016, at 10:50 AM, John McCall <rjmccall@apple.com> wrote:

On Oct 11, 2016, at 10:10 AM, Joe Groff via swift-dev <swift-dev@swift.org> wrote:

On Oct 10, 2016, at 6:58 PM, Andrew Trick <atrick@apple.com> wrote:

On Oct 10, 2016, at 6:23 PM, Joe Groff <jgroff@apple.com> wrote:

On Oct 7, 2016, at 11:10 PM, Andrew Trick via swift-dev <swift-dev@swift.org> wrote:
** World 1: SSA @inout

Projecting an element produces a new SILValue. Does this SILValue have
it's own ownership associated with it's lifetime, or is it derived
from it's parent object by looking through projections?

Either way, projecting any subelement requires reconstructing the
entire aggregate in SIL, through all nesting levels. This will
generate a massive amount of SILValues. Superficially they all need
their own storage.

[We could claim that projections don't need storage, but that only
solves one side of the problem.]

[I argue that this actually obscures the producer/consumer
relationship, which is the opposite of the intention of moving to
SSA. Projecting subelements for mutation fundamentally doesn't make
sense. It does make sense to borrow a subelement (not for
mutation). It also makes sense to project a mutable storage
location. The natural way to project a storage location is by
projecting an address...]

I think there's a size threshold at which SSA @inout is manageable, and might lead to overall better register-oriented code, if the aggregates can be exploded into a small number of individual values. The cost of reconstructing the aggregate could be mitigated somewhat by introducing 'insert' instructions for aggregates to pair with the projection instructions, similar to how LLVM has insert/extractelement. "%x = project_value %y.field; %x' = transform(%x); %y' = insert %y.field, %x" isn't too terrible compared to the address-oriented formulation. Tracking ownership state through projections and insertions might tricky; haven't thought about that aspect.

-Joe

We would have to make sure SROA+mem2reg could still kick in. If that happens, I don’t think we need to worry about inout ownership semantics anymore. A struct_extract is then essentially a borrow. It’s parent’s lifetime needs to be guaranteed, but I don’t know if the subobject needs explicit scoping in SIL since there’s no inout scopes to worry about and nothing for the runtime to do when the scope ends .

(Incidentally, this would never happen to a CoW type that has a uniqueness check—to mutate a CoW type, it’s value needs to be in memory).

Does a uniqueness check still need to be associated with a memory location once we associate ownership with SSA values? It seems to me like it wouldn't necessarily need to be. One thing I'd like us to work toward is being able to reliably apply uniqueness checks to rvalues, so that code in a "pure functional" style gets the same optimization benefits as code that explicitly uses inouts.

As I've pointed out in the past, this doesn't make any semantic sense. Projecting out a buffer reference as a true r-value creates an independent value and therefore requires bumping the reference count. The only query that makes semantic sense is "does this value hold a unique reference to its buffer", which requires some sort of language tool for talking abstractly about values without creating new, independent values. Our only existing language tool for that is inout, which allows you to talk about the value stored in a specific mutable variable. Ownership will give us a second and more general tool, borrowing, which allows you abstractly refer to immutable existing values.

If we have @owned values, then we also have the ability to do a uniqueness check on that value, don't we? This would necessarily consume the value, but we could conditionally produce a new known-unique value on the path where the uniqueness check succeeds.

entry(%1: @owned $X):
is_uniquely_referenced %1, yes, no
yes(%2: /*unique*/ @owned $X):
// %2 is unique, until copied at least
no(%3: @owned %X):
// %3 is not

-Joe

You had to copy $X to make it @owned.

This is the part I think I'm missing. It's not clear to me why this is the case, though. You could have had an Array return value that has never been stored in memory, so never needed to be copied. If you have an @inout memory location, and we enforce the single-owner property on inouts so that they act like a Rust-style mutable borrow, then you should also be able to take the value out of the memory location as long as you move a value back in before the scope of the inout expires.

I'm not sure what your goal is here vs. relying on borrowing. Both still require actual analysis to prove uniqueness at any given point, as you note with your "until copied at least" comment.

Also, from a higher level, I'm not sure why we care whether a value that was semantically an r-value was a unique reference. CoW types are immutable even if the reference is shared, and that should structurally straightforward to take advantage of under any ownership representation.

Ah right, that optimization. The problem here with using borrows is that you really want static enforcement that both (1) you've really got ownership of a unique reference (so e.g. you aren't just forwarding a borrowed value down) and (2) you're not accidentally copying the reference and so ruining the uniqueness check. Those are hard guarantees to get with an implicitly-copyable type.

I wonder if it would make more sense to make copy-on-write buffer references a move-only type, so that as long as you were just working with the raw reference (as opposed to the CoW aggregate, which would remain copyable) it wouldn't get implicitly copied anymore. You could have mutable and immutable buffer reference types, both move-only, and there could be a consuming checkUnique operation on the immutable one that, I dunno, returned an Either of the mutable and immutable versions.

For CoW aggregates, you'd need some @copied attribute on the field to make sure that the CoW attribute was still copyable. Within the implementation of the type, though, you would be projecting out the reference immediately, and thereafter you'd be certain that you were borrowing / moving it around as appropriate.

I dunno. It's an idea.

John.

···

On Oct 11, 2016, at 11:49 AM, Joe Groff <jgroff@apple.com> wrote:

On Oct 11, 2016, at 11:44 AM, John McCall <rjmccall@apple.com> wrote:

On Oct 11, 2016, at 11:22 AM, Joe Groff <jgroff@apple.com> wrote:

On Oct 11, 2016, at 11:19 AM, Andrew Trick <atrick@apple.com> wrote:

On Oct 11, 2016, at 11:02 AM, Joe Groff <jgroff@apple.com> wrote:

On Oct 11, 2016, at 10:50 AM, John McCall <rjmccall@apple.com> wrote:

On Oct 11, 2016, at 10:10 AM, Joe Groff via swift-dev <swift-dev@swift.org> wrote:

On Oct 10, 2016, at 6:58 PM, Andrew Trick <atrick@apple.com> wrote:

On Oct 10, 2016, at 6:23 PM, Joe Groff <jgroff@apple.com> wrote:

On Oct 7, 2016, at 11:10 PM, Andrew Trick via swift-dev <swift-dev@swift.org> wrote:
** World 1: SSA @inout

Projecting an element produces a new SILValue. Does this SILValue have
it's own ownership associated with it's lifetime, or is it derived
from it's parent object by looking through projections?

Either way, projecting any subelement requires reconstructing the
entire aggregate in SIL, through all nesting levels. This will
generate a massive amount of SILValues. Superficially they all need
their own storage.

[We could claim that projections don't need storage, but that only
solves one side of the problem.]

[I argue that this actually obscures the producer/consumer
relationship, which is the opposite of the intention of moving to
SSA. Projecting subelements for mutation fundamentally doesn't make
sense. It does make sense to borrow a subelement (not for
mutation). It also makes sense to project a mutable storage
location. The natural way to project a storage location is by
projecting an address...]

I think there's a size threshold at which SSA @inout is manageable, and might lead to overall better register-oriented code, if the aggregates can be exploded into a small number of individual values. The cost of reconstructing the aggregate could be mitigated somewhat by introducing 'insert' instructions for aggregates to pair with the projection instructions, similar to how LLVM has insert/extractelement. "%x = project_value %y.field; %x' = transform(%x); %y' = insert %y.field, %x" isn't too terrible compared to the address-oriented formulation. Tracking ownership state through projections and insertions might tricky; haven't thought about that aspect.

-Joe

We would have to make sure SROA+mem2reg could still kick in. If that happens, I don’t think we need to worry about inout ownership semantics anymore. A struct_extract is then essentially a borrow. It’s parent’s lifetime needs to be guaranteed, but I don’t know if the subobject needs explicit scoping in SIL since there’s no inout scopes to worry about and nothing for the runtime to do when the scope ends .

(Incidentally, this would never happen to a CoW type that has a uniqueness check—to mutate a CoW type, it’s value needs to be in memory).

Does a uniqueness check still need to be associated with a memory location once we associate ownership with SSA values? It seems to me like it wouldn't necessarily need to be. One thing I'd like us to work toward is being able to reliably apply uniqueness checks to rvalues, so that code in a "pure functional" style gets the same optimization benefits as code that explicitly uses inouts.

As I've pointed out in the past, this doesn't make any semantic sense. Projecting out a buffer reference as a true r-value creates an independent value and therefore requires bumping the reference count. The only query that makes semantic sense is "does this value hold a unique reference to its buffer", which requires some sort of language tool for talking abstractly about values without creating new, independent values. Our only existing language tool for that is inout, which allows you to talk about the value stored in a specific mutable variable. Ownership will give us a second and more general tool, borrowing, which allows you abstractly refer to immutable existing values.

If we have @owned values, then we also have the ability to do a uniqueness check on that value, don't we? This would necessarily consume the value, but we could conditionally produce a new known-unique value on the path where the uniqueness check succeeds.

entry(%1: @owned $X):
is_uniquely_referenced %1, yes, no
yes(%2: /*unique*/ @owned $X):
// %2 is unique, until copied at least
no(%3: @owned %X):
// %3 is not

-Joe

You had to copy $X to make it @owned.

This is the part I think I'm missing. It's not clear to me why this is the case, though. You could have had an Array return value that has never been stored in memory, so never needed to be copied. If you have an @inout memory location, and we enforce the single-owner property on inouts so that they act like a Rust-style mutable borrow, then you should also be able to take the value out of the memory location as long as you move a value back in before the scope of the inout expires.

I'm not sure what your goal is here vs. relying on borrowing. Both still require actual analysis to prove uniqueness at any given point, as you note with your "until copied at least" comment.

Also, from a higher level, I'm not sure why we care whether a value that was semantically an r-value was a unique reference. CoW types are immutable even if the reference is shared, and that should structurally straightforward to take advantage of under any ownership representation.

My high-level goal was to get to a point where we could support in-place optimizations on unique buffers that belong to values that are semantically rvalues at the language level. It seems to me that we ought to be able to make 'stringA + B + C + D' as efficient as '{ var tmp = stringA; tmp += B; tmp += C; tmp += D; tmp }()' by enabling uniqueness checks and in-place mutation of the unique-by-construction results of +-ing strings. If you think that works under the borrow/inout-in-memory model, then no problem; I'm also trying to understand the design space a bit more.

So, to project a MutabableArrayStorage we would need to explode an owned Array into an owned ConstArrayStorage (forwarding its value). Then pass it to:
  isUniqueOwned(ConstArrayStorage) -> either<MutableArrayStorage, ConstArrayStorage>

Making it move-only should give us the necessary guarantee for Erik's CoW proposal: Nothing can mutate the storage as long as ConstArrayStorage is alive.

Then how would we get MutabableArrayStorage from inout Array?
We can project the address of an inout ConstArrayStorage from the inout Array. Then we need to magically cast its address to *MutableArrayStorage, and somehow tie it to the inout ConstArrayStorage scope.

-Andy

···

On Oct 11, 2016, at 2:14 PM, John McCall <rjmccall@apple.com> wrote:

On Oct 11, 2016, at 11:49 AM, Joe Groff <jgroff@apple.com> wrote:

On Oct 11, 2016, at 11:44 AM, John McCall <rjmccall@apple.com> wrote:

On Oct 11, 2016, at 11:22 AM, Joe Groff <jgroff@apple.com> wrote:

On Oct 11, 2016, at 11:19 AM, Andrew Trick <atrick@apple.com> wrote:

On Oct 11, 2016, at 11:02 AM, Joe Groff <jgroff@apple.com> wrote:

On Oct 11, 2016, at 10:50 AM, John McCall <rjmccall@apple.com> wrote:

On Oct 11, 2016, at 10:10 AM, Joe Groff via swift-dev <swift-dev@swift.org> wrote:

On Oct 10, 2016, at 6:58 PM, Andrew Trick <atrick@apple.com> wrote:

On Oct 10, 2016, at 6:23 PM, Joe Groff <jgroff@apple.com> wrote:

On Oct 7, 2016, at 11:10 PM, Andrew Trick via swift-dev <swift-dev@swift.org> wrote:
** World 1: SSA @inout

Projecting an element produces a new SILValue. Does this SILValue have
it's own ownership associated with it's lifetime, or is it derived
from it's parent object by looking through projections?

Either way, projecting any subelement requires reconstructing the
entire aggregate in SIL, through all nesting levels. This will
generate a massive amount of SILValues. Superficially they all need
their own storage.

[We could claim that projections don't need storage, but that only
solves one side of the problem.]

[I argue that this actually obscures the producer/consumer
relationship, which is the opposite of the intention of moving to
SSA. Projecting subelements for mutation fundamentally doesn't make
sense. It does make sense to borrow a subelement (not for
mutation). It also makes sense to project a mutable storage
location. The natural way to project a storage location is by
projecting an address...]

I think there's a size threshold at which SSA @inout is manageable, and might lead to overall better register-oriented code, if the aggregates can be exploded into a small number of individual values. The cost of reconstructing the aggregate could be mitigated somewhat by introducing 'insert' instructions for aggregates to pair with the projection instructions, similar to how LLVM has insert/extractelement. "%x = project_value %y.field; %x' = transform(%x); %y' = insert %y.field, %x" isn't too terrible compared to the address-oriented formulation. Tracking ownership state through projections and insertions might tricky; haven't thought about that aspect.

-Joe

We would have to make sure SROA+mem2reg could still kick in. If that happens, I don’t think we need to worry about inout ownership semantics anymore. A struct_extract is then essentially a borrow. It’s parent’s lifetime needs to be guaranteed, but I don’t know if the subobject needs explicit scoping in SIL since there’s no inout scopes to worry about and nothing for the runtime to do when the scope ends .

(Incidentally, this would never happen to a CoW type that has a uniqueness check—to mutate a CoW type, it’s value needs to be in memory).

Does a uniqueness check still need to be associated with a memory location once we associate ownership with SSA values? It seems to me like it wouldn't necessarily need to be. One thing I'd like us to work toward is being able to reliably apply uniqueness checks to rvalues, so that code in a "pure functional" style gets the same optimization benefits as code that explicitly uses inouts.

As I've pointed out in the past, this doesn't make any semantic sense. Projecting out a buffer reference as a true r-value creates an independent value and therefore requires bumping the reference count. The only query that makes semantic sense is "does this value hold a unique reference to its buffer", which requires some sort of language tool for talking abstractly about values without creating new, independent values. Our only existing language tool for that is inout, which allows you to talk about the value stored in a specific mutable variable. Ownership will give us a second and more general tool, borrowing, which allows you abstractly refer to immutable existing values.

If we have @owned values, then we also have the ability to do a uniqueness check on that value, don't we? This would necessarily consume the value, but we could conditionally produce a new known-unique value on the path where the uniqueness check succeeds.

entry(%1: @owned $X):
is_uniquely_referenced %1, yes, no
yes(%2: /*unique*/ @owned $X):
// %2 is unique, until copied at least
no(%3: @owned %X):
// %3 is not

-Joe

You had to copy $X to make it @owned.

This is the part I think I'm missing. It's not clear to me why this is the case, though. You could have had an Array return value that has never been stored in memory, so never needed to be copied. If you have an @inout memory location, and we enforce the single-owner property on inouts so that they act like a Rust-style mutable borrow, then you should also be able to take the value out of the memory location as long as you move a value back in before the scope of the inout expires.

I'm not sure what your goal is here vs. relying on borrowing. Both still require actual analysis to prove uniqueness at any given point, as you note with your "until copied at least" comment.

Also, from a higher level, I'm not sure why we care whether a value that was semantically an r-value was a unique reference. CoW types are immutable even if the reference is shared, and that should structurally straightforward to take advantage of under any ownership representation.

My high-level goal was to get to a point where we could support in-place optimizations on unique buffers that belong to values that are semantically rvalues at the language level. It seems to me that we ought to be able to make 'stringA + B + C + D' as efficient as '{ var tmp = stringA; tmp += B; tmp += C; tmp += D; tmp }()' by enabling uniqueness checks and in-place mutation of the unique-by-construction results of +-ing strings. If you think that works under the borrow/inout-in-memory model, then no problem; I'm also trying to understand the design space a bit more.

Ah right, that optimization. The problem here with using borrows is that you really want static enforcement that both (1) you've really got ownership of a unique reference (so e.g. you aren't just forwarding a borrowed value down) and (2) you're not accidentally copying the reference and so ruining the uniqueness check. Those are hard guarantees to get with an implicitly-copyable type.

I wonder if it would make more sense to make copy-on-write buffer references a move-only type, so that as long as you were just working with the raw reference (as opposed to the CoW aggregate, which would remain copyable) it wouldn't get implicitly copied anymore. You could have mutable and immutable buffer reference types, both move-only, and there could be a consuming checkUnique operation on the immutable one that, I dunno, returned an Either of the mutable and immutable versions.

For CoW aggregates, you'd need some @copied attribute on the field to make sure that the CoW attribute was still copyable. Within the implementation of the type, though, you would be projecting out the reference immediately, and thereafter you'd be certain that you were borrowing / moving it around as appropriate.

I dunno. It's an idea.

John.

Well, makeUnique would be a ConstArrayStorage -> MutableArrayStorage function, and when you were done, you would demote to ConstArrayStorage and write back. It's perhaps a little ugly.

John.

···

On Oct 11, 2016, at 4:48 PM, Andrew Trick <atrick@apple.com> wrote:

On Oct 11, 2016, at 2:14 PM, John McCall <rjmccall@apple.com> wrote:

On Oct 11, 2016, at 11:49 AM, Joe Groff <jgroff@apple.com> wrote:

On Oct 11, 2016, at 11:44 AM, John McCall <rjmccall@apple.com> wrote:

On Oct 11, 2016, at 11:22 AM, Joe Groff <jgroff@apple.com> wrote:

On Oct 11, 2016, at 11:19 AM, Andrew Trick <atrick@apple.com> wrote:

On Oct 11, 2016, at 11:02 AM, Joe Groff <jgroff@apple.com> wrote:

On Oct 11, 2016, at 10:50 AM, John McCall <rjmccall@apple.com> wrote:

On Oct 11, 2016, at 10:10 AM, Joe Groff via swift-dev <swift-dev@swift.org> wrote:

On Oct 10, 2016, at 6:58 PM, Andrew Trick <atrick@apple.com> wrote:

On Oct 10, 2016, at 6:23 PM, Joe Groff <jgroff@apple.com> wrote:

On Oct 7, 2016, at 11:10 PM, Andrew Trick via swift-dev <swift-dev@swift.org> wrote:
** World 1: SSA @inout

Projecting an element produces a new SILValue. Does this SILValue have
it's own ownership associated with it's lifetime, or is it derived
from it's parent object by looking through projections?

Either way, projecting any subelement requires reconstructing the
entire aggregate in SIL, through all nesting levels. This will
generate a massive amount of SILValues. Superficially they all need
their own storage.

[We could claim that projections don't need storage, but that only
solves one side of the problem.]

[I argue that this actually obscures the producer/consumer
relationship, which is the opposite of the intention of moving to
SSA. Projecting subelements for mutation fundamentally doesn't make
sense. It does make sense to borrow a subelement (not for
mutation). It also makes sense to project a mutable storage
location. The natural way to project a storage location is by
projecting an address...]

I think there's a size threshold at which SSA @inout is manageable, and might lead to overall better register-oriented code, if the aggregates can be exploded into a small number of individual values. The cost of reconstructing the aggregate could be mitigated somewhat by introducing 'insert' instructions for aggregates to pair with the projection instructions, similar to how LLVM has insert/extractelement. "%x = project_value %y.field; %x' = transform(%x); %y' = insert %y.field, %x" isn't too terrible compared to the address-oriented formulation. Tracking ownership state through projections and insertions might tricky; haven't thought about that aspect.

-Joe

We would have to make sure SROA+mem2reg could still kick in. If that happens, I don’t think we need to worry about inout ownership semantics anymore. A struct_extract is then essentially a borrow. It’s parent’s lifetime needs to be guaranteed, but I don’t know if the subobject needs explicit scoping in SIL since there’s no inout scopes to worry about and nothing for the runtime to do when the scope ends .

(Incidentally, this would never happen to a CoW type that has a uniqueness check—to mutate a CoW type, it’s value needs to be in memory).

Does a uniqueness check still need to be associated with a memory location once we associate ownership with SSA values? It seems to me like it wouldn't necessarily need to be. One thing I'd like us to work toward is being able to reliably apply uniqueness checks to rvalues, so that code in a "pure functional" style gets the same optimization benefits as code that explicitly uses inouts.

As I've pointed out in the past, this doesn't make any semantic sense. Projecting out a buffer reference as a true r-value creates an independent value and therefore requires bumping the reference count. The only query that makes semantic sense is "does this value hold a unique reference to its buffer", which requires some sort of language tool for talking abstractly about values without creating new, independent values. Our only existing language tool for that is inout, which allows you to talk about the value stored in a specific mutable variable. Ownership will give us a second and more general tool, borrowing, which allows you abstractly refer to immutable existing values.

If we have @owned values, then we also have the ability to do a uniqueness check on that value, don't we? This would necessarily consume the value, but we could conditionally produce a new known-unique value on the path where the uniqueness check succeeds.

entry(%1: @owned $X):
is_uniquely_referenced %1, yes, no
yes(%2: /*unique*/ @owned $X):
// %2 is unique, until copied at least
no(%3: @owned %X):
// %3 is not

-Joe

You had to copy $X to make it @owned.

This is the part I think I'm missing. It's not clear to me why this is the case, though. You could have had an Array return value that has never been stored in memory, so never needed to be copied. If you have an @inout memory location, and we enforce the single-owner property on inouts so that they act like a Rust-style mutable borrow, then you should also be able to take the value out of the memory location as long as you move a value back in before the scope of the inout expires.

I'm not sure what your goal is here vs. relying on borrowing. Both still require actual analysis to prove uniqueness at any given point, as you note with your "until copied at least" comment.

Also, from a higher level, I'm not sure why we care whether a value that was semantically an r-value was a unique reference. CoW types are immutable even if the reference is shared, and that should structurally straightforward to take advantage of under any ownership representation.

My high-level goal was to get to a point where we could support in-place optimizations on unique buffers that belong to values that are semantically rvalues at the language level. It seems to me that we ought to be able to make 'stringA + B + C + D' as efficient as '{ var tmp = stringA; tmp += B; tmp += C; tmp += D; tmp }()' by enabling uniqueness checks and in-place mutation of the unique-by-construction results of +-ing strings. If you think that works under the borrow/inout-in-memory model, then no problem; I'm also trying to understand the design space a bit more.

Ah right, that optimization. The problem here with using borrows is that you really want static enforcement that both (1) you've really got ownership of a unique reference (so e.g. you aren't just forwarding a borrowed value down) and (2) you're not accidentally copying the reference and so ruining the uniqueness check. Those are hard guarantees to get with an implicitly-copyable type.

I wonder if it would make more sense to make copy-on-write buffer references a move-only type, so that as long as you were just working with the raw reference (as opposed to the CoW aggregate, which would remain copyable) it wouldn't get implicitly copied anymore. You could have mutable and immutable buffer reference types, both move-only, and there could be a consuming checkUnique operation on the immutable one that, I dunno, returned an Either of the mutable and immutable versions.

For CoW aggregates, you'd need some @copied attribute on the field to make sure that the CoW attribute was still copyable. Within the implementation of the type, though, you would be projecting out the reference immediately, and thereafter you'd be certain that you were borrowing / moving it around as appropriate.

I dunno. It's an idea.

John.

So, to project a MutabableArrayStorage we would need to explode an owned Array into an owned ConstArrayStorage (forwarding its value). Then pass it to:
isUniqueOwned(ConstArrayStorage) -> either<MutableArrayStorage, ConstArrayStorage>

Making it move-only should give us the necessary guarantee for Erik's CoW proposal: Nothing can mutate the storage as long as ConstArrayStorage is alive.

Then how would we get MutabableArrayStorage from inout Array?
We can project the address of an inout ConstArrayStorage from the inout Array. Then we need to magically cast its address to *MutableArrayStorage, and somehow tie it to the inout ConstArrayStorage scope.