Why doesn't enum destructuring use guaranteed references?

Running this on my MBP with 10 as the parameter: https://gist.github.com/zneak/ae33bb970a08632cfb2925e2049f9e7a

I get a runtime of about 10 seconds, 45% of which is spent in retain/release calls according to Instruments (!!), and at least half of that from Expr.count. Looking at the IR, Swift generously sprinkles retain/release calls through the outlined copy method:

`self` is retained at the beginning of `count`
The values that you get out of destructuring are retained
Of course, when you get `count` on these, they are retained again

Of course, Expr.count cannot modify itself or its subexpressions because the functions are not mutating; in fact, no function in that program can mutate an enum case. Why, then, is Swift retaining/releasing `self` and the values obtained from destructured patterns? They can't go away. Shouldn't we be getting guaranteed references instead of owning references?

That seems to hit pattern-matching-heavy programs with indirect cases pretty hard, and it's pretty frustrating because that seems to be about the nicest way to write this program, and there's no workaround from the developer's perspective. I don't think that this is a fatal flaw of refcounting, but unless I'm missing something, that's sub-par behavior.

Félix

<moving to swift-dev, where most of the perf optimization people lurk>

This is a great question, I’m not sure what the answer is: maybe it is a simple case missed by the arc optimizer?

-Chris

···

On Dec 27, 2017, at 9:19 PM, Félix Cloutier via swift-evolution <swift-evolution@swift.org> wrote:

Running this on my MBP with 10 as the parameter: https://gist.github.com/zneak/ae33bb970a08632cfb2925e2049f9e7a

I get a runtime of about 10 seconds, 45% of which is spent in retain/release calls according to Instruments (!!), and at least half of that from Expr.count. Looking at the IR, Swift generously sprinkles retain/release calls through the outlined copy method:

`self` is retained at the beginning of `count`
The values that you get out of destructuring are retained
Of course, when you get `count` on these, they are retained again

Of course, Expr.count cannot modify itself or its subexpressions because the functions are not mutating; in fact, no function in that program can mutate an enum case. Why, then, is Swift retaining/releasing `self` and the values obtained from destructured patterns? They can't go away. Shouldn't we be getting guaranteed references instead of owning references?

That seems to hit pattern-matching-heavy programs with indirect cases pretty hard, and it's pretty frustrating because that seems to be about the nicest way to write this program, and there's no workaround from the developer's perspective. I don't think that this is a fatal flaw of refcounting, but unless I'm missing something, that's sub-par behavior.

Félix
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Ran into this over the summer. My understanding is that Michael Gottesman (CC’d) has been looking into pattern matching at +0.

The code in SILGenPattern needs to be reworked, but I found this problem intersects with the really old dynamic casting entry points in SILGen too which would seem to complicate a “simple fix”. It’d be good to hear what plans, if any, there are to address this. It would be a real win to clean all of this up.

~Robert Widmann

2017/12/28 14:27、Chris Lattner via swift-dev <swift-dev@swift.org>のメール:

···

<moving to swift-dev, where most of the perf optimization people lurk>

This is a great question, I’m not sure what the answer is: maybe it is a simple case missed by the arc optimizer?

-Chris

On Dec 27, 2017, at 9:19 PM, Félix Cloutier via swift-evolution <swift-evolution@swift.org> wrote:

Running this on my MBP with 10 as the parameter: https://gist.github.com/zneak/ae33bb970a08632cfb2925e2049f9e7a

I get a runtime of about 10 seconds, 45% of which is spent in retain/release calls according to Instruments (!!), and at least half of that from Expr.count. Looking at the IR, Swift generously sprinkles retain/release calls through the outlined copy method:

`self` is retained at the beginning of `count`
The values that you get out of destructuring are retained
Of course, when you get `count` on these, they are retained again

Of course, Expr.count cannot modify itself or its subexpressions because the functions are not mutating; in fact, no function in that program can mutate an enum case. Why, then, is Swift retaining/releasing `self` and the values obtained from destructured patterns? They can't go away. Shouldn't we be getting guaranteed references instead of owning references?

That seems to hit pattern-matching-heavy programs with indirect cases pretty hard, and it's pretty frustrating because that seems to be about the nicest way to write this program, and there's no workaround from the developer's perspective. I don't think that this is a fatal flaw of refcounting, but unless I'm missing something, that's sub-par behavior.

Félix
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

It is a bit more complicated than that. There are codegen aspects and optimization aspects. In both cases there are "structural issues" that I am currently in the process of fixing as part of the "semantic sil" effort. There are some retain/release pairs that we could eliminate by teaching the optimizer about enums/indirect boxes. I go over all of this below.

TLDR: There are structural issues here that make this difficult/unsafe in the general case. Semantic SIL eliminates all of the structural issues with ARC and makes this problem (among other problems) "trivial", so we have been focusing our engineering time on implementing that. That being said, we could use the effects analysis to handle this trivial case. But it would have to be very conservative. Chris, feel like doing some ARC Optimization? = p.

# Codegen Aspects

## Background Info: Local Retain/Release Pairs

If one does not know about local vs non-local retain/release pairs, see the appendix for an explanation.

## +0 Self

In Swift today all parameters except for self are passed at +1. Self is passed at +0. Even though this is true, often times SILGen will be conservative around self and will emit a copy/destroy around the call site. This is actually better since it eliminates a non-local retain/release pair in the callee/caller favor of a local retain/release pair in the caller that can be eliminated by the optimizer without any form of inter-procedural optimization. That is why there is a retain/release at the count call. I will go into why we are not optimizing it below in the optimization section.

## Pattern Matching at +1

All pattern matching in Swift today is done at +1. That means that in the following Swift:

count.getter.pdf (31.3 KB)

···

On Dec 28, 2017, at 4:27 PM, Chris Lattner via swift-dev <swift-dev@swift.org> wrote:

<moving to swift-dev, where most of the perf optimization people lurk>

This is a great question, I’m not sure what the answer is: maybe it is a simple case missed by the arc optimizer?

----
enum Expr: CustomStringConvertible {
  case Int(n: Int)
  indirect case Var(x: String)
  indirect case Add(f: Expr, g: Expr)
  indirect case Mul(f: Expr, g: Expr)
  indirect case Pow(f: Expr, g: Expr)
  indirect case Ln(f: Expr)

       ...

  var count: Int {
    switch self {
    case .Int, .Var: return 1
    case let .Ln(f): return f.count
    case let .Add(f, g), let .Mul(f, g), let .Pow(f, g):
      return f.count + g.count
    }
  }
}
----

even though self is passed at +0 to count, SILGen will emit a retain before the switch. This is unfortunate and adds unnecessary ARC traffic in read only methods on enums. There is no reason not to emit switches that way, so as part of the work to make SILGenPattern able to pass the ownership verifier, I plan on changing all pattern emission to be done at +0. As an added benefit once that work is complete, refactorings in SILGenPattern will be exposed that will enable us to significantly simplify SILGenPattern's handling of cleanups. This complex code has lead to obscure bugs in the past and is IMO overly complex.

# Optimizer Aspects

Lets start by looking at count.getter. Here is the SIL in CFG form:

(FYI I am already writing a longer response, but got interrupted by visiting with my mother ; )). Give me a bit.

···

On Dec 29, 2017, at 2:03 PM, Robert Widmann <devteam.codafi@gmail.com> wrote:

Ran into this over the summer. My understanding is that Michael Gottesman (CC’d) has been looking into pattern matching at +0.

The code in SILGenPattern needs to be reworked, but I found this problem intersects with the really old dynamic casting entry points in SILGen too which would seem to complicate a “simple fix”. It’d be good to hear what plans, if any, there are to address this. It would be a real win to clean all of this up.

~Robert Widmann

2017/12/28 14:27、Chris Lattner via swift-dev <swift-dev@swift.org>のメール:

<moving to swift-dev, where most of the perf optimization people lurk>

This is a great question, I’m not sure what the answer is: maybe it is a simple case missed by the arc optimizer?

-Chris

On Dec 27, 2017, at 9:19 PM, Félix Cloutier via swift-evolution <swift-evolution@swift.org> wrote:

Running this on my MBP with 10 as the parameter: https://gist.github.com/zneak/ae33bb970a08632cfb2925e2049f9e7a

I get a runtime of about 10 seconds, 45% of which is spent in retain/release calls according to Instruments (!!), and at least half of that from Expr.count. Looking at the IR, Swift generously sprinkles retain/release calls through the outlined copy method:

  • `self` is retained at the beginning of `count`
  • The values that you get out of destructuring are retained
    • Of course, when you get `count` on these, they are retained again

Of course, Expr.count cannot modify itself or its subexpressions because the functions are not mutating; in fact, no function in that program can mutate an enum case. Why, then, is Swift retaining/releasing `self` and the values obtained from destructured patterns? They can't go away. Shouldn't we be getting guaranteed references instead of owning references?

That seems to hit pattern-matching-heavy programs with indirect cases pretty hard, and it's pretty frustrating because that seems to be about the nicest way to write this program, and there's no workaround from the developer's perspective. I don't think that this is a fatal flaw of refcounting, but unless I'm missing something, that's sub-par behavior.

Félix
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

Thanks Michael! Other people might have different opinions, but knowing that a "definitive" fix is on the way is plenty for me and I'm not looking to make a fuss.

If you still have some time, I have another question or two. You said that only self is passed at +0. Does this mean, for instance, that custom operators would still get +1 Expr values? It seems to me that if you receive any boxed value at +0, then it's always safe to also pass it further at +0 since it is immutable. That would have a huge impact on this program. Are there downsides to more pervasive +0 parameters?

Félix

···

Le 29 déc. 2017 à 15:42, Michael Gottesman <mgottesman@apple.com> a écrit :

On Dec 28, 2017, at 4:27 PM, Chris Lattner via swift-dev <swift-dev@swift.org> wrote:

<moving to swift-dev, where most of the perf optimization people lurk>

This is a great question, I’m not sure what the answer is: maybe it is a simple case missed by the arc optimizer?

It is a bit more complicated than that. There are codegen aspects and optimization aspects. In both cases there are "structural issues" that I am currently in the process of fixing as part of the "semantic sil" effort. There are some retain/release pairs that we could eliminate by teaching the optimizer about enums/indirect boxes. I go over all of this below.

TLDR: There are structural issues here that make this difficult/unsafe in the general case. Semantic SIL eliminates all of the structural issues with ARC and makes this problem (among other problems) "trivial", so we have been focusing our engineering time on implementing that. That being said, we could use the effects analysis to handle this trivial case. But it would have to be very conservative. Chris, feel like doing some ARC Optimization? = p.

# Codegen Aspects

## Background Info: Local Retain/Release Pairs

If one does not know about local vs non-local retain/release pairs, see the appendix for an explanation.

## +0 Self

In Swift today all parameters except for self are passed at +1. Self is passed at +0. Even though this is true, often times SILGen will be conservative around self and will emit a copy/destroy around the call site. This is actually better since it eliminates a non-local retain/release pair in the callee/caller favor of a local retain/release pair in the caller that can be eliminated by the optimizer without any form of inter-procedural optimization. That is why there is a retain/release at the count call. I will go into why we are not optimizing it below in the optimization section.

## Pattern Matching at +1

All pattern matching in Swift today is done at +1. That means that in the following Swift:

----
enum Expr: CustomStringConvertible {
  case Int(n: Int)
  indirect case Var(x: String)
  indirect case Add(f: Expr, g: Expr)
  indirect case Mul(f: Expr, g: Expr)
  indirect case Pow(f: Expr, g: Expr)
  indirect case Ln(f: Expr)

      ...

  var count: Int {
    switch self {
    case .Int, .Var: return 1
    case let .Ln(f): return f.count
    case let .Add(f, g), let .Mul(f, g), let .Pow(f, g):
      return f.count + g.count
    }
  }
}
----

even though self is passed at +0 to count, SILGen will emit a retain before the switch. This is unfortunate and adds unnecessary ARC traffic in read only methods on enums. There is no reason not to emit switches that way, so as part of the work to make SILGenPattern able to pass the ownership verifier, I plan on changing all pattern emission to be done at +0. As an added benefit once that work is complete, refactorings in SILGenPattern will be exposed that will enable us to significantly simplify SILGenPattern's handling of cleanups. This complex code has lead to obscure bugs in the past and is IMO overly complex.

# Optimizer Aspects

Lets start by looking at count.getter. Here is the SIL in CFG form:

<count.getter.pdf>

Lets analyze bb4 (I think bb5 suffers from similar issues).

The retain, release pair on %0 here suffers from the turning off of the guaranteed optimization. The guaranteed optimization says that any retain/release pairs on a guaranteed value can be eliminated if they are "semantic pairings". We had to turn off this optimization since without semantic-sil we were guessing what retain/release pairings were "semantic pairings". In certain cases this would result in mispairings so we had to turn it off in the general case. Specifically, consider the following straw man:

----
strong_retain %x
...
strong_release obfuscated(%x)
...
strong_retain obfuscated'(%x)
...
strong_release %x
----

in this case we would pair/remove the outer retain, release pair yielding:

----
strong_release obfuscated(%x)
...
strong_retain obfuscated'(%x)
----

While the reference count on %x is still balanced, since the retain count balancing is in inverse order this can result in pre-mature releases and use-after-frees. Once semantic-sil is complete, all pairings are explicit in the IR so such a bug can not happen. To work around that problem, today we are conservative and require any retain/release pair around a guaranteed parameter to be +1 before and have a use afterwards.

The second retain, release pair (i.e. %13) is slightly different since this suffers from the optimizer needing to make a connection in between the box in the enum and the value loaded from the box. Today we have enough knowledge in the optimizer to make the first connection without issue via the RCIdentity analysis. We do not make the connection though in between the box being guaranteed and the value loaded from the box being effectively guaranteed as a result. Even if we made that connection today, I am not sure if in the general case it would be safe. With semantic-sil all of those problems go away.

For both of these, I think that by using appropriate inter procedural effects analysis (which we have), we could teach the ARC optimizer to be more aggressive around guaranteed. The specific properties would be that:

1. No side-effects (ignoring retain/releases).
2. No retains/releases that are unbalanced.
3. No escapes.

I would have to think in more detail in order to be sure though.

# Appendix

## Local Retain/Release Pairs

There are two types of retain/release pairs in Swift, function local and non-local. An example of a local retain/release is when one creates a new binding, i.e.:

----
func f(x: Klass) {
let xhat = x
...
}
----

When the assignment to xhat occurs we retain x and at the end of the function (assuming that xhat is not used), we release xhat. Since the retain/release pair are both in f, we are able to potentially pair and eliminate them without needing to modify the call graph. In contrast, a function non-local pairing is created when a parameter is passed at +1, e.g.:

----
func g(x: Klass) { ... x is used but not consumed ... }
func f(x: Klass) {
g(x)
}
----

In this case when we call g (ignoring any peepholes) we retain x first in f (the callee) and then release x in g (the caller). This is bad for optimization purposes since it means that we can not eliminate the retain/release pair without modifying the call graph or inlining. For non-resilient functions, we of course can/do inline and modify the callgraph via function signature optimization. In the case of resilient functions this is impossible by the nature of resilience.

-Chris

On Dec 27, 2017, at 9:19 PM, Félix Cloutier via swift-evolution <swift-evolution@swift.org> wrote:

Running this on my MBP with 10 as the parameter: https://gist.github.com/zneak/ae33bb970a08632cfb2925e2049f9e7a

I get a runtime of about 10 seconds, 45% of which is spent in retain/release calls according to Instruments (!!), and at least half of that from Expr.count. Looking at the IR, Swift generously sprinkles retain/release calls through the outlined copy method:

  • `self` is retained at the beginning of `count`
  • The values that you get out of destructuring are retained
    • Of course, when you get `count` on these, they are retained again

Of course, Expr.count cannot modify itself or its subexpressions because the functions are not mutating; in fact, no function in that program can mutate an enum case. Why, then, is Swift retaining/releasing `self` and the values obtained from destructured patterns? They can't go away. Shouldn't we be getting guaranteed references instead of owning references?

That seems to hit pattern-matching-heavy programs with indirect cases pretty hard, and it's pretty frustrating because that seems to be about the nicest way to write this program, and there's no workaround from the developer's perspective. I don't think that this is a fatal flaw of refcounting, but unless I'm missing something, that's sub-par behavior.

Félix
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

Thanks Michael! Other people might have different opinions, but knowing that a "definitive" fix is on the way is plenty for me and I'm not looking to make a fuss.

No worries. Happy to help = ). If you are interested, I would be happy to help guide you in implementing one of these optimizations.

If you still have some time, I have another question or two. You said that only self is passed at +0. Does this mean, for instance, that custom operators would still get +1 Expr values?

Yes. But again assuming a non-resilient operator, the optimizer can convert the +1 in many cases to +0 via the owned->guaranteed optimization.

It seems to me that if you receive any boxed value at +0, then it's always safe to also pass it further at +0 since it is immutable.
That would have a huge impact on this program. Are there downsides to more pervasive +0 parameters?

The main downside is that in some cases you do want to have +1 parameters, namely places where you are forwarding values into a function for storage. An example of such a case is a value passed to a setter or an initializer. I would think of it more as an engineering trade-off.

I am currently experimenting with changing normal function arguments to +0 for this purpose, leaving initializers and setters as taking values at +1. Regardless of the default, we can always provide an attribute for the purpose of allowing the user to specify such behavior.

Michael

···

On Dec 29, 2017, at 4:58 PM, Félix Cloutier <felixcloutier@icloud.com> wrote:

Félix

Le 29 déc. 2017 à 15:42, Michael Gottesman <mgottesman@apple.com> a écrit :

On Dec 28, 2017, at 4:27 PM, Chris Lattner via swift-dev <swift-dev@swift.org> wrote:

<moving to swift-dev, where most of the perf optimization people lurk>

This is a great question, I’m not sure what the answer is: maybe it is a simple case missed by the arc optimizer?

It is a bit more complicated than that. There are codegen aspects and optimization aspects. In both cases there are "structural issues" that I am currently in the process of fixing as part of the "semantic sil" effort. There are some retain/release pairs that we could eliminate by teaching the optimizer about enums/indirect boxes. I go over all of this below.

TLDR: There are structural issues here that make this difficult/unsafe in the general case. Semantic SIL eliminates all of the structural issues with ARC and makes this problem (among other problems) "trivial", so we have been focusing our engineering time on implementing that. That being said, we could use the effects analysis to handle this trivial case. But it would have to be very conservative. Chris, feel like doing some ARC Optimization? = p.

# Codegen Aspects

## Background Info: Local Retain/Release Pairs

If one does not know about local vs non-local retain/release pairs, see the appendix for an explanation.

## +0 Self

In Swift today all parameters except for self are passed at +1. Self is passed at +0. Even though this is true, often times SILGen will be conservative around self and will emit a copy/destroy around the call site. This is actually better since it eliminates a non-local retain/release pair in the callee/caller favor of a local retain/release pair in the caller that can be eliminated by the optimizer without any form of inter-procedural optimization. That is why there is a retain/release at the count call. I will go into why we are not optimizing it below in the optimization section.

## Pattern Matching at +1

All pattern matching in Swift today is done at +1. That means that in the following Swift:

----
enum Expr: CustomStringConvertible {
  case Int(n: Int)
  indirect case Var(x: String)
  indirect case Add(f: Expr, g: Expr)
  indirect case Mul(f: Expr, g: Expr)
  indirect case Pow(f: Expr, g: Expr)
  indirect case Ln(f: Expr)

     ...

  var count: Int {
    switch self {
    case .Int, .Var: return 1
    case let .Ln(f): return f.count
    case let .Add(f, g), let .Mul(f, g), let .Pow(f, g):
      return f.count + g.count
    }
  }
}
----

even though self is passed at +0 to count, SILGen will emit a retain before the switch. This is unfortunate and adds unnecessary ARC traffic in read only methods on enums. There is no reason not to emit switches that way, so as part of the work to make SILGenPattern able to pass the ownership verifier, I plan on changing all pattern emission to be done at +0. As an added benefit once that work is complete, refactorings in SILGenPattern will be exposed that will enable us to significantly simplify SILGenPattern's handling of cleanups. This complex code has lead to obscure bugs in the past and is IMO overly complex.

# Optimizer Aspects

Lets start by looking at count.getter. Here is the SIL in CFG form:

<count.getter.pdf>

Lets analyze bb4 (I think bb5 suffers from similar issues).

The retain, release pair on %0 here suffers from the turning off of the guaranteed optimization. The guaranteed optimization says that any retain/release pairs on a guaranteed value can be eliminated if they are "semantic pairings". We had to turn off this optimization since without semantic-sil we were guessing what retain/release pairings were "semantic pairings". In certain cases this would result in mispairings so we had to turn it off in the general case. Specifically, consider the following straw man:

----
strong_retain %x
...
strong_release obfuscated(%x)
...
strong_retain obfuscated'(%x)
...
strong_release %x
----

in this case we would pair/remove the outer retain, release pair yielding:

----
strong_release obfuscated(%x)
...
strong_retain obfuscated'(%x)
----

While the reference count on %x is still balanced, since the retain count balancing is in inverse order this can result in pre-mature releases and use-after-frees. Once semantic-sil is complete, all pairings are explicit in the IR so such a bug can not happen. To work around that problem, today we are conservative and require any retain/release pair around a guaranteed parameter to be +1 before and have a use afterwards.

The second retain, release pair (i.e. %13) is slightly different since this suffers from the optimizer needing to make a connection in between the box in the enum and the value loaded from the box. Today we have enough knowledge in the optimizer to make the first connection without issue via the RCIdentity analysis. We do not make the connection though in between the box being guaranteed and the value loaded from the box being effectively guaranteed as a result. Even if we made that connection today, I am not sure if in the general case it would be safe. With semantic-sil all of those problems go away.

For both of these, I think that by using appropriate inter procedural effects analysis (which we have), we could teach the ARC optimizer to be more aggressive around guaranteed. The specific properties would be that:

1. No side-effects (ignoring retain/releases).
2. No retains/releases that are unbalanced.
3. No escapes.

I would have to think in more detail in order to be sure though.

# Appendix

## Local Retain/Release Pairs

There are two types of retain/release pairs in Swift, function local and non-local. An example of a local retain/release is when one creates a new binding, i.e.:

----
func f(x: Klass) {
let xhat = x
...
}
----

When the assignment to xhat occurs we retain x and at the end of the function (assuming that xhat is not used), we release xhat. Since the retain/release pair are both in f, we are able to potentially pair and eliminate them without needing to modify the call graph. In contrast, a function non-local pairing is created when a parameter is passed at +1, e.g.:

----
func g(x: Klass) { ... x is used but not consumed ... }
func f(x: Klass) {
g(x)
}
----

In this case when we call g (ignoring any peepholes) we retain x first in f (the callee) and then release x in g (the caller). This is bad for optimization purposes since it means that we can not eliminate the retain/release pair without modifying the call graph or inlining. For non-resilient functions, we of course can/do inline and modify the callgraph via function signature optimization. In the case of resilient functions this is impossible by the nature of resilience.

-Chris

On Dec 27, 2017, at 9:19 PM, Félix Cloutier via swift-evolution <swift-evolution@swift.org> wrote:

Running this on my MBP with 10 as the parameter: https://gist.github.com/zneak/ae33bb970a08632cfb2925e2049f9e7a

I get a runtime of about 10 seconds, 45% of which is spent in retain/release calls according to Instruments (!!), and at least half of that from Expr.count. Looking at the IR, Swift generously sprinkles retain/release calls through the outlined copy method:

  • `self` is retained at the beginning of `count`
  • The values that you get out of destructuring are retained
    • Of course, when you get `count` on these, they are retained again

Of course, Expr.count cannot modify itself or its subexpressions because the functions are not mutating; in fact, no function in that program can mutate an enum case. Why, then, is Swift retaining/releasing `self` and the values obtained from destructured patterns? They can't go away. Shouldn't we be getting guaranteed references instead of owning references?

That seems to hit pattern-matching-heavy programs with indirect cases pretty hard, and it's pretty frustrating because that seems to be about the nicest way to write this program, and there's no workaround from the developer's perspective. I don't think that this is a fatal flaw of refcounting, but unless I'm missing something, that's sub-par behavior.

Félix
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

No worries. Happy to help = ). If you are interested, I would be happy to help guide you in implementing one of these optimizations.

That sounds fun. I'll have to check with my manager after the holidays.

The main downside is that in some cases you do want to have +1 parameters, namely places where you are forwarding values into a function for storage. An example of such a case is a value passed to a setter or an initializer. I would think of it more as an engineering trade-off.

I am currently experimenting with changing normal function arguments to +0 for this purpose, leaving initializers and setters as taking values at +1. Regardless of the default, we can always provide an attribute for the purpose of allowing the user to specify such behavior.

It sounds like having flexible parameter ownership rules doesn't have too much overhead if it can be user-specified (in some future). Would it be feasible to use escape analysis to decide if a parameter should be +0 or +1?

More ARC questions! I remember that some time ago, someone else (John McCall?) said that references aren't tied to a scope. This was in the context of deterministic deallocation, and the point was that contrary to C++, an object could potentially be released immediately after its last use in the function rather than at the end of its scope. However, that's not really what you said ("When the assignment to xhat occurs we retain x and at the end of the function [...], we release xhat"), and it's not how Swift 4 works from what I can tell (release is not called "as soon as possible" from my limited testing).

It seems to me that this penalizes functional-style programming in at least two ways:

This kills tail call optimizations, because often the compiler will put a release call in the tail position
Especially in recursion-heavy algorithms, objects can be kept around for longer than they need to

Here's an example where both hit:

class Foo {
  var value: [Int]
  
  init() {
    value = Array<Int>(repeating: 0, count: 10000)
  }
  
  init(from foo: Foo) {
    value = foo.value
    value[0] += 1
  }
}

func silly(loop: Int, foo: Foo) -> Foo {
  guard loop != 0 else { return foo }
  let copy = Foo(from: foo)
  return silly(loop: loop - 1, foo: copy)
}

print(silly(loop: 10000, foo: Foo()).value[0])

I wouldn't call that "expert Swift code" (indeed, if I come to my senses <https://github.com/apple/swift-evolution/blob/master/proposals/0193-cross-module-inlining-and-specialization.md#proposed-solution&gt; and use a regular loop, it does just fine), but in this form, it does need about 800MB of memory, and it can't use tail calls.

The function has the opposite problem from the pattern matching case. It is specialized such that `foo` is passed at +0: it is retained at `return foo` and released (in the caller) after the call to `silly`. However, the optimal implementation would pass it at +1, do nothing for `return foo`, and release it (in the callee) after the call to `Foo(from: foo)`. (Or, even better, it would release it after `value = foo.value` in the init function.)

I'll note that escape analysis would correctly find that +1 is the "optimal" ownership convention for `foo` in `silly` :) but it won't actually solve either the memory use problem or the missed tail call unless the release call is also moved up.

I guess that the question is: what does Swift gain by keeping objects around for longer than they need to? Is it all about matching C++ or is there something else?

Félix

···

Le 29 déc. 2017 à 20:50, Michael Gottesman <mgottesman@apple.com> a écrit :

No worries. Happy to help = ). If you are interested, I would be happy to help guide you in implementing one of these optimizations.

That sounds fun. I'll have to check with my manager after the holidays.

Nerd snipe success? = p. In general, we haven't had as many people from oss work on the optimizer. This is unfortunate, so I am more than happy to help.

The main downside is that in some cases you do want to have +1 parameters, namely places where you are forwarding values into a function for storage. An example of such a case is a value passed to a setter or an initializer. I would think of it more as an engineering trade-off.

I am currently experimenting with changing normal function arguments to +0 for this purpose, leaving initializers and setters as taking values at +1. Regardless of the default, we can always provide an attribute for the purpose of allowing the user to specify such behavior.

It sounds like having flexible parameter ownership rules doesn't have too much overhead if it can be user-specified (in some future). Would it be feasible to use escape analysis to decide if a parameter should be +0 or +1?

No. A parameter's convention is ABI. You don't want to change ABI related things like that via escape analysis since it means that as a function changes, due to the optimizer, the ABI can change =><=. *BUT* recently Adam Nemet has added support for LLVM's opt remark infrastructure to Swift. Something like that /could/ provide suggestions when it is run that this might be a profitable parameter to change from +0 to +1 or vis-a-versa. Then the user could make that change via annotation. Keep in mind that this would most likely be an ABI breaking change.

More ARC questions! I remember that some time ago, someone else (John McCall?) said that references aren't tied to a scope. This was in the context of deterministic deallocation, and the point was that contrary to C++, an object could potentially be released immediately after its last use in the function rather than at the end of its scope. However, that's not really what you said ("When the assignment to xhat occurs we retain x and at the end of the function [...], we release xhat")

You are extrapolating too far from my example. That being said, I was unclear. What I was trying to show was the difference in between a case where xhat is passed off to another function, e.g.:

func g(x) {
  let xhat = x
  h(xhat)
}

and a situation where xhat is not consumed and is destroyed within g.

, and it's not how Swift 4 works from what I can tell (release is not called "as soon as possible" from my limited testing).

Cases like this are due to the optimizer seeing some use that it can not understand. The optimizer must be conservative so sometimes things that the user thinks the optimizer should see through/understand, it can not. The way to see that is to look at the SIL level and see what is stopping the code motion. There are ways that you can get debug output from the optimizer. This additionally may be a case where an opt-remark like system could help guide the user on why code motion has stopped.

It seems to me that this penalizes functional-style programming in at least two ways:

  • This kills tail call optimizations, because often the compiler will put a release call in the tail position
  • Especially in recursion-heavy algorithms, objects can be kept around for longer than they need to

Here's an example where both hit:

class Foo {
  var value: [Int]
  
  init() {
    value = Array<Int>(repeating: 0, count: 10000)
  }
  
  init(from foo: Foo) {
    value = foo.value
    value[0] += 1
  }
}

func silly(loop: Int, foo: Foo) -> Foo {
  guard loop != 0 else { return foo }
  let copy = Foo(from: foo)
  return silly(loop: loop - 1, foo: copy)
}

print(silly(loop: 10000, foo: Foo()).value[0])

I wouldn't call that "expert Swift code" (indeed, if I come to my senses and use a regular loop, it does just fine), but in this form, it does need about 800MB of memory, and it can't use tail calls.

The function has the opposite problem from the pattern matching case. It is specialized such that `foo` is passed at +0: it is retained at `return foo` and released (in the caller) after the call to `silly`. However, the optimal implementation would pass it at +1, do nothing for `return foo`, and release it (in the callee) after the call to `Foo(from: foo)`. (Or, even better, it would release it after `value = foo.value` in the init function.)

I'll note that escape analysis would correctly find that +1 is the "optimal" ownership convention for `foo` in `silly` :) but it won't actually solve either the memory use problem or the missed tail call unless the release call is also moved up.

I guess that the question is: what does Swift gain by keeping objects around for longer than they need to? Is it all about matching C++ or is there something else?

Again, I think you are extrapolating a bit. Swift is not attempting to keep objects around for longer than they need to be at all. Such situations are more likely due to optimizer inadequacies or unimplemented optimizations [again, nerd snipe alert, patches welcome ; )]. All of these things take engineering time to do and engineering time is something that must be prioritized with respect to the overall needs of the project.

When I compile this on my machine. Here is what the SIL looks like:

viewcfg.pdf (29.5 KB)

···

On Dec 30, 2017, at 1:21 PM, Félix Cloutier <felixcloutier@icloud.com> wrote:

Le 29 déc. 2017 à 20:50, Michael Gottesman <mgottesman@apple.com> a écrit :

That sounds fun. I'll have to check with my manager after the holidays.

Nerd snipe success? = p

I guess so? :nerd_face: I'm an easy target.

It sounds like having flexible parameter ownership rules doesn't have too much overhead if it can be user-specified (in some future). Would it be feasible to use escape analysis to decide if a parameter should be +0 or +1?

No. A parameter's convention is ABI. You don't want to change ABI related things like that via escape analysis since it means that as a function changes, due to the optimizer, the ABI can change =><=.

That makes sense, I hadn't thought about it.

Cases like this are due to the optimizer seeing some use that it can not understand. The optimizer must be conservative so sometimes things that the user thinks the optimizer should see through/understand, it can not. The way to see that is to look at the SIL level and see what is stopping the code motion. There are ways that you can get debug output from the optimizer. This additionally may be a case where an opt-remark like system could help guide the user on why code motion has stopped.

My limited testing was basically checking this program:

final class Foo {
  var bar = 4
}

let instance = Foo()

@inline(never)
func print(_ x: Int) {
  Swift.print(x)
}

func main() {
  let foo = instance
  print(foo.bar)
  print(0)
}

On my first pass I noticed that foo is released at the end of the function (hence the rest of my message), but upon closer inspection I see that it is, in fact, retained after `foo.bar` is accessed:

sil hidden @_T04test4mainyyF : $@convention(thin) () -> () {
bb0:
  %0 = global_addr @_T04test8instanceAA3FooCv : $*Foo // user: %1
  %1 = load %0 : $*Foo // users: %11, %6, %4, %2
  debug_value %1 : $Foo, let, name "foo" // id: %2
  // function_ref print(_:)
  %3 = function_ref @_T04test5printySiF : $@convention(thin) (Int) -> () // users: %10, %7
  %4 = ref_element_addr %1 : $Foo, #Foo.bar // user: %5
  %5 = load %4 : $*Int // user: %7
  strong_retain %1 : $Foo // id: %6
  %7 = apply %3(%5) : $@convention(thin) (Int) -> ()
  %8 = integer_literal $Builtin.Int64, 0 // user: %9
  %9 = struct $Int (%8 : $Builtin.Int64) // user: %10
  %10 = apply %3(%9) : $@convention(thin) (Int) -> ()
  strong_release %1 : $Foo // id: %11
  %12 = tuple () // user: %13
  return %12 : $() // id: %13
} // end sil function '_T04test4mainyyF'

So while I thought earlier that I didn't know why it wasn't released, I guess that the better question is why it's retained at all!

I guess that the question is: what does Swift gain by keeping objects around for longer than they need to? Is it all about matching C++ or is there something else?

Again, I think you are extrapolating a bit. Swift is not attempting to keep objects around for longer than they need to be at all. Such situations are more likely due to optimizer inadequacies or unimplemented optimizations [again, nerd snipe alert, patches welcome ; )]. All of these things take engineering time to do and engineering time is something that must be prioritized with respect to the overall needs of the project.

Of course. I think that I was being a bit aggressive with "what is the benefit of this"; I knew there was a fair chance that it was "we had other things to do".

Félix

···

Le 30 déc. 2017 à 14:22, Michael Gottesman <mgottesman@apple.com> a écrit :

ABI only matters at public entry points for ABI-stable binaries. We can do this sort of analysis freely within modules, or in a hypothetical multi-module-optimization mode.

-Joe

···

On Dec 30, 2017, at 11:23 AM, Michael Gottesman via swift-dev <swift-dev@swift.org> wrote:

No. A parameter's convention is ABI. You don't want to change ABI related things like that via escape analysis since it means that as a function changes, due to the optimizer, the ABI can change =><=. *BUT* recently Adam Nemet has added support for LLVM's opt remark infrastructure to Swift. Something like that /could/ provide suggestions when it is run that this might be a profitable parameter to change from +0 to +1 or vis-a-versa. Then the user could make that change via annotation. Keep in mind that this would most likely be an ABI breaking change.

That sounds fun. I'll have to check with my manager after the holidays.

Nerd snipe success? = p

I guess so? :nerd_face: I'm an easy target.

It sounds like having flexible parameter ownership rules doesn't have too much overhead if it can be user-specified (in some future). Would it be feasible to use escape analysis to decide if a parameter should be +0 or +1?

No. A parameter's convention is ABI. You don't want to change ABI related things like that via escape analysis since it means that as a function changes, due to the optimizer, the ABI can change =><=.

That makes sense, I hadn't thought about it.

Cases like this are due to the optimizer seeing some use that it can not understand. The optimizer must be conservative so sometimes things that the user thinks the optimizer should see through/understand, it can not. The way to see that is to look at the SIL level and see what is stopping the code motion. There are ways that you can get debug output from the optimizer. This additionally may be a case where an opt-remark like system could help guide the user on why code motion has stopped.

My limited testing was basically checking this program:

final class Foo {
  var bar = 4
}

let instance = Foo()

@inline(never)
func print(_ x: Int) {
  Swift.print(x)
}

func main() {
  let foo = instance
  print(foo.bar)
  print(0)
}

On my first pass I noticed that foo is released at the end of the function (hence the rest of my message), but upon closer inspection I see that it is, in fact, retained after `foo.bar` is accessed:

sil hidden @_T04test4mainyyF : $@convention(thin) () -> () {
bb0:
%0 = global_addr @_T04test8instanceAA3FooCv : $*Foo // user: %1
%1 = load %0 : $*Foo // users: %11, %6, %4, %2
debug_value %1 : $Foo, let, name "foo" // id: %2
// function_ref print(_:)
%3 = function_ref @_T04test5printySiF : $@convention(thin) (Int) -> () // users: %10, %7
%4 = ref_element_addr %1 : $Foo, #Foo.bar // user: %5
%5 = load %4 : $*Int // user: %7
strong_retain %1 : $Foo // id: %6
%7 = apply %3(%5) : $@convention(thin) (Int) -> ()
%8 = integer_literal $Builtin.Int64, 0 // user: %9
%9 = struct $Int (%8 : $Builtin.Int64) // user: %10
%10 = apply %3(%9) : $@convention(thin) (Int) -> ()
strong_release %1 : $Foo // id: %11
%12 = tuple () // user: %13
return %12 : $() // id: %13
} // end sil function '_T04test4mainyyF'

So while I thought earlier that I didn't know why it wasn't released, I guess that the better question is why it's retained at all!

I guess that the question is: what does Swift gain by keeping objects around for longer than they need to? Is it all about matching C++ or is there something else?

Again, I think you are extrapolating a bit. Swift is not attempting to keep objects around for longer than they need to be at all. Such situations are more likely due to optimizer inadequacies or unimplemented optimizations [again, nerd snipe alert, patches welcome ; )]. All of these things take engineering time to do and engineering time is something that must be prioritized with respect to the overall needs of the project.

Of course. I think that I was being a bit aggressive with "what is the benefit of this"; I knew there was a fair chance that it was "we had other things to do".

I do not believe in bad questions and I think that you asked said questions in an appropriate/respectful way. So don't worry about it: everything's good. = ).

That being said, if you want to take me up on the "nerd snipe", just send an email to swift-dev and +CC me. I may be the appropriate person to help you, but if not I will hook you up with the right person to help you.

Michael

···

On Dec 30, 2017, at 9:41 PM, Félix Cloutier <felixcloutier@icloud.com> wrote:

Le 30 déc. 2017 à 14:22, Michael Gottesman <mgottesman@apple.com> a écrit :

Félix

No. A parameter's convention is ABI. You don't want to change ABI related things like that via escape analysis since it means that as a function changes, due to the optimizer, the ABI can change =><=. *BUT* recently Adam Nemet has added support for LLVM's opt remark infrastructure to Swift. Something like that /could/ provide suggestions when it is run that this might be a profitable parameter to change from +0 to +1 or vis-a-versa. Then the user could make that change via annotation. Keep in mind that this would most likely be an ABI breaking change.

ABI only matters at public entry points for ABI-stable binaries. We can do this sort of analysis freely within modules, or in a hypothetical multi-module-optimization mode.

Sure. We were talking in the general case though which does include public entry points of ABI-stable binaries. So my statement was correct.

Michael

···

On Jan 2, 2018, at 7:00 PM, Joe Groff <jgroff@apple.com> wrote:

On Dec 30, 2017, at 11:23 AM, Michael Gottesman via swift-dev <swift-dev@swift.org> wrote:

-Joe

Felix, on a completely different note. Do you think you could contribute these workloads to the swift benchmark suite. Then we can track it over time.

Michael

···

On Dec 30, 2017, at 10:03 PM, Michael Gottesman via swift-dev <swift-dev@swift.org> wrote:

On Dec 30, 2017, at 9:41 PM, Félix Cloutier <felixcloutier@icloud.com> wrote:

Le 30 déc. 2017 à 14:22, Michael Gottesman <mgottesman@apple.com> a écrit :

That sounds fun. I'll have to check with my manager after the holidays.

Nerd snipe success? = p

I guess so? :nerd_face: I'm an easy target.

It sounds like having flexible parameter ownership rules doesn't have too much overhead if it can be user-specified (in some future). Would it be feasible to use escape analysis to decide if a parameter should be +0 or +1?

No. A parameter's convention is ABI. You don't want to change ABI related things like that via escape analysis since it means that as a function changes, due to the optimizer, the ABI can change =><=.

That makes sense, I hadn't thought about it.

Cases like this are due to the optimizer seeing some use that it can not understand. The optimizer must be conservative so sometimes things that the user thinks the optimizer should see through/understand, it can not. The way to see that is to look at the SIL level and see what is stopping the code motion. There are ways that you can get debug output from the optimizer. This additionally may be a case where an opt-remark like system could help guide the user on why code motion has stopped.

My limited testing was basically checking this program:

final class Foo {
  var bar = 4
}

let instance = Foo()

@inline(never)
func print(_ x: Int) {
  Swift.print(x)
}

func main() {
  let foo = instance
  print(foo.bar)
  print(0)
}

On my first pass I noticed that foo is released at the end of the function (hence the rest of my message), but upon closer inspection I see that it is, in fact, retained after `foo.bar` is accessed:

sil hidden @_T04test4mainyyF : $@convention(thin) () -> () {
bb0:
%0 = global_addr @_T04test8instanceAA3FooCv : $*Foo // user: %1
%1 = load %0 : $*Foo // users: %11, %6, %4, %2
debug_value %1 : $Foo, let, name "foo" // id: %2
// function_ref print(_:)
%3 = function_ref @_T04test5printySiF : $@convention(thin) (Int) -> () // users: %10, %7
%4 = ref_element_addr %1 : $Foo, #Foo.bar // user: %5
%5 = load %4 : $*Int // user: %7
strong_retain %1 : $Foo // id: %6
%7 = apply %3(%5) : $@convention(thin) (Int) -> ()
%8 = integer_literal $Builtin.Int64, 0 // user: %9
%9 = struct $Int (%8 : $Builtin.Int64) // user: %10
%10 = apply %3(%9) : $@convention(thin) (Int) -> ()
strong_release %1 : $Foo // id: %11
%12 = tuple () // user: %13
return %12 : $() // id: %13
} // end sil function '_T04test4mainyyF'

So while I thought earlier that I didn't know why it wasn't released, I guess that the better question is why it's retained at all!

I guess that the question is: what does Swift gain by keeping objects around for longer than they need to? Is it all about matching C++ or is there something else?

Again, I think you are extrapolating a bit. Swift is not attempting to keep objects around for longer than they need to be at all. Such situations are more likely due to optimizer inadequacies or unimplemented optimizations [again, nerd snipe alert, patches welcome ; )]. All of these things take engineering time to do and engineering time is something that must be prioritized with respect to the overall needs of the project.

Of course. I think that I was being a bit aggressive with "what is the benefit of this"; I knew there was a fair chance that it was "we had other things to do".

I do not believe in bad questions and I think that you asked said questions in an appropriate/respectful way. So don't worry about it: everything's good. = ).

That being said, if you want to take me up on the "nerd snipe", just send an email to swift-dev and +CC me. I may be the appropriate person to help you, but if not I will hook you up with the right person to help you.

Michael

Félix

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev