Profiling ARC

Jiho_Choi · February 17, 2017, 2:48am

Hi,

I was curious about the overhead of ARC and started profiling some
benchmarks found in the Computer Language Benchmark Game (
http://benchmarksgame.alioth.debian.org/u64q/measurements.php?lang=swift\).
So far, it seems that ARC sequence optimization is surprisingly good and
most benchmarks don't have to perform ARC operations as often as I
expected. I have some questions regarding this finding.

I compiled all benchmarks with "-O -wmo" flags and counted the number of
calls to ARC runtime (e.g., swift_rt_swift_retain) using Pin.

1. Reference counting is considered to have high overhead due to frequent
counting operations which also have to be atomic. At least for the
benchmarks I tested, it is not the case and there is almost no overhead.
Is it expected behavior? Or is it because the benchmarks are too simple
(they are all single-file programs)? How do you estimate the overhead of
ARC would be?

2. I also tried to compile the same benchmarks with "-Xfrontend
-assume-single-threaded" to measure the overhead of atomic operations.
Looking at the source code of this experimental pass and SIL optimizer's
statistic, the pass seems to work as expected to convert all ARC operations
in user code into nonatomic. However, even when using this flag, there are
some atomic ARC runtime called from the user code (not library). More
strangely, SIL output said all ARC operations in the user code have turned
into nonatomic. The documentation says ARC operations are never implicit
in SIL. So if there is no atomic ARC at SIL-level, I expect the user code
would never call atomic ARC runtime. Am I missing something?

3. Are there more realistic benchmarks available? Swift's official
benchmarks also seem pretty small.

Thanks,
Jiho

Slava_Pestov · February 17, 2017, 4:03am

Hi,

I was curious about the overhead of ARC and started profiling some benchmarks found in the Computer Language Benchmark Game (http://benchmarksgame.alioth.debian.org/u64q/measurements.php?lang=swift <http://benchmarksgame.alioth.debian.org/u64q/measurements.php?lang=swift>\). So far, it seems that ARC sequence optimization is surprisingly good and most benchmarks don't have to perform ARC operations as often as I expected. I have some questions regarding this finding.

I compiled all benchmarks with "-O -wmo" flags and counted the number of calls to ARC runtime (e.g., swift_rt_swift_retain) using Pin.

1. Reference counting is considered to have high overhead due to frequent counting operations which also have to be atomic. At least for the benchmarks I tested, it is not the case and there is almost no overhead. Is it expected behavior? Or is it because the benchmarks are too simple (they are all single-file programs)? How do you estimate the overhead of ARC would be?

It is possible that the optimizer eliminated many reference counting operations here. Also my understanding is that while atomic operations are more expensive than non-atomic operations, the real cost only comes into play if you actually have contention due to bouncing cache lines. In a single-threaded workload the overhead is not that great.

2. I also tried to compile the same benchmarks with "-Xfrontend -assume-single-threaded" to measure the overhead of atomic operations. Looking at the source code of this experimental pass and SIL optimizer's statistic, the pass seems to work as expected to convert all ARC operations in user code into nonatomic. However, even when using this flag, there are some atomic ARC runtime called from the user code (not library). More strangely, SIL output said all ARC operations in the user code have turned into nonatomic. The documentation says ARC operations are never implicit in SIL. So if there is no atomic ARC at SIL-level, I expect the user code would never call atomic ARC runtime. Am I missing something?

IRGen still emits atomic reference counting operations when it produces value witness operations. I think there’s a PR open right now to address this: [WIP] Enhance -assume-single-threaded option by mtake · Pull Request #7421 · apple/swift · GitHub

3. Are there more realistic benchmarks available? Swift's official benchmarks also seem pretty small.

Contributions are welcome :-)

···

On Feb 16, 2017, at 6:48 PM, Jiho Choi via swift-dev <swift-dev@swift.org> wrote:

Thanks,
Jiho
_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

dgrove-oss · February 17, 2017, 7:30pm

hmm, I wonder if your method of profiling is really finding all the ARC
operations. The Swift version of regex-dna is about 25x slower than the
Java version (on Linux). I looked at some prof profiles about a month ago
and at the time roughly 80% of all execution samples were attributed to
swift_retain/swift_release operations coming from CoreFoundation's regex
implementation.

--dave

(See attached file: regex-dna.svg)

···

swift-dev-bounces@swift.org wrote on 02/16/2017 09:48:28 PM: > > I was curious about the overhead of ARC and started profiling some > benchmarks found in the Computer Language Benchmark Game (http://

benchmarksgame.alioth.debian.org/u64q/measurements.php?lang=swift).
So far, it seems that ARC sequence optimization is surprisingly good
and most benchmarks don't have to perform ARC operations as often as
I expected. I have some questions regarding this finding.

I compiled all benchmarks with "-O -wmo" flags and counted the
number of calls to ARC runtime (e.g., swift_rt_swift_retain) using Pin.

1. Reference counting is considered to have high overhead due to
frequent counting operations which also have to be atomic. At least
for the benchmarks I tested, it is not the case and there is almost
no overhead. Is it expected behavior? Or is it because the
benchmarks are too simple (they are all single-file programs)? How
do you estimate the overhead of ARC would be?

Mikio_Takeuchi · February 17, 2017, 5:40am

Hi,

I just created a new PR #7557 (https://github.com/apple/swift/pull/7557\)
which replaces #7421.

Thanks,
-- Mikio

···

2017-02-17 13:03 GMT+09:00 Slava Pestov via swift-dev <swift-dev@swift.org>:

On Feb 16, 2017, at 6:48 PM, Jiho Choi via swift-dev <swift-dev@swift.org> > wrote:

Hi,

I was curious about the overhead of ARC and started profiling some
benchmarks found in the Computer Language Benchmark Game (
http://benchmarksgame.alioth.debian.org/u64q/measurements.php?lang=swift\).
So far, it seems that ARC sequence optimization is surprisingly good and
most benchmarks don't have to perform ARC operations as often as I
expected. I have some questions regarding this finding.

I compiled all benchmarks with "-O -wmo" flags and counted the number of
calls to ARC runtime (e.g., swift_rt_swift_retain) using Pin.

1. Reference counting is considered to have high overhead due to frequent
counting operations which also have to be atomic. At least for the
benchmarks I tested, it is not the case and there is almost no overhead.
Is it expected behavior? Or is it because the benchmarks are too simple
(they are all single-file programs)? How do you estimate the overhead of
ARC would be?

It is possible that the optimizer eliminated many reference counting
operations here. Also my understanding is that while atomic operations are
more expensive than non-atomic operations, the real cost only comes into
play if you actually have contention due to bouncing cache lines. In a
single-threaded workload the overhead is not that great.

2. I also tried to compile the same benchmarks with "-Xfrontend
-assume-single-threaded" to measure the overhead of atomic operations.
Looking at the source code of this experimental pass and SIL optimizer's
statistic, the pass seems to work as expected to convert all ARC operations
in user code into nonatomic. However, even when using this flag, there are
some atomic ARC runtime called from the user code (not library). More
strangely, SIL output said all ARC operations in the user code have turned
into nonatomic. The documentation says ARC operations are never implicit
in SIL. So if there is no atomic ARC at SIL-level, I expect the user code
would never call atomic ARC runtime. Am I missing something?

IRGen still emits atomic reference counting operations when it produces
value witness operations. I think there’s a PR open right now to address
this: https://github.com/apple/swift/pull/7421

3. Are there more realistic benchmarks available? Swift's official
benchmarks also seem pretty small.

Contributions are welcome :-)

Thanks,
Jiho
_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

Michael_Gottesman · February 17, 2017, 10:31pm

> benchmarksgame.alioth.debian.org/u64q/measurements.php?lang=swift).
> So far, it seems that ARC sequence optimization is surprisingly good
> and most benchmarks don't have to perform ARC operations as often as
> I expected. I have some questions regarding this finding.
>
> I compiled all benchmarks with "-O -wmo" flags and counted the
> number of calls to ARC runtime (e.g., swift_rt_swift_retain) using Pin.
>
> 1. Reference counting is considered to have high overhead due to
> frequent counting operations which also have to be atomic. At least
> for the benchmarks I tested, it is not the case and there is almost
> no overhead. Is it expected behavior? Or is it because the
> benchmarks are too simple (they are all single-file programs)? How
> do you estimate the overhead of ARC would be?
>

hmm, I wonder if your method of profiling is really finding all the ARC operations. The Swift version of regex-dna is about 25x slower than the Java version (on Linux). I looked at some prof profiles about a month ago and at the time roughly 80% of all execution samples were attributed to swift_retain/swift_release operations coming from CoreFoundation's regex implementation.

Question. Where is this regex-dna benchmark, is it in the swift benchmark suite?

···

On Feb 17, 2017, at 11:30 AM, David P Grove via swift-dev <swift-dev@swift.org> wrote:
swift-dev-bounces@swift.org wrote on 02/16/2017 09:48:28 PM: > > > > I was curious about the overhead of ARC and started profiling some > > benchmarks found in the Computer Language Benchmark Game (http://

--dave

(See attached file: regex-dna.svg)
<regex-dna.svg>_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

Jiho_Choi · February 20, 2017, 10:24pm

You are right that regex has many ARC operations from libFoundation.
Another outlier in terms of the number of ARC operations is binary-tree.
In this case, ARC operations are from the user code, and the optimizer
couldn't make much difference.

Other than these two, the optimizer seems working pretty well in removing
ARC operations.

···

On Fri, Feb 17, 2017 at 1:34 PM David P Grove <groved@us.ibm.com> wrote:

swift-dev-bounces@swift.org wrote on 02/16/2017 09:48:28 PM: > > > > I was curious about the overhead of ARC and started profiling some > > benchmarks found in the Computer Language Benchmark Game (http://
> benchmarksgame.alioth.debian.org/u64q/measurements.php?lang=swift).
> So far, it seems that ARC sequence optimization is surprisingly good
> and most benchmarks don't have to perform ARC operations as often as
> I expected. I have some questions regarding this finding.
>
> I compiled all benchmarks with "-O -wmo" flags and counted the
> number of calls to ARC runtime (e.g., swift_rt_swift_retain) using Pin.
>
> 1. Reference counting is considered to have high overhead due to
> frequent counting operations which also have to be atomic. At least
> for the benchmarks I tested, it is not the case and there is almost
> no overhead. Is it expected behavior? Or is it because the
> benchmarks are too simple (they are all single-file programs)? How
> do you estimate the overhead of ARC would be?
>

hmm, I wonder if your method of profiling is really finding all the ARC
operations. The Swift version of regex-dna is about 25x slower than the
Java version (on Linux). I looked at some prof profiles about a month ago
and at the time roughly 80% of all execution samples were attributed to
swift_retain/swift_release operations coming from CoreFoundation's regex
implementation.

--dave

*(See attached file: regex-dna.svg)*

Roman_Levenstein · February 17, 2017, 10:55pm

> benchmarksgame.alioth.debian.org/u64q/measurements.php?lang=swift <http://benchmarksgame.alioth.debian.org/u64q/measurements.php?lang=swift>\).
> So far, it seems that ARC sequence optimization is surprisingly good
> and most benchmarks don't have to perform ARC operations as often as
> I expected. I have some questions regarding this finding.
>
> I compiled all benchmarks with "-O -wmo" flags and counted the
> number of calls to ARC runtime (e.g., swift_rt_swift_retain) using Pin.
>
> 1. Reference counting is considered to have high overhead due to
> frequent counting operations which also have to be atomic. At least
> for the benchmarks I tested, it is not the case and there is almost
> no overhead. Is it expected behavior? Or is it because the
> benchmarks are too simple (they are all single-file programs)? How
> do you estimate the overhead of ARC would be?
>

hmm, I wonder if your method of profiling is really finding all the ARC operations. The Swift version of regex-dna is about 25x slower than the Java version (on Linux). I looked at some prof profiles about a month ago and at the time roughly 80% of all execution samples were attributed to swift_retain/swift_release operations coming from CoreFoundation's regex implementation.

Question. Where is this regex-dna benchmark, is it in the swift benchmark suite?

Here is a Swift version:
http://benchmarksgame.alioth.debian.org/u64q/program.php?test=regexdna&lang=swift&id=2 <http://benchmarksgame.alioth.debian.org/u64q/program.php?test=regexdna&lang=swift&id=2>

And here is a Java version:
http://benchmarksgame.alioth.debian.org/u64q/program.php?test=regexdna&lang=java&id=7 <http://benchmarksgame.alioth.debian.org/u64q/program.php?test=regexdna&lang=java&id=7>

And, BTW, Swift version is not multi-threaded, but Java version is.

···

On Feb 17, 2017, at 2:31 PM, Michael Gottesman via swift-dev <swift-dev@swift.org> wrote:

On Feb 17, 2017, at 11:30 AM, David P Grove via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:
swift-dev-bounces@swift.org <mailto:swift-dev-bounces@swift.org> wrote on 02/16/2017 09:48:28 PM: >> > >> > I was curious about the overhead of ARC and started profiling some >> > benchmarks found in the Computer Language Benchmark Game (http://

--dave

(See attached file: regex-dna.svg)

<regex-dna.svg>_______________________________________________
swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev

_______________________________________________
swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev

dgrove-oss · February 17, 2017, 11:04pm

Sorry, I shouldn't have assumed that everyone know what the Computer
Language Benchmark Game was.

There's a set of 10 toy benchmarks written in 30 different languages that
are available here: http://benchmarksgame.alioth.debian.org/ The webpage
also lets you see the results of regular performance runs and see how the
various programs stack up against each other.

As usual with small benchmarks, there are lots of ways in which the
programs aren't realistic.

After dispatch became available on Linux with Swift 3, I had a side project
of going through the Swift implementations and adding concurrency to a few
Swift programs that didn't have it already.

regex-dna is the main outlier for Swift (which is why I had profiled it).
There's enough ARC overhead that using dispatch actually made it slower...
The sequential version for of regex-dna for Swift is:
http://benchmarksgame.alioth.debian.org/u64q/program.php?test=regexdna&lang=swift&id=2
My slower concurrent version is:
http://benchmarksgame.alioth.debian.org/u64q/program.php?test=regexdna&lang=swift&id=3

I suspect that the main fix for improving the performance of this program
is actually doing something in CoreFoundation, but I got sidetracked and
didn't finish looking into it.

--dave

···

From: Michael Gottesman <mgottesman@apple.com>
To: David P Grove/Watson/IBM@IBMUS
Cc: Jiho Choi <jray319@gmail.com>, swift-dev <swift-dev@swift.org>
Date: 02/17/2017 05:32 PM
Subject: Re: [swift-dev] Profiling ARC
Sent by: mgottesman@apple.com

On Feb 17, 2017, at 11:30 AM, David P Grove via swift-dev < swift-dev@swift.org> wrote:

swift-dev-bounces@swift.org wrote on 02/16/2017 09:48:28 PM: > > I was curious about the overhead of ARC and started profiling some > benchmarks found in the Computer Language Benchmark Game (http://
> benchmarksgame.alioth.debian.org/u64q/measurements.php?lang=swift).

      > So far, it seems that ARC sequence optimization is surprisingly
      good
      > and most benchmarks don't have to perform ARC operations as often
      as
      > I expected. I have some questions regarding this finding.
      >
      > I compiled all benchmarks with "-O -wmo" flags and counted the
      > number of calls to ARC runtime (e.g., swift_rt_swift_retain) using
      Pin.
      >
      > 1. Reference counting is considered to have high overhead due to
      > frequent counting operations which also have to be atomic. At
      least
      > for the benchmarks I tested, it is not the case and there is almost

> no overhead. Is it expected behavior? Or is it because the
> benchmarks are too simple (they are all single-file programs)? How

> do you estimate the overhead of ARC would be?
>

      hmm, I wonder if your method of profiling is really finding all the
      ARC operations. The Swift version of regex-dna is about 25x slower
      than the Java version (on Linux). I looked at some prof profiles
      about a month ago and at the time roughly 80% of all execution
      samples were attributed to swift_retain/swift_release operations
      coming from CoreFoundation's regex implementation.

Question. Where is this regex-dna benchmark, is it in the swift benchmark
suite?

--dave

(See attached file: regex-dna.svg)

      <regex-dna.svg>_______________________________________________
      swift-dev mailing list
      swift-dev@swift.org
      https://lists.swift.org/mailman/listinfo/swift-dev

Michael_Gottesman · February 20, 2017, 11:20pm

Are you talking about this one (there are two)?

http://benchmarksgame.alioth.debian.org/u64q/program.php?test=binarytrees&lang=swift&id=1 <http://benchmarksgame.alioth.debian.org/u64q/program.php?test=binarytrees&lang=swift&id=1>

Michael

···

On Feb 20, 2017, at 2:24 PM, Jiho Choi via swift-dev <swift-dev@swift.org> wrote:

You are right that regex has many ARC operations from libFoundation. Another outlier in terms of the number of ARC operations is binary-tree. In this case, ARC operations are from the user code, and the optimizer couldn't make much difference.

Other than these two, the optimizer seems working pretty well in removing ARC operations.

On Fri, Feb 17, 2017 at 1:34 PM David P Grove <groved@us.ibm.com <mailto:groved@us.ibm.com>> wrote:
swift-dev-bounces@swift.org <mailto:swift-dev-bounces@swift.org> wrote on 02/16/2017 09:48:28 PM: > > > > I was curious about the overhead of ARC and started profiling some > > benchmarks found in the Computer Language Benchmark Game (http://
> benchmarksgame.alioth.debian.org/u64q/measurements.php?lang=swift <http://benchmarksgame.alioth.debian.org/u64q/measurements.php?lang=swift>\).
> So far, it seems that ARC sequence optimization is surprisingly good
> and most benchmarks don't have to perform ARC operations as often as
> I expected. I have some questions regarding this finding.
>
> I compiled all benchmarks with "-O -wmo" flags and counted the
> number of calls to ARC runtime (e.g., swift_rt_swift_retain) using Pin.
>
> 1. Reference counting is considered to have high overhead due to
> frequent counting operations which also have to be atomic. At least
> for the benchmarks I tested, it is not the case and there is almost
> no overhead. Is it expected behavior? Or is it because the
> benchmarks are too simple (they are all single-file programs)? How
> do you estimate the overhead of ARC would be?
>

hmm, I wonder if your method of profiling is really finding all the ARC operations. The Swift version of regex-dna is about 25x slower than the Java version (on Linux). I looked at some prof profiles about a month ago and at the time roughly 80% of all execution samples were attributed to swift_retain/swift_release operations coming from CoreFoundation's regex implementation.

--dave

(See attached file: regex-dna.svg)
_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

Jiho_Choi · February 21, 2017, 2:31am

I used the older versions (binary-trees #6 & binary-trees #7) which I
downloaded a couple of weeks ago. It seems like they updated binary-trees
benchmarks since then.

I just profiled the one you linked and got a similar result. The optimizer
removed about 30% of ARC operations, which is better than almost none in
the older versions. However, compared to other benchmarks, where most of
ARC operations in the user code are removed, it is still pretty low.

···

On Mon, Feb 20, 2017 at 5:20 PM Michael Gottesman <mgottesman@apple.com> wrote:

Are you talking about this one (there are two)?

http://benchmarksgame.alioth.debian.org/u64q/program.php?test=binarytrees&lang=swift&id=1

Michael

On Feb 20, 2017, at 2:24 PM, Jiho Choi via swift-dev <swift-dev@swift.org> > wrote:

You are right that regex has many ARC operations from libFoundation.
Another outlier in terms of the number of ARC operations is binary-tree.
In this case, ARC operations are from the user code, and the optimizer
couldn't make much difference.

Other than these two, the optimizer seems working pretty well in removing
ARC operations.

On Fri, Feb 17, 2017 at 1:34 PM David P Grove <groved@us.ibm.com> wrote:

swift-dev-bounces@swift.org wrote on 02/16/2017 09:48:28 PM: > > > > I was curious about the overhead of ARC and started profiling some > > benchmarks found in the Computer Language Benchmark Game (http://
> benchmarksgame.alioth.debian.org/u64q/measurements.php?lang=swift).
> So far, it seems that ARC sequence optimization is surprisingly good
> and most benchmarks don't have to perform ARC operations as often as
> I expected. I have some questions regarding this finding.
>
> I compiled all benchmarks with "-O -wmo" flags and counted the
> number of calls to ARC runtime (e.g., swift_rt_swift_retain) using Pin.
>
> 1. Reference counting is considered to have high overhead due to
> frequent counting operations which also have to be atomic. At least
> for the benchmarks I tested, it is not the case and there is almost
> no overhead. Is it expected behavior? Or is it because the
> benchmarks are too simple (they are all single-file programs)? How
> do you estimate the overhead of ARC would be?
>

hmm, I wonder if your method of profiling is really finding all the ARC
operations. The Swift version of regex-dna is about 25x slower than the
Java version (on Linux). I looked at some prof profiles about a month ago
and at the time roughly 80% of all execution samples were attributed to
swift_retain/swift_release operations coming from CoreFoundation's regex
implementation.

--dave

*(See attached file: regex-dna.svg)*

_______________________________________________

swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

Michael_Gottesman · February 18, 2017, 12:03am

I am familiar with it, just not all of the benchmarks by name.

···

On Feb 17, 2017, at 3:04 PM, David P Grove <groved@us.ibm.com> wrote:

Sorry, I shouldn't have assumed that everyone know what the Computer Language Benchmark Game was.

There's a set of 10 toy benchmarks written in 30 different languages that are available here: http://benchmarksgame.alioth.debian.org/ The webpage also lets you see the results of regular performance runs and see how the various programs stack up against each other.

As usual with small benchmarks, there are lots of ways in which the programs aren't realistic.

After dispatch became available on Linux with Swift 3, I had a side project of going through the Swift implementations and adding concurrency to a few Swift programs that didn't have it already.

regex-dna is the main outlier for Swift (which is why I had profiled it). There's enough ARC overhead that using dispatch actually made it slower...
The sequential version for of regex-dna for Swift is: http://benchmarksgame.alioth.debian.org/u64q/program.php?test=regexdna&lang=swift&id=2 <http://benchmarksgame.alioth.debian.org/u64q/program.php?test=regexdna&lang=swift&id=2>
My slower concurrent version is: http://benchmarksgame.alioth.debian.org/u64q/program.php?test=regexdna&lang=swift&id=3 <http://benchmarksgame.alioth.debian.org/u64q/program.php?test=regexdna&lang=swift&id=3>

I suspect that the main fix for improving the performance of this program is actually doing something in CoreFoundation, but I got sidetracked and didn't finish looking into it.

--dave

<graycol.gif>Michael Gottesman ---02/17/2017 05:32:03 PM---> On Feb 17, 2017, at 11:30 AM, David P Grove via swift-dev <swift-dev@swift.org> wrote: >

From: Michael Gottesman <mgottesman@apple.com>
To: David P Grove/Watson/IBM@IBMUS
Cc: Jiho Choi <jray319@gmail.com>, swift-dev <swift-dev@swift.org>
Date: 02/17/2017 05:32 PM
Subject: Re: [swift-dev] Profiling ARC
Sent by: mgottesman@apple.com

On Feb 17, 2017, at 11:30 AM, David P Grove via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:
swift-dev-bounces@swift.org <mailto:swift-dev-bounces@swift.org> wrote on 02/16/2017 09:48:28 PM: > > > > I was curious about the overhead of ARC and started profiling some > > benchmarks found in the Computer Language Benchmark Game (http://
> benchmarksgame.alioth.debian.org/u64q/measurements.php?lang=swift <http://benchmarksgame.alioth.debian.org/u64q/measurements.php?lang=swift>\).
> So far, it seems that ARC sequence optimization is surprisingly good
> and most benchmarks don't have to perform ARC operations as often as
> I expected. I have some questions regarding this finding.
>
> I compiled all benchmarks with "-O -wmo" flags and counted the
> number of calls to ARC runtime (e.g., swift_rt_swift_retain) using Pin.
>
> 1. Reference counting is considered to have high overhead due to
> frequent counting operations which also have to be atomic. At least
> for the benchmarks I tested, it is not the case and there is almost
> no overhead. Is it expected behavior? Or is it because the
> benchmarks are too simple (they are all single-file programs)? How
> do you estimate the overhead of ARC would be?
>

hmm, I wonder if your method of profiling is really finding all the ARC operations. The Swift version of regex-dna is about 25x slower than the Java version (on Linux). I looked at some prof profiles about a month ago and at the time roughly 80% of all execution samples were attributed to swift_retain/swift_release operations coming from CoreFoundation's regex implementation.

Question. Where is this regex-dna benchmark, is it in the swift benchmark suite?

--dave

(See attached file: regex-dna.svg)
<regex-dna.svg>_______________________________________________
swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev

Michael_Gottesman · February 21, 2017, 5:18am

I used the older versions (binary-trees #6 & binary-trees #7) which I downloaded a couple of weeks ago. It seems like they updated binary-trees benchmarks since then.

I just profiled the one you linked and got a similar result. The optimizer removed about 30% of ARC operations, which is better than almost none in the older versions. However, compared to other benchmarks, where most of ARC operations in the user code are removed, it is still pretty low.

Sure. I wasn't saying anything about the number of ARC operations in that benchmark. I just wanted to be clear which benchmark was being talked about that is all.

···

On Feb 20, 2017, at 6:31 PM, Jiho Choi <jray319@gmail.com> wrote:

On Mon, Feb 20, 2017 at 5:20 PM Michael Gottesman <mgottesman@apple.com <mailto:mgottesman@apple.com>> wrote:
Are you talking about this one (there are two)?

http://benchmarksgame.alioth.debian.org/u64q/program.php?test=binarytrees&lang=swift&id=1 <http://benchmarksgame.alioth.debian.org/u64q/program.php?test=binarytrees&lang=swift&id=1>

Michael

On Feb 20, 2017, at 2:24 PM, Jiho Choi via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

You are right that regex has many ARC operations from libFoundation. Another outlier in terms of the number of ARC operations is binary-tree. In this case, ARC operations are from the user code, and the optimizer couldn't make much difference.

Other than these two, the optimizer seems working pretty well in removing ARC operations.

On Fri, Feb 17, 2017 at 1:34 PM David P Grove <groved@us.ibm.com <mailto:groved@us.ibm.com>> wrote:
swift-dev-bounces@swift.org <mailto:swift-dev-bounces@swift.org> wrote on 02/16/2017 09:48:28 PM: >> > >> > I was curious about the overhead of ARC and started profiling some >> > benchmarks found in the Computer Language Benchmark Game (http://
> benchmarksgame.alioth.debian.org/u64q/measurements.php?lang=swift <http://benchmarksgame.alioth.debian.org/u64q/measurements.php?lang=swift>\).
> So far, it seems that ARC sequence optimization is surprisingly good
> and most benchmarks don't have to perform ARC operations as often as
> I expected. I have some questions regarding this finding.
>
> I compiled all benchmarks with "-O -wmo" flags and counted the
> number of calls to ARC runtime (e.g., swift_rt_swift_retain) using Pin.
>
> 1. Reference counting is considered to have high overhead due to
> frequent counting operations which also have to be atomic. At least
> for the benchmarks I tested, it is not the case and there is almost
> no overhead. Is it expected behavior? Or is it because the
> benchmarks are too simple (they are all single-file programs)? How
> do you estimate the overhead of ARC would be?
>

hmm, I wonder if your method of profiling is really finding all the ARC operations. The Swift version of regex-dna is about 25x slower than the Java version (on Linux). I looked at some prof profiles about a month ago and at the time roughly 80% of all execution samples were attributed to swift_retain/swift_release operations coming from CoreFoundation's regex implementation.

--dave

(See attached file: regex-dna.svg)

_______________________________________________

swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev