Huge difference in speed in Array<Class>/Array<Struct> on OSX/Ubuntu platform with same? toolchain


(Sergey Kuratov) #1

Hello,
I've just started to work with Swift (have C++ background) and try to make
some tests to avoid problems in design. But from beginning I met very
disappointing case.

1. So I have 2 similar computers:
- Ubuntu 15.10 with clang and Swift-2.2 dev 2016.01.25 for Ubuntu 15.10
- iMac with OSX 10.11 and Swift-2.2 dev 2016.01.25 fro OSX (switched to
this toolchain)

2. Took extremely simple source code from n-body sample
<http://benchmarksgame.alioth.debian.org/u64q/program.php?test=nbody&lang=swift&id=3>

3. Set Release version with -Ounchecked -whole-module-optimization for both
computers and get results:

Ubuntu - 8.4 sec
OSX - 167.3 sec
Was demolished by results

4. Changed "class Body" to "struct Body" and get another results:

Ubuntu - 12.2 sec
OSX - 10.0 sec
Again was deeply moved by results

5. Added
*let arrPtr = UnsafeMutablePointer<Body>(bodies)*
before loops and used arrPtr instead of bodies in loops. Got new result:

For "class Body" variant:
Ubuntu - 7.3 sec
OSX - 11.1 sec

For "struct Body" variant:
Ubuntu - 6.7 sec
OSX - 8.8 sec

I believe something wrong with Swift compiler/optimizer if for the same x64
processor same toolchain generate very different code. Especially in 1 case
with "class Body" on OSX platform - I guess it's bug. Can somebody comment
results ?

PS: Forgot to say - similar Java program n-body in Java
<http://benchmarksgame.alioth.debian.org/u64q/program.php?test=nbody&lang=java&id=2>
takes 6.5 sec on Ubuntu and 9.2 sec on OSX


(Joe Groff) #2

Arrays of class type have additional overhead on Apple platforms due to the NSArray interoperability. You can use ContiguousArray<Class> if you don't need to interoperate with Objective-C to get more consistent performance.

-Joe

···

On Jan 28, 2016, at 10:30 PM, Sergey Kuratov via swift-users <swift-users@swift.org> wrote:

Hello,
I've just started to work with Swift (have C++ background) and try to make some tests to avoid problems in design. But from beginning I met very disappointing case.

1. So I have 2 similar computers:
- Ubuntu 15.10 with clang and Swift-2.2 dev 2016.01.25 for Ubuntu 15.10
- iMac with OSX 10.11 and Swift-2.2 dev 2016.01.25 fro OSX (switched to this toolchain)

2. Took extremely simple source code from n-body sample <http://benchmarksgame.alioth.debian.org/u64q/program.php?test=nbody&lang=swift&id=3>

3. Set Release version with -Ounchecked -whole-module-optimization for both computers and get results:

Ubuntu - 8.4 sec
OSX - 167.3 sec
Was demolished by results

4. Changed "class Body" to "struct Body" and get another results:

Ubuntu - 12.2 sec
OSX - 10.0 sec
Again was deeply moved by results

5. Added
let arrPtr = UnsafeMutablePointer<Body>(bodies)
before loops and used arrPtr instead of bodies in loops. Got new result:

For "class Body" variant:
Ubuntu - 7.3 sec
OSX - 11.1 sec

For "struct Body" variant:
Ubuntu - 6.7 sec
OSX - 8.8 sec

I believe something wrong with Swift compiler/optimizer if for the same x64 processor same toolchain generate very different code. Especially in 1 case with "class Body" on OSX platform - I guess it's bug. Can somebody comment results ?

PS: Forgot to say - similar Java program n-body in Java <http://benchmarksgame.alioth.debian.org/u64q/program.php?test=nbody&lang=java&id=2> takes 6.5 sec on Ubuntu and 9.2 sec on OSX
_______________________________________________
swift-users mailing list
swift-users@swift.org
https://lists.swift.org/mailman/listinfo/swift-users


(Joe Groff) #3

Dave Abrahams can probably elaborate on the history better than me, but the current design balances a number of considerations. Microbenchmarks tend to be poor indicators of real-world performance, and in most Cocoa application code, arrays are shuttled into and out of Cocoa pretty frequently, so forcing every index read operation to do a deep conversion was unacceptable.

-Joe

···

On Jan 29, 2016, at 11:32 AM, Trent Nadeau <tanadeau@gmail.com> wrote:

I haven't looked at the bridging code yet, but is it possible that the overhead could only be incurred on the first conversion to NSArray? That way if you don't use that functionality it's consistent across platforms, and if you do use it, you get an initial O(n) hit and then O(1) afterwards.


(Trent Nadeau) #4

I haven't looked at the bridging code yet, but is it possible that the
overhead could only be incurred on the first conversion to NSArray? That
way if you don't use that functionality it's consistent across platforms,
and if you do use it, you get an initial O(n) hit and then O(1) afterwards.

···

On Fri, Jan 29, 2016 at 2:27 PM, Joe Groff via swift-users < swift-users@swift.org> wrote:

Arrays of class type have additional overhead on Apple platforms due to
the NSArray interoperability. You can use ContiguousArray<Class> if you
don't need to interoperate with Objective-C to get more consistent
performance.

-Joe

On Jan 28, 2016, at 10:30 PM, Sergey Kuratov via swift-users < > swift-users@swift.org> wrote:

Hello,
I've just started to work with Swift (have C++ background) and try to make
some tests to avoid problems in design. But from beginning I met very
disappointing case.

1. So I have 2 similar computers:
- Ubuntu 15.10 with clang and Swift-2.2 dev 2016.01.25 for Ubuntu 15.10
- iMac with OSX 10.11 and Swift-2.2 dev 2016.01.25 fro OSX (switched to
this toolchain)

2. Took extremely simple source code from n-body sample
<http://benchmarksgame.alioth.debian.org/u64q/program.php?test=nbody&lang=swift&id=3>

3. Set Release version with -Ounchecked -whole-module-optimization for
both computers and get results:

Ubuntu - 8.4 sec
OSX - 167.3 sec
Was demolished by results

4. Changed "class Body" to "struct Body" and get another results:

Ubuntu - 12.2 sec
OSX - 10.0 sec
Again was deeply moved by results

5. Added
*let arrPtr = UnsafeMutablePointer<Body>(bodies)*
before loops and used arrPtr instead of bodies in loops. Got new result:

For "class Body" variant:
Ubuntu - 7.3 sec
OSX - 11.1 sec

For "struct Body" variant:
Ubuntu - 6.7 sec
OSX - 8.8 sec

I believe something wrong with Swift compiler/optimizer if for the same
x64 processor same toolchain generate very different code. Especially in 1
case with "class Body" on OSX platform - I guess it's bug. Can somebody
comment results ?

PS: Forgot to say - similar Java program n-body in Java
<http://benchmarksgame.alioth.debian.org/u64q/program.php?test=nbody&lang=java&id=2>
takes 6.5 sec on Ubuntu and 9.2 sec on OSX
_______________________________________________
swift-users mailing list
swift-users@swift.org
https://lists.swift.org/mailman/listinfo/swift-users

_______________________________________________
swift-users mailing list
swift-users@swift.org
https://lists.swift.org/mailman/listinfo/swift-users

--
Trent Nadeau


(Sergey Kuratov) #5

Basically I understand what's going on (especially from Instruments), but
question is not about implementation.
I'm speaking about UNIFORM behaviour for all platform (actually I guess we
will see Windows/Android and so on ports).
So Array on Linux should work as Array on OSX, Windows, Android, ....
If we need specific OS behaviour we should make specific class like
ObjcArray<Class>, .... Isn't it right way ?
Or at least in language guide should be written "Use Array only if you need
to interact with ObjC runtime..." and of course same about all other ObjC
classes. Right ?

···

On Sat, Jan 30, 2016 at 5:55 AM, Joe Groff <jgroff@apple.com> wrote:

> On Jan 29, 2016, at 11:32 AM, Trent Nadeau <tanadeau@gmail.com> wrote:
>
> I haven't looked at the bridging code yet, but is it possible that the
overhead could only be incurred on the first conversion to NSArray? That
way if you don't use that functionality it's consistent across platforms,
and if you do use it, you get an initial O(n) hit and then O(1) afterwards.

Dave Abrahams can probably elaborate on the history better than me, but
the current design balances a number of considerations. Microbenchmarks
tend to be poor indicators of real-world performance, and in most Cocoa
application code, arrays are shuttled into and out of Cocoa pretty
frequently, so forcing every index read operation to do a deep conversion
was unacceptable.

-Joe


(Dave Abrahams) #6

I haven't looked at the bridging code yet, but is it possible that the
overhead could only be incurred on the first conversion to NSArray? That
way if you don't use that functionality it's consistent across platforms,
and if you do use it, you get an initial O(n) hit and then O(1)
afterwards.

There's no overhead on conversion to NSArray. The overhead applies to
all arrays aren't statically known not to be backed by a NSArray that
comes from ObjC. In practice, that's all Arrays whose Element type is a
class or ObjC existential. The overhead is due to a branch on each access,
where we decide whether to take the fast path (the backing storage was
created by Swift, a.k.a. a "native" buffer) or the slow path (the
backing storage is an otherwise unknown NSArray). There is a similar
slow path for a deferred type when accessing elements in the results of

  someArrayOfBase as [Derived]

We have discussed other ways to slice the efficiency and usability
tradeoffs here; if you'd like to talk about that I'd recommend bringing
it up on the swift-evolution list.

···

on Fri Jan 29 2016, Trent Nadeau <swift-users-AT-swift.org> wrote:

On Fri, Jan 29, 2016 at 2:27 PM, Joe Groff via swift-users < > swift-users@swift.org> wrote:

Arrays of class type have additional overhead on Apple platforms due to
the NSArray interoperability. You can use ContiguousArray<Class> if you
don't need to interoperate with Objective-C to get more consistent
performance.

-Joe

On Jan 28, 2016, at 10:30 PM, Sergey Kuratov via swift-users < >> swift-users@swift.org> wrote:

Hello,
I've just started to work with Swift (have C++ background) and try to make
some tests to avoid problems in design. But from beginning I met very
disappointing case.

1. So I have 2 similar computers:
- Ubuntu 15.10 with clang and Swift-2.2 dev 2016.01.25 for Ubuntu 15.10
- iMac with OSX 10.11 and Swift-2.2 dev 2016.01.25 fro OSX (switched to
this toolchain)

2. Took extremely simple source code from n-body sample
<http://benchmarksgame.alioth.debian.org/u64q/program.php?test=nbody&lang=swift&id=3>

3. Set Release version with -Ounchecked -whole-module-optimization for
both computers and get results:

Ubuntu - 8.4 sec
OSX - 167.3 sec
Was demolished by results

4. Changed "class Body" to "struct Body" and get another results:

Ubuntu - 12.2 sec
OSX - 10.0 sec
Again was deeply moved by results

5. Added
*let arrPtr = UnsafeMutablePointer<Body>(bodies)*
before loops and used arrPtr instead of bodies in loops. Got new result:

For "class Body" variant:
Ubuntu - 7.3 sec
OSX - 11.1 sec

For "struct Body" variant:
Ubuntu - 6.7 sec
OSX - 8.8 sec

I believe something wrong with Swift compiler/optimizer if for the same
x64 processor same toolchain generate very different code. Especially in 1
case with "class Body" on OSX platform - I guess it's bug. Can somebody
comment results ?

PS: Forgot to say - similar Java program n-body in Java
<http://benchmarksgame.alioth.debian.org/u64q/program.php?test=nbody&lang=java&id=2>
takes 6.5 sec on Ubuntu and 9.2 sec on OSX
_______________________________________________
swift-users mailing list
swift-users@swift.org
https://lists.swift.org/mailman/listinfo/swift-users

_______________________________________________
swift-users mailing list
swift-users@swift.org
https://lists.swift.org/mailman/listinfo/swift-users

HTH,

--
-Dave


(Nadav Rotem) #7

Arrays of class type have additional overhead on Apple platforms due to the NSArray interoperability. You can use ContiguousArray<Class> if you don't need to interoperate with Objective-C to get more consistent performance.

Using ContiguousArray is a very good idea. You can find other optimization tips and tricks here:

https://github.com/apple/swift/blob/master/docs/OptimizationTips.rst

-Nadav

···

On Jan 29, 2016, at 11:27 AM, Joe Groff via swift-users <swift-users@swift.org> wrote:

-Joe

On Jan 28, 2016, at 10:30 PM, Sergey Kuratov via swift-users <swift-users@swift.org <mailto:swift-users@swift.org>> wrote:

Hello,
I've just started to work with Swift (have C++ background) and try to make some tests to avoid problems in design. But from beginning I met very disappointing case.

1. So I have 2 similar computers:
- Ubuntu 15.10 with clang and Swift-2.2 dev 2016.01.25 for Ubuntu 15.10
- iMac with OSX 10.11 and Swift-2.2 dev 2016.01.25 fro OSX (switched to this toolchain)

2. Took extremely simple source code from n-body sample <http://benchmarksgame.alioth.debian.org/u64q/program.php?test=nbody&lang=swift&id=3>

3. Set Release version with -Ounchecked -whole-module-optimization for both computers and get results:

Ubuntu - 8.4 sec
OSX - 167.3 sec
Was demolished by results

4. Changed "class Body" to "struct Body" and get another results:

Ubuntu - 12.2 sec
OSX - 10.0 sec
Again was deeply moved by results

5. Added
let arrPtr = UnsafeMutablePointer<Body>(bodies)
before loops and used arrPtr instead of bodies in loops. Got new result:

For "class Body" variant:
Ubuntu - 7.3 sec
OSX - 11.1 sec

For "struct Body" variant:
Ubuntu - 6.7 sec
OSX - 8.8 sec

I believe something wrong with Swift compiler/optimizer if for the same x64 processor same toolchain generate very different code. Especially in 1 case with "class Body" on OSX platform - I guess it's bug. Can somebody comment results ?

PS: Forgot to say - similar Java program n-body in Java <http://benchmarksgame.alioth.debian.org/u64q/program.php?test=nbody&lang=java&id=2> takes 6.5 sec on Ubuntu and 9.2 sec on OSX
_______________________________________________
swift-users mailing list
swift-users@swift.org <mailto:swift-users@swift.org>
https://lists.swift.org/mailman/listinfo/swift-users

_______________________________________________
swift-users mailing list
swift-users@swift.org
https://lists.swift.org/mailman/listinfo/swift-users