[Review] Remove C-style for-loops with conditions and incrementers

Paul_Cantrell · December 11, 2015, 5:12am

Hold the presses.

David, I found the radical differences in our results troubling, so I did some digging. It turns out that the zip+stride code:

    var sum = 0
    for (i, j) in zip(first.stride(to: 0, by: -1), second.stride(to: 0, by: -2)) {
        if i % 2 == 0 { continue }
        sum += 1
    }

…runs much faster if you actually use both i and j inside the loop:

    var sum = 0
    for (i, j) in zip(first.stride(to: 0, by: -1), second.stride(to: 0, by: -2)) {
        if i % 2 == 0 { continue }
        sum += i-j
    }

Weird, right? This is with optimization on (default “production” build). It smells like a compiler quirk.

With that tweak, the zip+stride approach actually clocks in faster than the C-style for. Yes, you read that right: faster. Also smells like a quirk. Am I doing something fantastically stupid in my code? Or maybe it’s just my idiosyncratic taste in indentation? :P

Here’s my test case, which was a command-line app with manual timing, followed by David’s dropped into the same harness, followed by David’s but with sum += i-j instead of sum += 1:

https://gist.github.com/pcantrell/6bbe80e630d227ed0262

Point is: no big performance difference here; even a performance advantage (that is probably a compiler artifact).

David and Thorsten, you might want to reconsider your reviews?

Results:

—————— Paul’s comparison ——————

zip+stride

  Iter 0: 0.519110977649689
  Iter 1: 0.503385007381439
  Iter 2: 0.503321051597595
  Iter 3: 0.485216021537781
  Iter 4: 0.524757027626038
  Iter 5: 0.478078007698059
  Iter 6: 0.503880977630615
  Iter 7: 0.498068988323212
  Iter 8: 0.485781013965607

···

——————————————
Median: 0.524757027626038

C-style

  Iter 0: 0.85480797290802
  Iter 1: 0.879491031169891
  Iter 2: 0.851797997951508
  Iter 3: 0.836017966270447
  Iter 4: 0.863684952259064
  Iter 5: 0.837742984294891
  Iter 6: 0.839070022106171
  Iter 7: 0.849772989749908
  Iter 8: 0.819278955459595
         ——————————————
  Median: 0.863684952259064

Zip+stride takes 0.607579217692143x the time of C-style for

—————— David’s comparison ——————

zip+stride

  Iter 0: 1.15285503864288
  Iter 1: 1.1244450211525
  Iter 2: 1.24192994832993
  Iter 3: 1.02782195806503
  Iter 4: 1.13640999794006
  Iter 5: 1.15879601240158
  Iter 6: 1.12114900350571
  Iter 7: 1.21364599466324
  Iter 8: 1.10698300600052
         ——————————————
  Median: 1.13640999794006

C-style

  Iter 0: 0.375869989395142
  Iter 1: 0.371365010738373
  Iter 2: 0.356527984142303
  Iter 3: 0.384984970092773
  Iter 4: 0.367590010166168
  Iter 5: 0.365644037723541
  Iter 6: 0.384257972240448
  Iter 7: 0.379297018051147
  Iter 8: 0.363133013248444
         ——————————————
  Median: 0.367590010166168

Zip+stride takes 3.09151491202482x the time of C-style for

—————— David’s comparison, actually using indices in the loop ——————

zip+stride

  Iter 0: 0.328687965869904
  Iter 1: 0.332105994224548
  Iter 2: 0.336817979812622
  Iter 3: 0.321089029312134
  Iter 4: 0.338591992855072
  Iter 5: 0.348567008972168
  Iter 6: 0.34687602519989
  Iter 7: 0.34755402803421
  Iter 8: 0.341500997543335
         ——————————————
  Median: 0.338591992855072

C-style

  Iter 0: 0.422354996204376
  Iter 1: 0.427953958511353
  Iter 2: 0.403640985488892
  Iter 3: 0.415378987789154
  Iter 4: 0.403639018535614
  Iter 5: 0.416707038879395
  Iter 6: 0.415345013141632
  Iter 7: 0.417587995529175
  Iter 8: 0.415713012218475
         ——————————————
  Median: 0.403639018535614

Zip+stride takes 0.838848518865867x the time of C-style for

Program ended with exit code: 0

Cheers,

Paul

On Dec 10, 2015, at 5:36 PM, David Owens II <david@owensd.io> wrote:

Here’s my basic test case:

let first = 10000000
let second = 20000000

class LoopPerfTests: XCTestCase {

    func testZipStride() {
        self.measureBlock {
            var sum = 0
            for (i, j) in zip(first.stride(to: 0, by: -1), second.stride(to: 0, by: -2)) {
                if i % 2 == 0 { continue }
                sum += 1
            }
            print(sum)
        }
    }

    func testCStyleFor() {
        self.measureBlock {
            var sum = 0
            for var i = first, j = second; i > 0 && j > 0; i -= 1, j -= 2 {
                if i % 2 == 0 { continue }
                sum += 1
            }
            print(sum)
        }

    }

}

Non-optimized timings:
testCStyleFor - 0.126s
testZipStride - 2.189s

Optimized timings:
testCStyleFor - 0.008s
testZipStride - 0.015s

That’s a lot worse than 34%; even in optimized builds, that’s 2x slower and in debug builds, that’s 17x slower. I think it’s unreasonable to force people to write a more verbose while-loop construct to simply get the performance they need.

Also, the readability argument is very subjective; for example, I don’t find the zip version more readability. In fact, I think it obscures what the logic of the loop is doing. But again, that’s subjective.

-David

On Dec 10, 2015, at 2:41 PM, Paul Cantrell <cantrell@pobox.com <mailto:cantrell@pobox.com>> wrote:

Is there any guarantee that these two loops have the exact same runtime performance?

for (i, j) in zip(10.stride(to: 0, by: -1), 20.stride(to: 0, by: -2)) {
   if i % 2 == 0 { continue }
   print(i, j)
}

for var i = 10, j = 20; i > 0 && j > 0; i -= 1, j -= 2 {
   if i % 2 == 0 { continue }
   print(i, j)
}

In a quick and dirty test, the second is approximately 34% slower.

I’d say that’s more than acceptable for the readability gain. If you’re in that rare stretch of critical code where the extra 34% actually matters, write it using a while loop instead.

P

On Dec 10, 2015, at 4:07 PM, David Owens II via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Dec 10, 2015, at 1:57 PM, thorsten--- via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Yes, performance is one thing neglected by the discussions and the proposal.

This is my primary objection to to this proposal; it assumes (or neglects?) that all of the types used can magically be inlined to nothing but the imperative code. This isn’t magical, someone has to implement the optimizations to do this.

Is there any guarantee that these two loops have the exact same runtime performance?

for (i, j) in zip(10.stride(to: 0, by: -1), 20.stride(to: 0, by: -2)) {
   if i % 2 == 0 { continue }
   print(i, j)
}

for var i = 10, j = 20; i > 0 && j > 0; i -= 1, j -= 2 {
   if i % 2 == 0 { continue }
   print(i, j)
}

And can you guarantee that performance is relatively the same across debug and release builds? Because historically, Swift has suffered greatly in this regard with respects to the performance of optimized versus non-optimized builds.

These types of optimizer issues are real-world things I’ve had to deal with (and have written up many blog posts about). I get the desire to simplify the constructs, but we need an escape hatch to write performant code when the optimizer isn’t up to the job.

-David

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

thorsten · December 11, 2015, 7:47am

Big thanks for doing the measurements!

-Thorsten

···

Am 11.12.2015 um 00:36 schrieb David Owens II <david@owensd.io>:

Here’s my basic test case:

let first = 10000000
let second = 20000000

class LoopPerfTests: XCTestCase {

    func testZipStride() {
        self.measureBlock {
            var sum = 0
            for (i, j) in zip(first.stride(to: 0, by: -1), second.stride(to: 0, by: -2)) {
                if i % 2 == 0 { continue }
                sum += 1
            }
            print(sum)
        }
    }

    func testCStyleFor() {
        self.measureBlock {
            var sum = 0
            for var i = first, j = second; i > 0 && j > 0; i -= 1, j -= 2 {
                if i % 2 == 0 { continue }
                sum += 1
            }
            print(sum)
        }

    }

}

Non-optimized timings:
testCStyleFor - 0.126s
testZipStride - 2.189s

Optimized timings:
testCStyleFor - 0.008s
testZipStride - 0.015s

That’s a lot worse than 34%; even in optimized builds, that’s 2x slower and in debug builds, that’s 17x slower. I think it’s unreasonable to force people to write a more verbose while-loop construct to simply get the performance they need.

Also, the readability argument is very subjective; for example, I don’t find the zip version more readability. In fact, I think it obscures what the logic of the loop is doing. But again, that’s subjective.

-David

On Dec 10, 2015, at 2:41 PM, Paul Cantrell <cantrell@pobox.com> wrote:

Is there any guarantee that these two loops have the exact same runtime performance?

for (i, j) in zip(10.stride(to: 0, by: -1), 20.stride(to: 0, by: -2)) {
   if i % 2 == 0 { continue }
   print(i, j)
}

for var i = 10, j = 20; i > 0 && j > 0; i -= 1, j -= 2 {
   if i % 2 == 0 { continue }
   print(i, j)
}

In a quick and dirty test, the second is approximately 34% slower.

I’d say that’s more than acceptable for the readability gain. If you’re in that rare stretch of critical code where the extra 34% actually matters, write it using a while loop instead.

P

On Dec 10, 2015, at 4:07 PM, David Owens II via swift-evolution <swift-evolution@swift.org> wrote:

On Dec 10, 2015, at 1:57 PM, thorsten--- via swift-evolution <swift-evolution@swift.org> wrote:

Yes, performance is one thing neglected by the discussions and the proposal.

This is my primary objection to to this proposal; it assumes (or neglects?) that all of the types used can magically be inlined to nothing but the imperative code. This isn’t magical, someone has to implement the optimizations to do this.

Is there any guarantee that these two loops have the exact same runtime performance?

for (i, j) in zip(10.stride(to: 0, by: -1), 20.stride(to: 0, by: -2)) {
   if i % 2 == 0 { continue }
   print(i, j)
}

for var i = 10, j = 20; i > 0 && j > 0; i -= 1, j -= 2 {
   if i % 2 == 0 { continue }
   print(i, j)
}

And can you guarantee that performance is relatively the same across debug and release builds? Because historically, Swift has suffered greatly in this regard with respects to the performance of optimized versus non-optimized builds.

These types of optimizer issues are real-world things I’ve had to deal with (and have written up many blog posts about). I get the desire to simplify the constructs, but we need an escape hatch to write performant code when the optimizer isn’t up to the job.

-David

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

thorsten · December 11, 2015, 7:55am

I revoked my support for two reasons: One, worries about performance which has in the meantime been confirmed by David Owens II to suffer considerably, and Two, by repeated suggestions of using defer for the incrementing clause even though it changes semantics.
Removing the for-loop would therefore mean that we lose a construct needed in cases whe high performance is requird and that instead of incrasing readability we would decrease it in those cases where defer would be misused.

-Thorsten

···

Am 11.12.2015 um 01:33 schrieb David Waite <david@alkaline-solutions.com>:

On Dec 10, 2015, at 2:57 PM, thorsten--- via swift-evolution <swift-evolution@swift.org> wrote:

Yes, performance is one thing neglected by the discussions and the proposal.
That just occurred to me and I'm going to revoke my support for the proposal for these reasons.

-Thorsten

Do you mean your support is conditional based on performance?

-DW

gparker42 · December 11, 2015, 5:34am

One problem: zip+stride suffers tremendously at -Onone. One test looked like this (normalized elapsed time; smaller is better)

1.0 zip+stride -O
1.2 for(;;) -O
19.3 for(;;) -Onone
261.7 zip+stride -Onone

Presumably all of these can be improved with compiler work, but I don't know how far zip+stride can be pushed in the -Onone case.

···

On Dec 10, 2015, at 9:12 PM, Paul Cantrell via swift-evolution <swift-evolution@swift.org> wrote:

David, I found the radical differences in our results troubling, so I did some digging. It turns out that the zip+stride code:

…runs much faster if you actually use both i and j inside the loop:

Weird, right? This is with optimization on (default “production” build). It smells like a compiler quirk.

With that tweak, the zip+stride approach actually clocks in faster than the C-style for. Yes, you read that right: faster. Also smells like a quirk. Am I doing something fantastically stupid in my code? Or maybe it’s just my idiosyncratic taste in indentation? :P

--
Greg Parker gparker@apple.com Runtime Wrangler

owensd · December 11, 2015, 8:33am

I don’t know what you did, your gist 404s.

Here’s an update with the while-loop: Performance for Iterations In Swift · GitHub and using both i and j within the loops. This is a simple OS X framework project with unit tests.

Debug Build:
testZipStride - 2.496s
testCStyleFor - 0.210s
testWhileLoop - 0.220s

Release Build:
testZipStride - 0.029s
testCStyleFor - 0.018s
testWhileLoop - 0.019s

I ran these tests from my MacBook Pro, the previous tests were from my iMac.

When you use the sum += (i - j) construct, I think all you are ending up with a hot-path that the optimizer can end up optimizing better (my guess is that the i-j turns into a constant expression - after all, the difference is always 1, but I don’t know enough about the SIL representation to confirm that). If you use a code path where that expression is not constant time (again, assuming my suspicion is correct), the zip+stride is against slower.

I would argue the following:

The code is not objectively easier to read or understand with the zip+stride construct (arguably, they are not even semantically equivalent).
The debug builds are prohibitively slower, especially in the context of high-performance requirement code (I’m doing a lot of prototyping Swift in the context of games, so yes, performance matters considerably).
The optimized builds are still slower than the for-in “equivalent" functionality.
The optimizer is inconsistent, like all optimizers are (this is a simple truth, not a value judgement - optimizers are not magic, they are code that is run like any other code and can only do as well as they are coded under the conditions they are coded against), at actually producing similar results with code that ends up with slightly different shapes.
There is not functionally equivalent version of the code that I can write that is not more verbose, while requiring artificial scoping constructs, to achieve the same behavior.

So no, there is no evidence that I’ve seen to reconsider my opinion that this proposal should not be implemented. If there is evidence to show that my findings are incorrect or a poor summary of the general problem I am seeing, then of course I would reconsider my opinion.

-David

···

On Dec 10, 2015, at 9:12 PM, Paul Cantrell <cantrell@pobox.com> wrote:

Hold the presses.

David, I found the radical differences in our results troubling, so I did some digging. It turns out that the zip+stride code:

    var sum = 0
    for (i, j) in zip(first.stride(to: 0, by: -1), second.stride(to: 0, by: -2)) {
        if i % 2 == 0 { continue }
        sum += 1
    }

…runs much faster if you actually use both i and j inside the loop:

    var sum = 0
    for (i, j) in zip(first.stride(to: 0, by: -1), second.stride(to: 0, by: -2)) {
        if i % 2 == 0 { continue }
        sum += i-j
    }

Weird, right? This is with optimization on (default “production” build). It smells like a compiler quirk.

With that tweak, the zip+stride approach actually clocks in faster than the C-style for. Yes, you read that right: faster. Also smells like a quirk. Am I doing something fantastically stupid in my code? Or maybe it’s just my idiosyncratic taste in indentation? :P

Here’s my test case, which was a command-line app with manual timing, followed by David’s dropped into the same harness, followed by David’s but with sum += i-j instead of sum += 1:

    https://gist.github.com/pcantrell/6bbe80e630d227ed0262

Point is: no big performance difference here; even a performance advantage (that is probably a compiler artifact).

David and Thorsten, you might want to reconsider your reviews?

Results:

—————— Paul’s comparison ——————

zip+stride

  Iter 0: 0.519110977649689
  Iter 1: 0.503385007381439
  Iter 2: 0.503321051597595
  Iter 3: 0.485216021537781
  Iter 4: 0.524757027626038
  Iter 5: 0.478078007698059
  Iter 6: 0.503880977630615
  Iter 7: 0.498068988323212
  Iter 8: 0.485781013965607
         ——————————————
  Median: 0.524757027626038

C-style

  Iter 0: 0.85480797290802
  Iter 1: 0.879491031169891
  Iter 2: 0.851797997951508
  Iter 3: 0.836017966270447
  Iter 4: 0.863684952259064
  Iter 5: 0.837742984294891
  Iter 6: 0.839070022106171
  Iter 7: 0.849772989749908
  Iter 8: 0.819278955459595
         ——————————————
  Median: 0.863684952259064

Zip+stride takes 0.607579217692143x the time of C-style for

—————— David’s comparison ——————

zip+stride

  Iter 0: 1.15285503864288
  Iter 1: 1.1244450211525
  Iter 2: 1.24192994832993
  Iter 3: 1.02782195806503
  Iter 4: 1.13640999794006
  Iter 5: 1.15879601240158
  Iter 6: 1.12114900350571
  Iter 7: 1.21364599466324
  Iter 8: 1.10698300600052
         ——————————————
  Median: 1.13640999794006

C-style

  Iter 0: 0.375869989395142
  Iter 1: 0.371365010738373
  Iter 2: 0.356527984142303
  Iter 3: 0.384984970092773
  Iter 4: 0.367590010166168
  Iter 5: 0.365644037723541
  Iter 6: 0.384257972240448
  Iter 7: 0.379297018051147
  Iter 8: 0.363133013248444
         ——————————————
  Median: 0.367590010166168

Zip+stride takes 3.09151491202482x the time of C-style for

—————— David’s comparison, actually using indices in the loop ——————

zip+stride

  Iter 0: 0.328687965869904
  Iter 1: 0.332105994224548
  Iter 2: 0.336817979812622
  Iter 3: 0.321089029312134
  Iter 4: 0.338591992855072
  Iter 5: 0.348567008972168
  Iter 6: 0.34687602519989
  Iter 7: 0.34755402803421
  Iter 8: 0.341500997543335
         ——————————————
  Median: 0.338591992855072

C-style

  Iter 0: 0.422354996204376
  Iter 1: 0.427953958511353
  Iter 2: 0.403640985488892
  Iter 3: 0.415378987789154
  Iter 4: 0.403639018535614
  Iter 5: 0.416707038879395
  Iter 6: 0.415345013141632
  Iter 7: 0.417587995529175
  Iter 8: 0.415713012218475
         ——————————————
  Median: 0.403639018535614

Zip+stride takes 0.838848518865867x the time of C-style for

Program ended with exit code: 0

Cheers,

Paul

On Dec 10, 2015, at 5:36 PM, David Owens II <david@owensd.io <mailto:david@owensd.io>> wrote:

Here’s my basic test case:

let first = 10000000
let second = 20000000

class LoopPerfTests: XCTestCase {

    func testZipStride() {
        self.measureBlock {
            var sum = 0
            for (i, j) in zip(first.stride(to: 0, by: -1), second.stride(to: 0, by: -2)) {
                if i % 2 == 0 { continue }
                sum += 1
            }
            print(sum)
        }
    }

    func testCStyleFor() {
        self.measureBlock {
            var sum = 0
            for var i = first, j = second; i > 0 && j > 0; i -= 1, j -= 2 {
                if i % 2 == 0 { continue }
                sum += 1
            }
            print(sum)
        }

    }

}

Non-optimized timings:
testCStyleFor - 0.126s
testZipStride - 2.189s

Optimized timings:
testCStyleFor - 0.008s
testZipStride - 0.015s

That’s a lot worse than 34%; even in optimized builds, that’s 2x slower and in debug builds, that’s 17x slower. I think it’s unreasonable to force people to write a more verbose while-loop construct to simply get the performance they need.

Also, the readability argument is very subjective; for example, I don’t find the zip version more readability. In fact, I think it obscures what the logic of the loop is doing. But again, that’s subjective.

-David

On Dec 10, 2015, at 2:41 PM, Paul Cantrell <cantrell@pobox.com <mailto:cantrell@pobox.com>> wrote:

Is there any guarantee that these two loops have the exact same runtime performance?

for (i, j) in zip(10.stride(to: 0, by: -1), 20.stride(to: 0, by: -2)) {
   if i % 2 == 0 { continue }
   print(i, j)
}

for var i = 10, j = 20; i > 0 && j > 0; i -= 1, j -= 2 {
   if i % 2 == 0 { continue }
   print(i, j)
}

In a quick and dirty test, the second is approximately 34% slower.

I’d say that’s more than acceptable for the readability gain. If you’re in that rare stretch of critical code where the extra 34% actually matters, write it using a while loop instead.

P

On Dec 10, 2015, at 4:07 PM, David Owens II via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Dec 10, 2015, at 1:57 PM, thorsten--- via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Yes, performance is one thing neglected by the discussions and the proposal.

This is my primary objection to to this proposal; it assumes (or neglects?) that all of the types used can magically be inlined to nothing but the imperative code. This isn’t magical, someone has to implement the optimizations to do this.

Is there any guarantee that these two loops have the exact same runtime performance?

for (i, j) in zip(10.stride(to: 0, by: -1), 20.stride(to: 0, by: -2)) {
   if i % 2 == 0 { continue }
   print(i, j)
}

for var i = 10, j = 20; i > 0 && j > 0; i -= 1, j -= 2 {
   if i % 2 == 0 { continue }
   print(i, j)
}

And can you guarantee that performance is relatively the same across debug and release builds? Because historically, Swift has suffered greatly in this regard with respects to the performance of optimized versus non-optimized builds.

These types of optimizer issues are real-world things I’ve had to deal with (and have written up many blog posts about). I get the desire to simplify the constructs, but we need an escape hatch to write performant code when the optimizer isn’t up to the job.

-David

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

dwaite · December 11, 2015, 11:20am

This might be a controversial opinion, but:

- The indices/values being created by a for loop should be immutable through each iteration of the actual loop (this is actually enforced by for-in and forEach() ). For example, this is appropriate for iterating a sequence
- If you are modifying indices within the body of a loop, you should use while or repeat..while. This is appropriate for use cases like traversal of a singly-linked list.

We have seen plenty of for statements that fit each bucket, so there isn’t a single fix-it option. In fact, I don’t believe it is appropriate to have a fixit option more complex than “it looks like you meant to use a range”. We will not be able to interpret the purpose of the loop, and it will be easier for developers to recognize the meaning of the code they wrote vs our generated equivalent.

Defer might be appropriate in some cases. It might not be appropriate in others. Why are we trying to guess the programmer’s intentions?

-DW

···

On Dec 11, 2015, at 12:55 AM, thorsten@portableinnovations.de wrote:

I revoked my support for two reasons: One, worries about performance which has in the meantime been confirmed by David Owens II to suffer considerably, and Two, by repeated suggestions of using defer for the incrementing clause even though it changes semantics.
Removing the for-loop would therefore mean that we lose a construct needed in cases whe high performance is requird and that instead of incrasing readability we would decrease it in those cases where defer would be misused.

-Thorsten

thorsten · December 11, 2015, 8:00am

Sorry, no, not at the moment as these results are too quirky to be relied on. Furthermore Greg Parker showed problems with -Oone. Removing a language feature should be based on a more solid base than what we currently have.

-Thorsten

···

Am 11.12.2015 um 06:12 schrieb Paul Cantrell <cantrell@pobox.com>:

David and Thorsten, you might want to reconsider your reviews?

Paul_Cantrell · December 11, 2015, 3:30pm

With that tweak, the zip+stride approach actually clocks in faster than the C-style for. Yes, you read that right: faster.

One problem: zip+stride suffers tremendously at -Onone. One test looked like this (normalized elapsed time; smaller is better)

Do we really care about -Onone performance? Doesn’t that flag specifically mean “I don’t care about performance?”

All kinds of Swift code incurs massive performance penalties with -Onone, but the core team hasn’t let that hold back the language. See the “results” section here, for example: Apples to apples, Part II · Jesse Squires

IMO, we should design languages around performance concerns only when a construct has an _inherent_ performance limitation. I’d say these timings results show pretty clearly that no such inherent limitation exists here.

Cheers, P

michelf · December 11, 2015, 2:45pm

Defer is geared to do cleanup work, work that must not be avoided even when an exception is thrown. If one day Swift becomes capable of properly unwinding the stack when a C++ or Objective-C exception is thrown, defer's performance characteristics might change a little bit. More code will be added to handle the exception case, and an entry will have to be added to the unwind table so the unwind mechanism can call that increment that should not be called in the first place.

That's not supported right now, but the Swift team at Apple has obviously been thinking about it.
https://github.com/apple/swift/blob/master/docs/ErrorHandlingRationale.rst

All this illustrates that defer is the wrong mechanism for implementing a loop. Add that the semantics are slightly wrong anyway, the only remaining thing in its favor is that it's more syntactically convenient. Well, I think that's a problem. It must not be syntactically more convenient. Let's not encourage the use of defer to implement a loop.

Keeping the C-style for loop would help avoid defer being used in that situation. Adding an optional increment clause to a while loop could do the trick too. There might be better solutions. But please, not defer.

···

Le 11 déc. 2015 à 6:20, David Waite <david@alkaline-solutions.com> a écrit :

Defer might be appropriate in some cases. It might not be appropriate in others. Why are we trying to guess the programmer’s intentions?

--
Michel Fortin
michel.fortin@michelf.ca
https://michelf.ca

Paul_Cantrell · December 11, 2015, 3:44pm

Your revised results are now right in line with what I get in my test harness, so that’s reassuring!

I’d quibble with this:

The optimized builds are still slower than the for-in “equivalent” functionality.

That’s not an accurate summary. Depending on precisely what’s in the loop, the for-in flavor is clocking in anywhere from 80% slower to 20% faster.

None of this performance testing undercuts your entirely valid concerns about syntax. We have, I think, widespread agreement on the list that the C-style for is very rarely used in most Swift code in the wild — but if your usage patterns are unusual and you use it a lot, I can see why you’d be reluctant to part with it!

It’s a question, then, of whether it’s worth having a leaner language at the expense of making some less-common code more verbose when optimized. I’m not sure that any of the C-style audits people have done on the list have been games. Are there other game developers on the list using Swift who could do the audit on their code?

Cheers,

Paul

···

On Dec 11, 2015, at 2:33 AM, David Owens II <david@owensd.io> wrote:

I don’t know what you did, your gist 404s.

Here’s an update with the while-loop: Performance for Iterations In Swift · GitHub and using both i and j within the loops. This is a simple OS X framework project with unit tests.

Debug Build:
testZipStride - 2.496s
testCStyleFor - 0.210s
testWhileLoop - 0.220s

Release Build:
testZipStride - 0.029s
testCStyleFor - 0.018s
testWhileLoop - 0.019s

I ran these tests from my MacBook Pro, the previous tests were from my iMac.

When you use the sum += (i - j) construct, I think all you are ending up with a hot-path that the optimizer can end up optimizing better (my guess is that the i-j turns into a constant expression - after all, the difference is always 1, but I don’t know enough about the SIL representation to confirm that). If you use a code path where that expression is not constant time (again, assuming my suspicion is correct), the zip+stride is against slower.

I would argue the following:

The code is not objectively easier to read or understand with the zip+stride construct (arguably, they are not even semantically equivalent).
The debug builds are prohibitively slower, especially in the context of high-performance requirement code (I’m doing a lot of prototyping Swift in the context of games, so yes, performance matters considerably).
The optimized builds are still slower than the for-in “equivalent" functionality.
The optimizer is inconsistent, like all optimizers are (this is a simple truth, not a value judgement - optimizers are not magic, they are code that is run like any other code and can only do as well as they are coded under the conditions they are coded against), at actually producing similar results with code that ends up with slightly different shapes.
There is not functionally equivalent version of the code that I can write that is not more verbose, while requiring artificial scoping constructs, to achieve the same behavior.

So no, there is no evidence that I’ve seen to reconsider my opinion that this proposal should not be implemented. If there is evidence to show that my findings are incorrect or a poor summary of the general problem I am seeing, then of course I would reconsider my opinion.

-David

On Dec 10, 2015, at 9:12 PM, Paul Cantrell <cantrell@pobox.com <mailto:cantrell@pobox.com>> wrote:

Hold the presses.

David, I found the radical differences in our results troubling, so I did some digging. It turns out that the zip+stride code:

    var sum = 0
    for (i, j) in zip(first.stride(to: 0, by: -1), second.stride(to: 0, by: -2)) {
        if i % 2 == 0 { continue }
        sum += 1
    }

…runs much faster if you actually use both i and j inside the loop:

    var sum = 0
    for (i, j) in zip(first.stride(to: 0, by: -1), second.stride(to: 0, by: -2)) {
        if i % 2 == 0 { continue }
        sum += i-j
    }

Weird, right? This is with optimization on (default “production” build). It smells like a compiler quirk.

With that tweak, the zip+stride approach actually clocks in faster than the C-style for. Yes, you read that right: faster. Also smells like a quirk. Am I doing something fantastically stupid in my code? Or maybe it’s just my idiosyncratic taste in indentation? :P

Here’s my test case, which was a command-line app with manual timing, followed by David’s dropped into the same harness, followed by David’s but with sum += i-j instead of sum += 1:

    https://gist.github.com/pcantrell/6bbe80e630d227ed0262

Point is: no big performance difference here; even a performance advantage (that is probably a compiler artifact).

David and Thorsten, you might want to reconsider your reviews?

Results:

—————— Paul’s comparison ——————

zip+stride

  Iter 0: 0.519110977649689
  Iter 1: 0.503385007381439
  Iter 2: 0.503321051597595
  Iter 3: 0.485216021537781
  Iter 4: 0.524757027626038
  Iter 5: 0.478078007698059
  Iter 6: 0.503880977630615
  Iter 7: 0.498068988323212
  Iter 8: 0.485781013965607
         ——————————————
  Median: 0.524757027626038

C-style

  Iter 0: 0.85480797290802
  Iter 1: 0.879491031169891
  Iter 2: 0.851797997951508
  Iter 3: 0.836017966270447
  Iter 4: 0.863684952259064
  Iter 5: 0.837742984294891
  Iter 6: 0.839070022106171
  Iter 7: 0.849772989749908
  Iter 8: 0.819278955459595
         ——————————————
  Median: 0.863684952259064

Zip+stride takes 0.607579217692143x the time of C-style for

—————— David’s comparison ——————

zip+stride

  Iter 0: 1.15285503864288
  Iter 1: 1.1244450211525
  Iter 2: 1.24192994832993
  Iter 3: 1.02782195806503
  Iter 4: 1.13640999794006
  Iter 5: 1.15879601240158
  Iter 6: 1.12114900350571
  Iter 7: 1.21364599466324
  Iter 8: 1.10698300600052
         ——————————————
  Median: 1.13640999794006

C-style

  Iter 0: 0.375869989395142
  Iter 1: 0.371365010738373
  Iter 2: 0.356527984142303
  Iter 3: 0.384984970092773
  Iter 4: 0.367590010166168
  Iter 5: 0.365644037723541
  Iter 6: 0.384257972240448
  Iter 7: 0.379297018051147
  Iter 8: 0.363133013248444
         ——————————————
  Median: 0.367590010166168

Zip+stride takes 3.09151491202482x the time of C-style for

—————— David’s comparison, actually using indices in the loop ——————

zip+stride

  Iter 0: 0.328687965869904
  Iter 1: 0.332105994224548
  Iter 2: 0.336817979812622
  Iter 3: 0.321089029312134
  Iter 4: 0.338591992855072
  Iter 5: 0.348567008972168
  Iter 6: 0.34687602519989
  Iter 7: 0.34755402803421
  Iter 8: 0.341500997543335
         ——————————————
  Median: 0.338591992855072

C-style

  Iter 0: 0.422354996204376
  Iter 1: 0.427953958511353
  Iter 2: 0.403640985488892
  Iter 3: 0.415378987789154
  Iter 4: 0.403639018535614
  Iter 5: 0.416707038879395
  Iter 6: 0.415345013141632
  Iter 7: 0.417587995529175
  Iter 8: 0.415713012218475
         ——————————————
  Median: 0.403639018535614

Zip+stride takes 0.838848518865867x the time of C-style for

Program ended with exit code: 0

Cheers,

Paul

On Dec 10, 2015, at 5:36 PM, David Owens II <david@owensd.io <mailto:david@owensd.io>> wrote:

Here’s my basic test case:

let first = 10000000
let second = 20000000

class LoopPerfTests: XCTestCase {

    func testZipStride() {
        self.measureBlock {
            var sum = 0
            for (i, j) in zip(first.stride(to: 0, by: -1), second.stride(to: 0, by: -2)) {
                if i % 2 == 0 { continue }
                sum += 1
            }
            print(sum)
        }
    }

    func testCStyleFor() {
        self.measureBlock {
            var sum = 0
            for var i = first, j = second; i > 0 && j > 0; i -= 1, j -= 2 {
                if i % 2 == 0 { continue }
                sum += 1
            }
            print(sum)
        }

    }

}

Non-optimized timings:
testCStyleFor - 0.126s
testZipStride - 2.189s

Optimized timings:
testCStyleFor - 0.008s
testZipStride - 0.015s

That’s a lot worse than 34%; even in optimized builds, that’s 2x slower and in debug builds, that’s 17x slower. I think it’s unreasonable to force people to write a more verbose while-loop construct to simply get the performance they need.

Also, the readability argument is very subjective; for example, I don’t find the zip version more readability. In fact, I think it obscures what the logic of the loop is doing. But again, that’s subjective.

-David

On Dec 10, 2015, at 2:41 PM, Paul Cantrell <cantrell@pobox.com <mailto:cantrell@pobox.com>> wrote:

Is there any guarantee that these two loops have the exact same runtime performance?

for (i, j) in zip(10.stride(to: 0, by: -1), 20.stride(to: 0, by: -2)) {
   if i % 2 == 0 { continue }
   print(i, j)
}

for var i = 10, j = 20; i > 0 && j > 0; i -= 1, j -= 2 {
   if i % 2 == 0 { continue }
   print(i, j)
}

In a quick and dirty test, the second is approximately 34% slower.

I’d say that’s more than acceptable for the readability gain. If you’re in that rare stretch of critical code where the extra 34% actually matters, write it using a while loop instead.

P

On Dec 10, 2015, at 4:07 PM, David Owens II via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Dec 10, 2015, at 1:57 PM, thorsten--- via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Yes, performance is one thing neglected by the discussions and the proposal.

This is my primary objection to to this proposal; it assumes (or neglects?) that all of the types used can magically be inlined to nothing but the imperative code. This isn’t magical, someone has to implement the optimizations to do this.

Is there any guarantee that these two loops have the exact same runtime performance?

for (i, j) in zip(10.stride(to: 0, by: -1), 20.stride(to: 0, by: -2)) {
   if i % 2 == 0 { continue }
   print(i, j)
}

for var i = 10, j = 20; i > 0 && j > 0; i -= 1, j -= 2 {
   if i % 2 == 0 { continue }
   print(i, j)
}

And can you guarantee that performance is relatively the same across debug and release builds? Because historically, Swift has suffered greatly in this regard with respects to the performance of optimized versus non-optimized builds.

These types of optimizer issues are real-world things I’ve had to deal with (and have written up many blog posts about). I get the desire to simplify the constructs, but we need an escape hatch to write performant code when the optimizer isn’t up to the job.

-David

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

Wallacy · December 11, 2015, 5:11pm

FWIW:

Performance differences in loops exist in any language. And the vast
majority of cases it is only a matter of compiler optimization difference:

Using this example:

"For instance, the following code shows that the "do-while" is a bit
faster. This because the "jmp" istruction is not used by the "do-while"
loop."

int main(int argc, char* argv) { int i; char x[100];
//"FOR" LOOP:for(i=0; i<100; i++ ) { x[i] = 0; }
//"WHILE" LOOP:
i=0; while(i<100 ) { x[i++] = 0; }
//"DO-WHILE" LOOP:
i=0; do { x[i++] = 0; } while(i<100);
return 0; }

*// "FOR" LOOP:*

010013C8 mov dword ptr [ebp-0Ch],0
010013CF jmp wmain+3Ah (10013DAh)

  for(i=0; i<100; i++ )
  { x[i] = 0;
    010013D1 mov eax,dword ptr [ebp-0Ch] <<< UPDATE i
    010013D4 add eax,1
    010013D7 mov dword ptr [ebp-0Ch],eax
    010013DA cmp dword ptr [ebp-0Ch],64h <<< TEST
    010013DE jge wmain+4Ah (10013EAh) <<< COND JUMP
    010013E0 mov eax,dword ptr [ebp-0Ch] <<< DO THE JOB..
    010013E3 mov byte ptr [ebp+eax-78h],0
    010013E8 jmp wmain+31h (10013D1h) <<< UNCOND JUMP
  }

*// "WHILE" LOOP:*

  i=0;
  010013EA mov dword ptr [ebp-0Ch],0
  while(i<100 )
  { x[i++] = 0;
    010013F1 cmp dword ptr [ebp-0Ch],64h <<< TEST
    010013F5 jge wmain+6Ah (100140Ah) <<< COND JUMP
    010013F7 mov eax,dword ptr [ebp-0Ch] <<< DO THE JOB..
    010013FA mov byte ptr [ebp+eax-78h],0
    010013FF mov ecx,dword ptr [ebp-0Ch] <<< UPDATE i
    01001402 add ecx,1
    01001405 mov dword ptr [ebp-0Ch],ecx
    01001408 jmp wmain+51h (10013F1h) <<< UNCOND JUMP
  }

*// "DO-WHILE" LOOP:*

i=0; . 0100140A mov dword ptr [ebp-0Ch],0
  do
  { x[i++] = 0;
    01001411 mov eax,dword ptr [ebp-0Ch] <<< DO THE JOB..
    01001414 mov byte ptr [ebp+eax-78h],0
    01001419 mov ecx,dword ptr [ebp-0Ch] <<< UPDATE i
    0100141C add ecx,1
    0100141F mov dword ptr [ebp-0Ch],ecx
    01001422 cmp dword ptr [ebp-0Ch],64h <<< TEST
    01001426 jl wmain+71h (1001411h) <<< COND JUMP
  }
  while(i<100);

···

Em sex, 11 de dez de 2015 às 06:34, David Owens II via swift-evolution < swift-evolution@swift.org> escreveu:

I don’t know what you did, your gist 404s.

Here’s an update with the while-loop:
Performance for Iterations In Swift · GitHub and using both i and
j within the loops. This is a simple OS X framework project with unit tests.

*Debug Build:*

   - testZipStride - 2.496s
   - testCStyleFor - 0.210s
   - testWhileLoop - 0.220s

*Release Build:*

   - testZipStride - 0.029s
   - testCStyleFor - 0.018s
   - testWhileLoop - 0.019s

I ran these tests from my MacBook Pro, the previous tests were from my
iMac.

When you use the sum += (i - j) construct, I think all you are ending up
with a hot-path that the optimizer can end up optimizing better (my guess
is that the i-j turns into a constant expression - after all, the
difference is always 1, but I don’t know enough about the SIL
representation to confirm that). If you use a code path where that
expression is not constant time (again, assuming my suspicion is correct),
the zip+stride is against slower.

I would argue the following:

   1. The code is not objectively easier to read or understand with the
   zip+stride construct (arguably, they are not even semantically equivalent).
   2. The debug builds are prohibitively slower, especially in the
   context of high-performance requirement code (I’m doing a lot of
   prototyping Swift in the context of games, so yes, performance matters
   considerably).
   3. The optimized builds are still slower than the for-in “equivalent"
   functionality.
   4. The optimizer is inconsistent, like all optimizers are (this is a
   simple truth, not a value judgement - optimizers are not magic, they are
   code that is run like any other code and can only do as well as they are
   coded under the conditions they are coded against), at actually producing
   similar results with code that ends up with slightly different shapes.
   5. There is not functionally equivalent version of the code that I can
   write that is not more verbose, while requiring artificial scoping
   constructs, to achieve the same behavior.

So no, there is no evidence that I’ve seen to reconsider my opinion that
this proposal should not be implemented. If there is evidence to show that
my findings are incorrect or a poor summary of the general problem I am
seeing, then of course I would reconsider my opinion.

-David

On Dec 10, 2015, at 9:12 PM, Paul Cantrell <cantrell@pobox.com> wrote:

Hold the presses.

David, I found the radical differences in our results troubling, so I did
some digging. It turns out that the zip+stride code:

    var sum = 0
    for (i, j) in zip(first.stride(to: 0, by: -1), second.stride(to: 0,
by: -2)) {
        if i % 2 == 0 { continue }
        sum += 1
    }

…runs *much* faster if you actually use both i and j inside the loop:

    var sum = 0
    for (i, j) in zip(first.stride(to: 0, by: -1), second.stride(to: 0,
by: -2)) {
        if i % 2 == 0 { continue }
        sum += *i-j*
    }

Weird, right? This is with optimization on (default “production” build).
It smells like a compiler quirk.

With that tweak, the zip+stride approach actually clocks in faster than
the C-style for. Yes, you read that right: *faster*. Also smells like a
quirk. Am I doing something fantastically stupid in my code? Or maybe it’s
just my idiosyncratic taste in indentation? :P

Here’s my test case, which was a command-line app with manual timing,
followed by David’s dropped into the same harness, followed by David’s but
with sum += i-j instead of sum += 1:

    https://gist.github.com/pcantrell/6bbe80e630d227ed0262

Point is: *no big performance difference here; even a performance
advantage* (that is probably a compiler artifact).

David and Thorsten, you might want to reconsider your reviews?

Results:

—————— Paul’s comparison ——————

zip+stride

  Iter 0: 0.519110977649689
  Iter 1: 0.503385007381439
  Iter 2: 0.503321051597595
  Iter 3: 0.485216021537781
  Iter 4: 0.524757027626038
  Iter 5: 0.478078007698059
  Iter 6: 0.503880977630615
  Iter 7: 0.498068988323212
  Iter 8: 0.485781013965607
         ——————————————
  Median: 0.524757027626038

C-style

  Iter 0: 0.85480797290802
  Iter 1: 0.879491031169891
  Iter 2: 0.851797997951508
  Iter 3: 0.836017966270447
  Iter 4: 0.863684952259064
  Iter 5: 0.837742984294891
  Iter 6: 0.839070022106171
  Iter 7: 0.849772989749908
  Iter 8: 0.819278955459595
         ——————————————
  Median: 0.863684952259064

Zip+stride takes 0.607579217692143x the time of C-style for

—————— David’s comparison ——————

zip+stride

  Iter 0: 1.15285503864288
  Iter 1: 1.1244450211525
  Iter 2: 1.24192994832993
  Iter 3: 1.02782195806503
  Iter 4: 1.13640999794006
  Iter 5: 1.15879601240158
  Iter 6: 1.12114900350571
  Iter 7: 1.21364599466324
  Iter 8: 1.10698300600052
         ——————————————
  Median: 1.13640999794006

C-style

  Iter 0: 0.375869989395142
  Iter 1: 0.371365010738373
  Iter 2: 0.356527984142303
  Iter 3: 0.384984970092773
  Iter 4: 0.367590010166168
  Iter 5: 0.365644037723541
  Iter 6: 0.384257972240448
  Iter 7: 0.379297018051147
  Iter 8: 0.363133013248444
         ——————————————
  Median: 0.367590010166168

Zip+stride takes 3.09151491202482x the time of C-style for

—————— David’s comparison, actually using indices in the loop ——————

zip+stride

  Iter 0: 0.328687965869904
  Iter 1: 0.332105994224548
  Iter 2: 0.336817979812622
  Iter 3: 0.321089029312134
  Iter 4: 0.338591992855072
  Iter 5: 0.348567008972168
  Iter 6: 0.34687602519989
  Iter 7: 0.34755402803421
  Iter 8: 0.341500997543335
         ——————————————
  Median: 0.338591992855072

C-style

  Iter 0: 0.422354996204376
  Iter 1: 0.427953958511353
  Iter 2: 0.403640985488892
  Iter 3: 0.415378987789154
  Iter 4: 0.403639018535614
  Iter 5: 0.416707038879395
  Iter 6: 0.415345013141632
  Iter 7: 0.417587995529175
  Iter 8: 0.415713012218475
         ——————————————
  Median: 0.403639018535614

Zip+stride takes 0.838848518865867x the time of C-style for

Program ended with exit code: 0

Cheers,

Paul

On Dec 10, 2015, at 5:36 PM, David Owens II <david@owensd.io> wrote:

Here’s my basic test case:

let first = 10000000
let second = 20000000

class LoopPerfTests: XCTestCase {

    func testZipStride() {
        self.measureBlock {
            var sum = 0
            for (i, j) in zip(first.stride(to: 0, by:
-1), second.stride(to: 0, by: -2)) {
                if i % 2 == 0 { continue }
                sum += 1
            }
            print(sum)
        }
    }

    func testCStyleFor() {
        self.measureBlock {
            var sum = 0
            for var i = first, j = second; i > 0 && j > 0; i -= 1, j -= 2 {
                if i % 2 == 0 { continue }
                sum += 1
            }
            print(sum)
        }

    }

}

Non-optimized timings:

   - testCStyleFor - 0.126s
   - testZipStride - 2.189s

Optimized timings:

   - testCStyleFor - 0.008s
   - testZipStride - 0.015s

That’s a lot worse than 34%; even in optimized builds, that’s 2x slower
and in debug builds, that’s 17x slower. I think it’s unreasonable to force
people to write a more verbose while-loop construct to simply get the
performance they need.

Also, the readability argument is very subjective; for example, I don’t
find the zip version more readability. In fact, I think it obscures what
the logic of the loop is doing. But again, that’s subjective.

-David

On Dec 10, 2015, at 2:41 PM, Paul Cantrell <cantrell@pobox.com> wrote:

Is there any guarantee that these two loops have the exact same runtime
performance?

for (i, j) in zip(10.stride(to: 0, by: -1), 20.stride(to: 0, by: -2)) {
   if i % 2 == 0 { continue }
   print(i, j)
}

for var i = 10, j = 20; i > 0 && j > 0; i -= 1, j -= 2 {
   if i % 2 == 0 { continue }
   print(i, j)
}

In a quick and dirty test, the second is approximately 34% slower.

I’d say that’s more than acceptable for the readability gain. If you’re in
that rare stretch of critical code where the extra 34% actually matters,
write it using a while loop instead.

P

On Dec 10, 2015, at 4:07 PM, David Owens II via swift-evolution < > swift-evolution@swift.org> wrote:

On Dec 10, 2015, at 1:57 PM, thorsten--- via swift-evolution < > swift-evolution@swift.org> wrote:

Yes, performance is one thing neglected by the discussions and the
proposal.

This is my primary objection to to this proposal; it assumes (or
neglects?) that all of the types used can magically be inlined to nothing
but the imperative code. This isn’t magical, someone has to implement the
optimizations to do this.

Is there any guarantee that these two loops have the exact same runtime
performance?

for (i, j) in zip(10.stride(to: 0, by: -1), 20.stride(to: 0, by: -2)) {
   if i % 2 == 0 { continue }
   print(i, j)
}

for var i = 10, j = 20; i > 0 && j > 0; i -= 1, j -= 2 {
   if i % 2 == 0 { continue }
   print(i, j)
}

And can you guarantee that performance is relatively the same across debug
and release builds? Because historically, Swift has suffered greatly in
this regard with respects to the performance of optimized versus
non-optimized builds.

These types of optimizer issues are real-world things I’ve had to deal
with (and have written up many blog posts about). I get the desire to
simplify the constructs, but we need an escape hatch to write performant
code when the optimizer isn’t up to the job.

-David
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

owensd · December 11, 2015, 5:18pm

As for c-style-for vs while, the two are mechanically convertible.

This is provably false and has been demonstrated, but here is an example of it again:

    var sum = 0
    for var i = 10 /* expr1 */; i > 0 /* expr2 */; i -= 1 /* expr3 */ {
        if i % 2 == 0 { continue } // statement
        sum += 1 // statement
    }
    print(sum)

    var sum = 0
    var i = 10 // expr1
    while i > 0 /* expr2 */ {
        if i % 2 == 0 { continue } // statement
        sum += 1 // statement

        i -= 1 // expr3
    }
    print(sum)

It’s not just a mechanical conversion; consideration for early loop-exits and continuations need to be made as well. The rote conversion for the while version above is an infinite loop. Not only that, expr1 now leaks variables into a scope that is no longer contained within the loop.

Where heavy performance is not required, for-in is more readable, maintainable, and optimizable to a sufficient extent that I do not see it as a bar to conversion.

This is a pure philosophical difference that I don’t think we’ll agree on. The injection of types are required to make for-in constructs work with the hope that the optimizer can clean all of this up later. This is fundamentally the same claim that C++ makes with it’s zero-cost abstraction. Mike Action has some great talks on YouTube that demonstrate many of the real-world problems he runs into.

Should the for ;; construct be used for all loop iteration? No, I don’t think so. But we shouldn’t make the claim that the while-loop is an equivalent form, because it’s not. Can it be used to mimic the functionality? Of course, but not in a mechanical conversion like you suggest. Care has to be taken that the statements don’t essentially branch and an artificial scope needs to be added.

    if true {
        var sum = 0
        var i = 10 // expr1
        while i > 0 /* expr2 */ {
            if i % 2 == 0 { continue } // statement
            sum += 1 // statement

            i -= 1 // expr3
        }
    }
    print(sum)

And then there is this:

Performance is definitely a consideration; we already reverted a pull request that remove C-style fors from the standard library. I believe Andy is currently looking into where the regressions come from. stride(...) performing poorly seems like something we should fix regardless, since that's arguably the idiomatic way to write such loops and ought to perform well. I agree we should investigate ways to ensure common loops over ranges or strides perform reasonably at -Onone if we move forward with this.

-David

···

On Dec 11, 2015, at 8:27 AM, Erica Sadun via swift-evolution <swift-evolution@swift.org> wrote:

Erica_Sadun · December 11, 2015, 4:27pm

For many of these number-crunching performance-stretching scenarios, many I suggest once again, that if you're doing serious number crunching that Accelerate or similar approaches is to be preferred? As for c-style-for vs while, the two are mechanically convertible.

Where heavy performance is not required, for-in is more readable, maintainable, and optimizable to a sufficient extent that I do not see it as a bar to conversion.

-- E

···

On Dec 11, 2015, at 8:44 AM, Paul Cantrell via swift-evolution <swift-evolution@swift.org> wrote:

Your revised results are now right in line with what I get in my test harness, so that’s reassuring!

I’d quibble with this:

The optimized builds are still slower than the for-in “equivalent” functionality.

That’s not an accurate summary. Depending on precisely what’s in the loop, the for-in flavor is clocking in anywhere from 80% slower to 20% faster.

None of this performance testing undercuts your entirely valid concerns about syntax. We have, I think, widespread agreement on the list that the C-style for is very rarely used in most Swift code in the wild — but if your usage patterns are unusual and you use it a lot, I can see why you’d be reluctant to part with it!

It’s a question, then, of whether it’s worth having a leaner language at the expense of making some less-common code more verbose when optimized. I’m not sure that any of the C-style audits people have done on the list have been games. Are there other game developers on the list using Swift who could do the audit on their code?

Cheers,

Paul

On Dec 11, 2015, at 2:33 AM, David Owens II <david@owensd.io <mailto:david@owensd.io>> wrote:

I don’t know what you did, your gist 404s.

Here’s an update with the while-loop: Performance for Iterations In Swift · GitHub and using both i and j within the loops. This is a simple OS X framework project with unit tests.

Debug Build:
testZipStride - 2.496s
testCStyleFor - 0.210s
testWhileLoop - 0.220s

Release Build:
testZipStride - 0.029s
testCStyleFor - 0.018s
testWhileLoop - 0.019s

I ran these tests from my MacBook Pro, the previous tests were from my iMac.

When you use the sum += (i - j) construct, I think all you are ending up with a hot-path that the optimizer can end up optimizing better (my guess is that the i-j turns into a constant expression - after all, the difference is always 1, but I don’t know enough about the SIL representation to confirm that). If you use a code path where that expression is not constant time (again, assuming my suspicion is correct), the zip+stride is against slower.

I would argue the following:

The code is not objectively easier to read or understand with the zip+stride construct (arguably, they are not even semantically equivalent).
The debug builds are prohibitively slower, especially in the context of high-performance requirement code (I’m doing a lot of prototyping Swift in the context of games, so yes, performance matters considerably).
The optimized builds are still slower than the for-in “equivalent" functionality.
The optimizer is inconsistent, like all optimizers are (this is a simple truth, not a value judgement - optimizers are not magic, they are code that is run like any other code and can only do as well as they are coded under the conditions they are coded against), at actually producing similar results with code that ends up with slightly different shapes.
There is not functionally equivalent version of the code that I can write that is not more verbose, while requiring artificial scoping constructs, to achieve the same behavior.

So no, there is no evidence that I’ve seen to reconsider my opinion that this proposal should not be implemented. If there is evidence to show that my findings are incorrect or a poor summary of the general problem I am seeing, then of course I would reconsider my opinion.

-David

On Dec 10, 2015, at 9:12 PM, Paul Cantrell <cantrell@pobox.com <mailto:cantrell@pobox.com>> wrote:

Hold the presses.

David, I found the radical differences in our results troubling, so I did some digging. It turns out that the zip+stride code:

    var sum = 0
    for (i, j) in zip(first.stride(to: 0, by: -1), second.stride(to: 0, by: -2)) {
        if i % 2 == 0 { continue }
        sum += 1
    }

…runs much faster if you actually use both i and j inside the loop:

    var sum = 0
    for (i, j) in zip(first.stride(to: 0, by: -1), second.stride(to: 0, by: -2)) {
        if i % 2 == 0 { continue }
        sum += i-j
    }

Weird, right? This is with optimization on (default “production” build). It smells like a compiler quirk.

With that tweak, the zip+stride approach actually clocks in faster than the C-style for. Yes, you read that right: faster. Also smells like a quirk. Am I doing something fantastically stupid in my code? Or maybe it’s just my idiosyncratic taste in indentation? :P

Here’s my test case, which was a command-line app with manual timing, followed by David’s dropped into the same harness, followed by David’s but with sum += i-j instead of sum += 1:

    https://gist.github.com/pcantrell/6bbe80e630d227ed0262

Point is: no big performance difference here; even a performance advantage (that is probably a compiler artifact).

David and Thorsten, you might want to reconsider your reviews?

Results:

—————— Paul’s comparison ——————

zip+stride

  Iter 0: 0.519110977649689
  Iter 1: 0.503385007381439
  Iter 2: 0.503321051597595
  Iter 3: 0.485216021537781
  Iter 4: 0.524757027626038
  Iter 5: 0.478078007698059
  Iter 6: 0.503880977630615
  Iter 7: 0.498068988323212
  Iter 8: 0.485781013965607
         ——————————————
  Median: 0.524757027626038

C-style

  Iter 0: 0.85480797290802
  Iter 1: 0.879491031169891
  Iter 2: 0.851797997951508
  Iter 3: 0.836017966270447
  Iter 4: 0.863684952259064
  Iter 5: 0.837742984294891
  Iter 6: 0.839070022106171
  Iter 7: 0.849772989749908
  Iter 8: 0.819278955459595
         ——————————————
  Median: 0.863684952259064

Zip+stride takes 0.607579217692143x the time of C-style for

—————— David’s comparison ——————

zip+stride

  Iter 0: 1.15285503864288
  Iter 1: 1.1244450211525
  Iter 2: 1.24192994832993
  Iter 3: 1.02782195806503
  Iter 4: 1.13640999794006
  Iter 5: 1.15879601240158
  Iter 6: 1.12114900350571
  Iter 7: 1.21364599466324
  Iter 8: 1.10698300600052
         ——————————————
  Median: 1.13640999794006

C-style

  Iter 0: 0.375869989395142
  Iter 1: 0.371365010738373
  Iter 2: 0.356527984142303
  Iter 3: 0.384984970092773
  Iter 4: 0.367590010166168
  Iter 5: 0.365644037723541
  Iter 6: 0.384257972240448
  Iter 7: 0.379297018051147
  Iter 8: 0.363133013248444
         ——————————————
  Median: 0.367590010166168

Zip+stride takes 3.09151491202482x the time of C-style for

—————— David’s comparison, actually using indices in the loop ——————

zip+stride

  Iter 0: 0.328687965869904
  Iter 1: 0.332105994224548
  Iter 2: 0.336817979812622
  Iter 3: 0.321089029312134
  Iter 4: 0.338591992855072
  Iter 5: 0.348567008972168
  Iter 6: 0.34687602519989
  Iter 7: 0.34755402803421
  Iter 8: 0.341500997543335
         ——————————————
  Median: 0.338591992855072

C-style

  Iter 0: 0.422354996204376
  Iter 1: 0.427953958511353
  Iter 2: 0.403640985488892
  Iter 3: 0.415378987789154
  Iter 4: 0.403639018535614
  Iter 5: 0.416707038879395
  Iter 6: 0.415345013141632
  Iter 7: 0.417587995529175
  Iter 8: 0.415713012218475
         ——————————————
  Median: 0.403639018535614

Zip+stride takes 0.838848518865867x the time of C-style for

Program ended with exit code: 0

Cheers,

Paul

On Dec 10, 2015, at 5:36 PM, David Owens II <david@owensd.io <mailto:david@owensd.io>> wrote:

Here’s my basic test case:

let first = 10000000
let second = 20000000

class LoopPerfTests: XCTestCase {

    func testZipStride() {
        self.measureBlock {
            var sum = 0
            for (i, j) in zip(first.stride(to: 0, by: -1), second.stride(to: 0, by: -2)) {
                if i % 2 == 0 { continue }
                sum += 1
            }
            print(sum)
        }
    }

    func testCStyleFor() {
        self.measureBlock {
            var sum = 0
            for var i = first, j = second; i > 0 && j > 0; i -= 1, j -= 2 {
                if i % 2 == 0 { continue }
                sum += 1
            }
            print(sum)
        }

    }

}

Non-optimized timings:
testCStyleFor - 0.126s
testZipStride - 2.189s

Optimized timings:
testCStyleFor - 0.008s
testZipStride - 0.015s

That’s a lot worse than 34%; even in optimized builds, that’s 2x slower and in debug builds, that’s 17x slower. I think it’s unreasonable to force people to write a more verbose while-loop construct to simply get the performance they need.

Also, the readability argument is very subjective; for example, I don’t find the zip version more readability. In fact, I think it obscures what the logic of the loop is doing. But again, that’s subjective.

-David

On Dec 10, 2015, at 2:41 PM, Paul Cantrell <cantrell@pobox.com <mailto:cantrell@pobox.com>> wrote:

Is there any guarantee that these two loops have the exact same runtime performance?

for (i, j) in zip(10.stride(to: 0, by: -1), 20.stride(to: 0, by: -2)) {
   if i % 2 == 0 { continue }
   print(i, j)
}

for var i = 10, j = 20; i > 0 && j > 0; i -= 1, j -= 2 {
   if i % 2 == 0 { continue }
   print(i, j)
}

In a quick and dirty test, the second is approximately 34% slower.

I’d say that’s more than acceptable for the readability gain. If you’re in that rare stretch of critical code where the extra 34% actually matters, write it using a while loop instead.

P

On Dec 10, 2015, at 4:07 PM, David Owens II via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Dec 10, 2015, at 1:57 PM, thorsten--- via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Yes, performance is one thing neglected by the discussions and the proposal.

This is my primary objection to to this proposal; it assumes (or neglects?) that all of the types used can magically be inlined to nothing but the imperative code. This isn’t magical, someone has to implement the optimizations to do this.

Is there any guarantee that these two loops have the exact same runtime performance?

for (i, j) in zip(10.stride(to: 0, by: -1), 20.stride(to: 0, by: -2)) {
   if i % 2 == 0 { continue }
   print(i, j)
}

for var i = 10, j = 20; i > 0 && j > 0; i -= 1, j -= 2 {
   if i % 2 == 0 { continue }
   print(i, j)
}

And can you guarantee that performance is relatively the same across debug and release builds? Because historically, Swift has suffered greatly in this regard with respects to the performance of optimized versus non-optimized builds.

These types of optimizer issues are real-world things I’ve had to deal with (and have written up many blog posts about). I get the desire to simplify the constructs, but we need an escape hatch to write performant code when the optimizer isn’t up to the job.

-David

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

Erica_Sadun · December 11, 2015, 4:49pm

For many of these number-crunching performance-stretching scenarios, many I suggest once again, that if you're doing serious number crunching that Accelerate (or similar approaches) is to be preferred? As for c-style-for vs while, the two are mechanically convertible: http://imgur.com/G4qxING Where heavy performance is not required, for-in is more readable and maintainable. And it is still optimizable.

Let me close with three points, one new, two from my previous input on this matter:
* Optimization specifics: I'm not sure that -Onone optimization performance for 2.x should determine whether a feature is or is not in 3.x.
* Extreme coding: I prefer to refactor unnecessarily complex code and eliminate edge case abuse over retaining archaic control-flow patterns.
* Reactionary preservation: If the ultimate goal is to program in C, the C compiler is not going away

-- E

···

On Dec 11, 2015, at 9:27 AM, Erica Sadun <erica@ericasadun.com> wrote:

On Dec 11, 2015, at 8:44 AM, Paul Cantrell via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Your revised results are now right in line with what I get in my test harness, so that’s reassuring!

I’d quibble with this:

The optimized builds are still slower than the for-in “equivalent” functionality.

That’s not an accurate summary. Depending on precisely what’s in the loop, the for-in flavor is clocking in anywhere from 80% slower to 20% faster.

None of this performance testing undercuts your entirely valid concerns about syntax. We have, I think, widespread agreement on the list that the C-style for is very rarely used in most Swift code in the wild — but if your usage patterns are unusual and you use it a lot, I can see why you’d be reluctant to part with it!

It’s a question, then, of whether it’s worth having a leaner language at the expense of making some less-common code more verbose when optimized. I’m not sure that any of the C-style audits people have done on the list have been games. Are there other game developers on the list using Swift who could do the audit on their code?

Cheers,

Paul

On Dec 11, 2015, at 2:33 AM, David Owens II <david@owensd.io <mailto:david@owensd.io>> wrote:

I don’t know what you did, your gist 404s.

Here’s an update with the while-loop: Performance for Iterations In Swift · GitHub and using both i and j within the loops. This is a simple OS X framework project with unit tests.

Debug Build:
testZipStride - 2.496s
testCStyleFor - 0.210s
testWhileLoop - 0.220s

Release Build:
testZipStride - 0.029s
testCStyleFor - 0.018s
testWhileLoop - 0.019s

I ran these tests from my MacBook Pro, the previous tests were from my iMac.

When you use the sum += (i - j) construct, I think all you are ending up with a hot-path that the optimizer can end up optimizing better (my guess is that the i-j turns into a constant expression - after all, the difference is always 1, but I don’t know enough about the SIL representation to confirm that). If you use a code path where that expression is not constant time (again, assuming my suspicion is correct), the zip+stride is against slower.

I would argue the following:

The code is not objectively easier to read or understand with the zip+stride construct (arguably, they are not even semantically equivalent).
The debug builds are prohibitively slower, especially in the context of high-performance requirement code (I’m doing a lot of prototyping Swift in the context of games, so yes, performance matters considerably).
The optimized builds are still slower than the for-in “equivalent" functionality.
The optimizer is inconsistent, like all optimizers are (this is a simple truth, not a value judgement - optimizers are not magic, they are code that is run like any other code and can only do as well as they are coded under the conditions they are coded against), at actually producing similar results with code that ends up with slightly different shapes.
There is not functionally equivalent version of the code that I can write that is not more verbose, while requiring artificial scoping constructs, to achieve the same behavior.

So no, there is no evidence that I’ve seen to reconsider my opinion that this proposal should not be implemented. If there is evidence to show that my findings are incorrect or a poor summary of the general problem I am seeing, then of course I would reconsider my opinion.

-David

On Dec 10, 2015, at 9:12 PM, Paul Cantrell <cantrell@pobox.com <mailto:cantrell@pobox.com>> wrote:

Hold the presses.

David, I found the radical differences in our results troubling, so I did some digging. It turns out that the zip+stride code:

    var sum = 0
    for (i, j) in zip(first.stride(to: 0, by: -1), second.stride(to: 0, by: -2)) {
        if i % 2 == 0 { continue }
        sum += 1
    }

…runs much faster if you actually use both i and j inside the loop:

    var sum = 0
    for (i, j) in zip(first.stride(to: 0, by: -1), second.stride(to: 0, by: -2)) {
        if i % 2 == 0 { continue }
        sum += i-j
    }

Weird, right? This is with optimization on (default “production” build). It smells like a compiler quirk.

With that tweak, the zip+stride approach actually clocks in faster than the C-style for. Yes, you read that right: faster. Also smells like a quirk. Am I doing something fantastically stupid in my code? Or maybe it’s just my idiosyncratic taste in indentation? :P

Here’s my test case, which was a command-line app with manual timing, followed by David’s dropped into the same harness, followed by David’s but with sum += i-j instead of sum += 1:

    https://gist.github.com/pcantrell/6bbe80e630d227ed0262

Point is: no big performance difference here; even a performance advantage (that is probably a compiler artifact).

David and Thorsten, you might want to reconsider your reviews?

Results:

—————— Paul’s comparison ——————

zip+stride

  Iter 0: 0.519110977649689
  Iter 1: 0.503385007381439
  Iter 2: 0.503321051597595
  Iter 3: 0.485216021537781
  Iter 4: 0.524757027626038
  Iter 5: 0.478078007698059
  Iter 6: 0.503880977630615
  Iter 7: 0.498068988323212
  Iter 8: 0.485781013965607
         ——————————————
  Median: 0.524757027626038

C-style

  Iter 0: 0.85480797290802
  Iter 1: 0.879491031169891
  Iter 2: 0.851797997951508
  Iter 3: 0.836017966270447
  Iter 4: 0.863684952259064
  Iter 5: 0.837742984294891
  Iter 6: 0.839070022106171
  Iter 7: 0.849772989749908
  Iter 8: 0.819278955459595
         ——————————————
  Median: 0.863684952259064

Zip+stride takes 0.607579217692143x the time of C-style for

—————— David’s comparison ——————

zip+stride

  Iter 0: 1.15285503864288
  Iter 1: 1.1244450211525
  Iter 2: 1.24192994832993
  Iter 3: 1.02782195806503
  Iter 4: 1.13640999794006
  Iter 5: 1.15879601240158
  Iter 6: 1.12114900350571
  Iter 7: 1.21364599466324
  Iter 8: 1.10698300600052
         ——————————————
  Median: 1.13640999794006

C-style

  Iter 0: 0.375869989395142
  Iter 1: 0.371365010738373
  Iter 2: 0.356527984142303
  Iter 3: 0.384984970092773
  Iter 4: 0.367590010166168
  Iter 5: 0.365644037723541
  Iter 6: 0.384257972240448
  Iter 7: 0.379297018051147
  Iter 8: 0.363133013248444
         ——————————————
  Median: 0.367590010166168

Zip+stride takes 3.09151491202482x the time of C-style for

—————— David’s comparison, actually using indices in the loop ——————

zip+stride

  Iter 0: 0.328687965869904
  Iter 1: 0.332105994224548
  Iter 2: 0.336817979812622
  Iter 3: 0.321089029312134
  Iter 4: 0.338591992855072
  Iter 5: 0.348567008972168
  Iter 6: 0.34687602519989
  Iter 7: 0.34755402803421
  Iter 8: 0.341500997543335
         ——————————————
  Median: 0.338591992855072

C-style

  Iter 0: 0.422354996204376
  Iter 1: 0.427953958511353
  Iter 2: 0.403640985488892
  Iter 3: 0.415378987789154
  Iter 4: 0.403639018535614
  Iter 5: 0.416707038879395
  Iter 6: 0.415345013141632
  Iter 7: 0.417587995529175
  Iter 8: 0.415713012218475
         ——————————————
  Median: 0.403639018535614

Zip+stride takes 0.838848518865867x the time of C-style for

Program ended with exit code: 0

Cheers,

Paul

On Dec 10, 2015, at 5:36 PM, David Owens II <david@owensd.io <mailto:david@owensd.io>> wrote:

Here’s my basic test case:

let first = 10000000
let second = 20000000

class LoopPerfTests: XCTestCase {

    func testZipStride() {
        self.measureBlock {
            var sum = 0
            for (i, j) in zip(first.stride(to: 0, by: -1), second.stride(to: 0, by: -2)) {
                if i % 2 == 0 { continue }
                sum += 1
            }
            print(sum)
        }
    }

    func testCStyleFor() {
        self.measureBlock {
            var sum = 0
            for var i = first, j = second; i > 0 && j > 0; i -= 1, j -= 2 {
                if i % 2 == 0 { continue }
                sum += 1
            }
            print(sum)
        }

    }

}

Non-optimized timings:
testCStyleFor - 0.126s
testZipStride - 2.189s

Optimized timings:
testCStyleFor - 0.008s
testZipStride - 0.015s

That’s a lot worse than 34%; even in optimized builds, that’s 2x slower and in debug builds, that’s 17x slower. I think it’s unreasonable to force people to write a more verbose while-loop construct to simply get the performance they need.

Also, the readability argument is very subjective; for example, I don’t find the zip version more readability. In fact, I think it obscures what the logic of the loop is doing. But again, that’s subjective.

-David

On Dec 10, 2015, at 2:41 PM, Paul Cantrell <cantrell@pobox.com <mailto:cantrell@pobox.com>> wrote:

Is there any guarantee that these two loops have the exact same runtime performance?

for (i, j) in zip(10.stride(to: 0, by: -1), 20.stride(to: 0, by: -2)) {
   if i % 2 == 0 { continue }
   print(i, j)
}

for var i = 10, j = 20; i > 0 && j > 0; i -= 1, j -= 2 {
   if i % 2 == 0 { continue }
   print(i, j)
}

In a quick and dirty test, the second is approximately 34% slower.

I’d say that’s more than acceptable for the readability gain. If you’re in that rare stretch of critical code where the extra 34% actually matters, write it using a while loop instead.

P

On Dec 10, 2015, at 4:07 PM, David Owens II via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Dec 10, 2015, at 1:57 PM, thorsten--- via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Yes, performance is one thing neglected by the discussions and the proposal.

This is my primary objection to to this proposal; it assumes (or neglects?) that all of the types used can magically be inlined to nothing but the imperative code. This isn’t magical, someone has to implement the optimizations to do this.

Is there any guarantee that these two loops have the exact same runtime performance?

for (i, j) in zip(10.stride(to: 0, by: -1), 20.stride(to: 0, by: -2)) {
   if i % 2 == 0 { continue }
   print(i, j)
}

for var i = 10, j = 20; i > 0 && j > 0; i -= 1, j -= 2 {
   if i % 2 == 0 { continue }
   print(i, j)
}

And can you guarantee that performance is relatively the same across debug and release builds? Because historically, Swift has suffered greatly in this regard with respects to the performance of optimized versus non-optimized builds.

These types of optimizer issues are real-world things I’ve had to deal with (and have written up many blog posts about). I get the desire to simplify the constructs, but we need an escape hatch to write performant code when the optimizer isn’t up to the job.

-David

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

owensd · December 11, 2015, 6:18pm

For optimized code, yes, I’ve seen greater than 2x. The example I’ve shown on this thread is nearly 2x.

Many of the performance issues I ran into had to do with ARC aggressively adding retain/release, especially in the context of array access. The optimizer then goes through and removes a bunch of those it determines are not necessary.

Abstractions aren’t performance quirks; they will _always_ introduce overhead, especially in debug builds. The hope is that the optimizer can be sufficiently magical to turn those abstractions into the equivalent fast code.

I completely agree that for many use cases, the performance overhead of for-in will not have a user measurable impact. I’m simply asking for a way to write the cstyle-loop when I need to without jumping through arbitrary hoops of converting the functionality into a while-loop, and not introducing subtle bugs that I need to validate.

-David

···

On Dec 11, 2015, at 9:39 AM, Paul Cantrell <cantrell@pobox.com> wrote:

Did you get 1000x performance differences in optimized code? Or even >2x?

Is there any sign that the 1000x differences at -Onone are not solvable? We’ve speculated sufficiently already; I’d want to hear from the compiler developers on that question.

One should not design a language around its current performance quirks, but certainly _should_ design it around any inherent performance limitations. “Inherent” is open question here.

P

Stephen_Canon · December 11, 2015, 4:45pm

If Swift is to be a systems language someday, then we also need to be able to write (parts of) Accelerate in Swift.

– Steve

···

On Dec 11, 2015, at 11:27 AM, Erica Sadun via swift-evolution <swift-evolution@swift.org> wrote:

For many of these number-crunching performance-stretching scenarios, many I suggest once again, that if you're doing serious number crunching that Accelerate or similar approaches is to be preferred? As for c-style-for vs while, the two are mechanically convertible.

<Screen Shot 2015-12-08 at 3.54.37 PM.png>

Where heavy performance is not required, for-in is more readable, maintainable, and optimizable to a sufficient extent that I do not see it as a bar to conversion.

-- E

On Dec 11, 2015, at 8:44 AM, Paul Cantrell via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Your revised results are now right in line with what I get in my test harness, so that’s reassuring!

I’d quibble with this:

The optimized builds are still slower than the for-in “equivalent” functionality.

That’s not an accurate summary. Depending on precisely what’s in the loop, the for-in flavor is clocking in anywhere from 80% slower to 20% faster.

None of this performance testing undercuts your entirely valid concerns about syntax. We have, I think, widespread agreement on the list that the C-style for is very rarely used in most Swift code in the wild — but if your usage patterns are unusual and you use it a lot, I can see why you’d be reluctant to part with it!

It’s a question, then, of whether it’s worth having a leaner language at the expense of making some less-common code more verbose when optimized. I’m not sure that any of the C-style audits people have done on the list have been games. Are there other game developers on the list using Swift who could do the audit on their code?

Cheers,

Paul

On Dec 11, 2015, at 2:33 AM, David Owens II <david@owensd.io <mailto:david@owensd.io>> wrote:

I don’t know what you did, your gist 404s.

Here’s an update with the while-loop: Performance for Iterations In Swift · GitHub and using both i and j within the loops. This is a simple OS X framework project with unit tests.

Debug Build:
testZipStride - 2.496s
testCStyleFor - 0.210s
testWhileLoop - 0.220s

Release Build:
testZipStride - 0.029s
testCStyleFor - 0.018s
testWhileLoop - 0.019s

I ran these tests from my MacBook Pro, the previous tests were from my iMac.

When you use the sum += (i - j) construct, I think all you are ending up with a hot-path that the optimizer can end up optimizing better (my guess is that the i-j turns into a constant expression - after all, the difference is always 1, but I don’t know enough about the SIL representation to confirm that). If you use a code path where that expression is not constant time (again, assuming my suspicion is correct), the zip+stride is against slower.

I would argue the following:

The code is not objectively easier to read or understand with the zip+stride construct (arguably, they are not even semantically equivalent).
The debug builds are prohibitively slower, especially in the context of high-performance requirement code (I’m doing a lot of prototyping Swift in the context of games, so yes, performance matters considerably).
The optimized builds are still slower than the for-in “equivalent" functionality.
The optimizer is inconsistent, like all optimizers are (this is a simple truth, not a value judgement - optimizers are not magic, they are code that is run like any other code and can only do as well as they are coded under the conditions they are coded against), at actually producing similar results with code that ends up with slightly different shapes.
There is not functionally equivalent version of the code that I can write that is not more verbose, while requiring artificial scoping constructs, to achieve the same behavior.

So no, there is no evidence that I’ve seen to reconsider my opinion that this proposal should not be implemented. If there is evidence to show that my findings are incorrect or a poor summary of the general problem I am seeing, then of course I would reconsider my opinion.

-David

On Dec 10, 2015, at 9:12 PM, Paul Cantrell <cantrell@pobox.com <mailto:cantrell@pobox.com>> wrote:

Hold the presses.

David, I found the radical differences in our results troubling, so I did some digging. It turns out that the zip+stride code:

    var sum = 0
    for (i, j) in zip(first.stride(to: 0, by: -1), second.stride(to: 0, by: -2)) {
        if i % 2 == 0 { continue }
        sum += 1
    }

…runs much faster if you actually use both i and j inside the loop:

    var sum = 0
    for (i, j) in zip(first.stride(to: 0, by: -1), second.stride(to: 0, by: -2)) {
        if i % 2 == 0 { continue }
        sum += i-j
    }

Weird, right? This is with optimization on (default “production” build). It smells like a compiler quirk.

With that tweak, the zip+stride approach actually clocks in faster than the C-style for. Yes, you read that right: faster. Also smells like a quirk. Am I doing something fantastically stupid in my code? Or maybe it’s just my idiosyncratic taste in indentation? :P

Here’s my test case, which was a command-line app with manual timing, followed by David’s dropped into the same harness, followed by David’s but with sum += i-j instead of sum += 1:

    https://gist.github.com/pcantrell/6bbe80e630d227ed0262

Point is: no big performance difference here; even a performance advantage (that is probably a compiler artifact).

David and Thorsten, you might want to reconsider your reviews?

Results:

—————— Paul’s comparison ——————

zip+stride

  Iter 0: 0.519110977649689
  Iter 1: 0.503385007381439
  Iter 2: 0.503321051597595
  Iter 3: 0.485216021537781
  Iter 4: 0.524757027626038
  Iter 5: 0.478078007698059
  Iter 6: 0.503880977630615
  Iter 7: 0.498068988323212
  Iter 8: 0.485781013965607
         ——————————————
  Median: 0.524757027626038

C-style

  Iter 0: 0.85480797290802
  Iter 1: 0.879491031169891
  Iter 2: 0.851797997951508
  Iter 3: 0.836017966270447
  Iter 4: 0.863684952259064
  Iter 5: 0.837742984294891
  Iter 6: 0.839070022106171
  Iter 7: 0.849772989749908
  Iter 8: 0.819278955459595
         ——————————————
  Median: 0.863684952259064

Zip+stride takes 0.607579217692143x the time of C-style for

—————— David’s comparison ——————

zip+stride

  Iter 0: 1.15285503864288
  Iter 1: 1.1244450211525
  Iter 2: 1.24192994832993
  Iter 3: 1.02782195806503
  Iter 4: 1.13640999794006
  Iter 5: 1.15879601240158
  Iter 6: 1.12114900350571
  Iter 7: 1.21364599466324
  Iter 8: 1.10698300600052
         ——————————————
  Median: 1.13640999794006

C-style

  Iter 0: 0.375869989395142
  Iter 1: 0.371365010738373
  Iter 2: 0.356527984142303
  Iter 3: 0.384984970092773
  Iter 4: 0.367590010166168
  Iter 5: 0.365644037723541
  Iter 6: 0.384257972240448
  Iter 7: 0.379297018051147
  Iter 8: 0.363133013248444
         ——————————————
  Median: 0.367590010166168

Zip+stride takes 3.09151491202482x the time of C-style for

—————— David’s comparison, actually using indices in the loop ——————

zip+stride

  Iter 0: 0.328687965869904
  Iter 1: 0.332105994224548
  Iter 2: 0.336817979812622
  Iter 3: 0.321089029312134
  Iter 4: 0.338591992855072
  Iter 5: 0.348567008972168
  Iter 6: 0.34687602519989
  Iter 7: 0.34755402803421
  Iter 8: 0.341500997543335
         ——————————————
  Median: 0.338591992855072

C-style

  Iter 0: 0.422354996204376
  Iter 1: 0.427953958511353
  Iter 2: 0.403640985488892
  Iter 3: 0.415378987789154
  Iter 4: 0.403639018535614
  Iter 5: 0.416707038879395
  Iter 6: 0.415345013141632
  Iter 7: 0.417587995529175
  Iter 8: 0.415713012218475
         ——————————————
  Median: 0.403639018535614

Zip+stride takes 0.838848518865867x the time of C-style for

Program ended with exit code: 0

Cheers,

Paul

On Dec 10, 2015, at 5:36 PM, David Owens II <david@owensd.io <mailto:david@owensd.io>> wrote:

Here’s my basic test case:

let first = 10000000
let second = 20000000

class LoopPerfTests: XCTestCase {

    func testZipStride() {
        self.measureBlock {
            var sum = 0
            for (i, j) in zip(first.stride(to: 0, by: -1), second.stride(to: 0, by: -2)) {
                if i % 2 == 0 { continue }
                sum += 1
            }
            print(sum)
        }
    }

    func testCStyleFor() {
        self.measureBlock {
            var sum = 0
            for var i = first, j = second; i > 0 && j > 0; i -= 1, j -= 2 {
                if i % 2 == 0 { continue }
                sum += 1
            }
            print(sum)
        }

    }

}

Non-optimized timings:
testCStyleFor - 0.126s
testZipStride - 2.189s

Optimized timings:
testCStyleFor - 0.008s
testZipStride - 0.015s

That’s a lot worse than 34%; even in optimized builds, that’s 2x slower and in debug builds, that’s 17x slower. I think it’s unreasonable to force people to write a more verbose while-loop construct to simply get the performance they need.

Also, the readability argument is very subjective; for example, I don’t find the zip version more readability. In fact, I think it obscures what the logic of the loop is doing. But again, that’s subjective.

-David

On Dec 10, 2015, at 2:41 PM, Paul Cantrell <cantrell@pobox.com <mailto:cantrell@pobox.com>> wrote:

Is there any guarantee that these two loops have the exact same runtime performance?

for (i, j) in zip(10.stride(to: 0, by: -1), 20.stride(to: 0, by: -2)) {
   if i % 2 == 0 { continue }
   print(i, j)
}

for var i = 10, j = 20; i > 0 && j > 0; i -= 1, j -= 2 {
   if i % 2 == 0 { continue }
   print(i, j)
}

In a quick and dirty test, the second is approximately 34% slower.

I’d say that’s more than acceptable for the readability gain. If you’re in that rare stretch of critical code where the extra 34% actually matters, write it using a while loop instead.

P

On Dec 10, 2015, at 4:07 PM, David Owens II via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Dec 10, 2015, at 1:57 PM, thorsten--- via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Yes, performance is one thing neglected by the discussions and the proposal.

This is my primary objection to to this proposal; it assumes (or neglects?) that all of the types used can magically be inlined to nothing but the imperative code. This isn’t magical, someone has to implement the optimizations to do this.

Is there any guarantee that these two loops have the exact same runtime performance?

for (i, j) in zip(10.stride(to: 0, by: -1), 20.stride(to: 0, by: -2)) {
   if i % 2 == 0 { continue }
   print(i, j)
}

for var i = 10, j = 20; i > 0 && j > 0; i -= 1, j -= 2 {
   if i % 2 == 0 { continue }
   print(i, j)
}

And can you guarantee that performance is relatively the same across debug and release builds? Because historically, Swift has suffered greatly in this regard with respects to the performance of optimized versus non-optimized builds.

These types of optimizer issues are real-world things I’ve had to deal with (and have written up many blog posts about). I get the desire to simplify the constructs, but we need an escape hatch to write performant code when the optimizer isn’t up to the job.

-David

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

cloutiertyler · December 11, 2015, 7:54pm

David, the

if true {

}

scope mechanism can be accomplished with

do {

}

Nonetheless, I would agree that the point remains that the while loop equivalent is more difficult to read (and write) and maintain. I would argue perhaps somewhat controversially that the loop that should go is the while loop, as it has a strict subset of the functionality of the C style for loop.

Why not allow each of the three expressions to be optional?

for var i = 0 while i < 5 step i += 1 {

or

while someCondition() {

or

for var i = 0 step i += 1 {

or

for var i = 0 while someOtherCondition() {

The mechanical conversion of for to while that you proposed below demonstrates that there is a purpose to the C style for loop features, why not allow them to be optional?

Tyler

···

On Dec 11, 2015, at 9:18 AM, David Owens II via swift-evolution <swift-evolution@swift.org> wrote:

On Dec 11, 2015, at 8:27 AM, Erica Sadun via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

As for c-style-for vs while, the two are mechanically convertible.

This is provably false and has been demonstrated, but here is an example of it again:

    var sum = 0
    for var i = 10 /* expr1 */; i > 0 /* expr2 */; i -= 1 /* expr3 */ {
        if i % 2 == 0 { continue } // statement
        sum += 1 // statement
    }
    print(sum)

    var sum = 0
    var i = 10 // expr1
    while i > 0 /* expr2 */ {
        if i % 2 == 0 { continue } // statement
        sum += 1 // statement

        i -= 1 // expr3
    }
    print(sum)

It’s not just a mechanical conversion; consideration for early loop-exits and continuations need to be made as well. The rote conversion for the while version above is an infinite loop. Not only that, expr1 now leaks variables into a scope that is no longer contained within the loop.

Where heavy performance is not required, for-in is more readable, maintainable, and optimizable to a sufficient extent that I do not see it as a bar to conversion.

This is a pure philosophical difference that I don’t think we’ll agree on. The injection of types are required to make for-in constructs work with the hope that the optimizer can clean all of this up later. This is fundamentally the same claim that C++ makes with it’s zero-cost abstraction. Mike Action has some great talks on YouTube that demonstrate many of the real-world problems he runs into.

Should the for ;; construct be used for all loop iteration? No, I don’t think so. But we shouldn’t make the claim that the while-loop is an equivalent form, because it’s not. Can it be used to mimic the functionality? Of course, but not in a mechanical conversion like you suggest. Care has to be taken that the statements don’t essentially branch and an artificial scope needs to be added.

    if true {
        var sum = 0
        var i = 10 // expr1
        while i > 0 /* expr2 */ {
            if i % 2 == 0 { continue } // statement
            sum += 1 // statement

            i -= 1 // expr3
        }
    }
    print(sum)

And then there is this:

Performance is definitely a consideration; we already reverted a pull request that remove C-style fors from the standard library. I believe Andy is currently looking into where the regressions come from. stride(...) performing poorly seems like something we should fix regardless, since that's arguably the idiomatic way to write such loops and ought to perform well. I agree we should investigate ways to ensure common loops over ranges or strides perform reasonably at -Onone if we move forward with this.

-David

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Stephen_Canon · December 11, 2015, 5:13pm

I should note that I agree that performance is mainly a red herring. There is likely some optimizer work to be done, but unless the optimizer guys speak up and say it’s infeasible, I don’t think it’s a major concern.

What does concern me is that the mechanical translation into a while loop is strictly worse from the standpoint of code readability and maintenance. The only great virtue of c-style-for is that all of the loop control is in a single place, which makes it easy to find and study. None of the proposals for replacing it that I’ve seen floated[1] have that feature (which is literally the only thing that it has going for it, in my book).

The proposed while-loop is especially painful, as not only does it split the control structure into three source locations (one of them at the end of the loop), but it also loses scoping of the iteration variable.

– Steve

[1] ok, except for proposals that are just mechanical translations of semicolons into keywords like:

for let i=0 while i<N step i+=2 {
}

···

On Dec 11, 2015, at 11:27 AM, Erica Sadun via swift-evolution <swift-evolution@swift.org> wrote:

For many of these number-crunching performance-stretching scenarios, many I suggest once again, that if you're doing serious number crunching that Accelerate or similar approaches is to be preferred? As for c-style-for vs while, the two are mechanically convertible.

<Screen Shot 2015-12-08 at 3.54.37 PM.png>

Where heavy performance is not required, for-in is more readable, maintainable, and optimizable to a sufficient extent that I do not see it as a bar to conversion.

-- E

On Dec 11, 2015, at 8:44 AM, Paul Cantrell via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Your revised results are now right in line with what I get in my test harness, so that’s reassuring!

I’d quibble with this:

The optimized builds are still slower than the for-in “equivalent” functionality.

That’s not an accurate summary. Depending on precisely what’s in the loop, the for-in flavor is clocking in anywhere from 80% slower to 20% faster.

None of this performance testing undercuts your entirely valid concerns about syntax. We have, I think, widespread agreement on the list that the C-style for is very rarely used in most Swift code in the wild — but if your usage patterns are unusual and you use it a lot, I can see why you’d be reluctant to part with it!

It’s a question, then, of whether it’s worth having a leaner language at the expense of making some less-common code more verbose when optimized. I’m not sure that any of the C-style audits people have done on the list have been games. Are there other game developers on the list using Swift who could do the audit on their code?

Cheers,

Paul

On Dec 11, 2015, at 2:33 AM, David Owens II <david@owensd.io <mailto:david@owensd.io>> wrote:

I don’t know what you did, your gist 404s.

Here’s an update with the while-loop: Performance for Iterations In Swift · GitHub and using both i and j within the loops. This is a simple OS X framework project with unit tests.

Debug Build:
testZipStride - 2.496s
testCStyleFor - 0.210s
testWhileLoop - 0.220s

Release Build:
testZipStride - 0.029s
testCStyleFor - 0.018s
testWhileLoop - 0.019s

I ran these tests from my MacBook Pro, the previous tests were from my iMac.

When you use the sum += (i - j) construct, I think all you are ending up with a hot-path that the optimizer can end up optimizing better (my guess is that the i-j turns into a constant expression - after all, the difference is always 1, but I don’t know enough about the SIL representation to confirm that). If you use a code path where that expression is not constant time (again, assuming my suspicion is correct), the zip+stride is against slower.

I would argue the following:

The code is not objectively easier to read or understand with the zip+stride construct (arguably, they are not even semantically equivalent).
The debug builds are prohibitively slower, especially in the context of high-performance requirement code (I’m doing a lot of prototyping Swift in the context of games, so yes, performance matters considerably).
The optimized builds are still slower than the for-in “equivalent" functionality.
The optimizer is inconsistent, like all optimizers are (this is a simple truth, not a value judgement - optimizers are not magic, they are code that is run like any other code and can only do as well as they are coded under the conditions they are coded against), at actually producing similar results with code that ends up with slightly different shapes.
There is not functionally equivalent version of the code that I can write that is not more verbose, while requiring artificial scoping constructs, to achieve the same behavior.

So no, there is no evidence that I’ve seen to reconsider my opinion that this proposal should not be implemented. If there is evidence to show that my findings are incorrect or a poor summary of the general problem I am seeing, then of course I would reconsider my opinion.

-David

On Dec 10, 2015, at 9:12 PM, Paul Cantrell <cantrell@pobox.com <mailto:cantrell@pobox.com>> wrote:

Hold the presses.

David, I found the radical differences in our results troubling, so I did some digging. It turns out that the zip+stride code:

    var sum = 0
    for (i, j) in zip(first.stride(to: 0, by: -1), second.stride(to: 0, by: -2)) {
        if i % 2 == 0 { continue }
        sum += 1
    }

…runs much faster if you actually use both i and j inside the loop:

    var sum = 0
    for (i, j) in zip(first.stride(to: 0, by: -1), second.stride(to: 0, by: -2)) {
        if i % 2 == 0 { continue }
        sum += i-j
    }

Weird, right? This is with optimization on (default “production” build). It smells like a compiler quirk.

With that tweak, the zip+stride approach actually clocks in faster than the C-style for. Yes, you read that right: faster. Also smells like a quirk. Am I doing something fantastically stupid in my code? Or maybe it’s just my idiosyncratic taste in indentation? :P

Here’s my test case, which was a command-line app with manual timing, followed by David’s dropped into the same harness, followed by David’s but with sum += i-j instead of sum += 1:

    https://gist.github.com/pcantrell/6bbe80e630d227ed0262

Point is: no big performance difference here; even a performance advantage (that is probably a compiler artifact).

David and Thorsten, you might want to reconsider your reviews?

Results:

—————— Paul’s comparison ——————

zip+stride

  Iter 0: 0.519110977649689
  Iter 1: 0.503385007381439
  Iter 2: 0.503321051597595
  Iter 3: 0.485216021537781
  Iter 4: 0.524757027626038
  Iter 5: 0.478078007698059
  Iter 6: 0.503880977630615
  Iter 7: 0.498068988323212
  Iter 8: 0.485781013965607
         ——————————————
  Median: 0.524757027626038

C-style

  Iter 0: 0.85480797290802
  Iter 1: 0.879491031169891
  Iter 2: 0.851797997951508
  Iter 3: 0.836017966270447
  Iter 4: 0.863684952259064
  Iter 5: 0.837742984294891
  Iter 6: 0.839070022106171
  Iter 7: 0.849772989749908
  Iter 8: 0.819278955459595
         ——————————————
  Median: 0.863684952259064

Zip+stride takes 0.607579217692143x the time of C-style for

—————— David’s comparison ——————

zip+stride

  Iter 0: 1.15285503864288
  Iter 1: 1.1244450211525
  Iter 2: 1.24192994832993
  Iter 3: 1.02782195806503
  Iter 4: 1.13640999794006
  Iter 5: 1.15879601240158
  Iter 6: 1.12114900350571
  Iter 7: 1.21364599466324
  Iter 8: 1.10698300600052
         ——————————————
  Median: 1.13640999794006

C-style

  Iter 0: 0.375869989395142
  Iter 1: 0.371365010738373
  Iter 2: 0.356527984142303
  Iter 3: 0.384984970092773
  Iter 4: 0.367590010166168
  Iter 5: 0.365644037723541
  Iter 6: 0.384257972240448
  Iter 7: 0.379297018051147
  Iter 8: 0.363133013248444
         ——————————————
  Median: 0.367590010166168

Zip+stride takes 3.09151491202482x the time of C-style for

—————— David’s comparison, actually using indices in the loop ——————

zip+stride

  Iter 0: 0.328687965869904
  Iter 1: 0.332105994224548
  Iter 2: 0.336817979812622
  Iter 3: 0.321089029312134
  Iter 4: 0.338591992855072
  Iter 5: 0.348567008972168
  Iter 6: 0.34687602519989
  Iter 7: 0.34755402803421
  Iter 8: 0.341500997543335
         ——————————————
  Median: 0.338591992855072

C-style

  Iter 0: 0.422354996204376
  Iter 1: 0.427953958511353
  Iter 2: 0.403640985488892
  Iter 3: 0.415378987789154
  Iter 4: 0.403639018535614
  Iter 5: 0.416707038879395
  Iter 6: 0.415345013141632
  Iter 7: 0.417587995529175
  Iter 8: 0.415713012218475
         ——————————————
  Median: 0.403639018535614

Zip+stride takes 0.838848518865867x the time of C-style for

Program ended with exit code: 0

Cheers,

Paul

On Dec 10, 2015, at 5:36 PM, David Owens II <david@owensd.io <mailto:david@owensd.io>> wrote:

Here’s my basic test case:

let first = 10000000
let second = 20000000

class LoopPerfTests: XCTestCase {

    func testZipStride() {
        self.measureBlock {
            var sum = 0
            for (i, j) in zip(first.stride(to: 0, by: -1), second.stride(to: 0, by: -2)) {
                if i % 2 == 0 { continue }
                sum += 1
            }
            print(sum)
        }
    }

    func testCStyleFor() {
        self.measureBlock {
            var sum = 0
            for var i = first, j = second; i > 0 && j > 0; i -= 1, j -= 2 {
                if i % 2 == 0 { continue }
                sum += 1
            }
            print(sum)
        }

    }

}

Non-optimized timings:
testCStyleFor - 0.126s
testZipStride - 2.189s

Optimized timings:
testCStyleFor - 0.008s
testZipStride - 0.015s

That’s a lot worse than 34%; even in optimized builds, that’s 2x slower and in debug builds, that’s 17x slower. I think it’s unreasonable to force people to write a more verbose while-loop construct to simply get the performance they need.

Also, the readability argument is very subjective; for example, I don’t find the zip version more readability. In fact, I think it obscures what the logic of the loop is doing. But again, that’s subjective.

-David

On Dec 10, 2015, at 2:41 PM, Paul Cantrell <cantrell@pobox.com <mailto:cantrell@pobox.com>> wrote:

Is there any guarantee that these two loops have the exact same runtime performance?

for (i, j) in zip(10.stride(to: 0, by: -1), 20.stride(to: 0, by: -2)) {
   if i % 2 == 0 { continue }
   print(i, j)
}

for var i = 10, j = 20; i > 0 && j > 0; i -= 1, j -= 2 {
   if i % 2 == 0 { continue }
   print(i, j)
}

In a quick and dirty test, the second is approximately 34% slower.

I’d say that’s more than acceptable for the readability gain. If you’re in that rare stretch of critical code where the extra 34% actually matters, write it using a while loop instead.

P

On Dec 10, 2015, at 4:07 PM, David Owens II via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Dec 10, 2015, at 1:57 PM, thorsten--- via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Yes, performance is one thing neglected by the discussions and the proposal.

This is my primary objection to to this proposal; it assumes (or neglects?) that all of the types used can magically be inlined to nothing but the imperative code. This isn’t magical, someone has to implement the optimizations to do this.

Is there any guarantee that these two loops have the exact same runtime performance?

for (i, j) in zip(10.stride(to: 0, by: -1), 20.stride(to: 0, by: -2)) {
   if i % 2 == 0 { continue }
   print(i, j)
}

for var i = 10, j = 20; i > 0 && j > 0; i -= 1, j -= 2 {
   if i % 2 == 0 { continue }
   print(i, j)
}

And can you guarantee that performance is relatively the same across debug and release builds? Because historically, Swift has suffered greatly in this regard with respects to the performance of optimized versus non-optimized builds.

These types of optimizer issues are real-world things I’ve had to deal with (and have written up many blog posts about). I get the desire to simplify the constructs, but we need an escape hatch to write performant code when the optimizer isn’t up to the job.

-David

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

fluidsonic · December 11, 2015, 5:17pm

This does not take continue into account which would then require a
duplication of expr3.
It would also change the scope of variables defined in expr1
potentially causing
collisions or unexpected shadowing.

···

On Fri, Dec 11, 2015 at 5:27 PM, Erica Sadun via swift-evolution < swift-evolution@swift.org> wrote:

For many of these number-crunching performance-stretching scenarios, many
I suggest once again, that if you're doing serious number crunching that
Accelerate or similar approaches is to be preferred? As for c-style-for vs
while, the two are mechanically convertible.

Where heavy performance is not required, for-in is more readable,
maintainable, and optimizable to a sufficient extent that I do not see it
as a bar to conversion.

-- E

On Dec 11, 2015, at 8:44 AM, Paul Cantrell via swift-evolution < > swift-evolution@swift.org> wrote:

Your revised results are now right in line with what I get in my test
harness, so that’s reassuring!

I’d quibble with this:

   1. The optimized builds are still slower than the for-in “equivalent”
   functionality.

That’s not an accurate summary. Depending on precisely what’s in the loop,
the for-in flavor is clocking in anywhere from 80% slower to 20% *faster*.

None of this performance testing undercuts your entirely valid concerns
about syntax. We have, I think, widespread agreement on the list that the
C-style for is very rarely used in most Swift code in the wild — but if
your usage patterns are unusual and you use it a lot, I can see why you’d
be reluctant to part with it!

It’s a question, then, of whether it’s worth having a leaner language at
the expense of making some less-common code more verbose when optimized.
I’m not sure that any of the C-style audits people have done on the list
have been games. Are there other game developers on the list using Swift
who could do the audit on their code?

Cheers,

Paul

On Dec 11, 2015, at 2:33 AM, David Owens II <david@owensd.io> wrote:

I don’t know what you did, your gist 404s.

Here’s an update with the while-loop:
Performance for Iterations In Swift · GitHub and using both i and
j within the loops. This is a simple OS X framework project with unit tests.

*Debug Build:*

   - testZipStride - 2.496s
   - testCStyleFor - 0.210s
   - testWhileLoop - 0.220s

*Release Build:*

   - testZipStride - 0.029s
   - testCStyleFor - 0.018s
   - testWhileLoop - 0.019s

I ran these tests from my MacBook Pro, the previous tests were from my
iMac.

When you use the sum += (i - j) construct, I think all you are ending up
with a hot-path that the optimizer can end up optimizing better (my guess
is that the i-j turns into a constant expression - after all, the
difference is always 1, but I don’t know enough about the SIL
representation to confirm that). If you use a code path where that
expression is not constant time (again, assuming my suspicion is correct),
the zip+stride is against slower.

I would argue the following:

   1. The code is not objectively easier to read or understand with the
   zip+stride construct (arguably, they are not even semantically equivalent).
   2. The debug builds are prohibitively slower, especially in the
   context of high-performance requirement code (I’m doing a lot of
   prototyping Swift in the context of games, so yes, performance matters
   considerably).
   3. The optimized builds are still slower than the for-in “equivalent"
   functionality.
   4. The optimizer is inconsistent, like all optimizers are (this is a
   simple truth, not a value judgement - optimizers are not magic, they are
   code that is run like any other code and can only do as well as they are
   coded under the conditions they are coded against), at actually producing
   similar results with code that ends up with slightly different shapes.
   5. There is not functionally equivalent version of the code that I can
   write that is not more verbose, while requiring artificial scoping
   constructs, to achieve the same behavior.

So no, there is no evidence that I’ve seen to reconsider my opinion that
this proposal should not be implemented. If there is evidence to show that
my findings are incorrect or a poor summary of the general problem I am
seeing, then of course I would reconsider my opinion.

-David

On Dec 10, 2015, at 9:12 PM, Paul Cantrell <cantrell@pobox.com> wrote:

Hold the presses.

David, I found the radical differences in our results troubling, so I did
some digging. It turns out that the zip+stride code:

    var sum = 0
    for (i, j) in zip(first.stride(to: 0, by: -1), second.stride(to: 0,
by: -2)) {
        if i % 2 == 0 { continue }
        sum += 1
    }

…runs *much* faster if you actually use both i and j inside the loop:

    var sum = 0
    for (i, j) in zip(first.stride(to: 0, by: -1), second.stride(to: 0,
by: -2)) {
        if i % 2 == 0 { continue }
        sum += *i-j*
    }

Weird, right? This is with optimization on (default “production” build).
It smells like a compiler quirk.

With that tweak, the zip+stride approach actually clocks in faster than
the C-style for. Yes, you read that right: *faster*. Also smells like a
quirk. Am I doing something fantastically stupid in my code? Or maybe it’s
just my idiosyncratic taste in indentation? :P

Here’s my test case, which was a command-line app with manual timing,
followed by David’s dropped into the same harness, followed by David’s but
with sum += i-j instead of sum += 1:

    https://gist.github.com/pcantrell/6bbe80e630d227ed0262

Point is: *no big performance difference here; even a performance
advantage* (that is probably a compiler artifact).

David and Thorsten, you might want to reconsider your reviews?

Results:

—————— Paul’s comparison ——————

zip+stride

  Iter 0: 0.519110977649689
  Iter 1: 0.503385007381439
  Iter 2: 0.503321051597595
  Iter 3: 0.485216021537781
  Iter 4: 0.524757027626038
  Iter 5: 0.478078007698059
  Iter 6: 0.503880977630615
  Iter 7: 0.498068988323212
  Iter 8: 0.485781013965607
         ——————————————
  Median: 0.524757027626038

C-style

  Iter 0: 0.85480797290802
  Iter 1: 0.879491031169891
  Iter 2: 0.851797997951508
  Iter 3: 0.836017966270447
  Iter 4: 0.863684952259064
  Iter 5: 0.837742984294891
  Iter 6: 0.839070022106171
  Iter 7: 0.849772989749908
  Iter 8: 0.819278955459595
         ——————————————
  Median: 0.863684952259064

Zip+stride takes 0.607579217692143x the time of C-style for

—————— David’s comparison ——————

zip+stride

  Iter 0: 1.15285503864288
  Iter 1: 1.1244450211525
  Iter 2: 1.24192994832993
  Iter 3: 1.02782195806503
  Iter 4: 1.13640999794006
  Iter 5: 1.15879601240158
  Iter 6: 1.12114900350571
  Iter 7: 1.21364599466324
  Iter 8: 1.10698300600052
         ——————————————
  Median: 1.13640999794006

C-style

  Iter 0: 0.375869989395142
  Iter 1: 0.371365010738373
  Iter 2: 0.356527984142303
  Iter 3: 0.384984970092773
  Iter 4: 0.367590010166168
  Iter 5: 0.365644037723541
  Iter 6: 0.384257972240448
  Iter 7: 0.379297018051147
  Iter 8: 0.363133013248444
         ——————————————
  Median: 0.367590010166168

Zip+stride takes 3.09151491202482x the time of C-style for

—————— David’s comparison, actually using indices in the loop ——————

zip+stride

  Iter 0: 0.328687965869904
  Iter 1: 0.332105994224548
  Iter 2: 0.336817979812622
  Iter 3: 0.321089029312134
  Iter 4: 0.338591992855072
  Iter 5: 0.348567008972168
  Iter 6: 0.34687602519989
  Iter 7: 0.34755402803421
  Iter 8: 0.341500997543335
         ——————————————
  Median: 0.338591992855072

C-style

  Iter 0: 0.422354996204376
  Iter 1: 0.427953958511353
  Iter 2: 0.403640985488892
  Iter 3: 0.415378987789154
  Iter 4: 0.403639018535614
  Iter 5: 0.416707038879395
  Iter 6: 0.415345013141632
  Iter 7: 0.417587995529175
  Iter 8: 0.415713012218475
         ——————————————
  Median: 0.403639018535614

Zip+stride takes 0.838848518865867x the time of C-style for

Program ended with exit code: 0

Cheers,

Paul

On Dec 10, 2015, at 5:36 PM, David Owens II <david@owensd.io> wrote:

Here’s my basic test case:

let first = 10000000
let second = 20000000

class LoopPerfTests: XCTestCase {

    func testZipStride() {
        self.measureBlock {
            var sum = 0
            for (i, j) in zip(first.stride(to: 0, by:
-1), second.stride(to: 0, by: -2)) {
                if i % 2 == 0 { continue }
                sum += 1
            }
            print(sum)
        }
    }

    func testCStyleFor() {
        self.measureBlock {
            var sum = 0
            for var i = first, j = second; i > 0 && j > 0; i -= 1, j -= 2 {
                if i % 2 == 0 { continue }
                sum += 1
            }
            print(sum)
        }

    }

}

Non-optimized timings:

   - testCStyleFor - 0.126s
   - testZipStride - 2.189s

Optimized timings:

   - testCStyleFor - 0.008s
   - testZipStride - 0.015s

That’s a lot worse than 34%; even in optimized builds, that’s 2x slower
and in debug builds, that’s 17x slower. I think it’s unreasonable to force
people to write a more verbose while-loop construct to simply get the
performance they need.

Also, the readability argument is very subjective; for example, I don’t
find the zip version more readability. In fact, I think it obscures what
the logic of the loop is doing. But again, that’s subjective.

-David

On Dec 10, 2015, at 2:41 PM, Paul Cantrell <cantrell@pobox.com> wrote:

Is there any guarantee that these two loops have the exact same runtime
performance?

for (i, j) in zip(10.stride(to: 0, by: -1), 20.stride(to: 0, by: -2)) {
   if i % 2 == 0 { continue }
   print(i, j)
}

for var i = 10, j = 20; i > 0 && j > 0; i -= 1, j -= 2 {
   if i % 2 == 0 { continue }
   print(i, j)
}

In a quick and dirty test, the second is approximately 34% slower.

I’d say that’s more than acceptable for the readability gain. If you’re in
that rare stretch of critical code where the extra 34% actually matters,
write it using a while loop instead.

P

On Dec 10, 2015, at 4:07 PM, David Owens II via swift-evolution < > swift-evolution@swift.org> wrote:

On Dec 10, 2015, at 1:57 PM, thorsten--- via swift-evolution < > swift-evolution@swift.org> wrote:

Yes, performance is one thing neglected by the discussions and the
proposal.

This is my primary objection to to this proposal; it assumes (or
neglects?) that all of the types used can magically be inlined to nothing
but the imperative code. This isn’t magical, someone has to implement the
optimizations to do this.

Is there any guarantee that these two loops have the exact same runtime
performance?

for (i, j) in zip(10.stride(to: 0, by: -1), 20.stride(to: 0, by: -2)) {
   if i % 2 == 0 { continue }
   print(i, j)
}

for var i = 10, j = 20; i > 0 && j > 0; i -= 1, j -= 2 {
   if i % 2 == 0 { continue }
   print(i, j)
}

And can you guarantee that performance is relatively the same across debug
and release builds? Because historically, Swift has suffered greatly in
this regard with respects to the performance of optimized versus
non-optimized builds.

These types of optimizer issues are real-world things I’ve had to deal
with (and have written up many blog posts about). I get the desire to
simplify the constructs, but we need an escape hatch to write performant
code when the optimizer isn’t up to the job.

-David
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Brent_Royal-Gordon · December 12, 2015, 6:37pm

As for c-style-for vs while, the two are mechanically convertible.

This is provably false and has been demonstrated, but here is an example of it again:

    var sum = 0
    for var i = 10 /* expr1 */; i > 0 /* expr2 */; i -= 1 /* expr3 */ {
        if i % 2 == 0 { continue } // statement
        sum += 1 // statement
    }
    print(sum)

    var sum = 0
    var i = 10 // expr1
    while i > 0 /* expr2 */ {
        if i % 2 == 0 { continue } // statement
        sum += 1 // statement

        i -= 1 // expr3
    }
    print(sum)

It’s not just a mechanical conversion; consideration for early loop-exits and continuations need to be made as well. The rote conversion for the while version above is an infinite loop. Not only that, expr1 now leaks variables into a scope that is no longer contained within the loop.

You know, Perl *makes* it mechanically convertible by adding a continue {} block after the while {} loop. The continue {} block is run between the loop body and the condition, so it’s skipped by `break` (well, `last`) and other early constructs that leave the loop, but not `continue` (spelled `next` there). Maybe something along those lines would help us here by making the loss of `for` more palatable. (Although that does kind of defeat the purpose of removing this syntax…)

···

--
Brent Royal-Gordon
Architechies