Suitable form for discussing and optimizing small programs to help improving compiler/optimizer/stdlib?

Andrew_Trick · March 6, 2018, 6:08pm

I have no opinion on which forum group to use, since I don't pay much attention to the categories. A performance tag that I could signup to "watch" might be helpful though.

I'm working on a reponse to the first bug report, SR-6254.

github.com/apple/swift

[SR-6254] Redundant load elimination + alias analysis opportunity.

opened 01:44PM - 30 Oct 17 UTC

jepers

bug performance Compiler

| | | |------------------|-----------------|… |Previous ID | SR-6254 | |Radar | rdar://problem/35301451 | |Original Reporter | @jepers | |Type | Bug | Attachment: [Download](https://user-images.githubusercontent.com/2727770/164962404-7f842f40-1a7b-47f3-a70c-9d9618cf4180.gz) <details> <summary>Environment</summary> Recent development snapshots (at least 2017-10-26 … 29), macOS 10.13. </details> <details> <summary>Additional Detail from JIRA</summary> | | | |------------------|-----------------| |Votes | 0 | |Component/s | Compiler | |Labels | Bug, Performance | |Assignee | None | |Priority | Medium | md5: c127c1be23378239d82633a19201e410 </details> **relates to**: * [SR-7023](https://bugs.swift.org/browse/SR-7023) Missed optimization opportunity, but not missed if wrapping code in a nested func **Issue Description:** EDIT 1: I have attached an updated version of the code so that it will compile without warnings with snapshot 2017-12-03 (and also added a // NOTE about the need to use @inline(\_\_always) in Table.swift). EDIT 2: Attached another version of the demonstration program updated to compile with dev snapshot 2018-03-03. The file below is part of the program in the attachment (the updated one) and explains the issue: main.swift ``` none // // This main.swift file is a demonstration of some IMHO strange // ways to make the program faster / slower. // // NOTE: Don't try to compile this with anything earlier than // DEVELOPMENT-SNAPSHOT-2017-10-29, since you'll probably just // get a lot of errors not related to the issues meant // to be demonstrated here. // // Please read the code and comments of this file to get an idea // about what the conditional compilation flags do, and look at // the following (which is when using my MBP, macOS 10.13): // // $ ./build.sh && ./Dlapg // time: 8.04189954401227 seconds // // $ ./build.sh -D KNOWABLE_SIZE && ./Dlapg // time: 8.15726826502942 seconds // // $ ./build.sh -D KNOWABLE_SIZE -D MAGIC && ./Dlapg // time: 2.39418747503078 seconds // // $ ./build.sh -D KNOWABLE_SIZE -D MAGIC -D DISPEL_MAGIC && ./Dlapg // time: 8.13506004196825 seconds // // I think I can understand why -D KNOWABLE_SIZE could make it faster, // (although I am interested in if it really should in this case), but // I do not at all understand why -D MAGIC is needed and why // -D DISPEL_MAGIC dispels the magic. // // Also see the NOTE in Table.swift about the need to use // @inline(__always) for the two initializers there. // import QuartzCore // Only used for CACurrentMediaTime() func createRandomDiffusionLimitedAggregationPatternImage() { // A pseudo random number generator seeded with value from SecRandomCopyBytes: let rg = Xoroshiro128Plus() #if KNOWABLE_SIZE // Statically knowable image size, seems to make some optimizations possible BUT ... // ... only if compiling with -D MAGIC! let sz = Vector2(123, 234) #else // Statically unknowable image size, seems to make some optimizations impossible: let sz = (Vector2(120, 230) ..<& Vector2(130, 240)).randomPoint(using: rg) #endif let img = Table<FloatLinear2D>(size: sz) // Linear gamma grayscale raster image. #if DISPEL_MAGIC img[.zero].e0 = 0 // This magically dispels the magic of magicImgBounds. #endif #if MAGIC let magicImgBounds = img.bounds // See how and why this is used (at only one place) in the timed code below. #endif // Set all pixels to black: for c in img.coordinates { img[c].e0 = 0.0 } // Set a couple of pixels to white, to have something for the "corals" to grow on: for _ in 0 ..< 7 { img[img.bounds.randomPoint(using: rg)].e0 = 1.0 } // The following VectorRange is the 3x3 two-dim closed range (-1, -1) to (1, 1), but // since VectorRange always interprets its upperBound as exclusive we use (2, 2): let neighborhood = VectorRange(uncheckedBounds: (Vector2<Int>(-1, -1), Vector2<Int>(2, 2))) // Generate the coral pattern by random walking until hitting a white pixel, and also // measure and report the time it takes: let t0 = CACurrentMediaTime() for _ in 0 ..< 500_000 { var c = img.bounds.randomPoint(using: rg) // (magicImgBounds has no effect here.) for i in 0 ..< 20_000 { let dir = neighborhood.randomPoint(using: rg) c = (c &+ dir).periodicallyBounded(by: img.bounds) // (magicImgBounds has no effect here.) var sum = Float(0) for nc in neighborhood.points { #if MAGIC let gc = (c &+ nc).periodicallyBounded(by: magicImgBounds) #else let gc = (c &+ nc).periodicallyBounded(by: img.bounds) #endif sum = sum + img[gc].e0 } if sum > 0.0 { if sum == 1.0 && i > 500 { img[c].e0 = 1.0 } break } } } let t1 = CACurrentMediaTime() print("time:", t1 - t0, "seconds") // Repeat the generated pattern as 3 x 3 tiles and save the result as a png file: let img2 = type(of: img).init(size: img.size * 3) for c in img2.coordinates { img2[c] = img[c.periodicallyBounded(by: img.bounds)] } img2.saveAsPng(path: "pattern.png") } createRandomDiffusionLimitedAggregationPatternImage() ``` So the demonstration is meant to compile without errors and produce images like the one contained in the attached zip file. And the expected result (assuming these really are issues and not expected behaviors) is that the compiler should be able to optimize this program (so that time is 2 rather than 8) with -D KNOWABLE_SIZE but without needing -D MAGIC. (I first encountered these issues within Xcode and I could've attached an Xcode project or Swift package but I think the risk of accidentally introducing unrelated issues might be reduced by only attaching the source files and a simple build.sh to compile them.)

That bug report already has more than enough information. I was able to reproduce it right away. Spending more time on the bug report/benchmark probably won't get it fixed any quicker, but it may give you more insight to work around the problem quicker next time.

I mainly want to assure you that filing bugs like this is extremely helpful, even if it seems like no one is looking at them. Posting to the forum also helps draw attention. If it's a problem that hasn't been solved in months and is still affecting you, it helps to update the bug or post to the forum again.

I know that reducing a performance problem to a standalone benchmark is aggravating when you don't know what is causing fluctuations. But if you take the time to do that (as you did in SR-6264) that's very helpful to someone debugging the problem. Whoever actually does the debugging will be able further reduce it to a regression benchmark and add it to the benchmark suite. The reporter does not need to add a benchmark.