Expensive calls to "outlined init with take"

I've found a workaround. It's super ugly...

@inlinable @inline(__always)
internal var schemeKind: Optional<WebURL.SchemeKind> {
  withUnsafeBytes(of: self) {
    $0.load(fromByteOffset: MemoryLayout.offset(of: \Self._schemeKind)!, as: Optional<WebURL.SchemeKind>.self)
  }
}

Basically I renamed the stored properties and wrote my own accessors for them. I did this for like 3 or 4 stored properties, all optionals. I really, really wish I didn't have to do this, but now my benchmarks are showing the best results I have ever seen from this project, with common cases improving by 17-24%(!!!).

Summary
benchmark                                              column     results/path_9 results/path_10       %
--------------------------------------------------------------------------------------------------------
Constructor.SpecialNonFile.AverageURLs                 std                 26.08           63.28 -142.65
Constructor.SpecialNonFile.AverageURLs                 warmup         3829983.00      3466338.00    9.49
Constructor.SpecialNonFile.AverageURLs                 iterations       37796.00        44360.00  -17.37
Constructor.SpecialNonFile.AverageURLs                 time             33922.00        28055.00   17.30
Constructor.SpecialNonFile.AverageURLs filtered        std                 35.28           33.86    4.03
Constructor.SpecialNonFile.AverageURLs filtered        warmup         4742413.00      5451983.00  -14.96
Constructor.SpecialNonFile.AverageURLs filtered        iterations       28011.00        28872.00   -3.07
Constructor.SpecialNonFile.AverageURLs filtered        time             45356.00        43094.50    4.99
Constructor.SpecialNonFile.IPv4 host                   std                 35.11           34.60    1.45
Constructor.SpecialNonFile.IPv4 host                   warmup         3577949.00      2827990.00   20.96
Constructor.SpecialNonFile.IPv4 host                   iterations       36648.00        42892.00  -17.04
Constructor.SpecialNonFile.IPv4 host                   time             35112.00        27630.00   21.31
Constructor.SpecialNonFile.IPv4 host filtered          std                 50.86           36.97   27.31
Constructor.SpecialNonFile.IPv4 host filtered          warmup         4221503.00      3779503.00   10.47
Constructor.SpecialNonFile.IPv4 host filtered          iterations       32093.00        33531.00   -4.48
Constructor.SpecialNonFile.IPv4 host filtered          time             39854.00        36601.00    8.16
Constructor.SpecialNonFile.IPv6 host                   std                 32.64           37.21  -14.00
Constructor.SpecialNonFile.IPv6 host                   warmup         4345729.00      3213953.00   26.04
Constructor.SpecialNonFile.IPv6 host                   iterations       32983.00        44068.00  -33.61
Constructor.SpecialNonFile.IPv6 host                   time             37398.00        28309.50   24.30
Constructor.SpecialNonFile.IPv6 host filtered          std                 24.86           38.78  -56.02
Constructor.SpecialNonFile.IPv6 host filtered          warmup         5475037.00      3436541.00   37.23
Constructor.SpecialNonFile.IPv6 host filtered          iterations       30296.00        37909.00  -25.13
Constructor.SpecialNonFile.IPv6 host filtered          time             41760.00        34307.00   17.85
Constructor.SpecialNonFile.Percent-encoding components std                 37.57           32.74   12.86
Constructor.SpecialNonFile.Percent-encoding components warmup         1278889.00      1141999.00   10.70
Constructor.SpecialNonFile.Percent-encoding components iterations      104477.00       119079.00  -13.98
Constructor.SpecialNonFile.Percent-encoding components time             12318.00        11108.00    9.82
Constructor.SpecialNonFile.Percent-encoded hostnames   std                104.19           37.64   63.87
Constructor.SpecialNonFile.Percent-encoded hostnames   warmup         1214436.00      1039024.00   14.44
Constructor.SpecialNonFile.Percent-encoded hostnames   iterations      109287.00       131662.00  -20.47
Constructor.SpecialNonFile.Percent-encoded hostnames   time             11629.00         9915.00   14.74
Constructor.SpecialNonFile.Long paths                  std                 24.11           25.35   -5.14
Constructor.SpecialNonFile.Long paths                  warmup        10246773.00      9702339.00    5.31
Constructor.SpecialNonFile.Long paths                  iterations       12921.00        13470.00   -4.25
Constructor.SpecialNonFile.Long paths                  time             94723.00        93464.50    1.33
Constructor.SpecialNonFile.Complex paths 1             std                 47.32           28.06   40.70
Constructor.SpecialNonFile.Complex paths 1             warmup         2991180.00      2786803.00    6.83
Constructor.SpecialNonFile.Complex paths 1             iterations       47141.00        48935.00   -3.81
Constructor.SpecialNonFile.Complex paths 1             time             26821.00        26165.00    2.45
Constructor.SpecialNonFile.Complex paths 2             std                 35.28           26.18   25.79
Constructor.SpecialNonFile.Complex paths 2             warmup         3751616.00      3253327.00   13.28
Constructor.SpecialNonFile.Complex paths 2             iterations       37345.00        41286.00  -10.55
Constructor.SpecialNonFile.Complex paths 2             time             32966.00        32142.00    2.50
Constructor.SpecialNonFile.Long query 1                std                194.64           70.05   64.01
Constructor.SpecialNonFile.Long query 1                warmup          362005.00       328641.00    9.22
Constructor.SpecialNonFile.Long query 1                iterations      407376.00       476298.00  -16.92
Constructor.SpecialNonFile.Long query 1                time              3089.00         2738.00   11.36
Constructor.SpecialNonFile.Long query 2                std                 34.20           30.88    9.71
Constructor.SpecialNonFile.Long query 2                warmup         2504260.00      2733403.00   -9.15
Constructor.SpecialNonFile.Long query 2                iterations       52335.00        54263.00   -3.68
Constructor.SpecialNonFile.Long query 2                time             24316.00        23348.00    3.98
--------------------------------------------------------------------------------------------------------
                                                       std                                         14.02
                                                       warmup                                      11.72
                                                       iterations                                 -13.05
                                                       time                                        11.09

Wow. I'm ecstatic. I can't begin to describe the countless hours I've spent, trying to figure out which algorithmic changes I could make, so I could bring these benchmarks anywhere close to this level. In particular the first one ("AverageURLs") - it's the most important, the most representative of real URLs on the web, and I'd set myself the goal of getting it under 30ms. But no matter how much I tried, I just couldn't get it reliably under that level.

With one change, it has steam-rolled past that. As you can imagine, I've ran them all again multiple times to confirm. The best I've seen is 27.6, and the worst is 28.8. It's an arbitrary benchmark and there's random variance and all that, but still - it's huge. Even now I can't really believe I've seen those numbers.

We need this fixed, badly. I hate to think how many other developers are out there, wondering why their code isn't meeting their performance targets, spending who-knows-how-much time trying to think of clever tricks or novel algorithms, when there are things like this lurking in the compiler.

8 Likes