URLComponents(url:) is much faster than URLComponents(string:) - possible bridging performance issue?

I've noticed a rather strange performance difference between URLComponents(url:resolvingAgainstBaseUrl:) and URLComponents(string:) - namely that the former seems a fair bit faster than the latter. This is on a macOS 11.6.

Components From URL

Looking at the implementation for NSURLComponents(url:resolvingAgainstBaseURL:), I find that it forwards to the CoreFoundation function _CFURLComponentsCreateWithURL. If we are resolving against the base URL, this function:

  1. Calls CFURLCopyAbsoluteURL to do the resolving, then
  2. CFURLGetString on the result, which it then forwards to...
  3. _CFURLComponentsCreateWithString, which creates the CFURLComponents

URL.absoluteString

A quick note here: if we look at the implementation for NSURL.absoluteString, we can see it uses exactly the same pattern as above: CFURLCopyAbsoluteURL, followed by CFURLGetString on the result.

Components From String

Now, let's look at NSURLComponents(string:) - as you might expect, it just calls _CFURLComponentsCreateWithString directly. Same function as before, but without resolving the URL and copying its string.

Now, you can probably see where I'm going with this - theoretically, if I have a URL instance, I should be able to call .absoluteString and feed that result in to URLComponents(string:) myself, and expect exactly the same performance as calling URLComponents(url:, resolvingAgainstBaseURL: true). The benefit to doing it manually is that I'd also have the .absoluteString which I can use for other purposes.

Performance

But that isn't what I'm seeing. Benchmarking the two on a set of URL strings (specifically, these strings), I'm seeing a 27% regression. Just calling these functions, and nothing else. And it's repeatable - I can change back and forth and consistently get the same results.

URLComponents(url:)

for url in average_urls {
  let cmps = URLComponents(url: url, resolvingAgainstBaseURL: true)
  blackHole(cmps)
}
name                        time         std        iterations
--------------------------------------------------------------
FoundationToWeb.AverageURLs 30076.000 ns ±  23.54 %      43943
FoundationToWeb.IPv4        40066.000 ns ±  24.83 %      31307
FoundationToWeb.IPv6        44944.000 ns ±  24.34 %      28852

URLComponents(string:)

for url in average_urls {
  let str = url.absoluteString
  let cmps = URLComponents(string: str)
  blackHole(cmps)
}
name                        time         std        iterations
--------------------------------------------------------------
FoundationToWeb.AverageURLs 41393.000 ns ±  22.51 %      32415
FoundationToWeb.IPv4        60179.500 ns ±  22.54 %      20860
FoundationToWeb.IPv6        65387.000 ns ±  21.91 %      19123

The only thing that I can think of is that it's the whole ._swiftObject stuff that's causing the overhead (bridging the string to Swift, rather than keeping it in C). Does that sound plausible, or could there be some other explanation for this? As shown above, they ultimately call the exact same CF functions, in the same order.

I can't find where ._swiftObject is implemented for these (presumably CFStrings, so I haven't been able to examine further.

I haven't dug into the specifics of this at all, but since you mention macOS 11.6, just a reminder so you know you're looking at the right code: the Foundation and CoreFoundation frameworks which ship on Darwin platforms are not the Swift reimplementations in swift-corelibs-foundation. Those Swift sources are used only on non-Darwin platforms. If you want to be sure of what's actually happening on your system, you'll be better served jumping into something like Hopper and looking at the actual frameworks on your system. (This might help describe any differences between your expectations based on the source code and what you observe.)

1 Like

I don't think that will work; there isn't even any binary in /System/Library/Frameworks any more. Apparently that's due to the dyld shared cache.

EDIT: Oh, Hopper can actually extract the binaries from the shared cache. Neat!

Yep! Hopper is pretty magical. (I believe there may also be an incantation to pull a library out of the dyld shared cache onto disk for inspection, but I don't remember what it is off the top of my head)

One alternative for general viewing of Darwin cross-platform libraries — you can find the copies of the system libraries used when running apps inside of the iOS simulator within Xcode itself:

<path to Xcode>/Contents/Developer/Platforms/iPhoneOS.platform/Library/Developer/CoreSimulator/Profiles/Runtimes/iOS.simruntime/Contents/Resources/RuntimeRoot/System/Library/Frameworks

has the system frameworks on disk. Obviously, this won't work for macOS-specific frameworks, and the ones in there are built for the sim (so may not necessarily match the architecture of an actual hardware device — though they do tend to be fat archives), but many relevant frameworks are present and don't change much between platforms. For general introspection, this may be sufficient.

[I believe the same is available for the other OS simulators, but for casual introspection, the iOS sim may be enough.]

Are you doing many iterations and going through the same small URL set again and again? Will results change if you run your URL set in many event loop invocations, so that any given URL of the test set is only used once per event loop invocation (obviously timing code will have to be adjusted)? Alternatively create a bunch of unique URL's and only use each URL once. This is to rule out URL caching effects if any.

I'm using the Google benchmark library, which I've found to be really great for profiling code snippets. It uses a dynamic number of iterations, but I've found that it gives incredibly consistent results across runs performed even days apart, or when switching back and forth between implementations, so I consider its measurements to be reliable.

I linked the URL strings I used the OP. I used exactly the same executable as I use to profile Foundation -> WebURL conversion, except that the specific benchmarks I linked to were modified to only test this aspect of Foundation's API (that's why the results say "FoundationToWeb"; I didn't change the name. But WebURL wasn't involved in any of it), e.g.:

So the numbers are a measure of how long it took to process 17 URL instances, and the average difference between the 2 APIs was 11.3μs, meaning 665ns overhead per URL. Just in overheads, mind.

I took a brief look at the bridging machinery, and it's quite complex.

Thanks to Hopper, I was able to confirm that URL.absoluteString results in a call to String.init(_cocoaString: AnyObject). It clearly will not produce tagged pointers (because the values are too large), but the strings are ASCII and support _fastCStringContents, so they hit the relatively happy-path. But then you need to pass them back in to URLComponents and bridge them again via String._bridgeToObjectiveCImpl(), so you get hit with bridging twice. It's not unthinkable that it could take ~600ns for both steps combined.

URLComponents.init(url:) can avoid all of this, because URL doesn't need the same bridging magic that String does.

So my assumption for now is that bridging is to blame. I wonder if it would be possible for APIs like CFURLGetString to be adapted to write to the buffer provided by new initializers such as String.init(unsafeUninitializedCapacity:) so they could produce native Swift strings, rather than allocating and writing a CFString only to immediately bridge it.

4 Likes

Essentially that scenario is what initially motivated me to add that String initializer, so yeah, that sounds like a great fix. I’ll make a note to assign a bug to myself about it.

3 Likes
Terms of Service

Privacy Policy

Cookie Policy