Foundation URL Improvements

To be fair, the copyright comment at the top of CFURL's implementation is dated from 1998, which means it's at least 3 years older than Billie Eilish.

When you consider the number of system APIs that traffic in terms of Foundation URL (not to mention all of the internal ones we never see!), as well as all of the code written over that period... it becomes very difficult to do something like change the parsing behaviour. Who knows what that could do?

IMHO, Foundation URL has really been let down by the weakness of URL standards as a whole. It turns out, in hindsight, that they weren't nearly as stable as people at the time hoped they would be, and baking it in to the system ended up locking down outdated behaviour.

It sucks, but nobody could have forseen it. The web was just a totally different place in the 90s.

11 Likes

I was curious about that. I'm fine switching to WebURL but was wondering if there could be a way to make URL that everybody uses more suited for modern day practical usages.

I think this is one of those cases where it is best to cut losses and just transition to something else over time. Foundation.URL is to WebURL as Foundation.NSString is to Swift.String.

2 Likes

CustomStringConvertible

The description might be improved by always returning the absoluteString.
Currently, it can return the relativeString, a " -- " separator, and the baseURL.

CustomDebugStringConvertible

If necessary, the debugDescription can return the separated components.

LosslessStringConvertible

Would it be possible to add this conformance?
Or might it conflict with the proposed non-failable initializer?

I don't want to derail this thread; I think it's great that the Foundation team are looking to make URL more economic, but I guess to some extent it is inevitable to bring up the parsing behaviour. It's something I encountered quite a lot working on Apple platforms (URLs spread everywhere - and if one platform refuses to accept them, you get a bunch of eye-rolls from your colleagues and a lot of reluctance to accommodate).

There are a lot of changes to things like parsing behaviour - but it goes even deeper. Foundation URL represents both relative and absolute URLs (i.e. with a scheme) in a single type, but the WHATWG standard doesn't even consider relative URLs to be URLs. According to the web model, every URL must have a scheme. Stuff like this:

URL(string: "foo") // <- Parses successfully! Is this a URL?

Just wouldn't fly.

Not to mention that pretty much every other URL library has gone the other way in terms of things like automatically percent-decoding URL components, the percent-encoding sets are different, and there's a lot of protocol-specific rules (e.g. HTTP URLs can't have empty hostnames - it's forbidden by the HTTP spec, and actually enforced at the URL level so it won't even parse).

Then you get in to things like extra leading slashes (e.g. http:///////example.com gets turned in to http://example.com/ - but only for HTTP/S and other "special" schemes because that turns out to be necessary in order to not break the web), and how back-slashes are treated like forward-slashes in certain URLs, and it starts adding up to a lot of risk.

I have no idea what would happen if URL suddenly started parsing things like WebURL across the board, but I'd be very surprised if nothing broke anywhere. Maybe it would be acceptable breakage, but maybe it would be catastrophic.

12 Likes

I agree: it would be a bad idea to change existing behavior.

Foundation.NSString has extremely similar problems, and Swift.String should (to some extent) be considered a model of how to deal with them. String is relatively unusual among string implementations for being extremely compliant with Unicode standards. It would have been ridiculous to retrofit that behavior onto NSString.

I’m not advocating for more toll-free bridging, obviously. And I don’t meant to disparage the Foundation team at all: they have a formidable amount of technical debt and lock-in to contend with.

5 Likes

There’s a subtle hierarchy to Swift string representations:

  • CustomStringConvertible says that an instance can be converted into a String value (which is actually true regardless, hence the guidance to never require or otherwise depend on it) using a certain property.
  • LosslessStringConvertible says that an instance can be converted into a String value using a certain property, and that String value can be converted back into an equivalent instance with a certain initializer.
  • RawRepresentable with a RawValue of String says that an instance can be converted into a String value using a certain property, and that String value alone can be converted back into an equivalent instance with a certain initializer.

I think it comes down to how strict the String initializer is supposed to be.

1 Like

ouch. :frowning: this is really bad. What would happen for an URL on a network drive with a slow network? If this falls into the category of "straightforward changes" I'd suggest to fix this.

4 Likes

Task local values must be expressed with closures and such "with..." blocks (because values are just bound for the duration of the call and cannot escape it), so yeah that's the API you'd have for it. Yeah, such API would just work™ I think.

3 Likes

I wholeheartedly agree, and that's definitely a future direction that we will explore. This proposal is about making URL more Swifty™ with some quick since it really needs some modernization.

5 Likes

Wholeheartedly agree. The current working directory concept is ancient, fragile and needs retirement badly. I'd say we shall not just not support it with the new sugar, but deprecate.

2 Likes

Thanks for the suggestion! There are a few more posts mentioned URLQueryItem I'll gather all related suggestions and hopefully come up with a good solution in the next revision :grin:

1 Like

Any support for query (or form) encoding of dictionaries or arrays will have to be accompanied by several encoding options for each, as there’s no single standard to follow. Frankly, almost all such encoding needs different options, such as space encoding, data, and percent escaping.

6 Likes

I like the way the WHATWG standard (which claims ownership of form-encoding) describes it. I've read a fair few standards documents, and I've never seen anything like this before:

The application/x-www-form-urlencoded format is in many ways an aberrant monstrosity, the result of many years of implementation accidents and compromises leading to a set of requirements necessary for interoperability, but in no way representing good design practices. In particular, readers are cautioned to pay close attention to the twisted details involving repeated (and in some cases nested) conversions between character encodings and byte sequences. Unfortunately the format is in widespread use due to the prevalence of HTML forms.

It really is something not even a mother could love.

11 Likes

Good catch. I meant precondition. The plan is to treat malformed Strings as programmer errors even in the release builds.

I thought about that in the earlier version of the proposal. We decided URL isn't the best fit for ExpressibleByStringLiteral because technically it isn't. Some strings can't be turned into an URL. Imagine this scenario:

let url: URL = "www.apple.com"

It would be pretty confusing if/when this crashes.

Ahh thanks for pointing that out!

This is a rename of the existing method appendingPathComponent() so it allows multiple components. We purposely chose path as the argument label to indicate that you are appending a path, so it could contain multiple path components.

I agree! Let us know if you have other suggestions :grin:

The cached resource values does not impact equality. The URL caches are cleared on the next turn of the RunLoop (which, unfortunately, also means runloop-less daemons and such NEVER get them invalidated). It's primarily designed for the situation where a URL is passed between several different layers of code that end up repeating access of the same metadata. In these situations, the URL cache could reduce the amount of disk access needed.

1 Like

True, although I suppose the same could be said for the proposed StaticString initializer. Are the call-sites really different enough to conclude that one makes the mistake obvious and the other does not?

let url: URL = "www.apple.com"
let url = URL("www.apple.com")

I see. I actually quite it somewhat surprising that .appendingPathComponent allows you to append multiple components. It seems like the user is explicitly saying that the provided string should be considered a single component and percent-encoded accordingly (even if it contains user input which happens to contain a forward-slash).

func urlForBand(_ bandName: String) -> URL {
  URL(string: "https://example.com/music/bands")!
    .appendingPathComponent(bandName)
}

// Imagine the band name is not a string literal,
// but instead derived from runtime data (perhaps a search field).
// This results in 2 components:
urlForBand("AC/DC") // https://example.com/music/bands/AC/DC

Also, if I can control the bandName parameter in the above example, I can add any number of ".." components and take control of the entire path. That could potentially have security implications in certain circumstances, and in other situations just leads to particular inputs returning the incorrect result unless the developer knows about this and manually adds percent-encoding.

Okay, that's quite interesting. I took a brief look at the implementation in corelibs-foundation, and this seems to be part of quite an elaborate mechanism.

  • Every URL (including HTTP URLs, which I don't believe make use of resource values) has to store an extra pointer for this
  • Every URL needs to pay for potentially cleaning up this object at deinit (it's often overlooked, but cheap destruction is just as important as cheap construction)
  • The pointer has to be initialized and cleared using atomic instructions
  • Getting and setting resource values additionally must be gated using an NSLock

I wasn't actually able to find the location where RunLoop clears these values (perhaps it's not implemented in corelibs-foundation?), but it seems to me that this requires one of two implementation strategies:

  • Every URL with resource values must register itself with the RunLoop for later cleanup. Essentially, you could think of these auto-cleaning caches as similar to auto-zeroing weak references.
  • The cached values must be paired with some kind of tick number from the runloop. They will stay in memory, but return nil if the current tick number is different to the cached one.

So, how about an alternative: move this caching behaviour to FileManager.

  1. This could more easily be made to work for file paths as well
  2. It could more reliably work for all URLs and paths which resolve to the same file, rather than being state tied to a particular URL instance
  3. The size of URL instances can shrink by 64 bits (multiplied by n URLs, that could be significant), and instances become a little cheaper to destroy
  4. Because Apps generally have many more URLs than they do FileManagers, overall memory usage should be lower
  5. It becomes a lot easier to wipe the entire cache for all paths/URLs on each iteration of the runloop, because they'd all be stored in a single Dictionary rather than being split across n URL instances. FM could even offer new APIs for greater control of this cache.
4 Likes

Can the new function signatures be appending(subPath:) and appending(subPaths:), so it's clear that they don't take whole paths.

This design is quite dated and deserves replacement. The following fragment illustrates the problem:

Timer.scheduledTimer(withTimeInterval: 1, repeats: true) { _ in
    let fm = FileManager.default
    let tempFolder = URL(fileURLWithPath: NSTemporaryDirectory())
    let name = UUID().uuidString
    let file = tempFolder.appendingPathComponent(name)

    // MARK: first create a file
    let data = "Hello".data(using: .utf8)!
    try! data.write(to: file)
    
    // MARK: "cache" resource values
    let values = try! file.resourceValues(forKeys: [.isDirectoryKey])
    assert(values.isDirectory == false)

    // MARK: then delete the file and create a directory with the same name
    try? fm.removeItem(at: file)
    try! fm.createDirectory(at: file, withIntermediateDirectories: false, attributes: nil)
    let values2 = try! file.resourceValues(forKeys: [.isDirectoryKey])
    assert(values2.isDirectory == true) // Fails
    
    // MARK: cleanup
    try? fm.removeItem(at: file)
}

The change to the file hierarchy could be done externally. I didn't check it but it could be also possible to get mixed results (some keys of the same item from old cached entries, some that weren't in cache newly obtained).

"AutoreleasePool style" caching, for the duration of current run loop invocation (like 1/100 second) smells bad, as if developers didn't want (or were short on time) to invest in a proper caching / cache invalidation architecture. There's a lost optimisation opportunity in the current resource caching as it is very short lived and there could have been a significant speedup if caching outlived run loop invocations (and were totally unaware of the very concept of run loop!)

:100: my thoughts. If I am not mistaken FileManager already has caching, so instead of introducing another caching on top (and to the side) of it resource caching can be inside FM.

If we were implementing resource caching today it's unlikely we'd end up with the design currently implemented.

I hope the Foundation team:

  • firmly intends to not break existing code in a whim, including code which relies on its bugs/quirks/features.
  • was prepared for a "culture clash" as they decided to open up their plans to this forum.

I think it is important that we do not throw "why don't you just" sentences to the Foundation team, or blame design decisions in a hurry and without consideration.

Since Swift was introduced, we have all seen Foundation make progress towards a better Swift integration. It would be respectful to acknowledge that the Foundation team is quite aware of frictions with the new language. The mere presence of those new FOU proposals is a clear sign that they are looking to accelerate this process, and are looking for feedback, support, and contributions. I understand that many years of complete opacity and "black hole" radars have created frustration. But irony, disregard, and "api-ageism" are rhetorical tools that are unlikely to have the desired effect - and are painful to read.

A backward-compatible solution for the Foundation.URL cache issues could be to phase out apis and behaviors that go against immutability, thread-safety and Swift concurrency, and generally hinder progress (through deprecation, or Pitch: Unavailability from asynchronous contexts). Those legacy apis would be replaced with new ones, available from both ObjC and Swift. Or those legacy apis would not be replaced at all, leaving the job to another Core Swift library (stdlib, Swift System, ...).

22 Likes

Indeed! The following test passes when running on the main thread and fails when running on a different queue:

var file: URL!

init() {
    Timer.scheduledTimer(withTimeInterval: 1, repeats: true) { _ in
         // MARK: uncomment for crash
        // DispatchQueue.global().async {
            self.test()
        // }
    }
}

func test() {
    if file != nil {
        test_phase2()
    } else {
        test_phase1()
    }
}

func test_phase1() {
    print("phase1")
    let fm = FileManager.default
    let tempFolder = URL(fileURLWithPath: NSTemporaryDirectory())
    let name = UUID().uuidString
    file = tempFolder.appendingPathComponent(name)

    // MARK: first create a file
    let data = "Hello".data(using: .utf8)!
    try! data.write(to: file)
    
    // MARK: "cache" resource values
    let values = try! file.resourceValues(forKeys: [.isDirectoryKey])
    assert(values.isDirectory == false)

    // MARK: then delete the file and create a directory with the same name
    try? fm.removeItem(at: file)
    try! fm.createDirectory(at: file, withIntermediateDirectories: false, attributes: nil)
}

func test_phase2() {
    print("phase2")
    let fm = FileManager.default
    
    let values2 = try! file.resourceValues(forKeys: [.isDirectoryKey])
    assert(values2.isDirectory == true) // Fails

    // MARK: cleanup
    try? fm.removeItem(at: file)
    file = nil
}

Can this be improved as part of this FOU? The quick & dirty fix would be to never cache when on a secondary queue (or running in a RunLoop-less daemon environment). As Gwendal rightfully mentions above the "bug compatibility" is also an important concern, so the new/fixed behaviour should be an opt-in.