Foundation URL Improvements

I like the way the WHATWG standard (which claims ownership of form-encoding) describes it. I've read a fair few standards documents, and I've never seen anything like this before:

The application/x-www-form-urlencoded format is in many ways an aberrant monstrosity, the result of many years of implementation accidents and compromises leading to a set of requirements necessary for interoperability, but in no way representing good design practices. In particular, readers are cautioned to pay close attention to the twisted details involving repeated (and in some cases nested) conversions between character encodings and byte sequences. Unfortunately the format is in widespread use due to the prevalence of HTML forms.

It really is something not even a mother could love.

11 Likes

Good catch. I meant precondition. The plan is to treat malformed Strings as programmer errors even in the release builds.

I thought about that in the earlier version of the proposal. We decided URL isn't the best fit for ExpressibleByStringLiteral because technically it isn't. Some strings can't be turned into an URL. Imagine this scenario:

let url: URL = "www.apple.com"

It would be pretty confusing if/when this crashes.

Ahh thanks for pointing that out!

This is a rename of the existing method appendingPathComponent() so it allows multiple components. We purposely chose path as the argument label to indicate that you are appending a path, so it could contain multiple path components.

I agree! Let us know if you have other suggestions :grin:

The cached resource values does not impact equality. The URL caches are cleared on the next turn of the RunLoop (which, unfortunately, also means runloop-less daemons and such NEVER get them invalidated). It's primarily designed for the situation where a URL is passed between several different layers of code that end up repeating access of the same metadata. In these situations, the URL cache could reduce the amount of disk access needed.

1 Like

True, although I suppose the same could be said for the proposed StaticString initializer. Are the call-sites really different enough to conclude that one makes the mistake obvious and the other does not?

let url: URL = "www.apple.com"
let url = URL("www.apple.com")

I see. I actually quite it somewhat surprising that .appendingPathComponent allows you to append multiple components. It seems like the user is explicitly saying that the provided string should be considered a single component and percent-encoded accordingly (even if it contains user input which happens to contain a forward-slash).

func urlForBand(_ bandName: String) -> URL {
  URL(string: "https://example.com/music/bands")!
    .appendingPathComponent(bandName)
}

// Imagine the band name is not a string literal,
// but instead derived from runtime data (perhaps a search field).
// This results in 2 components:
urlForBand("AC/DC") // https://example.com/music/bands/AC/DC

Also, if I can control the bandName parameter in the above example, I can add any number of ".." components and take control of the entire path. That could potentially have security implications in certain circumstances, and in other situations just leads to particular inputs returning the incorrect result unless the developer knows about this and manually adds percent-encoding.

Okay, that's quite interesting. I took a brief look at the implementation in corelibs-foundation, and this seems to be part of quite an elaborate mechanism.

  • Every URL (including HTTP URLs, which I don't believe make use of resource values) has to store an extra pointer for this
  • Every URL needs to pay for potentially cleaning up this object at deinit (it's often overlooked, but cheap destruction is just as important as cheap construction)
  • The pointer has to be initialized and cleared using atomic instructions
  • Getting and setting resource values additionally must be gated using an NSLock

I wasn't actually able to find the location where RunLoop clears these values (perhaps it's not implemented in corelibs-foundation?), but it seems to me that this requires one of two implementation strategies:

  • Every URL with resource values must register itself with the RunLoop for later cleanup. Essentially, you could think of these auto-cleaning caches as similar to auto-zeroing weak references.
  • The cached values must be paired with some kind of tick number from the runloop. They will stay in memory, but return nil if the current tick number is different to the cached one.

So, how about an alternative: move this caching behaviour to FileManager.

  1. This could more easily be made to work for file paths as well
  2. It could more reliably work for all URLs and paths which resolve to the same file, rather than being state tied to a particular URL instance
  3. The size of URL instances can shrink by 64 bits (multiplied by n URLs, that could be significant), and instances become a little cheaper to destroy
  4. Because Apps generally have many more URLs than they do FileManagers, overall memory usage should be lower
  5. It becomes a lot easier to wipe the entire cache for all paths/URLs on each iteration of the runloop, because they'd all be stored in a single Dictionary rather than being split across n URL instances. FM could even offer new APIs for greater control of this cache.
4 Likes

Can the new function signatures be appending(subPath:) and appending(subPaths:), so it's clear that they don't take whole paths.

This design is quite dated and deserves replacement. The following fragment illustrates the problem:

Timer.scheduledTimer(withTimeInterval: 1, repeats: true) { _ in
    let fm = FileManager.default
    let tempFolder = URL(fileURLWithPath: NSTemporaryDirectory())
    let name = UUID().uuidString
    let file = tempFolder.appendingPathComponent(name)

    // MARK: first create a file
    let data = "Hello".data(using: .utf8)!
    try! data.write(to: file)
    
    // MARK: "cache" resource values
    let values = try! file.resourceValues(forKeys: [.isDirectoryKey])
    assert(values.isDirectory == false)

    // MARK: then delete the file and create a directory with the same name
    try? fm.removeItem(at: file)
    try! fm.createDirectory(at: file, withIntermediateDirectories: false, attributes: nil)
    let values2 = try! file.resourceValues(forKeys: [.isDirectoryKey])
    assert(values2.isDirectory == true) // Fails
    
    // MARK: cleanup
    try? fm.removeItem(at: file)
}

The change to the file hierarchy could be done externally. I didn't check it but it could be also possible to get mixed results (some keys of the same item from old cached entries, some that weren't in cache newly obtained).

"AutoreleasePool style" caching, for the duration of current run loop invocation (like 1/100 second) smells bad, as if developers didn't want (or were short on time) to invest in a proper caching / cache invalidation architecture. There's a lost optimisation opportunity in the current resource caching as it is very short lived and there could have been a significant speedup if caching outlived run loop invocations (and were totally unaware of the very concept of run loop!)

:100: my thoughts. If I am not mistaken FileManager already has caching, so instead of introducing another caching on top (and to the side) of it resource caching can be inside FM.

If we were implementing resource caching today it's unlikely we'd end up with the design currently implemented.

I hope the Foundation team:

  • firmly intends to not break existing code in a whim, including code which relies on its bugs/quirks/features.
  • was prepared for a "culture clash" as they decided to open up their plans to this forum.

I think it is important that we do not throw "why don't you just" sentences to the Foundation team, or blame design decisions in a hurry and without consideration.

Since Swift was introduced, we have all seen Foundation make progress towards a better Swift integration. It would be respectful to acknowledge that the Foundation team is quite aware of frictions with the new language. The mere presence of those new FOU proposals is a clear sign that they are looking to accelerate this process, and are looking for feedback, support, and contributions. I understand that many years of complete opacity and "black hole" radars have created frustration. But irony, disregard, and "api-ageism" are rhetorical tools that are unlikely to have the desired effect - and are painful to read.

A backward-compatible solution for the Foundation.URL cache issues could be to phase out apis and behaviors that go against immutability, thread-safety and Swift concurrency, and generally hinder progress (through deprecation, or Pitch: Unavailability from asynchronous contexts). Those legacy apis would be replaced with new ones, available from both ObjC and Swift. Or those legacy apis would not be replaced at all, leaving the job to another Core Swift library (stdlib, Swift System, ...).

22 Likes

Indeed! The following test passes when running on the main thread and fails when running on a different queue:

var file: URL!

init() {
    Timer.scheduledTimer(withTimeInterval: 1, repeats: true) { _ in
         // MARK: uncomment for crash
        // DispatchQueue.global().async {
            self.test()
        // }
    }
}

func test() {
    if file != nil {
        test_phase2()
    } else {
        test_phase1()
    }
}

func test_phase1() {
    print("phase1")
    let fm = FileManager.default
    let tempFolder = URL(fileURLWithPath: NSTemporaryDirectory())
    let name = UUID().uuidString
    file = tempFolder.appendingPathComponent(name)

    // MARK: first create a file
    let data = "Hello".data(using: .utf8)!
    try! data.write(to: file)
    
    // MARK: "cache" resource values
    let values = try! file.resourceValues(forKeys: [.isDirectoryKey])
    assert(values.isDirectory == false)

    // MARK: then delete the file and create a directory with the same name
    try? fm.removeItem(at: file)
    try! fm.createDirectory(at: file, withIntermediateDirectories: false, attributes: nil)
}

func test_phase2() {
    print("phase2")
    let fm = FileManager.default
    
    let values2 = try! file.resourceValues(forKeys: [.isDirectoryKey])
    assert(values2.isDirectory == true) // Fails

    // MARK: cleanup
    try? fm.removeItem(at: file)
    file = nil
}

Can this be improved as part of this FOU? The quick & dirty fix would be to never cache when on a secondary queue (or running in a RunLoop-less daemon environment). As Gwendal rightfully mentions above the "bug compatibility" is also an important concern, so the new/fixed behaviour should be an opt-in.

I agree with the decision to avoid ExpressibleByStringLiteral conformance: it’s inappropriate for types that can’t accept all strings. The new StaticString initializer has runtime-checked preconditions, so the explicit call is important.

As for changing existing Foundation.URL behavior: I think the best thing to do is replace the type entirely with things like SystemPackage.FilePath and WebURL. Foundation.URL should be kept as a sort of legacy option.

Note that WebURL isn’t actually stable yet, so everyone should probably keep using Foundation.URL in production code anyway.

2 Likes

URL indeed feeds tons of apis, from Data.init(contentsOf:options:) to URLRequest, and URLRequest itself feeds a lot of other tools, URLSession, URLProtocol... It's called "Foundation" for a reason.

2 Likes

Foundation is an excellent framework full of powerful tools. It is also literally incapable of fully embracing Swift, due to the need for Objective-C compatibility (enumerations, etc.). Its monolithic nature also makes it difficult to implement across all the platforms Swift can run on, which include things like embedded systems.

In light of that, Foundation should avoid breaking changes to defined behavior: ultimately, it should be replaced by modern Swift packages (the core libraries) and/or the Standard Library. And yes, that means that all of those will likely need to be replaced someday too. That day is not now.

5 Likes

URL is the most frequently used foundation type after String / Data / Array / Dictionary. It is used in gazillion of places (QuickLook, Metal, CoreImage, EventKit to name just a few not obvious). It seems it would be much easier to fix URL rather than move everything on top of a new type. If WebURL can run in a special mode, that makes it 100% compatible with URL (including API and bug for bug compatibility) then URL can be just typealias for WebURL, and with appropriate opt-in the bug for bug compatibility requirement can be lifted ("opt_in_for_fewer_url_bugs_and_faster_performance = true").

Foundation.URL, through no fault of the Foundation team, is a bad tool for the job. It conforms to an obsolete and ambiguous standard, tries to fill many disparate roles, and is shackled by the need to remain Objective-C compatible.

Three of the four examples you just gave suffered from the same sort of issues: Foundation.NSString, Foundation.NSArray, and Foundation.NSDictionary. They were vastly more common than Foundation.URL ever was or will be, and they were wisely replaced by completely new types. Sure, they did have to implement hacky toll-free bridging to make that happen, but it happened.

Foundation.URL is unfixable. Applying the fixes needed would render it unsuitable for at least some of its existing roles, and constitute an overhaul so comprehensive that you would really be replacing it with a new type anyway.

Its primary successors (SystemPackage.FilePath, WebURL, etc.) are specifically designed to make conversion as painless as it can be without compromising on other things. For the moment, WebURL is not stable. FilePath is, however, and everyone using Foundation.URL for files should be considering whether they can use it instead (converting to Foundation.URL as necessary for APIs still tied to it).

1 Like

Whatever foundation team does here at the end of the day is widely expected to be Objective-C compatible! Even if the new implementation is in swift with Objective-C shim layer on top. Or are you suggesting Objective-C users shall not benefit from latest and greatest changes?

I do not believe it is so bad... Please give concrete examples that proves it unfixability. The very topic's question is about "how can we improve it", and I believe we can.. Even if by the end of the day improvement means:

 typealias URL = WebURL

with WebURL having some compatibility mode that replicates all URL's bugs and quirks (with explicit opt-out of the bugs & quirks).

Bridging is the crux of the matter here.... you may change URL to WebURL or to whatnot you like - just do the bridging that for all intents and purposes makes the thing acting the same way as the original URL - that's the way to go.

There's a lot of great feedback in this thread. A sincere thank you to everyone who has responded so far. I'll leave some of the more specific responses on URL itself to @icharleshu but do want to discuss Foundation and its process at a higher level.

I talk about Foundation to a lot of people, and I have worked on the library myself for quite some time. I think one of the best ways to describe it is as a double-edged sword.

On one hand, it has the burden of maintaining compatibility with hundreds of thousands of distinct apps and libraries. Those apps use APIs like URL in many ways that are unanticipated or unexpected, beyond the promises we thought we made about how these types should behave.

On the other hand, it has the potential of improving hundreds of thousands of apps and libraries with even small changes to APIs like URL -- both in the implementation and in the interface.

There are certainly times where it makes sense to start over. We've done this a few times in Foundation for new Swift API over the past few releases. Most recently it was with AttributedString and the new FormatStyle family for formatting dates, numbers, and more. There are also times where it makes sense to improve the existing APIs, because then the cost of adoption is very low for all existing clients.

So, we have responsibility but also opportunity. These are the kinds of tradeoffs we make in our API design every day. Personally, I think it's an exciting place to be. This thread, and the one about Locale, are about opening up this opportunity to more people: the ones who count themselves as authors of those thousands of apps and libraries.

Our basic goal is to gather more ideas and feedback. Of course, we will have to do some categorization of changes on the spectrum of "minor, compatible improvement" to "complete rethink." Sometimes the latter may not be practical or even possible. I expect that different people with different interests will come to different conclusions about what the best path forward is. I don't think that we will end up in a place where literally everyone is in agreement. For this particular RFC, the changes are certainly more on the "minor, compatible improvement" side. That doesn't mean ideas on the other side aren't valuable or interesting. We may just need to split them up.

Overall, what I believe is that your ideas and discussions here, along with the experience and time commitment from the Foundation team, will result in a better API for everyone in the end.

Now, as far as a concrete process for this RFC and also Locale: what we hope to do is open up this discussion here for a few weeks and iterate on changes. Then we will put a bow on it so that we can move on to other APIs and ideas. Hopefully we can land implementations in swift-corelibs-foundation around that time so that people can try out these APIs before they are finalized. There isn't a formal process yet, because honestly I'd like to grow one a bit more organically to see what works best for all of us, because the needs of a library like this are a bit different than those for the compiler and language. We'll figure it out together.

37 Likes

As far as the language is concerned, I think the biggest issue Foundation (and all system frameworks) have is portability: once you use it in Swift code, especially packages, the code becomes considerably less versatile. Platform support becomes a very complicated question.

I believe that people will be more willing to adopt Apple’s libraries and technologies if they could be confident they will work on non-Apple platforms. I’d certainly be more willing to use Combine if it was an open-source package. Since Foundation is dependent on the Objective-C runtime, that’s only really feasible if Foundation’s functionality is reimplemented as various packages too.

There’s been a lot of progress recently towards accomplishing that, and I think it should continue.

2 Likes

@icharleshu

(1) you think this will improve the ergonomics of using URL

Those, definitely, yes.
I create nearly exactly the same extensions for project that has to deal with local system file paths.
Very happy to see the type being reconsidered.

I often create extensions that resembles:

extension URL {
    static var cachesDirectory: URL { get }
    var contents: [URL] { get }
    var fileExists: Bool { get }
    var isDirectory: Bool { get }
    func remove()
    func appendingDir(path: String) -> URL
    func appendingFile(path: String) -> URL
    func createDirectoryIfNeeded() -> URL
}

.

(2) if there are other straightforward changes we should consider to help. Thanks!

  • Pure Swift? (This is Foundation improvement so pointless?)
  • As already mentioned by others, URL query support incl. automatic (or, as it's mentioned above that there's no single rule, so something close) percent encoding would be nice.
  • More imports from FileManager?

I know this goes beyond 'improvements', – please don't take this serious too much – but this is a good opportunity to express what I've been feeling.

I remember the very first time I started to learn Swift and use NSURL (it was Swift 1 or 2 I guess), I got confused that it handles both Remote (http, etc) and Local (file://) schemes (I didn't have Obj-C experience back then).

Now I know how the type, Foundation and ObjC behave and it's okay, but sometimes it still feels a bit (just a little bit) awkward that it's mixed in a single type. Kind of feels like 2 types are in 1 type, suffering from ObjC legacy.

I mean, ultimately I want something like:

public protocol URL
public struct WebURL : URL
public struct FileURL : URL
2 Likes

The replacements for Foundation.URL are in the works, and they work pretty much exactly like that: SystemPackage.FilePath for files, WebURL.WebURL for webpages.

FilePath is already stable, and as the original post says is the better option if you aren’t dependent on resource value caching.

WebURL is not stable yet, but it will be eventually.

2 Likes

In addition to wonderful URL.resourceBytes an ability to read a number of bytes from offset would be great to have. e.g. in this form:

url.resourceBytes[10000 ..< 10200]

if that's impossible then e.g this:

url.bytes(offset: 10000, size: 10200)

for file url read from offset is straightforward, for web URL HTTP range header can be used, if unsupported by the server raise an error.

The slice could be possible in a general form on all async sequences and not just URL's AsyncBytes:

extension AsyncSequence {
  public subscript<R: RangeExpression>(_ range: R) -> AsyncSlice<Self> where R.Bound == Int {
    let r = range.relative(to: 0..<Int.max)
    return AsyncSlice(self, range: r)
  }
}

public struct AsyncSlice<Base: AsyncSequence> {
  let base: Base
  let range: Range<Int>
  init(_ base: Base, range: Range<Int>) {
    self.base = base
    self.range = range
  }
}

extension AsyncSlice: AsyncSequence {
  public typealias Element = Base.Element
  
  public struct Iterator: AsyncIteratorProtocol {
    let range: Range<Int>
    var iterator: Base.AsyncIterator
    var index = 0
    
    init(_ iterator: Base.AsyncIterator, range: Range<Int>) {
      self.iterator = iterator
      self.range = range
    }
    
    public mutating func next() async rethrows -> Element? {
      while index <= range.lowerBound {
        _ = try await iterator.next()
        index += 1
      }
      
      if index >= range.upperBound {
        return nil
      }
      defer { index += 1 }
      return try await iterator.next()
    }
  }
  
  public func makeAsyncIterator() -> Iterator {
    Iterator(base.makeAsyncIterator(), range: range)
  }
}

I think those types of additions should be considered separately for their own merit and not just on URL.

3 Likes

So long as the solution leads to "HTTP range header" for web urls and "read from offset" for file urls (and thus won't incur unnecessary I/O of reading some leading bytes and throwing them away) - it's good. To put it differently read(offset:length:) must move O(length) bytes across the bus / wire. Possibly accompanied with some readAnyway(offset:length:) that first tries the fast path and if that fails (say, for web URLs for servers that do not support HTTP range headers) performs the operation anyway reading all bytes from the start and throwing leading bytes away, moving O(offset + length) bytes across the bus / wire.

3 Likes