Retain baseURL while enumerating / finding files recursively

Hi! I want to find all files with a given name within a given directory and its subtree.

If the URL that I give as the search directory happens to have a baseURL and a relative part, then I want the resulting URLs to still have that same baseURL

Turns out enumerator returns URLs that does not retain the baseURL and I couldn't find any simpler way than the following (which I'm not even sure is correct) to achieve what I want. Is there a better way?

func findFiles(named fileName: String, in directory: URL) -> [URL] {
  let result = FileManager.default.enumerator(at: directory, includingPropertiesForKeys: nil)?
    .compactMap { $0 as? URL }
    .filter { $0.lastPathComponent == fileName } ?? []
  if let baseURL = directory.baseURL {
    return result.compactMap { $0.relative(to: baseURL) }
  } else {
    return result
  }
}

extension URL {
  func relative(to baseURL: URL) -> URL? {
    let fullString = absoluteString, baseString = baseURL.absoluteString
    guard fullString.hasPrefix(baseString) else { return nil }
    return URL(string: String(fullString.dropFirst(baseString.count)), relativeTo: baseURL)
  }
}

I would not recommend relying on baseURL for anything. Restructure your code so you do not.

The issue is that the baseURL/relativeString split is a property of the Foundation URL object model only. It does not exist in the URL string, and so it is very easy to lose just by converting a URL to a string and back.

The idea of a baseURL property just doesn't work. Consider - what happens if your baseURL has its own baseURL? In theory, any URL could be holding on to the memory for an unbounded chain of base URLs. Clearly this would be impractical, so the framework automatically clips it:

import Foundation

let base = URL(string: "http://example.com/")!

let rel1 = URL(string: "foo/", relativeTo: base)!
print(rel1)                  // foo/ -- http://example.com/
print(rel1.baseURL == base)  // true

let rel2 = URL(string: "bar/", relativeTo: rel1)!
print(rel2)                          // bar/ -- http://example.com/foo/
print(rel2.baseURL == rel1)          // false (!)
print(rel2.baseURL?.baseURL == base) // false (!)

But here we see this automatic clipping giving us a surprising result; because the Foundation object model contains these baseURL and relativeString properties, it means two URLs with the same absoluteString are not interchangeable, and hence do not compare as == to each other.

So what happens above is that when we wanted to create rel2, the framework noticed that rel1 has a base URL, and to avoid creating a large chain of objects, collapsed it in to an "equivalent" absolute URL. But other parts of the object model do not agree that rel2.baseURL really is equivalent to rel1, so we get this confusing situation where rel2.baseURL == rel1 is false.

What's more, we were just creating the URL http://example.com/foo/bar here. If we had used different APIs to achieve that result, our baseURL might be something else. It's very fragile.

In short, the baseURL property is just not reliable. It's not a very good concept in the API, and the framework already struggles to make it work. And again, the baseURL/relativeString split does not actually exist in the URL string, so encoding and decoding your state may lose that information entirely.

In fact, I would go further - I think the single biggest improvement we could make to Foundation's URL interface would be to deprecate the baseURL and relativeString properties, and to resolve URLs against their base at construction. This is the approach that I used for WebURL:

import WebURL

let base = WebURL("http://example.com/")!

let rel1 = base.resolve("foo/")!  // "http://example.com/foo/"
let rel2 = rel1.resolve("bar/")!  // "http://example.com/foo/bar/"

// There is no baseURL property or references to other URLs.
// Each WebURL is just a single string, and they are equal 
// if their strings are equal.
//
// If you want to refer to any of these URLs later, 
// just store them somewhere like any other variable.
4 Likes

In my use case (a command-line-tool) I want to print the relative rather than the (potentially much longer) absolute file paths, to reduce noise in the output.

From your explanation, it seems we should avoid any functionality which makes URL's with non-nil baseURL.

So then I guess I'd have to use a separate (absolute) URL to for the base path, perhaps convert it into a string directly, and then pass it along to anywhere where I need to print a relative path (as command line tools usually do), and perform purely textual operations to remove the base prefix from the (absolute) URL.

It's unfortunate that Swift's URL or FileManager doesn't provide a cleaner way to handle such a seemingly trivial use case.

You can simply pass options: .producesRelativePathURLs to FileManager.enumerate() for this specific use case:

1 Like

On second thought that might not be what you’re looking for, I believe it produces URLs relative to the directory URL, not to the directory URL’s baseURL, although I haven’t tried it out.

That's what I would recommend.

The Foundation URL model supports a lot of things (not even talking about the parser - just the object model). It supports absolute URLs, relative references, and multiple representations of each (resolved, and split baseURL/relativeString).

This is what I mean by the split representation:

import Foundation
let url = URL(string: "b/c", relativeTo: URL(string: "http://example.com/a/")!)!

// The URL components appear to be joined.

url.host  // "example.com"
url.path  // "/a/b/c"

// But underneath, the parts are stored separately.

url.relativeString  // "b/c"
url.baseURL         // "http://example.com/a/"

This amount of complexity is difficult to manage - if a user of the API constructs a URL in the wrong way (or slightly changes it during an apparently harmless refactoring), it may not behave as they expect it to:

import Foundation

let urlA = URL(string: "http://example.com/a/b/c")!
let urlB = URL(string: "/a/b/c", relativeTo: URL(string: "http://example.com")!)!
let urlC = URL(string: "b/c", relativeTo: URL(string: "http://example.com/a/")!)!

// All of these URLs have the same .absoluteString.

urlA.absoluteString == urlB.absoluteString // true
urlB.absoluteString == urlC.absoluteString // true

// But they are not interchangeable.

urlA == urlB // false (!)
urlB == urlC // false (!)
URL(string: urlB.absoluteString) == urlB // false (!)

// Let's imagine an application using URLs as keys in a dictionary:

var operations: [URL: TaskHandle] = [:]
operations[urlA] = TaskHandle { ... }
operations[urlA] // TaskHandle
operations[urlB] // nil (!)

If a library expects a particular baseURL, any code which needs to satisfy that requirement needs work very delicately.

Some APIs and operations will just drop the baseURL, and it isn't even documented which of them do that, and the API doesn't include alternatives for all operations which preserve baseURL (some of them are not even possible) -- if you need an operation, then discover when running that it drops the baseURL, but your library expects a particular baseURL, you might just be stuck.

For example:

let base = URL(string: "http://example.com")!

var urlA = URL(string: "/a/b/c", relativeTo: base)!
assert(urlA.baseURL == base)

urlA.append(queryItems: [URLQueryItem(name: "test", value: "foo")])
assert(urlA.baseURL == nil) // ?! But what's my alternative?

So, the baseURL feature:

  • Doesn't work very reliably
  • Shouldn't become part of your API contract ("this parameter is a URL whose baseURL is..." :no_good_man:)
  • Costs a lot in complexity

I've been meaning to pitch this to Foundation at some point. There may be a path forward where we just deprecate the baseURL and relativeString properties, and have the initialisers eagerly resolve the relative string against the base URL in a new SDK version. If you're not using the baseURL and relativeString properties, you shouldn't (in theory) notice the difference, except for one thing:

I think that change would be enough to finally give Foundation URL the property that if two URLs have the same absolute string, they compare as ==, so the above example with URLs as dictionary keys would work as you expect and not be so fragile.

That's why I think removing/deprecating these properties would be the single biggest improvement we could make to Foundation's URL API.


Anyway, sorry, I went off a bit. It’s rare that somebody asks about these things, so it’s an opportunity to give some broader advice based on the time I’ve spent investigating and designing URL interfaces. To answer your question, try FilePath. removePrefix looks like it might help if you know you're dealing with subpaths:

var path: FilePath = "/usr/local/bin"
path.removePrefix("/usr/bin")   // false
path.removePrefix("/us")        // false
path.removePrefix("/usr/local") // true, path is "bin"
1 Like