Is there a way to validate a URL from a string

Hi,

Almost everything seems to return a URL except a empty space.
Is there a way to check if a string is a valid URL?

import Foundation

let url1 = URL(string: "www.example.com")
let url2 = URL(string: "example")
let url3 = URL(string: ".")

Edited

I found this validation/URLValidator.swift at master · vapor/validation · GitHub

Thanks

1 Like

Unfortunately, “URL” is an overloaded concept, and what is “valid” depends on context.

5 Likes

ah ok. Probably that is the reason I couldn't find any well defined answers.

I guess in my context, a valid url is anything that the app can open.

In that case, only “the app” (whatever that is) can answer.

Apologies if the following is out of scope for the swift forum.

May be I will give some background to what I originally attempted to do.

I wanted to accept a URL from the user in a SwiftUI app.

I used the following code:

TextField("URL", value: $url, format: .url, prompt: Text("URL"))

Unfortunately that returned nil unless the user added a / at the end.

So was finding a way to do it.

The only way to find out if the URL is something that the app can open is to try opening it and handle the error if it fails. The URL can be as syntactically valid as you like, if somebody has removed the resource it is supposed to point to, you won't be opening it.

1 Like

@jeremyp Thanks!! that makes sense I will try that.

That's not to say you should never bother to do any validation. For example, you might want to only accept file URLs or restrict HTTP urls to HTTPS or whatever. These checks are most easily done after you have created the URL when you have easy access to the components.

You can't imagine the embarrassment I once had when a product I was installing (not an Apple product or anything to do with Swift) was quite happy to open file:///etc/passwd for the security penetration test team.

2 Likes

That is a really good point, I better check only for allowed schemes.

I guess the app is sandboxed yet might be a good idea to check for certain allowed schemes

N.B. That the URL:string failable initializer considers ”foo” a “valid URL”.

This surprised me, since I would have maybe called it a valid URI - probably incorrect assumptions by me.

So if you are talking about URL in the context of addresses to website you should most likely not rely on that initialializer as a “validator”.

1 Like

What is going on here is that Foundation's URL type models two things - URLs and relative references, using a single type named "URL".

The problem is that a relative reference is not, by itself, an identifier; just as a set of directions cannot be used to identify a place - however, if we have a starting point (an origin, or "base URL"), we can resolve a set of directions relative to that origin, in order to arrive at a new identifier.

It is simply a conceptual error to combine them in a single type. These strings represent different data, and there are operations which may make sense for one but not the other.


What makes this even more challenging is that the URL API is confused and incorrect. URL purports to follow RFC-1738 and RFC-1808. The former gives us the most basic description of URL syntax:

In general, URLs are written as follows:

    <scheme>:<scheme-specific-part>

A URL contains the name of the scheme being used (<scheme>) followed
by a colon and then a string (the <scheme-specific-part>) whose
interpretation depends on the scheme.

Basically - schemes are mandatory, everything else is optional. This is the basic premise of URLs.

RFC-1808 came along later, and defined "relative URLs". It describes itself as "a companion to RFC 1738, which specifies the syntax and semantics of absolute URLs", and gives us a basic description of what relative URLs look like:

The syntax for relative URLs is a shortened form of that for absolute
URLs, where some prefix of the URL is missing and certain path
components ("." and "..") have a special meaning when interpreting a
relative path.

This is a good way to think about these concepts - as companions, but not the same thing. URLs as defined by RFC-1738 are referred to as "absolute URLs" by this standard. Absolute URLs must begin with a scheme, and since relative URLs drop some prefix of their URL, they necessarily do not include a scheme. Fundamentally, that is how you tell absolute vs. relative URLs apart.

And yet, Foundation's URL type accepts relative URLs as input (without a scheme, and without specifying a base URL to get a scheme from), and includes a property called absoluteURL which returns a non-optional URL. There is simply no way for that API to work:

URL(string: "foo")!.absoluteURL  // "foo" - Not an absolute URL :(

In short, you're stumbling upon several weaknesses of Foundation's URL model. It combines separate concepts in a single type, which is a poor design, and its APIs demonstrate a confused understanding of URL concepts, which is of course not ideal for a URL library.

I've been creating a new URL library, WebURL, which remedies several of these flaws. A WebURL value always contains an absolute URL, and if you want to resolve a relative reference, you call the .resolve function, which returns another absolute URL.

So things like WebURL("foo") return nil, as you expect. It is also much more lenient about the correctly-structured URL strings it accepts, as it is built to match modern browsers. I've noticed in particular some social media libraries and applications using it, as Foundation sometimes fails to parse URLs which work in browsers and are hence used across the web.

4 Likes

@Karl Thank you so much that thorough explanation!!

For a beginner like me, this could easily trip me.

I will have a look at your library, thank you!

Questions:

  1. Just wondering if changes would be made to Foundation's URL or is it too critical / big change (design change) that it might break a lot of existing code depending on it?

  2. Or does Swift / Swift standard library intend in future of building a URL type?

  1. The issues are core parts of Foundation.URL's object model. It's not a quick fix - large swathes of the API should really be removed (absoluteString, absoluteURL, baseURL, relativeString, relativePath, etc), and other parts would need to significantly change their behaviour.

    It's not something that can just be fixed. The entire API needs a redesign - which is what I've been doing with my library.

  2. I have no knowledge of possible additions to the Swift standard library or any of Apple's libraries. That said, I have invested a significant amount of time familiarising myself with the URL standard (as well as related standards such as Unicode's IDNA standard, UTS46), and implementing the library, and would be delighted to collaborate on any future efforts based on WebURL.

    WebURL is one of the fastest and most comprehensively-tested implementations available, makes extensive use of Swift language features, and includes novel API designs which solve issues that have plagued libraries in other languages. Obviously I'm biased, but I think it's the best URL library around, and an asset to the Swift ecosystem. If there is to be some kind of official new URL type, I think it should start by leveraging the work that I've made available for free as an open-source package.

3 Likes

@Karl Thank you so much for that explanation!

Really good effort to build something entirely on Swift that others could use. Your library would be very helpful for me, thanks a lot!!!

2 Likes

This is fantastic work @Karl, would you consider starting a discussion around bringing something like WebURL into Swift (as a first step before a proposal)?

2 Likes