Multiple hostname:port elements in a URL

i’m trying to parse a mongodb connection string, which has the form:

mongodb://[username:password@]host1[:port1][,...hostN[:portN]][/[defaultauthdb][?options]]

but it looks like WebURL only supports a single hostname and port pair. is there a way to parse multiple host-port pairs with WebURL?

The URL specification indicates that:

The MongoDB connection string does not meet these requirements and you'll probably have to implement this parsing yourself, as anything that wants to enforce standards-compliant URL parsing will not be able to support those strings.

3 Likes

Yeah the MongoDB connection string isn't a standards-compliant URL.

While it is true that the generic syntax only allows for a single host-and-port combo, you can go beyond the generic syntax and add substructure to any component, including the host. Because mongodb: is not a special scheme, its hostname is typically considered opaque (application-defined), so it is possible to encode a list of hosts in that opaque string:

We want the whole list to be considered the "host":

monogodb://foo:42,bar:64/...
host       ^^^^^^^^^^^^^ = "foo:42,bar:64"

Unfortunately, : is a reserved character, so when it is used unescaped like this, it is interpreted as a host/port delimiter within the single host-and-port combo of the generic syntax.

There are two ways around this:

  1. Escape the :, making it clear that it is being used as a literal colon, not as a URL delimiter.

    monogodb://foo%3A42,bar%3A64/...
    host       ^^^^^^^^^^^^^^^^^ = "foo%3A42,bar%3A64"
    host (unescaped)             = "foo:42,bar:64"
    

    This would be a standards-compliant URL, as you can see by checking it against the reference implementation in the Live URL Viewer.

  2. Use a different delimiter. For example, $, *, and ~ are all allowed, and (IMO) are unlikely to appear in the hostnames themselves.

    monogodb://foo$42,bar*64,baz~99/...
    host       ^^^^^^^^^^^^^^^^^^^^ = "foo$42,bar*64,baz~99"
    

    This is also a standards-compliant URL - Live URL Viewer.

Of course, you would need the client which ultimately processes this URL and connects to the MongoDB server to understand this - it would need to either unescape the host-port pairs when parsing the host (maybe it already does?), or accept the alternate delimiter.

1 Like

damn. is it not valid even if we consider it an opaque URI?

unfortunately, i cannot change the format of the connection string, because it is defined by the mongodb connection URL specification.

is there truly no way i can leverage WebURL’s URL parsing capabilities to parse these URLs?

Well, there is mongodb-js/mongodb-connection-string-url, which is a Javascript library. They use a regex to extract the first few components, then rewrite the URL string to use a dummy hostname. The rewritten string is parsed as a WHATWG URL, and the original hostname is stored alongside.

Since WebURL conforms to the same WHATWG URL standard, you should be able to create a wrapper which does the same, and get the same result.

I know that is perhaps not the cleanest solution, but that package is used by the official MongoDB driver for NodeJS, and npm reports 1.6 million downloads/week, so it should work well.

Unfortunately, this is one of the drawbacks of custom identifiers (including incompatible forks) - they are incompatible with existing tooling.

2 Likes