i’m trying to parse a mongodb connection string, which has the form:
mongodb://[username:password@]host1[:port1][,...hostN[:portN]][/[defaultauthdb][?options]]
but it looks like WebURL only supports a single hostname and port pair. is there a way to parse multiple host-port pairs with WebURL?
MPLewis
(Mike Lewis)
2
The URL specification indicates that:
The MongoDB connection string does not meet these requirements and you'll probably have to implement this parsing yourself, as anything that wants to enforce standards-compliant URL parsing will not be able to support those strings.
3 Likes
Karl
(👑🦆)
3
Yeah the MongoDB connection string isn't a standards-compliant URL.
While it is true that the generic syntax only allows for a single host-and-port combo, you can go beyond the generic syntax and add substructure to any component, including the host. Because mongodb: is not a special scheme, its hostname is typically considered opaque (application-defined), so it is possible to encode a list of hosts in that opaque string:
We want the whole list to be considered the "host":
monogodb://foo:42,bar:64/...
host ^^^^^^^^^^^^^ = "foo:42,bar:64"
Unfortunately, : is a reserved character, so when it is used unescaped like this, it is interpreted as a host/port delimiter within the single host-and-port combo of the generic syntax.
There are two ways around this:
-
Escape the :, making it clear that it is being used as a literal colon, not as a URL delimiter.
monogodb://foo%3A42,bar%3A64/...
host ^^^^^^^^^^^^^^^^^ = "foo%3A42,bar%3A64"
host (unescaped) = "foo:42,bar:64"
This would be a standards-compliant URL, as you can see by checking it against the reference implementation in the Live URL Viewer.
-
Use a different delimiter. For example, $, *, and ~ are all allowed, and (IMO) are unlikely to appear in the hostnames themselves.
monogodb://foo$42,bar*64,baz~99/...
host ^^^^^^^^^^^^^^^^^^^^ = "foo$42,bar*64,baz~99"
This is also a standards-compliant URL - Live URL Viewer.
Of course, you would need the client which ultimately processes this URL and connects to the MongoDB server to understand this - it would need to either unescape the host-port pairs when parsing the host (maybe it already does?), or accept the alternate delimiter.
1 Like
damn. is it not valid even if we consider it an opaque URI?
unfortunately, i cannot change the format of the connection string, because it is defined by the mongodb connection URL specification.
is there truly no way i can leverage WebURL’s URL parsing capabilities to parse these URLs?
Karl
(👑🦆)
5
Well, there is mongodb-js/mongodb-connection-string-url, which is a Javascript library. They use a regex to extract the first few components, then rewrite the URL string to use a dummy hostname. The rewritten string is parsed as a WHATWG URL, and the original hostname is stored alongside.
Since WebURL conforms to the same WHATWG URL standard, you should be able to create a wrapper which does the same, and get the same result.
I know that is perhaps not the cleanest solution, but that package is used by the official MongoDB driver for NodeJS, and npm reports 1.6 million downloads/week, so it should work well.
Unfortunately, this is one of the drawbacks of custom identifiers (including incompatible forks) - they are incompatible with existing tooling.
2 Likes