I actually don't think this is very desirable.
Strings are messy, and for initialisers which parse runtime strings, I think there is value in limiting the set of accepted inputs instead of being overly permissive. If I was parsing some data, and the string "0_0"
parsed as an integer, I would find that unexpected. Same with "0_0____3___"
. Both are accepted by the compiler as integer literals, though.
Literals are different from runtime strings. The value is right there staring at you, so we can build some conveniences in to the language with relatively low risk of surprising people.
I would recommend reading The Harmful Consequences of the Robustness Principle, an informational IETF document which argues that prioritising permissiveness has led to a gradual decline in quality for internet standards. To summarise:
The robustness principle, often phrased as "be conservative in what you send, and liberal in what you accept", has long guided the design and implementation of Internet protocols. The posture this statement advocates promotes interoperability in the short term, but can negatively affect the protocol ecosystem over time.
[...]
Applying the principle defers the effort of dealing with interoperability problems, which prioritizes progress. However, deferral can amplify the ultimate cost of handling interoperability problems.
Divergent implementations of a specification emerge over time. When variations occur in the interpretation or expression of semantic components, implementations cease to be perfectly interoperable.
Implementation bugs are often identified as the cause of variation, though it is often a combination of factors. Application of a protocol to uses that were not anticipated in the original design, or ambiguities and errors in the specification are often confounding factors. Disagreements on the interpretation of specifications should be expected over the lifetime of a protocol.
Even with the best intentions, the pressure to interoperate can be significant. No implementation can hope to avoid having to trade correctness for interoperability indefinitely.
An implementation that reacts to variations in the manner recommended in the robustness principle sets up a feedback cycle. Over time:
- Implementations progressively add logic to constrain how data is transmitted, or to permit variations in what is received.
- Errors in implementations or confusion about semantics are permitted or ignored.
- These errors can become entrenched, forcing other implementations to be tolerant of those errors.
Consider what would happen if Swift applications parsing integers using Int(someString)
suddenly started accepting underscores -- they would permit new variations in data they receive (the first point). If those Swift applications gain some traction, other applications will be pressured to support underscores as well for the sake of interoperability. Even if the authors of that Swift application never intended to support this notation, they do and it has now spread throughout some unsuspecting ecosystem.
I can point to similar situations elsewhere in computing. For example, when parsing an IPv4 address in a URL, most browsers in the 90s would defer to libc's inet_aton
function, which seemed to be the obvious choice (much like Int(String)
is the obvious way to parse an integer). Unfortunately, this function accepts a wide variety of inputs -- far more than was actually intended to be supported in URLs.
Decades later, those inputs still need to be supported, for fear that somebody may be depending on them. The result is that https://0xbadf00d/
is a valid URL, and equivalent to https://11.173.240.13/
. That adds significant implementation complexity, and some accepted inputs have skirted on the edge of being security vulnerabilities. All to support a feature that nobody ever wanted in the first place.
So yeah - it is good to have a simple, strict parser without these kinds of surprising edge-cases.
As a secondary matter, the added implementation complexity could potentially hurt performance. I would consider Int(String)
to be a high-impact function, and I don't think supporting these notations is so important that everybody who uses the function should pay for it.
If you want to parse integer literals like the compiler does when interpreting Swift source code, I think the swift-syntax
library should provide that.