Consistent Numeric representation in Strings and Literals

I don’t think @taylorswift was trying to make the point that the parsing is particularly difficult—the algorithm has of course already been written. Rather, I think what folks are trying to get at is that even if we decided to implement a more permissive string-to-int conversion, it’s not clear that the right behavior for a general purpose conversion that ships in the standard library is the same as the parsing algorithm used in the Swift compiler.

The constraints and use cases for the two situations are quite different and I don’t think that an abstract desire for consistency between the two really holds up.

6 Likes

I find it rather curious that there exist two such divergent approaches in dealing with similar issues, namely converting a string or literal to an integer. At any rate I suspect the standard library routine use will diminish when StaticBigInts are used more often instead of numeric strings.

Yeah, there are different use-cases.

If you were parsing a URL you could (today) use Int.init?(String) to parse the port number. But you wouldn't want to accept a URL like http://localhost:8_0_80/. If you were parsing a data format, such as JSON or YAML, you wouldn't want to accept "0_0__" as a numeric value. If you were validating a form, you wouldn't want to accept "3___3" as somebody's age.

As has been mentioned, there are also localised number parsers for parsing non-ASCII digits and accounting for cultural conventions (e.g. Germans tend to use commas as a decimal point, and period as a thousands separator).

Source code is quite a specialised use-case. Developers sometimes have to write very long magic numbers, and the usual separators we use outside of programming such as periods, commas, and spaces are not ideal in integer literals. So we allow underscores. And we allow any number of them because developers sometimes like to align things for better code presentation. It doesn't really make a lot of sense to build that behaviour in by default; none of that really applies to other domains.

What this illustrates is that it's totally valid to build your own integer parser if you have a different use-case (as the compiler does). Here are some Apache-licensed implementations of mine you can extend if you like:

  • parseDecimalU16 parses a decimal UInt16 from UTF8 bytes. It is exhaustively tested because why not? The compiler basically generates optimal assembly for this.

  • IPv4Address.parse implements inet_aton in pure Swift, so it can parse hex, decimal, or octal 32-bit unsigned integers. It can easily be extended to signed integers, and wider bit-widths.

11 Likes

Yes, source code is my use-case and I really appreciate the integer literal separators - they really help to make code tables and extended-precision numbers more readable. Thanks for the code, I'll have a look at the parsers you've provided.

Interesting that lookup tables are used. They don't seem to see much use in other places.