There have been extensive discussions on this topic which you may be interested to read.
If, after reviewing these links, you feel like there's a new direction or perspective to be had on the topic, then by all means please do share! In that case, it can be helpful for the community if you'd write a short synopsis of what you learn from these readings so that we're not starting back at square one.
this topic was discussed to death last year so i’m just gonna summarize the main issue with this idea, which is that this becomes allowed:
'a' + 'b' == 195
there are not good ways around this problem, since if you want to use an ascii literal as an integer, then you also have to be okay with ascii literals appearing anywhere you can use an integer
the right thing to do here is to have the 'a' literals be their own ASCII type, and then somehow have a way of safely rebinding a [UInt8] buffer to an [ASCII] buffer without copying it (since the underlying memory representation is the same). but we don’t have zero-cost abstractions in swift, so this isn’t going to work.
i’m nowadays in the camp that any solution to the 'a' as UInt8 problem is probably going to have to go through newtype or something like it
"Zero-cost abstractions" usually refers to "zero runtime cost" (EDIT: although I suppose that's zero code cost and not zero data cost, because of Swift's runtime metadata), and we absolutely do have zero-cost abstractions in Swift—a struct with one member has the same layout as that member. I don't think that was the problem.
let buffer:[UInt8]
let ascii:[ASCII] = .init(buffer)
which involves copying the entire bytestring. ofc you can use the unsafe memory rebinding APIs to do this without the copy, but we shouldn’t expect people to have to drop down to withMemoryRebound(to:_:) to do ascii string processing
Why do you need this, anyway? It shouldn’t be a common task.
As for your proposal, consider that there are multiple ways to interpret a as an integer, and if Swift was going to choose one of them it would be UTF-8.
I think the current approach is fine. Just use UInt8(ascii: "a"). If you don’t want that to fail, don’t explicitly restrict it to ASCII: UInt32("a" as Unicode.Scalar).
By the way, it is really weird that that UInt32 initializer doesn't have any argument labels. You'd expect it to be init(utf8:).
That explains a lot. Still, I would expect an exception to be made here, since there are multiple initializers taking a single parameter with a type that conforms to ExpressibleByUnicodeScalarLiteral.
gives you nil. yes i know technically this is overloading on argument type which is okay, but from the perspective of the user it’s effectively overloading on return type (init(_:) vs init?(_:)) which is just bad swift.
Personally, I think the idea of exposing conversions between characters and integers in the language to be just plain weird, especially with Swift's String model. It seems straightforward at a glance, but the more you think about it, the more confusing it becomes.
Instead, I use an ASCII struct (note: this is pretty rough, and I'm still trying to refine the API).
public struct ASCII: Equatable, Comparable {
public var codePoint: UInt8
@inlinable public init(_ v: UInt8) { self.codePoint = v }
}
public extension ASCII {
// Control Characters.
@inlinable static var null : ASCII { ASCII(0x00) }
@inlinable static var startOfHeading : ASCII { ASCII(0x01) }
@inlinable static var startOfText : ASCII { ASCII(0x02) }
@inlinable static var endOfText : ASCII { ASCII(0x03) }
@inlinable static var endOfTransmission : ASCII { ASCII(0x04) }
@inlinable static var enquiry : ASCII { ASCII(0x05) }
@inlinable static var acknowledge : ASCII { ASCII(0x06) }
...
// Upper-case letters.
@inlinable static var A: ASCII { ASCII(0x41) }
@inlinable static var B: ASCII { ASCII(0x42) }
@inlinable static var C: ASCII { ASCII(0x43) }
@inlinable static var D: ASCII { ASCII(0x44) }
...
}
Additionally, I've got a bunch of heterogenous operator overloads so you can use == between ASCII-type values, Character, and UInt8, as well as pattern-match between them, like this:
let input: String = ...
let c = input[idx]
switch c {
case ASCII.questionMark:
url.query = ""
state = .query
case ASCII.numberSign:
url.fragment = ""
state = .fragment
Like I said, it's still kind of rough, but I've found it pretty okay so far. You can try it out and see what you think: ascii prototype · GitHub
the issue for me, which i brought up the last time this got slugged out was i can just never remember the full names of all the special characters
i cant find the post but i think this got ruled out pretty early on last time because it’s a very band-aidy solution and just exacerbated existing issues with over-overloaded operators
i just don’t know what even justifies the existence of this method when you can just use Unicode.Scalar.value it’s not like it’s that shorter, and its way less clear what’s going on if you don’t include the name Unicode.Scalar
.init("a") as UInt32
("a" as Unicode.Scalar).value
Oh, yeah - that happens to me quite often as well. I tried to be as objective as possible, so I took the list of names from somewhere official-looking. For example, I grew up calling # "hash" or "pound", but that list calls it "number sign" so that's what I use in that type, even though the British names are more familiar to me.
As for the operators - it works for me. Certainly the Character operators make sense IMO, but I could see some people taking issue with comparing an integer directly with an ASCII character.
i really don’t like == with different types on the left and right hands, first off it kind of contradicts the whole meaning of == second it basically doubles the amount of overloads you have to provide since it has to work in both orders
Does it? I'm not sure. That's certainly a matter for debate.
Besides, at some point you need to balance pedantry with usability. I certainly prefer doing this over the alternative of making character literals assignable to integer-type values!
Sure, but you do that once when you implement the type, and never again. I think it is worth paying the boilerplate cost for a more natural interface (assuming the type-checker can handle it).
The longer reply to this is that, as we all know, literals + operators (see what I did there?) are a major bottleneck for type checking. Essentially any new operator overload that takes integer literals will throw some existing expressions that are just under the "too complex" limit over that limit. This is very much not a nice thing to do to users who write manifestly well-formed code; nor is the increase in compilation time. Until there is a dramatic reworking of this area of the compiler, vending additional overloads in the standard library is a big no-go.