Using macros to convert constant single character string to ASCII UInt8 at compile time?

Over the years there have been several threads discussing adding to Swift a terse syntax to be able to quote constant ASCII characters in a way that was easy to store as UInt8. It appears to have been decided to not give up single quotes as a way to quote ASCII characters. Most code paths from String to 8 bits are really verbose or slow.

Is there anything that could be done with the new macro facilities to offer an efficient and terse way to do this conversion? Ideally at compile time?

Maybe make a #characterLiteral(_:) macro?

let a = #characterLiteral("a") // turns into ASCII character, or compiler error if it isn't.

I made a sample implementation.

// declaration:
@freestanding(expression)
public macro characterLiteral(_ scalar: Unicode.Scalar) -> Int8 = #externalMacro(module: "CMacrosMacros", type: "CharacterLiteralMacro")

// implementation:
public struct CharacterLiteralMacro: ExpressionMacro {
    enum Error: Swift.Error {
        case didNotPassStaticStringLiteral
        case didNotPassUnicodeScalar
        case didNotPassASCIIScalar
    }
    
    public static func expansion<Node, Context>(of node: Node, in context: Context) throws -> ExprSyntax where Node : FreestandingMacroExpansionSyntax, Context : MacroExpansionContext {
        guard let literal = node.argumentList.first?.expression.as(StringLiteralExprSyntax.self),
              case let .stringSegment(segment) = literal.segments.first,
              case let .stringSegment(text) = segment.content.tokenKind else {
            throw Error.didNotPassStaticStringLiteral
        }
        
        guard let scalar = UnicodeScalar(text) else {
            throw Error.didNotPassUnicodeScalar
        }
        
        guard let ascii = UInt8(exactly: scalar.value) else {
            throw Error.didNotPassASCIIScalar
        }
        
        return ExprSyntax(IntegerLiteralExprSyntax(integerLiteral: Int(ascii)))
    }
}
2 Likes

Then usage of the macro is just turned into an integer at compile time.

let string: [Int8] = [
  #characterLiteral("h"),
  #characterLiteral("e"),
  #characterLiteral("l"),
  #characterLiteral("l"),
  #characterLiteral("o"),
]

// expands to:
let string: [Int8] = [
    104,
    101,
    108,
    108,
    111,
]
2 Likes

That's exactly what I was hoping would be possible. Thank you!

Awesome. I guess it would be even more helpful if it operated on strings with arbitrary length though… if you don't beat me to it, I'll have a go at that.

Just to clarify, is UInt8(ascii: "x") not sufficient for your use-case?

1 Like

I don't know about the OP, but my personal use case would be to speed up

let string = "Hello, world!"
let uint8: [UInt8] = string.utf8.map { $0 }

by shifting this to the compilation phase.

2 Likes

Yeah, sorry, my reply was mostly to those upthread who were writing what appear to be complex replacements for that function. Mapping an entire string to an Array is definitely something a macro can do.

The benefit to using a macro is that you'd get compile-time validation that the given string actually is ASCII. That means it will be checked earlier, and even if it exists on a path which doesn't happen to be exercised by the author's tests.

That said, the macro should use UInt8(ascii:) internally. (EDIT: No it shouldn't. I forgot that that initialiser traps on failure instead of being optional. In any case, it should check for ASCII).

UInt8(exactly: scalar.value) is not sufficient:

UInt8(exactly: Unicode.Scalar("Β«").value)  // 171, or 0xAB
1 Like

Sure, I don't dispute that. I'm not sure how useful that is though: it only works for compile-time constants, and there just aren't that many compile-time ASCII constants (127). It should be pretty apparent at compile time whether that is going to work or not. But yes, it's definitely true that a macro gives compiler errors that the other function does not. On the other hand, it also has to invoke an entire extra binary to do it, slowing down compile times.

I could see it being more useful for strings rather than individual characters.

But yes, the utility is somewhat limited, especially if it does only support single characters. And macros do have a significant compile-time cost, that is also true.

I am writing a parser which only accepts ASCII and it needs to mark some characters by setting the high bit. That puts unicode right out.

I have code that does this, but it converts at run time and it really bloats up what should be some simple switch statements. I was looking for a compile time solution if such a thing was possible.

To be clear, when compiled with optimizations turned on (-O or -c release) UInt8(ascii:) on a static string optimizes down directly to the integer value.

3 Likes

I've shared a basic implementation of a #utf8 macro in the pitch thread. It would be quite helpful in embedded environments where String with Unicode tables is prohibitively expensive.

You can add this statement to the macro body to make sure it's ASCII.

guard scalar.utf8.count == 1 else {
  throw Error.didNotPassASCIIScalar
}
1 Like

That would be a different macro though, #ascii maybe? I leave that as an exercise to the reader, it feels optional to me as ASCII is a subset of UTF-8 anyway.

It is a subset, but it is an important subset. Knowing that you're dealing with ASCII strings can simplify some textual operations.

Speaking of which, if there were a #utf8 macro, it would be great if we could ensure the text is normalised (NFC, NFD, etc). Having the text in a known normalisation form can also make some processing easier.

Also, just daydreaming about how much "useful" functionality could be added to this...

  • It would be nice if the macro could create any ExpressibleByArrayLiteral type, not just Array.
  • For #utf8, I'd also create a variant which produces CChars in case you're doing C interop on a system with signed char.
  • For #ascii, I'd add a variant which produces wchar_t (hello, Windows users). It can be pretty annoying to create them otherwise.
2 Likes