How to create a enum-like struct macro?

the type:

@frozen public
struct Macrolanguage:Equatable, Hashable, Sendable
{
    public
    var rawValue:UInt16

    @inlinable public
    init(rawValue:UInt16)
    {
        self.rawValue = rawValue
    }
}

the goal:

extension Macrolanguage
{
    @inlinable public static
    var en:Self { .init(rawValue: 0x656E) }

    @inlinable public static
    var es:Self { .init(rawValue: 0x6573) }

    ...
}

what i have so far:

public
protocol RawRepresentableEnumeration<RawValue>:RawRepresentable
{
    init(rawValue:RawValue)
}

@freestanding(declaration, names: arbitrary)
public
macro constructors<Self, RawValue>(_ strings:String...) = #externalMacro(
    module: "InlineValueMacros",
    type: "InlineASCII.ConstructorMacro")
    where   Self:RawRepresentableEnumeration,
            Self.RawValue == RawValue,
            RawValue:FixedWidthInteger,
            RawValue:UnsignedInteger
extension Macrolanguage:RawRepresentableEnumeration
{
    #constructors<Self, UInt16>("en", "es")
}

the problem:

error: circular reference expanding freestanding macro 'constructors'
    #constructors<Self, UInt16>("en")
    ^
note: through reference here
struct Macrolanguage:RawRepresentableEnumeration, Equatable, Hashable, Sendable
       ^
note: through reference here
struct Macrolanguage:RawRepresentableEnumeration, Equatable, Hashable, Sendable
       ^
note: through reference here
struct Macrolanguage:RawRepresentableEnumeration, Equatable, Hashable, Sendable

what am i doing wrong?

Would you mind sharing InlineASCII.ConstructorMacro implementation?

i actually ended up giving up on @freestanding(declaration) macros entirely and switched to an @attached(extension) macro instead.

one thing i really don't like about this is how the list of member names has to appear inline in the attached macro expression.

@RawRepresentableByIntegerEncoding(
    "aa",
    "ab",
    "ae",
    "af",
    "ak",
    "am",
    "an",
    "ar",
    "as",
    "av",
    "ay",
    "az",
    "ba",
    "be",
    "bg",
    "bi",

With an attached macro you have the option of putting the list in a member with a well-known name or attribute of its own, if you think that’s better.

is it required to be in the "main" lexical block; can the list live in an extension?

I’m thinking of something like this:

@SynthesizeRawValues
extension LangCode {
  private static let allRawValuesForMacro: [String] = ["en", "es", …]
}

It’s strictly more work for the compiler, but could be enough better for a human that you prefer it.

i thought the macro attribute had to be applied to the primary declaration block? at least that's what SE-0402 seems to say.

Extension macros do, but you don’t need an extension macro here, just a member macro. Though I could still be missing something; I haven’t worked with macros much.

it needs to be an extension macro, because it requires a conformance to RawRepresentableByIntegerEncoding to be present, otherwise there is no way of knowing if init(rawValue:) is available.

public
protocol RawRepresentableByIntegerEncoding<RawValue>:RawRepresentable
    where RawValue:ExpressibleByIntegerLiteral
{
    init(rawValue:RawValue)
}

@attached(extension, names: arbitrary, conformances: RawRepresentableByIntegerEncoding)
public
macro RawRepresentableByIntegerEncoding(_ strings:String...) = #externalMacro(
    module: "InlineValueMacros",
    type: "InlineASCII.ConstructorMacro")

I don’t think it’s so bad in this case if you generate declarations that then fail to compile. If you restrict it to being an extension macro so you can produce a dedicated error message, you not only require the original declaration, but you also require the declaration of RawRepresentable to be on the original declaration. TIL!

i was actually (pleasantly) surprised to find that extension macros are able to look up type-checked conformances, so the required conformance can live anywhere.

...
    "wa",
    "wo",
    "xh",
    "yi",
    "yo",
    "za",
    "zh",
    "zu")
@frozen public
struct Macrolanguage:Equatable, Hashable, Sendable
{
    public
    var rawValue:UInt16

    @inlinable public
    init(rawValue:UInt16)
    {
        self.rawValue = rawValue
    }
}
extension Macrolanguage:RawRepresentableByIntegerEncoding
{
}

if only the attribute itself could also live on an extension block…

1 Like

Stepping back, I assume you don't want to have the ordinary String backed enum:

enum Macrolanguage: String {
    case en
    case es
    // ...
    var code: UInt16 {
        // get uint16 from rawValue 
    }
}

e.g. because of performance? but is it actually slow or you just assumed it is? Short strings would be stored inline with no heap object, so string to four char code conversion could be really quick. e.g. this code:

    var code: UInt16 {
        let v = rawValue.utf8
        let i = v.startIndex
        let j = v.index(after: v.startIndex)
        return UInt16(v[i]) << 8 + UInt16(v[j])
    }

takes less than 20 CPU instructions to execute start to finish and doesn't involve memory allocations / releases, retain count business or locks:


As another alternative, is it possible to make a macro that converts a string to its fourCharCode integer representation?

enum Macrolanguage: UInt16 {
    case en = #fourCharCode("en")
    case es = #fourCharCode("es")
    // ...
}

Personally I tend to avoid putting things like these into an enum or a enum like struct ("will I ever have a need to use .zh or .zu literals in code? perhaps not"). But maybe it's justified in your case.

no, because the entire 26² codepage is technically assignable (like country codes), and i’d like the type to be able to consistently represent unknown cases.

Can you expand on that? Now you have:

@RawRepresentableByIntegerEncoding(
    "aa",
    "ab",
    "ae",
    ...

How different is that compared to:

enum Macrolanguage: String {
    case aa, ab, ae, ...
}

you can parse one dynamically (for example, from an Accept-Language header), and get a valid Macrolanguage value, but one that doesn’t match any of the pre-defined cases.

Hmm. How about:

enum Macrolanguage {
    case known(KnownLanguage)
    case unknown(String)
}

enum KnownLanguage: String {
    case en
    case es
    case ab
    ...
}

what happens if someone tries to create an unknown("en")?

2 Likes

Good catch. Hide it so they can't?

struct Macrolanguage {
    var storage: Storage
    
    enum Storage {
        case known(KnownLanguage)
        case unknown(String)
    }
    
    init(_ rawValue: String) {
        if let v = KnownLanguage(rawValue: rawValue) {
            storage = .known(v)
        } else {
            storage = .unknown(rawValue)
        }
    }
}

at that point, why not just use the layout (UInt16 ←→ (UInt8, UInt8)) that matches the actual data being modeled?

That's my concern as well... Do you need to literally write: Macrolanguage.zu in your app? In what cases? Maybe you just need to know if the language is "known" or "unknown"?