On the (uneasy) coexistence of String, [UInt8], and ByteBuffer

taylorswift · July 12, 2022, 7:36pm

i have some hex encoding code that’s growing (in my opinion), an absurd amount of boilerplate.

public 
enum Base16 
{
    @inlinable public static 
    func encodeBigEndian<Words>(_ words:Words, as _:String.Type = String.self, 
        by ascii:(UInt8) throws -> UInt8) rethrows -> String
    {
        try .init(unsafeUninitializedCapacity: 2 * MemoryLayout<Words>.size)
        {
            var utf8:UnsafeMutableBufferPointer<UInt8> = $0
            try Self.encodeBigEndian(words, utf8: &utf8, by: ascii)
            return $0.count
        }
    }

    @inlinable public static 
    func encodeBigEndian<Words>(lowercasing words:Words, as _:String.Type = String.self) 
        -> String
    {
        Self.encodeBigEndian(words, by: Self.ascii(lowercasing:))
    }
    @inlinable public static 
    func encodeBigEndian<Words>(uppercasing words:Words, as _:String.Type = String.self) 
        -> String
    {
        Self.encodeBigEndian(words, by: Self.ascii(uppercasing:))
    }

    @inlinable public static 
    func encodeBigEndian<Words>(lowercasing words:Words, as _:[UInt8].Type = [UInt8].self) 
        -> [UInt8]
    {
        Self.encodeBigEndian(words, by: Self.ascii(lowercasing:))
    }
    @inlinable public static 
    func encodeBigEndian<Words>(uppercasing words:Words, as _:[UInt8].Type = [UInt8].self) 
        -> [UInt8]
    {
        Self.encodeBigEndian(words, by: Self.ascii(uppercasing:))
    }

    @inlinable public static 
    func encodeBigEndian<Words>(_ words:Words, as _:[UInt8].Type = [UInt8].self, 
        by ascii:(UInt8) throws -> UInt8) rethrows -> [UInt8]
    {
        try .init(unsafeUninitializedCapacity: 2 * MemoryLayout<Words>.size)
        {
            try Self.encodeBigEndian(words, utf8: &$0, by: ascii)
            $1 = $0.count
        }
    }

    @inlinable public static 
    func encodeBigEndian<UTF8, Words>(lowercasing words:Words, utf8:inout UTF8)
        where UTF8:MutableCollection, UTF8.Element == UInt8
    {
        Self.encodeBigEndian(words, utf8: &utf8, by: Self.ascii(lowercasing:))
    }
    @inlinable public static 
    func encodeBigEndian<UTF8, Words>(uppercasing words:Words, utf8:inout UTF8)
        where UTF8:MutableCollection, UTF8.Element == UInt8
    {
        Self.encodeBigEndian(words, utf8: &utf8, by: Self.ascii(uppercasing:))
    }

    @inlinable public static 
    func encodeBigEndian<UTF8, Words>(_ words:Words, utf8:inout UTF8, 
        by ascii:(UInt8) throws -> UInt8) rethrows
        where UTF8:MutableCollection, UTF8.Element == UInt8
    {
        try withUnsafeBytes(of: words)
        {
            assert(utf8.count == $0.count * 2)
            
            var offset:UTF8.Index = utf8.startIndex
            for byte:UInt8 in $0 
            {
                utf8[offset] = try ascii(byte >> 4)
                utf8.formIndex(after: &offset)
                utf8[offset] = try ascii(byte & 0x0f)
                utf8.formIndex(after: &offset)
            }
        }
    }
}

any tips on how to cut this down to size?

lukasa · July 14, 2022, 11:14am

It's a bit tricky to know which of this is boilerplate and which of it is API surface, because all of this is public. Which of these entry points do you consider API and which are boilerplate?

taylorswift · July 14, 2022, 6:57pm

they are all API surface, what bugs me is they all do essentially the same thing: performantly emit UTF-8 data (whose length is known beforehand) as some collection type that can store UTF-8 text, be it String, [UInt8], or ByteBuffer, etc.

lukasa · July 16, 2022, 1:25pm

Well the ByteBuffer case can follow the [UInt8] case: ByteBuffer itself is not a Collection, but ByteBufferView is, and you can freely transform between the two of them.

More broadly however, you don't have much of a way to cut this down further. There's no protocol for "things that can be initialized directly via raw storage", so you can't write code generic over that. You could define such a protocol yourself and conform String and [UInt8] to it to slightly trim things down.

scanon · July 17, 2022, 8:16pm

This is what I would do, FWIW.