On the (uneasy) coexistence of String, [UInt8], and ByteBuffer

i have some hex encoding code that’s growing (in my opinion), an absurd amount of boilerplate.

public 
enum Base16 
{
    @inlinable public static 
    func encodeBigEndian<Words>(_ words:Words, as _:String.Type = String.self, 
        by ascii:(UInt8) throws -> UInt8) rethrows -> String
    {
        try .init(unsafeUninitializedCapacity: 2 * MemoryLayout<Words>.size)
        {
            var utf8:UnsafeMutableBufferPointer<UInt8> = $0
            try Self.encodeBigEndian(words, utf8: &utf8, by: ascii)
            return $0.count
        }
    }
    @inlinable public static 
    func encodeBigEndian<Words>(lowercasing words:Words, as _:String.Type = String.self) 
        -> String
    {
        Self.encodeBigEndian(words, by: Self.ascii(lowercasing:))
    }
    @inlinable public static 
    func encodeBigEndian<Words>(uppercasing words:Words, as _:String.Type = String.self) 
        -> String
    {
        Self.encodeBigEndian(words, by: Self.ascii(uppercasing:))
    }
    @inlinable public static 
    func encodeBigEndian<Words>(lowercasing words:Words, as _:[UInt8].Type = [UInt8].self) 
        -> [UInt8]
    {
        Self.encodeBigEndian(words, by: Self.ascii(lowercasing:))
    }
    @inlinable public static 
    func encodeBigEndian<Words>(uppercasing words:Words, as _:[UInt8].Type = [UInt8].self) 
        -> [UInt8]
    {
        Self.encodeBigEndian(words, by: Self.ascii(uppercasing:))
    }
    @inlinable public static 
    func encodeBigEndian<Words>(_ words:Words, as _:[UInt8].Type = [UInt8].self, 
        by ascii:(UInt8) throws -> UInt8) rethrows -> [UInt8]
    {
        try .init(unsafeUninitializedCapacity: 2 * MemoryLayout<Words>.size)
        {
            try Self.encodeBigEndian(words, utf8: &$0, by: ascii)
            $1 = $0.count
        }
    }
    @inlinable public static 
    func encodeBigEndian<UTF8, Words>(lowercasing words:Words, utf8:inout UTF8)
        where UTF8:MutableCollection, UTF8.Element == UInt8
    {
        Self.encodeBigEndian(words, utf8: &utf8, by: Self.ascii(lowercasing:))
    }
    @inlinable public static 
    func encodeBigEndian<UTF8, Words>(uppercasing words:Words, utf8:inout UTF8)
        where UTF8:MutableCollection, UTF8.Element == UInt8
    {
        Self.encodeBigEndian(words, utf8: &utf8, by: Self.ascii(uppercasing:))
    }
    @inlinable public static 
    func encodeBigEndian<UTF8, Words>(_ words:Words, utf8:inout UTF8, 
        by ascii:(UInt8) throws -> UInt8) rethrows
        where UTF8:MutableCollection, UTF8.Element == UInt8
    {
        try withUnsafeBytes(of: words)
        {
            assert(utf8.count == $0.count * 2)
            
            var offset:UTF8.Index = utf8.startIndex
            for byte:UInt8 in $0 
            {
                utf8[offset] = try ascii(byte >> 4)
                utf8.formIndex(after: &offset)
                utf8[offset] = try ascii(byte & 0x0f)
                utf8.formIndex(after: &offset)
            }
        }
    }
}

any tips on how to cut this down to size?

It's a bit tricky to know which of this is boilerplate and which of it is API surface, because all of this is public. Which of these entry points do you consider API and which are boilerplate?

they are all API surface, what bugs me is they all do essentially the same thing: performantly emit UTF-8 data (whose length is known beforehand) as some collection type that can store UTF-8 text, be it String, [UInt8], or ByteBuffer, etc.

Well the ByteBuffer case can follow the [UInt8] case: ByteBuffer itself is not a Collection, but ByteBufferView is, and you can freely transform between the two of them.

More broadly however, you don't have much of a way to cut this down further. There's no protocol for "things that can be initialized directly via raw storage", so you can't write code generic over that. You could define such a protocol yourself and conform String and [UInt8] to it to slightly trim things down.

3 Likes

This is what I would do, FWIW.

1 Like