Idea: Bytes Literal

I’m definitely in favor of adding more support around binary data that contains embedded text. Beyond that, though, the biggest improvement for me for working with binary data would be converging on [Prototype] Protocol-powered generic trimming, searching, splitting, – those are the utilities I add to Data whenever I work with byte formats. And I do use Data—it’s the type whose API is designed for aggregate byte work, even if it’s not quite the API I’d design. I’m not sure it’s worth adding another Bytes type to the Swift project.

How about Byte? C has the problem that char is used as the basis for uint8_t, the element type for string literals, and as the basis for raw byte manipulation through pointer casts, but Swift doesn’t have that problem. I don’t really know what you’d do with an individual Byte, and when you want to manipulate memory opaquely you use UnsafeRawPointer.

However, I have frequently wanted to use a literal string as a search term when scanning a Data. Data("foo".utf8) (or Array("foo".utf8)) isn’t terribly complicated, but it doesn’t handle the case you mention of “strings” containing non-UTF-8 bytes. So an ExpressibleByByteStringLiteral could be useful—and in fact, since every valid Unicode string is a valid byte string, one approach would be to not invent any new syntax, but to have ExpressibleByByteStringLiteral refine ExpressibleByStringLiteral, with the sole addition of the \xFF escape being valid. I would then argue that the one type in the stdlib to natively implement it would be UnsafeRawBufferPointer, with any other implementations (such as Data’s) built on top of that. It’s a little different from the other literal protocols, but I think it makes sense in practice, and it means #include() or whatever becomes a string literal, either a normal one or a byte string depending on the contents of the file.

This doesn’t solve ASCII byte literals. As much as I want to be Unicode-correct, I’m inclined to say Chris’s single-quote syntax is a way forward, as an ExpressibleByByteLiteral that only UInt8 and maybe Int8 conform to by default. I wouldn’t want to use single quotes for byte strings because it feels too subtle in the end (too easy to use the wrong one and get a weird error, or miss checking that you thought you’d get), but using it for individual bytes is well-precedented by C. (I don’t remember why the previous character literals proposal got stuck, but calling them “byte literals” helps some, at least for me.)

And if we want to save the single quote for other uses, we don’t actually need the close-quote. 'a, '\n…though '\' does look a little weird. :-)

8 Likes