Why don't string views (`UTF8View`, etc.), conform to `Codable`, `Hashable`, etc.?

stephencelis · November 30, 2021, 2:42pm

While working with generic code that can be conditionally codable and hashable I ran into this limitation. Is there any reason why these types can't be extended with codability and hashability? Seems that they could adopt most conformances from String (with the exception of RangeReplaceableCollection for some views).

kylemacomber · November 30, 2021, 2:47pm

Ya those are oversights. Now that we have the ability to add protocol conformances with availability we need to audit the stdlib for all these missing conformances.

stephencelis · November 30, 2021, 3:05pm

Thanks! Opened a bug to track here: [SR-15534] String views should be codable, hashable · Issue #57837 · apple/swift · GitHub

itaiferber · November 30, 2021, 5:54pm

Regarding String views/Substrings and Codable specifically, I suspect some amount of discussion and exploration will need to happen around deciding what behavior is expected regarding their encoding format — specifically, if there's a mismatch between the view type and the expected text encoding of the Codable format itself. For example, some formats might be constrained to UTF-8, so, e.g., String.UTF16View might need some indirect representation.

The naive implementation for these views (encoding into an UnkeyedContainer as integers) would take care of this, but some might be surprised to see a String.UTF8View encode as an array of integers in a UTF-8 compatible format (or a UTF16View in a UTF-16 compatible format)
- It's possible to expect that some encoders would special-case certain views based on their known output format, but it's not something you can necessarily rely on
Another possibility is that views don't encode themselves directly, but pass off their underlying String to the encoder and allow the encoder to grab whatever representation it can best deal with (effectively transcoding). This could work, but there's also some potential for surprise (encoding a UTF-16 view but getting UTF-8 data out)
- This could potentially be compatible with what Substring does — one "easy" Codable conformance for Substring and its views could be to create Strings out of them [potentially made more performant by wrapping up the Substring in some way, rather than allocating a new buffer] and encoding those

There's also a bit of a philosophical question regarding decoding — specifically, it might feel a little bit funny to request decoding a Substring or a view ("a substring of what? who owns the original?"). The way String/Substring/views are written, this isn't actually a problem at all, but there may be a semantic question here of might be considered semantically meaningful. (In terms of implementation, at least, it's pretty trivial to have these types decode a String and initialize from that String directly.)

Either way, I think it's worth considering and seeing what feels right!