[Pitch] Add Equatable and Hashable conformance to String views

[Pitch] Add Equatable and Hashable Conformance to String views

  • Proposal: SE-NNNN
  • Authors: Shantini Vyas, David Smith
  • Review Manager: TBD
  • Status: Implementation Blocked
  • Implementation: #42237

Important Note

Existing apps may have already added these conformances in extensions, and Swift does not currently provide a compatible way for the stdlib to add conformances that are duplicates of existing ones from extensions. This proposal is conditional on a solution to this problem being added to the language, and is on hold until then. See the Source Compatibility section for more details.

Glossary

  • String Viewis not a Swift type per se, but is a helpful catch-all for describing a string or substring's unicode representation, for example String.UTF8View. It will be used here to refer to:

    • String.UTF8View
    • String.UTF16View
    • String.UnicodeScalarView
    • Substring.UTF8View
    • Substring.UTF16View
    • Substring.UnicodeScalarView

Introduction and Motivation

String views provide useful functionality to String and Substring. We propose expanding that functionality by making them Equatable and Hashable.

Adding an Equatable conformance would allow one to directly compare String views and determine if they have the same value.

let isEqual: Bool = "Dog".UTF8View == "Cat".UTF8View

Swift's Dictionary and Set enable fast, safe indexing and lookup. We propose adding String Views as types that are available for use as Dictionary keys or Set members.

Example usage:

var myDictionary: [String.UTF8View : Int]
var mySet: [String.UTF8View]

The original suggestion for these conformances can be found here.

Detailed design

Adding Hashable and Equatable conformance to these types is straightforward. Consider the following implementation for String.UTF8View:

extension String.UTF8View {
  @_alwaysEmitIntoClient
  public static func ==(lhs: Self, rhs: Self) -> Bool {
    lhs.elementsEqual(rhs)
  }
  
  @_alwaysEmitIntoClient
  public func hash(into hasher: inout Hasher) {
    hasher.combine(0xFF as UInt8)
    for element in self {
      hasher.combine(element)
    }
  }

  @_alwaysEmitIntoClient
  public var hashValue: Int {
    var hasher = Hasher()
    self.hash(into: &hasher)
    return hasher.finalize()
  }
}

@available(SwiftStdlib 5.7, *)
extension String.UTF8View: Equatable {}

@available(SwiftStdlib 5.7, *)
extension String.UTF8View: Hashable {}

The Equatable conformance leverages the already-optimized elementsEqual() function for collections. Since the String views are all collections, this will work just fine.

The Hashable conformance also leverages collection functionality and uses the collection's contents and a terminator to generate a unique hash, consistent with other Standard Library collection types.

The implementation above does not work as of writing (Swift 5.7):

  • The above causes issues with retroactive conformances to the same. This is further discussed in the compatibility section below.

Source Compatibility, Effect on ABI stability, and Effect on API resilience

While this is an additive change, and as such would normally not impact ABI or API compatibility, this proposal runs into an unsolved issue in the language and cannot currently be made compatible; hence its unusual conditional status.

Specifically, there's two separate potential compatibility issues caused by "retroactive" conformances added by existing code that uses the standard library:

At compile time, the addition of == to types that didn't previously conform to Equatable can cause compilation failures due to the type checker being unable to determine which definition of == to use. Normally this would be resolved by the standard library's definition being shadowed by the client's definition, but because one definition may be inside an extension while the other is not, this ends up not happening in practice in some cases.

At runtime, we run into a related issue. Any dynamic dispatch of Equatable or Hashable methods must pick an implementation to call. When only a single conformance is present this is easy: use the one from the conformance. When two or more conformances are present, this is impossible to do safely in Swift as it is today.

  1. Assume the existence of a standard library client that has added a conformance to Equatable or Hashable that has unusual semantics in some way (this is relatively unlikely in any individual case, but the universe of software is large).
  2. If we pick the conformance from the standard library, the client loses its special semantics. Presumably it had those semantics for a reason, and will now behave incorrectly.
  3. If we pick the conformance from the client, then any uses in the standard library itself (or other libraries the unusual client links) lose their non-special semantics, and may behave incorrectly.
  4. It's unclear what "pick both" or "pick neither" would actually mean

The issues described here are very general, applicable to essentially any attempt to add new conformances to existing protocols to the standard library. Given that, we can reasonably expect that there will be a serious effort to find a solution in the future, at which point this proposal can be finalized.

Alternatives Considered

In this bug report and linked discussions, it was raised that String views (see Glossary) would additionally benefit from conformance to Codable.

Codable conformance facilitates easy encoding and decoding between Swift objects and data sources. For a struct or enum to be Codable, all of its properties must also be Codable, hence the desire to add that conformance to String views. However, it is unclear what the expected behavior of the String views would be in this case, and we have chosen to omit Codable conformance from this proposal for simplicity.

Acknowledgements

  • Karoy Lorentey
  • Holly Borla
  • Steve Canon
  • Alex Akers
24 Likes

This would be great! :clap:

One thing the pitch could mention (even though it might be obvious to some) is that string view equality can be different from string equality, since it directly compares the two strings' Unicode code units instead of using canonical equivalence. That is, two strings that have composed and and decomposed versions of the same character compare as equal, but their string views do not:

let a = "Beyoncé"
let b = "Beyonce\u{301}"

a == b           // true
a.utf8 == b.utf8 // false

This isn't a problem at all (and in fact is a source of better comparison performance), just probably worth noting.

12 Likes

Yes, that's a good call out, thanks!

1 Like

I'm very excited to see this land! I've bumped into this a number of times, and using elementsEqual to work around it has always been a bit awkward for me, so I'm looking forward to seeing it.

1 Like

+1. Great write up!

I’m interested in hearing more about reasoning or uncertainty behind Codable. Equatable/Hashable establishes substitutability for views based on binary semantics (I.e. element-wise). The natural question is whether we can serialize under binary semantics as well (I.e. serialize the elements).

By binary semantics I mean we’re not using Unicode Canonical Equivalence / we’re not normalizing the contents.

5 Likes

I don't know if @shantini shares the same thoughts or concerns that I do, but a few of my own open questions from the thread that produced the GH issue:

Specifically, there are some questions about compatibility with various formats (what does it mean to encode a UTF-16 view into a format that only supports UTF-8 strings?) and ownership (what does it mean semantically to try to decode a String view?). I won't rehash the contents here (though there isn't much more to them) but I think there are some interesting things worth exploring here!

1 Like

@itaiferber thanks for answering!! Your original post summarized the concerns really well - aside from implementation questions there are some ideological ones on what it means for a String view to be Codable.

4 Likes

Should lexicographicallyPrecedes(_:) be used for Comparable conformances?

Two string views that compare elementwise equal are views of strings that are themselves equal; the converse is not true, as discussed above, but that's arguably a feature, as @nnnnnnnn points out.

The relative ordering of two unequal string views does not, to my knowledge, tell you anything about the ordering of the two strings. What would such comparisons be used for, and are the use cases which are not better served by ordering under Unicode-aware rules common enough that the benefits of providing for them in the standard library would outweigh the risk that it will cause users to unwittingly confuse Unicode and non-Unicode orderings?

4 Likes

Out of curiosity, why are all the methods in the illustrated implementation marked @_alwaysEmitIntoClient?

Section 5.17, Binary Order in the Unicode Standard:

When comparing text that is visible to end users, a correct linguistic sort should be used, as described in Section 5.16, Sorting and Searching. However, in many circumstances the only requirement is for a fast, well-defined ordering. In such cases, a binary ordering can be used.

Not all encoding forms of Unicode have the same binary order. UTF-8 and UTF-32 data, and UTF-16 data containing only BMP characters, sort in code point order, whereas UTF-16 data containing a mix of BMP and supplementary characters does not.

(There is sample code for sorting UTF-16 data in code point order.)

I don't know if we'd want binary ordering in the Standard Library, but it wouldn't be confusing, because String views are for working at the level of code points or code units.

I'm not sure what "it" refers to in your statement that "it wouldn't be confusing." Currently, the standard library provides "a [singular] fast, well-defined ordering" for locale-independent comparison of strings.

Having more than one ordering depending on which view of a string one sorts on—even if each ordering is fast and on its own well-defined—would indeed be confusing. Therefore, the questions I asked: what use cases are there for these other orderings, and do those use cases outweigh the significant cons of having more than one ordering?

3 Likes