SE-0261: Identifiable Protocol

masters3d · July 6, 2019, 3:48pm

Something being hashable allows us to compare between objects and value types with out explicitly declaring equality of every conforming type. Loosing that ability to just check the hash to see if the objects and values are equal just weakens the reasoning to have it in the standard library.

Jean-Daniel · July 6, 2019, 5:09pm

Checking the hash tell you if objects are different, not if they are equals. Hash equality don't imply object equality.

That's why Hashable inherits from Equatable. You must be able to compare the objects to check if they are equals when hashes are equal.

koher · July 6, 2019, 6:12pm

How about a case like that we design an ID like this? The ID has to be Comparable and doesn't have to be Hashable. I don't want to think about what is the best implementation of hash(into:) for the ID just to silence the compiler. Of course, we can add : Hashable to the ID to synthesize hash(into:) automatically. However, when we provide a library which includes the ID and users of the library also want to use the ID as keys of hash tables, they cannot provide the best implementation of hash(into:) for the ID instead of the automatically synthesized one.

It may seem a corner case. But we cannot foresee everything. I think it is more deliberate to chose the minimal one which we really need.

xwu · July 6, 2019, 6:27pm

I don't understand. The IDs mentioned in this link are 64-bit integers.

jawbroken · July 6, 2019, 6:27pm

The Instagram link is just a 64-bit number that could trivially be Hashable. Sure, you can always say that almost no protocol should inherit from any other, because users can just require conformance to both protocols in a typealias or generic signature (e.g. Hashable itself doesn't technically need to imply Equatable, you could make everyone write Equatable & Hashable) but that puts an extra burden on users. So you need to weigh up the complexity of using the protocol vs the complexity of conforming to it. In this case, I don't feel like Hashable conformance is a significant burden on top of Equatable conformance, especially because both can be synthesised in basically the same way, and it greatly increases the flexibility of the ID.

anandabits · July 6, 2019, 8:21pm

This will not change. Struct instances themselves will continue to not have identity. What Identifiable does is recognize that many struct instances represent a snapshot of the state of an entity which does have a persistent identity. The id property correlates the snapshot with the entity and allows it to be distinguished from snapshots of state of other entities.

This makes no sense whatsoever to me. The identity provided for by Identifiable often will not be object identity. Several commenters have suggested dropping the defaultl where Self: AnyObject in order to emphasize that this protocol is not strictly about object identity.

Agree

This is an interesting possible direction.

This proposal intentionally does not address this problem. There are a number of ways to approach the problem. All of the ways I'm aware of are compatible with the proposal in its current form.

The reason is that Hashable is very commonly useful when working with identity values and it is difficult to imagine a type that can be used as an identifier for which a Hashable conformance is an onerous requirement. Placing the requirement on the protocol eliminates the need for writing out the constraint at a lot of usage sites.

Comparable can be useful on identifiers for technical purposes such as tree storage. But identifiers usually do not have any inherent semantically meaningful notion of ordering. For this reason it is reasonable to omit a Comparable conformance on an identifier type if it is known that Comparable is not necessary (for use in trees, etc).

The associated type is essential. Different identifier types are necessary in different contexts. In particular, many people prefer to use strongly typed identifiers which prevent incorrect use of for example an ID<Person> where an ID<Company> is required. I have seen production bugs caused by this kind of accidental misuse.

The point you raise about existentials is a well known language limitation. There has been enough discussion about lifting it recently that I am optimistic it will happen before too long (even if it takes more time to fully flesh out constraints on existentials). Instead of changing the design of the protocol we should focus on lifting the language limitation so that the protocol would be viable for Combine.

There is no way for a protocol to prevent this. If it were possible to define a protocol requirement let id: ID that would have been considered but unfortunately it isn't (yet).

As soon as the above mentioned language limitation around existentials is lifted this will no longer be an issue. Is there anything you can do now that would position Combine to adopt the protocol in the future when the language limitation is lifted (or at least leave that door open)?

The code sample you posted above isn't actually relying on existentials - it uses a generic constraint. If that is representative of what you need to do, maybe you could do this:

// or extension Foo, and possibly private in either case
func hasBeenSeen<F: Foo>(_ item: F) -> Bool {
   let id = AnyHashable(item.id)
   guard !seenItems.contains(id) else {
      return true
   }
   seenItems.insert(id)
   return false
}

It's possible that this wouldn't meet your performance requirements or that you really do need an existential elsewhere. But at least in this example you're not facing a hard limit in the type system.

What is the reason this signature chooses to use an existential instead of generics?

xwu · July 7, 2019, 12:21am

This review has been supremely useful in clarifying the semantics of the proposed protocol. It's not enough to say that the protocol is "not strictly" about object identity; as @Karl clarifies, it's in fact strictly not about object identity. Unless I'm again mistaken, the only type for which identity as defined by this protocol would be semantically coincident with object identity would be one where the state that's modeled is machine memory itself.

The reason that I (and I'm guessing others) have misunderstood the proposal is due to the proposed default implementation. By declaring that the default identity for the purposes of Identifiable is the object identifier for all reference types, the proposal yokes the two concepts together. Your statement that "the identity provided for by Identifiable often will not be object identity" is entirely a repudiation of that default implementation; the two simply cannot be reconciled.

koher · July 7, 2019, 1:10am

anandabits:

koher:

Why must ID be Hashable ? I think Equatable is enough for identification.

The reason is that Hashable is very commonly useful when working with identity values and it is difficult to imagine a type that can be used as an identifier for which a Hashable conformance is an onerous requirement. Placing the requirement on the protocol eliminates the need for writing out the constraint at a lot of usage sites.

koher:

DBs often use trees for indices instead of hash tables. Don't we want to make ID Comparable instead of Hashable in such cases

Comparable can be useful on identifiers for technical purposes such as tree storage. But identifiers usually do not have any inherent semantically meaningful notion of ordering. For this reason it is reasonable to omit a Comparable conformance on an identifier type if it is known that Comparable is not necessary (for use in trees, etc).

My opinion is not to add Comparable to the requirement of ID. I think it is also a technical purpose that we want ID be Hashable. We want it because we have Dictionary in the standard library which is implemented using hash tables and common in Swift now. If it was common in Swift to use dictionaries implemented using trees, like Haskell, would you propose that ID should have the requirement to conform to Comparable for convenience?

koher · July 7, 2019, 1:52am

What I wanted to mean was

The IDs have encoded information into 64-bit data.
The data have different characteristics from 64-bit integers.
So the IDs have different optimal implementation for uniformity.
Just adding : Hashable does not necessarily synthesize the optimal one.
In some cases like one I referred to, adding : Hashable to silence the compiler prevent to provide the optimal implementation for users.

In practice, I think handling the IDs as 64-integers and adding : Hashable to synthesize hash(into:) automatically works enough. But if ID does not have the Hashable requirement, we don't have to care it anyway. As I mentioned first that I had no concrete ideas of cases that the Hashable requirement became a problem in practice, I know it is an awkward example. But I am not sure if there are really no cases that adding Hashable becomes a problem because I can't foresee everything.

It is a kind of minimalism. I don't want to make types conform to needless Hashable. I prefer being minimal because it may cause some problems in the future in long term. Identifiable.ID in SwiftUI is OK. But the standard library is universal. I want it to be kept minimal.

Thank you all to keep trying to understand what I think from my poor English

anandabits · July 7, 2019, 2:26am

If Comparable was a pervasive constraint used for purely technical reasons in Swift then that would certainly be a consideration. But it isn’t so that’s a hypothetical.

One important distinction is that Hashable is always about technical concerns where Comparable often has semantic meaning in the non technical domains that are represented by our data.

From the perspective of minimalism, if you don’t need Identifiable you will not be adding that conformance either. If you do need it there must be a reason. What generic code do you want to write with an Identifiable constraint where the ID is only required to conform to Equatable but not Hashable (or Comparable if you replace hash tables with trees)?

koher · July 7, 2019, 10:36am

How about the following example?

protocol Identifiable {
    associatedtype ID: Equatable
    var id: ID { get }
}

struct Table<Value> {
    ...
}

// when the table has a primary key
extension Table where Value: Identifiable, Value.ID: Comparable {
    func selectedValue(wherePrimaryKeyIs key: Value.ID) -> Value? {
        ...
    }
    ...
}

I agree with it. So associatedtype ID: Hashable is permissible for me while associatedtype ID: Comparable is not although I prefer associatedtype ID: Equatable because Identifiable sematically means just that values can be identified by their ids and equality of ids is enough for the semantics.

anandabits · July 7, 2019, 12:56pm

Your example shows a signature but not an implementation. How do you plan to provide an efficient implementation using only Equatable?

koher · July 7, 2019, 2:03pm

I didn't intend to provide an efficient implementation using only Equatable. I mean it is good to make it possible to choose an appropriate one, Hashable, Comparable or other one, depending on a case.

I intended to use trees for an efficient implementation in the case I showed as implied in the requirement Value.ID: Comparable. I chose it because the Table type represents a table of databases whose indices are usually implemented using a kind of trees.

Also it is possible to think about an example of more flexible usages of Identifiable for tables (some database systems permit to use hash tables for indices instead of trees).

protocol Identifiable {
    associatedtype ID: Equatable
    var id: ID { get }
}

protocol TableProtocol {
    associatedtype Value: Identifiable
    func selectedValue(wherePrimaryKeyIs key: Value.ID) -> Value?
    ...
}

struct Table<Value: Identifiable>: TableProtocol where Value.ID: Comparable {
    func selectedValue(wherePrimaryKeyIs key: Value.ID) -> Value? {
        ...
    }
    ...
}

struct HashTable<Value: Identifiable>: TableProtocol where Value.ID: Hashable {
    func selectedValue(wherePrimaryKeyIs key: Value.ID) -> Value? {
        ...
    }
    ...
}

struct SingletonTable<Value: Identifiable>: TableProtocol {
    private var value: Value
    
    init(value: Value) {
        self.value = value
    }

    func selectedValue(wherePrimaryKeyIs key: Value.ID) -> Value? {
        guard value.id == key else { return nil }
        return value
    }
    ...
}

Neither Hashable nor Comparable are required for the last one, SingletonTable.

anandabits · July 7, 2019, 2:29pm

I see, I must have misread your post. I still think the Hashable constraint is warranted.

No significant downsides to including it have been articulated. The closest example is a not-necessarily-optimal Hashable conformance (usually synthesized by the compiler) that may not be used. The downside to omitting Hashable is having to write out an additional constraint at many usage sites. This can make signatures more difficult to understand, especially for programmers who are less familiar with generics.

On balance, I think the benefit of including it outweighs the relatively minimal cost.

Karl · July 7, 2019, 5:32pm

The thing about this is that if two live objects have the same reference identity/memory address (by ===), they must also be considered substitutable (by Equatable's ==). Swift does not allow any kind of funky aliasing that could lead to two references to the same address being considered different by Equatable's semantics.

In other words, if a === b, then a == b in all valid programs.

Meanwhile, the purpose of this protocol is that two objects that are not considered substitutable (i.e a == b might be false) may have the same record identity (i.e. a ==== b is true).

In other words, if a ==== b, then a == b may or may not be true.

A record identity operator might be a useful shorthand, but it also has the potential to be confusing given this difference.

cal · July 7, 2019, 5:54pm

This is a great addition to the standard library!

I think id for the variable name is a practical tradeoff between clarity and brevity. There is so much prior art that it shouldn't be ambiguous, and it would be used quite often in code that uses this protocol.

I do agree, though, that the associatedtype should be spelled Identifier. The associatedtype itself will be referenced sever-orders-of-magnitude less often than the id property, so it seems less important to prefer brevity.

Perhaps:

protocol Identifiable {
    associatedtype Identifier: Hashable
    var id: Identifier { get }
}

Karl · July 7, 2019, 6:27pm

Philippe_Hausler:

There is one gotcha that seems a bit cagey. The protocol allows the id to potentially be re-assigned or re-generated. Consider the following usage:
struct Contact: Identifiable {
    var id: Int { generateID() }
    var name: String
}
That would mean that any access to id would return a new generated identifier. This would probably be really bad. Furthermore the protocol allows the var to be assignable too (which has the same failure mode as returning a hash). These objections should not be considered as a blocking type of objection but more-so something that should be considered imho.

FWIW, CoreData does this. Create a new instance of an entity, and the objectID will be some temporary ID. Save the context, and that object now has a new, permanent objectID.

Actually, while testing this out I noticed CoreData has some fascinating behaviour in this area. Check this out:

CoreData example

import CoreData

let url = NSURL(fileURLWithPath: NSTemporaryDirectory(), isDirectory: true).appendingPathComponent("testDB.sqlite")!
print(url)

let stack = try! CDStack(url: url)
try! stack.doTest()

class CDStack {
  let coord: NSPersistentStoreCoordinator
  let moc: NSManagedObjectContext

  init(url: URL) throws {
    let model = NSManagedObjectModel()
    do {
      let ent  = NSEntityDescription()
      ent.name = "MyEntity"
      do {
        let valAttr = NSAttributeDescription()
        valAttr.attributeType = .floatAttributeType
        valAttr.name = "myProp"
        ent.properties = [valAttr]
      }
      model.entities = [ent]
    }
    coord = NSPersistentStoreCoordinator(managedObjectModel: model)
    try coord.addPersistentStore(ofType: NSSQLiteStoreType, configurationName: nil, at: url, options: nil)

    moc = NSManagedObjectContext(concurrencyType: .mainQueueConcurrencyType)
    moc.persistentStoreCoordinator = coord
  }

  func doTest() throws {
    func printID(id: NSManagedObjectID) {
      print("ID: \(id.uriRepresentation()) isTemp=\(id.isTemporaryID)")
    }

    let newObj = NSEntityDescription.insertNewObject(forEntityName: "MyEntity", into: moc)
    newObj.setValue(NSNumber(floatLiteral: 3.141), forKey: "myProp")

    // Print the object's ID. Should be temporary.
    let tempID = newObj.objectID
    printID(id: tempID)
    assert(tempID.isTemporaryID)
    // Save the MOC.
    try moc.save()
    print("Saved ✌️")
    // Print the ID we got before the save. Should still be temporary.
    printID(id: tempID)
    assert(tempID.isTemporaryID)
    // Print the object's ID. Should be non-temporary.
    printID(id: newObj.objectID)
    assert(newObj.objectID.isTemporaryID == false)
    assert(tempID != newObj.objectID, "objectID should have changed")
    assert(tempID.isEqual(to: newObj.objectID) == false, "objectID should have changed")

    // But we can still fetch using the old (temporary) ID.
    let fetchedTemp = moc.object(with: tempID)
    let fetchedPerm = moc.object(with: newObj.objectID)
    assert(fetchedTemp.objectID == tempID)
    assert(fetchedPerm.objectID == newObj.objectID)
    // Prints two different objects! '===' returns false! 
    print(fetchedTemp, fetchedPerm, fetchedTemp === fetchedPerm)
  }
}

So after the context save, tempID remains unchanged (i.e. essentially an immutable object), but newObj.objectID now returns a different (permanent) ID. That said, we can still query the DB using the temporary ID, and if we do that we get some other object (not newObj).

I'm sure there are reasons why they did it this way, and I think it shows that there are use-cases for reassigning an object's ID.

michelf · July 7, 2019, 6:59pm

We could even go further with the renaming to avoid some confusion:

protocol IdentifiableContent {
    associatedtype ContentIdentifier: Hashable
    var id: ContentIdentifier { get }
}

That should make it clear this isn't about object identity.

gwendal.roue · July 7, 2019, 7:35pm

Hello,

I wish the identifier would not be called id or ID, but something much longer like identifier or identity.

The reason why I think id is a bad fit for this protocol is because types that will confirm it will often have existing identifiers which are not easy to rename, and will clash with the protocol: uid, uuid, fooId, _id. Yeah, real world plain data objects that map database records or JSON objects can have such properties.

We do not want the protocol to "mess" with those conventions, and introduce confusion.

My point of view is that identity is a very good name for our purpose.

gwendal.roue · July 7, 2019, 7:43pm

I would like to further push for identity because this word matches the purpose of the protocol. Identity is what does not change when objects change. It is the only one constant quality of an object, stable and distinct from other objects of its kind.