Hashing Weak Variables

Nickolas_Pohilets · December 4, 2019, 11:09am

I'm trying to make a wrapper (property delegate) that would encapsulate how weak references should be compared and hashed.

Consider this example:

class MyClass { }

struct WeakRef: Hashable {
    weak var ref: MyClass?

    static func == (_ lhs: WeakRef, _ rhs: WeakRef) -> Bool {
        return lhs.ref === rhs.ref
    }

    func hash(into hasher: inout Hasher) {
        hasher.combine(ref.map(ObjectIdentifier.init))
    }

    var hashString: String {
        return String(format:"%016X", self.hashValue)
    }
}

var strongRefs: [MyClass] = [MyClass(), MyClass()]
let ref1 = WeakRef(ref: strongRefs[0])
let ref2 = WeakRef(ref: strongRefs[0])
let ref3 = WeakRef(ref: strongRefs[1])
print(ref1 == ref2, ref2 == ref3)
print(ref1.hashString, ref2.hashString, ref3.hashString)
strongRefs = []
print(ref1 == ref2, ref2 == ref3)
print(ref1.hashString, ref2.hashString, ref3.hashString)

It prints:

true false
00000000F8F86466 00000000F8F86466 00000000FF656E44
true true
00000000FF863A4C 00000000FF863A4C 00000000FF863A4C

My ref1 and ref2 are declared as immutable, so their hashes should not change and their equality relation should not change either. So that is not a correct implementation.

I could store ObjectIdentifier in initializer, but still this might break if one object is deallocated and another one is created at exactly the same address.

Ideally, I would like to store something which is distinct for every object, but maintains identity even after object is deallocated, until last weak reference is deallocated.

Based on mikeash.com: Friday Q&A 2017-09-22: Swift 4 Weak References, a reference to the side table looks exactly like what I need - it is unique per object, and keeps living until there are no weak references left. So even if new object is created at the same address, their side tables still would be distinct.

I was able to come up with this:

class MyClass { }

private struct RefHolder {
    weak var ref: MyClass?
}

struct WeakRef: Hashable {
    var ref: MyClass? {
        get { impl.ref }
        set {
            impl.ref = newValue
            updateSideTablePtr()
        }
    }
    private var sideTablePtr: UnsafeRawPointer?
    private var impl: RefHolder

    init(ref: MyClass?) {
        self.impl = RefHolder(ref: ref)
        self.updateSideTablePtr()
    }

    private mutating func updateSideTablePtr() {
        self.sideTablePtr = withUnsafeBytes(of: &self.impl) { (ptr: UnsafeRawBufferPointer) in
            ptr.bindMemory(to: UnsafeRawPointer?.self)[0]
        }
    }

    static func == (_ lhs: WeakRef, _ rhs: WeakRef) -> Bool {
        return lhs.sideTablePtr == rhs.sideTablePtr
    }

    func hash(into hasher: inout Hasher) {
        hasher.combine(sideTablePtr)
    }
}

But that's obviously relying on private implementation details. Is there a way to implement this using proper public API?

P.S. Weak references to ObjC objects don't have a side table, so the code above effectively behaves as a solution with saved ObjectIdentifier.

glessard · December 4, 2019, 12:25pm

Yes, but you would know that the old valus has been deallocated when the weak reference becomes nil (you save both, clearly). At that point, you can safely clean up your old data and save the new. When i’ve done this my equality method delegates to the ObjectIdentifier and to the identity operator, while the hash value uses only the ObjectIdentifier. That respects the Hashable contract.

Note that when a struct contains a weak reference you delegate some mutability to the reference counting system. An instance of WeakRef that’s declared let indeed has a non-changing weak reference and its memory representation is constant, but every use of the weak reference involves the liveness check on the object, and that can produce the illusion of a change.

Nickolas_Pohilets · December 4, 2019, 1:26pm

That would not work if both of the objects are already deallocated.

Nickolas_Pohilets · December 4, 2019, 1:32pm

That's good, that's exactly what I need. The question is how can I get around this illusion and compare two underlying representations.

Nickolas_Pohilets · December 4, 2019, 2:11pm

I've been think how can this be reliably implemented for ObjC weak references. And I think associated objects is the answer. I can emulate side table behaviour by storing an associated object and using identity of that object as a key.

This approach sounds like a reliable solution for Swift as well, but for Swift it is suboptimal. I would prefer to use associated objects approach as a fallback, and still have a fast path for native Swift reference counting.

glessard · December 4, 2019, 4:54pm

What's the harm, then? Both are deallocated, therefore neither has an identity other than nil

glessard · December 4, 2019, 5:20pm

With apologies to the memory-safety police:

struct Weak<T: AnyObject>
{
  let identifier: ObjectIdentifier
  weak var reference: T?

  init(_ r: T) {
    reference = r
    identifier = ObjectIdentifier(r)
  }
}

class Thing {}

var t = Thing() as Optional
var weak = Weak(t!)

// peek at the memory representation
withUnsafeBytes(of: &weak) {
  for b in $0 { print(String(b, radix: 16), terminator: "") }; print()
}

// deallocate the underlying object
t = nil
assert(weak.reference == nil)

// peek at the memory representation again
withUnsafeBytes(of: &weak) {
  for b in $0 { print(String(b, radix: 16), terminator: "") }; print()
}

You'll get the same 128-bit hex string twice.
In other words, the contents of your variable doesn't change by itself when you don't touch it.

gnuoyd · December 4, 2019, 5:24pm

I've been think how can this be reliably implemented for ObjC weak
references. And I think associated objects is the answer. I can
emulate side table behaviour by storing an associated object and using
identity of that object as a key.

This approach sounds like a reliable solution for Swift as well, but
for Swift it is suboptimal. I would prefer to use associated objects
approach as a fallback, and still have a fast path for native Swift
reference counting.

What if you stick the weak references into a dictionary indexed by
ObjectIdentifer and identified by serial number? Identify each WeakRef
by the serial number corresponding to the referenced object, if any has
already been assigned; otherwise, assign a new serial number. See the
following code, which I haven't even tried to compile:

struct WeakRef {
	private struct ExtantRecord {
		weak var object: MyClass?
		var serialNumber: UInt64
	}
	private static var nextSerialNumber: UInt64 = 0
	private static var extantWeakRefs: Dictionary<ObjectIdentifer,
	    > = [:]

	private var serialNumber: UInt64

	func hash(into hasher: inout Hasher) {
		hasher.combine(l.serialNumber)
	}
	static func ==(_ l: WeakRef, _ r: WeakRef) -> Bool {
		return l.serialNumber == r.serialNumber
	}
	init(object o: MyClass) {
		let oid = ObjectIdentifier(o)

		// TBD synchronize access to extantWeakRefs, nextSerialNumber
		guard let extant = extantWeakRefs[oid], extant.object === o
		    else {
			serialNumber = nextSerialNumber
			nextSerialNumber = nextSerialNumber + 1
			extantWeakRefs[oid] = ExtantRecord(object: o,
			    serialNumber: serialNumber)
			// Collect garbage after every 1024 records.
			if serialNumber % 1024 == 0 {
				collectGarbage()
			}
			return
		}
		serialNumber = extant.serialNumber
	}
	// Remove records of weak refs that have changed to nil. 
	private static func collectGarbage() {
		// TBD synchronize access to extantWeakRefs
		extantWeakRefs = extantWeakRefs.compactMapValues() { extant in
		    (extant.object == nil) ? nil : extant
		}
	}
}

Dave

AlexanderM · December 4, 2019, 7:05pm

IDK how to solve your problem, but side tables aren't the answer. In this context, they have the same issue as the regular objects.

Just as an object could exist, die, and have a new object allocated at its same address (resulting in an object with the same address, but distinct logical identity), so too can a side table be made, die (when all weak refs have been destroyed upon access), and have a new side table allocated at its same address.

Joe_Groff · December 4, 2019, 8:07pm

If you control the class being referenced, you can make it so that it constructs itself with an identifier taken from a monotonic counter, something like this:

var nextFooId = 0

class Foo {
  let id: Int

  init() {
    // You may need synchronization here if objects can be inited on
    // multiple threads
    id = nextFooId
    nextFooId += 1
  }
}

If you use that id as your hash for the weak reference, it should then be stable against reallocation in the same memory address.

AlexanderM · December 4, 2019, 8:16pm

TBH I would synchronize that in any case. If there's no threading, there won't be much contention, thus not much performance impact.

The annoyance of not synchronizing it, and unintentionally initialization off-thread (such as from a call back to a network call) would be so grave, I would just nip it in the bud. My $0.02

Joe_Groff · December 4, 2019, 8:25pm

Sure, I mostly left the synchronization out for clarity.

Nickolas_Pohilets · December 5, 2019, 4:19pm

You code still does not work if new object allocated at the same address - ObjectIdentifiers will be equal, old serial number will be taken from the table. You need somehow the clear the table when object deallocated. That's exactly what associated objects do, but in much less lines of code. Instead of serial number I'm using an identity of the allocated NSObject(), which is retained by the WeakRef

Nickolas_Pohilets · December 5, 2019, 4:19pm

I don't. I'm trying to build a generic non-invasive solution solution, which would work for any Swift or ObjC class.

Nickolas_Pohilets · December 5, 2019, 4:25pm

Dying when all the weak refs have been destroyed would be exactly what I need.
But turns out, things are more complicated. When weak refs are copied, I was expecting them to copy reference to the side table and increment the number of the weak references. But for already dead objects that does not happen (always). It does not reproduce for local variables, but reproduced when I put my WeakRef into a map and then tried to read it back. Not sure if it is related to VWT or something else.

gnuoyd · December 5, 2019, 4:33pm

Sorry, I should have explained my intention better: ObjectIdentifiers
may be equal, but the object member of the ExtantRecord will turn to
nil when the first object is reclaimed. The guard tests for that
condition so that the serial number is not reused in that case.

Dave

Nickolas_Pohilets · December 5, 2019, 4:45pm

Yes, correct. That would work. But still associated objects do the same, in less lines of code.

Nickolas_Pohilets · December 5, 2019, 4:53pm

So, I'm pretty happy with associated objects as an implementation, and moving on to dealing with type checker - Generic constraint for "weakly referencable"

Joe_Groff · December 5, 2019, 5:03pm

Note that associated objects would only work for Objective-C classes, on Apple platforms. Swift classes without ObjC ancestry do not support associated objects.

Nickolas_Pohilets · December 5, 2019, 9:01pm

Good catch! We are not planning to supporting non-Apple platforms in the foreseeable future, but if that ever will be case, we can use something similar to @gnuoyd's solution as a fallback.