Adding more data structures to the standard library

GLanza · May 7, 2019, 7:10am

just had to change the equality condition. Thank you for spotting this

    public static func ==(lhs: WeakHashableContainer<T>, rhs: WeakHashableContainer<T>) -> Bool {
        switch (lhs.weakReference, rhs.weakReference) {
        case let (.some(a), .some(b)): return a == b
        case (.some, .none), (.none, .some): return false
        case (.none, .none): return lhs.hashValue == rhs.hashValue
        }
    }

I think I might even compare just the hashValues... but somehow I feel it's wrong.

GLanza · May 7, 2019, 7:23am

Maybe we should open a discussion thread for each data structure? :) I feel like the thread is branched.

GLanza · May 7, 2019, 7:33am

it's the "upon deallocation" the problem. I might be wrong, but I didn't find any way to "listen" to object deallocation and react to it.

AlexanderM · May 7, 2019, 1:45pm

I misspoke by calling it a Dictionary, it would be your WeakDictionary type, and for this to work, it would need to be a class (so that every WeakKeyWrapper can capture a shared reference to it).

IDK how dict handles deletion. I would have thought that assigning nil for a given key is special cased to trigger deletion, and doesn't literally store nil.

AlexanderM · May 7, 2019, 1:46pm

Ah never mind, I guess the key would get released, not the key wrapper, so you can't handle the removal in the key wrapper's deinit. Nevermind

GLanza · May 7, 2019, 1:52pm

The best I could do to get rid of the value upon key deallocation was the cleanup method call on each collection access... But this adds a tremendous O(n) overhead on each access to the collection.

I'm currently using that data structure but, being aware of the constraint, I use it for small collections to handle multicast delegate or coordinator patterns.

The behaviour of NSMapTable was not good enough in my use case (cleanup upon collection resize... depending on the usage this might even never happen and the value would leak for the entire app lifecycle)

Josh_Osborne · May 8, 2019, 12:11am

If you really want to use that & you are Ok with being limited to Apple's platforms you can register for a dispatch memory pressure event, and do a cleanup on all of the weak dictionaries on memory pressure events.

It would "work" on macOS, but not be great (it could force page-ins during a memory pressure event). On iOS/tvOS it would work well (no page-outs on those systems, so memory pressure events are really "running low on RAM, will kill a process if things get worse")

David_Smith · May 8, 2019, 12:30am

This approach is likely acceptable for applications on iOS, but wouldn't be something we could use in daemons in general. In order to avoid "thundering herd" problems, the kernel doesn't wake up everything at once to receive memory pressure notifications, so by the time a process receives the warning, it's likely multiple other daemons would have been killed already. In general we've found that userspace memory notifications are much harder to make good use of than most people expected.

Of course this does leave us with the dilemma of how to implement weak collections usefully. Back when we were doing tracing garbage collection for ObjC, there was a "weak callback" system that allowed collections to update when the weak reference was cleared, but it was pretty tricky to use well. With ARC weak references the best available scheme was clearing things out on mutation, as has been mentioned here.

I haven't thought this through yet, but it's at least theoretically possible that Swift ARC could do better than ObjC ARC for this due to the different structure for weak references. It would need runtime changes at the very least though.

GLanza · May 8, 2019, 7:46am

Yet, NSMapTable clears memory on next resize of the data structure.

A "weak callback" would solve the issue. each container would have to listen to this callback and cleanup.

David_Smith · May 8, 2019, 6:40pm

NSMapTable in ARC mode clears on resize. In GC mode it clears more immediately. Somewhat irrelevant now that GC is gone, but useful for historical perspective, since the GC mode weak was the original design.

John_McCall · May 8, 2019, 7:15pm

We could certainly augment the weak-references runtime to allow some sort of notification, but every time I sketch out what that would require from the bookkeeping data structures, I'm left with the inescapable impression that it would probably use more peak memory than just using lazier approaches like clear-on-resize or clear-on-memory-pressure.

David_Smith · May 8, 2019, 9:40pm

We definitely don't want a synchronous callback, and we definitely don't want to spawn a thread, so that does rather limit our options. I guess that points towards coroutines again, "switch to this the next time the current thread would go idle".

Still a bit concerning tbh.

Speculating wildly, I wonder if a really naive heuristic like "have the runtime keep a count of the number of weak references cleared, look at it on reads from weak collections, and if it's gone up by at least 10% of the table size since the last time we looked, see if we can clear stuff out" would work acceptably. I guess that would mean that you'll have random latency spikes in your reads for work that could be done off the hot path

GLanza · May 9, 2019, 11:04am

Is a "callback" intended as "single job executed upon deallocation"? In that case, multiple subjects might be interested in this event.

My current design for the WeakDictionary is considering the fact that in a WeakCollection you might store fairly heavy objects that you want to clear from memory as soon as they are no longer needed.

The problem is relatively small while talking about WeakArray because what remains in memory is the container. but for WeakDictionary the story is fairly different because of the patterns they enable.

Example: Hash Consing or Flyweight patterns.
Imagine a Coordinator pattern; The coordinator generates the ViewController (and all the stack to keep the VC functional), then returns it. The coordinator keeps a weak reference to the generated VC.
The parent coordinator stores in a WeakDictionary the viewController as weak key and the coordinator that generated it as value (the same coordinator might build 2 different VC and you want to keep it alive for as long as at least one VC it generated is on alive).

The coordinator is implicitly retained by the VC, the VC is retained by UIKit and it will stay alive for as long as it stays on the screen. When the VC is dismissed I expect the Coordinator to die as well. In this case, the coordinator might be a fairly heavy object that still will not cause memory pressure, and the nature of the design pattern might never cause a resize of the data structure, so this memory is in all sense a memory leak.

True Flyweight/Hash consing in swift is currently not possible. The closest thing I had to make it work is having this extra O(n) work to clean up on access. A real-time "dead" notification would be amazing.

GLanza · May 9, 2019, 11:09am

If a real-time clean up is not possible maybe in this data structure we could make selectable a cleanup strategy?

enum CleanupStrategy {
    case onNextAccess
    case onMemoryPressure
    case onResize
}

linuxjh · September 20, 2019, 6:02am

Sorry about the noise! Just started to understand the unique idiom in the extension. The .filter calls .insert on every elements. After that, it returns the result of filter.

--

if elements of array can be compared like built-in value type Int, this is ok:

var array = [4,1,2,3,1,5,2]
array = Array(Set(array))
print(array)
// [2, 3, 5, 1, 4]

AlexanderM · September 20, 2019, 5:30pm

That code works, but loses the original ordering of the elements.

DeFrenZ · September 21, 2019, 2:30pm

So something like

extension Collection where Element: Hashable {
	func uniqueElements() -> [Element] {
		var insertedElements: Set<Element> = []
		return reduce([]) { accumulator, element in
			guard !insertedElements.contains(element) else { return accumulator }
			insertedElements.insert(element)
			return accumulator + [element]
		}
	}
}

toph42 · September 21, 2019, 3:04pm

I don’t understand the distinction between an ordered set and an array. I assume there is one.

ben-cohen · September 21, 2019, 3:14pm

An ordered set, like a Set, enforces uniqueness of elements and has constant-time contains. An Array can have duplicate values (or values that can't be equated at all) and contains takes linear time.

ben-cohen · September 21, 2019, 3:21pm

Note that this implementation is accidentally quadratic. Every time around the loop, you are creating a new array (via accumulator + [element]) which takes linear time, inside a loop that takes linear time.

An alternative would be to use reduce(into:). But given the reduce needs to capture and mutate the "seen" set, the readability benefits of reduce are probably outweighed by it's complexity and a simple for loop would be better IMO, especially if you use a where clause to eliminate the inner if (guard with early return also seems over-complicated given the simplicity of the inner check):

    for element in self where !seen.contains(element) {
      uniqued.append(element)
      seen.insert(element)
    }