Adding more data structures to the standard library

SwiftNIO has CircularBuffer which might be useful?

(It also has ByteBuffer which arguably fits in the stdlib remit too, but the Data question remains in the long grass).

4 Likes

This sounds like a good place to start. Additionally, given Swift’s reliance on copy-on-write I think we should be considering persistent data structures that could dramatically reduce copying for large collections (especially when they see a lot of small updates).

Specifically, we should have a type that is similar to Array but trades contiguity for sharing, similar to Clojure’s persistent vectors. The Persistence for the Masses paper looks like an approach that might be interesting to consider for Swift.

8 Likes

There was a thread on introducing a Swift version of NSOrderedSet in this thread for Swift 5. I thought it was pitched by a Swift core team member but might not have been. Seems like a good place to start for resurrecting the idea.

3 Likes

Here's a rough list comparing collections in Java and Swift standard libraries. A star next to an entry means that there is a common value outside the standard library - for Java that means Guava, for Swift that means a Foundation type without an unprefixed equivalent.

I did eliminate all values where the type only differs by synchronization/concurrency in the Java space, as the concurrency model is neither finalized nor likely to be equivalent in Swift.

I did not bother listing any of the Deque Java types, as IMHO there is no Swift equivalent to any of them.

Java Class Swift Equivalent
Sets
HashSet Set
EnumSet ---
--- OptionSet
TreeSet NSOrderedSet*
CopyOnWriteArraySet ---
LinkedHashSet ---
Maps
HashMap Dictionary
EnumMap ---
WeakHashMap NSMapTable*
IdentityHashMap NSMapTable*
LinkedHashMap ---
TreeMap NSOrderedDictionary*
Lists/Sequences
ArrayList Array
LinkedList ---
5 Likes

My opinions on targets:

  1. EnumSet and EnumMap equivalents would generate offsets for RawRepresentable enums and represent these structures as a bit vector and sparsely populated array, respectively. Such structures would require language/compiler support, and might be worth consideration. The big limitation is that such structures have no literal equivalent - something like NSAttributedString might be well-served using EnumMap, but it would complicate the visible API.
  2. NSOrderedSet / NSOrderedDictionary 'swifty' equivalents would still be useful, and would probably deserve peer review to create any base protocols.
  3. Queues/Deques are worth considering as a first-class concept, but I punted on the interface because their usage overlaps significantly with concurrency (through publish/subscribe systems).

I don't fully understand CopyOnWriteArray(Set|List), but I suspect their use cases are already covered by our value typed, CoWable Array and Set.

WeakHashMap, IdentityHashMap

Although NSMapTable covers these use cases, it's not generic and not the most elegant. Luckily, Swift allows us to implement these incredibly efficiently because we can make cheap, in-inline objects (structs).

WeakHashMap

Can be easily implemented using a weak box to wrap keys or values within a regular Dictionary:

struct Weak<T: AnyObject> {
	weak var object: T?
	
	init(wrapping object: T) {
		self.object = object
	}
}

extension Weak: Equatable where T: Equatable {
	static func == (lhs: Weak, rhs: Weak) -> Bool { 
		return lhs.object == rhs.object
	}
}

extension Weak: Hashable where T: Hashable {
	func hash(into hasher: inout Hasher) {
		hasher.combine(object)
	}
}

IdentityHashMap

Can easily implement using a weak box to wrap keys within a regular Dictionary:

struct IdentityBox<T: AnyObject>: Equatable {
	let object: T
	
	init(warpping object: T) {
		self.object = object
	}
	
	static func == (lhs: IdentityBox, rhs: IdentityBox) -> Bool { 
		return lhs.object === rhs.object
	}
}

extension IdentityBox: Hashable where T: Hashable {
	func hash(into hasher: inout Hasher) {
		hasher.combine(object)
	}
}

P.s. Can you please edit your table to add queue, deque, and stack, for sake of completeness?

2 Likes

IdentityBox is more or less typed ObjectIdentifier right?

1 Like

@Lantua It's similar, but as far as I can tell, you can create an ObjectIdentifier for a given AnyObject and store it, but once you go to retrieve it, you can't get the AnyObject instance back.

You can't do the equivalent of

let object = someObject()
let objectIdentityBox = IdentityBox(wrapping: object)
let retrievedObject = objectIdentityBox.object // There's no counterpart to this API
assert(retrievedObject === object)

Given that ObjectIdentifier's underlying data is raw pointer to the object, we may be able to expose that as AnyObject. Though I don't have enough knowledge regarding BuildIn types to say for sure.

Yeah, I can't think of any reason about why it wouldn't be possible, it's just currently not done.

Like this?

class SomeClass {
    var property: String
    init(_ property: String) {
        self.property = property
    }
}

let a = SomeClass("A")
let b = SomeClass("B")
let aID = ObjectIdentifier(a)
let bID = ObjectIdentifier(b)

func object<T>(ofType objectType: T.Type, from id: ObjectIdentifier) -> T {
    pray("object wasn't deallocated")    
    return unsafeBitCast(id, to: objectType)
}

print(object(ofType: SomeClass.self, from: bID).property) // B
print(object(ofType: SomeClass.self, from: aID).property) // A

@cukr It makes sense for ObjectIdentifier to just be a struct containing a single pointer. That makes unsafeBitCast(_:to:) work now, but I wouldn't rely on it. Cool idea thought.

You can even use that code to cast directly to an AnyObject, and it works. Still, if we decide to implement this, I'd rather just expose the property from within ObjectIdentifier.

1 Like

Oh right, OI doesn't retain the original object, recovering it could be unsafe. If it's important for IdentifierMap, we'll probably need another struct.

1 Like

Why doesn't OI retain? What use is an ObjectIdentifier pointed at an object that no longer exists?

1 Like

I don't know exactly as to why, but identifying object even after it's long gone is still very useful in database setting so I suppose it's useful in some similar case. Plus it's easier to opt-in by adding a dictionary [ObjectIdentifier: OriginalObject] while it's harder to opt-out.

In any case, the problem is not whether or not it should, but that it is.

The problem with identifying objects after they are gone, is that ObjectIdentifiers in that case aren't unique anymore, even though documentation says that they should be unique (radar 46952352)

(also sorry for derailing the thread about data structures D: )

It's dangerous though, because nothing stops allocation of a new object at the same memory address. OI would consider them as identical, even though they could be completely different objects, with different values, and even of a different type altogether.

I think it's a problem that it is that way. I would love to hear someone chime in as to what might motivate OI to not retain objects.

Hey Alexander,
This unrelated but, on what websites are you practicing?

1 Like

WeakHashMap Can be easily implemented using a weak box to wrap keys or values within a regular Dictionary

I didn't consider this a possibility because the other requirements of a Java weak map. For example:

  • The values need to potentially be dereferenced/deallocated when the keys go away
  • Iteration cannot break if such changes happen due to actions on another thread.

IdentityHashMap Can easily implement using a weak box to wrap keys within a regular Dictionary :

Yes, with an extra step to box/unbox to use. It does have the advantage over java that you can control whether null keys/values are allowed.

It's interesting to note that IdentityHashMap is a linear probe algorithm, and not a bucket-chain like normal java HashMaps.

P.s. Can you please edit your table to add queue, deque, and stack, for sake of completeness?

As I noted, I left them all off because the whole category has no real Swift equivalents to the Java API. In particular, a lot of the Queue/Deque diversity in Java is based on the early encouragement to use many blocking threads for processing (such as creating a reading and a writing thread for each socket).

If you ignore such concurrency-related methods and go for a simpler API, you can use Array as a Queue. You can use Array directly as a Deque, or wrap it with a more efficient implementation that doesn't necessarily require a full new array/copy to push/pop off the front.

Other options in Java like priority deques don't have an easy corresponding underlying implementation (well, NSSortedMap).

For those reasons I punted on representing it in the table. And punting on representing it was why I made the table and my comments on the table separate posts :-)

If you suggest what to put (Queue/Deque interfaces to nothing on Swift side?), I'll edit the post.