Does `Dictionary.Keys.contains(_:)` have `O(1)` time complexity?

This question stemmed from this StackOverflow post, in which it was called into question whether Dictionary.Keys.contains(_:) has linear or constant time complexity.

I saw this Swift evolution proposal, Provide Custom Collections for Dictionary Keys and Values, which was implemented for Swift 4, and specifically calls out this issue as something it aims to improve.

Yet, when I look at the implementation of Dictionary.Keys, I don't see anything that would suggest that it has hash-based, O(1) behavior for .contains(_:) calls. I was expecting to find an overload of contains(_:) that forwards to self._variant, but that's not the case.

So what's going on here?

Dictionary.Keys implements _customContainsEquatableElement, which is a customisation point of Sequence that's called from contains(_ element:).

Dictionary.Key's implementation of this customisation point forwards the call onto the underlying variant buffer which indeed performs the lookup in O(1) time. So yes, dict.key.contains(key) is an O(1) operation.

3 Likes

Hey Hamish. Great find!

Do you know why this was implemented in this way? What's the advantage over having types merely provide their own contains(_:) function?

The problem with making contains(_:) itself a customisation point is the fact that it requires Element to conform to Equatable, which cannot be expressed on a customisation point, e.g we cannot say:

protocol Sequence {
  associatedtype Element

  // ...
  func contains(_ element: Element) -> Bool where Element : Equatable
}

and there's no sensible default implementation for the case where the Element isn't Equatable.

_customContainsEquatableElement solves this problem by returning a Bool?, allowing the default implementation to return nil, which covers both the case where Element isn't Equatable and the type doesn't provide a custom implementation. The reason why it's underscored is due to the fact that it's an implementation detail of the conforming type.

The importance of being a customisation point is the fact that customisation points are dynamically dispatched, allowing them to be called from a generic context. If Dictionary.Key had just defined its own contains(_:) method, while it would get called for a direct call to dict.keys.contains(key), it would not get called from a generic context such as:

func foo<S : Sequence>(_ sequence: S, element: S.Element) -> Bool where S.Element : Equatable {
  return sequence.contains(element)
}

let dict = ["": 0]
print(foo(dict.keys, element: ""))
6 Likes

We have in the past discussed allowing 'where' clauses for non-generic declarations too, as long as they're nested inside a generic context (however there might be other reasons to implement contains(_:) the way its implemented today).

4 Likes

The Swift documentation says it is O(n). Is the documentation wrong?

Looks like this documentation was delegated into Sequence.contains(_:)?

As I learned the definition, the documentation is right — but has little real-life relevance:
In the worst case, equality checks have to be performed with each member of the dictionary (O(n)). However, "worst case" means hash collisions for every key, and that is very unlikely. The common case (no collision) completes in constant time (you just can't guarantee that).

1 Like

keys.contains(someKey) is O(n); dict[someKey] != nil would be O(1). – jscs Mar 15 '19 at 17:26
He is right.

Both should be worst-case O(n), and average-case O(1), as typical of hash table.

4 Likes

keys.contains(someKey) is O(n) is wrong I suppose? As above posts mentioned

Dictionary.Key 's implementation of this customisation point forwards the call onto the underlying variant buffer which indeed performs the lookup in O(1) time. So yes, dict.key.contains(key) is an O(1) operation.

1 Like

Apple documentation said Dictionary.keys.contains is O(1). Apple Developer Documentation.
Dict.keys return a array, then array.contains is O(n).

No it doesn't, it returns a Dictionary.Keys: keys | Apple Developer Documentation

3 Likes

if the worst case is O(n) and the best case is O(1) shouldn't that make the average case (ever so) slightly worse than O(1)?

app below shows more fine-grained O() behaviour. it is uses Set instead of a dictionary but i believe with dictionary behaviour would be similar as both Set and Dictionary are implemented using hash tables. it populates a set with numbers between 0 and N and then searches these numbers, calculating the average number of EQ calls per number. it also outputs min and max number of EQ calls per number.

here on X is the N and on Y if the average number of EQ calls per one element.
if we consider the segment 12K to 24K - there'll be a (not so) linear growth of EQ calls with N. then the number of EQ calls drops down (presumably hash table doubled and all elements moved to a bigger one) and then EQ number grows again between 25K and 49K. then drops again, and so on.

the app
import Foundation

struct S: Equatable, Hashable {
    let value: Int
    static var eqCount = 0
    
    public static func == (a: S, b: S) -> Bool {
        eqCount += 1
        return a.value == b.value
    }
    func hash(into hasher: inout Hasher) {
        hasher.combine(value)
    }
}

var n = 100

print("n\tavg\tmin\tmax")

while n < 10_000_000 {
    var s: Set<S> = []
    
    for i in 0 ..< n {
        s.insert(S(value: i))
    }

    S.eqCount = 0
    var min = Int.max
    var max = 0

    for i in 0 ..< n {
        let old = S.eqCount
        let has = s.contains(S(value: i))
        let cur = S.eqCount - old
        if max < cur { max = cur }
        if min > cur { min = cur }
        assert(has)
    }
    let avg = Double(S.eqCount) / Double(n)
    print("\(n)\t\(avg)\t\(min)\t\(max)")
    n += 100
}

// use SWIFT_DETERMINISTIC_HASHING environment variable to 1 for deterministic results

/* beginning of the output
n    avg    min    max
100    1.37    1    6
200    1.24    1    6
300    1.6866666666666668    1    13
400    1.2925    1    9
500    1.472    1    12
600    1.7816666666666667    1    17
700    2.2314285714285713    1    41
800    1.32875    1    8                ****** drop (table doubled)
900    1.4066666666666667    1    16
1000    1.498    1    16
1100    1.6336363636363636    1    18
1200    1.8091666666666666    1    26
1300    1.96    1    36
1400    2.138571428571429    1    36
1500    2.4513333333333334    1    44
1600    1.306875    1    9              ****** drop (table doubled)
1700    1.3429411764705883    1    9
1800    1.3894444444444445    1    12
1900    1.4231578947368422    1    14
2000    1.445    1    14
2100    1.4928571428571429    1    17
2200    1.535909090909091    1    17
2300    1.598695652173913    1    21
2400    1.6529166666666666    1    22
2500    1.6948    1    22
2600    1.7876923076923077    1    22
2700    1.875925925925926    1    22
2800    1.9728571428571429    1    23
2900    2.0717241379310343    1    28
3000    2.201    1    31
3100    1.3048387096774194    1    19   ****** drop (table doubled)
3200    1.32375    1    19
3300    1.346060606060606    1    19
*/
3 Likes

Dictionary.Keys implements contains efficiently, with the same lookup algorithm used by Dictionary's keying subscript, or Set.contains().

The expected number of hashing/equality operations performed by this method is an O(1) function, as long as the key type implements Hashable correctly. (I.e., by feeding all distinguishing bits into the hasher in its hash(into:) implementation.) The maximum number of operations performed is O(n), where n is the number of items in the dictionary.

That isn't quite how these asymptotic bounds work. For example, if f(n) = 1 almost all the time, except for once in a blue moon when f(n) = n, then f(n) is still O(n) -- as long as there isn't just a finite number of these slow cases.

(And not just because every O(1) function is also O(n), O(n^2), O(2^n) etc.)

Assuming that the contents of a hash table are "random enough" and that the hash function is "good enough", lookup operations are expected to perform O(1) hashing/equality operations. However, hash tables are probabilistic data structures: for every hash function, no matter how good, a tiny fraction of input data will unavoidably generate collision chains of a size that is proportional to the number of keys in the table. (Hashes map a (potentially) infinite set of values to a finite range of integers, so if the input type is "large enough", we can always find an arbitrarily large set if input values that map to the same hash value. The challenge is to make sure these will only rarely occur in practice.)

So technically, the cost of looking up items in a Set or Dictionary isn't an O(1) function -- the best upper bound is O(n). That said, with proper hashing, the expected value of the lookup cost is an O(1) function; linear-time lookups are vanishingly rare.

Hasher is (supposed to be) a "good enough" hash function even if the distribution of the input data is unknown. And with random seeding, it's (supposed to be) good enough even in the face of maliciously chosen data -- until and unless the seed becomes known to the attacker.

However, Set/Dictionary analysis is complicated by the fact that their performance tightly depends on the quality of the key's Hashable implementation. All bets are off if a key type implements hashValue rather than hash(into:), or if it feeds less data to the hasher than what it compares in ==. Generally, these mistakes make it trivial to generate collisions -- so they are a really, really bad idea if the input data isn't 100% trusted.

Would it be reasonable to expose this much detail in API docs? Sadly, no. FWIW, the OrderedSet docs attempt to provide guidance without delving too much into the "scary" details.

app below shows more fine-grained O() behaviour.

This is an excellent demonstration!

This behavior is also visible (albeit much less clearly) in benchmarking charts -- by the tell-tale sawtooth curve in elapsed time:

14 Likes

Karoy, quite interesting analysis, thanks.

in many cases the whole set of elements is finite, thus the number of "fast" and "slow" cases is obviously finite. examples: plain (C style) enum, int(8/16/32/64), a string of a fixed length, a Data of a fixed length, or an ADT holding the above.

Yes! However, saying that "lookups in a Set<Int> are O(1), because they will never execute more than 2^64 comparisons" would sound rather hollow to my ears. It would be highly misleading!

The Set and Dictionary implementations we actually have in the stdlib are finite data structures -- they cannot hold more than 2^63 - 1 items. (The actual limit is a tiny, tiny fraction of this, as the address spaces of the architectures Swift runs on are much, much smaller than that -- not to mention the size of physically addressable memory on any actual machine.) So no matter what the key type is, and no matter how it implements hashing, set lookups in the stdlib will never execute more than 9,223,372,036,854,775,807 comparisons. This is true even if the keys are Strings of "arbitrary" sizes. However, this emphatically does not mean that set lookups perform O(1) comparisons in the worst case!

Click to expand some nonsensical ramblings

The big O notation is concerned with a function's behavior "at infinity" -- it isn't even well-defined for functions that only have values over a finite set of inputs. So the performance guarantees we document technically aren't about the concrete implementations we are actually running on our computers -- they are about some idealized, abstract versions that are infinitely scalable. (As far as I'm aware, these platonic data structures cannot ever be truly implemented in our physical universe. Our computers are just fancy finite state machines: they have a very finite storage capacity. They cannot even truly represent simple integer values! Everything is a lie.)

The best we can do is to come up with a formula that expresses, say, the expected number of comparisons executed while looking up a value in a hash table of a certain fill factor, assuming the hash function distributes keys evenly. Then, we implicitly extrapolate this function over every integer value, so that we can simplify it using asymptotic analysis. (Even though this extrapolation is somewhat questionable, when you think about it.)

For example, it wouldn't be very practical to try documenting performance with precise formulas like 3 * n^2 + 42 * n + 1 -- it seems much preferable to use a hand-wavy O(n^2) instead. (Note that the linear term dominates the quadratic for n < 14 in this case; strictly speaking this makes the big-oh simplification less useful, because it discards this information.)

The implicit assumption is that the growth rates that we express using the big-oh shorthand will tell us something useful about behavior at empirically reproducible (or rather, commonly occurring) inputs. One aspect of this is that the constants involved must be small enough -- it doesn't matter if an algorithm technically takes constant time to execute if that constant time is measured in trillions of years.

In the case of Set, Dictionary, Array etc, the big-oh notation works great. The benchmark plot above experimentally demonstrates that Set<Int>.contains is a roughly "flat" function over input sizes between 1 and 16 million, while Array<Int>.contains is roughly linear over 4..<32k. The chart does not prove that Set.contains is O(1) or Array.contains is O(n) -- but it does show that the large-scale behavior of the idealized implementations are a reasonably good predictor of the performance of the actual small-scale data structures, on the particular set of input data that was measured by the benchmark.

The big-oh notation is all about omitting details and (over)simplifying things -- the art of it is in making sure that the resulting simple model still tells us something useful of the actual observable behavior.

8 Likes

On second thought, let's not talk about Data.

1 Like

I often jest "Everything is O(1) if your constant is large enough" :crazy_face: but I digress...

10 Likes

One of my favorite strange complexity facts related to this is that there are more efficient ways to multiply numbers than the obvious ones, but the big-O improvement only starts winning over the higher constant factor at scales most people don't care about. e.g. Fürer's algorithm starts winning above 2^2^64.

2 Likes

how would you explain the growth of graph of the Set benchmark after 100K? as you've said we can't just draw a constant line higher, the same way we can't say that, "O(N^2) == O(1) because N is actually finite".

i wonder what makes you say so?