Adding more data structures to the standard library

Note that this implementation is accidentally quadratic. Every time around the loop, you are creating a new array (via accumulator + [element]) which takes linear time, inside a loop that takes linear time.

An alternative would be to use reduce(into:). But given the reduce needs to capture and mutate the "seen" set, the readability benefits of reduce are probably outweighed by it's complexity and a simple for loop would be better IMO, especially if you use a where clause to eliminate the inner if (guard with early return also seems over-complicated given the simplicity of the inner check):

    for element in self where !seen.contains(element) {
      uniqued.append(element)
      seen.insert(element)
    }
4 Likes

I had a hunch that my implementation wasn’t that performant, and that usually these algorithms are almost always better off with a for-in :sweat_smile:

You might also gain a slight edge by using the fact that inserted tells you if the element was already there, but I'd want to do some benchmarking on different kinds of types to check it was genuinely faster.

    for element in self {
      let (inserted,_) = seen.insert(element)
      if inserted { uniqued.append(element) }
    }
4 Likes

If you're using reduce to built up an array, it's a really strong indication you should probably actually use map (keep 1 to 1 relationships between input and output elements), flatMap (1 to many) or filter (many to 1).

I've already shown a good solution, here: Adding more data structures to the standard library

2 Likes

I think I fixed your problem for doubly linked list indices. I abstracted away the nodes and store a reference to them in the index itself. Here is the code for the type.

I'd recommend against filter for this for the exact same reasons as reduce. You are closing over your "seen" set, and making assumptions about how filter works by mutating external state on every call. Fairly safe assumptions in this case but still, the for loop looks much simpler to my eyes. This general desire to force problem into stock higher-order functions is an anti-pattern that should be discouraged IMO.

If you really want to use them, reduce(into:) can be written to pass along both the built-up array and the seen set to each closure call.

3 Likes

Hey Ben, thanks for taking the time to write!

This is kinda funny, because in my head, I have your name tagged as "that guy who points out accidentally quadratic code," (from your talks at WWDC 2017, and the Functional Swift Conference 2018)

The only assumption here (I think) is that multiple calls to the closure can't happen in parallel, otherwise the seen set would need to be synchronized. But as you said, that's a safe call to make.

I was on the fence about this, but I eventually settled in the other camp.

filter is a more specific construct than for. When you see filter, you instantly know a few things, without even needing to look at the details:

  • output.count ≤ input.count,
  • The type of elements of the input and the output are the same.

for always needs case-by-case reading, to infer anything beyond "something happens 0, 1, n or infinitely times here." I make the same argument against using reduce when other APIs will do the same thing. reduce, like for, is really general. That means you can do a lot with it, but it also means you don't know anything about what it's doing unless you look at it carefully.

I've heard from many people who don't like the idea of having side-effects in closures passed to otherwise pure higher ordered functions, but personally, I see this as idealism impeding on pragmatism. I value the readability quality of that code over function purity, in cases like this where the purity doesn't add value.

I'm not saying function purity doesn't add value, but I am saying that seeking function purity for purity's sake is a non-goal of mine. If the purity helps with parallelism, code-reasoning, etc., then by all means, go for it. I usually start by aiming to write pure functions when possible, but I'm always constantly weighing how much complexity I'm accumulating, how much value it adds, and what a "competing" impure implementation could be like. Often times, pure functions can be more complex than impure functions, so there's a cost to consider. Consider this (extreme) example, of trying to implement this code in a pure way (ignoring the reduce(into:) vs reduce(_:) distinction):

extension Sequence where Element: Hashable {
	// Please don't abuse `reduce` like this.
	func uniqued1() -> [Element] {
		return reduce(into: (result: [], seen: Set<Element>())) { accumulator, element in
			if accumulator.seen.insert(element).inserted {
				accumulator.result.append(element)
			}
		}.result
	}
	
	func uniqued2() -> [Element] {
		var seen = Set<Element>() // mutated by capture, but so simple!
		return self.filter { seen.insert($0).inserted }
	}
	
	// The "if it ain't broken, don't fix it" approach
	func uniqued3() -> [Element] {
		var uniqued = [Element]()
		var seen = Set<Element>()
		
		for element in self where !seen.contains(element) {
			uniqued.append(element)
			seen.insert(element)
		}
		
		return uniqued
	}
}

The second and third implementations are my personally preferred ones. #2 is the simplest if you're familiar with Swift (imo), and #3 is the most "all-around" simple/approachable ones to people who are less familiar with Swift. But #1 is the worst of all. It "adds" function purity, but piping along multiple pieces of closed-over state via parameters is messy. This is precisely what closures are for, after all! To easily capture necessary state, and move us past the days of the userInfo: parameters.

7 Likes

For the SortedSet type I just wrote, I added a debouncing lazy filter. That gets rid of consecutive duplicates. After confusion on terms, I also wrote another filter that gets rid of any duplicate. The latter uses an object that conforms to SetAlgebra, not just Set. If you allow custom set types, you can work the algorithm on non-Hashable elements. Or work with custom equivalent tests, like case-insensitive strings. Since I pass the set as a function argument, I have the option to pre-load a ban list, whose elements appear zero times instead of at most once.

FYI ObjC's NSArray supports efficient prepend (and append), I don't remember the exact details (as in the first prepend may be inefficient). I don't know if most Swift Arrays do, but a Swift Array can be backed by an NSArray, so if you really wanted to make sure one was efficient as a Deque you could make sure it was a bridged NSArray...

(and really for a code challenge it ought to be sufficient to say "typealias Deque = Array // Assume O(1) prepends" as opposed to actually needing one...)

Also we seem to be leaving out in-memory B-Trees (see https://github.com/attaswift/BTree for a good Swift implementation), they cover a lot of ground for efficient storage.

Terms of Service

Privacy Policy

Cookie Policy