Differences between `map` and `filter` implementations

Checking the code on GitHub, I found that there are some differences between how map and filter work in Sequence. For example filter supports regrowth while map doesn't.
Is there any rationale behind this?

@inlinable
  public func map<T>(
    _ transform: (Element) throws -> T
  ) rethrows -> [T] {
    let initialCapacity = underestimatedCount
    var result = ContiguousArray<T>()
    result.reserveCapacity(initialCapacity)

    var iterator = self.makeIterator()

    // Add elements up to the initial capacity without checking for regrowth.
    for _ in 0..<initialCapacity {
      result.append(try transform(iterator.next()!))
    }
    // Add remaining elements, if any.
    while let element = iterator.next() {
      result.append(try transform(element))
    }
    return Array(result)
  }
@inlinable
  public __consuming func filter(
    _ isIncluded: (Element) throws -> Bool
  ) rethrows -> [Element] {
    return try _filter(isIncluded)
  }

  @_transparent
  public func _filter(
    _ isIncluded: (Element) throws -> Bool
  ) rethrows -> [Element] {

    var result = ContiguousArray<Element>()

    var iterator = self.makeIterator()

    while let element = iterator.next() {
      if try isIncluded(element) {
        result.append(element)
      }
    }

    return Array(result)
  }

What do you mean by "regrowth"? Aside from the fact that map assert that the first underestimatedCount items exist, they don't look much different for me.

With regrowth I mean that in map if in the closure transform elements are added into the original array, those elements aren't taking into account. Instead, in filter if in the closure isIncluded elements are added into the original array, those are considered and the closure is applied also to them.
I found it curious, I don't know if there's a specific reason behind it.

This is not a correct understanding of the code you are reading. Both apply the given closure to each element.

The comment about "rechecking for regrowth," as @Lantua mentions, refers to checking whether iterator.next() is nil or not; this is unnecessary in the implementation of map for the first initialCapacity elements because of the semantics of map:

Every element is transformed and the transformed result added to the return value of map (that is just what it means to map), so we can reserve underestimatedCount elements as the initialCapacity for the return value. Since we have performed that computation, we can make use of the result to skip testing for nil for the first initialCapacity iterations of the loop. Since we have reserved that capacity for the return value, we also know that the return value doesn't have to allocate memory during the first initialCapacity iterations.

By contrast, every element is transformed and, if the transformed result is true, the original value is added to the return value of filter (that is just what it means to filter). Therefore, we cannot reserve some number of elements beforehand for the return value based on the number of elements in the input (self), and without information about the number of elements in the input, we test for nil on every iteration of the loop.

2 Likes

Just to add, for most of the Collections that make use of Sequence APIs, the underestimatedCount is the same as count. In those cases, you wouldn't even need to iterate the second loop.

In addition to the comments above about how the regrowth is unnecessary, the operation you describe here is also unlikely to do what you want it to do.

In general, modifying a Swift sequence while iterating it has no defined semantics. What it will do depends on the collection and the mode of iteration. In some cases you will trigger copy-on-write, meaning that you continue to iterate the original copy of the sequence. In other cases you will trigger exclusive access violations. In other cases still you will invalidate your indices and make the iteration subtly unsafe, potentially crashing.

Do not modify the sequence you are iterating during the iteration.

1 Like

@andreaantonioni You might find this relevant

I think the comment in the source code is out of date.

It was added in https://github.com/apple/swift/commit/0c39db22bc420c8331f5a430dcbe67ed4d1d4b5d#diff-1925379b2182a63848c1c7f7f94d3baaf0c66370ba76e46b13be8e5853208cb4R282 which was using a type that seemed to have a special way to add elements in a pre-allocated space, and elements beyond the pre-allocated space.

I don't know if using iterator.next() instead of while let element = iterator.next() gives you a little performance boost (this seems to be @xwu theory in their answer), but in that case the same optimization should be possible in filter (but filter also has more code inside the loop, the isIncluded call, so the optimization might not work as well as in map).

Terms of Service

Privacy Policy

Cookie Policy