Add SequenceInstantiable, map(into:) to the stdlib, make map, filter, reduce etc. lazy by default

(Logan Shire) #1

The fact that map, filter, reduce etc return arrays by default is a consistent point of frustration for myself and many other users. Having to jump through a bunch of hoops just to avoid unnecessary intermediate arrays with .lazy all over the place is annoying at best and hazardous at worst (especially for large collections).

Just a couple of weeks ago @tkrajacic posted hoping for a way to easily map from a Set to a Set which unfortunately isn't really possible without Higher Kinded Types. That said, we could still make the current APIs much more ergonomic and mapping sequences lazy by default.

I propose adding a SequenceInstantiable protocol to the standard library which is defined as follows:

protocol SequenceInstantiable: Sequence {
    associatedtype SequenceElement
    init<S>(_ s: S) where S: Sequence, S.Element == SequenceElement
}

We could then make Array, Set, Dictionary and all of the other concrete standard library sequence types implement it in extensions:

extension Array: SequenceInstantiable {
    typealias SequenceElement = Element
}

extension Set: SequenceInstantiable {
    typealias SequenceElement = Element
}

extension Dictionary: SequenceInstantiable {
    typealias SequenceElement = (Key, Value)
    init<S>(_ s: S) where S: Sequence, S.Element == (Key, Value) {
        self.init(uniqueKeysWithValues: s)
    }
}

This would allow us to provide a map(into:) function where you can feed an arbitrary sequence into a type's constructor:

extension Sequence {
    func map<S>(into: S.Type) -> S where S: SequenceInstantiable, S.SequenceElement == Element {
        return S(self)
    }
}

This function would allow us to chain lazy functional operators and then finally pipe it into the desired type:

let mappedValues = setOfValues.lazy
    .map { ... }
    .filter { ... }
    .map(into: Set.self)

I'd also, perhaps more controversially, like to propose making map, filter et al lazy by default. I.e. we'd replace them with, essentially:

extension Sequence {
    func newFilter(_ isIncluded: @escaping (Element) -> Bool) -> LazyFilterSequence<Self> {
        return self.lazy.filter(isIncluded)
    }

    func newMap<U>(_ transform: @escaping (Element) -> U) -> LazyMapSequence<Self, U> {
        return self.lazy.map(transform)
    }
}

I believe making these functions lazy by default will save engineers from more bugs than any additional frustration this might cause. We can maintain almost complete source compatibility by writing another extension:

extension Sequence {
    func newMap<S, U>(_ transform: @escaping (Element) -> U) -> S where S: SequenceInstantiable, S.SequenceElement == U {
        return S(self.lazy.map(transform))
    }
}

Which will provide the identical behavior to today's map assuming the type can be inferred to be an array. If the type cannot be inferred to be an array, the Xcode source migrator could add an explicit array type annotation.

Aside: While playing around with this in a playground, I noticed that the current lazy sequence map, filter etc do not take throwing transforms, while the regular ones do. I wasn't sure if this was an oversight or if this was intentional.

(Letanyan Arumugam) #2

This still suffers from a performance loss as you still need to create another collection after you create an array. I'd much rather a solution that avoids this unnecessary allocation. Something like:

extension Set {
  func map<NE: Hashable>(_ transform: (Element) -> NE) -> Set<NE> {
    var result = Set<NE>()
    for e in self {
      result.insert(transform(e))
    }
    return result
  }
}

This solution doesn't account for moving from one type of collection to another, but Swift has been quite clear that type conversions should be done using initializers. This is why we have Double(_:Int) and not Int.doubleValue() for instance.

(Thomas Krajacic) #3

It's probably around half a year ago that I asked about this, and the problem with mapping a Set is that the result could easily have a different number of items (because of uniquing). Please don't push other people in front of your proposal. If you must, link to their post.

Edit: It's more than a year ago and here is my post: Provide map and flatMap for Set<A> that return a Set

(Ben Cohen) #4

This pitch covers a lot of different topics, and I'd suggest breaking it out into separate threads, to help keep focus on the parts that are more likely to make progress. For example, making map and filter lazy by default is pretty well-trodden ground at this point. It is unlikely to get traction both for the original reasons they aren't and for source-compatibility reasons.

When proposing a new (non-existential) protocol, there is one question that needs to be answered above all things: what new generic code could be written using this protocol? In concrete cases, there is no need for a protocol. If you want a Set, you can use it's initializer from a sequence. It's only if you want to write generic code that abstracts over multiple different types, including Set, that the protocol is needed.

The standard library already has a protocol that allows you to initialize many collections from a sequence: RangeReplaceableCollection. So the question is, is there a common need to write code that is generic over both Set or a potential SortedSet, which aren't range-replaceable, and range-replaceable things like Array or a linked list?

I would say almost certainly not. Set types behave extremely differently to an Array when you construct them from a sequence. They discard elements, they don't preserve ordering. So the number of cases where you would write generic code that genuinely doesn't care about these differences is likely very small.

This pitch cites one such case: being able to write a function that can be tacked onto the end of a chain that then constructs such a type. The goal here is to allow chaining, and I completely agree that the left-right-left tennis match caused by Swift's initiailizer mechanism is very annoying. But this is just a solution for one specific case, of chains of sequence operations. There are other times when having to zip to the left and wrap the whole expression is irritating:

Float(int1 + int2 + int3)

Ideally we would solve this problem across the board. This could be solved at the language level. But there are also library-level solutions. I'm a supporter of adding pipe-forward to the standard library for this:

infix operator |>

func |> <T,U>(lhs: T, rhs: (T)->U) -> U {
  return rhs(lhs)
}

let set = (0..<10).lazy.map(String.init) |> Set.init
let float = (1 + 2 + 3) |> Float.init
9 Likes
(Rudolf Adamkovič) #5

I'm +1 on the idea but −1 on the map(into:) naming and the |> operator. What I would like to see is a simple method along these lines:

extension Whatchamacallit {
    func apply<T>(_ closure: (Self) -> T) -> T {
        return closure(self)
    }
}

Unlike the |> operator, this would read like today's Swift:

valuesIfExist
    .compactMap { value in value }
    .filter { value in value <= maxValue }
    .map { value in value + 1 }
    .apply(Set.init)

The question is, where should the apply method live? It is potentially useful with any type except Never. If Any could be extended, this would be a great addition...