Pre-pitch: striding through collections should be easier

collection
stdlib

(Zachary Waldowski) #1

stride(from:through:by:) and stride(from:to:by:) are of great utility, but their relationship to collection indices — or, rather, their lack thereof — is optimized for the old world where indices were all Strideable. There's no real equivalent for the present world, and String ergonomics are really hurting for it. It seems to me such a utility built for Collection would round out dropFirst, prefix, split et. al. the same way stride rounds out +, min, et. al. There's been a bunch of prior art to this (1, 2, 3).

Building an iterative version for forward collections into the stdlib would perform a lot better than hand-done versions, which typically frankenstein some assemblage of count, dropFirst, and prefix from Stack Overflow and unknowingly walk the collection many times. A straw-man implementation:

extension Collection {

    func split(every stride: Int) -> AnySequence<SubSequence> {
        return AnySequence { () -> AnyIterator<SubSequence> in
            var start = startIndex
            let end = endIndex
            return AnyIterator { () -> SubSequence? in
                guard start != end else { return nil }
                let slice = self.suffix(from: start).prefix(stride)
                start = slice.endIndex
                return slice
            }
        }
    }

}

"Hello, playground".stride(by: 5) // => "Hello", ", pla", "ygrou", "nd"

A real version would use concrete types and conditional conformances instead of AnySequence. Ideally we could even use opaque types and not need to worry about bikeshed-perfect names. :wink:

If there's interest, I'll start working a proposal based on the sketch above.


(Daryle Walker) #2

You may need to update your Swift reading material. One problem with Swift's pace is that there are already multiple generations of techniques that have been made completely obsolete by later updates. You probably shouldn't do this in Swift 3, and definitely not in Swift 4 and later. Nowadays, you would make a lazy Collection that wraps the target collection and also takes a stride amount. I think I tried it once, but it's not among my Gists.

struct StridingCollection<Base: Collection> {
    let base: Base
    let span: Int

    let stragglerCount: Int

    init(_ base: Base, stride: Int) {
        precondition(stride > 0)

        self.base = base
        span = stride

        stragglerCount = base.count % span
    }
}

extension StridingCollection: Collection {
    var startIndex: Base.Index { return base.startIndex }
    var endIndex: Base.Index { return base.endIndex }

    func index(after i: Base.Index) -> Base.Index {
        precondition(i < endIndex)

        return base.index(i, offsetBy: span, limitedBy: base.endIndex) ?? base.endIndex
    }
    subscript(position: Base.Index) -> Base.SubSequence { return base[position..<index(after: position)] }
}

If you check that Base is RandomAccessCollection, you could make this conform to BidirectionalCollection and RandomAccessCollection support (by adding index(before:) and overriding distance and index(_: offsetBy:).) This code here is very quick and dirty; my attempts to make it bidirectional failed.


(Zachary Waldowski) #3

What is "this"? Writing functions? I don't understand what point you're making.


(Daryle Walker) #4

Channeling code through closures hiding in anonymous types like AnySequence, instead of using custom lazy sequence/collection types that can expose more information to the user and other parts of your code.


(Nate Cook) #5

This is definitely a useful thing to add, and you're definitely right that stride(from:to:by:) and its companion were more suited to when we could call successor() on indexes. A couple notes:

  1. We don't need to put the start and end indexes as parameters, since that gets handled by slicing the collection. That is, you'd want to write array[5..<13].stride(by: 2).
  2. There are two related but distinct utilities here, what I would call chunking or splitting a collection (which you're calling striding), and what I think of as striding. Just as stride(from:to:by:) gives individual values in the e.g. Int space, I would expect "striding" by 3 to give me the 1st, 4th, and 7th elements of a collection, and so on, not the subsequences of length 3.

There's an existing proposal for the functionality you're describing, with the name chunks(of:), here: https://github.com/apple/swift-evolution/pull/935


(Zachary Waldowski) #6

The post mentions twice that the specific AnySequence implementation is not what is being proposed; it’s the syntax it yields that is being proposed.


(Zachary Waldowski) #7

Both excellent points, thank you! I can incorporate those notes and see if I can get in touch with Luciano on advancing the proposal.


(Daryle Walker) #8

I just threw together a quick type for this (with Swift 5 beta) in my Gists.