Sequence/Collection Enhancements


(Ben Cohen) #1

Hi swift-evolution,

Following up on Ted’s post regarding the opening up of stage 2, I’m starting a thread to discuss additive algorithms for Sequence and Collection.

Here is a list of commonly requested algorithms to operate on Sequence or Collection:

In-place transformations:
transform elements in a MutableCollection using a closure i.e. in-place map
remove(where:) that takes a predicate i.e. in-place filter
remove(indices:) that takes a Sequence of Index
bulk operations (replace, remove, insert) on RangeReplaceableCollection for a Sequence of subranges
Sorting:
change sort/sorted/partition to be stable
possibly add an explicitly unstable sort
provide partial sorting i.e. the _n_th lowest element of a sequence
Select a random element of a RandomAccessCollection (note this could include a random element of 0..<n)
Check if all elements in a Sequence match a predicate (i.e. !contains(!predicate))
reduce with an inout argument for the running value (PR for proposal here: https://github.com/apple/swift-evolution/pull/587)
cycle – repeat a collection over and over.
scan/accumulate/reductions/some other spelling of a reduce that returns the running values (previously proposed: https://github.com/apple/swift-evolution/blob/master/proposals/0045-scan-takewhile-dropwhile.md)
rotation of elements in a collection (deferred proposal: https://github.com/apple/swift-evolution/blob/master/proposals/0078-rotate-algorithm.md)
Please reply here with any comments or questions on the above list, or any additions you believe are important that are missing from it.
As this focus area is larger in scope than Dictionary, it is likely that it will need to be broken up into multiple separate proposals. The plan is to get an initial high-level signal from this thread on which areas to put focus on. We are also looking for volunteers to put together proposals and implementations – if you're interested in helping with that, please email me off-list. High quality community implementations will significantly improve the chances of a feature making it into Swift 4. Smaller, more focussed proposals are also more likely to make it than larger ones.

All methods added to the standard library increase complexity, so need a strong justification to reduce the risk of API sprawl. When requesting additions/modifications, please keep the following questions in mind:

Is the suggested addition a common operation that many would find useful? Can it be flexible enough to cover different needs?
Will it encourage good practice? Might it be misused or encourage anti-patterns?
Can the operation be composed simply from existing std lib features? Is that composition intuitive/readable?
Is writing the equivalent by hand hard to get right? Are there common correctness traps that this addition would help avoid?
Is writing the equivalent by hand hard to make efficient? Are there common performance traps that this addition would help avoid?
Might a native implementation be able to execute more efficiently, by accessing internals, than the equivalent implementation using public APIs?
Thanks!

Ben


In-place map for MutableCollection
(Guillaume Lessard) #2

A question that comes up often and has also led to a few syntax suggestions revolves around Optionals of Collections or Sequences. Since they can easily arise through optional chaining, making optional sequences easier to use would be useful.

Much could be accomplished post-SE-0143 by adding a conditional conformance to Optional approximately like this:

extension Optional: Sequence where Wrapped: Sequence {
  func makeIterator() -> AnyIterator<Wrapped.Iterator.Element> {
    switch self {
    case .some(let sequence):
      return AnyIterator(sequence.makeIterator())
    case .none:
      return AnyIterator { nil }
    }
  }
}

This way, a for-in loop could handle an optional sequence just as it can a regular one.

Cheers,
Guillaume Lessard


(Philippe Hausler) #3

I definitely find the intent here to be an interesting view at performant collection operations; in Foundation we have exposed IndexSet to do this type of thing. When Foundation migrated to structural types it was really obvious to make IndexSet a structure - it was a logical change because it already was used as that concept in the Objective-C APIs. One of it’s biggest features is that it interoperates with NSArray/NSMutableArray which unfortunately it does not operate upon Array/Collection etc as of current. But there are APIs like UI/NS CollectionView that some of the primitive interoperation is based upon IndexSet which mirror NSArray/NSMutableArray making modeling really nice. As of current Swift is lacking some of those handy parallels. The IndexSet type has been used across many APIs in the SDK (as well as third party code). I wonder if it would be reasonable to sink an encapsulation type of the collection of ranges (as an efficient store) down and make it generic for the Index type. That way we would have a cohesive story when it comes to interoperation with on-going improvements that might happen in UIKit/AppKit/Foundation etc where the changes like this would interoperate well.

There are also a few other methods that I think might be worth investigating;

non mutation
Fetching elements out of a collection given a sequence of ranges.
Fetching a sequence of ranges where a predicate matches elements (and perhaps even a sequence of ranges as a constraint upon that predicate)

mutation
Inserting a sequence of elements into the collection at indexes defined by a sequence of ranges
Replacing elements at indexes defined by a sequence of ranges with elements from a sequence

If we modified IndexSet to be generic (and perhaps moved it further down to interoperate with collection) I am certain that this would work really well with the already established APIs we have (just making bridged NSIndexSets really a IndexSet<Int> ). However this would mean that we would need some sort of way of indicating types were “Countable” which might be a decent addendum to the collection protocol cluster.

I think that overall these APIs have worked nicely with other counterparts (perhaps with a slight caveat of the matching of replacement counts and index counts). Additionally I am quite certain that having an encapsulating efficient storage for a sequence of ranges can allow for some really nice performance improvements and getting the performance point right is not always easy or obvious; so having the higher level APIs to handle these can not only be easier for the developer but also offer some really great performance optimizations too.

I have already tinkered with the idea of using IndexSet on Data and it is amazing; I can easily write really complex parsers for doing awful things like parsing mach-o headers from executables or other such things. I have also tinkered with the idea of making a slice type that is a sparse slice (it breaks some other assumptions of how collections work) and that works pretty nicely too (I think that should be a further discussion and this is probably not the time to introduce that idea just yet).

Hope that gives somethings to chew on from a frameworks perspective.

Cheers!
Philippe Hausler

···

On Feb 16, 2017, at 4:39 PM, Ben Cohen via swift-evolution <swift-evolution@swift.org> wrote:

Hi swift-evolution,

Following up on Ted’s post regarding the opening up of stage 2, I’m starting a thread to discuss additive algorithms for Sequence and Collection.

Here is a list of commonly requested algorithms to operate on Sequence or Collection:

In-place transformations:
transform elements in a MutableCollection using a closure i.e. in-place map
remove(where:) that takes a predicate i.e. in-place filter
remove(indices:) that takes a Sequence of Index
bulk operations (replace, remove, insert) on RangeReplaceableCollection for a Sequence of subranges
Sorting:
change sort/sorted/partition to be stable
possibly add an explicitly unstable sort
provide partial sorting i.e. the _n_th lowest element of a sequence
Select a random element of a RandomAccessCollection (note this could include a random element of 0..<n)
Check if all elements in a Sequence match a predicate (i.e. !contains(!predicate))
reduce with an inout argument for the running value (PR for proposal here: https://github.com/apple/swift-evolution/pull/587)
cycle – repeat a collection over and over.
scan/accumulate/reductions/some other spelling of a reduce that returns the running values (previously proposed: https://github.com/apple/swift-evolution/blob/master/proposals/0045-scan-takewhile-dropwhile.md)
rotation of elements in a collection (deferred proposal: https://github.com/apple/swift-evolution/blob/master/proposals/0078-rotate-algorithm.md)
Please reply here with any comments or questions on the above list, or any additions you believe are important that are missing from it.
As this focus area is larger in scope than Dictionary, it is likely that it will need to be broken up into multiple separate proposals. The plan is to get an initial high-level signal from this thread on which areas to put focus on. We are also looking for volunteers to put together proposals and implementations – if you're interested in helping with that, please email me off-list. High quality community implementations will significantly improve the chances of a feature making it into Swift 4. Smaller, more focussed proposals are also more likely to make it than larger ones.

All methods added to the standard library increase complexity, so need a strong justification to reduce the risk of API sprawl. When requesting additions/modifications, please keep the following questions in mind:

Is the suggested addition a common operation that many would find useful? Can it be flexible enough to cover different needs?
Will it encourage good practice? Might it be misused or encourage anti-patterns?
Can the operation be composed simply from existing std lib features? Is that composition intuitive/readable?
Is writing the equivalent by hand hard to get right? Are there common correctness traps that this addition would help avoid?
Is writing the equivalent by hand hard to make efficient? Are there common performance traps that this addition would help avoid?
Might a native implementation be able to execute more efficiently, by accessing internals, than the equivalent implementation using public APIs?
Thanks!

Ben

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(David Waite) #4

Hi swift-evolution,

Following up on Ted’s post regarding the opening up of stage 2, I’m starting a thread to discuss additive algorithms for Sequence and Collection.

Here is a list of commonly requested algorithms to operate on Sequence or Collection:

In-place transformations:
transform elements in a MutableCollection using a closure i.e. in-place map

I assume this would be an (Element)->Element transform?

Check if all elements in a Sequence match a predicate (i.e. !contains(!predicate))

I created this one to clean up some Sequence code earlier this week

Is writing the equivalent by hand hard to make efficient? Are there common performance traps that this addition would help avoid?

I suspect this is where remove(where:) and remove(indices:) for example would really shine as additions.

-DW

···

On Feb 16, 2017, at 5:39 PM, Ben Cohen via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:


(Ole Begemann) #5

Hi swift-evolution,

Following up on Ted’s post regarding the opening up of stage 2, I’m starting a thread to discuss additive algorithms for Sequence and Collection.

Here is a list of commonly requested algorithms to operate on Sequence or Collection:

In-place transformations:
transform elements in a MutableCollection using a closure i.e. in-place map
remove(where:) that takes a predicate i.e. in-place filter
remove(indices:) that takes a Sequence of Index

Is it possible to implement this (efficiently) with RangeReplaceableCollection's current feature set? Since replaceSubrange(_:with:) potentially invalidates indices, you can't call it repeatedly and expect that the passed-in indices are still valid.

I suppose it's possible to create a new empty collection, then iterate over the existing collection's indices and append each element that doesn't match one of the indices to be removed, but that sounds a bit awkward to me. Is there a better solution?

bulk operations (replace, remove, insert) on RangeReplaceableCollection for a Sequence of subranges

This relates to my question above. Is there a better way to implement this than creating an new empty collection and iterating over the existing elements, replacing/skipping/inserting at the passed-in subranges?

···

On 17 Feb 2017, at 01:39, Ben Cohen via swift-evolution <swift-evolution@swift.org> wrote:


(Brent Royal-Gordon) #6

This is not *exactly* on the same topic, but should I revise and re-submit SE-0132 (systematic renaming of various inconsistently-named collection algorithms)?

<https://github.com/apple/swift-evolution/blob/master/proposals/0132-sequence-end-ops.md>

···

On Feb 16, 2017, at 4:39 PM, Ben Cohen via swift-evolution <swift-evolution@swift.org> wrote:

Following up on Ted’s post regarding the opening up of stage 2, I’m starting a thread to discuss additive algorithms for Sequence and Collection.

--
Brent Royal-Gordon
Architechies


(plx) #7

Some belated feedback.

Here is a list of commonly requested algorithms to operate on Sequence or Collection:

In-place transformations:
transform elements in a MutableCollection using a closure i.e. in-place map

+0.5 in the stdlib if only to side-step numerous re-implementations and perhaps also for minor performance gains.

+0.5 if it’s paired with an in-place variant (`mutating func mapInPlace(_ transform: (inout Element) -> Void) -> Void`); could perhaps be better-named `updateEach`.

Both are minor but would be nice-to-have as built-ins.

remove(where:) that takes a predicate i.e. in-place filter

+2 as long as this is an overridable method. It’s a common need and difficult for users to write an efficient implementation.

One quibble: would this be on `Collection`, `MutableCollection`, or some new `ShrinkableCollection`? This issue I’d have with putting this onto `Collection` itself is that that would seemingly shut the door on `Collection` being adoptable by any type that implements a “collection” with some fixed, intrinsic size.

Thus either adding it to `MutableCollection` (at cost of being overly narrow) or adding it to some new refinement like `ShrinkableCollection` (thus allowing e.g. `Set` and `Dictionary` to conform w/out eliminating fixed-size “collections” from adopting `Collection`).

remove(indices:) that takes a Sequence of Index

+0 if *exactly* as stated; the point of a method like this would presumably be efficient bulk-removal, and a mere “a `Sequence` of `Index`” doesn’t provide much useful information: no count, no guarantee each index is included at-most once, no guarantee of ordering, and so on.

I think the “right” eventual solution is an equivalent to `(NS)IndexSet`—loosely speaking, a “collection” of non-empty, non-overlapping, ordered-ascending ranges of indices—but unfortunately it seems both difficult and premature to contemplate such a thing at this time.

bulk operations (replace, remove, insert) on RangeReplaceableCollection for a Sequence of subranges

See above: a `Sequence` of subranges is a bit better than a `Sequence` of `Index`, but doesn’t provide much information (# of ranges? # of indices? ranges ordered somehow? ranges non-overlapping? ranges non-empty?).

Thus for both of the previous two I strongly think building them around an `(NS)IndexSet`-like component is the way to do them but to do that `(NS)IndexSet`-alike type properly would IMHO be a bigger project than the current scope; in the absence of such an index-set type I’m not sure there’s enough benefit to these bulk operations to justify their inclusion.

Select a random element of a RandomAccessCollection (note this could include a random element of 0..<n)

+1 to add to the stdlib some reasonable way to draw a value at random from `0..<n` along with the obvious convenience method on `RandomAccessCollection`. Especially with cross-platform ambitions having the stdlib provide a non-terrible way to do this seems prudent.

That said, I’d be wary of doing anything more than that basic pair—random value from `0..<n` + convenience on `RandomAccessCollection`—at least at this point in time…I say this because I’d like to believe the stdlib will eventually gain a more-structured approach to randomness (perhaps something analogous to what C++11 picked up), and thus this feature should be kept very minimal.

Check if all elements in a Sequence match a predicate (i.e. !contains(!predicate))

+1, this in the stdlib is better than a million independent re-implementations.

As a philosophical point, I’d actually prefer if `all`, `any`, `notAll`, and `notAny` (change names to suit) were all in the stdlib even if just as convenience methods defined in an extension: no point having everyone roll their own and there’s likely a minor performance advantage if such code goes in the stdlib.

When only one “polarity” of a logical operation is available (`all` but not `none`, or `remove(where:)` but not `keep(where:)`, etc.) it seems inevitable to have occasional mismatches, and wrapping a closure in another closure just to apply `!` to it always at least “feels” wasteful.

reduce with an inout argument for the running value (PR for proposal here: https://github.com/apple/swift-evolution/pull/587)

+0.5? The performance consideration is real, but I’d hope the eventual ownership system would be able to mitigate the performance issues.

rotation of elements in a collection (deferred proposal: https://github.com/apple/swift-evolution/blob/master/proposals/0078-rotate-algorithm.md)

+0.5, it’s another nice thing to have in the standard library but not especially urgent.

Please reply here with any comments or questions on the above list, or any additions you believe are important that are missing from it.
As this focus area is larger in scope than Dictionary, it is likely that it will need to be broken up into multiple separate proposals. The plan is to get an initial high-level signal from this thread on which areas to put focus on. We are also looking for volunteers to put together proposals and implementations – if you're interested in helping with that, please email me off-list. High quality community implementations will significantly improve the chances of a feature making it into Swift 4. Smaller, more focussed proposals are also more likely to make it than larger ones.

All methods added to the standard library increase complexity, so need a strong justification to reduce the risk of API sprawl.

One last question: what’s the right cost model for adding an associated type to a standard library protocol?

EG: for the collection-rotation above, the `RotatedCollection` type as presently proposed would result in `RotatedCollection<Base>` returning `RotatedCollection<RotatedCollection<Base>>` from `func rotated(shiftingToStart:)`; this *could* be avoided by having `Collection`:

- gain `associatedtype Rotation: Collection where…`
- make `func rotated(shiftingToStart index: Index) -> Rotation`
- default `Rotation = RotatedCollection<Self>`

…and then having `RotatedCollection<Base>` make use `Rotation = RotatedCollection<Base>`.

There’s a benefit but also a cost, and it’s never been clear *how costly* one should consider a new associated type on a stdlib protocol.

···

On Feb 16, 2017, at 6:39 PM, Ben Cohen via swift-evolution <swift-evolution@swift.org> wrote:

When requesting additions/modifications, please keep the following questions in mind:

Is the suggested addition a common operation that many would find useful? Can it be flexible enough to cover different needs?
Will it encourage good practice? Might it be misused or encourage anti-patterns?
Can the operation be composed simply from existing std lib features? Is that composition intuitive/readable?
Is writing the equivalent by hand hard to get right? Are there common correctness traps that this addition would help avoid?
Is writing the equivalent by hand hard to make efficient? Are there common performance traps that this addition would help avoid?
Might a native implementation be able to execute more efficiently, by accessing internals, than the equivalent implementation using public APIs?
Thanks!

Ben

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Ben Cohen) #8

Hi swift-evolution,

Following up on Ted’s post regarding the opening up of stage 2, I’m starting a thread to discuss additive algorithms for Sequence and Collection.

Here is a list of commonly requested algorithms to operate on Sequence or Collection:

In-place transformations:
transform elements in a MutableCollection using a closure i.e. in-place map
remove(where:) that takes a predicate i.e. in-place filter
remove(indices:) that takes a Sequence of Index

Is it possible to implement this (efficiently) with RangeReplaceableCollection's current feature set?

In-place, I don’t think so, because of the shuffling-down-being-O(n) problem.

It is possible with a combo of MutableCollection and RangeReplaceableCollection, because MutableCollection implies constant-time mutation of a single element:

extension MutableCollection where Self: RangeReplaceableCollection {
    mutating func remove(where predicate: (Iterator.Element) -> Bool) {
        guard var i = index(where: predicate) else { return }
        var j = index(after: i)
        while j != endIndex {
            let e = self[j]
            if !predicate(e) {
                // here's where you need MutableCollection...
                self[i] = e
                formIndex(after: &i)
            }
            formIndex(after: &j)
        }
        // and this is the part you need RRC for...
        removeSubrange(i..<endIndex)
    }
}

But if you tried to implement this with RRC alone, you’d hit the challenge of single-element replacement not necessarily being O(n). String is the poster child for this problem since the graphemes you might be wanting to remove are variable width.

Making this a full-fledged member of one of RangeReplaceableCollection would allow types that can’t offer constant-time single element replacement, but can implement an algorithm that slides down values over the top of others efficiently in one pass, to provide an efficient implementation. Additionally, types like Array or String that are backed by contiguous storage might be able to make use of lower-level memory operations. The default implementation could be to rebuild self from scratch using lazy filter, which while sub-optimal can at least be done in linear time.

Since replaceSubrange(_:with:) potentially invalidates indices, you can't call it repeatedly and expect that the passed-in indices are still valid.

I suppose it's possible to create a new empty collection, then iterate over the existing collection's indices and append each element that doesn't match one of the indices to be removed, but that sounds a bit awkward to me. Is there a better solution?

bulk operations (replace, remove, insert) on RangeReplaceableCollection for a Sequence of subranges

This relates to my question above. Is there a better way to implement this than creating an new empty collection and iterating over the existing elements, replacing/skipping/inserting at the passed-in subranges?

In-place operations avoid having to allocate/free a whole new buffer, avoid copying the initial prefix of retained elements unnecessarily (a big deal if you are removing a small proportion of the whole, especially if you sometimes remove none), and also enable customized efficient implementations for specific types. Collections that didn’t implement them could just get a default that builds up from scratch like you describe.

If we had multi-range bulk remove, you could also implement remove(where:) using it, at the cost of having to do it in two passes.

···

On Feb 20, 2017, at 7:38 AM, Ole Begemann <ole@oleb.net> wrote:

On 17 Feb 2017, at 01:39, Ben Cohen via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:


(David Hart) #9

Following up on Ted’s post regarding the opening up of stage 2, I’m starting a thread to discuss additive algorithms for Sequence and Collection.

This is not *exactly* on the same topic, but should I revise and re-submit SE-0132 (systematic renaming of various inconsistently-named collection algorithms)?

<https://github.com/apple/swift-evolution/blob/master/proposals/0132-sequence-end-ops.md>

I think it's worth re-proposing. But it will probably have to be in parallel to the String rewrite which will affect collection functions.

···

On 22 Feb 2017, at 12:19, Brent Royal-Gordon via swift-evolution <swift-evolution@swift.org> wrote:

On Feb 16, 2017, at 4:39 PM, Ben Cohen via swift-evolution <swift-evolution@swift.org> wrote:

--
Brent Royal-Gordon
Architechies

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution