[Pitch] remove(at: Set<Index>)


(Karl) #1

So like most good pitches, this one comes about because of a bug I recently fixed which seems common and subtle enough that I think it would be nice to include in the standard library.

Removing a collection of elements at once from a collection. You might want to do this in some kind of advanced filtering code. In my case, our code was merging adjacent elements in-place, and storing a list of indexes that were no longer required because they were now part of a merged element, and then we cleaned up the duplicates by removing those indexes.

A näive implementation might look like this:

for index in indexesToRemove {
    myArray.remove(at: index)
}

However, as the array is mutated, those indexes won’t make sense any more. You’ll end up with invalid results - for example, if you have the array [0,1,2] and indexesToRemove is [0,1], your resulting array will actually be [1] (not [2], as expected). Actually removing a batch of indexes is subtly more complex. Here’s my generic implementation:

extension RangeReplaceableCollection where Index:Hashable, Self:BidirectionalIndexable {

  mutating func remove(at indexes: Set<Index>) {
    var removed : IndexDistance = 0
    for idx in indexes.sorted() {
      remove(at: index(idx, offsetBy: -removed))
      removed = removed.advanced(by: 1)
    }
  }
}

I think it would be nice to have this in the standard library. I think it’s a reasonably common problem and it’d be nice to make it easier for people.

Karl


(Dmitri Gribenko) #2

So like most good pitches, this one comes about because of a bug I recently fixed which seems common and subtle enough that I think it would be nice to include in the standard library.

Removing a collection of elements at once from a collection. You might want to do this in some kind of advanced filtering code. In my case, our code was merging adjacent elements in-place, and storing a list of indexes that were no longer required because they were now part of a merged element, and then we cleaned up the duplicates by removing those indexes.

A näive implementation might look like this:

for index in indexesToRemove {
    myArray.remove(at: index)
}

However, as the array is mutated, those indexes won’t make sense any more. You’ll end up with invalid results - for example, if you have the array [0,1,2] and indexesToRemove is [0,1], your resulting array will actually be [1] (not [2], as expected). Actually removing a batch of indexes is subtly more complex. Here’s my generic implementation:

extension RangeReplaceableCollection where Index:Hashable, Self:BidirectionalIndexable {

        mutating func remove(at indexes: Set<Index>) {

Hi Karl,

This sounds like a good idea to me. You can make this method even
more useful by making it generic over collections of appropriate
indices. Then you can drop the Hashable requirement for indices.

                var removed : IndexDistance = 0
                for idx in indexes.sorted() {
                        remove(at: index(idx, offsetBy: -removed))
                        removed = removed.advanced(by: 1)

This implementation will not work correctly for all collections, since
removing an element at the beginning of the collection invalidates the
indices pointing at the tail. Adjusting the index by "-removed" won't
help, the indices are invalidated already. (Consider what happens in
a UnicodeScalar String view where indices have to jump over variable
number of code units in the underlying storage.)

You can make it correct (and work even faster, in O(n)) by walking
from the start, and moving the elements into place, and then
truncating the collection. You will need to require
MutableCollection.

You can make a variant that works for collections that are not
MutableCollections, but then the runtime will be O(n^2). You will
need to remove elements one by one walking from the end and shifting
the tail of the collection.

Dmitri

···

On Sat, Jun 18, 2016 at 9:09 PM, Karl via swift-evolution <swift-evolution@swift.org> wrote:

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr@gmail.com>*/


(Saagar Jha) #3

This isn’t actually that complex, especially if you ditch the “C-style” for
loop algorithm and switch it to, as you mentioned, “filtering code”. filter,
enumerated (to get indices), and map (to go back to elements) are more than
up to the task. Plus, this is much more efficient.

var myArray = [0, 1, 2]
let indices: Set = [0, 1]
myArray = myArray.enumerated().filter {
    return !indices.contains($0.offset)
}.map {
    return $0.element // to get the elements back
}
print(myArray) // prints “[2]"

Adding it to the standard library might be useful, but it’s not as hard as
it looks.

···

On Sat, Jun 18, 2016 at 9:09 PM Karl via swift-evolution < swift-evolution@swift.org> wrote:

So like most good pitches, this one comes about because of a bug I
recently fixed which seems common and subtle enough that I think it would
be nice to include in the standard library.

Removing a collection of elements at once from a collection. You might
want to do this in some kind of advanced filtering code. In my case, our
code was merging adjacent elements in-place, and storing a list of indexes
that were no longer required because they were now part of a merged
element, and then we cleaned up the duplicates by removing those indexes.

A näive implementation might look like this:

for index in indexesToRemove {
    myArray.remove(at: index)
}

However, as the array is mutated, those indexes won’t make sense any more.
You’ll end up with invalid results - for example, if you have the array
[0,1,2] and indexesToRemove is [0,1], your resulting array will actually be
[1] (not [2], as expected). Actually removing a batch of indexes is subtly
more complex. Here’s my generic implementation:

extension RangeReplaceableCollection where Index:Hashable,
Self:BidirectionalIndexable {

        mutating func remove(at indexes: Set<Index>) {
                var removed : IndexDistance = 0
                for idx in indexes.sorted() {
                        remove(at: index(idx, offsetBy: -removed))
                        removed = removed.advanced(by: 1)
                }
        }
}

I think it would be nice to have this in the standard library. I think
it’s a reasonably common problem and it’d be nice to make it easier for
people.

Karl

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

--
-Saagar Jha


(Karl) #4

What exactly are the guarantees for index safety across mutations? Both Collection and Indexable stop short of making any kind of guarantees about what happens to indexes after you mutate. For example, popFirst() has the potential to invalidate all stored indexes, but you wouldn’t know that if you were just working in terms of Collection.

If we can guarantee tail-end removal is safe, a minimalist’s solution could be:

extension RangeReplaceableCollection {

  mutating func remove<S:Sequence where S.Iterator.Element == Index>(at indexes: S) {

    for idx in indexes.sorted(isOrderedBefore: >) {
      remove(at: idx)
    }
  }
}

Maybe with Array having a more optimal implementation if those sorted indexes form a continuous range.

Karl

···

On 19 Jun 2016, at 06:58, Saagar Jha <saagarjha28@gmail.com> wrote:

This isn’t actually that complex, especially if you ditch the “C-style” for loop algorithm and switch it to, as you mentioned, “filtering code”. filter, enumerated (to get indices), and map (to go back to elements) are more than up to the task. Plus, this is much more efficient.

var myArray = [0, 1, 2]
let indices: Set = [0, 1]
myArray = myArray.enumerated().filter {
    return !indices.contains($0.offset)
}.map {
    return $0.element // to get the elements back
}
print(myArray) // prints “[2]"
Adding it to the standard library might be useful, but it’s not as hard as it looks.

On Sat, Jun 18, 2016 at 9:09 PM Karl via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
So like most good pitches, this one comes about because of a bug I recently fixed which seems common and subtle enough that I think it would be nice to include in the standard library.

Removing a collection of elements at once from a collection. You might want to do this in some kind of advanced filtering code. In my case, our code was merging adjacent elements in-place, and storing a list of indexes that were no longer required because they were now part of a merged element, and then we cleaned up the duplicates by removing those indexes.

A näive implementation might look like this:

for index in indexesToRemove {
    myArray.remove(at: index)
}

However, as the array is mutated, those indexes won’t make sense any more. You’ll end up with invalid results - for example, if you have the array [0,1,2] and indexesToRemove is [0,1], your resulting array will actually be [1] (not [2], as expected). Actually removing a batch of indexes is subtly more complex. Here’s my generic implementation:

extension RangeReplaceableCollection where Index:Hashable, Self:BidirectionalIndexable {

        mutating func remove(at indexes: Set<Index>) {
                var removed : IndexDistance = 0
                for idx in indexes.sorted() {
                        remove(at: index(idx, offsetBy: -removed))
                        removed = removed.advanced(by: 1)
                }
        }
}

I think it would be nice to have this in the standard library. I think it’s a reasonably common problem and it’d be nice to make it easier for people.

Karl

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution
--
-Saagar Jha


(Haravikk) #5

Has .enumerated() been changed to use the collection’s internal Index type? Last time I tried something like this it failed because .enumerated() only returns numeric offsets, but the index type of a collection may not necessary be compatible (not all indices are just a plain numeric value, consider a type that returns AnyIndex). So this will only work when your indices are integers starting from zero.

I think the following should work for the generic case, and is pretty similar:

extension MutableCollection {
  public mutating func remove(indices: Set<Index>) {
    var index = self.startIndex
    self = self.filter {
      let result = indices.contains(index)
      self.formIndex(after: &index)
      return result
    }
  }
}

(note: I'm not in a position to test this just now so it may not work exactly as written, but that’s the gist of how to do it safely I think)

The main question I have is how do you get into a situation where you’ve generated the indices, but could not put the same code into a call to .filter? For example:

  var indicesToRemove:Set<Index> = []
  for eachIndex in myArray.indices {
    if someCondition(myArray[eachIndex]) { indicesToRemove.insert(eachIndex) }
  }
  myArray.remove(indices: indicesToRemove)

Could be rewritten as:

  myArray = myArray.filter { someCondition($0) }

If what we want is in-place removal then I think what we need is some kind of MutatingIteratorProtocol which includes a .remove() method; this method would remove the last element retrieved without invalidating the iterator or skipping an element, allowing us to do the following:

  var iterator = myArray.makeIterator() // Conforms to MutatingIteratorProtocol
  while let eachElement = iterator.next() {
    if someCondition(eachElement) { iterator.remove() }
  }

···

On 19 Jun 2016, at 05:58, Saagar Jha via swift-evolution <swift-evolution@swift.org> wrote:

This isn’t actually that complex, especially if you ditch the “C-style” for loop algorithm and switch it to, as you mentioned, “filtering code”. filter, enumerated (to get indices), and map (to go back to elements) are more than up to the task. Plus, this is much more efficient.

var myArray = [0, 1, 2]
let indices: Set = [0, 1]
myArray = myArray.enumerated().filter {
    return !indices.contains($0.offset)
}.map {
    return $0.element // to get the elements back
}
print(myArray) // prints “[2]"
Adding it to the standard library might be useful, but it’s not as hard as it looks.


(Dmitri Gribenko) #6

What exactly are the guarantees for index safety across mutations? Both
Collection and Indexable stop short of making any kind of guarantees about
what happens to indexes after you mutate. For example, popFirst() has the
potential to invalidate all stored indexes, but you wouldn’t know that if
you were just working in terms of Collection.

https://github.com/apple/swift/blob/master/docs/IndexInvalidation.rst

Mutating a collection invalidates all indices. Specific types and
protocols relax this:

- Replacing an element in a MutableCollection does not invalidate any indices.

- RangeReplaceableCollection only invalidates indices after the one
where the structural mutation happened. For example,
replaceRange(a..<b, ...) invalidates "a" and all indices after it.

Collections in stdlib/private/StdlibCollectionUnittest/MinimalCollections.swift.gyb
check these rules.

If we can guarantee tail-end removal is safe, a minimalist’s solution could
be:

extension RangeReplaceableCollection {

mutating func remove<S:Sequence where S.Iterator.Element == Index>(at
indexes: S) {

for idx in indexes.sorted(isOrderedBefore: >) {
remove(at: idx)
}
}
}

Yes, this works, but it is O(n^2). We can do better.

Dmitri

···

On Sat, Jun 18, 2016 at 10:59 PM, Karl <razielim@gmail.com> wrote:

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr@gmail.com>*/


(Karl) #7

Scratch that: it should still be a Set and Index should still be Hashable. Otherwise the order of the indexes could be important and there may be duplicates. That’s a level of sophistication where you should know how about index invalidation while mutating and can do it manually.

extension RangeReplaceableCollection where Index : Hashable {

  mutating func remove(at indexes: Set<Index>) {

    for idx in indexes.sorted(isOrderedBefore: >) {
      remove(at: idx)
    }
  }
}

It’s not actually O(n^2) - it’s O(m*n), where m is never larger than n, and likely much smaller if n is large. Otherwise, you should probably use an inclusive filter and create a new instance, rather than than removing most of your large numbers of elements in-place.

So it’s probably better to loop over m. Given that, I’m not sure how we’d improve on the O(n) remove-at-index. That said, I’m seeing lots of weirdness in Xcode 8 (which may just be my machine), so I’m not really certain what MutableCollection even provides — apparently it supports in-place ’sort’ (via CMD+click), but it’s not in auto-complete and trying to use it is a compile error.

Karl

···

On 19 Jun 2016, at 07:59, Karl <raziel.im+swift-evo@gmail.com> wrote:

What exactly are the guarantees for index safety across mutations? Both Collection and Indexable stop short of making any kind of guarantees about what happens to indexes after you mutate. For example, popFirst() has the potential to invalidate all stored indexes, but you wouldn’t know that if you were just working in terms of Collection.

If we can guarantee tail-end removal is safe, a minimalist’s solution could be:

extension RangeReplaceableCollection {

  mutating func remove<S:Sequence where S.Iterator.Element == Index>(at indexes: S) {

    for idx in indexes.sorted(isOrderedBefore: >) {
      remove(at: idx)
    }
  }
}

Maybe with Array having a more optimal implementation if those sorted indexes form a continuous range.

Karl

On 19 Jun 2016, at 06:58, Saagar Jha <saagarjha28@gmail.com <mailto:saagarjha28@gmail.com>> wrote:

This isn’t actually that complex, especially if you ditch the “C-style” for loop algorithm and switch it to, as you mentioned, “filtering code”. filter, enumerated (to get indices), and map (to go back to elements) are more than up to the task. Plus, this is much more efficient.

var myArray = [0, 1, 2]
let indices: Set = [0, 1]
myArray = myArray.enumerated().filter {
    return !indices.contains($0.offset)
}.map {
    return $0.element // to get the elements back
}
print(myArray) // prints “[2]"
Adding it to the standard library might be useful, but it’s not as hard as it looks.

On Sat, Jun 18, 2016 at 9:09 PM Karl via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
So like most good pitches, this one comes about because of a bug I recently fixed which seems common and subtle enough that I think it would be nice to include in the standard library.

Removing a collection of elements at once from a collection. You might want to do this in some kind of advanced filtering code. In my case, our code was merging adjacent elements in-place, and storing a list of indexes that were no longer required because they were now part of a merged element, and then we cleaned up the duplicates by removing those indexes.

A näive implementation might look like this:

for index in indexesToRemove {
    myArray.remove(at: index)
}

However, as the array is mutated, those indexes won’t make sense any more. You’ll end up with invalid results - for example, if you have the array [0,1,2] and indexesToRemove is [0,1], your resulting array will actually be [1] (not [2], as expected). Actually removing a batch of indexes is subtly more complex. Here’s my generic implementation:

extension RangeReplaceableCollection where Index:Hashable, Self:BidirectionalIndexable {

        mutating func remove(at indexes: Set<Index>) {
                var removed : IndexDistance = 0
                for idx in indexes.sorted() {
                        remove(at: index(idx, offsetBy: -removed))
                        removed = removed.advanced(by: 1)
                }
        }
}

I think it would be nice to have this in the standard library. I think it’s a reasonably common problem and it’d be nice to make it easier for people.

Karl

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution
--
-Saagar Jha


(Haravikk) #8

Apologies for the double-post but I’m honour bound to follow up that my generic method is a load of nonsense at the moment; it’s missing some kind of constructor (I don’t recall if there’s a common option that will satisfy this), also, due to the way .filter {} currently works even in its lazy form it can end up calling the filter closure more than once per item, so advancing the index currently won’t work as expected (elements are skipped, children murdered, it’s just the worst).

So yeah, best option to do this right now is with a proper loop over self.indices, but it still requires some kind constructor to build a new copy of the correct type.

For arrays this is a bit simpler as you *can* safely remove indices in reverse order, since later indices cannot invalidate earlier ones, but this is reliant on the array’s indices being integers (reasonable enough, but not really correct in the strictest sense), so like so:

  extension Array {
    public mutating func remove(indices: Set<Index>) {
      for eachIndex in indices.sort(>) { // greater than should give descending order
        self.remove(at: eachIndex)
      }
    }
  }

But yeah, this is only safe on types where removing later items does not change the order of earlier ones, though this should be true on Array, Dictionary and Set since none of these shrink as elements are removed, it’s not strictly a good idea to rely on this behaviour as it’s implementation dependent, even though it’s unlikely this will change.

···

On 19 Jun 2016, at 11:12, Haravikk via swift-evolution <swift-evolution@swift.org> wrote:

I think the following should work for the generic case, and is pretty similar:

extension MutableCollection {
  public mutating func remove(indices: Set<Index>) {
    var index = self.startIndex
    self = self.filter {
      let result = indices.contains(index)
      self.formIndex(after: &index)
      return result
    }
  }
}


(Dave Abrahams) #9

This is still an O(N^2) algorithm. You can do this in O(N) by taking
Dmitri's approach.

···

on Sun Jun 19 2016, Haravikk <swift-evolution@swift.org> wrote:

On 19 Jun 2016, at 11:12, Haravikk via swift-evolution <swift-evolution@swift.org> wrote:

I think the following should work for the generic case, and is pretty similar:

extension MutableCollection {
  public mutating func remove(indices: Set<Index>) {
    var index = self.startIndex
    self = self.filter {
      let result = indices.contains(index)
      self.formIndex(after: &index)
      return result
    }
  }
}

Apologies for the double-post but I’m honour bound to follow up that my generic method is a load of nonsense at the moment; it’s missing some kind of constructor (I don’t recall if there’s a common option that will satisfy this), also, due to the way .filter {} currently works even in its lazy form it can end up calling the filter closure more than once per item, so advancing the index currently won’t work as expected (elements are skipped, children murdered, it’s just the worst).

So yeah, best option to do this right now is with a proper loop over self.indices, but it still requires some kind constructor to build a new copy of the correct type.

For arrays this is a bit simpler as you *can* safely remove indices in reverse order, since later indices cannot invalidate earlier ones, but this is reliant on the array’s indices being integers (reasonable enough, but not really correct in the strictest sense), so like so:

  extension Array {
    public mutating func remove(indices: Set<Index>) {
      for eachIndex in indices.sort(>) { // greater than should give descending order
        self.remove(at: eachIndex)
      }
    }
  }

--
Dave