[Proposal] Add .order() family of methods to Collection


(Taras Zakharko) #1

Swift standard library already offers a useful set of sort() functions. However, it is also often useful to know how the collection should be rearranged in order to become sorted. For example, R defines the order() function which returns a permutation of collection indexes which rearrange the collection in an order. I suggest to add similar functionality to the Swift Collections E.g.:

var out = []
for i in collection.order({$0 < $1}) { out.append(collection[i]) }
// out is now sorted collection

Knowing the sort order is useful in many applications where the data cannot or should not be rearranged and yet some information about the ordering is helpful, e.g. for traversing the collection in a specific way. It is also helpful for maintaining multiple ordering relations associated with the same collection.

The implementation should be fairly straightforward. E.g. here is an extension method I use:

extension CollectionType {
    func order(@noescape isOrderedBefore: (Self.Generator.Element, Self.Generator.Element) -> Bool) -> [Self.Index] {
        return indices.sort({ isOrderedBefore(self[$0], self[$1]) })
    }
}

Best,

Taras


(Milos Rankovic) #2

Hi Taras,

The implementation should be fairly straightforward. E.g. here is an extension method I use:

extension CollectionType {
    func order(@noescape isOrderedBefore: (Self.Generator.Element, Self.Generator.Element) -> Bool) -> [Self.Index] {
        return indices.sort({ isOrderedBefore(self[$0], self[$1]) })
    }
}

Just a side note that you could also:

extension SequenceType {
  func order(@noescape isOrderedBefore: (Generator.Element, Generator.Element) -> Bool) -> [Int] {
    return enumerate().sort{ isOrderedBefore($0.1, $1.1) }.map{ $0.0 }
  }
}

(0...3).reverse().order(<) // [3, 2, 1, 0]

This way you can `order` all sequences, and it is more efficient as you don’t fetch elements by index inside the `isOrderedBefore`. (You could also *not* `map` at the end and return all the elements along with their original indexes.)

milos

···

On 11 Apr 2016, at 08:48, Taras Zakharko via swift-evolution <swift-evolution@swift.org> wrote:


(Taras Zakharko) #3

Hi,

I think it depends on how expensive the indexed access is on the collection. I use it primarily on arrays, where self[i] essentially boils down to a pointer dereference, so I expect the generated code to be very efficient. Your version might be faster for collection with expensive element access, but it should be slower for arrays and the like, as it involves additional intermediate structure allocations and copies.

— Taras

···

Just a side note that you could also:

extension SequenceType {
  func order(@noescape isOrderedBefore: (Generator.Element, Generator.Element) -> Bool) -> [Int] {
    return enumerate().sort{ isOrderedBefore($0.1, $1.1) }.map{ $0.0 }
  }
}

(0...3).reverse().order(<) // [3, 2, 1, 0]

This way you can `order` all sequences, and it is more efficient as you don’t fetch elements by index inside the `isOrderedBefore`. (You could also *not* `map` at the end and return all the elements along with their original indexes.)

milos


(Dave Abrahams) #4

This is about as efficient as it gets:

extension CollectionType {
  func order(@noescape isOrderedBefore areInOrder: (Generator.Element, Generator.Element) -> Bool) -> [Index] {
    var a = Array(indices)
    a.sortInPlace { areInOrder(self[$0], self[$1]) }
    return a
  }
}

···

on Mon Apr 11 2016, Taras Zakharko <swift-evolution@swift.org> wrote:

Hi,

I think it depends on how expensive the indexed access is on the collection. I
use it primarily on arrays, where self[i] essentially boils down to a pointer
dereference, so I expect the generated code to be very efficient. Your version
might be faster for collection with expensive element access, but it should be
slower for arrays and the like, as it involves additional intermediate structure
allocations and copies.

— Taras

    Just a side note that you could also:

    extension SequenceType {
    func order(@noescape isOrderedBefore: (Generator.Element, Generator.Element)
    -> Bool) -> [Int] {
    return enumerate().sort{ isOrderedBefore($0.1, $1.1) }.map{ $0.0 }
    }
    }

    (0...3).reverse().order(<) // [3, 2, 1, 0]

    This way you can `order` all sequences, and it is more efficient as you
    don’t fetch elements by index inside the `isOrderedBefore`. (You could also
    *not* `map` at the end and return all the elements along with their original
    indexes.)

--
Dave


(Milos Rankovic) #5

Hi Taras,

···

On 11 Apr 2016, at 19:38, Taras Zakharko <taras.zakharko@uzh.ch <mailto:taras.zakharko@uzh.ch>> wrote:

Your version might be faster for collection with expensive element access, but it should be slower for arrays and the like, as it involves additional intermediate structure allocations and copies.

No, not really. The cost of my enumeration and mapping is linear so they add nothing to the sorting order of complexity. Your two subscript calls inside the predicate, however, definitely do. In other words, your implementation will not be faster under any circumstances and it will in fact grow nonlinearly slower compared to the enumeration approach as the length of array increases…

milos

(My message appear to have bounced: I’m sorry if I end up sending multiple copies!)


(Milos Rankovic) #6

Yes, you are right that this is in fact not obvious!

I’ve just measured and, though my implementation is faster, it’s advantage only grows linearly with the length of the array:

  let a = (1...100_000).shuffled
  
  func testPerformanceOf_orderSequence() {
    self.measureBlock {
      self.a.orderSequence(<) // 0.464 sec
    }
  }
  
  func testPerformanceOf_orderCollection() {
    self.measureBlock {
      self.a.orderCollection(<) // 1.067 sec
    }
  }

It remains only twice as fast for 10,000 and 1000 elements (below that is the margin of error).

milos

···

On 11 Apr 2016, at 20:47, Taras Zakharko <taras.zakharko@uzh.ch <mailto:taras.zakharko@uzh.ch>> wrote:

the question is what is faster: {self[0] < self[1]} or {$1.0 < $1.1}