[Proposal]: Drastically improve searching API (indexOf(…)) of CollectionType


(Vincent Esche) #1

After having had this code laying around on my Mac since 6/26/15 (waiting for Swift to go open source)
I figured it’s about damn time I submit the actual RFC for it. So without further ado…

One of the areas where Swift’s stdlib is still quite lacking is searching.
Which is a shame as searching (along with indexing) might be among
the most important tasks of computer science/software development.
One might not need fast collection searches for writing a banal fart or flashlight app,
but almost every other app is likely to benefit from having a proper searching API.

So I’d like to fix that.

Attached you find a full RFC along with the full and functional source code to make it happen.

I’d love to hear your opinion on this and will be more than happy to answer questions.

Rendered Version + Full Implementation: https://gist.github.com/regexident/2b7531bd748f57679671
(The code is tested using Quick/Nimble. Tests would thus have to be ported to Swift’s XCTest.)

- Vincent

···

Markdown-dump for

# Improved Collection Search

* Proposal: [SE-NNNN](https://github.com/apple/swift-evolution/blob/master/proposals/NNNN-name.md)
* Author(s): [Vincent Esche](https://github.com/regexident)
* Status: **Review**
* Review manager: TBD

## Introduction

This RFC proposes an extension to the currently rather limited and linear searching API of `CollectionType`, that is `indexOf(element:)` and `indexOf(predicate:)`.
It proposes the backwards-compatible refactoring of those existing methods as well as the introduction of a view-based ensemble of methods for efficient binary-search.

Swift-evolution thread: [link to the discussion thread for that proposal](https://lists.swift.org/pipermail/swift-evolution)

## Motivation

Indexing of and searching in data is a big deal in software development. As such it is crucial to have capable means of doing so in a language that aspires to be a highly performant programming language.

Swift's current API for searching is basically limited to these two methods on `CollectionType`:

> ```swift
> func indexOf(element: Self.Generator.Element) -> Self.Index?
> ```
> Returns the first index where value appears in self or nil if value is not found.
>
> **Complexity**: `O(self.count)`.

and:

> ```swift
> func indexOf(@noescape predicate: (Self.Generator.Element) throws -> Bool) rethrows -> Self.Index?
> ```
> Returns the first index where predicate returns true for the corresponding value, or nil if such value is not found.
>
> **Complexity**: `O(self.count)`.

The author sees a couple of issue with these two methods:

1. They do not provide an option for restricting the "haystack" to a certain `range` of `self`.
2. They do not provide a fast (`O(log2(self.count))`) path for sorted collections.
3. They should not be limited to `CollectionType` but instead be moved to `Indexable`.

In many situations it is desirable to perform fast searches on sorted collections instead of having to fall back to naïve linear searches.

Further more while a single `index` might be the most common subject of interest, there are many scenarios where a `range` or `count` of matches are of more use.

And anyone who is writing a custom ordered collection type will eventually want to find the insertion index for a given element and will thus end up writing some variant of `lowerBoundOf(…)` or `upperBoundOf(…)` (upon which all the other methods can be implemented, actually).

## Proposed solution

The author proposes:

1. A backwards-compatible refactoring of `CollectionType.indices`, moving it to `Indexable`.

2. A backwards-compatible refactoring of `indexOf(…)` (adding optional `range:` and moving it to `Indexable`).

3. The addition of `rangeOf(…)`, `countOf(…)` and `isSorted(…)` to `Indexable` with a TIME complexity of `O(self.count)`.

4. The introduction of a `BinarySearchView` on `Indexable`, allowing for fast (`O(log2(self.count))`) searches on `Indexable` via `indexOf(…)`, `rangeOf(…)`, `countOf(…)`, `lowerBoundOf(…)`, `upperBoundOf(…)` without cluttering `Indexable`'s interface.

## Detailed design

### `CollectionType.indices`:

The author proposes the relocation of `.indices` from `CollectionType` to `Indexable`:

extension Indexable {
    /// Return the range of valid index values.
    ///
    /// The result's `endIndex` is the same as that of `self`.  Because
    /// `Range` is half-open, iterating the values of the result produces
    /// all valid subscript arguments for `self`, omitting its `endIndex`.
    public var indices: Range<Self.Index> { get }
}

After all all it does is provide a convenience method for generating a Range from its `startIndex` and `endIndex`. There is no need for `self` to be a `CollectionType` for this.

This change should not break any existing code as it generalizes the property.

### `Indexable` of `Comparable` elements:

The author proposes the addition of the following method on `Indexable where Element : Comparable`:

extension Indexable where Index: RandomAccessIndexType, _Element: Comparable {
	/// Return true iff all elements in `self` within `range` are sorted according to `isOrderedBefore`.
	///
	/// - Complexity: O(`self.count`)
	public func isSorted(range: Range<Index>? = nil) -> Bool
}

This method would perform a linear scan of the `Indexable` with a TIME complexity of `O(self.count)`.

An actual implementation of these methods can be found in `Indexable.swift`.

### `Indexable` of `Equatable ` elements:

The author proposes the addition of the following method on `Indexable where Element : Equatable`:

extension Indexable where Index : RandomAccessIndexType, _Element : Equatable {

    /// Returns the first index where `element` appears in `self`
    /// within range `range` or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`self.count`)
    public func indexOf(element: _Element, range: Range<Index>? = default) -> Self.Index?

    /// Returns the first range where `element` appears in `self`
    /// within range `range` or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`self.count`)
    public func rangeOf(element: _Element, range: Range<Index>? = default) -> Range<Self.Index>?

    /// Returns the number of `element`s that appear in `self`
    /// within range `range` or `nil` if such an element is not found.
    ///
    /// - Complexity: O(`self.count`)
    public func countOf(element: _Element, range: Range<Index>? = default) -> Int
}

These methods would perform a linear scan of the `Indexable` with a TIME complexity of `O(self.count)`.

An actual implementation of these methods can be found in `Indexable.swift`.

### `Indexable` of any elements:

The author proposes the addition of the following methods on `Indexable`:

extension Indexable where Index : RandomAccessIndexType {

    /// Returns the first index where an element appears in `self`
    /// within range `range` for which `lensView(element) == value` or `nil` if such an element is not found.
    ///
    /// - Complexity: O(`self.count`)
    public func indexOf(range: Range<Index>? = default, @noescape predicate: (_Element) throws -> Bool) rethrows -> Self.Index?

    /// Returns the first range where an element appears in `self`
    /// within range `range` for which `lensView(element) == value` or `nil` if such an element is not found.
    ///
    /// - Complexity: O(`self.count`)
    public func rangeOf(range: Range<Index>? = default, @noescape predicate: (_Element) throws -> Bool) rethrows -> Range<Self.Index>?

    /// Returns the number of `element`s that appear in `self`
    /// within range `range` or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`self.count`)
    public func countOf(range: Range<Index>? = default, @noescape predicate: (_Element) throws -> Bool) rethrows -> Int

    /// Return true iff all elements in `self` within `range` are sorted according to `isOrderedBefore`.
    /// 
    /// - Complexity: O(`self.count`)
    public func isSorted(range: Range<Index>? = default, @noescape isOrderedBefore: (_Element, _Element) throws -> Bool) rethrows -> Bool
}

These methods would perform a linear scan of the `Indexable` with a TIME complexity of `O(self.count)`.

An actual implementation of these methods can be found in `Indexable.swift`.

**Note**: For explanation of and reasoning behind the `lensView:` argument as well as the distinction between `T` and `Base._Element`, see below.

### `Indexable.binarySearchView(validateAgainst:?)`:

The author proposes the addition of a `binarySearchView(validateAgainst:?)` method on `Indexable` for obtaining a specialized search view for performing searches with O(`log2(self.count)`) complexity:

extension Indexable where Index: RandomAccessIndexType {
	/// Create a view into a sorted indexable that allows access within `bounds`.
	///
	/// - Complexity: O(`self.count`) if a validation closure is provided, otherwise O(`1`).
	public func binarySearchView(validateAgainst isOrderedBefore: ((_Element, _Element) -> Bool)? = nil) -> BinarySearchView<Self>
}

### `BinarySearchView<T: Indexable>`:

The author proposes the introduction of a `BinarySearchView` struct with the following interface:

public struct BinarySearchView<Base : Indexable where Base.Index : RandomAccessIndexType> : Indexable {
    public typealias Index = Base.Index

    public var startIndex: Base.Index { get }

    public var endIndex: Base.Index { get }

    public subscript (position: Index) -> Base._Element { get }

    /// Create a view into a sorted collection `base` that allows access within `bounds`.
    ///
    /// - Complexity: O(1).
    public init(base: Base, validateAgainst isOrderedBefore: ((Base._Element, Base._Element) -> Bool)? = default)

    /// Returns first index of the first `element` in `self` within range `range` for which
    /// `lensView(element) < value` evaluates to false or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func lowerBoundOf<T : Comparable>(value: T, range: Range<Index>? = default, @noescape isOrderedBefore: (T, T) -> Bool = default, @noescape lensView: Base._Element -> T) -> Base.Index

    /// Returns first index of the first `element` in `self` within range `range` for which
    /// `value < lensView(element)` evaluates to true or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func upperBoundOf<T : Comparable>(value: T, range: Range<Index>? = default, @noescape isOrderedBefore: ((T, T) -> Bool) = default, @noescape lensView: Base._Element -> T) -> Base.Index

    /// Returns the first index where an element appears in `self`
    /// within range `range` for which `lensView(element) == value` or `nil` if such an element is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func indexOf<T : Comparable>(value: T, range: Range<Index>? = default, @noescape isOrderedBefore: (T, T) -> Bool = default, @noescape lensView: Base._Element -> T) -> Base.Index?

    /// Returns the range where an element appears in `self`
    /// within range `range` for which `lensView(element) == value` or `nil` if such an element is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func rangeOf<T : Comparable>(value: T, range: Range<Index>? = default, @noescape isOrderedBefore: (T, T) -> Bool = default, @noescape lensView: Base._Element -> T) -> Range<Base.Index>?

    /// Returns the number of `element`s that appear in `self`
    /// within range `range` for which `lensView(element) == value` or `nil` if such an element is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func countOf<T : Comparable>(value: T, range: Range<Index>? = default, @noescape isOrderedBefore: (T, T) -> Bool = default, @noescape lensView: Base._Element -> T) -> Int

    internal let _base: Base
}

As well as the following extension for `Indexable where Element: Comparable`:

extension BinarySearchView where Base._Element : Comparable {
    /// Returns first index of the first `element` in `self` within range `range` for which
    /// `… < element` evaluates to false or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func lowerBoundOf(element: Base._Element, range: Range<Index>? = default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool = default) -> Base.Index

    /// Returns first index of the first `element` in `self` within range `range` for which
    /// `element < …` evaluates to true or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func upperBoundOf(element: Base._Element, range: Range<Index>? = default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool = default) -> Base.Index

    /// Returns the first index where `element` appears in `self`
    /// within range `range` or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func indexOf(element: Base._Element, range: Range<Index>? = default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool = default) -> Base.Index?

    /// Returns the range where `element` appears in `self`
    /// within range `range` or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func rangeOf(element: Base._Element, range: Range<Index>? = default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool = default) -> Range<Base.Index>?

    /// Returns the number of `element`s that appear in `self`
    /// within range `range` or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func countOf(element: Base._Element, range: Range<Index>? = default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool = default) -> Int
}

These methods would perform a linear scan of the `Indexable` with a TIME complexity of `O(log2(self.count))`.

An actual implementation of these methods can be found in `BinarySearchView.swift`.

### Why `lensView`?

Let's assume one had a collection of Persons like this:

struct Person {
    let name: String
    let age: Int
}

let persons: [Person] = [ … ]

Now let's assume one wanted to find the first person that's called `John Doe`. One could either change `Person` to conform to `Equatable`, allowing `indexOf(element:)` to be called on `persons`. Now that was easy, wasn't it?

Fine, but what if one wanted to search `Person`s by both their `name` or their `age` depending on the given use case? You're out of luck now.

Luckily there is a variant of `indexOf(…)` called `indexOf(predicate:)` that is available for any type of `Element` regardless of its conformance to `Equatable`:

let searchView = persons.binarySearchView()
let index = persons.indexOf { $0.name == "John Doe" }

Unfortunately however such a `predicate` cannot be used when searching for an element's property in a sorted(!) collection or a collection of non-`Comparable` elements within a TIME complexity of `O(log2(self.count))` as binary search uses `<` instead of `==` internally and as the "needle" gets passed as both the first (in `indexOf(…)` itself) and second argument (in `lowerBoundOf(…)` which gets called by `indexOf(…)`) to the operator, one cannot simply change `predicate: (E) -> Bool` to `isOrderedBefore: (V, E) -> Bool` as that would result in a type mismatch (`(V, E)` vs. `(E, V)`).

The use of `lensView` however allows one to do just that without making a mess.

let searchView = persons.binarySearchView() // sorted!
let index = searchView.indexOf("John Doe") { $0.name }

The `lensView` works similarly to how keypaths work in Objective-C, just type-safe.

## Impact on existing code

The author proposes:

1. Moving `CollectionType.indices` to `Indexable` should not cause any issues with existing code.

2. Changing `indexOf(element:)` to `indexOf(element:range:?)` should not cause any issues with existing code as the `range` argument is optional and the default behaves like it used to.

2. Changing `indexOf(predicate:)` to `indexOf(range:?predicate:)` should not cause any issues with existing code either, as the `range` argument is optional and the default behaves like it used to. Swift's trailing-closure magic allows `indexOf(predicate: …)` to still properly map to `indexOf(range: nil, predicate: …)`.

3. The addition of `rangeOf(…)`, `countOf(…)` and `isSorted(…)` to `Indexable` do not conflict with any existing API.

4. The introduction of a `BinarySearchView` on `Indexable` does not conflict with any existing API.

## Alternatives considered

The author's initial approach was to implement all methods on Indexable. This however required all methods to either be provided in two variants (one for unsorted, one for sorted `Indexable`s):

final public func indexOf(element: _Element, range: Range<Index>? = nil) -> Index? {
	return …
}

final public func sortedIndexOf(element: _Element, range: Range<Index>? = nil) -> Index? {
	return …
}

Or require the introduction of an additional argument:

final public func indexOf(element: _Element, range: Range<Index>? = nil, isSorted: Bool = true) -> Index? {
    if isSorted {
        return …
    } else {
        return …
    }
}

It also would have cluttered `Indexable` with methods such as `lowerBoundOf` and `upperBoundOf`, which are only ever applicable when `self` is sorted accordingly. The introduction of a dedicated BinarySearchView fixes these issues.

One also would have had to switch from `predicate: (T) -> Bool` to `isOrderedBefore: (T, T) -> Bool` for all methods for the second unified API alternative.


(Brent Royal-Gordon) #2

I'm not sure about the rest of this, but...

1. A backwards-compatible refactoring of `CollectionType.indices`, moving it to `Indexable`.

2. A backwards-compatible refactoring of `indexOf(…)` (adding optional `range:` and moving it to `Indexable`).

3. The addition of `rangeOf(…)`, `countOf(…)` and `isSorted(…)` to `Indexable` with a TIME complexity of `O(self.count)`.

4. The introduction of a `BinarySearchView` on `Indexable`, allowing for fast (`O(log2(self.count))`) searches on `Indexable` via `indexOf(…)`, `rangeOf(…)`, `countOf(…)`, `lowerBoundOf(…)`, `upperBoundOf(…)` without cluttering `Indexable`'s interface.

I don't think you quite understand what `Indexable` is for.

`Indexable` is a minimal protocol containing the very most basic parts of `CollectionType`'s interface. It's used to avoid circular definitions in `CollectionType`. The doc comments on it describe it as "almost an implementation detail". I don't think it's appropriate to move a whole bunch of stuff into `Indexable` when it's supposed to be a minimal protocol.

···

--
Brent Royal-Gordon
Architechies


(TJ Usiyan) #3

The proposed additions to Indexible only require Indices… I don't see how
adding them is such a mistake. If we can gain these behaviors while only
*requiring* that minimum interface… go for it.

TJ

···

On Mon, Jan 4, 2016 at 3:49 PM, Brent Royal-Gordon via swift-evolution < swift-evolution@swift.org> wrote:

I'm not sure about the rest of this, but...

>> 1. A backwards-compatible refactoring of `CollectionType.indices`,
moving it to `Indexable`.
>>
>> 2. A backwards-compatible refactoring of `indexOf(…)` (adding optional
`range:` and moving it to `Indexable`).
>>
>> 3. The addition of `rangeOf(…)`, `countOf(…)` and `isSorted(…)` to
`Indexable` with a TIME complexity of `O(self.count)`.
>>
>> 4. The introduction of a `BinarySearchView` on `Indexable`, allowing
for fast (`O(log2(self.count))`) searches on `Indexable` via `indexOf(…)`,
`rangeOf(…)`, `countOf(…)`, `lowerBoundOf(…)`, `upperBoundOf(…)` without
cluttering `Indexable`'s interface.

I don't think you quite understand what `Indexable` is for.

`Indexable` is a minimal protocol containing the very most basic parts of
`CollectionType`'s interface. It's used to avoid circular definitions in
`CollectionType`. The doc comments on it describe it as "almost an
implementation detail". I don't think it's appropriate to move a whole
bunch of stuff into `Indexable` when it's supposed to be a minimal protocol.

--
Brent Royal-Gordon
Architechies

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Vincent Esche) #4

There are a couple of things I’m not 100% happy with/sure about:

1.
I don’t like how it’s
“<index/range/count>Of(element:range:)” but “<index/range/count>Of(range:predicate:)”.
The reason I went for “<index/range/count>Of(range:predicate:)” was to go with the standard pattern of putting closures last and thus allowing for trailing closure syntax.

The current order allows for this:
“<index/range/count>Of(0..10) { $0.name == “Foobar" }”
which I like syntax-wise.
It however makes it look like one was looking for a range ("indexOf(0..10, …)”), not the predicate.

While the alternative requires this:
“<index/range/count>Of(predicate: { $0.name == “Foobar" }, range: 0..10)”

I’m actually leaning towards the latter now. Dang!

2.
I’m unsure about the name of “lensView”. I first went with “transform”, then with “mappingFunction”, but found that neither of those made it clear that the closure should provide a specific view into the element. Both “transform" and “mappingFunction” imply the transformation from one data type to another. It’s not about transforming. It’s about accessing.
Thus looked for names of similar patterns and found “keypaths” (kind of) in ObjC and lenses in Haskell. To people familiar with functional programming the name should be clear. The closure passed to “lensView” is basically the getter part of a functional lens.
Anyway, I’m not too attached to the name and more than open to suggestions.

3.
“BinarySearchView.init(base:validateAgainst:)” currently asserts. This should probably be split into two init functions. One assuming the base to be sorted (“BinarySearchView.init(base:)”, O(1)). And one allowing for a check, returning nil on failure (“BinarySearchView.init?(base:validateAgainst:)”, O(self.count)). Or should at least throw an error, instead of panicking.

Note: I made some minor documentation changes/fixes.
See https://gist.github.com/regexident/2b7531bd748f57679671 for up-to-date RFC/source code (vs. Markdown-dump at bottom of OP).

- Vincent

···

On 04 Jan 2016, at 16:13, Vincent Esche <regexident.mailinglists@gmail.com> wrote:

After having had this code laying around on my Mac since 6/26/15 (waiting for Swift to go open source)
I figured it’s about damn time I submit the actual RFC for it. So without further ado…

One of the areas where Swift’s stdlib is still quite lacking is searching.
Which is a shame as searching (along with indexing) might be among
the most important tasks of computer science/software development.
One might not need fast collection searches for writing a banal fart or flashlight app,
but almost every other app is likely to benefit from having a proper searching API.

So I’d like to fix that.

Attached you find a full RFC along with the full and functional source code to make it happen.

I’d love to hear your opinion on this and will be more than happy to answer questions.

Rendered Version + Full Implementation: https://gist.github.com/regexident/2b7531bd748f57679671
(The code is tested using Quick/Nimble. Tests would thus have to be ported to Swift’s XCTest.)

- Vincent

Markdown-dump for

# Improved Collection Search

* Proposal: [SE-NNNN](https://github.com/apple/swift-evolution/blob/master/proposals/NNNN-name.md)
* Author(s): [Vincent Esche](https://github.com/regexident)
* Status: **Review**
* Review manager: TBD

## Introduction

This RFC proposes an extension to the currently rather limited and linear searching API of `CollectionType`, that is `indexOf(element:)` and `indexOf(predicate:)`.
It proposes the backwards-compatible refactoring of those existing methods as well as the introduction of a view-based ensemble of methods for efficient binary-search.

Swift-evolution thread: [link to the discussion thread for that proposal](https://lists.swift.org/pipermail/swift-evolution)

## Motivation

Indexing of and searching in data is a big deal in software development. As such it is crucial to have capable means of doing so in a language that aspires to be a highly performant programming language.

Swift's current API for searching is basically limited to these two methods on `CollectionType`:

> ```swift
> func indexOf(element: Self.Generator.Element) -> Self.Index?
> ```
> Returns the first index where value appears in self or nil if value is not found.
>
> **Complexity**: `O(self.count)`.

and:

> ```swift
> func indexOf(@noescape predicate: (Self.Generator.Element) throws -> Bool) rethrows -> Self.Index?
> ```
> Returns the first index where predicate returns true for the corresponding value, or nil if such value is not found.
>
> **Complexity**: `O(self.count)`.

The author sees a couple of issue with these two methods:

1. They do not provide an option for restricting the "haystack" to a certain `range` of `self`.
2. They do not provide a fast (`O(log2(self.count))`) path for sorted collections.
3. They should not be limited to `CollectionType` but instead be moved to `Indexable`.

In many situations it is desirable to perform fast searches on sorted collections instead of having to fall back to naïve linear searches.

Further more while a single `index` might be the most common subject of interest, there are many scenarios where a `range` or `count` of matches are of more use.

And anyone who is writing a custom ordered collection type will eventually want to find the insertion index for a given element and will thus end up writing some variant of `lowerBoundOf(…)` or `upperBoundOf(…)` (upon which all the other methods can be implemented, actually).

## Proposed solution

The author proposes:

1. A backwards-compatible refactoring of `CollectionType.indices`, moving it to `Indexable`.

2. A backwards-compatible refactoring of `indexOf(…)` (adding optional `range:` and moving it to `Indexable`).

3. The addition of `rangeOf(…)`, `countOf(…)` and `isSorted(…)` to `Indexable` with a TIME complexity of `O(self.count)`.

4. The introduction of a `BinarySearchView` on `Indexable`, allowing for fast (`O(log2(self.count))`) searches on `Indexable` via `indexOf(…)`, `rangeOf(…)`, `countOf(…)`, `lowerBoundOf(…)`, `upperBoundOf(…)` without cluttering `Indexable`'s interface.

## Detailed design

### `CollectionType.indices`:

The author proposes the relocation of `.indices` from `CollectionType` to `Indexable`:

extension Indexable {
    /// Return the range of valid index values.
    ///
    /// The result's `endIndex` is the same as that of `self`.  Because
    /// `Range` is half-open, iterating the values of the result produces
    /// all valid subscript arguments for `self`, omitting its `endIndex`.
    public var indices: Range<Self.Index> { get }
}

After all all it does is provide a convenience method for generating a Range from its `startIndex` and `endIndex`. There is no need for `self` to be a `CollectionType` for this.

This change should not break any existing code as it generalizes the property.

### `Indexable` of `Comparable` elements:

The author proposes the addition of the following method on `Indexable where Element : Comparable`:

extension Indexable where Index: RandomAccessIndexType, _Element: Comparable {
	/// Return true iff all elements in `self` within `range` are sorted according to `isOrderedBefore`.
	///
	/// - Complexity: O(`self.count`)
	public func isSorted(range: Range<Index>? = nil) -> Bool
}

This method would perform a linear scan of the `Indexable` with a TIME complexity of `O(self.count)`.

An actual implementation of these methods can be found in `Indexable.swift`.

### `Indexable` of `Equatable ` elements:

The author proposes the addition of the following method on `Indexable where Element : Equatable`:

extension Indexable where Index : RandomAccessIndexType, _Element : Equatable {

    /// Returns the first index where `element` appears in `self`
    /// within range `range` or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`self.count`)
    public func indexOf(element: _Element, range: Range<Index>? = default) -> Self.Index?

    /// Returns the first range where `element` appears in `self`
    /// within range `range` or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`self.count`)
    public func rangeOf(element: _Element, range: Range<Index>? = default) -> Range<Self.Index>?

    /// Returns the number of `element`s that appear in `self`
    /// within range `range` or `nil` if such an element is not found.
    ///
    /// - Complexity: O(`self.count`)
    public func countOf(element: _Element, range: Range<Index>? = default) -> Int
}

These methods would perform a linear scan of the `Indexable` with a TIME complexity of `O(self.count)`.

An actual implementation of these methods can be found in `Indexable.swift`.

### `Indexable` of any elements:

The author proposes the addition of the following methods on `Indexable`:

extension Indexable where Index : RandomAccessIndexType {

    /// Returns the first index where an element appears in `self`
    /// within range `range` for which `lensView(element) == value` or `nil` if such an element is not found.
    ///
    /// - Complexity: O(`self.count`)
    public func indexOf(range: Range<Index>? = default, @noescape predicate: (_Element) throws -> Bool) rethrows -> Self.Index?

    /// Returns the first range where an element appears in `self`
    /// within range `range` for which `lensView(element) == value` or `nil` if such an element is not found.
    ///
    /// - Complexity: O(`self.count`)
    public func rangeOf(range: Range<Index>? = default, @noescape predicate: (_Element) throws -> Bool) rethrows -> Range<Self.Index>?

    /// Returns the number of `element`s that appear in `self`
    /// within range `range` or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`self.count`)
    public func countOf(range: Range<Index>? = default, @noescape predicate: (_Element) throws -> Bool) rethrows -> Int

    /// Return true iff all elements in `self` within `range` are sorted according to `isOrderedBefore`.
    /// 
    /// - Complexity: O(`self.count`)
    public func isSorted(range: Range<Index>? = default, @noescape isOrderedBefore: (_Element, _Element) throws -> Bool) rethrows -> Bool
}

These methods would perform a linear scan of the `Indexable` with a TIME complexity of `O(self.count)`.

An actual implementation of these methods can be found in `Indexable.swift`.

**Note**: For explanation of and reasoning behind the `lensView:` argument as well as the distinction between `T` and `Base._Element`, see below.

### `Indexable.binarySearchView(validateAgainst:?)`:

The author proposes the addition of a `binarySearchView(validateAgainst:?)` method on `Indexable` for obtaining a specialized search view for performing searches with O(`log2(self.count)`) complexity:

extension Indexable where Index: RandomAccessIndexType {
	/// Create a view into a sorted indexable that allows access within `bounds`.
	///
	/// - Complexity: O(`self.count`) if a validation closure is provided, otherwise O(`1`).
	public func binarySearchView(validateAgainst isOrderedBefore: ((_Element, _Element) -> Bool)? = nil) -> BinarySearchView<Self>
}

### `BinarySearchView<T: Indexable>`:

The author proposes the introduction of a `BinarySearchView` struct with the following interface:

public struct BinarySearchView<Base : Indexable where Base.Index : RandomAccessIndexType> : Indexable {
    public typealias Index = Base.Index

    public var startIndex: Base.Index { get }

    public var endIndex: Base.Index { get }

    public subscript (position: Index) -> Base._Element { get }

    /// Create a view into a sorted collection `base` that allows access within `bounds`.
    ///
    /// - Complexity: O(1).
    public init(base: Base, validateAgainst isOrderedBefore: ((Base._Element, Base._Element) -> Bool)? = default)

    /// Returns first index of the first `element` in `self` within range `range` for which
    /// `lensView(element) < value` evaluates to false or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func lowerBoundOf<T : Comparable>(value: T, range: Range<Index>? = default, @noescape isOrderedBefore: (T, T) -> Bool = default, @noescape lensView: Base._Element -> T) -> Base.Index

    /// Returns first index of the first `element` in `self` within range `range` for which
    /// `value < lensView(element)` evaluates to true or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func upperBoundOf<T : Comparable>(value: T, range: Range<Index>? = default, @noescape isOrderedBefore: ((T, T) -> Bool) = default, @noescape lensView: Base._Element -> T) -> Base.Index

    /// Returns the first index where an element appears in `self`
    /// within range `range` for which `lensView(element) == value` or `nil` if such an element is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func indexOf<T : Comparable>(value: T, range: Range<Index>? = default, @noescape isOrderedBefore: (T, T) -> Bool = default, @noescape lensView: Base._Element -> T) -> Base.Index?

    /// Returns the range where an element appears in `self`
    /// within range `range` for which `lensView(element) == value` or `nil` if such an element is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func rangeOf<T : Comparable>(value: T, range: Range<Index>? = default, @noescape isOrderedBefore: (T, T) -> Bool = default, @noescape lensView: Base._Element -> T) -> Range<Base.Index>?

    /// Returns the number of `element`s that appear in `self`
    /// within range `range` for which `lensView(element) == value` or `nil` if such an element is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func countOf<T : Comparable>(value: T, range: Range<Index>? = default, @noescape isOrderedBefore: (T, T) -> Bool = default, @noescape lensView: Base._Element -> T) -> Int

    internal let _base: Base
}

As well as the following extension for `Indexable where Element: Comparable`:

extension BinarySearchView where Base._Element : Comparable {
    /// Returns first index of the first `element` in `self` within range `range` for which
    /// `… < element` evaluates to false or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func lowerBoundOf(element: Base._Element, range: Range<Index>? = default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool = default) -> Base.Index

    /// Returns first index of the first `element` in `self` within range `range` for which
    /// `element < …` evaluates to true or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func upperBoundOf(element: Base._Element, range: Range<Index>? = default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool = default) -> Base.Index

    /// Returns the first index where `element` appears in `self`
    /// within range `range` or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func indexOf(element: Base._Element, range: Range<Index>? = default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool = default) -> Base.Index?

    /// Returns the range where `element` appears in `self`
    /// within range `range` or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func rangeOf(element: Base._Element, range: Range<Index>? = default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool = default) -> Range<Base.Index>?

    /// Returns the number of `element`s that appear in `self`
    /// within range `range` or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func countOf(element: Base._Element, range: Range<Index>? = default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool = default) -> Int
}

These methods would perform a linear scan of the `Indexable` with a TIME complexity of `O(log2(self.count))`.

An actual implementation of these methods can be found in `BinarySearchView.swift`.

### Why `lensView`?

Let's assume one had a collection of Persons like this:

struct Person {
    let name: String
    let age: Int
}

let persons: [Person] = [ … ]

Now let's assume one wanted to find the first person that's called `John Doe`. One could either change `Person` to conform to `Equatable`, allowing `indexOf(element:)` to be called on `persons`. Now that was easy, wasn't it?

Fine, but what if one wanted to search `Person`s by both their `name` or their `age` depending on the given use case? You're out of luck now.

Luckily there is a variant of `indexOf(…)` called `indexOf(predicate:)` that is available for any type of `Element` regardless of its conformance to `Equatable`:

let searchView = persons.binarySearchView()
let index = persons.indexOf { $0.name == "John Doe" }

Unfortunately however such a `predicate` cannot be used when searching for an element's property in a sorted(!) collection or a collection of non-`Comparable` elements within a TIME complexity of `O(log2(self.count))` as binary search uses `<` instead of `==` internally and as the "needle" gets passed as both the first (in `indexOf(…)` itself) and second argument (in `lowerBoundOf(…)` which gets called by `indexOf(…)`) to the operator, one cannot simply change `predicate: (E) -> Bool` to `isOrderedBefore: (V, E) -> Bool` as that would result in a type mismatch (`(V, E)` vs. `(E, V)`).

The use of `lensView` however allows one to do just that without making a mess.

let searchView = persons.binarySearchView() // sorted!
let index = searchView.indexOf("John Doe") { $0.name }

The `lensView` works similarly to how keypaths work in Objective-C, just type-safe.

## Impact on existing code

The author proposes:

1. Moving `CollectionType.indices` to `Indexable` should not cause any issues with existing code.

2. Changing `indexOf(element:)` to `indexOf(element:range:?)` should not cause any issues with existing code as the `range` argument is optional and the default behaves like it used to.

2. Changing `indexOf(predicate:)` to `indexOf(range:?predicate:)` should not cause any issues with existing code either, as the `range` argument is optional and the default behaves like it used to. Swift's trailing-closure magic allows `indexOf(predicate: …)` to still properly map to `indexOf(range: nil, predicate: …)`.

3. The addition of `rangeOf(…)`, `countOf(…)` and `isSorted(…)` to `Indexable` do not conflict with any existing API.

4. The introduction of a `BinarySearchView` on `Indexable` does not conflict with any existing API.

## Alternatives considered

The author's initial approach was to implement all methods on Indexable. This however required all methods to either be provided in two variants (one for unsorted, one for sorted `Indexable`s):

final public func indexOf(element: _Element, range: Range<Index>? = nil) -> Index? {
	return …
}

final public func sortedIndexOf(element: _Element, range: Range<Index>? = nil) -> Index? {
	return …
}

Or require the introduction of an additional argument:

final public func indexOf(element: _Element, range: Range<Index>? = nil, isSorted: Bool = true) -> Index? {
    if isSorted {
        return …
    } else {
        return …
    }
}

It also would have cluttered `Indexable` with methods such as `lowerBoundOf` and `upperBoundOf`, which are only ever applicable when `self` is sorted accordingly. The introduction of a dedicated BinarySearchView fixes these issues.

One also would have had to switch from `predicate: (T) -> Bool` to `isOrderedBefore: (T, T) -> Bool` for all methods for the second unified API alternative.


(Florian Bachmann) #5

Hey Vincent,
good proposal, love to see this in production. Better collection search
would become quite handy :slight_smile:

regards,
Flori

···

On Mon, Jan 4, 2016 at 4:13 PM, Vincent Esche via swift-evolution < swift-evolution@swift.org> wrote:

After having had this code laying around on my Mac since 6/26/15 (waiting
for Swift to go open source)
I figured it’s about damn time I submit the actual RFC for it. So without
further ado…

One of the areas where Swift’s stdlib is still quite lacking is searching.
Which is a shame as searching (along with indexing) might be among
the most important tasks of computer science/software development.
One might not need fast collection searches for writing a banal fart or
flashlight app,
but almost every other app is likely to benefit from having a proper
searching API.

So I’d like to fix that.

Attached you find a full RFC along with the full and functional source
code to make it happen.

I’d love to hear your opinion on this and will be more than happy to
answer questions.

Rendered Version + Full Implementation:
https://gist.github.com/regexident/2b7531bd748f57679671
(The code is tested using Quick/Nimble. Tests would thus have to be ported
to Swift’s XCTest.)

- Vincent

Markdown-dump for

# Improved Collection Search

* Proposal: [SE-NNNN](
https://github.com/apple/swift-evolution/blob/master/proposals/NNNN-name.md
)
* Author(s): [Vincent Esche](https://github.com/regexident)
* Status: **Review**
* Review manager: TBD

## Introduction

This RFC proposes an extension to the currently rather limited and linear
searching API of `CollectionType`, that is `indexOf(element:)` and
`indexOf(predicate:)`.
It proposes the backwards-compatible refactoring of those existing methods
as well as the introduction of a view-based ensemble of methods for
efficient binary-search.

Swift-evolution thread: [link to the discussion thread for that proposal](
https://lists.swift.org/pipermail/swift-evolution)

## Motivation

Indexing of and searching in data is a big deal in software development.
As such it is crucial to have capable means of doing so in a language that
aspires to be a highly performant programming language.

Swift's current API for searching is basically limited to these two
methods on `CollectionType`:

> ```swift
> func indexOf(element: Self.Generator.Element) -> Self.Index?
> ```
> Returns the first index where value appears in self or nil if value is
not found.
>
> **Complexity**: `O(self.count)`.

and:

> ```swift
> func indexOf(@noescape predicate: (Self.Generator.Element) throws ->
Bool) rethrows -> Self.Index?
> ```
> Returns the first index where predicate returns true for the
corresponding value, or nil if such value is not found.
>
> **Complexity**: `O(self.count)`.

The author sees a couple of issue with these two methods:

1. They do not provide an option for restricting the "haystack" to a
certain `range` of `self`.
2. They do not provide a fast (`O(log2(self.count))`) path for sorted
collections.
3. They should not be limited to `CollectionType` but instead be moved to
`Indexable`.

In many situations it is desirable to perform fast searches on sorted
collections instead of having to fall back to naïve linear searches.

Further more while a single `index` might be the most common subject of
interest, there are many scenarios where a `range` or `count` of matches
are of more use.

And anyone who is writing a custom ordered collection type will eventually
want to find the insertion index for a given element and will thus end up
writing some variant of `lowerBoundOf(…)` or `upperBoundOf(…)` (upon which
all the other methods can be implemented, actually).

## Proposed solution

The author proposes:

1. A backwards-compatible refactoring of `CollectionType.indices`, moving
it to `Indexable`.

2. A backwards-compatible refactoring of `indexOf(…)` (adding optional
`range:` and moving it to `Indexable`).

3. The addition of `rangeOf(…)`, `countOf(…)` and `isSorted(…)` to
`Indexable` with a TIME complexity of `O(self.count)`.

4. The introduction of a `BinarySearchView` on `Indexable`, allowing for
fast (`O(log2(self.count))`) searches on `Indexable` via `indexOf(…)`,
`rangeOf(…)`, `countOf(…)`, `lowerBoundOf(…)`, `upperBoundOf(…)` without
cluttering `Indexable`'s interface.

## Detailed design

### `CollectionType.indices`:

The author proposes the relocation of `.indices` from `CollectionType` to
`Indexable`:

extension Indexable {
    /// Return the range of valid index values.
    ///
    /// The result's `endIndex` is the same as that of `self`.  Because
    /// `Range` is half-open, iterating the values of the result produces
    /// all valid subscript arguments for `self`, omitting its `endIndex`.
    public var indices: Range<Self.Index> { get }
}

After all all it does is provide a convenience method for generating a
Range from its `startIndex` and `endIndex`. There is no need for `self` to
be a `CollectionType` for this.

This change should not break any existing code as it generalizes the
property.

### `Indexable` of `Comparable` elements:

The author proposes the addition of the following method on `Indexable
where Element : Comparable`:

extension Indexable where Index: RandomAccessIndexType, _Element:
Comparable {
/// Return true iff all elements in `self` within `range` are sorted
according to `isOrderedBefore`.
///
/// - Complexity: O(`self.count`)
public func isSorted(range: Range<Index>? = nil) -> Bool
}

This method would perform a linear scan of the `Indexable` with a TIME
complexity of `O(self.count)`.

An actual implementation of these methods can be found in
`Indexable.swift`.

### `Indexable` of `Equatable ` elements:

The author proposes the addition of the following method on `Indexable
where Element : Equatable`:

extension Indexable where Index : RandomAccessIndexType, _Element :
Equatable {

    /// Returns the first index where `element` appears in `self`
    /// within range `range` or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`self.count`)
    public func indexOf(element: _Element, range: Range<Index>? = default)
-> Self.Index?

    /// Returns the first range where `element` appears in `self`
    /// within range `range` or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`self.count`)
    public func rangeOf(element: _Element, range: Range<Index>? = default)
-> Range<Self.Index>?

    /// Returns the number of `element`s that appear in `self`
    /// within range `range` or `nil` if such an element is not found.
    ///
    /// - Complexity: O(`self.count`)
    public func countOf(element: _Element, range: Range<Index>? = default)
-> Int
}

These methods would perform a linear scan of the `Indexable` with a TIME
complexity of `O(self.count)`.

An actual implementation of these methods can be found in
`Indexable.swift`.

### `Indexable` of any elements:

The author proposes the addition of the following methods on `Indexable`:

extension Indexable where Index : RandomAccessIndexType {

    /// Returns the first index where an element appears in `self`
    /// within range `range` for which `lensView(element) == value` or
`nil` if such an element is not found.
    ///
    /// - Complexity: O(`self.count`)
    public func indexOf(range: Range<Index>? = default, @noescape
predicate: (_Element) throws -> Bool) rethrows -> Self.Index?

    /// Returns the first range where an element appears in `self`
    /// within range `range` for which `lensView(element) == value` or
`nil` if such an element is not found.
    ///
    /// - Complexity: O(`self.count`)
    public func rangeOf(range: Range<Index>? = default, @noescape
predicate: (_Element) throws -> Bool) rethrows -> Range<Self.Index>?

    /// Returns the number of `element`s that appear in `self`
    /// within range `range` or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`self.count`)
    public func countOf(range: Range<Index>? = default, @noescape
predicate: (_Element) throws -> Bool) rethrows -> Int

    /// Return true iff all elements in `self` within `range` are sorted
according to `isOrderedBefore`.
    ///
    /// - Complexity: O(`self.count`)
    public func isSorted(range: Range<Index>? = default, @noescape
isOrderedBefore: (_Element, _Element) throws -> Bool) rethrows -> Bool
}

These methods would perform a linear scan of the `Indexable` with a TIME
complexity of `O(self.count)`.

An actual implementation of these methods can be found in
`Indexable.swift`.

**Note**: For explanation of and reasoning behind the `lensView:` argument
as well as the distinction between `T` and `Base._Element`, see below.

### `Indexable.binarySearchView(validateAgainst:?)`:

The author proposes the addition of a
`binarySearchView(validateAgainst:?)` method on `Indexable` for obtaining a
specialized search view for performing searches with O(`log2(self.count)`)
complexity:

extension Indexable where Index: RandomAccessIndexType {
/// Create a view into a sorted indexable that allows access within
`bounds`.
///
/// - Complexity: O(`self.count`) if a validation closure is provided,
otherwise O(`1`).
public func binarySearchView(validateAgainst isOrderedBefore: ((_Element,
_Element) -> Bool)? = nil) -> BinarySearchView<Self>
}

### `BinarySearchView<T: Indexable>`:

The author proposes the introduction of a `BinarySearchView` struct with
the following interface:

public struct BinarySearchView<Base : Indexable where Base.Index :
> : Indexable {
    public typealias Index = Base.Index

    public var startIndex: Base.Index { get }

    public var endIndex: Base.Index { get }

    public subscript (position: Index) -> Base._Element { get }

    /// Create a view into a sorted collection `base` that allows access
within `bounds`.
    ///
    /// - Complexity: O(1).
    public init(base: Base, validateAgainst isOrderedBefore:
((Base._Element, Base._Element) -> Bool)? = default)

    /// Returns first index of the first `element` in `self` within range
`range` for which
    /// `lensView(element) < value` evaluates to false or `nil` if
`element` is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func lowerBoundOf<T : Comparable>(value: T, range:
Range<Index>? = default, @noescape isOrderedBefore: (T, T) -> Bool =
default, @noescape lensView: Base._Element -> T) -> Base.Index

    /// Returns first index of the first `element` in `self` within range
`range` for which
    /// `value < lensView(element)` evaluates to true or `nil` if
`element` is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func upperBoundOf<T : Comparable>(value: T, range:
Range<Index>? = default, @noescape isOrderedBefore: ((T, T) -> Bool) =
default, @noescape lensView: Base._Element -> T) -> Base.Index

    /// Returns the first index where an element appears in `self`
    /// within range `range` for which `lensView(element) == value` or
`nil` if such an element is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func indexOf<T : Comparable>(value: T, range: Range<Index>? =
default, @noescape isOrderedBefore: (T, T) -> Bool = default, @noescape
lensView: Base._Element -> T) -> Base.Index?

    /// Returns the range where an element appears in `self`
    /// within range `range` for which `lensView(element) == value` or
`nil` if such an element is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func rangeOf<T : Comparable>(value: T, range: Range<Index>? =
default, @noescape isOrderedBefore: (T, T) -> Bool = default, @noescape
lensView: Base._Element -> T) -> Range<Base.Index>?

    /// Returns the number of `element`s that appear in `self`
    /// within range `range` for which `lensView(element) == value` or
`nil` if such an element is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func countOf<T : Comparable>(value: T, range: Range<Index>? =
default, @noescape isOrderedBefore: (T, T) -> Bool = default, @noescape
lensView: Base._Element -> T) -> Int

    internal let _base: Base
}

As well as the following extension for `Indexable where Element:
Comparable`:

extension BinarySearchView where Base._Element : Comparable {
    /// Returns first index of the first `element` in `self` within range
`range` for which
    /// `… < element` evaluates to false or `nil` if `element` is not
found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func lowerBoundOf(element: Base._Element, range: Range<Index>?
= default, @noescape isOrderedBefore: (Base._Element, Base._Element) ->
Bool = default) -> Base.Index

    /// Returns first index of the first `element` in `self` within range
`range` for which
    /// `element < …` evaluates to true or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func upperBoundOf(element: Base._Element, range: Range<Index>?
= default, @noescape isOrderedBefore: (Base._Element, Base._Element) ->
Bool = default) -> Base.Index

    /// Returns the first index where `element` appears in `self`
    /// within range `range` or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func indexOf(element: Base._Element, range: Range<Index>? =
default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool
= default) -> Base.Index?

    /// Returns the range where `element` appears in `self`
    /// within range `range` or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func rangeOf(element: Base._Element, range: Range<Index>? =
default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool
= default) -> Range<Base.Index>?

    /// Returns the number of `element`s that appear in `self`
    /// within range `range` or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func countOf(element: Base._Element, range: Range<Index>? =
default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool
= default) -> Int
}

These methods would perform a linear scan of the `Indexable` with a TIME
complexity of `O(log2(self.count))`.

An actual implementation of these methods can be found in
`BinarySearchView.swift`.

### Why `lensView`?

Let's assume one had a collection of Persons like this:

struct Person {
    let name: String
    let age: Int
}

let persons: [Person] = [ … ]

Now let's assume one wanted to find the first person that's called `John
Doe`. One could either change `Person` to conform to `Equatable`, allowing
`indexOf(element:)` to be called on `persons`. Now that was easy, wasn't it?

Fine, but what if one wanted to search `Person`s by both their `name` or
their `age` depending on the given use case? You're out of luck now.

Luckily there is a variant of `indexOf(…)` called `indexOf(predicate:)`
that is available for any type of `Element` regardless of its conformance
to `Equatable`:

let searchView = persons.binarySearchView()
let index = persons.indexOf { $0.name == "John Doe" }

Unfortunately however such a `predicate` cannot be used when searching for
an element's property in a sorted(!) collection or a collection of
non-`Comparable` elements within a TIME complexity of `O(log2(self.count))`
as binary search uses `<` instead of `==` internally and as the "needle"
gets passed as both the first (in `indexOf(…)` itself) and second argument
(in `lowerBoundOf(…)` which gets called by `indexOf(…)`) to the operator,
one cannot simply change `predicate: (E) -> Bool` to `isOrderedBefore: (V,
E) -> Bool` as that would result in a type mismatch (`(V, E)` vs. `(E, V)`).

The use of `lensView` however allows one to do just that without making a
mess.

let searchView = persons.binarySearchView() // sorted!
let index = searchView.indexOf("John Doe") { $0.name }

The `lensView` works similarly to how keypaths work in Objective-C, just
type-safe.

## Impact on existing code

The author proposes:

1. Moving `CollectionType.indices` to `Indexable` should not cause any
issues with existing code.

2. Changing `indexOf(element:)` to `indexOf(element:range:?)` should not
cause any issues with existing code as the `range` argument is optional and
the default behaves like it used to.

2. Changing `indexOf(predicate:)` to `indexOf(range:?predicate:)` should
not cause any issues with existing code either, as the `range` argument is
optional and the default behaves like it used to. Swift's trailing-closure
magic allows `indexOf(predicate: …)` to still properly map to
`indexOf(range: nil, predicate: …)`.

3. The addition of `rangeOf(…)`, `countOf(…)` and `isSorted(…)` to
`Indexable` do not conflict with any existing API.

4. The introduction of a `BinarySearchView` on `Indexable` does not
conflict with any existing API.

## Alternatives considered

The author's initial approach was to implement all methods on Indexable.
This however required all methods to either be provided in two variants
(one for unsorted, one for sorted `Indexable`s):

final public func indexOf(element: _Element, range: Range<Index>? = nil)
-> Index? {
return …
}

final public func sortedIndexOf(element: _Element, range: Range<Index>? =
nil) -> Index? {
return …
}

Or require the introduction of an additional argument:

final public func indexOf(element: _Element, range: Range<Index>? = nil,
isSorted: Bool = true) -> Index? {
    if isSorted {
        return …
    } else {
        return …
    }
}

It also would have cluttered `Indexable` with methods such as
`lowerBoundOf` and `upperBoundOf`, which are only ever applicable when
`self` is sorted accordingly. The introduction of a dedicated
BinarySearchView fixes these issues.

One also would have had to switch from `predicate: (T) -> Bool` to
`isOrderedBefore: (T, T) -> Bool` for all methods for the second unified
API alternative.

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(nschum) #6

Hi Vincent,

I’m all for more search in the standard library. Most of these functions are very likely to be re-implemented in slightly different ways across projects and therefore would benefit from standardization.

Here are my notes on things that I think need some work:

- I dislike the possible assumption of a Collection to be “sorted" without validation. That’s an optimization that I consider too risky for a standard library. I think validation should be mandatory and also a “precondition" rather than a thrown error. (What are you supposed to do when you catch the error? The only thing I can think of is: Sort and try again.)

[Note: I only realized before sending that you had just changed this *to* an error. Why?]

- To me the name “binarySearchView” actually implies that it’s a sorted view into the collection (i.e. sorted once for O(log n) lookups afterwards). I would look for a different name, perhaps this one, inspired by your comment and modelled after “lazy":

let index = assumeSorted(persons).indexOf("John Doe”)

- I find it especially problematic that each method in BinarySearchView requires that the collection be sorted according to that method’s arguments beforehand. That is, the interface will let me use methods with any comparison function even though just one will work.

let persons = [Person(name: "Alice", age: 50), Person(name: "Bob", age: 30)]
let searchView = persons.binarySearchView // assuming sorted!
searchView.indexOf(Person(name: "Alice", age: 50), isOrderedBefore: <) // works
searchView.indexOf(Person(name: "Alice", age: 50), isOrderedBefore: >) // nonsense

The same applies to "lensViews": The following code is wrong, but looks fine at first glance:

let searchView = persons.binarySearchView // assuming sorted!
searchView.indexOf("John Doe") { $0.name } // works
searchView.indexOf(30) { $0.ago } // nonsense

It would be safer to make both properties of the searchView to avoid that possibility in the first place.

let searchView = persons.binarySearchView(<, {$0.name}) // assuming sorted!
searchView.indexOf("John Doe”)

- Have you considered just returning dictionaries, like Python’s “Counter"? That’s O(n), the same as the cost of validation, but allows O(1) lookups afterwards, even if the collection isn’t sorted.

- Have you considered including subsequence search?

Conclusion:
- Search functions, yes please!
- Search views need work, but might not even be worth it if validation is mandatory
- I would probably just remove lensView and maybe put that into separate proposal and consider its use across the entire stdlib. It seems like an orthogonal issue and the proposal would be more focused without. But I don’t see much benefit over using “map”, yet.

Nik

···

On 4 Jan 2016, at 16:13, Vincent Esche via swift-evolution <swift-evolution@swift.org> wrote:

After having had this code laying around on my Mac since 6/26/15 (waiting for Swift to go open source)
I figured it’s about damn time I submit the actual RFC for it. So without further ado…

One of the areas where Swift’s stdlib is still quite lacking is searching.
Which is a shame as searching (along with indexing) might be among
the most important tasks of computer science/software development.
One might not need fast collection searches for writing a banal fart or flashlight app,
but almost every other app is likely to benefit from having a proper searching API.

So I’d like to fix that.

Attached you find a full RFC along with the full and functional source code to make it happen.

I’d love to hear your opinion on this and will be more than happy to answer questions.

Rendered Version + Full Implementation: https://gist.github.com/regexident/2b7531bd748f57679671
(The code is tested using Quick/Nimble. Tests would thus have to be ported to Swift’s XCTest.)

- Vincent

Markdown-dump for

# Improved Collection Search

* Proposal: [SE-NNNN](https://github.com/apple/swift-evolution/blob/master/proposals/NNNN-name.md)
* Author(s): [Vincent Esche](https://github.com/regexident)
* Status: **Review**
* Review manager: TBD

## Introduction

This RFC proposes an extension to the currently rather limited and linear searching API of `CollectionType`, that is `indexOf(element:)` and `indexOf(predicate:)`.
It proposes the backwards-compatible refactoring of those existing methods as well as the introduction of a view-based ensemble of methods for efficient binary-search.

Swift-evolution thread: [link to the discussion thread for that proposal](https://lists.swift.org/pipermail/swift-evolution)

## Motivation

Indexing of and searching in data is a big deal in software development. As such it is crucial to have capable means of doing so in a language that aspires to be a highly performant programming language.

Swift's current API for searching is basically limited to these two methods on `CollectionType`:

> ```swift
> func indexOf(element: Self.Generator.Element) -> Self.Index?
> ```
> Returns the first index where value appears in self or nil if value is not found.
>
> **Complexity**: `O(self.count)`.

and:

> ```swift
> func indexOf(@noescape predicate: (Self.Generator.Element) throws -> Bool) rethrows -> Self.Index?
> ```
> Returns the first index where predicate returns true for the corresponding value, or nil if such value is not found.
>
> **Complexity**: `O(self.count)`.

The author sees a couple of issue with these two methods:

1. They do not provide an option for restricting the "haystack" to a certain `range` of `self`.
2. They do not provide a fast (`O(log2(self.count))`) path for sorted collections.
3. They should not be limited to `CollectionType` but instead be moved to `Indexable`.

In many situations it is desirable to perform fast searches on sorted collections instead of having to fall back to naïve linear searches.

Further more while a single `index` might be the most common subject of interest, there are many scenarios where a `range` or `count` of matches are of more use.

And anyone who is writing a custom ordered collection type will eventually want to find the insertion index for a given element and will thus end up writing some variant of `lowerBoundOf(…)` or `upperBoundOf(…)` (upon which all the other methods can be implemented, actually).

## Proposed solution

The author proposes:

1. A backwards-compatible refactoring of `CollectionType.indices`, moving it to `Indexable`.

2. A backwards-compatible refactoring of `indexOf(…)` (adding optional `range:` and moving it to `Indexable`).

3. The addition of `rangeOf(…)`, `countOf(…)` and `isSorted(…)` to `Indexable` with a TIME complexity of `O(self.count)`.

4. The introduction of a `BinarySearchView` on `Indexable`, allowing for fast (`O(log2(self.count))`) searches on `Indexable` via `indexOf(…)`, `rangeOf(…)`, `countOf(…)`, `lowerBoundOf(…)`, `upperBoundOf(…)` without cluttering `Indexable`'s interface.

## Detailed design

### `CollectionType.indices`:

The author proposes the relocation of `.indices` from `CollectionType` to `Indexable`:

extension Indexable {
    /// Return the range of valid index values.
    ///
    /// The result's `endIndex` is the same as that of `self`.  Because
    /// `Range` is half-open, iterating the values of the result produces
    /// all valid subscript arguments for `self`, omitting its `endIndex`.
    public var indices: Range<Self.Index> { get }
}

After all all it does is provide a convenience method for generating a Range from its `startIndex` and `endIndex`. There is no need for `self` to be a `CollectionType` for this.

This change should not break any existing code as it generalizes the property.

### `Indexable` of `Comparable` elements:

The author proposes the addition of the following method on `Indexable where Element : Comparable`:

extension Indexable where Index: RandomAccessIndexType, _Element: Comparable {
	/// Return true iff all elements in `self` within `range` are sorted according to `isOrderedBefore`.
	///
	/// - Complexity: O(`self.count`)
	public func isSorted(range: Range<Index>? = nil) -> Bool
}

This method would perform a linear scan of the `Indexable` with a TIME complexity of `O(self.count)`.

An actual implementation of these methods can be found in `Indexable.swift`.

### `Indexable` of `Equatable ` elements:

The author proposes the addition of the following method on `Indexable where Element : Equatable`:

extension Indexable where Index : RandomAccessIndexType, _Element : Equatable {

    /// Returns the first index where `element` appears in `self`
    /// within range `range` or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`self.count`)
    public func indexOf(element: _Element, range: Range<Index>? = default) -> Self.Index?

    /// Returns the first range where `element` appears in `self`
    /// within range `range` or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`self.count`)
    public func rangeOf(element: _Element, range: Range<Index>? = default) -> Range<Self.Index>?

    /// Returns the number of `element`s that appear in `self`
    /// within range `range` or `nil` if such an element is not found.
    ///
    /// - Complexity: O(`self.count`)
    public func countOf(element: _Element, range: Range<Index>? = default) -> Int
}

These methods would perform a linear scan of the `Indexable` with a TIME complexity of `O(self.count)`.

An actual implementation of these methods can be found in `Indexable.swift`.

### `Indexable` of any elements:

The author proposes the addition of the following methods on `Indexable`:

extension Indexable where Index : RandomAccessIndexType {

    /// Returns the first index where an element appears in `self`
    /// within range `range` for which `lensView(element) == value` or `nil` if such an element is not found.
    ///
    /// - Complexity: O(`self.count`)
    public func indexOf(range: Range<Index>? = default, @noescape predicate: (_Element) throws -> Bool) rethrows -> Self.Index?

    /// Returns the first range where an element appears in `self`
    /// within range `range` for which `lensView(element) == value` or `nil` if such an element is not found.
    ///
    /// - Complexity: O(`self.count`)
    public func rangeOf(range: Range<Index>? = default, @noescape predicate: (_Element) throws -> Bool) rethrows -> Range<Self.Index>?

    /// Returns the number of `element`s that appear in `self`
    /// within range `range` or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`self.count`)
    public func countOf(range: Range<Index>? = default, @noescape predicate: (_Element) throws -> Bool) rethrows -> Int

    /// Return true iff all elements in `self` within `range` are sorted according to `isOrderedBefore`.
    /// 
    /// - Complexity: O(`self.count`)
    public func isSorted(range: Range<Index>? = default, @noescape isOrderedBefore: (_Element, _Element) throws -> Bool) rethrows -> Bool
}

These methods would perform a linear scan of the `Indexable` with a TIME complexity of `O(self.count)`.

An actual implementation of these methods can be found in `Indexable.swift`.

**Note**: For explanation of and reasoning behind the `lensView:` argument as well as the distinction between `T` and `Base._Element`, see below.

### `Indexable.binarySearchView(validateAgainst:?)`:

The author proposes the addition of a `binarySearchView(validateAgainst:?)` method on `Indexable` for obtaining a specialized search view for performing searches with O(`log2(self.count)`) complexity:

extension Indexable where Index: RandomAccessIndexType {
	/// Create a view into a sorted indexable that allows access within `bounds`.
	///
	/// - Complexity: O(`self.count`) if a validation closure is provided, otherwise O(`1`).
	public func binarySearchView(validateAgainst isOrderedBefore: ((_Element, _Element) -> Bool)? = nil) -> BinarySearchView<Self>
}

### `BinarySearchView<T: Indexable>`:

The author proposes the introduction of a `BinarySearchView` struct with the following interface:

public struct BinarySearchView<Base : Indexable where Base.Index : RandomAccessIndexType> : Indexable {
    public typealias Index = Base.Index

    public var startIndex: Base.Index { get }

    public var endIndex: Base.Index { get }

    public subscript (position: Index) -> Base._Element { get }

    /// Create a view into a sorted collection `base` that allows access within `bounds`.
    ///
    /// - Complexity: O(1).
    public init(base: Base, validateAgainst isOrderedBefore: ((Base._Element, Base._Element) -> Bool)? = default)

    /// Returns first index of the first `element` in `self` within range `range` for which
    /// `lensView(element) < value` evaluates to false or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func lowerBoundOf<T : Comparable>(value: T, range: Range<Index>? = default, @noescape isOrderedBefore: (T, T) -> Bool = default, @noescape lensView: Base._Element -> T) -> Base.Index

    /// Returns first index of the first `element` in `self` within range `range` for which
    /// `value < lensView(element)` evaluates to true or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func upperBoundOf<T : Comparable>(value: T, range: Range<Index>? = default, @noescape isOrderedBefore: ((T, T) -> Bool) = default, @noescape lensView: Base._Element -> T) -> Base.Index

    /// Returns the first index where an element appears in `self`
    /// within range `range` for which `lensView(element) == value` or `nil` if such an element is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func indexOf<T : Comparable>(value: T, range: Range<Index>? = default, @noescape isOrderedBefore: (T, T) -> Bool = default, @noescape lensView: Base._Element -> T) -> Base.Index?

    /// Returns the range where an element appears in `self`
    /// within range `range` for which `lensView(element) == value` or `nil` if such an element is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func rangeOf<T : Comparable>(value: T, range: Range<Index>? = default, @noescape isOrderedBefore: (T, T) -> Bool = default, @noescape lensView: Base._Element -> T) -> Range<Base.Index>?

    /// Returns the number of `element`s that appear in `self`
    /// within range `range` for which `lensView(element) == value` or `nil` if such an element is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func countOf<T : Comparable>(value: T, range: Range<Index>? = default, @noescape isOrderedBefore: (T, T) -> Bool = default, @noescape lensView: Base._Element -> T) -> Int

    internal let _base: Base
}

As well as the following extension for `Indexable where Element: Comparable`:

extension BinarySearchView where Base._Element : Comparable {
    /// Returns first index of the first `element` in `self` within range `range` for which
    /// `… < element` evaluates to false or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func lowerBoundOf(element: Base._Element, range: Range<Index>? = default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool = default) -> Base.Index

    /// Returns first index of the first `element` in `self` within range `range` for which
    /// `element < …` evaluates to true or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func upperBoundOf(element: Base._Element, range: Range<Index>? = default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool = default) -> Base.Index

    /// Returns the first index where `element` appears in `self`
    /// within range `range` or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func indexOf(element: Base._Element, range: Range<Index>? = default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool = default) -> Base.Index?

    /// Returns the range where `element` appears in `self`
    /// within range `range` or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func rangeOf(element: Base._Element, range: Range<Index>? = default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool = default) -> Range<Base.Index>?

    /// Returns the number of `element`s that appear in `self`
    /// within range `range` or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func countOf(element: Base._Element, range: Range<Index>? = default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool = default) -> Int
}

These methods would perform a linear scan of the `Indexable` with a TIME complexity of `O(log2(self.count))`.

An actual implementation of these methods can be found in `BinarySearchView.swift`.

### Why `lensView`?

Let's assume one had a collection of Persons like this:

struct Person {
    let name: String
    let age: Int
}

let persons: [Person] = [ … ]

Now let's assume one wanted to find the first person that's called `John Doe`. One could either change `Person` to conform to `Equatable`, allowing `indexOf(element:)` to be called on `persons`. Now that was easy, wasn't it?

Fine, but what if one wanted to search `Person`s by both their `name` or their `age` depending on the given use case? You're out of luck now.

Luckily there is a variant of `indexOf(…)` called `indexOf(predicate:)` that is available for any type of `Element` regardless of its conformance to `Equatable`:

let searchView = persons.binarySearchView()
let index = persons.indexOf { $0.name == "John Doe" }

Unfortunately however such a `predicate` cannot be used when searching for an element's property in a sorted(!) collection or a collection of non-`Comparable` elements within a TIME complexity of `O(log2(self.count))` as binary search uses `<` instead of `==` internally and as the "needle" gets passed as both the first (in `indexOf(…)` itself) and second argument (in `lowerBoundOf(…)` which gets called by `indexOf(…)`) to the operator, one cannot simply change `predicate: (E) -> Bool` to `isOrderedBefore: (V, E) -> Bool` as that would result in a type mismatch (`(V, E)` vs. `(E, V)`).

The use of `lensView` however allows one to do just that without making a mess.

let searchView = persons.binarySearchView() // sorted!
let index = searchView.indexOf("John Doe") { $0.name }

The `lensView` works similarly to how keypaths work in Objective-C, just type-safe.

## Impact on existing code

The author proposes:

1. Moving `CollectionType.indices` to `Indexable` should not cause any issues with existing code.

2. Changing `indexOf(element:)` to `indexOf(element:range:?)` should not cause any issues with existing code as the `range` argument is optional and the default behaves like it used to.

2. Changing `indexOf(predicate:)` to `indexOf(range:?predicate:)` should not cause any issues with existing code either, as the `range` argument is optional and the default behaves like it used to. Swift's trailing-closure magic allows `indexOf(predicate: …)` to still properly map to `indexOf(range: nil, predicate: …)`.

3. The addition of `rangeOf(…)`, `countOf(…)` and `isSorted(…)` to `Indexable` do not conflict with any existing API.

4. The introduction of a `BinarySearchView` on `Indexable` does not conflict with any existing API.

## Alternatives considered

The author's initial approach was to implement all methods on Indexable. This however required all methods to either be provided in two variants (one for unsorted, one for sorted `Indexable`s):

final public func indexOf(element: _Element, range: Range<Index>? = nil) -> Index? {
	return …
}

final public func sortedIndexOf(element: _Element, range: Range<Index>? = nil) -> Index? {
	return …
}

Or require the introduction of an additional argument:

final public func indexOf(element: _Element, range: Range<Index>? = nil, isSorted: Bool = true) -> Index? {
    if isSorted {
        return …
    } else {
        return …
    }
}

It also would have cluttered `Indexable` with methods such as `lowerBoundOf` and `upperBoundOf`, which are only ever applicable when `self` is sorted accordingly. The introduction of a dedicated BinarySearchView fixes these issues.

One also would have had to switch from `predicate: (T) -> Bool` to `isOrderedBefore: (T, T) -> Bool` for all methods for the second unified API alternative.

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Vincent Esche) #7

That was my reasoning behind choosing Indexable and not CollectionType.

Neither the Swift documents, nor the “headers” indicate any semi-privateness of Indexable, AFAIK.
(Apart from maybe Base._Element kind of hinting at it, but then again it’s Indexable, not _Indexable.)

Anyway, I’d be totally fine with sticking it on CollectionType, if that’s preferred.

Vincent

···

On 05 Jan 2016, at 00:58, T.J. Usiyan via swift-evolution <swift-evolution@swift.org> wrote:

The proposed additions to Indexible only require Indices… I don't see how adding them is such a mistake. If we can gain these behaviors while only *requiring* that minimum interface… go for it.

TJ

On Mon, Jan 4, 2016 at 3:49 PM, Brent Royal-Gordon via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
I'm not sure about the rest of this, but...

>> 1. A backwards-compatible refactoring of `CollectionType.indices`, moving it to `Indexable`.
>>
>> 2. A backwards-compatible refactoring of `indexOf(…)` (adding optional `range:` and moving it to `Indexable`).
>>
>> 3. The addition of `rangeOf(…)`, `countOf(…)` and `isSorted(…)` to `Indexable` with a TIME complexity of `O(self.count)`.
>>
>> 4. The introduction of a `BinarySearchView` on `Indexable`, allowing for fast (`O(log2(self.count))`) searches on `Indexable` via `indexOf(…)`, `rangeOf(…)`, `countOf(…)`, `lowerBoundOf(…)`, `upperBoundOf(…)` without cluttering `Indexable`'s interface.

I don't think you quite understand what `Indexable` is for.

`Indexable` is a minimal protocol containing the very most basic parts of `CollectionType`'s interface. It's used to avoid circular definitions in `CollectionType`. The doc comments on it describe it as "almost an implementation detail". I don't think it's appropriate to move a whole bunch of stuff into `Indexable` when it's supposed to be a minimal protocol.

--
Brent Royal-Gordon
Architechies

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Donnacha Oisín Kidney) #8

Unless I’m confused, all of the “ranged” overloads can already be achieved by composing ranged subscripts with the indexOf, etc., methods:

let ar = [0, 1, 2, 3, 4, 5]
let ind = ar[3...5].indexOf { n in n % 2 == 0 }
ar[ind!] // 4

···

On 5 Jan 2016, at 01:00, Vincent Esche via swift-evolution <swift-evolution@swift.org> wrote:

There are a couple of things I’m not 100% happy with/sure about:

1.
I don’t like how it’s
“<index/range/count>Of(element:range:)” but “<index/range/count>Of(range:predicate:)”.
The reason I went for “<index/range/count>Of(range:predicate:)” was to go with the standard pattern of putting closures last and thus allowing for trailing closure syntax.

The current order allows for this:
“<index/range/count>Of(0..10) { $0.name == “Foobar" }”
which I like syntax-wise.
It however makes it look like one was looking for a range ("indexOf(0..10, …)”), not the predicate.

While the alternative requires this:
“<index/range/count>Of(predicate: { $0.name == “Foobar" }, range: 0..10)”

I’m actually leaning towards the latter now. Dang!

2.
I’m unsure about the name of “lensView”. I first went with “transform”, then with “mappingFunction”, but found that neither of those made it clear that the closure should provide a specific view into the element. Both “transform" and “mappingFunction” imply the transformation from one data type to another. It’s not about transforming. It’s about accessing.
Thus looked for names of similar patterns and found “keypaths” (kind of) in ObjC and lenses in Haskell. To people familiar with functional programming the name should be clear. The closure passed to “lensView” is basically the getter part of a functional lens.
Anyway, I’m not too attached to the name and more than open to suggestions.

3.
“BinarySearchView.init(base:validateAgainst:)” currently asserts. This should probably be split into two init functions. One assuming the base to be sorted (“BinarySearchView.init(base:)”, O(1)). And one allowing for a check, returning nil on failure (“BinarySearchView.init?(base:validateAgainst:)”, O(self.count)). Or should at least throw an error, instead of panicking.

Note: I made some minor documentation changes/fixes.
See https://gist.github.com/regexident/2b7531bd748f57679671 for up-to-date RFC/source code (vs. Markdown-dump at bottom of OP).

- Vincent

On 04 Jan 2016, at 16:13, Vincent Esche <regexident.mailinglists@gmail.com <mailto:regexident.mailinglists@gmail.com>> wrote:

After having had this code laying around on my Mac since 6/26/15 (waiting for Swift to go open source)
I figured it’s about damn time I submit the actual RFC for it. So without further ado…

One of the areas where Swift’s stdlib is still quite lacking is searching.
Which is a shame as searching (along with indexing) might be among
the most important tasks of computer science/software development.
One might not need fast collection searches for writing a banal fart or flashlight app,
but almost every other app is likely to benefit from having a proper searching API.

So I’d like to fix that.

Attached you find a full RFC along with the full and functional source code to make it happen.

I’d love to hear your opinion on this and will be more than happy to answer questions.

Rendered Version + Full Implementation: https://gist.github.com/regexident/2b7531bd748f57679671
(The code is tested using Quick/Nimble. Tests would thus have to be ported to Swift’s XCTest.)

- Vincent

Markdown-dump for

# Improved Collection Search

* Proposal: [SE-NNNN](https://github.com/apple/swift-evolution/blob/master/proposals/NNNN-name.md)
* Author(s): [Vincent Esche](https://github.com/regexident)
* Status: **Review**
* Review manager: TBD

## Introduction

This RFC proposes an extension to the currently rather limited and linear searching API of `CollectionType`, that is `indexOf(element:)` and `indexOf(predicate:)`.
It proposes the backwards-compatible refactoring of those existing methods as well as the introduction of a view-based ensemble of methods for efficient binary-search.

Swift-evolution thread: [link to the discussion thread for that proposal](https://lists.swift.org/pipermail/swift-evolution)

## Motivation

Indexing of and searching in data is a big deal in software development. As such it is crucial to have capable means of doing so in a language that aspires to be a highly performant programming language.

Swift's current API for searching is basically limited to these two methods on `CollectionType`:

> ```swift
> func indexOf(element: Self.Generator.Element) -> Self.Index?
> ```
> Returns the first index where value appears in self or nil if value is not found.
>
> **Complexity**: `O(self.count)`.

and:

> ```swift
> func indexOf(@noescape predicate: (Self.Generator.Element) throws -> Bool) rethrows -> Self.Index?
> ```
> Returns the first index where predicate returns true for the corresponding value, or nil if such value is not found.
>
> **Complexity**: `O(self.count)`.

The author sees a couple of issue with these two methods:

1. They do not provide an option for restricting the "haystack" to a certain `range` of `self`.
2. They do not provide a fast (`O(log2(self.count))`) path for sorted collections.
3. They should not be limited to `CollectionType` but instead be moved to `Indexable`.

In many situations it is desirable to perform fast searches on sorted collections instead of having to fall back to naïve linear searches.

Further more while a single `index` might be the most common subject of interest, there are many scenarios where a `range` or `count` of matches are of more use.

And anyone who is writing a custom ordered collection type will eventually want to find the insertion index for a given element and will thus end up writing some variant of `lowerBoundOf(…)` or `upperBoundOf(…)` (upon which all the other methods can be implemented, actually).

## Proposed solution

The author proposes:

1. A backwards-compatible refactoring of `CollectionType.indices`, moving it to `Indexable`.

2. A backwards-compatible refactoring of `indexOf(…)` (adding optional `range:` and moving it to `Indexable`).

3. The addition of `rangeOf(…)`, `countOf(…)` and `isSorted(…)` to `Indexable` with a TIME complexity of `O(self.count)`.

4. The introduction of a `BinarySearchView` on `Indexable`, allowing for fast (`O(log2(self.count))`) searches on `Indexable` via `indexOf(…)`, `rangeOf(…)`, `countOf(…)`, `lowerBoundOf(…)`, `upperBoundOf(…)` without cluttering `Indexable`'s interface.

## Detailed design

### `CollectionType.indices`:

The author proposes the relocation of `.indices` from `CollectionType` to `Indexable`:

extension Indexable {
    /// Return the range of valid index values.
    ///
    /// The result's `endIndex` is the same as that of `self`.  Because
    /// `Range` is half-open, iterating the values of the result produces
    /// all valid subscript arguments for `self`, omitting its `endIndex`.
    public var indices: Range<Self.Index> { get }
}

After all all it does is provide a convenience method for generating a Range from its `startIndex` and `endIndex`. There is no need for `self` to be a `CollectionType` for this.

This change should not break any existing code as it generalizes the property.

### `Indexable` of `Comparable` elements:

The author proposes the addition of the following method on `Indexable where Element : Comparable`:

extension Indexable where Index: RandomAccessIndexType, _Element: Comparable {
	/// Return true iff all elements in `self` within `range` are sorted according to `isOrderedBefore`.
	///
	/// - Complexity: O(`self.count`)
	public func isSorted(range: Range<Index>? = nil) -> Bool
}

This method would perform a linear scan of the `Indexable` with a TIME complexity of `O(self.count)`.

An actual implementation of these methods can be found in `Indexable.swift`.

### `Indexable` of `Equatable ` elements:

The author proposes the addition of the following method on `Indexable where Element : Equatable`:

extension Indexable where Index : RandomAccessIndexType, _Element : Equatable {

    /// Returns the first index where `element` appears in `self`
    /// within range `range` or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`self.count`)
    public func indexOf(element: _Element, range: Range<Index>? = default) -> Self.Index?

    /// Returns the first range where `element` appears in `self`
    /// within range `range` or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`self.count`)
    public func rangeOf(element: _Element, range: Range<Index>? = default) -> Range<Self.Index>?

    /// Returns the number of `element`s that appear in `self`
    /// within range `range` or `nil` if such an element is not found.
    ///
    /// - Complexity: O(`self.count`)
    public func countOf(element: _Element, range: Range<Index>? = default) -> Int
}

These methods would perform a linear scan of the `Indexable` with a TIME complexity of `O(self.count)`.

An actual implementation of these methods can be found in `Indexable.swift`.

### `Indexable` of any elements:

The author proposes the addition of the following methods on `Indexable`:

extension Indexable where Index : RandomAccessIndexType {

    /// Returns the first index where an element appears in `self`
    /// within range `range` for which `lensView(element) == value` or `nil` if such an element is not found.
    ///
    /// - Complexity: O(`self.count`)
    public func indexOf(range: Range<Index>? = default, @noescape predicate: (_Element) throws -> Bool) rethrows -> Self.Index?

    /// Returns the first range where an element appears in `self`
    /// within range `range` for which `lensView(element) == value` or `nil` if such an element is not found.
    ///
    /// - Complexity: O(`self.count`)
    public func rangeOf(range: Range<Index>? = default, @noescape predicate: (_Element) throws -> Bool) rethrows -> Range<Self.Index>?

    /// Returns the number of `element`s that appear in `self`
    /// within range `range` or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`self.count`)
    public func countOf(range: Range<Index>? = default, @noescape predicate: (_Element) throws -> Bool) rethrows -> Int

    /// Return true iff all elements in `self` within `range` are sorted according to `isOrderedBefore`.
    /// 
    /// - Complexity: O(`self.count`)
    public func isSorted(range: Range<Index>? = default, @noescape isOrderedBefore: (_Element, _Element) throws -> Bool) rethrows -> Bool
}

These methods would perform a linear scan of the `Indexable` with a TIME complexity of `O(self.count)`.

An actual implementation of these methods can be found in `Indexable.swift`.

**Note**: For explanation of and reasoning behind the `lensView:` argument as well as the distinction between `T` and `Base._Element`, see below.

### `Indexable.binarySearchView(validateAgainst:?)`:

The author proposes the addition of a `binarySearchView(validateAgainst:?)` method on `Indexable` for obtaining a specialized search view for performing searches with O(`log2(self.count)`) complexity:

extension Indexable where Index: RandomAccessIndexType {
	/// Create a view into a sorted indexable that allows access within `bounds`.
	///
	/// - Complexity: O(`self.count`) if a validation closure is provided, otherwise O(`1`).
	public func binarySearchView(validateAgainst isOrderedBefore: ((_Element, _Element) -> Bool)? = nil) -> BinarySearchView<Self>
}

### `BinarySearchView<T: Indexable>`:

The author proposes the introduction of a `BinarySearchView` struct with the following interface:

public struct BinarySearchView<Base : Indexable where Base.Index : RandomAccessIndexType> : Indexable {
    public typealias Index = Base.Index

    public var startIndex: Base.Index { get }

    public var endIndex: Base.Index { get }

    public subscript (position: Index) -> Base._Element { get }

    /// Create a view into a sorted collection `base` that allows access within `bounds`.
    ///
    /// - Complexity: O(1).
    public init(base: Base, validateAgainst isOrderedBefore: ((Base._Element, Base._Element) -> Bool)? = default)

    /// Returns first index of the first `element` in `self` within range `range` for which
    /// `lensView(element) < value` evaluates to false or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func lowerBoundOf<T : Comparable>(value: T, range: Range<Index>? = default, @noescape isOrderedBefore: (T, T) -> Bool = default, @noescape lensView: Base._Element -> T) -> Base.Index

    /// Returns first index of the first `element` in `self` within range `range` for which
    /// `value < lensView(element)` evaluates to true or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func upperBoundOf<T : Comparable>(value: T, range: Range<Index>? = default, @noescape isOrderedBefore: ((T, T) -> Bool) = default, @noescape lensView: Base._Element -> T) -> Base.Index

    /// Returns the first index where an element appears in `self`
    /// within range `range` for which `lensView(element) == value` or `nil` if such an element is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func indexOf<T : Comparable>(value: T, range: Range<Index>? = default, @noescape isOrderedBefore: (T, T) -> Bool = default, @noescape lensView: Base._Element -> T) -> Base.Index?

    /// Returns the range where an element appears in `self`
    /// within range `range` for which `lensView(element) == value` or `nil` if such an element is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func rangeOf<T : Comparable>(value: T, range: Range<Index>? = default, @noescape isOrderedBefore: (T, T) -> Bool = default, @noescape lensView: Base._Element -> T) -> Range<Base.Index>?

    /// Returns the number of `element`s that appear in `self`
    /// within range `range` for which `lensView(element) == value` or `nil` if such an element is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func countOf<T : Comparable>(value: T, range: Range<Index>? = default, @noescape isOrderedBefore: (T, T) -> Bool = default, @noescape lensView: Base._Element -> T) -> Int

    internal let _base: Base
}

As well as the following extension for `Indexable where Element: Comparable`:

extension BinarySearchView where Base._Element : Comparable {
    /// Returns first index of the first `element` in `self` within range `range` for which
    /// `… < element` evaluates to false or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func lowerBoundOf(element: Base._Element, range: Range<Index>? = default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool = default) -> Base.Index

    /// Returns first index of the first `element` in `self` within range `range` for which
    /// `element < …` evaluates to true or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func upperBoundOf(element: Base._Element, range: Range<Index>? = default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool = default) -> Base.Index

    /// Returns the first index where `element` appears in `self`
    /// within range `range` or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func indexOf(element: Base._Element, range: Range<Index>? = default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool = default) -> Base.Index?

    /// Returns the range where `element` appears in `self`
    /// within range `range` or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func rangeOf(element: Base._Element, range: Range<Index>? = default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool = default) -> Range<Base.Index>?

    /// Returns the number of `element`s that appear in `self`
    /// within range `range` or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func countOf(element: Base._Element, range: Range<Index>? = default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool = default) -> Int
}

These methods would perform a linear scan of the `Indexable` with a TIME complexity of `O(log2(self.count))`.

An actual implementation of these methods can be found in `BinarySearchView.swift`.

### Why `lensView`?

Let's assume one had a collection of Persons like this:

struct Person {
    let name: String
    let age: Int
}

let persons: [Person] = [ … ]

Now let's assume one wanted to find the first person that's called `John Doe`. One could either change `Person` to conform to `Equatable`, allowing `indexOf(element:)` to be called on `persons`. Now that was easy, wasn't it?

Fine, but what if one wanted to search `Person`s by both their `name` or their `age` depending on the given use case? You're out of luck now.

Luckily there is a variant of `indexOf(…)` called `indexOf(predicate:)` that is available for any type of `Element` regardless of its conformance to `Equatable`:

let searchView = persons.binarySearchView()
let index = persons.indexOf { $0.name == "John Doe" }

Unfortunately however such a `predicate` cannot be used when searching for an element's property in a sorted(!) collection or a collection of non-`Comparable` elements within a TIME complexity of `O(log2(self.count))` as binary search uses `<` instead of `==` internally and as the "needle" gets passed as both the first (in `indexOf(…)` itself) and second argument (in `lowerBoundOf(…)` which gets called by `indexOf(…)`) to the operator, one cannot simply change `predicate: (E) -> Bool` to `isOrderedBefore: (V, E) -> Bool` as that would result in a type mismatch (`(V, E)` vs. `(E, V)`).

The use of `lensView` however allows one to do just that without making a mess.

let searchView = persons.binarySearchView() // sorted!
let index = searchView.indexOf("John Doe") { $0.name }

The `lensView` works similarly to how keypaths work in Objective-C, just type-safe.

## Impact on existing code

The author proposes:

1. Moving `CollectionType.indices` to `Indexable` should not cause any issues with existing code.

2. Changing `indexOf(element:)` to `indexOf(element:range:?)` should not cause any issues with existing code as the `range` argument is optional and the default behaves like it used to.

2. Changing `indexOf(predicate:)` to `indexOf(range:?predicate:)` should not cause any issues with existing code either, as the `range` argument is optional and the default behaves like it used to. Swift's trailing-closure magic allows `indexOf(predicate: …)` to still properly map to `indexOf(range: nil, predicate: …)`.

3. The addition of `rangeOf(…)`, `countOf(…)` and `isSorted(…)` to `Indexable` do not conflict with any existing API.

4. The introduction of a `BinarySearchView` on `Indexable` does not conflict with any existing API.

## Alternatives considered

The author's initial approach was to implement all methods on Indexable. This however required all methods to either be provided in two variants (one for unsorted, one for sorted `Indexable`s):

final public func indexOf(element: _Element, range: Range<Index>? = nil) -> Index? {
	return …
}

final public func sortedIndexOf(element: _Element, range: Range<Index>? = nil) -> Index? {
	return …
}

Or require the introduction of an additional argument:

final public func indexOf(element: _Element, range: Range<Index>? = nil, isSorted: Bool = true) -> Index? {
    if isSorted {
        return …
    } else {
        return …
    }
}

It also would have cluttered `Indexable` with methods such as `lowerBoundOf` and `upperBoundOf`, which are only ever applicable when `self` is sorted accordingly. The introduction of a dedicated BinarySearchView fixes these issues.

One also would have had to switch from `predicate: (T) -> Bool` to `isOrderedBefore: (T, T) -> Bool` for all methods for the second unified API alternative.

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution


(Vincent Esche) #9

Fair point! I’m all in favor of a simple(r) API. Noted, and thanks for the comment!

···

On 05 Jan 2016, at 03:07, Donnacha Oisín Kidney <oisin.kidney@gmail.com> wrote:

Unless I’m confused, all of the “ranged” overloads can already be achieved by composing ranged subscripts with the indexOf, etc., methods:

let ar = [0, 1, 2, 3, 4, 5]
let ind = ar[3...5].indexOf { n in n % 2 == 0 }
ar[ind!] // 4

On 5 Jan 2016, at 01:00, Vincent Esche via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

There are a couple of things I’m not 100% happy with/sure about:

1.
I don’t like how it’s
“<index/range/count>Of(element:range:)” but “<index/range/count>Of(range:predicate:)”.
The reason I went for “<index/range/count>Of(range:predicate:)” was to go with the standard pattern of putting closures last and thus allowing for trailing closure syntax.

The current order allows for this:
“<index/range/count>Of(0..10) { $0.name == “Foobar" }”
which I like syntax-wise.
It however makes it look like one was looking for a range ("indexOf(0..10, …)”), not the predicate.

While the alternative requires this:
“<index/range/count>Of(predicate: { $0.name == “Foobar" }, range: 0..10)”

I’m actually leaning towards the latter now. Dang!

2.
I’m unsure about the name of “lensView”. I first went with “transform”, then with “mappingFunction”, but found that neither of those made it clear that the closure should provide a specific view into the element. Both “transform" and “mappingFunction” imply the transformation from one data type to another. It’s not about transforming. It’s about accessing.
Thus looked for names of similar patterns and found “keypaths” (kind of) in ObjC and lenses in Haskell. To people familiar with functional programming the name should be clear. The closure passed to “lensView” is basically the getter part of a functional lens.
Anyway, I’m not too attached to the name and more than open to suggestions.

3.
“BinarySearchView.init(base:validateAgainst:)” currently asserts. This should probably be split into two init functions. One assuming the base to be sorted (“BinarySearchView.init(base:)”, O(1)). And one allowing for a check, returning nil on failure (“BinarySearchView.init?(base:validateAgainst:)”, O(self.count)). Or should at least throw an error, instead of panicking.

Note: I made some minor documentation changes/fixes.
See https://gist.github.com/regexident/2b7531bd748f57679671 for up-to-date RFC/source code (vs. Markdown-dump at bottom of OP).

- Vincent

On 04 Jan 2016, at 16:13, Vincent Esche <regexident.mailinglists@gmail.com <mailto:regexident.mailinglists@gmail.com>> wrote:

After having had this code laying around on my Mac since 6/26/15 (waiting for Swift to go open source)
I figured it’s about damn time I submit the actual RFC for it. So without further ado…

One of the areas where Swift’s stdlib is still quite lacking is searching.
Which is a shame as searching (along with indexing) might be among
the most important tasks of computer science/software development.
One might not need fast collection searches for writing a banal fart or flashlight app,
but almost every other app is likely to benefit from having a proper searching API.

So I’d like to fix that.

Attached you find a full RFC along with the full and functional source code to make it happen.

I’d love to hear your opinion on this and will be more than happy to answer questions.

Rendered Version + Full Implementation: https://gist.github.com/regexident/2b7531bd748f57679671
(The code is tested using Quick/Nimble. Tests would thus have to be ported to Swift’s XCTest.)

- Vincent

Markdown-dump for

# Improved Collection Search

* Proposal: [SE-NNNN](https://github.com/apple/swift-evolution/blob/master/proposals/NNNN-name.md)
* Author(s): [Vincent Esche](https://github.com/regexident)
* Status: **Review**
* Review manager: TBD

## Introduction

This RFC proposes an extension to the currently rather limited and linear searching API of `CollectionType`, that is `indexOf(element:)` and `indexOf(predicate:)`.
It proposes the backwards-compatible refactoring of those existing methods as well as the introduction of a view-based ensemble of methods for efficient binary-search.

Swift-evolution thread: [link to the discussion thread for that proposal](https://lists.swift.org/pipermail/swift-evolution)

## Motivation

Indexing of and searching in data is a big deal in software development. As such it is crucial to have capable means of doing so in a language that aspires to be a highly performant programming language.

Swift's current API for searching is basically limited to these two methods on `CollectionType`:

> ```swift
> func indexOf(element: Self.Generator.Element) -> Self.Index?
> ```
> Returns the first index where value appears in self or nil if value is not found.
>
> **Complexity**: `O(self.count)`.

and:

> ```swift
> func indexOf(@noescape predicate: (Self.Generator.Element) throws -> Bool) rethrows -> Self.Index?
> ```
> Returns the first index where predicate returns true for the corresponding value, or nil if such value is not found.
>
> **Complexity**: `O(self.count)`.

The author sees a couple of issue with these two methods:

1. They do not provide an option for restricting the "haystack" to a certain `range` of `self`.
2. They do not provide a fast (`O(log2(self.count))`) path for sorted collections.
3. They should not be limited to `CollectionType` but instead be moved to `Indexable`.

In many situations it is desirable to perform fast searches on sorted collections instead of having to fall back to naïve linear searches.

Further more while a single `index` might be the most common subject of interest, there are many scenarios where a `range` or `count` of matches are of more use.

And anyone who is writing a custom ordered collection type will eventually want to find the insertion index for a given element and will thus end up writing some variant of `lowerBoundOf(…)` or `upperBoundOf(…)` (upon which all the other methods can be implemented, actually).

## Proposed solution

The author proposes:

1. A backwards-compatible refactoring of `CollectionType.indices`, moving it to `Indexable`.

2. A backwards-compatible refactoring of `indexOf(…)` (adding optional `range:` and moving it to `Indexable`).

3. The addition of `rangeOf(…)`, `countOf(…)` and `isSorted(…)` to `Indexable` with a TIME complexity of `O(self.count)`.

4. The introduction of a `BinarySearchView` on `Indexable`, allowing for fast (`O(log2(self.count))`) searches on `Indexable` via `indexOf(…)`, `rangeOf(…)`, `countOf(…)`, `lowerBoundOf(…)`, `upperBoundOf(…)` without cluttering `Indexable`'s interface.

## Detailed design

### `CollectionType.indices`:

The author proposes the relocation of `.indices` from `CollectionType` to `Indexable`:

extension Indexable {
    /// Return the range of valid index values.
    ///
    /// The result's `endIndex` is the same as that of `self`.  Because
    /// `Range` is half-open, iterating the values of the result produces
    /// all valid subscript arguments for `self`, omitting its `endIndex`.
    public var indices: Range<Self.Index> { get }
}

After all all it does is provide a convenience method for generating a Range from its `startIndex` and `endIndex`. There is no need for `self` to be a `CollectionType` for this.

This change should not break any existing code as it generalizes the property.

### `Indexable` of `Comparable` elements:

The author proposes the addition of the following method on `Indexable where Element : Comparable`:

extension Indexable where Index: RandomAccessIndexType, _Element: Comparable {
	/// Return true iff all elements in `self` within `range` are sorted according to `isOrderedBefore`.
	///
	/// - Complexity: O(`self.count`)
	public func isSorted(range: Range<Index>? = nil) -> Bool
}

This method would perform a linear scan of the `Indexable` with a TIME complexity of `O(self.count)`.

An actual implementation of these methods can be found in `Indexable.swift`.

### `Indexable` of `Equatable ` elements:

The author proposes the addition of the following method on `Indexable where Element : Equatable`:

extension Indexable where Index : RandomAccessIndexType, _Element : Equatable {

    /// Returns the first index where `element` appears in `self`
    /// within range `range` or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`self.count`)
    public func indexOf(element: _Element, range: Range<Index>? = default) -> Self.Index?

    /// Returns the first range where `element` appears in `self`
    /// within range `range` or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`self.count`)
    public func rangeOf(element: _Element, range: Range<Index>? = default) -> Range<Self.Index>?

    /// Returns the number of `element`s that appear in `self`
    /// within range `range` or `nil` if such an element is not found.
    ///
    /// - Complexity: O(`self.count`)
    public func countOf(element: _Element, range: Range<Index>? = default) -> Int
}

These methods would perform a linear scan of the `Indexable` with a TIME complexity of `O(self.count)`.

An actual implementation of these methods can be found in `Indexable.swift`.

### `Indexable` of any elements:

The author proposes the addition of the following methods on `Indexable`:

extension Indexable where Index : RandomAccessIndexType {

    /// Returns the first index where an element appears in `self`
    /// within range `range` for which `lensView(element) == value` or `nil` if such an element is not found.
    ///
    /// - Complexity: O(`self.count`)
    public func indexOf(range: Range<Index>? = default, @noescape predicate: (_Element) throws -> Bool) rethrows -> Self.Index?

    /// Returns the first range where an element appears in `self`
    /// within range `range` for which `lensView(element) == value` or `nil` if such an element is not found.
    ///
    /// - Complexity: O(`self.count`)
    public func rangeOf(range: Range<Index>? = default, @noescape predicate: (_Element) throws -> Bool) rethrows -> Range<Self.Index>?

    /// Returns the number of `element`s that appear in `self`
    /// within range `range` or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`self.count`)
    public func countOf(range: Range<Index>? = default, @noescape predicate: (_Element) throws -> Bool) rethrows -> Int

    /// Return true iff all elements in `self` within `range` are sorted according to `isOrderedBefore`.
    /// 
    /// - Complexity: O(`self.count`)
    public func isSorted(range: Range<Index>? = default, @noescape isOrderedBefore: (_Element, _Element) throws -> Bool) rethrows -> Bool
}

These methods would perform a linear scan of the `Indexable` with a TIME complexity of `O(self.count)`.

An actual implementation of these methods can be found in `Indexable.swift`.

**Note**: For explanation of and reasoning behind the `lensView:` argument as well as the distinction between `T` and `Base._Element`, see below.

### `Indexable.binarySearchView(validateAgainst:?)`:

The author proposes the addition of a `binarySearchView(validateAgainst:?)` method on `Indexable` for obtaining a specialized search view for performing searches with O(`log2(self.count)`) complexity:

extension Indexable where Index: RandomAccessIndexType {
	/// Create a view into a sorted indexable that allows access within `bounds`.
	///
	/// - Complexity: O(`self.count`) if a validation closure is provided, otherwise O(`1`).
	public func binarySearchView(validateAgainst isOrderedBefore: ((_Element, _Element) -> Bool)? = nil) -> BinarySearchView<Self>
}

### `BinarySearchView<T: Indexable>`:

The author proposes the introduction of a `BinarySearchView` struct with the following interface:

public struct BinarySearchView<Base : Indexable where Base.Index : RandomAccessIndexType> : Indexable {
    public typealias Index = Base.Index

    public var startIndex: Base.Index { get }

    public var endIndex: Base.Index { get }

    public subscript (position: Index) -> Base._Element { get }

    /// Create a view into a sorted collection `base` that allows access within `bounds`.
    ///
    /// - Complexity: O(1).
    public init(base: Base, validateAgainst isOrderedBefore: ((Base._Element, Base._Element) -> Bool)? = default)

    /// Returns first index of the first `element` in `self` within range `range` for which
    /// `lensView(element) < value` evaluates to false or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func lowerBoundOf<T : Comparable>(value: T, range: Range<Index>? = default, @noescape isOrderedBefore: (T, T) -> Bool = default, @noescape lensView: Base._Element -> T) -> Base.Index

    /// Returns first index of the first `element` in `self` within range `range` for which
    /// `value < lensView(element)` evaluates to true or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func upperBoundOf<T : Comparable>(value: T, range: Range<Index>? = default, @noescape isOrderedBefore: ((T, T) -> Bool) = default, @noescape lensView: Base._Element -> T) -> Base.Index

    /// Returns the first index where an element appears in `self`
    /// within range `range` for which `lensView(element) == value` or `nil` if such an element is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func indexOf<T : Comparable>(value: T, range: Range<Index>? = default, @noescape isOrderedBefore: (T, T) -> Bool = default, @noescape lensView: Base._Element -> T) -> Base.Index?

    /// Returns the range where an element appears in `self`
    /// within range `range` for which `lensView(element) == value` or `nil` if such an element is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func rangeOf<T : Comparable>(value: T, range: Range<Index>? = default, @noescape isOrderedBefore: (T, T) -> Bool = default, @noescape lensView: Base._Element -> T) -> Range<Base.Index>?

    /// Returns the number of `element`s that appear in `self`
    /// within range `range` for which `lensView(element) == value` or `nil` if such an element is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func countOf<T : Comparable>(value: T, range: Range<Index>? = default, @noescape isOrderedBefore: (T, T) -> Bool = default, @noescape lensView: Base._Element -> T) -> Int

    internal let _base: Base
}

As well as the following extension for `Indexable where Element: Comparable`:

extension BinarySearchView where Base._Element : Comparable {
    /// Returns first index of the first `element` in `self` within range `range` for which
    /// `… < element` evaluates to false or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func lowerBoundOf(element: Base._Element, range: Range<Index>? = default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool = default) -> Base.Index

    /// Returns first index of the first `element` in `self` within range `range` for which
    /// `element < …` evaluates to true or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func upperBoundOf(element: Base._Element, range: Range<Index>? = default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool = default) -> Base.Index

    /// Returns the first index where `element` appears in `self`
    /// within range `range` or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func indexOf(element: Base._Element, range: Range<Index>? = default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool = default) -> Base.Index?

    /// Returns the range where `element` appears in `self`
    /// within range `range` or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func rangeOf(element: Base._Element, range: Range<Index>? = default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool = default) -> Range<Base.Index>?

    /// Returns the number of `element`s that appear in `self`
    /// within range `range` or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func countOf(element: Base._Element, range: Range<Index>? = default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool = default) -> Int
}

These methods would perform a linear scan of the `Indexable` with a TIME complexity of `O(log2(self.count))`.

An actual implementation of these methods can be found in `BinarySearchView.swift`.

### Why `lensView`?

Let's assume one had a collection of Persons like this:

struct Person {
    let name: String
    let age: Int
}

let persons: [Person] = [ … ]

Now let's assume one wanted to find the first person that's called `John Doe`. One could either change `Person` to conform to `Equatable`, allowing `indexOf(element:)` to be called on `persons`. Now that was easy, wasn't it?

Fine, but what if one wanted to search `Person`s by both their `name` or their `age` depending on the given use case? You're out of luck now.

Luckily there is a variant of `indexOf(…)` called `indexOf(predicate:)` that is available for any type of `Element` regardless of its conformance to `Equatable`:

let searchView = persons.binarySearchView()
let index = persons.indexOf { $0.name == "John Doe" }

Unfortunately however such a `predicate` cannot be used when searching for an element's property in a sorted(!) collection or a collection of non-`Comparable` elements within a TIME complexity of `O(log2(self.count))` as binary search uses `<` instead of `==` internally and as the "needle" gets passed as both the first (in `indexOf(…)` itself) and second argument (in `lowerBoundOf(…)` which gets called by `indexOf(…)`) to the operator, one cannot simply change `predicate: (E) -> Bool` to `isOrderedBefore: (V, E) -> Bool` as that would result in a type mismatch (`(V, E)` vs. `(E, V)`).

The use of `lensView` however allows one to do just that without making a mess.

let searchView = persons.binarySearchView() // sorted!
let index = searchView.indexOf("John Doe") { $0.name }

The `lensView` works similarly to how keypaths work in Objective-C, just type-safe.

## Impact on existing code

The author proposes:

1. Moving `CollectionType.indices` to `Indexable` should not cause any issues with existing code.

2. Changing `indexOf(element:)` to `indexOf(element:range:?)` should not cause any issues with existing code as the `range` argument is optional and the default behaves like it used to.

2. Changing `indexOf(predicate:)` to `indexOf(range:?predicate:)` should not cause any issues with existing code either, as the `range` argument is optional and the default behaves like it used to. Swift's trailing-closure magic allows `indexOf(predicate: …)` to still properly map to `indexOf(range: nil, predicate: …)`.

3. The addition of `rangeOf(…)`, `countOf(…)` and `isSorted(…)` to `Indexable` do not conflict with any existing API.

4. The introduction of a `BinarySearchView` on `Indexable` does not conflict with any existing API.

## Alternatives considered

The author's initial approach was to implement all methods on Indexable. This however required all methods to either be provided in two variants (one for unsorted, one for sorted `Indexable`s):

final public func indexOf(element: _Element, range: Range<Index>? = nil) -> Index? {
	return …
}

final public func sortedIndexOf(element: _Element, range: Range<Index>? = nil) -> Index? {
	return …
}

Or require the introduction of an additional argument:

final public func indexOf(element: _Element, range: Range<Index>? = nil, isSorted: Bool = true) -> Index? {
    if isSorted {
        return …
    } else {
        return …
    }
}

It also would have cluttered `Indexable` with methods such as `lowerBoundOf` and `upperBoundOf`, which are only ever applicable when `self` is sorted accordingly. The introduction of a dedicated BinarySearchView fixes these issues.

One also would have had to switch from `predicate: (T) -> Bool` to `isOrderedBefore: (T, T) -> Bool` for all methods for the second unified API alternative.

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution


(Vincent Esche) #10

I’ve updated a couple of things in the proposal draft:

1.
Given the obvious redundancy of “range:” in the existence of "CollectionType.subscript(bounds:)” I’ve removed it from the proposed API.
For the sake of lightweighted-ness and speed “BinarySearchView" now accepts a range and supports ranged subscripts itself (required for efficient implementation of rangeOf).

2.
“isOrderedBefore” and “predicate” now support throwing closures, as does the rest of stdlib. Changed affected methods to rethrow accordingly.

3.
BinarySearchView now as two initializers: “init(base:bounds?:)” and “init(base:bounds?:isOrderedBefore:) throws”. Changed the latter from "assert()” to "throw BinarySearchViewError.BaseNotSorted”.

This solves two of the issues (1 & 3) I previously had with the implementation.

Updated Proposal Draft: https://gist.github.com/regexident/2b7531bd748f57679671

Any comments welcome.

- Vincent

···

On 05 Jan 2016, at 12:15, Vincent Esche <regexident.mailinglists@gmail.com> wrote:

Fair point! I’m all in favor of a simple(r) API. Noted, and thanks for the comment!

On 05 Jan 2016, at 03:07, Donnacha Oisín Kidney <oisin.kidney@gmail.com <mailto:oisin.kidney@gmail.com>> wrote:

Unless I’m confused, all of the “ranged” overloads can already be achieved by composing ranged subscripts with the indexOf, etc., methods:

let ar = [0, 1, 2, 3, 4, 5]
let ind = ar[3...5].indexOf { n in n % 2 == 0 }
ar[ind!] // 4

On 5 Jan 2016, at 01:00, Vincent Esche via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

There are a couple of things I’m not 100% happy with/sure about:

1.
I don’t like how it’s
“<index/range/count>Of(element:range:)” but “<index/range/count>Of(range:predicate:)”.
The reason I went for “<index/range/count>Of(range:predicate:)” was to go with the standard pattern of putting closures last and thus allowing for trailing closure syntax.

The current order allows for this:
“<index/range/count>Of(0..10) { $0.name == “Foobar" }”
which I like syntax-wise.
It however makes it look like one was looking for a range ("indexOf(0..10, …)”), not the predicate.

While the alternative requires this:
“<index/range/count>Of(predicate: { $0.name == “Foobar" }, range: 0..10)”

I’m actually leaning towards the latter now. Dang!

2.
I’m unsure about the name of “lensView”. I first went with “transform”, then with “mappingFunction”, but found that neither of those made it clear that the closure should provide a specific view into the element. Both “transform" and “mappingFunction” imply the transformation from one data type to another. It’s not about transforming. It’s about accessing.
Thus looked for names of similar patterns and found “keypaths” (kind of) in ObjC and lenses in Haskell. To people familiar with functional programming the name should be clear. The closure passed to “lensView” is basically the getter part of a functional lens.
Anyway, I’m not too attached to the name and more than open to suggestions.

3.
“BinarySearchView.init(base:validateAgainst:)” currently asserts. This should probably be split into two init functions. One assuming the base to be sorted (“BinarySearchView.init(base:)”, O(1)). And one allowing for a check, returning nil on failure (“BinarySearchView.init?(base:validateAgainst:)”, O(self.count)). Or should at least throw an error, instead of panicking.

Note: I made some minor documentation changes/fixes.
See https://gist.github.com/regexident/2b7531bd748f57679671 for up-to-date RFC/source code (vs. Markdown-dump at bottom of OP).

- Vincent

On 04 Jan 2016, at 16:13, Vincent Esche <regexident.mailinglists@gmail.com <mailto:regexident.mailinglists@gmail.com>> wrote:

After having had this code laying around on my Mac since 6/26/15 (waiting for Swift to go open source)
I figured it’s about damn time I submit the actual RFC for it. So without further ado…

One of the areas where Swift’s stdlib is still quite lacking is searching.
Which is a shame as searching (along with indexing) might be among
the most important tasks of computer science/software development.
One might not need fast collection searches for writing a banal fart or flashlight app,
but almost every other app is likely to benefit from having a proper searching API.

So I’d like to fix that.

Attached you find a full RFC along with the full and functional source code to make it happen.

I’d love to hear your opinion on this and will be more than happy to answer questions.

Rendered Version + Full Implementation: https://gist.github.com/regexident/2b7531bd748f57679671
(The code is tested using Quick/Nimble. Tests would thus have to be ported to Swift’s XCTest.)

- Vincent

Markdown-dump for

# Improved Collection Search

* Proposal: [SE-NNNN](https://github.com/apple/swift-evolution/blob/master/proposals/NNNN-name.md)
* Author(s): [Vincent Esche](https://github.com/regexident)
* Status: **Review**
* Review manager: TBD

## Introduction

This RFC proposes an extension to the currently rather limited and linear searching API of `CollectionType`, that is `indexOf(element:)` and `indexOf(predicate:)`.
It proposes the backwards-compatible refactoring of those existing methods as well as the introduction of a view-based ensemble of methods for efficient binary-search.

Swift-evolution thread: [link to the discussion thread for that proposal](https://lists.swift.org/pipermail/swift-evolution)

## Motivation

Indexing of and searching in data is a big deal in software development. As such it is crucial to have capable means of doing so in a language that aspires to be a highly performant programming language.

Swift's current API for searching is basically limited to these two methods on `CollectionType`:

> ```swift
> func indexOf(element: Self.Generator.Element) -> Self.Index?
> ```
> Returns the first index where value appears in self or nil if value is not found.
>
> **Complexity**: `O(self.count)`.

and:

> ```swift
> func indexOf(@noescape predicate: (Self.Generator.Element) throws -> Bool) rethrows -> Self.Index?
> ```
> Returns the first index where predicate returns true for the corresponding value, or nil if such value is not found.
>
> **Complexity**: `O(self.count)`.

The author sees a couple of issue with these two methods:

1. They do not provide an option for restricting the "haystack" to a certain `range` of `self`.
2. They do not provide a fast (`O(log2(self.count))`) path for sorted collections.
3. They should not be limited to `CollectionType` but instead be moved to `Indexable`.

In many situations it is desirable to perform fast searches on sorted collections instead of having to fall back to naïve linear searches.

Further more while a single `index` might be the most common subject of interest, there are many scenarios where a `range` or `count` of matches are of more use.

And anyone who is writing a custom ordered collection type will eventually want to find the insertion index for a given element and will thus end up writing some variant of `lowerBoundOf(…)` or `upperBoundOf(…)` (upon which all the other methods can be implemented, actually).

## Proposed solution

The author proposes:

1. A backwards-compatible refactoring of `CollectionType.indices`, moving it to `Indexable`.

2. A backwards-compatible refactoring of `indexOf(…)` (adding optional `range:` and moving it to `Indexable`).

3. The addition of `rangeOf(…)`, `countOf(…)` and `isSorted(…)` to `Indexable` with a TIME complexity of `O(self.count)`.

4. The introduction of a `BinarySearchView` on `Indexable`, allowing for fast (`O(log2(self.count))`) searches on `Indexable` via `indexOf(…)`, `rangeOf(…)`, `countOf(…)`, `lowerBoundOf(…)`, `upperBoundOf(…)` without cluttering `Indexable`'s interface.

## Detailed design

### `CollectionType.indices`:

The author proposes the relocation of `.indices` from `CollectionType` to `Indexable`:

extension Indexable {
    /// Return the range of valid index values.
    ///
    /// The result's `endIndex` is the same as that of `self`.  Because
    /// `Range` is half-open, iterating the values of the result produces
    /// all valid subscript arguments for `self`, omitting its `endIndex`.
    public var indices: Range<Self.Index> { get }
}

After all all it does is provide a convenience method for generating a Range from its `startIndex` and `endIndex`. There is no need for `self` to be a `CollectionType` for this.

This change should not break any existing code as it generalizes the property.

### `Indexable` of `Comparable` elements:

The author proposes the addition of the following method on `Indexable where Element : Comparable`:

extension Indexable where Index: RandomAccessIndexType, _Element: Comparable {
	/// Return true iff all elements in `self` within `range` are sorted according to `isOrderedBefore`.
	///
	/// - Complexity: O(`self.count`)
	public func isSorted(range: Range<Index>? = nil) -> Bool
}

This method would perform a linear scan of the `Indexable` with a TIME complexity of `O(self.count)`.

An actual implementation of these methods can be found in `Indexable.swift`.

### `Indexable` of `Equatable ` elements:

The author proposes the addition of the following method on `Indexable where Element : Equatable`:

extension Indexable where Index : RandomAccessIndexType, _Element : Equatable {

    /// Returns the first index where `element` appears in `self`
    /// within range `range` or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`self.count`)
    public func indexOf(element: _Element, range: Range<Index>? = default) -> Self.Index?

    /// Returns the first range where `element` appears in `self`
    /// within range `range` or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`self.count`)
    public func rangeOf(element: _Element, range: Range<Index>? = default) -> Range<Self.Index>?

    /// Returns the number of `element`s that appear in `self`
    /// within range `range` or `nil` if such an element is not found.
    ///
    /// - Complexity: O(`self.count`)
    public func countOf(element: _Element, range: Range<Index>? = default) -> Int
}

These methods would perform a linear scan of the `Indexable` with a TIME complexity of `O(self.count)`.

An actual implementation of these methods can be found in `Indexable.swift`.

### `Indexable` of any elements:

The author proposes the addition of the following methods on `Indexable`:

extension Indexable where Index : RandomAccessIndexType {

    /// Returns the first index where an element appears in `self`
    /// within range `range` for which `lensView(element) == value` or `nil` if such an element is not found.
    ///
    /// - Complexity: O(`self.count`)
    public func indexOf(range: Range<Index>? = default, @noescape predicate: (_Element) throws -> Bool) rethrows -> Self.Index?

    /// Returns the first range where an element appears in `self`
    /// within range `range` for which `lensView(element) == value` or `nil` if such an element is not found.
    ///
    /// - Complexity: O(`self.count`)
    public func rangeOf(range: Range<Index>? = default, @noescape predicate: (_Element) throws -> Bool) rethrows -> Range<Self.Index>?

    /// Returns the number of `element`s that appear in `self`
    /// within range `range` or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`self.count`)
    public func countOf(range: Range<Index>? = default, @noescape predicate: (_Element) throws -> Bool) rethrows -> Int

    /// Return true iff all elements in `self` within `range` are sorted according to `isOrderedBefore`.
    /// 
    /// - Complexity: O(`self.count`)
    public func isSorted(range: Range<Index>? = default, @noescape isOrderedBefore: (_Element, _Element) throws -> Bool) rethrows -> Bool
}

These methods would perform a linear scan of the `Indexable` with a TIME complexity of `O(self.count)`.

An actual implementation of these methods can be found in `Indexable.swift`.

**Note**: For explanation of and reasoning behind the `lensView:` argument as well as the distinction between `T` and `Base._Element`, see below.

### `Indexable.binarySearchView(validateAgainst:?)`:

The author proposes the addition of a `binarySearchView(validateAgainst:?)` method on `Indexable` for obtaining a specialized search view for performing searches with O(`log2(self.count)`) complexity:

extension Indexable where Index: RandomAccessIndexType {
	/// Create a view into a sorted indexable that allows access within `bounds`.
	///
	/// - Complexity: O(`self.count`) if a validation closure is provided, otherwise O(`1`).
	public func binarySearchView(validateAgainst isOrderedBefore: ((_Element, _Element) -> Bool)? = nil) -> BinarySearchView<Self>
}

### `BinarySearchView<T: Indexable>`:

The author proposes the introduction of a `BinarySearchView` struct with the following interface:

public struct BinarySearchView<Base : Indexable where Base.Index : RandomAccessIndexType> : Indexable {
    public typealias Index = Base.Index

    public var startIndex: Base.Index { get }

    public var endIndex: Base.Index { get }

    public subscript (position: Index) -> Base._Element { get }

    /// Create a view into a sorted collection `base` that allows access within `bounds`.
    ///
    /// - Complexity: O(1).
    public init(base: Base, validateAgainst isOrderedBefore: ((Base._Element, Base._Element) -> Bool)? = default)

    /// Returns first index of the first `element` in `self` within range `range` for which
    /// `lensView(element) < value` evaluates to false or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func lowerBoundOf<T : Comparable>(value: T, range: Range<Index>? = default, @noescape isOrderedBefore: (T, T) -> Bool = default, @noescape lensView: Base._Element -> T) -> Base.Index

    /// Returns first index of the first `element` in `self` within range `range` for which
    /// `value < lensView(element)` evaluates to true or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func upperBoundOf<T : Comparable>(value: T, range: Range<Index>? = default, @noescape isOrderedBefore: ((T, T) -> Bool) = default, @noescape lensView: Base._Element -> T) -> Base.Index

    /// Returns the first index where an element appears in `self`
    /// within range `range` for which `lensView(element) == value` or `nil` if such an element is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func indexOf<T : Comparable>(value: T, range: Range<Index>? = default, @noescape isOrderedBefore: (T, T) -> Bool = default, @noescape lensView: Base._Element -> T) -> Base.Index?

    /// Returns the range where an element appears in `self`
    /// within range `range` for which `lensView(element) == value` or `nil` if such an element is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func rangeOf<T : Comparable>(value: T, range: Range<Index>? = default, @noescape isOrderedBefore: (T, T) -> Bool = default, @noescape lensView: Base._Element -> T) -> Range<Base.Index>?

    /// Returns the number of `element`s that appear in `self`
    /// within range `range` for which `lensView(element) == value` or `nil` if such an element is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func countOf<T : Comparable>(value: T, range: Range<Index>? = default, @noescape isOrderedBefore: (T, T) -> Bool = default, @noescape lensView: Base._Element -> T) -> Int

    internal let _base: Base
}

As well as the following extension for `Indexable where Element: Comparable`:

extension BinarySearchView where Base._Element : Comparable {
    /// Returns first index of the first `element` in `self` within range `range` for which
    /// `… < element` evaluates to false or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func lowerBoundOf(element: Base._Element, range: Range<Index>? = default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool = default) -> Base.Index

    /// Returns first index of the first `element` in `self` within range `range` for which
    /// `element < …` evaluates to true or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func upperBoundOf(element: Base._Element, range: Range<Index>? = default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool = default) -> Base.Index

    /// Returns the first index where `element` appears in `self`
    /// within range `range` or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func indexOf(element: Base._Element, range: Range<Index>? = default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool = default) -> Base.Index?

    /// Returns the range where `element` appears in `self`
    /// within range `range` or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func rangeOf(element: Base._Element, range: Range<Index>? = default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool = default) -> Range<Base.Index>?

    /// Returns the number of `element`s that appear in `self`
    /// within range `range` or `nil` if `element` is not found.
    ///
    /// - Complexity: O(`log2(self.count)`).
    public func countOf(element: Base._Element, range: Range<Index>? = default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool = default) -> Int
}

These methods would perform a linear scan of the `Indexable` with a TIME complexity of `O(log2(self.count))`.

An actual implementation of these methods can be found in `BinarySearchView.swift`.

### Why `lensView`?

Let's assume one had a collection of Persons like this:

struct Person {
    let name: String
    let age: Int
}

let persons: [Person] = [ … ]

Now let's assume one wanted to find the first person that's called `John Doe`. One could either change `Person` to conform to `Equatable`, allowing `indexOf(element:)` to be called on `persons`. Now that was easy, wasn't it?

Fine, but what if one wanted to search `Person`s by both their `name` or their `age` depending on the given use case? You're out of luck now.

Luckily there is a variant of `indexOf(…)` called `indexOf(predicate:)` that is available for any type of `Element` regardless of its conformance to `Equatable`:

let searchView = persons.binarySearchView()
let index = persons.indexOf { $0.name == "John Doe" }

Unfortunately however such a `predicate` cannot be used when searching for an element's property in a sorted(!) collection or a collection of non-`Comparable` elements within a TIME complexity of `O(log2(self.count))` as binary search uses `<` instead of `==` internally and as the "needle" gets passed as both the first (in `indexOf(…)` itself) and second argument (in `lowerBoundOf(…)` which gets called by `indexOf(…)`) to the operator, one cannot simply change `predicate: (E) -> Bool` to `isOrderedBefore: (V, E) -> Bool` as that would result in a type mismatch (`(V, E)` vs. `(E, V)`).

The use of `lensView` however allows one to do just that without making a mess.

let searchView = persons.binarySearchView() // sorted!
let index = searchView.indexOf("John Doe") { $0.name }

The `lensView` works similarly to how keypaths work in Objective-C, just type-safe.

## Impact on existing code

The author proposes:

1. Moving `CollectionType.indices` to `Indexable` should not cause any issues with existing code.

2. Changing `indexOf(element:)` to `indexOf(element:range:?)` should not cause any issues with existing code as the `range` argument is optional and the default behaves like it used to.

2. Changing `indexOf(predicate:)` to `indexOf(range:?predicate:)` should not cause any issues with existing code either, as the `range` argument is optional and the default behaves like it used to. Swift's trailing-closure magic allows `indexOf(predicate: …)` to still properly map to `indexOf(range: nil, predicate: …)`.

3. The addition of `rangeOf(…)`, `countOf(…)` and `isSorted(…)` to `Indexable` do not conflict with any existing API.

4. The introduction of a `BinarySearchView` on `Indexable` does not conflict with any existing API.

## Alternatives considered

The author's initial approach was to implement all methods on Indexable. This however required all methods to either be provided in two variants (one for unsorted, one for sorted `Indexable`s):

final public func indexOf(element: _Element, range: Range<Index>? = nil) -> Index? {
	return …
}

final public func sortedIndexOf(element: _Element, range: Range<Index>? = nil) -> Index? {
	return …
}

Or require the introduction of an additional argument:

final public func indexOf(element: _Element, range: Range<Index>? = nil, isSorted: Bool = true) -> Index? {
    if isSorted {
        return …
    } else {
        return …
    }
}

It also would have cluttered `Indexable` with methods such as `lowerBoundOf` and `upperBoundOf`, which are only ever applicable when `self` is sorted accordingly. The introduction of a dedicated BinarySearchView fixes these issues.

One also would have had to switch from `predicate: (T) -> Bool` to `isOrderedBefore: (T, T) -> Bool` for all methods for the second unified API alternative.

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution