On Mon, Jan 4, 2016 at 4:13 PM, Vincent Esche via swift-evolution < swift-evolution@swift.org> wrote:
After having had this code laying around on my Mac since 6/26/15 (waiting
for Swift to go open source)
I figured it’s about damn time I submit the actual RFC for it. So without
further ado…
One of the areas where Swift’s stdlib is still quite lacking is searching.
Which is a shame as searching (along with indexing) might be among
the most important tasks of computer science/software development.
One might not need fast collection searches for writing a banal fart or
flashlight app,
but almost every other app is likely to benefit from having a proper
searching API.
So I’d like to fix that.
Attached you find a full RFC along with the full and functional source
code to make it happen.
I’d love to hear your opinion on this and will be more than happy to
answer questions.
Rendered Version + Full Implementation:
https://gist.github.com/regexident/2b7531bd748f57679671
(The code is tested using Quick/Nimble. Tests would thus have to be ported
to Swift’s XCTest.)
- Vincent
Markdown-dump for
# Improved Collection Search
* Proposal: [SE-NNNN](
https://github.com/apple/swift-evolution/blob/master/proposals/NNNN-name.md
)
* Author(s): [Vincent Esche](https://github.com/regexident\)
* Status: **Review**
* Review manager: TBD
## Introduction
This RFC proposes an extension to the currently rather limited and linear
searching API of `CollectionType`, that is `indexOf(element:)` and
`indexOf(predicate:)`.
It proposes the backwards-compatible refactoring of those existing methods
as well as the introduction of a view-based ensemble of methods for
efficient binary-search.
Swift-evolution thread: [link to the discussion thread for that proposal](
https://lists.swift.org/pipermail/swift-evolution\)
## Motivation
Indexing of and searching in data is a big deal in software development.
As such it is crucial to have capable means of doing so in a language that
aspires to be a highly performant programming language.
Swift's current API for searching is basically limited to these two
methods on `CollectionType`:
> ```swift
> func indexOf(element: Self.Generator.Element) -> Self.Index?
> ```
> Returns the first index where value appears in self or nil if value is
not found.
>
> **Complexity**: `O(self.count)`.
and:
> ```swift
> func indexOf(@noescape predicate: (Self.Generator.Element) throws ->
Bool) rethrows -> Self.Index?
> ```
> Returns the first index where predicate returns true for the
corresponding value, or nil if such value is not found.
>
> **Complexity**: `O(self.count)`.
The author sees a couple of issue with these two methods:
1. They do not provide an option for restricting the "haystack" to a
certain `range` of `self`.
2. They do not provide a fast (`O(log2(self.count))`) path for sorted
collections.
3. They should not be limited to `CollectionType` but instead be moved to
`Indexable`.
In many situations it is desirable to perform fast searches on sorted
collections instead of having to fall back to naïve linear searches.
Further more while a single `index` might be the most common subject of
interest, there are many scenarios where a `range` or `count` of matches
are of more use.
And anyone who is writing a custom ordered collection type will eventually
want to find the insertion index for a given element and will thus end up
writing some variant of `lowerBoundOf(…)` or `upperBoundOf(…)` (upon which
all the other methods can be implemented, actually).
## Proposed solution
The author proposes:
1. A backwards-compatible refactoring of `CollectionType.indices`, moving
it to `Indexable`.
2. A backwards-compatible refactoring of `indexOf(…)` (adding optional
`range:` and moving it to `Indexable`).
3. The addition of `rangeOf(…)`, `countOf(…)` and `isSorted(…)` to
`Indexable` with a TIME complexity of `O(self.count)`.
4. The introduction of a `BinarySearchView` on `Indexable`, allowing for
fast (`O(log2(self.count))`) searches on `Indexable` via `indexOf(…)`,
`rangeOf(…)`, `countOf(…)`, `lowerBoundOf(…)`, `upperBoundOf(…)` without
cluttering `Indexable`'s interface.
## Detailed design
### `CollectionType.indices`:
The author proposes the relocation of `.indices` from `CollectionType` to
`Indexable`:
extension Indexable {
/// Return the range of valid index values.
///
/// The result's `endIndex` is the same as that of `self`. Because
/// `Range` is half-open, iterating the values of the result produces
/// all valid subscript arguments for `self`, omitting its `endIndex`.
public var indices: Range<Self.Index> { get }
}
After all all it does is provide a convenience method for generating a
Range from its `startIndex` and `endIndex`. There is no need for `self` to
be a `CollectionType` for this.
This change should not break any existing code as it generalizes the
property.
### `Indexable` of `Comparable` elements:
The author proposes the addition of the following method on `Indexable
where Element : Comparable`:
extension Indexable where Index: RandomAccessIndexType, _Element:
Comparable {
/// Return true iff all elements in `self` within `range` are sorted
according to `isOrderedBefore`.
///
/// - Complexity: O(`self.count`)
public func isSorted(range: Range<Index>? = nil) -> Bool
}
This method would perform a linear scan of the `Indexable` with a TIME
complexity of `O(self.count)`.
An actual implementation of these methods can be found in
`Indexable.swift`.
### `Indexable` of `Equatable ` elements:
The author proposes the addition of the following method on `Indexable
where Element : Equatable`:
extension Indexable where Index : RandomAccessIndexType, _Element :
Equatable {
/// Returns the first index where `element` appears in `self`
/// within range `range` or `nil` if `element` is not found.
///
/// - Complexity: O(`self.count`)
public func indexOf(element: _Element, range: Range<Index>? = default)
-> Self.Index?
/// Returns the first range where `element` appears in `self`
/// within range `range` or `nil` if `element` is not found.
///
/// - Complexity: O(`self.count`)
public func rangeOf(element: _Element, range: Range<Index>? = default)
-> Range<Self.Index>?
/// Returns the number of `element`s that appear in `self`
/// within range `range` or `nil` if such an element is not found.
///
/// - Complexity: O(`self.count`)
public func countOf(element: _Element, range: Range<Index>? = default)
-> Int
}
These methods would perform a linear scan of the `Indexable` with a TIME
complexity of `O(self.count)`.
An actual implementation of these methods can be found in
`Indexable.swift`.
### `Indexable` of any elements:
The author proposes the addition of the following methods on `Indexable`:
extension Indexable where Index : RandomAccessIndexType {
/// Returns the first index where an element appears in `self`
/// within range `range` for which `lensView(element) == value` or
`nil` if such an element is not found.
///
/// - Complexity: O(`self.count`)
public func indexOf(range: Range<Index>? = default, @noescape
predicate: (_Element) throws -> Bool) rethrows -> Self.Index?
/// Returns the first range where an element appears in `self`
/// within range `range` for which `lensView(element) == value` or
`nil` if such an element is not found.
///
/// - Complexity: O(`self.count`)
public func rangeOf(range: Range<Index>? = default, @noescape
predicate: (_Element) throws -> Bool) rethrows -> Range<Self.Index>?
/// Returns the number of `element`s that appear in `self`
/// within range `range` or `nil` if `element` is not found.
///
/// - Complexity: O(`self.count`)
public func countOf(range: Range<Index>? = default, @noescape
predicate: (_Element) throws -> Bool) rethrows -> Int
/// Return true iff all elements in `self` within `range` are sorted
according to `isOrderedBefore`.
///
/// - Complexity: O(`self.count`)
public func isSorted(range: Range<Index>? = default, @noescape
isOrderedBefore: (_Element, _Element) throws -> Bool) rethrows -> Bool
}
These methods would perform a linear scan of the `Indexable` with a TIME
complexity of `O(self.count)`.
An actual implementation of these methods can be found in
`Indexable.swift`.
**Note**: For explanation of and reasoning behind the `lensView:` argument
as well as the distinction between `T` and `Base._Element`, see below.
### `Indexable.binarySearchView(validateAgainst:?)`:
The author proposes the addition of a
`binarySearchView(validateAgainst:?)` method on `Indexable` for obtaining a
specialized search view for performing searches with O(`log2(self.count)`)
complexity:
extension Indexable where Index: RandomAccessIndexType {
/// Create a view into a sorted indexable that allows access within
`bounds`.
///
/// - Complexity: O(`self.count`) if a validation closure is provided,
otherwise O(`1`).
public func binarySearchView(validateAgainst isOrderedBefore: ((_Element,
_Element) -> Bool)? = nil) -> BinarySearchView<Self>
}
### `BinarySearchView<T: Indexable>`:
The author proposes the introduction of a `BinarySearchView` struct with
the following interface:
public struct BinarySearchView<Base : Indexable where Base.Index :
> : Indexable {
public typealias Index = Base.Index
public var startIndex: Base.Index { get }
public var endIndex: Base.Index { get }
public subscript (position: Index) -> Base._Element { get }
/// Create a view into a sorted collection `base` that allows access
within `bounds`.
///
/// - Complexity: O(1).
public init(base: Base, validateAgainst isOrderedBefore:
((Base._Element, Base._Element) -> Bool)? = default)
/// Returns first index of the first `element` in `self` within range
`range` for which
/// `lensView(element) < value` evaluates to false or `nil` if
`element` is not found.
///
/// - Complexity: O(`log2(self.count)`).
public func lowerBoundOf<T : Comparable>(value: T, range:
Range<Index>? = default, @noescape isOrderedBefore: (T, T) -> Bool =
default, @noescape lensView: Base._Element -> T) -> Base.Index
/// Returns first index of the first `element` in `self` within range
`range` for which
/// `value < lensView(element)` evaluates to true or `nil` if
`element` is not found.
///
/// - Complexity: O(`log2(self.count)`).
public func upperBoundOf<T : Comparable>(value: T, range:
Range<Index>? = default, @noescape isOrderedBefore: ((T, T) -> Bool) =
default, @noescape lensView: Base._Element -> T) -> Base.Index
/// Returns the first index where an element appears in `self`
/// within range `range` for which `lensView(element) == value` or
`nil` if such an element is not found.
///
/// - Complexity: O(`log2(self.count)`).
public func indexOf<T : Comparable>(value: T, range: Range<Index>? =
default, @noescape isOrderedBefore: (T, T) -> Bool = default, @noescape
lensView: Base._Element -> T) -> Base.Index?
/// Returns the range where an element appears in `self`
/// within range `range` for which `lensView(element) == value` or
`nil` if such an element is not found.
///
/// - Complexity: O(`log2(self.count)`).
public func rangeOf<T : Comparable>(value: T, range: Range<Index>? =
default, @noescape isOrderedBefore: (T, T) -> Bool = default, @noescape
lensView: Base._Element -> T) -> Range<Base.Index>?
/// Returns the number of `element`s that appear in `self`
/// within range `range` for which `lensView(element) == value` or
`nil` if such an element is not found.
///
/// - Complexity: O(`log2(self.count)`).
public func countOf<T : Comparable>(value: T, range: Range<Index>? =
default, @noescape isOrderedBefore: (T, T) -> Bool = default, @noescape
lensView: Base._Element -> T) -> Int
internal let _base: Base
}
As well as the following extension for `Indexable where Element:
Comparable`:
extension BinarySearchView where Base._Element : Comparable {
/// Returns first index of the first `element` in `self` within range
`range` for which
/// `… < element` evaluates to false or `nil` if `element` is not
found.
///
/// - Complexity: O(`log2(self.count)`).
public func lowerBoundOf(element: Base._Element, range: Range<Index>?
= default, @noescape isOrderedBefore: (Base._Element, Base._Element) ->
Bool = default) -> Base.Index
/// Returns first index of the first `element` in `self` within range
`range` for which
/// `element < …` evaluates to true or `nil` if `element` is not found.
///
/// - Complexity: O(`log2(self.count)`).
public func upperBoundOf(element: Base._Element, range: Range<Index>?
= default, @noescape isOrderedBefore: (Base._Element, Base._Element) ->
Bool = default) -> Base.Index
/// Returns the first index where `element` appears in `self`
/// within range `range` or `nil` if `element` is not found.
///
/// - Complexity: O(`log2(self.count)`).
public func indexOf(element: Base._Element, range: Range<Index>? =
default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool
= default) -> Base.Index?
/// Returns the range where `element` appears in `self`
/// within range `range` or `nil` if `element` is not found.
///
/// - Complexity: O(`log2(self.count)`).
public func rangeOf(element: Base._Element, range: Range<Index>? =
default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool
= default) -> Range<Base.Index>?
/// Returns the number of `element`s that appear in `self`
/// within range `range` or `nil` if `element` is not found.
///
/// - Complexity: O(`log2(self.count)`).
public func countOf(element: Base._Element, range: Range<Index>? =
default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool
= default) -> Int
}
These methods would perform a linear scan of the `Indexable` with a TIME
complexity of `O(log2(self.count))`.
An actual implementation of these methods can be found in
`BinarySearchView.swift`.
### Why `lensView`?
Let's assume one had a collection of Persons like this:
struct Person {
let name: String
let age: Int
}
let persons: [Person] = [ … ]
Now let's assume one wanted to find the first person that's called `John
Doe`. One could either change `Person` to conform to `Equatable`, allowing
`indexOf(element:)` to be called on `persons`. Now that was easy, wasn't it?
Fine, but what if one wanted to search `Person`s by both their `name` or
their `age` depending on the given use case? You're out of luck now.
Luckily there is a variant of `indexOf(…)` called `indexOf(predicate:)`
that is available for any type of `Element` regardless of its conformance
to `Equatable`:
let searchView = persons.binarySearchView()
let index = persons.indexOf { $0.name == "John Doe" }
Unfortunately however such a `predicate` cannot be used when searching for
an element's property in a sorted(!) collection or a collection of
non-`Comparable` elements within a TIME complexity of `O(log2(self.count))`
as binary search uses `<` instead of `==` internally and as the "needle"
gets passed as both the first (in `indexOf(…)` itself) and second argument
(in `lowerBoundOf(…)` which gets called by `indexOf(…)`) to the operator,
one cannot simply change `predicate: (E) -> Bool` to `isOrderedBefore: (V,
E) -> Bool` as that would result in a type mismatch (`(V, E)` vs. `(E, V)`).
The use of `lensView` however allows one to do just that without making a
mess.
let searchView = persons.binarySearchView() // sorted!
let index = searchView.indexOf("John Doe") { $0.name }
The `lensView` works similarly to how keypaths work in Objective-C, just
type-safe.
## Impact on existing code
The author proposes:
1. Moving `CollectionType.indices` to `Indexable` should not cause any
issues with existing code.
2. Changing `indexOf(element:)` to `indexOf(element:range:?)` should not
cause any issues with existing code as the `range` argument is optional and
the default behaves like it used to.
2. Changing `indexOf(predicate:)` to `indexOf(range:?predicate:)` should
not cause any issues with existing code either, as the `range` argument is
optional and the default behaves like it used to. Swift's trailing-closure
magic allows `indexOf(predicate: …)` to still properly map to
`indexOf(range: nil, predicate: …)`.
3. The addition of `rangeOf(…)`, `countOf(…)` and `isSorted(…)` to
`Indexable` do not conflict with any existing API.
4. The introduction of a `BinarySearchView` on `Indexable` does not
conflict with any existing API.
## Alternatives considered
The author's initial approach was to implement all methods on Indexable.
This however required all methods to either be provided in two variants
(one for unsorted, one for sorted `Indexable`s):
final public func indexOf(element: _Element, range: Range<Index>? = nil)
-> Index? {
return …
}
final public func sortedIndexOf(element: _Element, range: Range<Index>? =
nil) -> Index? {
return …
}
Or require the introduction of an additional argument:
final public func indexOf(element: _Element, range: Range<Index>? = nil,
isSorted: Bool = true) -> Index? {
if isSorted {
return …
} else {
return …
}
}
It also would have cluttered `Indexable` with methods such as
`lowerBoundOf` and `upperBoundOf`, which are only ever applicable when
`self` is sorted accordingly. The introduction of a dedicated
BinarySearchView fixes these issues.
One also would have had to switch from `predicate: (T) -> Bool` to
`isOrderedBefore: (T, T) -> Bool` for all methods for the second unified
API alternative.
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution