Binary Search and Swift

rockbruno · October 10, 2020, 2:10pm

Binary Search was suggested to Swift 4 years ago but it didn't see the light of day. Most people were in favor of it but there were concerns about adding an API that can easily be misused (the requirement of being sorted).

I agree with that, though I don't really see that as that much of a problem. It's sad, because you could argue that binary search is one of the most useful algorithms to add in a library like this. What do we think of it as an addition to Swift Algorithms after all this time?

As a suggestion, I guess that for example if we are too worried about it being misused we could add a debug-only assertion that checks if the array is properly sorted before running so that newer developers will get a crash in debug if they attempt to use it the wrong way.

rockbruno · October 10, 2020, 2:30pm

I agree that a different name could be a good solution for this. Though I believe Swift uses the unsafe keyword specifically for things that cannot guarantee a valid result in terms of corruption, instead of just right/wrong in this case.

timv · October 10, 2020, 4:49pm

Swift Algorithms already (sort of) has a binary search: the partitioningIndex(where:) method requires that the collection is partitioned according to the predicate, and returns the index of the first element that satisfies the predicate.

let array = [2, 3, 5, 7, 11, 13, 17, 19]
let index = array.partitioningIndex(where: { $0 >= 10 })
print(array[..<index], array[index...]) // => [2, 3, 5, 7] [11, 13, 17, 19]

From here a binarySearch method is as simple as

extension RandomAccessCollection where Element: Comparable {
    func binarySearch(_ element: Element) -> (index: Index, found: Bool) {
        let index = partitioningIndex(where: { $0 >= element })
        let found = index != endIndex && self[index] == element
        return (index, found)
    }
}

I personally don't really have a problem with these methods returning nonsense in case the collection isn't sorted/partitioned accordingly. Many existing APIs can be misused already, such as trying to index an array with an index that is too large, or trying to sort an array with a comparison method that does not impose a strict weak ordering. We could even add an assertion for debug builds that checks that the input is valid.

CTMacUser · October 12, 2020, 5:30am

I just found out about this library while browsing through Apple's GitHub site for SwiftNIO. I was doing this to take a break from... writing algorithms! In fact, I did binary search and sort check. What about this interface:

extension Sequence {
    /// Returns a sequence containing the longest prefix of elements that are
    /// sorted according to the given predicate.
    public func sortedPrefix(stopsAtDuplicate: Bool = false, by areInIncreasingOrder: (Element, Element) throws -> Bool) rethrows -> [Element]

    /// Returns a sequence formed by skipping the longest prefix of elements that
    /// were sorted according to the given predicate.
    public func dropSortedPrefix(stopsAtDuplicate: Bool = false, by areInIncreasingOrder: (Element, Element) throws -> Bool) rethrows -> PrependedIteratorSequence<Iterator> 
}

extension Collection {
    /// Returns the past-the-end index for the longest prefix of the collection
    /// that can be considered sorted according to the given predicate.
    public func sortedPivot(stopsAtDuplicate: Bool = false, by areInIncreasingOrder: (Element, Element) throws -> Bool) rethrows -> Index
}

All three have a variant the drops the predicate to default it to <. The PrependedIteratorSequence type carries over a dropped suffix and the element just before it, since I can't test where a sort breaks without reading the first post-sort element from the iterator. (The standard DropWhileSequence does the same thing.)

In sorting, runs of the same value are OK. But I added an optional flag for strictly increasing sequences/collections, for advanced work.

extension Collection {
    /// Returns the index of some element equivalent to the given value,
    /// assuming the collection is (at least) partially sorted according to the
    /// given predicate.
    public func someIndex(of element: Element, by areInIncreasingOrder: (Element, Element) throws -> Bool) rethrows -> (index: Index, isMatch: Bool)

    /// Returns the index of the earliest element whose order is equivalent to
    /// the element at the given index, assuming the collection is (at least)
    /// partially sorted according to the given predicate.
    public func lowerBound(for pivot: Index, by areInIncreasingOrder: (Element, Element) throws -> Bool) rethrows -> Index

    /// Returns the index of the earliest element whose order is greater than
    /// the element at the given index, assuming the collection is (at least)
    /// partially sorted according to the given predicate.
    public func upperBound(for pivot: Index, by areInIncreasingOrder: (Element, Element) throws -> Bool) rethrows -> Index

    /// Returns the range of indices for the elements of the collection that are
    /// all equivalent to the given value, assuming the collection is (at least)
    /// partially sorted according to the given predicate.
    public func sortedBounds(of element: Element, by areInIncreasingOrder: (Element, Element) throws -> Bool) rethrows -> Range<Index>
}

All four have a variant that removes the predicate parameter to default it to <. I was inspired by the analogous functions from C++, but tried to improve their efficiency.

binary_search just tells you if a match exists or not. It doesn't tell you where it is (if successful) or the best place to insert (if failure). Common implementations also just punt the work to lower_bound. The someIndex(of: by:) method gives you both the location (so you can do further work), and whether it's a match (so you don't need to test for endIndex and for element equality before the main work) with a custom algorithm.
Speaking of a custom algorithm, someIndex(of: by:), lowerBound(for: by:), and upperBound(for: by:) do separate parts of the process, and so have slightly different algorithms. Like I said, unlike C++, someIndex never punts to lowerBound, so it's free to stop at any match, even in the middle. If you want the lower border, you must explicitly ask for it.
binary_search, lower_bound, and upper_bound base their searches off a value. While someIndex does the same thing, lowerBound and upperBound work off a known-good matching index. This is for efficiency; you have to call someIndex or your own substitute first. And there's no need to worry about what happens if you use lb or ub with a non-existent value; you couldn't get a matching index to use in the first place.
sortedBounds(of: by:) calls the other three methods for a basic user interface.

dabrahams · October 12, 2020, 5:09pm

That algorithm is, fundamentally, a binary search. The background for understanding that is in this paper. I do wish it hadn't been renamed though.

CTMacUser · October 27, 2020, 6:54am

I just added a version of these methods as a pull request.