Sorted Collections API Review
Hello! I'd like to post the overview of the new SortedDictionary
and SortedSet
APIs bring brought to swift-collections. Please feel free to take a look below and leave any feedback!
Introduction
A module of Swift Collections containing new data structures which efficiently maintain their elements in sorted order. The initial release includes SortedSet
and SortedDictionary
.
Motivation
The Swift Collections package currently includes two data structures which maintain its members in a well-defined order: OrderedSet
and OrderedDictionary
. These types are useful in a variety of use-cases, as they enumerate their elements in insertion order. However, there also exist many situations where its desired to maintain elements in an order specified by a predefined comparator. For example, a common pattern in user interfaces is displaying a list with entries sorted in some order, such as in chronological order.
Quick ad hoc implementations of sorted data structures can have many pitfalls. A naive implementation can devolve to quadratic performance. A smarter implementation using binary search is difficult to get correct and has subtle potential pitfalls. For this reason, there is natural place to provide high-performance, production-grade sorted data structures.
Proposed Solution
Two new types, SortedSet
and SortedDictionary
, which guarantee logarithmic performance, mimicking their traditional Set
and Dictionary
counterparts as closely as possible.
Underlying these types is a B-Tree data structure. The use of a tree is unique in that it departs from the use of a contiguous storage buffer (such as in hash tables or arrays). Instead B-Trees can be thought of as a variant of a binary search tree where each node contains multiple 'keys' stuffed into a single node (as a contiguous buffer). The amount of keys is a careful balancing act. Binary search will typically be fastest within a single contiguous buffer, however as a contiguous buffer gets larger, random insertions become slower, combined with the costs of large allocations. A B-Tree implementation optimizes between these two characteristics.
B-Trees also have notable efficiency implementing the CoW optimization. As the data is not stored within a single contiguous data structure, only the nodes along the path must be copied. The following example demonstrates this on a tree mutating element E
:
βββββ βββββ
β D β β D*β
βββββ΄ββββ΄ββββ βββββ΄ββββ΄ββββ
β β β β
βββ΄ββ βββ΄ββ βββ΄ββ βββ΄ββ
β B β β F β ββββΊ β B β β F*β
ββ΄ββββ΄β ββ΄ββββ΄β ββ΄ββββ΄β ββ΄ββββ΄β
β β β β β β β β
βββ΄ββ βββ΄ββ βββ΄ββ βββ΄ββ βββ΄ββ βββ΄ββ βββ΄ββ βββ΄ββ
β A β β C β β E β β G β β A β β C β β E*β β G β
βββββ βββββ βββββ βββββ βββββ βββββ βββββ βββββ
For larger trees, this means the vast majority of the data structure can be re-used.
Sorted data structures based on B-Trees are not bound to a uniqueness constraint. Dropping such a constraint opens the door to the possibility of efficient 'multi' variants of data structures such as dictionaries capable of storing multiple instances of key-value pairs with the same key, or sets with multiple instances of the same element. This module focuses on developing a versatile and high performance B-Tree data structure, along with sorted and unique variants of Set
and Dictionary
, however the place of these 'multi' data structures are an open question enabled by the implementation of the underlying tree data structure.
Detailed Design
SortedSet
A SortedSet
mimics the API of a traditional Set, however it automatically maintains its elements in ascending sorted order:
let states: SortedSet = ["Nebraska", "Arkansas", "Colorado"]
// Familiar methods from SetAlgebra
states.formUnion(["Iowa", "Virginia"])
// Element-range based operations
states["D"...] // SubSequence with ["Iowa", "Nebraska", "Virginia"]
states // ["Arkansas", "Colorado", "Iowa", "Nebraska", "Virginia"]
Definition
struct SortedSet<Element: Comparable>
: BidirectionalCollection,
SetAlgebra,
ExpressibleByArrayLiteral,
CustomStringConvertible, CustomDebugStringConvertible, CustomReflectable {
struct Index { ... }
typealias Indices = DefaultIndices<Self>
struct Iterator { ... }
struct SubSequence: BidirectionalCollection, Equatable {
typealias Element = SortedSet.Element
typealias Index = SortedSet.Index
typealias SubSequence = Self
struct Iterator { ... }
var base: SortedSet { get }
subscript(range: Range<Element>) -> Self
subscript(range: PartialRangeFrom<Element>) -> Self
subscript(range: PartialRangeThrough<Element>) -> Self
subscript(range: PartialRangeUpTo<Element>) -> Self
}
}
// Conditional conformances
extension SortedSet: Hashable where Element: Hashable {}
extension SortedSet: Encodable where Element: Encodable {}
extension SortedSet: Decodable where Element: Decodable {}
extension SortedSet.SubSequence: Hashable where Element: Hashable {}
By using a custom Index
, SortedSet
can maintain a constant time subscript.
Additionally, it provides a fast path for advancing the index. However, mutations
with an index such as remove(at:)
by nature are still logarithmic.
SortedSet Specific APIs
SortedSet
provides a new methods which do not exist on a traditional Set
:
extension SortedSet {
/// Creates a sorted set from a sequence of sorted elements.
///
/// If duplicates are encountered within the sequence, the last instance of
/// the element is kept within the collection and preceeding instances are
/// discarded.
///
/// This provides an optimized method of constructing a ``SortedSet``, when
/// the initializer does not need to sort the data.
///
/// On debug builds, this will trap if the provided sequence is not sorted.
/// Otherwise, providing a non-sorted sequence can put the collection in an
/// undefined state.
///
/// - Complexity: O(`n`) where `n` is the number of elements in `elements`.
init<S: Sequence>(sortedElements elements: S) where S.Element == Element
}
Additionally, SortedSet
offers new and convenient APIs which are unique to its sorted nature:
extension SortedSet {
/// Returns a sequence of elements in the collection bounded by
/// limiting element values.
///
/// This is particularly useful when applied with a bound corresponding to some
/// group of elements.
///
/// let students: SortedSet = ...
/// students["A"..<"B"] // Sequence of students with names beginning with "A"
///
/// - Complexity: O(log(`self.count`))
subscript(range: Range<Element>) -> SubSequence<Element>
subscript(range: PartialRangeFrom<Element>) -> SubSequence<Element>
subscript(range: PartialRangeThrough<Element>) -> SubSequence<Element>
subscript(range: PartialRangeUpTo<Element>) -> SubSequence<Element>
}
Individual implementations must exist for each of the Range
types. The reason a generic implementation over RangeExpression
cannot be used is that there are no known bounds to clamp to to obtain a concrete range.
These range subscripts are also implemented upon SortedSet.SubSequence
.
SetAlgebra and the Set Interface Mimicked
SortedSet
contains respective implementations of almost all the methods
specific to Set
:
extension SortedSet: SetAlgebra {
/// Creates an empty sorted set.
public init()
/// - Complexity: O(`n`) where `n` is the number of elements in `elements`.
public init<S: Sequence>(_ elements: S) where S.Element == Element
// Testing for membership
/// - Complexity: O(log(`self.count`))
func contains(_ element: Self.Element) -> Bool
// Adding and Removing Elements
/// - Complexity: O(log(`self.count`))
mutating func insert(_ newMember: Self.Element) -> (inserted: Bool, memberAfterInsert: Self.Element)
/// - Complexity: O(log(`self.count`))
mutating func update(with newMember: Self.Element) -> Self.Element?
/// - Complexity: O(log(`self.count`))
mutating func remove(_ member: Self.Element) -> Self.Element?
// Combining Sets
/// - Complexity: O(`self.count` + `other.count`)
func union(_ other: Self) -> Self
/// - Complexity: O(`self.count` + `other.count`)
mutating func formUnion(_ other: Self) -> Self
/// - Complexity: O(`self.count` + `other.count`)
func intersection(_ other: Self) -> Self
/// - Complexity: O(`self.count` + `other.count`)
mutating func formIntersection(_ other: Self) -> Self
/// - Complexity: O(`self.count` + `other.count`)
func symmetricDifference(_ other: Self) -> Self
/// - Complexity: O(`self.count` + `other.count`)
mutating func formSymmetricDifference(_ other: Self) -> Self
/// - Complexity: O(`self.count`)
func subtract(_ other: Self)
/// - Complexity: O(`self.count`)
mutating func subtracting(_ other: Self) -> Self
// Comparing Sets
/// - Complexity: O(max(`self.count`, `other.count`))
func isStrictSubset(of other: Self) -> Bool
/// - Complexity: O(max(`self.count`, `other.count`))
func isStrictSuperset(of other: Self) -> Bool
/// - Complexity: O(max(`self.count`, `other.count`))
func isDisjoint(with other: Self) -> Bool
/// - Complexity: O(max(`self.count`, `other.count`))
func isSubset(of other: Self) -> Bool
/// - Complexity: O(max(`self.count`, `other.count`))
func isSuperset(of other: Self) -> Bool
}
The methods which are not implemented are those related to capacity. See below
for rational on exposing capacity
init(minimumCapacity: Int)
var capacity: Int { get }
func reserveCapacity(Int)
Custom Implementations
SortedSet
implements optimized implementations for certain methods from Collection:
/// - Complexity: O(log(`self.count`))
func randomElement<T: RandomNumberGenerator>(using generator: inout T) -> Element?
An optimized implementation can reduce the amount of (non-trivial) indexing operations that would have been performed by the default implementation.
Partial RangeReplaceableCollection
SortedSet
implements a subset of the RangeReplaceableCollection
methods it can
support, similar to OrderedSet
. SortedSet
cannot fully conform to
RangeReplaceableCollection
as some of its methods could result in violating the sortedness invariant.
extension SortedSet {
/// - Complexity: O(`self.count`)
func filter(_ isIncluded: (Element) throws -> Bool) rethrows -> Self
/// - Complexity: O(log(`self.count`))
mutating func popFirst() -> Element?
/// - Complexity: O(log(`self.count`))
mutating func popLast() -> Element?
/// - Complexity: O(log(`self.count`))
@discardableResult
mutating func removeFirst() -> Element
/// - Complexity: O(log(`self.count`) + `k`)
mutating func removeFirst(_ k: Int)
/// - Complexity: O(log(`self.count`))
@discardableResult
mutating func removeLast() -> Element
/// - Complexity: O(log(`self.count`) + `k`)
mutating func removeLast(_ k: Int)
/// - Complexity: O(log(`self.count`))
@discardableResult
mutating func remove(at index: Index) -> Element
/// - Complexity: O(log(`self.count` + `range.count`))
mutating func removeSubrange(_ range: Range<Self.Index>)}}
The methods which were not implemented due to design decisions are:
// Not implemented due to the potential to violate the uniqueness
// and sortedness invariants.
//
// However, it may be possible to design a variant of this which
// does not allow violating the invariants.
mutating func applying(_ difference: CollectionDifference<Element>) -> Self?
// In the case of duplicate elements, this does not make it clear
// which collections' element takes precedence.
//
// Instead use of formUnion(_:) and union(_:) is preferred. This
// is the same approach taken by Set.
static func +(Self, Other) -> Self
// See below for rationale on capacity-related methods
func reserveCapacity(Int)
The methods which are not logical to implement are:
// Would violate the uniqueness invariant
init(repeating: Element, count: Int)
// May violate the sortedness invariant
mutating func append(_ element: Element)
// May violate the sortedness invariant
mutating func append<S: Sequence>(_ element: S) where S.Element == Element
// May violate the sortedness invariant
mutating func insert(_ element: Element, at index: Index)
// May violate the sortedness invariant
mutating func replaceSubrange<C: Collection>(_ range: Range<Index>, with: C) where C.Element == Element
SortedDictionary
A SortedDictionary
mimics the API of a traditional Dictionary
, however it automatically maintains its elements in ascending sorted order by its keys:
let httpCodes: SortedDictionary = [
200: "OK",
403: "Access forbidden",
404: "File not found",
500: "Internal server error"]
// Familiar operations from Dictionary
httpCodes[301] = "Moved permanently"
httpCodes
// [200: "OK",
// 301: "Moved permanently"
// 403: "Access forbidden",
// 404: "File not found",
// 500: "Internal server error"]
Definition
struct SortedDictionary<Key: Comparable, Value>
: BidirectionalCollection,
ExpressibleByDictionaryLiteral,
CustomStringConvertible, CustomDebugStringConvertible, CustomReflectable {
typealias Element = (key: Key, value: Value)
struct Index { ... }
typealias Indices = DefaultIndices<Self>
struct Iterator { ... }
struct SubSequence: BidirectionalCollection {
typealias Element = SortedDictionary.Element
typealias Index = SortedDictionary.Index
typealias SubSequence = Self
struct Iterator { ... }
var base: SortedDictionary { get }
}
}
// Key and Value views
extension SortedDictionary {
struct Keys: BidirectionalCollection, Equatable {
typealias Element = Key
typealias Index = SortedDictionary.Index
typealias Indices = Range<Index>
struct Iterator { ... }
struct SubSequence: BidirectionalCollection {
typealias Element = SortedDictionary.Key
typealias Index = SortedDictionary.Index
typealias SubSequence = Self
struct Iterator { ... }
var base: SortedDictionary.Keys { get }
}
}
struct Values: BidirectionalCollection {
typealias Element = Value
typealias Index = SortedDictionary.Index
typealias Indices = Range<Index>
struct Iterator { ... }
struct SubSequence: BidirectionalCollection {
typealias Element = SortedDictionary.Value
typealias Index = SortedDictionary.Index
typealias SubSequence = Self
struct Iterator { ... }
var base: SortedDictionary.Values { get }
}
}
}
// Conditional conformances
extension SortedDictionary: Equatable where Value: Equatable {}
extension SortedDictionary: Hashable where Key: Hashable, Value: Hashable {}
extension SortedDictionary: Encodable where Key: Encodable, Value: Encodable {}
extension SortedDictionary: Decodable where Key: Decodable, Value: Decodable {}
extension SortedDictionary.SubSequence: Equatable where Value: Equatable {}
extension SortedDictionary.SubSequence: Hashable where Key: Hashable, Value: Hashable {}
extension SortedDictionary.Keys: Hashable where Key: Hashable {}
extension SortedDictionary.Keys.SubSequence: Hashable where Element: Hashable {}
extension SortedDictionary.Values: Equatable where Value: Hashable {}
extension SortedDictionary.Values: Hashable where Element: Hashable {}
extension SortedDictionary.Values.SubSequence: Equatable where Element: Value {}
extension SortedDictionary.Values.SubSequence: Hashable where Element: Hashable {}
SortedDictionary Specific APIs
SortedDictionary
exposes a few APIs which are not available on a traditional Dictionary
.
extension SortedDictionary {
/// Creates a sorted dictionary from a sequence of sorted elements.
///
/// If duplicates are encountered within the sequence, the last instance of
/// the element is kept within the collection and preceeding instances are
/// discarded.
///
/// This provides an optimized method of constructing a ``SortedDictionary``, when
/// the initializer does not need to sort the data.
///
/// On debug builds, this will trap if the provided sequence is not sorted.
/// Otherwise, providing a non-sorted sequence can put the collection in an
/// undefined state.
///
/// - Complexity: O(`n`) where `n` is the number of elements in `elements`.
init<S: Sequence>(sortedKeysWithValues keysAndValues: S) where S.Element == Element
/// Allows in-place mutation of a dictionary's value at a particular key.
///
/// - Parameters:
/// - key: The key to look up (or append). If `key` does not already exist
/// in the dictionary, it is appended with the supplied default value.
/// - defaultValue: The default value to append if `key` doesn't exist in
/// the dictionary.
/// - body: A function that performs an in-place mutation on the dictionary
/// value.
/// - Returns: The return value of `body`.
/// - Complexity: O(log(`self.count`))
mutating func modifyValue<R>(
forKey key: Key,
default defaultValue: @autoclosure () -> Value,
_ body: (inout Value) throws -> R
) rethrows -> R
}
The func modifyValue(forKey:default_:)
allows for efficient in-place mutation of a dictionaries value at a key. This API parallels a similar method on OrderedDictionary
.
Additionally, SortedDictionary
offers new and convenient APIs which are unique to its sorted nature:
extension SortedDictionary {
/// Returns a sequence of elements in the collection bounded by
/// limiting key values.
///
/// This is particularly useful when applied with a bound corresponding to a
/// some group.
///
/// let httpCodes: SortedDictionary = ...
/// let successCodes = httpCodes[200..<300]
///
/// - Complexity: O(log(`self.count`))
subscript(range: Range<Key>) -> SubSequence<Element> { get }
subscript(range: PartialRangeFrom<Key>) -> SubSequence<Element>
subscript(range: PartialRangeThrough<Key>) -> SubSequence<Element>
subscript(range: PartialRangeUpTo<Key>) -> SubSequence<Element>
}
Individual implementations must exist for each of the Range
types. The reason a generic implementation over RangeExpression
cannot be used is that there are no known bounds to clamp to to obtain a concrete range.
These range subscripts are also implemented upon:
SortedDictionary.SubSequence
SortedDictionary.Keys
SortedDictionary.Keys.SubSequence
Dictionary Interface Mimicked
SortedDictionary
implements almost entirely the same API as a standard dictionary:
extension SortedDictionary {
// Initializers
/// - Complexity: O(1)
init()
/// - Complexity: O(n log n) where n is the number of elements in `keysAndValues`
init<S: Sequence>(_ keysAndValues: S) where S.Element == Element
/// - Complexity: O(n log n) where n is the number of elements in `keysAndValues`
init<S: Sequence>(grouping: S, by: (S.Element) -> Key) where Value == [S.Element]
// Accessing Keys and Values
/// - Complexity: O(log(`self.count`))
subscript(key: Key) -> Value? { get set _modify }
/// - Complexity: O(log(`self.count`))
subscript(
key: Key,
default defaultValue: @autoclosure () -> Value
) -> Value { get set _modify }
/// - Complexity: O(log(`self.count`))
func index(forKey key: Key) -> Index?
/// - Complexity: O(1)
var keys: Keys { get }
/// - Complexity: O(1)
var values: Values { get set _modify }
// Adding Keys and Values
/// - Complexity: O(log(`self.count`))
mutating func updateValue(_ value: Value, forKey key: Key) -> Value?
/// - Complexity: O(`self.count` + `other.count`)
mutating func merge<S: Sequence>(_ other: S, uniquingKeysWith combine: (Value, Value) throws -> Value) rethrows
/// - Complexity: O(`self.count` + `other.count`)
mutating func merge(_ other: SortedDictionary<Key, Value>, uniquingKeysWith combine: (Value, Value) throws -> Value) rethrows
/// - Complexity: O(`self.count` + `other.count`)
func merging<S: Sequence>(_ other: S, uniquingKeysWith combine: (Value, Value) throws -> Value) rethrows -> SortedDictionary<Key, Value>
/// - Complexity: O(`self.count` + `other.count`)
func merging(_ other: SortedDictionary<Key, Value>, uniquingKeysWith combine: (Value, Value) throws -> Value) rethrows -> SortedDictionary<Key, Value>
// Removing Keys and Values
/// - Complexity: O(log(`self.count`))
mutating func removeValue(forKey key: Key) -> Value?
// Transforming a Dictionary
/// - Complexity: O(`self.count`)
func mapValues<T>(_ transform: (Value) throws -> T) rethrows -> SortedDictionary<Key, T>
/// - Complexity: O(`self.count`)
func compactMapValues<T>(_ transform: (Value) throws -> T?) rethrows -> SortedDictionary<Key, T>
}
The methods not implemented are the following:
// SortedDictionaries can efficiently handle duplicate elements so these
// initializers are not as useful.
init<S: Sequence>(uniqueKeysWithValues: S) where S.Element == Element
init<S: Sequence>(S, uniquingKeysWith: (Value, Value) -> Value) where S.Element == Element
// See below for the rationale behind not implementing capacity-related
// methods.
init(minimumCapacity: Int)
var capacity: Int
func reserveCapacity(_ capacity: Int)
Custom Implementations
SortedDictionary implements optimized implementations for certain methods from Collection:
func randomElement<T: RandomNumberGenerator>(using generator: inout T) -> Element?
Partial RangeReplaceableCollection
SortedDictionary
implements most of the methods specified by RangeReplaceableCollection
. These additional methods are possible to implement thanks to the well-defined ordering of a SortedDictionary
:
extension SortedDictionary {
// Implemented on Dictionary
/// - Complexity: O(`self.count`)
func filter(_ isIncluded: (Element) throws -> Bool) rethrows -> Self
/// - Complexity: O(log(`self.count`))
mutating func popFirst() -> Element?
/// - Complexity: O(log(`self.count`))
mutating func popLast() -> Element?
/// - Complexity: O(log(`self.count`))
@discardableResult
mutating func removeFirst() -> Element
/// - Complexity: O(log(`self.count`) + `k`)
mutating func removeFirst(_ k: Int)
/// - Complexity: O(log(`self.count`))
@discardableResult
mutating func removeLast() -> Element
/// - Complexity: O(log(`self.count`) + `k`)
mutating func removeLast(_ k: Int)
/// - Complexity: O(log(`self.count`))
@discardableResult
mutating func remove(at index: Index) -> Element
/// - Complexity: O(log(`self.count`) + `range.count`)
mutating func removeSubrange(_ range: Range<Self.Index>)}
The methods which were not implemented due to design decisions are:
// Not implemented due to the potential to violate the uniqueness
// and sortedness invariants. For this reason, this is also not
// implemented on a traditional Dictionary
//
// However, it may be possible to design a variant of this which
// does not allow violating the invariants.
mutating func applying(_ difference: CollectionDifference<Element>) -> Self?
// In the case of duplicate elements, this does not make it clear
// which collections' element takes precedence.
//
// Instead use of merge(...) methods are preferred. This is the
// same approach taken by the standard Dictionary type
static func +(Self, Other) -> Self
// See below for rationale on capacity-related methods
func reserveCapacity(Int)
The methods which are not logical to implement are:
// Would violate the uniqueness invariant
init(repeating: Element, count: Int)
// May violate the sortedness invariant
mutating func append(_ element: Element)
// May violate the sortedness invariant
mutating func append<S: Sequence>(_ element: S) where S.Element == Element
// May violate the sortedness invariant
mutating func insert(_ element: Element, at index: Index)
// May violate the sortedness invariant
mutating func replaceSubrange<C: Collection>(_ range: Range<Index>, with: C) where C.Element == Element
Considerations
Constant-time startIndex
The sorted collections do notably depart from the specified requirement of an O(1) Collection.startIndex
. Maintaining a reference to the first node adds considerable complexity to the implementation and has non-trivial performance trade-offs on all other operations.
Exposing offset-based subscripts.
Many of the other collections expose offset-based subscripts. Specifically, Array
, Deque
, and the ordered collection module use primarily integer-offset subscripts. The sorted collections also have their elements efficiently cardinally identifiable and therefore can also expose such an API. As a result, there is motivation to provide some extent of offset-based APIs.
However, those data structures conform RandomAccessCollection
and can provide an O(1) integer subscript. This is not the case for a sorted collection which requires logarithmic time to find elements. Additionally, in contrast to OrderedSet
and OrderedDictionary
, a sorted collection cannot efficiently use an integer as its index type (as it would not be able to provide an O(1)
subscript or offset advancement fast-path). Using an optimized implementation, a custom Index
can achieve O(1) amortized performance when advancing an indexβ , while integer-offsets would only be able to achieve O(log n) performance.
This means that if integer-offsets are being exposed, they wouldn't make sense to use as the Index
type and would require a parallel-set of integer-offset based APIs to exist. This does increase API surface area considerably and add maintenance burden.
Furthermore, there is a potential for ambiguity in the case of SortedDictionary
:
let httpCodes: SortedDictionary = [
200: "OK",
403: "Access forbidden",
404: "File not found",
500: "Internal server error"]
httpCodes[200] // Does `200` refer to the offset or key.
These sorted collections are not the only type facing these concerns. The String
type in the standard library shares similar performance questions. Rather than implement an ad hoc solution for this, we believe the best approach would be future language-level or standard library developments which ease the syntactical burden of expressing index manipulations.
Instead, it is currently recommend to use the Index
type to perform offset-based operations:
let measurements: SortedSet<Int> = ...
var medianIndex = measurements.startIndex
measurements.formIndex(&medianIndex, offsetBy: measurements.count / 2)
let median: Int = measurements[medianIndex]
β while a custom Index
can achieve O(1) amortized advancement. The current implementation does not, as doing so require maintaining a memory-heavy stack of all nodesβ a trade-off deemed too significant. Instead, O(1) amortized advancement is achieved through the iterator implementation on the sorted data structures and their respective subsequences. The current implementation, does however, use an efficient fast-path which means it is near constant-time for advancement. Currently, for all practically possible dataset sizes, the log(n)
factor is below 3.
Violating Sortedness Invariants
It is expected that an object's Comparable
implementation is properly defined. If not,
the collections will not panic or trap, but their behavior is undefined and they may
return incorrect values. An example of a ill-defined Comparable
implementation is:
struct Person: Comparable {
var name: String
var age: Int
public func <(lhs: Self, rhs: Self) -> Bool {
if externalSource.useNameForComparison() {
return lhs.name < rhs.name
} else {
return lhs.age < rhs.age
}
}
}
The standard library in multiple places puts trust that an implementation does indeed fulfill its stated invariants. Since the sorted collections do neither trap, panic, nor violate the safe-nature of Swift, we've decided it is alright to expect this.
Conforming to RandomAccessCollection
While RandomAccessCollection
does not provide new methods on its own, other tooling, libraries, and frameworks often depend on performant random access by checking conformance to RandomAccessCollection
. Examples are the Swift Algorithms package and SwiftUI.
While the proposed implementation has very slow growth in access performance across collection size, almost all operations are O(log collection.count)
and it would thereby be wrong to conform to RandomAccessCollection.
Exposing the concept of capacity
The B-Trees used within these sorted collections do have a concept of capacity (in contrast to other trees such as AVL or Red-Black Trees). However, there are a few pitfalls to exposing them.
-
It is not immediately obvious how varying a B-Trees capacity will affect performance in contrast to other data structures such as
Array
. AnArray
is a contiguous buffer and therefore thefunc reserveCapacity(Int)
method provides a clear optimization in reducing future allocations. However, with B-Trees their performance in comparison to their capacity followed a 'U'-shaped curve. A larger node capacity will likely mean slower lookup performance as the default capacities have been tuned to be the most optimally performant possible. -
A tree where
self.count < self.capacity
does not necessarily mean that an element will be able to be inserted without a need for allocation as the node with the free slot could be different from the node in which the element needs to be inserted to maintain sortedness. -
Efficiently maintaining the overall tree's free capacity would require additional memory and runtime performance overhead, and it is not clear, potentially aside from debugging and instrumentation, what a
capacity
property would be used for to warrant the trade-offs.
Providing custom comparators
A common situation may be to customize the property by which elements are sorted within these collections. A few of the most obvious and ergonomic approaches are:
- Passing a
KeyPath
parameter to specify a property on which to sort - Providing a closure implementing some comparison predicate such as
<
similar to theSequence.sorted(by:)
API
The primary shortfall is that both approaches would induce significant runtime overhead as every internal method would need to store and defer to either the key-path or closure to perform a comparison operation.
The current goal of the package is to provide a data-structure that is low-level and capable of performing common operations efficiently. This means in its current form, it is possible to build such an interface on top of the existing SortedSet
and SortedDictionary
implementations such as:
struct CustomSort<Element, Key: Comparable>: Comparable {
var element: Element
var property: KeyPath<Element, Key>
init(_ value: Element, on property: KeyPath<Element, Key>) {
self.element = element
self.property = property
}
static func ==(lhs: Self, rhs: Self) -> Bool {
lhs[keyPath: property].element == rhs[keyPath: property].element
}
static func <(lhs: Self, rhs: Self) -> Bool {
lhs[keyPath: property] < rhs[keyPath: property]
}
}
let students = SortedSet(
students.map { student in
CustomSort(student, on: \.name)
}
)
Whether this is within scope and a common enough pattern to warrant an additional data structure in the package maybe a future question.