Introduction
This is inspired by the proposal of adding count
methods to Sequence
. Apart from count(of: Element)
and count(where:)
, there is another count
method and, in addition, a contains
method that I consider a sensible addition to Collection
.
Methods
count<T: Collection>(of other: T, overlapping: Bool) -> Int where T.Element == Element
Returns the number of occurrences of the given subcollection in the collection. When overlapping
is true, overlapping occurrences are also counted.
contains<T: Collection>(_ other: T) -> Bool where T.Element == Element
Returns a boolean value indicating whether the collection contains the given subcollection.
Motivation
Strings
There is a rather tricky and non-trivial way (for a regular user) to count the number of occurrences of a substring in a string:
str.components(separatedBy: "substring") - 1
The method used in this approach not only is unintuitive for the task in question, but also unnecessarily inefficient – a shortcoming that is partially solved with .lazy
without gaining any advantage in semantics, ergonimics an readability.
StringProtocol
has a contains<T>(_ other: T)
method as part of the NSString
compatibility API, which can become a special case of the proposed method, thus avoiding breaking changes and redundancy.
Arrays and Data
There is neither a way nor a relatively short workaround for counting the number of occurrences of a given subcollection in a collection or verify whether an ordered collection contains a given subcollection.
As @xwu pointed out, It is sensible to wonder and have a efficient way to find out if or how many times a sequence occurs in an instance of Data
, Array
and other ordered collections.
Issues
Counting how many times a subsequence appears in a sequence can be done either considering or ignoring overlapping occurrences.
[1, 1, 1].count(of: [1, 1]) == 1 // non-overlapping
[1, 1, 1].count(of: [1, 1]) == 2 // overlapping
A straight-forward and natural solution is an additional parameter, overlapping: Bool
.
Splitting the method and hence extending count
for semantic integrity, in my opinion, is an intuitively harmful approach that also isn't consistent with the existing count
name family.
Implementation
StringProtocol
It is reasonable to assume the current StringProtocol.contains(subsequence:)
method is well optimized for strings, therefore it should be left as is or replaced by a Hashable
optimization of the proposed method if appropriate.
Collection
default implementation
(*) *If the provided collection is empty, the below implementations return 1
and true
respectively. However, I understand this can be misleading and it is yet to be discussed which variant is most convenient.
count
Code
extension Collection where Element: Equatable {
@_inlineable
public func count<T: Collection>(of other: T, overlapping: Bool) -> Int where T.Element == Element {
if other.startIndex == other.endIndex { return 0 }
if self.startIndex == self.endIndex { return 0 }
var count = 0
var currentMainSelfIndex = self.startIndex
var currentHelperSelfIndex = self.startIndex
var currentOtherIndex = other.startIndex
if overlapping {
while currentMainSelfIndex < self.endIndex {
while other[currentOtherIndex] == self[currentHelperSelfIndex] {
if other.index(after: currentOtherIndex) == other.endIndex {
count += 1
break
}
if self.index(after: currentHelperSelfIndex) == self.endIndex { return count }
currentHelperSelfIndex = self.index(after: currentHelperSelfIndex)
currentOtherIndex = other.index(after: currentOtherIndex)
}
currentMainSelfIndex = self.index(after: currentMainSelfIndex)
currentHelperSelfIndex = currentMainSelfIndex
currentOtherIndex = other.startIndex
}
return count
}
while currentMainSelfIndex < self.endIndex {
while other[currentOtherIndex] == self[currentHelperSelfIndex] {
if other.index(after: currentOtherIndex) == other.endIndex {
count += 1
currentMainSelfIndex = currentHelperSelfIndex
break
}
if self.index(after: currentHelperSelfIndex) == self.endIndex { return count }
currentHelperSelfIndex = self.index(after: currentHelperSelfIndex)
currentOtherIndex = other.index(after: currentOtherIndex)
}
currentMainSelfIndex = self.index(after: currentMainSelfIndex)
currentHelperSelfIndex = currentMainSelfIndex
currentOtherIndex = other.startIndex
}
return count
}
}
Complexity
-
n
is the collectioncount
,m
is the subcollectioncount
. -
Time
- best: Ď´(n)
- worst: Ď´(nm)
- average: O(nm)
-
Memory Ď´(1)
contains
Code
extension Collection where Element: Equatable {
@_inlineable
public func contains<T: Collection>(_ other: T) -> Bool where T.Element == Element {
if other.startIndex == other.endIndex { return true }
var currentMainSelfIndex = self.startIndex
var currentHelperSelfIndex = self.startIndex
var currentOtherIndex = other.startIndex
while currentMainSelfIndex < self.endIndex {
while other[currentOtherIndex] == self[currentHelperSelfIndex] {
if other.index(after: currentOtherIndex) == other.endIndex {
return true
}
if self.index(after: currentHelperSelfIndex) == self.endIndex { return false }
currentHelperSelfIndex = self.index(after: currentHelperSelfIndex)
currentOtherIndex = other.index(after: currentOtherIndex)
}
currentMainSelfIndex = self.index(after: currentMainSelfIndex)
currentHelperSelfIndex = currentMainSelfIndex
currentOtherIndex = other.startIndex
}
return false
}
}
Complexity
-
n
is the collectioncount
,m
is the subcollectioncount
. -
Time
- best: Ď´(n)
- worst: Ď´(nm)
- average: O(nm)
-
Memory Ď´(1)
Being closely related to the aforementioned proposal, the motivation is pretty much the same – to introduce a method for a common task, make it intuitive, easy to use and prevent the user from accidentally writing inefficient code.