I was doing some work with String
earlier and was wondering why <T>split(separator: T) -> [Substring] where T: StringProtocol
doesn't exist in the standard library. I know that this function exists in Foundation under the name <T>components(separatedBy: T) -> [Substring] where T: StringProtocol
, but this alludes to a more general missing function.
The generic function that is missing here is splitting a collection by another collection.
extension Collection where Element: Equatable {
/// Returns the longest possible subsequences of the collection, in order,
/// around sequences equal to the elements of the given separator collection.
///
/// The resulting array consists of at most `maxSplits + 1` subsequences.
/// Collections that are used to split the collection are not returned as part
/// of any subsequence.
///
/// - Parameters:
/// - separator: The collection whose sequence of elements should be split upon.
/// - maxSplits: The maximum number of times to split the collection, or
/// one less than the number of subsequences to return. If
/// `maxSplits + 1` subsequences are returned, the last one is a suffix
/// of the original collection containing the remaining elements.
/// `maxSplits` must be greater than or equal to zero. The default value
/// is `Int.max`.
/// - omittingEmptySubsequences: If `false`, an empty subsequence is
/// returned in the result for each consecutive pair of `separator`
/// instances in the collection and for each instance of `separator` at
/// the start or end of the collection. If `true`, only nonempty
/// subsequences are returned. The default value is `true`.
/// - Returns: An array of subsequences, split from this collection's
/// elements.
///
/// - Complexity: O(*n*), where *n* is the length of the collection.
func split<OtherCollection>(separator: OtherCollection, maxSplits: Int = Int.max, omittingEmptySubsequences: Bool = true) -> [SubSequence] where OtherCollection: Collection, OtherCollection.Element == Element {
precondition(maxSplits >= 0, "Must take zero or more splits")
guard count > 0 else {
return []
}
guard !separator.isEmpty else {
return [self[startIndex..<endIndex]]
}
var splitArray = [SubSequence]()
var isMatching = false
var subSequenceStartIndex = startIndex
var subSequenceEndIndex: Index!
var separatorIndex = separator.startIndex
var i = startIndex
var splitCount = 0
while i < endIndex && splitCount < maxSplits {
if !isMatching {
if self[i] == separator[separatorIndex] {
isMatching = true
subSequenceEndIndex = i
separatorIndex = separator.index(after: separatorIndex)
} else {
i = index(after: i)
continue
}
} else {
if self[i] == separator[separatorIndex] {
separatorIndex = separator.index(after: separatorIndex)
} else {
isMatching = false
separatorIndex = separator.startIndex
i = index(after: i)
continue
}
}
i = index(after: i)
if isMatching && separatorIndex == separator.endIndex {
if !omittingEmptySubsequences || subSequenceStartIndex != subSequenceEndIndex {
splitArray.append(self[subSequenceStartIndex..<subSequenceEndIndex])
splitCount += 1
}
separatorIndex = separator.startIndex
subSequenceStartIndex = i
isMatching = false
}
}
if !omittingEmptySubsequences || subSequenceStartIndex != endIndex {
splitArray.append(self[subSequenceStartIndex..<endIndex])
}
return splitArray
}
}
This split function would get rid of the need to import Foundation and differentiate between a very similar operation on strings relative to the overloads of split(separator:)
that already exist in the standard library. Furthermore, this would serve a more general purpose of allowing one to split a collection whose elements conform to Equatable
with any other collection with the same Element
type. So in the case of splitting a String
, not only could you split it by another String
, but you could also split it by Substring
or [Character]
or any other collection whose Element
is Character
for that matter. I feel that this fills an open gap in Swift's Collection API.
Should we add such a function to the standard library?
Is there a need for this?