Splitting a Collection by another Collection

I was doing some work with String earlier and was wondering why <T>split(separator: T) -> [Substring] where T: StringProtocol doesn't exist in the standard library. I know that this function exists in Foundation under the name <T>components(separatedBy: T) -> [Substring] where T: StringProtocol, but this alludes to a more general missing function.

The generic function that is missing here is splitting a collection by another collection.

extension Collection where Element: Equatable {
    /// Returns the longest possible subsequences of the collection, in order,
    /// around sequences equal to the elements of the given separator collection.
    ///
    /// The resulting array consists of at most `maxSplits + 1` subsequences.
    /// Collections that are used to split the collection are not returned as part
    /// of any subsequence.
    ///
    /// - Parameters:
    ///   - separator: The collection whose sequence of elements should be split upon.
    ///   - maxSplits: The maximum number of times to split the collection, or
    ///     one less than the number of subsequences to return. If
    ///     `maxSplits + 1` subsequences are returned, the last one is a suffix
    ///     of the original collection containing the remaining elements.
    ///     `maxSplits` must be greater than or equal to zero. The default value
    ///     is `Int.max`.
    ///   - omittingEmptySubsequences: If `false`, an empty subsequence is
    ///     returned in the result for each consecutive pair of `separator`
    ///     instances in the collection and for each instance of `separator` at
    ///     the start or end of the collection. If `true`, only nonempty
    ///     subsequences are returned. The default value is `true`.
    /// - Returns: An array of subsequences, split from this collection's
    ///   elements.
    ///
    /// - Complexity: O(*n*), where *n* is the length of the collection.
    func split<OtherCollection>(separator: OtherCollection, maxSplits: Int = Int.max, omittingEmptySubsequences: Bool = true) -> [SubSequence] where OtherCollection: Collection, OtherCollection.Element == Element {
        precondition(maxSplits >= 0, "Must take zero or more splits")

        guard count > 0 else {
            return []
        }
        guard !separator.isEmpty else {
            return [self[startIndex..<endIndex]]
        }
        var splitArray = [SubSequence]()
        var isMatching = false
        var subSequenceStartIndex = startIndex
        var subSequenceEndIndex: Index!
        var separatorIndex = separator.startIndex
        var i = startIndex
        var splitCount = 0
        while i < endIndex && splitCount < maxSplits {
            if !isMatching {
                if self[i] == separator[separatorIndex] {
                    isMatching = true
                    subSequenceEndIndex = i
                    separatorIndex = separator.index(after: separatorIndex)
                } else {
                    i = index(after: i)
                    continue
                }
            } else {
                if self[i] == separator[separatorIndex] {
                    separatorIndex = separator.index(after: separatorIndex)
                } else {
                    isMatching = false
                    separatorIndex = separator.startIndex
                    i = index(after: i)
                    continue
                }
            }
            i = index(after: i)
            if isMatching && separatorIndex == separator.endIndex {
                if !omittingEmptySubsequences || subSequenceStartIndex != subSequenceEndIndex {
                    splitArray.append(self[subSequenceStartIndex..<subSequenceEndIndex])
                    splitCount += 1
                }
                separatorIndex = separator.startIndex
                subSequenceStartIndex = i
                isMatching = false
            }
        }
        if !omittingEmptySubsequences || subSequenceStartIndex != endIndex {
            splitArray.append(self[subSequenceStartIndex..<endIndex])
        }
        return splitArray
    }
}

This split function would get rid of the need to import Foundation and differentiate between a very similar operation on strings relative to the overloads of split(separator:) that already exist in the standard library. Furthermore, this would serve a more general purpose of allowing one to split a collection whose elements conform to Equatable with any other collection with the same Element type. So in the case of splitting a String, not only could you split it by another String, but you could also split it by Substring or [Character] or any other collection whose Element is Character for that matter. I feel that this fills an open gap in Swift's Collection API.

Should we add such a function to the standard library?
Is there a need for this?

3 Likes
  • split(separator: CharacterSet) cannot be in the Standard Library because it requires the presence of CharacterSet, which only exists at the Foundation level.
  • String.split(separator: T) -> [Substring] where T: StringProtocol would be a valid addition to the Standard Library, though it would likely be better generalized for Collection instead. A much more filledā€out pitch exists here:
3 Likes

As per @SDGGiesbrecht's suggestion, I've changed the post's from just splitting Strings to splitting a collection with another collection.