Pitch [stdlib]: Variant of split that doesn't discard split separator in result

At present it seems Swift's standard lib has a split method that consumes the split separator, eg:

[1, 2, 3, 4].split(separator: 3) // [[1, 2], [4]]
"abc def".split(separator: " ") // ["abc", "def"]
Array(0...10).split { $0.isMultiple(of: 3) } // [[1, 2], [4, 5], [7, 8], [10]]

This is great when you're splitting strings on whitespace or commas or breaking down other sequences where you don't want to retain the separator.

However often you want to retain the separator. One recent example I ran into is breaking down "PascalCase" identifiers.

Eg, I wanted to break strings like "SanFrancisco" down into an array such as ["San", "Francisco"] (and then join it together again with space separators).

Ideally there would be a function in the Swift standard library that acted like the following:

[1, 2, 3, 4].splitBefore(separator: 3) // [[1, 2], [3, 4]]
"abc def".splitBefore(separator: " ") // ["abc", " def"]
Array(0...10).splitBefore { $0.isMultiple(of: 3) } // [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]
"SanFrancisco".splitBefore { $0.isUppercase } // ["San", "Francisco"]

How about it? This would be a purely additive change so wouldn't break ABI.

PS: I'm not such a fan of using splitBefore as the method name. But I can't currently think of anything better.

6 Likes

What about appending another parameter to the existing split function?

func split(
    separator: Element, 
    maxSplits: Int = Int.max, 
    omittingEmptySubsequences: Bool = true,
    consumeSeparator: Bool = true
) -> [AnyBidirectionalCollection<Element>]

I assume this will impact ABI, though, so perhaps not the best approach to get this through.

It does, but we have ways to work around that (at a small cost of dead symbols remaining in the ABI). So this shouldn't rule that option out if it's considered the best solution.

(it's also source breaking, but only in really footling ways which I think can be ignored)

We need another parameter to specify if the split happens before or after the delimiter? I am not sure I want the split method to have so many options.

1 Like

You could use an enum instead of a Boolean I that case I think.

That would make for a nicer API, I guess:

func split(
    separator: Element, 
    maxSplits: Int = Int.max, 
    options: SplitOptions = [.omitEmptySubsequences, .consumeSeparator]
) -> [AnyBidirectionalCollection<Element>]

Could the existing method be updated to use the new implementation?

2 Likes