Why does split() filters out empty elements by default?

(Vladislav Alekseev) #1

E.g. in Python/Kotlin, when you split a string using a separator, it will include empty string by default:

"1,,,,,1".split(',')
['1', '', '', '', '', '1']

In Swift, it will omit empty elements:

"1,,,,1".split(separator: ",").count 
$R0: Int = 2

You can leave empty elements by explicitly asking it so (omittingEmptySubsequences: false)

I wonder what were the discussions and arguments behind the decision to make split method omit empty elements by default.

4 Likes
(Max Moiseev) #2

/cc @dabrahams

UPD: after some quick investigation I can probably speculate about the decision to make the API the way it is.

Looks like both Python and Kotlin do not even provide you the possibility to omit the empty parts. Therefore there is no choice, they have to give you back all the parts as you would be losing information otherwise. From the personal experience, I always found myself filtering empty sub-sequences out immediately after calling split, so I can see why this can be considered a useful default, and since there is an option to include empty ones – no information loss.

1 Like
(Ben Rimmington) #3

Foundation also includes empty strings in the result, with no option to exclude them.

import Foundation
"1,,,,,1".components(separatedBy: ",")
// $R0: [String] = ["1", "", "", "", "", "1"]

The split API was originally a free function in June 2013, which predates the first CHANGELOG entry. It might be difficult to answer your question, unless someone at Apple has an exceptional memory (or access to internal mailing lists).

1 Like
(Vladislav Alekseev) #4

Thanks for your thoughts about the decision. While you may be true about using split() and then filtering out the empty elements, it is odd to make split() work as a filter() by adding one more argument to it.

(Jordan Rose) #5

It may be that it would make more sense for split not to filter out empty elements, but at this point that'd be a behavior-breaking change. As much as we might like to, I'm not sure the default is so bad that it's worth changing the behavior of existing working code.

(Max Moiseev) #6

The important thing is that split does not perform unnecessary work if that is what you desire. split(omittingEmptySubsequences: false).filter { $0 != "" } would be less efficient, even if you add a .lazy in front of a filter, I think. Maybe I'm mistaken, but split will create an array, the lazy filter will then create a lazy collection, that's fine, but if you want to pass it somewhere as an array, you'd have to wrap it into in Array.init call, which will make another array. It gets even worse if you forget to include lazy. All that work could be avoided by just letting split havу one extra parameter and not include the empty parts in the result in the first place.