`NonEmpty` collections support

DevAndArtist · April 24, 2021, 8:07am

Update:

This thread was moved from Collections to Discussion as it didn't fit the project and to reach a wider audience. Below you will find the previous discussion while the new participants are welcome to add their feedback right after this last comment.

Previous discussion starting point:

Hi there, I asked in the announcement thread about the potential adoption of a collection wrapper type NonEmpty, but haven't received any response yet. I guess a standalone thread might be a better signal for this question. :)

The open source project:

cc @lorentey, @stephencelis & @mbrandonw

Some previous discussions:

Karl · April 24, 2021, 8:32am

Yikes - it looks like they’re adding their own conformance to StringProtocol. That’s a bad idea - the documentation explicitly says not to do that

ktraunmueller · April 24, 2021, 5:33pm

I have never, in my twenty years of software engineering, found myself in need of a guaranteed non-empty collection.

mattpolzin · April 24, 2021, 6:03pm

I’ve used NonEmpty types plenty. Every time I call first on a collection that I know has at least one element and subsequently need to check whether the return of first was nil I am frustrated that I have forced myself to handle a code path that will never be executed. Tangentially, force unwrapping is not the answer because I don’t like placing any faith in my future self to never accidentally pass an empty collection into code expecting a non empty collection. The answer is definitely to create a NonEmpty collection wherever I first know it is guaranteed to be non empty and pass that around, never needing to worry or check again.

jrose · April 24, 2021, 6:44pm

Yeah, it's always felt like overhead to me, but I think many people say the same thing when first encountering Optional. The funny thing is that Optional is reversed: "I've never needed a non-nullable type". Maybe I'll have to try using NonEmpty more.

phoneyDev · April 24, 2021, 7:51pm

This is a workaround for the "Dark side of Optionals." Makes sense to me. Nobody likes unwrapping something that's logically known (or believed) to be non-optional. Having a compiler guarantee of this for collections would be very useful.

Karl · April 24, 2021, 9:17pm

One of the reasons I've never felt the need for something like this is because the whole empty/non-empty distinction is not incredibly useful for generic code.

Most of the time, if you have a function which takes a generic collection, you're interested in processing more than a single element from it (otherwise you could use something like c.first.map). If your function really does require a certain number of elements, it will already need to handle the case when the collection contains fewer elements than required. The empty vs. non-empty distinction is just a special case of that; having the caller check first if the collection is empty just adds noise and complexity.

Furthermore, the benefits (non-optional first and last properties, etc) disappear the moment you do something like take a slice of the collection.

And then you need to consider the caller - having initializers with separate head and tail parameters is not very convenient. I'd say it's substantially more of a burden than having to unwrap first for a collection you somehow know isn't empty.

You might say that generic code isn't a good candidate for NonEmpty - but then, none of the other examples in the README are very compelling either, IMO. Take the GraphQL example - if an empty set of fields is literally the only error that could possibly happen when sending a request to a server, it might be worthwhile; but since that's obviously not the case, it ends up being a lot of fuss for very little benefit.

If anybody has some compelling examples of general classes of problems, where forcing the caller to deal with emptiness specifically apart from all other errors can have real benefits, I'd love to hear them.

idrougge · April 25, 2021, 1:57am

I think there is a space where something expressing the emptiness of something on a type level is useful, and that is optional collections.

Often you have a model object which is an optional collection (array or string), where you must guard both against .some(collection) and .isEmpty.

If the world consisted only of Swift code, this would be a lesser problem, but dealing with foreign/sloppy systems, you often must deal with systems which don’t draw a clear line between a string being absent or empty.

DevAndArtist · April 25, 2021, 6:25am

Well yes and no, if a slice is known to be taken from a non-empty collection and generated from a non-empty range then the slice itself won't be empty, unless the index range is out bounds or empty. However there are some difficulties in the language that prevent us expressing such a thing. Not only that, it would be beneficial if a non-optional implementation of an optional requirment would be propagated further through other types. That is basically the second issue with slices as their first and last which are optional won't inherit the non-optionality of those properties from the wrapped non-empty collection.

If there were good solutions on how these problems could be generally tackled, I think this thread is the good incubation point for such ideas. Maybe instead of a wrapper type, we might be better with some kind of compiler support for non-emptiness.

mattpolzin · April 25, 2021, 4:44pm

It’s totally reasonable to want specific examples of NonEmpty uses, but let me at least provide a bit more abstract motivation. NonEmpty really is a dual of non-Optional, as pointed out by other commenters, and therefore the use cases are someone similar and also similarly easy to overlook if you have always used collections that can be empty in the past. If you write code that needs to check more than once whether a collection is empty or not (or grab the first element and handle nil as a problem) then a NonEmpty collection would have saved code and reduced opportunities for bugs.

Maybe a user is required to enter at least one value in a field but another part of the codebase uses those values; and let’s say you’ve got validation code so you can tell the user if they’ve failed to enter at least one value; then your code that uses those values elsewhere, perhaps in another view, should not need to assume you’ve validated the user input and you should not need to write UI for the edge case that the collection is empty somehow. Instead, create a NonEmpty collection the first time you know it is not empty and use that guarantee elsewhere.

Maybe instead you’ve got a collection coming out of an API response and used all over the place in your app but the collection must have at least one value (maybe it’s a list of options for something); this is a great place for NonEmpty so you can handle an erroneous or unexpected API response just once in your response handling code and create a NonEmpty collection. What do you do if the collection from the API is empty? It’s up to you, but now you know you are handling it the same way everywhere because you handle it up front as soon as you could possibly recognize the problem instead of pushing that handling out into potentially disparate parts of the codebase.

Maybe you’re writing an API for a library and you can either express in documentation that a function expects a non-empty collection or you can make the parameter explicitly a NonEmpty type. In the former case you must either produce a default if the collection is empty or return a nil or error result from the function if the function is given an empty collection; it might have been much nicer if the user of your API could check whether they meet the invariant of a non-empty collection up front and not need to thread the handling of the error case through your function.

jawbroken · April 25, 2021, 6:19pm

To quote myself from the other thread:

I don't find the analogy to Optional that compelling, because roughly all you can do with an Optional is conditionally unwrap it. That's why I made the analogy to a non-zero number above, i.e. you're going to want to do a lot of Sequence/Collection operations on a non-empty collection but be thwarted or forced to drop the non-empty guarantee.

mattpolzin · April 25, 2021, 6:48pm

This happens all the time with Optional; you have a non-optional type and then you do something that removes that guarantee and you end up with an optional type. Why is it so much less appealing to dip in and out of a non-empty guarantee than it is to dip in and out of a non-optional guarantee?

I think “I don’t often use collections with a need to know whether it has any elements or not” is a perfectly valid statement but I’m not sure I buy an argument that there’s more of a drawback than infrequency of use or even unfamiliarity (RE my argument that you don’t see the use cases if you’ve spent your whole career not looking for them)

DevAndArtist · April 25, 2021, 7:03pm

To me solving the non emptiness is equally important as eliminating the extra cases introduced by (Value?, Error?) through the Result<Value, Error> type.

Dante-Broggi · April 25, 2021, 7:20pm

A potential reason is that with T vs. T? one is going from 1 thing to 0 or 1 things, which doubles the number of possible states, whereas going from NonEmpty<[T]> to [T], only adds 1 state (empty) to the Int.max existing states, which may be almost negligible in relation.

Francois_Green · April 26, 2021, 3:36am

The Ceylon Programming Language has this concept. Maybe some of those ideas can find a home in Swift Collections.

mattpolzin · April 26, 2021, 1:29pm

This is a tempting observation because indeed optional and empty states do both add 1 additional possible value to types, but T can be anything so one specific example of T -> Optional<T> would be [U] -> Optional<[U]> which just like NonEmpty<[U]> -> [U] adds 1 possible value but does not double the total number of representable values. It may not be particularly common to work with optional arrays, but I could say the same thing more generally by letting T be any enum: enumerations take on one of some number of values where that number is generally greater than 1. Or let T be any struct: The number of possible values is a product of the number of fields. Therefore the cases where T -> Optional<T> represents doubling the number of possible values are relatively uncommon.

DevAndArtist · April 26, 2021, 1:39pm

Another abstract thought:
If T was a "non-empty collection with a fixed size of 1", then T? would be a "a potentially empty collection with a fixed size of 1". Since T is a theoretical sub-type of T? (an enum sub-type has equal or less cases than the parent enum), it also implies that a non-empty collection is a sub-type of a collection which can be empty. However, Swift does not yet allow us to express this sub-type for collections.

And since the analogy for an emptiable collection would be the Optional type, we just discovered that without a proper non-empty sub-type the status quo for collections is still the same as in languages which don't rely on an Optional like type and have to deal with "null pointer exception".

With that, I would like to second what @jrose mentioned above. The polarities for the "default" are reversed. In other words, we do lean towards a non-optional type more often than to an optional, but it doesn't make the optional type not useful. On the collection side we lean towards emptiable collections more often, but this should not imply that non-empty collections aren't useful in other use-cases.

AnotherUser · April 26, 2021, 5:10pm

I'm in favor of NonEmpty. I would very much like some way to guarantee that collection.first will be non-nil without needed to spread failure cases in that will never be executed. A compile-time safety check as simple as this is always welcome.

stephencelis · April 26, 2021, 6:08pm

We didn't find any reason not to given the discussion here: StringProtocol - Do not declare new conformances? - #3 by Joe_Groff

And because NonEmpty is simply a wrapper around a raw string this also seemed perfectly fine to do.

typesanitizer · April 26, 2021, 7:26pm

Since people are mentioning that they haven't run into wanting this before, I can give some examples of use cases I've run into in practice:

In a compiler, after the parsing stage, I want to make sure a lambda has 1 or more parameters.
In a graphics program, I want to make sure that a pipeline short-circuits early in case the array of selected graphics is empty (this array is used to apply changes), this can be enforced by having the later stages of the pipeline accept non-empty arrays. More generally, I think this is useful for working with selections, when you want to do something special when nothing is selected vs when 1 or more items are selected.
In certain situations, I want to make sure a string is not empty (because it represents a name or a filepath) and I want to pass that proof elsewhere. For example, when working on the Swift compiler (so this was C++ code), we ran into a bug where in some cases a file path was susprisingly empty. We have a hack to work around that: swift/LoadedModuleTrace.cpp at a21f323c1657913445f27c8aaebd31a0b8248dd8 · apple/swift · GitHub where we print a warning instead of asserting, since we haven't been able to reproduce the issue reliably.

Is such functionality a little bit of overhead at the call-site? Absolutely. Is it more work to forward/duplicate methods? Yes. However, the benefits come up later when you don't need to either throw an error or write fatalError("Impossible") or have hard to debug issues where you end up with empty collections in places that you don't expect.