SE-0194: Derived Collection of Enum Cases

The index API shouldn't be on Collection, because it doesn't not make sense for a Set or Dictionary to have indices, since they are unordered. And your point about a B-tree is a red-herring, because there is no B-tree in the standard library, and writing your own data structure is so fantastically rare that it seems strange to hobble the Collection API for a 0.001% use case.

It's possible I'm mistaken and don't realize it. What I do realize is that every time I try to index into a string, I cuss out the screen because the API is so nonsensical and annoying to use.

1 Like

What I do realize is that every time I try to index into a string, I cuss out the screen because the API is so nonsensical and annoying to use

String having integer indexes would be an index into the code units of the underlying string structure (which is nearly useless in terms of general purpose algorithms, and potentially surprising when e.g. a string operation returns a new string where the underlying data has gone from say UTF-8 to UTF-16), or would make indexing an O(n) operation to the index Character (making most algorithms relying upon it very slow).

I'd rather have O(N) speed and correctness than a ridiculous API. And no, indexing by an Int would not index into the code units, because a String is a Collection of Characters, so indexing into the collection would index into the Characters. If I wanted to index into the UTF8/UTF16/whatever, I'd use those views of the String, and not the string itself.

1 Like

Thanks for your input. I don't think resilient library issue invalidates this approach.

Let me clarify something: I asked this literal/macro to be scoped because I want it to be only callable from within a declaration (or extension) of an enum, not like an API that a client code can call to obtain all values of an enum. So, a call to #allEnumValues will be only valid as a part of enum declaration, or in an extension when it is safe (in terms of resilience) to do so. It would not be safe to call it in an extension to a non-frozen enum dynamically liked from an external binary framework. An implementation of a resilient library can still call #allEnumValues as part of its implementation to get an accurate list of values at the time of its compilation, which will always remain valid.

String indexing has been discussed very fully elsewhere. No point in rehashing here.

Itā€™s not a limitation of Collection that results in its not having integer indices, but a deliberate design decision predicated on the idea that thereā€™s almost always a better way to do something than O(N) indexing.

I don't understand what ordering has to do with indices. You expect to be able to iterate the contents of a Set or Dictionary, do you not? Therefore it is representable as some kind of Sequence. An Index is simply a stable pointer to an Element, so if you can also produce a Sequence of stable pointers to your Elements (which Set/Dictionary can), you can also be a Collection. BidirectionalCollection adds the ability to generate that pointer Sequence in reverse, and RandomAccessCollection adds some performance guarantees about index calculations. Ordered/Unordered doesn't come in to it AFAICT.

The exact meaning of "pointer" definitely depends on your implementation. A flat, 1-dimensional Array-like Collection might be happy with a integer, but something like a JoinedCollection might need something different (or even if they used integers, the values might be just as non-intuitive as an opaque Index).

I feel like having a general-sounding #allEnumValues expression is worse than the proposal in several ways:

  • Having different names for "allValues" in different types means less consistency across code bases, no ability to abstract over types that can enumerate all of their values (e.g., via a common protocol). Every developer will invent their own name for this same concept.
  • It will be surprising to developers to have a general-sounding expression #allEnumValues that nonetheless is quite restricted. OTOH, we already have a fairly-consistent set of rules for synthesized protocol conformances (Equatable, Hashable, RawRepresentable, Codable, etc.) that the proposed ValueEnumerable leverages.
  • Using it for imported enums will still need to either be banned, or will be a source of code-evolution problems (again, the resilience issue).
4 Likes

People love to make that assertion, but sorry, it's just wrong. There is no sense in which Set and Dictionary lack an ordering. You can traverse their elements and they repeatably come out in the same order. The only thing you can't do is control the order of the elements. But that's just as true of 3...10, or if you prefer an example where the progression of values is less directly correlated to ordering, (3...10).lazy.map { $0 % 2 == 0 ? $0 : -$0 }.

Useful and meaningful things can be done with positions in a collection whose order you can't control that would otherwise force a premature copy of the elements into an array, at an unnecessary cost. For example, you could

  • slice up regions of a Set and hand them out for parallel processing
  • sort the indices of a Set of large values based on some feature of the corresponding element, then traverse these sorted indices to reference the elements of the set.

I'm sure if you use your imagination you can come up with many others.

It's not at all a red herring. We'd like to have a B-tree in the standard library, and moreover we want to support those ā€œfantastically rareā€ 3rd party library designers who will supply data structures for ā€œthe rest of us.ā€ One of the goals of the protocol design is to allow interesting data structures that will be developed in the future to fit into the ecosystem and take advantage of all the generic algorithms we (and others) supply.

There are several reasons for that. The main one is probably the Standard Library's ā€œfault,ā€ in that it hasn't yet supplied the high-level operations that would make your indexing into a string unnecessary, or you haven't been properly made aware of the ones it does have. For casual use, there is in principle no reason you should need to index almost any collection: there should always be a higher-level operation you can reach for or compose to do what you want, usually in a more correct, efficient, and maintainable way. Of course, many of these operations are still missing from the standard library, and that list only scratches the surface, notably not including many of the operations that would be most useful on strings.

I think there's also at least a small part of this frustration you can take responsibility for yourself: as long as you come at programming with the preconceived idea that non-integer indexing is an awful mistake, every time you bump into it, it's going to be annoying, and you're also going to fail to appreciate the motivations for it that will help you use indices powerfully.

All that said, there's no reasonable representation for strings that offers efficient integer indexing without exposing serious correctness traps for the casual user. So even if you think all the other examples are bogus, there's still no good way to get you what you wantā€”strings indexed by integersā€”without unacceptably compromising other design goals.

5 Likes

As I touched it in one of my previous responses. We do still need an API for clients of the enum and we may decide to add the contract of that API as a protocol in standard library.

Let me also clarify that I don't think the proposed client API is satisfactory, especially considering library evolution issues.

On the other hand, the key motivator for asking for #allEnumValues is that it provides a clear path for further enhancement for supporting library evolution-related issues such as #allEnumValues(!deprecated) example I gave before.

The proposal as it stands does not provide any path for being able to filter enum cases based on availability and is not easily extendable to support it. As it stands now, only a hashtag special symbol can access attributes like availability and deprecation.

This is great to hear! Have you considered eRRB-Trees or something similar? (https://public.sinusoid.es/misc/immer/immer-icfp17.pdf) It seems like this would be a great fit for Swift.

I think there are two axis here, one is the need to have a mathematically sound API, that will allow any kind of data structure to be built. The other is a pragmatic need to have an easy way to get a substring from a string. I don't think these are mutually exclusive, it's just that we are in version 4 of the language and we are able to build C* red-orange hash trees, but we can't do the Python equivalent of s[2:5] (I'm not even talking about "advanced" stuff like s[2:-2]) in a non-convoluted way.

In case it isn't obvious, I very much appreciate both axes. The frustration of not to be able to ā€œdo the Python thingā€ in this space is real, which is why I've repeatedly made proposals to address the problem.

1 Like

I disagree. I've written a few and hope to write more. I've found that the Collection API is not so much bad as it is inconsistently documented. At various times, I have been sure about what needed to be implemented to conform only to find that the requirements changed and the the best way to find the new set was to stare at the interface and let my mind wander until the requirements came clear like a magic eye image.
Regarding indices, I actually hope that we make it easier to use indices as they are on Set and Dictionary. Given more ways to receive a Set.Index, people will have a better understanding of why the index types exist and the benefits of the current situation.

4 Likes

What's the status of this proposal?

If reviews are still valid, I'm +1 on this proposal, with the weird custom collection weird but fine as long as I can Int index into it. What I'm really not sold on is this being accomplished using runtime machinery and not just something generated at build time. I don't see anything in the proposal or discussion about why that part's necessary.

If this is still open to votes then Iā€™m a huge +1 as long as the collection is source ordered and +0 if not source ordered.

Also I feel like Int is the best index.