The array is highlighting the asymmetry, but there are plenty of times where you might know that an index always exists at offset five or offset ten or whatever. I can very reasonably imagine an API that is generic over a Collection but has a precondition to have at least n elements present. Or in scripting contexts, you are being rough and ready and you just know you are only getting strings that have a three-character prefix or whatever. I think these kind of cases are reasonably frequent.
I don't understand the relevancy of this. We don't 'know' the indices when we use bare subscript (index: Index), and that does not return an optional. The implementation can certainly support returning a non-optional result.
Yes we do know. Besides iterating over indices, using startIndex, manipulating startIndex and endIndex with the various index(...) API, and using the firstIndex(...) and lastIndex(...) API give us known starting indexes and ones based off known indexes. Some Index types don't even have public initializers; you have to use the index manipulation API above instead of creating a value on your own.
I'm inclined that we don't include .first and .last.
Many precedent languages for location before the end already use -1 for the last element, so people would be familiar with it. Code snippet would also already incorporate that fact.
It feels harmful to me to have choices between very similar concepts, especially when neither case fits perfectly for what I want.
From the description of how overloads are structured I don't see any problems with this approach from type-checker perspective. Type-checker is pretty good at dealing with concrete or partially concrete overloads, so performance shouldn't be a problem.
I don't think so, but although that seems to be an uncommon opinion here, I doubt that the audience in this forum is representative for the majority of actual users of Swift.
However, I'm not questioning that we have a performance-centric API; that's great, and it definitely shouldn't be replaced.
Instead, I'd prefer adding an alternative mechanism that fully focuses on ease of use and flexibility.
In this regard, the pitched solution imho doesn't perform that well compared to possible alternatives.
Correct me if I'm wrong, but the actual problem to solve is that we have to repeat ourselves way too often for accessing elements in a collection.
Obviously, we need to specify the collection itself, but we have to repeat this to get any index for it, and we have to repeat it even more often when we want offset such an index.
This is fine when those positions are stored anyways, but there are many situations where this isn't the case, and all the repetition is just a burden:
It's clear that the index has to come from the base collection, and the pitch addresses the shortcomings of the general case.
However, there is variety of situations where the elements of the collection give us much more options than asking for the first or last element. As soon as we deal with Comparable or Equatable, there can be other interesting indexes in a collection.
Especially for Strings, there are many applications, and with flexible indexing, it would be possible to tackle many problems where people normally reach out for regular expressions with all their downsides.
In the most naive implementation, that "flexible" indexing would utilize closure which take a Collection, and return an index (or range) from that collection.
This could not only be the first or last element, but also the first element bigger than 42 (for a collection of Int), the last occurrence of a newline in a String or something completely different.
Nice! I updated the gist with this simplification.
I originally avoided that to skip a bunch of introspection-branching at the top for RACs. Since OffsetBound as written is fairly resilient, we wouldnât be able to leverage any static analysis from usage of .start or .end at the call site even if this were inlinable. But it might be worth it anyways. I tried to think of a more generally useful customization hook that could drive this, but couldnât.
We can certainly bit-pack the anchor into a single word along with the offset. Swiftâs address space is effectively capped at 60-bits (which itself is cautiously high). We wouldnât want this approach, though:
These are the two most important reasons. Also, we only have one zero representation, so we couldnât have zero mean both .start and .end, and the latter is useful for half-open ranges.
In practice, unless this type is going to commonly appear in large collections, there is not a huge difference between one word and two words (bucket sizes, stack alignment, etc). It is also resilient, so it will be handled indirectly anyways.
Not everyone is as big a fan of hand-packing bits as I tend to be, so unless thereâs a strong need for it, I went with a simpler and more understandable approach
I was planning to add a setter for the slicing subscript for RangeReplaceableCollection, but I was surprised to the find the protocol doesnât actually require one!
For the single-element one, a setter on MutableCollection would not be able to support nil doing removal, it would have to no-op.
I chose to leave these off. If we wanted to add setters to RRC, then it might make sense to add them there. Iâll update the âAlternatives Consideredâ with this rationale.
This is awesome. Especially for programs that are generic over Collection, this would make indexing operations a lot more readable. Right now, indexing String or a generic Collection whose index type is not Int is extremely cumbersome and this would really help to fix this. Big +1 for me on this proposal. I really hope this makes it into Swift.
I went with dropping .start and .end, since I found .first and .last to be so much clearer in pretty much every example I wrote. It also communicates the level of abstraction better, itâs more like Collection.first than Collection.startIndex. These offsets can be used across collections of different sizes and shapes, so long as theyâre shaped enough for the element at that offset to be present. In contrast, indices can only be used on the collection from which they were derived.
Along these lines, I added RRC methods like insert and replace, as well as the subscript setter, which fleshes out this abstraction.
I looked a little deeper into this. There are two limitations that stop us: we can't provide just a setter implementation, nor can we refer to an inherited implementation/requirement. Because Collectionâs subscript getter is a requirement of the protocol, RRC cannot provide a default implementation of a setter.
But, since we're defining the getter for the OffsetBound API, we can add a setter! Updated with a setter.
A funny piece of trivia: the value passed in to the subscript setter has to be of type Collection.SubSequence, which limits its convenience in generic code. But, ArraySlice conforms to ExpressibleByArrayLiteral and Substring conforms to ExpressibleByStringLiteral, so concrete use sites with literals work.
A bit of a tangent, but this is a good motivation for why set-only subscripts are useful. Itâs the reason Iâve wanted them for a long time, so that one can define an API where the subscript getter returns a concrete type, while the subscript setter can accept any type conforming to certain requirements.
which I think is very, very unfortunate from the point of newbie usability. It seems like an active bug magnet for newbies and old fogies alike, in contrast to this:
Edit: Specifically, the .last version invites off-by-1 errors, and it's unsymmetrical with offset calculations relative to .start/.first. Perfect humans don't make off-by-1 errors, but I'm not none one of them.
Wouldn't the change mitigate that by separating the naming so there's no longer any confusion. Previously, with start / end, the similarity to startIndex / endIndex is what would likely lead to errors. Frankly, this formulation is what I think most people would expect, with start and end, with the indexes being eternally confusing, at least for me.
Honestly, I'm fine either way, whether to use start/end or first/last. The API is high level enough that we can safely depart from the existing one. What's important is which one would be more ergonomic for the targeted use case.
I can see we'll spend a good amount of time debating and/or comparing between the two.
makes more sense to me and is symmetrical. I'm not sure that it inherently invites off-by-1 errors more than any indexing scheme, though it is less precedented in other languages.
Custom operators are somewhat covered in the Other Syntax section of the proposal. I can't recall if anyone has proposed the pipe operator before, but I don't immediately find it to be that readable in the examples (e.g. str[|3 ..< |3] vs str[|3 ..< 3|] vs str[3| ..< |3] vs str[3| ..< 3|]), though perhaps I'm just not used to it. I also think defining 3| as .last - 2 could be fairly confusing, and isn't symmetric (e.g. in terms of trimming n characters from the start of end). Why did you choose that interpretation?