Critic notes about Swift Range types jungle

tera · July 23, 2023, 5:57pm

TL&DR:
We have so many different and incompatible range types in Swift: Range, ClosedRange, PartialRangeFrom, PartialRangeThrough, PartialRangeUpTo, UnboundRange plus a RangeExpression protocol. Wouldn't it be better to have a single type instead?

Critic notes about Swift Range types jungle

There's a ClosedRange and Range (which is OpenRange). Perhaps OpenRange name would be better.
Technically they are half open ranges. HalfClosedRange
I can see a benefit of these two ranges being different:
Double.random(in: 0 ... 1)
Double.random(in: 0 ..< 1)

but what about the other two?

Double.random(in: 0 >.. 1)
Double.random(in: 0 >.< 1)

No, we can only have "half open vs closed" on the right end, left end is always closed. why? So technically what we have is RightHalfClosedRange
Is 1 ..< 9 equal to 1 ... 10? Practically speaking yes, but you can't compare them and their hashValues are different.
The concept of one sided ranges adds another type... Is "1 ..." equal to "1 ... .max" - again, for practical purposes yes, but those are different types that can't be compared.
A peculiar property about one sided ranges is that while you can use white space with normal ranges you can't with one side ranges (no space allowed for prefix / postfix operators)
UnboundedRange is the whole new thing... it's not even generic?!
UnboundedRange<Int> // Error
Is unbound range of Int (if existed!) equal to .min ... .max ? Practically yes, but again they are different types and can't be compared.
As a developer implementing an algorithm that wants to be as general as possible I have to express the algorithm taking RangeExpression parameter instead of a concrete type, which is less convenient due to the need of support of not just one generic dimension (boundary type) but the other dimension (concrete range type I am dealing with):

func foo<T>(range: ClosedRange<T>) {
    print(range.upperBound)
}

vs

func foo<T, Range: RangeExpression<T>> (range: Range) {
    print(range.upperBound) // Error
    fatalError()
}

Other languages have a notion of "range type", which is a similar but different concept: say, a type 0 ... 10 would only accept values within this range, attempting to have a value of that type out of that range would be a runtime error. Swift totally lacks here and I'd welcome this addition to be in Swift one day.

Could we done ranges differently in Swift?

// no RangeExpression protocol

// This is a SINGLE range type takes care of all cases
struct Range<T>: Hashable {
    enum Openness { case `open`, close }
    struct Edge {
        var openness: Openness
        var value: T
    }
    var lower: Edge
    var upper: Edge
}

infix operator .>.. : RangeFormationPrecedence // †
infix operator .>.<  : RangeFormationPrecedence

func ... <T: Comparable> (lhs: T, rhs: T) -> Range<T> { ... }
func ..< <T: Comparable> (lhs: T, rhs: T) -> Range<T> { ... }
func .>.. <T: Comparable> (lhs: T, rhs: T) -> Range<T> { ... }
func .>.< <T: Comparable> (lhs: T, rhs: T) -> Range<T> { ... }
// Option A. either no one sides ranges, use explicit constants like min / max / inf
// Option B. or similar to what we have now, returning the same Range type

(1 ... 9) == (1 ..< 10)
(1 ... 9) == (0 .>.. 9)
(1...) == (1 ... .max) // Option A: Error
(1...) == (1 ... .max) // Option B: true
(1.0...) == (1 ... .inf) // Option A: Error
(1.0...) == (1 ... .inf) // Option B: true

† - using ".>.." and ".>.<" name instead of simpler ">.." and ">.<" names is a consequence of an unfortunate rule:

Karl · July 23, 2023, 6:29pm

I will say: I think we should replace UnboundedRange.

Sometimes I implement things which include replaceSubrange, but don't conform to RangeReplaceableCollection (for reasons that are not worth going in to now). So, to ensure I support half-open ranges and all those other nice convenient ranges, I implement the function as:

mutating func replaceSubrange(
  _ subrange: some RangeExpression<Index>,
  with newElements: some Sequence<Element>
) -> Range<Index>

This works with everything except UnboundedRange, which actually cannot conform to RangeExpression since, as you noted, it is not generic and therefore has nothing for the Bound associated type to witness.

Ugh.

xwu · July 23, 2023, 6:44pm

Is not replacing the entire unbounded “sub”-range with a sequence of new elements isomorphic to reassigning the value by calling the initializer that takes that sequence as its argument? In the general case, since there are collection types that check and enforce some invariant after mutation, calling replaceSubrange to perform this operation would be less than ideal. Is there some reason this specific example is coming to mind?

While I agree that there are specific examples where expressivity limitations cause ... to be excluded where it does not make sense to be, this example gives me some pause as to whether in a sizable proportion of use cases, or even more often than not, the right thing is for the unbounded range case to be considered explicitly and handled separately, such that the current expressivity limitation is a feature and not just a bug.

xwu · July 23, 2023, 6:51pm

Hi @tera,

A lot of these points are not new and have been discussed at length in the past. Indeed, some of your observations (e.g., about one-sided ranges) are not entirely accurate and a careful study of those discussions would have helped to clarify. I would encourage you to spend some time with the very large amount of knowledge that has been preserved on these forums (the whole point of switching from the mailing list was to make it possible and easy to do so) before prompting the community to rehash them here again. If after such study there are new insights about how the standard library can be improved which you’ve newly discovered that haven’t been considered in the past, it would be of course delightful to hear them.

I want to emphasize that it would take a deeper level of insight to move the needle on this particular point. It goes without saying that, if it would “be better” to have one range type instead of six, the core team and the folks doing the implementation work would not have opted for six range types: it is implausible that they just didn’t think of that option and need it pointed out. Now, if you want to undertake a deep study of all the reasons why six types were chosen instead of one, and then look for any reasons which applied back then but do not apply to Swift as it is now thanks to new language features, and you can demonstrate convincingly that reworking the standard library in an ABI- and API-compatible way using the new features results in quantifiable benefits to user code (performance, correctness, etc.), then by all means we should have that conversation. But the onus for that work has to be on you: the people who designed the range types explained their rationale to the satisfaction of the core team at the time their proposals were adopted—they’re not on standby for all eternity to explain their choices on these forums.

Karl · July 23, 2023, 6:57pm

Not if your type doesn't support initialising a value out of thin air, such as if it is a mutable "view" type. Indeed, one of the reasons why I opted not to conform to RRC was so that I could omit that initialiser.

In my case, I was doing it with URL path components. A user may very well want to set the entire path using a collection of components, replacing anything that currently exists. But they can't create a PathComponents view from nothing, as it needs to know which URL it is part of and have the appropriate backing storage to write to (the behaviour can actually change depending on the values of other URL components -- like everything URL-related, the details are more complex than anyone would believe).

I don't see a problem with that. It is perfectly acceptable for a user to write:

url.pathComponents.replaceSubrange(
  url.pathComponents.startIndex...,
  with: ["foo", "bar", "baz"]
)

And we would need to deal with all invariant checks just the same as if they used a more compact spelling:

url.pathComponents.replaceSubrange(..., with: ["foo", "bar", "baz"])

The latter has the additional benefit that we don't call the .pathComponents getter twice.

taylorswift · July 23, 2023, 7:24pm

tera:

// This is a SINGLE range type takes care of all cases
struct Range<T>: Hashable {
    enum Openness { case `open`, close }
    struct Edge {
        var openness: Openness
        var value: T
    }
    var lower: Edge
    var upper: Edge
}

assuming the alignment is not too weird, this type will have a stride of 4 * MemoryLayout<T>.stride. the existing ranges, which encode Openness in the type system, occupy half as much space in memory.

for what it’s worth, i don’t think the appeals to historical precedent hold water as arguments against simplifying the range model. but doubling the memory footprint of a currency type like Range<T> is clearly going to be a nonstarter.

tera · July 23, 2023, 7:47pm

In the list of swift/proposals looking for "range" I can see only SE-0172 which introduces a new RangeExpression and a few new types, but not the previous split in to Range / ClosedRange which probably predates Swift evolution process. Would love to read relevant old discussions – would take me a bit to find them so bear with me.

Do you think range values are getting stored en mass, e.g. in Array of ranges?

The bullet point #10 from my list above could actually help here:

struct Edge {
    var value: T
    var openness: Openness
}

This could occupy 8 bytes for T being an interval of numbers from 0 to 2^55 (7 bytes) which is very big number sufficient for most if not all practical applications.

taylorswift · July 23, 2023, 7:54pm

for any type in the standard library (well, maybe not ContinuousClock) if you ask the question “is it being stored en masse” the answer is yes, somewhere someone is storing it en masse.

probably not by itself, probably as a field in a larger struct, but yes.

tera:

The bullet point #10 from my list above could actually help here:
struct Edge {
    var value: T
    var openness: Openness
}
This could occupy 8 bytes for T being an interval of numbers from 0 to 2^55 (7 bytes) which is very big number sufficient for most if not all practical applications.

well, it would not work with Int, which has a size, stride, and alignment of 8, would it?

tera · July 23, 2023, 8:41pm

You mean ContinuousClock.Instant? What's wrong with it?
Or do you just mean that people won't store it en masse because it's huge?

Not an arbitrary Int, although huge numbers rarely needed as indexes or bounds (It's like an array subscript - for all practical applications it could be 32 bits). If you have a practical example of having index value greater than what could fit in 7 bytes please share.

As a convenience we could allow creating such ranges from Int:

enum Openness {
    case `open`, closed
}

typealias Int56 = (UInt8, UInt8, UInt8, UInt8, UInt8, UInt8, UInt8)

struct Edge {
    var value: Int56
    var openness: Openness
    
    init(value: Int, openness: Openness) throws {
        guard let value = Int56(value) else { throw EdgeError.tooBigInteger }
        self.value = value
        self.openness = openness
    }
}

aggie33 · July 23, 2023, 9:10pm

I would say the two most common uses of ranges are:

For loops

for i in 1...100 { ... }

Collection subscripting

print([1, 2, 3][1...])

Neither of which are affected very much if the memory size of ranges increases. That said, this seems kind of unnecessary. This makes it harder to write simple algorithms on ranges, and is basically an AnyRangeExpression type. There are a lot of range types, but they all make sense.

I do think it would be good to update UnboundedRange. Currently, it's just a type alias for a function taking an _UnboundedRange and returning void, which can never be called since _UnboundedRange is an uninhabited enum. But once we gain the ability to extend function types to conform to protocols, UnboundedRange can actually conform to RangeExpression. Until then, we might want to make an actual, empty unbounded range type to use as an easy default argument to range expression parameters.

aggie33 · July 24, 2023, 2:23am

Also, the option you propose only replaces Range and ClosedRange. It doesn't address UnboundedRange, PartialRangeUpTo, PartialRangeThrough and PartialRangeFrom.

- (.open, .close) - 1..<3 // Range<Bound> (1, 2)
- (.close, .close) - 1>.<3 // currently does not exist (2)
- (.open, .open) - 1...3 // ClosedRange<Bound> (1, 2, 3)
- (.close, .open) - 1>..3 // currently does not exist (2, 3)

Adding a third case to Openness, unbounded, would fix this.

- (.unbounded, .close) - ..<3 // PartialRangeUpTo<Bound> (Int.min, ..., 2)
- (.unbounded, .open) - ...3 // PartialRangeThrough<Bound> (Int.min, ..., 2, 3)
- (.close, .unbounded) 1>.. // currently does not exist (2, 3, ..., Int.max)
- (.open, .unbounded) 1... // PartialRangeFrom<Bound> (1, 2, 3, ..., Int.max)
- (.unbounded, .unbounded) ... // UnboundedRange (Int.min, ..., Int.max)

bbrk24 · July 24, 2023, 2:29pm

Note that under the current rules of the language, this cannot exist. An operator that does not start with . may not contain it, and > may not be a unary postfix operator.

tera · July 24, 2023, 3:08pm

I found the concept of "unbounded" very strange. e.g. this range (pseudocode):

(UInt8(1) ... UInt8.unbounded)

doesn't feel right to me. Imho it is as good as a bounded range between 1 and 255, i.e. (1 ... max)

Even for Double it could be represented as a bounded range from 1 to +inf

I wonder if that was the very reason to not have "left open" ranges...

Double.random(in: 1 ... 2) // ✅
Double.random(in: 1 ..< 2) // ✅
Double.random(in: 1 >.. 2) // 🛑 oops
// But let's workaround: -Double.random(in: -2 ..< -1)
Double.random(in: 1 >.< 2) // 🛑 oops, no workaround other than manual filtering of "1.0" value

aggie33 · July 24, 2023, 4:20pm

A range expression is usually either sugar for a collection or a comparison operation. If I'm pattern matching:

switch n {
 case 1...:
  print("one or over")
 default:
  print("under one")
}

// in your pitch:
switch n {
  case 1...Int.max:
    ...
  default:
    ...
}

That's basically equivalent to:

if n >= 1 { ... } else { ... }

So an unbounded range serves a useful role in pattern matching and collection subscripting. In your hypothetical example, one would have to write

let string = "Hello, world!"
let index = string.firstIndex(of: "w")

// swift now
print(string[index...])

// swift with your proposal
print(string[index..<string.endIndex])

So I think unbounded ranges do serve a useful purpose.

aggie33 · July 24, 2023, 4:23pm

I think the range design was also influenced by what it was replacing. They wanted to replace these methods:

prefix(upTo:)
prefix(through:)
suffix(from:)

So they added three new range types to match them.

PartialRangeUpTo<Bound>
PartialRangeThrough<Bound>
PartialRangeFrom<Bound>

But the fact that a range operator not including the left bound would have to be spelled like this .>.. probably influenced the decision.

QuinceyMorris · July 24, 2023, 5:55pm

Indexing is asymmetrical regarding left and right. To see this, think about inserting an element between elements 3 and 4 of an existing array of 5 Ints. The function to do this could be either "insert after 3" or "insert before 4", and it's a non-trivial difference because it means the index parameter to the function is different in each case.

Swift (like most languages) has opted for the "right handed" approach: it's "insert before 4".

As a consequence, the openness also most frequently belongs on the right-hand end of a range, since (to end the range between element 3 and 4) we typically want to use index 4 ("before"), not 3 ("after"), for consistency even though both would work.

That doesn't mean that "openness on the left" would somehow be bad. It's just not something that there seems to be a lot of demand for. By contrast, "openness on the right" fits naturally with constructs like for index in 0 ..< array.count, being concise and avoiding an off-by-one error.

tera · July 24, 2023, 7:24pm

It doesn't feel "unbound" though in the sense that "you keep going and never stop" or "stop when reaching infinity". It's more like "figureOutYourselfWhereToStop". For example if you had this API:

extension CGRect {
	func iterateArea(
	    horizontalRange: Range<Float>,
	    verticalRange: Range<Float>,
	    stepWidth: CGFloat,
	    stepHeight: CGFloat,
	    execute: (Range<Float>) -> Void
	)
}

to iterate a given CGRect with a certain 2D range and use it:

// iterate right half
rect.iterateArea(
    horizontalRange: (rect.left + rect.width/2)...,
    verticalRange: rect.top...,
    stepWidth: 0.01,
    stepHeight: 0.01
)

here you would assume assume that iteration stops at rectangle boundaries rather than at the natural "unbound" value which is infinity. Maybe "unbound" just a bad name.

Sounds plausible.

I can definitely see how we started with that. Then we extended the concept of range to floating numbers - something unrelated to "for in" loop or "indexes" at all. Compared to integer ranges, open v closed variants of which are in all fairness trivially convertible one to another, extending ranges to floats opened us up to something previously unseen: opened vs closed range of floats to allow things like "Float.random(in: -1...1)" vs "Float.random(in: -1..<1)". From there is just one small step to "left open" ranges to get the picture complete, the step we haven't done yet.

dcowuno · July 28, 2023, 3:37pm

At least be helpful and link to the previous literature, please. Assuming everyone is operating in good faith, OP likely searched for this topic and didn't find whatever previous literature you're referring to. I just searched and couldn't find anything along the lines of an authoritative conclusion on the range topic. The only thing close that I found is a comment saying "we can't change range because ABI stability", which all but admits that the status quo sucks but Swift is entrenched so GG. So do you really fault people for continuing to bring up the topic when they encounter a shitty part of the language? Is there no future major release of Swift planned where an ABI break might be allowable to address some more existential concerns that these discussions could be filed under so that progress can be made towards improvement?

xwu · July 28, 2023, 4:53pm

A large portion of the relevant discussions take place during the pitch and proposal review phases for the relevant proposal(s), and this is very much the case here. Yes, that involves actually reading a quite extensive amount of prior writing, but it's only right that the person who is kicking off an unsolicited "critique" of an existing design study it in detail. That is not—and cannot be—my job.

I don't see any "existential" concern raised here. There is absolutely no plan for breaking ABI, and even if that weren't a concern, let's be clear that we are here to evolve an existing, broadly used language and not remake it. As the name suggests, this process is fundamentally an accretive one.

Lantua · July 28, 2023, 9:25pm

Amongst other things: