Unify and expand Range and its likes

Some light background

There currently exists 6 types in the standard library that represent intervals.

5 of them conform to RangeExpression:

  1. Range
  2. ClosedRange
  3. PartialRangeFrom
  4. PartialRangeThrough
  5. PartialRangeUpTo

And a 6th UnboundedRange that is like a mystery.


My beefs with ranges

  1. They do not represent all kinds of intervals. Here are the ones they are able to represent:

    let r1 = 0...1 // r1 is [0, 1]
    let r2 = 0..<1 // r2 is [0, 1)
    let r3 = 0...  // r3 is [0, ∞)
    let r4 = ...1  // r4 is (-∞, 1]
    let r5 = ..<1  // r5 is (-∞, 1)
    let r6 = ...   // r6 is (-∞, ∞), I'm not sure if the assignment works tho
    

    These "range" types cover only 6 kinds of intervals.

    An interval is defined by its endpoints' values, and the openness of its boundaries. An endpoint can be either bounded (non-infinite) or unbounded (infinite), and a boundary either open or closed. Thus, there are either 8 (if an infinite endpoint cannot have a closed boundary) or 16 (if an infinite endpoint can have a closed boundary) kinds of intervals. No matter which ever rule we use, the "range" types are at least 2 short.

  2. You can not easily iterate a RangeExpression-conforming type backwards.

    Since Swift removed C-style for-loops in favour of for-in loops, it has become harder to iterate descendingly than ascendingly. However, sometimes it is convenient and beneficial to do something like this:

    for index in 5..>0 {
        // for-loop block...
    }
    
  3. All 5 existing RangeExpression-conforming types are highly similar in terms of both their use cases and their structures. However, the only thing they share code-wise is the RangeExpression-conformance. Combined with that RangeExpression can only be used as a generic constraint because of its associated type Bound, it leads to lots of boilerplate code when something needs to work with more than 1 "range" type:

    For example, starting from line 943 of Range.swift are these 2 functions (with comments omitted):

    @inlinable
    public func overlaps(_ other: Range<Bound>) -> Bool {
        let isDisjoint = other.upperBound <= self.lowerBound
            || self.upperBound <= other.lowerBound
            || self.isEmpty || other.isEmpty
        return !isDisjoint
    }
    
    @inlinable
    public func overlaps(_ other: ClosedRange<Bound>) -> Bool {
        let isDisjoint = other.upperBound < self.lowerBound
            || self.upperBound <= other.lowerBound
            || self.isEmpty
        return !isDisjoint
    }
    

    Then, starting from line 444 of ClosedRange.swift are 2 more of these functions (again with comments omitted):

    @inlinable
    public func overlaps(_ other: ClosedRange<Bound>) -> Bool {
        let isDisjoint = other.upperBound < self.lowerBound
            || self.upperBound < other.lowerBound
        return !isDisjoint
    }
    
    @inlinable
    public func overlaps(_ other: Range<Bound>) -> Bool {
        return other.overlaps(self)
    }
    

    There will need to be at least 12 more near-identical instance methods to cover all possible overlap tests between "range" types if we ignore polarity (e.g. only closedRange.overlaps(partialRangeFrom), but no partialRangeFrom.overlaps(closedRange)) and UnboundedRange. Then 18 more on top of that if we include UnboundedRange, or 33 more if we both include both polarity and UnboundedRange.

    In addition to all this, it is even harder to represent an interval that can be either bounded or unbounded or open or closed, using just a variable (or constant). You have to work around it with either a conditional logic or a wrapping type.

    For example, starting from line 13 of VersionSetSpecifier.swift in Swift Package Manager is this enum:

    /// An abstract definition for a set of versions.
    public enum VersionSetSpecifier: Hashable {
        /// The universal set.
        case any
        /// The empty set.
        case empty
        /// A non-empty range of version.
        case range(Range<Version>)
        /// The exact version that is required.
        case exact(Version)
        /// A range of disjoint versions (sorted).
        case ranges([Range<Version>])
    }
    

    It needs a custom enum just to work with different kinds of version intervals. If it needs to work with closed intervals, it will have to either add a new case, or convert a ClosedRange instance to a Range instance (and this only works if Bound is Strideable). Also, .ranges only works with Range. If the set ever needs to contain both ClosedRange and Range, then there needs to be another custom type and/or logic.

  4. Range is ambiguous. It at least misled me to think that it's a big umbrella over all "range" types. All other "range" types have descriptive names that explain what kind of ranges they are. Range is probably better called LeftClosedRightOpenRange, or something similar.

  5. UnboundedRange is a mystery. It works in many places where other "range" types work, but there is no indication of how it works. It's a type alias of (UnboundedRange_) -> (), which in turn is a complete mystery and has almost no footprint in the entire codebase. The unbounded range operator ... is declared as a postfix operator but acts kind of like an expression. And it's not callable. The only documentation it has (or at least the only bit I can find) is a TODO: "How does the ... ranges work?"

    UPDATE: As it turns out, UnboundedRange isn't that mysterious after all. See Standard Library thread: How does UnboundedRange work?


My solution

I started a draft pull request that introduces a new generic type Interval:

https://github.com/apple/swift/pull/32865

This new type should be void of all listed problems above.

It's able to represent all kinds of intervals by holding 4 constants:

public let lowerBoundary: Boundary
public let lowerEndpoint: Endpoint
public let upperEndpoint: Endpoint
public let upperBoundary: Boundary

It can be iterated backwards by using a set of inverse interval operators, or by manually setting a flag:

public var isInverse: Bool

An unbounded interval is not much different from other intervals. It just has unbounded endpoints and open boundaries. It's debatable whether an unbounded interval can be closed. Also, because an unbounded interval is not of a special type, it contains the same wrapped type information as the non-unbounded intervals.

Lots of basic things are already implemented in the draft pull request, but there is still a lot to work on. There are a lot of bikesheddable things too. For example, the interval operators are a bit unwieldy, compared to range operators. And the current Interval design doesn't allow all the "range" types to become a type alias, like how CountableRange works.


Overall, I would like to seek the community's opinions and feedbacks, and improve it, before turning it into a proper pitch.

7 Likes

You can iterate over Range or CountableRange backwards by using .reversed(). Range is a bidirectional collection so this generates efficient code, as shown in this compiler explorer demo.

You cannot iterate the other range types because they are infinite in size.

9 Likes

This is true, but comes with drawbacks.

First, it's harder (albeit slightly) to iterate descendingly than ascendingly.

In a C-style for-loop, both ascending and descending iterations have the same complexity:

for(var i = 0; i < 10; i++) {
    // ...
}

for(var i = 9; i >= 0; i--) {
    // ...
}

Whereas with a for-in-loop using a range expression, iterating descendingly is slightly more complex than ascendingly:

for i in 1..<10 {
    // ...
}

for i in (1..<10).reversed() {
    // ...
}

Second, range.reversed() returns a ReversedCollection<Range>, not Range, which means it does not work in every place where Range works.

There are some technical barriers to this change, like source and ABI compatibility, but I’m going to put those aside, because there’s a more important issue: we wouldn’t want this design. We’ve considered similar ones in the past, but we rejected them in favor of the current solution, where decisions about boundary and direction are baked into the type. Let me explain why.

1. The current design enforces the correctness of iteration.

For instance, suppose you try to do this:

let myRange = ..<3
for i in myRange { ... }

This iteration is impossible to perform because it has no starting value—the last value will be 3, but what will the first value be? Fortunately, the compiler knows that it can’t iterate over a PartialRangeTo, so it rejects this code.

But if this was done with your Interval type:

let myInterval = Interval(fromUnboundedTo: 3, .exclusive)
for i in myInterval { ... }

Then there would be no indication in the type system that this iteration is impossible to perform, so the mistake could only be caught by a runtime crash instead of a compile-time diagnostic.

2. The current design makes loops over ranges much faster.

For instance, currently when the compiler sees this:

for i in myRange { ... }

It knows that it can turn that into something like:

var i = myRange.lowerBound
while i < myRange.upperBound {
  ...
  i &+= 1
}

And in general, it actually does—a Swift for i in someRange loop is nearly always as fast as a C for loop.

With your interval type, Swift would have to generate more flexible—and therefore much slower—code to account for all the possible variations in an Interval<Int>:

var i: Int
let endpoint: Interval<Int>.Endpoint
let startBoundary, endBoundary: Interval<Int>.Boundary
let step: Int
if myInterval.isInverse {
  step = -1
  startBoundary = myInterval.upperBoundary
  endBoundary = myInterval.lowerBoundary
  endpoint = myInterval.lowerEndpoint
  switch myInterval.upperEndpoint {
  case .bounded(let bound):
    i = bound
  case .unbounded:
    fatalError(“Can’t iterate from an unbounded endpoint”)
  }
}
else {
  step = 1
  startBoundary = myInterval.lowerBoundary
  endBoundary = myInterval.upperBoundary
  endpoint = myInterval.upperEndpoint
  switch myInterval.lowerEndpoint {
  case .bounded(let bound):
    i = bound
  case .unbounded:
    fatalError(“Can’t iterate from an unbounded endpoint”)
  }
}

if startBoundary == .open {
  i += step
}

while ({ () -> Bool in
  guard case .bounded(let end) = endpoint else { return true }
  switch (myInterval.isInverse, endBoundary) {
  // I actually may have flipped some of the tests here, but you know what I mean.
  case (true, .open):
    return i > end
  case (true, .closed):
    return i >= end
  case (false, .open):
    return i < end
  case (false, .closed):
    return i <= end
  }
}()) {
  ...
  i += step
}

(Note the fatalError calls—those are point 1.)

Even if you provide methods to perform these operations, they will still end up executing the same amount of code—just better encapsulated. The only way to actually reduce the amount of code executed when iterating over an Interval would be by constant folding (the optimizer determining, for instance, that isInterval will always be false for a particular loop and removing code that only runs when it’s true), but unlike information embedded in the individual types like Range, information from constant folding can’t cross function call boundaries unless the loop is inlined into the caller. So loops over Intervals would frequently be much slower than loops over Ranges.

It’s tempting to make a maximally configurable replacement for the Range types—I’ve had the same idea in the past. But these types have the design they have for specific, compelling reasons.

28 Likes

Thanks for the detailed feedback.

These are definitely 2 points that I neglected when designing Interval, and it seems like it's hard to improve or get around them.

^ I do have a question for (or perhaps an argument against) the first point, though. I think the compiler proves the correctness of iteration by checking if the type conforms to Sequence? Left-unbounded ranges fail at compile time if you try to iterate over it, because they don't conform to Sequence. But I think it becomes problematic when it comes to PartialRangeFrom. PartialRangeFrom conforms to Sequence, and so does ReversedCollection<PartialRangeFrom>. As a result, you can have something like this:

for i in (1...).reversed() {
	print(i)
}

It compiles, and runs non-stop. Instead of throwing an error, it just keeps running and never enters the for-loop body. Is this behaviour fixable with the current design? And if it's not, would the current design still be better than Interval in terms of point 1?

Here is the assembly code:

It's quite long, and I don't know much of x86 instruction set to understand it.

1 Like

PartialRangeFrom is not a BidirectionalCollection, so calling reversed() on it does not produce a ReversedCollection.

Instead it produces an Array.

The program hangs while attempting to materialize that array. This is exactly the same behavior you would see when converting any other very long (or endless) sequence into an array.

• • •

To “fix” the behavior, we could introduce a deprecated overload:

extension PartialRangeFrom where Bound: Strideable, Bound.Stride: SignedInteger {
  @available(*, deprecated, message: "Cannot reverse a PartialRangeFrom")
  func reversed() -> [Bound] { fatalError() }
}

That way attempting to call reversed on a PartialRangeFrom would give a compile-time warning and trap at runtime, instead of hanging.

The hang would still happen if reversed was called on such a value in a generic context, but such is life.

• • •

Interestingly, this code traps at runtime for me (Xcode 11.3.1, Swift 5.1.3):

let a = (1 as Int8)...
let b = a.reversed()  // Execution was interrupted, reason: EXC_BAD_INSTRUCTION 
print(b.count)

And indeed so does this:

let a = Array((1 as Int8)...)

Or even this:

let a = ((1 as Int8)...).max()

The implication is clearly that partial ranges in fact model truly unbounded, one-sided intervals, not restricted to the set of representable values for the element type.

8 Likes
0...10
0<..10
0..<10
0<.<10

...10
..<10

0...
0<..

...

I'm not sure which 8 or 16 you meant, but these are the nine we should have. There have been discussions here before about the lack of ranges with open lower bounds, but no real reason was provided for their absence as far as I remember.

Your count of 9 is correct. My math must be off.

I got the 16 from (2²)², because there are 2 endpoints, and each can be either bounded or unbounded, and open or closed. But that includes things like [0, ∞], which need to be removed from the count (unless we allow closed unbounded intervals). And that must be where my calculation went wrong. I probably counted [-∞, ∞] twice.

I don't know if it's much of a reason, but as far as the operators grammar goes, <.. and <.< are invalid.

You can also define custom operators that begin with a dot ( . ). These operators can contain additional dots. For example, .+. is treated as a single operator. If an operator doesn’t begin with a dot, it can’t contain a dot elsewhere. For example, +.+ is treated as the + operator followed by the .+ operator.

Also because of this rule, I had to make up some quite awkward opertors that uses "~" in place of ".":

Ah I didn't realise operators had to start with a . if they wanted to contain one. Let's hope that's not the reason we don't have <.. though!

By the way, your operator >=~~~ should probably be written something like ~~~~~ since it would correspond to ..., no? In normal swift . signifies a range that is closed on that side.

I've considered something similar. In the TODO comment right above operator declarations, I left some notes on using bullet points instead of dots to circumvent the grammar, and shorten the operator to 3-character long. The problem is operators like ..., ~~~, and ••• lack the directionality that <=~<= and >=~>= have.

1 and 2 seem like arguments not to unify the partial (unbounded) and bounded families. But I'd like to take a weaker position; why not unify ClosedRange and Range? Or, an even weaker position: unifying ClosedRange and Range expressions.

In the spirit of "my beefs with ranges", let me lay out the drawbacks of ClosedRange/Range:

  1. Range is ambiguous. Following other Swift type families like Int32/Int64/Int, I might expect types like OpenRange, ClosedRange and Range (which is preferred). I understand we don't need a machine-dependent semantics like we do for Int, and we want to indicate which type is preferred by giving it a short name, but the inability to clearly indicate off-by-1 bugs in a code review is a real bother.
  2. Conversions are hard. Consider this OS function:
extension MTLComputeCommandEncoder {
    public func setBuffers(_ buffers: [MTLBuffer?], offsets: [Int], range: Range<Int>)
}

I would argue that it is equally sensible for a client to use either syntax:

instance.setBuffers(buffers, offsets: offsets, range: 0..<1)
instance.setBuffers(buffers, offsets: offsets, range: 0...0)

However, only one syntax is allowed here. Someone could write their function <B: RangeExpression> but in practice they didn't, everyone just uses Range. If range expressions were convertible or we had an ExpressibleByRangeLiteral type conformance it would go a long way.


Although this is more an argument in favor of a broader unification, I understand in theory why we want a expressive and therefore safer type system, but in practice I don't think we live in that world. We have a preferred type, everyone uses Range and lives dangerously. Presenting again the OS function:

extension MTLComputeCommandEncoder {
    public func setBuffers(_ buffers: [MTLBuffer?], offsets: [Int], range: Range<Int>)
}

I can call this with

encoder.setBuffers([],offset:[],range:0..<25)

and not only will it compile, it actually UBs with an out-of-bounds read inside Metal. (For those unfamiliar with Metal, the function is supposed to assign the elements of the first parameter to an internal array with the indices in the last parameter, but here the sizes mismatch.)

Arguably this function should take a PartialRangeFrom and infer the final index from the array count rather than having 2 parameters that may disagree. In doing so it would be more obvious when implementing this function that you can't iterate over the range parameter without first converting to a complete/iterable type by supplying an end index from the first parameter. I believe this is the bug @beccadax envisions as being prevented by an expressive typesystem.

I believe there are ObjC bridging reasons to use Range here. It's unfair of me to present some ObjC bug as not being fixed by Swift, but it illustrates how these types are actually used in the wild. When there's better tooling for preferred types, it tilts the playing field in favor of their use, away from what might be the "natural" type we might prefer.

Separately from bridging, there is also a potential callside argument for preferring "the unnatural type" here. That is, if Metal takes an unbounded range and infers the end index from the array's count, it's easier to implement the function correctly, but calling it becomes harder:

encoder.setBuffers(someBuffers,offset:someBuffers.map{0},range:0...)
encoder.setBuffers(moreBuffers,offset:moreBuffers.map{0},range:<#?#>) //what goes here again?

vs by requiring redundant information, we have to remember to check it's the same inside the function, but callers can more easily reason about the arguments:

encoder.setBuffers(someBuffers,offset:someBuffers.map{0},range:0..<5)
encoder.setBuffers(moreBuffers,offset:moreBuffers.map{0},range:5..<10) //copy from line above
encoder.setBuffers(evenMoreBuffers,offset:moreBuffers.map{0},range:10..<17)

I can see arguments for either style. In any case, I think most uses of Range does not truly consider which type really ought to be used. In cases where it is considered, I think the considerations may be at some variance with what choice might produce the best local typesafety.


I do think a lot of this can be fixed without compromising type expressivity or deeply overhauling the stdlib. For example,

  • we could support implicit conversions between (some) ranges (or perhaps range expressions) which would "unify" the syntax callside, without compromising the codegen or safety
  • Provide a more thorough ObjC bridge for the non-preferred ranges
  • typealias OpenRange = Range

I think the Metal example shows a plausible motivation for unifying bounded and unbounded ranges (or expressions) as well, but this is perhaps a bit more specialized, and may be better handled by generics or overloads than new language surface area.

1 Like