[Pitch] `contains(_:)` for Ranges

sky · December 4, 2022, 5:54am

Hey folks. While working on Advent of Code, I ran into the need for a contains(_:) for ClosedRange. What I found was that I had one available, but because I had imported RegexBuilder in my source file, it was one of the contains(_:) extension methods that were added onto Collection as part of SE-0357. As a result, I was unexpectedly performing O(n) operations on my ranges when O(1) would've been possible.

To correspond to the existing overlaps(_:) method, I'd like to propose a contains(_:) or isSuperset(of:) method to be included into the standard library for ClosedRange, and Range if possible. It seems like a strange gap in the existing API. I saw this was last proposed back in 2017, so maybe this is worth raising once again. Hopefully the community will find this a positive inclusion.

Nevin · December 4, 2022, 3:34pm

It might help to clarify that you are talking about a method with signature

contains(_: Self) -> Bool

or perhaps

contains(_: RangeExpression<Bound>) -> Bool

as indeed there is already

contains(_: Bound) -> Bool

In any case, I think the alternative spelling isSuperset(of:) that you suggest, may be preferable.

xwu · December 4, 2022, 3:45pm

If we were to add one operation, I'd add the whole lot of them and make sure that the names all align with those of operations on Set:

isSuperset(of:)
isStrictSuperset(of:)
isSubset(of:)
isStrictSubset(of:)
isDisjoint(with:)

davedelong · December 4, 2022, 4:00pm

The notion of "containment" is slightly more complex when it comes to ranges, because there are 13 different ways you can describe how ranges are related: Allen's interval algebra - Wikipedia.

scanon · December 4, 2022, 4:10pm

The ones that are independent of ordering map cleanly onto the set algebra notions, however, so it makes sense to treat them as a different thing vs. the ones that use the notion of order.

sky · December 4, 2022, 4:27pm

This is all good feedback. I raised this pitch without much knowledge of the underlying math or computer science, so I appreciate it. I'll make sure to update the OP with the relevant signatures and behaviors with regard to their relations.

The only concern I have with a name other than contains(_:) is that exposure of the new Collection extension in _StringProcessing. This method is public, undocumented, and O(n), all of which can be unexpected for users working with Ranges. I would want any solution we implement to have an API impact in this case. That said, I appreciate the desire for consistency (despite not directly conforming SetAlgebra), and I vaguely recall Swift's API design ethos standing against having multiple methods to express the same thing, like we see in Ruby.

xwu · December 4, 2022, 4:29pm

Sounds like there's a problem with that method being available on range types because it is inappropriate to use; if so, we should fix the problem, whether by deprecating or some other method to warn users, independent of what APIs are added.

As a general principle, I'd argue that a string processing facility exposing unnecessarily inefficient operations on non-string processing types should be considered a bug.

anon9791410 · December 4, 2022, 11:40pm

Sequence.contains(_: some Sequence<Element>) should be in the standard library.
I'm happy that it exists in some form now (only for Collection, like with sadly much of the Algorithms package), but it does need to be handled better.

contains and isSuperset mean the same thing for ranges, because their elements are both unique and ordered, but contains more clearly demonstrates the idea of order. These overloads need to be added to the standard library to fix the problem:

`contains` for all `RangeExpression`s

public protocol RangeProtocol<Bound>: RangeExpression {
  func contains(_ range: ClosedRange<Bound>) -> Bool
  func contains(_ range: PartialRangeFrom<Bound>) -> Bool
  func contains(_ range: PartialRangeThrough<Bound>) -> Bool
}

public extension RangeProtocol where Bound: Strideable, Bound.Stride: SignedInteger {
  func contains(_ range: PartialRangeUpTo<Bound>) -> Bool {
    contains(PartialRangeThrough(range))
  }

  func contains(_ range: Range<Bound>) -> Bool {
    contains(ClosedRange(range))
  }
}

public protocol FiniteRange: RangeProtocol { }

public extension FiniteRange {
  func contains(_ range: PartialRangeFrom<Bound>) -> Bool {
    false
  }

  func contains(_ range: PartialRangeThrough<Bound>) -> Bool {
    false
  }

  func contains(_ range: PartialRangeUpTo<Bound>) -> Bool {
    false
  }
}

// MARK: -

extension ClosedRange: RangeProtocol {
  public func contains(_ range: Self) -> Bool {
    contains(range.lowerBound) && contains(range.upperBound)
  }
}

extension ClosedRange: FiniteRange { }

// MARK: -

extension PartialRangeFrom: RangeProtocol {
  public func contains(_ range: ClosedRange<Bound>) -> Bool {
    contains(range.lowerBound)
  }

  public func contains(_ range: Self) -> Bool {
    contains(range.lowerBound)
  }

  public func contains(_ range: PartialRangeThrough<Bound>) -> Bool {
    false
  }
}

public extension PartialRangeFrom {
  func contains(_ range: PartialRangeUpTo<Bound>) -> Bool {
    false
  }

  func contains(_ range: Range<Bound>) -> Bool {
    contains(range.lowerBound)
  }
}

// MARK: -

extension PartialRangeThrough: RangeProtocol {
  public func contains(_ range: ClosedRange<Bound>) -> Bool {
    contains(range.upperBound)
  }

  public func contains(_ range: PartialRangeFrom<Bound>) -> Bool {
    false
  }

  public func contains(_ range: Self) -> Bool {
    contains(range.upperBound)
  }
}

// MARK: -

extension PartialRangeUpTo: RangeProtocol {
  public func contains(_ range: ClosedRange<Bound>) -> Bool {
    contains(range.upperBound)
  }

  public func contains(_ range: PartialRangeFrom<Bound>) -> Bool {
    false
  }

  public func contains(_ range: PartialRangeThrough<Bound>) -> Bool {
    contains(range.upperBound)
  }
}

public extension PartialRangeUpTo {
  func contains(_ range: PartialRangeUpTo<Bound>) -> Bool {
    range.upperBound <= upperBound
  }

  func contains(_ range: Range<Bound>) -> Bool {
    range.upperBound <= upperBound
  }
}

// MARK: -

extension Range: RangeProtocol {
  public func contains(_ range: ClosedRange<Bound>) -> Bool {
    contains(range.lowerBound) && contains(range.upperBound)
  }
}

public extension Range {
  func contains(_ range: Self) -> Bool {
    range.upperBound <= upperBound
  }
}

extension Range: FiniteRange { }

xwu · December 5, 2022, 12:57am

How would you propose to make this behave correctly and usefully for infinite or single-pass sequences? And if neither are supported, why is it “sad” that it’s only implemented for collections?

anon9791410 · December 5, 2022, 1:20am

I don't think that question makes any sense. There's tons of stuff in the standard library which assumes a Sequence is neither of those. If you want to have those be type-safe argument to contains, you'll need more protocols that derive from Sequence.

Because for everything that's provided to us for Collection, we need to go implement the Sequence version ourselves. It's not respectful of everyone's time, especially when people making the decision to work on Collection are getting paid well for it.

xwu · December 5, 2022, 2:02am

Let me rephrase—what type exists which can conform to Sequence but not Collection for which this contains API implemented on Sequence would behave usefully while relying on only the guarantees of that protocol?

xwu · December 5, 2022, 2:10am

Which standard library APIs vended on Sequence return an incorrect result when invoked on a single-pass or infinite sequence?

bbrk24 · December 5, 2022, 2:26pm

Would "not returning" be considered an incorrect result? I can't think of any that return garbage, but filter, suffix, and the map family have to iterate over the entire sequence before returning.

xwu · December 5, 2022, 4:53pm

I’d argue that it’s the only correct behavior: if you insist on mapping over an infinite sequence or retrieving the last n elements, such an operation must not return.

Nevin · December 5, 2022, 5:28pm

I believe the various Stride types fit the bill. They should be collections, but technical limitations mean they are only sequences.

In theory, with a sufficiently clever implementation, filter on unbounded ranges could sometimes be finite:

(0...).filter{ $0 < 10 }

xwu · December 5, 2022, 7:57pm

It’s good that you brought that up. There’s a lot I could say about these types. But that’s for another time.

Saliently for this conversation, though, these types should be random access collections, but they are not—and most if not all of the technical hold-up comes down to that. (There are not sufficient semantic guarantees made by Strideable such that it is possible to actually implement random access when striding over values of an arbitrary Strideable type.)

I say that this is salient here because, if we were to vend a “plain” Collection conformance to make the existing contains available for stride types, or if we were to make contains available on all Sequence types for that purpose as suggested above, we would be exacerbating rather than solving @sky’s issue which prompted this thread. Namely, there would be yet another O(n) implementation of an algorithm that users will want to reach for that’d be wildly inefficient for the most common strides.

Hence my question: is there an example of a type which can conform to Sequence but not Collection for which this contains API implemented on Sequence would behave usefully—and thus be “respectful” to make available so that users don’t have to write their own?

The stride types don’t fit the bill because, to my mind at least, it is much better to have the user manually call isMultiple(of: 3) to make their own efficient version of contains for an integer stride-by-3 instance, than to vend an O(n) implementation which will repeatedly add 3.

xwu · December 5, 2022, 8:13pm

I’d take the view that it is correct behavior for filter to test the predicate against every element of a sequence, or at least to behave as-if it has done so in all observable respects.

Swift doesn’t require the “purity” of filter predicates, and I think we can agree that (0...).filter { print($0 < 10); return $0 < 10 } should never return.

Now, never returning is itself observable, so even if the compiler were “sufficiently clever,” I’d argue that filter is correct not to return from (0...).filter { $0 < 10 }.

Practically, as you know of course, prefix(while:) would be how one spells a definitely returning version of the above operation now and in the future.

anon9791410 · December 5, 2022, 11:05pm

UnfoldSequence. I think you will take issue, as the documentation says "the sequence may potentially be infinite in length". Typically, as with every other sequence, that's not going to be the case.

Feel free to add more guarantees via the type system—I'll use them!