Shorthand for Offsetting startIndex and endIndex

So sorry I completely forgot I saw this. I must of been subconsciously influenced. :worried:

Not at all; it's on a public forum for anybody to use. I was just commenting that I was looking at a similar design, but I was a bit worried it wouldn't get accepted because it might be too minimal. But these days I'm less worried about that.

The operators that take an Index and IndexDistance are problematic for non-RandomAccess collections where both of those types are, eg., Int.

This operator from your demo:

func + <C: Collection>(idx: C.Index, distance: C.IndexDistance) -> IndexExpression<C> {
  return IndexExpression(base: .index(idx), distance: distance)
}

And the nearly-identical operator from Letanyan’s post:

func +<C: Collection>(lhs: C.Index, rhs: Int) -> IndexOffset<C> {
  return IndexOffset(base: .index(lhs), offset: rhs)
}

Will not actually get called in an expression like c[c.startIndex + 4] if c uses Int indexes. That is totally fine if c is RandomAccess such as Array, but if not then it is a problem.

I never suggested removing the offset label in the subscript. I still thinks it’s important to have it.

Perhaps you could show a larger set of before/after examples that demonstrate the improvement in usability? The improvement is not very clear yet.

Going back to your original motivation, you've ended up replacing this:

let m1 = s[...s.index(s.startIndex, offsetBy: 4)]

with this (including unfortunate but necessary parentheses):

let m2 = s[offset: ...(s.startIndex + 4)]

That example is not obviously a big win, is it?

If there was a practical way of getting something like this, in the general case:

let m3 = s[...s.startIndex.offset(by: 4)]

it starts to look convincing, I think. (cf. advanced(by:))

(Also, your last-posted code yielded compile-time ambiguities in a number of examples I tried.)

IMO we should be thinking in terms of replacing many SubSequence-returning operations with something equally legible. I don't want to keep a.prefix(3), a.dropFirst() and a.dropLast(10) around (at least, not without deprecation) once we have a unified slicing syntax that handles these cases. Most of the proposals I've seen don't seem to result in a readability win over a.prefix(3).

All of these:

let m1 = s[...s.index(s.startIndex, offsetBy: 4)]
let m2 = s[offset: ...(s.startIndex + 4)]
let m3 = s[...s.startIndex.offset(by: 4)]

are just bad ways to write:

let m4 = s.prefix(5)

You can also combine these

let m4 = s.dropFirst(5).prefix(5) // elements 5...9

The only functional reason to have a slicing syntax for this is so that we can do mutations:

 s[...s.startIndex.offset(by: 4)].sort() // sort the first 5 elements

I'm not saying I have the answer of course: every suggestion I've made in the past has raised vociferous objections from some people. For example,

 s[..<++5].sort()         // sort the first 5 elements
 s[--5...].sort()         // sort the last 5 elements
 let tail = s[..<++1]     // let tail = s.dropFirst()

It also composes; you're not stuck with offsets on both ends (a very minor win):

 s[someIndex..<--42].sort() // sort elements starting at someIndex, 
                            // ending 42 before endIndex

I think the main problem most people have is that it's only legible if you know the notation, though they usually pronounce that concern as something more like “^%@!&%!, Dave!”

That might be a slight overreaction: we're already in "interesting notation" territory with a[i..<j] and I for one am happy to embrace that fully…but I guess others feel differently.

Other possible directions, none of which I find very satisfying:

  • Extend the language to allow prefix et. al to become “lvalue-returning functions” so you could write

    x.prefix(3).sort()
    
  • Add variants of the prefix/suffix/drop methods to support mutation, such as

    x.mutatingPrefix(3) { $0.sort() }
    
  • Give up on mutation and just keep using the prefix/suffix/drop methods

My main complaint about all of these is that they all either maintain the size of, or expand, an API that is IMO already too large and that fails to draw all these similar operations together syntactically.

7 Likes

Yes, after further exploration of this design, I found some unfortunate issues. One of which I think is quite the deal breaker.

let i = s.startIndex + 4 // doesn't work :(

Do you have any other cases?

I’ve only been casually observing this thread, but one thing strikes me about the way all the options are breaking down is by butting heads against the range operator.

It seems to me that what we have here isn’t strictly a Range as it’s defined by the type, but rather a from-to offset relative to the receiver’s start & end index range. A “Range Slice” if you will.

What if this is really a different type, with a different operator? Don’t take this as a real suggestion, but rather as a straw-man for some operator syntax that I haven’t given any thought to:

x[slice: start➡️-4].sort()

I forgot to mention

s[someIndex++7] = y
s[...someIndex--7].sort()

Just wanted to throw the following into the mix:

var str = "Hello, playground"

extension Collection {

    subscript (offset offset: IndexDistance) -> Element {
        return self[self.index(offset < 0 ? self.endIndex : self.startIndex, offsetBy: offset)]
    }
}

extension Collection where Indices: Collection {

    subscript <MyRange: RangeExpression>(_ startOffset: Indices.IndexDistance, _ rangeInit: (Index, Index) -> MyRange, _ endOffset: Indices.IndexDistance) -> SubSequence where MyRange.Bound == Index {
        let startIndex = self.indices[offset: startOffset]
        let endIndex = self.indices[offset: endOffset]
        return self[rangeInit(startIndex, endIndex).relative(to: self)]
    }
}

struct WonkyRange<Bound> {
    var lowerBound: Bound
    var upperBound: Bound
}

func ..< <Bound>(lowerBound: Bound, upperBound: Bound) -> WonkyRange<Bound> {
    return WonkyRange(lowerBound: lowerBound, upperBound: upperBound)
}

extension Collection where Indices: Collection {

    subscript (offsets offsets: WonkyRange<Indices.IndexDistance>) -> SubSequence {
        let startIndex = self.indices[offset: offsets.lowerBound]
        let endIndex = self.indices[offset: offsets.upperBound]
        return self[startIndex ..< endIndex]
    }
}

str[offset: 4]          // "o"
str[offset: -2]         // "n"
str[4, ..<, -2]         // "o, playgrou"
str[-5, ..., -2]        // "roun"

str[offsets: 4 ..< -2]  // "o, playgrou"

let range = str.indices[offset: 4] ..< str.indices[offset: -2]
str[range]              // "o, playgrou"

I don't think your comparable is correct, strictly speaking. In the switch, .start() < .end(), but this depends on the length of the collection. So let a = [1, 2, 3]; a.indices[-3] < a.indices[2] should be true.

Of course, but that is unknowable absent a collection and trivial in the context of any particular collection. The point is that it should be possible to express a range that is some offset from the start to some offset from the end, which is a valid thing to want to express. By contrast, 2...(-3) is never a valid range.

Just wanted to make sure that the issue is raised, because I feel that it is an important enough point to stress.

My intuition is that it's worth taking the time to find a better way about it, seen as Comparable is a fairly central protocol in the language and has some guarantees that I think all types in the standard library that conform to it should uphold.

I was trying random variations in a playground, but didn't record what gave errors. FWIW the problem was a clash between the immutable and mutable variants of the "offset:" subscript function.

After @dabrahams’s friendly dope slap, insisting on positive unification and attractiveness in the syntax, and rereading the opinions expressed in the entire thread, I tried starting at the syntax end of the problem, and came up with this, that might be a reasonable compromise:

Part A

I think there’s some UX pressure for a very straightforward Int-domain solution. The trouble is, if it gets tangled up with the alternative “anchored” solution, nobody is happy. So I broke this up into two parts. Part A is the pure Int-domain, Part B is the rest of it.

In this part, the idea is just to provide subscripts taking plain ol’ Ints, either a single Int (element access) or an Int range (slice access). For example:

func printS<S> (_ s: S) where S: Sequence {
	for e in s {
		print (e, terminator: " ")
	}
	print ()
}

let c = "abcdefghijklmnopqrstuvwxyz"

print (c [at: 3]) // d
printS (c [at: 5..<10]) // f g h i j 
printS (c [at: 5...10]) // f g h i j k

The word “offset” doesn’t appear in the syntax. This was a deliberate choice, because I think the range semantics are uncomfortable when described as “offset” or “offsets”. Erring in the direction of brevity was also a deliberate choice regarding the "O(???)" issue.

The Part A implementation is trivial in principle:

extension Collection {
	subscript (at offset: Int) -> Self.Element {
		return self [index (startIndex, offsetBy: offset)]
	}
	subscript<R: RangeExpression> (at r: R) -> SubSequence where R.Bound == Int {
		let range = r.relative (to: [0 ..< self.count])
		return self [index (startIndex, offsetBy: range.lowerBound) ..< index (startIndex, offsetBy: range.upperBound)]
	}
}

but the RangeExpression subscript code is a hack in practice, a point I’ll come back to. For now, I’ve also omitted the extensions for mutating collections.

Part B1

The larger UX pressure seems to be for a way of offsetting relative to specific indices, especially startIndex and endIndex. It seems impossible to get agreement on special index marker or operator syntax, so I went with syntax that doesn’t require learning anything, except three symbols (.startOffset, .endOffset, .offset) that look like enum cases (but aren’t, for implementation reasons), extending @xwu's suggestion. For example:

	// Anchored near the start and end
	
	print (c [at: .startOffset]) // a
	printS (c [at: .startOffset ..< .endOffset]) // a b c d e f g h i j k l m n o p q r s t u v w x y z 
	printS (c [at: .startOffset + 1 ..< .endOffset - 1]) // b c d e f g h i j k l m n o p q r s t u v w x y 
	
	// Anchored near some index

	let i = c.index (c.startIndex, offsetBy: 3) // getting an index the old way
	print (c [at: .offset (i) + 1]) // e
	printS (c [at: .offset (i) + 1 ..< .endOffset]) // e f g h i j k l m n o p q r s t u v w x y z 

	// Partial ranges

	printS (c [at: (.startOffset + 1)...]) // b c d e f g h i j k l m n o p q r s t u v w x y z 
	printS (c [at: ..<(.endOffset - 1)]) // a b c d e f g h i j k l m n o p q r s t u v w x y 
	printS (c [at: ...(.endOffset - 1)]) // a b c d e f g h i j k l m n o p q r s t u v w x y z 

I believe the prefix/suffix/dropWhatever family of convenience functions can all be implemented in terms of this, so I think it meets @dabrahams’s unification goal, and I think the syntax is as simple as it can get without introducing anything new to recognize. The implementation is a bit more complicated than Part A, to hide the internals:

struct CollectionIndexOffset<C>: Comparable where C: Collection {
	fileprivate enum IndexType {
		case start
		case end
		case offset (C.Index)
	}
	
	fileprivate let indexType: IndexType
	fileprivate let offset: Int
	
	static var startOffset: CollectionIndexOffset {
		return CollectionIndexOffset (indexType: .start, offset: 0)
	}
	static var endOffset: CollectionIndexOffset {
		return CollectionIndexOffset (indexType: .end, offset: 0)
	}
	static func offset (_ index: C.Index) -> CollectionIndexOffset {
		return CollectionIndexOffset (indexType: .offset (index), offset: 0)
	}
	
	static func + (lhs: CollectionIndexOffset, rhs: Int) -> CollectionIndexOffset {
		switch lhs.indexType {
		case .start:
			return CollectionIndexOffset (indexType: .start, offset: lhs.offset + rhs)
		case .end:
			return CollectionIndexOffset (indexType: .end, offset: lhs.offset + rhs)
		case .offset (let index):
			return CollectionIndexOffset (indexType: .offset (index), offset: lhs.offset + rhs)
		}
	}
	
	static func - (lhs: CollectionIndexOffset, rhs: Int) -> CollectionIndexOffset {
		switch lhs.indexType {
		case .start:
			return CollectionIndexOffset (indexType: .start, offset: lhs.offset - rhs)
		case .end:
			return CollectionIndexOffset (indexType: .end, offset: lhs.offset - rhs)
		case .offset (let index):
			return CollectionIndexOffset (indexType: .offset (index), offset: lhs.offset - rhs)
		}
	}
	
	static func < (lhs: CollectionIndexOffset<C>, rhs: CollectionIndexOffset<C>) -> Bool {
		switch (lhs.indexType, rhs.indexType) {
		case (.start, .end), (.start, .offset), (.offset, .end):
			return true
		default:
			return false
		}
	}
	
	static func == (lhs: CollectionIndexOffset<C>, rhs: CollectionIndexOffset<C>) -> Bool {
		switch (lhs.indexType, rhs.indexType) {
		case (.start, .start), (.offset, .offset), (.end, .end):
			return true
		default:
			return false
		}
	}
}

extension Collection {
	private func _indexAtOffset (_ offset: CollectionIndexOffset<Self>) -> Self.Index {
		switch offset.indexType {
		case .start:
			return self.index (self.startIndex, offsetBy: offset.offset)
		case .end:
			return self.index (self.endIndex, offsetBy: offset.offset)
		case .offset (let i):
			return self.index (i, offsetBy: offset.offset)
		}
	}
	
	subscript (at offset: CollectionIndexOffset<Self>) -> Self.Element {
		return self [self._indexAtOffset (offset)]
	}
	
	subscript (at range: Range<CollectionIndexOffset<Self>>) -> SubSequence {
		return self [self._indexAtOffset (range.lowerBound) ..< self._indexAtOffset (range.upperBound)]
	}
	
	subscript (at range: ClosedRange<CollectionIndexOffset<Self>>) -> SubSequence {
		return self [self._indexAtOffset (range.lowerBound) ... self._indexAtOffset (range.upperBound)]
	}
	
	subscript (at range: PartialRangeFrom<CollectionIndexOffset<Self>>) -> SubSequence {
		return self [self._indexAtOffset (range.lowerBound) ..< self.endIndex]
	}
	
	subscript (at range: PartialRangeUpTo<CollectionIndexOffset<Self>>) -> SubSequence {
		return self [self.startIndex ..< self._indexAtOffset (range.upperBound)]
	}
	
	subscript (at range: PartialRangeThrough<CollectionIndexOffset<Self>>) -> SubSequence {
		return self [self.startIndex ... self._indexAtOffset (range.upperBound)]
	}
	
/*	subscript<R: RangeExpression> (range r: R) -> SubSequence where R.Bound == CollectionIndexOffset<Self> {
		let range = r.relative (to: ???)
		return self [self._indexAtOffset (range.lowerBound) ..< self._indexAtOffset (range.upperBound)]
	}*/

There’s no RangeExpression version of the subscript in this case, because I couldn’t find a hack to make it work. (Again, more on that later. Again, the mutability variants are omitted.)

Part B2

One defect with Part B1 is there’s no way to do this:

	let o = c.offset (i) - 2
	print (c [at: o])
	printS (c [at: o ... o + 3])

The easiest answer is just to add some implementation to make that work:

extension Collection
{
	var startOffset: CollectionIndexOffset<Self> { return CollectionIndexOffset.startOffset }
	var endOffset: CollectionIndexOffset<Self> { return CollectionIndexOffset.endOffset }
	func offset (_ index: Index) -> CollectionIndexOffset<Self> {
		return CollectionIndexOffset (indexType: .offset (index), offset: 0)
	}
}

The Partial Range Problem

Partial ranges with expressions at the ends turn out to be ugly because parentheses are always needed, while no parentheses are needed for the equivalent full ranges:

.startOffset ... .endOffset - 1 // vs.
...(.endOffset - 1)

One solution is to not support partial ranges, but I think the real problem is in the unary range operators themselves, which effectively have the “wrong” precedence for their semantics. Since it’s not obvious to me if this is something that can be fixed, I’ve punted on the problem and written the parentheses.

The RangeExpression Problem

RangeExpression would be a good way to simplify the Part B implementation, but it doesn’t seem to have the required expressibility. Given a RangeExpression, I can’t see how to get the lower and upper bounds, except .relative(to:), but there is no readily-available collection that this implementation can be relative to. In Part A, I cheated by creating a suitable collection. In Part B, there doesn’t seem to be any way of cheating.

Perhaps someone else can suggest a way of making RangeExpression work here.

Per naming guidelines, at: specifically means that the argument is an index, which is not the case here. The label used for this feature can be several things, but it absolutely, without debate, cannot be at:.

2 Likes

Where do the naming guidelines say that?

1 Like

This causes conflicts with part A though.

struct _CollectionWithIndexOffsetIndices<C: Collection> : Collection {
  typealias Element = Void
  typealias Index = CollectionIndexOffset<C>

  init() {}

  var startIndex: Index {
    return .startOffset
  }

  var endIndex: Index {
    return .endOffset
  }

  subscript(position: Index) -> Iterator.Element {
    fatalError("This should not be called")
  }

  func index(after i: Index) -> Index {
    return i + 1
  }
}

extension RangeExpression {
  func _relativeAsOffset<C>(to c: C) -> Range<C.Index> 
  where Bound == CollectionIndexOffset<C> {
    let offsetRange = _CollectionWithIndexOffsetIndices<C>()
    let relativeOffsetRange = relative(to: offsetRange)
    
    let lb = relativeOffsetRange.lowerBound
    let ub = relativeOffsetRange.upperBound
    
    let start = c._indexAtOffset(lb)
    let end = c._indexAtOffset(ub)
    return start..<end
  }
}

extension Collection {
  subscript<R: RangeExpression>(at offset: R) -> SubSequence
  where R.Bound == CollectionIndexOffset<Self> {
    get {
      return self[offset._relativeAsOffset(to: self)]
    }
  }
}

printS (c [at: .startOffset ..< .endOffset]) // a b c d e f g h i j k l m n o p q r s t u v w x y z 
printS (c [at: .startOffset + 1 ..< .endOffset - 1]) // b c d e f g h i j k l m n o p q r s t u v w x y 
  
// Anchored near some index

let i = c.index (c.startIndex, offsetBy: 3) // getting an index the old way
printS (c [at: .offset (i) + 1 ..< .endOffset]) // e f g h i j k l m n o p q r s t u v w x y z 

// Partial ranges

printS (c [at: (.startOffset + 1)...]) // b c d e f g h i j k l m n o p q r s t u v w x y z 
printS (c [at: ..<(.endOffset - 1)]) // a b c d e f g h i j k l m n o p q r s t u v w x y 
printS (c [at: ...(.endOffset - 1)]) // a b c d e f g h i j k l m n o p q r s t u v w x y z 

This issue was discussed at some length during review of SE-0023. As you know, the guidelines themselves offer three admonitions:

[1] Include all the words needed to avoid ambiguity for a person reading code where the name is used. For example, consider a method that removes the element at a given position within a collection.

// ...
employees.remove(at: x)

If we were to omit the word at from the method signature, it could imply to the reader that the method searches for and removes an element equal to x, rather than using x to indicate the position of the element to remove.

[2] Omit needless words. ... In particular, omit words that merely repeat type information.

// ...
allViews.removeElement(cancelButton)

In this case, the word Element adds nothing salient at the call site. ...

[3] Compensate for weak type information to clarify a parameter’s role. Especially when a parameter type is NSObject, Any, AnyObject, or a fundamental type such Int or String, type information and context at the point of use may not fully convey intent. ...

func add(_ observer: NSObject, for keyPath: String) // ... vague

To restore clarity, precede each weakly typed parameter with a noun describing its role:

func addObserver(_ observer: NSObject, forKeyPath keyPath: String) // ... clear

Given these guidelines, the following question was asked during review of SE-0023:

Many of the ObjC APIs will come across with prepositions in the external labels, such as:

func insert(_ anObject: AnyObject, atIndex index: Int)

... when we look at the various API options in front of us when designing new APIs, do we use "atIndex", "at", or simply stick with the default "index"? This is where I think the guidelines don't help steer us in any direction.

To which, you replied (with emphasis added by me):

That'll come in as

func insert(_ anObject: AnyObject, at index: Int)

... the intent would be to move you toward what I wrote above. ... we could be more explicit about that. I think the admonition to avoid merely repeating type information is the applicable one here, but it really only works to push you away from "index" in the context of an associated Index type, and NSArray doesn't have one.


So to synthesize:

The rules tell us that, ordinarily, repeating type information is discouraged but weak type parameters such as Int do require clarification.

However, the usage at: is preferred over atIndex: even when the type is Int, with Index regarded as "needless" type information instead of clarification. This is illustrated by explicit examples in SE-0005/6, which interpret the guidelines. Such usage was confirmed by you, a principal guideline author, during review of SE-0023 as the intended interpretation of the guidelines, which was merely not made explicit in the guideline text.

1 Like

The first paragraph above makes sense. I don’t think you need any of the rest of it to support an argument that more semantic information is needed in the label for always-integer offsets, as opposed to indices which often have “stronger” types. In particular, you seemed to be implying that (I had said) “at:” should be reserved for use with indices, which doesn’t sound right to me.