Shorthand for Offsetting startIndex and endIndex

QuinceyMorris · February 12, 2018, 8:13am

It's a unification in the sense that all other slicing (assuming I haven't missed something) is expressible in terms of this single syntax. In actual use, a convenience method like prefix is often preferable. Many of the examples I gave are not the best way to do something, they're more about checking that everything works.

That would be c.dropFirst(5) though! I mention it because people have made the wrong convenience method conversion a couple of times in this thread, and there's probably important usability information in that fact (if we can figure out what).

Yes, I can see that as one way of proceeding. The .at static method could just be eliminated from the proposal without harming anything else, and the extra parentheses are ugly. But I suspect that having the more complete and consistent anchored syntax would end up being wanted.

BTW, I tried using unary .. instead of .at for this. It's cleaner in one way, but that's an awful lot of dots in a row. I also tried various other symbol combinations for both unary and binary operators, but anything non-standard just looks incomprehensible, IMO.

In fact, I went that route first, but it turns out to be really irritating not to have both the open and closed range operators, especially for a collection that is not bi-directional. It also means there is a new operator to learn, so in the end I moved away from that approach.

QuinceyMorris · February 12, 2018, 7:12pm

Incidentally, your IndexOffsetRange as written doesn't actually work, because there's no way to specify the end of the range.

0..0 means "start to start" (an empty range), and 0..-1 means "start to end-1" (i.e. drop last element). You would have to use -1 for endIndex and that introduces even more off-by-one danger than was discussed earlier.

For a while I played with a binary .. operator for insetting, not offsetting, with negative values meaning "the other anchor":

0..0 means "start to end (all)"
1..1 means "start+1 to end-1 (drop first and last)"
0..-1 means "start to start+1 (first element)"
-1..0 means "end-1 to end (last element)"
-20..-20 means "end-20 to start+20"

It's really simple and nice, but in actual use it seemed just too hard to figure out/remember that the minus sign swapped the anchor, not the inset.

Letan · February 12, 2018, 7:49pm

I used nil to represent this internally. I didn't think it would be suitable to allow nil for the operators. And a 'partial range' would better serve this intent.

QuinceyMorris · February 12, 2018, 9:45pm

That's true if the ends of the range are constants, but it's awkward if they're expressions calculated at run time.

palimondo · February 12, 2018, 11:50pm

+💯

Given this topic isn’t here the first time, maybe it helps to identify hotspots of contention and confusion:

Collection’s slicing overloads for Sequence methods (prefix, suffix, dropFirst, dropLast) — general unification of these to adopt subscript as shorthand syntax
Concerns over performance characteristics of subscript arithmetic for non-Integer backed indices in light of performance guarantees vis-à-vis Collection protocol family
Mutating subscripted slice @dabrahams brought up

I have re-read this thread again, and it is not clear to me what would be reasons against adopting something like @Letan's .. solution as relative arithmetic against start index and end indexes. I would maybe change the name to SliceRange and propose we borrow Python's terminology for these operations on Sequence and Collection, calling it slicing.

I believe this is closes to what @ben-cohen and @dabrahams were describing as future directions in Strings in Swift 4 :

Slicing a Sequence

SliceRange describes Sequence operations that return subsequence relative to the specified bounds. It is formed using the .. operator from integer bounds. Positive bounds are relative to the start of the sequence. Negative bounds are relative to the end of the sequence.

Half-Bounded `SliceRange`

`Sequence` Slice	Condition	Equivalent `Sequence` Operations
`s[i..]`	`0 <= i`	`s.dropFirst(i)`
	`i < 0`	`s.suffix(abs(i))`
`s[..i]`	`0 <= i`	`s.prefix(i)`
	`i < 0`	`s.dropLast(abs(i))`
Edit: fixed, thanks to @QuinceyMorris

Bounded `SliceRange` : `s[i..j]`

Conditions	`0 <= j`	`j < 0`
`0 <= i`	`s.prefix(j).dropFirst(i)`	`s.dropFirst(i).dropLast(abs(j))`
`i < 0`	unsupported	`s.suffix(abs(i)).dropLast(abs(j))`

The SliceRange combination with lowerBound from end of the sequence and upperBound from start of sequence is not supported for Sequence, because it can not be expressed in terms of successive sequence operations without knowing the sequence length. This limitation could be potentially lifted, if I wasn’t aiming to express the slices in terms of sequence ops. The suffix implementation is accumulating data in buffer, keeping an element count from the beginning to trim the resulting sequence is doable.

Here’s fully working implementation sketch I’ve put together in a Playground:

infix operator .. : RangeFormationPrecedence
prefix operator ..
postfix operator ..
prefix operator ..-
infix operator ..- : RangeFormationPrecedence

public struct SliceRange {
    var lowerBound: Int
    var upperBound: Int?
}

func ..(lhs: Int, rhs: Int) -> SliceRange {
    return SliceRange(lowerBound: lhs, upperBound: rhs)
}

prefix func ..(upperBound: Int) -> SliceRange {
    return SliceRange(lowerBound: 0, upperBound: upperBound)
}

postfix func ..(lowerBound: Int) -> SliceRange {
    return SliceRange(lowerBound: lowerBound, upperBound: nil)
}

// Negating `SliceRange` to allow slices with lower bound
// relative to the end of `Sequence`
public prefix func -(slice: SliceRange) -> SliceRange {
    return SliceRange(lowerBound: -slice.lowerBound,
                      upperBound: slice.upperBound)
}

// Resolves unary operator juxtaposition of `..` and `-` for
// `SliceRange` with upper bound from the end of `Sequence`
public prefix func ..-(upperBound: Int) -> SliceRange {
    return ..(-upperBound)
}

public func ..-(lowerBound: Int, upperBound: Int) -> SliceRange {
    return lowerBound..(-upperBound)
}


extension Sequence {
    subscript(slice: SliceRange) -> SubSequence {
        switch (slice.lowerBound, slice.upperBound) {
        case (0, .some(let upTo)) where 0 <= upTo:
            return self.prefix(upTo)
        case (0, .some(let n)) where n < 0:
            return self.dropLast(-n)
        case (let n, nil) where 0 <= n:
            return self.dropFirst(n)
        case (let length, nil) where length < 0:
            return self.suffix(-length)
        case (let n, .some(let upTo)) where 0 <= n && 0 <= upTo:
//            return self.prefix(upTo).dropFirst(n)
// Value of type 'Self.SubSequence' has no member 'dropFirst'
            return (self.prefix(upTo) as! AnySequence<Element>)
                .dropFirst(n) as! Self.SubSequence
        case (let n, .some(let upTo)) where 0 <= n && upTo < 0:
//            return self.dropFirst(n).dropLast(-upTo)
            return (self.dropFirst(n) as! AnySequence<Element>)
                .dropLast(-upTo) as! Self.SubSequence
        case (let length, .some(let upTo)) where length < 0 && 0 <= upTo:
            fatalError("Unsuported SliceRange combination for Sequence: lowerBound from end of the sequence and upperBound from start of sequence.")
        case (let n, .some(let upTo)) where n < 0 && upTo < 0:
// XXX Why is the forced cast suggested by compiler necessary???
            return self.suffix(-n).dropLast(-upTo) as! Self.SubSequence
        default:
            fatalError("Unexpected combination: " + String(describing: slice))
        }
    }
}


// Tests

let seq = sequence(first: 0) { $0 < 6 - 1 ? $0 &+ 1 : nil }

print(Array(  seq             )) // [0, 1, 2, 3, 4, 5]
print(Array(  seq[2..]        )) // [2, 3, 4, 5]
print(Array(  seq[..2]        )) // [0, 1]
print(Array(  seq[(-2)..]     )) // [4, 5]
print(Array(  seq[-2..]       )) // [4, 5]
print(Array(  seq[..(-2)]     )) // [0, 1, 2, 3]
print(Array(  seq[..-2]       )) // [0, 1, 2, 3]
print(Array(  seq[2..4]       )) // [2, 3]
print(Array(  seq[2..(-1)]    )) // [2, 3, 4]
print(Array(  seq[2..-1]      )) // [2, 3, 4]
//print(Array(  seq[-5..3]      )) // unsupported combination
print(Array(  seq[-3..(-1)]   )) // [3, 4]
print(Array(  seq[-3..-1]     )) // [3, 4]

Slicing a Collection

I don’t have an up-to-date master build on hand, but I don’t seen any fundamental obstacles to using @Letan’s code:

extension SliceRange: RangeExpression {
    public func relative<C: Collection>(to collection: C) -> Range<Bound>
        where C.Index == Bound {
            let startBase = lowerBound < 0
                ? collection.endIndex
                : collection.startIndex
            let endBase = upperBound == nil || upperBound! < 0
                ? collection.endIndex
                : collection.startIndex
            
            let start = collection.index(startBase, offsetBy: lowerBound)
            let end = collection.index(endBase, offsetBy: upperBound ?? 0)
            
            return start..<end
    }
}

To address the performance concerns, we should clearly document that SliceRange conversion to Range (via RangeExpression protocol conformance) performs computation whose performance guarantees are given by the underlying collection.

I’m not sure if the above slicing implementation for Collections automatically works for mutable range modifications as @dabrahams mentions above… Does it?

xwu · February 13, 2018, 12:01am

No, prefix(upTo:) takes an Index, not an offset. The distinction is already confusing enough when Index == Int: the spelling should not make them more confusable.

Adding new operators works against this purpose, especially when they're only one character apart from existing operators: x[2..] and x[2...] would be entirely different for a slice, and that's not acceptable.

QuinceyMorris · February 13, 2018, 12:23am

Negative offsets don't work at all for Collection, because it has no backwards stepping of indices. Collection doesn't even have a dropLast method. Anything to do with last or negative offsets requires BidirectionalCollection.

Then, again, once you get to the offset's sign indicating an anchor (start or end), there is no way of representing an un-offset end, absent a -0 different from 0. I don't see the viability of any solution that prevents referring to the end of the collection (in the sense of unmodified endIndex).

palimondo · February 13, 2018, 12:23am

Please don’t immediately focus on crusading against adding operators, but think more about lifting the computation of index relative offsets into it’s own SliceRange type. Admittedly, the .. operator is crucial for ease of use. I think the SliceRange could be pretty fundamental and prominent type, with potential to simplify a ton of out APIs.

The core issue is that Range as is can not be reused for relative index computation. It is already taken for slicing with fully formed indices and it enforces lowerBound < upperBound - making it unusable for s[2:-3] case. I feel like relative indexing is more common, and currently we have to jump through too many hoops to compute the indexes manually. Hence this thread.

Lifting the whole conversion between edge-relative and fully formed indices gives us place to fully document the behavior. Ranges have 3 letter operators. SliceRange has 2 letter operator. IMHO, not that confusing.

palimondo · February 13, 2018, 12:35am

I’m not sure what you mean… Sequences don’t even have ends, yet we have defined methods like suffix and dropLast on them. By that standard our whole Sequence protocol is broken… But all Collections are also Sequences and all these methods are available on them.

I’ll guess you are thinking about performance guarantees? As @Karl said before, it is the type of your collection that gives you the performance guarantees, but it is the Sequence and Collection protocol, that defines the functional capability. We have fallback implementations on everything, so you will always get your slice back correctly. It just might take longer.

xwu · February 13, 2018, 12:42am

That isn’t true. Range can be used just fine as long as the bounds aren’t naked integers but are types that express the desired semantics, as I demonstrated above.

The difference is that you want to create a different Range type in order to use operators to distinguish indices from offsets; that is unacceptable to me for the reasons above, namely that it decreases rather than increases clarity.

QuinceyMorris · February 13, 2018, 2:03am

Yep, sorry, I was looking at the official documentation for Collection (Collection | Apple Developer Documentation) which doesn't document all of the methods. (I guess it only documents methods for which Collection provides alternative, faster implementations.)

I'm a bit confused about your range semantics, though. The 2's in s[2..] and s[2..4] don't mean the same thing ("take the last 2" vs. "drop the first 2", according to my reading of your definitions). However that's resolved, it highlights the difficulty for an average person to grasp what undecorated numbers mean.

Based on existing comments, I don't see the community buying into something that isn't immediately and blindingly obvious (syntax), and doesn't need mental decoding (semantics).

palimondo · February 13, 2018, 3:10am

I believe I have captured same semantics as Python’s slicing. Given the 2s are both positive and are sitting at the lowerBounds, they do mean exactly the same thing. ~~You are probably getting lost in the inversed order of operations necessary to first get the end of the sequence (4) relative from beginning.~~

~~This demonstrates that correctly composing sequence operations requires more practice than interpreting relative indices. This is the reason to deprecate the sequence methods in favor of slicing.~~ This demonstrates that posting after midnight is unwise.

QuinceyMorris · February 13, 2018, 4:11am

It's the table under this heading that's wrong, with the sign tests interchanged. Your code does what you meant.

Given that seq[2..4] prints as 2 3, the binary .. operator is "half-open" (like ..<) not "closed" (like ...). If it's intended to be half-open, you also need a way of referencing "end" on the RHS of the binary operator.

palimondo · February 13, 2018, 5:24am

I meant Half-Bounded and Bounded SliceRange. I’ve fixed it now, thanks for pointing it out! I shouldn’t post after 1am…
(I still can’t see where the table doesn’t match code, but it could be it my lack of )

QuinceyMorris · February 13, 2018, 5:33am

Reformatting the text of the first 2 lines of the table:

Sequence Slice                  s[i..]	

Condition                       0 <= i
Equivalent Sequence Operations  s.suffix(i)

Condition                       i < 0	
Equivalent Sequence Operations  s.dropFirst(abs(i))

So if i is 2, that text says it should do suffix(2), but it actually does dropFirst(2), right?

And if i is -2, it actually does suffix(abs(-2)), right?

palimondo · February 13, 2018, 5:43am

You are right. Thank you! Fixed in edit.

Erica_Sadun · February 27, 2018, 2:53pm

It's the ergonomics and the readability. I like subscripting with labels because they provide the affordances that best suit the code while warning-in-use that you are conducting a non-constant-time lookup. I'd much rather have sugar than the current index calls (with their long and needlessly repetitive strands) or these kinds of workarounds that more or less mandate comments to explain what you're trying to achieve.

And while I agree that most production code does not use specific indices, Swift is also a teaching language, an interviewing language, a scripting language, and a prototyping languages. All these are worthy uses and should not be discarded as not in line with the direction and philosophy of the language, especially when there's such an easy, clean, and obvious approach as labeled-subscript sugar.

MutatingFunk · February 27, 2018, 5:20pm

I'd like to see a labelled offset subscript added, as long as it is as efficient as manually written code.

Just as an offhand depiction (apologies for any errors):

extension Collection {
    subscript(offset offset: Int) { //corrected Index → Int
        get {return self[self.index(self.startIndex, offsetBy: offset)]}
        set {self[self.index(self.startIndex, offsetBy: offset)] = newValue}
    }
}

It looks to me like the index lookup is happening twice. In any user-code written currently, the index would be calculated, stored, and reused for both get and set. Can Swift optimise away the duplicate lookup here, or is there any way of avoiding it in the subscript implementation?

Michael_Ilseman · February 27, 2018, 5:40pm

Right, I'm not advocating that the current state of things is good. That's why my earlier reply was pitching something like s[offset: 0...4].

QuinceyMorris · February 27, 2018, 6:52pm

I'm pretty sure you meant subscript(offset offset: Int).

Just a reminder before we get on this treadmill again:

It's not problematic to find a solution for start-relative offsets. However, such a solution is unsatisfying for two reasons:

People need to use end-relative offsets too, to avoid other horrendous-looking expressions involving c.index(c.endIndex, offsetBy:-offset).
The ideal (and the pitch in this thread) is to provide a single, unified syntax that can express all combinations of prefix/suffix/dropFirst/dropLast (and any others I forgot), as a preliminary to finishing the work in SE-0132. That needs end-relative offsets.

This thread has stagnated because there's no consensus about syntax for end-relative offsets.

Shorthand for Offsetting startIndex and endIndex

Slicing a Sequence

Half-Bounded SliceRange

Bounded SliceRange : s[i..j]

Slicing a Collection

Half-Bounded `SliceRange`

Bounded `SliceRange` : `s[i..j]`