Shorthand for Offsetting startIndex and endIndex

dabrahams · March 8, 2018, 6:13pm

That’s why you don’t allow a plain Int here. My suggested syntax solves that problem. I don’t know why nobody seemed to take notice, even to trash the idea...

palimondo · March 8, 2018, 6:26pm

Oh, so the -- and ++ in those operators was to separate them form Int operators? Hmm…

dabrahams · March 8, 2018, 6:28pm

Among other things, yes. It also connotes a possible O(N) stepwise cost that is incurred in the general case

palimondo · March 8, 2018, 6:37pm

I still don’t grok the syntax you suggested. Is the ..< normal Range operator and the -- lifts the Int to some relative offset type or is ..<++ new operator?

ckeithray · March 8, 2018, 6:44pm

Even if the index type is not an Int, we can overload operators + and - to create Index offsets. I don't think we need to create new operators.

DeFrenZ · March 8, 2018, 6:55pm

The point of the operator was to avoid the explicit .start/.startIndex anchors.
++n := .start + n and --n := .end - n.
Though I'd probably prefer to avoid that, I think clarity suffers with it.

dabrahams · March 8, 2018, 7:40pm

Both. You can’t make j..<++3 compile unless ..<++ is an operator. There are of course variants that don’t require introducing that operator, e.g. j..<3++

ckeithray · March 8, 2018, 7:56pm

how about argument names instead of operators? Alpha to Omega would be nice, but Alpha looks like the letter A...

Here, I've used "ø" for offsets from startIndex and "Ω" for offsets from endIndex

subscript(ø start: Index, Ω end: Index ) -> SubSequence {...}

subscript(ø start: Index, ø end: Index ) -> SubSequence {...}

subscript(Ω start: Index, Ω end: Index ) -> SubSequence {...}

subscript(ø start: Index ) -> SubSequence {...}

subscript(Ω end: Index ) -> SubSequence {...}

subscript(ø start: Index, length: Int ) -> SubSequence {...}

subscript(Ω end: Index, length: Int ) -> SubSequence {...}

var a = arr[ø: 1, Ω: 3]  // 1 after start, 3 before end

var b = arr[ø: 1, ø: 3]  // 1 after start, 3 after start

var c = arr[Ω: 4, Ω: 2]  // 4 before end, 2 before end

var d = arr[ø: 2]        // 2 after start through to the end

var e = arr[Ω: 2]        // 2 before end through to the end

var f = arr[ø: 2, 3]     // 2 after start, for 3 elements

var g = arr[Ω: 2, 3]     // 2 before end, for 3 elements

xwu · March 8, 2018, 8:30pm

If we're going to start using symbols, then clearly this is a job for emoji.
I recommend arr[🔜(1)..<🔚(2)].

Like @DeFrenZ, I don't understand this desire to avoid words, which everyone understands.

Erica_Sadun · March 8, 2018, 11:39pm

Not only that but what does .start or .end belong to? It makes no sense to use dot notation as a "member" of the calling object in the subscript. Adding only relative values removes this need but introduces a challenge of trying to specify:

Offset from start (a normal "Index")
Offset from end (via Xiaodi Wu's negative number)
Offset from starting position (a vector form, start and magnitude)

(This doesn't take automatically reversed negative magnitude versions or allowing negative offsets from the end in the starting position of the range of indices.)

The cleanest solution I have seen to date (via Wux) is:

collection[ i ... j ] -> collection startIndex offset by i through startIndex offset by j. Sub ..< for "up to"
collection[ i ... -j ] -> collection startIndex offset by i through endIndex offset by -j, and again ..< variant

For the vector form:

collection[ i ...+ j ] -> collection startIndex offset by i through startIndex offset by i + j (ditto ..<)

If negative progressions are allowed, producing a reversed order with flipped arguments:

collection [ i ...- j ] -> collection startIndex offset by (i - j) through startIndex offset by i, with items reversed
collection [ i ...- -j ] -> collection endIndex offset by -j through startIndex offset by i, with items reversed

and so forth. You may need to burn an operator or four but I don't think you need to refer to a keyword (start, end) or play games with collection property inference (that collection[startIndex] refers to the collection's startIndex and not the startIndex of self who owns this scope). There are a bunch of edge cases that I've left out, but it wouldn't be terribly hard to enumerate all possibilities of calling patterns.

MutatingFunk · March 8, 2018, 11:53pm

I may be missing something obvious, but if i and j are integers, isn't i ...+ j equivalent to i ... i + j? Do we really need the vector form?

QuinceyMorris · March 9, 2018, 12:12am

A couple of reminders:

Forms like collection[ i ... j ] aren't possible, because of the ambiguity when Collection.Index == Int.
The Collection subscripts (with index and range parameters) are documented to be O(1).
c.index(c.endIndex, offsetBy: -offset) isn't allowed for Collection. It needs BidirectionalCollection.

#3 has code-arounds (e.g. c.index(c.startIndex, offsetBy: c.count-offset), but they're necessarily O(n), conflicting with #2.

QuinceyMorris · March 9, 2018, 12:14am

Also, forms that depend on spaces to resolve syntactical ambiguity around range operators have been generally unappealing, so far.

Also, forms that depend on parentheses solely to eliminate unary-operator-coalescing issues have not seen much support, so far.

Also, forms that incorporate symbols with no clear, pre-existing meaning in regard to indexing seem pointless, since they tend to mystify rather than demystify.

(This thread has been going for a while.)

QuinceyMorris · March 9, 2018, 6:16am

One more try. This steals good ideas from anyone and everyone contributing to this thread.

It’s a complete but minimal implementation of needed functionality, AFAICT.
It does not use range operators.
it does not use negative offsets.
It doesn’t introduce any new operators.
It doesn’t depend on spacing or parenthesizing to disambiguate syntax.
It doesn’t violate any existing O(1) complexity constraints.
Its semantics are reasonably transparent (no need for a decoder ring).

There is one set of functions for all Sequences:

	var x = [10,20,30,40,50,60,70,80,90]

	_ = x.slice (skipFirst: 1, skipLast: 2) // [20,30,40,50,60,70]
	_ = x.slice (skipFirst: 1, take: 2) // [20,30]
	_ = x.slice (skipLast: 1, take: 2) // [70,80]

plus index and range functions for all Collections, with complexity O(n) at worst:

	let i1 = x.index (afterFirst: 1) // .startIndex+1
	let i2 = x.index (beforeLast: 1) // .endIndex-2 (not .endIndex-1)
	
	let r1 = x.range (skipFirst: 1, skipLast: 2) // .startIndex+1 ..< .endIndex-2
	let r2 = x.range (skipFirst: 1, take: 2) // .startIndex+1 ..< .startIndex+3
	let r3 = x.range (skipLast: 1, take: 2) // .endIndex-3 ..< .endIndex-1

which, being already resolved, can be used with standard O(1) subscripts:

	_ = x [i1] // 20
	_ = x [i2] // 80 (not 90)
	
	_ = x [r1] // [20,30,40,50,60,70]
	_ = x [r2] // [20,30]
	_ = x [r3] // [70,80]

plus, finally, new subscripts for brevity and mutability, with complexity O(n) at worst:

	x [afterFirst: 1] = 42
	x [beforeLast: 1] = 43
	
	x [skipFirst: 1, skipLast: 2] = [100, 200, 300]
	x [skipFirst: 1, take: 2] = [100, 200, 300]
	x [skipLast: 1, take: 2] = [100, 200, 300]

Obviously, convenience functions may be added at will. For example, x [skipFirst: 0, take 3] might be x [takeFirst: 3], or x.first (3), etc.

The implementation is pretty short:

public extension Sequence {
	
	public func slice (skipFirst skip: Int, take: Int) -> SubSequence {
		precondition (take >= 0 && skip >= 0)
		
		if skip > 0 {
			return dropFirst (skip).prefix (take)
		}
		else {
			return prefix (take)
		}
	}
	
	public func slice (skipLast skip: Int, take: Int) -> SubSequence {
		precondition (take >= 0 && skip >= 0)
		
		if skip > 0 {
			return dropLast (skip).suffix (take)
		}
		else {
			return suffix (take)
		}
	}
	
	public func slice (skipFirst firstSkip: Int, skipLast lastSkip: Int) -> SubSequence {
		precondition (firstSkip >= 0 && lastSkip >= 0)
		
		switch (firstSkip > 0, lastSkip > 0)
		{
		case (true, true):
			return dropFirst (firstSkip).dropLast (lastSkip)
		case (true, false):
			return dropFirst (firstSkip)
		case (false, true):
			return dropLast (lastSkip)
		case (false, false):
			return dropFirst (0)
		}
	}
}

public extension Collection {
	
	public func range (skipFirst skip: Int, take: Int) -> Range<Index> {
		precondition (skip >= 0 && skip >= 0)
		
		let start = index (afterFirst: skip)
		let end = index (start, offsetBy: take)
		
		return start ..< end
	}
	
	public func range (skipLast skip: Int, take: Int) -> Range<Index> {
		precondition (skip >= 0 && skip >= 0)
		
		let start = _index (beforeLast: skip + take)
		let end = index (start, offsetBy: take)
		
		return start ..< end
	}
	
	public func range (skipFirst firstSkip: Int, skipLast lastSkip: Int) -> Range<Index> {
		precondition (firstSkip >= 0 && lastSkip >= 0)
		
		return index (afterFirst: firstSkip) ..< _index (beforeLast: lastSkip)
	}
	
	public func index (afterFirst skip: Int) -> Index {
		precondition (skip >= 0)
		
		guard skip > 0 else { return startIndex }
		
		return index (startIndex, offsetBy: skip)
	}
	
	public func index (beforeLast skip: Int) -> Index {
		return _index (beforeLast: skip + 1)
	}
	
	public func _index (beforeLast skip: Int) -> Index {
		precondition (skip >= 0)
		
		guard skip > 0 else { return endIndex }
		
		return index (startIndex, offsetBy: count - skip)
	}
	
	public subscript (afterFirst skip: Int) -> Element {
		return self [index (afterFirst: skip)]
	}
	
	public subscript (beforeLast skip: Int) -> Element {
		return self [index (beforeLast: skip)]
	}
	
	public subscript (skipFirst skip: Int, take take: Int) -> Self.SubSequence {
		return self.slice (skipFirst: skip, take: take)
	}
	
	public subscript (skipLast skip: Int, take take: Int) -> Self.SubSequence {
		return self.slice (skipLast: skip, take: take)
	}
	
	public subscript (skipFirst firstSkip: Int, skipLast lastSkip: Int) -> Self.SubSequence {
		return self.slice (skipFirst: firstSkip, skipLast: lastSkip)
	}
}

public extension BidirectionalCollection {
	
	public func _index (beforeLast skip: Int) -> Index {
		precondition (skip >= 0)
		
		guard skip > 0 else { return endIndex }
		
		return index (endIndex, offsetBy: -skip)
	}
}

public extension MutableCollection {
	
	public subscript (afterFirst skip: Int) -> Element {
		get { return self [index (afterFirst: skip)] }
		set { self [index (afterFirst: skip)] = newValue }
	}
	
	public subscript (beforeLast skip: Int) -> Element {
		get { return self [index (beforeLast: skip)] }
		set { self [index (beforeLast: skip)] = newValue }
	}
	
	public subscript (skipFirst skip: Int, take take: Int) -> Self.SubSequence {
		get { return self.slice (skipFirst: skip, take: take) }
		set { self [self.range (skipFirst: skip, take: take)] = newValue }
	}
	
	public subscript (skipLast skip: Int, take take: Int) -> Self.SubSequence {
		get { return self.slice (skipLast: skip, take: take) }
		set { self [self.range (skipLast: skip, take: take)] = newValue }
	}
	
	public subscript (skipFirst firstSkip: Int, skipLast lastSkip: Int) -> Self.SubSequence {
		get { return self.slice (skipFirst: firstSkip, skipLast: lastSkip) }
		set { self [self.range (skipFirst: firstSkip, skipLast: lastSkip)] = newValue }
	}
}

For simplicity, the Sequence functions are implemented in terms of existing functions, but a real implementation ought to be the other way round: the new functions should be primitive, and the existing functions should be convenience methods.

nuclearace · March 9, 2018, 1:48pm

I was thinking about this kind of approach yesterday, glad to see someone suggested this. I feel like this is some sort of compromise. The biggest issue I had with this approach is thinking of names for the params/labels. take for example is not immediately obvious to me what it does.

Another thing it doesn't really address is the brevity that some people want. It seems some people really do want powerful slicing expressions like you have in more script oriented languages like Python or Perl.

nuclearace · March 9, 2018, 1:52pm

Perl does have * to refer to the end of an array.

my @alphabet = 'A' .. 'Z';
say @alphabet[*-1];  # OUTPUT: «Z␤» 
say @alphabet[*-2];  # OUTPUT: «Y␤» 
say @alphabet[*-3];  # OUTPUT: «X␤»

It also allows that to be used in the slicing syntax.

anthonylatsis · March 9, 2018, 3:54pm

Can somebody be so kind to explain why can't we have proper Int-based subscripts on String like in Array, for instance, instead of discussing how to simplify
s[s.index(s.startIndex, offsetBy: 7)...s.index(s.startIndex, offsetBy: 11)]?

P.S. According to the String Manifesto, String will become a Collection of Characters again (It hasn't yet become, has it?), which emphasizes the question for me even more.

nuclearace · March 9, 2018, 4:10pm

Someone correct me if I'm wrong, but it has to do with Unicode correctness and algorithm complexity. Because of the way Collection is defined, subscripts should be O(1). But to be Unicode correct, you have to traverse the string, which can be O(n). So the idea is that if we had Int based subscripts on String, all of a sudden you have O(n) behavior on a subscript.

It also has to do with what a "character" really is. Because Unicode defines code points that combine to form one grapheme. What happens when you have a String that is really composed of multiple Unicode code points, but composes to a single grapheme?

That being said, I'm also in the camp that thinks there should be some easier way to express slicing up a string that doesn't revolve around a ton of index calculation.

An extreme case:



let zalgo = "Ḩ̷͕̺͍͉̇͛́̀͑̍̕͢e̢̢̘̫̺͖͓̥͎̐̈̀̄̚͞ c̵̛̗̯̫̫͓̪̓͛̂͘ͅo̢̡̳͕̰̗̻͇̽̿̈͗̀̐̿̌͠m̢̪͕̮̱̘͇̈̉͗͆̀̕͘e̳͕͓͚͎̪̟̬̰͆̓͑̃̿̊ͅs̵͇̳͓̥̼̠͌͒̎̄͒̅̿͢"



print(zalgo.count) // 8
print(zalgo.utf16.count) // 112

xwu · March 9, 2018, 4:37pm

It has. Collections can have opaque indices.

QuinceyMorris · March 9, 2018, 6:52pm

Previously, I had skip/keep but that looked a bit weird in a subscript on the LHS of an assignment, because it replaces the thing you're "keeping". Before that I had skip/next, which reads better, but is slightly odd in the end-relative case, where it means "next to the left". There probably isn't any single word that is totally clear on its own.

If we can cover the basic functionality (and meet the basic requirements) in a straightforward, consistent way, I don't see a problem adding "convenience" API on top that introduces operators for brevity, or uses Python-style signed offsetting. I just wasn't much in favor of those being the only choices.

I believe the ultimate goal is to be able to complete SE-0132.