That’s why you don’t allow a plain Int here. My suggested syntax solves that problem. I don’t know why nobody seemed to take notice, even to trash the idea...
Oh, so the --
and ++
in those operators was to separate them form Int
operators? Hmm…
Among other things, yes. It also connotes a possible O(N) stepwise cost that is incurred in the general case
I still don’t grok the syntax you suggested. Is the ..<
normal Range
operator and the --
lifts the Int
to some relative offset type or is ..<++
new operator?
Even if the index type is not an Int, we can overload operators +
and -
to create Index offsets. I don't think we need to create new operators.
The point of the operator was to avoid the explicit .start
/.startIndex
anchors.
++n
:= .start + n
and --n
:= .end - n
.
Though I'd probably prefer to avoid that, I think clarity suffers with it.
Both. You can’t make j..<++3
compile unless ..<++
is an operator. There are of course variants that don’t require introducing that operator, e.g. j..<3++
how about argument names instead of operators? Alpha to Omega would be nice, but Alpha looks like the letter A...
Here, I've used "ø" for offsets from startIndex and "Ω" for offsets from endIndex
subscript(ø start: Index, Ω end: Index ) -> SubSequence {...}
subscript(ø start: Index, ø end: Index ) -> SubSequence {...}
subscript(Ω start: Index, Ω end: Index ) -> SubSequence {...}
subscript(ø start: Index ) -> SubSequence {...}
subscript(Ω end: Index ) -> SubSequence {...}
subscript(ø start: Index, length: Int ) -> SubSequence {...}
subscript(Ω end: Index, length: Int ) -> SubSequence {...}
var a = arr[ø: 1, Ω: 3] // 1 after start, 3 before end
var b = arr[ø: 1, ø: 3] // 1 after start, 3 after start
var c = arr[Ω: 4, Ω: 2] // 4 before end, 2 before end
var d = arr[ø: 2] // 2 after start through to the end
var e = arr[Ω: 2] // 2 before end through to the end
var f = arr[ø: 2, 3] // 2 after start, for 3 elements
var g = arr[Ω: 2, 3] // 2 before end, for 3 elements
If we're going to start using symbols, then clearly this is a job for emoji.
I recommend arr[🔜(1)..<🔚(2)]
.
Like @DeFrenZ, I don't understand this desire to avoid words, which everyone understands.
Not only that but what does .start
or .end
belong to? It makes no sense to use dot notation as a "member" of the calling object in the subscript. Adding only relative values removes this need but introduces a challenge of trying to specify:
- Offset from start (a normal "Index")
- Offset from end (via Xiaodi Wu's negative number)
- Offset from starting position (a vector form, start and magnitude)
(This doesn't take automatically reversed negative magnitude versions or allowing negative offsets from the end in the starting position of the range of indices.)
The cleanest solution I have seen to date (via Wux) is:
- collection[ i ... j ] -> collection
startIndex
offset by i throughstartIndex
offset by j. Sub..<
for "up to" - collection[ i ... -j ] -> collection
startIndex
offset by i throughendIndex
offset by -j, and again..<
variant
For the vector form:
- collection[ i ...+ j ] -> collection
startIndex
offset by i throughstartIndex
offset by i + j (ditto ..<)
If negative progressions are allowed, producing a reversed order with flipped arguments:
- collection [ i ...- j ] -> collection
startIndex
offset by (i - j) throughstartIndex
offset by i, with items reversed - collection [ i ...- -j ] -> collection
endIndex
offset by -j throughstartIndex
offset by i, with items reversed
and so forth. You may need to burn an operator or four but I don't think you need to refer to a keyword (start
, end
) or play games with collection property inference (that collection[startIndex]
refers to the collection's startIndex
and not the startIndex
of self
who owns this scope). There are a bunch of edge cases that I've left out, but it wouldn't be terribly hard to enumerate all possibilities of calling patterns.
I may be missing something obvious, but if i and j are integers, isn't i ...+ j
equivalent to i ... i + j
? Do we really need the vector form?
A couple of reminders:
-
Forms like
collection[ i ... j ]
aren't possible, because of the ambiguity whenCollection.Index == Int
. -
The
Collection
subscripts (with index and range parameters) are documented to be O(1). -
c.index(c.endIndex, offsetBy: -offset)
isn't allowed forCollection
. It needsBidirectionalCollection
.
#3 has code-arounds (e.g. c.index(c.startIndex, offsetBy: c.count-offset)
, but they're necessarily O(n), conflicting with #2.
Also, forms that depend on spaces to resolve syntactical ambiguity around range operators have been generally unappealing, so far.
Also, forms that depend on parentheses solely to eliminate unary-operator-coalescing issues have not seen much support, so far.
Also, forms that incorporate symbols with no clear, pre-existing meaning in regard to indexing seem pointless, since they tend to mystify rather than demystify.
(This thread has been going for a while.)
One more try. This steals good ideas from anyone and everyone contributing to this thread.
-
It’s a complete but minimal implementation of needed functionality, AFAICT.
-
It does not use range operators.
-
it does not use negative offsets.
-
It doesn’t introduce any new operators.
-
It doesn’t depend on spacing or parenthesizing to disambiguate syntax.
-
It doesn’t violate any existing O(1) complexity constraints.
-
Its semantics are reasonably transparent (no need for a decoder ring).
There is one set of functions for all Sequence
s:
var x = [10,20,30,40,50,60,70,80,90]
_ = x.slice (skipFirst: 1, skipLast: 2) // [20,30,40,50,60,70]
_ = x.slice (skipFirst: 1, take: 2) // [20,30]
_ = x.slice (skipLast: 1, take: 2) // [70,80]
plus index and range functions for all Collection
s, with complexity O(n) at worst:
let i1 = x.index (afterFirst: 1) // .startIndex+1
let i2 = x.index (beforeLast: 1) // .endIndex-2 (not .endIndex-1)
let r1 = x.range (skipFirst: 1, skipLast: 2) // .startIndex+1 ..< .endIndex-2
let r2 = x.range (skipFirst: 1, take: 2) // .startIndex+1 ..< .startIndex+3
let r3 = x.range (skipLast: 1, take: 2) // .endIndex-3 ..< .endIndex-1
which, being already resolved, can be used with standard O(1) subscripts:
_ = x [i1] // 20
_ = x [i2] // 80 (not 90)
_ = x [r1] // [20,30,40,50,60,70]
_ = x [r2] // [20,30]
_ = x [r3] // [70,80]
plus, finally, new subscripts for brevity and mutability, with complexity O(n) at worst:
x [afterFirst: 1] = 42
x [beforeLast: 1] = 43
x [skipFirst: 1, skipLast: 2] = [100, 200, 300]
x [skipFirst: 1, take: 2] = [100, 200, 300]
x [skipLast: 1, take: 2] = [100, 200, 300]
Obviously, convenience functions may be added at will. For example, x [skipFirst: 0, take 3]
might be x [takeFirst: 3]
, or x.first (3)
, etc.
The implementation is pretty short:
public extension Sequence {
public func slice (skipFirst skip: Int, take: Int) -> SubSequence {
precondition (take >= 0 && skip >= 0)
if skip > 0 {
return dropFirst (skip).prefix (take)
}
else {
return prefix (take)
}
}
public func slice (skipLast skip: Int, take: Int) -> SubSequence {
precondition (take >= 0 && skip >= 0)
if skip > 0 {
return dropLast (skip).suffix (take)
}
else {
return suffix (take)
}
}
public func slice (skipFirst firstSkip: Int, skipLast lastSkip: Int) -> SubSequence {
precondition (firstSkip >= 0 && lastSkip >= 0)
switch (firstSkip > 0, lastSkip > 0)
{
case (true, true):
return dropFirst (firstSkip).dropLast (lastSkip)
case (true, false):
return dropFirst (firstSkip)
case (false, true):
return dropLast (lastSkip)
case (false, false):
return dropFirst (0)
}
}
}
public extension Collection {
public func range (skipFirst skip: Int, take: Int) -> Range<Index> {
precondition (skip >= 0 && skip >= 0)
let start = index (afterFirst: skip)
let end = index (start, offsetBy: take)
return start ..< end
}
public func range (skipLast skip: Int, take: Int) -> Range<Index> {
precondition (skip >= 0 && skip >= 0)
let start = _index (beforeLast: skip + take)
let end = index (start, offsetBy: take)
return start ..< end
}
public func range (skipFirst firstSkip: Int, skipLast lastSkip: Int) -> Range<Index> {
precondition (firstSkip >= 0 && lastSkip >= 0)
return index (afterFirst: firstSkip) ..< _index (beforeLast: lastSkip)
}
public func index (afterFirst skip: Int) -> Index {
precondition (skip >= 0)
guard skip > 0 else { return startIndex }
return index (startIndex, offsetBy: skip)
}
public func index (beforeLast skip: Int) -> Index {
return _index (beforeLast: skip + 1)
}
public func _index (beforeLast skip: Int) -> Index {
precondition (skip >= 0)
guard skip > 0 else { return endIndex }
return index (startIndex, offsetBy: count - skip)
}
public subscript (afterFirst skip: Int) -> Element {
return self [index (afterFirst: skip)]
}
public subscript (beforeLast skip: Int) -> Element {
return self [index (beforeLast: skip)]
}
public subscript (skipFirst skip: Int, take take: Int) -> Self.SubSequence {
return self.slice (skipFirst: skip, take: take)
}
public subscript (skipLast skip: Int, take take: Int) -> Self.SubSequence {
return self.slice (skipLast: skip, take: take)
}
public subscript (skipFirst firstSkip: Int, skipLast lastSkip: Int) -> Self.SubSequence {
return self.slice (skipFirst: firstSkip, skipLast: lastSkip)
}
}
public extension BidirectionalCollection {
public func _index (beforeLast skip: Int) -> Index {
precondition (skip >= 0)
guard skip > 0 else { return endIndex }
return index (endIndex, offsetBy: -skip)
}
}
public extension MutableCollection {
public subscript (afterFirst skip: Int) -> Element {
get { return self [index (afterFirst: skip)] }
set { self [index (afterFirst: skip)] = newValue }
}
public subscript (beforeLast skip: Int) -> Element {
get { return self [index (beforeLast: skip)] }
set { self [index (beforeLast: skip)] = newValue }
}
public subscript (skipFirst skip: Int, take take: Int) -> Self.SubSequence {
get { return self.slice (skipFirst: skip, take: take) }
set { self [self.range (skipFirst: skip, take: take)] = newValue }
}
public subscript (skipLast skip: Int, take take: Int) -> Self.SubSequence {
get { return self.slice (skipLast: skip, take: take) }
set { self [self.range (skipLast: skip, take: take)] = newValue }
}
public subscript (skipFirst firstSkip: Int, skipLast lastSkip: Int) -> Self.SubSequence {
get { return self.slice (skipFirst: firstSkip, skipLast: lastSkip) }
set { self [self.range (skipFirst: firstSkip, skipLast: lastSkip)] = newValue }
}
}
For simplicity, the Sequence
functions are implemented in terms of existing functions, but a real implementation ought to be the other way round: the new functions should be primitive, and the existing functions should be convenience methods.
I was thinking about this kind of approach yesterday, glad to see someone suggested this. I feel like this is some sort of compromise. The biggest issue I had with this approach is thinking of names for the params/labels. take
for example is not immediately obvious to me what it does.
Another thing it doesn't really address is the brevity that some people want. It seems some people really do want powerful slicing expressions like you have in more script oriented languages like Python or Perl.
Perl does have *
to refer to the end of an array.
my @alphabet = 'A' .. 'Z';
say @alphabet[*-1]; # OUTPUT: «Z»
say @alphabet[*-2]; # OUTPUT: «Y»
say @alphabet[*-3]; # OUTPUT: «X»
It also allows that to be used in the slicing syntax.
Can somebody be so kind to explain why can't we have proper Int
-based subscripts on String
like in Array
, for instance, instead of discussing how to simplify
s[s.index(s.startIndex, offsetBy: 7)...s.index(s.startIndex, offsetBy: 11)]
?
P.S. According to the String Manifesto, String
will become a Collection
of Character
s again (It hasn't yet become, has it?), which emphasizes the question for me even more.
Someone correct me if I'm wrong, but it has to do with Unicode correctness and algorithm complexity. Because of the way Collection
is defined, subscripts should be O(1). But to be Unicode correct, you have to traverse the string, which can be O(n). So the idea is that if we had Int based subscripts on String, all of a sudden you have O(n) behavior on a subscript.
It also has to do with what a "character" really is. Because Unicode defines code points that combine to form one grapheme. What happens when you have a String that is really composed of multiple Unicode code points, but composes to a single grapheme?
That being said, I'm also in the camp that thinks there should be some easier way to express slicing up a string that doesn't revolve around a ton of index calculation.
An extreme case:
let zalgo = "Ḩ̷͕̺͍͉̇͛́̀͑̍̕͢e̢̢̘̫̺͖͓̥͎̐̈̀̄̚͞ c̵̛̗̯̫̫͓̪̓͛̂͘ͅo̢̡̳͕̰̗̻͇̽̿̈͗̀̐̿̌͠m̢̪͕̮̱̘͇̈̉͗͆̀̕͘e̳͕͓͚͎̪̟̬̰͆̓͑̃̿̊ͅs̵͇̳͓̥̼̠͌͒̎̄͒̅̿͢"
print(zalgo.count) // 8
print(zalgo.utf16.count) // 112
It has. Collections can have opaque indices.
Previously, I had skip
/keep
but that looked a bit weird in a subscript on the LHS of an assignment, because it replaces the thing you're "keeping". Before that I had skip
/next
, which reads better, but is slightly odd in the end-relative case, where it means "next to the left". There probably isn't any single word that is totally clear on its own.
If we can cover the basic functionality (and meet the basic requirements) in a straightforward, consistent way, I don't see a problem adding "convenience" API on top that introduces operators for brevity, or uses Python-style signed offsetting. I just wasn't much in favor of those being the only choices.
I believe the ultimate goal is to be able to complete SE-0132.