Proposal: Python's indexing and slicing


(Amir Michail) #1

Examples:

l=[1,2,3,4,5]
l[-1]

5

l[-2]

4

l[2:4]

[3, 4]

l[2:]

[3, 4, 5]

l[-2:]

[4, 5]

l[:3]

[1, 2, 3]

l[::2]

[1, 3, 5]

l[::]

[1, 2, 3, 4, 5]


(Joe Groff) #2

Accepting negative indices is problematic for two reasons: it imposes runtime overhead in the index operation to check the sign of the index; also, it masks fencepost errors, since if you do foo[m-n] and n is accidentally greater than m, you'll quietly load the wrong element instead of trapping. I'd prefer something like D's `$-n` syntax for explicitly annotating end-relative indexes.

-Joe

···

On Dec 18, 2015, at 4:42 AM, Amir Michail via swift-evolution <swift-evolution@swift.org> wrote:

Examples:

>>> l=[1,2,3,4,5]
>>> l[-1]
5
>>> l[-2]
4
>>> l[2:4]
[3, 4]
>>> l[2:]
[3, 4, 5]
>>> l[-2:]
[4, 5]
>>> l[:3]
[1, 2, 3]
>>> l[::2]
[1, 3, 5]
>>> l[::]
[1, 2, 3, 4, 5]


(Jacob Bandes-Storch) #3

Or perhaps some subscripts with parameter labels, like

extension Array {
    subscript(fromEnd distance: Int) -> Element {
        return self[endIndex - distance]
    }
}

[0, 1, 2][fromEnd: 1] // returns 2

Jacob Bandes-Storch

···

On Fri, Dec 18, 2015 at 1:46 PM, Joe Groff via swift-evolution < swift-evolution@swift.org> wrote:

On Dec 18, 2015, at 4:42 AM, Amir Michail via swift-evolution < > swift-evolution@swift.org> wrote:

Examples:

>>> l=[1,2,3,4,5]
>>> l[-1]
5
>>> l[-2]
4
>>> l[2:4]
[3, 4]
>>> l[2:]
[3, 4, 5]
>>> l[-2:]
[4, 5]
>>> l[:3]
[1, 2, 3]
>>> l[::2]
[1, 3, 5]
>>> l[::]
[1, 2, 3, 4, 5]

Accepting negative indices is problematic for two reasons: it imposes
runtime overhead in the index operation to check the sign of the index;
also, it masks fencepost errors, since if you do foo[m-n] and n is
accidentally greater than m, you'll quietly load the wrong element instead
of trapping. I'd prefer something like D's `$-n` syntax for explicitly
annotating end-relative indexes.

-Joe

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Dave Abrahams) #4

Yes, we already have facilities to do most of what Python can do here, but one major problem IMO is that the “language” of slicing is so non-uniform: we have [a..<b], dropFirst, dropLast, prefix, and suffix. Introducing “$” for this purpose could make it all hang together and also eliminate the “why does it have to be so hard to look at the 2nd character of a string?!” problem. That is, use the identifier “$” (yes, that’s an identifier in Swift) to denote the beginning-or-end of a collection. Thus,

  c[c.startIndex.advancedBy(3)] => c[$+3] // Python: c[3]
  c[c.endIndex.advancedBy(-3)] => c[$-3] // Python: c[-3]
  c.dropFirst(3) => c[$+3...] // Python: c[3:]
  c.dropLast(3) => c[..<$-3] // Python: c[:-3]
  c.prefix(3) => c[..<$+3] // Python: c[:3]
  c.suffix(3) => c[$-3...] // Python: c[-3:]
   
It even has the nice connotation that, “this might be a little more expen$ive than plain indexing” (which it might, for non-random-access collections). I think the syntax is still a bit heavy, not least because of “..<“ and “...”, but the direction has potential.

I haven’t had the time to really experiment with a design like this; the community might be able to help by prototyping and using some alternatives. You can do all of this outside the standard library with extensions.

-Dave

···

On Dec 18, 2015, at 1:46 PM, Joe Groff via swift-evolution <swift-evolution@swift.org> wrote:

On Dec 18, 2015, at 4:42 AM, Amir Michail via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Examples:

>>> l=[1,2,3,4,5]
>>> l[-1]
5
>>> l[-2]
4
>>> l[2:4]
[3, 4]
>>> l[2:]
[3, 4, 5]
>>> l[-2:]
[4, 5]
>>> l[:3]
[1, 2, 3]
>>> l[::2]
[1, 3, 5]
>>> l[::]
[1, 2, 3, 4, 5]

Accepting negative indices is problematic for two reasons: it imposes runtime overhead in the index operation to check the sign of the index; also, it masks fencepost errors, since if you do foo[m-n] and n is accidentally greater than m, you'll quietly load the wrong element instead of trapping. I'd prefer something like D's `$-n` syntax for explicitly annotating end-relative indexes.


(Paul Ossenbruggen) #5

I would like to avoid what you currently have to do for iterating a subcontainer.

for a in container[0..container.count-4] {
  // do something.
}

The slicing syntax would certainly help in these common situations. Maybe there are easy ways that I am not aware of.

- Paul

···

On Dec 18, 2015, at 2:39 PM, Dave Abrahams via swift-evolution <swift-evolution@swift.org> wrote:

On Dec 18, 2015, at 1:46 PM, Joe Groff via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Dec 18, 2015, at 4:42 AM, Amir Michail via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Examples:

>>> l=[1,2,3,4,5]
>>> l[-1]
5
>>> l[-2]
4
>>> l[2:4]
[3, 4]
>>> l[2:]
[3, 4, 5]
>>> l[-2:]
[4, 5]
>>> l[:3]
[1, 2, 3]
>>> l[::2]
[1, 3, 5]
>>> l[::]
[1, 2, 3, 4, 5]

Accepting negative indices is problematic for two reasons: it imposes runtime overhead in the index operation to check the sign of the index; also, it masks fencepost errors, since if you do foo[m-n] and n is accidentally greater than m, you'll quietly load the wrong element instead of trapping. I'd prefer something like D's `$-n` syntax for explicitly annotating end-relative indexes.

Yes, we already have facilities to do most of what Python can do here, but one major problem IMO is that the “language” of slicing is so non-uniform: we have [a..<b], dropFirst, dropLast, prefix, and suffix. Introducing “$” for this purpose could make it all hang together and also eliminate the “why does it have to be so hard to look at the 2nd character of a string?!” problem. That is, use the identifier “$” (yes, that’s an identifier in Swift) to denote the beginning-or-end of a collection. Thus,

  c[c.startIndex.advancedBy(3)] => c[$+3] // Python: c[3]
  c[c.endIndex.advancedBy(-3)] => c[$-3] // Python: c[-3]
  c.dropFirst(3) => c[$+3...] // Python: c[3:]
  c.dropLast(3) => c[..<$-3] // Python: c[:-3]
  c.prefix(3) => c[..<$+3] // Python: c[:3]
  c.suffix(3) => c[$-3...] // Python: c[-3:]
   
It even has the nice connotation that, “this might be a little more expen$ive than plain indexing” (which it might, for non-random-access collections). I think the syntax is still a bit heavy, not least because of “..<“ and “...”, but the direction has potential.

I haven’t had the time to really experiment with a design like this; the community might be able to help by prototyping and using some alternatives. You can do all of this outside the standard library with extensions.

-Dave

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution


(Lily Ballard) #6

Interesting idea.

One downside is it masks potentially O(N) operations
(ForwardIndex.advancedBy()) behind the + operator, which is typically
assumed to be an O(1) operation. Alos, the $+3 syntax suggests that it
requires there to be at least 3 elements in the sequence, but
prefix()/suffix()/dropFirst/etc. all take maximum counts, so they
operate on sequences of fewer elements.

There's also some confusion with using $ for both start and end. What
if I say c[$..<$]? We'd have to infer from position that the first $ is
the start and the second $ is the end, but then what about
c[$+n..<$+m]? We can't treat the usage of + as meaning "from start"
because the argument might be negative. And if we use the overall sign
of the operation/argument together, then the expression `$+n` could
mean from start or from end, which comes right back to the problem with
Python syntax.

I think Jacob's idea has some promise though:

c[c.startIndex.advancedBy(3)] => c[fromStart: 3]
c[c.endIndex.advancedBy(-3)] => c[fromEnd: 3]

But naming the slice operations is a little trickier. We could actually
just go ahead and re-use the existing method names for those:

c.dropFirst(3) => c[dropFirst: 3]
c.dropLast(3) => c[dropLast: 3]
c.prefix(3) => c[prefix: 3]
c.suffix(3) => c[suffix: 3]

That's not so compelling, since we already have the methods, but I
suppose it makes sense if you want to try and make all slice-producing
methods use subscript syntax (which I have mixed feelings about). But
the [fromStart:] and [fromEnd:] subscripts seem useful.

-Kevin Ballard

···

On Fri, Dec 18, 2015, at 02:39 PM, Dave Abrahams via swift-evolution wrote:

Yes, we already have facilities to do most of what Python can do here,
but one major problem IMO is that the “language” of slicing is so non-
uniform: we have [a..<b], dropFirst, dropLast, prefix, and suffix.
Introducing “$” for this purpose could make it all hang together and
also eliminate the “why does it have to be so hard to look at the 2nd
character of a string?!” problem. That is, use the identifier “$”
(yes, that’s an identifier in Swift) to denote the beginning-or-end of
a collection. Thus,

c[c.startIndex.advancedBy(3)] =>c[$+3] // Python: c[3] c[c.endIndex.advancedBy(-
3)] =>c[$-3] // Python: c[-3]

c.dropFirst(3) =>c[$+3...] // Python: c[3:] c.dropLast(3) =>c[..<$-
3] // Python: c[:-3] c.prefix(3) =>c[..<$+3] // Python: c[:3]
c.suffix(3) => c[$-3...] // Python: c[-3:]

It even has the nice connotation that, “this might be a little more
expen$ive than plain indexing” (which it might, for non-random-access
collections). I think the syntax is still a bit heavy, not least
because of “..<“ and “...”, but the direction has potential.

I haven’t had the time to really experiment with a design like this;
the community might be able to help by prototyping and using some
alternatives. You can do all of this outside the standard library
with extensions.


(Dennis Lysenko) #7

Dave, perhaps we could use "^" as an anchor point for the start index and $
as the anchor point for the end index? It's familiar to anyone who knows a
bit of regex, and all vim users. My main worry would be ^ is already infix
xor operator.

···

On Fri, Dec 18, 2015 at 5:43 PM Paul Ossenbruggen via swift-evolution < swift-evolution@swift.org> wrote:

I would like to avoid what you currently have to do for iterating a
subcontainer.

for a in container[0..container.count-4] {
// do something.
}

The slicing syntax would certainly help in these common situations. Maybe
there are easy ways that I am not aware of.

- Paul

On Dec 18, 2015, at 2:39 PM, Dave Abrahams via swift-evolution < > swift-evolution@swift.org> wrote:

On Dec 18, 2015, at 1:46 PM, Joe Groff via swift-evolution < > swift-evolution@swift.org> wrote:

On Dec 18, 2015, at 4:42 AM, Amir Michail via swift-evolution < > swift-evolution@swift.org> wrote:

Examples:

>>> l=[1,2,3,4,5]
>>> l[-1]
5
>>> l[-2]
4
>>> l[2:4]
[3, 4]
>>> l[2:]
[3, 4, 5]
>>> l[-2:]
[4, 5]
>>> l[:3]
[1, 2, 3]
>>> l[::2]
[1, 3, 5]
>>> l[::]
[1, 2, 3, 4, 5]

Accepting negative indices is problematic for two reasons: it imposes
runtime overhead in the index operation to check the sign of the index;
also, it masks fencepost errors, since if you do foo[m-n] and n is
accidentally greater than m, you'll quietly load the wrong element instead
of trapping. I'd prefer something like D's `$-n` syntax for explicitly
annotating end-relative indexes.

Yes, we already have facilities to do most of what Python can do here, but
one major problem IMO is that the “language” of slicing is so non-uniform:
we have [a..<b], dropFirst, dropLast, prefix, and suffix. Introducing “$”
for this purpose could make it all hang together and also eliminate the
“why does it have to be so hard to look at the 2nd character of a string?!”
problem. That is, use the identifier “$” (yes, that’s an identifier in
Swift) to denote the beginning-or-end of a collection. Thus,

  c[c.startIndex.advancedBy(3)] => c[$+3] // Python: c[3]
  c[c.endIndex.advancedBy(-3)] => c[$-3] // Python: c[-3]
  c.dropFirst(3) => c[$+3...] // Python: c[3:]
  c.dropLast(3) => c[..<$-3] // Python: c[:-3]
  c.prefix(3) => c[..<$+3] // Python: c[:3]
  c.suffix(3) => c[$-3...] // Python: c[-3:]

It even has the nice connotation that, “this might be a little more expen$ive
than plain indexing” (which it might, for non-random-access collections). I
think the syntax is still a bit heavy, not least because of “..<“ and
“...”, but the direction has potential.

I haven’t had the time to really experiment with a design like this; the
community might be able to help by prototyping and using some
alternatives. You can do all of this outside the standard library with
extensions.

-Dave

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Dave Abrahams) #8

Yes, we already have facilities to do most of what Python can do here, but one major problem IMO is that the “language” of slicing is so non-uniform: we have [a..<b], dropFirst, dropLast, prefix, and suffix. Introducing “$” for this purpose could make it all hang together and also eliminate the “why does it have to be so hard to look at the 2nd character of a string?!” problem. That is, use the identifier “$” (yes, that’s an identifier in Swift) to denote the beginning-or-end of a collection. Thus,

  c[c.startIndex.advancedBy(3)] =>c[$+3] // Python: c[3]
  c[c.endIndex.advancedBy(-3)] =>c[$-3] // Python: c[-3]

  c.dropFirst(3) =>c[$+3...] // Python: c[3:]
  c.dropLast(3) =>c[..<$-3] // Python: c[:-3]
  c.prefix(3) =>c[..<$+3] // Python: c[:3]
  c.suffix(3) => c[$-3...] // Python: c[-3:]

It even has the nice connotation that, “this might be a little more expen$ive than plain indexing” (which it might, for non-random-access collections). I think the syntax is still a bit heavy, not least because of “..<“ and “...”, but the direction has potential.

I haven’t had the time to really experiment with a design like this; the community might be able to help by prototyping and using some alternatives. You can do all of this outside the standard library with extensions.

Interesting idea.

One downside is it masks potentially O(N) operations (ForwardIndex.advancedBy()) behind the + operator, which is typically assumed to be an O(1) operation.

Yeah, but the “$” is sufficiently unusual that it doesn’t bother me too much.

Alos, the $+3 syntax suggests that it requires there to be at least 3 elements in the sequence, but prefix()/suffix()/dropFirst/etc. all take maximum counts, so they operate on sequences of fewer elements.

For indexing, $+3 would make that requirement. For slicing, it wouldn’t. I’m not sure why you say something about the syntax suggests exceeding bounds would be an error.

There's also some confusion with using $ for both start and end. What if I say c[$..<$]? We'd have to infer from position that the first $ is the start and the second $ is the end, but then what about c[$+n..<$+m]? We can't treat the usage of + as meaning "from start" because the argument might be negative. And if we use the overall sign of the operation/argument together, then the expression `$+n` could mean from start or from end, which comes right back to the problem with Python syntax.

There’s a problem with Python syntax? I’m guessing you mean that c[a:b] can have very different interpretations depending on whether a and b are positive or negative?

First of all, I should say: that doesn’t really bother me. The 99.9% use case for this operation uses literal constants for the offsets, and I haven’t heard of it causing confusion for Python programmers. That said, if we wanted to address it, we could easily require n and m above to be literals, rather than Ints (which incidentally guarantees it’s an O(1) operation). That has upsides and downsides of course.

I think Jacob's idea has some promise though:

c[c.startIndex.advancedBy(3)] => c[fromStart: 3]
c[c.endIndex.advancedBy(-3)] => c[fromEnd: 3]

But naming the slice operations is a little trickier. We could actually just go ahead and re-use the existing method names for those:

c.dropFirst(3) => c[dropFirst: 3]
c.dropLast(3) => c[dropLast: 3]
c.prefix(3) => c[prefix: 3]
c.suffix(3) => c[suffix: 3]

That's not so compelling, since we already have the methods, but I suppose it makes sense if you want to try and make all slice-producing methods use subscript syntax (which I have mixed feelings about).

Once we get efficient in-place slice mutation (via slice addressors), it becomes a lot more compelling, IMO. But I still don’t find the naming terribly clear, and I don’t love that one needs to combine two subscript operations in order to drop the first and last element or take just elements 3..<5.

Even if we need separate symbols for “start” and “end” (e.g. using “$” for both might just be too confusing for people in the end, even if it works otherwise), I still think a generalized form that allows ranges to be used everywhere for slicing is going to be much easier to understand than this hodgepodge of words we use today.

But the [fromStart:] and [fromEnd:] subscripts seem useful.

Yeah… I really want a unified solution that covers slicing as well as offset indexing.

-Dave

···

On Dec 19, 2015, at 8:52 PM, Kevin Ballard via swift-evolution <swift-evolution@swift.org> wrote:
On Fri, Dec 18, 2015, at 02:39 PM, Dave Abrahams via swift-evolution wrote:


(Dave Abrahams) #9

Dave, perhaps we could use "^" as an anchor point for the start index and $ as the anchor point for the end index? It's familiar to anyone who knows a bit of regex, and all vim users. My main worry would be ^ is already infix xor operator.

We could.

Downsides:
• it's a needless additional sigil
• it requires a language extension to express x[^ + 3] unless we fake it with prefix operators ^- and ^+.

I would like to avoid what you currently have to do for iterating a subcontainer.

for a in container[0..container.count-4] {
  // do something.
}

The slicing syntax would certainly help in these common situations. Maybe there are easy ways that I am not aware of.

- Paul

Examples:

>>> l=[1,2,3,4,5]
>>> l[-1]
5
>>> l[-2]
4
>>> l[2:4]
[3, 4]
>>> l[2:]
[3, 4, 5]
>>> l[-2:]
[4, 5]
>>> l[:3]
[1, 2, 3]
>>> l[::2]
[1, 3, 5]
>>> l[::]
[1, 2, 3, 4, 5]

Accepting negative indices is problematic for two reasons: it imposes runtime overhead in the index operation to check the sign of the index; also, it masks fencepost errors, since if you do foo[m-n] and n is accidentally greater than m, you'll quietly load the wrong element instead of trapping. I'd prefer something like D's `$-n` syntax for explicitly annotating end-relative indexes.

Yes, we already have facilities to do most of what Python can do here, but one major problem IMO is that the “language” of slicing is so non-uniform: we have [a..<b], dropFirst, dropLast, prefix, and suffix. Introducing “$” for this purpose could make it all hang together and also eliminate the “why does it have to be so hard to look at the 2nd character of a string?!” problem. That is, use the identifier “$” (yes, that’s an identifier in Swift) to denote the beginning-or-end of a collection. Thus,

  c[c.startIndex.advancedBy(3)] => c[$+3] // Python: c[3]
  c[c.endIndex.advancedBy(-3)] => c[$-3] // Python: c[-3]
  c.dropFirst(3) => c[$+3...] // Python: c[3:]
  c.dropLast(3) => c[..<$-3] // Python: c[:-3]
  c.prefix(3) => c[..<$+3] // Python: c[:3]
  c.suffix(3) => c[$-3...] // Python: c[-3:]
   
It even has the nice connotation that, “this might be a little more expen$ive than plain indexing” (which it might, for non-random-access collections). I think the syntax is still a bit heavy, not least because of “..<“ and “...”, but the direction has potential.

I haven’t had the time to really experiment with a design like this; the community might be able to help by prototyping and using some alternatives. You can do all of this outside the standard library with extensions.

-Dave

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

-Dave

···

On Dec 19, 2015, at 8:27 PM, Dennis Lysenko <dennis.s.lysenko@gmail.com> wrote:
On Fri, Dec 18, 2015 at 5:43 PM Paul Ossenbruggen via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Dec 18, 2015, at 2:39 PM, Dave Abrahams via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Dec 18, 2015, at 1:46 PM, Joe Groff via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Dec 18, 2015, at 4:42 AM, Amir Michail via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:


(Lily Ballard) #10

Yes, we already have facilities to do most of what Python can do
here, but one major problem IMO is that the “language” of slicing is
so non-uniform: we have [a..<b], dropFirst, dropLast, prefix, and
suffix. Introducing “$” for this purpose could make it all hang
together and also eliminate the “why does it have to be so hard to
look at the 2nd character of a string?!” problem. That is, use the
identifier “$” (yes, that’s an identifier in Swift) to denote the
beginning-or-end of a collection. Thus,

c[c.startIndex.advancedBy(3)] =>c[$+3] // Python: c[3] c[c.endIndex.advancedBy(-
3)] =>c[$-3] // Python: c[-3]

c.dropFirst(3) =>c[$+3...] // Python: c[3:] c.dropLast(3) =>c[..<$-
3] // Python: c[:-3] c.prefix(3) =>c[..<$+3] // Python:
c[:3] c.suffix(3) => c[$-3...] // Python: c[-3:]

It even has the nice connotation that, “this might be a little more
expen$ive than plain indexing” (which it might, for non-random-
access collections). I think the syntax is still a bit heavy, not
least because of “..<“ and “...”, but the direction has potential.

I haven’t had the time to really experiment with a design like this;
the community might be able to help by prototyping and using some
alternatives. You can do all of this outside the standard library
with extensions.

Interesting idea.

One downside is it masks potentially O(N) operations
(ForwardIndex.advancedBy()) behind the + operator, which is typically
assumed to be an O(1) operation.

Yeah, but the “$” is sufficiently unusual that it doesn’t bother me
too much.

Alos, the $+3 syntax suggests that it requires there to be at least 3
elements in the sequence, but prefix()/suffix()/dropFirst/etc. all
take maximum counts, so they operate on sequences of fewer elements.

For indexing, $+3 would make that requirement. For slicing, it
wouldn’t. I’m not sure why you say something about the _syntax_
suggests exceeding bounds would be an error.

Because there's no precedent for + behaving like a saturating addition,
not in Swift and not, to my knowledge, anywhere else either. The closest
example that comes to mind is floating-point numbers eventually ending
up at Infinity, but that's not really saturating addition, that's just a
consequence of Infinity + anything == Infinity. Nor do I think we should
be establishing precedent of using + for saturating addition, because
that would be surprising to people. Additionally, I don't think adding a
$ to an array slice expression should result in a behavioral difference,
e.g. array[3..<array.endIndex] and array[$+3..<$] should behave the same

There's also some confusion with using $ for both start and end. What
if I say c[$..<$]? We'd have to infer from position that the first $
is the start and the second $ is the end, but then what about
c[$+n..<$+m]? We can't treat the usage of + as meaning "from start"
because the argument might be negative. And if we use the overall
sign of the operation/argument together, then the expression `$+n`
could mean from start or from end, which comes right back to the
problem with Python syntax.

There’s a problem with Python syntax? I’m guessing you mean that
c[a:b] can have very different interpretations depending on whether a
and b are positive or negative?

Exactly.

First of all, I should say: that doesn’t really bother me. The 99.9%
use case for this operation uses literal constants for the offsets,
and I haven’t heard of it causing confusion for Python programmers.
That said, if we wanted to address it, we could easily require n and m
above to be literals, rather than Ints (which incidentally guarantees
it’s an O(1) operation). That has upsides and downsides of course.

I don't think we should add this feature in any form if it only
supports literals.

I think Jacob's idea has some promise though:

c[c.startIndex.advancedBy(3)] => c[fromStart: 3]
c[c.endIndex.advancedBy(-3)] => c[fromEnd: 3]

But naming the slice operations is a little trickier. We could
actually just go ahead and re-use the existing method names
for those:

c.dropFirst(3) => c[dropFirst: 3]
c.dropLast(3) => c[dropLast: 3]
c.prefix(3) => c[prefix: 3]
c.suffix(3) => c[suffix: 3]

That's not so compelling, since we already have the methods, but I
suppose it makes sense if you want to try and make all slice-
producing methods use subscript syntax (which I have mixed feelings
about).

Once we get efficient in-place slice mutation (via slice addressors),
it becomes a lot more compelling, IMO. But I still don’t find the
naming terribly clear, and I don’t love that one needs to combine two
subscript operations in order to drop the first and last element or
take just elements 3..<5.

You can always add more overloads, such as

c[dropFirst: 3, dropLast: 5]

but I admit that there's a bunch of combinations here that would need
to be added.

My concern over trying to make it easier to take elements 3..<5 is that
incrementing indexes is verbose for a reason, and adding a feature that
makes it really easy to index into any collection by using integers is a
bad idea as it will hide O(N) operations behind code that looks like
O(1). And hiding these operations makes it really easy to accidentally
turn an O(N) algorithm into an O(N^2) algorithm.

Even if we need separate symbols for “start” and “end” (e.g. using “$”
for both might just be too confusing for people in the end, even if it
works otherwise), I still think a generalized form that allows ranges
to be used everywhere for slicing is going to be much easier to
understand than this hodgepodge of words we use today.

I'm tempted to say that if we do this, we should use two different
sigils, and more importantly we should not use + and - but instead use
methods on the sigils like advancedBy(), as if the sigils were literally
placeholders for the start/end index. That way we won't write code that
looks O(1) when it's not. For example:

col[^.advancedBy(3)..<$]

Although we'd need to revisit the names a little, because $.advancedBy(-
3) is a bit odd when we know that $ can't ever take a non-negative
number for that.

Or maybe we should just use $ instead as a token that means "the
collection being indexed", so you'd actually say something like

col[$.startIndex.advancedBy(3)..<$.startIndex.advancedBy(5)]

This solves the problem of subscripting a collection without having to
store it in a local variable, without discarding any of the intentional
index overhead. Of course, if the goal is to make index operations more
concise this doesn't really help much, but my argument here is that it's
hard to cut down on the verbosity without hiding O(N) operations.

-Kevin Ballard

···

On Mon, Dec 21, 2015, at 11:56 AM, Dave Abrahams wrote:

On Dec 19, 2015, at 8:52 PM, Kevin Ballard via swift-evolution <swift- >> evolution@swift.org> wrote:
On Fri, Dec 18, 2015, at 02:39 PM, Dave Abrahams via swift- >> evolution wrote:

But the [fromStart:] and [fromEnd:] subscripts seem useful.

Yeah… I really want a unified solution that covers slicing as well as
offset indexing.

-Dave


(Dave Abrahams) #11

Yes, we already have facilities to do most of what Python can do here, but one major problem IMO is that the “language” of slicing is so non-uniform: we have [a..<b], dropFirst, dropLast, prefix, and suffix. Introducing “$” for this purpose could make it all hang together and also eliminate the “why does it have to be so hard to look at the 2nd character of a string?!” problem. That is, use the identifier “$” (yes, that’s an identifier in Swift) to denote the beginning-or-end of a collection. Thus,

  c[c.startIndex.advancedBy(3)] =>c[$+3] // Python: c[3]
  c[c.endIndex.advancedBy(-3)] =>c[$-3] // Python: c[-3]

  c.dropFirst(3) =>c[$+3...] // Python: c[3:]
  c.dropLast(3) =>c[..<$-3] // Python: c[:-3]
  c.prefix(3) =>c[..<$+3] // Python: c[:3]
  c.suffix(3) => c[$-3...] // Python: c[-3:]

It even has the nice connotation that, “this might be a little more expen$ive than plain indexing” (which it might, for non-random-access collections). I think the syntax is still a bit heavy, not least because of “..<“ and “...”, but the direction has potential.

I haven’t had the time to really experiment with a design like this; the community might be able to help by prototyping and using some alternatives. You can do all of this outside the standard library with extensions.

Interesting idea.

One downside is it masks potentially O(N) operations (ForwardIndex.advancedBy()) behind the + operator, which is typically assumed to be an O(1) operation.

Yeah, but the “$” is sufficiently unusual that it doesn’t bother me too much.

Alos, the $+3 syntax suggests that it requires there to be at least 3 elements in the sequence, but prefix()/suffix()/dropFirst/etc. all take maximum counts, so they operate on sequences of fewer elements.

For indexing, $+3 would make that requirement. For slicing, it wouldn’t. I’m not sure why you say something about the syntax suggests exceeding bounds would be an error.

Because there's no precedent for + behaving like a saturating addition, not in Swift and not, to my knowledge, anywhere else either. The closest example that comes to mind is floating-point numbers eventually ending up at Infinity, but that's not really saturating addition, that's just a consequence of Infinity + anything == Infinity. Nor do I think we should be establishing precedent of using + for saturating addition, because that would be surprising to people.

To call this “saturating addition” is an…interesting…interpretation. I don’t view it that way at all. The “saturation,” if there is any, happens as part of subscripting. You don’t even know what the “saturation limit” is until you couple the range expression with the collection.

In my view, the addition is part of an EDSL that represents a notional position offset from the start or end, then the subscript operation forgivingly trims these offsets as needed.

Additionally, I don't think adding a $ to an array slice expression should result in a behavioral difference, e.g. array[3..<array.endIndex] and array[$+3..<$] should behave the same

I see your point, but don’t (necessarily) agree with you there. “$” here is used as an indicator of several of things, including not-necessarily-O(1) and forgiving slicing. We could introduce a label just to handle that:

array[forgivingAndNotO1: $+3..<$]

but it doesn’t look like a win to me.

There's also some confusion with using $ for both start and end. What if I say c[$..<$]? We'd have to infer from position that the first $ is the start and the second $ is the end, but then what about c[$+n..<$+m]? We can't treat the usage of + as meaning "from start" because the argument might be negative. And if we use the overall sign of the operation/argument together, then the expression `$+n` could mean from start or from end, which comes right back to the problem with Python syntax.

There’s a problem with Python syntax? I’m guessing you mean that c[a:b] can have very different interpretations depending on whether a and b are positive or negative?

Exactly.

First of all, I should say: that doesn’t really bother me. The 99.9% use case for this operation uses literal constants for the offsets, and I haven’t heard of it causing confusion for Python programmers. That said, if we wanted to address it, we could easily require n and m above to be literals, rather than Ints (which incidentally guarantees it’s an O(1) operation). That has upsides and downsides of course.

I don't think we should add this feature in any form if it only supports literals.
  

I think Jacob's idea has some promise though:

c[c.startIndex.advancedBy(3)] => c[fromStart: 3]
c[c.endIndex.advancedBy(-3)] => c[fromEnd: 3]

But naming the slice operations is a little trickier. We could actually just go ahead and re-use the existing method names for those:

c.dropFirst(3) => c[dropFirst: 3]
c.dropLast(3) => c[dropLast: 3]
c.prefix(3) => c[prefix: 3]
c.suffix(3) => c[suffix: 3]

That's not so compelling, since we already have the methods, but I suppose it makes sense if you want to try and make all slice-producing methods use subscript syntax (which I have mixed feelings about).

Once we get efficient in-place slice mutation (via slice addressors), it becomes a lot more compelling, IMO. But I still don’t find the naming terribly clear, and I don’t love that one needs to combine two subscript operations in order to drop the first and last element or take just elements 3..<5.

You can always add more overloads, such as

c[dropFirst: 3, dropLast: 5]

but I admit that there's a bunch of combinations here that would need to be added.

My point is that we have an English language soup that doesn’t compose naturally. Slicing in Python is much more elegant and composes well. If we didn’t currently have 6 separate methods (7 including subscript for index-based slicing) for handling this, that need to be separately documented and understood, I wouldn’t be so eager to replace the words with an EDSL, but in this case IMO it is an overall simplification.

My concern over trying to make it easier to take elements 3..<5 is that incrementing indexes is verbose for a reason, and adding a feature that makes it really easy to index into any collection by using integers is a bad idea as it will hide O(N) operations behind code that looks like O(1). And hiding these operations makes it really easy to accidentally turn an O(N) algorithm into an O(N^2) algorithm.

As I’ve said, I consider the presence of “$” to be enough of an indicator that something co$tly is happening, though I’m open to other ways of indicating it. I’m trying to strike a balance between “rigorous” and “easy to use,” here. Remember that Swift has to work in playgrounds and for beginning programmers, too. I am likewise unsatisfied with the (lack of) ease-of-use of String as well (e.g. for lexing and parsing tasks), and have made improving it a priority for Swift 3. I view fixing the slicing interface as part of that job.

Even if we need separate symbols for “start” and “end” (e.g. using “$” for both might just be too confusing for people in the end, even if it works otherwise), I still think a generalized form that allows ranges to be used everywhere for slicing is going to be much easier to understand than this hodgepodge of words we use today.

I'm tempted to say that if we do this, we should use two different sigils, and more importantly we should not use + and - but instead use methods on the sigils like advancedBy(), as if the sigils were literally placeholders for the start/end index. That way we won't write code that looks O(1) when it's not. For example:

col[^.advancedBy(3)..<$]

Although we'd need to revisit the names a little, because $.advancedBy(-3) is a bit odd when we know that $ can't ever take a non-negative number for that.

Or maybe we should just use $ instead as a token that means "the collection being indexed", so you'd actually say something like

col[$.startIndex.advancedBy(3)..<$.startIndex.advancedBy(5)]

I really like that direction, but I don’t think it does enough to solve the ease-of-use problem; I still think the result looks and feels horrible compared to Python for the constituencies mentioned above.

I briefly implemented this syntax, that was intended to suggest repeated incrementation:

  col.startIndex++3 // col.startIndex.advancedBy(3)

I don’t think that is viable, especially now that we’ve dropped “++” and “--“. But this syntax

  col[$.start⛄️3..<$.start⛄️5]

begins to be interesting for some definition of :snowman:️.

This solves the problem of subscripting a collection without having to store it in a local variable, without discarding any of the intentional index overhead. Of course, if the goal is to make index operations more concise this doesn't really help much, but my argument here is that it's hard to cut down on the verbosity without hiding O(N) operations.

That ship has already sailed somewhat, because e.g. every Collection has to have a count property, which can be O(N). But I still like to uphold it where possible. I just don’t think the combination of “+” and “$” necessarily has such a strong O(1) connotation… especially because the precedent for seeing those symbols together is regexps.

-Kevin Ballard

But the [fromStart:] and [fromEnd:] subscripts seem useful.

Yeah… I really want a unified solution that covers slicing as well as offset indexing.

-Dave

-Dave

···

On Dec 21, 2015, at 1:51 PM, Kevin Ballard <kevin@sb.org> wrote:
On Mon, Dec 21, 2015, at 11:56 AM, Dave Abrahams wrote:

On Dec 19, 2015, at 8:52 PM, Kevin Ballard via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
On Fri, Dec 18, 2015, at 02:39 PM, Dave Abrahams via swift-evolution wrote:


(Lily Ballard) #12

I could definitely see this working given an appropriate operator. I'm
opposed to + because I think that the notion of addition is strongly
tied to O(1) behavior, and the precedent in Swift is for addition to
trap if it goes out of bounds instead of to be saturating (or however
you want to describe that behavior).

There is a potential concern here though which is that some operations
make sense as "saturating" slices (e.g. the offsets are trimmed to be
in-bounds) and some would likely want to trap for out of bounds. This
would be the difference between the advancedBy(_:slight_smile: and
advancedBy(_:limit:) methods. This is one reason why I suggested using $
merely as a stand-in for the collection, so the existing behavior around
advancedBy(_:slight_smile: vs advancedBy(_:limit:) would apply. More generally, any
slicing operation that's the equivalent of prefix(), dropFirst(), or
dropLast() would likely want the "saturating" behavior, but any slicing
operation that's working with explicit offsets and merely wants to make
it more concise would want to trap if the offsets go out of bounds. This
could be represented as two different operators (if the trapping
operator is :snowman:️ perhaps the "saturating" one could even be &:snowman:️, just as
&+ is the non-trapping version of +).

As for operator choice, my first idea was to use ~> (and I know it's
already taken, but that's an internal implementation detail and could be
changed), but when I tried writing out the 3..<5 example it looks weird
to have > be in the operator that's used with ..<, because that becomes
col[$.start~>3..<$.start~>5]. So I think something else is more
appropriate. Heck, the ++ suggestion you offered has promise, especially
now that we've dropped ++ and --.

On another note, I'm tempted to say that we should use $start and $end
instead of $.start and $.end. The compiler doesn't currently allow this,
because it expects a number after the $, but I see no reason why we
can't relax that rule and allow $start to be a valid token. The benefit
of this approach is it frees up $ to be used by third-party code (such
as in the older thread about rebinding `self` for DSLs where I suggested
that a block-based API can use $ as the parameter name so code would say
something like `$.expect(foo).to(.equal(bar))`).

-Kevin Ballard

···

On Mon, Dec 21, 2015, at 07:29 PM, Dave Abrahams wrote:

Even if we need separate symbols for “start” and “end” (e.g. using
“$” for both might just be too confusing for people in the end, even
if it works otherwise), I still think a generalized form that allows
ranges to be used everywhere for slicing is going to be much easier
to understand than this hodgepodge of words we use today.

I'm tempted to say that if we do this, we should use two different
sigils, and more importantly we should not use + and - but instead
use methods on the sigils like advancedBy(), as if the sigils were
literally placeholders for the start/end index. That way we won't
write code that looks O(1) when it's not. For example:

col[^.advancedBy(3)..<$]

Although we'd need to revisit the names a little, because $.advancedBy(-
3) is a bit odd when we know that $ can't ever take a non-negative
number for that.

Or maybe we should just use $ instead as a token that means "the
collection being indexed", so you'd actually say something like

col[$.startIndex.advancedBy(3)..<$.startIndex.advancedBy(5)]

I really like that direction, but I don’t think it does enough to
solve the ease-of-use problem; I still think the result looks and
feels horrible compared to Python for the constituencies
mentioned above.

I briefly implemented this syntax, that was intended to suggest
repeated incrementation:

col.startIndex++3 // col.startIndex.advancedBy(3)

I don’t think that is viable, especially now that we’ve dropped “++”
and “--“. But this syntax

col[$.start⛄️3..<$.start⛄️5]

begins to be interesting for some definition of :snowman:️.


(Donnacha Oisín Kidney) #13

Why not make the “forgiving” version the default? I mean, the majority of python-style composable slicing would be happening on arrays and array slices, for which there’s no performance overhead, and the forgiving version would seam to suit the “safe-by-default” philosophy. I’ve seen mistakes like this:

let ar = [1, 2, 3, 4, 5]
let arSlice = ar[2..<5]
arSlice[1]

on a few occasions, for instance. I would think something like this:

let ar = [0, 1, 2, 3, 4, 5]

let arSlice = ar[2...] // [3, 4, 5]
arSlice[..<3] // [2, 3, 4]
arSlice[...3] // [2, 3, 4, 5]
arSlice[direct: 2] // 2
arSlice[0] // 2

Would be what was expected from most programmers learning Swift, while leaving the unforgiving option open to those who need it.

···

On 22 Dec 2015, at 03:29, Dave Abrahams via swift-evolution <swift-evolution@swift.org> wrote:

On Dec 21, 2015, at 1:51 PM, Kevin Ballard <kevin@sb.org <mailto:kevin@sb.org>> wrote:

On Mon, Dec 21, 2015, at 11:56 AM, Dave Abrahams wrote:

On Dec 19, 2015, at 8:52 PM, Kevin Ballard via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Fri, Dec 18, 2015, at 02:39 PM, Dave Abrahams via swift-evolution wrote:

Yes, we already have facilities to do most of what Python can do here, but one major problem IMO is that the “language” of slicing is so non-uniform: we have [a..<b], dropFirst, dropLast, prefix, and suffix. Introducing “$” for this purpose could make it all hang together and also eliminate the “why does it have to be so hard to look at the 2nd character of a string?!” problem. That is, use the identifier “$” (yes, that’s an identifier in Swift) to denote the beginning-or-end of a collection. Thus,

  c[c.startIndex.advancedBy(3)] =>c[$+3] // Python: c[3]
  c[c.endIndex.advancedBy(-3)] =>c[$-3] // Python: c[-3]

  c.dropFirst(3) =>c[$+3...] // Python: c[3:]
  c.dropLast(3) =>c[..<$-3] // Python: c[:-3]
  c.prefix(3) =>c[..<$+3] // Python: c[:3]
  c.suffix(3) => c[$-3...] // Python: c[-3:]

It even has the nice connotation that, “this might be a little more expen$ive than plain indexing” (which it might, for non-random-access collections). I think the syntax is still a bit heavy, not least because of “..<“ and “...”, but the direction has potential.

I haven’t had the time to really experiment with a design like this; the community might be able to help by prototyping and using some alternatives. You can do all of this outside the standard library with extensions.

Interesting idea.

One downside is it masks potentially O(N) operations (ForwardIndex.advancedBy()) behind the + operator, which is typically assumed to be an O(1) operation.

Yeah, but the “$” is sufficiently unusual that it doesn’t bother me too much.

Alos, the $+3 syntax suggests that it requires there to be at least 3 elements in the sequence, but prefix()/suffix()/dropFirst/etc. all take maximum counts, so they operate on sequences of fewer elements.

For indexing, $+3 would make that requirement. For slicing, it wouldn’t. I’m not sure why you say something about the syntax suggests exceeding bounds would be an error.

Because there's no precedent for + behaving like a saturating addition, not in Swift and not, to my knowledge, anywhere else either. The closest example that comes to mind is floating-point numbers eventually ending up at Infinity, but that's not really saturating addition, that's just a consequence of Infinity + anything == Infinity. Nor do I think we should be establishing precedent of using + for saturating addition, because that would be surprising to people.

To call this “saturating addition” is an…interesting…interpretation. I don’t view it that way at all. The “saturation,” if there is any, happens as part of subscripting. You don’t even know what the “saturation limit” is until you couple the range expression with the collection.

In my view, the addition is part of an EDSL that represents a notional position offset from the start or end, then the subscript operation forgivingly trims these offsets as needed.

Additionally, I don't think adding a $ to an array slice expression should result in a behavioral difference, e.g. array[3..<array.endIndex] and array[$+3..<$] should behave the same

I see your point, but don’t (necessarily) agree with you there. “$” here is used as an indicator of several of things, including not-necessarily-O(1) and forgiving slicing. We could introduce a label just to handle that:

array[forgivingAndNotO1: $+3..<$]

but it doesn’t look like a win to me.

There's also some confusion with using $ for both start and end. What if I say c[$..<$]? We'd have to infer from position that the first $ is the start and the second $ is the end, but then what about c[$+n..<$+m]? We can't treat the usage of + as meaning "from start" because the argument might be negative. And if we use the overall sign of the operation/argument together, then the expression `$+n` could mean from start or from end, which comes right back to the problem with Python syntax.

There’s a problem with Python syntax? I’m guessing you mean that c[a:b] can have very different interpretations depending on whether a and b are positive or negative?

Exactly.

First of all, I should say: that doesn’t really bother me. The 99.9% use case for this operation uses literal constants for the offsets, and I haven’t heard of it causing confusion for Python programmers. That said, if we wanted to address it, we could easily require n and m above to be literals, rather than Ints (which incidentally guarantees it’s an O(1) operation). That has upsides and downsides of course.

I don't think we should add this feature in any form if it only supports literals.
  

I think Jacob's idea has some promise though:

c[c.startIndex.advancedBy(3)] => c[fromStart: 3]
c[c.endIndex.advancedBy(-3)] => c[fromEnd: 3]

But naming the slice operations is a little trickier. We could actually just go ahead and re-use the existing method names for those:

c.dropFirst(3) => c[dropFirst: 3]
c.dropLast(3) => c[dropLast: 3]
c.prefix(3) => c[prefix: 3]
c.suffix(3) => c[suffix: 3]

That's not so compelling, since we already have the methods, but I suppose it makes sense if you want to try and make all slice-producing methods use subscript syntax (which I have mixed feelings about).

Once we get efficient in-place slice mutation (via slice addressors), it becomes a lot more compelling, IMO. But I still don’t find the naming terribly clear, and I don’t love that one needs to combine two subscript operations in order to drop the first and last element or take just elements 3..<5.

You can always add more overloads, such as

c[dropFirst: 3, dropLast: 5]

but I admit that there's a bunch of combinations here that would need to be added.

My point is that we have an English language soup that doesn’t compose naturally. Slicing in Python is much more elegant and composes well. If we didn’t currently have 6 separate methods (7 including subscript for index-based slicing) for handling this, that need to be separately documented and understood, I wouldn’t be so eager to replace the words with an EDSL, but in this case IMO it is an overall simplification.

My concern over trying to make it easier to take elements 3..<5 is that incrementing indexes is verbose for a reason, and adding a feature that makes it really easy to index into any collection by using integers is a bad idea as it will hide O(N) operations behind code that looks like O(1). And hiding these operations makes it really easy to accidentally turn an O(N) algorithm into an O(N^2) algorithm.

As I’ve said, I consider the presence of “$” to be enough of an indicator that something co$tly is happening, though I’m open to other ways of indicating it. I’m trying to strike a balance between “rigorous” and “easy to use,” here. Remember that Swift has to work in playgrounds and for beginning programmers, too. I am likewise unsatisfied with the (lack of) ease-of-use of String as well (e.g. for lexing and parsing tasks), and have made improving it a priority for Swift 3. I view fixing the slicing interface as part of that job.

Even if we need separate symbols for “start” and “end” (e.g. using “$” for both might just be too confusing for people in the end, even if it works otherwise), I still think a generalized form that allows ranges to be used everywhere for slicing is going to be much easier to understand than this hodgepodge of words we use today.

I'm tempted to say that if we do this, we should use two different sigils, and more importantly we should not use + and - but instead use methods on the sigils like advancedBy(), as if the sigils were literally placeholders for the start/end index. That way we won't write code that looks O(1) when it's not. For example:

col[^.advancedBy(3)..<$]

Although we'd need to revisit the names a little, because $.advancedBy(-3) is a bit odd when we know that $ can't ever take a non-negative number for that.

Or maybe we should just use $ instead as a token that means "the collection being indexed", so you'd actually say something like

col[$.startIndex.advancedBy(3)..<$.startIndex.advancedBy(5)]

I really like that direction, but I don’t think it does enough to solve the ease-of-use problem; I still think the result looks and feels horrible compared to Python for the constituencies mentioned above.

I briefly implemented this syntax, that was intended to suggest repeated incrementation:

  col.startIndex++3 // col.startIndex.advancedBy(3)

I don’t think that is viable, especially now that we’ve dropped “++” and “--“. But this syntax

  col[$.start⛄️3..<$.start⛄️5]

begins to be interesting for some definition of :snowman:️.

This solves the problem of subscripting a collection without having to store it in a local variable, without discarding any of the intentional index overhead. Of course, if the goal is to make index operations more concise this doesn't really help much, but my argument here is that it's hard to cut down on the verbosity without hiding O(N) operations.

That ship has already sailed somewhat, because e.g. every Collection has to have a count property, which can be O(N). But I still like to uphold it where possible. I just don’t think the combination of “+” and “$” necessarily has such a strong O(1) connotation… especially because the precedent for seeing those symbols together is regexps.

-Kevin Ballard

But the [fromStart:] and [fromEnd:] subscripts seem useful.

Yeah… I really want a unified solution that covers slicing as well as offset indexing.

-Dave

-Dave

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution


(Lily Ballard) #14

Why not make the “forgiving” version the default? I mean, the majority of python-style composable slicing would be happening on arrays and array slices, for which there’s no performance overhead, and the forgiving version would seam to suit the “safe-by-default” philosophy. I’ve seen mistakes like this:

let ar = [1, 2, 3, 4, 5] let arSlice = ar[2..<5] arSlice[1]

on a few occasions, for instance. I would think something like this:

let ar = [, 1, 2, 3, 4, 5]

let arSlice = ar[2...] // [3, 4, 5] arSlice[..<3] // [2, 3, 4]
arSlice[...3] // [2, 3, 4, 5] arSlice[direct: 2] // 2 arSlice[] // 2

Would be what was expected from most programmers learning Swift, while
leaving the unforgiving option open to those who need it.

You seem to be arguing against the notion that array slices preserve the
indexing of the base array, but that's not what's under discussion here.

-Kevin Ballard

···

On Mon, Dec 21, 2015, at 08:28 PM, Donnacha Oisín Kidney wrote:

On 22 Dec 2015, at 03:29, Dave Abrahams via swift-evolution <swift- >> evolution@swift.org> wrote:

On Dec 21, 2015, at 1:51 PM, Kevin Ballard <kevin@sb.org> wrote:

On Mon, Dec 21, 2015, at 11:56 AM, Dave Abrahams wrote:

On Dec 19, 2015, at 8:52 PM, Kevin Ballard via swift-evolution <swift- >>>>> evolution@swift.org> wrote:

On Fri, Dec 18, 2015, at 02:39 PM, Dave Abrahams via swift- >>>>> evolution wrote:

Yes, we already have facilities to do most of what Python can do
here, but one major problem IMO is that the “language” of slicing
is so non-uniform: we have [a..<b], dropFirst, dropLast, prefix,
and suffix. Introducing “$” for this purpose could make it all
hang together and also eliminate the “why does it have to be so
hard to look at the 2nd character of a string?!” problem. That
is, use the identifier “$” (yes, that’s an identifier in Swift)
to denote the beginning-or-end of a collection. Thus,

c[c.startIndex.advancedBy(3)] =>c[$+3] // Python: c[3]
c[c.endIndex.advancedBy(-3)] =>c[$-3] // Python: c[-3]

c.dropFirst(3) =>c[$+3...] // Python: c[3:] c.dropLast(3)
=>c[..<$-3] // Python: c[:-3] c.prefix(3) =>c[..<$+3] //
Python: c[:3] c.suffix(3) => c[$-3...] // Python: c[-3:]

It even has the nice connotation that, “this might be a little
more expen$ive than plain indexing” (which it might, for non-random-
access collections). I think the syntax is still a bit heavy,
not least because of “..<“ and “...”, but the direction has
potential.

I haven’t had the time to really experiment with a design like
this; the community might be able to help by prototyping and
using some alternatives. You can do all of this outside the
standard library with extensions.

Interesting idea.

One downside is it masks potentially O(N) operations
(ForwardIndex.advancedBy()) behind the + operator, which is
typically assumed to be an O(1) operation.

Yeah, but the “$” is sufficiently unusual that it doesn’t bother me
too much.

Alos, the $+3 syntax suggests that it requires there to be at
least 3 elements in the sequence, but
prefix()/suffix()/dropFirst/etc. all take maximum counts, so they
operate on sequences of fewer elements.

For indexing, $+3 would make that requirement. For slicing, it
wouldn’t. I’m not sure why you say something about
the_syntax_suggests exceeding bounds would be an error.

Because there's no precedent for + behaving like a saturating
addition, not in Swift and not, to my knowledge, anywhere else
either. The closest example that comes to mind is floating-point
numbers eventually ending up at Infinity, but that's not really
saturating addition, that's just a consequence of Infinity +
anything == Infinity. Nor do I think we should be establishing
precedent of using + for saturating addition, because that would be
surprising to people.

To call this “saturating addition” is an…interesting…interpretation.
I don’t view it that way at all. The “saturation,” if there is any,
happens as part of subscripting. You don’t even know what the
“saturation limit” is until you couple the range expression with the
collection.

In my view, the addition is part of an EDSL that represents a
notional position offset from the start or end, then the subscript
operation forgivingly trims these offsets as needed.

Additionally, I don't think adding a $ to an array slice expression
should result in a behavioral difference, e.g.
array[3..<array.endIndex] and array[$+3..<$] should behave the same

I see your point, but don’t (necessarily) agree with you there. “$”
here is used as an indicator of several of things, including not-necessarily-
O(1) and forgiving slicing. We could introduce a label just to
handle that:

array[forgivingAndNotO1: $+3..<$]

but it doesn’t look like a win to me.

There's also some confusion with using $ for both start and end.
What if I say c[$..<$]? We'd have to infer from position that the
first $ is the start and the second $ is the end, but then what
about c[$+n..<$+m]? We can't treat the usage of + as meaning "from
start" because the argument might be negative. And if we use the
overall sign of the operation/argument together, then the
expression `$+n` could mean from start or from end, which comes
right back to the problem with Python syntax.

There’s a problem with Python syntax? I’m guessing you mean that
c[a:b] can have very different interpretations depending on whether
a and b are positive or negative?

Exactly.

First of all, I should say: that doesn’t really bother me. The
99.9% use case for this operation uses literal constants for the
offsets, and I haven’t heard of it causing confusion for Python
programmers. That said, if we wanted to address it, we could
easily require n and m above to be literals, rather than Ints
(which incidentally guarantees it’s an O(1) operation). That has
upsides and downsides of course.

I don't think we should add this feature in any form if it only
supports literals.

I think Jacob's idea has some promise though:

c[c.startIndex.advancedBy(3)] => c[fromStart: 3]
c[c.endIndex.advancedBy(-3)] => c[fromEnd: 3]

But naming the slice operations is a little trickier. We could
actually just go ahead and re-use the existing method names for
those:

c.dropFirst(3) => c[dropFirst: 3]
c.dropLast(3) => c[dropLast: 3]
c.prefix(3) => c[prefix: 3]
c.suffix(3) => c[suffix: 3]

That's not so compelling, since we already have the methods, but I
suppose it makes sense if you want to try and make all slice-
producing methods use subscript syntax (which I have mixed
feelings about).

Once we get efficient in-place slice mutation (via slice
addressors), it becomes a lot more compelling, IMO. But I still
don’t find the naming terribly clear, and I don’t love that one
needs to combine two subscript operations in order to drop the
first and last element or take just elements 3..<5.

You can always add more overloads, such as

c[dropFirst: 3, dropLast: 5]

but I admit that there's a bunch of combinations here that would
need to be added.

My point is that we have an English language soup that doesn’t
compose naturally. Slicing in Python is much more elegant and
composes well. If we didn’t currently have 6 separate methods (7
including subscript for index-based slicing) for handling this, that
need to be separately documented and understood, I wouldn’t be so
eager to replace the words with an EDSL, but in this case IMO it is
an overall simplification.

My concern over trying to make it easier to take elements 3..<5 is
that incrementing indexes is verbose for a reason, and adding a
feature that makes it really easy to index into any collection by
using integers is a bad idea as it will hide O(N) operations behind
code that looks like O(1). And hiding these operations makes it
really easy to accidentally turn an O(N) algorithm into an O(N^2)
algorithm.

As I’ve said, I consider the presence of “$” to be enough of an
indicator that something co$tly is happening, though I’m open to
other ways of indicating it. I’m trying to strike a balance
between “rigorous” and “easy to use,” here. Remember that Swift
has to work in playgrounds and for beginning programmers, too. I
am likewise unsatisfied with the (lack of) ease-of-use of String as
well (e.g. for lexing and parsing tasks), and have made improving
it a priority for Swift 3. I view fixing the slicing interface as
part of that job.

Even if we need separate symbols for “start” and “end” (e.g. using
“$” for both might just be too confusing for people in the end,
even if it works otherwise), I still think a generalized form that
allows ranges to be used everywhere for slicing is going to be much
easier to understand than this hodgepodge of words we use today.

I'm tempted to say that if we do this, we should use two different
sigils, and more importantly we should not use + and - but instead
use methods on the sigils like advancedBy(), as if the sigils were
literally placeholders for the start/end index. That way we won't
write code that looks O(1) when it's not. For example:

col[^.advancedBy(3)..<$]

Although we'd need to revisit the names a little, because $.advancedBy(-
3) is a bit odd when we know that $ can't ever take a non-negative
number for that.

Or maybe we should just use $ instead as a token that means "the
collection being indexed", so you'd actually say something like

col[$.startIndex.advancedBy(3)..<$.startIndex.advancedBy(5)]

I really like that direction, but I don’t think it does enough to
solve the ease-of-use problem; I still think the result looks and
feels horrible compared to Python for the constituencies
mentioned above.

I briefly implemented this syntax, that was intended to suggest
repeated incrementation:

col.startIndex++3 // col.startIndex.advancedBy(3)

I don’t think that is viable, especially now that we’ve dropped “++”
and “--“. But this syntax

col[$.start⛄️3..<$.start⛄️5]

begins to be interesting for some definition of :snowman:️.

This solves the problem of subscripting a collection without having
to store it in a local variable, without discarding any of the
intentional index overhead. Of course, if the goal is to make index
operations more concise this doesn't really help much, but my
argument here is that it's hard to cut down on the verbosity without
hiding O(N) operations.

That ship has already sailed somewhat, because e.g. every Collection
has to have a count property, which can be O(N). But I still like to
uphold it where possible. I just don’t think the combination of “+”
and “$” necessarily has such a strong O(1) connotation… especially
because the precedent for seeing those symbols together is regexps.

-Kevin Ballard

But the [fromStart:] and [fromEnd:] subscripts seem useful.

Yeah… I really want a unified solution that covers slicing as well
as offset indexing.

-Dave

-Dave

_______________________________________________
swift-evolution mailing list swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Jordan Rose) #15

Without commenting on the rest of this thread, the current rule is that identifiers starting with "$" are reserved for the debugger (not counting implicit closure args). We can change that rule, but the debugger folks won't be happy—the implicit variables you get from the REPL, for example, should stay short. I'm not sure if '$' itself falls under the current rule, though.

Jordan

···

On Dec 21, 2015, at 19:47 , Kevin Ballard via swift-evolution <swift-evolution@swift.org> wrote:

On another note, I'm tempted to say that we should use $start and $end instead of $.start and $.end. The compiler doesn't currently allow this, because it expects a number after the $, but I see no reason why we can't relax that rule and allow $start to be a valid token. The benefit of this approach is it frees up $ to be used by third-party code (such as in the older thread about rebinding `self` for DSLs where I suggested that a block-based API can use $ as the parameter name so code would say something like `$.expect(foo).to(.equal(bar))`).


(Donnacha Oisín Kidney) #16

I don’t think I am. Maybe I’m confused: the current suggestion is the addition of a $ operator (or labelled subscripts, or another operator) to signify “offset indexing”, yes? As in:

someCollection[$3] == someCollection[someCollection.startIndex.advancedBy(3)]
someCollection[$3..<$] == someCollection[someCollection.startIndex.advancedBy(3)..<someCollection.endIndex]

I’m not arguing against preserving the indexing of the base array, I understand its benefits. I’m arguing that, instead of using an extra indicator (like $) to indicate offset indexing, with the default being non-offset, why not have the offset indexing be the default, requiring an extra indication (like the label direct) for the non-offset. This would keep the benefits of non-offset indexing, because you’d still have access to it.

Is think that’s part of this discussion, right? I could start another thread, if not.

Oisín

···

On 22 Dec 2015, at 20:06, Kevin Ballard <kevin@sb.org> wrote:

On Mon, Dec 21, 2015, at 08:28 PM, Donnacha Oisín Kidney wrote:

Why not make the “forgiving” version the default? I mean, the majority of python-style composable slicing would be happening on arrays and array slices, for which there’s no performance overhead, and the forgiving version would seam to suit the “safe-by-default” philosophy. I’ve seen mistakes like this:

let ar = [1, 2, 3, 4, 5]
let arSlice = ar[2..<5]
arSlice[1]

on a few occasions, for instance. I would think something like this:

let ar = [0, 1, 2, 3, 4, 5]

let arSlice = ar[2...] // [3, 4, 5]
arSlice[..<3] // [2, 3, 4]
arSlice[...3] // [2, 3, 4, 5]
arSlice[direct: 2] // 2
arSlice[0] // 2

Would be what was expected from most programmers learning Swift, while leaving the unforgiving option open to those who need it.

You seem to be arguing against the notion that array slices preserve the indexing of the base array, but that's not what's under discussion here.

-Kevin Ballard

On 22 Dec 2015, at 03:29, Dave Abrahams via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Dec 21, 2015, at 1:51 PM, Kevin Ballard <kevin@sb.org <mailto:kevin@sb.org>> wrote:

On Mon, Dec 21, 2015, at 11:56 AM, Dave Abrahams wrote:

On Dec 19, 2015, at 8:52 PM, Kevin Ballard via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Fri, Dec 18, 2015, at 02:39 PM, Dave Abrahams via swift-evolution wrote:

Yes, we already have facilities to do most of what Python can do here, but one major problem IMO is that the “language” of slicing is so non-uniform: we have [a..<b], dropFirst, dropLast, prefix, and suffix. Introducing “$” for this purpose could make it all hang together and also eliminate the “why does it have to be so hard to look at the 2nd character of a string?!” problem. That is, use the identifier “$” (yes, that’s an identifier in Swift) to denote the beginning-or-end of a collection. Thus,

  c[c.startIndex.advancedBy(3)] =>c[$+3] // Python: c[3]
  c[c.endIndex.advancedBy(-3)] =>c[$-3] // Python: c[-3]

  c.dropFirst(3) =>c[$+3...] // Python: c[3:]
  c.dropLast(3) =>c[..<$-3] // Python: c[:-3]
  c.prefix(3) =>c[..<$+3] // Python: c[:3]
  c.suffix(3) => c[$-3...] // Python: c[-3:]

It even has the nice connotation that, “this might be a little more expen$ive than plain indexing” (which it might, for non-random-access collections). I think the syntax is still a bit heavy, not least because of “..<“ and “...”, but the direction has potential.

I haven’t had the time to really experiment with a design like this; the community might be able to help by prototyping and using some alternatives. You can do all of this outside the standard library with extensions.

Interesting idea.

One downside is it masks potentially O(N) operations (ForwardIndex.advancedBy()) behind the + operator, which is typically assumed to be an O(1) operation.

Yeah, but the “$” is sufficiently unusual that it doesn’t bother me too much.

Alos, the $+3 syntax suggests that it requires there to be at least 3 elements in the sequence, but prefix()/suffix()/dropFirst/etc. all take maximum counts, so they operate on sequences of fewer elements.

For indexing, $+3 would make that requirement. For slicing, it wouldn’t. I’m not sure why you say something about thesyntaxsuggests exceeding bounds would be an error.

Because there's no precedent for + behaving like a saturating addition, not in Swift and not, to my knowledge, anywhere else either. The closest example that comes to mind is floating-point numbers eventually ending up at Infinity, but that's not really saturating addition, that's just a consequence of Infinity + anything == Infinity. Nor do I think we should be establishing precedent of using + for saturating addition, because that would be surprising to people.

To call this “saturating addition” is an…interesting…interpretation. I don’t view it that way at all. The “saturation,” if there is any, happens as part of subscripting. You don’t even know what the “saturation limit” is until you couple the range expression with the collection.

In my view, the addition is part of an EDSL that represents a notional position offset from the start or end, then the subscript operation forgivingly trims these offsets as needed.

Additionally, I don't think adding a $ to an array slice expression should result in a behavioral difference, e.g. array[3..<array.endIndex] and array[$+3..<$] should behave the same

I see your point, but don’t (necessarily) agree with you there. “$” here is used as an indicator of several of things, including not-necessarily-O(1) and forgiving slicing. We could introduce a label just to handle that:

array[forgivingAndNotO1: $+3..<$]

but it doesn’t look like a win to me.

There's also some confusion with using $ for both start and end. What if I say c[$..<$]? We'd have to infer from position that the first $ is the start and the second $ is the end, but then what about c[$+n..<$+m]? We can't treat the usage of + as meaning "from start" because the argument might be negative. And if we use the overall sign of the operation/argument together, then the expression `$+n` could mean from start or from end, which comes right back to the problem with Python syntax.

There’s a problem with Python syntax? I’m guessing you mean that c[a:b] can have very different interpretations depending on whether a and b are positive or negative?

Exactly.

First of all, I should say: that doesn’t really bother me. The 99.9% use case for this operation uses literal constants for the offsets, and I haven’t heard of it causing confusion for Python programmers. That said, if we wanted to address it, we could easily require n and m above to be literals, rather than Ints (which incidentally guarantees it’s an O(1) operation). That has upsides and downsides of course.

I don't think we should add this feature in any form if it only supports literals.

I think Jacob's idea has some promise though:

c[c.startIndex.advancedBy(3)] => c[fromStart: 3]
c[c.endIndex.advancedBy(-3)] => c[fromEnd: 3]

But naming the slice operations is a little trickier. We could actually just go ahead and re-use the existing method names for those:

c.dropFirst(3) => c[dropFirst: 3]
c.dropLast(3) => c[dropLast: 3]
c.prefix(3) => c[prefix: 3]
c.suffix(3) => c[suffix: 3]

That's not so compelling, since we already have the methods, but I suppose it makes sense if you want to try and make all slice-producing methods use subscript syntax (which I have mixed feelings about).

Once we get efficient in-place slice mutation (via slice addressors), it becomes a lot more compelling, IMO. But I still don’t find the naming terribly clear, and I don’t love that one needs to combine two subscript operations in order to drop the first and last element or take just elements 3..<5.

You can always add more overloads, such as

c[dropFirst: 3, dropLast: 5]

but I admit that there's a bunch of combinations here that would need to be added.

My point is that we have an English language soup that doesn’t compose naturally. Slicing in Python is much more elegant and composes well. If we didn’t currently have 6 separate methods (7 including subscript for index-based slicing) for handling this, that need to be separately documented and understood, I wouldn’t be so eager to replace the words with an EDSL, but in this case IMO it is an overall simplification.

My concern over trying to make it easier to take elements 3..<5 is that incrementing indexes is verbose for a reason, and adding a feature that makes it really easy to index into any collection by using integers is a bad idea as it will hide O(N) operations behind code that looks like O(1). And hiding these operations makes it really easy to accidentally turn an O(N) algorithm into an O(N^2) algorithm.

As I’ve said, I consider the presence of “$” to be enough of an indicator that something co$tly is happening, though I’m open to other ways of indicating it. I’m trying to strike a balance between “rigorous” and “easy to use,” here. Remember that Swift has to work in playgrounds and for beginning programmers, too. I am likewise unsatisfied with the (lack of) ease-of-use of String as well (e.g. for lexing and parsing tasks), and have made improving it a priority for Swift 3. I view fixing the slicing interface as part of that job.

Even if we need separate symbols for “start” and “end” (e.g. using “$” for both might just be too confusing for people in the end, even if it works otherwise), I still think a generalized form that allows ranges to be used everywhere for slicing is going to be much easier to understand than this hodgepodge of words we use today.

I'm tempted to say that if we do this, we should use two different sigils, and more importantly we should not use + and - but instead use methods on the sigils like advancedBy(), as if the sigils were literally placeholders for the start/end index. That way we won't write code that looks O(1) when it's not. For example:

col[^.advancedBy(3)..<$]

Although we'd need to revisit the names a little, because $.advancedBy(-3) is a bit odd when we know that $ can't ever take a non-negative number for that.

Or maybe we should just use $ instead as a token that means "the collection being indexed", so you'd actually say something like

col[$.startIndex.advancedBy(3)..<$.startIndex.advancedBy(5)]

I really like that direction, but I don’t think it does enough to solve the ease-of-use problem; I still think the result looks and feels horrible compared to Python for the constituencies mentioned above.

I briefly implemented this syntax, that was intended to suggest repeated incrementation:

col.startIndex++3 // col.startIndex.advancedBy(3)

I don’t think that is viable, especially now that we’ve dropped “++” and “--“. But this syntax

col[$.start⛄️3..<$.start⛄️5]

begins to be interesting for some definition of :snowman:️.

This solves the problem of subscripting a collection without having to store it in a local variable, without discarding any of the intentional index overhead. Of course, if the goal is to make index operations more concise this doesn't really help much, but my argument here is that it's hard to cut down on the verbosity without hiding O(N) operations.

That ship has already sailed somewhat, because e.g. every Collection has to have a count property, which can be O(N). But I still like to uphold it where possible. I just don’t think the combination of “+” and “$” necessarily has such a strong O(1) connotation… especially because the precedent for seeing those symbols together is regexps.

-Kevin Ballard

But the [fromStart:] and [fromEnd:] subscripts seem useful.

Yeah… I really want a unified solution that covers slicing as well as offset indexing.

-Dave

-Dave

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution


(Lily Ballard) #17

Oh that's a good point, I hadn't thought of that. It makes sense to keep $abc reserved for the debugger. I don't believe LLDB tries to use a bare $ anywhere (although I could be wrong) so leaving that as a valid identifier should be fine.

-Kevin Ballard

···

On Tue, Dec 22, 2015, at 07:48 PM, Jordan Rose wrote:

> On Dec 21, 2015, at 19:47 , Kevin Ballard via swift-evolution <swift-evolution@swift.org> wrote:
>
> On another note, I'm tempted to say that we should use $start and $end instead of $.start and $.end. The compiler doesn't currently allow this, because it expects a number after the $, but I see no reason why we can't relax that rule and allow $start to be a valid token. The benefit of this approach is it frees up $ to be used by third-party code (such as in the older thread about rebinding `self` for DSLs where I suggested that a block-based API can use $ as the parameter name so code would say something like `$.expect(foo).to(.equal(bar))`).

Without commenting on the rest of this thread, the current rule is that identifiers starting with "$" are reserved for the debugger (not counting implicit closure args). We can change that rule, but the debugger folks won't be happy—the implicit variables you get from the REPL, for example, should stay short. I'm not sure if '$' itself falls under the current rule, though.

Jordan


(Lily Ballard) #18

It was an intentional decision for Swift's indexes to not be based on
offset indexing. In many cases (including indexing into strings)
calculating index offsets is an O(N) operation. The design of Swift's
indexes is such that you pay the cost when constructing an index, rather
than when using the index, so that way you can pay the cost once and re-
use that index many times (and similarly, if you index over the
collection, you can save indexes and revisit them without any cost).
Switching to offset indexing by default would throw away that cost and
cause a lot of collection operations to accidentally be O(N) when they
look like they're O(1) (which would in turn cause many O(N) algorithms
to accidentally become O(N^2)).

-Kevin Ballard

···

On Tue, Dec 22, 2015, at 12:58 PM, Donnacha Oisín Kidney wrote:

I don’t think I am. Maybe I’m confused: the current suggestion is the
addition of a $ operator (or labelled subscripts, or another operator)
to signify “offset indexing”, yes? As in:

someCollection[$3] ==
someCollection[someCollection.startIndex.advancedBy(3)] someCollectio-
n[$3..<$]==someCollection[someCollection.startIndex.advancedBy(3)..<s-
omeCollection.endIndex]

I’m not arguing against preserving the indexing of the base array, I
understand its benefits. I’m arguing that, instead of using an extra
indicator (like $) to indicate offset indexing, with the default being
non-offset, why not have the *offset* indexing be the default,
requiring an extra indication (like the label direct) for the non-
offset. This would keep the benefits of non-offset indexing, because
you’d still have access to it.

Is *think* that’s part of this discussion, right? I could start
another thread, if not.

Oisín

On 22 Dec 2015, at 20:06, Kevin Ballard <kevin@sb.org> wrote:

On Mon, Dec 21, 2015, at 08:28 PM, Donnacha Oisín Kidney wrote:

Why not make the “forgiving” version the default? I mean, the
majority of python-style composable slicing would be happening on
arrays and array slices, for which there’s no performance overhead,
and the forgiving version would seam to suit the “safe-by-default”
philosophy. I’ve seen mistakes like this:

let ar = [1, 2, 3, 4, 5] let arSlice = ar[2..<5] arSlice[1]

on a few occasions, for instance. I would think something like this:

let ar = [, 1, 2, 3, 4, 5]

let arSlice = ar[2...] // [3, 4, 5] arSlice[..<3] // [2, 3, 4]
arSlice[...3] // [2, 3, 4, 5] arSlice[direct: 2] // 2 arSlice[] // 2

Would be what was expected from most programmers learning Swift,
while leaving the unforgiving option open to those who need it.

You seem to be arguing against the notion that array slices preserve
the indexing of the base array, but that's not what's under
discussion here.

-Kevin Ballard

On 22 Dec 2015, at 03:29, Dave Abrahams via swift-evolution <swift- >>>> evolution@swift.org> wrote:

On Dec 21, 2015, at 1:51 PM, Kevin Ballard <kevin@sb.org> wrote:

On Mon, Dec 21, 2015, at 11:56 AM, Dave Abrahams wrote:

On Dec 19, 2015, at 8:52 PM, Kevin Ballard via swift-evolution >>>>>>> <swift-evolution@swift.org> wrote:

On Fri, Dec 18, 2015, at 02:39 PM, Dave Abrahams via swift- >>>>>>> evolution wrote:

Yes, we already have facilities to do most of what Python can
do here, but one major problem IMO is that the “language” of
slicing is so non-uniform: we have [a..<b], dropFirst,
dropLast, prefix, and suffix. Introducing “$” for this purpose
could make it all hang together and also eliminate the “why
does it have to be so hard to look at the 2nd character of a
string?!” problem. That is, use the identifier “$” (yes,
that’s an identifier in Swift) to denote the beginning-or-end
of a collection. Thus,

c[c.startIndex.advancedBy(3)] =>c[$+3] // Python: c[3]
c[c.endIndex.advancedBy(-3)] =>c[$-3] // Python: c[-3]

c.dropFirst(3) =>c[$+3...] // Python: c[3:] c.dropLast(3)
=>c[..<$-3] // Python: c[:-3] c.prefix(3) =>c[..<$+3]
// Python: c[:3] c.suffix(3) => c[$-3...] // Python: c[-
3:]

It even has the nice connotation that, “this might be a little
more expen$ive than plain indexing” (which it might, for non-random-
access collections). I think the syntax is still a bit heavy,
not least because of “..<“ and “...”, but the direction has
potential.

I haven’t had the time to really experiment with a design like
this; the community might be able to help by prototyping and
using some alternatives. You can do all of this outside the
standard library with extensions.

Interesting idea.

One downside is it masks potentially O(N) operations
(ForwardIndex.advancedBy()) behind the + operator, which is
typically assumed to be an O(1) operation.

Yeah, but the “$” is sufficiently unusual that it doesn’t bother
me too much.

Alos, the $+3 syntax suggests that it requires there to be at
least 3 elements in the sequence, but
prefix()/suffix()/dropFirst/etc. all take maximum counts, so
they operate on sequences of fewer elements.

For indexing, $+3 would make that requirement. For slicing, it
wouldn’t. I’m not sure why you say something about
the_syntax_suggests exceeding bounds would be an error.

Because there's no precedent for + behaving like a saturating
addition, not in Swift and not, to my knowledge, anywhere else
either. The closest example that comes to mind is floating-point
numbers eventually ending up at Infinity, but that's not really
saturating addition, that's just a consequence of Infinity +
anything == Infinity. Nor do I think we should be establishing
precedent of using + for saturating addition, because that would
be surprising to people.

To call this “saturating addition” is
an…interesting…interpretation. I don’t view it that way at all.
The “saturation,” if there is any, happens as part of subscripting.
You don’t even know what the “saturation limit” is until you couple
the range expression with the collection.

In my view, the addition is part of an EDSL that represents a
notional position offset from the start or end, then the subscript
operation forgivingly trims these offsets as needed.

Additionally, I don't think adding a $ to an array slice
expression should result in a behavioral difference, e.g.
array[3..<array.endIndex] and array[$+3..<$] should behave the
same

I see your point, but don’t (necessarily) agree with you there.
“$” here is used as an indicator of several of things, including
not-necessarily-O(1) and forgiving slicing. We could introduce a
label just to handle that:

array[forgivingAndNotO1: $+3..<$]

but it doesn’t look like a win to me.

There's also some confusion with using $ for both start and end.
What if I say c[$..<$]? We'd have to infer from position that
the first $ is the start and the second $ is the end, but then
what about c[$+n..<$+m]? We can't treat the usage of + as
meaning "from start" because the argument might be negative. And
if we use the overall sign of the operation/argument together,
then the expression `$+n` could mean from start or from end,
which comes right back to the problem with Python syntax.

There’s a problem with Python syntax? I’m guessing you mean that
c[a:b] can have very different interpretations depending on
whether a and b are positive or negative?

Exactly.

First of all, I should say: that doesn’t really bother me. The
99.9% use case for this operation uses literal constants for the
offsets, and I haven’t heard of it causing confusion for Python
programmers. That said, if we wanted to address it, we could
easily require n and m above to be literals, rather than Ints
(which incidentally guarantees it’s an O(1) operation). That has
upsides and downsides of course.

I don't think we should add this feature in any form if it only
supports literals.

I think Jacob's idea has some promise though:

c[c.startIndex.advancedBy(3)] => c[fromStart: 3]
c[c.endIndex.advancedBy(-3)] => c[fromEnd: 3]

But naming the slice operations is a little trickier. We could
actually just go ahead and re-use the existing method names for
those:

c.dropFirst(3) => c[dropFirst: 3]
c.dropLast(3) => c[dropLast: 3]
c.prefix(3) => c[prefix: 3]
c.suffix(3) => c[suffix: 3]

That's not so compelling, since we already have the methods, but
I suppose it makes sense if you want to try and make all slice-
producing methods use subscript syntax (which I have mixed
feelings about).

Once we get efficient in-place slice mutation (via slice
addressors), it becomes a lot more compelling, IMO. But I still
don’t find the naming terribly clear, and I don’t love that one
needs to combine two subscript operations in order to drop the
first and last element or take just elements 3..<5.

You can always add more overloads, such as

c[dropFirst: 3, dropLast: 5]

but I admit that there's a bunch of combinations here that would
need to be added.

My point is that we have an English language soup that doesn’t
compose naturally. Slicing in Python is much more elegant and
composes well. If we didn’t currently have 6 separate methods (7
including subscript for index-based slicing) for handling this,
that need to be separately documented and understood, I wouldn’t be
so eager to replace the words with an EDSL, but in this case IMO it
is an overall simplification.

My concern over trying to make it easier to take elements 3..<5 is
that incrementing indexes is verbose for a reason, and adding a
feature that makes it really easy to index into any collection by
using integers is a bad idea as it will hide O(N) operations
behind code that looks like O(1). And hiding these operations
makes it really easy to accidentally turn an O(N) algorithm into
an O(N^2) algorithm.

As I’ve said, I consider the presence of “$” to be enough of an
indicator that something co$tly is happening, though I’m open to
other ways of indicating it. I’m trying to strike a balance
between “rigorous” and “easy to use,” here. Remember that Swift
has to work in playgrounds and for beginning programmers, too. I
am likewise unsatisfied with the (lack of) ease-of-use of String as
well (e.g. for lexing and parsing tasks), and have made improving
it a priority for Swift 3. I view fixing the slicing interface as
part of that job.

Even if we need separate symbols for “start” and “end” (e.g.
using “$” for both might just be too confusing for people in the
end, even if it works otherwise), I still think a generalized
form that allows ranges to be used everywhere for slicing is
going to be much easier to understand than this hodgepodge of
words we use today.

I'm tempted to say that if we do this, we should use two different
sigils, and more importantly we should not use + and - but instead
use methods on the sigils like advancedBy(), as if the sigils were
literally placeholders for the start/end index. That way we won't
write code that looks O(1) when it's not. For example:

col[^.advancedBy(3)..<$]

Although we'd need to revisit the names a little, because $.advancedBy(-
3) is a bit odd when we know that $ can't ever take a non-negative
number for that.

Or maybe we should just use $ instead as a token that means "the
collection being indexed", so you'd actually say something like

col[$.startIndex.advancedBy(3)..<$.startIndex.advancedBy(5)]

I really like that direction, but I don’t think it does enough to
solve the ease-of-use problem; I still think the result looks and
feels horrible compared to Python for the constituencies mentioned
above.

I briefly implemented this syntax, that was intended to suggest
repeated incrementation:

col.startIndex++3 // col.startIndex.advancedBy(3)

I don’t think that is viable, especially now that we’ve dropped
“++” and “--“. But this syntax

col[$.start⛄️3..<$.start⛄️5]

begins to be interesting for some definition of :snowman:️.

This solves the problem of subscripting a collection without
having to store it in a local variable, without discarding any of
the intentional index overhead. Of course, if the goal is to make
index operations more concise this doesn't really help much, but
my argument here is that it's hard to cut down on the verbosity
without hiding O(N) operations.

That ship has already sailed somewhat, because e.g. every
Collection has to have a count property, which can be O(N). But I
still like to uphold it where possible. I just don’t think the
combination of “+” and “$” necessarily has such a strong O(1)
connotation… especially because the precedent for seeing those
symbols together is regexps.

-Kevin Ballard

But the [fromStart:] and [fromEnd:] subscripts seem useful.

Yeah… I really want a unified solution that covers slicing as
well as offset indexing.

-Dave

-Dave

_______________________________________________
swift-evolution mailing list swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Donnacha Oisín Kidney) #19

There’s two different issues here, as far as I can tell. (both of which the $ operator seems to be trying to solve) The first is being able to use the start and end indices of a collection in a more concise way, so you can write ar[$3..<$] instead of ar[(ar.startIndex+3)..<ar.endIndex], and the second is coercing Ints into the indices of other collections, possibly in worse-than-O(1) time.

With regards to arrays, all of these operations are going to be O(1), of course. So, in that case, there’s no need to steer people away from using the offset version. To me, something like ar[3..<] is clear, and it’s obvious what’s going on. If the end index is left out, it’s kind of “implicit”. Similarly with the start index: I’d expect ar[..<3] to return the first three elements of the array.

If you’re to follow that logic through, and use the same language with array slices, I think that “offset-as-default” is the only way that makes sense. Of course, access to the indices of the original array are important, but you can manage that with a labelled subscript.

In terms of non random-access collections, the open-ended and open-started slice syntax (ar[3..<]) is still applicable, without any performance hit (if you only allowed the collection’s native index type to be used). Beyond that, I’m not sure what the best option is.

Oisín

···

On 22 Dec 2015, at 21:08, Kevin Ballard <kevin@sb.org> wrote:

It was an intentional decision for Swift's indexes to not be based on offset indexing. In many cases (including indexing into strings) calculating index offsets is an O(N) operation. The design of Swift's indexes is such that you pay the cost when constructing an index, rather than when using the index, so that way you can pay the cost once and re-use that index many times (and similarly, if you index over the collection, you can save indexes and revisit them without any cost). Switching to offset indexing by default would throw away that cost and cause a lot of collection operations to accidentally be O(N) when they look like they're O(1) (which would in turn cause many O(N) algorithms to accidentally become O(N^2)).

-Kevin Ballard

On Tue, Dec 22, 2015, at 12:58 PM, Donnacha Oisín Kidney wrote:

I don’t think I am. Maybe I’m confused: the current suggestion is the addition of a $ operator (or labelled subscripts, or another operator) to signify “offset indexing”, yes? As in:

someCollection[$3] == someCollection[someCollection.startIndex.advancedBy(3)]
someCollection[$3..<$]==someCollection[someCollection.startIndex.advancedBy(3)..<someCollection.endIndex]

I’m not arguing against preserving the indexing of the base array, I understand its benefits. I’m arguing that, instead of using an extra indicator (like $) to indicate offset indexing, with the default being non-offset, why not have the offset indexing be the default, requiring an extra indication (like the label direct) for the non-offset. This would keep the benefits of non-offset indexing, because you’d still have access to it.

Is think that’s part of this discussion, right? I could start another thread, if not.

Oisín

On 22 Dec 2015, at 20:06, Kevin Ballard <kevin@sb.org <mailto:kevin@sb.org>> wrote:

On Mon, Dec 21, 2015, at 08:28 PM, Donnacha Oisín Kidney wrote:

Why not make the “forgiving” version the default? I mean, the majority of python-style composable slicing would be happening on arrays and array slices, for which there’s no performance overhead, and the forgiving version would seam to suit the “safe-by-default” philosophy. I’ve seen mistakes like this:

let ar = [1, 2, 3, 4, 5]
let arSlice = ar[2..<5]
arSlice[1]

on a few occasions, for instance. I would think something like this:

let ar = [0, 1, 2, 3, 4, 5]

let arSlice = ar[2...] // [3, 4, 5]
arSlice[..<3] // [2, 3, 4]
arSlice[...3] // [2, 3, 4, 5]
arSlice[direct: 2] // 2
arSlice[0] // 2

Would be what was expected from most programmers learning Swift, while leaving the unforgiving option open to those who need it.

You seem to be arguing against the notion that array slices preserve the indexing of the base array, but that's not what's under discussion here.

-Kevin Ballard

On 22 Dec 2015, at 03:29, Dave Abrahams via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Dec 21, 2015, at 1:51 PM, Kevin Ballard <kevin@sb.org <mailto:kevin@sb.org>> wrote:

On Mon, Dec 21, 2015, at 11:56 AM, Dave Abrahams wrote:

On Dec 19, 2015, at 8:52 PM, Kevin Ballard via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Fri, Dec 18, 2015, at 02:39 PM, Dave Abrahams via swift-evolution wrote:

Yes, we already have facilities to do most of what Python can do here, but one major problem IMO is that the “language” of slicing is so non-uniform: we have [a..<b], dropFirst, dropLast, prefix, and suffix. Introducing “$” for this purpose could make it all hang together and also eliminate the “why does it have to be so hard to look at the 2nd character of a string?!” problem. That is, use the identifier “$” (yes, that’s an identifier in Swift) to denote the beginning-or-end of a collection. Thus,

  c[c.startIndex.advancedBy(3)] =>c[$+3] // Python: c[3]
  c[c.endIndex.advancedBy(-3)] =>c[$-3] // Python: c[-3]

  c.dropFirst(3) =>c[$+3...] // Python: c[3:]
  c.dropLast(3) =>c[..<$-3] // Python: c[:-3]
  c.prefix(3) =>c[..<$+3] // Python: c[:3]
  c.suffix(3) => c[$-3...] // Python: c[-3:]

It even has the nice connotation that, “this might be a little more expen$ive than plain indexing” (which it might, for non-random-access collections). I think the syntax is still a bit heavy, not least because of “..<“ and “...”, but the direction has potential.

I haven’t had the time to really experiment with a design like this; the community might be able to help by prototyping and using some alternatives. You can do all of this outside the standard library with extensions.

Interesting idea.

One downside is it masks potentially O(N) operations (ForwardIndex.advancedBy()) behind the + operator, which is typically assumed to be an O(1) operation.

Yeah, but the “$” is sufficiently unusual that it doesn’t bother me too much.

Alos, the $+3 syntax suggests that it requires there to be at least 3 elements in the sequence, but prefix()/suffix()/dropFirst/etc. all take maximum counts, so they operate on sequences of fewer elements.

For indexing, $+3 would make that requirement. For slicing, it wouldn’t. I’m not sure why you say something about thesyntaxsuggests exceeding bounds would be an error.

Because there's no precedent for + behaving like a saturating addition, not in Swift and not, to my knowledge, anywhere else either. The closest example that comes to mind is floating-point numbers eventually ending up at Infinity, but that's not really saturating addition, that's just a consequence of Infinity + anything == Infinity. Nor do I think we should be establishing precedent of using + for saturating addition, because that would be surprising to people.

To call this “saturating addition” is an…interesting…interpretation. I don’t view it that way at all. The “saturation,” if there is any, happens as part of subscripting. You don’t even know what the “saturation limit” is until you couple the range expression with the collection.

In my view, the addition is part of an EDSL that represents a notional position offset from the start or end, then the subscript operation forgivingly trims these offsets as needed.

Additionally, I don't think adding a $ to an array slice expression should result in a behavioral difference, e.g. array[3..<array.endIndex] and array[$+3..<$] should behave the same

I see your point, but don’t (necessarily) agree with you there. “$” here is used as an indicator of several of things, including not-necessarily-O(1) and forgiving slicing. We could introduce a label just to handle that:

array[forgivingAndNotO1: $+3..<$]

but it doesn’t look like a win to me.

There's also some confusion with using $ for both start and end. What if I say c[$..<$]? We'd have to infer from position that the first $ is the start and the second $ is the end, but then what about c[$+n..<$+m]? We can't treat the usage of + as meaning "from start" because the argument might be negative. And if we use the overall sign of the operation/argument together, then the expression `$+n` could mean from start or from end, which comes right back to the problem with Python syntax.

There’s a problem with Python syntax? I’m guessing you mean that c[a:b] can have very different interpretations depending on whether a and b are positive or negative?

Exactly.

First of all, I should say: that doesn’t really bother me. The 99.9% use case for this operation uses literal constants for the offsets, and I haven’t heard of it causing confusion for Python programmers. That said, if we wanted to address it, we could easily require n and m above to be literals, rather than Ints (which incidentally guarantees it’s an O(1) operation). That has upsides and downsides of course.

I don't think we should add this feature in any form if it only supports literals.

I think Jacob's idea has some promise though:

c[c.startIndex.advancedBy(3)] => c[fromStart: 3]
c[c.endIndex.advancedBy(-3)] => c[fromEnd: 3]

But naming the slice operations is a little trickier. We could actually just go ahead and re-use the existing method names for those:

c.dropFirst(3) => c[dropFirst: 3]
c.dropLast(3) => c[dropLast: 3]
c.prefix(3) => c[prefix: 3]
c.suffix(3) => c[suffix: 3]

That's not so compelling, since we already have the methods, but I suppose it makes sense if you want to try and make all slice-producing methods use subscript syntax (which I have mixed feelings about).

Once we get efficient in-place slice mutation (via slice addressors), it becomes a lot more compelling, IMO. But I still don’t find the naming terribly clear, and I don’t love that one needs to combine two subscript operations in order to drop the first and last element or take just elements 3..<5.

You can always add more overloads, such as

c[dropFirst: 3, dropLast: 5]

but I admit that there's a bunch of combinations here that would need to be added.

My point is that we have an English language soup that doesn’t compose naturally. Slicing in Python is much more elegant and composes well. If we didn’t currently have 6 separate methods (7 including subscript for index-based slicing) for handling this, that need to be separately documented and understood, I wouldn’t be so eager to replace the words with an EDSL, but in this case IMO it is an overall simplification.

My concern over trying to make it easier to take elements 3..<5 is that incrementing indexes is verbose for a reason, and adding a feature that makes it really easy to index into any collection by using integers is a bad idea as it will hide O(N) operations behind code that looks like O(1). And hiding these operations makes it really easy to accidentally turn an O(N) algorithm into an O(N^2) algorithm.

As I’ve said, I consider the presence of “$” to be enough of an indicator that something co$tly is happening, though I’m open to other ways of indicating it. I’m trying to strike a balance between “rigorous” and “easy to use,” here. Remember that Swift has to work in playgrounds and for beginning programmers, too. I am likewise unsatisfied with the (lack of) ease-of-use of String as well (e.g. for lexing and parsing tasks), and have made improving it a priority for Swift 3. I view fixing the slicing interface as part of that job.

Even if we need separate symbols for “start” and “end” (e.g. using “$” for both might just be too confusing for people in the end, even if it works otherwise), I still think a generalized form that allows ranges to be used everywhere for slicing is going to be much easier to understand than this hodgepodge of words we use today.

I'm tempted to say that if we do this, we should use two different sigils, and more importantly we should not use + and - but instead use methods on the sigils like advancedBy(), as if the sigils were literally placeholders for the start/end index. That way we won't write code that looks O(1) when it's not. For example:

col[^.advancedBy(3)..<$]

Although we'd need to revisit the names a little, because $.advancedBy(-3) is a bit odd when we know that $ can't ever take a non-negative number for that.

Or maybe we should just use $ instead as a token that means "the collection being indexed", so you'd actually say something like

col[$.startIndex.advancedBy(3)..<$.startIndex.advancedBy(5)]

I really like that direction, but I don’t think it does enough to solve the ease-of-use problem; I still think the result looks and feels horrible compared to Python for the constituencies mentioned above.

I briefly implemented this syntax, that was intended to suggest repeated incrementation:

col.startIndex++3 // col.startIndex.advancedBy(3)

I don’t think that is viable, especially now that we’ve dropped “++” and “--“. But this syntax

col[$.start⛄️3..<$.start⛄️5]

begins to be interesting for some definition of :snowman:️.

This solves the problem of subscripting a collection without having to store it in a local variable, without discarding any of the intentional index overhead. Of course, if the goal is to make index operations more concise this doesn't really help much, but my argument here is that it's hard to cut down on the verbosity without hiding O(N) operations.

That ship has already sailed somewhat, because e.g. every Collection has to have a count property, which can be O(N). But I still like to uphold it where possible. I just don’t think the combination of “+” and “$” necessarily has such a strong O(1) connotation… especially because the precedent for seeing those symbols together is regexps.

-Kevin Ballard

But the [fromStart:] and [fromEnd:] subscripts seem useful.

Yeah… I really want a unified solution that covers slicing as well as offset indexing.

-Dave

-Dave

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution