[RFC] New collections model: collections advance indices


(Dmitri Gribenko) #1

Hi,

We would like to propose a major change to how collection indices
work. The standard library team has discussed this idea internally
and we wrote a prototype. Now we think it is a viable direction to
consider, and we are bringing it for wider public discussion.

I'm pasting the first section of the proposal below to give you a
general idea about this change, but please read the proposal to
understand the full details.

You can find the most up to date version of the proposal at
https://github.com/gribozavr/swift-evolution/blob/new-collections/proposals/NNNN-collections-move-indices.md

Permalink: https://github.com/gribozavr/swift-evolution/blob/87df19a9a9d73e64a2a966b807440216a608b8ad/proposals/NNNN-collections-move-indices.md

Dmitri

## Introduction

We are proposing a new model for collections, where indices can only be
advanced forward or backward by the corresponding collection instance.
Indices become opaque tokens representing collection positions, that can
be produced and consumed by collection APIs. This allows us to reduce
the amount of data stored in indices to the bare minimum.

Compared to the current state, the new scheme simplifies implementation
of non-trivial indices, and fixes concurrency issues in `Set` and
`Dictionary` indices. It also allows us to eliminate reference-counted
stored properties from most indices, including non-trivial ones, like
`Set.Index` and `Dictionary.Index`, creating more optimizable code.

Out of scope for this proposal:

* Expanding the set of concrete collections provided by the standard
  library.

* Expanding the set of collection protocols to provide functionality
  beyond what is already provided (for example, protocols for sorted
  collections, queues etc.) Discussing how other concrete collections
  fit into the current protocol hierarchy is in scope, though.

···

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr@gmail.com>*/


[Pitch] Make offset index available for String
(Trent Nadeau) #2

Why is the protocol for iterators called IteratorProtocol instead of
Iterator or Iterable? If that is/was discussed elsewhere, I apologize, but
I don't remember seeing that particular name before. Is that new to this
proposal?

···

On Tue, Mar 1, 2016 at 9:04 PM, Dmitri Gribenko via swift-evolution < swift-evolution@swift.org> wrote:

Hi,

We would like to propose a major change to how collection indices
work. The standard library team has discussed this idea internally
and we wrote a prototype. Now we think it is a viable direction to
consider, and we are bringing it for wider public discussion.

I'm pasting the first section of the proposal below to give you a
general idea about this change, but please read the proposal to
understand the full details.

You can find the most up to date version of the proposal at

https://github.com/gribozavr/swift-evolution/blob/new-collections/proposals/NNNN-collections-move-indices.md

Permalink:
https://github.com/gribozavr/swift-evolution/blob/87df19a9a9d73e64a2a966b807440216a608b8ad/proposals/NNNN-collections-move-indices.md

Dmitri

## Introduction

We are proposing a new model for collections, where indices can only be
advanced forward or backward by the corresponding collection instance.
Indices become opaque tokens representing collection positions, that can
be produced and consumed by collection APIs. This allows us to reduce
the amount of data stored in indices to the bare minimum.

Compared to the current state, the new scheme simplifies implementation
of non-trivial indices, and fixes concurrency issues in `Set` and
`Dictionary` indices. It also allows us to eliminate reference-counted
stored properties from most indices, including non-trivial ones, like
`Set.Index` and `Dictionary.Index`, creating more optimizable code.

Out of scope for this proposal:

* Expanding the set of concrete collections provided by the standard
  library.

* Expanding the set of collection protocols to provide functionality
  beyond what is already provided (for example, protocols for sorted
  collections, queues etc.) Discussing how other concrete collections
  fit into the current protocol hierarchy is in scope, though.

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr@gmail.com>*/
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

--
Trent Nadeau


(Brent Royal-Gordon) #3

We would like to propose a major change to how collection indices
work. The standard library team has discussed this idea internally
and we wrote a prototype. Now we think it is a viable direction to
consider, and we are bringing it for wider public discussion.

This is super-interesting, and overall I think it's probably an improvement (although I haven't gone *really* deep with generic CollectionType code very often).

I do have a question, though it may simply betray my ignorance. `advance` has two forms:

  func advance(i: Index, by n: IndexDistance) -> Index
  func advance(i: Index, by n: IndexDistance, limit: Index) -> Index

This is a pretty straight port of the old `advance`:

  public func advancedBy(amount: Distance) -> Self
  public func advancedBy(amount: Distance, limit: Self) -> Self

But I'm not sure the `limit` parameter is still appropriate. I've always figured that you were supposed to pass the collection's `endIndex` there, but now that `advance` *is* a method on the collection, it can access that itself. Would we be better served with something like this?

  func unsafeAdvance(i: Index, by n: IndexDistance) -> Index // no endIndex check
  func advance(i: Index, by n: IndexDistance) -> Index // endIndex check

Or do you imagine the `limit` parameter being put to some other use that I'm not thinking of?

···

--
Brent Royal-Gordon
Architechies


(Karoy Lorentey) #4

This looks interesting! As the author of a number of custom collection
implementations, including a rather elaborate B-tree package
(https://github.com/lorentey/BTree), it always felt strange to me that
indices are expected to be able to move around the collection on their
own, while element access has to go through by the collection. It is a
great idea to fix this asymmetry.

I’ll have to carefully read through it a couple more times and look at
the prototype branch to form a real opinion, but at first glance I
like the proposal. Here are a couple of quick thoughts, with more to
come once I had time to think about the implications in detail:

- I’m not at a great fan of the `*IndexType` protocols in Swift 2. I
  do not believe they work hard enough compared to how hard it is to
  implement them, and I welcome any change that makes them even a
  little bit simpler for the collection writer.

- Having to call a collection method to increment an index looks
  unnatural at first glance, but I can see myself getting used to it
  in a few hours.

- I know that it isn't a new requirement, but I do dislike that
  `Indexable` specifies the complexity of index operations; this puts
  a hard constraint on custom collection design. I do understand the
  desire for concrete complexity promises on operations using
  indexes, but can't we express these instead e.g. in terms of number
  of index accesses?

- I love that there is a section with detailed guidance on designing
  tree-based collections. It’s interesting and informative.

- My B-trees are persistent data structures, thus my nodes cannot have
  parent or sibling links. Index lookup and navigation is still O(1)
  though, as my indices contain pointers to every node on the path to
  the current element. Since I have to keep looking up these nodes
  anyway to retrieve elements and to navigate around in the tree, I
  simply decided to keep them directly in the index. B-trees are super
  shallow, so there are only a handful of nodes on any path.

- I found that the most straightforward place to implement tree
  navigation methods like `next(:)` and `advance(:by:)` is on the path
  struct that contains the actual node references. There is no reason
  I couldn't have the new collection methods simply call through to
  these path methods, though -- I am currently doing the same thing in
  the BTreeIndex type anyway.
  
- I'm using weak references inside the index, with a (seriously
  underdeveloped) index invalidation method that happens to be closer
  to #2b than #2a. I'm not happy about using weak references, but this
  seemed the most sensible thing to do. I'd love to replace them with
  `unowned(unsafe)`, and the mutation counter seems like a great idea.
  The ARC issue mentioned at the end of the proposal is rather scary,
  though -- I don't know how I would protect against that.
  
  Generators/iterators look to be safe from this issue,
  so I’ll probably optimize their path representation first.

- For mutation, I think custom APIs often make much more sense
  than general-purpose solutions. I try to discourage use of the normal
  B-tree index for doing complex tree mutations, and I instead provide
  a cursor construct that was designed especially for performing
  a batch of mutations in a batch:

  https://github.com/lorentey/BTree/blob/master/Sources/BTreeCursor.swift#L295-L707

  The cursor is like an index on steroids. It has an identity with
  mutable state on its own, and it takes unique ownership of the tree
  while it is active. This frees the cursor to disable some costly
  invariants (such as maintaining up-to-date descendant counts in each
  node). This in turn allows for convenient batch editing of elements
  in the tree, with amortized O(1) insertion and removal operations.

  The cursor's approach goes the exact opposite way of this proposal:
  not only is the collection not necessary to use the cursor, but the
  collection's value isn't even available while there is an active
  cursor on it. (This is like how
  `Array.withUnsafeMutableBufferPointer()` works.)

- I'm almost positive this has been discussed before, but what is the
  rationale behind allowing non-Int `IndexDistance`s? The distance is
  getting cast to Int in a lot of places anyway (IIRC, even the stdlib
  uses numericCasts to cut a way through it.)

    associatedtype IndexDistance : SignedIntegerType = Int

- The `Indices` associated type is intriguing. I assume it is brand new?
  It seems strange that it is allowed to hold a strong reference, but
  I’ll have to look through the prototype code to grok it.

  Superficial comment: I’m not too happy with the name. The irregular
  plural is hard on non-native English speakers, plus it seems weird
  to have both an `Index` and an `Indices` type. The `indices` property
  calls it `IndexRange` (I assume by accident); I think I like that
  name better.

- In this declaration:

    subscript(position: Index) -> Generator.Element { get }

  I find the argument name rather unfortunate, because I've been using
  the term "position" to consistently refer to the (numerical)
  position of an element in an ordered collection, which is typically
  not the same as the element's index. Could we just quietly rename
  this to `index` or `i`? :slight_smile:

···

On 2016-03-02, at 03:04, Dmitri Gribenko <gribozavr@gmail.com> wrote:

Hi,

We would like to propose a major change to how collection indices
work. The standard library team has discussed this idea internally
and we wrote a prototype. Now we think it is a viable direction to
consider, and we are bringing it for wider public discussion.

I'm pasting the first section of the proposal below to give you a
general idea about this change, but please read the proposal to
understand the full details.

You can find the most up to date version of the proposal at
https://github.com/gribozavr/swift-evolution/blob/new-collections/proposals/NNNN-collections-move-indices.md

Permalink: https://github.com/gribozavr/swift-evolution/blob/87df19a9a9d73e64a2a966b807440216a608b8ad/proposals/NNNN-collections-move-indices.md

Dmitri

## Introduction

We are proposing a new model for collections, where indices can only be
advanced forward or backward by the corresponding collection instance.
Indices become opaque tokens representing collection positions, that can
be produced and consumed by collection APIs. This allows us to reduce
the amount of data stored in indices to the bare minimum.

Compared to the current state, the new scheme simplifies implementation
of non-trivial indices, and fixes concurrency issues in `Set` and
`Dictionary` indices. It also allows us to eliminate reference-counted
stored properties from most indices, including non-trivial ones, like
`Set.Index` and `Dictionary.Index`, creating more optimizable code.

Out of scope for this proposal:

* Expanding the set of concrete collections provided by the standard
library.

* Expanding the set of collection protocols to provide functionality
beyond what is already provided (for example, protocols for sorted
collections, queues etc.) Discussing how other concrete collections
fit into the current protocol hierarchy is in scope, though.

--
Karoly


(plx) #5

# General Remarks: Great!

Thanks for sharing this proposal; it's a big change, but will be a big improvement once it's in place, and it's encouraging to see the team is willing to undertake changes of such scale.

I'm not sure how much discussion you'll actually manage to scare up b/c the issues this proposal addresses are *not* common, but are nevertheless rather significant when encountered.

E.G.: it's nice to be able to create simple chain/product/etc. collection-combinators which can, themselves, be collections, without winding up with indices indices forced to choose between *either* holding a back-reference to retrieve various startIndex/endIndex values as-needed *or* carrying around a complete set of all needed startIndex/endIndex values.

I don’t have any negative feedback that isn’t subsumed by the next section.

# Concern: Linearity Baked-In

Even with this change, I have some concern that the proposed protocol hierarchy from `Collection` onward feels like it has a foreseeable lack of generality due to how strongly "linear" the design wants `Collection` to be.

Is this the right time to raise such concerns (e.g. in-scope for this discussion)?

Other than that concern about generality I think this proposal looks great "on paper” and I hope to get a chance to experiment with it soon.

···

On Mar 1, 2016, at 8:04 PM, Dmitri Gribenko via swift-evolution <swift-evolution@swift.org> wrote:

Hi,

We would like to propose a major change to how collection indices
work. The standard library team has discussed this idea internally
and we wrote a prototype. Now we think it is a viable direction to
consider, and we are bringing it for wider public discussion.

I'm pasting the first section of the proposal below to give you a
general idea about this change, but please read the proposal to
understand the full details.

You can find the most up to date version of the proposal at
https://github.com/gribozavr/swift-evolution/blob/new-collections/proposals/NNNN-collections-move-indices.md

Permalink: https://github.com/gribozavr/swift-evolution/blob/87df19a9a9d73e64a2a966b807440216a608b8ad/proposals/NNNN-collections-move-indices.md

Dmitri

## Introduction

We are proposing a new model for collections, where indices can only be
advanced forward or backward by the corresponding collection instance.
Indices become opaque tokens representing collection positions, that can
be produced and consumed by collection APIs. This allows us to reduce
the amount of data stored in indices to the bare minimum.

Compared to the current state, the new scheme simplifies implementation
of non-trivial indices, and fixes concurrency issues in `Set` and
`Dictionary` indices. It also allows us to eliminate reference-counted
stored properties from most indices, including non-trivial ones, like
`Set.Index` and `Dictionary.Index`, creating more optimizable code.

Out of scope for this proposal:

* Expanding the set of concrete collections provided by the standard
library.

* Expanding the set of collection protocols to provide functionality
beyond what is already provided (for example, protocols for sorted
collections, queues etc.) Discussing how other concrete collections
fit into the current protocol hierarchy is in scope, though.

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr@gmail.com>*/
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Dmitri Gribenko) #6

Hi,

What does everyone think about requiring indices to conform to
Hashable, in addition to the existing requirements for Equatable and
Comparable?

I don't think this should limit any viable collection designs, and yet
might be useful, for example, to store indices in a set.

Dmitri

···

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr@gmail.com>*/


(Dmitri Gribenko) #7

I'm using the new names introduced by SE-0006
https://github.com/apple/swift-evolution/blob/master/proposals/0006-apply-api-guidelines-to-the-standard-library.md

The protocol is not called Iterator to disambiguate it with the
Iterator associated type in Sequence from the protocol:

protocol Sequence {
  associatedtype Iterator : IteratorProtocol
}

It is not called Iterable because that would have the wrong meaning
(Sequence is iterable, the iterator iterates over its elements).

Dmitri

···

On Tue, Mar 1, 2016 at 6:55 PM, Trent Nadeau <tanadeau@gmail.com> wrote:

Why is the protocol for iterators called IteratorProtocol instead of
Iterator or Iterable? If that is/was discussed elsewhere, I apologize, but I
don't remember seeing that particular name before. Is that new to this
proposal?

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr@gmail.com>*/


(Trent Nadeau) #8

Ah. Thanks. I've read through that proposal before but totally forgot that
change.

···

On Tue, Mar 1, 2016 at 10:01 PM, Dmitri Gribenko <gribozavr@gmail.com> wrote:

On Tue, Mar 1, 2016 at 6:55 PM, Trent Nadeau <tanadeau@gmail.com> wrote:
> Why is the protocol for iterators called IteratorProtocol instead of
> Iterator or Iterable? If that is/was discussed elsewhere, I apologize,
but I
> don't remember seeing that particular name before. Is that new to this
> proposal?

I'm using the new names introduced by SE-0006

https://github.com/apple/swift-evolution/blob/master/proposals/0006-apply-api-guidelines-to-the-standard-library.md

The protocol is not called Iterator to disambiguate it with the
Iterator associated type in Sequence from the protocol:

protocol Sequence {
  associatedtype Iterator : IteratorProtocol
}

It is not called Iterable because that would have the wrong meaning
(Sequence is iterable, the iterator iterates over its elements).

Dmitri

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr@gmail.com>*/

--
Trent Nadeau


(Dmitri Gribenko) #9

If limit was implied to be endIndex, how would Slice implement the
limiting advance? Only by ignoring the advance() implementation from
the collection, and incrementing manually. So it seems like advance
with an explicit limit is a basic operation that you can use to build
others.

Another thing is that 'unsafeAdvance' is not really unsafe, it used to
be implementation-defined, but not memory-unsafe. But now that the
collection is always available in advance(_:by:), we should be able to
require it to perform a trap if result is out of bounds.

Dmitri

···

On Wed, Mar 2, 2016 at 1:08 AM, Brent Royal-Gordon <brent@architechies.com> wrote:

We would like to propose a major change to how collection indices
work. The standard library team has discussed this idea internally
and we wrote a prototype. Now we think it is a viable direction to
consider, and we are bringing it for wider public discussion.

This is super-interesting, and overall I think it's probably an improvement (although I haven't gone *really* deep with generic CollectionType code very often).

I do have a question, though it may simply betray my ignorance. `advance` has two forms:

        func advance(i: Index, by n: IndexDistance) -> Index
        func advance(i: Index, by n: IndexDistance, limit: Index) -> Index

This is a pretty straight port of the old `advance`:

        public func advancedBy(amount: Distance) -> Self
        public func advancedBy(amount: Distance, limit: Self) -> Self

But I'm not sure the `limit` parameter is still appropriate. I've always figured that you were supposed to pass the collection's `endIndex` there, but now that `advance` *is* a method on the collection, it can access that itself. Would we be better served with something like this?

        func unsafeAdvance(i: Index, by n: IndexDistance) -> Index // no endIndex check
        func advance(i: Index, by n: IndexDistance) -> Index // endIndex check

Or do you imagine the `limit` parameter being put to some other use that I'm not thinking of?

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr@gmail.com>*/


(Dave Abrahams) #10

This looks interesting! As the author of a number of custom collection
implementations, including a rather elaborate B-tree package
(https://github.com/lorentey/BTree),

FWIW, I am impressed by (even jealous of) this work, so your feedback in
this area is much appreciated.

it always felt strange to me that indices are expected to be able to
move around the collection on their own, while element access has to
go through by the collection. It is a great idea to fix this
asymmetry.

I’ll have to carefully read through it a couple more times and look at
the prototype branch to form a real opinion, but at first glance I
like the proposal. Here are a couple of quick thoughts, with more to
come once I had time to think about the implications in detail:

- I’m not at a great fan of the `*IndexType` protocols in Swift 2. I
  do not believe they work hard enough compared to how hard it is to
  implement them, and I welcome any change that makes them even a
  little bit simpler for the collection writer.

- Having to call a collection method to increment an index looks
  unnatural at first glance, but I can see myself getting used to it
  in a few hours.

One thing that hasn't been mentioned so far is that when the algorithms
using indices are extensions on the collection protocol (or members of a
specific collection) these APIs can be used without qualification, which
makes them read like free functions, which ends up looking quite natural
to my eye.

- I know that it isn't a new requirement, but I do dislike that
  `Indexable` specifies the complexity of index operations; this puts
  a hard constraint on custom collection design. I do understand the
  desire for concrete complexity promises on operations using
  indexes, but can't we express these instead e.g. in terms of number
  of index accesses?

The problem is that eventually it becomes really difficult to simply
describe the efficiency of any given algorithm. We don't want people to
have to do complex algebra to understand how algorithms compose with
data structures.

Yes, it's a trade-off, and loosening the upper bound on the cost of
indexing was one of the things we considered. We tried to carefully
think through different possible collection designs (trees in
particular!) to understand what the implications of keeping the O(1)
upper bound were. We'd be happy to discuss specific tradeoffs with you.

- I love that there is a section with detailed guidance on designing
  tree-based collections. It’s interesting and informative.

- My B-trees are persistent data structures, thus my nodes cannot have
  parent or sibling links. Index lookup and navigation is still O(1)
  though, as my indices contain pointers to every node on the path to
  the current element. Since I have to keep looking up these nodes
  anyway to retrieve elements and to navigate around in the tree, I
  simply decided to keep them directly in the index. B-trees are super
  shallow, so there are only a handful of nodes on any path.

Yeah, one other way to get O(1) here is to consider that log N is
bounded by a small constant, for practical purposes.

- I found that the most straightforward place to implement tree
  navigation methods like `next(:)` and `advance(:by:)` is on the path
  struct that contains the actual node references. There is no reason
  I couldn't have the new collection methods simply call through to
  these path methods, though -- I am currently doing the same thing in
  the BTreeIndex type anyway.

- I'm using weak references inside the index, with a (seriously
  underdeveloped) index invalidation method that happens to be closer
  to #2b than #2a. I'm not happy about using weak references, but this
  seemed the most sensible thing to do. I'd love to replace them with
  `unowned(unsafe)`, and the mutation counter seems like a great idea.
  The ARC issue mentioned at the end of the proposal is rather scary,
  though -- I don't know how I would protect against that.

Hmm, Dmitri, I thought we decided that using unowned(unsafe) to manage
the "indices" collection was simply untenable. Remember the
thread-safety issue, iterating indices on one thread while mutating the
collection on another?

  Generators/iterators look to be safe from this issue,
  so I’ll probably optimize their path representation first.

- For mutation, I think custom APIs often make much more sense
  than general-purpose solutions. I try to discourage use of the normal
  B-tree index for doing complex tree mutations, and I instead provide
  a cursor construct that was designed especially for performing
  a batch of mutations in a batch:

  https://github.com/lorentey/BTree/blob/master/Sources/BTreeCursor.swift#L295-L707

  The cursor is like an index on steroids. It has an identity with
  mutable state on its own, and it takes unique ownership of the tree
  while it is active. This frees the cursor to disable some costly
  invariants (such as maintaining up-to-date descendant counts in each
  node). This in turn allows for convenient batch editing of elements
  in the tree, with amortized O(1) insertion and removal operations.

  The cursor's approach goes the exact opposite way of this proposal:
  not only is the collection not necessary to use the cursor, but the
  collection's value isn't even available while there is an active
  cursor on it. (This is like how
  `Array.withUnsafeMutableBufferPointer()` works.)

- I'm almost positive this has been discussed before, but what is the
  rationale behind allowing non-Int `IndexDistance`s?

One could imagine indexing a file-backed thing on a 32-bit platform,
which could require 64-bit distances.

  The distance is getting cast to Int in a lot of places anyway (IIRC,
  even the stdlib uses numericCasts to cut a way through it.)

We've tried to be careful enough in balancing ease-of-use (Int
almost everywhere) with flexibility, but we might have made mistakes,
for sure.

    associatedtype IndexDistance : SignedIntegerType = Int

- The `Indices` associated type is intriguing. I assume it is brand new?
  It seems strange that it is allowed to hold a strong reference, but
  I’ll have to look through the prototype code to grok it.

  Superficial comment: I’m not too happy with the name. The irregular
  plural is hard on non-native English speakers, plus it seems weird
  to have both an `Index` and an `Indices` type. The `indices` property
  calls it `IndexRange` (I assume by accident); I think I like that
  name better.

- In this declaration:

    subscript(position: Index) -> Generator.Element { get }

  I find the argument name rather unfortunate, because I've been using
  the term "position" to consistently refer to the (numerical)
  position of an element in an ordered collection,

“Offset” would be a better name for that, IMO.

···

on Tue Mar 01 2016, Károly Lőrentey <swift-evolution@swift.org> wrote:

  which is typically not the same as the element's index. Could we
  just quietly rename this to `index` or `i`? :slight_smile:

On 2016-03-02, at 03:04, Dmitri Gribenko <gribozavr@gmail.com> wrote:

Hi,

We would like to propose a major change to how collection indices
work. The standard library team has discussed this idea internally
and we wrote a prototype. Now we think it is a viable direction to
consider, and we are bringing it for wider public discussion.

I'm pasting the first section of the proposal below to give you a
general idea about this change, but please read the proposal to
understand the full details.

You can find the most up to date version of the proposal at
https://github.com/gribozavr/swift-evolution/blob/new-collections/proposals/NNNN-collections-move-indices.md

Permalink:
https://github.com/gribozavr/swift-evolution/blob/87df19a9a9d73e64a2a966b807440216a608b8ad/proposals/NNNN-collections-move-indices.md

Dmitri

## Introduction

We are proposing a new model for collections, where indices can only be
advanced forward or backward by the corresponding collection instance.
Indices become opaque tokens representing collection positions, that can
be produced and consumed by collection APIs. This allows us to reduce
the amount of data stored in indices to the bare minimum.

Compared to the current state, the new scheme simplifies implementation
of non-trivial indices, and fixes concurrency issues in `Set` and
`Dictionary` indices. It also allows us to eliminate reference-counted
stored properties from most indices, including non-trivial ones, like
`Set.Index` and `Dictionary.Index`, creating more optimizable code.

Out of scope for this proposal:

* Expanding the set of concrete collections provided by the standard
library.

* Expanding the set of collection protocols to provide functionality
beyond what is already provided (for example, protocols for sorted
collections, queues etc.) Discussing how other concrete collections
fit into the current protocol hierarchy is in scope, though.

--
-Dave


(Dmitri Gribenko) #11

# General Remarks: Great!

Thanks for sharing this proposal; it's a big change, but will be a big improvement once it's in place, and it's encouraging to see the team is willing to undertake changes of such scale.

I'm not sure how much discussion you'll actually manage to scare up b/c the issues this proposal addresses are *not* common, but are nevertheless rather significant when encountered.

E.G.: it's nice to be able to create simple chain/product/etc. collection-combinators which can, themselves, be collections, without winding up with indices indices forced to choose between *either* holding a back-reference to retrieve various startIndex/endIndex values as-needed *or* carrying around a complete set of all needed startIndex/endIndex values.

Agreed! We already have that problem in the lazy flatMap collection.

I don’t have any negative feedback that isn’t subsumed by the next section.

# Concern: Linearity Baked-In

Even with this change, I have some concern that the proposed protocol hierarchy from `Collection` onward feels like it has a foreseeable lack of generality due to how strongly "linear" the design wants `Collection` to be.

Is this the right time to raise such concerns (e.g. in-scope for this discussion)?

We can definitely dive into more details about this. One thing that I
would want to understand is whether this non-linearity concept could
be added later without a re-design.

Could you provide more information about the linearity issues that you
have in mind? Are you thinking of something like Segmented Iterators
and Hierarchical Algorithms [1]?

[1] http://lafstern.org/matt/segmented.pdf

Other than that concern about generality I think this proposal looks great "on paper” and I hope to get a chance to experiment with it soon.

If you want to try an early version, there is a prototype
implementation in
https://github.com/apple/swift/blob/master/test/Prototypes/CollectionsMoveIndices.swift

Dmitri

···

On Thu, Mar 3, 2016 at 9:56 AM, plx via swift-evolution <swift-evolution@swift.org> wrote:

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr@gmail.com>*/


(Dmitri Gribenko) #12

Hi Károly,

Sorry for a delayed reply!

This looks interesting! As the author of a number of custom collection
implementations, including a rather elaborate B-tree package
(https://github.com/lorentey/BTree), it always felt strange to me that
indices are expected to be able to move around the collection on their
own, while element access has to go through by the collection. It is a
great idea to fix this asymmetry.

I’ll have to carefully read through it a couple more times and look at
the prototype branch to form a real opinion, but at first glance I
like the proposal. Here are a couple of quick thoughts, with more to
come once I had time to think about the implications in detail:

- I’m not at a great fan of the `*IndexType` protocols in Swift 2. I
  do not believe they work hard enough compared to how hard it is to
  implement them, and I welcome any change that makes them even a
  little bit simpler for the collection writer.

Thank you! I added your point to the proposal.

- Having to call a collection method to increment an index looks
  unnatural at first glance, but I can see myself getting used to it
  in a few hours.

- I know that it isn't a new requirement, but I do dislike that
  `Indexable` specifies the complexity of index operations; this puts
  a hard constraint on custom collection design. I do understand the
  desire for concrete complexity promises on operations using
  indexes, but can't we express these instead e.g. in terms of number
  of index accesses?

I think Dave answered this one.

- I love that there is a section with detailed guidance on designing
  tree-based collections. It’s interesting and informative.

- My B-trees are persistent data structures, thus my nodes cannot have
  parent or sibling links. Index lookup and navigation is still O(1)
  though, as my indices contain pointers to every node on the path to
  the current element.

Are these pointers unsafe pointers or ARC references? If they are ARC
references, then we get extra retain/releases. If they are unsafe
pointers, we need a safety check -- and I am interested to know what
was your approach to this.

- I found that the most straightforward place to implement tree
  navigation methods like `next(:)` and `advance(:by:)` is on the path
  struct that contains the actual node references. There is no reason
  I couldn't have the new collection methods simply call through to
  these path methods, though -- I am currently doing the same thing in
  the BTreeIndex type anyway.

- I'm using weak references inside the index, with a (seriously
  underdeveloped) index invalidation method that happens to be closer
  to #2b than #2a. I'm not happy about using weak references, but this
  seemed the most sensible thing to do. I'd love to replace them with
  `unowned(unsafe)`, and the mutation counter seems like a great idea.
  The ARC issue mentioned at the end of the proposal is rather scary,
  though -- I don't know how I would protect against that.

Does your tree have in-place mutation? (E.g., if a tree is uniquely
referenced, and there's only one live index, does tree[i] = 42
reallocate any nodes?)

I think the reasoning about unowned(safe) applies to weak references.
(The ARC issue.)

Fundamentally, the problem is that if you want an entity to be an
independent value, it has to hold strong references to all its
critical parts. For example, an index can hold a weak reference to
something, but only if it does not care and continues to be fully
functional if that reference suddenly becomes nil. But if a valid
index can't operate without a reference to collection's storage, it
has to be a strong reference.

You can replace a strong reference with an unsafe(unowned) reference,
but as an optimization, when it is possible to guarantee memory safety
by other means that are cheaper than ARC. You should see no
functional difference between using a strong reference and
unsafe(unowned) + necessary safety checks.

  Generators/iterators look to be safe from this issue,
  so I’ll probably optimize their path representation first.

Generators/iterators are separate values. So they should be
traversing a snapshot of the collection value. Iterators shouldn't be
auto-updating with concurrent collection mutations. For example, for
any collection implementation, this code terminates and appends
collection's elements to itself:

var c = getCollection()
for item in c {
  c.append(item)
}

Thus, iterators should be holding a strong reference to the
collection's storage, and make it non-uniquely referenced.

- For mutation, I think custom APIs often make much more sense
  than general-purpose solutions. I try to discourage use of the normal
  B-tree index for doing complex tree mutations, and I instead provide
  a cursor construct that was designed especially for performing
  a batch of mutations in a batch:

  https://github.com/lorentey/BTree/blob/master/Sources/BTreeCursor.swift#L295-L707

  The cursor is like an index on steroids. It has an identity with
  mutable state on its own, and it takes unique ownership of the tree
  while it is active. This frees the cursor to disable some costly
  invariants (such as maintaining up-to-date descendant counts in each
  node). This in turn allows for convenient batch editing of elements
  in the tree, with amortized O(1) insertion and removal operations.

  The cursor's approach goes the exact opposite way of this proposal:
  not only is the collection not necessary to use the cursor, but the
  collection's value isn't even available while there is an active
  cursor on it. (This is like how
  `Array.withUnsafeMutableBufferPointer()` works.)

That's a very powerful concept!

- I'm almost positive this has been discussed before, but what is the
  rationale behind allowing non-Int `IndexDistance`s? The distance is
  getting cast to Int in a lot of places anyway (IIRC, even the stdlib
  uses numericCasts to cut a way through it.)

    associatedtype IndexDistance : SignedIntegerType = Int

- The `Indices` associated type is intriguing. I assume it is brand new?
  It seems strange that it is allowed to hold a strong reference, but
  I’ll have to look through the prototype code to grok it.

No, it is not new. It is a special type for `var indices`, which is
the collection of indices. This variable used to be typed as
`Range<Index>` -- basically wrapping a pair of indices.
`Range<Index>` won't be able to conform to Collection in this
proposal, since you can't increment all indices in isolation.
Therefore, in the general case, `var indices` has to capture
startIndex, endIndex and the collection itself. In some cases (for
example, Array) it can be Range<Index>. To allow using the cheapest
possible implementation, we are making the type of `var indices` an
associated type.

The reason why it needs to hold a strong reference is the same as why
generators need to hold a strong reference. This index collection is
a separate value. The value should be usable regardless of what
happens to other values.

Consider this example:

var c = getCollection()
for i in c.indices {
  print(i)
  c.append(c[i])
}

Imagine that `c` is uniquely referenced, and `c.indices` holds an
unowned reference to `c` that does not increase the reference count.
Then, when a new element is appended, the collection might need to
reallocate the storage. The old storage will be deallocated. When
`c.indices` needs to increment the index, it will try to access an
unowned reference that was deallocated, and the program will trap.

  Superficial comment: I’m not too happy with the name. The irregular
  plural is hard on non-native English speakers, plus it seems weird
  to have both an `Index` and an `Indices` type. The `indices` property
  calls it `IndexRange` (I assume by accident); I think I like that
  name better.

I'll let Dave answer this one :slight_smile:

Dmitri

···

On Tue, Mar 1, 2016 at 10:39 PM, Károly Lőrentey <karoly@lorentey.hu> wrote:

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr@gmail.com>*/


(plx) #13

So far I can’t think of a single index that would have difficulty implementing `Hashable`.

One policy question though:

  struct MutationTrackingTreeIndex<T> {
    private let mutation: Int
    private unsafe(unowned) let treeNode: TreeNode<T>
  }

  // which hashValue would you expect:
  extension MutationTrackingTreeIndex : Hashable {

    var hashValue: Int {
      return ObjectIdentifier(treeNode).hashValue
    }

    var hashValue: Int {
      return ObjectIdentifier(treeNode).hashValue ^ self.mutation
    }

  }

…I’d assume the 2nd, but there’s then going to be room for confusion vis-a-vis how it works with Array (and other Int-indexed collections, or even e.g. an “Array” backed by a persistent tree of some kind).

I think `Hashable` indices is a good idea but I would want the expectation for considerations like the above to be documented for consistency’s sake.

···

On Mar 7, 2016, at 7:25 PM, Dmitri Gribenko via swift-evolution <swift-evolution@swift.org> wrote:

Hi,

What does everyone think about requiring indices to conform to
Hashable, in addition to the existing requirements for Equatable and
Comparable?

I don't think this should limit any viable collection designs, and yet
might be useful, for example, to store indices in a set.

Dmitri

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr@gmail.com>*/
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Joe Groff) #14

Is there a reason for Hashable to be distinct from Equatable at all? Every Equatable type can (badly) satisfy the requirements of Hashable with `return 0;`, if nothing else.

-Joe

···

On Mar 7, 2016, at 5:25 PM, Dmitri Gribenko via swift-evolution <swift-evolution@swift.org> wrote:

Hi,

What does everyone think about requiring indices to conform to
Hashable, in addition to the existing requirements for Equatable and
Comparable?

I don't think this should limit any viable collection designs, and yet
might be useful, for example, to store indices in a set.


(Pyry Jahkola) #15

Why impose such a restriction?

I think it's better to just make any concrete index types (e.g. DictionaryIndex<K,V>) in the standard library Hashable where possible. People are still free to constrain their algorithms to `C : CollectionType where C.Index : Hashable` whenever needed.

Besides, in the proposed design, is it really necessary to make Index also Comparable, or wouldn't Equatable be enough? Is it just a question of convenience and failing to find an example of and index type that's Equatable but not Comparable?

— Pyry

···

On 08 Mar 2016, at 03:25, Dmitri Gribenko via swift-evolution <swift-evolution@swift.org> wrote:

What does everyone think about requiring indices to conform to
Hashable, in addition to the existing requirements for Equatable and
Comparable?


(Howard Lovatt) #16

I have used the node of a linked list as the list's index! The node wasn't
Hashable, Equatable, or Comparable; hash and equate would be easy to add
but Comparable would be expensive. It is probably not a big deal since
linked lists aren't that great a data structure anyway. Code below:

class Node<Element> {

    var value: Element

    private(set) var next: Node<Element>? = nil

    init(_ value: Element) { self.value = value }

    init(value: Element, successor: Node<Element>) {

        self.value = value

        self.next = successor

    }

}

class LinkedListGenerator<Element>: GeneratorType {

    private var index: Node<Element>?

    init(_ startIndex: Node<Element>) { index = startIndex }

    func next() -> Element? {

        guard let i = index else { return nil }

        let element = i.value

        index = i.next

        return element

    }

}

class LinkedList<Element>: SequenceType { // Has same interface as
Indexable, but without the ForwardIndexType constraint

    private(set) var startIndex: Node<Element>

    private(set) var endIndex: Node<Element>

    convenience init<S: SequenceType where S.Generator.Element ==

(elements:

S) {

        var gen = elements.generate()

        var element = gen.next()

        guard let first = element else { fatalError("Empty list") }

        self.init(first)

        element = gen.next()

        while element != nil {

            append(element!)

            element = gen.next()

        }

    }

    init(_ element: Element) {

        startIndex = Node(element)

        endIndex = startIndex

    }

    subscript(index: Node<Element>) -> Element {

        get { return index.value }

        set { index.value = newValue }

    }

    func generate() -> LinkedListGenerator<Element> { return
LinkedListGenerator(startIndex) }

    func prepend(element: Element) { startIndex = Node(value: element,
successor: startIndex) }

    func prepend(elements elements: LinkedList<Element>) {

        let old = startIndex

        startIndex = elements.startIndex

        elements.endIndex.next = old

    }

    func append(element: Element) {

        endIndex.next = Node(element)

        endIndex = endIndex.next!

    }

    func append(elements elements: LinkedList<Element>) {

        endIndex.next = elements.startIndex

        endIndex = elements.endIndex

    }

}

  -- Howard.

···

On 8 March 2016 at 12:25, Dmitri Gribenko via swift-evolution < swift-evolution@swift.org> wrote:

Hi,

What does everyone think about requiring indices to conform to
Hashable, in addition to the existing requirements for Equatable and
Comparable?

I don't think this should limit any viable collection designs, and yet
might be useful, for example, to store indices in a set.

Dmitri

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr@gmail.com>*/
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Howard Lovatt) #17

Sorry, unfortunately I can't see that it really helps. All that happens is
that the client of the collection now holds the reference to the collection
as well as the reference to the iterator. In the more traditional model the
client holds a reference to the iterator which in turn holds a reference to
the collection.

Maybe the problem arose with the double indirection trick. Would a better
approach be to keep it simple:

    struct SomeCollection<E>: Collection {
        private var refToInternalStorage: UnsafePointerOfSomeSort<E>
        func iterable() -> SomeCollectionIterator { return
SomeCollectionIterator(self) }
        mutating func append(element: E) {
            copyInternalStorageIfNecessary(size + 1) // Makes a copy if
either not large enough or if already aliased
            // Append `element` to `refToInternalStorage`
        }
        // Other functions access `refToInternalStorage`, but if mutating
call `copyInternalStorageIfNecessary(Int)` before modifying
    }

    struct SomeCollectionIterator<E>: Iterator {
        private let someCollection: SomeCollection<E> // Note let not var -
could be direct link to internal storage
        private var index: Index
        init(someCollection: someCollection <E>) {
            self.someCollection = someCollection
            index = someCollection.firstIndex()
        }
        // next()
    }

Obviously above is pseudo code - but hopefully you get the idea.

The difference comes when with the proposal you do:

    var array = [1]
    var iterator = array.makeIterator()
    array.removeAll()
    array.next(iterator) // Runtime exception

Similarly with modifications in another thread. With the above suggested
`obvious` implementation:

    var array = [1]
    var iterator = array.iterator()
    array.removeAll()
    iterator.next() // OK, returns 1 because it has the original before
array was mutated.

In essence I am saying do the obvious, since that is easiest for the
clients (i.e. the programmers that use Swift).

Sorry to be negative,

-- Howard.

  -- Howard.

···

On 2 March 2016 at 14:03, Trent Nadeau via swift-evolution < swift-evolution@swift.org> wrote:

Ah. Thanks. I've read through that proposal before but totally forgot that
change.

On Tue, Mar 1, 2016 at 10:01 PM, Dmitri Gribenko <gribozavr@gmail.com> > wrote:

On Tue, Mar 1, 2016 at 6:55 PM, Trent Nadeau <tanadeau@gmail.com> wrote:
> Why is the protocol for iterators called IteratorProtocol instead of
> Iterator or Iterable? If that is/was discussed elsewhere, I apologize,
but I
> don't remember seeing that particular name before. Is that new to this
> proposal?

I'm using the new names introduced by SE-0006

https://github.com/apple/swift-evolution/blob/master/proposals/0006-apply-api-guidelines-to-the-standard-library.md

The protocol is not called Iterator to disambiguate it with the
Iterator associated type in Sequence from the protocol:

protocol Sequence {
  associatedtype Iterator : IteratorProtocol
}

It is not called Iterable because that would have the wrong meaning
(Sequence is iterable, the iterator iterates over its elements).

Dmitri

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr@gmail.com>*/

--
Trent Nadeau

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Dave Abrahams) #18

Awesome paper; everybody should read it :wink:

Collections aren't necessarily about fundamentally linear structures,
but about *linearizable* structures. If you want to build a system of
generic algorithms that work efficiently on all kinds of data
structures, flat linearity won't always be the most efficient
abstraction, but:

1. It's generally simpler to write linear algorithms (e.g. even copying
   from one hierarchical data structure to another with different
   intermediate segment boundaries can be hard to code).

2. You'll want the linear specialization for the leaves of your data
   structure regardless of what else you do, so having a linearized
   abstraction is important even if you're going to build a hierarchical
   one.

···

on Thu Mar 03 2016, Dmitri Gribenko <swift-evolution@swift.org> wrote:

On Thu, Mar 3, 2016 at 9:56 AM, plx via swift-evolution > <swift-evolution@swift.org> wrote:

# General Remarks: Great!

Thanks for sharing this proposal; it's a big change, but will be a
big improvement once it's in place, and it's encouraging to see the
team is willing to undertake changes of such scale.

I'm not sure how much discussion you'll actually manage to scare up
b/c the issues this proposal addresses are *not* common, but are
nevertheless rather significant when encountered.

E.G.: it's nice to be able to create simple
chain/product/etc. collection-combinators which can, themselves, be
collections, without winding up with indices indices forced to
choose between *either* holding a back-reference to retrieve various
startIndex/endIndex values as-needed *or* carrying around a complete
set of all needed startIndex/endIndex values.

Agreed! We already have that problem in the lazy flatMap collection.

I don’t have any negative feedback that isn’t subsumed by the next section.

# Concern: Linearity Baked-In

Even with this change, I have some concern that the proposed
protocol hierarchy from `Collection` onward feels like it has a
foreseeable lack of generality due to how strongly "linear" the
design wants `Collection` to be.

Is this the right time to raise such concerns (e.g. in-scope for this discussion)?

We can definitely dive into more details about this. One thing that I
would want to understand is whether this non-linearity concept could
be added later without a re-design.

Could you provide more information about the linearity issues that you
have in mind? Are you thinking of something like Segmented Iterators
and Hierarchical Algorithms [1]?

[1] http://lafstern.org/matt/segmented.pdf

--
-Dave


(Karoy Lorentey) #19

This looks interesting! As the author of a number of custom collection
implementations, including a rather elaborate B-tree package
(https://github.com/lorentey/BTree),

FWIW, I am impressed by (even jealous of) this work, so your feedback in
this area is much appreciated.

Why thank you. *blush*

One thing that hasn't been mentioned so far is that when the algorithms
using indices are extensions on the collection protocol (or members of a
specific collection) these APIs can be used without qualification, which
makes them read like free functions, which ends up looking quite natural
to my eye.

Nice!

- I know that it isn't a new requirement, but I do dislike that
`Indexable` specifies the complexity of index operations; this puts
a hard constraint on custom collection design. I do understand the
desire for concrete complexity promises on operations using
indexes, but can't we express these instead e.g. in terms of number
of index accesses?

The problem is that eventually it becomes really difficult to simply
describe the efficiency of any given algorithm. We don't want people to
have to do complex algebra to understand how algorithms compose with
data structures.

Yes, it's a trade-off, and loosening the upper bound on the cost of
indexing was one of the things we considered. We tried to carefully
think through different possible collection designs (trees in
particular!) to understand what the implications of keeping the O(1)
upper bound were. We'd be happy to discuss specific tradeoffs with you.

That's fair! So far, I've always been able to implement constant access,
and the code has only become better of it.

The only place where I intentionally decided against it is the tree-backed,
Array-like `List`. It really wants to be indexed by an Int, to emulate
Array's loose approach to index invalidation. (Although, like you said,
B-tree lookup is practically O(1) anyway, so maybe I'll just call it that.)

This reminds me that index invalidation rules are all over the place across
the various collection types and not very well documented. Some ideas for
improvements:

- From the sample code in its documentation, it looks like
  `MutableCollection` requires subscript assignment to not invalidate
  indices. Can we make this a formal requirement?
- What about range assignment in `MutableCollection`? E.g., what happens
  to indices if it changes the element count? (Is supporting that a requirement?)
- Does `RangeReplacableCollection` imply Array-like index invalidation?
  I think it should.

By the way, does the requirement for (amortized) constant complexity also
apply to advancing an index? That's OK for trees, but it may not come cheap
for hashed collections. Native dictionaries & sets currently have an
index.successor() with O(n) complexity (if I'm not mistaken).

What about `Collection.indices`? Getting the first index will definitely
take O(log(n)) for tree collections, and bridged dictionaries/sets have
this at O(n).

- I'm using weak references inside the index, with a (seriously
underdeveloped) index invalidation method that happens to be closer
to #2b than #2a. I'm not happy about using weak references, but this
seemed the most sensible thing to do. I'd love to replace them with
`unowned(unsafe)`, and the mutation counter seems like a great idea.
The ARC issue mentioned at the end of the proposal is rather scary,
though -- I don't know how I would protect against that.

Hmm, Dmitri, I thought we decided that using unowned(unsafe) to manage
the "indices" collection was simply untenable. Remember the
thread-safety issue, iterating indices on one thread while mutating the
collection on another?

Hm, aren't multithreaded mutations supposed to use COW to get their own
copy to mutate?

Or is this about invalidation of the return value of `Collection.indices`?
That's not supposed to survive (all) mutations, is it?

(Ugh, how do I refer to an instance of Indices (the associated type) without
confusing it with indices in the sense of multiple `Index`es?)

- I'm almost positive this has been discussed before, but what is the
rationale behind allowing non-Int `IndexDistance`s?

One could imagine indexing a file-backed thing on a 32-bit platform,
which could require 64-bit distances.

Ah, OK! But it begs the question: would a file-backed collection be able
to satisfy the O(1) indexing requirement without making a complete
mockery of it? :slight_smile:

The distance is getting cast to Int in a lot of places anyway (IIRC,
even the stdlib uses numericCasts to cut a way through it.)

We've tried to be careful enough in balancing ease-of-use (Int
almost everywhere) with flexibility, but we might have made mistakes,
for sure.

I think the largest speed bump there is that sequences and collections
are often loaded into Arrays, and an oversized collection would have to
tread very carefully to avoid all of these places. The API is geared
towards finite (and relatively small-ish) collections/sequences.
I suspect a Collection with a billion elements wouldn't work much better
than an infinite sequence.

- In this declaration:

subscript(position: Index) -> Generator.Element { get }

I find the argument name rather unfortunate, because I've been using
the term "position" to consistently refer to the (numerical)
position of an element in an ordered collection,

“Offset” would be a better name for that, IMO.

That's spot on! Although in some contexts it could be misunderstood to
mean a delta. I'll sleep on it, but I think you're right.

···

On 2016-03-03 16:25:45 +0000, Dave Abrahams via swift-evolution said:

on Tue Mar 01 2016, Károly Lőrentey <swift-evolution@swift.org> wrote:

--
Károly


(plx) #20

# General Remarks: Great!

Thanks for sharing this proposal; it's a big change, but will be a big improvement once it's in place, and it's encouraging to see the team is willing to undertake changes of such scale.

I'm not sure how much discussion you'll actually manage to scare up b/c the issues this proposal addresses are *not* common, but are nevertheless rather significant when encountered.

E.G.: it's nice to be able to create simple chain/product/etc. collection-combinators which can, themselves, be collections, without winding up with indices indices forced to choose between *either* holding a back-reference to retrieve various startIndex/endIndex values as-needed *or* carrying around a complete set of all needed startIndex/endIndex values.

Agreed! We already have that problem in the lazy flatMap collection.

I don’t have any negative feedback that isn’t subsumed by the next section.

# Concern: Linearity Baked-In

Even with this change, I have some concern that the proposed protocol hierarchy from `Collection` onward feels like it has a foreseeable lack of generality due to how strongly "linear" the design wants `Collection` to be.

Is this the right time to raise such concerns (e.g. in-scope for this discussion)?

We can definitely dive into more details about this. One thing that I
would want to understand is whether this non-linearity concept could
be added later without a re-design.

Sorry for the belated reply; I had to think for a bit about this one!

I’m not entirely sure, in part b/c I don’t have a ready-to-go alternative design to propose.

What I had in mind was to try and "reserve room” for alternative indexing schemes to be added later without undue awkwardness.

However, when I went looking for real-world examples of such alternative indexing schemes it seems it’s just not something anyone actually does, and there’s probably a reason for that!

I think it’s possible to adjust the protocol hierarchy to “reserve room”—remove `Indexable`’s methods from `Collection`, add them back to a new `ForwardCollection` between `Collection` and `BidirectionalCollection`—but it’d only make sense to do that if you expect to have use for that room in the future.

Could you provide more information about the linearity issues that you
have in mind? Are you thinking of something like Segmented Iterators
and Hierarchical Algorithms [1]?

[1] http://lafstern.org/matt/segmented.pdf

That’s a neat paper! It also seems limited, insofar as it only demonstrated an improved implementation of `fill` (which is somewhat suggestive vis-a-vis the suspicions outlined above).

What I *was* thinking of was something generalizing the chapter on “Bifurcate Coordinates” in Stepanov’s “Elements of Programming” (this approach seems easily generalizable, but that doesn’t seem to have been taken up anywhere).

My thoughts were that as collections get stronger semantics, there’s an increasing disconnect between the semantics of the collection and the semantics of a “linear” index. Working up to it:

Arrays and array-like collections are truly “linear”, and are well-modeled by “linear” indices as-per the proposal; there’s a rich collection of generic algorithms that work on such collections and their indices, and it’s nice to have them in Swift.

Sets and dictionaries (and similar) are what I guess you could call “flat” (or “almost-linear”) collections: iteration aside, many index-based generic algorithms don’t work (particular mutating algorithms), and even where such algorithms *do work* it tends to be like this:

  let stringSet: Set<String> = [“a”, “b”, “c”, “d” ]
  let bPrefix = stringSet.prefixThrough(stringSet.indexOf(“b”)!)

…where the operation is *well-defined* but doesn’t have much to do with the semantics of `Set`.

Collections with even-stronger semantics (tree-shaped collections, order-maintaining collections, geometric collections, etc.) will likely have even further semantic disconnections!

As an example, one thing I’m working on is *basically* a “set" of `(Value,CGRect)` pairs (with the rects possibly-overlapping each other); it keeps its contents organized so you can traverse its contents sorted by any combination of edge (north/south/east/west) and order (ascending/descending).

The way I have been structuring it is that each of those traversals is basically like this:

  extension CustomCollection {

    /// Returns a collection providing a specific, read-only, ordered “view” of the collection's contents.
    var byAscendingNorthernEdges: AscendingNorthernEdgeView { get }

  }

…and although I *did* make `CustomCollection` itself into a `CollectionType`, I can’t think of any scenario in which I’d ever intentionally want to use its `Index` for anything other than iteration.

Similarly, consider something like the tree-index situation vis-a-vis generic algorithms like `binarySearch`, `upperBound`, `lowerBound`, and `equalRange`:

- if the tree *isn’t* ordered-as-required the generic algorithms won’t work b/c the precondition isn’t being met
- if the tree *is* ordered-as-required the algorithms *will* work, but an implementation using linear-indices would likely be sub-optimal vis-a-vis implementations that exploit the tree-structure (start at root, follow appropriate branch, etc.)

…all of which leads me to wonder a bit about the general usefulness of such linear indices outside of iteration.

To be clear, there’s nothing in all of the above that has anything specifically to do with “Collections move Indices”; it’s all about potential future directions.

Even then, it seems like most other languages stick with what I’ve been calling a “linear” model and their “fancier" collections *are* collections (as per their basic collection-API) and then use additional API to get at the “fancier” functionality, so it’s not clear there’s a real need to do different here.

···

On Mar 3, 2016, at 3:28 PM, Dmitri Gribenko <gribozavr@gmail.com> wrote:
On Thu, Mar 3, 2016 at 9:56 AM, plx via swift-evolution > <swift-evolution@swift.org> wrote:

Other than that concern about generality I think this proposal looks great "on paper” and I hope to get a chance to experiment with it soon.

If you want to try an early version, there is a prototype
implementation in
https://github.com/apple/swift/blob/master/test/Prototypes/CollectionsMoveIndices.swift

Dmitri

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr@gmail.com>*/