Add ability to validate collection indices


(Alexey Komnin) #1

Hello everyone!

Currently, indices of a collection might be invalidated due to call to
any mutating method of the object. Assume, you have two indices - they
point to the first and the last elements of an array - and you call
removeLast() routine. The latter index would be invalidated during
this call but the first one is not. It's understandable.

Now imagine, you've got two indices of type SetIndex<T>. You don't
know what elements of the Set they point to. Then you call remove()
routine. Which of indices would still be valid?

Here is the complete example:

    var testSet: Set = [1, 2, 3, 4, 5]

    let idx = testSet.index(of: 4)!
    print("value at index \(idx): \(testSet[idx])")

    testSet.remove(3)
    print("value at index \(idx): \(testSet[idx])")
    // fatal error: attempting to access Set elements using an invalid Index

The reference states that indices may become invalid as a result of
mutating operations, but it doesn't state which of indices.

The proposed solution is: define new method for Collection protocol to
make it possible to validate index before subscripting particular
collection:
    validate(index: Self.Index)

What do you think? I feel, like we should discuss it before I
formalize it as a proposal.
Thanks.

- Alexey Komnin


(Daniel Vollmer) #2

Hi,

[snip]

What do you think? I feel, like we should discuss it before I
formalize it as a proposal.

I think this is a fruitless attempt, as even if the indices are still valid,
they may not refer to the same elements they referenced before the mutation.

Of course, mutating methods should state whether the invalidate existing
indices, but I think that’s about the best that can be reasonably done.

  Daniel.

···

On 16 Dec 2016, at 14:56, Alexey Komnin via swift-evolution <swift-evolution@swift.org> wrote:


(Anton Zhilin) #3

It will be impossible without huge additional overhead.
I assume that "invalidated" means "no more points to that element".

Consider that an element is inserted in the middle of an Array.
Assuming enough capacity, all iterators after that one will be invalidated.
But all new iterators pointing to the same spots will be valid.
How do you differentiate between the "old" ones and the "new" ones?

I see only one general approach to this:
1. Make iterator type a class
2. Add to the collection, an array of all iterators, which have been
created (and are being used)
3. Add a "valid" flag to iterator
4. On most operations on the collection, it will walk through its
iterators, marking some as "invalid".

It's a safe way to eliminate some "out of bounds" errors, but it's just
utterly rediculous.


(Dave Abrahams) #4

Hello everyone!

Currently, indices of a collection might be invalidated due to call to
any mutating method of the object. Assume, you have two indices - they
point to the first and the last elements of an array - and you call
removeLast() routine. The latter index would be invalidated during
this call but the first one is not. It's understandable.

Now imagine, you've got two indices of type SetIndex<T>. You don't
know what elements of the Set they point to. Then you call remove()
routine. Which of indices would still be valid?

Here is the complete example:

    var testSet: Set = [1, 2, 3, 4, 5]

    let idx = testSet.index(of: 4)!
    print("value at index \(idx): \(testSet[idx])")

    testSet.remove(3)
    print("value at index \(idx): \(testSet[idx])")
    // fatal error: attempting to access Set elements using an invalid Index

The reference states that indices may become invalid as a result of
mutating operations, but it doesn't state which of indices.

Yes. This is something we've been meaning to better nail down. There
are also index relationships we want to ensure between collections and
their copies and slices.

The proposed solution is: define new method for Collection protocol to
make it possible to validate index before subscripting particular
collection:
    validate(index: Self.Index)

I don't see how that really helps programmers. What they need most is
the ability to reason about index validity, IMO.

···

on Fri Dec 16 2016, Alexey Komnin <swift-evolution@swift.org> wrote:

What do you think? I feel, like we should discuss it before I
formalize it as a proposal.
Thanks.

- Alexey Komnin
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

--
-Dave


(Dave Abrahams) #5

We can do a bit more. For example, RangeReplaceableCollection's
replaceRange should return the range in the new collection state
corresponding to the elements that were replaced.

···

on Fri Dec 16 2016, Daniel Vollmer <swift-evolution@swift.org> wrote:

Hi,

On 16 Dec 2016, at 14:56, Alexey Komnin via swift-evolution <swift-evolution@swift.org> wrote:

[snip]

What do you think? I feel, like we should discuss it before I
formalize it as a proposal.

I think this is a fruitless attempt, as even if the indices are still valid,
they may not refer to the same elements they referenced before the mutation.

Of course, mutating methods should state whether the invalidate existing
indices, but I think that’s about the best that can be reasonably
done.

--
-Dave


(Haravikk) #6

There was one idea I was exploring but never quite came up with a final design for, which would be the ability for types to "tag" and invalidate values that they return.

In essence, a collection might return an index and tag it as "indices", and a later call to remove, add etc. might invalidate all values tagged as "indices", and provide some guidance on what to do about it. The Swift compiler will make a best effort to track these tags to provide warnings.

I was thinking something like this (heavily cut down):

  struct Foo : Collection {
    func index(_ i:Index, offsetBy n:IndexDistance) -> Index @tag("indices") { return i + n }
    @invalidates("indices") func remove(at:Index) { /* Remove an element */ }
  }

  var foo = Foo()
  let index = foo.index(foo.startIndex, offsetBy: 5)
  foo.remove(at: index) // This is fine
  foo.remove(at: index) // Warning: indices of foo have been invalidated

In other words, the variable index is linked (in the compiler, not at run-time) to the type instance that created it by the specified tag "indices". Now, when that type invalidates "indices" that variable is effectively marked as invalid, thus any attempt to use it will produce a warning that it may no longer be valid. Of course in the case of simple indices it should be fine, though technically speaking you're still doing something that could still fail at runtime.

The idea here is that, in the simpler cases at least, the compiler gains some awareness of index invalidation, allowing us to potentially avoid run-time errors entirely; naturally it gets more complex if the indices are passed around, but as long as it is still possible to track where they came from (e.g- if it's a collection held by a class reference) then it can still work. This feature could also be used to detect use of an index for the wrong type or instance; i.e- if two instances have the same index type, indices may not be compatible between the two, currently mixing and matching them isn't a compiler error, but can fail unexpectedly at runtime, though there's probably a simpler solution to that specific case (e.g- some way to make the type checker to treat their indices as if they were different types).

···

On 16 Dec 2016, at 14:51, Anton Zhilin via swift-evolution <swift-evolution@swift.org> wrote:

It will be impossible without huge additional overhead.
I assume that "invalidated" means "no more points to that element".

Consider that an element is inserted in the middle of an Array.
Assuming enough capacity, all iterators after that one will be invalidated.
But all new iterators pointing to the same spots will be valid.
How do you differentiate between the "old" ones and the "new" ones?

I see only one general approach to this:
1. Make iterator type a class
2. Add to the collection, an array of all iterators, which have been created (and are being used)
3. Add a "valid" flag to iterator
4. On most operations on the collection, it will walk through its iterators, marking some as "invalid".

It's a safe way to eliminate some "out of bounds" errors, but it's just utterly rediculous.
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Karl) #7

Oh, we really do. It’s not such a big deal for Array and basic collections, but it’s absolutely essential for String (where a character index may represent a variable number of bytes).

I think the way to resolve this would be to return some kind of IndexDisplacement type which knows how the underlying indexes were mutated (e.g. "100 bytes added at offset 36”) so it can transform previously-valid indexes in to indexes which are valid post-mutation. See: https://bugs.swift.org/browse/SR-2689

- Karl

···

On 19 Dec 2016, at 04:10, Dave Abrahams via swift-evolution <swift-evolution@swift.org> wrote:

on Fri Dec 16 2016, Daniel Vollmer <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Hi,

On 16 Dec 2016, at 14:56, Alexey Komnin via swift-evolution <swift-evolution@swift.org> wrote:

[snip]

What do you think? I feel, like we should discuss it before I
formalize it as a proposal.

I think this is a fruitless attempt, as even if the indices are still valid,
they may not refer to the same elements they referenced before the mutation.

Of course, mutating methods should state whether the invalidate existing
indices, but I think that’s about the best that can be reasonably
done.

We can do a bit more. For example, RangeReplaceableCollection's
replaceRange should return the range in the new collection state
corresponding to the elements that were replaced.

--
-Dave

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution


(Alexey Komnin) #8

Because I don't see any way to use it in correct code. At least, none
that can't be more-usefully and reliably provided by returning more
index information from mutating methods.

Ok. I agree. All the indices must be invalidated if the collection
object is mutated. I just thought about making it straight and obvious
how to check if index is still valid for simple collections like
Array. Returning additional index information from mutating methods is
a subtle thing, because some methods do not change the object layout,
for example append does not change validity of indices; other methods
may invalidate a lot or, even, all indices, for example,
replaceSubrange.

I just want to understand the purpose of indices and what behavior is
intended for them. It's unclear for me now. I just see a bunch of ways
to shoot myself in the foot and looking for a way to change that
disposition.

As I see it, the problem is: there is no way to find if index is valid
for collection. Obvious way of checking if index is within the
(startIndex..<endIndex) bounds does not work. So, I know only three
possible solutions to the problem:

1. provide a way to check if index is valid;
2. state in documentation that all indices of any collection become
invalid after each mutation;
3. state in documentation that accessing elements of collection via
index after mutation is unspecified behavior.

Am I right? Did I miss something?

I would prefer option #1, if there are no other feasible solutions to
the problem.

Thanks.
- Alex Komnin

P.S. I just noticed that our discussion fell out of swift-evolution
list. Added it back.

···

On Mon, Dec 19, 2016 at 8:30 PM, Dave Abrahams <dabrahams@apple.com> wrote:

on Mon Dec 19 2016, Alexey Komnin <interfere.work-AT-gmail.com> wrote:

I don't personally agree with #3, FWIW

Why? I thought it may be useful.

Because I don't see any way to use it in correct code. At least, none
that can't be more-usefully and reliably provided by returning more
index information from mutating methods.

- Alex Komnin

On Mon, Dec 19, 2016 at 4:20 PM, Dave Abrahams <dabrahams@apple.com> wrote:

I don't personally agree with #3, FWIW

On Dec 19, 2016, at 3:01 AM, Alexey Komnin <interfere.work@gmail.com> wrote:

Ok. Through the discussion I've come to the following:

1. collection invalidates all indices on any mutation as a common solution;
2. collection may keep some indices valid if it is feasible;
3. there are should be a way to check if index is valid;
4. mutating methods should return index of inserted/replaced element.

Did I miss anything?

- Alex Komnin

On Mon, Dec 19, 2016 at 6:10 AM, Dave Abrahams via swift-evolution >>>> <swift-evolution@swift.org> wrote:

on Fri Dec 16 2016, Daniel Vollmer <swift-evolution@swift.org> wrote:

Hi,

On 16 Dec 2016, at 14:56, Alexey Komnin via swift-evolution <swift-evolution@swift.org> wrote:

[snip]

What do you think? I feel, like we should discuss it before I
formalize it as a proposal.

I think this is a fruitless attempt, as even if the indices are still valid,
they may not refer to the same elements they referenced before the mutation.

Of course, mutating methods should state whether the invalidate existing
indices, but I think that’s about the best that can be reasonably
done.

We can do a bit more. For example, RangeReplaceableCollection's
replaceRange should return the range in the new collection state
corresponding to the elements that were replaced.

--
-Dave

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

--
-Dave


(Rien) #9

Solutions that work only part of the time will raise expectations of the user only to crush hem later when they fail.
It will likely generate a number of “bug” reports that must then be explained.

I would suggest that functionality like this should only be applied to custom types, never on the build-in types.

Regards,
Rien

Site: http://balancingrock.nl
Blog: http://swiftrien.blogspot.com
Github: http://github.com/Swiftrien
Project: http://swiftfire.nl

···

On 18 Dec 2016, at 10:12, Haravikk via swift-evolution <swift-evolution@swift.org> wrote:

On 16 Dec 2016, at 14:51, Anton Zhilin via swift-evolution <swift-evolution@swift.org> wrote:

It will be impossible without huge additional overhead.
I assume that "invalidated" means "no more points to that element".

Consider that an element is inserted in the middle of an Array.
Assuming enough capacity, all iterators after that one will be invalidated.
But all new iterators pointing to the same spots will be valid.
How do you differentiate between the "old" ones and the "new" ones?

I see only one general approach to this:
1. Make iterator type a class
2. Add to the collection, an array of all iterators, which have been created (and are being used)
3. Add a "valid" flag to iterator
4. On most operations on the collection, it will walk through its iterators, marking some as "invalid".

It's a safe way to eliminate some "out of bounds" errors, but it's just utterly rediculous.
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

There was one idea I was exploring but never quite came up with a final design for, which would be the ability for types to "tag" and invalidate values that they return.

In essence, a collection might return an index and tag it as "indices", and a later call to remove, add etc. might invalidate all values tagged as "indices", and provide some guidance on what to do about it. The Swift compiler will make a best effort to track these tags to provide warnings.

I was thinking something like this (heavily cut down):

  struct Foo : Collection {
    func index(_ i:Index, offsetBy n:IndexDistance) -> Index @tag("indices") { return i + n }
    @invalidates("indices") func remove(at:Index) { /* Remove an element */ }
  }

  var foo = Foo()
  let index = foo.index(foo.startIndex, offsetBy: 5)
  foo.remove(at: index) // This is fine
  foo.remove(at: index) // Warning: indices of foo have been invalidated

In other words, the variable index is linked (in the compiler, not at run-time) to the type instance that created it by the specified tag "indices". Now, when that type invalidates "indices" that variable is effectively marked as invalid, thus any attempt to use it will produce a warning that it may no longer be valid. Of course in the case of simple indices it should be fine, though technically speaking you're still doing something that could still fail at runtime.

The idea here is that, in the simpler cases at least, the compiler gains some awareness of index invalidation, allowing us to potentially avoid run-time errors entirely; naturally it gets more complex if the indices are passed around, but as long as it is still possible to track where they came from (e.g- if it's a collection held by a class reference) then it can still work. This feature could also be used to detect use of an index for the wrong type or instance; i.e- if two instances have the same index type, indices may not be compatible between the two, currently mixing and matching them isn't a compiler error, but can fail unexpectedly at runtime, though there's probably a simpler solution to that specific case (e.g- some way to make the type checker to treat their indices as if they were different types).
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Dave Abrahams) #10

Because I don't see any way to use it in correct code. At least, none
that can't be more-usefully and reliably provided by returning more
index information from mutating methods.

Ok. I agree. All the indices must be invalidated if the collection
object is mutated.

I didn't say that, and in the general case, I don't agree. Collection
indices really must be stable when the collection is sorted, for
example. It is only structure-changing operations that can invalidate
indices, and even then that's only true for some collections and some
operations.

Note that this means that the copy-on-write step for
conceptually-non-structure-changing mutations may not be allowed to
change the data structure's physical structure, if that would require
invalidating indices.

I just thought about making it straight and obvious how to check if
index is still valid for simple collections like Array. Returning
additional index information from mutating methods is a subtle thing,
because some methods do not change the object layout, for example
append does not change validity of indices;

It certainly could. Conceptually, in its most general form, a
Collection is some kind of tree (which must be of limited depth to avoid
violating big-O expectations, but memory limitations pretty much take
care of that---yes, this requires a small amount of handwaving). In its
most general form, an index describes a path through that tree.
Appending to an in-memory B-tree could require the allocation of a new
root node, which changes the paths to all elements in the tree, thereby
invalidating all indices.

To avoid this effect we'd either have to require the tree to become
unbalanced, which is too great a big-O killer, or that it be a
RandomAccessCollection that could be indexed by Ints, which in itself
could hurt the performance of linear traversals.

other methods may invalidate a lot or, even, all indices, for example,
replaceSubrange.

I just want to understand the purpose of indices and what behavior is
intended for them. It's unclear for me now.

I hope the above helps.

I just see a bunch of ways to shoot myself in the foot and looking for
a way to change that disposition.

The best way to do it would be to write the evolution proposal that lays
out the general index invalidation rules and the guarantees we should
give for the various data structures, explaining why those guarantees
are the right ones. If we can get the proposal passed, the issue should
be totally clear for everyone :slight_smile:

As I see it, the problem is: there is no way to find if index is valid
for collection. Obvious way of checking if index is within the
(startIndex..<endIndex) bounds does not work.
So, I know only three possible solutions to the problem:

1. provide a way to check if index is valid;

There's always c.indices.contains(i), though it's potentially very
inefficient and may not tell you what you want to know (at least, not if
you want to know the index points to the same element you thought it
did).

But the bigger question is, what would your program do differently if it
discovered that an index it wanted to use was invalid with respect to a
collection?

2. state in documentation that all indices of any collection become
invalid after each mutation;

That's too broad; as noted above, index invalidation only relates to
structural mutation. We should document the minimum guarantees provided
by all Collections and also any stronger guarantees provided by specific
Collections.

3. state in documentation that accessing elements of collection via
index after mutation is unspecified behavior.

Definitely not; you wouldn't be able to write any interesting generic
mutating algorithms if that were the case.

···

on Tue Dec 20 2016, Alexey Komnin <interfere.work-AT-gmail.com> wrote:

Am I right? Did I miss something?

I would prefer option #1, if there are no other feasible solutions to
the problem.

Thanks.
- Alex Komnin

P.S. I just noticed that our discussion fell out of swift-evolution
list. Added it back.

On Mon, Dec 19, 2016 at 8:30 PM, Dave Abrahams <dabrahams@apple.com> wrote:

on Mon Dec 19 2016, Alexey Komnin <interfere.work-AT-gmail.com> wrote:

I don't personally agree with #3, FWIW

Why? I thought it may be useful.

Because I don't see any way to use it in correct code. At least, none
that can't be more-usefully and reliably provided by returning more
index information from mutating methods.

- Alex Komnin

On Mon, Dec 19, 2016 at 4:20 PM, Dave Abrahams <dabrahams@apple.com> wrote:

I don't personally agree with #3, FWIW

On Dec 19, 2016, at 3:01 AM, Alexey Komnin <interfere.work@gmail.com> wrote:

Ok. Through the discussion I've come to the following:

1. collection invalidates all indices on any mutation as a common solution;
2. collection may keep some indices valid if it is feasible;
3. there are should be a way to check if index is valid;
4. mutating methods should return index of inserted/replaced element.

Did I miss anything?

- Alex Komnin

On Mon, Dec 19, 2016 at 6:10 AM, Dave Abrahams via swift-evolution >>>>> <swift-evolution@swift.org> wrote:

on Fri Dec 16 2016, Daniel Vollmer <swift-evolution@swift.org> wrote:

Hi,

On 16 Dec 2016, at 14:56, Alexey Komnin via swift-evolution <swift-evolution@swift.org> wrote:

[snip]

What do you think? I feel, like we should discuss it before I
formalize it as a proposal.

I think this is a fruitless attempt, as even if the indices are still valid,
they may not refer to the same elements they referenced before the mutation.

Of course, mutating methods should state whether the invalidate existing
indices, but I think that’s about the best that can be reasonably
done.

We can do a bit more. For example, RangeReplaceableCollection's
replaceRange should return the range in the new collection state
corresponding to the elements that were replaced.

--
-Dave

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

--
-Dave

--
-Dave


(Jon Hull) #11

My understanding is that currently indexes are immutable value types, and that we ask the collection for new indices with some relationship to the old ones.

I know we hate reference types here in Swiftland, but why not have indices be simple classes that are immutable to everyone but the collection itself. From the outside, since it is immutable, it should have the same semantics. But because it is a reference type, the collection could store weak reference to all the indices it gives out. Then when the collection mutates in a way which invalidates indices, it can mutate the internals of the indices so that they still point to the same piece of data that they always did (or gets marked as invalid in the case of delete, etc…)

Is there a horrible problem with this approach I am missing?

Thanks,
Jon

···

On Dec 20, 2016, at 5:01 AM, Dave Abrahams via swift-evolution <swift-evolution@swift.org> wrote:

on Tue Dec 20 2016, Alexey Komnin <interfere.work-AT-gmail.com <http://interfere.work-at-gmail.com/>> wrote:

Because I don't see any way to use it in correct code. At least, none
that can't be more-usefully and reliably provided by returning more
index information from mutating methods.

Ok. I agree. All the indices must be invalidated if the collection
object is mutated.

I didn't say that, and in the general case, I don't agree. Collection
indices really must be stable when the collection is sorted, for
example. It is only structure-changing operations that can invalidate
indices, and even then that's only true for some collections and some
operations.

Note that this means that the copy-on-write step for
conceptually-non-structure-changing mutations may not be allowed to
change the data structure's physical structure, if that would require
invalidating indices.

I just thought about making it straight and obvious how to check if
index is still valid for simple collections like Array. Returning
additional index information from mutating methods is a subtle thing,
because some methods do not change the object layout, for example
append does not change validity of indices;

It certainly could. Conceptually, in its most general form, a
Collection is some kind of tree (which must be of limited depth to avoid
violating big-O expectations, but memory limitations pretty much take
care of that---yes, this requires a small amount of handwaving). In its
most general form, an index describes a path through that tree.
Appending to an in-memory B-tree could require the allocation of a new
root node, which changes the paths to all elements in the tree, thereby
invalidating all indices.

To avoid this effect we'd either have to require the tree to become
unbalanced, which is too great a big-O killer, or that it be a
RandomAccessCollection that could be indexed by Ints, which in itself
could hurt the performance of linear traversals.

other methods may invalidate a lot or, even, all indices, for example,
replaceSubrange.

I just want to understand the purpose of indices and what behavior is
intended for them. It's unclear for me now.

I hope the above helps.

I just see a bunch of ways to shoot myself in the foot and looking for
a way to change that disposition.

The best way to do it would be to write the evolution proposal that lays
out the general index invalidation rules and the guarantees we should
give for the various data structures, explaining why those guarantees
are the right ones. If we can get the proposal passed, the issue should
be totally clear for everyone :slight_smile:

As I see it, the problem is: there is no way to find if index is valid
for collection. Obvious way of checking if index is within the
(startIndex..<endIndex) bounds does not work.
So, I know only three possible solutions to the problem:

1. provide a way to check if index is valid;

There's always c.indices.contains(i), though it's potentially very
inefficient and may not tell you what you want to know (at least, not if
you want to know the index points to the same element you thought it
did).

But the bigger question is, what would your program do differently if it
discovered that an index it wanted to use was invalid with respect to a
collection?

2. state in documentation that all indices of any collection become
invalid after each mutation;

That's too broad; as noted above, index invalidation only relates to
structural mutation. We should document the minimum guarantees provided
by all Collections and also any stronger guarantees provided by specific
Collections.

3. state in documentation that accessing elements of collection via
index after mutation is unspecified behavior.

Definitely not; you wouldn't be able to write any interesting generic
mutating algorithms if that were the case.

Am I right? Did I miss something?

I would prefer option #1, if there are no other feasible solutions to
the problem.

Thanks.
- Alex Komnin

P.S. I just noticed that our discussion fell out of swift-evolution
list. Added it back.

On Mon, Dec 19, 2016 at 8:30 PM, Dave Abrahams <dabrahams@apple.com> wrote:

on Mon Dec 19 2016, Alexey Komnin <interfere.work-AT-gmail.com> wrote:

I don't personally agree with #3, FWIW

Why? I thought it may be useful.

Because I don't see any way to use it in correct code. At least, none
that can't be more-usefully and reliably provided by returning more
index information from mutating methods.

- Alex Komnin

On Mon, Dec 19, 2016 at 4:20 PM, Dave Abrahams <dabrahams@apple.com> wrote:

I don't personally agree with #3, FWIW

On Dec 19, 2016, at 3:01 AM, Alexey Komnin <interfere.work@gmail.com> wrote:

Ok. Through the discussion I've come to the following:

1. collection invalidates all indices on any mutation as a common solution;
2. collection may keep some indices valid if it is feasible;
3. there are should be a way to check if index is valid;
4. mutating methods should return index of inserted/replaced element.

Did I miss anything?

- Alex Komnin

On Mon, Dec 19, 2016 at 6:10 AM, Dave Abrahams via swift-evolution >>>>>> <swift-evolution@swift.org> wrote:

on Fri Dec 16 2016, Daniel Vollmer <swift-evolution@swift.org> wrote:

Hi,

On 16 Dec 2016, at 14:56, Alexey Komnin via swift-evolution <swift-evolution@swift.org> wrote:

[snip]

What do you think? I feel, like we should discuss it before I
formalize it as a proposal.

I think this is a fruitless attempt, as even if the indices are still valid,
they may not refer to the same elements they referenced before the mutation.

Of course, mutating methods should state whether the invalidate existing
indices, but I think that’s about the best that can be reasonably
done.

We can do a bit more. For example, RangeReplaceableCollection's
replaceRange should return the range in the new collection state
corresponding to the elements that were replaced.

--
-Dave

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

--
-Dave

--
-Dave
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution


(Dave Abrahams) #12

My understanding is that currently indexes are immutable value types,
and that we ask the collection for new indices with some relationship
to the old ones.

“Immutable value type” doesn't make all that much sense in swift, since
you can always inout a var and change its value. But your
basic point stands.

I know we hate reference types here in Swiftland, but why not have
indices be simple classes that are immutable to everyone but the
collection itself.

Because we don't want to penalize algorithms on arrays by forcing
indexing to allocate class instances.

From the outside, since it is immutable, it should have the same
semantics. But because it is a reference type, the collection could
store weak reference to all the indices it gives out.
Then when the collection mutates in a way which invalidates indices,
it can mutate the internals of the indices so that they still point to
the same piece of data that they always did (or gets marked as invalid
in the case of delete, etc…)

Is there a horrible problem with this approach I am missing?

Only if you care about silly things like performance :wink:

···

on Tue Dec 20 2016, Jonathan Hull <jhull-AT-gbis.com> wrote:

Thanks,
Jon

On Dec 20, 2016, at 5:01 AM, Dave Abrahams via swift-evolution <swift-evolution@swift.org> wrote:

on Tue Dec 20 2016, Alexey Komnin <interfere.work-AT-gmail.com > <http://interfere.work-at-gmail.com/>> wrote:

Because I don't see any way to use it in correct code. At least, none
that can't be more-usefully and reliably provided by returning more
index information from mutating methods.

Ok. I agree. All the indices must be invalidated if the collection
object is mutated.

I didn't say that, and in the general case, I don't agree. Collection
indices really must be stable when the collection is sorted, for
example. It is only structure-changing operations that can invalidate
indices, and even then that's only true for some collections and some
operations.

Note that this means that the copy-on-write step for
conceptually-non-structure-changing mutations may not be allowed to
change the data structure's physical structure, if that would require
invalidating indices.

I just thought about making it straight and obvious how to check if
index is still valid for simple collections like Array. Returning
additional index information from mutating methods is a subtle thing,
because some methods do not change the object layout, for example
append does not change validity of indices;

It certainly could. Conceptually, in its most general form, a
Collection is some kind of tree (which must be of limited depth to avoid
violating big-O expectations, but memory limitations pretty much take
care of that---yes, this requires a small amount of handwaving). In its
most general form, an index describes a path through that tree.
Appending to an in-memory B-tree could require the allocation of a new
root node, which changes the paths to all elements in the tree, thereby
invalidating all indices.

To avoid this effect we'd either have to require the tree to become
unbalanced, which is too great a big-O killer, or that it be a
RandomAccessCollection that could be indexed by Ints, which in itself
could hurt the performance of linear traversals.

other methods may invalidate a lot or, even, all indices, for example,
replaceSubrange.

I just want to understand the purpose of indices and what behavior is
intended for them. It's unclear for me now.

I hope the above helps.

I just see a bunch of ways to shoot myself in the foot and looking for
a way to change that disposition.

The best way to do it would be to write the evolution proposal that lays
out the general index invalidation rules and the guarantees we should
give for the various data structures, explaining why those guarantees
are the right ones. If we can get the proposal passed, the issue should
be totally clear for everyone :slight_smile:

As I see it, the problem is: there is no way to find if index is valid
for collection. Obvious way of checking if index is within the
(startIndex..<endIndex) bounds does not work.
So, I know only three possible solutions to the problem:

1. provide a way to check if index is valid;

There's always c.indices.contains(i), though it's potentially very
inefficient and may not tell you what you want to know (at least, not if
you want to know the index points to the same element you thought it
did).

But the bigger question is, what would your program do differently if it
discovered that an index it wanted to use was invalid with respect to a
collection?

2. state in documentation that all indices of any collection become
invalid after each mutation;

That's too broad; as noted above, index invalidation only relates to
structural mutation. We should document the minimum guarantees provided
by all Collections and also any stronger guarantees provided by specific
Collections.

3. state in documentation that accessing elements of collection via
index after mutation is unspecified behavior.

Definitely not; you wouldn't be able to write any interesting generic
mutating algorithms if that were the case.

Am I right? Did I miss something?

I would prefer option #1, if there are no other feasible solutions to
the problem.

Thanks.
- Alex Komnin

P.S. I just noticed that our discussion fell out of swift-evolution
list. Added it back.

On Mon, Dec 19, 2016 at 8:30 PM, Dave Abrahams <dabrahams@apple.com> wrote:

on Mon Dec 19 2016, Alexey Komnin <interfere.work-AT-gmail.com> wrote:

I don't personally agree with #3, FWIW

Why? I thought it may be useful.

Because I don't see any way to use it in correct code. At least, none
that can't be more-usefully and reliably provided by returning more
index information from mutating methods.

- Alex Komnin

On Mon, Dec 19, 2016 at 4:20 PM, Dave Abrahams <dabrahams@apple.com> wrote:

I don't personally agree with #3, FWIW

On Dec 19, 2016, at 3:01 AM, Alexey Komnin <interfere.work@gmail.com> wrote:

Ok. Through the discussion I've come to the following:

1. collection invalidates all indices on any mutation as a common solution;
2. collection may keep some indices valid if it is feasible;
3. there are should be a way to check if index is valid;
4. mutating methods should return index of inserted/replaced element.

Did I miss anything?

- Alex Komnin

On Mon, Dec 19, 2016 at 6:10 AM, Dave Abrahams via swift-evolution >>>>>>> <swift-evolution@swift.org> wrote:

on Fri Dec 16 2016, Daniel Vollmer <swift-evolution@swift.org> wrote:

Hi,

On 16 Dec 2016, at 14:56, Alexey Komnin via swift-evolution <swift-evolution@swift.org> wrote:

[snip]

What do you think? I feel, like we should discuss it before I
formalize it as a proposal.

I think this is a fruitless attempt, as even if the indices are still valid,
they may not refer to the same elements they referenced before the mutation.

Of course, mutating methods should state whether the invalidate existing
indices, but I think that’s about the best that can be reasonably
done.

We can do a bit more. For example, RangeReplaceableCollection's
replaceRange should return the range in the new collection state
corresponding to the elements that were replaced.

--
-Dave

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

--
-Dave

--
-Dave
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

<https://lists.swift.org/mailman/listinfo/swift-evolution>

--
-Dave


(Alexey Komnin) #13

Because I don't see any way to use it in correct code. At least, none
that can't be more-usefully and reliably provided by returning more
index information from mutating methods.

Ok. I agree. All the indices must be invalidated if the collection
object is mutated.

I didn't say that, and in the general case, I don't agree.

Sorry, I misunderstood your point here before. Now I see what you were
talking about. Thanks for the clarification.

The best way to do it would be to write the evolution proposal that lays
out the general index invalidation rules and the guarantees we should
give for the various data structures, explaining why those guarantees
are the right ones. If we can get the proposal passed, the issue should
be totally clear for everyone :slight_smile:

Yes, I think the same way. I want to end up with a proposal, but I
have no insight into existing plans about evolution of Collection
protocol. That's why I decided to start from the question to mailing
list.

Collection indices really must be stable when the collection is sorted, for
example. It is only structure-changing operations that can invalidate
indices, and even then that's only true for some collections and some
operations.

Yes, it's a subtle thing I was talking about. Indices are not stable
if we are sorting a Set, for example. Therefore, we have a set of
collections keeping indices safe after sorting and a set of
collections which don't.

Let's take a look at very simple collection, or, as common example, a
set of well-known collections. Here is (incomplete) list:
* Array
* Forward List (single-linked)
* Set

What is a notion of an index for each of them? It's pretty obvious
what is the index when we are talking about Array. But what is the
index when we are talking about Set? Would it be different for Hashed
set (current default implementation in Swift stdlib) and Tree-based
set? What if we make it using skip-list as a base collection?

Forward list is a pretty simple data structure. Is it a collection?
Yes, indeed! But does it have an index? I don't know, because notion
of an index is unclear for me. We could define index for an element in
the list as a number of strides from the head node. Hence the index
is of integer type.

The same notion is applicable to all collections. We could define an
index for a Set as a number of iterator steps from the beginning to
the node with an element. But it does kill all performance.

We should document the minimum guarantees provided
by all Collections and also any stronger guarantees provided by specific
Collections.

I have tried to sum up all I knew about different collections:

* Unordered collections, based on hash function, may invalidate any
number of indices during any mutation, because:
    a) any mutation may cause rehashing,
    b) current implementation may invalidate indices of elements with
collided hashValue;
* Random-access collections like Array, or double-ended queue, may
invalidate indices only during removal operation. Note, that
invalidated index (or indices) may point to other element;
* Forward and double-linked lists invalidate only index of deleted element;
* Tree-based collections may or may not invalidate indices due to
insert operation, depending on the internal algorithm of any
particular collection;
* Some complex collections, like LRU or LFU cache may invalidate
indices during get subscript operation.

There is no any common way as well as no common meaning of an index.

So, the possible minimum guarantee: any particular index may be
invalidated as a result of any operation.
If we drop complex containers out of Collections list, we could
restrict the rule: any particular index may be invalidated as a result
of any mutating operation.
Restriction for RandomAccessCollection: all indices in range
(startIndex..<endIndex) are and remain valid during collection
lifetime.

Also there is a bunch of non-trivial rules for any particular
collection. Now, there are like three or four collections in stdlib.
Is there any more in future plans?

- Alex Komnin

P.S. Sorry, if I am acting little nosey, but I am in the middle of
implementing some additional collections in Swift for my project, and
I would prefer to make them as close to canonical stdlib
implementation as it possible.

···

On Tue, Dec 20, 2016 at 4:01 PM, Dave Abrahams <dabrahams@apple.com> wrote:

on Tue Dec 20 2016, Alexey Komnin <interfere.work-AT-gmail.com> wrote:

Because I don't see any way to use it in correct code. At least, none
that can't be more-usefully and reliably provided by returning more
index information from mutating methods.

Ok. I agree. All the indices must be invalidated if the collection
object is mutated.

I didn't say that, and in the general case, I don't agree. Collection
indices really must be stable when the collection is sorted, for
example. It is only structure-changing operations that can invalidate
indices, and even then that's only true for some collections and some
operations.

Note that this means that the copy-on-write step for
conceptually-non-structure-changing mutations may not be allowed to
change the data structure's physical structure, if that would require
invalidating indices.

I just thought about making it straight and obvious how to check if
index is still valid for simple collections like Array. Returning
additional index information from mutating methods is a subtle thing,
because some methods do not change the object layout, for example
append does not change validity of indices;

It certainly could. Conceptually, in its most general form, a
Collection is some kind of tree (which must be of limited depth to avoid
violating big-O expectations, but memory limitations pretty much take
care of that---yes, this requires a small amount of handwaving). In its
most general form, an index describes a path through that tree.
Appending to an in-memory B-tree could require the allocation of a new
root node, which changes the paths to all elements in the tree, thereby
invalidating all indices.

To avoid this effect we'd either have to require the tree to become
unbalanced, which is too great a big-O killer, or that it be a
RandomAccessCollection that could be indexed by Ints, which in itself
could hurt the performance of linear traversals.

other methods may invalidate a lot or, even, all indices, for example,
replaceSubrange.

I just want to understand the purpose of indices and what behavior is
intended for them. It's unclear for me now.

I hope the above helps.

I just see a bunch of ways to shoot myself in the foot and looking for
a way to change that disposition.

The best way to do it would be to write the evolution proposal that lays
out the general index invalidation rules and the guarantees we should
give for the various data structures, explaining why those guarantees
are the right ones. If we can get the proposal passed, the issue should
be totally clear for everyone :slight_smile:

As I see it, the problem is: there is no way to find if index is valid
for collection. Obvious way of checking if index is within the
(startIndex..<endIndex) bounds does not work.
So, I know only three possible solutions to the problem:

1. provide a way to check if index is valid;

There's always c.indices.contains(i), though it's potentially very
inefficient and may not tell you what you want to know (at least, not if
you want to know the index points to the same element you thought it
did).

But the bigger question is, what would your program do differently if it
discovered that an index it wanted to use was invalid with respect to a
collection?

2. state in documentation that all indices of any collection become
invalid after each mutation;

That's too broad; as noted above, index invalidation only relates to
structural mutation. We should document the minimum guarantees provided
by all Collections and also any stronger guarantees provided by specific
Collections.

3. state in documentation that accessing elements of collection via
index after mutation is unspecified behavior.

Definitely not; you wouldn't be able to write any interesting generic
mutating algorithms if that were the case.

Am I right? Did I miss something?

I would prefer option #1, if there are no other feasible solutions to
the problem.

Thanks.
- Alex Komnin

P.S. I just noticed that our discussion fell out of swift-evolution
list. Added it back.

On Mon, Dec 19, 2016 at 8:30 PM, Dave Abrahams <dabrahams@apple.com> wrote:

on Mon Dec 19 2016, Alexey Komnin <interfere.work-AT-gmail.com> wrote:

I don't personally agree with #3, FWIW

Why? I thought it may be useful.

Because I don't see any way to use it in correct code. At least, none
that can't be more-usefully and reliably provided by returning more
index information from mutating methods.

- Alex Komnin

On Mon, Dec 19, 2016 at 4:20 PM, Dave Abrahams <dabrahams@apple.com> wrote:

I don't personally agree with #3, FWIW

On Dec 19, 2016, at 3:01 AM, Alexey Komnin <interfere.work@gmail.com> wrote:

Ok. Through the discussion I've come to the following:

1. collection invalidates all indices on any mutation as a common solution;
2. collection may keep some indices valid if it is feasible;
3. there are should be a way to check if index is valid;
4. mutating methods should return index of inserted/replaced element.

Did I miss anything?

- Alex Komnin

On Mon, Dec 19, 2016 at 6:10 AM, Dave Abrahams via swift-evolution >>>>>> <swift-evolution@swift.org> wrote:

on Fri Dec 16 2016, Daniel Vollmer <swift-evolution@swift.org> wrote:

Hi,

On 16 Dec 2016, at 14:56, Alexey Komnin via swift-evolution <swift-evolution@swift.org> wrote:

[snip]

What do you think? I feel, like we should discuss it before I
formalize it as a proposal.

I think this is a fruitless attempt, as even if the indices are still valid,
they may not refer to the same elements they referenced before the mutation.

Of course, mutating methods should state whether the invalidate existing
indices, but I think that’s about the best that can be reasonably
done.

We can do a bit more. For example, RangeReplaceableCollection's
replaceRange should return the range in the new collection state
corresponding to the elements that were replaced.

--
-Dave

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

--
-Dave

--
-Dave


(Haravikk) #14

Solutions that work only part of the time will raise expectations of the user only to crush hem later when they fail.
It will likely generate a number of “bug” reports that must then be explained.

I would suggest that functionality like this should only be applied to custom types, never on the build-in types.

Regards,
Rien

It's not so much that it only works part of the time, but more that in situations where the "link" would be broken we would need to issue a warning, which could get annoying given that currently we have no way to suppress these.

This is why I stopped working on it as there's a lot of details to hammer out; I do think it's feasible, but there's a lot to consider. For example, indices could be treated in a manner similar to non-escaping closures, in that you can't store or pass them in any way that would break this link to the parent, unless you use a keyword or attribute to explicitly handle them in an "unsafe" way.

So it's not so much that it only works "part of the time", but rather that it relies on a type of inference, and needs some way to correctly (and usefully) handle situations where that inference can't be performed, which is why I put it to one side as it was taking up too much time.

If I ever do have more time I might put up a more formal thread to discuss it properly; I do think it's feasible, and possible to make it work well in most common cases, the question mark is just how best to handle those cases where either more data is needed, or the warning needs to be overridden somehow.

···

On 18 Dec 2016, at 10:04, Rien <Rien@balancingrock.nl> wrote:

On 18 Dec 2016, at 10:12, Haravikk via swift-evolution <swift-evolution@swift.org> wrote:

On 16 Dec 2016, at 14:51, Anton Zhilin via swift-evolution <swift-evolution@swift.org> wrote:
It will be impossible without huge additional overhead.
I assume that "invalidated" means "no more points to that element".

Consider that an element is inserted in the middle of an Array.
Assuming enough capacity, all iterators after that one will be invalidated.
But all new iterators pointing to the same spots will be valid.
How do you differentiate between the "old" ones and the "new" ones?

I see only one general approach to this:
1. Make iterator type a class
2. Add to the collection, an array of all iterators, which have been created (and are being used)
3. Add a "valid" flag to iterator
4. On most operations on the collection, it will walk through its iterators, marking some as "invalid".

It's a safe way to eliminate some "out of bounds" errors, but it's just utterly rediculous.
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

There was one idea I was exploring but never quite came up with a final design for, which would be the ability for types to "tag" and invalidate values that they return.

In essence, a collection might return an index and tag it as "indices", and a later call to remove, add etc. might invalidate all values tagged as "indices", and provide some guidance on what to do about it. The Swift compiler will make a best effort to track these tags to provide warnings.

I was thinking something like this (heavily cut down):

  struct Foo : Collection {
    func index(_ i:Index, offsetBy n:IndexDistance) -> Index @tag("indices") { return i + n }
    @invalidates("indices") func remove(at:Index) { /* Remove an element */ }
  }

  var foo = Foo()
  let index = foo.index(foo.startIndex, offsetBy: 5)
  foo.remove(at: index) // This is fine
  foo.remove(at: index) // Warning: indices of foo have been invalidated

In other words, the variable index is linked (in the compiler, not at run-time) to the type instance that created it by the specified tag "indices". Now, when that type invalidates "indices" that variable is effectively marked as invalid, thus any attempt to use it will produce a warning that it may no longer be valid. Of course in the case of simple indices it should be fine, though technically speaking you're still doing something that could still fail at runtime.

The idea here is that, in the simpler cases at least, the compiler gains some awareness of index invalidation, allowing us to potentially avoid run-time errors entirely; naturally it gets more complex if the indices are passed around, but as long as it is still possible to track where they came from (e.g- if it's a collection held by a class reference) then it can still work. This feature could also be used to detect use of an index for the wrong type or instance; i.e- if two instances have the same index type, indices may not be compatible between the two, currently mixing and matching them isn't a compiler error, but can fail unexpectedly at runtime, though there's probably a simpler solution to that specific case (e.g- some way to make the type checker to treat their indices as if they were different types).


(Dave Abrahams) #15

Because I don't see any way to use it in correct code. At least, none
that can't be more-usefully and reliably provided by returning more
index information from mutating methods.

Ok. I agree. All the indices must be invalidated if the collection
object is mutated.

I didn't say that, and in the general case, I don't agree.

Sorry, I misunderstood your point here before. Now I see what you were
talking about. Thanks for the clarification.

The best way to do it would be to write the evolution proposal that lays
out the general index invalidation rules and the guarantees we should
give for the various data structures, explaining why those guarantees
are the right ones. If we can get the proposal passed, the issue should
be totally clear for everyone :slight_smile:

Yes, I think the same way. I want to end up with a proposal, but I
have no insight into existing plans about evolution of Collection
protocol. That's why I decided to start from the question to mailing
list.

Speaking for myself, I have no plans to change the fundamental model
with respect to index invalidation. We reworked indexing for Swift 3
and I think we've settled in a pretty good place.

I think, to complete the Collection model for performance purposes, we
need to add (something like) these requirements:

  associatedtype Segments: Collection
    where Segments.Iterator.Element : Collection,
      Segments.Iterator.Elements.Iterator.Element == Iterator.Element

  var segments: Segments? { get }

(for more information on that, see http://lafstern.org/matt/segmented.pdf)

But as I say, those changes shouldn't really impact index invalidation.

Collection indices really must be stable when the collection is sorted, for
example. It is only structure-changing operations that can invalidate
indices, and even then that's only true for some collections and some
operations.

Yes, it's a subtle thing I was talking about. Indices are not stable
if we are sorting a Set, for example.

You can't sort a set :slight_smile:

Therefore, we have a set of collections keeping indices safe after
sorting and a set of collections which don't.

Let's take a look at very simple collection, or, as common example, a
set of well-known collections. Here is (incomplete) list:
* Array
* Forward List (single-linked)
* Set

What is a notion of an index for each of them? It's pretty obvious
what is the index when we are talking about Array. But what is the
index when we are talking about Set? Would it be different for Hashed
set (current default implementation in Swift stdlib) and Tree-based
set?
What if we make it using skip-list as a base collection?

As I've said in text you quoted below, in its most general form, an
index describes a path through a tree.

Forward list is a pretty simple data structure. Is it a collection?
Yes, indeed!

Sure, informally it's a collection. Whether or not it conforms to
`Collection` depends on how you design it.

But does it have an index?

The simplest index would be a pointer to a node in the list. However
note that such a list couldn't have value semantics.

I don't know, because notion of an index is unclear for me. We could
define index for an element in the list as a number of strides from
the head node. Hence the index is of integer type.

No, we couldn't, because subscripting is required to be O(1).

The same notion is applicable to all collections. We could define an
index for a Set as a number of iterator steps from the beginning to
the node with an element. But it does kill all performance.

We should document the minimum guarantees provided
by all Collections and also any stronger guarantees provided by specific
Collections.

I have tried to sum up all I knew about different collections:

* Unordered collections, based on hash function, may invalidate any
number of indices during any mutation, because:
    a) any mutation may cause rehashing,

No; changing the value associated with a dictionary key doesn't
invalidate anything.

    b) current implementation may invalidate indices of elements with
collided hashValue;
* Random-access collections like Array, or double-ended queue, may
invalidate indices only during removal operation.

It may really depend how you represent indices. A deque can be more
efficient with indices that consist of a pair of pointers (or a pair of
offsets) than with a single integer. I've seen it implemented with a
circular buffer for the root of the “tree.” Different implementation
choices can result in different invalidation behaviors. The question
is, which implementation choices are worth accomodating?

Note, that invalidated index (or indices) may point to other element;
* Forward and double-linked lists invalidate only index of deleted element;

This is in general true of data structures that store one element per
allocation (e.g. a red-black tree). Note that in redesigning the
indexing model we intentionally decided that the most important data
structures to accomodate all have some contiguous storage (e.g. a
B-tree) and that the specific goal of fitting linked lists into the
model could be sacrificed if necessary to acheive other goals. That's
why, for example, I'm pretty sure, you can't build a linked list
Collection with value semantics.

* Tree-based collections may or may not invalidate indices due to
insert operation, depending on the internal algorithm of any
particular collection;
* Some complex collections, like LRU or LFU cache may invalidate
indices during get subscript operation.

No, we can't have read operations invalidating indices. No generic
algorithms would work reliably on such a “Collection”. One might as
well not make the data structure conform to the protocol.

There is no any common way as well as no common meaning of an index.

There *is* a common meaning of an index. It is a position in the data
structure. The rest of its meaning is given by the fact that
Collections are multipass and the indices of all elements are reachable
from startIndex by applying index(after:) zero or more times.

So, the possible minimum guarantee: any particular index may be
invalidated as a result of any operation.
If we drop complex containers out of Collections list, we could
restrict the rule: any particular index may be invalidated as a result
of any mutating operation.

That's not strong enough. Any MutableCollection can be sorted. Sorting
wouldn't work if indices were invalidated by the assignment operations
used in sorting.

Restriction for RandomAccessCollection: all indices in range
(startIndex..<endIndex) are and remain valid during collection
lifetime.

Also there is a bunch of non-trivial rules for any particular
collection. Now, there are like three or four collections in stdlib.
Is there any more in future plans?

Yes, we anticipate adding more some day. But more importantly, the
Collection protocols are designed to cover a range of plausible data
structures both inside and outside the standard library, and the index
invalidation rules must both:

1. allow important algorithms—in the standard library (such as sort),
   and those that may be written by others—to work, and

2. be implementable for a wide range of important concrete collection
   models.

···

on Fri Dec 23 2016, Alexey Komnin <interfere.work-AT-gmail.com> wrote:

- Alex Komnin

P.S. Sorry, if I am acting little nosey, but I am in the middle of
implementing some additional collections in Swift for my project, and
I would prefer to make them as close to canonical stdlib
implementation as it possible.

On Tue, Dec 20, 2016 at 4:01 PM, Dave Abrahams <dabrahams@apple.com> wrote:

on Tue Dec 20 2016, Alexey Komnin <interfere.work-AT-gmail.com> wrote:

Because I don't see any way to use it in correct code. At least, none
that can't be more-usefully and reliably provided by returning more
index information from mutating methods.

Ok. I agree. All the indices must be invalidated if the collection
object is mutated.

I didn't say that, and in the general case, I don't agree. Collection
indices really must be stable when the collection is sorted, for
example. It is only structure-changing operations that can invalidate
indices, and even then that's only true for some collections and some
operations.

Note that this means that the copy-on-write step for
conceptually-non-structure-changing mutations may not be allowed to
change the data structure's physical structure, if that would require
invalidating indices.

I just thought about making it straight and obvious how to check if
index is still valid for simple collections like Array. Returning
additional index information from mutating methods is a subtle thing,
because some methods do not change the object layout, for example
append does not change validity of indices;

It certainly could. Conceptually, in its most general form, a
Collection is some kind of tree (which must be of limited depth to avoid
violating big-O expectations, but memory limitations pretty much take
care of that---yes, this requires a small amount of handwaving). In its
most general form, an index describes a path through that tree.
Appending to an in-memory B-tree could require the allocation of a new
root node, which changes the paths to all elements in the tree, thereby
invalidating all indices.

To avoid this effect we'd either have to require the tree to become
unbalanced, which is too great a big-O killer, or that it be a
RandomAccessCollection that could be indexed by Ints, which in itself
could hurt the performance of linear traversals.

other methods may invalidate a lot or, even, all indices, for example,
replaceSubrange.

I just want to understand the purpose of indices and what behavior is
intended for them. It's unclear for me now.

I hope the above helps.

I just see a bunch of ways to shoot myself in the foot and looking for
a way to change that disposition.

The best way to do it would be to write the evolution proposal that lays
out the general index invalidation rules and the guarantees we should
give for the various data structures, explaining why those guarantees
are the right ones. If we can get the proposal passed, the issue should
be totally clear for everyone :slight_smile:

As I see it, the problem is: there is no way to find if index is valid
for collection. Obvious way of checking if index is within the
(startIndex..<endIndex) bounds does not work.
So, I know only three possible solutions to the problem:

1. provide a way to check if index is valid;

There's always c.indices.contains(i), though it's potentially very
inefficient and may not tell you what you want to know (at least, not if
you want to know the index points to the same element you thought it
did).

But the bigger question is, what would your program do differently if it
discovered that an index it wanted to use was invalid with respect to a
collection?

2. state in documentation that all indices of any collection become
invalid after each mutation;

That's too broad; as noted above, index invalidation only relates to
structural mutation. We should document the minimum guarantees provided
by all Collections and also any stronger guarantees provided by specific
Collections.

3. state in documentation that accessing elements of collection via
index after mutation is unspecified behavior.

Definitely not; you wouldn't be able to write any interesting generic
mutating algorithms if that were the case.

Am I right? Did I miss something?

I would prefer option #1, if there are no other feasible solutions to
the problem.

Thanks.
- Alex Komnin

P.S. I just noticed that our discussion fell out of swift-evolution
list. Added it back.

On Mon, Dec 19, 2016 at 8:30 PM, Dave Abrahams <dabrahams@apple.com> wrote:

on Mon Dec 19 2016, Alexey Komnin <interfere.work-AT-gmail.com> wrote:

I don't personally agree with #3, FWIW

Why? I thought it may be useful.

Because I don't see any way to use it in correct code. At least, none
that can't be more-usefully and reliably provided by returning more
index information from mutating methods.

- Alex Komnin

On Mon, Dec 19, 2016 at 4:20 PM, Dave Abrahams <dabrahams@apple.com> wrote:

I don't personally agree with #3, FWIW

On Dec 19, 2016, at 3:01 AM, Alexey Komnin <interfere.work@gmail.com> wrote:

Ok. Through the discussion I've come to the following:

1. collection invalidates all indices on any mutation as a common solution;
2. collection may keep some indices valid if it is feasible;
3. there are should be a way to check if index is valid;
4. mutating methods should return index of inserted/replaced element.

Did I miss anything?

- Alex Komnin

On Mon, Dec 19, 2016 at 6:10 AM, Dave Abrahams via swift-evolution >>>>>>> <swift-evolution@swift.org> wrote:

on Fri Dec 16 2016, Daniel Vollmer <swift-evolution@swift.org> wrote:

Hi,

On 16 Dec 2016, at 14:56, Alexey Komnin via swift-evolution <swift-evolution@swift.org> wrote:

[snip]

What do you think? I feel, like we should discuss it before I
formalize it as a proposal.

I think this is a fruitless attempt, as even if the indices are still valid,
they may not refer to the same elements they referenced before the mutation.

Of course, mutating methods should state whether the invalidate existing
indices, but I think that’s about the best that can be reasonably
done.

We can do a bit more. For example, RangeReplaceableCollection's
replaceRange should return the range in the new collection state
corresponding to the elements that were replaced.

--
-Dave

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

--
-Dave

--
-Dave

--
-Dave


(Rien) #16

I should clarify:

“works only part of the time” was meant to say that it does not cover all cases. Sometimes a warning will be given and sometimes not. When referring to time I was thinking of the programmer, not of compilation or execution time.

I think that “partially detectable” features increase the cognitive workload rather than decreasing it. If that is the case then I would rather do without.

Regards,
Rien

Site: http://balancingrock.nl
Blog: http://swiftrien.blogspot.com
Github: http://github.com/Swiftrien
Project: http://swiftfire.nl

···

On 18 Dec 2016, at 12:39, Haravikk <swift-evolution@haravikk.me> wrote:

On 18 Dec 2016, at 10:04, Rien <Rien@balancingrock.nl> wrote:

Solutions that work only part of the time will raise expectations of the user only to crush hem later when they fail.
It will likely generate a number of “bug” reports that must then be explained.

I would suggest that functionality like this should only be applied to custom types, never on the build-in types.

Regards,
Rien

It's not so much that it only works part of the time, but more that in situations where the "link" would be broken we would need to issue a warning, which could get annoying given that currently we have no way to suppress these.

This is why I stopped working on it as there's a lot of details to hammer out; I do think it's feasible, but there's a lot to consider. For example, indices could be treated in a manner similar to non-escaping closures, in that you can't store or pass them in any way that would break this link to the parent, unless you use a keyword or attribute to explicitly handle them in an "unsafe" way.

So it's not so much that it only works "part of the time", but rather that it relies on a type of inference, and needs some way to correctly (and usefully) handle situations where that inference can't be performed, which is why I put it to one side as it was taking up too much time.

If I ever do have more time I might put up a more formal thread to discuss it properly; I do think it's feasible, and possible to make it work well in most common cases, the question mark is just how best to handle those cases where either more data is needed, or the warning needs to be overridden somehow.

On 18 Dec 2016, at 10:12, Haravikk via swift-evolution <swift-evolution@swift.org> wrote:

On 16 Dec 2016, at 14:51, Anton Zhilin via swift-evolution <swift-evolution@swift.org> wrote:
It will be impossible without huge additional overhead.
I assume that "invalidated" means "no more points to that element".

Consider that an element is inserted in the middle of an Array.
Assuming enough capacity, all iterators after that one will be invalidated.
But all new iterators pointing to the same spots will be valid.
How do you differentiate between the "old" ones and the "new" ones?

I see only one general approach to this:
1. Make iterator type a class
2. Add to the collection, an array of all iterators, which have been created (and are being used)
3. Add a "valid" flag to iterator
4. On most operations on the collection, it will walk through its iterators, marking some as "invalid".

It's a safe way to eliminate some "out of bounds" errors, but it's just utterly rediculous.
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

There was one idea I was exploring but never quite came up with a final design for, which would be the ability for types to "tag" and invalidate values that they return.

In essence, a collection might return an index and tag it as "indices", and a later call to remove, add etc. might invalidate all values tagged as "indices", and provide some guidance on what to do about it. The Swift compiler will make a best effort to track these tags to provide warnings.

I was thinking something like this (heavily cut down):

  struct Foo : Collection {
    func index(_ i:Index, offsetBy n:IndexDistance) -> Index @tag("indices") { return i + n }
    @invalidates("indices") func remove(at:Index) { /* Remove an element */ }
  }

  var foo = Foo()
  let index = foo.index(foo.startIndex, offsetBy: 5)
  foo.remove(at: index) // This is fine
  foo.remove(at: index) // Warning: indices of foo have been invalidated

In other words, the variable index is linked (in the compiler, not at run-time) to the type instance that created it by the specified tag "indices". Now, when that type invalidates "indices" that variable is effectively marked as invalid, thus any attempt to use it will produce a warning that it may no longer be valid. Of course in the case of simple indices it should be fine, though technically speaking you're still doing something that could still fail at runtime.

The idea here is that, in the simpler cases at least, the compiler gains some awareness of index invalidation, allowing us to potentially avoid run-time errors entirely; naturally it gets more complex if the indices are passed around, but as long as it is still possible to track where they came from (e.g- if it's a collection held by a class reference) then it can still work. This feature could also be used to detect use of an index for the wrong type or instance; i.e- if two instances have the same index type, indices may not be compatible between the two, currently mixing and matching them isn't a compiler error, but can fail unexpectedly at runtime, though there's probably a simpler solution to that specific case (e.g- some way to make the type checker to treat their indices as if they were different types).


(Haravikk) #17

Again I'd compare it more to non-escaping closures; most of the time they work just fine and you don't run into any trouble, but occasionally you want to do something that is actually non-escaping but that the compiler can't verify, which is why we have a mechanism to work around it.

Designed properly this should be fine, as it's not as if many programmers habitually scatter their indices to the four winds; in the vast majority of cases indices need to be kept somewhere close to what they refer to, otherwise they're no use at all. Thing of it like a type returning a value with the caveat "keep it where I can see it", like when I lend someone a pen; if you want to take it somewhere else you do so on the caveat that something terrible may happen to you so you best know what you're doing :smirk:

Anyway, my point was to just raise it as a possible solution to this that hadn't been discussed here yet; I never really made a formal proposal for it so if someone else would like to that'd be great, otherwise I'll try to at some point (alongside updating some of my other unfinished proposals).

···

On 18 Dec 2016, at 11:56, Rien <Rien@balancingrock.nl> wrote:

I should clarify:

“works only part of the time” was meant to say that it does not cover all cases. Sometimes a warning will be given and sometimes not. When referring to time I was thinking of the programmer, not of compilation or execution time.

I think that “partially detectable” features increase the cognitive workload rather than decreasing it. If that is the case then I would rather do without.

Regards,
Rien


(Xiaodi Wu) #18

Is the idea that a Collection will invalidate all indices that *may* be
stale, or only ones that *definitely* refer to the wrong thing? It sounds
like OP is asking for the latter, which isn't really feasible, but you're
proposing the former, which may be doable.

However, I have to agree with Rien that this idea sounds brittle. Moreover,
I think you're saying we'd need a workaround when indices are "scattered to
the four winds," but IMO it's precisely when indices are scattered to the
four winds that index invalidation warnings might be most valuable. By
contrast, in your example where indices are kept close at hand, one can see
by inspection that reusing the index is troublesome. It almost sounds to me
like a reasonably achievable solution would help diagnose problems in only
the easiest cases where help is least needed, and that at the same time the
more complex scenarios are rare enough that a more complete solution can't
be justified. I also wonder if some of it would naturally be taken care of
by the borrow-checking features on the horizon for Swift 4/5.

The comparison to non-escaping closures actually gives me pause. IIUC, the
only way to work around the problem of a notionally escaping closure not
actually being escaping is currently extremely horrible (`unsafeBitCast`)
because the proposed `withoutActuallyEscaping` has turned out to be
non-trivial to implement--even though it made it through the evolution
process. It would be very bad if a similar engineering problem presented
itself with Collection indices, especially if the practical gains for the
feature in question are scant.

I should clarify:

“works only part of the time” was meant to say that it does not cover all

cases. Sometimes a warning will be given and sometimes not. When referring
to time I was thinking of the programmer, not of compilation or execution
time.

I think that “partially detectable” features increase the cognitive

workload rather than decreasing it. If that is the case then I would rather
do without.

Regards,

Rien

Again I'd compare it more to non-escaping closures; most of the time they
work just fine and you don't run into any trouble, but occasionally you
want to do something that is actually non-escaping but that the compiler
can't verify, which is why we have a mechanism to work around it.

Designed properly this should be fine, as it's not as if many programmers
habitually scatter their indices to the four winds; in the vast majority of
cases indices need to be kept somewhere close to what they refer to,
otherwise they're no use at all. Thing of it like a type returning a value
with the caveat "keep it where I can see it", like when I lend someone a
pen; if you want to take it somewhere else you do so on the caveat that
something terrible may happen to you so you best know what you're doing :smirk:

Anyway, my point was to just raise it as a possible solution to this that
hadn't been discussed here yet; I never really made a formal proposal for
it so if someone else would like to that'd be great, otherwise I'll try to
at some point (alongside updating some of my other unfinished proposals).

···

On Sun, Dec 18, 2016 at 14:47 Haravikk via swift-evolution < swift-evolution@swift.org> wrote:

On 18 Dec 2016, at 11:56, Rien <Rien@balancingrock.nl> wrote:

_______________________________________________

swift-evolution mailing list

swift-evolution@swift.org

https://lists.swift.org/mailman/listinfo/swift-evolution