Because I don't see any way to use it in correct code. At least, none
that can't be more-usefully and reliably provided by returning more
index information from mutating methods.
Ok. I agree. All the indices must be invalidated if the collection
object is mutated.
I didn't say that, and in the general case, I don't agree.
Sorry, I misunderstood your point here before. Now I see what you were
talking about. Thanks for the clarification.
The best way to do it would be to write the evolution proposal that lays
out the general index invalidation rules and the guarantees we should
give for the various data structures, explaining why those guarantees
are the right ones. If we can get the proposal passed, the issue should
be totally clear for everyone :-)
Yes, I think the same way. I want to end up with a proposal, but I
have no insight into existing plans about evolution of Collection
protocol. That's why I decided to start from the question to mailing
list.
Speaking for myself, I have no plans to change the fundamental model
with respect to index invalidation. We reworked indexing for Swift 3
and I think we've settled in a pretty good place.
I think, to complete the Collection model for performance purposes, we
need to add (something like) these requirements:
associatedtype Segments: Collection
where Segments.Iterator.Element : Collection,
Segments.Iterator.Elements.Iterator.Element == Iterator.Element
var segments: Segments? { get }
(for more information on that, see http://lafstern.org/matt/segmented.pdf\)
But as I say, those changes shouldn't really impact index invalidation.
Collection indices really must be stable when the collection is sorted, for
example. It is only structure-changing operations that can invalidate
indices, and even then that's only true for some collections and some
operations.
Yes, it's a subtle thing I was talking about. Indices are not stable
if we are sorting a Set, for example.
You can't sort a set :-)
Therefore, we have a set of collections keeping indices safe after
sorting and a set of collections which don't.
Let's take a look at very simple collection, or, as common example, a
set of well-known collections. Here is (incomplete) list:
* Array
* Forward List (single-linked)
* Set
What is a notion of an index for each of them? It's pretty obvious
what is the index when we are talking about Array. But what is the
index when we are talking about Set? Would it be different for Hashed
set (current default implementation in Swift stdlib) and Tree-based
set?
What if we make it using skip-list as a base collection?
As I've said in text you quoted below, in its most general form, an
index describes a path through a tree.
Forward list is a pretty simple data structure. Is it a collection?
Yes, indeed!
Sure, informally it's a collection. Whether or not it conforms to
`Collection` depends on how you design it.
But does it have an index?
The simplest index would be a pointer to a node in the list. However
note that such a list couldn't have value semantics.
I don't know, because notion of an index is unclear for me. We could
define index for an element in the list as a number of strides from
the head node. Hence the index is of integer type.
No, we couldn't, because subscripting is required to be O(1).
The same notion is applicable to all collections. We could define an
index for a Set as a number of iterator steps from the beginning to
the node with an element. But it does kill all performance.
We should document the minimum guarantees provided
by all Collections and also any stronger guarantees provided by specific
Collections.
I have tried to sum up all I knew about different collections:
* Unordered collections, based on hash function, may invalidate any
number of indices during any mutation, because:
a) any mutation may cause rehashing,
No; changing the value associated with a dictionary key doesn't
invalidate anything.
b) current implementation may invalidate indices of elements with
collided hashValue;
* Random-access collections like Array, or double-ended queue, may
invalidate indices only during removal operation.
It may really depend how you represent indices. A deque can be more
efficient with indices that consist of a pair of pointers (or a pair of
offsets) than with a single integer. I've seen it implemented with a
circular buffer for the root of the “tree.” Different implementation
choices can result in different invalidation behaviors. The question
is, which implementation choices are worth accomodating?
Note, that invalidated index (or indices) may point to other element;
* Forward and double-linked lists invalidate only index of deleted element;
This is in general true of data structures that store one element per
allocation (e.g. a red-black tree). Note that in redesigning the
indexing model we intentionally decided that the most important data
structures to accomodate all have some contiguous storage (e.g. a
B-tree) and that the specific goal of fitting linked lists into the
model could be sacrificed if necessary to acheive other goals. That's
why, for example, I'm pretty sure, you can't build a linked list
Collection with value semantics.
* Tree-based collections may or may not invalidate indices due to
insert operation, depending on the internal algorithm of any
particular collection;
* Some complex collections, like LRU or LFU cache may invalidate
indices during get subscript operation.
No, we can't have read operations invalidating indices. No generic
algorithms would work reliably on such a “Collection”. One might as
well not make the data structure conform to the protocol.
There is no any common way as well as no common meaning of an index.
There *is* a common meaning of an index. It is a position in the data
structure. The rest of its meaning is given by the fact that
Collections are multipass and the indices of all elements are reachable
from startIndex by applying index(after:) zero or more times.
So, the possible minimum guarantee: any particular index may be
invalidated as a result of any operation.
If we drop complex containers out of Collections list, we could
restrict the rule: any particular index may be invalidated as a result
of any mutating operation.
That's not strong enough. Any MutableCollection can be sorted. Sorting
wouldn't work if indices were invalidated by the assignment operations
used in sorting.
Restriction for RandomAccessCollection: all indices in range
(startIndex..<endIndex) are and remain valid during collection
lifetime.
Also there is a bunch of non-trivial rules for any particular
collection. Now, there are like three or four collections in stdlib.
Is there any more in future plans?
Yes, we anticipate adding more some day. But more importantly, the
Collection protocols are designed to cover a range of plausible data
structures both inside and outside the standard library, and the index
invalidation rules must both:
1. allow important algorithms—in the standard library (such as sort),
and those that may be written by others—to work, and
2. be implementable for a wide range of important concrete collection
models.
···
on Fri Dec 23 2016, Alexey Komnin <interfere.work-AT-gmail.com> wrote:
- Alex Komnin
P.S. Sorry, if I am acting little nosey, but I am in the middle of
implementing some additional collections in Swift for my project, and
I would prefer to make them as close to canonical stdlib
implementation as it possible.
On Tue, Dec 20, 2016 at 4:01 PM, Dave Abrahams <dabrahams@apple.com> wrote:
on Tue Dec 20 2016, Alexey Komnin <interfere.work-AT-gmail.com> wrote:
Because I don't see any way to use it in correct code. At least, none
that can't be more-usefully and reliably provided by returning more
index information from mutating methods.
Ok. I agree. All the indices must be invalidated if the collection
object is mutated.
I didn't say that, and in the general case, I don't agree. Collection
indices really must be stable when the collection is sorted, for
example. It is only structure-changing operations that can invalidate
indices, and even then that's only true for some collections and some
operations.
Note that this means that the copy-on-write step for
conceptually-non-structure-changing mutations may not be allowed to
change the data structure's physical structure, if that would require
invalidating indices.
I just thought about making it straight and obvious how to check if
index is still valid for simple collections like Array. Returning
additional index information from mutating methods is a subtle thing,
because some methods do not change the object layout, for example
append does not change validity of indices;
It certainly could. Conceptually, in its most general form, a
Collection is some kind of tree (which must be of limited depth to avoid
violating big-O expectations, but memory limitations pretty much take
care of that---yes, this requires a small amount of handwaving). In its
most general form, an index describes a path through that tree.
Appending to an in-memory B-tree could require the allocation of a new
root node, which changes the paths to all elements in the tree, thereby
invalidating all indices.
To avoid this effect we'd either have to require the tree to become
unbalanced, which is too great a big-O killer, or that it be a
RandomAccessCollection that could be indexed by Ints, which in itself
could hurt the performance of linear traversals.
other methods may invalidate a lot or, even, all indices, for example,
replaceSubrange.
I just want to understand the purpose of indices and what behavior is
intended for them. It's unclear for me now.
I hope the above helps.
I just see a bunch of ways to shoot myself in the foot and looking for
a way to change that disposition.
The best way to do it would be to write the evolution proposal that lays
out the general index invalidation rules and the guarantees we should
give for the various data structures, explaining why those guarantees
are the right ones. If we can get the proposal passed, the issue should
be totally clear for everyone :-)
As I see it, the problem is: there is no way to find if index is valid
for collection. Obvious way of checking if index is within the
(startIndex..<endIndex) bounds does not work.
So, I know only three possible solutions to the problem:
1. provide a way to check if index is valid;
There's always c.indices.contains(i), though it's potentially very
inefficient and may not tell you what you want to know (at least, not if
you want to know the index points to the same element you thought it
did).
But the bigger question is, what would your program do differently if it
discovered that an index it wanted to use was invalid with respect to a
collection?
2. state in documentation that all indices of any collection become
invalid after each mutation;
That's too broad; as noted above, index invalidation only relates to
structural mutation. We should document the minimum guarantees provided
by all Collections and also any stronger guarantees provided by specific
Collections.
3. state in documentation that accessing elements of collection via
index after mutation is unspecified behavior.
Definitely not; you wouldn't be able to write any interesting generic
mutating algorithms if that were the case.
Am I right? Did I miss something?
I would prefer option #1, if there are no other feasible solutions to
the problem.
Thanks.
- Alex Komnin
P.S. I just noticed that our discussion fell out of swift-evolution
list. Added it back.
On Mon, Dec 19, 2016 at 8:30 PM, Dave Abrahams <dabrahams@apple.com> wrote:
on Mon Dec 19 2016, Alexey Komnin <interfere.work-AT-gmail.com> wrote:
I don't personally agree with #3, FWIW
Why? I thought it may be useful.
Because I don't see any way to use it in correct code. At least, none
that can't be more-usefully and reliably provided by returning more
index information from mutating methods.
- Alex Komnin
On Mon, Dec 19, 2016 at 4:20 PM, Dave Abrahams <dabrahams@apple.com> wrote:
I don't personally agree with #3, FWIW
On Dec 19, 2016, at 3:01 AM, Alexey Komnin <interfere.work@gmail.com> wrote:
Ok. Through the discussion I've come to the following:
1. collection invalidates all indices on any mutation as a common solution;
2. collection may keep some indices valid if it is feasible;
3. there are should be a way to check if index is valid;
4. mutating methods should return index of inserted/replaced element.
Did I miss anything?
- Alex Komnin
On Mon, Dec 19, 2016 at 6:10 AM, Dave Abrahams via swift-evolution >>>>>>> <swift-evolution@swift.org> wrote:
on Fri Dec 16 2016, Daniel Vollmer <swift-evolution@swift.org> wrote:
Hi,
On 16 Dec 2016, at 14:56, Alexey Komnin via swift-evolution <swift-evolution@swift.org> wrote:
[snip]
What do you think? I feel, like we should discuss it before I
formalize it as a proposal.
I think this is a fruitless attempt, as even if the indices are still valid,
they may not refer to the same elements they referenced before the mutation.
Of course, mutating methods should state whether the invalidate existing
indices, but I think that’s about the best that can be reasonably
done.
We can do a bit more. For example, RangeReplaceableCollection's
replaceRange should return the range in the new collection state
corresponding to the elements that were replaced.
--
-Dave
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution
--
-Dave
--
-Dave
--
-Dave