Fixing the confusion between non-mutating algorithms and single-pass sequences


(Jon Hull) #1

I have also been thinking about this problem for the last week or so (as well as the finite/infinite bit). I don’t really have anything detailed that is ready to share (and it sounds like you are headed in a different direction now). I still wanted to share the gist of my thoughts, in case they help spark ideas in others…

My thought was to follow the first rejected approach: removing sequence and letting the Iterator protocol model single-pass. Iterators would be reference types.

I followed a similar path, and my version also has a pretty large duplication of API between Iterator and Collection… the difference though, is I think I have a way to avoid most external duplication of API.

Basically, I added back in a super-minimal protocol to fill the structural gap left by Sequence. I call it “IteratorProvider” and it only has a single function which vends an iterator. Collection adheres to this, and Iterator adheres to it by returning itself. All of the other methods from Sequence remain on Iterator. Thus anyone with API that only needs a single pass would take a IteratorProvider and then work on the iterator it provides.

The big difference is that Collection and Iterator are still separate protocols, iterator is a reference type, and most of the methods from sequence are now on iterator.

I think this makes more sense semantically than the current model (or renaming sequence). I also really think it is important to have iterators be reference types (anything else is really a lie)

The rejected “consumedIn” idea also gave me an idea of how to reduce the internal API repetition, if desired.

Have a fileprivate method on Iterator (I will call it “consumedIn” here, but it is private, so call it whatever) that wraps the Iterator in a collection. The multi-pass-ness of that secret collection is a lie, but it is fileprivate so it should never get into the wild where someone can find that out. Then you would just define map() etc… on an extension of Iterator and have them forward to “self.consumedIn.map”, etc…. It does still have duplication of definitions, but the implementations would be in a single spot.

Another option, if the subterfuge of a secret collection is undesirable, would be to make “consumedIn” be public and have it create an array-like collection. The default implementation would actually make an eager copy, but specialized cases could work with the created collection to avoid copying the iterator’s contents where possible. Then you would remove all of the eager methods from Iterator and just use collection’s version.

Food for thought…

Thanks,
Jon

···

Hi,

I'd like to continue the discussion of the issue raised by David Waite
inhttp://thread.gmane.org/gmane.comp.lang.swift.evolution/21295/: <inhttp://thread.gmane.org/gmane.comp.lang.swift.evolution/21295/:>

> My main motivation for proposing this is the potential for developer confusion. As stated during one of the previous threads on the naming of map, flatMap, filter, etc. methods on Sequence, Sequence has a naming requirement not typical of the rest of the Swift standard library in that many methods on Sequence may or may not be destructive. As such, naming methods for any extensions on Sequence is challenging as the names need to not imply immutability.

I'd like to focus on a particular point: methods on Sequence can
consume elements, but the APIs are not markedmutating.

Dave Abrahams, Max Moiseev, and I have discussed this issue and we
agree this problem is severe and worth solving, we also think that the
likely solutions would be source-breaking, so it is important that we
discuss it now.

We have discussed a few options.

- Rejected option: remove Sequence, let IteratorProtocol model
single-pass data streams

- Rejected option: use a syntactic marker, like sequence.consumedIn.map {}

- Rejected option: mutating APIs on Sequence, non-mutating APIs on Collection

Proposed: rename Sequence to IterableOnce or TraversableOnce. We think
that Sequence does not convey the single-pass restriction clearly. The
term "sequence" has been used in math (as in "integer sequence"), and
since the math domain does not have mutation, "sequence" can be
understood to mean "multi-pass", since you can traverse a sequence of
integers an arbitrary number of times.

We think that only the last option is viable in the Swift language as
it exists now, without creating an undue burden for API vendors and
users.

For more details about rejection options, please see the full writeup:
https://gist.github.com/gribozavr/47f4717f3afc762549383e94da7f748b

Dmitri


(Dave Abrahams) #2

I have also been thinking about this problem for the last week or so
(as well as the finite/infinite bit). I don’t really have anything
detailed that is ready to share (and it sounds like you are headed in
a different direction now). I still wanted to share the gist of my
thoughts, in case they help spark ideas in others…

Hi Jonathan,

Thanks for your thoughts...

My thought was to follow the first rejected approach: removing
sequence and letting the Iterator protocol model single-pass.
Iterators would be reference types.

I followed a similar path, and my version also has a pretty large
duplication of API between Iterator and Collection… the difference
though, is I think I have a way to avoid most external duplication of
API.

Basically, I added back in a super-minimal protocol to fill the
structural gap left by Sequence. I call it “IteratorProvider” and it
only has a single function which vends an iterator. Collection
adheres to this, and Iterator adheres to it by returning itself. All
of the other methods from Sequence remain on Iterator. Thus anyone
with API that only needs a single pass would take a IteratorProvider
and then work on the iterator it provides.

That leaves us back where we are now: people will see that
IteratorProvider is a simple, universal protocol for both single-and
multi-pass sequences, write algorithm libraries that depend on
multi-pass-ness, and test them with the most prevalent examples, which
happen to be multipass.

The big difference is that Collection and Iterator are still separate
protocols,

That's currently the case.

iterator is a reference type,

That's slightly different, but we have reasons for not wanting to make
that requirement.

and most of the methods from sequence are now on iterator.

I think this makes more sense semantically than the current model (or
renaming sequence). I also really think it is important to have
iterators be reference types (anything else is really a lie)

Once they're reference types, you might as well just make them conform
to the same protocol as collections for algorithms, because none of the
“mutating/nonmutating” distinctions will be honored anyway. Furthermore
it implies efficiency costs we're not prepared to pay. The main problem
is that the right way to model single-pass traversal is with move-only
types, which can't be copied... and that's not something we can express
in Swift today.

The rejected “consumedIn” idea also gave me an idea of how to reduce
the internal API repetition, if desired.

Have a fileprivate method on Iterator (I will call it “consumedIn”
here, but it is private, so call it whatever) that wraps the Iterator
in a collection. The multi-pass-ness of that secret collection is a
lie, but it is fileprivate so it should never get into the wild where
someone can find that out. Then you would just define map() etc… on an
extension of Iterator and have them forward to “self.consumedIn.map”,
etc…. It does still have duplication of definitions, but the
implementations would be in a single spot.

Yeah, we thought about that, too. That doesn't really help the public
API, though. That's the problem we're trying to solve. And if we don't
expose the thing publicly, any user wishing to write her own algorithm
that works on both Iterator and Collection has to write it herself.

Another option, if the subterfuge of a secret collection is
undesirable, would be to make “consumedIn” be public and have it
create an array-like collection. The default implementation would
actually make an eager copy, but specialized cases could work with the
created collection to avoid copying the iterator’s contents where
possible. Then you would remove all of the eager methods from
Iterator and just use collection’s version.

Ultimately, there are lots of interesting ideas for addressing this
space, but none of them lead to an answer that we we're willing to bet
solves the design problems and is implementable in time for Swift
3... except for the renaming.

···

on Thu Jul 14 2016, Jonathan Hull <jhull-AT-gbis.com> wrote:

Food for thought…

Thanks,
Jon

Hi,

I'd like to continue the discussion of the issue raised by David Waite
inhttp://thread.gmane.org/gmane.comp.lang.swift.evolution/21295/: <inhttp://thread.gmane.org/gmane.comp.lang.swift.evolution/21295/:>

> My main motivation for proposing this is the potential for
developer confusion. As stated during one of the previous threads on
the naming of map, flatMap, filter, etc. methods on Sequence,
Sequence has a naming requirement not typical of the rest of the
Swift standard library in that many methods on Sequence may or may
not be destructive. As such, naming methods for any extensions on
Sequence is challenging as the names need to not imply immutability.

I'd like to focus on a particular point: methods on Sequence can
consume elements, but the APIs are not markedmutating.

Dave Abrahams, Max Moiseev, and I have discussed this issue and we
agree this problem is severe and worth solving, we also think that the
likely solutions would be source-breaking, so it is important that we
discuss it now.

We have discussed a few options.

- Rejected option: remove Sequence, let IteratorProtocol model
single-pass data streams

- Rejected option: use a syntactic marker, like sequence.consumedIn.map {}

- Rejected option: mutating APIs on Sequence, non-mutating APIs on Collection

Proposed: rename Sequence to IterableOnce or TraversableOnce. We think
that Sequence does not convey the single-pass restriction clearly. The
term "sequence" has been used in math (as in "integer sequence"), and
since the math domain does not have mutation, "sequence" can be
understood to mean "multi-pass", since you can traverse a sequence of
integers an arbitrary number of times.

We think that only the last option is viable in the Swift language as
it exists now, without creating an undue burden for API vendors and
users.

For more details about rejection options, please see the full writeup:
https://gist.github.com/gribozavr/47f4717f3afc762549383e94da7f748b

Dmitri

--
Dave


(Jon Hull) #3

Comments inline:

I have also been thinking about this problem for the last week or so
(as well as the finite/infinite bit). I don’t really have anything
detailed that is ready to share (and it sounds like you are headed in
a different direction now). I still wanted to share the gist of my
thoughts, in case they help spark ideas in others…

Hi Jonathan,

Thanks for your thoughts...

My thought was to follow the first rejected approach: removing
sequence and letting the Iterator protocol model single-pass.
Iterators would be reference types.

I followed a similar path, and my version also has a pretty large
duplication of API between Iterator and Collection… the difference
though, is I think I have a way to avoid most external duplication of
API.

Basically, I added back in a super-minimal protocol to fill the
structural gap left by Sequence. I call it “IteratorProvider” and it
only has a single function which vends an iterator. Collection
adheres to this, and Iterator adheres to it by returning itself. All
of the other methods from Sequence remain on Iterator. Thus anyone
with API that only needs a single pass would take a IteratorProvider
and then work on the iterator it provides.

That leaves us back where we are now: people will see that
IteratorProvider is a simple, universal protocol for both single-and
multi-pass sequences, write algorithm libraries that depend on
multi-pass-ness, and test them with the most prevalent examples, which
happen to be multi pass.

Let me make a quick counter-argument, because I thought about it a bit, and I don’t think it does have the same problem (especially with careful/better naming).

The difference is that the ONLY method on IteratorProvider is the one to get an iterator. There is no map, filter, sort, first, count, etc… just a way to get a single-pass iterator. This changes the mindset when using it. You are aware that you are getting a single-pass iterator.

True, people might try to get the iterator a second time, but we can make the iteratorProvider method optional (and trying to get an iterator from an iterator which is spent would return nil) and then they are forced to deal with the case where it was single-pass.

It is a fairly subtle difference, but an important one IMHO.

The big difference is that Collection and Iterator are still separate
protocols,

That's currently the case.

Yes, you are correct. I don’t know why I thought they were connected in the current version.

iterator is a reference type,

That's slightly different, but we have reasons for not wanting to make
that requirement.

Would you mind expanding on those reasons?

and most of the methods from sequence are now on iterator.

I think this makes more sense semantically than the current model (or
renaming sequence). I also really think it is important to have
iterators be reference types (anything else is really a lie)

Once they're reference types, you might as well just make them conform
to the same protocol as collections for algorithms, because none of the
“mutating/nonmutating” distinctions will be honored anyway. Furthermore
it implies efficiency costs we're not prepared to pay. The main problem
is that the right way to model single-pass traversal is with move-only
types, which can't be copied... and that's not something we can express
in Swift today.

ah.

The rejected “consumedIn” idea also gave me an idea of how to reduce
the internal API repetition, if desired.

Have a fileprivate method on Iterator (I will call it “consumedIn”
here, but it is private, so call it whatever) that wraps the Iterator
in a collection. The multi-pass-ness of that secret collection is a
lie, but it is fileprivate so it should never get into the wild where
someone can find that out. Then you would just define map() etc… on an
extension of Iterator and have them forward to “self.consumedIn.map”,
etc…. It does still have duplication of definitions, but the
implementations would be in a single spot.

Yeah, we thought about that, too. That doesn't really help the public
API, though. That's the problem we're trying to solve. And if we don't
expose the thing publicly, any user wishing to write her own algorithm
that works on both Iterator and Collection has to write it herself.

Another option, if the subterfuge of a secret collection is
undesirable, would be to make “consumedIn” be public and have it
create an array-like collection. The default implementation would
actually make an eager copy, but specialized cases could work with the
created collection to avoid copying the iterator’s contents where
possible. Then you would remove all of the eager methods from
Iterator and just use collection’s version.

Ultimately, there are lots of interesting ideas for addressing this
space, but none of them lead to an answer that we we're willing to bet
solves the design problems and is implementable in time for Swift
3... except for the renaming.

Yeah. We are out of time. Is there any hope of coming back to this in Swift 4, or would it be too much of a breaking change?

Thanks,
Jon

···

On Jul 15, 2016, at 11:47 AM, Dave Abrahams <dabrahams@apple.com> wrote:
on Thu Jul 14 2016, Jonathan Hull <jhull-AT-gbis.com <http://jhull-at-gbis.com/>> wrote:

Food for thought…

Thanks,
Jon

Hi,

I'd like to continue the discussion of the issue raised by David Waite
inhttp://thread.gmane.org/gmane.comp.lang.swift.evolution/21295/: <inhttp://thread.gmane.org/gmane.comp.lang.swift.evolution/21295/:> <inhttp://thread.gmane.org/gmane.comp.lang.swift.evolution/21295/: <inhttp://thread.gmane.org/gmane.comp.lang.swift.evolution/21295/:>>

My main motivation for proposing this is the potential for

developer confusion. As stated during one of the previous threads on
the naming of map, flatMap, filter, etc. methods on Sequence,
Sequence has a naming requirement not typical of the rest of the
Swift standard library in that many methods on Sequence may or may
not be destructive. As such, naming methods for any extensions on
Sequence is challenging as the names need to not imply immutability.

I'd like to focus on a particular point: methods on Sequence can
consume elements, but the APIs are not markedmutating.

Dave Abrahams, Max Moiseev, and I have discussed this issue and we
agree this problem is severe and worth solving, we also think that the
likely solutions would be source-breaking, so it is important that we
discuss it now.

We have discussed a few options.

- Rejected option: remove Sequence, let IteratorProtocol model
single-pass data streams

- Rejected option: use a syntactic marker, like sequence.consumedIn.map {}

- Rejected option: mutating APIs on Sequence, non-mutating APIs on Collection

Proposed: rename Sequence to IterableOnce or TraversableOnce. We think
that Sequence does not convey the single-pass restriction clearly. The
term "sequence" has been used in math (as in "integer sequence"), and
since the math domain does not have mutation, "sequence" can be
understood to mean "multi-pass", since you can traverse a sequence of
integers an arbitrary number of times.

We think that only the last option is viable in the Swift language as
it exists now, without creating an undue burden for API vendors and
users.

For more details about rejection options, please see the full writeup:
https://gist.github.com/gribozavr/47f4717f3afc762549383e94da7f748b <https://gist.github.com/gribozavr/47f4717f3afc762549383e94da7f748b>

Dmitri

--
Dave


(Dave Abrahams) #4

Comments inline:

I have also been thinking about this problem for the last week or so
(as well as the finite/infinite bit). I don’t really have anything
detailed that is ready to share (and it sounds like you are headed in
a different direction now). I still wanted to share the gist of my
thoughts, in case they help spark ideas in others…

Hi Jonathan,

Thanks for your thoughts...

My thought was to follow the first rejected approach: removing
sequence and letting the Iterator protocol model single-pass.
Iterators would be reference types.

I followed a similar path, and my version also has a pretty large
duplication of API between Iterator and Collection… the difference
though, is I think I have a way to avoid most external duplication of
API.

Basically, I added back in a super-minimal protocol to fill the
structural gap left by Sequence. I call it “IteratorProvider” and it
only has a single function which vends an iterator. Collection
adheres to this, and Iterator adheres to it by returning itself. All
of the other methods from Sequence remain on Iterator. Thus anyone
with API that only needs a single pass would take a IteratorProvider
and then work on the iterator it provides.

That leaves us back where we are now: people will see that
IteratorProvider is a simple, universal protocol for both single-and
multi-pass sequences, write algorithm libraries that depend on
multi-pass-ness, and test them with the most prevalent examples, which
happen to be multi pass.

Let me make a quick counter-argument, because I thought about it a
bit, and I don’t think it does have the same problem (especially with
careful/better naming).

The difference is that the ONLY method on IteratorProvider is the one
to get an iterator. There is no map, filter, sort, first, count, etc…
just a way to get a single-pass iterator. This changes the mindset
when using it. You are aware that you are getting a single-pass
iterator.

Maybe. What's to stop people from extending IteratorProvider?

True, people might try to get the iterator a second time, but we can
make the iteratorProvider method optional (and trying to get an
iterator from an iterator which is spent would return nil)
and then they are forced to deal with the case where it was
single-pass.

Now you can't loop over the same array multiple times.

It is a fairly subtle difference, but an important one IMHO.

The big difference is that Collection and Iterator are still separate
protocols,

That's currently the case.

Yes, you are correct. I don’t know why I thought they were connected in the current version.

iterator is a reference type,

That's slightly different, but we have reasons for not wanting to make
that requirement.

Would you mind expanding on those reasons?

All detailed below, before you write “ah.”

and most of the methods from sequence are now on iterator.

I think this makes more sense semantically than the current model (or
renaming sequence). I also really think it is important to have
iterators be reference types (anything else is really a lie)

Once they're reference types, you might as well just make them conform
to the same protocol as collections for algorithms, because none of the
“mutating/nonmutating” distinctions will be honored anyway. Furthermore
it implies efficiency costs we're not prepared to pay. The main problem
is that the right way to model single-pass traversal is with move-only
types, which can't be copied... and that's not something we can express
in Swift today.

ah.

The rejected “consumedIn” idea also gave me an idea of how to reduce
the internal API repetition, if desired.

Have a fileprivate method on Iterator (I will call it “consumedIn”
here, but it is private, so call it whatever) that wraps the Iterator
in a collection. The multi-pass-ness of that secret collection is a
lie, but it is fileprivate so it should never get into the wild where
someone can find that out. Then you would just define map() etc… on an
extension of Iterator and have them forward to “self.consumedIn.map”,
etc…. It does still have duplication of definitions, but the
implementations would be in a single spot.

Yeah, we thought about that, too. That doesn't really help the public
API, though. That's the problem we're trying to solve. And if we don't
expose the thing publicly, any user wishing to write her own algorithm
that works on both Iterator and Collection has to write it herself.

Another option, if the subterfuge of a secret collection is
undesirable, would be to make “consumedIn” be public and have it
create an array-like collection. The default implementation would
actually make an eager copy, but specialized cases could work with the
created collection to avoid copying the iterator’s contents where
possible. Then you would remove all of the eager methods from
Iterator and just use collection’s version.

Ultimately, there are lots of interesting ideas for addressing this
space, but none of them lead to an answer that we we're willing to bet
solves the design problems and is implementable in time for Swift
3... except for the renaming.

Yeah. We are out of time. Is there any hope of coming back to this
in Swift 4, or would it be too much of a breaking change?

There's always hope :slight_smile:

···

on Mon Jul 18 2016, Jonathan Hull <swift-evolution@swift.org> wrote:

On Jul 15, 2016, at 11:47 AM, Dave Abrahams <dabrahams@apple.com> wrote:
on Thu Jul 14 2016, Jonathan Hull <jhull-AT-gbis.com <http://jhull-at-gbis.com/>> wrote:

Food for thought…

Thanks,
Jon

Hi,

I'd like to continue the discussion of the issue raised by David Waite
inhttp://thread.gmane.org/gmane.comp.lang.swift.evolution/21295/:
<inhttp://thread.gmane.org/gmane.comp.lang.swift.evolution/21295/:>
<inhttp://thread.gmane.org/gmane.comp.lang.swift.evolution/21295/:
<inhttp://thread.gmane.org/gmane.comp.lang.swift.evolution/21295/:>>

My main motivation for proposing this is the potential for

developer confusion. As stated during one of the previous threads on
the naming of map, flatMap, filter, etc. methods on Sequence,
Sequence has a naming requirement not typical of the rest of the
Swift standard library in that many methods on Sequence may or may
not be destructive. As such, naming methods for any extensions on
Sequence is challenging as the names need to not imply immutability.

I'd like to focus on a particular point: methods on Sequence can
consume elements, but the APIs are not markedmutating.

Dave Abrahams, Max Moiseev, and I have discussed this issue and we
agree this problem is severe and worth solving, we also think that the
likely solutions would be source-breaking, so it is important that we
discuss it now.

We have discussed a few options.

- Rejected option: remove Sequence, let IteratorProtocol model
single-pass data streams

- Rejected option: use a syntactic marker, like sequence.consumedIn.map {}

- Rejected option: mutating APIs on Sequence, non-mutating APIs on Collection

Proposed: rename Sequence to IterableOnce or TraversableOnce. We think
that Sequence does not convey the single-pass restriction clearly. The
term "sequence" has been used in math (as in "integer sequence"), and
since the math domain does not have mutation, "sequence" can be
understood to mean "multi-pass", since you can traverse a sequence of
integers an arbitrary number of times.

We think that only the last option is viable in the Swift language as
it exists now, without creating an undue burden for API vendors and
users.

For more details about rejection options, please see the full writeup:
https://gist.github.com/gribozavr/47f4717f3afc762549383e94da7f748b
<https://gist.github.com/gribozavr/47f4717f3afc762549383e94da7f748b>
<https://gist.github.com/gribozavr/47f4717f3afc762549383e94da7f748b
<https://gist.github.com/gribozavr/47f4717f3afc762549383e94da7f748b>>

Dmitri

--
Dave

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

--
Dave