AnySequence and type erasure


(Svein Halvor Halvorsen) #1

I'm sure people here know about the problem that AnySequence, AnyGenerator,
etc solves.
In Swift, a protocol can have an associated type, which is kinda like
generics, but you cannot declare a variable like this:

let sequence: SequenceType<Int>

If you want a sequence, any sequence, over ints you need to wrap the
protocol in a new concrete, generic type, AnySequence<T> that itself
implements the protocol SequenceType, and where the associated type Element
is equal to the generic type T.

The standard library does this. And, like many others, I have tried to
implement this for my self, to try to better understand the problem, and I
expected that I would end up with a design very similar to what the
standard library does. However, I have not seen the need for the complex
boxing mechanism. I'm probably misunderstanding something, though.

Can someone please look at this code, and give me constructive feedback? Is
this a novel and working solution, or am I missing something?:

struct AnyGenerator<Element>: GeneratorType {
    init<G: GeneratorType where G.Element == Element>(_ gen: G) {
        var gen = gen
        self._next = { gen.next() }
    }
    private let _next: () -> Element?
    func next() -> Element? {
        return _next()
    }
}

struct AnySequence<Element>: SequenceType {
    init<S: SequenceType where S.Generator.Element == Element>(_ seq: S) {
        self.underestimateCount = seq.underestimateCount
        self._generate = { AnyGenerator(seq.generate()) }
    }
    private let _generate: () -> AnyGenerator<Element>
    func generate() -> AnyGenerator<Element> {
        return _generate()
    }
    let underestimateCount: () -> Int
}


(Dmitri Gribenko) #2

Hi,

Your design should work, but it won't provide optimal performance.
Sequence has a lot of requirements that can be customized by the
conforming type (e.g., map()). We need to forward those to the
original type to get the best performance.

While you can create a closure for each of the forwarded methods,
let's say n in total, it is n times more wasteful compared to using
just a single reference like the standard library does.

Dmitri

···

On Fri, Jun 17, 2016 at 1:39 AM, Svein Halvor Halvorsen via swift-users <swift-users@swift.org> wrote:

I'm sure people here know about the problem that AnySequence, AnyGenerator,
etc solves.
In Swift, a protocol can have an associated type, which is kinda like
generics, but you cannot declare a variable like this:

let sequence: SequenceType<Int>

If you want a sequence, any sequence, over ints you need to wrap the
protocol in a new concrete, generic type, AnySequence<T> that itself
implements the protocol SequenceType, and where the associated type Element
is equal to the generic type T.

The standard library does this. And, like many others, I have tried to
implement this for my self, to try to better understand the problem, and I
expected that I would end up with a design very similar to what the standard
library does. However, I have not seen the need for the complex boxing
mechanism. I'm probably misunderstanding something, though.

Can someone please look at this code, and give me constructive feedback? Is
this a novel and working solution, or am I missing something?:

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr@gmail.com>*/


(Svein Halvor Halvorsen) #3

Ok. Good to hear that I'm not too far off :slight_smile:

Can you provide me with an example of a sequence type or two that has some
map or other function that would benefit from an optimized wrapper? Or:
under what circumstances would the stdlib implementation outperform mine?
I'm completely new to this all, and I'm trying to understand. Could some
random access collection type parallelize a filter? If you want I guarantee
correct ordering, under what circumstances does some sequence type work
better than mere iteration?

···

fre. 17. jun. 2016 kl. 15.49 skrev Dmitri Gribenko <gribozavr@gmail.com>:

On Fri, Jun 17, 2016 at 1:39 AM, Svein Halvor Halvorsen via > swift-users <swift-users@swift.org> wrote:
> I'm sure people here know about the problem that AnySequence,
AnyGenerator,
> etc solves.
> In Swift, a protocol can have an associated type, which is kinda like
> generics, but you cannot declare a variable like this:
>
> let sequence: SequenceType<Int>
>
> If you want a sequence, any sequence, over ints you need to wrap the
> protocol in a new concrete, generic type, AnySequence<T> that itself
> implements the protocol SequenceType, and where the associated type
Element
> is equal to the generic type T.
>
> The standard library does this. And, like many others, I have tried to
> implement this for my self, to try to better understand the problem, and
I
> expected that I would end up with a design very similar to what the
standard
> library does. However, I have not seen the need for the complex boxing
> mechanism. I'm probably misunderstanding something, though.
>
> Can someone please look at this code, and give me constructive feedback?
Is
> this a novel and working solution, or am I missing something?:

Hi,

Your design should work, but it won't provide optimal performance.
Sequence has a lot of requirements that can be customized by the
conforming type (e.g., map()). We need to forward those to the
original type to get the best performance.

While you can create a closure for each of the forwarded methods,
let's say n in total, it is n times more wasteful compared to using
just a single reference like the standard library does.

Dmitri

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr@gmail.com>*/


(Dmitri Gribenko) #4

Compare the default implementations of map() for Sequence and
Collection. In the Sequence case, we don't know the size of the
resulting array, so we have to grow the resulting array as we pull the
elements from the sequence. In the case of running Collection.map()
we know the final size from the count property. In the case of Array
we can do even better and eliminate a check (the _expectEnd()
call). However, certain collections where calculating the number of
elements might be expensive, can opt into the Sequence behavior (e.g.,
various string views); this is up to the designer of the specific
collection.

Dmitri

···

On Fri, Jun 17, 2016 at 12:37 PM, Svein Halvor Halvorsen <svein.h@lvor.halvorsen.cc> wrote:

Ok. Good to hear that I'm not too far off :slight_smile:

Can you provide me with an example of a sequence type or two that has some
map or other function that would benefit from an optimized wrapper? Or:
under what circumstances would the stdlib implementation outperform mine?

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr@gmail.com>*/


(Svein Halvor Halvorsen) #5

Is this a problem in reality? The map() on Sequence uses
underestimatedCount to reserve capacity, and if the sequence wraps a
collection type, underestimatedCount is presumably O(1) and returns the
same value as count, non?

···

2016-06-18 10:23 GMT+02:00 Dmitri Gribenko <gribozavr@gmail.com>:

On Fri, Jun 17, 2016 at 12:37 PM, Svein Halvor Halvorsen > <svein.h@lvor.halvorsen.cc> wrote:
> Ok. Good to hear that I'm not too far off :slight_smile:
>
> Can you provide me with an example of a sequence type or two that has
some
> map or other function that would benefit from an optimized wrapper? Or:
> under what circumstances would the stdlib implementation outperform mine?

Compare the default implementations of map() for Sequence and
Collection. In the Sequence case, we don't know the size of the
resulting array, so we have to grow the resulting array as we pull the
elements from the sequence. In the case of running Collection.map()
we know the final size from the count property. In the case of Array
we can do even better and eliminate a check (the _expectEnd()
call). However, certain collections where calculating the number of
elements might be expensive, can opt into the Sequence behavior (e.g.,
various string views); this is up to the designer of the specific
collection.


(Dmitri Gribenko) #6

You can't call '.count' on a Sequence without potentially consuming
the sequence.

Dmitri

···

On Thu, Jun 23, 2016 at 2:12 AM, Svein Halvor Halvorsen <svein.h@lvor.halvorsen.cc> wrote:

2016-06-18 10:23 GMT+02:00 Dmitri Gribenko <gribozavr@gmail.com>:

On Fri, Jun 17, 2016 at 12:37 PM, Svein Halvor Halvorsen >> <svein.h@lvor.halvorsen.cc> wrote:
> Ok. Good to hear that I'm not too far off :slight_smile:
>
> Can you provide me with an example of a sequence type or two that has
> some
> map or other function that would benefit from an optimized wrapper? Or:
> under what circumstances would the stdlib implementation outperform
> mine?

Compare the default implementations of map() for Sequence and
Collection. In the Sequence case, we don't know the size of the
resulting array, so we have to grow the resulting array as we pull the
elements from the sequence. In the case of running Collection.map()
we know the final size from the count property. In the case of Array
we can do even better and eliminate a check (the _expectEnd()
call). However, certain collections where calculating the number of
elements might be expensive, can opt into the Sequence behavior (e.g.,
various string views); this is up to the designer of the specific
collection.

Is this a problem in reality? The map() on Sequence uses underestimatedCount
to reserve capacity, and if the sequence wraps a collection type,
underestimatedCount is presumably O(1) and returns the same value as count,
non?

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr@gmail.com>*/