union types


(Drew Crawford) #1

This is not really a well-thought-out proposal, but more of a brainstorm about if there is a good language-level solution.

A problem I frequently run into in Swift is the inability to use generic types as first-class "complete" types.

Let's say that I have a protocol (with an associated types) and some structs that implement it (with different associated types).

//all good feature requests involve factories

protocol Factory {
    typealias Product
    func make() -> Product
    var description : String { get }
}

struct IntFactory : Factory {
    typealias product = Int
    func make() -> Int { return 0 }
    var description : String { get { return "IntFactory" } }
}

struct StringFactory : Factory {
    typealias product = String
    func make() -> String { return "Hello world" }
    var description : String { get { return "StringFactory" } }
}

Now it is easy to work on the underlying IntFactory and StringFactory:

IntFactory().make() //static dispatch
StringFactory().make() //static dispatch

...but how do I write a function that works on either Factory?

func foo(a: Factory) {
    a.make() //dynamic dispatch
}
error: protocol 'Factory' can only be used as a generic constraint because it has Self or associated type requirements

I could use generics:

func foo<A: Factory>(a: A) {}

but now I need to bubble up generics all over the stack frame:

func baz<A: Factory>(a: A){bar(a)}
func bar<A: Factory>(a: A){foo(a)}
func foo<A: Factory>(a: A) {a.make()}

class WhyIsthisGeneric<A: Factory> {
    var a: A //because of an implementation detail of Factory, of course
}

I submit that this couples the implementation details of Factory too tightly to unrelated functions and methods (and perhaps entire classes that now become generic so I can create ivars).

Here's what I think is an elegant solution:

typealias EitherFactory = union(Factory, [IntFactory, StringFactory])
let a : EitherFactory = IntFactory()
func baz(a: EitherFactory){bar(a)}
func bar(a: EitherFactory){foo(a)}
func foo(a: EitherFactory){a.make()}

The union function being a new builtin, that causes the compiler to automatically write this type behind the scenes:

enum EitherFactory {
    case intFactory(IntFactory)
    case stringFactory(StringFactory)
    
    func make() -> Any {
        switch(self) {
        case .intFactory(let f):
            return f.make()
        case .stringFactory(let f):
            return f.make()
        }
    }
    
    var description : String {
        get {
            switch(self) {
            case .intFactory(let f):
                return f.description
            case .stringFactory(let f):
                return f.description
            }
        }
    }

    var intFactory? : IntFactory {
        switch(self) {
            case .intFactory(let f):
            return f
            default:
            return nil
        }
    }
    var stringFactory? : StringFactory {
        switch(self) {
            case .StringFactory(let f):
            return f
            default:
            return nil
        }
    }
}

This generated type is fully-specified, and so it may be used in any place a first-class type is allowed.

Arguments in favor of this proposal:

1. It allows protocols with Self or associated type constraints to be promoted to "full" types in many practical usecases, (specifically, when the list of types can be enumerated, and this is always the case for protocols with private or internal visibility).
2. It allows the user to opt into the simplicity of dynamic dispatch with their generic types
3. Since the boilerplate is automatically generated, it updates automatically for new functions and methods added to the protocol, whereas my current solution is tedious, manual, and error-prone
4. The semantics allow for a (future) optimizer to optimize away the boilerplate. For example, if we write

let a: EitherFactory = IntFactory()
func foo(a: EitherFactory){a.make()}
foo(a)

     Our optimizer may emit a specialization "as if" I had bubbled generics:

let a: IntFactory = IntFactory()
func foo_specialized_intFactory (a: IntFactory){a.make()}
foo_specialized_intFactory(a)

    Under this optimization the switch statement and the dynamic dispatch are eliminated. So the semantics allow "at least" dynamic dispatch performance, and "up to" static dispatch performance, given a strong optimizer

Motivating case:

This proposal arises (most recently) from the problem of trying to write code that is generic across IPv4 and IPv6. For example

final class Socket {
    func getsockname() -> ???
}

In the IPv4 case this function should return `sockaddr_in`, but in the v6 case it should return `sockaddr_in6`. So this socket can only be represented as a protocol with associated type requirements, and so it cannot be trivially used e.g. as a function parameter, as an ivar, etc. This significantly complicates the implementation.

Incompleteness:

The full semantics of the union builtin are underspecified.

1. In the example, `make() -> Int` and `make() -> String` unify to `make() -> Any`, but would `sequence() -> CollectionType` and `sequence() -> SequenceType` unify to `sequence() -> Any`? Perhaps not.
2. What is the behavior if the arguments/return values of a function are themselves unions?

And finally, would it be a better idea merely to promote generics to "full" types, without the use of an explicit union builtin? The approach here is more narrowly tailored, but that is not necessarily the right language design.


(Dmitri Gribenko) #2

    func make() -> Any {

What about parameter types that you erase, would you downcast and trap in
case of mismatch?

Did you look at AnySequence and other related types that implement a
similar pattern manually?

How do you expect people will use the result of such an operation in
practice? The type being 'Any' makes it completely opaque.

Motivating case:

This proposal arises (most recently) from the problem of trying to write
code that is generic across IPv4 and IPv6. For example

final class Socket {
    func getsockname() -> ???
}

In the IPv4 case this function should return `sockaddr_in`, but in the v6
case it should return `sockaddr_in6`. So this socket can only be
represented as a protocol with associated type requirements, and so it
cannot be trivially used e.g. as a function parameter, as an ivar, etc.
This significantly complicates the implementation.

If your library is a high-level one, I definitely wouldn't want to see it
return 'Any' from Socket methods. Instead, a library should erase the
differences between transport mechanisms in the high-level API, while still
providing a low-level API for those who need it, as well as for the
implementation of the high-level API.

Dmitri

···

On Fri, Dec 11, 2015 at 3:22 PM, Drew Crawford via swift-evolution < swift-evolution@swift.org> wrote:

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr@gmail.com>*/


(Drew Crawford) #3

How do you expect people will use the result of such an operation in practice? The type being 'Any' makes it completely opaque.

If your library is a high-level one, I definitely wouldn't want to see it return 'Any' from Socket methods. Instead, a library should erase the differences between transport mechanisms in the high-level API, while still providing a low-level API for those who need it, as well as for the implementation of the high-level API.

Perhaps it would be better to recursively unify parameters/returns into their own unions, such that

final class Socket {
    func getsockname() -> union(Any, [sockaddr_in, sockaddr_in6])
}

Did you look at AnySequence and other related types that implement a similar pattern manually?

AnySequence has a different motivation. In AnySequence, the underlying type is fully erased, or alternately, the underlying type is an element from an open set. This is appropriate for a public API where the user may create their own types unknown to the library author which need to be erased.

In a union, the underlying type is only partially erased, or alternately, the underlying type is an element from a closed enumeration. This is appropriate for a private/internal API where the types are known at compile time.

I suspect it would be difficult to automatically unify an *open* set of types, which is why AnySequence et al are manually maintained. But unifying a *closed* set of types can be implemented in a preprocessor. So unification is much more appropriate in this case.


(Joe Groff) #4

This is not really a well-thought-out proposal, but more of a brainstorm about if there is a good language-level solution.

A problem I frequently run into in Swift is the inability to use generic types as first-class "complete" types.

Let's say that I have a protocol (with an associated types) and some structs that implement it (with different associated types).

//all good feature requests involve factories

protocol Factory {
    typealias Product
    func make() -> Product
    var description : String { get }
}

struct IntFactory : Factory {
    typealias product = Int
    func make() -> Int { return 0 }
    var description : String { get { return "IntFactory" } }
}

struct StringFactory : Factory {
    typealias product = String
    func make() -> String { return "Hello world" }
    var description : String { get { return "StringFactory" } }
}

This seems like it would be addressed just by allowing Factory to be used as a dynamic type, with its Product type generalized to Any. We'll be set up to support that with some runtime work to store associated types in protocol witness tables (which is also necessary to fix cyclic conformances, one of our Swift 3 goals).

-Joe

···

On Dec 11, 2015, at 3:22 PM, Drew Crawford via swift-evolution <swift-evolution@swift.org> wrote:

Now it is easy to work on the underlying IntFactory and StringFactory:

IntFactory().make() //static dispatch
StringFactory().make() //static dispatch

...but how do I write a function that works on either Factory?

func foo(a: Factory) {
    a.make() //dynamic dispatch
}
error: protocol 'Factory' can only be used as a generic constraint because it has Self or associated type requirements

I could use generics:

func foo<A: Factory>(a: A) {}

but now I need to bubble up generics all over the stack frame:

func baz<A: Factory>(a: A){bar(a)}
func bar<A: Factory>(a: A){foo(a)}
func foo<A: Factory>(a: A) {a.make()}

class WhyIsthisGeneric<A: Factory> {
    var a: A //because of an implementation detail of Factory, of course
}

I submit that this couples the implementation details of Factory too tightly to unrelated functions and methods (and perhaps entire classes that now become generic so I can create ivars).

Here's what I think is an elegant solution:

typealias EitherFactory = union(Factory, [IntFactory, StringFactory])
let a : EitherFactory = IntFactory()
func baz(a: EitherFactory){bar(a)}
func bar(a: EitherFactory){foo(a)}
func foo(a: EitherFactory){a.make()}

The union function being a new builtin, that causes the compiler to automatically write this type behind the scenes:

enum EitherFactory {
    case intFactory(IntFactory)
    case stringFactory(StringFactory)
    
    func make() -> Any {
        switch(self) {
        case .intFactory(let f):
            return f.make()
        case .stringFactory(let f):
            return f.make()
        }
    }
    
    var description : String {
        get {
            switch(self) {
            case .intFactory(let f):
                return f.description
            case .stringFactory(let f):
                return f.description
            }
        }
    }

    var intFactory? : IntFactory {
        switch(self) {
            case .intFactory(let f):
            return f
            default:
            return nil
        }
    }
    var stringFactory? : StringFactory {
        switch(self) {
            case .StringFactory(let f):
            return f
            default:
            return nil
        }
    }
}

This generated type is fully-specified, and so it may be used in any place a first-class type is allowed.

Arguments in favor of this proposal:

1. It allows protocols with Self or associated type constraints to be promoted to "full" types in many practical usecases, (specifically, when the list of types can be enumerated, and this is always the case for protocols with private or internal visibility).
2. It allows the user to opt into the simplicity of dynamic dispatch with their generic types
3. Since the boilerplate is automatically generated, it updates automatically for new functions and methods added to the protocol, whereas my current solution is tedious, manual, and error-prone
4. The semantics allow for a (future) optimizer to optimize away the boilerplate. For example, if we write

let a: EitherFactory = IntFactory()
func foo(a: EitherFactory){a.make()}
foo(a)

     Our optimizer may emit a specialization "as if" I had bubbled generics:

let a: IntFactory = IntFactory()
func foo_specialized_intFactory (a: IntFactory){a.make()}
foo_specialized_intFactory(a)

    Under this optimization the switch statement and the dynamic dispatch are eliminated. So the semantics allow "at least" dynamic dispatch performance, and "up to" static dispatch performance, given a strong optimizer

Motivating case:

This proposal arises (most recently) from the problem of trying to write code that is generic across IPv4 and IPv6. For example

final class Socket {
    func getsockname() -> ???
}

In the IPv4 case this function should return `sockaddr_in`, but in the v6 case it should return `sockaddr_in6`. So this socket can only be represented as a protocol with associated type requirements, and so it cannot be trivially used e.g. as a function parameter, as an ivar, etc. This significantly complicates the implementation.

Incompleteness:

The full semantics of the union builtin are underspecified.

1. In the example, `make() -> Int` and `make() -> String` unify to `make() -> Any`, but would `sequence() -> CollectionType` and `sequence() -> SequenceType` unify to `sequence() -> Any`? Perhaps not.
2. What is the behavior if the arguments/return values of a function are themselves unions?

And finally, would it be a better idea merely to promote generics to "full" types, without the use of an explicit union builtin? The approach here is more narrowly tailored, but that is not necessarily the right language design.

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Drew Crawford) #5

In the alternate example

protocol Factory {
    typealias Product: ProductProtocol
    func make() -> Product
    var description : String { get }
}

would it generalize to ProductProtocol (e.g. not Any)?

That is potentially a good solution, especially if compatible with existing Swift 3 work.

···

On Dec 11, 2015, at 9:01 PM, Joe Groff <jgroff@apple.com> wrote:

with its Product type generalized to Any.


(Joe Groff) #6

Yeah, when generalizing a protocol type, we ought to be able to either generalize the associated types to their upper bounds, for use cases like yours, or constrain them to specific types, for the AnyGenerator<T> kind of case.

-Joe

···

On Dec 11, 2015, at 7:48 PM, Drew Crawford <drew@sealedabstract.com> wrote:

On Dec 11, 2015, at 9:01 PM, Joe Groff <jgroff@apple.com <mailto:jgroff@apple.com>> wrote:

with its Product type generalized to Any.

In the alternate example

protocol Factory {
    typealias Product: ProductProtocol
    func make() -> Product
    var description : String { get }
}

would it generalize to ProductProtocol (e.g. not Any)?


(Matthew Johnson) #7

Yeah, when generalizing a protocol type, we ought to be able to either generalize the associated types to their upper bounds, for use cases like yours, or constrain them to specific types, for the AnyGenerator<T> kind of case.

I'm really glad to see that this is planned as part of the Swift 3 generics work.

Will you also be able to support constraints relating more than one associated type? Something like protocol<P where P.Associated == P.Other.Associated>? And a further generalization to protocol<P, Q where P.Associated == Q.Associated>?

Will you also support protocol types that are partially bound allowing use of members that do not reference the unbound associated types?

Matthew


(David Hart) #8

Type-erasure to the rescue? Links back to our twitter discussions Joe.
David.

···

On 12 Dec 2015, at 04:53, Joe Groff via swift-evolution <swift-evolution@swift.org> wrote:

On Dec 11, 2015, at 7:48 PM, Drew Crawford <drew@sealedabstract.com <mailto:drew@sealedabstract.com>> wrote:

On Dec 11, 2015, at 9:01 PM, Joe Groff <jgroff@apple.com <mailto:jgroff@apple.com>> wrote:

with its Product type generalized to Any.

In the alternate example

protocol Factory {
    typealias Product: ProductProtocol
    func make() -> Product
    var description : String { get }
}

would it generalize to ProductProtocol (e.g. not Any)?

Yeah, when generalizing a protocol type, we ought to be able to either generalize the associated types to their upper bounds, for use cases like yours, or constrain them to specific types, for the AnyGenerator<T> kind of case.

-Joe
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution