Generalized abstract data types (GADTs)?

griotspeak · November 28, 2020, 10:02pm

"cases with where clauses" is a nice benefit of my proposed syntax as well. (I'm biased, I know)

dan-zheng · November 28, 2020, 10:06pm

The where clause syntax on enum cases is quite idiomatic. Fits neatly in Swift and is more concise and easier to understand.

Yes. GADTs in Swift can be presented as still enums, but with more refined cases.

Edit: this syntax raises some questions.

Do we need to require that all enum generic type parameters are bound in the where clauses of each case? I believe so - or we can default unspecific generic type parameters to something (Never).
- Special-case of the above: can we have enums where some cases have where clauses and others do not? That probably makes sense, we do not require where clauses to bind all generic type parameters.

dan-zheng · November 28, 2020, 10:34pm

@efremale asks: why GADTs? I think some folks may be familiar, but here goes.

tldr resources, not all directly related to GADTs:

What is dependent typing? (Stack Overflow's version of this post)
Lambda cube (framework for thinking about type systems - in terms of values, types, and functions)
A Little Taste of Dependent Types (my favorite introductory talk)
GADTs in Dotty (Scala 3) (fellow GADT language design)
Why GADTs matter for performance (interesting motivating use case)

Values can depend on values (λ→)

All languages have functions, taking and returning values. Thus, values can depend on values (via functions).

let x = 1
let y = x + x
// y depends on x

Cool: this is the simply typed lambda calculus; also JavaScript and Python. Let's start adding types!

Values can depend on types (λ2)

Can values depend on types? Yes, that is called generic functions: they take generic (type) parameters and there may exist values with those types.

func identity<T>(_ x: T) -> T { x }

Types can depend on types (Fω)

Can types depend on types? Yes, that is called generic types, or type constructors. These are type-level functions taking and returning types.

struct Pair<First, Second> {
  let first: First
  let second: Second
}
// "`Pair`" by itself is like a type-level function.
// It takes a type `First` and a type `Second` and returns `Pair<First, Second>`.

func pair<A, B>(_ a: A, _ b: B) { Pair<A, B>(a, b) }

Generics add a lot of expressivity to languages. Otherwise, we may need some special-case support for specific useful generic types (e.g. special array type T[] in C).

Some languages also have ability to abstract over type constructors of a certain arity, via protocols with associated types (PATs) or higher-kinded types.

Many statically typed languages live in this category, including Swift.

Types can depend on values (λP)

Typically, values exist only at runtime, while types exist at compile-time. How can types depend on values?

This is the world of dependent types, or types dependent on values. In terms of expressivity, dependent types are a big increase in specificity of types but not in computational power (hot take).

My favorite introduction to dependent types is A Little Taste of Dependent Types. (It's also one of my favorite programming languages talks.)

One step towards dependent types is GADTs. Consider:

// Regular enum.
enum Exp<T> {
  case bool(Bool)
  case int(Int)
}

Without GADTs, we cannot enforce that .bool(Bool) must have type Exp<Bool>. Neither can we know that values of type Exp<Bool> must come from the Exp.bool case.

// Refined enum (GADT). I'll pick a syntax:
enum Exp<T> {
  case bool(Bool) where T == Bool
  case int(Int) where T == Int
}

With GADTs: we are able to know the above two points, because the language contains more type information about specific enum cases. This adds type safety, so we don't need unsafe force-cast operations (as!). Type safety improves usability, similar to flow-sensitive typing in other languages!

griotspeak · November 28, 2020, 11:20pm

Yes to question 1 I think and definitely yes to 2.

griotspeak · November 29, 2020, 12:49am

Thinking about this further… the answer to the first question is probably "yes, at least to start" but it would be nice to relax it later to something like: "If you have an enum with multiple type parameters, you can leave one of the type parameters declared by the enum unbound." I think this could lead to a bad (difficult to explain) ruleset though… so the simpler "bind everything" sounds like a better starting point.

jrose · November 29, 2020, 5:14am

Some of this comes down to the difference between

enum Foo<T> {
  case a where T: BinaryInteger
  case b
}

and

enum Bar {
  case a<T> where T: BinaryInteger
  case b

The first case Foo, at a language level, is pretty easy to deal with: it just has some cases that are known-unreachable in switch statements. (The compile-time and run-time enum layout operations may also have to handle these cases that may not be valid.) Other than that, it’s basically the same as

enum Baz<T> {
  case a
  case b
}

that already works today. For that reason, I don’t see any need to place any specific requirements on where clauses.

Bar is trickier because it means switch statements generate types—I think you can basically build dependent types out of that feature. But if the case’s generic parameters are fully bound, then it’s “fine”—it’s the same as @Alejandro’s extensions with generic parameters, and simpler because you can’t actually name them in a switch in Swift’s current syntax. (Maybe that’s what you were talking about the whole time!)

griotspeak · November 29, 2020, 6:35am

I want to clarify, just to be super clear, that my suggestion is in between the “two poles” you gave here. T is still introduced at the type level when there are cases that “don’t need it”

dan-zheng · November 29, 2020, 11:56am

This is quite interesting.

Isn't Bar roughly the same as:

struct Bar {
  static func a<T>() -> Self where T: BinaryInteger { ... }
  static func b() -> { ... }
}

Which is almost valid today:

hi.swift:3:28: error: expected expression
  static func b() -> { ... }
                           ^
hi.swift:2:17: error: generic parameter 'T' is not used in function signature
  static func a<T>() -> Self where T: BinaryInteger { ... }
                ^

So we actually need to represent GADT enums as:

struct Bar {
  static func a<T>(_ type: T.Type) -> Self where T: BinaryInteger { ... }
                   ^~~~~~~~~~~~~~ appease ye old Swift type-checker.
                                  scala defaults to `Any` or `Never`
                                  (depending on variance) but
                                  i guess you think different ().
  static func b() -> { ... }
}

Okay. What more runtime metadata support do we need for GADTs? I feel like we don't need any more (fingers crossed, but also adding runtime support isn't bad - it's good because it makes the runtime more powerful. start learning here). We just need type-checker additions for representing enum refinement type (T == Int) information.

Agree? @griotspeak @jrose @Max_Desiatov

Future (Swift 7?)

Eventual IDE support (SourceKit) for GADTs and "flow-sensitive typing" features will be pretty cool, like visualizable and interactive "Clang static analyzer arrows" in the core language semantics. I look forward to that day!

_{Clang static analyzer, arrows show dataflow/symbolic execution path.

GADTs can have similar nice IDE integration for the new flow-sensitive typing capabilities.}

If we agree on a GADT syntax, I'm happy to file some bugs.swift.org issues for adding GADTs. Also happy to mentor interested contributors, hmu!

anandabits · November 29, 2020, 2:58pm

jrose:

Some of this comes down to the difference between
enum Foo<T> {
  case a where T: BinaryInteger
  case b
}
and
enum Bar {
  case a<T> where T: BinaryInteger
  case b
The first case Foo, at a language level, is pretty easy to deal with: it just has some cases that are known-unreachable in switch statements. (The compile-time and run-time enum layout operations may also have to handle these cases that may not be valid.) Other than that, it’s basically the same as
enum Baz<T> {
  case a
  case b
}
that already works today. For that reason, I don’t see any need to place any specific requirements on where clauses.

I’m confused as to why you wrote an example where a case introduces a new type parameter. GADTs don’t require that.

GADTs are more like your Foo example. As was mentioned already, the usual examples require the type parameters to be bound in all cases, thus supporting switching over the subset of cases relevant for a given value based on the concrete type bound to T. I don’t think flow sensitive typing is required for these use cases. The compiler just needs to determine which cases are possible based on what is known about T. When it can statically determine that a value cannot have certain cases, the user does not need a code path to handle them in a switch (and the compiler should generate a warning or error if they are handled).

That said, I think there are interesting cases that would be supported by allowing arbitrary constraints. Knowing you have a Foo<Int> doesn’t tell you anything about which cases need to be handled (it meets the constraints of case a but could also have a value of case b since that is unconstrained). However, when switching over your Foo, code in the case .a would know T: BinaryInteger. This is where flow sensitive typing comes in.

So I think there are two separable features here. It probably makes sense to carve out an initial proposal that focuses on statically determining which subset of cases may be contained in a value based on static information about the type parameters. This probably requires all cases to fully bind the type parameters. This would deliver a lot of value and power to users without the complexity of flow-sensitive typing. A follow-on proposal could relax the limitations on what kinds of constraints are supported on cases and introduce flow-sensitive typing.

dan-zheng · November 29, 2020, 3:00pm

Incremental, modular, progressive disclosure of complexity, Swifty. You have my axe

Yes, I believe this is accurate. GADTs are better than flow-sensitive typing.

jrose · November 29, 2020, 9:31pm

Ah, I didn't actually think you'd be able to use the BinaryInteger-ness of T (or a value thereof) within a switch over a Foo, but I see that we'd have problems otherwise:

enum Foo<T> {
  case a where T: BinaryInteger
  case b
}

func test<T>(_ foo: Foo<T>) -> Int? {
  switch foo {
  case a:
    // within this block we know that T: BinaryInteger,
    // but does the compiler allow us to access that?
    return Int(T())
  case b:
    return nil
  }
}

enum FooPayload<T> {
  case a(T) where T: BinaryInteger
  case b
}

func test<T>(_ foo: FooPayload<T>) -> Int {
  switch foo {
  case a(value):
    // it's pretty annoying if we *can't* do integer things
    return Int(value)
  case b:
    return nil
  }
}

enum FooRestricted<T> {
  case a(T) where T == Int
  case b
}

func test<T>(_ foo: FooRestricted<T>) -> Int {
  switch foo {
  case a(value):
    // we will be pelted with tomatos if you can't do this
    return value
  case b:
    return nil
  }
}

It would definitely make the implementation more complicated to support the Foo and FooPayload examples (and doubly so with library evolution, which allows you to add more cases to an existing enum), so I understand now why you'd want to restrict the feature to the FooRestricted case.

[Implementation thoughts: the only place you know T: BinaryInteger is where the case value is constructed, so to avoid a global lookup later you could store that conformance in the value, or in a separate "case name + argument types" hash table to go with the existing "nominal type + argument types" hash table.]

@griotspeak has taught me it's an also-useful feature in some cases. The most common example is a statically-typed expression tree:

enum Expr<Result: Equatable> {
  case integer(Int) where Result == Int
  case boolean(Bool) where Result == Bool
  case not(Expr<Bool>) where Result == Bool
  case plus(Expr<Int>, Expr<Int>) where Result == Int

  // this one
  case equal<Arg>(Expr<Arg>, Expr<Arg>) where Result == Bool

  func evaluate() -> ResultType { … }
}

let expr /*: Expr<Bool>*/ =
  Expr.equal(.plus(.integer(1), .integer(2)), .integer(3))

There's not really a way to express this structure (statically) without allowing generic parameters on cases. But there's also a lot more language and implementation work that comes from supporting this, and I don't actually know where it shows up besides expression trees.

Note that we can express the same operations with protocols if we get fully-bound protocol values (everything except exhaustive switches):

protocol Expr: Equatable {
  associatedtype Result
  func evaluate() -> Result
}
struct Integer: Expr {
  var value: Int
  func evaluate() -> Int { self.value }
  // …
}
struct Bool: Expr { … }
struct Not: Expr { … }
struct Plus: Expr { … }

struct Equal<Arg>: Expr {
  var left, right: Expr<Arg>
  func evaluate() -> Bool { … }
  // …
}

let expr: Any<Expr where Result == Bool> =
  Equal(Plus(Integer(1), Integer(2)), Integer(3))

And that does imply that the same is true of the GADTs actually being proposed. But because enums and protocols are not unified in Swift and are used for different purposes, I still think it makes sense to add GADT support to the language.

jrose · November 29, 2020, 11:09pm

>< As if I haven't said enough already, I'd like to point out that restricting where clauses to fully constraining parameters (rather than allowing protocol or superclass bounds) still doesn't mean that all of the parameters have to be fully constrained in a particular clause. There's no real reason why you can't have enum Expr<Result, Metadata> and allow the metadata part to be unconstrained even in a case where Result == Bool.

dan-zheng · November 29, 2020, 11:24pm

I think that you were right to say earlier that this is the gateway to full dependent types.

Haskell is taking that same route uuic (proposal, Reddit comment with context) towards Dependent Haskell. Terms can have forall in their type:

The Grand Plan for Unification of Class Methods and Associated Types [in Haskell]

At this point, we end up with the following set of features:

Visible dependent quantification in terms:
idv :: forall a -> a -> a
idv t x = x :: t

x = idv Double  42     -- x :: Double
n = idv Integer 42     -- n :: Integer
...

This is like "leaving some GADT enum generic parameters unconstrained" in Swift. Those generic parameters wouldn't default to some specific type (Any or Never), they become a forall (rank-2, <T, U, ...>(...) -> ...) type.

anandabits · November 29, 2020, 11:28pm

That’s a good point. Do you have any thoughts about a more relaxed rule might be without allowing arbitrary constraints? What is the sweet spot of straightforwardness, flexibility and ease of implementation?

jrose · November 30, 2020, 12:15am

I'm really bad at reading Haskell, but unconstrained type-level parameters are not at all dependent types. A dependent type would be something like this:

let a: SomeType = …
let b: SomeType = …

var derived = foo(a)
derived = foo(b) // invalid, may be a different type

That can happen with my equal expression example, though we have to contort the syntax to show it without naming the types:

let a: Expr<Bool> = Expr.equal(bool1, bool2)
let b: Expr<Bool> = Expr.equal(int1, int2)

if case .equal(let firstExprA, _) = a {
  if case .equal(let firstExprB, _) = b {
    var derived = firstExprA // dynamically Expr<Bool>, statically "Expr<ArgType of .equal of a>`
    derived = firstExprB // dynamically Expr<Int>, statically "Expr<ArgType of .equal of b>`
  }
}

But it can't happen with just where clauses, because even though the type is unconstrained in the case it's constrained at any use sites (like generic enums today):

enum Foo<Payload, Index: BinaryInteger> {
  case boolean(value: Payload, index: Index) where Payload == Bool
  case none(index: Index)
  case reallyNone
}

func test<T, Index>(_ a: Foo<T, Index>, _ b: Foo<T, Index>) {
  if case .boolean(let b, let i) = a:
    // We know b is a Bool because of the fully-constrained Payload.
    // We know i is an Index because of the original type of 'a'.
    if case .none(let j) = b {
      // We know j is an Index because of the original type of 'b'.
      // We know it's the *same* Index.
      // So this is legal.
      var derived = i
      derived = j
    }
  }
}

I think "only equality constraints are supported in where clauses on cases", with no other restrictions, makes a good place to start (now that you all have talked through it here!). It avoids a lot of "how should this work" questions, and doesn't rule out future support for other kinds of constraints.

Something you will face in the implementation is what to do with this:

enum Ick<Element> {
  case set(Set<Element>) where Element == Int
  case ick
}

print(MemoryLayout<Ick<Bool>>.size)

The simplest thing to do is to lay out every instantiation of the enum Ick as if all cases were possible and then not use some of them, and this is absolutely the approach you should take for any initial prototyping. This has the virtue of requiring no runtime work and thus no thoughts about deploying to older Swift runtimes, but

it doesn't scale to other kinds of constraints, like Element: Hashable, where you can't construct the payload type if the constraint isn't satisfied
it wastes space—an Ick<Bool> is always going to be ick, so it could be a zero-sized enum

For long-term support, we might want to change compile-time and run-time enum layout logic to ignore cases that aren't possible for a given instantiation. However, that does mean checking case requirements when instantiating an enum type, which feels super weird to me (note that I didn't even make an Ick value in the example above) and would require dynamic lookups for protocol constraints. So I'm a bit stymied.

(We might have to do something like say "cases with non-equality constraints get boxed like indirect cases" just to sidestep this layout requirement, whether or not we do any case trimming. But that shouldn't be necessary for equality-constraint-only where clauses, since that's going to run through a known list.)

anandabits · November 30, 2020, 1:19am

That makes sense. Would you include support for equality constraints based on associated types, such as this?

enum Foo<T: Identifiable, U> {
    case bar(T) where T.ID == U
}

jrose · November 30, 2020, 1:32am

Yeah, I would. On the implementation side it might be trickier to trim the case, but I don't think that's a sufficient reason not to do it.

anandabits · November 30, 2020, 1:45am

The reason I can think of for not supporting it as that supporting this kind of constraint without flow-sensitive typing could be surprising and frustrating. Consider the above example: when you pattern match on bar and extract the value it would be natural to expect its id to be known to be of type U. If we’re splitting flow-sensitive typing off to a future proposal we might want to reject constraints that subtly suggest it is available.

jrose · November 30, 2020, 2:05am

Oh, that's fair. Maybe the rule is "you can only use equality constraints with a generic parameter of the enum type, not an associated type thereof".

anandabits · November 30, 2020, 2:19am

Yeah, that’s what I was wondering. I think that would be a good place to start. It would be incredible to see this happen!