Generalized abstract data types (GADTs)?

Context

Generalized abstract data types (GADTs) are a useful and natural extension of enums (ADTs) in Swift today.

Wikipedia explains GADTs quite well. Haskell's wiki also provides a motivating use case for why you'd want to graduate from ADTs to GADTs.

Motivation

In a phrase, I'd summarize GADTs as "more-typed enum cases". Have you ever wished you could give a type to a specific enum case?

// A simple expression: a boolean or a number.
enum Expression<T> {
  case bool(Bool)
  case number(Int)
}

// Evaluates an expression to a value.
func evaluate<T>(_ expression: Expression<T>) -> T {
  switch expression {
  case .bool(let bool):
    // Uh oh, we can't write this without a forced-cast (as!).
    return bool as! T
  case .number(let int):
    // Ouch again. Dear type-checker, can't you do better?
    return int as! T
  }
}

Enter GADTs to the rescue. Pseudocode:

// Binary boolean functions.
enum BoolFunction {
  case and, or
  // Evaluates the function for the given operands.
  func callAsFunction(_ lhs: Bool, _ rhs: Bool) { ... }
}

// Binary number functions.
enum NumberFunction {
  case add, subtract, multiply, divide
  // Evaluates the function for the given operands.
  func callAsFunction(_ lhs: Int, _ rhs: Int) { ... }
}

// A simple language of booleans, numbers, and operations.
// GADT syntax is "the type ascription for each enum case".
indirect enum Exp<T> {
  case bool(Bool): Exp<Bool> // (1)
  case number(Int): Exp<Int> // (2)
  case boolFunction(BoolFunction, lhs: Exp<Bool>, rhs: Exp<Bool>): Exp<Bool> // (3)
  case numberFunction(NumberFunction, lhs: Exp<Int>, rhs: Exp<Int>): Exp<Int> // (4)
}

// Evaluates an expression to a value.
func evaluate<T>(_ expression: Exp<T>) -> T {
  switch expression {
  case .bool(let bool):
    // (1) Since `Exp` is a GADT, the type-checker gains extra knowledge
    // when pattern-matching on cases.
    // Here, it knows that `T == Bool` when matching the `.bool` case.
    return bool

  case .number(let int):
    // (2) Here, the type-checker knows that `T == Int`.
    return int

  case boolFunction(let function, lhs: let lhs, rhs: let rhs):
    // (3) Here, the type-checker knows that `T == Bool`.
    return function(lhs, rhs)

  case numberFunction(let function, lhs: let lhs, rhs: let rhs):
    // (4) Here, the type-checker knows that `T == Int`.
    return function(lhs, rhs)
  }
}

Question

What would be needed to implement GADTs in Swift?

Just curious and wanted to start a discussion. I'd like this thread to focus more on low-level implementation details and feasibility, rather than high-level language feature design (this is not a Swift Evolution pitch) - but certainly the latter informs the former.

Among possible type system extensions for Swift (like higher-kinded types and rank-2 types), GADTs seem to be relatively useful and easier to implement.


Kudos to @rxwei for his Gist teaching me about GADTs in Swift!

It is a neat embedding. Perhaps Swift compiler-supported GADTs can desugar to something like it that is already expressible in Swift today. Or perhaps the compiler can have more infrastructure (type-checking, SIL, runtime) to support GADTs.

10 Likes

Ok, had to read up @rxwel's gist to understand the usefulness of this. The usefulness really only becomes apparent when you have expressions consuming A and evaluating to B, like e.g. a comparison of Ints. The benefit is that you can actually statically infer the type to which te expression evaluates.

However, there are ways in Swift to achieve this already.

protocol BinaryExpression {
associatedtype Lhs
associatedtype Rhs
associatedtype Result
func evaluate(_ lhs: Lhs, _ rhs: Rhs) -> Result 
}

enum IntExpr<E : BinaryExpression> where E.Lhs == Int, E.Rhs == Int, E.Result == Int {
case value(Int)
case binary(E, Int, Int)
func evaluate() -> Int {...}
}

enum BasicIntFunction : BinaryExpression {
case add, subtract, multiply, divide
func evaluate(_ lhs: Int, _ rhs: Int) -> Int {...}
}

enum Comparison<T : Equatable & Comparable> : BinaryExpression {
case lessThan
case equal
case greaterThan
func evaluate(_ lhs: T, _ rhs: T) -> Bool{...}
}

enum MyDesiredSyntaxTree {

case value(Bool)
case comparison(Comparison<Int>, IntExpr<BasicIntFunction>, IntExpr<BasicIntFunction>)

func evaluate() -> Bool {
   switch self {
      case .value(let bool):
        return bool
      case .comparison(let c, let lhs, let rhs):
         return c.evalute(lhs.evaluate(), rhs.evaluate())
}
}

}

Sure: You won't have the shorthand Expr<Bool> or Expr<Int> evaluating to Bool or Int respectively, but that can be solved by naming conventions of your expression types. Also, this approach keeps the switch statements shorter.

3 Likes

Thanks for the observation!

Protocols and associated types (PATs) are fundamental general language features, which make them capable of modeling many things - including GADTs - with a high degree of explicitness.

First-class GADTs would be intentionally designed to be more scoped (tries to do less) but more ergonomic (tries to do it much better).

In two words each: PATs are a "general technology" (general and low-level, can model many things explicitly) and GADTs are a "specialized technology" (intentional, scoped, useful language feature).

@Max_Desiatov told me about some nice GADT design ideas (à la Swift Evolution pitch), so I'd like to open the floor to discuss some more design-y topics (feature details, use cases, interaction with other language features, naming and bikeshedding).

If the design discussion grows too big, we can create a new forum thread for just GADT design, versus this thread about GADT implementation. It makes sense to discuss design first, after all.

@Max_Desiatov and others: would you like to share your GADT ideas? :slightly_smiling_face:

2 Likes

Let me state upfront that I hope we'll see dependent types in Swift or in a language inspired by Swift someday. GADT is a welcome step in that direction allowing us to write more type-safe code.

Firstly, as Swift doesn't call its enums ADT, I wonder if this could be named differently for broader audience? What about "explicit generic enum types" or "generalized enum types"?

In terms of syntax, I initially thought if case bool(Bool) -> Exp<Bool> makes more sense than case bool(Bool): Exp<Bool>. But this would then imply that signatures for cases always look like functions, which is not true. case foo(Bar) introduces an equivalent of static func foo(_: Bar) -> Self to the scope of this enum, not static func foo(_: Bar), where the latter has empty tuple () as its return type. For this reason case bool(Bool): Exp<Bool> fits better in my opinion, as it reads as an explicit hint to the type checker, not as a return type.

We also need to come up with more use cases. One I found convincing is a translation of safe lists to Swift from the Haskell wikibook:

struct Empty {}
struct NonEmpty {}

indirect enum SafeList<Element, IsEmpty> {
  case nil: SafeList<Element, Empty>
  case cons(Element, SafeList<Element, IsEmpty>): SafeList<Element, NonEmpty>

  func append(_ element: Element) -> SafeList<Element, NonEmpty> {
    .cons(element, self)
  }
}

extension SafeList where IsEmpty == NonEmpty {
  var head: Element {
    guard case let .cons(result, _) = self else { fatalError() }
    return result
  }
}

let empty = SafeList<Int, Empty>.nil
let nonEmpty = empty.append(42)

nonEmpty.head // evaluates to 42
empty.head // compile-time error: `head` is not available on empty lists
5 Likes

I think case bool(Bool) -> Exp<Bool> (1) is more faithful to Haskell, but case bool(Bool): Exp<Bool> (2) is more idiomatic in Swift. Nice supporting arguments for (2) - I agree.

struct Empty and struct NonEmpty (and their users) are a manual encoding of type witnesses, similar to @rxwei's Gist. First-class GADTs would be more ergonomic!

The original example code in Haskell also declares Empty and NonEmpty explicitly:

-- we have to define these types
data Empty
data NonEmpty

Which seem to me to be equivalent to empty structs. :thinking:

Ah, I missed that the Haskell code also declares empty structs (Swift terminology) Empty and NonEmpty. That makes sense.

I would encourage us to challenge this assertion. Can we do better with first-class GADTs in Swift?

I believe it's equivalent to Never in Swift as Haskell doesn't differentiate between the struct and enum in declarations:

// struct
data Point2D = Point Int Int
data Point3D = Point Int Int Int

// enum
data SomePoint = Two Point2D | Three Point3D

Yes, data Empty is equivalent to Never. data Empty is an empty type, used as a marker type for the SafeList GADT.

Swift structs are a special case of enums (ADTs) with one case. Stored properties are associated values.


I'd still like to poke the bear: are such marker types necessary? If yes, can we prove it? If not, how can we do better?

Thank you for bringing this topic up. I’d really love to see GADTs come to Swift. I believe @Joe_Groff has thoughts on the topic.

Thanks for linking to this! I hadn’t seen it before and it’s much better than anything I had come up with when trying to encode GADT in Swift.

3 Likes

Some GADT implementation thoughts. What's needed are:

  1. Type-checking support (type-checking, type-inference, diagnostics).

This involves flow-sensitive typing: scope-sensitive unification type constraints in GADT pattern-matching declaration bodies. I wonder if Swift's constraint system supports this? Maybe it's as simple as cloning constraint graphs and adding new constraints (T == Int).

  1. Runtime support (for carrying around GADT type witness information).

Enum runtime metadata probably needs to be extended somehow. I wonder how, and what are the reflection API implications?

  1. Intermediate infrastructure.

Maybe we need intermediate glue to support the above two main points (e.g. new SIL instructions).

2 Likes

I'm not sure what to make of the idea to add something looking like a type annotation to enum cases. Would they only be supported for "generalized enum types"? In that case, this would be awkwardly specific syntax. Also, you'd have to know quite a bit about the enum to understand that the second generic type actually allows only two types, right?

Question on the intent: Should your example enable the compiler to derive:

extension SafeList where IsEmpty == NonEmpty {
   var head: Element {
      switch self {
         case let .cons(result, _):
            return result
      }//exhaustive
   }
}

?

That is correct, it should be able to derive that. But I'm not sure if this derivation machinery should be a part of this same pitch, I reckon this "derivation" logic would be quite complex. The whole GADT proposal itself is pretty big in terms of implementation details, so I hesitate to propose even more type checker rules that could be out of scope for the initial phase of this potential work.

I think "generalized enums" could be good. In Swift, enum (and all nominal types) is first a declaration and second a type.

But "generalized" sounds confusing, especially to students. I'm sure there's a better name out there.


Edit: actually, I think "generalized enums" is a great name, if presented correctly.

We typically don't think of GADTs as ADTs, but as something different.

However, in Swift, we can think of GADTs (generalized enums) as enums. GADTs are enums first, in the same way that protocols-with-associated-types (PATs) are very much protocols.

In The Swift Programming Language, there probably needn't be a "Generalized Enums" chapter, merely a "Generalized Enums" section in the "Enums" chapter.

When thinking about implementation, a quite "obvious" approach (although I'm not an expert on working on the compiler) would be to synthesize individual enums for each type variable that has this "type-level enum semantics". That is, in a first step, the compiler would derive from the enum cases that specialize a type variable that this type variable has "enum semantics". Your SafeList example would look to the compiler like this:

indirect enum SafeList<Element, @EnumSemantics IsEmpty> {
  case nil: SafeList<Element, Empty>
  case cons(Element, SafeList<Element, IsEmpty>): SafeList<Element, NonEmpty>
}

In a second step, the compiler can then create sub-enums for each "concrete" type you gave:

@Witness(SafeList<Element, Empty>)
enum SafeListEmpty<Element>{
   case nil
}

@Witness(SafeList<Element, NonEmpty>)
indirect enum SafeListNonEmpty<Element> {
   case cons(Element, SafeList<Element, @Indirect(IsEmpty)>
}

While I'm not 100% sure how to treat the @Indirect(IsEmpty) (in your example, it is a type variable of the enclosung enum, which would mean that in the NonEmpty case, standard type inference would give you an eager infinite list???), these synthesized enums should actually do the job. That is, extensions for SafeList where IsEmpty == (Non)Empty would just be an alias for the internal SafeList(Non)Empty. Unconstrained extensions would require you to write out all the cases in a switch block, but if you pair this with standard generic specialization, the compiler would just remove the case blocks for the types that aren't matched. It's like having an implicit #if #else.

Three things to think about:

  1. To integrate smoother with existing syntax, I'd propose where clauses instead of type annotation like statements in the enum cases.

  2. While the compiler should easily be able to derive which type parameters have "type level enum semantics", the reader may not be able to do that - especially if they don't have access to the original declaration. Requiring an explicit attribute for those type parameters may be helpful.

  3. Unlike Haskell, Swift has mutating funcs. In order to make the static type checks work correctly, we would have to either ban mutating funcs for those generalized enums or at the very least make sure that their actual type doesn't change.

Off topic: I recently had a situation in Kotlin where I checked a property of this for being non-null and in that scope calling a method updating that property [yeah ... please don't judge me for this :smiley:]. The method potentially could make the value null again, but Kotlin's flow-dependent type system told me it was non-null. Now, in my case, this was actually correct, but I doubt that the compiler is that smart - it would have to look into the actual implementation of the other method and check all code paths (the latter part actually being equivalent to the halting problem). I'm glad that Swift went another way and requires us to bind unwrapped optionals to a new variable/constant.

I agree with your user experience report of Kotlin (flow-dependent typing) vs Swift (useful implicit conversions to and from Optional, Any, etc).

Swift is nice! It's good to be explicit (typed), implicitly (inferred), and to minimize surprise.

I don’t. It’s not obvious that this is the only axis of generalization available, and reserving such a… general… name for a particular extension is short-sighted. (Swift enums could be considered a generalization of C enums, for example.)

Following the pattern of “protocols with associated types” and “enums with associated values”, how about “enums with case types”? (Although this could also apply to at least one other direction – treating cases as distinct subtypes, by analogy with Kotlin’s sealed-classes-as-sum-types pattern.)

1 Like

I'm a fan of the spirit! Removing "generalized" is big.


That said, the tricky thing about "case types" is that enum cases already have a type: Self. With GADTs, the case types will be Self refined.

It's the same trickiness that we (@rxwei) had in naming SE-0253: Callable values of user-defined nominal types. A lot of qualification (for the level of precision warranted for an Evolution proposal) is necessary because Swift already had "callable values": closures and metatypes.

Maybe "enums with case types (ECTs)" is fine as a colloquial name, like PATs. And the formal name would be something like "enums with refined case types", or just "GADTs".

It's nice to see other folks excited about GADTs. The syntax that I arrived at while pestering, ahem, discussing this feature with folks was

// use the familiar `where` keyword
indirect enum Parser<Value> {
	case emptyString(Value)
	case literal(String)
	case choice<A, B>(Parser<A>, Parser<B>) where Value == (A, B) // Or
								   // ‘… where Parser<A, B>’
 								   // ‘… where Self == Parser<A, B>’ 
 								   // Or… 
	case sequence(Parser<Value>, Parser<Value>)
}

This works well because it conveys that the case is conditional on the type without complicating other bits of the enum syntax. (and… after applying it to the OP example, I think it helps some with clarifying intent. Leaving T in the associated type spot means less up front parsing for the human, I think)

indirect enum Exp<T> {
  case bool(T) where T == Bool
  case number(T) where T == Int
  case boolFunction(T) where T ==  (BoolFunction, lhs: Exp<Bool>, rhs: Exp<Bool>)
  case numberFunction(T) where T == (NumberFunction, lhs: Exp<Int>, rhs: Exp<Int>)
}

8 Likes