Pitch: Making expression type checking of operator expressions fast

rudkx · November 16, 2018, 7:01am

This is a follow-up to a post from earlier in the year. I've had an opportunity to refine the original idea and implement it, and the implementation is in-tree but disabled by default. I inadvertently removed the branch of swift-evolution that the previous pitch was in, but that isn't of much concern since this is substantially different than the previous pitch.

Extending operator declarations with designated types

Proposal: SE-NNNN
Authors: Mark Lacey
Review Manager: TBD
Status: Implemented under staging options

During the review process, add the following fields as needed:

Implementation: This is currently in-tree, disabled by default, and enabled by staging options (-enable-operator-designated-types, -solver-enable-operator-designated-types) and only changes behavior in -swift-version 5 mode.
Decision Notes: Rationale, Additional Commentary
Bugs: SR-NNNN, SR-MMMM
Previous Revision: 1
Previous Proposal: SE-XXXX

Introduction

Swift's expression type checker is known to have cases where it is extremely slow in type checking expressions.

Some of these slow typechecking cases involve expressions that make use of operators (whereas others involve collections or chained single-expression closures).

This proposal addresses speeding up type checking for cases that are slow primarily due to the number of combinations of overloads of operators that are examined in the course of typechecking.

Swift-evolution thread: Discussion thread topic for that proposal

Motivation

Users sometimes write reasonable looking expressions that result in either the compiler taking a very long time to typecheck the expression or results in the diagnostic the compiler is unable to type-check this expression in reasonable time without any further hints as to how to fix the problem.

There are various reasons users hit this error:

Expressions involving many operators and (usually) several literals, that should successfully typecheck.
Expressions that should not successfully typecheck (but fail to emit a good diagnostic prior to the type checker giving up).
Expressions involving collection literals sometimes fail even if they have only a handful of elements (e.g. < 20).
Chains of single-expression closures, which in addition to exacerbating the issues with operators and literals, also create new problems due to a lack of explicit types for their parameters.

This proposal aims to address only the first of these issues.

Proposed solution

We introduce syntax to define designated types in operator declarations. These types will be used as the first place to look up operator overloads for a given operator. If the expression successfully typechecks with an overload found via those types, we stop looking at the remaining overloads for the operator.

The intent is to use this within Policy.swift in the standard library in order to provide a hint to the type checker as to which overload should be preferred for typechecking an operator. You can see the current usage of this syntax here.

Users can also add their own designated types to operators that they declare, but they will not have a way to add their own types as designated types to operators already defined in Policy.swift.

In effect this ties a particular operator to the semantics of one (or occassionally multiple) particular named entity(ies), usually a protocol, but sometimes a concrete type. For example, if you're implementing <<, making BinaryInteger the designated type for that operator says that the expectation is that you will be implementing this for a type conforming to BinaryInteger and behaving in the way BinaryInteger would expect it to.

The goal is to use this hint to replace the currently undocumented (and difficult to reason about) hacks in the type checker that are in place today and which are required in order to typecheck even relatively simple expressions.

Detailed design

Operator declaration syntax is extended to allow for one or more designated types to be specified. For infix operators, these types are listed after the (optional) precedence group. For prefix and postfix operators, these types come immediately after a : following the operator name. Some examples:

infix operator * : MultiplicationPrecedence, Numeric
infix operator / : MultiplicationPrecedence, BinaryInteger, FloatingPoint
prefix operator ~ : BinaryInteger

Specifying one or more types results in the expression type checker looking at declarations defined in those types before other declarations of the operator, and stopping as soon as it finds that one of the declarations in one of the types can be used to successfully type check the expression. If it fails to typecheck successfully with one of those declarations, it continues attempting all the other overloads for that operator.

In cases where there are multiple designated types for a given operator, the specific order in which the overloads in those types are examined is left unspecified, but should produce results that are compatible with the existing type checker's notion of the "best" solution if attempting all of the overloads could result in multiple solutions.

Experimental results

The implementation for this is already in-tree, and is enabled as such:

-Xfrontend -enable-operator-designated-types: enables parsing the extended operator declarations
-Xfrontend -solver-enable-operator-designated-types: enables having the constraint solver make use of the declarations, but only when -swift-version 5 is also specified

The current implementation has been tested across 25 test cases that were known to be slow and/or result in the type checker giving up after expending substantial effort attempting to type check the expression.

In addition to enabling this new designated type functionality, this testing disables the shrink portion of the expression type checker (via -Xfrontend -solver-disable-shrink). This portion of the expression type checker is known to help occassionally but often results in substantially increasing the expression type checking time since it frequently doesn't provide any benefit but does a lot of work. We also disable a number of other problematic hacks (via -Xfrontend -disable-constraint-solver-performance-hacks).

In 14 cases, the expressions in question now typecheck in less than 30ms (often as little as 5ms). In the 11 remaining cases, the expression is either one that would fail to typecheck if we were able to typecheck it quickly, or involves other things that are known to be problematic, like closures.

Some illustrative examples:

// Was "too complex", now 13ms:
let i: Int? = 1
let j: Int?
let k: Int? = 2

let _ = [i, j, k].reduce(0 as Int?) {
  $0 != nil && $1 != nil ? $0! + $1! : ($0 != nil ? $0! : ($1 != nil ? $1! : nil))
}


// Was "too complex", now 5ms:
func test(a: [String], b: String, c: String) -> [String] {
  return a.map { $0 + ": " + b + "(" + c + $0 + ")" }
}

// Was "too complex", now 25ms:
func test(strings: [String]) {
  for string in strings {
    let _ = string.split(omittingEmptySubsequences: false) { $0 == "C" || $0 == "D" || $0 == "H" || $0 == "S"}
  }
}

// Was "too complex", now 5ms:
func test(n: Int) -> Int {
  return n == 0 ? 0 : (0..<n).reduce(0) {
    ($0 > 0 && $1 % 2 == 0) ? ((($0 + $1) - ($0 + $1)) / ($1 - $0)) + (($0 + $1) / ($1 - $0)) : $0
  }
}

// Was "too complex", now 2ms:
let a = 1
_ = -a + -a - -a + -a - -a

A number (25+) of other test cases were previously compiling quickly were also tested using this functionality and with the existing hacks disabled, and those cases are as fast or faster than previously.

Source compatibility

Because of the fact that we fall back to behavior that is very similar to the current behavior of the type checker (exhaustively testing combinations of operators in the cases), this change is very compatible with the current behavior. One failure was seen in the source compatibility suite when running with this new behavior enabled by default in all language modes. The failure was in BlueSocket, and that same failure also manifests if you build the affected code with -swift-version 4. Specificially, for let mask = 1 << (2 as Int32), we now infer mask as Int rather than Int32.

Effect on ABI stability

None.

Effect on API resilience

None.

Alternatives considered

There have been various attempts at fixing this problem strictly through changing implementation details of the expression type checker. To date none of those attempts has been completely successful at making relatively straightforward expressions typecheck in a reasonable period of time in all cases.

There have been various iterations of this proposal that suggested alternatives such as treating operators like member lookup, with something similar to these designated types used in places where literals appear in an expression. This current iteration of the proposal seems simpler, easier to teach, and also creates a strong tie between the operator declaration and the semantic meaning of the operator by saying, e.g. that << is intended to be defined by types that implement BinaryInteger and that < is intended to be defined by types that implement Comparable.

jrose · November 16, 2018, 6:08pm

I'll note the case where this breaks source compatibility: if someone is shadowing or overloading an operator in one of the protocols.

extension Int {
  static func +(a: Int, b: Int) -> Int {
    return a &+ b
  }
}

print(Int.max + 1)

This example's contrived, but if the operator in the protocol is itself generic (like collection concatenation), you could actually end up with less efficient behavior. It might still be worth it, though.

Michael_Ilseman · November 16, 2018, 6:16pm

Big +1

This is 5-30ms in contrast to what? Seconds? timing out?

xwu · November 16, 2018, 7:03pm

Speeding up type checking would be huge. However I have two questions:

First:

I know that there was work to allow types to be designated, not just protocols. Therefore, will the Boolean operators have Bool as their designated types as part of this pitch? (I think, for consistency of behavior when others attempt to overload the standard library's operators, that it would make sense to do so even if it's not required for type checking performance.)

Second:

There was a subtle but (IMO) big problem that occurred when heterogeneous comparison operators were introduced for integers. Namely, comparing to a literal value preferred heterogeneous comparison to the default integer literal type Int over homogeneous comparison.

This was partially worked around by taking advantage of the "currently undocumented (and difficult to reason about) hacks in the type checker" to implement homogeneous comparison operators in concrete types to be preferred over the protocol's operators.

However, the same problem persisted in generic code, which can be illustrated as follows:

func f() -> Bool {
  return UInt.max == ~0
}

func g<T : FixedWidthInteger>(_: T.Type) -> Bool {
  return T.max == ~0
}

f()          // `true`
g(UInt.self) // `false`

So, will this pitch (a) make the partial workaround stop working and expose this problem more widely? (b) solve the problem altogether in some way?

rudkx · November 16, 2018, 11:53pm

Oh yes, it's definitely possible to come up with cases where we will now go through a specified protocol and at the end of type checking the expression rewrite the reference in terms of the witness for the type we deduce.

Most of the cases were ones where we gave up, usually after several seconds. There were a few that successfully type checked after ~7-8 seconds.

I looked at this a little at one point, including reintroducing a Boolean protocol for the purpose. In the end I decided to try to have minimal impact on the stdlib. Your point about consistency is interesting, though. WDYT, @Ben_Cohen?

With this change enabled, the generic function produces true, since it now type checks to calling through Equatable's homogeneous implementation.

benrimmington · November 17, 2018, 12:53am

Could the range operators have Comparable as their designated type?

postfix operator ... : Comparable

prefix operator ... : Comparable
prefix operator ..< : Comparable

infix operator  ... : RangeFormationPrecedence, Comparable
infix operator  ..< : RangeFormationPrecedence, Comparable

There are closed range operators on both Comparable and Strideable, but could the latter be removed now? It was originally for CountableClosedRange, when that was a separate type.

Ben_Cohen · November 17, 2018, 1:19am

I'd be really reluctant to introduce a "boolean protocol" just for this purpose. There are naturally families of numbers and collections, but there's only one Bool (pace interop hacks). Any chance, if it's really needed, we could support a concrete type instead of a protocol?

rudkx · November 17, 2018, 1:47am

Yes, it already works for concrete types. If you think that makes more sense now, I'll look at adding the designations for the Bool operators.

rudkx · November 17, 2018, 2:47am

Yes. Ideally these would be requirements on Comparable as well, rather than just have default implementations in an extension.

Is there a reason that's not already the case?

xwu · November 17, 2018, 6:57am

The more I think about it, the more I like the pitch because:

Not only does it serve a practical purpose of speeding up type checking, but because protocols aren’t just bags of syntax, it also makes possible the expression of designated semantics for each operator.

This is wonderful and desirable, considering that every style guide I’ve seen tells users that if they implement a standard library operator for a custom type, they should make sure that the semantics of the operator are respected. Now we can actually write out what those semantics actually are by associating each operator with a type, protocol, or protocols that define such semantics.

Seen in that light, it would be ideal if every standard library operator had at least one type or protocol designated.

It also raises a point regarding + and +=: currently, its designated types/protocols are AdditiveArithmetic, Array, String, but these represent not three but only two semantics: arithmetic addition and concatenation. It also seems somewhat confining to enumerate only the two standard library types that use + for concatenation, as it would be reasonable for others to create their own types that use the operator in that way. This would seem to justify a protocol Concatenatable: MutableCollection with the appropriate semantic definition of + and +=.

Dante-Broggi · November 17, 2018, 7:11am

We already have a protocol for 'Concatenatable', it is called RangeReplaceableCollection . Perhaps the Array, String specification should be replaced with that, instead.

anandabits · November 17, 2018, 2:13pm

Keep in mind that + is an immutable concatenation operator. If we’re going to go in this direction I think we would need a protocol to recognize types that are concatenable even if they are not mutable.

benrimmington · November 17, 2018, 3:06pm

Probably for the reason described in the motivation of SE-0232. Each implementation is so simple that there is no reasonable alternative implementation.

To answer my own question, the other countable range operators on Strideable were removed in apple/swift#13342. It seems like the one in "ClosedRange.swift" can also be removed (the validation test suite passes) so I've created a pull request.

mdiep · November 18, 2018, 6:42pm

Can you explain more about how this speeds up typechecking? Does it cut down on the number of disjunctions? Does it simplify choosing the correct overload?

This wouldn’t do anything for overloads outside the standard library, correct? E.g. I’m a big fan of GitHub - Rightpoint/Anchorage: A collection of operators and utilities that simplify iOS layout code.. This wouldn’t do anything to speed up usage of the overloads from that library, right?

David_Ungar2 · November 22, 2018, 4:45am

Several thoughts:

Reminds me of how we optimized the arithmetic operators in Smaltalk and Self VMs: by optimizing for the integer cases.
If I can specify one designated type, I might want to designate a list to be checked in order.
Is there any way to do this for everything where the types get inferred, not just operators? Could be more orthogonal, less surprising that way.
What if I'm using a framework that provides an operator with a designated type, but in my code I want/need a different type? For instance, suppose I want to write an infinite-precision integer library, reuse all the arithmetic operators, but I know that in my library the types will always be bignums? Maybe there's some sort of notion of context?

Overall, I like the direction, but worry about how the examples can potentially lead to an overly-specific design. And then there's Jordan's point, if I understand correctly, that it could change semantics.

Hmm.... maybe something like this belongs under the covers? As adaptive feedback to the compiler? When it compiles something, it could remember the types that were found, write the info out in a file, and the next time it compiles the same thing, read that file and use it to optimize type inference. Thoughts?

taylorswift · November 22, 2018, 5:26am

There should be a way to separately specify the left and right types for the infix operators. Other than that i’m way in favor of this.

xwu · November 22, 2018, 6:07am

This feature would allow custom operators to specify designated types as well; if adopted by a library, it would speed up type checking for its custom operators as well.

It sounds like there are some misunderstandings regarding the "designated type" feature. The designated type is the type or protocol that the compiler will first look to in order to find a suitable declaration of the operator function; it is a suitable declaration found on that type or protocol which in turn determines the inferred type(s) of the operand(s).

For the arithmetic operators the "designated type" is the appropriate protocol (for example, for multiplication it would be the 'Arithmetic' protocol). Any custom bignum type would conform to those protocols and become "designated."

An operator function is only ever declared once, and you choose whether to declare it on the LHS type or the RHS type if they are not the same. You would not designate both LHS and RHS types for a heterogeneous infix operator, nor would you want to, because that defeats the purpose of the feature. With this feature, you designate the type on which the operator function is declared. (The feature could also theoretically disambiguate a duplicated definition of the same heterogeneous operator where today it would be ambiguous if implemented twice, but if you designated LHS and RHS types separately, then it would "re-ambiguate" all over again!)

mdiep · November 23, 2018, 2:42am

The example I gave was for additional overloads of existing operators from the standard library: == and +, e.g.

rudkx · November 28, 2018, 11:52pm

Yes, that isn't what I initially had in mind when I started down the route of extending the operator declaration syntax, but it is a nice result of the current design.

Yes, I think it's unfortunate that we end up with two meanings for these. I think the best alternative might be to make the concatenation behavior have a different name, e.g. ++, but I think that's a separate discussion with obvious source compatibility and migration concerns.

The difference here is that it only attempts overloads defined in that designated type, and if one of them succeeds, it stops looking for additional solutions. The performance improvement comes from (generally) having only a single overload defined in those types. If you start adding a bunch of additional overloads, you'll see type checking time start to get exponential for that operator again.

We currently allow multiple types to be specified. The implementation attempts to order the overloads based on whether we know something about the concrete types, otherwise it attempts them in order.

I took at quick look at this library when you originally responded and IIRC the overloads were all top-level rather than defined in concrete types, and the concrete types involved did not conform to the associated designated types for those operators. To get the benefit, of this improvement they would need to update their library to move the overloads into concrete types, and have those types conform to the associated protocols.

Note that one benefit of this proposal is that despite that library adding a bunch of overloads, code that is primarily using operators with stdlib types but also happens to import that library will not suffer slower type checking for those stdlib types. That is a really significant benefit since today it is possible to import a library and go from having a program that compiles successfully to one that hits expressions that are considered too complex to type check just due to the import (without actually making use of any of the types or operators defined in the library).

xwu · July 28, 2019, 8:31pm

Out of curiosity, how has this idea played out in-tree? Did it seem to solve the type checking performance problem satisfactorily? Was it deemed too disruptive as ABI was being finalized? Is there a possible future for it?

(I hate to bump a topic so old, but the idea described here is a bit complicated to describe anew in a new thread.)