[Pre-Pitch] Runtime Type Guarantees

Jon_Hull · November 16, 2019, 5:02am

I have had an idea for a while now kicking around in the back of my head, and I could use some help fleshing it out into a workable solution for Swift. (Note: I am using Array below to simplify, but this would all really apply to Sequence/Collection, etc…). The syntax is mostly placeholder syntax, which is what I need help with.

Motivation

There is always a tradeoff between safety and performance when writing reusable code. Low level code either has to make assumptions about the values being passed to it (e.g. a value is non-zero, or an array is non-empty) or it has to check those assumptions. When these functions call one another, but could also be called from the outside, you often find them repeatedly performing the same check.

For example, if I have code which can only return a meaningful value if an array passed to it is non-empty (e.g. first), then it must either:

crash if the array is empty
return an optional
throw an error

Each of these options has an associated cost which is compounded when you call lower level functions which have to make the same choice. Each level has to deal with the optional, or the error, or has a chance of crashing.

It would be nice to be able to make that check only once, and then to tell the lower-level code that you guarantee the check has been made (and then to have the compiler enforce this guarantee). In short, it would be nice to use the type system for this, but we can’t, because the check can only be made at runtime.

Proposed solution

While the check itself can only be made at runtime, there is still a lot we can guarantee at compile time. If the check itself returns a type with the guarantee baked in, then we can just use it like we would any other type: As the type of a variable, parameter, return type, etc….

if let myNonEmptyArray = myArray as? NonEmpty Array<T> {
	///We know that the array is not empty and can use it accordingly
}

Once we have that, we can make functions that only take arrays which are known not to be empty:

func foo(_ array: NonEmpty Array<T>) {
	///Do something with an array guaranteed not to be empty
}

A non-empty array would still have all the methods, etc… available to any Array (i.e. it is still an Array), but we could also add conditional methods that take advantage of the additional knowledge. For example, first and last could return a non-optional result, avoiding repeated unwrapping:

extension NonEmpty Array {
	var first:Element {
		return self[0] ///We don't have to worry about a crash or optionals here
	}

    var isEmpty:Bool {
       return false ///We can optimize here
    }
}

I believe that these modifiers should be sub-typeable in the same way protocols are. For example, you might have a NonZero Int, and a Positive Int, where Positive also guarantees non-zero-ness.

A couple of other use-cases for this:

• Guaranteeing a floating point value is an actual number which isn’t NaN, inf, etc…
• Guaranteeing a non-zero or positive number
• Guaranteeing a string wasn’t created by the user (or has been appropriately escaped)
• Helping to enforce read/write access rules (especially across Actor boundaries)

Basically, anywhere where you would repeatedly need to make the same runtime check, or you are dealing with optionals which only happen in an edge case. I believe this could actually lead to both safer code (replacing areas where the check was only enforced by documentation) and faster low-level code (avoiding checks which have already been made by higher-level code).

What I Need Help With

The syntax.

How are these defined?

The main parts of the definition are:

The Name
The check (e.g. !array.isEmpty)
(Optionally) A Sub-modifier (e.g. NonZero and Positive)

What does it look like?

The check would then be available by using is and as?

if myVar is Modifier Type {
	//The check succeeded
} else {
	//The check failed
}

if let checkedVar = uncheckedVar as? Modifier Type {
	//We can use the modified type here
}

How to spell them?

I am using ModifierName TypeName, but if that doesn’t work for some reason we could do something else. We need to find a syntax which we know isn't ambiguous.

Any thoughts are appreciated...

typesanitizer · November 16, 2019, 6:42am

In other languages like Rust, Haskell and the like, this problem is "solved" by writing newtypes (single element structs in Swift terms). For example, the Rust standard library has types like std::num::NonZeroU8 and the Haskell standard library has types like Data.List.NonEmpty.

In Swift, this would look like (apologies if my Swift is not idiomatic):

func getUncheckedFirst<T>(_ a: Array<T>) -> T {
    // TODO: implement this
}

struct NonEmpty<T> {
  var inner: Array<T>
  init?(from array: Array<T>) {
    if array.count == 0 { return nil }
    self.inner = array
  }
  var first: T {
      get {
          return getUncheckedFirst(inner)
      }
      set {
          inner[0] = newValue
      }
  }
}

Given the presence of language features like protocols, extensions, properties and fallible inits, I think this can make for particularly nice APIs in Swift, while avoiding the performance costs.

Here are some of the trade-offs around using newtypes vs built-in compiler logic:

Implementation work: Using newtypes doesn't require any special logic built into the compiler. On the other hand, building these kinds of predicates into the compiler means more
- implementation work
- more room for bugs
- more room for edge cases which is not a bug but doesn't work even though one might expect it to (information not getting propagated correctly because this requires flow-sensitive analysis)
- more room for "why does X work but X + ∆X does not?" because the underlying theory the compiler can handle is not sufficiently well defined. From a user's perspective, the difference might be small but it might be complex from a general implementation perspective. This is the biggest design challenge with such a feature. What limitations are acceptable and where should the line be drawn about what goes into the compiler vs what goes into libraries?
Code reuse: The newtype approach has less code reuse because one needs to write forwarding methods or wrapper methods in a lot of cases. Depending on the meta programming facilities, the duplication here can be reduced to some extent.
Performance: both approaches should give the same performance. There is one edge case where "bake the logic into the compiler" has an edge: it can help the compiler reason about the number of inhabitants in a type and potentially optimize the layout better. Another edge case is that if the compiler can reason about things like "these two numbers are small, so addition cannot trap" and omit checks. ~~However, in both of these cases~~ [thanks @TellowKrinkle for the correction below] For the second case, adding an "unsafeUnreachable" primitive (not sure if it already exists?) would help get the same effect.
- As a last resort, if you literally need the best performance possible bar none, you probably should be using modern C++, Rust or Fortran instead of writing that core bit in Swift.

Personally, I think the tradeoff is that newtypes consist of "write-once easy-to-check code", vs compiler logic which I think will be (relatively speaking) "constant-maintenance hard-to-check code", so I would prefer that library authors defined such newtypes (or make a case for inclusion in a key library if they feel there is some value there) instead of having this as a language-level feature.

mayoff · November 16, 2019, 6:57am

You might be interested in Point-Free’s NonEmpty package:

NonEmpty is a lightweight type that can transform any collection type into a non-empty version.

Jon_Hull · November 16, 2019, 7:28am

Thanks.

My two concerns about NewType (please tell me if I am wrong):

It isn't a subtype of the type. A NonEmpty Array<T> would be able to be passed to anything which takes Array<T> (so you don't need a combinatorial explosion of functions)
You would have to re-write or forward all of the functions from the original type

If a NewType is still considered a member of it's originating type (and auto forwards messages in some way), then I would be fine with that approach...

TellowKrinkle · November 16, 2019, 7:43am

I don't think adding unsafeUnreachables will ever convince a compiler to change the layout of a type, Rust's uses compiler magic

DevAndArtist · November 16, 2019, 8:14am

It‘s an interesting pitch, however I personally lean towards a static solution rather than anything at runtime. There are not only a NonEmpty collection that I personally want, but also other things like a fixed range of a given numeric type etc.

I don‘t remember much about compile-tome constant expression evaluation, but it seems that it will solve a few of such shortcomings. Wasn‘t it part of the TensorFlow Swift Project?

——
Edit: Ah found it: Compile Time Constant Expressions for Swift · GitHub

Jon_Hull · November 16, 2019, 9:36am

I actually think we could get an extraordinary amount of compile-time behavior from just small modifications of this pitch.

For example, if we define inits in an extension, they could create these modified types right at compile time:

extension NonEmpty Array {
    init(repeating: Element, count: Positive Int) {
        ///Create non-empty array here
    }
}

All of the runtime stuff would still work, of course.

When compile-time constant expressions get added, it should also naturally allow a lot more to be done at compile time without having to otherwise change this design.

The one thing which would be nice, but I have no idea how to do without breaking things, would be to have these be create-able using literals. Maybe adding an extra failable initializer to the ExpressableBy... protocols with a default implementation that fails.

extension NonEmpty Array {
    init?(arrayLiteral value: ArrayLiteralType) {
        guard !value.isEmpty else {return nil}
        self = value
    }
}

Or now that I think about it, I guess you could get part-way there by allowing an override, and then having a precondition.

extension NonEmpty Array {
    init(arrayLiteral value: ArrayLiteralType) {
        precondition(!value.isEmpty)
        self = value
    }
}

Compiler magic could then default to NonEmpty Array if the variable needs to infer:

var a = [1,2,3] ///Equivalent to var a:NonEmpty Array<Int> = [1,2,3]

Jean-Daniel · November 16, 2019, 10:57am

You can't justify that safety check have such a great impact on performances that it requires a language change without some numbers at least.

I don't think this is the right way to do it. I would rather use composability (like protocol too) than inheritance. If I want a Zero or Positive constraint (for √ function for instance), I won't be able to express it without introducing yet another new constraint.

It would be preferable to have Positive meaning [0; +∞[, and using something like NonZero & Positive to express ]0; +∞[.

Moreover, it would be clearer that you can pass the value to functions taking NonZero input and to functions taking Positive input.

Jon_Hull · November 16, 2019, 11:25am

You convinced me on the composition front (vs inheritance). How would you spell that?

As for numbers, I guess the best way to count would be to look for preconditions in the standard library and 3rd party libraries, as well as documentation that says to only pass a certain subset of values. I run into it often enough that it just seemed obvious to me, but other's milage may vary.

As for performance, you would be eliminating a lot of branches by selecting the right overload at compile time... not to mention that you eliminate the possibility of runtime crashing/misuse by catching mistakes at compile time. That seems worthwhile to me.

Jean-Daniel · November 16, 2019, 11:49am

Yes, but until shown that these branches have a real impact on real world code, this is just premature optimisation.

Another question is how much of stdlib would benefit from this. If only a few methods are optimised for constrained types, the benefit will be totally negligible.

Don't forget that until recently, all methods call required many branches (objc_msgSend checked at least for nil object, smart pointer, and for selector not found in cache) and it was not enough an issue to have to heavily change the language.

michelf · November 16, 2019, 2:01pm

The way I see it, you're sort-of attaching some restrictions to a type at the point where you're defining the value without really changing its nature. Let's call this a decorator.

decorator NonEmpty: Collection {
   // things that must be true at any time
   invariant {
      assert(!isEmpty)
   }
}

The decorator's main role is maintaining this invariant. To that end, it has to forbid all the mutating methods that could break that invariant, such as array.removeAll(). This is tricky since there can be any number of methods that could mutate the value and break the invariant. To ensure correctness the compiler should insert a call to the invariant after mutations, whether it is caused by a mutating method or by passing it inout to some other function. Statically forbidding some methods (such as removeAll) then becomes a nice convenience for catching some issues at compile-time but is not necessary for correctness.

Some notes about enforcing the invariant:

can't really enforce it with reference types since other references could have no decorators and mutate the object skipping the invariant
a remove method can put the collection in a non-empty state if you're removing the last element, but should nonetheless be allowed so you can remove elements except the last one
a removeAll method can be statically guarantied to fail the invariant so should be statically disallowed when possible

Once you can ensure the invariant is properly maintained, you can think about adding some affordances. It might look something like this:

decorator NonEmpty: Collection {
   // things that must be true at any time
   invariant {
      assert(!isEmpty)
   }

   // Add decorator to the function's return type.
   // The implementation isn't changed.
   // `invariant` is run on the returned value when applying NonEmpty to it.
   refine func map<T>(_ transform: (Element) -> T) -> NonEmpty Array<T>

   // don't change `func filter`:  can't guaranty the result is NonEmpty

   // Add decorator to the function's return type.
   // `NotNullable`'s `invariant` is run upon returning the value.
   refine var first: NotNullable Element?
   // Ideally, the compiler could see `NotNullable Element?` as equivalent 
   // to `Element` and allow an automatic conversion. Or better, it could 
   // allow us to write a plain `Element` as the return type as a special case.
}

Note that instead of reducing the number of checks, all this does is evaluate the invariant automatically at every turn so any correctness issue is caught early. This is likely to run slower unless the optimizer is very smart about it and can turn decorators added to function signatures to its advantage. But it makes it easier to express correctness constraints and it enables some affordances.

anandabits · November 16, 2019, 2:28pm

What you’re looking for is known as refinement types. One implementation I’m aware of is Liquid Haskell. This is a very sophisticated type system feature. I have brought it up in the past and am interested in it myself. However, IMO there are a lot lower hanging fruit in the type / generics system that should be tackled before we consider features like this.

Tino · November 16, 2019, 5:33pm

I don‘t have numbers either (but I can‘t remember a single proposal that did its homework here ;-), but the idea as I read it is a generalisation of a well established concept:
Mutability, which adds important guarantees for structs without creating extra types.
So I wouldn’t focus that much on the runtime-aspect of the pitch, but rather consider the benefits during compilation.

However, my gut feeling is that Swift is already too conservative for such innovative features which did not prove their usefulness yet

typesanitizer · November 16, 2019, 5:33pm

Ah right! Layouts are determined without looking at function bodies so that makes perfect sense. Thanks for the correction, I've edited my comment to reflect that.

Tino · November 16, 2019, 5:45pm

Something similar popped up during discussion about the (already big) family of collection/sequence protocols (e.g having
Sequence & Multipass & Finite instead of extra protocols for all combinations)

typesanitizer · November 16, 2019, 5:54pm

That is correct. However, it doesn't necessarily mean that you need a combinatorial explosion of functions. You would write a core set of functions which address the primary needs and convert at the boundaries.

The problem with the "still considered a member of it's originating type" is that it (kinda') breaks down in the presence of mutation. Let's say NonemptyArray was a subtype of Array.

var x: NonemptyArray<String> = ["Hello World!"]
(x as Array<String>).remove(at: 0)

What should this code do at runtime?

Perform a copy on the upcast: This means that x will not be mutated. However, the compiler can't bake remove to be special -- this means that we potentially need to start performing defensive copies on up casts in a lot more places! Ouch.
Don't perform a copy on the upcast: This means that we violated the invariant of x being a non-empty array. Ouch.

Jean-Daniel · November 17, 2019, 12:00am

There where very few proposals where the main motivation was optimisation. And reading the discussion, it even looks like having enforced constraints may even slow down the code.

That said, it does not mean that proposition has no value, but it should really make clear that generating more performant code is not the main motivation, and instead push the fact that extending the type system like this may be a great tool for API designing.

TellowKrinkle · November 17, 2019, 3:08am

Personally, my expected result would be one of these two:

(x as Array) is an lvalue and you cannot call mutating methods on it
Preconditions are rechecked on mutations and this would cause an immediate precondition failure similar to what would happen if you overflowed an integer, which is much nicer than having the precondition failure happen at the point x is used.

Jon_Hull · November 17, 2019, 3:44am

I hadn't considered the mutability issue that several people have brought up. Thanks for raising it, it is helpful.

I'll have to think about it more, but off the top of my head, the best way to handle mutable functions would be to disallow them unless they are explicitly defined for the modified (refinement) type. We would provide a fixit explaining and suggesting to cast it to the unmodified type to perform the mutation.

This should keep full speed while getting the programmer to think about the underlying issue (i.e if you remove something, it may be empty... and thus no longer fits in the type).

Jon_Hull · November 17, 2019, 3:48am

It would not compile.

You would need to do:

var x: NonEmpty Array<String> = ["Hello World!"]
var x2: Array<String> = x
x2.remove(at: 0)