[Pre-Pitch] Runtime Type Guarantees

It‘s an interesting pitch, however I personally lean towards a static solution rather than anything at runtime. There are not only a NonEmpty collection that I personally want, but also other things like a fixed range of a given numeric type etc.

I don‘t remember much about compile-tome constant expression evaluation, but it seems that it will solve a few of such shortcomings. Wasn‘t it part of the TensorFlow Swift Project?

——
Edit: Ah found it: https://gist.github.com/marcrasi/b0da27a45bb9925b3387b916e2797789

6 Likes

I actually think we could get an extraordinary amount of compile-time behavior from just small modifications of this pitch.

For example, if we define inits in an extension, they could create these modified types right at compile time:

extension NonEmpty Array {
    init(repeating: Element, count: Positive Int) {
        ///Create non-empty array here
    }
}

All of the runtime stuff would still work, of course.

When compile-time constant expressions get added, it should also naturally allow a lot more to be done at compile time without having to otherwise change this design.

The one thing which would be nice, but I have no idea how to do without breaking things, would be to have these be create-able using literals. Maybe adding an extra failable initializer to the ExpressableBy... protocols with a default implementation that fails.

extension NonEmpty Array {
    init?(arrayLiteral value: ArrayLiteralType) {
        guard !value.isEmpty else {return nil}
        self = value
    }
}

Or now that I think about it, I guess you could get part-way there by allowing an override, and then having a precondition.

extension NonEmpty Array {
    init(arrayLiteral value: ArrayLiteralType) {
        precondition(!value.isEmpty)
        self = value
    }
}

Compiler magic could then default to NonEmpty Array if the variable needs to infer:

var a = [1,2,3] ///Equivalent to var a:NonEmpty Array<Int> = [1,2,3]

You can't justify that safety check have such a great impact on performances that it requires a language change without some numbers at least.

I don't think this is the right way to do it. I would rather use composability (like protocol too) than inheritance. If I want a Zero or Positive constraint (for √ function for instance), I won't be able to express it without introducing yet another new constraint.

It would be preferable to have Positive meaning [0; +∞[, and using something like NonZero & Positive to express ]0; +∞[.

Moreover, it would be clearer that you can pass the value to functions taking NonZero input and to functions taking Positive input.

1 Like

You convinced me on the composition front (vs inheritance). How would you spell that?

As for numbers, I guess the best way to count would be to look for preconditions in the standard library and 3rd party libraries, as well as documentation that says to only pass a certain subset of values. I run into it often enough that it just seemed obvious to me, but other's milage may vary.

As for performance, you would be eliminating a lot of branches by selecting the right overload at compile time... not to mention that you eliminate the possibility of runtime crashing/misuse by catching mistakes at compile time. That seems worthwhile to me.

Yes, but until shown that these branches have a real impact on real world code, this is just premature optimisation.

Another question is how much of stdlib would benefit from this. If only a few methods are optimised for constrained types, the benefit will be totally negligible.

Don't forget that until recently, all methods call required many branches (objc_msgSend checked at least for nil object, smart pointer, and for selector not found in cache) and it was not enough an issue to have to heavily change the language.

The way I see it, you're sort-of attaching some restrictions to a type at the point where you're defining the value without really changing its nature. Let's call this a decorator.

decorator NonEmpty: Collection {
   // things that must be true at any time
   invariant {
      assert(!isEmpty)
   }
}

The decorator's main role is maintaining this invariant. To that end, it has to forbid all the mutating methods that could break that invariant, such as array.removeAll(). This is tricky since there can be any number of methods that could mutate the value and break the invariant. To ensure correctness the compiler should insert a call to the invariant after mutations, whether it is caused by a mutating method or by passing it inout to some other function. Statically forbidding some methods (such as removeAll) then becomes a nice convenience for catching some issues at compile-time but is not necessary for correctness.

Some notes about enforcing the invariant:

  • can't really enforce it with reference types since other references could have no decorators and mutate the object skipping the invariant
  • a remove method can put the collection in a non-empty state if you're removing the last element, but should nonetheless be allowed so you can remove elements except the last one
  • a removeAll method can be statically guarantied to fail the invariant so should be statically disallowed when possible

Once you can ensure the invariant is properly maintained, you can think about adding some affordances. It might look something like this:

decorator NonEmpty: Collection {
   // things that must be true at any time
   invariant {
      assert(!isEmpty)
   }

   // Add decorator to the function's return type.
   // The implementation isn't changed.
   // `invariant` is run on the returned value when applying NonEmpty to it.
   refine func map<T>(_ transform: (Element) -> T) -> NonEmpty Array<T>

   // don't change `func filter`:  can't guaranty the result is NonEmpty

   // Add decorator to the function's return type.
   // `NotNullable`'s `invariant` is run upon returning the value.
   refine var first: NotNullable Element?
   // Ideally, the compiler could see `NotNullable Element?` as equivalent 
   // to `Element` and allow an automatic conversion. Or better, it could 
   // allow us to write a plain `Element` as the return type as a special case.
}

Note that instead of reducing the number of checks, all this does is evaluate the invariant automatically at every turn so any correctness issue is caught early. This is likely to run slower unless the optimizer is very smart about it and can turn decorators added to function signatures to its advantage. But it makes it easier to express correctness constraints and it enables some affordances.

1 Like

What you’re looking for is known as refinement types. One implementation I’m aware of is Liquid Haskell. This is a very sophisticated type system feature. I have brought it up in the past and am interested in it myself. However, IMO there are a lot lower hanging fruit in the type / generics system that should be tackled before we consider features like this.

13 Likes

I don‘t have numbers either (but I can‘t remember a single proposal that did its homework here ;-), but the idea as I read it is a generalisation of a well established concept:
Mutability, which adds important guarantees for structs without creating extra types.
So I wouldn’t focus that much on the runtime-aspect of the pitch, but rather consider the benefits during compilation.

However, my gut feeling is that Swift is already too conservative for such innovative features which did not prove their usefulness yet :cry:

1 Like

Ah right! Layouts are determined without looking at function bodies so that makes perfect sense. Thanks for the correction, I've edited my comment to reflect that.

Something similar popped up during discussion about the (already big) family of collection/sequence protocols (e.g having
Sequence & Multipass & Finite instead of extra protocols for all combinations)

1 Like

That is correct. However, it doesn't necessarily mean that you need a combinatorial explosion of functions. You would write a core set of functions which address the primary needs and convert at the boundaries.

The problem with the "still considered a member of it's originating type" is that it (kinda') breaks down in the presence of mutation. Let's say NonemptyArray was a subtype of Array.

var x: NonemptyArray<String> = ["Hello World!"]
(x as Array<String>).remove(at: 0)

What should this code do at runtime?

  1. Perform a copy on the upcast: This means that x will not be mutated. However, the compiler can't bake remove to be special -- this means that we potentially need to start performing defensive copies on up casts in a lot more places! Ouch.
  2. Don't perform a copy on the upcast: This means that we violated the invariant of x being a non-empty array. Ouch.
2 Likes

There where very few proposals where the main motivation was optimisation. And reading the discussion, it even looks like having enforced constraints may even slow down the code.

That said, it does not mean that proposition has no value, but it should really make clear that generating more performant code is not the main motivation, and instead push the fact that extending the type system like this may be a great tool for API designing.

1 Like

Personally, my expected result would be one of these two:

  • (x as Array) is an lvalue and you cannot call mutating methods on it
  • Preconditions are rechecked on mutations and this would cause an immediate precondition failure similar to what would happen if you overflowed an integer, which is much nicer than having the precondition failure happen at the point x is used.
4 Likes

I hadn't considered the mutability issue that several people have brought up. Thanks for raising it, it is helpful.

I'll have to think about it more, but off the top of my head, the best way to handle mutable functions would be to disallow them unless they are explicitly defined for the modified (refinement) type. We would provide a fixit explaining and suggesting to cast it to the unmodified type to perform the mutation.

This should keep full speed while getting the programmer to think about the underlying issue (i.e if you remove something, it may be empty... and thus no longer fits in the type).

It would not compile.

You would need to do:

var x: NonEmpty Array<String> = ["Hello World!"]
var x2: Array<String> = x
x2.remove(at: 0)

I don't follow. The OP wants both code reuse and performance. If you go with option 1, code reuse is affected. If you go with option 2, performance is affected (the proposal wants static checking instead of runtime checking).

I mean the runtime check for emptiness is going to have to happen somewhere (assuming the array contents aren't known at compile time), this just moves it from the object use side to the object creation side with the hope that you will create/modify objects less than you use them (often true). Unless you know of a way to guarantee that an array stays nonempty on modification without checking if it's nonempty.
I think in general the goal of attaching preconditions to objects is to put more work on the side of object creation in return for less work when using the objects, which pays off when you use objects multiple times, but I don't think it's possible to reduce the cost of modification. Someone has to check whether that integer is nonzero whether it's the person making it or the person using it.
As for the lvalue option, it prevents possibly surprising people when they crash from making modifications that could break preconditions, forcing them to acknowledge it by recreating a new NonEmpty or whatever from the result and force unwrapping it to indicate their intent. The minus is that it will seem horribly unnecessary for other mutating methods that will always succeed, like .append.

2 Likes

What I think can be done in this case is to allow mutable methods which have been defined specifically for that modified (refinement) type. So you would be able to use .append right on the variable, but you would not be allowed to use .remove without putting it in a variable of the appropriate type.

I believe this can all still be done at compile time, with a few explicit runtime checks where necessary as before.

The problem is that with the runtime check on modification scheme, every modification is penalized! This is bad, because we probably want to modify things quite a bit for performance (consider pushing elements into a preallocated array). The newtype approach moves the checking to creation time instead of modification time, which avoids this drawback. [Maybe I am misunderstanding your point, but from my POV, the subtyping approach seems to require checking on modification, whereas the newtype approach does checking on creation.]

As for the lvalue option, it prevents possibly surprising people when they crash from making modifications that could break preconditions, forcing them to acknowledge it by recreating a new NonEmpty or whatever from the result and force unwrapping it to indicate their intent. The minus is that it will seem horribly unnecessary for other mutating methods that will always succeed, like .append .

I don't think the lvalue option is a feasible one. I wrote the (x as Array<String>) out explicitly to make it clear that there was an upcast. Instead, if NonEmptyArray<String> was a subtype of Array<String>, the user will expect the compiler to do that upcast implicitly, no question about it.

var x: NonEmptyArray<String> = ["Hello"]
x.remove(at: 0)

If that fails to compile, how can one claim that NonEmptyArray<String> is indeed a subtype of Array<String>? It is not, because you aren't able to use a method of the super type.

1 Like

Think of it more like a variable which has been declared with let. You can't call mutating methods on it, but it is still of the type and can be passed to variables defined with var and mutated there. (Note: In our case, we could also define some mutating functions which are explicitly allowed if we want)

Make sense?