[Proposal idea] Support for pure functions

Andrew_Trick · January 14, 2016, 4:27am

I'm moving the discussion from "Proposal proposal: @pure keyword" here and jumping in this time.

1. Pure means that the function always return the same value given the same arguments, and has no side effects (it purely computes a result from its inputs), making it possible for the compiler, or a cache, to reuse the result from a previous call. This is the simplest definition, and it provide strong guaranties. Let's call that "strongly pure".

2. Pure just mean that the function has no access to global variables. It can only mutate "outside" things through inout parameters or pointers (including class references) passed to it by the caller. So in the general case you can't reuse the results. But you can use this function to mutate the state inside a strongly pure one. A strongly pure function in this case is one with no inout or pointer in the signature.

There are two independently useful proporties, which I choose to call @pure and @noglobals. (Those names haven't been bikeshedded). However, I don't think @pure needs to be as strong as was suggested. A pure function can mutate its copy of any values that it takes and return those to the caller. It cannot access shared state--that is, state the may be reachable via other values. Put simply, it can't access heap objects unless we can guarantee those objects are uniquely referenced or immutable.

These properties can and should be inferred by the compiler. However, I feel that at least @noglobals should be the default, if not @pure, so that published APIs permit future optimization. If they aren't accepted as default, then we should aggressively add annotations to stdlib entry points. The compiler can then infer purity in higher level user code.

Note that I'm looking at this mainly from an optimizer-hint perspective, and haven't seriously considered supporting computed lets.

Is a method impure if it uses self? I suppose it could be. I guess self is
an inout parameter. I presume an inout parameter is a known expected.
side-effect.

I think normal inout parameters, including self are still @pure (closure captures are not pure). inout arguments do not have pointer-like semantics and are guaranteed not to alias for the purpose of side-effect visibility. The inout arguments simply need to be treated as results. Naturally, a @pure call cannot be eliminated if the inout argument is used after the call. Similarly, two mutating pure method calls on the same struct are not redundant. These facts are obvious to the optimizer, so I don't see the issue.

The value of the inout argument can mutate locally within a @pure function. If that argument is a struct, then those struct members can mutate.

The argument values, whether inout or not, can be copied without losing purity. Incrementing reference counts should not be considered a side effect from the perspective of function purity. It's true that we have an isUniquelyReferenced() API, but there is a requirement on the user of this API to ensure identical program behavior regardless of the return value (it is purely an optimization for CoW implementations).

All that's good in theory, but there is a major detail that needs addressing. Memory allocation breaks the guaranties of a "strongly pure" function. For instance, if you return a newly allocated object, or a struct with a pointer to an object, the object is going to be a different one every time. That object is mutable memory, and returning a different chunk of mutable memory is quite different in semantics from returning the same one. If you want strong purity guaranties when returning objects (and thus be able to optimize by reusing the result from a previous call), there needs to be a way to return objects that have a language-enforced guaranty of immutability... same for structs that can have a pointer to an object or other memory. Without immutability guaranties, `@pure` has almost no optimization value.

I think broader support for immutability requires a separate proposal. I will say that we would like to optimize around calls that allocate objects but are otherwise @pure (they can't be marked @pure because they are not idempotent). We can probably get a lot of mileage out of marking them @noglobals. If we need to do better, we could introduce a @pure_with_alloc sort of attribute later.

I'm concerned that with this definition we won't be able to mark many APIs
as pure, even though they actually are pure. The issue is that this
definition disallows local mutation. Consider CollectionType.sort() -- the
way it is implemented is that it first copies the collection into an array,
and then sorts that array in-place. sortInPlace() isn't pure, but because
the mutation happens on local state, the whole operation is pure.

Array<T>.sortInPlace() should definitely be considered @pure. Of course, that won't be inferred from the rules above because it's implementation uses UnsafeMutableBufferPointer, but we will annotate it as such. Chris L. already mentioned that we need this escape hatch.

There is a more general problem with CoW data types. Simply reading an array element is superficially impure because it accesses array storage. We would work around this using the same "force pure" mechanism because we know the storage is uniquely referenced or immutable.

This leads me to a much bigger issue though. Consuming a generic value is not necessarilly @pure because of deinit(). In fact it isn't even @noglobals. So this single argument nop function, which doesn't even have an inout, is impure and may access globals:

func foo<T>(t: T) {}

To fix this we need to:

- Assert that freeing a non-ObjC Swift object has no side effects other than it's deinit().

- Introduce a type modifier that prevents any subclasses or protocol conformance from introducing an impure deinit().

Then we could write generic code that guarantees purity:

func @pure foo<T : PureType>(t: T) {}

We still have a problem because our core library routines need to work on all types and we aren't going to accept an explosion of purity in the API. It's particularly problematic if we want @pure or @noglobal to be default.

This seems like it will require polymorphic effects/attributes that can be derived from the generic type parameters. Generally, we want to be able to say that foo<T>(t: T) is pure whenever T is a "pure type" as I explained above. This idea could be extended to support declaring a function purity conditionally depending on the purity (or @noglobals property) of any closure arguments.

So, to conclude, I'm strongly in favor of defining @pure and @noglobals semantics, getting the defaults right, and annotating APIs from the outset. However, I don't have a compelling design proposal short of introducing a significant language feature. Suggested alternatives or proposals for the necessary language support are welcome.

AndyT

···

On Sat, Jan 9, 2016 at 11:01 PM, Michel Fortin via swift-evolution < swift-evolution at swift.org> wrote:
On 9 Jan 2016, at 8:04 PM, Andrew Bennett <cacoyi at gmail.com> wrote:
On Sat, Jan 9, 2016 at 11:01 PM, Michel Fortin via swift-evolution <swift-evolution at swift.org> wrote:
On Sat, Jan 9, 2016 at 9:29 PM, Dmitri Gribenko <gribozavr at gmail.com> wrote:

Andrew · January 14, 2016, 8:25am

Very interesting points Andy, I think I agree with pretty much everything
you've said.

Good point on `func foo<T>(t: T) {}` being impure without a type
constraint, deinit seems to have some interesting consequences.

I've been throwing around the idea of having this notation:

func myFunc<T,U>(value: T, apply: T -> U) @pure(apply) -> U

to indicate that myFunc is pure if 'apply' is also pure. Similar to
@rethrows.

If it also requires that A has a pure deinit, or further requirements, then
perhaps you could use "@pure(value,apply)". In your case you would annotate
it like this:

func foo<T>(t: T) @pure(t) {}

Then the compiler can make foo pure only if T's `deinit` is marked @pure.
The issue there is, as you say, if T can be subclassed then those need
@pure deinit. This could be solved if the annotation is always inherited,
likewise for overrides.

···

On Thu, Jan 14, 2016 at 3:27 PM, Andrew Trick <atrick@apple.com> wrote:

I'm moving the discussion from "Proposal proposal: @pure keyword" here and
jumping in this time.

On Sat, Jan 9, 2016 at 11:01 PM, Michel Fortin via swift-evolution < > swift-evolution at swift.org> wrote:
> 1. Pure means that the function always return the same value given the
same arguments, and has no side effects (it purely computes a result from
its inputs), making it possible for the compiler, or a cache, to reuse the
result from a previous call. This is the simplest definition, and it
provide strong guaranties. Let's call that "strongly pure".
>
> 2. Pure just mean that the function has no access to global variables.
It can only mutate "outside" things through inout parameters or pointers
(including class references) passed to it by the caller. So in the general
case you can't reuse the results. But you can use this function to mutate
the state inside a strongly pure one. A strongly pure function in this case
is one with no inout or pointer in the signature.

There are two independently useful proporties, which I choose to call
@pure and @noglobals. (Those names haven't been bikeshedded). However, I
don't think @pure needs to be as strong as was suggested. A pure function
can mutate its copy of any values that it takes and return those to the
caller. It cannot access shared state--that is, state the may be reachable
via other values. Put simply, it can't access heap objects unless we can
guarantee those objects are uniquely referenced or immutable.

These properties can and should be inferred by the compiler. However, I
feel that at least @noglobals should be the default, if not @pure, so that
published APIs permit future optimization. If they aren't accepted as
default, then we should aggressively add annotations to stdlib entry
points. The compiler can then infer purity in higher level user code.

Note that I'm looking at this mainly from an optimizer-hint perspective,
and haven't seriously considered supporting computed lets.

On 9 Jan 2016, at 8:04 PM, Andrew Bennett <cacoyi at gmail.com> wrote:
> Is a method impure if it uses self? I suppose it could be. I guess self
is
> an inout parameter. I presume an inout parameter is a known expected.
> side-effect.

I think normal inout parameters, including self are still @pure (closure
captures are not pure). inout arguments do not have pointer-like semantics
and are guaranteed not to alias for the purpose of side-effect visibility.
The inout arguments simply need to be treated as results. Naturally, a
@pure call cannot be eliminated if the inout argument is used after the
call. Similarly, two mutating pure method calls on the same struct are not
redundant. These facts are obvious to the optimizer, so I don't see the
issue.

The value of the inout argument can mutate locally within a @pure
function. If that argument is a struct, then those struct members can
mutate.

The argument values, whether inout or not, can be copied without losing
purity. Incrementing reference counts should not be considered a side
effect from the perspective of function purity. It's true that we have an
isUniquelyReferenced() API, but there is a requirement on the user of this
API to ensure identical program behavior regardless of the return value (it
is purely an optimization for CoW implementations).

On Sat, Jan 9, 2016 at 11:01 PM, Michel Fortin via swift-evolution > <swift-evolution at swift.org> wrote:
> All that's good in theory, but there is a major detail that needs
addressing. Memory allocation breaks the guaranties of a "strongly pure"
function. For instance, if you return a newly allocated object, or a struct
with a pointer to an object, the object is going to be a different one
every time. That object is mutable memory, and returning a different chunk
of mutable memory is quite different in semantics from returning the same
one. If you want strong purity guaranties when returning objects (and thus
be able to optimize by reusing the result from a previous call), there
needs to be a way to return objects that have a language-enforced guaranty
of immutability... same for structs that can have a pointer to an object or
other memory. Without immutability guaranties, `@pure` has almost no
optimization value.

I think broader support for immutability requires a separate proposal. I
will say that we would like to optimize around calls that allocate objects
but are otherwise @pure (they can't be marked @pure because they are not
idempotent). We can probably get a lot of mileage out of marking them
@noglobals. If we need to do better, we could introduce a @pure_with_alloc
sort of attribute later.

On Sat, Jan 9, 2016 at 9:29 PM, Dmitri Gribenko <gribozavr at gmail.com> > wrote:
> I'm concerned that with this definition we won't be able to mark many
APIs
> as pure, even though they actually are pure. The issue is that this
> definition disallows local mutation. Consider CollectionType.sort() --
the
> way it is implemented is that it first copies the collection into an
array,
> and then sorts that array in-place. sortInPlace() isn't pure, but
because
> the mutation happens on local state, the whole operation is pure.

Array<T>.sortInPlace() should definitely be considered @pure. Of course,
that won't be inferred from the rules above because it's implementation
uses UnsafeMutableBufferPointer, but we will annotate it as such. Chris L.
already mentioned that we need this escape hatch.

There is a more general problem with CoW data types. Simply reading an
array element is superficially impure because it accesses array storage. We
would work around this using the same "force pure" mechanism because we
know the storage is uniquely referenced or immutable.

This leads me to a much bigger issue though. Consuming a generic value is
not necessarilly @pure because of deinit(). In fact it isn't even
@noglobals. So this single argument nop function, which doesn't even have
an inout, is impure and may access globals:

func foo<T>(t: T) {}

To fix this we need to:

- Assert that freeing a non-ObjC Swift object has no side effects other
than it's deinit().

- Introduce a type modifier that prevents any subclasses or protocol
conformance from introducing an impure deinit().

Then we could write generic code that guarantees purity:

func @pure foo<T : PureType>(t: T) {}

We still have a problem because our core library routines need to work on
all types and we aren't going to accept an explosion of purity in the API.
It's particularly problematic if we want @pure or @noglobal to be default.

This seems like it will require polymorphic effects/attributes that can be
derived from the generic type parameters. Generally, we want to be able to
say that foo<T>(t: T) is pure whenever T is a "pure type" as I explained
above. This idea could be extended to support declaring a function purity
conditionally depending on the purity (or @noglobals property) of any
closure arguments.

So, to conclude, I'm strongly in favor of defining @pure and @noglobals
semantics, getting the defaults right, and annotating APIs from the outset.
However, I don't have a compelling design proposal short of introducing a
significant language feature. Suggested alternatives or proposals for the
necessary language support are welcome.

AndyT

michelf · January 14, 2016, 12:22pm

In D, for template functions, the pure attribute is deduced from the function's body. It'd be nice to have deduced purity so you don't have to figure out those conditional purity requirements for every function.

Deduced purity should probably not extend automatically to other resilience domains though, because it provides a guaranty that might not apply to future versions of that function. So you still need a way to annotate.

···

Le 14 janv. 2016 à 3:25, Andrew Bennett <cacoyi@gmail.com> a écrit :

Very interesting points Andy, I think I agree with pretty much everything you've said.

Good point on `func foo<T>(t: T) {}` being impure without a type constraint, deinit seems to have some interesting consequences.

I've been throwing around the idea of having this notation:
func myFunc<T,U>(value: T, apply: T -> U) @pure(apply) -> U
to indicate that myFunc is pure if 'apply' is also pure. Similar to @rethrows.

If it also requires that A has a pure deinit, or further requirements, then perhaps you could use "@pure(value,apply)". In your case you would annotate it like this:
func foo<T>(t: T) @pure(t) {}

Then the compiler can make foo pure only if T's `deinit` is marked @pure. The issue there is, as you say, if T can be subclassed then those need @pure deinit. This could be solved if the annotation is always inherited, likewise for overrides.

--
Michel Fortin
https://michelf.ca

Andrew_Trick · January 14, 2016, 5:53pm

That works for me as long as we can also annotate types to indicate that subclasses and conformances can’t introduce an impure deinit (or introduce class members with impure deinit).

Andy

···

On Jan 14, 2016, at 12:25 AM, Andrew Bennett <cacoyi@gmail.com> wrote:

Good point on `func foo<T>(t: T) {}` being impure without a type constraint, deinit seems to have some interesting consequences.

I've been throwing around the idea of having this notation:
func myFunc<T,U>(value: T, apply: T -> U) @pure(apply) -> U
to indicate that myFunc is pure if 'apply' is also pure. Similar to @rethrows.

If it also requires that A has a pure deinit, or further requirements, then perhaps you could use "@pure(value,apply)". In your case you would annotate it like this:
func foo<T>(t: T) @pure(t) {}

Then the compiler can make foo pure only if T's `deinit` is marked @pure. The issue there is, as you say, if T can be subclassed then those need @pure deinit. This could be solved if the annotation is always inherited, likewise for overrides.

Andrew_Trick · January 14, 2016, 5:56pm

That’s right. This proposal is really about
- introducing API support so we can optimize across resilience
- allowing annotation of low level routines, like sortInPlace, that the compiler can’t figure out

Andy

···

On Jan 14, 2016, at 4:22 AM, Michel Fortin <michel.fortin@michelf.ca> wrote:

In D, for template functions, the pure attribute is deduced from the function's body. It'd be nice to have deduced purity so you don't have to figure out those conditional purity requirements for every function.

Deduced purity should probably not extend automatically to other resilience domains though, because it provides a guaranty that might not apply to future versions of that function. So you still need a way to annotate.

Joe_Groff · January 14, 2016, 6:53pm

Deducing purity from function bodies is compile-time expensive and brittle, and impossible across resilience boundaries. It wouldn't work well with our non-instantiation-based generics model.

-Joe

···

On Jan 14, 2016, at 4:22 AM, Michel Fortin via swift-evolution <swift-evolution@swift.org> wrote:

Le 14 janv. 2016 à 3:25, Andrew Bennett <cacoyi@gmail.com> a écrit :

Very interesting points Andy, I think I agree with pretty much everything you've said.

Good point on `func foo<T>(t: T) {}` being impure without a type constraint, deinit seems to have some interesting consequences.

I've been throwing around the idea of having this notation:
func myFunc<T,U>(value: T, apply: T -> U) @pure(apply) -> U
to indicate that myFunc is pure if 'apply' is also pure. Similar to @rethrows.

If it also requires that A has a pure deinit, or further requirements, then perhaps you could use "@pure(value,apply)". In your case you would annotate it like this:
func foo<T>(t: T) @pure(t) {}

Then the compiler can make foo pure only if T's `deinit` is marked @pure. The issue there is, as you say, if T can be subclassed then those need @pure deinit. This could be solved if the annotation is always inherited, likewise for overrides.

In D, for template functions, the pure attribute is deduced from the function's body. It'd be nice to have deduced purity so you don't have to figure out those conditional purity requirements for every function.

Deduced purity should probably not extend automatically to other resilience domains though, because it provides a guaranty that might not apply to future versions of that function. So you still need a way to annotate.

David_Sweeris · January 15, 2016, 12:07am

Pure functions, as a concept, get +3245 from because, among other reasons, I *think* it’d help with auto-parallelization, which is of course the greatest computational problem of our time (besides a lack of native CPU support for arbitrary precision math). If not, then merely +1… I simply do not believe it wouldn’t eventually be useful for something (manual parallelization comes to mind), and the sooner we get language support for deep concepts like this, the less code will have to be recompiled if it breaks something (and somehow gets accepted anyway).

At the moment, the only other strong opinion I have on the matter is that only going with “auto-detect” is a bad idea, because then there’s no clear way to tell why you can’t pass your “I-thought-it-was-pure-but-it-really-isn’t” function to something that requires a function that actually is pure. I don’t mind the compiler auto-detecting purity (or anything, really) in addition to manual annotations, if it can, but I’ll always want a way to override it.

- Dave Sweeris

···

On Jan 14, 2016, at 04:22, Michel Fortin via swift-evolution <swift-evolution@swift.org> wrote:

In D, for template functions, the pure attribute is deduced from the function's body. It'd be nice to have deduced purity so you don't have to figure out those conditional purity requirements for every function.

Deduced purity should probably not extend automatically to other resilience domains though, because it provides a guaranty that might not apply to future versions of that function. So you still need a way to annotate.