Reducing array abstraction

This question is somewhere between swift-dev and swift-users, not sure where best to post this.

I’m working on a project that wants to get very low-abstraction penalty array operations, particularly with varargs. Given the currently language limitations (no fixed size arrays), the best I’ve been able to come up with is something like this, where “lowlevelAPI” contains the heavy lifting (and is assumed to be opaque), and the “safeAPI” wrappers exist merely to provide a convenient safe wrapper for clients:

array_abstraction.swift (412 Bytes)

This question is somewhere between swift-dev and swift-users, not sure where best to post this.

I’m working on a project that wants to get very low-abstraction penalty array operations, particularly with varargs.

It’s awesome to hear you’re still using Swift ;-)

Given the currently language limitations (no fixed size arrays), the best I’ve been able to come up with is something like this, where “lowlevelAPI” contains the heavy lifting (and is assumed to be opaque), and the “safeAPI” wrappers exist merely to provide a convenient safe wrapper for clients:

<array_abstraction.swift>

Given whole module optimization of the program, we should in principle, be able to optimize this down to the equivalent of an LLVM array alloca in the clients, a few stores to it, then passing the pointers to the LLVM alloca into lowlevelAPI. However, Swift is not doing this, not even with:

$ swiftc array_abstraction.swift -emit-sil -o - -O

In this trivial case (with constant initializers) it does do the “array outlining” optimization, but this is no where near the abstraction level it could be.

Is there a better way to achieve this, and if not, is there any planned work (perhaps aligned with the ABI stability efforts) to improve vararg array performance to be able to avoid heap abstractions? Any individual call to a vararg array is a known length, so it is a perfect candidate for a stack allocation.

In fact I filed an internal radar about this a while ago, but the discussion didn’t go anywhere because we had bigger fish to fry. I think the main difficulty is that we would need some kind of “copy-on-escape” sequence type. Iterating over the vararg inside the callee should be efficient, but if I then take the value and store it somewhere else, it would need to copy the buffer from the stack to the heap.

Also there were source stability implications because people assume the vararg argument inside the function body is literally a Swift.Array.

Perhaps the work being done on ownership can help here though. In the ownership manifesto, John describes a ‘shared’ attribute on function parameters. What if I declare my vararg as shared:

func takesArgs(_ xs: shared Int…)

This means the caller is passing along a borrowed value, not the value itself. With today’s vararg implementation this would just pass the Array<Int> at +0, but perhaps this can be the extension point where we introduce the stack-allocated varargs that you propose, and say that a ‘shared’ vararg argument actually has a more efficient representation. This allows it to be introduced additively without breaking ABI or source compatibility.

What do you think?

Slava

···

On Oct 6, 2017, at 11:06 PM, Chris Lattner via swift-dev <swift-dev@swift.org> wrote:

-Chris

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

This question is somewhere between swift-dev and swift-users, not sure where best to post this.

I’m working on a project that wants to get very low-abstraction penalty array operations, particularly with varargs. Given the currently language limitations (no fixed size arrays), the best I’ve been able to come up with is something like this, where “lowlevelAPI” contains the heavy lifting (and is assumed to be opaque), and the “safeAPI” wrappers exist merely to provide a convenient safe wrapper for clients:

<array_abstraction.swift>

Given whole module optimization of the program, we should in principle, be able to optimize this down to the equivalent of an LLVM array alloca in the clients, a few stores to it, then passing the pointers to the LLVM alloca into lowlevelAPI. However, Swift is not doing this, not even with:

$ swiftc array_abstraction.swift -emit-sil -o - -O

In this trivial case (with constant initializers) it does do the “array outlining” optimization,

What do you mean by the array outlining optimization specifically?

We definitely already have a heap->stack for classes in the guise of the StackPromotion optimization is that what you are talking about with the "array outlining" optimization? (outlining to me is referring to specifically code outlining). IIRC Erik (+CC) do special work to make it work for fixed size array. I would ask why that optimization is not kicking in for varargs. Perhaps, we could add a special recognition that the given array will not escape through a varargs? Or provide some way of saying, trust me this doesn't escape.

In terms of what Slava was talking about with copy-on-escape. That can be implemented (assuming a sane runtime ; p) by initializing any COW data structure with a count of 2. Then you are guaranteed to know that any write use or escape from the COW data structure will cause a copy. Once we have guaranteed, this comes for free since any guaranteed parameter must be copied before a mutable use.

I do think that you will run into issues with escaping around C APIs though.

···

On Oct 6, 2017, at 11:06 PM, Chris Lattner via swift-dev <swift-dev@swift.org> wrote:

but this is no where near the abstraction level it could be.

Is there a better way to achieve this, and if not, is there any planned work (perhaps aligned with the ABI stability efforts) to improve vararg array performance to be able to avoid heap abstractions? Any individual call to a vararg array is a known length, so it is a perfect candidate for a stack allocation.

-Chris

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

This question is somewhere between swift-dev and swift-users, not sure where best to post this.

I’m working on a project that wants to get very low-abstraction penalty array operations, particularly with varargs.

It’s awesome to hear you’re still using Swift ;-)

Indeed, it is nice to get to use it as well as hack on it, instead of largely just hacking on it :-). OTOH, this makes me as impatient as everyone else to get conditional conformances, recursive conformances, varargs splat, and so many other things. :) :)

Is there a better way to achieve this, and if not, is there any planned work (perhaps aligned with the ABI stability efforts) to improve vararg array performance to be able to avoid heap abstractions? Any individual call to a vararg array is a known length, so it is a perfect candidate for a stack allocation.

In fact I filed an internal radar about this a while ago, but the discussion didn’t go anywhere because we had bigger fish to fry.

I think I confused things in my email. The long term points that you mention are important (and I’ll comment on them below), but independently of the representation of varargs, there is a generally useful optimization that this code should be able to benefit from: when converting a statically knowable fixed sized array to an UnsafeBufferPointer (or UnsafePointer) for a call, we can get rid of the heap traffic in the client. This is generally useful e.g. when calling into random C APIs.

I’m curious to know if anyone has thought about that or plans to work on it, because it would help for a lot of kinds of code, and is related to the existing “outlining” optimization. Coming back to varargs though:

I think the main difficulty is that we would need some kind of “copy-on-escape” sequence type. Iterating over the vararg inside the callee should be efficient, but if I then take the value and store it somewhere else, it would need to copy the buffer from the stack to the heap.
Also there were source stability implications because people assume the vararg argument inside the function body is literally a Swift.Array.
Perhaps the work being done on ownership can help here though. In the ownership manifesto, John describes a ‘shared’ attribute on function parameters.

Right. If we ignore source compatibility for the moment, it seems clear that the best callee side representation for varargs is a borrowing array slice (like llvm::ArrayRef or UnsafeBufferPointer). This provides the essential abstraction needed by varargs without forcing an onerous representation on the caller, and without implying any reference counting operations. This would allow passing an Array, an ArraySlice, a StaticArray, or anything else with contiguous elements.

What if I declare my vararg as shared:

func takesArgs(_ xs: shared Int…)

This seems like a great way to address the source compatibility issue: a normal varargs argument could be passed as an Array<T> for compatibility with existing source, and a “borrowed Int…” could be passed more efficiently, allowing folks to opt into that.

This means the caller is passing along a borrowed value, not the value itself. With today’s vararg implementation this would just pass the Array<Int> at +0, but perhaps this can be the extension point where we introduce the stack-allocated varargs that you propose, and say that a ‘shared’ vararg argument actually has a more efficient representation. This allows it to be introduced additively without breaking ABI or source compatibility.

What do you think?

This would be totally awesome. How much do you think that people depend on varargs being an escapable array inside the body of functions? It would be much cleaner to just make them nonescaping slices all the time.

-Chris

···

On Oct 6, 2017, at 11:12 PM, Slava Pestov <spestov@apple.com> wrote:

On Oct 6, 2017, at 11:06 PM, Chris Lattner via swift-dev <swift-dev@swift.org> wrote:

This question is somewhere between swift-dev and swift-users, not sure where best to post this.

I’m working on a project that wants to get very low-abstraction penalty array operations, particularly with varargs. Given the currently language limitations (no fixed size arrays), the best I’ve been able to come up with is something like this, where “lowlevelAPI” contains the heavy lifting (and is assumed to be opaque), and the “safeAPI” wrappers exist merely to provide a convenient safe wrapper for clients:

<array_abstraction.swift>

Given whole module optimization of the program, we should in principle, be able to optimize this down to the equivalent of an LLVM array alloca in the clients, a few stores to it, then passing the pointers to the LLVM alloca into lowlevelAPI. However, Swift is not doing this, not even with:

$ swiftc array_abstraction.swift -emit-sil -o - -O

In this trivial case (with constant initializers) it does do the “array outlining” optimization,

What do you mean by the array outlining optimization specifically?

We definitely already have a heap->stack for classes in the guise of the StackPromotion optimization is that what you are talking about with the "array outlining" optimization? (outlining to me is referring to specifically code outlining). IIRC Erik (+CC) do special work to make it work for fixed size array. I would ask why that optimization is not kicking in for varargs. Perhaps, we could add a special recognition that the given array will not escape through a varargs? Or provide some way of saying, trust me this doesn't escape.

*We definitely already have a heap->stack for classes in the guise of the StackPromotion optimization. Is that the sort of thing that you are talking about when you say the "array outlining" optimization? (outlining to me is referring to specifically code outlining). IIRC Erik (+CC) did special work to make heap->stack work for fixed size Swift.Array that do not escape. But my memory may be incorrect. If my memory is correct, I would ask why that optimization is not kicking in for varargs. I imagine it is an issue around the optimization assuming that the array escapes. Perhaps, we could add a special recognition that the given array will not escape through a varargs? Or provide some way of saying, trust me this doesn't escape? I imagine that a special case would work here since IIRC we can know in the optimizer via the AST that we are dealing with a varargs parameter. Things sort of follow from there.

(Sorry for the half written reply. Needed Coffee).

···

On Oct 8, 2017, at 11:57 AM, Michael Gottesman via swift-dev <swift-dev@swift.org> wrote:

On Oct 6, 2017, at 11:06 PM, Chris Lattner via swift-dev <swift-dev@swift.org> wrote:

In terms of what Slava was talking about with copy-on-escape. That can be implemented (assuming a sane runtime ; p) by initializing any COW data structure with a count of 2. Then you are guaranteed to know that any write use or escape from the COW data structure will cause a copy. Once we have guaranteed, this comes for free since any guaranteed parameter must be copied before a mutable use.

I do think that you will run into issues with escaping around C APIs though.

but this is no where near the abstraction level it could be.

Is there a better way to achieve this, and if not, is there any planned work (perhaps aligned with the ABI stability efforts) to improve vararg array performance to be able to avoid heap abstractions? Any individual call to a vararg array is a known length, so it is a perfect candidate for a stack allocation.

-Chris

_______________________________________________
swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev

_______________________________________________
swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev

This question is somewhere between swift-dev and swift-users, not sure where best to post this.

I’m working on a project that wants to get very low-abstraction penalty array operations, particularly with varargs. Given the currently language limitations (no fixed size arrays), the best I’ve been able to come up with is something like this, where “lowlevelAPI” contains the heavy lifting (and is assumed to be opaque), and the “safeAPI” wrappers exist merely to provide a convenient safe wrapper for clients:

<array_abstraction.swift>

Given whole module optimization of the program, we should in principle, be able to optimize this down to the equivalent of an LLVM array alloca in the clients, a few stores to it, then passing the pointers to the LLVM alloca into lowlevelAPI. However, Swift is not doing this, not even with:

$ swiftc array_abstraction.swift -emit-sil -o - -O

In this trivial case (with constant initializers) it does do the “array outlining” optimization,

What do you mean by the array outlining optimization specifically?

I mean the "Outline global variable” thing that IPO/GlobalOpt.cpp does. I have no idea why it is called that.

We definitely already have a heap->stack for classes in the guise of the StackPromotion optimization is that what you are talking about with the "array outlining" optimization? (outlining to me is referring to specifically code outlining). IIRC Erik (+CC) do special work to make it work for fixed size array. I would ask why that optimization is not kicking in for varargs. Perhaps, we could add a special recognition that the given array will not escape through a varargs? Or provide some way of saying, trust me this doesn't escape.

This is actually pretty straight-forward to fit into the existing optimizer, and I’m working on a proto patch for it in my spare time this weekend. The problem is that it will only work on non-Apple systems because of the way that BridgeSupport model’s the ObjC interop goop. IMO, SIL and the stdlib have the wrong division of labor here. I’ll follow up with something more concrete when I have a chance.

In terms of what Slava was talking about with copy-on-escape. That can be implemented (assuming a sane runtime ; p) by initializing any COW data structure with a count of 2. Then you are guaranteed to know that any write use or escape from the COW data structure will cause a copy. Once we have guaranteed, this comes for free since any guaranteed parameter must be copied before a mutable use.

I do think that you will run into issues with escaping around C APIs though.

C APIs cannot escape an array unless they have access to the refcount. Beyond the SIL representational problem with bridging, what I’m getting at seems like a straight-forward extension to ArrayElementValuePropagation.cpp.

-Chris

···

On Oct 8, 2017, at 11:57 AM, Michael Gottesman via swift-dev <swift-dev@swift.org> wrote:

On Oct 6, 2017, at 11:06 PM, Chris Lattner via swift-dev <swift-dev@swift.org> wrote:

This question is somewhere between swift-dev and swift-users, not sure where best to post this.

I’m working on a project that wants to get very low-abstraction penalty array operations, particularly with varargs. Given the currently language limitations (no fixed size arrays), the best I’ve been able to come up with is something like this, where “lowlevelAPI” contains the heavy lifting (and is assumed to be opaque), and the “safeAPI” wrappers exist merely to provide a convenient safe wrapper for clients:

<array_abstraction.swift>

Given whole module optimization of the program, we should in principle, be able to optimize this down to the equivalent of an LLVM array alloca in the clients, a few stores to it, then passing the pointers to the LLVM alloca into lowlevelAPI. However, Swift is not doing this, not even with:

$ swiftc array_abstraction.swift -emit-sil -o - -O

In this trivial case (with constant initializers) it does do the “array outlining” optimization,

What do you mean by the array outlining optimization specifically?

I mean the "Outline global variable” thing that IPO/GlobalOpt.cpp does. I have no idea why it is called that.

We definitely already have a heap->stack for classes in the guise of the StackPromotion optimization is that what you are talking about with the "array outlining" optimization? (outlining to me is referring to specifically code outlining). IIRC Erik (+CC) do special work to make it work for fixed size array. I would ask why that optimization is not kicking in for varargs. Perhaps, we could add a special recognition that the given array will not escape through a varargs? Or provide some way of saying, trust me this doesn't escape.

We already do heap->stack promotion for array buffers. This is done in the StackPromotion pass. It uses interprocedural escape analysis to check if it can be done. So if the callee is known it should work with the varargs array. BTW it also works in your example, in testAPI():

%1 = alloc_ref [stack] [tail_elems $Int * %0 : $Builtin.Word] $_ContiguousArrayStorage<Int>

But it’s not zero cost, because we initialize the metadata pointer + refcount in a runtime function. Also we do ref-count operations on the array in the callee (this will improve as soon as we have a ref count representation for immortal objects).

···

On Oct 8, 2017, at 3:14 PM, Chris Lattner <clattner@nondot.org> wrote:

On Oct 8, 2017, at 11:57 AM, Michael Gottesman via swift-dev <swift-dev@swift.org> wrote:

On Oct 6, 2017, at 11:06 PM, Chris Lattner via swift-dev <swift-dev@swift.org> wrote:

This is actually pretty straight-forward to fit into the existing optimizer, and I’m working on a proto patch for it in my spare time this weekend. The problem is that it will only work on non-Apple systems because of the way that BridgeSupport model’s the ObjC interop goop. IMO, SIL and the stdlib have the wrong division of labor here. I’ll follow up with something more concrete when I have a chance.

In terms of what Slava was talking about with copy-on-escape. That can be implemented (assuming a sane runtime ; p) by initializing any COW data structure with a count of 2. Then you are guaranteed to know that any write use or escape from the COW data structure will cause a copy. Once we have guaranteed, this comes for free since any guaranteed parameter must be copied before a mutable use.

I do think that you will run into issues with escaping around C APIs though.

C APIs cannot escape an array unless they have access to the refcount. Beyond the SIL representational problem with bridging, what I’m getting at seems like a straight-forward extension to ArrayElementValuePropagation.cpp.

-Chris

I talked about this with Joe today and he came up with an excellent idea.

We could still change the ABI for varargs so that at the SIL level, the caller passes an UnsafePointer<T> to a stack allocated buffer together with an element count, instead of a reference to the array. Then, as a transitional step, we could have the callee just always first box the varargs into a heap-allocated Swift.Array on entry into the function.

This has the advantage that it gives us a better ABI going forward without impacting source compatibility. Then we can design “tear-off arrays” or whatever we want to do there later, and stage the new behavior in with a new keyword, -swift-version flag or some implementation that maintains perfect backward compatibility with the current semantics.

Also, even without the language change, constructing the vararg array inside the callee will theoretically allow the optimizer to optimize it better, eliminating the heap allocation or performing scalar replacement on array elements.

What do you think? It would be a great project for an external contributor who is already familiar with the code base ;-)

Slava

···

On Oct 7, 2017, at 2:01 PM, Chris Lattner <clattner@nondot.org> wrote:

Right. If we ignore source compatibility for the moment, it seems clear that the best callee side representation for varargs is a borrowing array slice (like llvm::ArrayRef or UnsafeBufferPointer). This provides the essential abstraction needed by varargs without forcing an onerous representation on the caller, and without implying any reference counting operations. This would allow passing an Array, an ArraySlice, a StaticArray, or anything else with contiguous elements.

What if I declare my vararg as shared:

func takesArgs(_ xs: shared Int…)

This seems like a great way to address the source compatibility issue: a normal varargs argument could be passed as an Array<T> for compatibility with existing source, and a “borrowed Int…” could be passed more efficiently, allowing folks to opt into that.

This is fantastic, I didn’t notice the [stack] marker there! I’m thrilled you’re on top of this. It looks like you have a well-developed framework here, which I haven’t fully ingested. As such, I have a few questions for you:

1) With a debug build of the stdlib, the “isNative” and other sanity checks in the stdlib create a lot of bridge_object_to_word -> zext to i64 -> and-with-magic-arch-specific-constant checks to validate that the array is properly in a native state. Have you considered introducing a “bridge_object_classify” SIL instruction which translates a bridge object value into a tuple of bits (isNative, isObjC, etc) + unbitmangled value? It seems super weird to me that there are a bunch of architecture-specific magic constants in stdlib/public/core/Builtin.swift. It seems that these should be encapsulated into a SIL instruction, which would then allow the SIL optimizer to eliminate these checks.

Particularly with a debug stdlib, it is super common to see an “array.uninitialized” call which then gets bitwise queried through a super-verbose series of SIL instructions that are opaque to the SIL optimizer (because they are arch specific). Encapsulating this into SIL seems like a good call both from putting the arch-specific bits into the SIL layer which can expand it in IRGen (if it survives that long) and also optimize it away in the common case when it is knowable if these bits are 0 or 1 (as when coming from an array literal or other common known-native construct).

2) You have an apparently great infra for escape analysis of arrays. Have you considered adding one more check? If the only uses of the reference counted object is a single release in the current function (common when passing array literals to C functions) then you can eliminating the heap allocation AND all the refcounting instructions and use a simple alloc_stack (of an array of tuples?) and put the dealloc stack where the single release is. This seems like it would be a particularly bit win for the simple case of things like:

  memcpy(dest, [1,2,3,4], sizeof(Int)*4)

among others. Because the array literal would turn into the C-equivalent construct with zero overhead.

3) There are other optimizations like ArrayElementValuePropagation, which don’t handle the BridgeObjectToWordInst or AddressToPointer or UncheckedRefCastInst, and thus give up. This causes a number of limitations (though I have no idea if they are important in practice or not): since conversions to UnsafePointer cannot mutate the array, they should not prevent element optimizations, count propagation optimizations, etc.

In any case, I’m super-thrilled you and your team have been pushing this forward, since arrays are really fundamental to all swift programs. I’m very much looking forward to the time when we can pass borrowed array slices down the stack with zero overhead.

-Chris

···

On Oct 8, 2017, at 3:30 PM, Erik Eckstein <eeckstein@apple.com> wrote:

We definitely already have a heap->stack for classes in the guise of the StackPromotion optimization is that what you are talking about with the "array outlining" optimization? (outlining to me is referring to specifically code outlining). IIRC Erik (+CC) do special work to make it work for fixed size array. I would ask why that optimization is not kicking in for varargs. Perhaps, we could add a special recognition that the given array will not escape through a varargs? Or provide some way of saying, trust me this doesn't escape.

We already do heap->stack promotion for array buffers. This is done in the StackPromotion pass. It uses interprocedural escape analysis to check if it can be done. So if the callee is known it should work with the varargs array. BTW it also works in your example, in testAPI():

%1 = alloc_ref [stack] [tail_elems $Int * %0 : $Builtin.Word] $_ContiguousArrayStorage<Int>

But it’s not zero cost, because we initialize the metadata pointer + refcount in a runtime function. Also we do ref-count operations on the array in the callee (this will improve as soon as we have a ref count representation for immortal objects).

Right. If we ignore source compatibility for the moment, it seems clear that the best callee side representation for varargs is a borrowing array slice (like llvm::ArrayRef or UnsafeBufferPointer). This provides the essential abstraction needed by varargs without forcing an onerous representation on the caller, and without implying any reference counting operations. This would allow passing an Array, an ArraySlice, a StaticArray, or anything else with contiguous elements.

What if I declare my vararg as shared:

func takesArgs(_ xs: shared Int…)

This seems like a great way to address the source compatibility issue: a normal varargs argument could be passed as an Array<T> for compatibility with existing source, and a “borrowed Int…” could be passed more efficiently, allowing folks to opt into that.

I talked about this with Joe today and he came up with an excellent idea.

We could still change the ABI for varargs so that at the SIL level, the caller passes an UnsafePointer<T> to a stack allocated buffer together with an element count, instead of a reference to the array. Then, as a transitional step, we could have the callee just always first box the varargs into a heap-allocated Swift.Array on entry into the function.

That makes perfect sense to me. What are the tradeoffs vs passing an UnsafeBufferPointer?

This has the advantage that it gives us a better ABI going forward without impacting source compatibility. Then we can design “tear-off arrays” or whatever we want to do there later, and stage the new behavior in with a new keyword, -swift-version flag or some implementation that maintains perfect backward compatibility with the current semantics.

+1. I think the common case is to pass the array down the stack or immediately consume it (e.g. through a foreach loop), so the array should never have to be materialized in a the common case.

Also, even without the language change, constructing the vararg array inside the callee will theoretically allow the optimizer to optimize it better, eliminating the heap allocation or performing scalar replacement on array elements.

Yep.

What do you think? It would be a great project for an external contributor who is already familiar with the code base ;-)

That would be great. I don’t expect to have time to do that myself, but I’d love to see it! :-)

-Chris

···

On Oct 9, 2017, at 9:55 PM, Slava Pestov via swift-dev <swift-dev@swift.org> wrote:

On Oct 7, 2017, at 2:01 PM, Chris Lattner <clattner@nondot.org> wrote:

We definitely already have a heap->stack for classes in the guise of the StackPromotion optimization is that what you are talking about with the "array outlining" optimization? (outlining to me is referring to specifically code outlining). IIRC Erik (+CC) do special work to make it work for fixed size array. I would ask why that optimization is not kicking in for varargs. Perhaps, we could add a special recognition that the given array will not escape through a varargs? Or provide some way of saying, trust me this doesn't escape.

We already do heap->stack promotion for array buffers. This is done in the StackPromotion pass. It uses interprocedural escape analysis to check if it can be done. So if the callee is known it should work with the varargs array. BTW it also works in your example, in testAPI():

%1 = alloc_ref [stack] [tail_elems $Int * %0 : $Builtin.Word] $_ContiguousArrayStorage<Int>

But it’s not zero cost, because we initialize the metadata pointer + refcount in a runtime function. Also we do ref-count operations on the array in the callee (this will improve as soon as we have a ref count representation for immortal objects).

This is fantastic, I didn’t notice the [stack] marker there! I’m thrilled you’re on top of this. It looks like you have a well-developed framework here, which I haven’t fully ingested. As such, I have a few questions for you:

1) With a debug build of the stdlib, the “isNative” and other sanity checks in the stdlib create a lot of bridge_object_to_word -> zext to i64 -> and-with-magic-arch-specific-constant checks to validate that the array is properly in a native state. Have you considered introducing a “bridge_object_classify” SIL instruction which translates a bridge object value into a tuple of bits (isNative, isObjC, etc) + unbitmangled value? It seems super weird to me that there are a bunch of architecture-specific magic constants in stdlib/public/core/Builtin.swift. It seems that these should be encapsulated into a SIL instruction, which would then allow the SIL optimizer to eliminate these checks.

Particularly with a debug stdlib, it is super common to see an “array.uninitialized” call which then gets bitwise queried through a super-verbose series of SIL instructions that are opaque to the SIL optimizer (because they are arch specific). Encapsulating this into SIL seems like a good call both from putting the arch-specific bits into the SIL layer which can expand it in IRGen (if it survives that long) and also optimize it away in the common case when it is knowable if these bits are 0 or 1 (as when coming from an array literal or other common known-native construct).

2) You have an apparently great infra for escape analysis of arrays. Have you considered adding one more check? If the only uses of the reference counted object is a single release in the current function (common when passing array literals to C functions) then you can eliminating the heap allocation AND all the refcounting instructions and use a simple alloc_stack (of an array of tuples?) and put the dealloc stack where the single release is. This seems like it would be a particularly bit win for the simple case of things like:

  memcpy(dest, [1,2,3,4], sizeof(Int)*4)

among others. Because the array literal would turn into the C-equivalent construct with zero overhead.

Sounds like the ReleaseDevirtualizer

3) There are other optimizations like ArrayElementValuePropagation, which don’t handle the BridgeObjectToWordInst or AddressToPointer or UncheckedRefCastInst, and thus give up. This causes a number of limitations (though I have no idea if they are important in practice or not): since conversions to UnsafePointer cannot mutate the array, they should not prevent element optimizations, count propagation optimizations, etc.

1) and 3)
There are some discussions ongoing regarding array bridging. So things may change anyway.

In any case, I’m super-thrilled you and your team have been pushing this forward, since arrays are really fundamental to all swift programs.

Thanks :-)

  I’m very much looking forward to the time when we can pass borrowed array slices down the stack with zero overhead.

Me, too!

···

On Oct 9, 2017, at 9:46 PM, Chris Lattner <clattner@nondot.org> wrote:
On Oct 8, 2017, at 3:30 PM, Erik Eckstein <eeckstein@apple.com <mailto:eeckstein@apple.com>> wrote:

-Chris

We definitely already have a heap->stack for classes in the guise of the StackPromotion optimization is that what you are talking about with the "array outlining" optimization? (outlining to me is referring to specifically code outlining). IIRC Erik (+CC) do special work to make it work for fixed size array. I would ask why that optimization is not kicking in for varargs. Perhaps, we could add a special recognition that the given array will not escape through a varargs? Or provide some way of saying, trust me this doesn't escape.

We already do heap->stack promotion for array buffers. This is done in the StackPromotion pass. It uses interprocedural escape analysis to check if it can be done. So if the callee is known it should work with the varargs array. BTW it also works in your example, in testAPI():

%1 = alloc_ref [stack] [tail_elems $Int * %0 : $Builtin.Word] $_ContiguousArrayStorage<Int>

But it’s not zero cost, because we initialize the metadata pointer + refcount in a runtime function. Also we do ref-count operations on the array in the callee (this will improve as soon as we have a ref count representation for immortal objects).

This is fantastic, I didn’t notice the [stack] marker there! I’m thrilled you’re on top of this. It looks like you have a well-developed framework here, which I haven’t fully ingested. As such, I have a few questions for you:

1) With a debug build of the stdlib, the “isNative” and other sanity checks in the stdlib create a lot of bridge_object_to_word -> zext to i64 -> and-with-magic-arch-specific-constant checks to validate that the array is properly in a native state. Have you considered introducing a “bridge_object_classify” SIL instruction which translates a bridge object value into a tuple of bits (isNative, isObjC, etc) + unbitmangled value? It seems super weird to me that there are a bunch of architecture-specific magic constants in stdlib/public/core/Builtin.swift. It seems that these should be encapsulated into a SIL instruction, which would then allow the SIL optimizer to eliminate these checks.

Particularly with a debug stdlib, it is super common to see an “array.uninitialized” call which then gets bitwise queried through a super-verbose series of SIL instructions that are opaque to the SIL optimizer (because they are arch specific). Encapsulating this into SIL seems like a good call both from putting the arch-specific bits into the SIL layer which can expand it in IRGen (if it survives that long) and also optimize it away in the common case when it is knowable if these bits are 0 or 1 (as when coming from an array literal or other common known-native construct).

2) You have an apparently great infra for escape analysis of arrays. Have you considered adding one more check? If the only uses of the reference counted object is a single release in the current function (common when passing array literals to C functions) then you can eliminating the heap allocation AND all the refcounting instructions and use a simple alloc_stack (of an array of tuples?) and put the dealloc stack where the single release is. This seems like it would be a particularly bit win for the simple case of things like:

  memcpy(dest, [1,2,3,4], sizeof(Int)*4)

among others. Because the array literal would turn into the C-equivalent construct with zero overhead.

Sounds like the ReleaseDevirtualizer

Perhaps similar, but it really isn’t. I’m specifically talking about replacing the alloc_ref [stack] with an alloc_stack.

3) There are other optimizations like ArrayElementValuePropagation, which don’t handle the BridgeObjectToWordInst or AddressToPointer or UncheckedRefCastInst, and thus give up. This causes a number of limitations (though I have no idea if they are important in practice or not): since conversions to UnsafePointer cannot mutate the array, they should not prevent element optimizations, count propagation optimizations, etc.

1) and 3)
There are some discussions ongoing regarding array bridging. So things may change anyway.

Ok, sounds promising but also mysterious. Can you elaborate?

-Chris

···

On Oct 10, 2017, at 9:50 AM, Erik Eckstein <eeckstein@apple.com> wrote:

On Oct 9, 2017, at 9:46 PM, Chris Lattner <clattner@nondot.org <mailto:clattner@nondot.org>> wrote:
On Oct 8, 2017, at 3:30 PM, Erik Eckstein <eeckstein@apple.com <mailto:eeckstein@apple.com>> wrote: