swift_tailRealloc

Michael_Gottesman · May 22, 2018, 5:29pm

@johannesweiss and I have been talking about creating a swift_reallocTail builtin for performance sensitive systems applications that want to use the system realloc as much as possible to reduce the amount of actual reallocations that occur. For your consideration/feedback:

NOTE Feedback is welcome from all, but I would like to ping specifically: @rjmccall, @jckarter, @atrick, @eeckstein. Also @jordanrose would be useful for naming purposes ; ).

The main challenge in implementing a generic version of swift_reallocTail that can work for any swift type is namely that the system realloc may move values without any way for the caller
from the caller. Luckily we have that exact predicate in the value witness table!? Basically we would implement in the stdlibCore a simple function called tailRealloc that would expand inline to (most likely via a SILGen builtin):

public func swift_tailRealloc<T>(x : T) {
    if (valueWitnessTable<T>.isBitwiseTakeable()) {
        realloc(x)
        return
    }
    // ... otherwise alloc new memory and move values over one by one ...
    swift_tailAllocRuntimeFunc(x)
}

After full specialization, this should reduce to a single function call in all cases. In the case of a generic context, we will still be able to go down the bitwise takeable path and potentially save a ton of compile time by not needing to allocate more memory.

swift_tailAllocRuntimeFunction would necessarily be some sort of function implemented in the runtime in C++ or be a resilient swift function. It needs to be a future proof way to handle any possible future realloc scenario. We could just do a new allocation + witness table moves + destroy of old. That would preserve AFAIKT current behavior in the stdlib, while giving us extra performance in other cases.

Thoughts, objections, and or questions?

EDIT: Answers to some common questions:

I am assuming that we have uniqueness before this builtin is invoked.
Since this is a Builtin, I am specifically /not/ talking about public API for this. Just implementing the underlying functionality.
I am assuming the ability to determine that the contents of a class are all bitwise takable in a way that is similar to what we do for structs.

Michael_Gottesman · May 22, 2018, 5:32pm

Meant to ping @Erik_Eckstein, @John_McCall, @Joe_Groff, @Andrew_Trick...

Erik_Eckstein · May 22, 2018, 5:37pm

For the implementation we would also need a SIL instruction + Builtin, similar to alloc_ref.

Michael_Gottesman · May 22, 2018, 7:22pm

Sure. I was more describing the high level vision. But yes. I imagine you would implement it with such a builtin and then have IRGen lower the instruction appropriately.

Joe_Groff · May 22, 2018, 7:55pm

Can you give an example of what you want this for? This doesn't make any sense:

public func swift_tailRealloc<T>(x : T) {
    if (valueWitnessTable<T>.isBitwiseTakeable()) {
        realloc(x)
        return
    }
    // ... otherwise alloc new memory and move values over one by one ...
    swift_tailAllocRuntimeFunc(x)
}

since a value of arbitrary type T has no allocation associated with it; it's a value that could be in registers, on the stack, or embedded in an aggregate. Do you mean tail allocation for class instances? That would only be possible if you know you have the unique reference to a class instance, since otherwise other references to the class instance are going to expect the original memory to remain valid. With some cooperation from the malloc implementation, maybe you could define a similar operation that creates a new class instance if it needs to allocate a new instance or otherwise retains the existing instance in-place if it has room to grow (which is IIRC how Array and Dictionary work). We however don't currently have an "is bitwise takable" attribute that applies to the contents of a class instance, or any generic notion of "copy constructors" for class instances, so you would need to be able to provide a function to describe how the copy should work.

jrose · May 22, 2018, 8:18pm

I'm very leery of providing this as a general-purpose functionality. For one, the signature of the function is a lot different from what you wrote:

public func swift_tailRealloc<T: AnyObject>(_ object: inout T, tailAllocationInBytes: Int, tailAlignment: Int)

Why should it be like this? Because (1) you can only realloc a class instance that's uniquely-referenced, and (2) you'll of course get a new pointer back afterwards.

However, really I'd suggest potentially making this an API on ManagedBuffer, rather than declaring a top-level function that works on any object type.

Michael_Gottesman · May 22, 2018, 10:18pm

Joe_Groff
May 22
Can you give an example of what you want this for? This doesn't make any sense:

public func swift_tailRealloc(x : T) {
if (valueWitnessTable.isBitwiseTakeable()) {
realloc(x)
return
}
// ... otherwise alloc new memory and move values over one by one ...
swift_tailAllocRuntimeFunc(x)
}

since a value of arbitrary type T has no allocation associated with it; it's a value that could be in registers, on the stack, or embedded in an aggregate. Do you mean tail allocation for class instances?

Yes. It was more of a sketch. I am specifically talking about tail allocation of class instances that are unique.

That would only be possible if you know you have the unique reference to a class instance, since otherwise other references to the class instance are going to expect the original memory to remain valid.

That is an assumption in the design that uniqueness would be required /and/ that weak pointers would be included in the notion of uniqueness.

With some cooperation from the malloc implementation, maybe you could define a similar operation that creates a new class instance if it needs to allocate a new instance or otherwise retains the existing instance in-place if it has room to grow. We however don't currently have an "is bitwise takable" attribute that applies to the contents of a class instance, or any generic notion of "copy constructors" for class instances, so you would need to be able to provide a function to describe how the copy should work.

I would assume that as part of this proposal we would add the isBitwiseTakeable to classes as well. From my chatting with Arnold, we have all of the information we just do not emit it for classes (evidently we emit it for structs). I am not saying that the two cases are completely isomorphic but there is precedent and code already in the compiler for implementing this.

Michael_Gottesman · May 22, 2018, 10:28pm

jrose https://forums.swift.org/u/jrose Jordan Rose https://forums.swift.org/u/jrose
May 22
I'm very leery of providing this as a general-purpose functionality. For one, the signature of the function is a lot different from what you wrote:

public func swift_tailRealloc<T: AnyObject>(_ object: inout T, tailAllocationInBytes: Int, tailAlignment: Int)
Why should it be like this? Because (1) you can only realloc a class instance that's uniquely-referenced, and (2) you'll of course get a new pointer back afterwards.

However, really I'd suggest potentially making this an API on ManagedBuffer, rather than declaring a top-level function that works on any object type.

A few things:

I think you missed an important detail around the proposal namely that the posted code is pseudo-code for a Builtin that would be SILGened. I think I mentioned pretty clearly in the post that it was for a SILGened builtin. Since it is a builtin it would not be exposed directly to users since it can only be used in the Swift module.
I should have mentioned that I was assuming uniqueness... sorry.
I would be fine with having this be exposed via an API on ManagedBuffer.

jrose · May 22, 2018, 10:53pm

Ah, my mistake. I thought that you meant the body was pseudocode, but that the declaration was important to expose to SwiftNIO. If you think this is only important for Array (and maybe Set and Dictionary) then yeah, maybe it doesn't have to have a public declaration at all.

Michael_Gottesman · May 22, 2018, 10:57pm

A discussion of the API to vend so Swift-NIO can use this is something that should be a separate discussion/proposal IMO. This is only for the low level implementation.

Michael_Gottesman · May 22, 2018, 11:12pm

I talked a little bit with JoeG and Andy. One question that came up is would we have a faster runtime if we instead of using realloc, we used malloc_size + our own realloc implementation. My feeling is that it wouldn't be since the malloc implementation in realloc could avoid moves in a case where we have filled up a malloc block and there is an adjacent unused block.

@johannesweiss do you have info/thoughts about this?

johannesweiss · May 23, 2018, 1:49pm

Indeed. The prime use-case for this are (very) large ByteBuffers in NIO where the copying/moving around is very costly. We also always allocate powers of 2 which means as soon as we hit PAGE_SIZE there's a very real chance that the allocator can find an empty adjacent page in (virtual) memory.
Actually, I wouldn't be surprised at all to find allocator implementations that deliberately spread the pages for large allocation so that realloc can basically always work.

Arnold · May 23, 2018, 3:45pm

@Michael_Gottesman

I don't think that you would need the notion of is bitwise takable for the content of classes if we encapsulate this in the standard library. A ManagedBuffer is defined as a class holding a Header field and tail allocated Elements. All we have to do is assert that the Header _isBitwiseTakable and so is an Element and add this as a precondition to using the the realloc function or dynamically check and change the behavior accordingly. Adding a _isBitwiseTakable builtin is pretty simple (see _isPOD).

Michael_Gottesman · May 23, 2018, 5:04pm

@Arnold You are correct. And I guess given your work with the existential inline buffers to only store bitwise movable things, that would just work for existentials, right?

Arnold · May 23, 2018, 5:26pm

@Michael_Gottesman Yes my recent change to only store bitwise takable values in the inline buffer means that existentials are bitwise takable themselves also. (They currently don't report so - I need to fix that. Thanks!)

Michael_Gottesman · May 23, 2018, 5:29pm

@Arnold Just wanted to give you some props = ). That is a really great change. +1!

Michael_Gottesman · May 23, 2018, 7:29pm

@Joe_Groff @jrose Do you guys have any other concerns/etc?

Joe_Groff · May 23, 2018, 7:42pm

Adding this as API to ManagedBuffer makes sense to me. Hopefully that doesn't really need any additional compiler support—would the optimizer need to know about realloc-ing class instances?

jrose · May 23, 2018, 8:02pm

None from me. To Joe's point, I hope we would model this as creating a different instance that would happen to be cheap to construct in some circumstances.

John_McCall · May 23, 2018, 8:36pm

I'm hesitant to expose realloc directly in Swift because I don't think it's a very good API. A better API would attempt to change an existing allocation in-place, but would just report failure if that doesn't succeed instead of doing the free-and-memcpy thing.