swift_tailRealloc

@johannesweiss and I have been talking about creating a swift_reallocTail builtin for performance sensitive systems applications that want to use the system realloc as much as possible to reduce the amount of actual reallocations that occur. For your consideration/feedback:

NOTE Feedback is welcome from all, but I would like to ping specifically: @rjmccall, @jckarter, @atrick, @eeckstein. Also @jordanrose would be useful for naming purposes ; ).


The main challenge in implementing a generic version of swift_reallocTail that can work for any swift type is namely that the system realloc may move values without any way for the caller
from the caller. Luckily we have that exact predicate in the value witness table!? Basically we would implement in the stdlibCore a simple function called tailRealloc that would expand inline to (most likely via a SILGen builtin):

public func swift_tailRealloc<T>(x : T) {
    if (valueWitnessTable<T>.isBitwiseTakeable()) {
        realloc(x)
        return
    }
    // ... otherwise alloc new memory and move values over one by one ...
    swift_tailAllocRuntimeFunc(x)
}

After full specialization, this should reduce to a single function call in all cases. In the case of a generic context, we will still be able to go down the bitwise takeable path and potentially save a ton of compile time by not needing to allocate more memory.

swift_tailAllocRuntimeFunction would necessarily be some sort of function implemented in the runtime in C++ or be a resilient swift function. It needs to be a future proof way to handle any possible future realloc scenario. We could just do a new allocation + witness table moves + destroy of old. That would preserve AFAIKT current behavior in the stdlib, while giving us extra performance in other cases.

Thoughts, objections, and or questions?

EDIT: Answers to some common questions:

  1. I am assuming that we have uniqueness before this builtin is invoked.
  2. Since this is a Builtin, I am specifically /not/ talking about public API for this. Just implementing the underlying functionality.
  3. I am assuming the ability to determine that the contents of a class are all bitwise takable in a way that is similar to what we do for structs.
4 Likes

Meant to ping @Erik_Eckstein, @John_McCall, @Joe_Groff, @Andrew_Trick...

For the implementation we would also need a SIL instruction + Builtin, similar to alloc_ref.

Sure. I was more describing the high level vision. But yes. I imagine you would implement it with such a builtin and then have IRGen lower the instruction appropriately.

Can you give an example of what you want this for? This doesn't make any sense:

public func swift_tailRealloc<T>(x : T) {
    if (valueWitnessTable<T>.isBitwiseTakeable()) {
        realloc(x)
        return
    }
    // ... otherwise alloc new memory and move values over one by one ...
    swift_tailAllocRuntimeFunc(x)
}

since a value of arbitrary type T has no allocation associated with it; it's a value that could be in registers, on the stack, or embedded in an aggregate. Do you mean tail allocation for class instances? That would only be possible if you know you have the unique reference to a class instance, since otherwise other references to the class instance are going to expect the original memory to remain valid. With some cooperation from the malloc implementation, maybe you could define a similar operation that creates a new class instance if it needs to allocate a new instance or otherwise retains the existing instance in-place if it has room to grow (which is IIRC how Array and Dictionary work). We however don't currently have an "is bitwise takable" attribute that applies to the contents of a class instance, or any generic notion of "copy constructors" for class instances, so you would need to be able to provide a function to describe how the copy should work.

I'm very leery of providing this as a general-purpose functionality. For one, the signature of the function is a lot different from what you wrote:

public func swift_tailRealloc<T: AnyObject>(_ object: inout T, tailAllocationInBytes: Int, tailAlignment: Int)

Why should it be like this? Because (1) you can only realloc a class instance that's uniquely-referenced, and (2) you'll of course get a new pointer back afterwards.

However, really I'd suggest potentially making this an API on ManagedBuffer, rather than declaring a top-level function that works on any object type.

Joe_Groff
May 22
Can you give an example of what you want this for? This doesn't make any sense:

public func swift_tailRealloc(x : T) {
if (valueWitnessTable.isBitwiseTakeable()) {
realloc(x)
return
}
// ... otherwise alloc new memory and move values over one by one ...
swift_tailAllocRuntimeFunc(x)
}

since a value of arbitrary type T has no allocation associated with it; it's a value that could be in registers, on the stack, or embedded in an aggregate. Do you mean tail allocation for class instances?

Yes. It was more of a sketch. I am specifically talking about tail allocation of class instances that are unique.

That would only be possible if you know you have the unique reference to a class instance, since otherwise other references to the class instance are going to expect the original memory to remain valid.

That is an assumption in the design that uniqueness would be required /and/ that weak pointers would be included in the notion of uniqueness.

With some cooperation from the malloc implementation, maybe you could define a similar operation that creates a new class instance if it needs to allocate a new instance or otherwise retains the existing instance in-place if it has room to grow. We however don't currently have an "is bitwise takable" attribute that applies to the contents of a class instance, or any generic notion of "copy constructors" for class instances, so you would need to be able to provide a function to describe how the copy should work.

I would assume that as part of this proposal we would add the isBitwiseTakeable to classes as well. From my chatting with Arnold, we have all of the information we just do not emit it for classes (evidently we emit it for structs). I am not saying that the two cases are completely isomorphic but there is precedent and code already in the compiler for implementing this.

jrose https://forums.swift.org/u/jrose Jordan Rose https://forums.swift.org/u/jrose
May 22
I'm very leery of providing this as a general-purpose functionality. For one, the signature of the function is a lot different from what you wrote:

public func swift_tailRealloc<T: AnyObject>(_ object: inout T, tailAllocationInBytes: Int, tailAlignment: Int)
Why should it be like this? Because (1) you can only realloc a class instance that's uniquely-referenced, and (2) you'll of course get a new pointer back afterwards.

However, really I'd suggest potentially making this an API on ManagedBuffer, rather than declaring a top-level function that works on any object type.

A few things:

  1. I think you missed an important detail around the proposal namely that the posted code is pseudo-code for a Builtin that would be SILGened. I think I mentioned pretty clearly in the post that it was for a SILGened builtin. Since it is a builtin it would not be exposed directly to users since it can only be used in the Swift module.
  2. I should have mentioned that I was assuming uniqueness... sorry.
  3. I would be fine with having this be exposed via an API on ManagedBuffer.

Ah, my mistake. I thought that you meant the body was pseudocode, but that the declaration was important to expose to SwiftNIO. If you think this is only important for Array (and maybe Set and Dictionary) then yeah, maybe it doesn't have to have a public declaration at all.

A discussion of the API to vend so Swift-NIO can use this is something that should be a separate discussion/proposal IMO. This is only for the low level implementation.

1 Like

I talked a little bit with JoeG and Andy. One question that came up is would we have a faster runtime if we instead of using realloc, we used malloc_size + our own realloc implementation. My feeling is that it wouldn't be since the malloc implementation in realloc could avoid moves in a case where we have filled up a malloc block and there is an adjacent unused block.

@johannesweiss do you have info/thoughts about this?

Indeed. The prime use-case for this are (very) large ByteBuffers in NIO where the copying/moving around is very costly. We also always allocate powers of 2 which means as soon as we hit PAGE_SIZE there's a very real chance that the allocator can find an empty adjacent page in (virtual) memory.
Actually, I wouldn't be surprised at all to find allocator implementations that deliberately spread the pages for large allocation so that realloc can basically always work.

1 Like

@Michael_Gottesman

I don't think that you would need the notion of is bitwise takable for the content of classes if we encapsulate this in the standard library. A ManagedBuffer is defined as a class holding a Header field and tail allocated Elements. All we have to do is assert that the Header _isBitwiseTakable and so is an Element and add this as a precondition to using the the realloc function or dynamically check and change the behavior accordingly. Adding a _isBitwiseTakable builtin is pretty simple (see _isPOD).

1 Like

@Arnold You are correct. And I guess given your work with the existential inline buffers to only store bitwise movable things, that would just work for existentials, right?

@Michael_Gottesman Yes my recent change to only store bitwise takable values in the inline buffer means that existentials are bitwise takable themselves also. (They currently don't report so - I need to fix that. Thanks!)

2 Likes

@Arnold Just wanted to give you some props = ). That is a really great change. +1!

1 Like

@Joe_Groff @jrose Do you guys have any other concerns/etc?

Adding this as API to ManagedBuffer makes sense to me. Hopefully that doesn't really need any additional compiler support—would the optimizer need to know about realloc-ing class instances?

None from me. To Joe's point, I hope we would model this as creating a different instance that would happen to be cheap to construct in some circumstances.

I'm hesitant to expose realloc directly in Swift because I don't think it's a very good API. A better API would attempt to change an existing allocation in-place, but would just report failure if that doesn't succeed instead of doing the free-and-memcpy thing.

1 Like