Array capacity optimization

nnnnnnnn · June 19, 2018, 4:22am

While discussing the idea of adding a way to work with an array's uninitialized capacity, it came up that arrays will sometimes allocate more storage than strictly requested, as a performance optimization. The difference can be significant:

var a: [UInt8] = []
a.reserveCapacity(5)
// a.capacity == 16

I can assume a few possible rationales for this behavior, but I haven't been able to find the reasons documented. In the current paradigm, where the uninitialized part of an array is inaccessible, there generally isn't any ill effect here. However, if we open up a way of accessing the full capacity of an array, not having control over the specific size of the allocation ends up feeling very strange.

The extra padding doesn't appear to be consistent, depending instead of how an array grows to a certain size:

// Appending to an array allocates the same as reserveCapacity
a = []
a.append(contentsOf: 1...5)
// a.capacity == 16

// The same array created using a literal is more stingy
a = [1, 2, 3, 4, 5]
// a.capacity == 5

I'd like to see whether the benefit outweighs the potential confusion here. My specific questions are:

What's the intended purpose of allocating more space than requested?
How would I go about turning this off to test whether that purpose is being served? I've followed the code down to Builtin.allocWithTailElems_1, and frankly get lost in lib/AST/Builtins.cpp.

sbromberger · June 19, 2018, 5:14am

In the first example, it appears from some quick testing that reserveCapacity will reserve a minimum of 16 elements.

My understanding is that capacity doubles when a new allocation is required (this is actually a fairly common method of trying to minimize reallocation which results in an expensive copy), ~~so that in your second example (a.append()), you are reallocating 4 times.~~ (edited: it appears that append(contentsOf:) will also allocate a minimum of 16 elements,)

In your third example (a = [1,2,3,4,5]), you will get a capacity of 10 with the next addition to the array:

var arr = [1,2,3,4,5]
let c1 = arr.capacity  // c1 = 5
arr.append(6)
let c2 = arr.capacity  // c2 = 10

I realize that you probably already know this, but I wanted to get my understanding out there both to confirm its accuracy and as a pointer for others.

nnnnnnnn · June 19, 2018, 6:00am

Absolutely! We have the rationale for this growth strategy documented in the Array.reserveCapacity docs and a couple other places—it's what allows us to document Array as having O(1) appends over many operations.

I don't think that guarantee would be compromised by making reserveCapacity allocate only the requested amount of space. As the docs in that method indicate, it's essentially a programmer error to repeatedly call it (without implementing your own exponential growth strategy). I don't think it should have to handle the use case of an array that will immediately continue growing beyond the requested capacity.

sbromberger · June 19, 2018, 6:13am

~~If I'm understanding this (and my tests) correctly, it will only overallocate by a maximum of 15 elements, basically rounding up to the next multiple of 16.~~

~~If the elements were themselves large, I could see this being a potential problem, but what else am I missing?~~

Edited: it appears that it overallocates by at most 15 bytes:

var a3: [UInt8] = []
a3.reserveCapacity(2)
let c5 = a3.capacity // c5 = 16

var a4: [UInt64] = []
a4.reserveCapacity(2)
let c6 = a4.capacity // c6 = 2

lancep · June 19, 2018, 5:32pm

The source of this is malloc_size (see ContiguousArrayBuffer's init(_uninitializedCount uninitializedCount: Int, minimumCapacity: Int)

from the malloc_size man page:

The malloc_size() function returns the size of the memory block that
backs the allocation pointed to by ptr. The memory block size is always
at least as large as the allocation it backs, and may be larger.

So sometimes, malloc over-allocates because it can and malloc_size gives us the actual size allocated. We use this because, why not?

nnnnnnnn · June 19, 2018, 8:08pm

That makes sense! I'm suggesting giving it the same behavior as UnsafeMutableBufferPointer.allocate(c), where the resulting buffer always has c elements. In that case, there's the potential for allocated memory past the buffer that is just never accessible, right?

I think the same trade-off would be worthwhile if we end up allowing access to an array's full capacity, since bugs like this are too easy:

let a = [1, 2, 3, 4, 5]
var b = Array(minimumCapacity: a.count)
b.withFullCapacityUnsafeMutableBufferPointer { buffer, initializedCount in
    for i in 0..<buffer.count {
        buffer[i] = a[i] * 2   // subscript out of bounds on `a` sometimes
    }
    initializedCount = buffer.count
}

We could keep some of the same behavior that we have today, where automatic expansion (due to appending) would get the full amount, but make reserveCapacity(_:) and the hypothetical init(capacity:) always result in an array with that exact capacity.

Ben_Cohen · June 19, 2018, 8:33pm

Relatedly, it's sometimes argued that our 2x growth factor is bad and that something slightly lower might make sense. It'd be an interesting experiment for someone to make some adjustments and see the impact on both the benchmarks and perhaps other more targeted benchmarking code (or even a real world application, like NIO that has memory use tests as well).

Changing from 2x would dent the amortized-constant-time-append guarantee but depending on the factor chosen, maybe not necessarily in a materially important way.

Ben_Cohen · June 19, 2018, 8:35pm

OTOH, if someone knows a good rebuttal of that link, that would be interesting too!

gparker42 · June 19, 2018, 9:34pm

FWIW Foundation and CF generally prefer to grow hash tables and arrays by 1.618x (the golden ratio). I don't know if they have recent performance numbers to justify that.

jawbroken · June 20, 2018, 12:51am

I'm not sure what you meant by dent here, but that property is preserved for any growth factor > 1, as your link says.

Ben_Cohen · June 20, 2018, 2:58pm

That's true, but only frivolously so as the value gets closer to 1, since n has to become huge before the amortization pays for itself. I don't have a good feel for where it starts to swing in the wrong direction in real-world use (and many use cases might differ, so there probably isn't a right answer). Needs some experimentation.

torquato · June 20, 2018, 3:18pm

As an amateur programmer I am wondering if this topic was never been topic of research in academia…? o_O

johannesweiss · June 21, 2018, 10:07pm

We also currently use 2 as the growth factor for ByteBuffer. But I don’t think we currently have many real-world benchmarks where we do actually need to grow a ByteBuffer. The fast path is usually that we allocate with a good enough estimation and stuff we actually need in one contiguous buffer fits without growth.

In user code however ByteBuffer growth seems rather likely so I’d be quite interested to see a benchmark too.

Andrew_Trick · June 22, 2018, 4:32am

A reserve operation does not and should not guarantee a maximum capacity. The solution here is to call it init(reserveCapacity:) and make sure the documentation is clear ;)

nnnnnnnn · June 26, 2018, 1:11pm

Can you say a bit more about why reserving capacity shouldn't guarantee a maximum? I'm concerned about a risk that users of this API will introduce subtle bugs by assuming the capacity of the array is what they asked for, and I don't see a down side to limiting the capacity exposed to the user after one of those calls.

If a user calls reserveCapacity(100) today, they probably only need space for 100 elements (and our documentation discourages asking for too little). If that's true, the resulting capacity of 104 (or whatever) doesn't actually help them with performance, and there wouldn't be an issue with us artificially limiting the visible capacity of the array to 100.

(This question is specifically about Array, not RangeReplaceableCollection, which doesn't make any guarantees about reserve operations at all.)

lorentey · June 26, 2018, 1:56pm

While it may not help, exposing a few item's worth of extra space that we get from malloc won't hurt performance, either.

I believe it's unfortunate that capacity is a public property; it exposes information about internal implementation details that the stdlib may want/need to change in the future. I don't know a good use case for it, either: using the capacity getter is a distinct code smell in Swift programs.

We got the API right for String: its storage capacity isn't exposed. I suggest we should deprecate capacity in Array, Set and Dictionary, rather than trying to simplify its apparent behavior.

Andrew_Trick · June 26, 2018, 5:08pm

Utility and simplicity: An API should maximize both. When reserve allows the implementation to choose the capacity it satisfies all use cases with a single API. There's no need to invent another reserveAsMuchAsYouCan API.

Performance: reserve is sometimes part of a user-side growth strategy. Allowing the data structure to use the entire available capacity reduces allocation and copying.

Consistency: The current rule is that can't assume capacity across any mutating call that may resize the data structure. If user code relies on capacity, the code needs to read the capacity. That's a simple rule that's easy to understand and verify.

Quality: Creating a special rule that users can assume capacity sometimes may lead to a more general misunderstanding that users can rely on the implementation behavior, for example assuming the capacity after initialization. That behavior can change from release-to-release and break code.

I think you have a valid concern that users may reuse the same limit variable for a call to reserve and within the closure. I think that concern should be addressed by the design and documentation of the new unsafe API, not by redifining the semantics of the existing API. For example, maybe we should not encourage the reserve(x) + withUnsafeFullCapacityBufferPointer() pattern and provide some kind of combined API instead.

huon · June 26, 2018, 11:06pm

One reason that Rust encountered in practice was it became a quadratic footgun: e.g. if reserveCapacity gives an exact value for the capacity, this loop is likely to to be quadratic:

for newArray in something {
    array.reserveCapacity(array.count)
    array.append(contentsOf: newArray)
}

This is of course probably not the best way to write this sort of loop, but things like this did come up.

All that said, it looks like reserveCapacity actually does have this problem now, according to the documentation? (Rust switched the reserveCapacity equivalent to doing quadratic growth, and added a reserve_exact that... still isn't quite exact because of the malloc-usable-size thing.)

lorentey · June 29, 2018, 2:30pm

That's an interesting issue. I wonder if splitting the API wouldn't just cause people start using reserve_exact in these places instead, though. (After all, they know the exact count being appended!)

Making reserveCapacity do round up to max(2 * oldCapacity, newCapacity) would make it pointless to call it here -- after all, append(contentsOf:) already does the same thing.

I think it helps a little that the argument of Swift's reserveCapacity specifies the full storage size, including already occupied slots, and not just the free space to be made available. It perhaps makes the function a little less convenient, but it forces us think a little about how storage works, and that slight speed bump may be enough to make us realize the problem.

scanon · June 29, 2018, 4:10pm

I am wondering if this topic was never been topic of research in academia

This topic has been heavily studied, but the optimal behavior is workload- and system-dependent. For any fixed workload, it's pretty easy to fix the Right Behavior, but for a general-purpose language running on a diverse set of platforms, it's fairly subtle.

The consensus seems to be for a geometric progression with a factor between 1.6 and 2. The fundamental tradeoff is that as the growth factor gets larger, you have more wasted space on average (note that this is mainly a concern with sizes smaller than a page or two--for huge capacities it's less a problem because unused pages will simply never be faulted into the working set); conversely for smaller growth factors, you don't get to amortize the cost of reallocating as much.

For very large (multiple-page) buffers, there's a reasonable argument to be made that something the golden ratio (or an approximation thereof) is optimal in a fairly workload-independent fashion.