My app allows the user to record audio. I'm using a ContiguousArray to store the raw audio samples. The size of the array is often very large (>1,000,000 samples). As audio is recorded, the samples are appended to the array. The final size of the array is always known before recording starts. So to avoid reallocations, I call reserveCapacity() before recording.
Despite this, I've found that ContiguousArray seems to change its capacity to slightly more than its count almost every time I call append(contentsOf:). Because of this, the copying and reallocation starts to get pretty expensive towards the end of the user's recording. Time Profiler shows that ContiguousArray._copyToNewBuffer(oldCount:) is taking up a lot of CPU cycles:
And CPU usage grows exponentially throughout the recording, indicating that the appends are not O(1) like the documentation claims. Instead they seem to be O(n), where n is the array's count before appending.
Calling reserveCapacity() again after every append is not a solution, since this forces the array to reallocate again. This results in worse performance.
Am I misunderstanding the point of reserveCapacity()? What's the point of calling it if ContiguousArray reallocates itself anyways every time I append to it? How do I get it to hold its capacity through multiple appends over >60 seconds of recording?
Hi @dkun7944 – can you post code that demonstrates the issue you're seeing? Hopefully you haven't found a bug in the implementation but it's possible. Is there a chance you're making a copy of the array around each loop, causing CoW to trigger? It's sometimes possible to do that in ways that aren't immediately clear by inspecting the code, especially with helper functions.
Here's a bit of code that should demonstrate it's not at least trivially broken:
var a: ContiguousArray<Int> = []
var b = (0..<10).map { _ in Int.random(in: 0..<10) }
var oldAddr = ""
for _ in 0..<10_000 {
a.append(contentsOf: b)
let newAddr = a.withUnsafeMutableBufferPointer { "\($0.baseAddress!)" }
if newAddr != oldAddr {
print("Address changed to", newAddr, " count =", a.count)
oldAddr = newAddr
}
}
which on 5.1 shows exponential growth:
Address changed to 0x0000000100796220 count = 10
Address changed to 0x00000001005067d0 count = 20
Address changed to 0x0000000100506930 count = 30
Address changed to 0x0000000100506a90 count = 50
Address changed to 0x0000000100800020 count = 90
Address changed to 0x0000000100800620 count = 190
Address changed to 0x0000000100801220 count = 390
Address changed to 0x0000000100802a20 count = 770
Address changed to 0x0000000100805a20 count = 1540
Address changed to 0x0000000108000020 count = 3070
Address changed to 0x0000000108010020 count = 8190
Address changed to 0x0000000108030020 count = 16390
Address changed to 0x0000000110000020 count = 32770
Address changed to 0x0000000110080020 count = 65540
edit: updated to use append(contentsOf:) since that was cited in the original post
Oh – any chance you're calling reserveCapacity before any append? i.e. in the above code, a.reserveCapacity(a.count + b.count) to make sure you have at least enough space? That will lead to defeating the exponential growth because we never over-reserve.
Here's something about reserveCapacity that might be easy to miss:
var a = ContiguousArray<Float>.init()
a.reserveCapacity(1_000_000) // <-- You want *at least* 1_000_000
print(a.capacity) // <-- It will print eg 1_000_440, ie you got some more.
That is, you cannot trust that it will reserve exactly the capacity that you ask for.
If I'm not mistaken, your code assumes that you can, and that's whats causing your issue.
The correct way to check the actual capacity is by using the array's .capacity property, not storing and using your "wished-for minimum capacity" frameCount.
This would be clearer if you had to write eg myArray.reserveCapacity(atLeast: frameCount).
Don't think of it as casting. That sort of syntax always indicates you're constructing a new instance of something, and with construction there may be a lot of work (like copying).
As for the core problem at hand, have you verified that the capacity you're setting is actually correct and not just "kind of close" or something like that?
@Jens Correct. The problem is that after calling reserveCapacity initially, ContiguousArray reduces its capacity to just over the count, and then re-allocates itself non-exponentially which is very expensive. Seems like the array completely forgets my initial call to reserveCapacity once I've appended to it.
@BigZaphod Got it. Yes, I've verified the capacity I'm setting is exactly correct. Even if it was "kind of close," I still don't see why this issue would occur. To give you an idea, the initial reserveCapacity is around 3,000,000, and the array reallocates itself after appending only 732 (see image in 4th post).
FWIW, I copied your example in to a playground, with this code to exercise it:
let l = Loop(frameCount: 10_000)
let samples = Array(repeating: Float32(42), count: 10)
for _ in 0..<1000 {
l.addSamples(samples)
}
print(l.originalSamples.count)
And also with 100_000 frames (10 * 10_000). I can't reproduce the issue. This is my output:
Address changed to 0x000000011406c020 count = 10
10000
I'm leaning towards this being a COW issue. Something is happening every time before new frames are added, and is causing the reserved capacity to be lost. It seems like something must be grabbing a copy of the array.
@Karl Something else grabbing a copy of the array causes the original array to lose its reserved capacity? I do have a waveform renderer in a separate class grabbing a copy of the array regularly.
That sounds like the issue. IIRC, when copy-on-write gets triggered, the new Array doesn't have the same capacity as the old array.
Here's the implementation of append(contentsOf:). As you see, it calls in to reserveCapacityForAppend, which calls: _reserveCapacityImpl(minimumCapacity: self.count + newElementsCount, ...)
I don't think so. What's worse - just making a copy (via Array(loop.originalSamples)) wouldn't work, either. The waveform renderer will need to get a copy and instantly mutate it (it's important that the waveform renderer does this, because it's copy-on-write so it has to be the writer).
If the renderer gets a copy and appends a bogus value to it, it should get its own copy and the original Array will keep its capacity.
copy-on-write means the object doing the mutations (in this case, the loop) gets the new copy. The old array, with plenty of excess capacity, is retained by those who only wanted a read-only view.
Someone else grabbing the array will cause the storage's reference count to become 2, and the next time you append it'll have to make a copy before appending to avoid mutating the storage that is double-referenced. That's just how copy-on-write works. (And when copying the storage, extra reserved capacity is not allocated again.)