ArraySlice creates undesired copies

pronebird · August 26, 2022, 7:00am

ArraySlice breaks the promise of being a view into a larger array. Instead, creates copies. Consider the following example:

func test(_ slice: inout ArraySlice<UInt8>) {
    var payload = slice.dropFirst(1)

    payload.withUnsafeMutableBytes { bufferPointer in
        bufferPointer.storeBytes(of: 8, as: UInt8.self)
    }

    print("payload: \(payload)")
}

var buffer = [UInt8](repeating: 0, count: 10);
test(&buffer[5...]);

print("buffer = \(buffer)")

slice.dropFirst(1) returns ArraySlice<Uint8>, yet the buffer remains unchanged after mutation. How come that payload does not point to the same underlying storage?
That applies to all methods in the iterator family, i.e prefix, suffix, drop, etc.
It's also unclear when copying occurs and how dropFirst(1) is different from slice[slice.index(after: slice.startIndex)...].
It looks like once you define a variable with var, the whole thing is being copied either immediately or upon mutation, haven't verified. Does it happen the same way when using let variable?
How does one make a reference to avoid going through the same variable, especially inconvenient when offsets are involved? Drop to raw pointers and YOLO? I don't see any way to reference a mutable subrange.

lukasa · August 27, 2022, 9:05am

This isn't quite true, but I can see how it feels that way.

Because you made a copy.

This is not entierly obvious, but once you bring payload into existence there are now two copies of the array: one in the variable called buffer (and aliased as slice within func test), one in the variable called payload. When you subsequently write into the variable in payload, you trigger the copy-on-write pattern, which copies buffer into payload and then mutates it.

You can fix this by continuing to operate directly on slice:

func test(_ slice: inout ArraySlice<UInt8>) {
    slice = slice.dropFirst(1)

    slice.withUnsafeMutableBytes { bufferPointer in
        bufferPointer.storeBytes(of: 8, as: UInt8.self)
    }

    print("slice: \(slice)")
}

var buffer = [UInt8](repeating: 0, count: 10);
test(&buffer[5...]);

print("buffer = \(buffer)")

The copy occurs at the point where you call withUnsafeMutableBytes, as this is your first mutating access to the variable payload. dropFirst(1) is exactly identical to slice[slice.index(after: slice.startIndex)...], and you could replace it in your code without changing the behaviour.

let variables cannot be mutated through, and so will not trigger the copy-on-write.

Don't create the copy variable. See above.

Karl · August 27, 2022, 12:26pm

Another way to express what @lukasa is saying is that even slices have value semantics.

The variable slice is provided to the function inout, but you never change its value, you only change the value of payload. To demonstrate, add this line to the end of the function:

slice = payload

And I believe it should do what you expect:

payload: [8, 0, 0, 0]
buffer = [0, 0, 0, 0, 0, 8, 0, 0, 0]

(Of course, that may not the most efficient way to do it as you may see intermediate copies; you're probably better off using slice directly, but this is the semantic reason why the code doesn't have the result you expect)

pronebird · August 27, 2022, 2:17pm

This would drop the first element from buffer and this is not desired. The array should maintain the same length.

The idea is to reference a subrange within the same buffer and be able to modify it (without addition or removal).

dropFirst(1) should literally bump startIndex by 1 and it even returns ArraySlice, but once mutated the ground beneath slips away and suddenly ArraySlice allocates a new storage.

I understand Swift semantics but they are getting really twisted especially when working with views that are supposed to be linked to the same storage.

Karl · August 27, 2022, 2:30pm

Oh, that's a good point; I forgot about the dropFirst() changing the length.

Slices are values, not references. Since they are values, they are independent collections.

The concept of a "view" does not really exist in the language, but we use that term to refer to values which might share their underlying storage with another variable (rather than being a completely fresh Array or whatever). They cannot have shared mutable state, so mutations cannot happen in-place if the storage is shared.

Admittedly, this can be a little counter-intuitive if you are expecting them to be references or behave like classes in other languages. You should think of them more like independent values.

Just like other value types, if they have a uniquely-referenced storage, the mutation can happen in-place. But I would suggest that that is more of an implementation detail and not necessary to understand the high-level semantics.

jrose · August 27, 2022, 6:10pm

To add to that, the place where the original Array gets mutated is when the slicing subscript setter is called; the place where that happens is after the function call ends and the inout is…concluded, for lack of a better term. Everything up to that is mutating the independent value typed ArraySlice (which itself is copied).

I don’t think it’s possible to build a safe “view” on a value-typed collection without move-only types, but perhaps we’ll be able to explore that in the future. Meanwhile, if you want to mutate the original Array, you either have to inout all the way down (which, admittedly, the current Collection APIs do not make easy; you want &slice[offset: 1...]), or pass the base collection and the indexes separately.

lukasa · August 28, 2022, 6:35am

Just to absolutely clarify, the code I rewrote to use ArraySlice does produce the expected behaviour:

slice: [8, 0, 0, 0]
buffer = [0, 0, 0, 0, 0, 8, 0, 0, 0]

Mutating the bounds of the ArraySlice has no impact on the original array, as the bounds modification applies only to the ArraySlice that is torn down immediately after.