Making collection wrappers mutable?

nnnnnnnn · December 22, 2023, 11:39pm

Inspired by this issue, I'm curious what people think of making the collection wrappers in Swift Algorithms conform to MutableCollection when possible:

Is there a specific reason why Chain2Sequence does not conform to MutableCollection when Base1 and Base2 do?

I'm trying to build an API that uses two some MutableCollection<T> as parameters and one as return value. I need it to be mutable, because I need to change a specific element in the collection. Internally, the parameters will be chained using chain(_:_:) - or at least that's the goal.

Standard library collection wrappers, which the Algorithms package tries to follow, typically don't have mutating conformances. For example, ReversedCollection doesn't conform to MutableCollection, even though it does conditionally conform to RandomAccessCollection when its underlying collection does.

From a performance standpoint, allowing mutations through Chain2Sequence could be a win, since only one of the underlying collections would get copied, and after that things would be unique. In theory, we could even support RangeReplaceableCollection conformance.

To my eye, the biggest drawback is that people might expect changes in some of the collection wrappers to be visible in the original collection (ReversedCollection in particular). Are there other pros or cons that I'm not seeing?

bbrk24 · December 23, 2023, 12:12am

What happens if you insert something in between the original collections?

nnnnnnnn · December 23, 2023, 4:16am

I think we’d just append to the first collection, as that’s the more efficient operation in the common case. That said, since the internals of the chain type and its index are internal, whatever implementation decisions we make here wouldn’t really be externally visible.

ffried · December 23, 2023, 1:16pm

Thanks @nnnnnnnn for writing down your thoughts here!

As the author of the GitHub issue, I'm (obviously) in favor of adding the conformance.

To my eye, the biggest drawback is that people might expect changes in some of the collection wrappers to be visible in the original collection (ReversedCollection in particular).

Interesting point. However, I find it hard to come up with code that I could see people expecting the original to be modified. The only place where this is currently possible is with subsequences - but only when going through the subscript-getter for ranges.
When modifying a subsequence stored in a separate var, the original also stays unmodified:

var arr = [1, 2, 3, 4]
var subArr = arr[2...3]
subArr[subArr.startIndex] *= 2
assert(arr == [1, 2, 3, 4])

I think we’d just append to the first collection, as that’s the more efficient operation in the common case. That said, since the internals of the chain type and its index are internal, whatever implementation decisions we make here wouldn’t really be externally visible.

I thought about RangeReplaceableCollection as well. I'm a bit more reluctant here, though. While the implementation is internal, there might be cases where e.g. it does matter which collection gets extended when an element is inserted. For example because one of the two collections has a more efficient storage than the other.
Also, what if someone implemented a RangeReplaceableCollection using reference types? Then the decision would be externally visible - and become harder to justify.
Whereas when modifying elements (via MutableCollection) it's always clear which sequence an element belongs to.

LucianoPAlmeida · February 18, 2024, 1:46am

A bit late to the discussion, but I would worry about the maintaining of the state of the modified wrapper collection as the original underlying collection wouldn't change, the state of mutations such as modified a value will need to be kept at the wrapper collection so in a case for example of a ReversedCollection where all elements where modified, the collection wrapper would have to have an internal storage of the same size of the original collection to keep track of the modifications which comes with a cost.
But I think that would be the only con I thought about.

nnnnnnnn · February 20, 2024, 11:06pm

The copies would be separate – changes you make to e.g. a ReversedCollection wouldn't be visible in the original collection, so there wouldn't be "extra" storage required in the wrapper type beyond the wrapped collection. That wrapped collection's storage would be copied on the first mutation, and then be separate from the original collection that was reversed.

Something like this:

let myArray: [Int] = [1, 2, 3, 4, 5, 6]
var reversed = myArray.reversed()
if let firstEven = reversed.firstIndex(where: { $0.isMultiple(of: 2) }) {
    reversed[firstEven] *= 10    // storage is copied here
}
print(Array(reversed))
// [60, 5, 4, 3, 2, 1]
print(myArray)
// [1, 2, 3, 4, 5, 6]

LucianoPAlmeida · February 20, 2024, 11:24pm

Hi @nnnnnnnn

Sorry, I think my comment was not very detailed and didn't explain very well the point I was trying to convey

That was in some way my point, because now non-mutating is efficient because it takes advantage of Cow not needing a copy in the wrapper ReversedCollection but after the first mutation the wrapper now has a new copy of underlying, or what I was thinking it was another way of storage just to keep track of mutation inside the wrapper collection, which either way has the same underlying storage than base.

In your example, we see that another 6 element array was copied in the wrapper after mutation so there is now 2 arrays in memory. So my comment was more on the fact that the wrapper which use to be efficient with arrays for example due to CoW, would be possible expensive copies after mutation because in someway wrapper now has to keep track of mutations.

Is not possible to have then mutating without having an underlying copy of base, which may be not an efficient in place mutation operation. In my understanding a simple reversed[0] = 2 would be a linear operation because of the underlying copy in first mutation.