Capturing a reference to `self`

I want to capture a reference to self, and I am having some difficulty.

Here’s an example:

struct Reference<T> {
  var getter: ()->T
  
  var value: T { return getter() }
  
  init(_ value: @escaping @autoclosure ()->T) {
    getter = value
  }
}

This type captures a reference to the value it is initialized with:

var a = [0]
let r = Reference(a)
a = [1, 2, 3]
print(r.value)        // [1, 2, 3]

The reference r sees the new, updated value of a, and it does not trigger copy-on-write. This is exactly what I want.

However, if we make a convenience member on Array for creating such a reference, it behaves differently:

extension Array {
  var reference: Reference<Array> {
    return Reference(self)
  }
}

var b = [0]
let s = b.reference
b = [1, 2, 3]
print(s.value)        // [0]

This time, the reference does trigger copy-on-write, and it does not see the new, updated value of b. This is not what I want.

Is there any way to make this work, so that I can write a.reference instead of Reference(a) and have it do the same thing?

(The difference in spelling may seem trivial, but it becomes important when conforming to protocols.)

Not really. In the first case, you're capturing a reference to the variable a; in the second case, you're capturing a reference to the constant self. There's no way to access the "dynamic scope" of the caller from inside another function body.

EDIT: you can make this clearer for yourself by dropping the @autoclosure and writing the closures explicitly.

1 Like

Tested it with a custom COW type and as soon as you use instance member of a COW optimized type the retain count of the private ref storage automatically gets incremented and does not drop after you return from reference. That makes the ref count for the first example always 2 (no idea why 2 and not 1) and for the second example it becomes 3 during the reference call and stays that way. During the mutation this causes a full copy due to COW logic.

I don’t follow what you’re saying here. Could you provide a working example?

Also, how are you viewing the retain count?

That is unfortunate. :-(

(And @autoclosure was the last thing I added, just to make the call-site nicer.)

I am using this to create IndirectIndices for a MutableCollection, which allows you to mutate the collection while iterating over its indices without risk of an unwanted COW copy:

for i in IndirectIndices(x) {
  x[i] += 1
}

As far as I can tell, this works and does not make a copy.

However, when I try to make it “for i in x.indirectIndices” (or even just indices), then the unwanted copy-on-write occurs.

If we could make this work, then DefaultIndices and Dictionary.Values.Indices and any other types where it matters could be implemented to capture the base collection by reference.

In any case, I think conceptually it should work, because self should be synonymous with the thing that the method was called on—in this case, a.

struct S {
  class Storage {
    var value: String
    init(value: String) {
      self.value = value
    }
  }

  var storage: Storage

  init(value: String) {
    storage = Storage(value: value)
  }

  var value: String {
    get { return storage.value }
    set {
      if isKnownUniquelyReferenced(&storage) {
        storage.value = newValue
      } else {
        storage = Storage(value: newValue)
      }
    }
  }
}

struct Reference<T> {
  var getter: () -> T
  var value: T {
    return getter()
  }
  init(_ value: @escaping () -> T) {
    getter = value
  }
}

extension S {
  var reference: Reference<S> {
    return Reference({ self })
  }

  func foo() {
    print(CFGetRetainCount(storage))
  }
}

var a = S(value: "swift")
print(CFGetRetainCount(a.storage)) // 2
a.foo()                            // 3
let r = Reference({ a })
print(CFGetRetainCount(a.storage)) // 2
a.value = "rust"
print(CFGetRetainCount(a.storage)) // 2
print(r.value.value)               // "rust"

var b = S(value: "swift")
print(CFGetRetainCount(b.storage)) // 2
b.foo()                            // 3
// `l` captured one extra reference compared to the top variant
// this causes the inner storage to have ref count of 3, which will
// lead to a copy on COW optimized mutation
let l = b.reference
print(CFGetRetainCount(b.storage)) // 3 🧐
b.value = "rust"
print(CFGetRetainCount(b.storage)) // 2
print(l.value.value)               // "swift"
1 Like

It just doesn't make sense to "capture a collection by reference". You're capturing the variable the collection is stored in and re-accessing it every time. Which is fine, but which is never something you can do from a member of the type stored in that variable, because you've already gotten the value out of the variable to access the member.

(I see the use case here, but it's not something Swift already has, and it seems subtly dangerous anyway. Consider what would happen if your loop inserted at an index instead of just changing an individual value.)

First of all, “inserting” is not a valid operation on MutableCollection, so that could not happen in generic code without additional constraints.

Second, what would happen is, in order to increment the index, the IndirectIndices instance would access the collection (which has been mutated) through its reference, and ask the collection to form the next valid index after the current one. This should succeed, and the loop will continue over the valid indices of the collection.

Contrast that with the existing attractive nuisance:

If you are iterating over indices that retain the collection, and the loop inserts a value, then the collection’s copy-on-write triggers and now the loop is using indices from one collection (the original, pre-mutation instance) to index into another collection (the new, silently-created copy). This can fail badly.

Edit:

In fact, inserting or removing elements while looping over indices can be problematic even for collections where indices is trivial:

var a = [1, 2, 3]
for i in a.indices {
  a.remove(at: i)     // Fatal error: Index out of range
}

Obviously that particular loop is terrible for many reasons, but the point remains—if you do in fact need to perform nontrivial modifications to a collection, then looping over IndirectIndices will work as expected, with no extra copies and no crashes, whereas indices won’t.

The more I think about it, the less I see any use-case for indices in generic code at all. If you’re not mutating the collection, then you can just loop over its values. And if you are mutating, then you really ought to follow the documented recommendation and manually advance an index.

I just don’t see the purpose of indices, since the one thing it looks like it would be good at, is actually a pitfall.

Are there any important uses of indices in the standard library?

But "the current one" may no longer be valid after insertion. A RangeReplaceableCollection is, in general, allowed to invalidate all indexes on insertion.

The point of indices is for things like searching, where you need to produce an index at the end rather than a value.

1 Like

…but even the recommended approach of manually advancing an index has that problem:

var a = [1, 2, 3]
var i = a.startIndex

while i != a.endIndex {
  if a[i].isEven { a = [] }
  a.formIndex(after: &i)
}

Programmers need to be careful about invalidating indices any time they make non-trivial mutations to a collection. That isn’t going to change.

I’m trying to make a convenient way to iterate over indices without triggering copy-on-write when making simple in-place mutations. And it works, but it has to be spelled IndirectIndices(a) rather than a.indices.

Probably the best solution in this space is to introduce a mutateEach method though.

Ah yes, that makes sense. It’s useful for non-mutating operations that produce an Index. Thanks.