SE-240's CollectionDifference.Change

mpangburn · March 27, 2019, 2:51am

SE-240 introduced a new type, CollectionDifference, for ordered collection diffing.
The computed insertions and removals are expressed through an enum named Change, which looks like this:

public enum Change {
    case insert(offset: Int, element: ChangeElement, associatedWith: Int?)
    case remove(offset: Int, element: ChangeElement, associatedWith: Int?)
}

(source)

This type is rather unergonomic, as the associated values of each case are identical. Accessing the properties of a Change requires pattern matching, and not every associated value may be relevant.
The design is also problematic when iterating over the insertions: [Change] or removals: [Change] properties of CollectionDifference:

for insertion in diff.insertions {
    switch insertion {
        case .insert(offset: let offset, element: let element, associatedWith: let associatedOffset):
            /* highlight all inserted lines some color */
        case .remove(_, _, _):
            fatalError("Found a `remove` when iterating over insertions?")
}

The source even defines internal helper computed variables to avoid this.

The following design feels more natural to me:

public enum Change {
    public enum Kind {
        case insert
        case remove
    }

    public let kind: Kind
    public let offset: Int
    public let element: ChangeElement
    public let associatedOffset: Int?
}

Is there a motivating reason for Change to be structured as an enum with associated values?

mpangburn · March 27, 2019, 2:57am

cc @numist, who wrote the original proposal. (Thanks, by the way!)

xwu · March 27, 2019, 3:06am

mpangburn:

This type is rather unergonomic, as the associated values of each case are identical. Accessing the properties of a Change requires pattern matching, and not every associated value may be relevant.
The design is also problematic when iterating over the insertions: [Change] or removals: [Change] properties of CollectionDifference:
for insertion in diff.insertions {
    switch insertion {
        case .insert(offset: let offset, element: let element, associatedWith: let associatedOffset):
            /* highlight all inserted lines some color */
        case .remove(_, _, _):
            fatalError("Found a `remove` when iterating over insertions?")
}
The source even defines internal helper computed variables to avoid this.

I would suspect the rationale is similar to that of the core team in rejecting case accessors for Result:

Case accessors are a more complex question. The current proposal doesn't include case accessors, but the original proposal did, and we're anticipating a future proposal that will add automatic synthesis for case accessors. That synthesis will undoubtedly use names based on the actual case names, so at a first glance, it seems imprudent to use case names that we're not sure we'd want for the accessors. [...] More importantly, we don't want to burden this proposal because we haven't resolved those questions yet to our satisfaction.

mpangburn · March 27, 2019, 3:19am

That rationale makes sense for Result, which is necessarily a tagged union because the type of its payload differs between the two cases.

In Change, though, the enum is really just a flag describing the type of change; the related data is the same in both cases. Case accessors would make this design work, but is the ergonomic difference between switch change and switch change.kind significant enough to justify an unergonomic design absent a new future language feature?

numist · March 27, 2019, 8:58pm

Thanks for ccing me!

The core team didn't actually express an opinion about the enum nature of Change (and at the time I wasn't aware of any history), but there is some backstory here.

I've been working on this proposal for about 4 years, and Change was originally coded as an enum because those first versions of the proposal encoded cases that had wildly different associated values. One early version looked like:

enum Change {
  case remove(at: Int)
  case insert(element: ChangeElement, at: Int)
  case move(from: Int, to: Int)
}

This version had a number of major issues that the proposal (and discussion) goes into more deeply, but to summarize: reverting a diff requires remove be able to encode elements, and move makes it impossible to safely apply a diff with a 1-pass enumeration of its changes.

That's all history, but I think it still makes sense to use associated values for Change because, despite their matching types, offsets are not directly interchangeable between instances of insert and remove. To crib from a related conversation in the proposal thread:

SE-0240: Ordered Collection Diffing

That sounds like a good solution! Do we then infer reloads from insert/remove associated with the same offset ?

Not exactly, but you're on the right track! "Position" and "offset" are extremely related concepts, but offsets are vulnerable to shifting caused by insertions and removals of earlier elements. This is the most common reason people have trouble with representing (and using) diffs.

Ultimately to determine if an element moved from its start and end offsets, you need to take into account the number of insertions before its final offset minus the number of removes before its original offset (this is one reason why the diff type exposes sorted insertions and removals properties—they allow people to answer the question of position sameness in logarithmic time).

In this way, insert and remove behave like different (but extremely related) types.

That leaves element as a truly shared value, and Change could have been more purely represented with a type like:

public enum Change {
    public enum Type {
        case insert(offset: Int, associatedWith: Int?)
        case remove(offset: Int, associatedWith: Int?)
    }

    public let type: Type
    public let element: ChangeElement
}

…but that's also pretty cumbersome to deal with, and is no better at serving your examples.

That all said, I am very sympathetic to the ergonomic issues and don't consider the API to be finished. The curse of writing a new API is that you can't always predict how it will be used (assuming it gets used at all ), so my main goal for SE-0240 was for the idea of diffing to be incorporated into the standard library in a distilled form that reserved the most flexibility for future improvements.

That totally includes exposing the (currently internal) convenience accessors, and between the standard library and your example here I think there's a strong argument for exposing an offset convenience accessor in the future.

mpangburn · March 27, 2019, 10:01pm

Thanks Scott!

If I'm understanding you correctly, it's appropriate to keep the offsets as part of the enum cases (despite having identical types) because the semantic interpretation varies between cases.

If that's the case, it would be interesting to consider a design where the associated values are bundled into their own structs, i.e.

public enum Change {
    public struct Insertion {
        public let offset: Int
        public let element: ChangeElement
        public let associatedOffset: Int?
    }

    public struct Removal {
        public let offset: Int
        public let element: ChangeElement
        public let associatedOffset: Int?
    }

    case insert(Insertion)
    case remove(Removal)
}

This approach retains the same semantic distinction, and the helper computed variables could be defined on Change just as easily. It also comes with the following advantages:

Pattern matching on an instance of Change means binding only one value, which reduces noise in cases where not all properties of the payload are used.
The ergonomic problem of the insertions and removals properties in CollectionDifference disappears by updating the types of these properties:

public let insertions: [Change.Insertion]
public let removals: [Change.Removal]

Thus eliminating the invalid state case that currently exists when iterating over these properties.

mattrips · March 28, 2019, 7:10am

Nice improvement!

malhal · May 24, 2024, 12:01pm

I know this was long ago but I was just wondering if at the time you knew that diff is actually a collection itself so you can simply do:
for change in diff {
Which resolves the issue of .remove not being a valid case for insertions.

FYI it is missing from the current version of the docs for some reason.

github.com

numist/swift/blob/e89b53409f44e7c156748665ecaa382ecf8efdd9/stdlib/public/core/CollectionDifference.swift#L216


      
                switch c {
                  case .remove(let o, let e, let a):
                    return .insert(offset: o, element: e, associatedWith: a)
                  case .insert(let o, let e, let a):
                    return .remove(offset: o, element: e, associatedWith: a)
                }
              })
            }
          }
          
          /// A CollectionDifference is itself a Collection.
          ///
          /// The enumeration order of `Change` elements is:
          ///
          /// 1. `.remove`s, from highest `offset` to lowest
          /// 2. `.insert`s, from lowest `offset` to highest
          ///
          /// This guarantees that applicators on compatible base states are safe when
          /// written in the form:
          ///
          /// ```

taylorswift · May 27, 2024, 2:41pm

@theMomax is it expected that DocC can associate doccomment blocks on extensions to types in the same module? if i remember correctly, this documentation is only preserved for extensions to out-of-module types.

theMomax · May 27, 2024, 9:47pm

The conformance to Collection is listed under Conforms To Relationships. Only the doc comment above that extension block that states the conformance is not included. As far as I am aware that is the expected behavior for extensions to types in the same module, yes.