SE-240's CollectionDifference.Change

SE-240 introduced a new type, CollectionDifference, for ordered collection diffing.
The computed insertions and removals are expressed through an enum named Change, which looks like this:

public enum Change {
    case insert(offset: Int, element: ChangeElement, associatedWith: Int?)
    case remove(offset: Int, element: ChangeElement, associatedWith: Int?)
}

(source)

This type is rather unergonomic, as the associated values of each case are identical. Accessing the properties of a Change requires pattern matching, and not every associated value may be relevant.
The design is also problematic when iterating over the insertions: [Change] or removals: [Change] properties of CollectionDifference:

for insertion in diff.insertions {
    switch insertion {
        case .insert(offset: let offset, element: let element, associatedWith: let associatedOffset):
            /* highlight all inserted lines some color */
        case .remove(_, _, _):
            fatalError("Found a `remove` when iterating over insertions?")
}

The source even defines internal helper computed variables to avoid this.

The following design feels more natural to me:

public enum Change {
    public enum Kind {
        case insert
        case remove
    }

    public let kind: Kind
    public let offset: Int
    public let element: ChangeElement
    public let associatedOffset: Int?
}

Is there a motivating reason for Change to be structured as an enum with associated values?

1 Like

cc @numist, who wrote the original proposal. (Thanks, by the way!)

1 Like

I would suspect the rationale is similar to that of the core team in rejecting case accessors for Result:

Case accessors are a more complex question. The current proposal doesn't include case accessors, but the original proposal did, and we're anticipating a future proposal that will add automatic synthesis for case accessors. That synthesis will undoubtedly use names based on the actual case names, so at a first glance, it seems imprudent to use case names that we're not sure we'd want for the accessors. [...] More importantly, we don't want to burden this proposal because we haven't resolved those questions yet to our satisfaction.

That rationale makes sense for Result, which is necessarily a tagged union because the type of its payload differs between the two cases.

In Change, though, the enum is really just a flag describing the type of change; the related data is the same in both cases. Case accessors would make this design work, but is the ergonomic difference between switch change and switch change.kind significant enough to justify an unergonomic design absent a new future language feature?

Thanks for ccing me!

The core team didn't actually express an opinion about the enum nature of Change (and at the time I wasn't aware of any history), but there is some backstory here.

I've been working on this proposal for about 4 years, and Change was originally coded as an enum because those first versions of the proposal encoded cases that had wildly different associated values. One early version looked like:

enum Change {
  case remove(at: Int)
  case insert(element: ChangeElement, at: Int)
  case move(from: Int, to: Int)
}

This version had a number of major issues that the proposal (and discussion) goes into more deeply, but to summarize: reverting a diff requires remove be able to encode elements, and move makes it impossible to safely apply a diff with a 1-pass enumeration of its changes.

That's all history, but I think it still makes sense to use associated values for Change because, despite their matching types, offsets are not directly interchangeable between instances of insert and remove. To crib from a related conversation in the proposal thread:

In this way, insert and remove behave like different (but extremely related) types.

That leaves element as a truly shared value, and Change could have been more purely represented with a type like:

public enum Change {
    public enum Type {
        case insert(offset: Int, associatedWith: Int?)
        case remove(offset: Int, associatedWith: Int?)
    }

    public let type: Type
    public let element: ChangeElement
}

…but that's also pretty cumbersome to deal with, and is no better at serving your examples.

That all said, I am very sympathetic to the ergonomic issues and don't consider the API to be finished. The curse of writing a new API is that you can't always predict how it will be used (assuming it gets used at all :sweat_smile:), so my main goal for SE-0240 was for the idea of diffing to be incorporated into the standard library in a distilled form that reserved the most flexibility for future improvements.

That totally includes exposing the (currently internal) convenience accessors, and between the standard library and your example here I think there's a strong argument for exposing an offset convenience accessor in the future.

1 Like

Thanks Scott!

If I'm understanding you correctly, it's appropriate to keep the offsets as part of the enum cases (despite having identical types) because the semantic interpretation varies between cases.

If that's the case, it would be interesting to consider a design where the associated values are bundled into their own structs, i.e.

public enum Change {
    public struct Insertion {
        public let offset: Int
        public let element: ChangeElement
        public let associatedOffset: Int?
    }

    public struct Removal {
        public let offset: Int
        public let element: ChangeElement
        public let associatedOffset: Int?
    }

    case insert(Insertion)
    case remove(Removal)
}

This approach retains the same semantic distinction, and the helper computed variables could be defined on Change just as easily. It also comes with the following advantages:

  • Pattern matching on an instance of Change means binding only one value, which reduces noise in cases where not all properties of the payload are used.
  • The ergonomic problem of the insertions and removals properties in CollectionDifference disappears by updating the types of these properties:
public let insertions: [Change.Insertion]
public let removals: [Change.Removal]

Thus eliminating the invalid state case that currently exists when iterating over these properties.

6 Likes

Nice improvement!