[Pitch] Add `remove(where:)` Method for `Set` and `Dictionary`

This proposal introduces a new method, remove(where:), for the Set and Dictionary data structures in Swift. The method provides a streamlined way to remove elements that satisfy a given condition, improving code readability, performance, and developer experience.

3 Likes

Don’t forget to link to the proposal.

here is the pitch: 🚀 Add remove(where:) for Set and Dictionary 🎯 by krishpranav · Pull Request #2647 · swiftlang/swift-evolution · GitHub
PR: ✨ Add remove(where:) Method to Set and Dictionary for Seamless Element Removal ✨ by krishpranav · Pull Request #78600 · swiftlang/swift · GitHub

As @scanon mentioned on GitHub, can you clarify where you would find this method to be useful? Set and Dictionary don't have a notion of order, so a method that removes the "first" that matches a predicate in iteration order would behave unpredictably. What use case can this solve?

I think what Dave wants is a removeAll(where:) method.

4 Likes

As @scanon also notes in that GH issue (beat me to it!), these operations already exist (in an inverse capacity) as

Well, sort of. filter makes a new set or dictionary, while Dave wants to remove the elements in-place. Unfortunately, filter uses the term-of-art name (while the not-in-place operation "ought to be" filtered by the naming guidelines otherwise), which doesn't leave a great name for the in-place operation (which is why I'm sort of vaguely hopeful that we can turn this into a partition instead).

7 Likes

Ah, of course. I'd missed that :see_no_evil:

Going off of Dave's

I realized I don't understand if this is supposed to be guaranteed to work. (It does work.) Is lazy "a snapshot"?

public extension Set {
  @inlinable mutating func subtract(where shouldBeRemoved: @escaping (Element) -> Bool) {
    subtract(lazy.filter(shouldBeRemoved))
  }
}

Edit: It is a snapshot:

extension Sequence {
  public var lazy: LazySequence<Self> {
    return LazySequence(_base: self)
  }
}
public struct LazySequence<Base: Sequence> {
  internal var _base: Base
  internal init(_base: Base) { self._base = _base }
}

Lazy sequence types don't support errors so the equivalent for supporting errors definitely requires "a separate collection".

@inlinable mutating func subtract<Error>(
  where shouldBeRemoved: (Element) throws(Error) -> Bool
) throws(Error) {
  // This `filter` returns a `Set`. 
  // `lazy.filter` would return an `Array`.
  do { try subtract(filter(shouldBeRemoved)) } 
  catch { throw error as! Error }
}

Off-topic as I realise this pitch is about removing all that match the predicate, but I would find a singular remove to be helpful as I often know that there will be at most one match and exiting after the first one would be more efficient.
E.g. I have a dictionary of unique objects with keys derived from the object in some way. One of the objects is then modified in a way that changes the derived key. I need to update the dictionary but I'm not sure what the old key was so I need search through it to find the object.
Such a function could be named removeOne(where:), which has precedent in e.g. databases like mongodb.
Further off-topic, I would also love a removeFirst(where:) for arrays :joy:

[edit] On-topic, +1 on the pitch! Agreed that removeAll(where:) seems a better name.

2 Likes

fwiw, IdentifiedArray may be better suited for the case where you need unique identification from a key(s) where the hash value isn't necessarily the most appropriate.

1 Like

I think that removeAll(where:) and removeFirst(where:) to be better names than what is proposed, and they both are already in the standard library in the form of naming.

Swift's Array already has removeAll(where:) and you can remove the first element based on a predicate using firstIndex(where:) and remove(at:) for sequences/collections/dictionaries (the proposal uses the first(where:) functions in its implementation, but only working with the indexes would be better IMO).

That being said, I still see value in adding the two proposed functions as they currently don't exist in the standard library for the given types, and I currently have extensions for them in my projects because they're pretty useful.

+1see my follow-up response below

Clarification: the pitch here removes specifically the first match in iteration order (and returns it). You can see this explicitly shown in both the pitch text and the PR.

If the key is dependent and entirely derived from the value, is Dictionary the right type to model this collection?

1 Like

Can you explain, perhaps, what the use case is for removing the first match that appears in iteration order for a Set or Dictionary?

as i understand it, the pattern here is to have some ‘canonical table’ like [ObjectIdentifier: ObjectModel], and some dynamic index of the objects like [ObjectName: ObjectModel] that needs to be updated when objects are renamed.

Hm, the way I read the pitch, remove is the first match for Set, but all matches for Dictionary:

[edit] But the code does appear to remove only one, so this is a bit confusing.

A Dictionary is important for efficiently fetching the items by key.

I cannot think of a valid use case in removing the first match for a Set. We can achieve that just using remove(_ member:). I do see valid use cases using removeAll(where:) for a set and I probably don't need to explain why (manipulation is more performant when working with sets).

The only valid use case for removeFirst(where:) for a dictionary that I can think of would be in data fragmentation and multiple parent/child dictionaries that hold different data, but a better data structure would invalidate that use case. However, I do see valid use cases using removeAll(where:) for a dictionary.

I rarely use a dictionary as data storage, but not having these functions makes working with them harder.

Looking at the proposal again, I do not support it 100% in its current form. I only support the implementation of removeAll(where:) for Set and Dictionary, and only removeFirst(where:) for dictionary, as neither of them are in the standard library and are pretty useful.

Right: the proposal and implementation removes only one item—the first item in iteration order—that matches the predicate. I am specifically questioning whether this is indeed something we want to add to Set and Dictionary.

I think it's a separate conversation about removeAll—it's not proposed or implemented here, that I can see.

But in your scenario for use of removeFirst(where:), the value is an object that is modified in ways that invalidate the key, such that you need to fetch the items from the collection by the value?

Is this a real-world, best-practice use case for Dictionary.removeFirst(where:)—i.e., should the standard library be vending such an API if a better data structure is what would best serve users?

I'm not entirely sure. I think it would be nice to have when working with nested dictionaries or data fragmentation using dictionaries, but that's about it. A more experienced developer would use a better data structure (at least I hope so) but I don't think this edge case should justify adding it to Swift.

You're right. If this was about removeAll I would support it.

removeAll is a well-defined operation for both Set and Dictionary. Removing the first match is undefined behavior, because "first item" is undefined for Set and Dictionary.

I accept the optimization argument, in case only one item matches the predicate. The best use case I can think of, is a table with primary and secondary keys. The primary is used as index in the Dictionary, the secondary can be used as alternative (and inefficient) identification.

I would formulate the removal of a single item slightly different, in order to make it well-defined. The method can be called removeUnique (assuming this name is available). The documentation should say that it is the responsibility of the caller to guarantee that at most one item matches the predicate, otherwise the behavior is undefined. The implementation remains as proposed (find a match, remove it, and stop searching because no other match is expected).