Inquiry: Improved Composability of stdlib Collections?


(plx) #1

Swift’s standard library's core collections' value-like, copy-on-write behavior is handy in the small, but they compose rather poorly with each other: if you have e.g. [K:Set<V>], and plan to mutate the “inner" sets, it seems at present as if there is no way to avoid an unnecessary transient copy of each set for each mutation you perform upon it.

Various workarounds/solutions exist but all are full of unfortunate tradeoffs and/or limited further composability; in concrete cases you can do alright, but it’s hard to craft a generic solution that doesn’t inadvertently reintroduce the problem you started-out trying to solve.

Is anything currently on the roadmap to address this specifically — whether optimizer improvements, extended APIs, implementation adjustments, a `borrow` construct, or some other approach?


(Joe Groff) #2

Do you have specific examples of problems you're trying to solve?

-Joe

···

On Dec 5, 2015, at 11:34 AM, plx via swift-evolution <swift-evolution@swift.org> wrote:

Swift’s standard library's core collections' value-like, copy-on-write behavior is handy in the small, but they compose rather poorly with each other: if you have e.g. [K:Set<V>], and plan to mutate the “inner" sets, it seems at present as if there is no way to avoid an unnecessary transient copy of each set for each mutation you perform upon it.

Various workarounds/solutions exist but all are full of unfortunate tradeoffs and/or limited further composability; in concrete cases you can do alright, but it’s hard to craft a generic solution that doesn’t inadvertently reintroduce the problem you started-out trying to solve.

Is anything currently on the roadmap to address this specifically — whether optimizer improvements, extended APIs, implementation adjustments, a `borrow` construct, or some other approach?


(plx) #3

Swift’s standard library's core collections' value-like, copy-on-write behavior is handy in the small, but they compose rather poorly with each other: if you have e.g. [K:Set<V>], and plan to mutate the “inner" sets, it seems at present as if there is no way to avoid an unnecessary transient copy of each set for each mutation you perform upon it.

Various workarounds/solutions exist but all are full of unfortunate tradeoffs and/or limited further composability; in concrete cases you can do alright, but it’s hard to craft a generic solution that doesn’t inadvertently reintroduce the problem you started-out trying to solve.

Is anything currently on the roadmap to address this specifically — whether optimizer improvements, extended APIs, implementation adjustments, a `borrow` construct, or some other approach?

Do you have specific examples of problems you're trying to solve?

I'm not entirely sure if you're asking:

1. what are some specific problems with the standard library collections' composability?
2. what are some specific problems with available solutions to the problems in (1)?
3. what are some specific problems for which *performant* nested collections would be useful?

...so I'll answer (1) and (2) quickly, then provide some context for (3).

Before I'll begin, I'll preface this by re-stating what I wanted to know:

- whether or not the problems I see for (1) and (2) are seen as such
- whether or not anything is already on the roadmap to address (1) and (2)

...and in case, thanks for your time and consideration.

For (1), the problem is simply that when you nest Swift's collections -- at least within a `Dictionary`, e.g. `[K:Set<V>]` -- there's currently no direct way to do efficient, in-place mutations of the inner `Set<V>` values; performance here is essentially quadratic (see Appendix A).
  
That you get ~quadratic performance in this context isn't surprising, given the collections' design; the aspect I see as problematic is the lack of any straightforward way to do any better.

For (2), if you want e.g. all of the following:

- a collection `DictionaryOfSets` behaving ~ [K:Set<V>]
- re-using the basic collection types
- with value-like semantics (e.g. to fit in with the rest of the language)
- with better-than-quadratic performance on updates

...then the following outline seems to be the minimal amount of work to achieve all of the above:

- a class like `class Box<T> { var storage: T }` (etc.)
- a struct like `struct _DictionaryOfSets<K,V> { var storage: [K:Box<Set<V>>] }`
  - best if this implements all-or-most of the public API on `DictionaryOfSets`
  - knows how to perform a deep copy of `storage`
- a struct like `struct DictionaryOfSets<K,V> { var buffer: ManagedBuffer<_DictionaryOfSets<K,V>,Void> }`
  - takes care of checking `isUniquelyReferenced` and making deep-copies when needed
  - with ownership now assured, forwards all calls to the buffer's `_DictionaryOfSets`
  
...which is certainly tractable, but also feels rather disproportionate: to make it work we've now added 2 additional levels of indirection -- one due to the outer `ManagedBuffer`, and one more due to `Box` -- which we only need because we have no other way to manipulate the essentially-equivalent ownership-and-copying-(...etc.) behavior of `Dictionary` and `Set`.

This all feels like it could -- and should! -- be unnecessary, with e.g. some way of indicating an intent to do an in-place update if possible.

As for (3), a surprising amount of application-infrastructure code amounts to babysitting a nested collection of some kind.

As one example, if you want a generic solution to the "pool-and-reuse a bunch of expensive-to-create elements" (consider e.g. `UITableView` or `UICollectionView`), a `[Key:Set<V>]`-like collection is extremely handy:

    protocol ReusableComponentType: Equatable, Hashable {
      func prepareForPooling()
      func prepareForReuse()
      var reuseIdentifier: String { get }
    }
    
    protocol ComponentFactoryType {
      typealias Component: ReusableComponentType
      func instantiateComponent(reuseIdentifier: String) -> Component
    }
    
    class ReusableComponentPool<Component:ReusableComponentType> {
      private var factories: [String:AnyComponentFactory<Component>]
      private var poolSizeLimits: [String:Int]
      private var reusePool: DictionaryOfSets<String,Component>
      // ^ NESTED COLLECTION HERE, ~ [String:Set<Component>]
      
      func dequeueComponentForReuseIdentifier(identifier: String) -> Component {
        if let existing = self.reusePool.popFirstElementForKey(identifier) {
          existing.prepareForReuse()
        } else if let created = self.factories[identifier]?.instantiateComponent(reuseIdentifier: identifier) {
          return created
        } else {
          fatalError("No registered factory for identifier '\(identifier)'!")
        }
      }
      
      func poolComponentForPotentialReuse(component: Component) {
        self.reusePool.insertElement(
          component,
          forKey: component.identifier
        )
      }
      
      func drainOverfullPoolsIfNecessary() {
        for (identifier,sizeLimit) in self.poolSizeLimits {
          self.reusePool.shrinkSetForKey(
            identifier,
            ifAboveCount: sizeLimit
          )
        }
      }
      
    }

...but it's also a handy thing in other places (tracking active-transfers-by-host as ~`[String:Set<NSURL>]`, tracking discovered characteristics-of-services in `CoreBluetooth` as ~`[CBUUID:Set<CBUUID>]`, tracking invalidated layout attributes by supplementary-element-kind as ~`[String:Set<NSIndexPath>]`, organizing equivalence-classes for certain types of unit tests, and so on).

Note that it's not just `[K:Set<V>]`, either; sometimes `[K:[V]]` is handy, sometimes `[K:[Q:V]]` is handy (and ~ `[(K,Q):V]` isn't right); if Swift's stdlib gets an `OrderedSet` or `OrderedDictionary` there would be uses for those, as well.

Note that although in any specific use-case, you are arguably better off with a specific solution, e.g.:

    private class SingleComponentPool<Component:ReusableComponentType> {
      let factory: AnyComponentFactory<Component>
      let sizeLimit: Int?
      var pool: Set<Component>
    }
    
    class ReusableComponentPool<Component:ReusableComponentType> {
      private var pools: [String:SingleComponentPool<Component>]
    }

...instead of the earlier above, there's still arguably a utility to having a fancy collection that can be used in various contexts, with a rich supporting API.

In closing, thanks again for taking the time to read and thanks in advance for any consideration or response.

## Appendix: Concrete Benchmarks

To make the sure the performance implications are clear:

    func test10BareInserts() {
      self.measureBlock() {
        var target = Set<Int>()
        for index in 0..<10 {
          target.insert(index)
        }
      }
    }
    
    func test10NestedInserts() {
      self.measureBlock() {
        var target = [Int:Set<Int>]()
        var target[0] = Set<Int>()
        for index in 0..<10 {
          target[0]?.insert(index)
        }
      }
    }

...and so on, for e.g. `test100*`, `test1000*`, ...etc. Here's how this performs:

- 10 inserts:
  - bare: *0.000003
  - nested: *0.000008
- 100 inserts:
  - bare: *0.000012
  - nested: *0.000073
- 1000 inserts:
  - bare: *0.000081
  - nested: 0.003
- 10000 inserts:
  - bare: 0.001
  - nested: 0.431
- 100000 inserts:
  - bare: 0.009
  - nested: 41.937
- 1000000 inserts:
  - bare: 0.112
  - nested: ??? (didn't run, but > 1 hr if trend continues…)
  
…(where the un-marked values are average times, and those marked `*` are eyeballed modes over the runs, since XCTest seemingly rounds averages < 0.001 to 0.000). Tested under a Release build, FWIW.

As always whether this matters depends on problem-size and context.

···

On Dec 5, 2015, at 8:10 PM, Joe Groff <jgroff@apple.com> wrote:

On Dec 5, 2015, at 11:34 AM, plx via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

-Joe