SE-0256: Introduce {Mutable}ContiguousCollection protocol

tkremenek · April 3, 2019, 6:44am

The review of SE-0256: Introduce {Mutable}ContiguousCollection protocol begins now and runs through April 12, 2019.

The proposal is written by @Ben_Cohen .

Reviews are an important part of the Swift evolution process. All review feedback should be either on this forum thread or, if you would like to keep your feedback private, directly to me as the review manager via email or direct message on the forums. If you send me email, please put "SE-0256" somewhere in the subject line.

What goes into a review of a proposal?

The goal of the review process is to improve the proposal under review through constructive criticism and, eventually, determine the direction of Swift.

When reviewing a proposal, here are some questions to consider:

What is your evaluation of the proposal?
Is the problem being addressed significant enough to warrant a change to Swift?
Does this proposal fit well with the feel and direction of Swift?
If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?
How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

Thank you for contributing to Swift!

Ted Kremenek
Review Manager

benrimmington · April 5, 2019, 1:14am

Could the new protocols also add withUnsafeBytes(_:) and withUnsafeMutableBytes(_:) methods?

This would be similar to the Swift._HasContiguousBytes and Foundation.ContiguousBytes protocols. Foundation also conforms EmptyCollection and CollectionOfOne.

Users could also require Bytes: ContiguousCollection where Bytes.Element == UInt8 for safety.

Ben_Cohen · April 5, 2019, 1:22am

If you have a typed pointer/buffer, you can trivially turn it into a raw pointer/buffer.

This means withUnsafeBytes could be added as a protocol extension (i.e. it wouldn't need to be a customization point on the protocol). However, I don't think this is a good idea. My personal opinion is given how trivial it is to convert from one to the other, it was a mistake to add methods to Array to do this, as it clutters the API surface without adding sufficient benefit. I would rather not perpetuate that on any new contiguously stored types in future.

This is different from Data and ContiguousBytes, which only provide untyped buffers.

You make a good point about adding the conformance to CollectionOfOne and EmptyCollection. I should have included that in this proposal.

benrimmington · April 5, 2019, 1:55am

A generic method in a low-level library (which can't import Foundation) might want to support:

Array<UInt8>, ArraySlice<UInt8>, etc.
ByteBuffer from SwiftNIO.
Data from Foundation.

Is there a way to do this, with guaranteed access to contiguous storage?

Ben_Cohen · April 5, 2019, 2:11am

No. To do that they would need a ContiguousBytes-like protocol in the standard library. Which might make for a reasonable proposal, but it would be a different proposal.

This proposal cannot address that goal, because Data cannot provide a typed pointer so could not conform to the protocols being proposed.

anandabits · April 5, 2019, 2:15am

+1. This looks like a nice addition to the language. I’m a big proponent of using the type system to get static semantic guarantees.

I agree with the conformance of `Array. The exception for bridged arrays of objects is an extreme edge case. Further, such arrays are even more unlikely to be used with the API provided by these protocols. Clear documentation of this such as that included in the proposal is sufficient.

Douglas_Gregor · April 11, 2019, 12:57am

I'm -1 on this proposal, because I don't feel that the use cases warrant the expansion of the Collection protocol hierarchy.

The Collection protocol hierarchy is a nontrivial protocol hierarchy that confronts users fairly early in their use of the Swift standard library. Each protocol in the hierarchy brings with it a wealth of algorithms and some easily-describable set of conforming types. Because it's so visible and already fairly complex, there should be a very high bar to extending it with new protocols. The use cases in this proposal aren't compelling enough to meet that high bar.

The primary example in the proposal shows how cumbersome it is to use withContiguousStorageIfAvailable as motivation for {Mutable}ContiguousCollection:

func dspAdd<A: Collection, B: Collection>(
  _ a: A, _ b: B, _ result: inout [Float]
) where A.Element == Float, B.Element == Float {
  let n = a.count
  // try accessing contiguous underlying buffers:
  let wasContiguous: ()?? =
    a.withContiguousStorageIfAvailable { abuf in
      b.withContiguousStorageIfAvailable { bbuf in
        vDSP_vadd(abuf.baseAddress!, 1, bbuf.baseAddress!, 1, &result, 1, UInt(n))
      }
  }
  // if they weren't contiguous, create two arrays try again
  if wasContiguous == nil || wasContiguous! == nil {
    dspAdd(Array(a), Array(b), &result)
  }
}

However, we can do a lot better for the cases where we want to realize storage by adding a few simple extensions, e.g.,

extension Collection {
  func withForcedContiguousStorage<R>(
    _ body: (UnsafeBufferPointer<Element>) -> R
  ) throws -> R {
    if let result = withContiguousStorageIfAvailable(body) {
      return result
    }

    return Array(self).withContiguousStorageIfAvailable(body)!
  }
}

extension MutableCollection {
  mutating func withForcedContiguousMutableStorage<R>(
    _ body: (inout UnsafeMutableBufferPointer<Element>) -> R
  ) throws -> R {
    if let result = withContiguousMutableStorageIfAvailable(body) {
      return result
    }

    var array = Array(self)
    let result = array.withContiguousMutableStorageIfAvailable(body)!
    for (myIdx, arrayIdx) in zip(self.indices, array.indices) {
      self[myIdx] = array[arrayIdx]
    }
    return result
  }
}

The dspAdd example is basically the same as with the proposed solution, but without the type system there to enforce contiguousness:

func dspAdd<A: Collection, B: Collection, R: MutableCollection>(
  _ a: A, _ b: B, _ result: inout R
) where A.Element == Float, B.Element == Float, R.Element == Float {
  let n = a.count
  a.withForcedContiguousStorage { abuf in
    b.withForcedContiguousStorage { bbuf in
      result.withForcedContiguousMutableStorage { rbuf in
        vDSP_vadd(abuf.baseAddress!, 1, bbuf.baseAddress!, 1, rbuf.baseAddress!, 1, UInt(n))
      }
    }
  }
}

Unlike in the proposal, this version will "work" when given a noncontiguous collection, in the sense that it will provide the correct semantics. However, performance will be poor due to the extra copies incurred. This is not that much different from when copy-on-write causes performance surprises because the collection being referenced is not unique. Neither situation is ideal, but on balance we benefit from having the simpler semantics, and performance-minded programmers can ensure that the passed-in arrays are contiguous if the problem manifests in production. A performance-critical API like vDSP_add could choose to log (or even trap!) at runtime to guide programmers along the most-efficient path, without pushing the guarantee into the type system.

It is reasonable to point out that introducing the protocols makes the "contiguous" contract stronger (it does!), but that point is made weaker because array isn't always contiguous, and there are a number of other data structures (like String and Data) that are often-but-not-always contiguous. Designing an API to require ContiguousCollection excludes such data structures, which is unfortunate given that developers are getting explicit control to make a String instance contiguous.

Doug

tkremenek · April 18, 2019, 4:59am

Proposal Rejected

The Core Team did not feel the motivation behind the proposal warranted expanding the Collection protocol hierarchy. The Collection hierarchy is already quite rich, and any motivation to expanding it in the Standard Library needs to meet a high bar in general utility. At this time the proposed change was deemed too niche to motivate that change.

Thank you to everyone who participated in the review of this proposal!