`init(capacity:)` on RangeReplaceableCollection

Nevin · October 28, 2018, 5:21pm

Introduction

This pitch is for a small convenience to create arrays with a specified capacity:

var x = [Int](capacity: n)

Motivation

The RangeReplaceableCollection protocol provides an empty initializer and a reserveCapacity method, and it is extremely common to use them together. Here are just a few examples from the standard library:

// RuntimeFunctionCounters.swift
var functionNames : [String] = []
functionNames.reserveCapacity(numRuntimeFunctionCounters)

// Sequence.swift
var result = ContiguousArray<T>()
result.reserveCapacity(initialCapacity)

// String.swift
var a: [TargetEncoding.CodeUnit] = []
a.reserveCapacity(targetLength + 1)

// StringUTF8View.swift
var result = ContiguousArray<CChar>()
result.reserveCapacity(utf8.count + 1)

Although these are only two lines each, that is still twice as many lines as necessary. It is a small amount of boilerplate, but it is boilerplate nonetheless and it adds up.

Solution

To streamline code like the above, this proposal adds the following initializer to the standard library:

extension RangeReplaceableCollection {
  @inlinable
  public init(capacity: Int) {
    self.init()
    self.reserveCapacity(capacity)
  }
}

This allows the previous examples to be written as:

// RuntimeFunctionCounters.swift
var functionNames = [String](capacity: numRuntimeFunctionCounters)

// Sequence.swift
var result = ContiguousArray<T>(capacity: initialCapacity)

// String.swift
var a = [TargetEncoding.CodeUnit](capacity: targetLength + 1)

// StringUTF8View.swift
var result = ContiguousArray<CChar>(capacity: utf8.count + 1)

Other types

Dictionary and Set do not conform to RangeReplaceableCollection, however they do implement init(minimumCapacity:). To align the spelling, we could rename these existing initializers so they become init(capacity:) as well. This is not a core part of the proposal, but is mentioned here for completeness.

xwu · October 28, 2018, 5:38pm

Bikeshedding point: I think this would be most accurately "minimumCapacity" as it is on existing types, as the underlying capacity meets or exceeds the requested amount.

Other than that I think this sounds pretty reasonable.

Nevin · October 28, 2018, 5:59pm

I thought about that, and it basically comes down to two things:

1. The existing method that this wraps is reserveCapacity not reserveMinimumCapacity

2. This is a convenience feature, so it should be convenient

xwu · October 28, 2018, 6:06pm

The term "reserve" (to me at least) implies minimum, and I think the name is accurate. When I reserve dinner for four people at a restaurant, I am not taken aback when the table is big enough for six.

On the other hand, since Swift uses the term "capacity" to refer to the amount allocated, an initializer labeled "capacity" more strongly suggests that you get that amount allocated. This is bolstered by common usage of the word in scenarios such as the capacity of a room, where it is not a minimum or a suggested amount.

I don't find it convincing that a longer label is less convenient, and I would expect this new feature to align with the existing usage of terminology in Swift.

Anyway, my two cents on the name. I like the idea overall.

Nevin · October 28, 2018, 6:17pm

Similarly, “capacity” also implies minimum.

If I ask for a cell phone with the capacity to store my music collection, I am not taken aback when it can do so a hundred times over.

Or if a teacher tells a student “I know you have the capacity to learn this”, it is no surprise that the teacher also believes the student can learn many other things too.

Thus I think “minimum” is unnecessary on the existing Set/Dictionary initializers.

xwu · October 28, 2018, 6:27pm

That is indeed another way to look at it, but it is ambiguous--much more so than "reserve." If I were to ask you what the capacity of your 128GB iPhone is, it wouldn't be correct to say that it's 16GB.

Moreover, even in the example you give, you may in fact be taken aback. If you ask for a phone with the capacity to store your music collection, and the salesperson sells you a phone that can store it 100 times over but at 100 times the cost, you would be justified in being taken aback!

In any case, the existing precedent argues for "minimumCapacity" and the bar for breaking from precedent is a high one, which I don't think is achieved merely by the benefit of a slightly shorter name that most people don't have to type anyway thanks to autocomplete. (Plus, the usual adage about code being read more often than written.)

Ponyboy47 · October 28, 2018, 7:16pm

I had need of this functionality just yesterday and am definitely in favor of it. I was actually a little surprised it didnt already exist.

On the bikeshedding front, I would have to agree that we should be consistent and match the existing reserveCapacity. While i do think ‘minimumCapacity’ is more acurate of what happens behind the scenes, unless we’re going to change the function name to match the initializer we should just be consistent.

Nevin · October 28, 2018, 7:27pm

To clarify, are you suggesting the spelling init(reserveCapacity:)?

Ponyboy47 · October 28, 2018, 7:36pm

Sorry yes, that would my preference

beccadax · October 28, 2018, 7:38pm

As the coiner (I think) of “conveniences should be convenient,” I’ll rebut with a different slogan, this time from the API Design Guidelines: “Clarity is more important than brevity.”

torquato · October 28, 2018, 7:47pm

Both can be very convenient.

MLewin · October 28, 2018, 8:02pm

+1 to the proposal. I've wondered why this doesn't already exist.

I'm also happy to store my bike in this shed regardless of how it is constructed.

antonthehuman · October 28, 2018, 8:07pm

I need to mention that the existing method is reserveCapacity(_ minimumCapacity: Int), so it would be natural to bump parameter name into argument label for initializer.
And the reason for minimum word is also that this guarantee that capacity won't be "exact", but rather "at least" this.
But anyway +1 for proposal

Ponyboy47 · October 28, 2018, 9:21pm

I didnt realize the actual argument name was minimumCapacity!

Not sure if this means it should actually be named init(minimumCapacity:) because of that or if it shouldnt be named that since people may not realize that’s its name

Whatever is does end up getting named, I’ll just be happy it’s an actual initializer

Rod_Brown · October 28, 2018, 9:31pm

I’m gonna back init(minimumCapacity:) to give clarity and consistency of the behaviour. Bumping the parameter name up from reserveCapacity(_ minimumCapacity) further solidifies this idea.

I’d prefer to avoid the ambiguity around “capacity” as a word. For fixed length arrays for example (if/when they happen) capacity would be a min and max wouldn’t it?

Jens · October 29, 2018, 12:39am

I would prefer init(minimumCapacity:), (or perhaps init(reserveCapacity:)) over init(capacity:), for the reasons given above and because here:

var a = Array<UInt8>()
a.reserveCapacity(1)
print(a.capacity) // Prints 16

I reserved a minimum capacity of 1 and got 16 (exactly).

It would be weird if this should instead be interpreted as:
I reserved a minimum capacity if 1 and got 16 or more.

The documentation for the .capacity property says that it is:

The total number of elements that the array can contain without allocating new storage.

And I think this:

var a = [UInt8](minimumCapacity: 1)
print(a.capacity) // Prints 16

makes more sense than this

var a = [UInt8](capacity: 1)
print(a.capacity) // Prints 16

benrimmington · October 29, 2018, 1:59am

I filed SR-1431 (with a confusing title) a few months before the Swift 3.0 release, to suggest making the following APIs standard:

var capacity: Int { get }
init(minimumCapacity: Int)
mutating func removeAll(keepingCapacity: Bool = false)
mutating func reserveCapacity(_ minimumCapacity: Int)

Since then, SE-0165 added the missing APIs to Dictionary and Set.

Using init(capacity:) rather than init(minimumCapacity:) could lead to false assumptions, if I understand the pitch for SE-0223 correctly.

RangeReplaceableCollection has a default implementation of reserveCapacity(_:) which does nothing.

github.com

apple/swift/blob/a05dd3ed26214a66ce11a84d08d20168b65c5c98/stdlib/public/core/RangeReplaceableCollection.swift#L645-L656


      
          /// Prepares the collection to store the specified number of elements, when
          /// doing so is appropriate for the underlying type.
          ///
          /// If you will be adding a known number of elements to a collection, use
          /// this method to avoid multiple reallocations. A type that conforms to
          /// `RangeReplaceableCollection` can choose how to respond when this method
          /// is called. Depending on the type, it may make sense to allocate more or
          /// less storage than requested or to take no action at all.
          ///
          /// - Parameter n: The requested number of elements to store.
          @inlinable
          public mutating func reserveCapacity(_ n: Int) {}

Foundation.Data only has this no-op reserveCapacity(_:) AFAICT. There's an existing Data.init(capacity:) which could be deprecated and renamed.

Should a get-only capacity property also be added, to RangeReplaceableCollection and/or concrete collection types? A default implementation could return the count, if that makes sense.

Nevin · October 29, 2018, 8:14pm

Sounds good to me.

jrose · October 29, 2018, 10:14pm

I've been going the other direction when this has come up in the last few months: there is nothing useful you can do from knowing the current capacity, and so it doesn't make any sense to expose it. Just call reserveCapacity.

jarod · October 29, 2018, 11:29pm

There was some discussion back in 2016 about the empty initializer requirement of RangeReplaceableCollection. I'm not sure how feasible it is at this point considering source compatibility, but I've always hoped that this would eventually be revisited and the requirement could be removed. I have some custom collection types that have additional stored properties with no reasonable default value. It's useful and makes sense for them to conform to RangeReplaceableCollection and get all of its built-in functionality, but the empty initializer requirement makes that problematic because of the non-defaultable properties. I'm currently hacking around it by trapping in the empty initializer, but that's obviously not ideal.

I don't want to hijack this thread to discuss that change, but I wonder if the functionality of this proposal could instead be split into a separate protocol so that we don't add more APIs that rely on the empty initializer of RangeReplaceableCollection.