SE-0453: Vector, a fixed size array

Jumhyn · November 14, 2024, 3:07am

Hello Swift community,

The initial review of SE-0453: Vector, a fixed size array begins now and runs through 27 November, 2024.

For this initial review, the Language Steering Group requests that feedback on the name Vector be withheld. Feedback should be focused on the substantive API surface and fundamental behavior of the type. A subsequent review will be run at a later date where naming feedback will be considered on-topic.

Reviews are an important part of the Swift evolution process. All review feedback should be either on this forum thread or, if you would like to keep your feedback private, directly to the review manager. When emailing the review manager directly, please keep the proposal link at the top of the message.

Trying it out

macOS toolchain

What goes into a review?

The goal of the review process is to improve the proposal under review through constructive criticism and, eventually, determine the direction of Swift. When writing your review, here are some questions you might want to answer in your review:

What is your evaluation of the proposal?
Is the problem being addressed significant enough to warrant a change to Swift?
Does this proposal fit well with the feel and direction of Swift?
If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?
How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

More information about the Swift evolution process is available at

https://github.com/swiftlang/swift-evolution/blob/main/process.md

Thank you,

Freddy Kellison-Linn
Review Manager

xwu · November 14, 2024, 4:31am

Two bits of feedback—

  public init<State: ~Copyable, E: Error>(
    expand state: consuming State,
    with next: (inout State) throws(E) -> Element
  ) throws(E)

  public init<E: Error>(
    unfold first: consuming Element,
    with next: (borrowing Element) throws(E) -> Element
  ) throws(E)

These are pretty obviously useful. The point I want to make here is that expand and unfold are clearly being used as terms-of-art and—given that other noncopyable collections are probably going to pop up sooner or later—we should be happy with their naming here as they're undoubtedly going to be used as precedent.

This is all very reasonable—but even if Equatable conformance itself is not available, my intuition is that it's going to be quite important that we have == and != supported (even if it's @_alwaysEmitIntoClient), much in the same way that we're bringing in key useful, implementable collection-like APIs.

A similar auditing of the "other protocols" for supportable, useful APIs that we will want on the first go (lest end users find themselves filling the gaps and maybe incorrectly) might be a good idea.

Jon_Shier · November 14, 2024, 4:42am

I'd also like to point out that with is a pretty poor external label, as it contains exactly zero useful information about the parameter. Since this is a closure parameter it won't be seen in most cases, but a better name would be nice.

To prevent the headache of this and any potential new availability feature, we're holding off on these conformances until they are fully generalized.

While there are reasons to do this, adding these conformances after the fact always seems to become a Big Deal for one reason or another, and would also make the type and these conformances have different deployment targets. Can we just skip that and hold the proposal until it's complete?

MPLewis · November 14, 2024, 8:43am

We do plan to propose new protocols that look like Sequence and Collection that avoid implicit copying making them suitable for types like Vector and containers of noncopyable elements.
…

Much of the Collection API that we are generalizing here for this type are all API we feel confident will be included in any future container protocol.
…

Remember, one can still iterate a Vector instance with the usual indices property (which is what noncopyable vector instances would have had to deal with regardless until new container protocols have been proposed):

All of these caveats are absolutely screaming at me that not enough work has been done on those future directions of container protocols, which Vector will be the first conformer to. Please do not push this through without doing the groundwork first, or you risk an extremely painful transition for a type being pitched as a low-level, highly performant building block.

Karl · November 14, 2024, 11:44am

Hm? These seem to me like stopgaps, providing the equivalent of:

extension Array {
  init(_ elements: some Sequence<Element>)
}

except that Sequence doesn't support non-copyable Element types or throwing, so it uses a closure instead.

I expect the new collection protocols will support both of these things, at which point the initialisers become redundant. So I don't think these will be used as precedent.

Speaking of which, initialisation from a sequence seems useful, but I couldn't find it mentioned in the proposal. The SIMD types have such an initialiser, and I think it makes sense on Vector as well.

xwu · November 14, 2024, 3:39pm

Karl:

These seem to me like stopgaps, providing the equivalent of:
extension Array {
  init(_ elements: some Sequence<Element>)
}
except that Sequence doesn't support non-copyable Element types or throwing, so it uses a closure instead.

I'm not so sure about that. Even if sequence(first:next:) were extended to support noncopyable elements, it'd still be pretty lousy usability to require a user first to obtain such a sequence from a global function, then to initialize Vector in a second step.

That said, circling back to my original point, given that we have precedent with sequence(first:next:) and sequence(state:next:), perhaps those should be the same labels used here for the arguments to Vector.init.

xwu · November 14, 2024, 3:45pm

I empathize with the sentiment, but it's a no-win situation: it's difficult to design useful protocols well without conforming types which exercise them (exhibit: the present collection hierarchy, which can't accommodate noncopyable elements), it's difficult to design types that are expected to conform to a protocol hierarchy without that hierarchy already designed, and it's nigh impossible to design them all at once.

There is no way to start but to start somewhere: that this will be painful doesn't mean that it will be more painful than any alternative.

Karl · November 14, 2024, 4:03pm

We haven't felt the need to add such APIs to other containers, with copyable elements. So it stands to reason that if we had a version of Sequence which allowed for non-copyable elements and throwing, we wouldn't add these initialisers.

I guess I don't really mind having these as a temporary solution, but I also wouldn't disagree with suggestions to drop them until the Sequence successor is available. Doing so would make Vector less usable for non-copyable containers and element types, but that just reflects the reality of where the language is right now. While we continue to lack a protocol akin to Sequence or Collection with support for non-copyable types (both conformers and element types), generic algorithms which support those things (such as these Vector initialisers) are not really expressible or usable.

xwu · November 14, 2024, 4:18pm

But is that because the elements are...always copyable?

The whole crux of the problem is that containers which support noncopyable elements will share APIs which we haven't fully worked out that aren't present in our current protocols and types. It does not stand to reason that, because we don't have some API currently among our current collection types, it then won't be generalized for future containers that support noncopyable elements.

Indeed, I would hold open the possibility that each addition here without precedent on other types could indeed be replicated going forward: that is the design exercise we're engaging in.

Karl · November 14, 2024, 4:56pm

As far as I can tell, this is not a pattern that users (at least on these forums) frequently struggle with and ask to be made more visible, and the proposal does not make any argument whatsoever that these are particularly good APIs that deserve to be made so prominent.

Now, I'm quite sure these APIs only exist as a stopgap until we have a generic solution, but you're coming back with a very strict interpretation which forces me to concede that that is technically an assumption on my part.

But then I would counter that, using the same strict way of thinking, since the proposal does not make any argument in favour of these specific APIs or discuss their shape (and if they are actually needed as separate entry-points, that discussion would need to get quite deep in to what the Sequence successor will support, and so is surely out of scope), they must be dropped. We can't yet know whether they are needed.

xwu · November 14, 2024, 5:04pm

This goes to my reply above to @MPLewis: we have to start somewhere—implicit in running this proposal for review before any successors to the collection hierarchy is that we start the work here. We can only know that they’re needed for this concrete type, and we can only know they’re needed more broadly later.

Karl · November 14, 2024, 5:10pm

OK, I'm on board with that, but a lot depends on what "needed for this concrete type" means - is it only 'needed' because we don't have the proper abstraction yet, or is it 'needed' in the sense that it represents something intrinsic to the type and will always be needed?

Also, we need a better explanation of why the API is shaped as it is in the proposal if we are to evaluate either one of these.

Joe_Groff · November 14, 2024, 5:30pm

The thing that's unique about Vector that demands these initializers isn't the fact that it's potentially noncopyable, but that it's fixed size and must always be fully initialized. For pretty much any other data structure, expand and unfold could be expressed in terms of generic operations like init(), reserveCapacity(), and append, even for noncopyable elements, but you can't append or remove elements from a Vector, so the initial set of elements must be generated at initialization time. This is a concern that's pretty specific to Vector, so we'd want something shaped like these initializers independent of what protocols we have now or come up with later.

I do have concerns that even these initializers aren't sufficiently primitive to allow for more esoteric initializations patterns (even though they may still be useful enough to provide pre-built in the standard library). The most general unsafe initializer would obviously be something like Array's init(unsafeUninitializedCapacity:initializingWith:) initializer, which could give you an UnsafeMutableBufferPointer and leave it up to the programmer to populate as they see fit. A more general safe primitive might be an initializer that invokes a closure to initialize each element while passing a Span or MutableSpan (once we have those) over the previously-initialized elements. That would still only work for sequential initialization, but would allow the closure body to involve existing elements in the initialization however it sees fit (which could allow for sorting, shuffling, etc.).

Karl · November 14, 2024, 5:56pm

I don't follow. How are expand and unfold more suitable than a generic init(_: some NonCopyableSequence<Element>)?

Other than support for non-copyable types and throwing, how are they any better than the SIMD initialiser (which also creates a fixed-size, always fully initialised value, with no reserveCapacity or append)?

Joe_Groff · November 14, 2024, 6:20pm

If you already have such a sequence, and the sequence is consumable, then yeah, you could have a safe initializer that consumes the elements from the sequence and moves them into the new Vector. It isn't clear yet that consuming out of in-memory data structures is a generally useful operation that's worth the implementation complexity, since doing so requires tracking partial initialization state as you consume out the sequence (and potentially need to abort and put things back in a consistent state if you throw or break out of the consumer). Also, that initializer would need an initializer like expand or unfold to be implemented in terms of (if not written as a primitive against unsafe building blocks itself).

nnnnnnnn · November 14, 2024, 9:01pm

I agree that matching the sequence function labels would be great – I'm pretty sure that the review adding that API considered unfold as a name and went the other way.

I agree with some of the other commenters that initializing from an existing sequence could be a common / useful thing to do. It's relatively simple but definitely not straightforward to build that initializer on top of the (state:next:) initializer. Could I suggest two variations?

extension Vector { // where Element: Copyable
    /// Creates a vector with the first `count` elements of `sequence`.
    /// Returns `nil` if `sequence` has too few elements.
    init?(prefixing sequence: some Sequence<Element>) { ... }

    /// Creates a vector with the elements of `sequence`.
    /// Returns `nil` if `sequence` doesn't have exactly `count` elements.
    init?(exactly sequence: some Sequence<Element>) { ... }
}

Because most of us aren't using non-copyable values yet, these would make it easier to get set up with a vector that's initialized to something other than a constant:

var counters = Vector<5, _>(prefixing: 1...)! 
// [1, 2, 3, 4, 5]

guard var localSlice = Vector<16, _>(prefixing: myArray[i...]) 
    else { return }
// work with localSlice

Alejandro · November 14, 2024, 9:11pm

This is also trivially done with just the index closure based initializer though without an optional:

var counts = Vector<5, Int> { $0 + 1 }
// [1, 2, 3, 4, 5]

guard myArray.count >= 16 else {
  return
}

var localSlice = Vector<16, _> { myArray[i + $0] }

xwu · November 15, 2024, 1:24am

Hmm, this and Nate's init?(prefixing:) do feel more like potato/potato (ha), but I think it's less obvious how we would straightforwardly achieve the (good) effect of having init?(exactly:) return nil if the number of elements don't exactly line up without having it as a dedicated API.

Ben_Cohen · November 15, 2024, 3:44am

nnnnnnnn:

Could I suggest two variations?

    /// Creates a vector with the first `count` elements of `sequence`.
    /// Returns `nil` if `sequence` has too few elements.
[...]
    /// Creates a vector with the elements of `sequence`.
    /// Returns `nil` if `sequence` doesn't have exactly `count` elements.
}

You left off the option I'd choose, which is it traps with the wrong number of elements.

Returning nil for too few feels like silently accepting what is quite likely an error with two many by just ditching the overflow. But copying all the elements, and then bailing when there are too many really feels like "nil on programmer error" which is not generally a pattern the standard library follows.

This feels very different from Dictionary's "nil is an expected likely result and so combining looking for a key and return its value is a useful and ergonomic operation". This seems more like the equivalent subscript on Array.

Ben_Cohen · November 15, 2024, 3:53am

(and if you have a sequence you know is longer, myVec(seq.prefix(n)) ought to be cheap)