SE-0453: Vector, a fixed size array

xwu · November 15, 2024, 6:29am

Yes, in some ways. However, the bounds of an array's indices are trivially obtainable before subscripting, while in the general case the number of elements in a sequence is not (single-pass, etc.).

In such cases where checking the precondition yourself is not straightforward, the standard library does offer nil-returning operations as a pattern. And of course, a force unwrap gets you your preferred alternative with no fuss.

grynspan · November 15, 2024, 6:20pm

This type also serves as a safer replacement to withUnsafeTemporaryAllocation when the amount of things to reserve is known statically:

One of the primary motivators for that API was that it was often impossible to safely express certain patterns where you needed to have one (or more) values that were initialized via some foreign language (read: C) function.

This type is of course meant to be a safe replacement for that sort of pattern, but it sounds like it will require values to be eagerly initialized, so it can't help with those interop cases. Any ideas how we can close that gap and eliminate remaining use cases for temporary buffers?

Another protection the temporary buffers API provides is that it automatically heap-allocates larger buffers. This type can't do that, but does the compiler know enough to heap-allocate instances of sufficient length? Or is this going to blow up my stack?

let span = Span<1_000_000, UInt128>(...)

dnadoba · November 15, 2024, 6:30pm

I don't think Vector is supposed to be used for this kind of use case. read doesn't guarantee to initialize all elements. In fact it is quite common for it to not do so. You want a fixed capacity type for that, not a fixed sized type like Vector. OutputSpan sounds like a more suitable type for that. There isn't a type level guarantee for any of reads behavior though. A wrapper function will still be required, either manually created or automatically through special annotations of the interface.

grynspan · November 15, 2024, 6:34pm

Then we should make sure the proposal and any documentation don't position Vector as a replacement (except under the conditions where it can be, of course.)

(I worry the tone in this reply may come off too brusque—if it reads that way, it's not intentional! I just want to make sure we guide developers to the best available solutions for the problems they face, and C interop is the Problem That Will Never Die.)

jrose · November 16, 2024, 9:08pm

I’d like to see a bit more written down on why size = stride for Vector. I do think it’s the right choice, but it’s technically inconsistent with the tuple workaround if the element type has size < stride. (Fortunately all C types define size = stride, so this won’t be a problem for imports.)

Implementation concerns re: imports

The following facts seem incompatible:

swiftinterfaces contain inlinable code as source
Compiled swiftmodules implement cross-references by matching name and type (unless y’all have changed that)
A compilation unit can only import a C type one way, even if it’s referenced from multiple modules with conflicting swift-versions / upcoming-feature dialects

How are you planning to change the imported types of arrays without breaking existing inlinable code, and still allowing mixing modules built with and without this option? For struct fields you might be able to work around it with property overloading tricks, but typedefs like uuid_t and pointer-to-array function parameters (const uint8_t (*key)[32]) kind of need a canonical form.

benrimmington · November 18, 2024, 11:42am

The existing sequence(first:next:) API gives its closure the previous element. The proposed Vector.init(unfold:with) API always gives its closure the first element. I think (~~the previous element~~) a span of previously-initialized elements would be more useful, as suggested by @Joe_Groff.

The current index might also be useful, or perhaps the initializers could be combined?

extension Vector where Element: ~Copyable {

  public init<State: ~Copyable, Failure: Error>(
    state: consuming State = (),
    first: consuming Element? = nil,
    next: (
      _ index: Int,
      _ state: inout State,
      _ previous: Span<Element>
    ) throws(Failure) -> Element
  ) throws(Failure)
}

wes1 · November 18, 2024, 7:15pm

With the linked macOS toolchain, I'm unable to use an explicit numeric type parameter.

The inferred parameter works with literals

var v3i: Vector<_, Int> = [0, 1, 2]

But any explicit numeric type parameter fails, blocking tests for the closure initializer:

var v3i = Vector<3, Int> { $0 }

Errors:

On number: Expected '>' to end generic argument clause
At top of file: New Swift parser generated errors for code that C++ parser accepted
- Diagnostic: parser_new_parser_errors
- (assertions enabled in the toolchain?: building stdlib with assertions)

Am I missing some experimental/upcoming feature flag?

Would building the toolchain without assertions unblock this error?

Thanks in advance -

Configuration details

macOS toolchain: (org.swift.pr.76438.1604)
macOS 15.1 on Apple silicon
Xcode 16.0 (16A242d) (but same result from command-line with SPM)
Package.swift settings
- Swift language mode: 6
- SwiftSettings: .unsafeFlags(["-Xfrontend", "-disable-availability-checking"])
  - (to avoid availability error wrt macOS 9999)

Alejandro · November 18, 2024, 7:36pm

Yes, you'll need to add -enable-experimental-feature ValueGenerics for the swift-syntax parser to parse integer generics.

lorentey · November 19, 2024, 12:36am

We've already laid that groundwork!

Vector is not introducing any new core container API that we haven't already committed to support in SE-0437, with the UnsafeBufferPointer generalizations that we shipped in Swift 6.0. The Span type that was recently accepted in SE-0447 also shipped with a partial subset of the UBP API surface, reinforcing this even more.

Luckily, we don't need to make random decisions here -- in fact, our hands are tied by the need to interoperate with our rich set of existing constructs.

The new container protocols are not expected to replace Collection; they will need to indefinitely coexist with it. Often, types will want to conform to both protocol families, so synchronizing their shared API surface is the only practical choice.

For instance, the two protocol families will share the same concept of an Index, and they will also share much of the same core index-related APIs:

Index (defined by U*BP, Span and Vector)
isEmpty, count (implemented by U*BP, Span and Vector)
subscript(_: Index) (implemented by U*BP, Span and Vector)
startIndex, endIndex (implemented by U*BP and Vector)
index(after:), formIndex(after:) (implemented by U*BP and Vector)
index(before:), formIndex(before:) (implemented by U*BP and Vector)
index(_:offsetBy:), distance(from:to:) (implemented by U*BP)
swapAt(_:_:) (implemented by U*BP and Vector)

The only entry point on this list whose shape isn't entirely settled here is actually the most important one: the indexing subscript. We don't currently have a way to implement the right accessors for such subscripts. This is a shared limitation across UBP types, Span types and now Vector. We have not blocked UBP generalizations or Span on resolving this issue; that we're also proposing Vector without it fits into this pattern, although things are getting a little uncomfortable.

(Swift's extended accessor model is still being drafted, so we aren't yet able to provide the accessors we actually need on any of these types. Vector's subscript is currently implemented with _read and _modify accessors, but we know that those do not have the correct semantics. Span and UBP are currently using special unsafe addressors that are unique to them. We are hoping that the (eventual) replacements of the current accessors will not lead to source compatibility issues. A recent development has put some question marks around that; I hope to get some clarity soon.)

So this proposal does not tweak the core APIs that we've already embraced in SE-0437.
There are a few places where Vector is slightly pushing boundaries, though -- let's consider those.

The `Indices` typealias

SE-0437 did not generalize UnsafeBufferPointer's preexisting indices property; however, SE-0447 did decide to add one to Span. This was a special compromise to enable more pleasant iteration: in SE-0437, I assumed people will be okay with for i in 0 ..< buf.count; @glessard was of the opinion that for i in span.indices is more practical.

Now, in SE-0453, @Alejandro decided that Vector should adopt the indices property from Span, but he also introduced a new Indices associated type. I don't mind that; it matches Collection's preexisting concept, and it will definitely not conflict with whatever "indices view" a noncopyable container protocol might provide (if any).

(The upcoming container protocols will not come with an Indices associated type, nor an indices property requirement -- the names Indices/indices are taken by Collection, and the shape of the existing indices view isn't going to work for noncopyable containers. But this should not prevent specific container types from providing an indices property that's compatible with Collection. The random-access container protocol may even provide a Collection-style range-returning indices property as a default "algorithm" for strideable-indexed containers, whether or not the general container shape will include anything like an indices view.)

We can also choose to simply remove the Indices typealias from Vector (while keeping the property, Span-style). As the typealias serves no actual purpose in this context, this may be the most pragmatic choice.

To restore consistency across these types, we'll want to retroactively generalize the indices property on the U*BP types. (We can do this in the next round of stdlib generalizations, as part of the series of salad proposals started by SE-0437 -- we don't need to dilute the Vector proposal with such work.)

On the new initializers

SE-0453 includes init(expand:with:), init(unfold:with:), and reduce(into:_:). The expanding initializer is intended to be the reverse of reduce; the unfolding initializer is the reverse of a hypothetical fold method that we aren't proposing at this time:

extension Vector where Element: ~Copyable {
  func fold<E: Error>(
    _ updateAccumulatingResult: (inout Element, borrowing Element) throws(E) -> Void
  ) throws(E) -> Element?
}

let numbers = Vector<100, Int>(expand: MyFavoriteRNG()) { $0.next() }
let sum = numbers.fold { $0 += $1 }

Note that the precise signatures of the new initializers are specific to Vector, and only Vector -- in other cases, we'll probably want these initializers to take an explicit count.
Still, the new initializers are establishing useful new terminology, and they set an important naming precedent for defining similar operations elsewhere (if we decide to do that). I believe they are particularly important to have on Vector, so now is a good time to introduce them, and to argue about their names.

For example, would Vector(unfolding: foo) { ... } and Vector(expanding: bar) { ... } read better?

On the `reduce` algorithm

Vector.reduce(into:_:) is an analogue of the preexisting Sequence.reduce(into:_:) algorithm. It is less clear to me that it is a good idea to add this operation right now, as I expect the container protocols will (soonish) come with generic variants for the same operation, and there is a real chance that this definition will interact badly with those.

The primary problem is that removing the expectation of copyability splits the classic reduce(into:) into (at least) two separate variants: one that borrows items, and another that consumes them, along with the vector itself.

The proposal makes the decision to use the unqualified reduce(into:) name for the borrowing variant, but I don't think it's likely that will fit our eventual general rule. In any case, it is much too early to make that decision right now -- we do not have full insight into how the dust will settle. (For example, one potential direction is to generally use the unqualified names for the consuming variants, and to introduce a "borrowing view" to hold the others: vector.map vs. vector.borrowing.map.)

Accordingly, I recommend removing reduce(into:) from the proposal, pending future work in this area.

Alejandro · November 19, 2024, 10:01pm

The typealias is mainly an artifact from when this type conditionally conformed to Collection et al. It is trivially removable since it shouldn't be that useful anymore since the removal of the conformance.

wes1 · November 19, 2024, 10:56pm

Should the docs emphasize that the fixed size has to be statically determinable?

After using this a bit, I seem to be blocked by the constraints on the Integer generic parameter being a literal or a type parameter. I can't build anything like resize or even initialize from array for arbitrary sizes, and virtually all of my applications require an arbitrary size.

If developers can't write this, could perhaps Array and String.utf8 be updated to produce a Vector with the corresponding count?

Similarly, should the docs state that every element is de-initialized, notwithstanding errors (including initialization failure)?

wes1 · November 19, 2024, 11:26pm

FYI, some issues with the current toolchain implementation, that shouldn't affect the semantics discussion. Worth a bug report?

First, very large count parameters crash the compiler.

Second, looking for how swapAt(..) handled ~Copyable elements, I found instead that reading from a mutable Vector ~Copyable element crashes the compiler with a trace including Found outside of lifetime use?!

@main
struct M {
    public static func main() {
        struct T: ~Copyable { var i: Int }
        var tn = Vector<2, T> { T(i: $0) }
        let i = tn[0].i
        print(i)
    }
}

Jean-Daniel · November 21, 2024, 7:41pm

I think Array is what you're looking for. Vector is explicitly designed as a fixed size collection (with size statically defined and part of the type).

The constraints on the Size parameter are defined by SE-0452: Integer Generic Parameters and may be relaxed in the future, but Vector would still remain fixed size.

wes1 · November 22, 2024, 1:50am

<apologies: wording/bikeshedding only...>

Understood. Resize, et al are to produce new vectors with new elements, and hence a different type with a new count parameter. Initializing a Vector with the same count as an input Array is the first thing many users will want to do, and could do in every other language with arbitrary fixed-size buffers. I don't reject the proposal limitations, but believe they could be communicated better, and raised a doc issue.

For docs/communication, I (still) think most readers will be confused by the proposal terms "fixed-size", "array", and "integer" as confusingly similar to other concepts with broader meanings.

"Inline storage of fully-initialized values" really helps to explain this type and imply its other constraints, but leaves out count. And it makes "array" as a variable-capacity reference type a doubly inapt comparison.

All Swift types are fixed size. Here what's fixed by the type parameter is not the size but the count/capacity. I appreciate the generality of "integer generic parameters" (discussed in its future directions), but we could avoid implying anything about integer semantics here by describing this as a something like a "count-dependent type" and referring to the "count parameter". This is a nominal buffer type that happens to use a number for a name signifying element count and for equality constraints.

e.g., here's code using count as a type parameter

extension Vector {
  func mirrored<T>(_ by: (Element)->T) -> Vector<count, T> {
    .init { by(self[$0]) }
  }
}

What could go wrong with this usage? Well, the difference between "count" and "capacity" is exposed as soon as one overlays Vector with zero-value behaviors, as needed to consume elements or use parts (e.g., in a ring). (I'm still wondering what it means to consume indexed ~Copyable values.) And as soon as the type checker learns to do forms of M+N or M*N where dimensionality is implied, "length" might sound better.

So: "fixed-count inline storage of fully-initialized values"? Rename count to capacity or length for future-proofing?

SE-0453: Vector, a fixed size array

The Indices typealias

On the new initializers

On the reduce algorithm

The `Indices` typealias

On the `reduce` algorithm