Vector, a fixed-size array

Can we actually just disable implicit copying on the Vector type?

Would it make the type too unwieldy if we required users to explicitly write out copies using the copy operator?

2 Likes

Does “don’t know anything more specific about” mean “not special-cased in the compiler”? Because otherwise I am concerned that within a single module, the language rules allow calls to func f(arg: borrowing Vector<1, AtomicInt>) to be incorrectly optimized into pass-by-value calls.

This is just impossible. Implementation wise, Vector stores a special builtin type. When the compiler is determining whether Vector is pass by value or pass by address it goes to look at all of the stored properties to determine this fact. This builtin says, "if my element is address only, then I am address only. Otherwise I am loadable." Vector<_, Atomic<Int>> will never accidentally be passed by value in any such form ever.

1 Like

So the saving grace isn’t lack of special knowledge, it’s actually more special knowledge.

We have @_staticExclusiveOnly, @_rawLayout, and now this special dependent version of @_rawLayout. Is there perhaps a missing lower-level piece of the puzzle?

Vector<1, AtomicInt> cannot be copyable because it will directly store a move-only value. If the compiler can see the structure of Vector and AtomicInt (which it can see since both should be @frozen), it can see that AtomicInt is also annotated with @_rawLayout which means the compiler cannot move it and must pass it by reference. If the compiler cannot see the structure of either type, it must assume either could be @_rawLayout and must still pass by reference.

Edit: Alex is faster than I am. :slight_smile:

1 Like

The entire standard library is built on special compiler magic. It's not unreasonable to say that a new fundamental type like Vector is built on a little more special magic.

2 Likes

I’m not ruling out special magic entirely, but I am pointing out a number of indications that there might exist a better underlying formalization.

This was also my original feeling, but I've had trouble coming up with an alternative design that doesn't create other problems. I touched on this a bit in the "alternatives considered" for the integer generics proposal:

I think we do want to allow people to have "tail allocations" inside of structs and other value types, and once you allow for that, you have to define how to move/copy/destroy the values inside of that allocation. If we don't have a type to attach those definitions to, then we need to put it somewhere else. And if we don't track the size of the overall type with its "allocations" as generic parameters, then we need to propagate it some other way, that also composes when you nest types and so on. There might still be a nice alternative to a concrete Vector type that we haven't thought of.

I also don't think that Vector's copyability is inherently undesirable. For small buffers inside of small structs that are otherwise copyable, it's desirable to be so. We do need better control over implicit copying, but it seems to me that the thresholds for when people care about implicit copies are not universal, and they also don't cleanly follow type substitution rules, since it's usually a combination of size, number of nontrivial elements, genericity, and other variables that make a copy "expensive".

9 Likes

Given tail allocations and integer generic parameters, a client can construct their own Vector pretty trivially—unless a correct implementation requires builtin compiler magic that is only available to the stdlib. This is why I want to challenge the assumption that such magic is required.

I agree. It would be a minor shame if Vector<1, Int> could not be optimized the same way as Int, but I think that’s a reasonable sacrifice to preserve the correct semantics of Vector<1, AtomicInt>, or to avoid the pitfalls that @lorentey pointed out above.

While unfortunate, not conforming Vector to Sequence and Collection would be an OK stopgap in my opinion.

Span has the same limitation as well and in both cases the solution would be to fall back to a manual for loop.

I'd imagine the people eager to use either probably have a large overlap and represent only a small subset of Swift developers so it's probably OK as a temporary state until we get borrowing iteration.

6 Likes

Alternatively, given integer generic parameters and Vector, you can do tail allocation. Either way, the magic has to bottom out somewhere, either special syntax or a special stdlib type.

2 Likes

You can certainly talk about how some operation on Vector<N, Int> takes O(N) time. Whether the size of the collection is part of the type or not doesn’t change the time or space complexity of an algorithm.

4 Likes

Alas, that just means Vector’s conformance to Collection is technically, not just practically, out of compliance.

Apologies for that side note; I did not intent to derail the discussion with esoteric snark about the big-oh stuff. If y'all consider Vector's slicing subscript to give off complexity vibes closer to "O(n)" than "O(1)", then that's alright with me.

Technically this does not, by itself, prevent the type from conforming to Collection -- the protocol's performance expectations are documented not to be binding, after all, and Collection types do routinely violate them, inside and outside the stdlib. But it is certainly a mark against it.

Linear time complexity for slicing and makeIterator() is a real problem, but it may not be as big as it initially looks. (Although it does work directly against the goals of the performance predictability roadmap that this effort is ultimately a part of.)

Space complexity, on the other hand, will certainly be a very serious problem. (Note: neither Sequence nor Collection has documented expectations about memory use.) Vector is very much intended to gain widespread use in severely memory-deprived environments, oftentimes serving as storage for C-style buffers. It seems unlikely we can afford the luxury of making full copies of such things every time someone dares invoking a for-in loop over them.

So it seems something has to give: we can either (1) try changing Sequence and Collection, or (2) simply not conform Vector to either. I argue that at this time, it would only be practical to choose the latter option.

Sequence and Collection deeply assume that it is cheap to copy conforming types; indeed that's a large part of what gives them their expressive power. Iteration, slicing, our rich set of lazy transformations are all heavily based on this assumption. I do not believe we can (or should) try to break this.

However, with an investment of considerable sweat and tears, it may be possible to soften this (and other) assumptions. I think there is a real chance that this will end in failure; and I believe it would be an unwelcome distraction to try going in this direction at this time.

I believe we have a better chance of succeeding at this once we have established a workable model for containers of noncopyable/nonescapable elements, and once we have successfully generalized a few more protocols than just ExpressibleByNilLiteral. Trying to start directly with Collection would not end well.

We have a real and extremely pressing need for a noncopyable container model; but after extensive effort, we still do not have a workable plan that would allow Sequence and Collection to support such containers. Accordingly, the plan has been to propose a separate set of protocols to serve them. The new protocols need to trade some of Collection's elegance and expressive power for unbridled (and, crucially, predictable) performance.

Obviously, Vector will be able to efficiently conform to these new protocols. The noncopyable container protocols will provide Vector with (borrowing/consuming) for-in loop syntax and a set of efficient generic algorithms, avoiding the performance pitfalls that plague Sequence/Collection. I expect they will satisfy the very singular taste of low level systems engineers working in memory-constrained contexts.

(We also have the option to go even further, by not just removing the Sequence/Collection conformances, but also removing Vector's conditional copyability. This would avoid all problems with copying vectors, and that seems tempting -- so let's look at why I think that would be a bad idea.

The performance concerns that arise from copying Vector are in fact precisely why we initially drafted it as a noncopyable type, until a colleague smarter than me showed us why it needs to be conditionally copyable. (Conditional) copyability is what makes Vector a vector, and it is what gives it its primary use case: to be a less awkward substitute for homogeneous tuples. I do not believe a noncopyable type could fill that need, but I think we do need solve it, e.g., for sensible C interop.

Removing Vector's conditional Sequence/Collection conformance certainly harms this tuple-replacement use case, but it still leaves plenty of food on the plate. Removing its copyability would break it.)

10 Likes

I'm of the opinion that we should just accept it as a constraint in the initial version of this type because the utility of Collection conformance far outweighs the potential performance problems. Once we have borrowing/non-escaping iteration (which we presumably need for move-only collections anyway) we can move the type over to it and eliminate the performance issue.

I can certainly appreciate the concern about embedded or low-memory systems here, and I think my response would be "do not iterate in embedded." Perhaps we could suppress Collection conformance when $Embedded, at least in the short term?

3 Likes

I think we can assume that we will add some syntactic sugar for this new type, in line with arrays and tuples.
Given that, it's not too important to pick a short and catchy name (like the proposed Vector) for "the real name". Better keep these short names for specialized usages (e.g. mathematical vectors) and use a longer and more descriptive name for this new type, e.g. FixedSizeArray.

5 Likes

Wouldn't the main issue be generic contexts rather than for-in iteration though?

For for-in this might get fixed with a recompile after borrowing iteration is available. But the footgun of passing a Vector to an API accepting only a Sequence or Collection and not their future borrowing counterparts would remain forever.

Although that could maybe be addressed with compiler or linter warnings once the new noncopyable container protocols exist?

3 Likes

I suppose we could punt a bit and add an asCollection(_:) member, something like:

extension Vector {
  public typealias CollectionRepresentation = UnsafeBufferPointer<Element>
  @inlinable public borrowing func asCollection<R, E>(_ body: (CollectionRepresentation) throws(E) -> R) throws(E) -> R {
    try withUnsafeBytes(of: self) { buffer throws(E) in
      try buffer.withMemoryRebound(to: Element.self) { buffer throws(E) in
        try body(buffer)
      }
    }
  }
}

And then rely on the compiler avoiding a copy when getting the address for the buffer pointer. But that just doesn't seem particularly ergonomic, and the compiler can be a fickle beast.

I don’t think this assertion stands without a very strong argument. Accidentally-quadratic algorithms are a well-known trap in Swift’s iterables design, and as @Karoy_Lorentey pointed out it would be particularly ironic for a loop over a Vector to be quadratic in time and (super-)linear in space when the whole point of the data type is to reduce external allocations.

14 Likes

Given all of the discussion here about needing more language support for noncopyable/nonescapabale container types and efficient/syntactically concise iteration over them, I’m left wondering if this proposal isn’t putting the cart before the horse - sure, this proposal addresses some real-world gaps now, but until the supporting pieces (like borrowing for/in loops and efficient Collection conformance or equivalent) are there I think we run the risk of discovering later that this proposal made an incorrect long-term choice and have to deal with a painful transition for what is being pitched as a fundamental, low-level building block type.

There are currently ugly but useable and well-understood workarounds to the immediate pain this pitch addresses and I’d personally like to see this punted until there are more pieces in place for the larger vision of how this type fits into the standard library and language features.

14 Likes