Approaches for fixed-size arrays

Joe_Groff · July 12, 2022, 9:15pm

We're looking into how best to give Swift the equivalent functionality of fixed-size arrays in C or other systems languages. Here are some directions I've been considering, and I'd love to hear the community's ideas and reactions:

Tuples as fixed-size arrays

This was the approach outlined in our most recent pitch on the subject: we already use homogeneous tuples to represent imported fixed-size array fields from C, and library developers sometimes reach for them manually (usually under duress) to reserve fixed-size space within a data type. So we could embrace this hack, and give the language nice sugar for declaring (n * T) instead of (T, T, T, ...), make such tuples conform to collection protocols so they can be indexed, sliced, and transformed with standard collection APIs, and so on. The main benefit of this approach is compatibility, particularly:

we don't need to break source compatibility with the C importer's existing behavior
we don't need to worry about back deploying a new kind of type metadata to older OS runtimes

On the other hand, there are drawbacks, including:

Swift does not normally reserve tail padding for aggregates, including tuples; for example, the size of (Int32, Int8) would be 5 bytes, even though the stride (the size rounded up to alignment, representing the distance between contiguous values in memory) would still most likely be 8. And by extension, the size of (2 * (Int32, Int8)) would be 8+5 = 13 bytes, rather than the full 16 bytes. If this becomes the recommended tool for fixed-size arrays, then generic code will have to be mindful that the final element may not occupy the entire stride of the type.
Swift's calling convention currently always eagerly destructures tuples. A function taking a (1024 * Int8) argument would therefore take 1,024 Int8 arguments at the machine code level, which is obviously not great. Thankfully such argument types are currently rare, so we could probably get away with changing the default ABI for very large tuples, but as with any ABI break we'd still need to do work to keep compatibility with the original convention just in case.
Tuples have a wide range of other special case behavior, such as elementwise implicit conversions, labeled/unlabeled conversions, exploding into closure arguments, and so on, much of which is not very scalable. Making it easy to define very large tuples will exacerbate these problems if we don't mitigate them.

A new kind of type

We could introduce a new kind of builtin type to represent fixed-size arrays. Starting from a clean slate, we can keep them detangled from the complexities of tuples. Conversely, if we take this route, we have to consider how a new fixed-size array type interacts with backward compatibility with the existing ecosystem:

A new kind of type comes with new runtime metadata formats which won't be fully supported by the Swift runtime in already-shipped OSes. Such types would thus either become unavailable when deploying to those OSes, or be available with limited functionality, unless we're able to back-deploy full runtime support for them.
We would want to migrate the C importer's behavior to use this new kind of type when importing array fields of C aggregates, which would create source breaks. We could manage this by some combination of:
- importing C array fields twice, once under their own fieldName as the existing homogeneous tuple representation, and again as fieldNameArray or something similar as a fixed size array,
- conditionalizing the behavior on language version mode, so that Swift 6 code sees the imported field in its array form.

A nominal type using new generics features

With some new type system features, particularly integer generic arguments, we could theoretically express fixed-size arrays as a regular library type:

struct FixedArray<n: Int, T> {
  // something to represent the layout of the array
}

Any such addition to the language would more than likely have the same back deployment issues for handling new type metadata as a builtin type would. We would also still need some kind of primitive type, perhaps available only to the standard library, to represent the underlying layout of the FixedArray type, so as far as workload and tradeoffs, this approach seems like a superset of the "new builtin type" approach.

Views over fixed-size storage

Alternatively, instead of directly exposing fixed-size arrays as first class types, we could provide the ability for type declarations to allocate fixed-size space within themselves, with API to expose the fixed-size buffer as an UnsafeBufferPointer (or, using our planned extensions to allow for move-only and nonescaping types, a safe BufferView type). We could have something that behaves similarly to a property wrapper, allocating storage for N elements in the containing type, and presenting the wrapped property API as a BufferView
of the storage:

struct MyFloat80 {
  var exponent: Int16 = 0

  // Allocate space within MyFloat80 for four contiguous UInt16 values
  @FixedArray(count: 4, initialValue: 0)
  var significand: BufferView<UInt16>
}

MemoryLayout<Float80>.size // => 10
Float80().significand[0] // => 0

A big benefit of this approach is that it avoids having to expand the type system in backward-incompatible ways—we'd need a mechanism to expand the capabilities of property wrappers so that the FixedArray annotation can allocate the right amount of storage in the containing type, but that's a compile-time transformation, and once that's done, the resulting type representation with fixed storage should fit within our existing runtime type system. This view-over-storage approach could also be a potential design direction for allowing systems programmers to allocate raw memory within classes and move-only structs, for use with atomics, locks, and other low-level primitives that cannot be stored in normal Swift properties, allowing one to write something like:

class LockHolder {
  // Store a lock inline in the object without a separate allocation
  // or dirty hacks
  @RawMemory(of: os_unfair_lock.self, count: 1)
  var lock: UnsafePointer<os_unfair_lock>
}

or even, when we have move-only types:

@moveonly
struct UnfairLock {
  @RawMemory(of: os_unfair_lock.self, count: 1)
  private var lock: UnsafePointer<os_unfair_lock>

  init() { lock.initialize(with: OS_UNFAIR_LOCK_INIT) }
}

class LockHolder {
  var lock = UnfairLock()
}

On the flip side, implementing this design safely relies on a chain of other new language features: we need move only or nonescaping types, as well as the new BufferView family of move-only view types. We are actively working on these features, but there is nonetheless a risk they may not be completed. A BufferView also loses some possibly useful static information about the underlying fixed-size storage

My thinking

I was originally in favor of the homogeneous tuples approach; despite the shortcomings and legacy issues with tuples, it seemed worthwhile to avoid any question of whether something as fundamental as fixed size arrays could be back deployed. Since then, we've gained some practical experience handling back deployment of expansions to the type system, with varying degrees of functionality:

Concurrency added a bunch of new features to the runtime type system, including new builtin types and new function type attributes. We were able to successfully back deploy this functionality to older OSes using a dynamic library; however, doing so was a nontrivial amount of engineering work.
Swift 5.7 adds new forms of existential types like any Protocol<AssociatedType>, and extends the runtime metadata representation to support more complex constraints. These newly supported cases aren't fully supported when deploying to older OSes; however, they can be used in limited circumstances when we know the new metadata forms will not be necessary at runtime, such as in struct or class fields, or as nongeneric function arguments.

Particularly, our experience with expanding existential types is promising for the prospects of introducing a new kind of fixed size array type; it would be very useful to be able to use fixed-size arrays as fields and function arguments, even if they aren't fully expressive when targeting older OS versions. One wrinkle with taking this approach for fixed-size arrays is that they should be Collections, and get most of their API by virtue of being Collections. Calling an unspecialized generic method on Collection would need the type's metadata, and that would not be possible for fixed-size arrays on older OSes if we take this approach. We could conceivably do some forced specialization when possible, for collection methods in the standard library that are inlinable, and that may get us pretty far.

All that said, I see a lot of appeal in the BufferView approach. Oftentimes, fixed-size arrays aren't directly useful for specifically being an array of their fixed size, but are used as a tool to provide what's really an efficient fixed-capacity collection or buffer for some variable amount of information. That leads me to wonder whether fixed-size arrays are more a means to that end, and that focusing on the more general case of being able to allocate a buffer inside the layout of a type is the better problem to solve, and is an approach that can also potentially address the actual fixed-size array case.

benpious · July 12, 2022, 11:13pm

Having tried to use the Vulkan graphics framework from Swift recently and getting bitten by nearly every issue described in the older pitch, I'm very happy to see that there are plans to improve this.

I think the FixedArray approach is interesting if other types are allowed to take advantage of using an integer for the dimension, for example, one could then potentially implement a 3D math library with decls like Matrix<4> for 4 column arrays, Vec<3> for 3D vectors, etc.

The spelling of the last proposal seems like it would be super verbose if the author of a program needs a lot of fixed length arrays; it might make sense to have some sugar for it like [Int; 4] like Rust does.

compnerd · July 12, 2022, 11:28pm

When it comes to working with low level code, the fixed sized arrays being an array of their fixed size tend to be fairly useful. Consider cases where you are processing binary data with explicit record types potentially if fixed known arrays.

Personally, I find myself leaning towards the nominal type with the generic constraints swapped - struct FixedArray<T, N: Int> {...}. This doesn't preclude the views over fixed-size storage and that may still be useful even with the nominal type.

ibex10 · July 13, 2022, 2:57am

Wouldn't something like this be even more natural to write?

// Arrays
let store : Array <Int> [3] = [0, 0, 1]
// or
let store : Int [3] = [0, 0, 1]

// Vectors - I wish Swift had these
let rx13 : Matrix <Int> [3][3] = [0, 0, 1,
                                  0, 1, 0, 
                                  1, 0, 0]

let r1: Row <Real> [3] = [0, 0, 1]

let c1: Column <Real> [3] = [0, 0, 1]

technogen · July 13, 2022, 4:13am

Thank you so much for pushing this forward!

I'm very strongly in favor of the last approach where types are allowed to allocate extra memory in-place. This feature alone would be extremely useful and would cover a lot of use cases for low-level data representations. Later on, when non-type generic parameters are added, a proper FixedSizeArray could be implemented.

DevAndArtist · July 13, 2022, 5:18am

Extra advantages of a nominal type using new generics features is that:

it could use a Range instead of just an integer representing a count which would cover many many more general areas
it would seamlessly introduce the NonEmpty-ness (assuming older collection types could be extended with a default unbound range)

—-

FixedArray is just one type that does not cover the majority of problems. An Array today is a dynamic and unbound collection with the possibility of being empty. Being able to statically represent the boundary of a collection would elevate the existing types to a new level. We could be able to express non-empty sets, dictionaries and other types of collections. We could have collection with minimal amount of values and a fixed upper bound. A property wrapper simply does not seem to cut here.

Erik_Eckstein · July 13, 2022, 9:29am

I think we have to distinguish two use cases:

Where the datatype layout matters and the fixed size array is just n times the element and nothing more, e.g. for clang importing or for interpreting binary data. IMO, for this use case homogeneous tuples are a good solution. It's probably not even needed to support non-trivial or generic element types for this use case.
A "full-featured" fixed-size array type, which can be used as fast alternative to swift's Array.

I think we are in agreement that a goal for a fixed-size array type is that it's a "save" data type. That means:

no buffer overflows: we can do that easily with bounds checking (for both use cases)
no use after free: as you said, for BufferView we can ensure that with move-only/non-escaping language features.
no use of uninitialized data. This has some interesting implications on how to initialize a fixed-size array:

Either it's initialized in one step, e.g.

  var a = FixedSizeArray<String>(repeating: "-", count: 5)

or something like a closure based initializer:

  var a = FixedSizeArray<String>(count: 5) { computeSomeStringForIndex($0) }

But it would also be nice to support initialization with a loop, e.g.

  var a: FixedSizeArray<String>(capacity: 5)
  for i in 0..<5 {
    a.append(computeSomeStringForIndex(i))
  }

For this, the fixed-size array type must maintain some kind of "length" field, which separates the initialized from the uninitialized elements.
A length field would also enable to use a fixed-size array to implement data structures like llvm::SmallVector.

Such an array would rather be a "fixed-capacity" array than a "fixed-size" array.

Therefore I'm also in favor of the BufferView approach (if I understood correctly, a BufferView contains a "length").

Though, it would be nice to support fixed-size arrays with a nice and simple syntax, like with the nominal type approach.
I don't think we need integer generic arguments for this (which would probably a non-trivial thing to add).
It's also doable with "regular" initializer arguments, e.g.

  var a: FixedSizeArray<String>(capacity: 5)

How this works:

the low-level allocation of the storage (implemented in the library) is done with a SIL builtin.
the compiler checks if the argument to the builtin is a constant
the initializer of FixedSizeArray is required to be @inline(always) + @alwaysEmitIntoClient (it makes sense to add a new attribute for this, e.g. @guaranteedInline)

This would also enable that FixedSizeArrays can be nested into other data structures, e.g.

struct Matrix<T> {
  var elems: FixedSizeArray<T>
  
  @guaranteedInline
  init(rows: Int, columns: Int, initialValue: T) {
    self.elems = FixedSizeArray(repeating: initialValue, count: rows * columns)
  }
}

And actually, the capacity doesn't even need to be a compile time constant: the underlying allocation builtin can be codegen'd to a dynamic alloca.

lukasa · July 13, 2022, 11:40am

I think @Erik_Eckstein has a great idea here: we do want to separate this into two problems. The C importer has limits on what it can do and how safe it can be. In particular, it cannot make the safety guarantees that Erik would like to see.

These two separate problems definitely make the BufferView approach tempting.

ibex10 · July 13, 2022, 12:28pm

How about writing a fixed-size array in a concise way, like this?

let a: Array <String> [3] = ["f", "o", "o"]

// or
let a: String [3] = ["f", "o", "o"]

// or
let a: [3 x String] = ["f", "o", "o"]

// 2 dimensional - 3 rows of 5 Ints
var u: Array <Int> [3][5]

// or
var u: Int [3][5]

// or
var v: [3 x [5 x Int]]

Gero · July 13, 2022, 2:38pm

Having by chance read the older indirect for structs thread the @FixedArray/"Views over fixed-size storage" and @Erik_Eckstein points make me wonder if this could or would be abused to define recursive structs, like trees and such.

As that thread mentions people have used the regular non-fixed arrays to get a level of indirection to store values of a struct inside itself (heap be damned, I guess). I've done it myself, actually, but it feels icky as you're using a non-fixed array in spite of knowing how many elements it can/should always have (in a simple linked list either one or none).
Or in other words: If there is something like a fixed array, would this be a valid and good place to use one? Because people will surely do that.

Now I don't know much about what move-only types would entail, so I am not sure whether this potential backdoor into quasi-indirect structs would be even possible with the approaches discussed here (i.e. if there are aspects of this that would be preventing an indirection/recursion), but I thought I'd bring that up so smarter folks can think about it.

John_McCall · July 13, 2022, 3:30pm

How would the lifetime restriction work here such that this could actually be stack-allocated? And it doesn't seem like you could make this a stored property of another type.

If we're looking for a way to create a fixed-sized array exposed as a BufferView/UBP without introducing a first class type (which would either require a new user-exposed structural type or non-type generic arguments), I think we need some sort of property wrapper with a @const length parameter which just creates fixed-size array storage behind the scenes using whatever builtin magic we decide upon in the compiler.

The only problem I see with the magic-property-wrapper approach is that you'd also have to write the type annotation with BufferView, i.e.

@FixedSizeArray(count: 512) var buffer: BufferView<UInt8>

But maybe that's not so bad given that we expect these to be somewhat rarely used.

And hmm, I suppose you'd always be able to write this:

@FixedSizeArray<Int>(count: 512) var buffer

A little unnatural, but tolerable.

tera · July 13, 2022, 3:54pm

Syntax wise, I'm with the "as natural syntax as possible" camp on this. Like the mentioned:

var x: [Int] = [1, 2, 3] // normal resizable array
var y: Int[3] = [1, 2, 3] // fixed array
var z: Int[2][3] = [[1, 2, 3], [4, 5, 6]] // fixed 2d array
var t: Int[] = [1, 2, 3] // possibly even this: new syntax for normal resizable arrays

Joe_Groff · July 13, 2022, 5:07pm

The reason I put the count first in my proposed syntaxes (whether that be (n * T), [n * T], or FixedArray<n, T>) is so that the dimensions compose when you nest the array in the same order as you index them. If you have (4 * (5 * Int)) then the indices range from array[0][0] to array[3][4] and you don't have to mentally swap them, and/or we don't need a special case language rule to interpret a postfix type syntax like Array[4][5] by inverting the nesting of the counts.

Right, that's the idea I had in mind with that approach.

Erik_Eckstein · July 13, 2022, 5:46pm

With the BufferView approach, the actual buffer doesn't need to be located within the data type. It can be allocated somewhere on stack, together with the value which contains the fixed size array.
So the idea is that with the @guaranteedInline initializer, the low-level allocation builtin will end up in the function where such a value is constructed and alive. This builtin then directly translates into a stack allocation. (In case of a class property or global it's a bit different, but the principle is the same).
This allows to put a fixed-size array (or any type which contains a fixed-size array) into another type - as long as their initializers are @guaranteedInline.
The nice thing about this is that the size of the fixed-size array doesn't need to be constant. You can e.g. define a general-purpose "Matrix" (containing a fixed-size array) with a configurable size. See my example.

John_McCall · July 13, 2022, 5:48pm

Sure, but I think that might be a different feature. (It also overlaps heavily with SE-0322.)

Erik_Eckstein · July 14, 2022, 8:49am

It doesn't matter to which feature this maps to. Most importantly we should try to make a great fixed-size array

designerfuzzi · July 16, 2022, 7:39pm

apropos backward compatibility.. throwing this in here..
dealing with clang, there are the following CXTypes for arrays interpreted from C++ or ObjC..
we have of course CXType_ObjCObjectPointer .. in example: NSArray or NSMutableArray etc. which has its sister types Array<Any> in Swift.
and there are..

CXType_ConstantArray
CXType_Vector
CXType_IncompleteArray
CXType_VariableArray
CXType_DependentSizedArray

plus there are UnsafeBufferPointer<YourType> etc.. with the known approach of access via tuples, which is extremely cumbersome to deal with when developing with Audio buffers, Metal shared memory buffers or just some simple [CChar] of fixed size, aligned or nonaligned storing. I almost always end up coding UnsafePointer casts and make use of its siblings storing the length somewhere else.. so yes ConstantArray or DependentSizedArray rather then "FixedSizeArray" would be awesome and finally getting rid of one major barrier of coding in swift. I have my problems with "fixedSize" because fixed size when? ConstantArray or DependentSizedArray like-like make it clear size/stride aka length is known at compile time or is given at runtime.
The most important missing feature is being able to access indices without this tuple nightmare.

Matrix is just a naming convention and comes down to its data type because you can store simd [float4] in an array and have a matrix as well..

var variableArray: [Int] = []
var constantArray: [Float](128) = [] // would be nice..
var constantMatrix: [float4](128,4) = []  // also sugar

and certainly those data types have use cases which will most likely be accessed in for-loops or while-loops or direct access via Index

using a property ( see _y below) to store one dimension count the following allows then 2 dimensional access

subscript (lhs1: Int, lhs2: Int) -> T {
    get {
        let lhs = (lhs1 * _y) + lhs2
        if lhs > value.count-1  {
            fatalError("out of Bound")
        }
        return value[lhs]
    }
    set {
        let lhs = (lhs1 * _y) + lhs2
        if lhs > value.count-1  { return }
        value[lhs] = newValue
    }
}

which lets me access like

container[1,2] = 123 // and that looks swift to me.

ibex10 · July 17, 2022, 6:37am

Then, what would the syntax look like for accessing an entire row or entire column of the container?

I use the following syntax in P64, an app which I have developed for linear algebra.

let tm = Matrix [4][4](...)

let u = tm [1][2] // third element of second row

let r0 = tm [0] // first row
// or let r0 : Row [4] = tm [0]

let c2 = tm [][2]  // third column
// or let c2 : Column [4] = tm [][2]

designerfuzzi · July 17, 2022, 7:52am

as i coded my test stuff stored in some UnsafeMutableBufferPointer<T> inside a struct
i can do this..

subscript (range: Range<Int>) -> UnsafeMutableBufferPointer<T>.SubSequence {
    return value[range.startIndex..<range.endIndex]
    // return value[range]  //...  or simply
}

That should allow to access it with a range. I mean the rest is math and out of Bound havoc.

michelf · July 17, 2022, 12:19pm

Personally I think the tuple approach fits the language better on a conceptual level, but I can see it has issues.

So I wouldn't mind having (2 * Int) be a fixed-array type distinct from (Int, Int) if that helps things ABI-wise. I think it'd be nice if they could be share the same literal syntax, match the same way with case, and easily convert to and from the tuple form with explicit coercion like myFixedArray as (Int, Int) and myTuple as (2 * Int).

The fixed-storage-property-wrapper-with-a-BufferView approach would make fixed arrays much less approachable. Compare this:

  // Allocate space within MyFloat80 for four contiguous UInt16 values
  @FixedArray(count: 4, initialValue: 0)
  var significand: BufferView<UInt16>

to this:

  // Allocate space within MyFloat80 for four contiguous UInt16 values
  var significand: (4 * UInt16)

and tell me why the first one is a better syntax. I see exactly the same thing with a lot more boilerplate. Even the tuple syntax (UInt16, UInt16, UInt16, UInt16) is better than this.

In short, the (4 * UInt16) syntax could produce its own type distinct from a (UInt16, UInt16, UInt16, UInt16) tuple type. And that type could be initialized with a tuple literal, destructured like a tuple in case matching, easily convert to an from its tuple form with type coercion, disallow special tuple behaviors that scale badly, and have its own ABI and calling convention.