Approaches for fixed-size arrays

rex-remind · July 18, 2022, 2:10am

I think it's important to address the use cases @benpious pointed out - leveraging accelerators and related libraries for graphics and compute, or even compute on cpu; it seems sensible from that perspective that a tensor, matrix, or vector type dependent on generic integer dimensions would be the most ergonomic approach (e.g. struct FixedArray<n: Int, T> and friends).

Following from @Erik_Eckstein it does seem like (1), i.e. (n * T) tuple, makes sense as an isolated feature for clang importing, but it's possible that (2) i.e. "full featured" fixed-size array has multiple features among it depending on use case, and not to mention (3) BufferView has it's own other uses beyond fixed sized arrays.

It sounds like the attempt here is to kill 2+ birds with one stone, does it make sense though to pigeonhole the graphics/compute use case into these solutions or is that hamstringing it and it should be considered separately?

I personally have an affinity towards dimension dependent types, and yet the same question should then be asked about BufferView use cases if the topic's focus was instead in the dependent types direction.

fan · July 18, 2022, 3:06am

Glad to see this moving forward! When we were doing SwiftFusion we debated a lot on this issue. C++ has const generics so it is possible to write Matrix<10,100> which is automatically a new type. In Swift we end up writing a lot of gybs to generate types like Matrix10x100 - works but certainly not elegant.

This use case however is slightly different from the proposed, specifically in that what we want is not simply an array of some kind, but a way in the type system to represent type parameters. This will have huge implications. For example, is this something Swift's type system can elegantly handle?

struct FixedArray<n: Int, T> {
  var a: Array<T>;
  init() { a = Array<T>.init(repeating: T.zero, count: n);  }
}

More interestingly, can we do this?

func prod<m: Int, n: Int>(a: Matrix<m,n>, b: Matrix<n,m>) -> Matrix<m,m> { ... }

Geordie_J · July 18, 2022, 9:04pm

FWIW I've mostly come across "fixed size arrays" in the wild when importing code from C, either directly or indirectly. For example: dealing with SysEx messages in CoreMIDI gives you some great types like 255 * Int8 (I will spare you all from writing out what that really imports as). In that case we really just want to be able to do something with the contents – in this particular case, we want to convert (part of) the fixed type to a string. So some "safe" generic "view" over the array would be great for that use case. The workarounds we have for this use-case are ok for now – they are usable, but ugly.

The other use-case I come across a bunch is wanting to have a guaranteed (if such a thing is possible) stack allocation of a fixed size, for performance reasons. Examples of this are working with scratch buffers when doing audio processing – maybe we want 512 Floats for the lifetime of a call. Or maybe doing some processing of (C)Strings where intermediate results get thrown away. To be honest I'm not sure what kind of API would be ideal there, but ideally it would allow interfacing with generic Collection code – to be used in the same way we would use a standard array. Using a Tuple for this kind of use-case is basically impossible as is (because getting a pointer to it promotes the value to the heap AFAIK, which defeats the point).

Oschly · July 19, 2022, 7:26am

var t: Int[] = [1, 2, 3]

Looking at it gives me idea of fixed-size array, but its size would be resolved by counting elements in the initial value.

ibex · July 19, 2022, 11:31am

counterpart in C++ world:

const long v [] = {1, 2, 3, 4, 5};
const long v_size = sizeof (v) / sizeof (v [0]);

abdel-17 · July 19, 2022, 1:51pm

MemoryLayout<T>.size * count is more ideal because you don't need to check if the collection is empty.

tera · July 19, 2022, 1:59pm

We can reuse "_" to represent derived fixed size:

var t: Int[_] = [1, 2, 3] // same as Int[3]

BTW, with the new Int[] syntax instead of [Int] for flexible arrays (and with the new Int[3] syntax for fixed arrays) it would be more logical to have the new Value[Key] syntax instead of [Key: Value] for dictionaries. On the second thought I'm taking it back.

Joe_Groff · July 19, 2022, 3:57pm

michelf:

The fixed-storage-property-wrapper-with-a-BufferView approach would make fixed arrays much less approachable. Compare this:
  // Allocate space within MyFloat80 for four contiguous UInt16 values
  @FixedArray(count: 4, initialValue: 0)
  var significand: BufferView<UInt16>
to this:
  // Allocate space within MyFloat80 for four contiguous UInt16 values
  var significand: (4 * UInt16)

As a strawman idea, what if we provided sugar syntax for declaring fixed-size buffer members that provide a BufferView as API, so you can write something more like the second declaration to get the effect of the first?

michelf · July 19, 2022, 5:20pm

I don't think the semantics around @FixedArray and BufferView are clear enough at this point to know if (4 * UInt16) is the appropriate sugar for it. (4 * UInt16) seems appropriate for a type that works mostly like a homogenous tuple, and to me this is how I would expect a fixed array to behave. If the semantics are different, then the sugar shouldn't make it look like a tuple.

But from what I see there, it looks like BufferView doesn't know its size statically, so it'll be impossible to coerce it to and from an equivalent tuple form using "as". It also looks like a function parameter accepting a (4 * UInt16) won't work. And it seems to me you won't be able to make an array of fixed arrays with [(4 * UInt16)], or even a fixed array of fixed arrays with (4 * (4 * UInt16)). And I'm not sure if you can initialize it with a literal. I think (4 * UInt16) would need to be a proper type for all of this. I might be wrong here. This requires clarification.

Joe_Groff · July 19, 2022, 7:11pm

That's right, the idea I'm floating would just be sugar for declaring fixed storage backing a stored property, it wouldn't be valid as a general argument type, computed property type, or in other contexts that are looking for a type.

Joe_Groff · July 19, 2022, 7:34pm

One thing to consider: if we do get integer generic arguments at some point in the future, we would still need some kind of primitive storage-allocating construct to define a FixedArray<n, T> type. We could conceivably use the BufferView property wrapper approach as that primitive:

struct FixedArray<n: Int, T> {
  @FixedArrayBuffer(n)
  private var storage: BufferView<T>

  subscript(index: Int) -> T { return storage[index] }
  /* ... */
}

Having the ability to allocate raw uninitialized storage could also be used to build fixed-capacity arrays, like LLVM's SmallVector type, that have fixed inline storage for up to some number of elements but can have elements added or removed up to that limit without allocating:

@moveonly
struct SmallArray<capacity: Int, T> {
  private(set) var count: Int

  @UninitializedBuffer(count: capacity)
  private var storage: BufferView<T>

  init() { count = 0 }
  mutating func append(value: T) {
    assert(count < capacity)
    storage.baseAddress[count].initialize(with: value)
    count += 1
  }

  deinit {
    for i in 0..<count {
      (storage.baseAddress + i).destroy()
    }
}

michelf · July 19, 2022, 8:48pm

Seems like the true primitive is @UninitializedBuffer. You can build @FixedArrayBuffer and FixedArray on top of it, just by eagerly initializing the buffer. Seems good for low-level stuff.

But what's the purpose of offering @FixedArrayBuffer? I don't see why people should use @FixedArrayBuffer instead of a FixedArray type.

Joe_Groff · July 19, 2022, 9:12pm

One fine point of design for these space-allocating primitives is their value witness behavior when values of the containing type are copied, moved, and destroyed. I'm assuming FixedArrayBuffer would treat the buffer as being n copies of the element type, and the aggregate would copy, move, and destroy itself like it contained n contiguous values of that type, whereas UninitializedBuffer is just space, copying or destruction behavior, and leaves all of that to the type's own initialization and deinitialization (and thus could only appear inside a class or moveonly struct). If we exposed rule-of-five abilities on the primitive, then sure, you could build FixedArrayBuffer the UninitializedBuffer, but allowing for arbitrary user-defined copying and moving behavior in native Swift would incur other tradeoffs. If the main reason to do so would be to allow FixedArrayBuffer to be defined, then maybe it's better to have it be a separate thing.

If we can get the pseudo-property wrappers sooner than type-level integers, and those wrappers provide a building block that we'd need in some form or another even when we have type-level integers, then that might chart a nice trajectory for developing the language. A nominal fixed-size array type definitely has a lot of advantages, but my judgement is that it'll require a nontrivial new type system feature that's likely to be difficult to back-deploy, so if we can get a good chunk of the benefit for less effort that still works toward the eventual goal, I like that idea.

Andrew_Trick · July 19, 2022, 10:54pm

To put it another way, the central question is whether FixedArray<Int, T> needs to be a copyable type. And, by extenstion whether other structs with FixedArray members need to be copyable.

In short @UninitializedBuffer only makes sense for move-only types (or types with reference semantics). @FixedArrayBuffer makes sense for copyable types with value semantics.

michelf · July 20, 2022, 12:23am

Won't the @FixedArrayBuffer approach mean every library will need to create its own separate FixedArray type for function parameters and properties? A sad situation for interoperability of fixed arrays.

Have you considered implementing it as some kind of struct containing a tuple with extra padding? Something like:

struct FixedArrayImpl<TupleType, PaddingType> {
   var storage: TupleType
   var _padding: PaddingType
}

I don't really know but I assume the calling convention for passing a struct won't destructure members of a tuple inside, so it should scale well in that regard. And then, in usage:

(3 * Int32)          is  FixedArrayImpl<(Int32, Int32, Int32), Void>
(3 * (Int32, Int8))  is  FixedArrayImpl<((Int32, Int8), (Int32, Int8), (Int32, Int8)), (Int8, Int8, Int8)>

And from there, everything else is implemented as sugar over that special type. The user never has to see its real name or generic parameters.

You still have a type to back-deploy though, so maybe that's a problem.

scanon · July 20, 2022, 12:42am

In practice, most algorithms would be written against either:

A type that wraps the fixed-length array, if the size is truly part of the type.
Collection or RandomAccessCollection, to which such a type or view would conform, if the size is just "variable but knowable at compile time".

Note that the situation you're afraid of doesn't arise in either one of these scenarios.

Nickolas_Pohilets · July 20, 2022, 11:40am

IIUC, BufferView does not own the data. And this creates a risk that view may outlive its underlying storage.

Will fixed-sized arrays be limited to POD-types, or it should be possible to store types with non-trivial copying as well?

Untyped @UninitializedBuffer with a typed view cannot copy non-trivial types. When struct is copied, it is the backing storage which is responsible for copying, not the publicly visible BufferView.

Fully initialised FixedArray / (n * T) can copy them, the same way as tuples do.

Partially initialized SmallArray struct has sufficient information to implement the copy, but Swift has no copy constructors, so there is no place to implement it. Before we get move-only types, one way to work around this would be to implement SmallArray as an enum:

/// Have layout similar to struct with count property, but not the same
/// Discriminator encodes cases in the following order: [.s1, .s2, .s3, .s4, .s0]
enum SmallArray4<T> {
    case s0
    case s1(T)
    case s2(T, T)
    case s3(T, T, T)
    case s4(T, T, T, T)

   var count: Int {
       switch self {
           case .s0: return 0
           case .s1: return 1
           case .s2: return 2
           case .s3: return 3
       }
   }

    subscript(_ index: Int) -> T {
        switch (self, index) {
        case let (.s1(x), 0): return x
        case let (.s2(x, _), 0): return x
        case let (.s2(_, x), 1): return x
        case let (.s3(x, _, _), 0): return x
        case let (.s3(_, x, _), 1): return x
        case let (.s3(_, _, x), 2): return x
        case let (.s4(x, _, _, _), 0): return x
        case let (.s4(_, x, _, _), 1): return x
        case let (.s4(_, _, x, _), 2): return x
        case let (.s4(_, _, _, x), 2): return x
        default: fatalError("Index out of bounds")
        }
    }
}

If integers as generic params were supported, SmallArray could be implemented recursively, but then also we need a way to terminate recursion somehow:

enum SmallArray<let n: Int, T> {
    case base(SmallArray<n - 1, T>)
    case tail((n * T))
}

justinmeiners · July 20, 2022, 3:08pm

The lack of fixed-size arrays has often been a hurdle for me, so I am excited progress is being made here.

Regarding the tuple option:

I would guess the main use case is actually having fixed length arrays in a struct. As long as passing such a struct to a function doesn't cause the destructuring, that doesn't sound horrible.

Joe_Groff · July 20, 2022, 3:27pm

BufferView would be born as a move-only type that borrows the underlying storage, so that the compiler enforces that the view doesn't outlive the buffer.

Joe_Groff · July 20, 2022, 3:32pm

or (3) They can take an UnsafeBufferPointer or BufferView over whatever underlying array type if they care about the memory being contiguous, but not the size or ownership policy of the memory. One thing I'd like to get a sense of is how often fixed-size arrays are directly useful as value types of exactly N elements, vs. how often they're used to take up storage within another type that's the actual interesting one. If people do want to write lots of nongeneric code taking arrays of exact size, then @michelf's concerns about interoperability going forward are a definite concern.