Arrays - Russian dolls... how much can I ditch in my own stdlib if i don't care about supporting classes or objc?

carlos42421 · July 13, 2020, 10:23pm

I'm writing my own super slim standard library for microcontrollers. The first version is functioning but does not have Arrays, Dictionaries or Strings. I'm adding them one at a time and trying to seriously squeeze down what goes in to the bare minimum I need. My next target is adding arrays.

I am not supporting user defined classes in my product and I am not supporting objective c. I only want c array like semantics for storing value types. At least for now.

From the documentation in swift/docs and looking over the stdlib code, I can see that arrays seem to have a fairly small runtime but the standard library code for them is very large. In particular there seems to be this russian dolls effect with many structures nested within each other.

A few questions first to make sure I'm not talking nonsense. My understanding is...

In the simpler case of ContiguousArray, it contains a _ContiguousArrayBuffer that contains a field _storage: __ContiguousArrayStorageBase, which acts a pointer to an underlying memory buffer with a header followed by elements. The code for buffer makes use of pointer arithmetic and undocumented builtins such as Builtin.allocWithTailElems_1 and Builtin.projectTailElems to manipulate the memory buffer. It's unclear to me how these work and I can't find documentation. How is the header size determined?

This is described in docs as being handled by ManagedBufferPointer<_ArrayBody,T>. It looks like that's not strictly true but the code is nearly identical, maybe copy/pasted at one time and diverged?

The header is an instance of _ContiguousArrayStorage, which is a __ContiguousArrayStorageBase, which is a __SwiftNativeNSArrayWithContiguousStorage, which is an empty class on platforms that don't support objective C or cocoa. But in reality, these header instances aren't standard heap allocated class instances as I understand it, they could be allocated in stack memory, using the magic described above, is that right?

The structure for Array itself is even more complex, allowing for Cocoa/NSArray bridging.

I'd like to radically trim down this hierarchy in my version.

What would be the danger/downside in my case in just implementing effectively ContinuousArray, if necessary renaming it Array to get rid of all the NSArray/Cocoa/objc compatibility that I don't need?
In my version, what would be the harm of replacing __ContiguousArrayStorageBase with _ArrayBody as the header?

3405691582 · July 14, 2020, 3:07am

I can't speak to the rest of your concerns, but having stuck my nose around here recently, I can share some knowledge.

These builtins are not completely undocumented, but they are poorly documented. See ./include/swift/AST/Builtins.def for some small documentation.

AIUI, allocWithTailElems is like the C idiom where you allocate an array at the end of the struct and use that array for additional storage (I might not be remembering that idiom correctly, actually).

Essentially, you can think of the struct being over-allocated with additional space, and the arguments to allocWithTailElems determine how much extra space to allocate. projectTailElems gets you to a pointer at the start of that tail-allocated space.

carlos42421 · July 14, 2020, 12:50pm

That's useful, thank you.

So I think with a bit of tweaking, this ends up as a simpler question. From what I can see, ContiguousArray is all I need, it is fast and efficient enough for most things people will need, gives std::vector like performance for modification and reading for value types. In fact, it looks like Array uses _ContiguousArrayBuffer for storage when objc is disabled as it is on my platform.

So it looks like all I will need to support arrays are copy over the Array and _ContiguousArrayBuffer types (plus the array literal protocols of course), then figure out what to do with the "header"?

Does that sound right?

Joe_Groff · July 14, 2020, 4:33pm

You should not need any of the bridging code if you don't have ObjC interop. That's how Array works on Linux and other non-Apple platforms. If you want to be even more minimal, you don't necessarily need to exactly duplicate _ContiguousArrayBuffer's implementation, either. You could generate non-class heap objects instead. All that ultimately matters is the surface-level Array API; everything below it is implementation detail.

jrose · July 14, 2020, 7:46pm

I worked on a stripped-down stdlib for my Swift-on-Classic-Mac project, and Array was one of the types I touched the least—mainly because as you say it's so complicated. The main things I cut out were some helpers that were only used in ObjC modes, plus the empty array singleton. But there wasn't much to that.

Now, Array is implemented in terms of classes, since that's how it gets its copy-on-write behavior. But for your embedded stdlib, you may not care about that, and might be willing to forego the resource management aspect. In that case, you can get away with just using UnsafeBufferPointer for quite a lot. And if you want to build your own simpler Array, you might be able to do it by starting with ManagedBuffer and building it up from scratch.

carlos42421 · July 14, 2020, 10:02pm

Cool. Thanks Joe.

One simple thing I’m trying to understand but can’t work out...

Is the ContiguousArrayBuffer’s storage always allocated from heap memory?

Or can it be in other memory? (Stack/global)?

I had always sort of assumed that arrays could Sometimes be stored using stack memory, like alloca type storage. Especially if they’re not mutated and they don’t exist beyond the end of a function (which must be fairly common in simple cases).

I suppose a simpler question is...

Are arrays always stored on the heap?

Carl

carlos42421 · July 14, 2020, 10:11pm

That’s interesting.

I have actually done exactly that! In version 3.2 of S4A, I added a lot of helpers and sample code to use UnsafeBufferPointer for array type work on the embedded platform.

I only just found out about ManagedBuffer. It could be a slight upgrade.

The main reason I feel like I’ll have to implement some form of Array class is that it seems to be pretty fundamental to the standard use of the compiler.

Things like varargs methods rely on it?

And I think people expect to have things like array literals that you iterate over Because you would have that in C.

I’m not sure if array literals can exist without Array being defined?

jrose · July 14, 2020, 11:23pm

That's true, you'll still have to build something named "Array" that supports a few of the same entry points in order to make varargs work.

David_Smith · July 14, 2020, 11:35pm

This is a bit nebulous tbh. Here's my unverified understanding of the current status:

In theory, the optimizer can promote a heap allocation to the stack based on escape analysis
In practice, it's very difficult to get this to fire, and…
Array immediately passes its buffer to malloc_size() anyway, which should count as an escape

We'd like to fix this at some point, since it'd be a nice perf win to avoid unnecessarily heap allocating things.

carlos42421 · July 15, 2020, 9:25am

Ok. That makes sense. The docs seem to suggest that arrays can end up on the stack like arrays in C. But when I thought about it for a minute, I realised the technical challenges.

A contiguous in memory array would have to be fixed size (from the frame setup steps at the beginning of a function) otherwise stack allocation would be difficult to do efficiently.

And the COW semantics so well known and loved by swift developer, giving value type semantics with reference type efficiencies needs some heap allocation magic to work.

I think I’ve got all I need thanks guys. Actually in order to get all the benefits of swift arrays for developers, I think I will probably end up keeping the contiguous buffer code more or less unchanged. I’m going to try and see if there’s a practical way to create a type that gains a bit more of the C performance in my case. But I doubt it will work.

carlos42421 · July 15, 2020, 1:29pm

Bonus question...

I wanted to understand how memory management and C.O.W. are done by "native swift" arrays, for example when using _ContiguousArrayBuffer.

My understanding is...

_ContiguousArrayBuffer is a value type, containing only one member var _storage: __ContiguousArrayStorageBase.

__ContiguousArrayStorageBase is a class, so _ContiguousArrayBuffer contains just one field, a pointer to an instance of __ContiguousArrayStorageBase or a subclass thereof.

The designated initialiser for _ContiguousArrayBuffer is...

  internal init(
    _uninitializedCount uninitializedCount: Int,
    minimumCapacity: Int
  )

...rather unusually, instead of calling a standard constructor of __ContiguousArrayStorageBase or a subclass to initialise the _storage field, this initialiser calls Builtin.allocWithTailElems_1 followed by _initStorageHeader to allocate (heap) memory and to partially initialise the class instance. This creates a class like struct followed by a c array, forming the swift array header and body.

During use, the array is a value type, the pointer to the underlying storage is copied to any new copies and a reference count maintained using ARC, with a strong reference to the storage from each usage of it in a buffer/array, like with any normal reference type field inside a value type struct/enum. When all struct/enums go out of scope and the ref count drops to zero, the storage will be deinitialised and the heap storage deallocated, just like any normal reference object inside a value type.

If an array is modified in any way (capacity reserved, an element at a subscript modified, elements appended, etc.) then isMutableAndUniquelyReferenced() is called on the buffer. That calls down into _isUnique() which calls Builtin.isUnique() passing a ref to the storage. This intrinsic looks at the reference count and returns true if and only if this is the sole reference, and the pointer is not 'aliased' (if that is a reasonable term to use in this context).

For example, in the (sort of) setter for ContiguousArray subscript we see this code...

    _modify {
      _makeMutableAndUnique()
      _checkSubscript_native(index)
      let address = _buffer.subscriptBaseAddress + index
      yield &address.pointee
    }

where _makeMutableAndUnique() calls isMutableAndUniquelyReferenced() and if it returns false (the storage is shared with another Array), the underlying storage is copied to new memory block (probably on the heap) and the new block used for the storage in a new _ContiguousArrayBuffer, which the Array starts using before the modification is made. The above mechanism forms the basis of the C.O.W. behaviour intrinsic to Array and ContiguousArray.

Arrays also expand to new buffers when capacity is exhausted, in an intelligent way in O(1) amortised time, but that's another story.

I'm curious if I'm right about the above, plus if it's right it might make interesting documentation somewhere, and it will help me to design my own similar collection(s).

Carl

p.s. @jrose ... I love your Swift for classic macs project. I'm convinced it's even more crazy than mine!

jrose · July 15, 2020, 7:12pm

That all sounds correct, with a clarification on this point:

The rules of exclusive access already guarantee that there is no aliasing of the array value if you're mutating it, and checking the reference count makes sure there are no other array values that are around. That's why isUnique takes the object reference as inout: to invoke the exclusivity rules.

(There is one caveat to isUnique: it only checks the strong reference count. If you're using it for copy-on-write behavior, you need to make sure the reference to the object doesn't escape. See also SR-5633.)

carlos42421 · July 16, 2020, 12:18pm

I might need to take a step back to check that I haven't missed anything.

I'm talking about the very simple situation of...

let a = [1,2,3]
var b = a
b[1] = 9
print(b) // shows "1","9","3"
print(a) // shows "1","2","3"

After the statement var b = a, under the hood there will be two value type variables in memory somewhere (probably on the stack), a and b. Each one will contain its own buffer, and that buffer will contain a pointer to the underlying array storage. Unknown to the casual and carefree developer, actual array storage will be shared between the two value types.

During the setter call b[1] = 9, the array storage will be duplicated and a new array storage block set up for use by the value type variable b.

My understanding is that the mechanism to achieve this is as I described. Basically the shared storage is an instance of a class and reference counting together with _isUnique()* determines if more than one array value type is sharing the same storage. Is that correct?

I'm not sure how exclusive access would come into it if they're two separate array lvalues/variables?

Seems like I'm missing something?

(*Footnote for people reading later, the internal stdlib function _isUnique() is visible to the outside world as isKnownUniquelyReferenced(_:), which might be more familiar from blogs about C.O.W. etc.)

jrose · July 17, 2020, 1:16am

Yes, you're correct on all counts here. The exclusive access thing only comes in if you are trying to access the same variable twice, and of course if you do that for Array you're always mutating the struct. I've been trying to come up with an example, even a contrived example, where this comes up, and I haven't been able to, so I wanted to at least confirm that you are on the right track.

carlos42421 · July 17, 2020, 9:38am

Cool!

Is it worth me creating a pull request to add this to some documentation somewhere to help the next confused guy?

Or is it covered elsewhere or would be too much of a maintenance headache when the cow mechanism changes in the future?

jrose · July 17, 2020, 10:44pm

Hm, maybe? I feel like as far as user-made CoW types goes, the docs for isKnownUniquelyReferenced(_:) are already okay, and maybe the docs for ManagedBuffer should mention how to put the two APIs together. But docs/StandardLibraryProgrammersManual.md does mention that it would like more discussion of COW types and buffers, so even just putting your _ContiguousArrayBuffer post verbatim into that doc might be good. (There's also a brief mention in docs/OptimizationTips.rst, but that's more about applying COW than implementing a COW type.) Some of the standard library devs might have more of an opinion.