After banging my head against some code's performance for several hours, I realized that close to half of my runtime is spent doing refcounting/CoW operations on some arrays and dictionaries. These are local variables that never escape their enclosing function, so in theory I could ditch RC/CoW entirely and just free the memory when the variable goes out of scope. As I see it, my options are:
Accept this performance penalty
Write my own array type that manually manages raw pointers and calls realloc if needed
Import C++ std::vector and use that
All three of these options are suboptimal. Is there an existing type to use when I need a growable buffer but don't care about CoW? Are there plans to add ~Copyable collections to the standard library to do away with superfluous RC traffic? Or do I have to revert to low-level pointer madness or the STL?
Why do you consider your own implementation of array (as a move only type) suboptimal? IMO the best solution to step away from RC/COW.
Also, if your array doesn't escape and you know its required capacity, you might benefit from withUnsafeTemporaryAllocation
Simply put, I’d be spending effort on a solved problem and risking making a mistake when the implementation already exists (case in point std::vector).
That’s a large part of the reason I’m making this an Array in the first place; I don’t know ahead of time how many items it can have. I suppose I could come up with an overestimate with only a little additional effort, but even that only solves it for the array, not the dictionary.
If it's the case that the array never escapes its local context, then in principle the optimizer ought to be able to take advantage of that fact to reduce or eliminate refcounting and CoW overhead. If you're able to share some code examples, we might be able to help figure out why that isn't happening for you and if we can fix it on the compiler's end.
It feels like every change I make to improve performance in one place makes it even worse somewhere else. If I get rid of some of the swift_release calls, that time is now taken up by __TMC_END__. It took a lot of effort to even get this far performance-wise.