When I do:
struct TwoArrays<T, U> {
let i : Array<T>
let j : Array<U>
}
then
func testFromForum() {
print(Date(), getMemoryUsageString()!, "Creating random large column values...")
let numRows = 50_000_000
let doublesColumn: [Double] = (0..<numRows).map { Double.random(in: 0.0...Double($0)) }
let intsColumn: [Int] = (0..<numRows).map { Int.random(in: 0...$0) }
print(intsColumn.count, doublesColumn.count)
print(Date(), getMemoryUsageString()!, "Done creating random columns")
let largeColumns = TwoArrays(i: intsColumn,
j: doublesColumn)
print(Date(), getMemoryUsageString()!, "Done creating array of arrays")
}
(the same as the original code but just using a struct instead of an array of a different type)
The results are what you'd expect:
020-10-19 21:23:34 +0000 22.6 MB Creating random large column values...
50000000 50000000
2020-10-19 21:25:16 +0000 823 MB Done creating random columns
2020-10-19 21:25:16 +0000 823 MB Done creating array of arrays
In the code with an array of BaseArrowArrayElement it copies everything to an array with a larger stride when you create the new array. With the struct you get the copy-on-write optimization (or really the lack of always-copy dis-optimization I guess).
looking at that array of protocol type it seems like what you're trying to do is make an array of heterogenous arrays (like a data frame of the column vectors). I think that might be where things get weird--I could not get the code to compile like that (although I was glomming it over pretty quickly).
Making a data-frame like collection of heterogeneous-type arrays is kind of hard to do without variadic generics. You probably need to have a bunch of types with various numbers of generic parameters to manage that right now, or maybe something where you have an array of arrays of Int, array of arrays of double, etc, with metadata to keep track of which is which, like (untested):
enum IntOrDouble {
case integer
case double
}
struct Whatever {
var integerColumns: Array<Array<Int>>
var doubleColumns: Array<Array<Double>>
var fieldMetadata: (String, IntOrDouble, Int) // where first field is the name,
//the second is which array to use, the third is the index in the array
}
and you just put more stuff in there to add more kinds of column.
There is a bunch of ABI and memory layout documentation that I learned a lot from at different times. I think the ABI Stability Manifesto stuff is the place to start: swift/ABIStabilityManifesto.md at main · apple/swift · GitHub but I might be forgetting another good layout document that goes through the actual layout of bytes.