Minimizing Memory Footprint of Swift Types <-> C Types

Thanks everyone for your help!

Thanks, I didn't know protocols carry additional memory baggage. This goes back to my question here where I'm struggling to contain these various different types (Double, Bool, Decimal, Date, Int, String) in a way that I can operate on all of them without writing if statements in every function where I work on them:

if Double
else if Bool
else if Date
...

It's my limited understanding of the solutions using protocols proposed over in that post that lead to the current state of things. However, I phrased the question in that post in terms of type erasure and that may have been a red herring.

This is an interesting idea, thanks for suggesting. If the goal is to iterate across all of the data once, then we need keep in memory only the full dataset in C + the current small subset in Swift. In fact, if that's the goal we need keep in memory only the subset we're currently operating on. The Apache Arrow format can be memory mapped and we can simply iterate through the file as we operate on the subset we have available to us. But, I think this could turn into considerable overhead if we're doing many repeated random reads?

Yes that's absolutely the thinking. Yeah, the code snippet I provided wasn't enough to compile. Here's a more complete example for Double:

public protocol BaseArrowArrayElement: CustomStringConvertible {
}

protocol ArrowArrayElement: Equatable, BaseArrowArrayElement {
    static func toGArrowArray(array: [Self]) throws -> UnsafeMutablePointer<GArrowArray>?
    static func fromGArrowArray(_ gArray: UnsafeMutablePointer<GArrowArray>?) -> [Self]
}

extension Double: ArrowArrayElement {
    static func toGArrowArray(array: [Double]) throws -> UnsafeMutablePointer<GArrowArray>? {
    }

    static func fromGArrowArray(_ gArray: UnsafeMutablePointer<GArrowArray>?) -> [Double] {
    }
}

As you noticed, the trouble with this is that I don't know how many columns there are at compile time, it's determined by the file being read in. Your solution with a struct that contains metadata is very interesting, I'll play around with this idea.

Many thanks for this, I'll give this a read.