Minimizing Memory Footprint of Swift Types <-> C Types

xanderdunn · October 19, 2020, 8:03pm

I have written a Swift interface to the Apache Arrow data format, which is popular for high performance computing and data science with file types like .arrow, .feather, and .parquet. My library works end-to-end Swift->CArrow->file->CArrow->Swift. However, it is very inefficient in memory space. A dataset that can be loaded from disk as C types in 30GB of RAM is taking >120GB of RAM to decode to Swift types. This must be remedied for my library to be useful.

First, I'm trying to understand memory requirements of some basic Swift types. I'm using a helper function to print the current Swift process' memory usage:

import Foundation

#if canImport(Darwin)
enum MachError: Error {
    case FailedToGetMemory(String)
}

func getMemoryUsage() -> UInt64? {
    var taskInfo = mach_task_basic_info()
    var count = mach_msg_type_number_t(MemoryLayout<mach_task_basic_info>.size)/4
    let kerr: kern_return_t = withUnsafeMutablePointer(to: &taskInfo) {
        $0.withMemoryRebound(to: integer_t.self, capacity: 1) {
            task_info(mach_task_self_, task_flavor_t(MACH_TASK_BASIC_INFO), $0, &count)
        }
    }

    if kerr == KERN_SUCCESS {
        return taskInfo.resident_size
    } else {
        return nil
    }
}

func getMemoryUsageString() -> String? {
    if let memoryUsage: UInt64 = getMemoryUsage() {
        return ByteCountFormatter().string(fromByteCount: Int64(memoryUsage))
    } else {
        return nil
    }
}

#else
func getMemoryUsageString() -> String? {
    // TODO: Implement this for Linux
    return ""
}
#endif

And here is the code whose memory usage I'm testing:

print(Date(), getMemoryUsageString()!, "Creating random large column values...")
let numRows = 50_000_000
let doublesColumn: [Double] = (0..<numRows).map { Double.random(in: 0.0...Double($0)) }
let intsColumn: [Int] = (0..<numRows).map { Int.random(in: 0...$0) }
print(intsColumn.count, doublesColumn.count)
print(Date(), getMemoryUsageString()!, "Done creating random columns")
let largeColumns: [[BaseArrowArrayElement]] = [doublesColumn, intsColumn]
print(Date(), getMemoryUsageString()!, "Done creating array of arrays")

Where BaseArrowArrayElement is super simple:

public protocol BaseArrowArrayElement: CustomStringConvertible {
}

Here is the output:

2020-10-19 19:45:49 +0000 27.1 MB Creating random large column values...
50000000 50000000
2020-10-19 19:47:57 +0000 827.7 MB Done creating random columns
2020-10-19 19:48:05 +0000 4.83 GB Done creating array of arrays

50M Doubles and 50M Int(64)s is 2 columns * 8 bytes * 50M values = 800M bytes = 0.8GB, so the memory usage after creating the arrays is exactly as expected: 827.7 MB.
What then am I doing to cause it to balloon to 4.83GB just by putting them into an array of arrays? I was hoping copy on write would save me here, or worst case it would 2x the size from 0.8 to 1.6GB? This is instead 6x!

This is on my macOS 10.15.6 MacBook Pro with Swift for Tensorflow 0.11:

$ which swift
/Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.11.xctoolchain/usr/bin/swift
$ swift --version
Swift version 5.3-dev (LLVM db8896f3f345af2, Swift 61684f62a6132c0)
Target: x86_64-apple-darwin19.6.0

I have a slew of followup questions but will try to take this one step at a time. Are there any resources on writing memory constrained Swift code where safety can be sacrificed in the interest of operating on terabytes of data in as little memory space as possible? Any pointers will be helpful, thanks!

harlanhaskins · October 19, 2020, 8:17pm

The first thing to do here would be to avoid using protocols as values in this way. A protocol used this way needs to allocate extra memory to potentially store an arbitrary payload under the hood, and needs to bring type metadata and dynamic dispatch tables around with it.

If you're looking to optimize for terabytes of memory usage, use concrete types.

AlexanderM · October 19, 2020, 8:18pm

Or enums! Although even a single bit discriminator would waste way more memory, since it'll need padding to match alignment requirements.

AlexanderM · October 19, 2020, 8:20pm

Have you considered staying in unsafe api land (a.k.a. C land), and only bridging to/from Swift types on request?

E.g. materialize a Swift.Array<Int> of some small result set, only when queried. Have the backing store still be the C-allocated memory.

dfsweeney · October 19, 2020, 9:49pm

When I do:

struct TwoArrays<T, U> {
  let i : Array<T>
  let j : Array<U>
}

then

  func testFromForum() {
    print(Date(), getMemoryUsageString()!, "Creating random large column values...")
    let numRows = 50_000_000
    let doublesColumn: [Double] = (0..<numRows).map { Double.random(in: 0.0...Double($0)) }
    let intsColumn: [Int] = (0..<numRows).map { Int.random(in: 0...$0) }
    print(intsColumn.count, doublesColumn.count)
    print(Date(), getMemoryUsageString()!, "Done creating random columns")
    let largeColumns = TwoArrays(i: intsColumn,
                                  j: doublesColumn)
    print(Date(), getMemoryUsageString()!, "Done creating array of arrays")
  }

(the same as the original code but just using a struct instead of an array of a different type)

The results are what you'd expect:

020-10-19 21:23:34 +0000 22.6 MB Creating random large column values...
50000000 50000000
2020-10-19 21:25:16 +0000 823 MB Done creating random columns
2020-10-19 21:25:16 +0000 823 MB Done creating array of arrays

In the code with an array of BaseArrowArrayElement it copies everything to an array with a larger stride when you create the new array. With the struct you get the copy-on-write optimization (or really the lack of always-copy dis-optimization I guess).

looking at that array of protocol type it seems like what you're trying to do is make an array of heterogenous arrays (like a data frame of the column vectors). I think that might be where things get weird--I could not get the code to compile like that (although I was glomming it over pretty quickly).

Making a data-frame like collection of heterogeneous-type arrays is kind of hard to do without variadic generics. You probably need to have a bunch of types with various numbers of generic parameters to manage that right now, or maybe something where you have an array of arrays of Int, array of arrays of double, etc, with metadata to keep track of which is which, like (untested):

enum IntOrDouble {
  case integer
  case double
}
struct Whatever {
  var integerColumns: Array<Array<Int>>
  var doubleColumns: Array<Array<Double>>
  var fieldMetadata: (String, IntOrDouble, Int) // where first field is the name, 
             //the second is which array to use, the third is the index in the array
}

and you just put more stuff in there to add more kinds of column.

There is a bunch of ABI and memory layout documentation that I learned a lot from at different times. I think the ABI Stability Manifesto stuff is the place to start: swift/ABIStabilityManifesto.md at main · apple/swift · GitHub but I might be forgetting another good layout document that goes through the actual layout of bytes.

Lantua · October 19, 2020, 10:33pm

You also probably could have the entire array conform to some BaseArrowArray instead of treating it as an array of BaseArrowArrayElement. This should remove the extra allocation (and indirection).

xanderdunn · October 20, 2020, 2:16am

Thanks everyone for your help!

Thanks, I didn't know protocols carry additional memory baggage. This goes back to my question here where I'm struggling to contain these various different types (Double, Bool, Decimal, Date, Int, String) in a way that I can operate on all of them without writing if statements in every function where I work on them:

if Double
else if Bool
else if Date
...

It's my limited understanding of the solutions using protocols proposed over in that post that lead to the current state of things. However, I phrased the question in that post in terms of type erasure and that may have been a red herring.

This is an interesting idea, thanks for suggesting. If the goal is to iterate across all of the data once, then we need keep in memory only the full dataset in C + the current small subset in Swift. In fact, if that's the goal we need keep in memory only the subset we're currently operating on. The Apache Arrow format can be memory mapped and we can simply iterate through the file as we operate on the subset we have available to us. But, I think this could turn into considerable overhead if we're doing many repeated random reads?

Yes that's absolutely the thinking. Yeah, the code snippet I provided wasn't enough to compile. Here's a more complete example for Double:

public protocol BaseArrowArrayElement: CustomStringConvertible {
}

protocol ArrowArrayElement: Equatable, BaseArrowArrayElement {
    static func toGArrowArray(array: [Self]) throws -> UnsafeMutablePointer<GArrowArray>?
    static func fromGArrowArray(_ gArray: UnsafeMutablePointer<GArrowArray>?) -> [Self]
}

extension Double: ArrowArrayElement {
    static func toGArrowArray(array: [Double]) throws -> UnsafeMutablePointer<GArrowArray>? {
    }

    static func fromGArrowArray(_ gArray: UnsafeMutablePointer<GArrowArray>?) -> [Double] {
    }
}

As you noticed, the trouble with this is that I don't know how many columns there are at compile time, it's determined by the file being read in. Your solution with a struct that contains metadata is very interesting, I'll play around with this idea.

Many thanks for this, I'll give this a read.

Lantua · October 20, 2020, 2:30am

If most of the types are simple data types, there may be some optimization to be had by using enum. You can even use indirect on rare & large type to reduce the enum stride.

enum ArrowElement {
  case int(Int), double(Double), ...

  // Will use out-of-place storage
  indirect case bigRare(...) 
}

It'd still incur extra memory. I'd expect 2x(size of largest data type) since those enum bits will misalign the data. (+ it won't be easy accessing those data).

xanderdunn · October 23, 2020, 5:23am

Using @dfsweeney's struct approach I got my Swift->CArrow->File flow down from 6x memory usage to 2x memory usage. Thanks! The single copy is a result of Apache Arrow's C array builder and probably can't be optimized away.

Now I am looking at the other end: File->CArrow->Swift and I currently have 3x memory usage. One of the allocations occurs when I create a Swift Array to contain values that currently exist as C values. Here is my code for taking an Arrow C array and converting it to a Swift Array:

// From https://github.com/pvieito/PythonKit/blob/671302aebaa8b9bef4daccbf7fc07c014d1fd357/PythonKit/NumpyConversion.swift
extension Array: ConvertibleFromGArrowArray
where Element: ArrowArrayElement {
    init?(gArray: UnsafeMutablePointer<GArrowArray>?) {
        let primitiveArray: UnsafeMutablePointer<GArrowPrimitiveArray>? = GARROW_PRIMITIVE_ARRAY(gArray)
        let buffer: UnsafeMutablePointer<GArrowBuffer>? = garrow_primitive_array_get_data_buffer(primitiveArray)
        let ptrVal: OpaquePointer? = garrow_buffer_get_data(buffer) // GBytes
        var gSize = g_bytes_get_size(ptrVal)
        guard let bufferPointer: UnsafeRawPointer = g_bytes_get_data(ptrVal, &gSize) else {
            return nil
        }
        let ptr: UnsafePointer<Element> = bufferPointer.assumingMemoryBound(to: Element.self)

        // This code avoids constructing and initialize from `UnsafeBufferPointer`
        // because that uses the `init<S : Sequence>(_ elements: S)` initializer,
        // which performs unnecessary copying.
        let dummyPointer = UnsafeMutablePointer<Element>.allocate(capacity: 1)
        let scalarCount = Int(garrow_array_get_length(gArray))
        // TODO: This is allocating memory equal to the size of the dataset
        self.init(repeating: dummyPointer.move(), count: scalarCount)
        dummyPointer.deallocate()
        withUnsafeMutableBufferPointer { buffPtr in
            buffPtr.baseAddress!.assign(from: ptr, count: scalarCount)
        }
    }
}

As expected, the self.init(repeating:count:) is allocating memory equal to the size of the data column, thus doubling the memory necessary to do this GArrowArray->Swift Array conversion. Is this necessary? Is there no way to simply point an Array to the existing contiguous C array of values? I'm ok with unsafety. Keeping the C arrays in scope for the lifetime of the Swift Arrays would be acceptable.

I haven't been in a programming language with pointers in ~6 years so this is wild stuff for me.

Lantua · October 23, 2020, 5:38am

I'm pretty sure there's no way to inject an array arbitrary pointer into Array even if you can guarantee its lifetime. That said, nothing stops you from creating your own array that thinly wraps the said C array and conform it to Collection or even MutableCollection. So that most users can perform operations on them just fine.

PS

This is a move that'd make anyone faint . If you're targeting a newer version of Swift, you may want to use init(unsafeUninitializedCapacity:initializingWith:), which would make the code at the end somewhat more formal.

dfsweeney · October 23, 2020, 6:35pm

If you're really ok with unsafety, you can make a struct that conforms to Collection and holds an UnsafePointer and just passes the subscript on the struct through to subscript on the UnsafePointer. (That should hold for UnsafeMutablePointer as well. Although I did not personally test this.

Alternately you should be able to make an UnsafeBufferPointer from the UnsafePointer and use that as a Collection.

If I have it right you get an UnsafeRawPointer to the first element of the C array. If you make from that an UnsafePointer of the correct type with matching alignment and stride, and the correct count of elements, (or an UnsafeBufferPointer to get the Collection conformance) you should be OK as long as the underlying buffer does not get deallocated or reused.

I feel like that would be hard to guarantee--you're getting the buffer from a library that might have incentives to reuse the same block of memory.

Another risk is that Swift's idea of the alignment and stride might not match the alignment and stride of the underlying C type at some point in the field or in the future and bad stuff will happen.

I'm not sure that the copy itself is that bad if the underlying C array is deallocated or reused after you copy it. You'd have double memory use during the copy but then if the library deallocated the original you'd drop back to just one copy of the same data.

By the way I'm not sure if the process memory that the kernel knows about would appear to have the memory back in that case--it's probably deallocating through malloc, who might be holding on to some memory pages after they are deallocated to reuse itself. If it's directly using mmap to allocate and deallocate then the process memory should match up.

I imagine that making the memory copy could be optimized. If it's really large you might be able to call mmap directly to make the original buffer copy-on-write, or do another trick like that, to make a copy and take ownership of it even though it's Unsafe.

Hope that helps and kind of makes sense, I'm thinking as I type.

Karl · October 23, 2020, 6:50pm

xanderdunn:

        let dummyPointer = UnsafeMutablePointer<Element>.allocate(capacity: 1)
        let scalarCount = Int(garrow_array_get_length(gArray))
        // TODO: This is allocating memory equal to the size of the dataset
        self.init(repeating: dummyPointer.move(), count: scalarCount)

As @Lantua said, this move reads uninitialised memory. Please don't do that.

If you're really trying to minimise your allocations and copying, you'd be better off using your own Collection type, rather than Array.

xanderdunn · October 24, 2020, 3:50am

Thanks, I fixed it with the the updated method you referenced

@Lantua @dfsweeney @Karl Thanks for your suggestions to move toward a custom data structure to further improve memory. I think this is right and I will begin working on it.

It is precisely peak memory usage that is the issue. We are dealing with many terabytes of data, and the question is how much data can be processed in memory / how many hundreds of gigabytes or terabytes of RAM do we need to process a given chunk of the data. The fact that it will be freed helps only if the peak usage is still below the available RAM. The Arrow format is naturally chunked, so we could just consume small chunks and expect the CArrow chunk to be dellocated before we move on to the next chunk. In that case, peak memory usage becomes (single chunk size in CArrow) + (size of the dataset in Swift). However, in my testing I didn't see memory freeing up no matter how I played with scopes, which leads to your next point:

This is very interesting and could explain why I'm not seeing the memory free up when I'm expecting it to. This was actually intended as a followup question. I will investigate this further and see if I can come up with some self-contained sample code to illustrate the issue if I can't figure it out.

Jon_Shier · October 24, 2020, 4:16am

@xanderdunn You may also be interested in the discussion in this thread where it was discovered the default allocator on Linux was keeping memory around (for certain allocation sizes) even after the program had released it. I don't know if you can switch allocators on macOS, but it's something to keep in mind.

Lantua · October 24, 2020, 4:27am

Since you'll be looking into implementing custom Collection, it's worth mentioning that you can now use any storage strategy, including having multiple chunks in one collection, especially that it seems to match how you process data.

robertjpayne · October 24, 2020, 6:43am

It should really be noted that the allocator on Darwin (iOS / macOS, tvOS etc…) is much much different than Linux. Also using autoreleasepool's is essential on any Darwin targets as a lot of Swift is just bridged to Objective-C types.

xanderdunn · October 24, 2020, 4:10pm

Thanks, although I mentioned macOS in my original post because my minimal reproducing code also worked on macOS, everything I'm doing here is on Linux. It's Linux where I'm seeing the allocations stick around. If I get a minimal reproducing example on a much smaller dataset, then I try it on macOS as well. I haven't yet tried to reproduce the deallocations issue on macOS. But let's be clear I don't yet have evidence our issues are the same, I might just be doing something wrong. I'm very green with respect to Swift memory management. Much further testing to be done.