Saving Arrays of Different Element Types to File

I am trying to understand why the following code works:

import Foundation

// TODO: How to turn these two functions into Array extensions?
func arrayToFile<T>(array: [T], toFile url: URL) throws {
    var array = array
    /*let data = Data(bytes: &wArray, count: wArray.count * MemoryLayout<Double>.stride)*/
    let data = Data(bytesNoCopy: &array, count: array.count * MemoryLayout<T>.stride, deallocator: .none)
    try data.write(to: url)
}

func dataFileToArray<T>(url: URL) throws -> [T] {
    let data = try Data(contentsOf: url)
    let value = data.withUnsafeBytes {
        $0.baseAddress?.assumingMemoryBound(to: T.self)
    }
    return [T](UnsafeBufferPointer(start: value, count: data.count / MemoryLayout<T>.stride))
}

func testArraySaving() {
    let url = URL(fileURLWithPath: "./testBinary.data")

    let longString = "asldkjfhiasdjkhfakjsdhfaksoufhasdhflajsdhlfkjahslkfawufaiuwehfuy4i7y10i724y051720469r8qywpoeuigahpsjkd;hajksdn;vas[vaoijspvjkansdkjvanpweufiqihp2i4utypqiw7rypgaywrg97atpsdiufap093846t09w7640t97a60rs7gatypsiudhafpskjhfpqu3i4ipt793rpgya9usphgajshgaljks]"
    let testArray: [String] = ["one", "two", "three", longString, "eleventeenth", longString]
//    let testDoublesArray: [Double] = [1.0, 2.0, 3.0, 4.0, 5.0]
    do {
        try arrayToFile(array: testArray, toFile: url)
    } catch {
        print("Failed to save: \(error)")
    }

    do {
        let result: [String] = try dataFileToArray(url: url)
        assert(result[3] == longString)
        assert(result[5] == longString)
        print(result)
    } catch {
        print("Failed to read: \(error)")
    }
}

testArraySaving()

Specifically, why does this successfully store and retrieve Strings of varying lengths? It relies on MemoryLayout<String>.stride. I understand this would work with types like Bool or Float because they are fixed length - always a certain number of bytes. However, this is not the case for Strings, which can be arbitrarily long. Rather than describing the length of the String, is this instead describing the pointers to the Strings?

This becomes even more confusing when I realize this works too:

    let testArray: [Any] = ["one", "two", true, 4.33, longString, "eleventeenth", longString, [1, 2, 3, 4]]
...
        let result: [Any] = try dataFileToArray(url: url)
        print(result[0]) // one
        print(type(of: result[0])) // String
        print(result[7]) // [1, 2, 3, 4]
        print(type(of: result[7])) // Array<Int>

When the type of array being stored and retrieved is set to Any, it even works for arrays within arrays. What does it mean to have a MemoryLayout<Any>? Each individual element is not an Any?

What is this voodoo magįcks?

Try to read directly from the data. Before we proceed, there're a few things I'd like to mention about the unsafe APIs.


Firstly, I don't recommend coding data this way. It relies on the in-memory representation, which can easily change if you're not careful.

Secondly, there are a few life cycle rules you need to honour when using unsafe APIs. In this line,

let data = Data(bytesNoCopy: &array, count: array.count * MemoryLayout<T>.stride, deallocator: .none)

Given that you're using Data(bytesNoCopy:count:deallocator), you need to ensure that the memory pointed by the first argument is valid for the entire life of the returned data. It immediately has two problems:

  • You can't use &array to create a long-lived pointer. &x is valid only for the duration of the function call (which is Data.init). So the moment Data.init returns, that pointer is already invalidated. (Newer compilers should warn you about that).
  • Even if &array points to array afterward, there is no effort to make sure that array remains valid until data.write. The compiler could deallocates array after Data.init (its last usage), but before data.write.

A safer route is to use a copy version of the init:

let data = Data.init(bytes: array, count: array.count * MemoryLayout<T>.stride)

Again, in this block,

let value = data.withUnsafeBytes {
  $0.baseAddress?.assumingMemoryBound(to: T.self)
}
return [T](UnsafeBufferPointer(start: value, count: data.count / MemoryLayout<T>.stride))

withUnsafeBytes guarantees that the pointer is valid only within the block, so you can't escape anything that contains that pointer. You need to create the array inside that block, then return that array:

return data.withUnsafeBytes { (data) -> [T] in
  let value = data.baseAddress?.assumingMemoryBound(to: T.self)
  [T](UnsafeBufferPointer(start: value, count: data.count / MemoryLayout<T>.stride))
}
1 Like

Yes, you are writing and reading pointer values. Try writing a file, relaunching the application, and reading it again. :boom:

2 Likes

Ok now, the first thing you could try when reading doing some raw memory shenanigan, is to see if the data, especially longString is saved into the file. You can just make a different writer program, and reader program, then try again.

Edit:

@Karl beats me to it.

2 Likes

Thanks @Lantua and @Karl.

Lack of memory safety is intentional to process hundreds of gigabytes of data.

Thank you for pointing out this mistake!

This is an important point. Files saved in this way would potentially not be compatible with future versions of Swift.

Oh, of course! Reading the file in another process exposes the issue. We do, however, expect this to work for types like Float, Int, Double, and Bool, correct? This appears to be the case. I haven't written C in ages, nor do I know if Swift's memory management resembles C's, but if it does, then I think this makes sense. I need some basic education on Swift memory handling of its basic types. This may be a good reference for me, although it doesn't mention Strings.

Yes, at least on the same system. The constraint you want is the type being “trivially copyable”, but we don’t have a way to write that constraint today.

1 Like

If you need to use the pointer, use it while the pointer is still valid:

array.withUnsafeBytes { array in
  let data = Data(bytesNoCopy: array, ...)
  try data.write(...)
}

There's a small quibble. Array of Int and array of a protocol Foo have different memory layout even if the array of Foo itself contains only Int.

It doesn't. Like (almost?) any other type that represents length-flexible user information, String has to use remote memory allocation. That means that the raw struct itself contains only the pointer to that user data and any other accounting information. That (de)serialized information is useless for the user, and you totally skipped what the user actually would need.

The Encodeable and Decodable protocols are for types that can serialize and deserialize their instances' user information to/from a stream/archive. You need to use these, or some other serialization library with similar ideas, to convert an object to/from a byte stream. Your user-objects' types would have to conform, of course. (Or the (de)archiving type has to have methods directly reading/writing the user type.)

Terms of Service

Privacy Policy

Cookie Policy