Memory management / C interop

I was surprised to see memory corrupted/freed in the following scenario:

some_c_lib.h

typedef struct {
    const uint8_t *bytes;
    size_t len;
} ByteSlice;

typedef struct {
    int32_t id;
    ByteSlice firstName;
    ByteSlice lastName;
} Person;

typedef struct {
    const Person *people
    size_t count;
} PersonArray;

void *process_person(const Person *person);
void *process_person_array(const PersonArray *personArray);

Swift:

struct Student {
    let id: Int32
    let firstName: String
    let lastName: String
}

extension String {
    var asByteSlice: ByteSlice {
        let bytes = Array(self.utf8)
        ByteSlice(bytes: bytes, len: bytes.count)
    }
}

extension ByteSlice {
    var asString: String? {
        let bufferPtr = UnsafeBufferPointer(start: bytes, count: len)
        return String(bytes: bufferPtr, encoding: .utf8)
    }
}

// this method works
func works(students: [Student]) {
    let people: [Person] = students.map {
        Person(id: $0.id, firstName: $0.firstName.asByteSlice, lastName: $0.lastName.asByteSlice)
    }

    for person in people {
        _ = process_person(person) // <-- works fine
    }
}

// We want to pass in an array of people to the C method at once, using a PersonArray 
// but this method fails.
func failure(students: [Student]) {
    let people: [Person] = students.map {
        Person(id: $0.id, firstName: $0.firstName.asByteSlice, lastName: $0.lastName.asByteSlice)
    }
    let personArray = PersonArray(people: people, count: people.count)

    // When debugging, "lastName" is still correct at this point
    print("1. personArray: \(personArray), lastName: \(personArray.first?.firstName.asString()")

    _ = process_person_array(personArray) // !!! <- an internal failure happens here because the firstName/lastName byte buffers have been free'd/re-used/corrupted

    // "lastName" is sometimes, but not always, corrupted (like random values) or sometimes "\0\0\0\0\"
    print("2. personArray: \(personArray), lastName: \(personArray.first?.firstName.asString()") 
}

I was expecting it all to "just work", but then when it didn't I realized, I don't really understand what the semantics are for the generated interface between Swift and C for more complex cases like the PersonArray.

The most relevant docs I could find are: Apple Developer Documentation but it seems not to cover the case of a buffer of structs.

Is there a way to tell Swift to retain the memory tree from personArray, it's people and their byteBuffer's?

Or are we just stuck with manual memory management when dealing with a buffer of structs like this?

1 Like

I don’t think PersonArray is your problem here, but how you generate the ByteSlice values. Specifically, this code

extension String {
    var asByteSlice: ByteSlice {
        let bytes = Array(self.utf8)
        ByteSlice(bytes: bytes, len: bytes.count)
    }
}

In this context you can think of line 4 as something along the lines of:

extension String {
    var asByteSlice: ByteSlice {
        let bytes = Array(self.utf8)
        return bytes.withUnsafeBufferPointer { buf in
            return ByteSlice(bytes: buf.baseAddress!, len: buf.count)
        }
    }
}

and the docs for withUnsafeBufferPointer(_:) explain why that doesn’t work:

The pointer passed as an argument to body is valid only during the
execution of withUnsafeBufferPointer(_:). Do not store or return the
pointer for later use.

What’s happening in your works(students:) case is that this memory just happens to retain the necessary values and thus things work. In your failure(students:) case this memory gets used for something else, and things fall apart.

Share and Enjoy

Quinn “The Eskimo!” @ DTS @ Apple

3 Likes

This is a common area of bugs in Swift programs, and some of my colleagues have expressed annoyance that the code you wrote works at all. @eskimo has covered the "what is happening", but I'd like to elaborate on your surprise:

It is literally not possible for Swift to automatically solve this problem for you. Pointers in C have no semantic lifetime: there is no way to know when they are no longer needed. This means there's no way to communicate that to Swift's ARC implementation: it's simply not possible for Swift to tell the C code that it'd like retains/releases to occur.

When writing Swift code that interacts with C, you are in C's world, and have to play by C's rules. C's rules are: you are responsible for ensuring your pointers live an appropriate amount of time. Sadly, in many cases this means manual-memory-management of some form.

8 Likes

Thank you for the responses!

I understand you to be saying that the ByteSlice doesn't work because the memory pointed to by byteSlice.bytes will not be retained because bytes is a pointer defined on a C struct, and ARC can not consider C pointers when adjusting lifetimes.

struct SwiftByteSlice {
    let bytes: [UInt8]
    let count: Int
}

let bytes = Array("asdf".utf8)
let swiftByteSlice = SwiftByteSlice(bytes: bytes, len: bytes.count)

However of course the above works (not that you'd want to do it). It is so "easy" to use C structs in Swift, that I expected them to behave the same as Swift structs, when they're really quite a different beast.

I don’t think PersonArray is your problem here

Isn't it also a problem though? From what you told me, I believe we are similarly required to manually manage the lifetime of personArray.people, which is another c pointer (a Person *). Right?

Yes, PersonArray is also a problem.

1 Like

Note that the compiler recently became better at diagnosing some cases like this, so it will now flag these issues some of the time. It cannot find all of them for you, however, so "there's no error" is not the same as "it is semantically correct".

This is an extremely common source of bugs when working with types and interfaces imported from C, and we need to do a better job establishing common patterns that people can use to defuse the footgun that they create, but the quick-and-dirty guidance is "never surface a type in Swift that wraps a C pointer whose lifetime and ownership are not directly tied to the containing object".

1 Like

I wonder if it’s possible to add some kind of escaping/noescape attribute to pointer types to prevent this. It would be source-breaking (and so require a new language version number), and require some C header annotations as well as a version of withoutActuallyEscaping to deal with unannotated headers, but I think it’s worth it.

Effectively this would annotate whether the memory being pointed to has a caller-managed lifetime or not.

Part of the original motivation for the C interop design was that we didn't think we could realistically expect C libraries to annotate themselves. I agree that, if we could do that, it would make sense to limit the magic pointer argument behavior only to C functions that promise to be well behaved.

1 Like