Crash when adding particular Strings to Arrays with C++ interop

I'm constructing Swift String Arrays in C++ to pass into Swift code and I'm finding that adding particular strings seems to choke things up and leads to a crash when the Array goes out of scope. I can repro this in a fresh Xcode 15.0.1 (15A507) template project by turning on C++ interop and wiring up the code below.

Am I doing something wrong here, or does this seem like a bug? These strings all seem to be valid utf-8, and everything seems to work correctly if I take out the offending few.

Swift code:

static public func printStrings(strings: [String]) {
  for s in strings {
    print("GOT STRING", s)
  }
  print("DONE PRINTING")
}

C++ code:

{
  auto array = swift::Array<swift::String>::init();

  // These are all good.
  array.append("தமிழ்");
  array.append("简体中文");
  array.append("繁體中文");
  array.append("हिंदी");

  // This one's trouble.
  array.append("ภาษาไทย");
    
  // As is this one.
  array.append("🤷🏼‍♂️");

  // (note: still crashes without this)
  SwiftTestProject::printStrings(array)

  // Crashes here when tearing down array.
}

output:

GOT STRING தமிழ்
GOT STRING 简体中文
GOT STRING 繁體中文
GOT STRING हिंदी
GOT STRING
GOT STRING
DONE PRINTING
<EXC_BAD_ACCESS in swift::Array<swift::String>::~Array()>
1 Like

I wonder if it'd help to prefix the literals with u8? (e.g. u8"ภาษาไทย")

Maybe in the absence of that explicit encoding indicator, C++ is somehow mangling the encoding? Bit odd that it would be only for some non-ASCII characters, though.

You could also try using Unicode escape sequences for at least one of your problematic strings, and see if that helps. e.g. u8"\ud83e\udd37\ud83c\udffc\u200d\u2642\ufe0f" for your shrug emoji there.

That sounds like an access to deallocated memory to me.


What's the difference between the good and bad strings?

// These are all good.
"தமிழ்".utf8.count // → 15
"简体中文".utf8.count // → 12
"繁體中文".utf8.count // → 12
"हिंदी".utf8.count // → 15

// This one's trouble.
"ภาษาไทย".utf8.count // → 21
    
// As is this one.
"🤷🏼‍♂️".utf8.count // → 17

All the bad ones are above 15 bytes. I believe 15 is the upper limit of String's inline storage (for non-ASCII strings). Above that threshold String's storage is on the heap.

5 Likes

Well spotted. Then I guess just this will crash as well?

array.append("0123456789ABCDEF");

I think ASCII strings are packed more tightly so they can still fit in the inline storage above 15. ASCII is a 7-bit encoding, so that leaves extra bits. I'm not sure of the exact threshold but 17 or 18 bytes sounds likely for ASCII strings.

15 is the threshold:

withUnsafeBytes(of: "0123456789ABCDE") { p in
    dumpHex(p.baseAddress!, 16) // 30 31 32 33 34 35 36 37 38 39 41 42 43 44 45 ef
    // 16 here is for MemoryLayout<String>.size
}
withUnsafeBytes(of: "0123456789ABCDEF") { p in
    dumpHex(p.baseAddress!, 16) // 10 00 00 00 00 00 00 d0 50 7c 03 00 01 00 00 80
}
1 Like

I might have mixed things with NSString's tagged pointer representation. Looks like the actual limit for Swift.String inline storage is either 15, 14, or 10 depending on the platform.

Just confirmed that 15 does seem to be the cutoff here.

  auto array = swift::Array<swift::String>::init();
  array.append("0123456789ABCDE"); // len 15
  array.append("0123456789ABCDEF"); // len 16

Gives me:

GOT STRING 0123456789ABCDE
GOT STRING 
DONE PRINTING.
<EXC_BAD_ACCESS in swift::Array<swift::String>::~Array()>

(Also just noticed that the bad prints are full of U+FFFD (replacement character) if that means anything).

1 Like

Thanks for reporting this issue. Could you please file an issue on GitHub - apple/swift: The Swift Programming Language, and we'll work on fixing it.

Ok, issue filed. Thanks everyone for taking a look at this.

3 Likes

Thank you!

But wait, there's more! :sweat_smile:

So I'm now looking into workarounds and trying to pass my stuff in to Swift as std::vectors instead of converting to Swift arrays, but I'm running into a different (possibly unrelated?) issue there.

I'm defining a class in swift:

public class TestClass {
  
  public init(_ values: CppFloatVec) {
    for v in values {
      print("GOT VALUE", v)
    }
    print("DONE PRINTING.")
  }
  
  static public func create(_ values: CppFloatVec) -> TestClass {
    return TestClass(values)
  }
}

I'm defining CppFloatVec in the common header:

using CppFloatVec = std::vector<float>;

And I'm creating an instance of that class from C++.

void doTest() {
  auto vals = std::vector<float>{1.0f};
  auto obj = TestCommand::TestClass::init(vals);
}

I get the object created and printing vals correctly, but at the end of doTest when obj goes out of scope I get a debugger break in the std::vector destructor:
TestCommand(39267,0x1e1271ec0) malloc: *** error for object 0x600001298050: pointer being freed was not allocated

Interestingly, if I change the line to call TestCommand::TestClass::create(vals) instead of calling the initializer directly then I don't get the error.

Any idea what would be happening here? I'm putting together a separate bug report & repro steps for this one, but thought I'd mention it here first in case it seems like there's some common factor with the original issue.

1 Like

Ok, second bug & from-scratch repro case is up:

2 Likes

Thanks, the two issues are manifestation of the same underlying problem with consumed parameters being passed from C++ to Swift, so I'll use one issue to track the fix for Swift 5.10

Thanks again for getting that addressed.

Just wanted to mention that I'm running into another C++ interop issue and wanted to ask if it looks like that same already-addressed bug:

In this case it's String values coming out of Swift instead of going into Swift and it's giving me a hang instead of a crash so I thought I'd ask if it looks like something different. Apologies if I'm just reporting the same bug again though.

Thanks again,
-Eric