Value Witness table gets corrupt on s390x (big_endian)

Validation test on s390x shows some failed test cases that share the same reason:

[ RUN      ] Collection.MinimalMutableRangeReplaceableCollection<OpaqueValue<Int>>.Type._preprocessingPass/semantics
stderr>>> a.out: /home/work/sw/swift4/swift/stdlib/public/runtime/HeapObject.cpp:76: swift::HeapObject *_swift_allocObject_(const swift::HeapMetadata *, size_t, size_t): Assertion `isAlignmentMask(requiredAlignmentMask)' failed.
stderr>>> CRASHED: SIGABRT
the test crashed unexpectedly

Here is a sample swift code hitting the same error:

var x = [1, 2]
enum SillyError : Error { case JazzHands }
  do {
    try x.withUnsafeMutableBufferPointer { p in
      p[0] = 4
      p[1] = 5

      throw SillyError.JazzHands
    }
  } catch {}

Debugging shows the problem is ErrorObjectNative.cpp:75, and back to ErrorObjectNative.cpp:38:

  33   static std::pair<size_t, size_t>
   34   _getErrorAllocatedSizeAndAlignmentMask(const Metadata *type) {
   35     // The value is tail-allocated after the SwiftError record with the
   36     // appropriate alignment.
   37     auto vw = type->getValueWitnesses();
-> 38     size_t size = sizeof(SwiftError);
   39     unsigned valueAlignMask = vw->getAlignmentMask();
   40     size = (size + valueAlignMask) & ~(size_t)valueAlignMask;
   41     size += vw->getSize();

At this point, the type is:

(lldb) fr v *type
(swift::Metadata) *type = (Kind = 2929167701780)

and vw is

(lldb) fr v -L *vw
0x000002aa000016f8: (const swift::ValueWitnessTable) *vw = {
0x000002aa000016f8:   initializeBufferWithCopyOfBuffer = 0xebbff0580024a7fb
0x000002aa00001700:   destroy = 0xff60b90400bfc0e5
0x000002aa00001708:   initializeWithCopy = 0xffffffd7ebbfb0f8
0x000002aa00001710:   assignWithCopy = 0x000407feebbff058
0x000002aa00001718:   initializeWithTake = 0x0024a7fbff58b904
0x000002aa00001720:   assignWithTake = 0x00bfb904000ac010
0x000002aa00001728:   initializeBufferWithTakeOfBuffer = 0x00002b3541201008
0x000002aa00001730:   getEnumTagSinglePayload = 0xc03000002adce300
0x000002aa00001738:   storeEnumTagSinglePayload = 0xb0a00024c0e5ffff
0x000002aa00001740:   size = 18206623652139499524
0x000002aa00001748:   flags = {
0x000002aa00001748:     Data = 575905532951392344
  }
0x000002aa00001750:   stride = 10317799924218116
}

It seems that the type and valueWitnesses are both corrupted .

The TraceStack is

(lldb) bt
* thread #1, name = 'cast2', stop reason = step over
  * frame #0: 0x000003fffdd24ea4 libswiftCore.so`_getErrorAllocatedSizeAndAlignmentMask(type=0x000002aa00006ce8) at ErrorObjectNative.cpp:38
    frame #1: 0x000003fffdd24d4a libswiftCore.so`swift_allocError(type=0x000002aa00006ce8, errorConformance=0x0000000000000000, initialValue=0x000003ff00000000, isTake=true) at ErrorObjectNative.cpp:73
    frame #2: 0x000002aa00001880 cast2`closure #1 in (p=0x000003fffffff1e0, $error=(instance_type = 0x0000000000000000)) at cast2.swift:11
    frame #3: 0x000002aa000018f6 cast2`thunk for @callee_guaranteed (@inout UnsafeMutableBufferPointer<Int>) -> (@error @owned Error) at cast2.swift:0
    frame #4: 0x000002aa00001a1c cast2`partial apply for thunk for @callee_guaranteed (@inout UnsafeMutableBufferPointer<Int>) -> (@error @owned Error) at cast2.swift:0
    frame #5: 0x000003fffd8697de libswiftCore.so`Array.withUnsafeMutableBufferPointer<A>(body=0x000002aa000019c0 cast2`partial apply forwarder for reabstraction thunk helper from @callee_guaranteed (@inout Swift.UnsafeMutableBufferPointer<Swift.Int>) -> (@error @owned Swift.Error) to @callee_guaranteed (@inout Swift.UnsafeMutableBufferPointer<Swift.Int>) -> (@out (), @error @owned Swift.Error) at cast2.swift, self=0x000002aa00007200, $error=(instance_type = 0x0000000000000000)) at Arrays.swift:4154
    frame #6: 0x000002aa00001610 cast2`main at cast2.swift:7
    frame #7: 0x000003fffd0a2ece libc.so.6`__libc_start_main + 270

It is not able to find how and where the type comes from.
Do you have any ideas on type comes from and the root reason of the errors?

By the way, on x86_64, the type is

(lldb) fr v *type
(swift::Metadata) *type = (Kind = 2)

and works fine.

Thanks,

Sam

You pretty much have no choice but to backtrack and figure out where type is coming from, because yeah, it doesn't look like it's a valid type-metadata object at all.

Thanks @John_McCall
In fact, from the TraceStack shown above, the frame #1 (ErrorObjectNative.cpp:73) directly goes back to
the sample swift code (frame #2):

   8          p[0] = 4
   9          p[1] = 5
   10
-> 11         throw SillyError.JazzHands
   12       }
   13     } catch {}

Seems that there should be some code between the 2 frames, but how can I find it?

Thanks,

  1. Check whether the type metadata is valid when the swiftc-generated code calls the runtime. (The generated code can't be directly calling _getErrorAllocatedSizeAndAlignmentMask, because that function is static in the runtime.)

  2. Assuming it isn't, figure out where it's coming from in the generated code. You will need to ask your debugger to show disassembly for that code, and probably to step through it instruction by instruction.

On x86_64 (which is working correctly), tracing on assemble code instruction by instruction, then logic calls sysdeps/x86_64/dl-trampoline.h as follows:

(gdb) disass
Dump of assembler code for function swift_allocError@plt:
=> 0x00005555555550c0 <+0>:     jmpq   *0x5faa(%rip)        # 0x55555555b070 <swift_allocError@got.plt>
   0x00005555555550c6 <+6>:     pushq  $0xe
   0x00005555555550cb <+11>:    jmpq   0x555555554fd0
End of assembler dump.
(gdb) ni
0x00005555555550c6 in swift_allocError@plt ()
(gdb)
0x00005555555550cb in swift_allocError@plt ()
(gdb) si
0x0000555555554fd0 in ?? ()
(gdb) disass
No function contains program counter for selected frame.
(gdb) ni
0x0000555555554fd6 in ?? ()
(gdb)
_dl_runtime_resolve_avx () at ../sysdeps/x86_64/dl-trampoline.h:64
64      ../sysdeps/x86_64/dl-trampoline.h: No such file or directory.
(gdb) disass 0x0000555555554fd0
No function contains specified address.
(gdb) ni
67      in ../sysdeps/x86_64/dl-trampoline.h
(gdb)
69      in ../sysdeps/x86_64/dl-trampoline.h
....
159     in ../sysdeps/x86_64/dl-trampoline.h
(gdb)
swift::swift_allocError (type=0x5, errorConformance=0x7fffffffe3a8, initialValue=0x2, isTake=false)
    at /home/work/sw/swift4/swift/stdlib/public/runtime/ErrorObjectNative.cpp:71
71                              bool isTake) {
(gdb)
  1. debugger did not show source code of dl-trampoline.h, search all paths and could not find it;
    Seems dl-trampoline.h is in the package of libc.
  2. still did not find where and how type is coming from;
  3. Is swift::swift_allocError is called by some statements in IR file?

Thanks,

Yes; it’s called when creating an Error value from a value of a type that conforms to the protocol.

Thanks John,

I generated IR for the sample swift code, but cannot figure out what statement calling "swift::swift_allocError".
Attached is the IR file, could you help us to know ?
(See attached file: sample.zip)

Sorry for not sending to the forum as it is not allowed to send large size file

Thanks,

Sam Ding,
Linux on z Systems Open Source Ecosystem
IBM Toronto Lab,
email: samding@ca.ibm.com
phone: (905)413-2947

> To: samding@ca.ibm.com
> Date: 05/25/2018 01:01 PM
`> Subject: [Swift Forums] [Development/Compiler] Value Witness table

gets corrupt on s390x (big_endian)>
[image removed] >
John_McCall
May 25 >
Yes; it’s called when creating an Error value from a value of a type
that conforms to the protocol.>
Visit Topic or reply to this email to respond.>
In Reply To>
[image removed] >
samding
May 25 >
On x86_64 (which is working correctly), tracing on assemble code
instruction by instruction, then logic calls sysdeps/x86_64/dl-
trampoline.h as follows: (gdb) disass Dump of assembler code for
function swift_allocError@plt: => 0x00005555555550c0 <+0>: jmpq
*0x5faa(%rip) # 0x55555555b07…>
Visit Topic or reply to this email to respond.> To unsubscribe from these emails, click here.`

It's the throw statement.

@John_McCall

Yes, it is throw statement in swift that calls swift::swift_allocError, which is shown by the debugger. What I am asking is that in IR file what statement that calls swift::swift_allocError. Probably the parameters of that statement in IR can give us a hint on type

Thanks,

You mean, what line in the LLVM IR output corresponds to calling swift_allocError? It'll be the line that looks like call { %swift.error*, %swift.opaque* } @swift_allocError(...); LLVM IR is pretty intuitive to read. But I was suggesting that you try to track the value in assembly.

It's possible that the problem is that @swift_allocError returns two values and that your target's calling-convention lowering for that doesn't look like what the Swift frontend expects — e.g. maybe the C code returns the result indirectly but swiftc expects it to be returned directly. In that case, the fix might just be to make this a swiftcc function.

Thanks @John_McCall

Yes, you are right. The problem is on the convention. After adding SWIFT_CC(swift) to function swift_allocError, it works on s390x.

Note that there are other functions like:

swift_deallocError
swift_getErrorValue
swift_errorRetain
swift_willThrow

Should we add SWIFT_CC(swift) to them?

Thanks,

If you wouldn't mind doing this work, actually, I think it would be reasonable to make every Swift runtime function use swiftcc. There's no good reason to use the C calling convention for any of them.

(Note that you should update RuntimeFunctions.def and the call-emission code in IRGen as well. But note that this just applies to the swift_* functions, not e.g. objc_*.)

If that looks like too much, it's fine to just do swift_allocError. You might want to go ahead and look for other runtime functions that return multiple values, though.

1 Like