Infinite loop for an array that element is dictionary with enum as key on s390x


(Sam Ding) #1

A sample case as follows:

public enum LinkParameter: String {
    case val
    case type
    case rel
}
 var linkParameters   = [LinkParameter.rel:"root"]
 var headerValue = ""
 for (linkParamer, value) in linkParameters {     
     headerValue += "; \(linkParamer)="
print("aaaaaaa headerValue=\(headerValue)")
 }

gets infinite loop on s390x. After 1st iteration linkParamer always is val that is the 1st value of the enum and value is nil.
We have updates the lib/IRGen/GenType.cpp on v4.2 to solved the infinite loop for an enum (optional) as array element.
If changing this sample code dictionary to tuple, it works fine.
So just wonder what difference between dictionary and tuple that work differently?
What is the special processing for a dictionary?

Thanks for comment / suggestion


(Carl Peto) #2

This might not be very helpful but I'd assume if it's spinning indefinitely, there must be a problem with the Iterator being returned by makeIterator() on the dictionary. I've put your code in a playground on iOS and it works fine (of course, but worth checking). I would look at the implementation of makeIterator() in the standard library file stdlib/public/core/Dictionary.swift and try to look for why on your platform the returned object is not producing the right values from its next() method. A candidate to look at might be this...

extension _NativeDictionary.Iterator: IteratorProtocol {
  @usableFromInline
  internal typealias Element = (key: Key, value: Value)

  @inlinable
  internal mutating func next() -> Element? {
    guard index != endIndex else { return nil }
    let result = base.assertingGet(at: index)
    base.formIndex(after: &index)
    return result
  }
}

from about line 4205 on my current version of stdlib/public/core/Dictionary.swift.

If for some reason base.formIndex(after: &index) fails to change the index on your platform then I think that would create the endless loop you're seeing.

Sorry if you know all that and I'm just stating the obvious!


(Sam Ding) #3

@carlos42421
Thank you for your response.
A quick question. what is your version (the code shown above)? I did not find them in stdlib/public/core/Dictionary.swift either v4.2 or master.

In fact the debugger goes to follow code from the 2nd iteration (v 4.2):

   frame #0: 0x000003fffd7795f6 libswiftCore.so`DictionaryIterator.next(self=Swift.DictionaryIterator<Key, Swift.String> @ 0x000003fffffff2d8) at Dictionary.swift:4664
   4661   @inlinable // FIXME(sil-serialize-all)
   4662   @inline(__always)
   4663   public mutating func next() -> (key: Key, value: Value)? {
-> 4664     if _fastPath(_guaranteedNative) {
   4665       return _nativeNext()
   4666     }
   4667

then

   4667
   4668     switch _state {
   4669     case ._native:
-> 4670       return _nativeNext()
   4671 #if _runtime(_ObjC)
   4672     case ._cocoa(let cocoaIterator):
   4673       if let (anyObjectKey, anyObjectValue) = cocoaIterator.next() {

then it returns nil

   4640   internal mutating func _nativeNext() -> (key: Key, value: Value)? {
   4641     switch _state {
   4642     case ._native(let startIndex, let endIndex, let buffer):
  4643       if startIndex == endIndex {
->   4644         return nil       // return here
   4645       }

And it returns for in statement
on both s390x and x86_64.
However from this point, both platforms have different traces if debugger steps in:
on x86_64, it goes to HeapObject.cpp:332 and exits from the loop, but s390x goes to SwiftObject.mm:601 and enters the loop again.
What makes the both different?


(Carl Peto) #4

My code is from September 11th, fa08557f5305890ed99ce3bff527c5cd9b09d4f6, so don't worry about the details of what I posted. The point I was making was to dig into the iterators, as you're doing. Unfortunately I can't debug into that here as I don't have a linux environment. :frowning: But what you've found looks promising.

The iterator should provide each key/value pair from the dictionary, one after the other until the iterator is spent, at which time it returns nil, which is translated to .none in swift and causes for... in to end.

I assume the iteration you're talking about is after it's already returned one key/value pair on each platform?

I would expect on the first iteration, when it is returning the first key/value pair in your example, startIndex != endIndex and it skips line 4644. Then as I understand what you're saying, in your debug session, when it should be ending the loop, it hits line 4644, startIndex == endIndex and the function _nativeNext() returns nil. Is that correct? The same on both platforms?

If this function returns nil but the loop doesn't end, there's something else going on. I think you're doing the right thing, step through the code in SwiftObject and see where it goes. Another thing you can do is dump the compiled output to SIL and paste that here for people to have a look at. It should expand and show what's going on at a lower level.

To be honest now though, you've exhausted my knowledge and I'm guessing. I'll have to stand aside for someone more knowledgeable to take over.


(Sam Ding) #5

Yes, the sample swift code has one element in the array, so it is expected to loop once. After the 1st iteration, for in statement goes to function _nativeNext() and returns nil on both x86_64 and s390x.

I have created a repo to store the sample swift files as well as IR and SIL files.
See here
sample1.swift gets infinite loop on s390x, but sample2.swift does not on s390x, which uses only one variable in for in statement

IR and SIL are generated on both x86_64 and s390x platforms.


(Jordan Rose) #6

This is almost certainly the same issue that has come up before: the implementation of Optional on big-endian systems doesn't seem to be correct yet. It looks like @levivic is working on something like this in https://github.com/apple/swift/pull/19832.


(Sam Ding) #7

@jrose

In fact we already applied the patch from patch but which seems does not cover this case.


(Jordan Rose) #8

Sure, it might not be sufficient. My point is that someone needs to go look at what IRGen is doing and what the runtime is doing and see how they're not matching up.

So far no one's been focusing on big-endian systems, so it might still take some work to get them in a steady state.


(Sam Ding) #9

The dictionary of the array is an Optional,
If the key (enum) of the dictionary in the array is replaced with Int or String, the sample code works fine.
Wonder what makes them different between enum and Int (or String).


(Carl Peto) #10

(I'm making a chatty rather than helpful comment, feel free to ignore...)

I think @jrose has hit the nail on the head, there's something in either the runtime or the standard library on this processor (his suggestion is it's the fact that it's endianness) that's buggy in this circumstance. Just want anyone to know that if IBM want to hire me to help fix this, I'm sure I can give them a very reasonable rate. :rofl:

...sadly, back in the real world (where I remain a lowly iOS developer by day)... I think your best bet is to try and debug into the runtime in this circumstance and find out where it looks suspicious. I'd love to help but I don't have access to the relevant platform! (I've never even seen one or met anyone who works on zSystem machines... would love to!)


#11

You could get an old cheap PowerMac G5 install Ubuntu 16.04 LTS Server on it and have a try porting Swift… :grin:

Well, nevertheless, a serious question from an amateur. Why is endianess so difficult? Why isn't it cared about right from the beginning?


(Carl Peto) #12

Well I wasn’t there at the beginning. You’d have to ask Chris Lattner. ;)

I’d guess because it started as an iOS and macOS language so like most projects didn’t have all platform support. Nor did Linux at first. And that’s 25 years old compared to about 5 years old. :slight_smile:


(Sam Ding) #13

After applying the patch, it seems to match the runtime process for Big-Endian.
When debugging the runtime of the sample swift code, go skip the 1st iteration and come to for in statement,

   9     var linkParameters   = [LinkParameter.rel:"root"]
   10   // var headerValue = ""
-> 11    for (linkParamer, value  ) in linkParameters {
   12   //   let tmp = linkParamer
   13    }

At this point, on x86_64, linkParamer is rel, however, on s390x, it is val (the 1st case value). What makes them different possibly ?

then step in, we can observed that it indeed gets to nil of the list

   frame #0: 0x000003fffd7d9e0e libswiftCore.so`DictionaryIterator._nativeNext(self=Swift.DictionaryIterator<Key, Swift.String> @ 0x000003fffffff328) at Dictionary.swift:4644
   4641     switch _state {
   4642     case ._native(let startIndex, let endIndex, let buffer):
   4643       if startIndex == endIndex {
-> 4644         return nil
   4645       }

Then it calls storeEnumTagSinglePayloadImpl and getEnumTagSinglePayloadImpl to get enum tag. After returning from getEnumTagSinglePayloadImpl, the enumAddr is the 0x03 (which is the nil of the enum as there is only 3 cases in the sample code):

03ffffffefc0, numEmptyCases=1, self=0x000003fffdeb5df0) at Metadata.cpp:1091
   1088   auto ret = getEnumTagSinglePayloadImpl(enumAddr, numEmptyCases, self, size,
   1089                                      numExtraInhabitants,
   1090                                      getExtraInhabitantIndex);
-> 1091   return  ret;
   1092 }
   1093
   1094 template <bool IsPOD, bool IsInline>
Target 0: (ss) stopped.
(lldb) fr v -L enumAddr
0x000003ffffffee30: (const swift::OpaqueValue *) enumAddr = 0x000003ffffffefc0
(lldb) memory read --size 1 --format x --count 16 0x000003ffffffefc0
0x3ffffffefc0: 0x03 0x00 0x02 0xaa 0x00 0x00 0x00 0x00         // on s390x, 0x03
0x3ffffffefc8: 0x00 0x00 0x03 0xff 0xfd 0xbf 0x9f 0x28
....
(lldb)  memory read --size 1 --format x --count 16 0x00007fffffffe180
0x7fffffffe180: 0x03 0x38 0xdc 0xf7 0xff 0x7f 0x00 0x00          // on x86_64, 0x03
0x7fffffffe188: 0x18 0x38 0xdc 0xf7 0xff 0x7f 0x00 0x00

At this point, seems that both platforms get the nil (0x03) for the enum,
The patch contains a right shift 56 bits for Big-endian platform link.

After returning to for in statement, x86_64 calls

 333  void swift::swift_release(HeapObject *object) {
-> 334    _swift_release(object);
   335  }

to go to the end of the loop, however s390x does NOT call this swift_release, and entry to the loop.
What is the behind logic that makes the different?


(Sam Ding) #14

See the sample1_x86_64.ir, there are two times icmp with 3 in line 196 and 226, respectively.
The same thing happens on sample1_s390x.ir. (lines 197 & 230).
This is for an array element is a dictionary whose key is enum, but if change the array element to tuple (the 1st item is enum), there is only once icmp with 3 (here 3 is the total number of the enum).
What logic make them different? Why is it needed to check the loop quit condition two times?

This part of ir code is come from lib/IRGen/EnumPayload.cpp:530 (EnumPayload::emitCompare). This function is called two times and
executes following code based the patches of s390x:

....
566     if (isMasked) {
567       if (!IGF.IGM.Triple.isLittleEndian() && IGF.IGM.Triple.isArch64Bit()) {
568         v  = IGF.Builder.CreateLShr(v, 56);
569       }
570       auto maskConstant = llvm::ConstantInt::get(intTy, maskPiece);
571       v = IGF.Builder.CreateAnd(v, maskConstant);
572     }

If we hack these code to make

 568         v  = IGF.Builder.CreateLShr(v, 56);

execute only once (either in the 1st or the 2nd calling), the for in loop exits the loop correctly on s390x.
So my questions here are

  1. why this part of code is called twice?
  2. is there any way to tell the difference between the 1st time and 2nd time calling?

@jrose @carlos42421 Could you please give your comment?

By the way, this is for v4.2, the master branch has changed so the ir is different, but it still calls the function twice.

Thanks,


(Sam Ding) #15

Here is the call stack when hitting the function lib/IRGen/EnumPayload.cpp:530 (EnumPayload::emitCompare) .

Breakpoint 1, swift::irgen::EnumPayload::emitCompare (this=0x3ffffff5c78, IGF=..., mask=..., value=...)
    at /home/work/sw/swift42/swift/lib/IRGen/EnumPayload.cpp:564
564         bool isMasked = !maskPiece.isAllOnesValue();
(gdb) bt
#0  swift::irgen::EnumPayload::emitCompare (this=0x3ffffff5c78, IGF=..., mask=..., value=...)
    at /home/work/sw/swift42/swift/lib/IRGen/EnumPayload.cpp:564
#1  0x00000000805ad8c8 in swift::irgen::EnumPayload::emitSwitch (this=0x3ffffff5c78, IGF=..., mask=..., cases=...,
    dflt=...) at /home/work/sw/swift42/swift/lib/IRGen/EnumPayload.cpp:519
#2  0x00000000802bc996 in (anonymous namespace)::SinglePayloadEnumImplStrategy::emitValueSwitch (this=0x8b711340,
    IGF=..., value=..., dests=..., defaultDest=0x8b87e070) at /home/work/sw/swift42/swift/lib/IRGen/GenEnum.cpp:1948
#3  0x00000000804b7bc0 in (anonymous namespace)::IRGenSILFunction::visitSwitchEnumInst (this=0x3ffffff6af0,
    inst=0x8af2dac0) at /home/work/sw/swift42/swift/lib/IRGen/IRGenSIL.cpp:2933
#4  0x00000000804a872e in swift::SILInstructionVisitor<(anonymous namespace)::IRGenSILFunction, void>::visit (
    this=0x3ffffff6af0, inst=0x8af2dac0) at /home/work/sw/swift42/swift/include/swift/SIL/SILNodes.def:525
#5  0x00000000804a014a in (anonymous namespace)::IRGenSILFunction::visitSILBasicBlock (this=0x3ffffff6af0,
    BB=0x8aee4740) at /home/work/sw/swift42/swift/lib/IRGen/IRGenSIL.cpp:1830
#6  0x000000008049be24 in (anonymous namespace)::IRGenSILFunction::emitSILFunction (this=0x3ffffff6af0)
    at /home/work/sw/swift42/swift/lib/IRGen/IRGenSIL.cpp:1696
#7  0x000000008049b2ea in swift::irgen::IRGenModule::emitSILFunction (this=0x3ffffff7998, f=0x8b752410)
    at /home/work/sw/swift42/swift/lib/IRGen/IRGenSIL.cpp:1612
#8  0x00000000802488f0 in swift::irgen::IRGenerator::emitGlobalTopLevel (this=0x3ffffff90c8,
    emitForParallelEmission=false) at /home/work/sw/swift42/swift/lib/IRGen/GenDecl.cpp:1080
#9  0x000000008041d214 in performIRGeneration (Opts=..., M=0x8ae186b0, SILMod=..., ModuleName=..., PSPs=...,
    LLVMContext=..., SF=0x0, outModuleHash=0x0, StartElem=0) at /home/work/sw/swift42/swift/lib/IRGen/IRGen.cpp:752
#10 0x000000008041bee2 in swift::performIRGeneration (Opts=..., M=0x8ae186b0, SILMod=..., ModuleName=..., PSPs=...,
    LLVMContext=..., parallelOutputFilenames=..., outModuleHash=0x0)
    at /home/work/sw/swift42/swift/lib/IRGen/IRGen.cpp:1074
#11 0x000000008021f76a in swift::RunImmediately (CI=..., CmdLine=..., IRGenOpts=..., SILOpts=...)
    at /home/work/sw/swift42/swift/lib/Immediate/Immediate.cpp:301
#12 0x00000000801c945e in processCommandLineAndRunImmediately (Invocation=..., Instance=..., SM=..., MSF=...,
    observer=0x0, ReturnValue=@0x3ffffffb5f4: 0) at /home/work/sw/swift42/swift/lib/FrontendTool/FrontendTool.cpp:1129
#13 0x00000000801c48ec in performCompileStepsPostSILGen (Instance=..., Invocation=..., SM=...,
    astGuaranteedToCorrespondToSIL=true, MSF=..., PSPs=..., moduleIsPublic=false, ReturnValue=@0x3ffffffb5f4: 0,
    observer=0x0, Stats=0x0) at /home/work/sw/swift42/swift/lib/FrontendTool/FrontendTool.cpp:1315
#14 0x00000000801bee7c in performCompile (Instance=..., Invocation=..., Args=..., ReturnValue=@0x3ffffffb5f4: 0,
    observer=0x0, Stats=0x0) at /home/work/sw/swift42/swift/lib/FrontendTool/FrontendTool.cpp:975
#15 0x00000000801bdb04 in swift::performFrontend (Args=..., Argv0=0x3fffffff71f "/home/work/sw_42/usr/bin/swift",
    MainAddr=0x800d9c74 <getExecutablePath[abi:cxx11](char const*)>, observer=0x0)
    at /home/work/sw/swift42/swift/lib/FrontendTool/FrontendTool.cpp:1789
#16 0x00000000800dbade in run_driver (ExecName=..., argv=...)
    at /home/work/sw/swift42/swift/tools/driver/driver.cpp:121
#17 0x00000000800dab2c in main (argc_=9, argv_=0x3fffffff4a8)
    at /home/work/sw/swift42/swift/tools/driver/driver.cpp:248

We checked the calling functions of each layer, but did not find the difference between the 1st and 2nd time calling.
Seems that the while statement in /lib/IRGen/IRGenSIL.cpp:1692 invokes the calling.
But how do we find the difference of the 2 time calls?

Thanks,


(Carl Peto) #16

Hi @samding,

I hadn't replied to any of your messages as, to be honest, I was a little out of my depth. haha.

But I find it really interesting!

From the little I've seen in my own work, and your investigation, I'm trying to summarise the situation. As I understand from what you're saying, there is some kind of mismatch on big endian systems to do with how enumerations are handled. It looks to me like when SIL/LLVM IR/assembly is emitted for enumerations, there's some structure emitted that gives details of the cases, etc. as static data in the object files.

Something that confuses me is these seem to be called "metadata" (one example seems to be "enum tag" information) but I'm not sure that they are the same thing as LLVM IR metadata, they seem to be unrelated to that, but I could be wrong. My understanding is embryonic!

When iterating through using for...in, runtime functions in "MetadataImpl.h" are called (on the master branch, might be named different in 4.2), like swift_getEnumCaseSinglePayload, getEnumTagSinglePayloadImpl. These look through the "enum tag" "metadata" for the enum to provide the information for iteration, specifically in this case whether next() should return another result or nil.

And on big endian systems, there's some mismatch between how the compiler emits this "metadata" and how the runtime interprets it, that is causing this bug.

Is that right?


(Sam Ding) #17

Actually I found the difference between the 1st and 2nd times to call
EnumPayload::emitCompare, and then it is ok to have a work around to make the sample
code work correctly.

Thanks.