Weird SIGSEGV crashes in Collection related functions


(Zino) #1

Hello,

I have some Kitura code that I'm currently developing, and for the most part it works great. I do have, however, a real head scratcher.

The program walks through a mongodb collection, gathers some numbers and outputs some stats.

On one server (x64, ubuntu 18.04LTS, HDD, up to date in every respect package wise), the docker runs perfectly. After tens of thousands of requests, nothing seems wrong.
On the other server (x64, ubuntu 18.04LTS, SSD, up to date in every respect package wise), the docker crashes when trying to do some collection intensive manipulations.

(note: it remains true with the swift 4.1 and swift 4.2 images)

thread #7, name = 'MongoStats', stop reason = signal SIGSEGV: invalid address (fault address: 0x400033030010)
    frame #0: 0x00007ffff7ca4cf9 libswiftCore.so`swift_getGenericMetadata + 569
    frame #1: 0x00007ffff7a125bf libswiftCore.so`Swift.IndexingIterator.next() -> Swift.Optional<A.Element> + 79
    frame #2: 0x000055555563d64b MongoStats`Document.makeIndexKey(keyParts=<unavailable>, self=<unavailable>) at Document+Subscripts.swift:54

and

thread #2, name = 'MongoStats', stop reason = signal SIGSEGV: invalid address (fault address: 0x0) 
   frame #0: 0x00007ffff73dad05 libswiftCore.so`swift_dynamicCastClass + 85
    frame #1: 0x0000000000000010
    frame #2: 0x00007ffff7317d71 libswiftCore.so`merged protocol witness for Swift.Collection.count.getter : Swift.Int in conformance Swift.ArraySlice<A> : Swift.Collection in Swift + 33
    frame #3: 0x00007ffff71762e0 libswiftCore.so`protocol witness for Swift.Sequence._copyToContiguousArray() -> Swift.ContiguousArray<A.Element> in conformance Swift.ArraySlice<A> : Swift.Sequence in Swift + 16
    frame #4: 0x00007ffff736b836 libswiftCore.so`function signature specialization <Arg[0] = Owned To Guaranteed> of Swift.Array.init<A where A == A1.Element, A1: Swift.Sequence>(A1) -> Swift.Array<A> + 22
    frame #5: 0x00007ffff717da95 libswiftCore.so`Swift.Array.init<A where A == A1.Element, A1: Swift.Sequence>(A1) -> Swift.Array<A> + 21
    frame #6: 0x000055555562873f MongoStats`Document.getValue(position=144, type=objectId, kittenString=false, indexKey=nil, self=BSON.Document @ 0x00007fffdfffb520) at Document+ParsingSupport.swift:463

and

thread #2, name = 'MongoStats', stop reason = signal SIGSEGV: invalid address (fault address: 0x0)
    frame #0: 0x00007ffff73e7069 libswiftCore.so`swift_getGenericWitnessTable + 345
    frame #1: 0x00007ffff717fa20 libswiftCore.so`Swift.Array.append<A where A == A1.Element, A1: Swift.Sequence>(contentsOf: A1) -> () + 528
    frame #2: 0x0000555555a2a3f8 MongoStats`static MongoConnect.populateIndustries(of=BSON.ObjectId @ 0x00007fffecf32928, appendTo=0 values, self=<unavailable>) at main.swift:123

I strongly suspect a race condition of some description, as it sometimes works fine, but my code isn't using any dispatching at all. The crashes are also solely located in Collection manipulations, both in MongoKitten and my own top level code, and range from SIGSEGV signals on type casts, to SIGSEGV signals on count with no particular affinity for any particular symbol.

I am at a loss as to how to work around those particular crashes as they don't generate exceptions, errors or anything I can catch, and no message either, going straight for the segmentation fault.

If this message isn't 100% clear, please forgive me, as english is not my native tongue.

--
zino


(Cory Benfield) #2

Your code may not be, but is Kitura? It may be possible to run your code with Thread Sanitizer turned on, which may help reveal some thread safety issues. swift build -Xswiftc -sanitize=thread should do the trick.


(Zino) #3

When starting the program with thread sanitation, it crashes immediately

swift 4.2 image

* thread #1, name = 'MongoStats', stop reason = signal SIGSEGV: invalid address (fault address: 0x0)
  * frame #0: 0x0000000000000000
    frame #1: 0x0000555555fb3a4b MongoStats`::MonotonicNanoTime() at sanitizer_linux_libcdep.cc:754
    frame #2: 0x0000555555f8d4d8 MongoStats`::PopulateFreeArray() at sanitizer_allocator_primary64.h:700
    frame #3: 0x0000555555f8d0c5 MongoStats`::GetFromAllocator() at sanitizer_allocator_primary64.h:136
    frame #4: 0x0000555555f8cd29 MongoStats`::Refill() at sanitizer_allocator_local_cache.h:111
    frame #5: 0x0000555555f8c9d7 MongoStats`::Allocate() at sanitizer_allocator_local_cache.h:47
    frame #6: 0x0000555555f8a333 MongoStats`::Allocate() at sanitizer_allocator_combined.h:62
    frame #7: 0x0000555555f894b7 MongoStats`::user_alloc_internal() at tsan_mman.cc:157
    frame #8: 0x0000555555f899a7 MongoStats`::user_calloc() at tsan_mman.cc:183
    frame #9: 0x0000555555f3d01c MongoStats`::__interceptor_calloc() at tsan_interceptors.cc:684
    frame #10: 0x00007ffff69f5627

Total available ram on that machine : 14GB
Average load : 0.36

Multiple other swift dockers running fine.

What the heck have I run into here?

Side note : swift 4.1 image runs better and outputs tons of potential data race conditions, I'll dig into that. Thanks for the pointer!


(Cory Benfield) #4

Hrm, that looks like a bug with TSAN. What OS are you running on? I'll see if I can reproduce locally.


(Zino) #5

Ubuntu 18.04LTS - docker version 17.12.1-ce, build 7390fc6

Dockerfile uses FROM swift:4.2 and then installs libpq (using postgresql AND mongodb, I know I know)


(Cory Benfield) #6

Yup, this is a bug in Swift 4.2, filed as SR-8809.


(Zino) #7

Side note: after aggressively segregating mongo stuff from the rest I have very rare crashes. Of course, the performance drops mightily, but as we all know stability > performance :wink:


(Johannes Weiss) #8

given that TSan doesn't work you might want to try ASan? swift build -c release --sanitize=address? Or TSan/ASan on macOS might work.