Weird SIGSEGV crashes in Collection related functions

(Zino) #1


I have some Kitura code that I'm currently developing, and for the most part it works great. I do have, however, a real head scratcher.

The program walks through a mongodb collection, gathers some numbers and outputs some stats.

On one server (x64, ubuntu 18.04LTS, HDD, up to date in every respect package wise), the docker runs perfectly. After tens of thousands of requests, nothing seems wrong.
On the other server (x64, ubuntu 18.04LTS, SSD, up to date in every respect package wise), the docker crashes when trying to do some collection intensive manipulations.

(note: it remains true with the swift 4.1 and swift 4.2 images)

thread #7, name = 'MongoStats', stop reason = signal SIGSEGV: invalid address (fault address: 0x400033030010)
    frame #0: 0x00007ffff7ca4cf9`swift_getGenericMetadata + 569
    frame #1: 0x00007ffff7a125bf` -> Swift.Optional<A.Element> + 79
    frame #2: 0x000055555563d64b MongoStats`Document.makeIndexKey(keyParts=<unavailable>, self=<unavailable>) at Document+Subscripts.swift:54


thread #2, name = 'MongoStats', stop reason = signal SIGSEGV: invalid address (fault address: 0x0) 
   frame #0: 0x00007ffff73dad05`swift_dynamicCastClass + 85
    frame #1: 0x0000000000000010
    frame #2: 0x00007ffff7317d71`merged protocol witness for Swift.Collection.count.getter : Swift.Int in conformance Swift.ArraySlice<A> : Swift.Collection in Swift + 33
    frame #3: 0x00007ffff71762e0`protocol witness for Swift.Sequence._copyToContiguousArray() -> Swift.ContiguousArray<A.Element> in conformance Swift.ArraySlice<A> : Swift.Sequence in Swift + 16
    frame #4: 0x00007ffff736b836`function signature specialization <Arg[0] = Owned To Guaranteed> of Swift.Array.init<A where A == A1.Element, A1: Swift.Sequence>(A1) -> Swift.Array<A> + 22
    frame #5: 0x00007ffff717da95`Swift.Array.init<A where A == A1.Element, A1: Swift.Sequence>(A1) -> Swift.Array<A> + 21
    frame #6: 0x000055555562873f MongoStats`Document.getValue(position=144, type=objectId, kittenString=false, indexKey=nil, self=BSON.Document @ 0x00007fffdfffb520) at Document+ParsingSupport.swift:463


thread #2, name = 'MongoStats', stop reason = signal SIGSEGV: invalid address (fault address: 0x0)
    frame #0: 0x00007ffff73e7069`swift_getGenericWitnessTable + 345
    frame #1: 0x00007ffff717fa20`Swift.Array.append<A where A == A1.Element, A1: Swift.Sequence>(contentsOf: A1) -> () + 528
    frame #2: 0x0000555555a2a3f8 MongoStats`static MongoConnect.populateIndustries(of=BSON.ObjectId @ 0x00007fffecf32928, appendTo=0 values, self=<unavailable>) at main.swift:123

I strongly suspect a race condition of some description, as it sometimes works fine, but my code isn't using any dispatching at all. The crashes are also solely located in Collection manipulations, both in MongoKitten and my own top level code, and range from SIGSEGV signals on type casts, to SIGSEGV signals on count with no particular affinity for any particular symbol.

I am at a loss as to how to work around those particular crashes as they don't generate exceptions, errors or anything I can catch, and no message either, going straight for the segmentation fault.

If this message isn't 100% clear, please forgive me, as english is not my native tongue.


(Cory Benfield) #2

Your code may not be, but is Kitura? It may be possible to run your code with Thread Sanitizer turned on, which may help reveal some thread safety issues. swift build -Xswiftc -sanitize=thread should do the trick.

(Zino) #3

When starting the program with thread sanitation, it crashes immediately

swift 4.2 image

* thread #1, name = 'MongoStats', stop reason = signal SIGSEGV: invalid address (fault address: 0x0)
  * frame #0: 0x0000000000000000
    frame #1: 0x0000555555fb3a4b MongoStats`::MonotonicNanoTime() at
    frame #2: 0x0000555555f8d4d8 MongoStats`::PopulateFreeArray() at sanitizer_allocator_primary64.h:700
    frame #3: 0x0000555555f8d0c5 MongoStats`::GetFromAllocator() at sanitizer_allocator_primary64.h:136
    frame #4: 0x0000555555f8cd29 MongoStats`::Refill() at sanitizer_allocator_local_cache.h:111
    frame #5: 0x0000555555f8c9d7 MongoStats`::Allocate() at sanitizer_allocator_local_cache.h:47
    frame #6: 0x0000555555f8a333 MongoStats`::Allocate() at sanitizer_allocator_combined.h:62
    frame #7: 0x0000555555f894b7 MongoStats`::user_alloc_internal() at
    frame #8: 0x0000555555f899a7 MongoStats`::user_calloc() at
    frame #9: 0x0000555555f3d01c MongoStats`::__interceptor_calloc() at
    frame #10: 0x00007ffff69f5627

Total available ram on that machine : 14GB
Average load : 0.36

Multiple other swift dockers running fine.

What the heck have I run into here?

Side note : swift 4.1 image runs better and outputs tons of potential data race conditions, I'll dig into that. Thanks for the pointer!

(Cory Benfield) #4

Hrm, that looks like a bug with TSAN. What OS are you running on? I'll see if I can reproduce locally.

(Zino) #5

Ubuntu 18.04LTS - docker version 17.12.1-ce, build 7390fc6

Dockerfile uses FROM swift:4.2 and then installs libpq (using postgresql AND mongodb, I know I know)

(Cory Benfield) #6

Yup, this is a bug in Swift 4.2, filed as SR-8809.

(Zino) #7

Side note: after aggressively segregating mongo stuff from the rest I have very rare crashes. Of course, the performance drops mightily, but as we all know stability > performance :wink:

(Johannes Weiss) #8

given that TSan doesn't work you might want to try ASan? swift build -c release --sanitize=address? Or TSan/ASan on macOS might work.