Build crash when building in QEMU using new Swift 5.6 arm64 image

Hello! I'm trying to update the rvolosatovs/protoc Docker image to add arm64 support. This image contains Google protobuf support for a number of languages, including Swift. To build a multiarch image, I depend on the newly released swift:5.6-focal image, since it is one of the only Swift Docker images with arm64 support. I need to build grpc-swift in this section of my Dockerfile in order to run successfully build the protoc image:

FROM swift:5.6-focal as swift_builder
RUN apt-get update && \
    apt-get install -y unzip patchelf libnghttp2-dev curl libssl-dev zlib1g-dev build-essential

RUN mkdir -p /grpc-swift && \
    curl -sSL https://api.github.com/repos/grpc/grpc-swift/tarball/1.7.1 | tar xz --strip 1 -C /grpc-swift && \
    cd /grpc-swift && make && make plugins && \
    ...

This runs successfully on native linux/arm64 and linux/amd64. However, when I am building in QEMU with --platform=linux/arm64,linux/amd64 on an x86_64 processor, building grpc-swift fails every time with make: *** [Makefile:25: all] Error 1. I added the -v switch to the grpc-swift swift build command, but it turns out the point of failure is different every time. It will eventually always fail at this point:

/usr/bin/swiftc -module-name NIOConcurrencyHelpers -incremental -emit-dependencies -emit-module -emit-module-path /grpc-swift/.build/aarch64-unknown-linux-gnu/debug/NIOConcurrencyHelpers.swiftmodule -output-file-map /grpc-swift/.build/aarch64-unknown-linux-gnu/debug/NIOConcurrencyHelpers.build/output-file-map.json -parse-as-library -c /grpc-swift/.build/checkouts/swift-nio/Sources/NIOConcurrencyHelpers/NIOAtomic.swift /grpc-swift/.build/checkouts/swift-nio/Sources/NIOConcurrencyHelpers/atomics.swift /grpc-swift/.build/checkouts/swift-nio/Sources/NIOConcurrencyHelpers/lock.swift -I /grpc-swift/.build/aarch64-unknown-linux-gnu/debug -target aarch64-unknown-linux-gnu -swift-version 5 -enable-batch-mode -index-store-path /grpc-swift/.build/aarch64-unknown-linux-gnu/debug/index/store -Onone -enable-testing -g -j8 -DSWIFT_PACKAGE -DDEBUG -Xcc -fmodule-map-file=/grpc-swift/.build/aarch64-unknown-linux-gnu/debug/CNIOAtomics.build/module.modulemap -Xcc -I -Xcc /grpc-swift/.build/checkouts/swift-nio/Sources/CNIOAtomics/include -module-cache-path /grpc-swift/.build/aarch64-unknown-linux-gnu/debug/ModuleCache -parseable-output -parse-as-library -color-diagnostics
make: *** [Makefile:25: all] Error 1

I'm asking here first because I suspect the problem is with the build environment rather than the grpc-swift package. But I'm a little bit at a loss of how to debug this further, since I'm not at all familiar with Swift. Could this be a memory corruption problem? Or something in QEMU? How can I get more verbose compiler output on the cause of the error? Can anyone else try compiling this code (or their own code) in the latest Swift image while emulating arm64 on amd64? You can enter the image build environment like this:

docker run --platform linux/arm64 -it swift:5.6-focal bash

And then run the above build instructions, or your own. I have reproduced this on 3 machines so far, I'd be curious if other Swift code also fails. The reason it is important to be able to run like this is that most CI runners are using x86, so we must build multiarch images on this arch.

Could you post the full output somewhere? It likely looks like the point of failure is different because it's spawning a number of jobs concurrently so the last line of the output isn't necessarily the one that's failing.

With regards to Qemu: Yes, many things can go wrong for example:

  • the QEMU_CPU environment variable might not be set to max, so it might be emulating a too-old CPU
  • out of memory
  • Qemu emulation bug
  • unsupported instruction in Qemu's CPU emulation
  • ...

Are you using Docker for Mac? If yes, what version is that?

Actually, I can reproduce. I ran docker run -it --rm --platform linux/arm64 swift:5.6-focal on an Intel Mac, ie. we're emulating an arm64. Then (in the container) I did ulimit -c 99999999 to get core files, then I ran QEMU_CPU=max make -j1 and that created a core file for me:

root@d15eab5ee357:/grpc-swift# file *.core 
qemu_swift-driver_20220321-151932_7165.core: ELF 64-bit LSB core file, ARM aarch64, version 1 (SYSV), SVR4-style, from '/usr/bin/swift-driver --driver-mode=swiftc -Xfrontend -new-driver-path -Xfronte', real uid: 0, effective uid: 0, real gid: 0, effective gid: 0, execfn: '/usr/bin/swift-driver', platform: 'aarch64'
root@d15eab5ee357:/grpc-swift# lldb /usr/bin/swift-driver -c *.core
(lldb) target create "/usr/bin/swift-driver" --core "qemu_swift-driver_20220321-151932_7165.core"
[...]
Core file '/grpc-swift/qemu_swift-driver_20220321-151932_7165.core' (aarch64) was loaded.
(lldb) bt
* thread #1, name = 'swift-driver', stop reason = signal SIGTRAP
  * frame #0: 0x0000005502f00404 libdispatch.so`_dispatch_unfair_lock_lock_slow.cold.1
    frame #1: 0x0000005502f1ee7c libdispatch.so`_dispatch_unfair_lock_lock_slow + 152
    frame #2: 0x0000005502f1f1fc libdispatch.so`_dispatch_workq_worker_register + 160
    frame #3: 0x0000005502f16614 libdispatch.so`_dispatch_worker_thread + 200
    frame #4: 0x0000005502e4851c libpthread.so.0`start_thread(arg=0x00000055284927ff) at pthread_create.c:477:8
    frame #5: 0x00000055033e222c libc.so.6`___lldb_unnamed_symbol633$$libc.so.6 + 12
(lldb) 

So something's in libdispatch is getting a SIGTRAP. I don't have time to debug right now but I hope that's getting you a little further. You can set various QEMU_* variables as documented by Qemu to get all sorts of debugging info.

1 Like

Sure, full output of an example build is here, the relevant part starts around line 285. This is the Dockerfile I am building.

I am using Ubuntu 21.10 Desktop on an aging intel i7-4810MQ so it is conceivable the CPU might not support some modern features, but I was also able to reproduce in GitHub Actions CI which would be using more modern Xeons I believe. Here are my local Docker environment details:

strophy@W540:~/Code/strophy/docker-protobuf$ docker version
Client: Docker Engine - Community
 Cloud integration: 1.0.17
 Version:           20.10.13
 API version:       1.41
 Go version:        go1.16.15
 Git commit:        a224086
 Built:             Thu Mar 10 14:07:55 2022
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.13
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.15
  Git commit:       906f57f
  Built:            Thu Mar 10 14:05:44 2022
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.5.10
  GitCommit:        2a1d4dbdb2a1030dc5b01e96fb110a9d9f150ecc
 runc:
  Version:          1.0.3
  GitCommit:        v1.0.3-0-gf46b6ba
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

I am using the following commands to prepare the builder:

docker pull tonistiigi/binfmt
docker run --privileged --rm tonistiigi/binfmt --uninstall qemu-*
docker run --privileged --rm tonistiigi/binfmt --install all
docker buildx rm multiarch
docker buildx create --name multiarch --driver docker-container --use
docker buildx inspect --bootstrap

I don't understand the commands you ran or what libdispatch does, or even how a compiler really works. If you could point me in the right direction I can start reading, but I'm not sure I can solve this problem on my own.

Essentially, there's either a bug in libdispatch (which is a core component of Swift's runtime) or a bug in Qemu. The correct thing to do here is to file a bug against libdispatch.

Thanks for the information. libdispatch does not allow issues on GitHub, does not specify how to contact maintainers in the CONTRIBUTING.md, and has not responded to my request to join the mailing list, so I'm not sure where or how to submit a bug. I've discussed with the maintainer of the protoc image and we decided to omit Swift from the list of supported languages until someone familiar with the language is able to help us with this bug.

Ping @Pierre_Habouzit ?

Please comment on this GitHub issue tracking the lack of Swift support when the bug is resolved.

Another user is also facing this issue: Build `aarch64` Docker images in CI · Issue #124 · fwcd/d2 · GitHub

Just tried this again for the first time in a while and it seems to work using the swift:5.9-jammy image.

1 Like