I'm trying to run a more recent Swift on my Scaleway VPS, but starting w/ 5.7 calling the driver just seems to hang.
5.5 and 5.6 works:
docker run -it --rm --name ts swift:5.6 bash
root@b48676db35cc:/# swift --version
Swift version 5.6.3 (swift-5.6.3-RELEASE)
Target: x86_64-unknown-linux-gnu
Starting w/ Swift 5.7 (up to 6.0.1 and nightlies) this hangs:
docker run -it --rm --name ts swift:5.7 bash
root@b48676db35cc:/# swift --version
# hangs forever
The VPS is small, only 2GB of physical memory, but 8GB of swap, which doesn't get touched. Is there a minimum amount of memory required by Swift now? I played with -m 1g --memory-swap=8gb etc, that doesn't seem to help.
Starting the driver in GDB got me this suspicious thing (temporary resource shortage that isn't temporary?):
I wonder if it thinks there are more than two cores. I've seen problems in the past with things that detect the number of actual cores on the system rather than the number of cores assigned to the container.
I posted that above already. This is w/ gdb in the swift:5.7 image:
root@6d7be96bf00f:/# gdb swiftc
GNU gdb (Ubuntu 12.1-0ubuntu1~22.04.2) 12.1
(gdb) r --version
Starting program: /usr/bin/swiftc --version
warning: Error disabling address space randomization: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
process 401 is executing new program: /usr/bin/swift-driver
warning: Section .debug_names in /usr/bin/swift-driver length 5772 does not match section length 1577208, ignoring .debug_names.
warning: Could not find DWO CU /home/build-user/build/buildbot_linux/swiftdriver-linux-x86_64/x86_64-unknown-linux-gnu/release/ModuleCache/28OHVX4XUBPU4/CSwiftScan-P6X4JH2ZLZ8C.pcm(0x9abb811053751965) referenced by CU at offset 0x11f73 [in module /usr/bin/swift-driver]
...
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
^C
Program received signal SIGINT, Interrupt.
0x000076213c0a078a in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7ffd50215db0, rem=rem@entry=0x7ffd50215db0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78
78 ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or directory.
(gdb) bt
#0 0x000076213c0a078a in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7ffd50215db0, rem=rem@entry=0x7ffd50215db0)
at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78
#1 0x000076213c0a5677 in __GI___nanosleep (req=req@entry=0x7ffd50215db0, rem=rem@entry=0x7ffd50215db0) at ../sysdeps/unix/sysv/linux/nanosleep.c:25
#2 0x000076213c0a55ae in __sleep (seconds=0) at ../sysdeps/posix/sleep.c:55
#3 0x000076213ccc21eb in _dispatch_temporary_resource_shortage () from /usr/bin/../lib/swift/linux/libdispatch.so
#4 0x000076213ccd1265 in _dispatch_root_queue_poke_slow () from /usr/bin/../lib/swift/linux/libdispatch.so
#5 0x000076213ccdbffe in _dispatch_epoll_init () from /usr/bin/../lib/swift/linux/libdispatch.so
#6 0x000076213ccc8cbf in _dispatch_once_callout () from /usr/bin/../lib/swift/linux/libdispatch.so
#7 0x000076213ccdbf66 in _dispatch_event_loop_poke () from /usr/bin/../lib/swift/linux/libdispatch.so
#8 0x000076213cccc234 in _dispatch_lane_resume_activate () from /usr/bin/../lib/swift/linux/libdispatch.so
#9 0x000076213cd21d0d in $s8Dispatch0A6SourceCAA0aB8ProtocolA2aDP6resumeyyFTW () from /usr/bin/../lib/swift/linux/libswiftDispatch.so
#10 0x0000623db05929bb in swift_driver_main () at /home/build-user/swift-driver/Sources/swift-driver/main.swift:59
(gdb) info threads
Id Target Id Frame
* 1 Thread 0x762139ee27c0 (LWP 401) "swift-driver" 0x000076213c0a078a in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7ffd50215db0,
rem=rem@entry=0x7ffd50215db0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78
Sorry, hadn't spotted that. It isn't immediately obvious to me what's going wrong, but I'll ask some of our Dispatch experts to take a look and see what they think.
If it was, then the issue seems to be to do with the pthreads implementation itself, since we should be able to create more than one thread. That makes me wonder what version of Linux you're using/which version of Glibc is installed on the machine that is going wrong.
Also it occurs to me that with Docker you're sharing the system with other processes. If you had a runaway process outside of the container that had made absolutely boatloads of threads, thread creation within the container might conceivably fail.
Yes, it was only one thread. As far as I can tell it is GCD very early in the bootstrapping phase?
Also remember: This affects 5.7+
5.5 and 5.6 images run just fine! So they seem to be able to create threads.
The system is under very low load, I can't imagine that there is a particular issue w/ creating two threads:
163, also shown in the top output above. I don't think it is a specific constraint, given that Swift 5.6 does run just fine.
I've also tried to find something in a log file, but couldn't so far.
I'll have to leave for today, maybe I could fire up a container somehow and give you SSH access if you want to explore it further.
The other thing I could try is see whether I can fire up a fresh Scaleway 2-CPU and check whether it has the same issue (this one is years old and upgraded forwards, I even did the upgrade to Noble just to see whether that would fix the issue).
Given that 2-CPU works for Sven on Xeon's, maybe it is some weird AMD EPIC vs Xeon difference?