What failed in CI build?

Hello,

I have a SwiftPM PR that is blocking two other PR targeting release/6.3. However, Windows Platform build failed on the CI failed and I'm unable to determine the reason.

Looks like there is a similar failure on 🍒[clang][Dependency Scanning] Move Module Timestamp Update After Compi… by qiongsiwu · Pull Request #12009 · swiftlang/llvm-project · GitHub.

The build report a fatalError(), and nothing else

<...SNIP...>
[1217/1239] Compiling icu_packaged_data.cpp
error: fatalError
Error: Error: swift.exe exited with code 1.
Invocation:
  T:\Program Files\Swift\Toolchains\0.0.0+Asserts\usr\bin\swift.exe test --scratch-path T:\x86_64-unknown-windows-msvc\FoundationTests --package-path C:\Users\swift-ci\jenkins\workspace\swiftpm-PR-windows\swift-corelibs-foundation -c debug -Xbuild-tools-swiftc -IT:\Program Files\Swift\Platforms\Windows.platform\Developer\SDKs\Windows.sdk\usr\lib\swift -Xbuild-tools-swiftc -LT:\Program Files\Swift\Platforms\Windows.platform\Developer\SDKs\Windows.sdk\usr\lib\swift\windows -Xcc -IT:\Program Files\Swift\Platforms\Windows.platform\Developer\SDKs\Windows.sdk\usr\lib\swift -Xlinker -LT:\Program Files\Swift\Platforms\Windows.platform\Developer\SDKs\Windows.sdk\usr\lib\swift\windows -debug-info-format none 

Call stack:
  at Invoke-Program, C:\Users\swift-ci\jenkins\workspace\swiftpm-PR-windows\swift\utils\build.ps1: line 807
  at <ScriptBlock>, C:\Users\swift-ci\jenkins\workspace\swiftpm-PR-windows\swift\utils\build.ps1: line 1603
  at Invoke-IsolatingEnvVars, C:\Users\swift-ci\jenkins\workspace\swiftpm-PR-windows\swift\utils\build.ps1: line 826
  at Build-SPMProject, C:\Users\swift-ci\jenkins\workspace\swiftpm-PR-windows\swift\utils\build.ps1: line 1564
  at <ScriptBlock>, C:\Users\swift-ci\jenkins\workspace\swiftpm-PR-windows\swift\utils\build.ps1: line 2470
  at Invoke-IsolatingEnvVars, C:\Users\swift-ci\jenkins\workspace\swiftpm-PR-windows\swift\utils\build.ps1: line 826
  at Test-Foundation, C:\Users\swift-ci\jenkins\workspace\swiftpm-PR-windows\swift\utils\build.ps1: line 2460
  at Invoke-BuildStep, C:\Users\swift-ci\jenkins\workspace\swiftpm-PR-windows\swift\utils\build.ps1: line 613
  at <ScriptBlock>, C:\Users\swift-ci\jenkins\workspace\swiftpm-PR-windows\swift\utils\build.ps1: line 3369

    at Invoke-Program, C:\Users\swift-ci\jenkins\workspace\swiftpm-PR-windows\swift\utils\build.ps1: line 811
    at <ScriptBlock>, C:\Users\swift-ci\jenkins\workspace\swiftpm-PR-windows\swift\utils\build.ps1: line 1603
    at Invoke-IsolatingEnvVars, C:\Users\swift-ci\jenkins\workspace\swiftpm-PR-windows\swift\utils\build.ps1: line 826
    at Build-SPMProject, C:\Users\swift-ci\jenkins\workspace\swiftpm-PR-windows\swift\utils\build.ps1: line 1564
    at <ScriptBlock>, C:\Users\swift-ci\jenkins\workspace\swiftpm-PR-windows\swift\utils\build.ps1: line 2470
    at Invoke-IsolatingEnvVars, C:\Users\swift-ci\jenkins\workspace\swiftpm-PR-windows\swift\utils\build.ps1: line 826
    at Test-Foundation, C:\Users\swift-ci\jenkins\workspace\swiftpm-PR-windows\swift\utils\build.ps1: line 2460
    at Invoke-BuildStep, C:\Users\swift-ci\jenkins\workspace\swiftpm-PR-windows\swift\utils\build.ps1: line 613
    at <ScriptBlock>, C:\Users\swift-ci\jenkins\workspace\swiftpm-PR-windows\swift\utils\build.ps1: line 3369
  From System.Management.Automation.RuntimeException: Error: swift.exe exited with code 1.
  Invocation:
    T:\Program Files\Swift\Toolchains\0.0.0+Asserts\usr\bin\swift.exe test --scratch-path T:\x86_64-unknown-windows-msvc\FoundationTests --package-path C:\Users\swift-ci\jenkins\workspace\swiftpm-PR-windows\swift-corelibs-foundation -c debug -Xbuild-tools-swiftc -IT:\Program Files\Swift\Platforms\Windows.platform\Developer\SDKs\Windows.sdk\usr\lib\swift -Xbuild-tools-swiftc -LT:\Program Files\Swift\Platforms\Windows.platform\Developer\SDKs\Windows.sdk\usr\lib\swift\windows -Xcc -IT:\Program Files\Swift\Platforms\Windows.platform\Developer\SDKs\Windows.sdk\usr\lib\swift -Xlinker -LT:\Program Files\Swift\Platforms\Windows.platform\Developer\SDKs\Windows.sdk\usr\lib\swift\windows -debug-info-format none 
  Call stack:
    at Invoke-Program, C:\Users\swift-ci\jenkins\workspace\swiftpm-PR-windows\swift\utils\build.ps1: line 807
    at <ScriptBlock>, C:\Users\swift-ci\jenkins\workspace\swiftpm-PR-windows\swift\utils\build.ps1: line 1603
    at Invoke-IsolatingEnvVars, C:\Users\swift-ci\jenkins\workspace\swiftpm-PR-windows\swift\utils\build.ps1: line 826
    at Build-SPMProject, C:\Users\swift-ci\jenkins\workspace\swiftpm-PR-windows\swift\utils\build.ps1: line 1564
    at <ScriptBlock>, C:\Users\swift-ci\jenkins\workspace\swiftpm-PR-windows\swift\utils\build.ps1: line 2470
    at Invoke-IsolatingEnvVars, C:\Users\swift-ci\jenkins\workspace\swiftpm-PR-windows\swift\utils\build.ps1: line 826
    at Test-Foundation, C:\Users\swift-ci\jenkins\workspace\swiftpm-PR-windows\swift\utils\build.ps1: line 2460
    at Invoke-BuildStep, C:\Users\swift-ci\jenkins\workspace\swiftpm-PR-windows\swift\utils\build.ps1: line 613
    at <ScriptBlock>, C:\Users\swift-ci\jenkins\workspace\swiftpm-PR-windows\swift\utils\build.ps1: line 3369  
<...SNIP...

What troubleshooting can be done here?

1 Like

@compnerd : Sorry for the mention, but do you have any insights or thoughts on how to troubleshoot this?

Sorry, there isn't really anything to go on there.

All I can tell is that something failed (possibly SPM, swift-driver, or the frontend) crashed when building the Foundation tests. There was a crash that was occurring with a race condition that only seemed to reproduce on the CI hosts. The use of release build helped but it adds ~30m? to the build (@bnbarham might remember the actual time). Unfortunately, extracting the information from the crash was not really possible as SPM would consume the error messages. The problem only exhibited under SPM so I couldn't really get any more information on the underlying issue.

Do you know how much memory is installed in those failing CI jobs and how many processes are running simultaneously? I have seen that particular trace many times when building swift-foundation-icu, where that last C++ file is from, in constrained build environments with not enough memory. It is one of two issues:

  1. That packaged data file is a fairly large 200 MB array of ICU data and clang often allocates 15-20X the memory of the size of the array to the compiler process to eventually write that object file, seems like an LLVM bug. To work around it, I have to make sure I have enough RAM and swap.
  2. Sometimes that ICU C++ compile goes through, but then several large swift-frontend jobs are started and all contend for memory.

I first pass in -j 1 to the ninja or SwiftPM compile and that makes sure there is less contention for memory, allowing the build to go through. Other times, I have no choice but to provide more RAM or swap. I suspect that is what you're seeing here.

The issue there was a race in the move for response files on Windows (atomic, but not so much?). That should be fixed by Use UUID for response file names and add 'resolveArgumentList' API which contains original command-line if a response file is used by artemcm · Pull Request #1989 · swiftlang/swift-driver · GitHub, so it probably isn’t this issue.

This was only true before we understood the underlying cause. None of us tried the exact layout on CI and response files are only used when the arguments end up being too long. The paths on CI were long enough to hit this, but not the paths that anyone who tried to reproduce it were using. When using the same layout, it was also hit at desk.

For posterity, the reason it helped was because release is WMO and thus didn’t hit the race. 30 minutes sounds about right, that change was reverted though (after the above fix was merged).

FWIW it wasn’t SwiftPM consuming the error in this case, but rather the driver. I fixed the only case I found of that though (Always diagnose failures in job execution by bnbarham · Pull Request #1978 · swiftlang/swift-driver · GitHub). Not that SwiftPM couldn’t be improved as well ( Add extra output for --verbose and --very-verbose to help diagnose failures · Issue #9075 · swiftlang/swift-package-manager · GitHub ).

The logs above do suggest that there’s another case of this however, so maybe I missed one, ie. throwing a Driver.ErrorDiagnostics.emitted even though there has been no error emitted.

Not saying the issue above doesn’t have one of the two causes you mention (though I should note that the Windows CI machines have a fairly large amount of RAM), but this was also what we thought the problem above was for a while. Mostly mentioning because we likely would have found the driver issue much earlier if we hadn’t made that assumption.

2 Likes

It's great to see these being address, but it looks like the focus has been on main. However, I'm encountering the issues on release/6.2.

Also, re-triggering the build reproduces the issues. I tried about 3 re-triggers so far. do we need to cherry-pick some of the changes to release/6.2 ?

I do think that cherry picking some of the changes that we used as a workaround for the other issue would be a good litmus test.

Is the printing don’t prior to the step or after completion?

I assume you mean printing the build output above, where the last step shown is Compiling icu_packaged_data.cpp. It depends if you are building Foundation with CMake/Ninja or SwiftPM (unfortunately, I don't think that misleading build output was ever fixed), and which of the two problematic steps I listed causes the OOM failure, whether clang or swiftc. In this case, I have definitely seen that trace recently when building Foundation with ninja in a linux AArch64 container running on macOS arm64, which was swiftc taking too much memory: that went away when I allocated more memory to the linux container.

Let me finally say I have never used Swift on Windows- not counting iterating on pulls through the Windows CI- this is my experience building the Swift toolchain in smaller build environments, both running linux and natively on my Android phone.

I’m confused, your initial post mentions release/6.3, not release/6.2. Which are you seeing this on?

If it’s release/6.2 then yes, it’s likely the issue I mentioned above. Cherry-pick for that is [6.2 🍒] Use UUID for response file names and add 'resolveArgumentList' API which contains original command-line if a response file is used by artemcm · Pull Request #1996 · swiftlang/swift-driver · GitHub, though I believe @ArtemC was waiting on verifying the change further before merging (and it may be too late now).

I did also have [6.2] Build foundation tests in release by bnbarham · Pull Request #84445 · swiftlang/swift · GitHub up, but it’s really a workaround (and one that adds a fair amount to CI time) - this is a real failure on Windows when response files are used.

Ooops.. big finger typo. it was release/6.2.