Proper way to escalate Swift 6.1 crash blockers

Hi,

We have been building our full stack with 6.1 betas and still have crash bug in the compiler with a fully reproducible test case linked (which works fine with 6.0) - but the issue is not even triaged and seems dropped - are there any "official" way of escalating such crashes for review (it would be a pity if we can't move on to 6.1 when it drops...) - understood there is a serious case load now with both 6.1 (and WWDC...) coming up soon, but would presume that reproducible regressions should be evaluated at least.

Often we can see crashes due to assertions being enabled, but we still crash with Xcode 16.3b3 which supposedly is without assertions.

Cheers,

Joakim

3 Likes

Thanks for surfacing this.

Please ping me on distributed module issues, though it's unclear if it is a distributed change or something related I can try to at least triage / diagnose.

1 Like

It may be worth adding GitHub - ordo-one/package-distributed-system to the source compatibility suite... I see you opened up a PR for it; I just pinged the CI bot to kick off tests again.

Please don't hesitate to ping me on those, I can try my best to get those looked at.

1 Like

Yeah, we thought the same as it bit a couple of times now - that PR is ready for merge AFAIK, but still waiting to go in... Thanks for the quick response.

From what I can see this is an assertion failure. This shouldn't be an issue in the 6.1 release since the release toolchains are build without assertions. However, the linux nightly toolchains are build with assertions and that's why you are running into this. We should get this fixed nevertheless and check why the assertion failed.
cc @tbkka who has been doing some work on assertions previously.

1 Like

Yeah, that is why I wrote:

It may be in this case that we crash after the assertion, I'll ask someone to see if that backtrace is different and attach it to the GitHub issue if so (ping @Sarunas).

Hi Franz!

Happens in Xcode 16.3 Beta 3, built without assert:

DEVELOPER_DIR=/Applications/Xcode-16.3.0-Beta.3.app/Contents/Developer swift --version
swift-driver version: 1.120.5 Apple Swift version 6.1 (swiftlang-6.1.0.110.5 clang-1700.0.13.3)
Target: arm64-apple-macosx15.0
DEVELOPER_DIR=/Applications/Xcode-16.3.0-Beta.3.app/Contents/Developer swift build -c release
Building for production...
error: compile command failed due to signal 11 (use -v to see invocation)
Please submit a bug report (https://swift.org/contributing/#reporting-bugs) and include the crash backtrace.
Stack dump:
0.	Program arguments: /Applications/Xcode-16.3.0-Beta.3.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/swift-frontend -frontend
[...]
1.	Apple Swift version 6.1 (swiftlang-6.1.0.110.5 clang-1700.0.13.3)
2.	Compiling with the current language version
3.	While emitting IR SIL function "@$s17DistributedSystemAAC0A00a5ActorB0AacDP10remoteCall2on6target10invocation8throwing9returningqd_1_qd___AC06RemoteE6TargetV17InvocationEncoderQzzqd_0_mqd_1_mtYaKAC0aC0Rd__s5ErrorRd_0_2IDQyd__0cP0Rtzr1_lFTW".
 for 'remoteCall(on:target:invocation:throwing:returning:)' (at /Users/sarunas/GitHub/ordo-one/package-distributed-system/Sources/DistributedSystem/DistributedSystem.swift:1062:12)
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0  swift-frontend           0x000000010abdac28 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 56
1  swift-frontend           0x000000010abd8a60 llvm::sys::RunSignalHandlers() + 112
2  swift-frontend           0x000000010abdb264 SignalHandler(int) + 360
3  libsystem_platform.dylib 0x000000019baaade4 _sigtramp + 56
4  swift-frontend           0x0000000104fd2780 swift::irgen::emitWitnessTableRef(swift::irgen::IRGenFunction&, swift::CanType, llvm::Value**, swift::ProtocolConformanceRef) + 276
5  swift-frontend           0x0000000104fd2780 swift::irgen::emitWitnessTableRef(swift::irgen::IRGenFunction&, swift::CanType, llvm::Value**, swift::ProtocolConformanceRef) + 276
6  swift-frontend           0x0000000104fd31cc swift::irgen::emitGenericRequirementFromSubstitutions(swift::irgen::IRGenFunction&, swift::GenericRequirement, swift::MetadataState, swift::SubstitutionMap, bool) + 240
7  swift-frontend           0x0000000104fda57c void llvm::function_ref<void (swift::GenericRequirement)>::callback_fn<(anonymous namespace)::EmitPolymorphicArguments::emit(swift::SubstitutionMap, swift::irgen::WitnessMetadata*, swift::irgen::Explosion&)::$_0>(long, swift::GenericRequirement) + 64
8  swift-frontend           0x0000000104fdaca4 void llvm::function_ref<void (swift::GenericRequirement)>::callback_fn<(anonymous namespace)::PolymorphicConvention::enumerateUnfulfilledRequirements(llvm::function_ref<void (swift::GenericRequirement)> const&)::$_0>(long, swift::GenericRequirement) + 212
9  swift-frontend           0x0000000104fc8bb8 swift::irgen::enumerateGenericSignatureRequirements(swift::CanGenericSignature, llvm::function_ref<void (swift::GenericRequirement)> const&) + 220
10 swift-frontend           0x0000000104fd2988 swift::irgen::emitPolymorphicArguments(swift::irgen::IRGenFunction&, swift::CanTypeWrapper<swift::SILFunctionType>, swift::SubstitutionMap, swift::irgen::WitnessMetadata*, swift::irgen::Explosion&) + 292

Thanks for confirming. That's really helpful!

2 Likes

So the good news is that I'm reproducing the crash readily, so thank you for the ping on this.

The mixed good/bad news is that it also reproduces on current main / 6.2 which means this has been around since 6.1 somehow. It is not just about some aggressive assertion; since removing the assertion just gets us a real crash elsewhere.

I also confirm it only fails in release mode AFAICS, and not in debug mode...

Thanks for the ping here, I'll see what I can find out.

3 Likes

Yeah, the assertion guards against the unexpected appearance of an invalid conformance here. Since we’re in IRGen, all conformances we see should be valid; if a type doesn’t conform to some protocol along the way we should have diagnosed an error first. The crash you see without the assert is the null pointer dereference that occurs immediately after.

1 Like

We’re moving to always-on assertions (ASSERT vs assert) so these differences will be less of an issue in the long run.

5 Likes

As Slava said, we're moving towards having all assertions built into every compiler, including release compilers.

For now, some compilers are still being built without some assertions, but that is changing.

3 Likes

That’s great news!

We’ve had so many assert-only-on-nightly build crashes over the years that should then just go away.

Same thing for lldb swift support? Similar issues there historically.

1 Like

Well, there are a few steps in the causal chain before we get to the desired outcome, but yeah, the basic idea is if the release toolchain has assertions, we can hopefully shake out assertion failures during release toolchain qualification.

I always tell people that an assertion failure is a feature request for the test suite; even in cases where the assertion is just wrong and the precondition might not hold, it suggests that the code path isn’t being tested properly in CI, where asserts are always on.

4 Likes

I minimized the issue and we're working on a fix.

Workaround:
@hassila While we're working on a fix I found a workaround, in case this is something you'd like to give a shot: if the conforming types (all which have ad hoc protocol requirement constraints (anything with the Param: SerializationRequirement like DistributedActorSystem and the Decoder/Encoder/ResultHandler) are structs we don't hit the problem. It is generally around how the witness is dispatched (and missing the added constraint), and doesn't affect struct types.

2 Likes

Thanks @ktoso, we'll have a look if we can do that.

We've unfortunately got a couple of enum types that are difficult to get rid off (e.g. Optional...).

Update from the engineer working on it:

I removed 'Tranferable' conformance from the Optional type (it actually not required for DS itself, but for the dependencies), and changed ConnectionState type from enum to struct, but no luck. Compiler crashes with the same error. May be I missed some type, but checked it twice...

@hassila enums? I'm talking about:

public class DistributedSystem: DistributedActorSystem, @unchecked Sendable {

And others conforming to DistributedTargetInvocationDecoder, DistributedTargetInvocationEncoder and DistributedTargetInvocationResultHandler.

I see the result handler is a final class and the encoder/decoder are structs -- they should actually be fine.

I just realized that you could make the DistributedSystem final, or even just making all methods which are witness to methods with ad-hoc requirements final and things will work as well (I confirmed that workaround just now).

An even simpler workaround should be to make remoteCall definition final, AFAICS this will workaround properly (I just tried). This applies to all methods which are a protocol witness where you ... : SerializationRequirement (that's the ad hoc requirements) - if those functions are final, in a final class, or in a struct/enum things will work AFAICS.

Meanwhile we're working on a proper fix for non final classes as well.

3 Likes

Hi, Ktoso!
Thank you for your reply.
Could you provide more details on how you tried the proposed simple fix? Unfortunately, we can't make the DistributedSystem class final without significant code refactoring, as there is a derived class used for better code structuring.
Making the remoteCall method final doesn't work either due to the following compilation error:

error: emit-module command failed with exit code 1 (use -v to see invocation)
<>/package-distributed-system/Sources/DistributedSystem/DistributedSystem.swift:1062:23: error: method 'remoteCall(on:target:invocation:throwing:returning:)' must be as accessible as its enclosing type because it matches a requirement in protocol 'DistributedActorSystem'
1060 |     }
1061 | 
1062 |     public final func remoteCall<Actor, Err, Res>(
     |                       `- error: method 'remoteCall(on:target:invocation:throwing:returning:)' must be as accessible as its enclosing type because it matches a requirement in protocol 'DistributedActorSystem'
1063 |         on actor: Actor,
1064 |         target: RemoteCallTarget,

I guess it's this repo you've mentioned in another thread, have you been able to figure out?

A bit weird, I've just tried in playground and adding final works, but trying with your repo fires must be as accessible as its enclosing type error. Maybe some package limitation or actually a bug? :thinking: