IRGen tests and dealing with ABI differences

So, looking through the IRGen failures that I am seeing on the Windows targets, it seems that the IRGen tests are going to need some serious finessing to make it correct for all the targets. On Windows, we use the singleton strategy to deal with the class initialisation due to the constant cross-module data initialisation not being possible. However, this manifests in some tests, that I think that we shouldn't just disable. However, the code generation patterns are sufficiently different (I'm taking a particular example here, but that isn't specifically the issue):

IRGen.typemetadata assumes that the type metadata accessor for an ObjC derived type will call swift_getInitializedObjCClass which is not the case with the singleton pattern, and instead we will call swift_getSingletonMetadata and extract the value from the returned type. This is then followed with an additional phi node to check if the cache was empty. Checking this requires a series of instructions to be checked.

I'm not sure what is the best way to handle this though (and most of the remaining tests are similarly complicated). Do we really want to start forking off versions of the tests?

CC: @John_McCall @Joe_Groff @Slava_Pestov

It seems to me like the best approach depends on the purpose of each test. If a test is exercising a specific metadata instantiation strategy, then it would make sense to either change the test so that it uses a consistent strategy across platforms, or else conditionalize the test so that it only runs on the platforms that use that strategy. If the test isn't specifically exercising any particular metadata instantiation strategy, then it may be best to make the test less specific so that it isn't checking for irrelevant codegen details. Our lit tests have a bad habit of being overspecific.

Right, there is no specific strategy specified, it just assumes that all platforms are the same. It just so happens that Linux and Darwin are similar enough that the tests got away with it. My question was more on the order of how do I make the tests less specific especially since I don't have my fancy tool (a.k.a a preprocessor) to deal with it. Do we really want to have a different variant of all the IRGen tests that test the same thing on different platforms? Do we want to duplicate the test RUN command for every target somehow? I'm trying to find the best solution out of a pile of pretty bad ones.

In the case you mentioned in your OP:

it sounds like this code is specifically interested in exercising subclasses of ObjC objects, which are only a concern for Darwin. Can this test be extracted out into its own file that REQUIRES: objc_interop?

Well, that particular case also seems to be sad to limit to just Darwin. It actually currently does run and pass on Linux targets as well. In theory, it should also run on Windows (provided that someone provides a recent objc.dll to run against. It would be a shame to just disable it for convenience. As it is, there is insufficient testing on Windows, so I'd prefer to keep as many tests enabled as possible.

In practice, the ObjC interop is so tightly coupled to Apple's implementation that there's a lot more work you'd have to do to make it work on a non-Apple platform. I don't think you'd be getting any useful coverage on Windows out of ObjC interop tests unless you were also actively porting Swift's runtime to interop with a non-Apple ObjC runtime.

1 Like

Sounds like an interesting challenge as a follow up set of changes after the non-interop case works :wink:

Okay, this one might be more illustrative of the issue:

IRGen.c_layout

It currently checks to ensure that chareth is sext-ing the formal arguments as well as the return value, which does not hold on Windows (since it does it implicitly as per the ABI). However, doing this via regex also doesn't give us enough control as sometimes we want to perform capture substitutions as well. So, what is the right thing to do here? Have a separate variant of the entire test for windows? Because on non-Windows, that is relevant to the test IMO.

Would this: GitHub - gnustep/libobjc2: Objective-C runtime library intended for use with Clang. ObjC runtime be sufficient?

No, that wouldn't be sufficient. The current assumptions in the code generation require that the Darwin ABI be followed, and the GNUStep runtime has its own ABI.

I have no objection in general to disabling tests on Windows and then cloning them for different test output, although it's probably better to try to improve the lit substitution logic to the point of having a Windows vs. non-Windows %-substitution.

I've been trying to use substitutions as approach, or regexes where the pattern is comparable, but when you start going down comparing entire basic block flow that tends to just break down. But, perhaps having a lit substitution for the initialization strategy is what we need, because then we can do CHECK-RESILIENT-STRATEGY and CHECK-FRAGILE-STRATEGY. Would that be reasonable to add to the swift lit configuration?

It's fine to just provide a complete different set of CHECK lines for a function if that's what you have to do.

I don't think either of these is more resilient or fragile than the other, and even if they were, people reading the test aren't going to remember this specific detail. Windows/COFF has stood out before as needing different treatment from Mach-O and ELF, which generally can be treated the same; I would recommend naming the substitution values along those lines.

c_layout is trying to test C ABI interop, and it looks like as written, it's only taking into account different Darwin flavors (and maybe taking advantage of the fact that x86_64 macOS and Linux both mostly follow the same standard AMD C ABI). It seems to me like we could add %target-os to the CHECK: line to add the OS as an explicit dimension to the set of tests, but that might be unwieldy. Like John said, at this point it might make sense to split it into separate files for Windows/Linux/Darwin ABI tests.

Hmm, for the c_layout test, I really like your suggestion of having a (CHECK-)?SYSV through a %target-abi substitution. For the others, its the default strategy which is employed which causes the difference, so perhaps having two substitutions isn't too terrible comparatively.

Thanks, I think that the additional substitutions at the IRGen level might be what I was looking for as a much more palatable solution.