Compiling 4.2 on armv7 (and i686)

Awesome, sounds good.

Now for something different: CMakeLists.txt

I hacked around this originally, thinking there was just something weird going on locally, but apparently there's more to it.

I had to add support for detecting i686 to CMakeLists.txt and target.py. That's all good and happy. However, this piece where SWIFT_HOST_VARIANT_ARCH is set from SWIFT_HOST_VARIANT_ARCH_default is failing.

For whatever reason, even though SWIFT_HOST_VARIANT_ARCH_default is set to i686, the line that copies it and enables caching of the result will always wind up blank, and cause cmake to bail further down because of it. It works fine if I disable caching like I did locally below. I will add that I also already added message() calls around this block of code to see that things are what I expected, and _default is indeed "i686", but still it winds up blank. I'm scratching my head as to what could be going on here, but I don't really want to send the PR for the Linux 32-bit host changes until I know what's going on here (and it is the last patch I need to resolve before this first PR is ready).

@@ -597,8 +599,9 @@ endif()
 
 set(SWIFT_HOST_VARIANT_SDK "${SWIFT_HOST_VARIANT_SDK_default}" CACHE STRING
     "Deployment sdk for Swift host tools (the compiler).")
-set(SWIFT_HOST_VARIANT_ARCH "${SWIFT_HOST_VARIANT_ARCH_default}" CACHE STRING
-    "Deployment arch for Swift host tools (the compiler).")
+# FIXME: Do not commit. Not sure why i686 won't cache properly here
+#        and keeps winding up blank instead.
+set(SWIFT_HOST_VARIANT_ARCH "${SWIFT_HOST_VARIANT_ARCH_default}")
 
 #
 # Enable additional warnings.

EDIT: Nevermind, I noticed that build-script-impl also sets SWIFT_HOST_VARIANT_ARCH, and even if it sets to be blank CMake takes that over the non-blank value that is set as the default.

Good news is that with the work put in so far, I figured I should be able to get to the Swift PM on Arm32, so I ran that build overnight.

The bad news is that it got stuck in Swift PM and many tests fail with the same issue:

stderr>>> AnyHashableCasts.swift.gyb.tmp.out: /mnt/usb/pi/buildSwift.exp/swift/stdlib/public/runtime/Metadata.cpp:3382:
 void initializeResilientWitnessTable(swift::GenericWitnessTable *, void **): Assertion `impl != nullptr && "no implementation for witness"' failed. 

It looks like thereā€™s another 32-bit issue lurking with the Resilient Witness Table added in 4.2, but the code mostly looks fine.

EDIT: Hmm, this appears to be 32-bit ARM only, which will make it a bit harder to find. 64-bit ARM is fine, i686/x86 is fine.

1 Like

Okay, this is starting to get into areas of linking/etc that I'm really rough/rusty on.

To try to figure out the resilient witness issue, I took a look at the SIL output, which is identical between i686 and armv7. I doubted this was the problem, but it doesn't hurt to confirm. I also dumped the IR, but I'm still trying to make heads or tails of what is in there.

I also instrumented initializeResilientWitnessTable() in the stdlib to see what is going on there. And that is where things get a little more interesting, but also having some trouble deciphering what it means:

i686 (Good):

[ RUN      ] AnyHashableCasts.HashableEnum.value(5) as HashableEnum as? AnyHashable
stdout>>> ==== Protocol: 0xb76aa180 ====
stdout>>> 0, -756012: (nil), (nil)
stdout>>> 1, -756000: 0xb752b950, (nil)
stdout>>> 2, -755988: 0xb752b980, (nil)
stdout>>> ==== Witnesses: 0x466d40 ====
stdout>>> 0, -28076: 0xb752b950, 0x45a0e0
stdout>>> 1, -28068: 0xb752b980, 0x45a110
stdout>>> ==== Protocol: 0xb76aa11c ====
stdout>>> 0, -758140: 0xb7526910, (nil)
stdout>>> ==== Witnesses: 0x466d6c ====
stdout>>> 0, -28100: 0xb7526910, 0x45a150
[       OK ] AnyHashableCasts.HashableEnum.value(5) as HashableEnum as? AnyHashable

armv7 (Bad):

[ RUN      ] AnyHashableCasts.HashableEnum.value(5) as HashableEnum as? AnyHashable
stdout>>> ==== Protocol: 0x76de2790 ====
stdout>>> 0, -57364: (nil), (nil)
stdout>>> 1, -57352: 0x76cfb5bc, (nil)
stdout>>> 2, -57340: 0x76cfb5c4, (nil)
stdout>>> ==== Witnesses: 0x4e6d70 ====
stdout>>> 0, -26156: 0x4da5a0, 0x4dbab8
stdout>>> 1, -26148: 0x4da5ac, 0x4dbacc
stderr>>> AnyHashableCasts.swift.gyb.tmp.out: /mnt/usb/pi/buildSwift.exp/swift/stdlib/public/runtime/Metadata.cpp:3401: void initializeResilientWitnessTable(swift::GenericWitnessTable *, void **): Assertion `impl != nullptr && "no implementation for witness"' failed.
stderr>>> CRASHED: SIGABRT
the test crashed unexpectedly
[     FAIL ] AnyHashableCasts.HashableEnum.value(5) as HashableEnum as? AnyHashable

The rows in the dump output are: Index, "offset" of data, "Requirement Function", "Witness Implementation" (or default for protocols).

The interesting thing that stands out to me is that the protocol requirements are coming from a different module (the stdlib I presume), and with i686, the requirement function in the resilient witness table matches it. This seems to suggest to me that the resilient witness table is supposed to participate in dynamic linking. And in the case of ELF ARM32, the requirement is pointing at part of the local library Is that true? If so, then the question winds up becoming: Why is the ELF ARM32 version producing this output, when ARM64 works, and obviously Mach-O ARM32/64 as well?

The test AnyHashableCasts.5 as AnyHashable as? Int passes here, which I suspect is because Int is part of the same module already.

[ RUN      ] AnyHashableCasts.5 as AnyHashable as? Implemented
[       OK ] AnyHashableCasts.5 as AnyHashable as? Implemented
[ RUN      ] AnyHashableCasts.5 as AnyHashable as? Unimplemented
[       OK ] AnyHashableCasts.5 as AnyHashable as? Unimplemented

Is also interesting, but there isn't any new requirements showing up that need to be linked dynamically, which makes it unsurprising it passes.

Anyone know enough about how the linker is supposed to behave here in general?

The "Requirement function" of the resilient witness (i.e., the Function non-static data member of TargetResilientWitness is intended to point to a function that is defined in the same library as the protocol itself. If it's not pointing into that library, that's a problem.

That Function field is described as a RelativeIndirectPointer, which means it's a 32-bit offset. When you apply that 32-bit offset to the address of the Function field itself, you get the address of a function pointer, which should reference a function---called a "dispatch thunk"---as noted before, it's emitted alongside the protocol.

It is possible that the linker has a bug involving a relative address to a pointer to a function. We ran into a similar bug in the Darwin linker where relatively-referenced symbols were getting coalesced but the relative offsets weren't getting adjusted accordingly. We still have

Aside: on master, we no longer use a RelativeIndirectPointer to the dispatch thunk. Instead, we use a RelativeIndirectablePointer that references an alias pointing to the requirement's description within the protocol metadata. I'm curious whether that scheme---which is what we'll be using going forward---also triggers the bug.

Doug

That's what I was figuring. I just wasn't confident about that assumption, so this helps.

Speaking of that, the LLVM IR being emitted is identical for the tables, so this is the line of thought I'm thinking of as well. I kinda garbled my post there, whoops. Speaking of, what was supposed to be at the end of the last sentence? Seemed relevant. :slight_smile:

I did notice that there's been more work here, I attempted a poor backport of some of it with no luck. Although knowing what I know this morning between investigation and your confirmation, it wouldn't have worked anyways because I backported the wrong parts. I'll look into doing a more intelligent backport based on what you say here and what I've pieced together. I'm also looking at a couple small issues where if GNU ld the default, it sometimes winds up getting used in a few places (in various repos) instead of gold. At least on Arm64, it was actually spewing linker warnings when it happened because of relocations swift is emitting in 4.2. I want to eliminate that variable as well.

Hah. We still have hacks in IRGen to work around linker bugs involving relative references to coalesced string constants, e.g., in GenDecl.cpp:

/// Get or create a global string constant.
///
/// \returns an i8* with a null terminator; note that embedded nulls
///   are okay
///
/// FIXME: willBeRelativelyAddressed is only needed to work around an ld64 bug
/// resolving relative references to coalesceable symbols.
/// It should be removed when fixed. rdar://problem/22674524
llvm::Constant *IRGenModule::getAddrOfGlobalString(StringRef data,
                                               bool willBeRelativelyAddressed) {

I suggest trying to back port IRGen/Runtime: Use method descriptors instead of dispatch thunks as kā€¦ Ā· apple/swift@fcbe997 Ā· GitHub . That's the commit that switched from using functions to using descriptors, and gets rid of the suspiciously-looking getFunctionGOTEquivalent.

Doug

Thanks for that, you just saved me some time hunting in the GenProto.cpp source history for that commit (which was what I was going to do in about an hour or so once my current build finished). :slight_smile:

That change still does look promising, but I haven't gotten a chance to try it yet. I made the mistake of attempting to just keep pulling in PRs referenced by that commit to satisfy functions/etc it relies on, but it got out of hand quick as changing the mangling was breaking libdispatch when applied to the 4.2 branch. I'm currently still waiting on a fresh build with a more hand-crafted backport, and won't really know until the morning how much more code I have to bring over by hand to get it to build cleanly.

1 Like

@Douglas_Gregor To backport the suggested change, I'm also having to backport PR 19067. I've got it pretty much working, except the TBDGen portion doesn't map cleanly on the swift-4.2-RELEASE tag, which causes the unit tests checking for the method descriptor to be in the TBD output to fail.

The commit in question that doesn't map properly is this one: Method descriptors by slavapestov Ā· Pull Request #19067 Ā· apple/swift Ā· GitHub

Pinging @Slava_Pestov as well in case there's some guidance on how to make the above patch onto the 4.2 release tag?

I'm also curious if anyone knows if Linux Swift uses the tbd files, or if this is strictly meant to support Darwin's linker?

In Swift 4.2, IRGen does not emit method descriptors, so you should just drop this commit -- these symbols don't need to appear in the TBD file.

However, the suggestion I received was to backport your recent change which ditched the RelativeIndirectPointer in favor of a new approach which depends on the method descriptor. The hope was this change uses a mechanism that the Arm32 Linux linker can handle better. Because right now, resilient witness tables donā€™t link properly at runtime, making it pretty difficult to provide a Linux 4.2 build for certain platforms (Raspberry Pi).

Is there maybe another approach here I should be looking at that doesnā€™t wind up sucking in the method descriptor changes?

1 Like

If my analysis is correct, we need to emit the method descriptors for 4.2 address the problem @Kaiede is seeing with ELF ARM32.

Doug

@Douglas_Gregor Your analysis here is correct. I was able to get armv7 working well enough to build one of my projects and pass its unit tests by backporting the method descriptor work.

Whatā€™s interesting though is the backport broke i686 in some ways (although master was fine at the 2018-09-22 snapshot which I think includes these changes?), but i686 is also a ways off from being fully stood up. It was the i686 result that made me think it wasnā€™t working, so I was surprised that armv7 did work and didnā€™t have the same test failures.

The downside here is that by backporting a Swift 5 ABI detail, it makes cross compilers more complicated than in the past. I canā€™t target both armv7 and AArch64 with the same cross compiler build, for example, unless I either do some touchy work, or simply say AArch64 also gets the ABI change too. :confused:

2 Likes

I just wanted to add a thanks to the folks offering up their help up to this point.

i686 is on the way to being bootstrapped, but needs some llvm work and more fixes for very recent regressions, as well as finishing up opening PRs for some of the other patches.

Armv7 appears to be working well enough to be usable, with the caveat introduced by backporting the Swift 5 change. Enough that I am going to start seeing how well the build works in a test environment long term. (i.e. how well it works running aquarium lights on a 24/7 schedule, and how well NIO/Kitura build with it)

Iā€™ll need to work around [SR-8847] DispatchTimeInterval String representation regression Swift 4.2/Ubuntu 18.04 Ā· Issue #650 Ā· apple/swift-corelibs-libdispatch Ā· GitHub but thatā€™s okay, at least all Linux devs using Dispatch are hitting that one. :slight_smile:

1 Like

Perhaps @Kaiede should just be using master?

Considering the goal of this is to bring folks using 3.1.1 and 4.1.2/4.1.3 up to the current release. It wouldnā€™t just be me using master. Itā€™d be most anyone trying to target the Raspberry Pi or other Armv7 SBCs.

Iā€™m also a little hesitant to just say that 4.2 is off-limits to these devices because some of the ABI stabilization work regressed a community-supported platform. Thatā€™s a whole yearly cycle weā€™d be cut off from, along with the improvements it brings.

1 Like

Anyhow, now that I've at least got a working 4.2 build, I've turned my attention to the master branch for both arm32 and x86 Linux in an attempt to see what changes are needed to unblock CI nodes for the two platforms.

I originally started with the 2018-09-22 snapshot, which worked well up through the package manager on x86. There's some unimplemented relocations in LLVM that currently block x86 (discussed earlier in the thread). It's just so much faster than a Raspberry Pi to build, it helps flush out the easy stuff, even without being fully stood up.

However, I am now looking at 2018-10-03, and I've hit a new break in a new function swift_getAssociatedTypeWitness. On x86 Linux, this just up and segfaults on me when trying to launch swift-build-stage1, without arguments, and without writing out anything from the fatal errors. I haven't had time to do a full investigation yet. @Douglas_Gregor, just pinging you in case this is interesting. I still need to flesh out more details in a debugger, so I'm not expecting it'll ring a bell.

The title is now a bit of a misnomer, but I think it makes sense to keep using the thread discussing Armv7 and x86 standup.

1 Like

I hit this at one point in the last week or two with x86_64 Linux, and fixed it in one of my pull requests... but I don't know which specific pull request or commit addressed the problem. The last week or so was fairly rocky with the landing of swift_getAssociatedTypeWitness().

Doug

Alright, I'll move to a newer snapshot before digging too much into this particular problem.

Getting back to Arm32 for a moment, I'm hitting an interesting issue with this chunk of code in parseExtendedAvailabilitySpecList:

  auto Attr = new (Context)
  AvailableAttr(AtLoc, SourceRange(AttrLoc, Tok.getLoc()),
                PlatformKind.getValue(),
                Message, Renamed,
                Introduced.Version, Introduced.Range,
                Deprecated.Version, Deprecated.Range,
                Obsoleted.Version, Obsoleted.Range,
                PlatformAgnostic,
                /*Implicit=*/false);
  return makeParserResult(Attr);

I'm getting an error consistently:

swift: /mnt/usb/pi/buildSwift.master/llvm/include/llvm/ADT/PointerIntPair.h:163: static intptr_t llvm::PointerIntPairInfo<swift::AvailableAttr *, 2, llvm::PointerLikeTypeTraits<swift::AvailableAttr *> >::updatePointer(intptr_t, PointerT) [PointerT = swift::AvailableAttr *, IntBits = 2, PtrTraits = llvm::PointerLikeTypeTraits<swift::AvailableAttr *>]: Assertion `(PtrWord & ~PointerBitMask) == 0 && "Pointer is not sufficiently aligned"' failed.

2.      With parser at source location: /mnt/usb/pi/buildSwift.master/build/buildbot_linux/swift-linux-armv7/stdlib/public/core/
4/FloatingPointTypes.swift:4889

The line being parsed is:
@available(*, unavailable, message: "Float80 is only available on non-Windows x86 targets.")

I've done a couple things here. First, I made DeclAttribute 8-byte aligned for grins. Second, I added a printf (since I can't get much out of the debugger without disassembling and dumping raw memory) to tell me what Attr became, and I'm seeing this: Attr = 0x8cb5574

So, I'm getting a bit confused. i686/x86 handles this fine without changes, only Arm32 has problems with it. And every other exit point in this function returns nullptr, which should show up as 0 when implicitly converting it to a ParserResult<AvailableAttr>. No diagnostic is in the logs, which also helps rule out the other exit points in the function. But I'm left with a couple thoughts/questions:

  1. The new (Context) AvailableAttr() syntax is a little odd, since it looks like it uses the operator on AttributeBase, which forces alignment to be the same as AttributeBase, which seems like a problem if classes that inherit AttributeBase wind up being more strict than AttributeBase itself.
  2. Why is 4-byte alignment still not enough to satisfy PointerIntPair when asking for 2 bits for the Int? It seems like it should, no problem. Only thing I can think of is that there's a sneaky 3rd bit being asked for somewhere.

@ahoppen, maybe you have a bit of insight, seeing as you wrote this code, originally?