Compiling 4.2 on armv7 (and i686)

Kaiede · September 26, 2018, 7:12pm

Knowing Darwin works here is useful info. Might give me another angle to see what might be different.

Kaiede · September 26, 2018, 10:11pm

Maybe someone can tell me if this is a red herring or not. The processing is choking on the SIL output for this function where Base is _UnmanagedString<UInt16>:

extension ReversedCollection.Iterator: IteratorProtocol, Sequence {
  public typealias Element = Base.Element
  
  @inlinable
  @inline(__always)
  public mutating func next() -> Element? {
    guard _fastPath(_position != _base.startIndex) else { return nil }
    _base.formIndex(before: &_position)
    return _base[_position] // THIS IS WHERE THINGS GO BAD
  }
}

I've looked at the SIL output for the function, and it mostly makes sense, after reading up on SIL. However, here's where it goes south:

bb12:                                             // Preds: bb11                                                                
  %97 = struct_element_addr %20 : $*UInt16, #UInt16._value // user: %98                                                         
  %98 = load %97 : $*Builtin.Int16                // user: %99                                                                  
  %99 = struct $UInt16 (%98 : $Builtin.Int16)     // user: %100                                      
  %100 = enum $Optional<UInt16>, #Optional.some!enumelt.1, %99 : $UInt16 // user: %101                                          
  br bb16(%100 : $Optional<UInt16>)               // id: %101

The specific piece that dies is the load into %98. The logic is weird, but after muddling through the Integers.swift.gyb file, I realize _value for all integers use the signed built-in type, even if it isn't signed. So this looks somewhat wrong at first glance, but is actually correct. But what about how %20 is calculated?

  %18 = integer_literal $Builtin.Word, 4294967295 // user: %20                                                                  
  %19 = pointer_to_address %4 : $Builtin.RawPointer to [strict] $*UInt16 // user: %20                                           
  %20 = index_addr %19 : $*UInt16, %18 : $Builtin.Word // users: %97, %21

This, looks a bit more weird. I'm having a hard time drumming it back up, but I could have sworn that I read that literals are expected to be signed rather than unsigned. Yet, here we have 0xFFFF_FFFF as a literal. So I'm suspecting that this may be the root problem, where the compiler is attempting to offset an index by 4 billion elements in the positive direction (a very bad day), versus 1 element in the negative direction (expected behavior).

Am I on the right track here? Or is my memory faulty and literals shouldn't be signed in this case?

Joe_Groff · September 26, 2018, 10:37pm

The Builtin types correspond to LLVM's types, which are neither signed nor unsigned; it's the Swift struct type and the operations on it that imparts those semantics. That pointer arithmetic indeed looks suspicious. Even though the operation isn't fundamentally signed, I'm not sure why we would walk one byte backwards there; @Michael_Ilseman might recognize that logic though.

Kaiede · September 26, 2018, 11:26pm

Considering this is a Swift.ReversedCollection<_UnmanagedString<UInt16>>.Iterator that is being worked with, the answer is that it works by decrementing the Index. In this case, because _UnmanagedString<UInt16>.Index is backed by UnsafePointer<UInt16>, it's doing the index decrement using index_addr. That all makes sense.

I was more concerned if the unsigned literal may be causing issues, if it were ever loaded into say, a 64-bit int on 32-bit systems as-is rather than using sign-extension, and then used to try to understand what type is expected to be at the new address?

EDIT:

Okay, this is pretty interesting:

(gdb) p static_cast<IntegerLiteralInst *>(IAI->getIndex().Value)->getValue()
$58 = {static WORD_MAX = 18446744073709551615, U = {VAL = 4294967295, pVal = 0xffffffff}, BitWidth = 64}

Swift when asked about the APInt for the Literal is reporting 64-bits wide. So that makes this even more suspicious. It may not even be a bug in the StringGuts code then, it just happened to expose this underlying issue. That would explain why nothing stood out in the diff.

It also looks like this is intentional, as the IntegerLiteralInst type stores the literal as llvm::APInt::WordType in a trailing object (which is always uint64_t). It then creates the APInt from it, which explains the 64-bit width. This really looks like it somehow got created from a 32-bit signed value somewhere, and didn't get sign extended before the instruction was constructed.

Kaiede · September 27, 2018, 6:06am

I just want to say, this is getting all sorts of weird. I've discovered a pattern in the SIL that's generating weird behaviors during constant folding beyond just this one case:

  public func deallocate() {
    Builtin.deallocRaw(_rawValue, (-1)._builtinWordValue, (-1)._builtinWordValue)
  }

This looks pretty benign, but the end result in the SIL gets weird:

// UnsafeMutablePointer.deallocate()
sil [serialized] @$SSp10deallocateyyF : $@convention(method) <Pointee> (UnsafeMutablePointer<Pointee>) -> () {
// %0                                             // users: %2, %1
bb0(%0 : $UnsafeMutablePointer<Pointee>):
  debug_value %0 : $UnsafeMutablePointer<Pointee>, let, name "self", argno 1 // id: %1
  %2 = struct_extract %0 : $UnsafeMutablePointer<Pointee>, #UnsafeMutablePointer._rawValue // user: %12
  %3 = integer_literal $Builtin.Int2048, -1       // user: %4
  %4 = builtin "s_to_s_checked_trunc_Int2048_Int32"(%3 : $Builtin.Int2048) : $(Builtin.Int32, Builtin.Int1) // users: %6, %5
  %5 = tuple_extract %4 : $(Builtin.Int32, Builtin.Int1), 0 // user: %7
  %6 = tuple_extract %4 : $(Builtin.Int32, Builtin.Int1), 1
  %7 = struct $Int (%5 : $Builtin.Int32)          // user: %8
  %8 = struct_extract %7 : $Int, #Int._value      // user: %9
  %9 = builtin "zextOrBitCast_Int32_Word"(%8 : $Builtin.Int32) : $Builtin.Word // user: %12
  %10 = integer_literal $Builtin.Int32, -1        // user: %11
  %11 = builtin "zextOrBitCast_Int32_Word"(%10 : $Builtin.Int32) : $Builtin.Word // user: %12
  %12 = builtin "deallocRaw"(%2 : $Builtin.RawPointer, %9 : $Builtin.Word, %11 : $Builtin.Word) : $()
  %13 = tuple ()                                  // user: %14
  return %13 : $()                                // id: %14
} // end sil function '$SSp10deallocateyyF'

This becomes:

// UnsafeMutablePointer.deallocate()
sil [serialized] @$SSp10deallocateyyF : $@convention(method) <Pointee> (UnsafeMutablePointer<Pointee>) -> () {
// %0                                             // users: %2, %1
bb0(%0 : $UnsafeMutablePointer<Pointee>):
  debug_value %0 : $UnsafeMutablePointer<Pointee>, let, name "self", argno 1 // id: %1
  %2 = struct_extract %0 : $UnsafeMutablePointer<Pointee>, #UnsafeMutablePointer._rawValue // user: %6
  %3 = integer_literal $Builtin.Int32, -1         // user: %4
  %4 = builtin "zextOrBitCast_Int32_Word"(%3 : $Builtin.Int32) : $Builtin.Word // user: %6
  %5 = integer_literal $Builtin.Word, 4294967295  // user: %6
  %6 = builtin "deallocRaw"(%2 : $Builtin.RawPointer, %4 : $Builtin.Word, %5 : $Builtin.Word) : $()
  %7 = tuple ()                                   // user: %8
  return %7 : $()                                 // id: %8
} // end sil function '$SSp10deallocateyyF'

That's... interesting. Starting with an Int2048 literal (expected), which is then truncated using sign information to Int32 (why Int32? What decides this?), and that then winds up being thrown into the zextOrBitCast_Int32_Word built-in, which extends Int32 to Word. Because Word is still considered 64-bits (it seems to be considered a pointer type, and the maximum width of pointer types is 64 bits, no matter what your host/target is). That creates an interesting situation where we truncate down using sign-preserving means, but then extend back up using a means that doesn't preserve the sign. And I'm not sure why it does this, or why the StringGuts change was what it took to start firing asserts. But because of constant folding, these become the Builtin.Word with 0xFFFFFFFF as the value (positive, instead of negative). When the optimizer tries to make sense of the loads, it chokes.

Now I'm wondering what the heck could be different on Darwin, unless this is specifically a host behavior, and not target at play. Are i386/arm32 on Darwin only targets built on a 64-bit host? If that's the case it would explain it, if for some reason it emits Builtin.Int64 in these cases on 64-bit hosts?

Joe_Groff · September 27, 2018, 3:31pm

The Int2048 is a byproduct of the _ExpressibleByBuiltinIntegerLiteral protocol, which tries to provide some abstraction between literals in the source language and the implementation so that literals can be user-extensible. The resulting SIL looks reasonable (though we're missing some constant folding that ought to be happening); Word would be 32-bit on a 32-bit target, so the manipulations here should end up with -1 getting passed to deallocRaw in the end

Kaiede · September 27, 2018, 4:05pm

Keep in mind the snippets aren’t the full folding process, that would be very verbose.

The literal stuff makes sense, except when a 32-bit host is treating Builtin.Word as a 64-bit value, and then tries to optimize address loads. In a 32-bit address space. Then the sign of the original literal is suddenly important again, yes?

Also, what is expected if I took this SIL and used it with a 64-bit target? Word in that case is 64 bit, and again this value seems like it is problematic.

Unfortunately, the things I’m getting explanations on are the things we already agree make sense. But hear me out here:

What I’m reading is that when SIL encounters a Builtin.Word, is that it treats it like a 64 bit value. Always. It isn’t fixed width, and follows the pointer size, so when the compiler generates an IntegerLiteralInst for that type, it asks for the maximum width (which is hard coded at 64-bits for pointer width semantics), I always get a 64-bit APInt, which means -1 becomes 4 billion under zero extension behavior that happens when folding the Int2048 literal into a Int32 literal and then folding the up conversion to Builtin.Word.

It may all be a wash when it gets truncated when outputting LLVM IR, but the SIL optimizer, doing 64-bit math with this value is hitting the error. That’s why the Projection kind has no value, the memory location isn’t remotely correct and well out of bounds of anything it knows about when processing the function.

I suspect the StringGuts change exposes this bug precisely because it is a Reversible collection that is using an UnsafePointer as the Index, causing SILGen to produce the address index and load instructions which need to be checked by the RedunantLoadElim pass.

Again, because it’s a 32-bit Word once you gen the IR for LLVM, unless the optimizer actually wants to check the address indexing behavior, you wouldn’t even know that this is a problem. Unless I tried to build 64-bit targets on a 32-bit host, or the optimizer actually wanted to know what is at offset 4 billion * sizeof(UInt16).

Kaiede · September 27, 2018, 4:19pm

I apologize if I come off as a bit harsh here, or rude. I’m just a little frustrated because it feels like I’m so close to the answer here, but I’m also literally brand new to this code base. I don’t have enough experience to confirm my thinking is correct, and would appreciate any help walking through this logic, or guidance on how to build a test to confirm/disprove my hypothesis.

If I was more familiar with the code, I’d probably have tested my hypothesis with a fix by now, hence my frustration.

jrose · September 27, 2018, 4:28pm

No, it looks like you're right. I wonder how this works on Apple's 32-bit platforms.

github.com

apple/swift/blob/b9b568b6ec5f58f1b0b6208aa86c6332f3c8c74e/include/swift/AST/Types.h#L1388-L1408


      
          /// Get the least supported value for the width.
          ///
          /// FIXME: This should be build-configuration-dependent.
          unsigned getLeastWidth() const {
            if (isFixedWidth())
              return getFixedWidth();
            if (isPointerWidth())
              return 32;
            llvm_unreachable("impossible width value");
          }
          
          
/// Get the greatest supported value for the width.
          ///
          /// FIXME: This should be build-configuration-dependent.
          unsigned getGreatestWidth() const {
            if (isFixedWidth())
              return getFixedWidth();
            if (isPointerWidth())
              return 64;
            llvm_unreachable("impossible width value");
          }

This definitely seems suspect to me.

Joe_Groff · September 27, 2018, 4:32pm

SIL doesn't know anything about the target configuration, so it treats Word as 64-bit conservatively. It is suspect that it's doing a zextOrBitCast_Int32_Word instead of a sextOrBitCast_Int32_Word. We could try fixing that, though I suspect in practice it still won't matter once we lower to LLVM IR.

Kaiede · September 27, 2018, 4:55pm

@jrose, that’s the code block I’m referring to, yes. And as a sheer hack, I set it to 32 on my local build just now. I was able to actually build the stdlib on 32-bit x86 Debian against the 2018-9-22 snapshot in master, so I think that lends a lot of evidence that this is the issue, and one of only two

(The other is that the static assert you recently added for SILLocation fails on 32-bit, I disabled it just to see how far I could get on master overnight out of a lack of anything else to try)

@Joe_Groff the issue is that LLVM isn’t the only one attempting to make sense of these literals. SIL is too. So whatever change is made needs to account for that. But I agree you can probably get away with leaving the emitted SIL alone so long as Builtin.Word has the correct size for the target platform.

Joe_Groff · September 27, 2018, 5:04pm

Sure, we shouldn't rely on stars aligning for things to work by accident. We should still fix the standard library to use sensible sign-extending operations here. Maybe we have problems like this elsewhere, but at least in the case you've highlighted, though, the only thing that appears to be using the value is the deallocRaw call, which should only care about the low 32 bits in practice on a 32-bit platform.

Kaiede · September 27, 2018, 5:09pm

I’ll just be clear, the case I highlighted around dealloc isn’t the break, but it was a good example of the constant folding that led to the break I am hitting in next().

The SIL from that post further up is what is breaking after all the constant folding is done.

Michael_Ilseman · September 27, 2018, 5:33pm

What would that fix be and where would it apply? This example is the default ReversedCollection operation that's calling formIndex(before:), which seems sensible:

  public mutating func next() -> Element? {
    guard _fastPath(_position != _base.startIndex) else { return nil }
    _base.formIndex(before: &_position)
    return _base[_position]
  }

_UnmanagedString's Index type is UnsafePointer<CodeUnit> (either UInt8 or UInt16), which I don't think breaks the indexing model:

extension _UnmanagedString : RandomAccessCollection {
  internal typealias Element = UTF16.CodeUnit
  // Note that the Index type can't be an integer offset because Collection
  // requires that SubSequence share indices with the original collection.
  // Therefore, we use pointers as the index type; however, we also provide
  // integer subscripts as a convenience, in a separate extension below.
  @usableFromInline
  internal typealias Index = UnsafePointer<CodeUnit>
...
  internal subscript(position: Index) -> UTF16.CodeUnit { ... }
...
  internal subscript(offset: Int) -> UTF16.CodeUnit { ... }
...

Since pointers and integers are distinct types, I wouldn't expect the Int-based convenience subscript to conflict with anything.

It's a RandomAccessCollection, deriving its index(before:) etc from the default:

  public func index(before i: Index) -> Index {
    let result = i.advanced(by: -1)
    // FIXME: swift-3-indexing-model: tests for the trap.
    _failEarlyRangeCheck(
      result, bounds: Range(uncheckedBounds: (startIndex, endIndex)))
    return result
  }

So in this case I would expect the UnsafePointer<UInt16> to be decrementing itself by its byte-stride of 2, instead of an Int that decrements itself by 1. Is this not the case? Where is the issue or am I missing something?

Joe_Groff · September 27, 2018, 5:38pm

Looking at @Kaiede's example:

suggests that the problem may be in Int._builtinWordValue.

Joe_Groff · September 27, 2018, 5:39pm

No worries, we appreciate that you're trying so hard to get to the bottom of this problem!

Michael_Ilseman · September 27, 2018, 5:43pm

  public // @testable
  var _builtinWordValue: Builtin.Word {
% if BuiltinName == 'Int32':
    return Builtin.zextOrBitCast_Int32_Word(_value)
% elif BuiltinName == 'Int64':
    return Builtin.truncOrBitCast_Int64_Word(_value)
% end
  }

CC @moiseev. Should we universally zext or sext, or should this vary with the signedness of Self?

Joe_Groff · September 27, 2018, 5:45pm

We ought to sext for signed types and zext for unsigned types.

Michael_Ilseman · September 27, 2018, 5:49pm

I filed [SR-8870] _builtinWordValue should zext/sext based on signedness · Issue #51376 · apple/swift · GitHub

Kaiede · September 27, 2018, 5:50pm

If I read the docs correctly, index_addr takes the type into account. So it is more like ptr[-1] in C, not ((byte *)ptr) - 1. Or more concretely: %19[%18] Makes sense to me.

That’s certainly one place where it enforces unsigned extension of the type, yes. I’m not sure why it’s emitting it in the case of formIndex(before:), I just know that it is. Trying to trace down all the hits where we use unsigned extension on Int32 to Word in GDB was painful, and I never actually caught it with my breakpoint due to the noise, unfortunately.

Later today, I’ll take a peek to see if updating _builtinWordValue is enough to fix it, or if the break remains (i.e. there’s another path to the builtin conversion)

EDIT:

@Joe_Groff I think you'll be happy to know that my test of modifying IntegerTypes.swift.gyb to use signed conversions works. Since that's a lot simpler than trying to make SIL see Builtin.Word as 32-bit (and probably less risky), and even works at the latest development snapshot on master, I'll put together the PR tonight for the change. It'll give me time to also start putting together the PR for building on x86 32-bit Linux hosts, too.