Compilation conditions for word size

idrougge · July 15, 2019, 11:46pm

Introduction

A lot of code written to support more than one platform contains #if arch() conditions to handle differences between 32-bit or 64-bit platforms or little and big endian CPUs. This may turn into long statements like #if arch(x86_64) || arch(arm64) || arch(s390x) || arch(powerpc64) || arch(powerpc64le), enumerating every (currently) supported architecture which must be handled by that #if … #endif clause. 1

Motivation

Checking each platform currently supported by Swift in order to conditionally compile 32-bit or 64-bit code does not express intent — it is just a list of platforms with certain implicit properties.
Furthermore, it is very fragile. Whenever a new platform is supported by Swift, it must be added to every source file that uses this technique to determine which code path to compile.

Proposed Solution

The current compilation conditions for os, arch, etc should be amended by a wordLength condition — or several in case different kinds of machine words need to be differentiated.
An equally valid spelling might be intWidth().

This would replace a line such as #if arch(x86_64) || arch(arm64) || arch(s390x) || arch(powerpc64) || arch(powerpc64le) with #if wordLength(64) or even #if wordLength(>=64)

Impact on existing code

This would not deprecate the current arch() condition, so any existing code would still compile as usual.

Alternatives considered

if Int.bitWidth == Int64.bitWidth would ideally do the same thing as #if intWidth(64). This is not guaranteed to be resolved at compile-time, though, so people still rely on checking the architecture with compilation conditionals. Compilation conditionals also have the advantage of being explicit about compile-time evaluation.

jrose · July 15, 2019, 11:49pm

I've long been in favor of this, so thank you for bringing it up again. I'll note that if Int.bitWidth == Int64.bitWidth only works in statement contexts; you can't use it to choose what members to put in a struct, for example. So we still benefit from the #if form.

scanon · July 16, 2019, 12:28am

I'm broadly in favor of this, but would like some clarity on what the wordLength on platforms like arm64_32 or x86_32 is expected to be.

compnerd · July 16, 2019, 12:56am

I’m not a fan of the wordLength. Right now, I see two different conditions that we are trying to check:

pointer size (32-bit vs 64-bit)
a LP64 vs LLP64 environment
ARM64_32 and x86_32 are interesting cases (L64P32). The word size there is supposed to be 64-bit.

Where would word size get used generally? (The cases for motivation seems like _pointerSize is sufficient, am I missing something?).

jrose · July 16, 2019, 1:04am

arm64_32 is ILP32, but sure. (The point is taken since the "word length" could reasonably be said to be 64 bits even though that's not the size of int, long, or Swift.Int.)

xwu · July 16, 2019, 2:20am

I have this implemented as _pointer_bit_width in this PR from a year ago:

github.com/apple/swift

[WIP] [Parse] Add _pointer_bit_width platform condition

apple:master ← xwu:native_word_size

opened 08:52PM - 05 Feb 18 UTC

xwu

+76 -27

Instead of explicitly listing each platform when testing for the native word siz…e (which, per documentation, is the bit width of `Int`), add an explicit (underscored) platform conditional. This supersedes the solution discussed in #14386 and #14409.

Overall, I think pointer bit width would be a reasonable way to go.

idrougge · July 17, 2019, 11:27pm

I'm not versed in the intricacies of pointer size versus Int size, but even Foundation itself contains a lot of #if arch(…) clauses which have more to do with the size of Int than the size of pointers as such (though they may be intertwined in all currently supported Swift platforms).

Code "in the wild" often seems to be concerned with the differing underlying types of CGFloat on 64-bit and 32-bit platforms. I don't know exactly which heuristic would cover that, but it's obvious that the combination of arch(i386) and arch(arm) holds the key to that.

compnerd · July 18, 2019, 2:37am

The CGFloat Is directly related to the pointer size. The default type of CGFloat on 32-bit Darwin is float, while 64-bit Darwin uses double.

The checks in Foundation usually are for LP64 vs LLP64 rather 32-bit vs 64-bit. Unfortunately, the use of _pointer_bit_width is insufficient to detect this particular case :-(.

Joe_Groff · July 18, 2019, 2:38am

This is an accident of history, though, not a guarantee.

compnerd · July 19, 2019, 1:28am

Agreed that it is purely an accident of history, but, would it not be fair to characterise this as something which has become embedded into the ABI (that is, it is not possible to change this without breaking ABI and therefore it is unlikely to change)?

Joe_Groff · July 19, 2019, 1:32am

There could theoretically be new ABIs where CGFloat doesn't follow the word size.

idrougge · July 19, 2019, 8:30am

That would be a breach of contract, though.

The size and precision of this type depend on the CPU architecture. When you build for a 64-bit CPU, the CGFloat type is a 64-bit, IEEE double-precision floating point type, equivalent to the Double type. When you build for a 32-bit CPU, the CGFloat type is a 32-bit, IEEE single-precision floating point type, equivalent to the Float type.

Rod_Brown · July 19, 2019, 11:18am

From what you quoted, nothing in that specifically says CGFloat follows the word size. It says it is CPU architecture-dependent - that is, if an architecture changes, it may differ. How, it makes correlations, but does not state outright.

Nevertheless, I think this points to a complexity we need to think about in the architecture check, and shows perhaps one of the reasons this may have been held back for some time: we need to really define what we mean by these checks, and what we expect to be able to do based on them. How flexible do we need the checks to be? What are the problems we intend to solve so this feature fills the requirements of the developers like yourself who'll use it?

Definitely pro the idea. I think we just need to be careful to fully understanding the goal.

scanon · July 19, 2019, 1:08pm

Eh, that's already bogus. CGFloat is Float on arm64_32, which indisputably is "for a 64-bit CPU".

But more to the point, code that isn't the CoreGraphics SDK overlay should not be trying to infer what CGFloat is based on pointer size, CPU, or anything else. It should use CGFloat when necessary, and convert to a type of known-size when necessary, and use MemoryLayout when exact layout is required. Attempting to infer what CGFloat "really is" instead of using the information provided by the SDK is basically always a bug.

jberry · July 19, 2019, 4:03pm

This discussion seems to be going a bit into the weeds with the distraction of CGFloat.

It seems like what's needed are independent tests for the bit size of the natural word in the architecture, and the bit size of a pointer.

intBitWidth
wordBitWidth
pointerBitWidth

More generally, and in addition, perhaps a conditional to test the size of a type: bitWidth(Int) or bitWidth(CGFloat) would also address the CGFloat and similar issues if needed?

scanon · July 19, 2019, 4:08pm

This sort of information already has the spelling MemoryLayout<T>.size--are you really just asking to be able to use that in an #if context?

jberry · July 19, 2019, 4:12pm

Yes, that would seem to cover the general case of checking the width of a type, including Int. I don't think it covers the width of a pointer (though _pointer_bit_width does, I guess, if it's public)?

scanon · July 19, 2019, 4:14pm

MemoryLayout<UnsafeRawPointer>.size?

jberry · July 19, 2019, 4:14pm

Ah yes!

Joe_Groff · July 19, 2019, 4:27pm

Basing an #if conditional on the size of specific types would be problematic, because it creates a layering problem. #if happens before any imports or semantic analysis even happens, so we wouldn't even be able to do name lookup to find out where a type is, let alone its layout. In principle it could also create circular dependencies, like:

#if sizeof(Foo) == 8
import Foo_is_16 // defines struct Foo { var x, y: Int64 }
#elsif sizeof(Foo) == 16
import Foo_is_8
#endif

It also seems like it invites misuse, since, like with the discussion above, you could check the size of one type and draw inappropriate conclusions about other types from it. Having a conditional that checks a higher-level trait of the platform (like ILP32/LP64/LLP64-ness) makes sense to me, though we should be careful to specify exactly what these mean.

Note that, if you're just trying to conditionalize logic within a function, without changing the types of declarations, if MemoryLayout<UnsafeRawPointer>.size == 8 already works fine and will get constant folded away.