[Pitch] Pointer bit width compile time conditional

drexin · August 10, 2022, 6:33pm

Hi all,

there have been some discussion on this in the past (e.g. Compilation conditions for word size) and @xwu provided an implementation, but there was never a proposal to actually add this. I think this is a very useful feature, so I took it over, created an updated PR and am now creating this pitch.

Looking forward to your feedback.

Introduction

As a multi-platform language Swift supports various CPU architectures with different
pointer sizes. The pointer size can be checked at runtime using MemoryLayout, but
it is currently not possible to use this information at compile-time. This proposal
aims to add a compile-time conditional to check the pointer size in bits for the
given target architecture.

Motivation

Currently the only way to branch on pointer size at compile time is to list every
supported CPU that uses the specific pointer size as follows:

#if arch(i386) || arch(arm) || arch(arm64_32) || arch(wasm32)

This code is error prone and hard to maintain. Whenever a new target architecture
is added, code has to be carefully examined and updated for the new architecture.

Being able to branch on the pointer size at compile time would eliminate the need
to update code like this and instead only require the pointer size to be configured
when adding the new target architecture to the compiler.

Proposed solution

We are proposing to add a new compile-time conditional pointerBitWidth, that
checks if the target architecture uses the specified pointer size.

Usage:

struct MyStruct {
#if pointerBitWidth(32)
    let myProperty: Float
#elseif pointerBitWidth(64)
    let myProperty: Double
#else
    #error("Unsupported pointer size.")
#endif
}

Source compatibility and ABI

This change is purely additive.

Alternatives considered

An alternative appraoch would be to use the existing runtime mechanism instead.

Example:

if MemoryLayout<UnsafeRawPointer>.size == 4 { // system uses 32-bit pointers
    // ...
}

This should generally be constant folded by the compiler, but it is limited
in that it can only affect runtime control flow and not be used for conditionally
available functions or type layouts.

Example:

struct MyStruct {
    if MemoryLayout<UnsafeRawPointer>.size == 4 {
        let myProperty: Float
    } else {
        let myProperty: Double
    }
}

This code is invalid and will cause a compiler error.

jrose · August 10, 2022, 6:45pm

Seems reasonable, with the caveat that choosing between Float and Double based on the platform word width is generally a bad idea, CGFloat notwithstanding!

Bikeshed on the name: Rust has just barely begun attempting to distinguish pointers from addresses to handle CHERI as well as platforms with segmented address spaces (see their “strict provenance” experiment). Right now Swift has two core word-sized type families: Int/UInt, and *Pointer. If those ever diverge, which width would this query?

(I don’t think this is likely to happen for Swift any time soon, and we can always change the name at that point. But it might be nice to consider it now.)

David_Smith · August 10, 2022, 6:50pm

This distinction (word/pointer/integers) does seem potentially important with Windows being LLP64 while most Unix-like systems are LP64

drexin · August 10, 2022, 6:51pm

Yeah, it's a silly example. Happy to change that to something better.

I think it would always be based on *Pointer, thus the name pointerBitWidth.

ksluder · August 10, 2022, 6:53pm

Is it then worth having intBitWidth as well?

benrimmington · August 11, 2022, 8:53am

This will be a useful feature. Is there a name that works better with the Int and UInt types?

#if bitWidth(64)
#if wordBitWidth(64)

In the previous pitch from 2019, it was observed that "word" can be ambiguous (e.g. in Windows APIs and x86 assembly language). However, in Swift the Atomics.DoubleWord and BinaryInteger.Words APIs both use the UInt type.

Should _endian(big) and _endian(little) also be made available? I only found a few uses:

I mention endianness because it's also related to computer architecture:

#if arch(powerpc64le)
#if arch(*, bitWidth: 64, endian: little)

scanon · August 11, 2022, 12:49pm

"Word" is quite problematic (especially when we talk about targets like x86_32 or arm64_32) and should be avoided in this setting. I would simply call this intBitWidth(); it is Int.bitWidth, after all.

However, I'm nervous about adding this feature. The need that it addresses is a very real pain point, but the examples of why it is needed and how it would be used all show why it is also a bad idea: attempting precise control of memory layout in swift, with the language as we have today, is usually a programming error. If you want to precisely layout structs to match a wire format like this, you should instead define them in a .h file, and the C preprocessor already has the appropriate conditionals. I worry that adding this feature without providing the other tools necessary for precise layout control in swift will lead people to write code that they should not write.

I also wonder if the ultimate solution we want is not to "simply" allow compiler-evaluable expressions to be used (non-recursively?) in compile-time conditions. Should this in fact be something like:

#if UnsafePointer<T>.bitWidth == 64
#else
#endif

This is obviously a much larger feature, but it also seems like a much better feature. There's obviously a tradeoff between making an easy fix now and building some wild long-term solution, but I do worry about scattering stop-gap features across the language in the meantime.

tera · August 11, 2022, 2:28pm

If it is literally, say

#if MemoryLayout<UnsafeRawPointer>.size == 8
#else
#endif

or #if UnsafeRawPointer.size == 8

and assuming all pointer sizes are equal (are they?), can't the relevant code in the parser just have a special case for MemoryLayout<UnsafeRawPointer>.size string? (I know it's a bit dirty (e.g. what if I put space around dots, or what about "UnsafeMutableRawPointer", etc, but still).

Re: the size vs bitWith: is it important to have "64" instead of "8"? Can that number ever be 63 or 65 or will it always be in chunks of 8?

xwu · August 11, 2022, 2:38pm

Perhaps, but presumably we'd want some way to distinguish between evaluating these properties for the target or host platform, and even then it's all quite a mouthful (MemoryLayout<UnsafeRawPointer>.size is...a lot), not to mention that a bitWidth API doesn't even exist on UnsafePointer, compiler-evaluable or not.

As there aren't an infinite set of memory-related compile-time conditionals which users have identified a need for, and endianness and pointer bit width have definitely been desired and/or already have use cases in the code base, and since the lack of bitWidth on pointer types may point to some of these things being more useful at compile time than runtime and not exactly of-a-kind with the broader ask of being able to use runtime facilities that are compiler evaluable in compile-time conditionals, I think it's justifiable as a separate and self-contained proposal to have the simple syntax for the already-identified use cases. I'm a little biased, though, obviously.

ksluder · August 11, 2022, 2:40pm

I’m not very familiar with the effort to support Swift on s390x, but historically that architecture has had 31-bit pointers.

scanon · August 11, 2022, 3:06pm

At risk of derailing the thread on a tangent, under what circumstance would you ever want to evaluate them for the host platform?

Joe_Groff · August 11, 2022, 4:14pm

I'm not a huge fan of this myself, but would using the jargon acronyms like #if LP32/LP64/LLP64/ILP32/etc be "term of art" enough to be widely understood? They are very specific and compact for folks who know them, but possibly hard to understand for folks who don't.

tera · August 11, 2022, 4:35pm

I'm trying to understand what would I want to do differently in these cases (31 vs 32 bits). Specifically during compile time.

I always google them, every time

xwu · August 11, 2022, 4:47pm

I wouldn’t, not for these conditionals certainly. But as a user, I know that #if means the compiler is doing the evaluating, and I know that using #if os(…) in SwiftPM configurations is a footgun because it does exactly what it says on the tin—Swift conditionalizes on the platform on which SwiftPM is running and not the target platform of the package. Naturally this leads me to wonder if a generalized facility to evaluate Swift at compile time on the host machine is evaluating these expressions for the host platform or the target platform.

beccadax · August 11, 2022, 7:27pm

scanon:

I also wonder if the ultimate solution we want is not to "simply" allow compiler-evaluable expressions to be used (non-recursively?) in compile-time conditions. Should this in fact be something like:
#if UnsafePointer<T>.bitWidth == 64
#else
#endif

This would be difficult from a layering perspective because we want to first build the AST for the current file and then process its imports, but building the AST requires evaluating the condition, and the condition would refer to imported declarations.

beccadax · August 11, 2022, 7:30pm

(Sorry for the accidental serial post.)

drexin:

Usage:

struct MyStruct {
#if pointerBitWidth(32)
    let myProperty: Float
#elseif pointerBitWidth(64)
    let myProperty: Double
#else
    #error("Unsupported pointer size.")
#endif
}

Nit: do we want to spell this with a comparison operator, like we do for version checks?

#if intBitWidth(<64)
    let myProperty: Float
#else
    let myProperty: Double
#endif

jrose · August 11, 2022, 8:10pm

Probably not! There are platforms with 16-bit pointers, even if Swift doesn’t support them.

ksluder · August 11, 2022, 10:41pm

One thing I can think of is defining a pointer value as a compile-time constant.

Would #if LLP64 reflect how the Clang importer parses C headers? That could be useful when trying to interoperate with C, but if C and Swift diverge on a particular platform, developers could follow the wrong lead. (@compnerd, does Swift on Windows map Int to Int32 or Int64?)

As another example, RISC V defines four different ISAs: two 32-bit ISAs, one 64-bit ISA, and one 128-bit ISA. These present the expected power-of-2-sized linear address spaces. The 128-bit ISA isn’t completely finished yet, but it’s documented in the user-level spec.

There’s also the issue of how many bits of a virtual address are actually valid. This is actually defined by a combination of hardware and runtime settings. For example, an x86 processor may or may not support PAE. If it does, and the OS enables it, the page table maps 32-bit virtual addresses to 64-bit physical addresses. All x86_64 processors map 64-bit virtual addresses to 64-bit physical addresses, but the top 64-N bits must be a sign extension of the Nth bit, where N is determined by the number of page levels the OS has told the processor to implement.

I suspect it’s worth providing expressions that are tailored to particular use cases. Otherwise Swift libraries might make unfortunate decisions like CoreGraphics did, by basing the definition of CGFloat on #if LP64 when the integer size has nothing to do with the floating point size.

compnerd · August 12, 2022, 5:10am

I don't think that Swift and Clang can diverge when it applies to the importing of C APIs.

Int maps to something 64-bit or 32-bit but is not an alias to Int32 or Int64. This applies across the platforms.

ksluder · August 12, 2022, 3:40pm

Sorry, I should have been more clear. As you point out, ClangImporter and standalone clang have to agree with each other or else absolute chaos will ensue.

My concern is about situations where Swift and clang choose different defaults for each language’s natively-sized types—e.g. a platform where Swift decides that Int has 64 bits, but established C convention is that int has 32 bits. I think it’s important to explicitly allow Swift to make the decision to deviate from established C practice on any given platform. The LLP model exists to minimize the porting effort for existing codebases. There are no Swift codebases that predate the existence of 64-bit processors, so Swift should choose the size of Int based solely on what’s right for the modern architecture.

On an LLP platform where Swift chooses to use 64-bit integers, ClangImporter would still treat int as 32 bits, and imported C structures will have Int32 members in the Swift projection. And #if LLP would evaluate to true to match how ClangImporter handles C structures. But #if LLP cannot serve as a substitute for #if integerBitWidth() on such a platform, because Swift.Int has 64 bits.