[Pitch] Low-level operations for volatile memory accesses

kubamracek · January 16, 2024, 11:11pm

Hello, Swift Community!

Me and @rauhul would like to make a pitch for adding low-level support for volatile memory operations in Swift, which are very common and necessary in embedded programming for configuring hardware devices. This is not supposed to be a complete user-facing high-level solution for volatile operations, quite the opposite: This should be only the very first step that makes the most primitive low-level unsafe volatile operations available, and assumes that safe(r) layers are built on top of those. Concretely, the swift-mmio is an existing library that provides structured high-level MMIO APIs. It today uses a C header to perform the actual volatile operations using Clang's volatile support. This proposal aims to only improve the internal implementation of libraries like swift-mmio by giving them a way to perform volatile access directly in Swift code, but there's no expectation that the described low-level primitives would end up surfaced in user-facing APIs.

The implementation for this pitch can be seen in this PR: https://github.com/apple/swift/pull/70944.

Introduction

Volatile operations are needed to program registers in environments like firmware for microcontrollers or drivers.

This proposal:

adds APIs for the most basic load + store volatile operations, on 8, 16, 32 and 64 bit integers,
assumes the actual volatile semantics are defined by Clang and LLVM, which match the commonly understood behavior (no removal or reordering by the compiler), see the definition at https://llvm.org/docs/LangRef.html#volatile-memory-accesses,
packages the new APIs into a separate module which must be explicitly imported by the user (import Volatile) -- this is intended to prevent accidental usage of volatile operations in e.g. userspace code for inter-thread synchronization (a common misuse of volatile in C).

This proposal:

doesn't try to provide any safety or structured access to these operations, as that is left for libraries built on top of the low-level APIs,
doesn't try to make it possible to mark struct members or variables as "volatile" (like C allows), instead only a pointer to storage can be volatile and a load/store using an eligible pointer can be volatile -- this sidesteps many design issues around volatile that C has,
doesn't try to provide an abstraction over making more types volatile (e.g. how the AtomicValue and AtomicRepresentation protocols in swift-atomics provide an abstraction for making custom types atomic) -- the reasoning is that unlike atomics, volatile operations are only intended for MMIO HW register access and similar usage, which is only meaningful on machine level integer types and allowing custom types to become volatile would encourage misuse for inter-thread synchronization.

Proposed solution

The proposal is to create a new library/module called "Volatile" that would be shipped with the toolchain (given how close the logic is to the compiler implementation) and would define the following API set:

struct UnsafeVolatilePointer<Pointee> {
  init(bitPattern: UInt)
}

extension UnsafeVolatilePointer<UInt8> { // also 16, 32, 64
  func load() -> UInt8 // with LLVM volatile load semantics, assume natural alignment
  func store(_ value: UInt8) // with LLVM volatile store semantics, assume natural alignment
}

For convenience, the module would also define helper extensions on UnsafeMutablePointer:

extension UnsafeMutablePointer<UInt8> { // also 16, 32, 64
  func volatileLoad() -> UInt8 // with LLVM volatile load semantics, assume natural alignment
  func volatileStore(_ value: UInt8) // with LLVM volatile store semantics, assume natural alignment
}

An example usage of these APIs by e.g. a piece of code running on an embedded device:

import Volatile

func turnLEDOnOrOff(enable: Bool) {
  let gpio_a1_enable = UnsafeVolatilePointer<UInt32>(bitPattern: GPIO_BASE + GPIO_A1_ENABLE_OFFSET)
  gpio_a1_enable.store(enable ? 0x1 : 0x0)
}

However, as mentioned above, we don't expect application level code to use UnsafeVolatilePointer or volatileLoad/volatileStore directly. Instead a higher-level library should provide a structured and safe access to the semantics of the hardware, for example:

import MMIO

func turnLEDOnOrOff(enable: Bool) {
  // pseudo-code, not part of this proposal:
  GPIO.a1.modify { $0.enable = enable } // the accessors compute the correct pointee/offset and perform a volatile store
}

Because of the very limited intended usage of these low-level volatile operations, the proposed solution is not trying to add anything beyond the absolute minimum set of primitives. Concretely:

UnsafeVolatilePointer does not offer any pointer arithmetic facilities.
There is no conversion APIs between UnsafePointer and UnsafeVolatilePointer.
There is no structured / offset-based API on UnsafeVolatilePointer.

Detour: The complexity of "volatile" in C

In C, "volatile" is a type qualifier, and it's specifically allowed even on non-pointer types and on aggregate types. Ostensibly correct use of volatile pointers in C may produce operations violating hardware requirements in innocuous settings. See the following examples:

typedef struct {
    volatile uint8_t  field0;
    volatile uint32_t field1;
} device_t;

// User might try to copy the value of one memory-mapped device_t directly to another:
volatile device_t* inst0 = (volatile device_t*)0xfe00;
volatile device_t* inst1 = (volatile device_t*)0xff00;
*inst0 = *inst1; // ✗ on 32-bit platforms results in a memcpy optimized to two 32 bit load store pairs

This shows that the convenience of volatile in C has some dangerous sharp edges. This proposal specifically aims to avoid this problem by only allowing volatile operations on machine level integer types.

Conclusion

As mentioned, the intended usage of these low-level primitive operations is very limited, and users who need to access MMIO registers should prefer high-level structured safe abstractions like what swift-mmio provides. For those users, this proposal doesn't change anything. For implementors of library code that implements these abstractions and for code that for some reason needs to perform direct MMIO device accesses, this proposal provides a basic set of primitive operations for volatile loads and stores to that they don't need to resort to workarounds like bridging headers.

Thoughts?

Joe_Groff · January 16, 2024, 11:20pm

We recently accepted Atomics into the standard library, which have a similar fundamental issue to volatile accesses that the hardware only supports them for a fixed set of types. Atomic<T> requires T: AtomicRepresentable to impose this restriction rather than providing ad-hoc overloads of the operations only for specific supported types. Should UnsafeVolatilePointer use the same T: AtomicRepresentable constraint, or should we have a similar VolatileRepresentable protocol? If we use the same protocol, there are a few benefits:

Helper code can still be written generically over T: AtomicRepresentable, which can't be done with ad-hoc overloads for specific types.
The standard library conforms a bunch of eligible types to AtomicRepresentable in addition to the UInt types, including the signed integers and pointer types, and it provides a default implementation for types that conform to RawRepresentable, allowing user types to be made AtomicRepresentable when they can be mapped to a primitive atomic-representable type.
The implementation already conditionalizes AtomicRepresentable conformances to types which have hardware support for atomics, so for example, 64-bit types are not AtomicRepresentable on platforms which don't have 64-bit atomic operations. This also seems valuable for volatile accesses, where we also don't want to support loads and stores that might tear on the target platform.

It's plausible though that there are platforms we care about that could support non-tearing non-atomic accesses of sizes different from the set of sizes for which it supports atomic operations, in which case we'd need another protocol to go this road.

wadetregaskis · January 17, 2024, 6:50am

It sounds like the proposal is mainly interested in the cases where reads and/or writes have side-effects, or some other reason why the exact number of reads & writes is significant? As opposed to shared memory (e.g. multi-threading) for which we already have atomics and similar, or more general instruction ordering for which we [can] have separate memory barrier methods?

The size of volatile accesses

What about the other pointer types, e.g. UnsafeMutableBufferPointer?

Is that accurate? Device memory (in ARM parlance) is volatile but isn't necessarily registers (e.g. memory-mapped but non-coherent SRAM or DRAM). There can still be a need for the semantics that [C-style] volatile provides, most notably: ensuring reads & writes happen exactly as written, with no repetition (e.g. reloading a spilled register) nor eliding. Repetition might "merely" be a performance concern, but I think it is important even as such. And elision is of course a functional concern.

Granted you can maybe build the necessary abstractions atop primitive word-sized reads, but that might not give you the performance you need. e.g. what about SIMD loads & stores?

Type vs variable vs access modifiers

I agree that C has a lot of flaws in its implementation, but that doesn't prove to me that the type modifier approach is inherently wrong. I can see how tying volatility only to the pointer types could help prevent some types of misuses, but it's also pretty limiting - it's intuitive to me to declare a type to represent a section of address space and mark either some or all of it as being volatile. It's also much nicer to have few[er] high-level pointers to composite types, than having to deal with myriad pointers to individual, piecemeal values.

Swift has a more powerful type system than C; perhaps it could do a better job of a volatile type modifier (or equivalent). e.g. maybe it could just not do problematic things like coalesce volatile accesses (re. your UInt8 + UInt32 struct example) or other things that change the load / store width.

I'm not saying the pitch is wrong in this respect, I'm just saying I don't think it proves it's right, yet. It'd be helpful to have some elaboration as to why the type & variable modifier approaches are intrinsically wrong, not merely implemented wrong in some existing languages.

Relationship to atominicity

As @Joe_Groff mentioned, volatility and atomicity are quite often required together. So I find Joe's suggestions appealing in that respect.

But, volatile is not necessarily about shared access, in the way atomics are. Often it's just to prevent the compiler from unwittingly creating problems even in simple serial code. So I do think the concept needs to be applicable to more than just AtomicRepresentables (unless I misunderstand what types can actually, plausibly conform to AtomicRepresentable?).

Conceptually, I don't think all volatile accesses have to be atomic. It could be perfectly fine to split up a read into smaller pieces, even to reorder their loads. The point may simply be to ensure the overall read actually happens when & where it's supposed to.

Conversely, not all atomic accesses are volatile. It is a valid circumstance to merely need a consistent view of a given set of bits, not necessarily to care how often or precisely when those bits are read from memory. (If I understand correctly, Joe is not proposing that constraint - I'm just noting it.)

What is this 'volatile' thing, anyway?

This might be a chance to break from the name 'volatile', given it has very inconsistent meanings across popular languages. A new, distinct name might also help streamline the proposal & review, by preventing people bringing assumptions and preconceptions, about what the intent of the proposal is, based on their personal experience with other languages.

Depending on who you ask, 'volatile' is used to:

Ensure reads & writes are not reordered (and w.r.t. to what is a further source of disagreement).
Ensure reads & writes are not elided or duplicated.
Ensure reads & writes are atomic (as in not torn).
Ensure reads & writes are not cached (not merely conceptually, as implied by some of the other points, but explicitly bypassing processor caches a.k.a. non-temporal loads & stores).
Communicate that reads and/or writes might have unspecified side-effects.
Prevent speculation & prefetching.
Probably other stuff I'm not thinking of right now, or not even aware of.

Whether it actually does some or all those things in any given language and toolchain is yet another fun question. Suffice to say it's a mess. Maybe Swift can avoid that confusion entirely.

scanon · January 17, 2024, 3:08pm

Half the point of C-style volatile as an (imperfect) binding for device memory (and similar) is that you cannot combine scalar accesses into wider accesses as an optimization when working with device memory, because the bus might not actually support any other transaction size.

It could be perfectly fine to split up a read into smaller pieces, even to reorder their loads. The point may simply be to ensure the overall read actually happens when & where it's supposed to.

These two sentences are at odds with each other, and with the constraint you list later "communicate that reads and/or writes might have unspecified side-effects." If reads and writes can have side effects, and must happen where and when they are supposed to, then they must not be split, merged, or reordered. C (and C++) "volatile" is used for a lot of things that it doesn't actually do, but this is the one thing that it very clearly does have to do to be useful.

nikolai.ruhe · January 17, 2024, 3:47pm

While this might be the case for some uses it is not generally true with regard to Swift MMIO. Volatile read access to memory mapped registers in MCUs cannot be split up or reordered.

Often times the semantics are implemented in a way where reading a pair of registers must happen in the proper order. The second read clears an interrupt, kicks off another ADC measurement or has other side effects.

ksluder · January 17, 2024, 4:18pm

Is it worth retaining the volatile terminology? The pitch explicitly limits this feature to memory-mapped IO, excluding all the other use cases for volatile in C.

JanWillemBrands · January 17, 2024, 4:24pm

I propose PEEK and POKE

scanon · January 17, 2024, 4:36pm

I would gently suggest JF's talk "deprecating volatile" for anyone who wants to understand what "volatile" means (and meant historically, and also was incorrectly believed to mean) in the context of C and C++ code, as a pretty good primer for understanding this proposal.

wadetregaskis · January 17, 2024, 4:37pm

Yep. But it might. There are use-cases either way, within the nebulous existing definition of 'volatile'. Remember that I wasn't focusing on C's definition of 'volatile'.

I don't think so. You might need to ensure the compiler doesn't elide the read, but otherwise not care how it does the read. Think more about cases of shared memory than side-effects, e.g. where you're in a loop monitoring for a signal that something has changed, and upon receiving it you need to actually see those changes (as opposed to the compiler mistakenly believing the value cannot change and hoisting the read out of the loop entirely).

Now, it might be that compiler memory barriers (or similar) serve this purpose. But I'm not sure we have those yet, in Swift, do we? In any case, the functionality I've described is a common use-case for volatile in other languages (for better or worse). Thus why I mentioned it, to - if nothing else - consider the legacy and baggage associated with 'volatile'.

ksluder · January 17, 2024, 4:39pm

kubamracek:

For convenience, the module would also define helper extensions on UnsafeMutablePointer:

extension UnsafeMutablePointer<UInt8> { // also 16, 32, 64
  func volatileLoad() -> UInt8 // with LLVM volatile load semantics, assume natural alignment
  func volatileStore(_ value: UInt8) // with LLVM volatile store semantics, assume natural alignment
}

This seems like a footgun. If the whole point of this feature is to model pointers that can’t be treated like normal pointers, why offer the ability to confuse them? Especially if this means types bridged from C/C++ will produce UnsafeMutablePointers that you just “have to know” are actually volatile.

My preferred alternative would be to remove the volatile methods from UnsafeMutablePointer and add the ability to convert between UnsafeMutablePointer and UnsafeVolatilePointer. (And per my previous comment, rename UnsafeVolatilePointer to something like UnsafeMMIOPointer.)

Other quick questions:

Is it implied that UnsafeVolatilePointer.Pointee is constrained to trivial types?
Is there a potential use case for UnsafeVolatileRawPointer? (Edit: probably not, since I am assuming all volatile pointers are indeed constrained to trivial types)

JetForMe · January 17, 2024, 5:12pm

Can this proposal ensure that it’s possible to minimize the number of machine instructions application-level code ends up generating to access hardware registers? On smaller devices this can be critical, from both performance and code size perspectives. Having to call through a “safe layer” might make some operations impractical or even impossible if the compiler can’t guarantee optimal code generation, even in a debug build.

Relatedly, in some processors it’s important to order writes tightly. I recall, 8-bit AVR has a 16-bit MMIO register(s) that can be written atomically if one writes the high byte followed by the low byte with no intervening writes. It’s not clear to me (upon my admittedly cursory reading) that this proposal allows for that.

kubamracek · January 17, 2024, 5:22pm

Can this proposal ensure that it’s possible to minimize the number of machine instructions application-level code ends up generating to access hardware registers?

This will be the case. It's already "solved" in swift-mmio by making all the layers transparent/inlineable, which doesn't produce any optimization-blocking barrier on the actual volatile reads and writes. The rest is a task for LLVM, which should already be able to do a pretty good job at this, and in case there's any missed optimization opportunities at the LLVM level, this proposal doesn't add any problems to that.

nikolai.ruhe · January 17, 2024, 5:45pm

... or just MMIOPointer.

If the types are constrained to trivial types, and the memory can't be allocated, there's nothing unsafe about this type of pointer.

wadetregaskis · January 17, 2024, 5:59pm

But it still has to be initialised with an arbitrary address, which the compiler cannot verify. So that's unsafe, isn't it?

scanon · January 17, 2024, 6:05pm

IIRC I suggested off-thread that the init could have an unsafe: label, but the type could just be VolatilePointer.

wadetregaskis · January 17, 2024, 6:12pm

That makes some sense… but isn't this 'volatile pointer' essentially a logical subclass of the regular pointer types? So is it purely the initialisation step that's unsafe, with UnsafePointer et al? I was under the impression the 'Unsafe' in the name was to represent the general expert-mode nature of raw pointer manipulation?

Joe_Groff · January 17, 2024, 6:14pm

For their use as an MMIO interface, volatile pointers are more or less completely distinct from normal pointers. If you're referencing a hardware register at a fixed address, that address is never going to be deallocated or reallocated, and there are no arithmetic operations on the type as proposed to let you get to other possibly-nonexistent addresses. The only "unsafe" part is getting the address right when you construct it.

scanon · January 17, 2024, 6:15pm

No; UnsafePointer has all sorts of operations (including arithmetic), which makes pretty much everything you could do with one potentially unsafe. Volatile[Unsafe]Pointer is not a subtype of UnsafePointer and doesn't have any of these operations; all you can do is create one from an address, load, and store.

wadetregaskis · January 17, 2024, 6:34pm

I think I have to reiterate that using the word 'volatile' here is creating a lot of confusion, because some people are clearly interpreting that in a very specific way (e.g. C's "volatile" keyword) while others are not.

I get that this specific pitch is motivated by some specific work in a specific space (Embedded Swift), but I believe Swift is intended to be a general-purpose language. As such, people come to it from languages other than C, such as Java, or C#, where 'volatile' means something very different.

wes1 · January 17, 2024, 8:46pm

Based on the docs and discussion, below is a variation of the UnsafeVolatilePointer docc that might head off misperception and mis-use with a narrower scope.

But it raises a question: is @_transparent transitive, such that a transitive-transitive call is inlined? i.e., can a library offer @_transparent function() to wrap a @_transparent VolatilePointer.load() such that the client code only gets the builtin LLVM volatile load? (Particularly when supplying simple-enum values?)

https://github.com/apple/swift/blob/main/docs/TransparentAttr.md

UnsafeVolatilePointer docc:

A pointer for LLVM "volatile" access to unsigned integers, suitable mainly for implementing memory-mapped device I/O without interference from compiler optimizations for normal memory access.

An LLVM volatile load or store cannot be added, removed, or reordered relative to other volatile operations by the compiler. However, they may be reordered with non-volatile operations and thus are not suitable for inter-thread synchronization.

Memory address errors may be fatal on initialization or use. These pointers are intended to be initialized not from dynamic memory but from fixed locations validated for a specific device.

The load and store methods are @_transparent, i.e., compiled directly into the call site (like @inlined) and skipped over when debugging. Constrained devices may require eliminating any call overhead.

This API is intended only for use by device driver libraries internally mapping known memory locations using valid unsigned integers. The public API of such libraries may wish to avoid surfacing volatile addresses, operations, or values directly to client code if any errors are possible, consistent with the need to eliminate call overhead.