[Pitch] Unaligned Loads and Stores from Raw Memory

This is another in a series of proposed improvements to the UnsafePointer and UnsafeBufferPointer families. This time, we add a load operation from potentially-unaligned memory offsets.

I look forward to your feedback.

Unaligned Loads & Stores from Raw Memory

Introduction

Swift does not currently provide a clear way to load data from an arbitrary source of bytes, such as a binary file, in which data may be stored without respect for in-memory alignment. This proposal aims to rectify the situation, making workarounds unnecessary.

Swift-evolution thread:

Motivation

The method UnsafeRawPointer.load<T>(fromByteOffset offset: Int, as type: T.Type) -> T requires the address at self+offset to be properly aligned to access an instance of type T. Attempts to use a combination of pointer and byte offset that is not aligned for T results in a runtime crash. Unfortunately, in general, data saved to files or network streams does not adhere to the same restrictions as in-memory layouts do, and tends to not be properly aligned. When copying data from such sources to memory, Swift users therefore frequently encounter aligment mismatches that require using a workaround.

For example, given an arbitrary data stream in which a 4-byte value is encoded between byte offsets 3 through 7:

let data = Data([0x0, 0x0, 0x0, 0xff, 0xff, 0xff, 0xff, 0x0])

In order to extract all the 0xff bytes of this stream to an UInt32, we would like to be able to use load(as:), as follows:

let result = data.dropFirst(3).withUnsafeBytes { $0.load(as: UInt32.self) }

However, that will currently crash at runtime, because in this case load requires the base pointer to be correctly aligned for accessing UInt32. A workaround is required, such as the following:

let result = data.dropFirst(3).withUnsafeBytes { buffer -> UInt32 in
  var storage = UInt32.zero
  return withUnsafeMutableBytes(of: &storage) { scratch -> UInt32 in
    scratch.copyBytes(from: buffer.prefix(MemoryLayout<UInt32>.size))
    return scratch.load(as: UInt32.self)
  }
}

The necessity of this workaround (or of others that produce the same outcome) is unsatisfactory for two reasons; firstly it is tremendously non-obvious. Secondly, it requires two copies instead of the expected single copy: the first to a correctly-aligned raw buffer, and then to the final, correctly-typed variable. We should be able to do this with a single copy.

The kinds of types for which it is important to improve loads from arbitrary alignments are types whose values can be copied bit for bit, without reference counting operations. These types are commonly referred to as "POD" (plain old data) or "trivial" types. We propose to restrict the use of the unaligned loading operation to those types.

Proposed solution

We propose to add an API UnsafeRawPointer.loadUnaligned(frombyteOffset:as:) to support unaligned loads from UnsafeRawPointer, UnsafeRawBufferPointer and their mutable counterparts. These will be explicitly restricted to POD types. Loading a non-POD type remains meaningful only when the source memory is another live object where the memory is, by construction, already correctly aligned. The original API will continue to support this case.

UnsafeMutableRawPointer.storeBytes(of:toByteOffset:) is documented to only be meaningful for POD types. However, it enforces storage to a correctly-aligned offset at runtime. We propose to lift the alignment restriction while leaving the API unchanged and lightly updating its documentation. Please see the ABI stability section for a discussion of binary compatibility with this approach.

The UnsafeRawBufferPointer and UnsafeMutableRawBufferPointer types will receive matching changes.

Detailed design

extension UnsafeRawPointer {
  /// Returns a new instance of the given type, constructed from the raw memory
  /// at the specified offset.
  ///
  /// This function only supports loading trivial types.
  /// A trivial type does not contain any reference-counted property
  /// within its in-memory representation.
  /// The memory at this pointer plus `offset` must be laid out
  /// identically to the in-memory representation of `T`.
  ///
  /// - Note: A trivial type can be copied with just a bit-for-bit copy without
  ///   any indirection or reference-counting operations. Generally, native
  ///   Swift types that do not contain strong or weak references or other
  ///   forms of indirection are trivial, as are imported C structs and enums.
	///
  /// - Parameters:
  ///   - offset: The offset from this pointer, in bytes. `offset` must be
  ///     nonnegative. The default is zero.
  ///   - type: The type of the instance to create.
  /// - Returns: A new instance of type `T`, read from the raw bytes at
  ///   `offset`. The returned instance isn't associated
  ///   with the value in the range of memory referenced by this pointer.
  public func loadUnaligned<T>(fromByteOffset offset: Int = 0, as type: T.Type) -> T
}
extension UnsafeMutableRawPointer {
  /// Returns a new instance of the given type, constructed from the raw memory
  /// at the specified offset.
  ///
  /// This function only supports loading trivial types.
  /// A trivial type does not contain any reference-counted property
  /// within its in-memory representation.
  /// The memory at this pointer plus `offset` must be laid out
  /// identically to the in-memory representation of `T`.
  ///
  /// - Note: A trivial type can be copied with just a bit-for-bit copy without
  ///   any indirection or reference-counting operations. Generally, native
  ///   Swift types that do not contain strong or weak references or other
  ///   forms of indirection are trivial, as are imported C structs and enums.
  ///
  /// - Parameters:
  ///   - offset: The offset from this pointer, in bytes. `offset` must be
  ///     nonnegative. The default is zero.
  ///   - type: The type of the instance to create.
  /// - Returns: A new instance of type `T`, read from the raw bytes at
  ///   `offset`. The returned instance isn't associated
  ///   with the value in the range of memory referenced by this pointer.
  public func loadUnaligned<T>(fromByteOffset offset: Int = 0, as type: T.Type) -> T

  /// Stores the given value's bytes into raw memory at the specified offset.
  ///
  /// The type `T` to be stored must be a trivial type. The memory
  /// must also be uninitialized, initialized to `T`, or initialized to
  /// another trivial type that is layout compatible with `T`.
  ///
  /// After calling `storeBytes(of:toByteOffset:as:)`, the memory is
  /// initialized to the raw bytes of `value`. If the memory is bound to a
  /// type `U` that is layout compatible with `T`, then it contains a value of
  /// type `U`. Calling `storeBytes(of:toByteOffset:as:)` does not change the
  /// bound type of the memory.
  ///
  /// - Note: A trivial type can be copied with just a bit-for-bit copy without
  ///   any indirection or reference-counting operations. Generally, native
  ///   Swift types that do not contain strong or weak references or other
  ///   forms of indirection are trivial, as are imported C structs and enums.
  ///
  /// If you need to store a copy of a value of a type that isn't trivial into memory,
  /// you cannot use the `storeBytes(of:toByteOffset:as:)` method. Instead, you must know
  /// the type of value previously in memory and initialize or assign the
  /// memory. For example, to replace a value stored in a raw pointer `p`,
  /// where `U` is the current type and `T` is the new type, use a typed
  /// pointer to access and deinitialize the current value before initializing
  /// the memory with a new value.
  ///
  ///     let typedPointer = p.bindMemory(to: U.self, capacity: 1)
  ///     typedPointer.deinitialize(count: 1)
  ///     p.initializeMemory(as: T.self, repeating: newValue, count: 1)
  ///
  /// - Parameters:
  ///   - value: The value to store as raw bytes.
  ///   - offset: The offset from this pointer, in bytes. `offset` must be
  ///     nonnegative. The default is zero.
  ///   - type: The type of `value`.
  public func storeBytes<T>(of value: T, toByteOffset offset: Int = 0, as type: T.Type)
}

UnsafeRawBufferPointer and UnsafeMutableRawBufferPointer receive a similar addition of a loadUnaligned function. It enables loading from an arbitrary offset with the buffer, subject to the usual index validation rules of BufferPointer types: indexes are checked when client code is compiled in debug mode, while indexes are unchecked when client code is compiled in release mode.

extension Unsafe{Mutable}RawBufferPointer {
  /// Returns a new instance of the given type, constructed from the raw memory
  /// at the specified offset.
  ///
  /// This function only supports loading trivial types.
  /// A trivial type does not contain any reference-counted property
  /// within its in-memory stored representation.
  /// The memory at `offset` bytes into the buffer must be laid out
  /// identically to the in-memory representation of `T`.
  ///
  /// - Note: A trivial type can be copied with just a bit-for-bit copy without
  ///   any indirection or reference-counting operations. Generally, native
  ///   Swift types that do not contain strong or weak references or other
  ///   forms of indirection are trivial, as are imported C structs and enums.
  ///
  /// You can use this method to create new values from the buffer pointer's
  /// underlying bytes. The following example creates two new `Int32`
  /// instances from the memory referenced by the buffer pointer `someBytes`.
  /// The bytes for `a` are copied from the first four bytes of `someBytes`,
  /// and the bytes for `b` are copied from the next four bytes.
  ///
  ///     let a = someBytes.load(as: Int32.self)
  ///     let b = someBytes.load(fromByteOffset: 4, as: Int32.self)
  ///
  /// The memory to read for the new instance must not extend beyond the buffer
  /// pointer's memory region---that is, `offset + MemoryLayout<T>.size` must
  /// be less than or equal to the buffer pointer's `count`.
  ///
  /// - Parameters:
  ///   - offset: The offset, in bytes, into the buffer pointer's memory at
  ///     which to begin reading data for the new instance. The buffer pointer
  ///     plus `offset` must be properly aligned for accessing an instance of
  ///     type `T`. The default is zero.
  ///   - type: The type to use for the newly constructed instance. The memory
  ///     must be initialized to a value of a type that is layout compatible
  ///     with `type`.
  /// - Returns: A new instance of type `T`, copied from the buffer pointer's
  ///   memory.
  public func loadUnaligned<T>(fromByteOffset offset: Int = 0, as type: T.Type) -> T
}

Additionally, the semantics of UnsafeMutableBufferPointer.storeBytes(of:toByteOffset) will be changed in the same way as its counterpart UnsafeMutablePointer.storeBytes(of:toByteOffset), no longer enforcing alignment at runtime. Again, the index validation behaviour is unchanged: indexes are checked when client code is compiled in debug mode, while indexes are unchecked when client code is compiled in release mode.

extension UnsafeMutableRawBufferPointer {
  /// Stores a value's bytes into the buffer pointer's raw memory at the
  /// specified byte offset.
  ///
  /// The type `T` to be stored must be a trivial type. The memory must also be
  /// uninitialized, initialized to `T`, or initialized to another trivial
  /// type that is layout compatible with `T`.
  ///
  /// The memory written to must not extend beyond the buffer pointer's memory
  /// region---that is, `offset + MemoryLayout<T>.size` must be less than or
  /// equal to the buffer pointer's `count`.
  ///
  /// After calling `storeBytes(of:toByteOffset:as:)`, the memory is
  /// initialized to the raw bytes of `value`. If the memory is bound to a
  /// type `U` that is layout compatible with `T`, then it contains a value of
  /// type `U`. Calling `storeBytes(of:toByteOffset:as:)` does not change the
  /// bound type of the memory.
  ///
  /// - Note: A trivial type can be copied with just a bit-for-bit copy without
  ///   any indirection or reference-counting operations. Generally, native
  ///   Swift types that do not contain strong or weak references or other
  ///   forms of indirection are trivial, as are imported C structs and enums.
  ///
  /// If you need to store a copy of a value of a type that isn't trivial into memory,
  /// you cannot use the `storeBytes(of:toByteOffset:as:)` method. Instead, you must know
  /// the type of value previously in memory and initialize or assign the memory.
  ///
  /// - Parameters:
  ///   - offset: The offset in bytes into the buffer pointer's memory to begin
  ///     reading data for the new instance. The buffer pointer plus `offset`
  ///     must be properly aligned for accessing an instance of type `T`. The
  ///     default is zero.
  ///   - type: The type to use for the newly constructed instance. The memory
  ///     must be initialized to a value of a type that is layout compatible
  ///     with `type`.
  public func storeBytes<T>(of value: T, toByteOffset offset: Int = 0, as: T.Type)
}

Source compatibility

This proposal is source compatible. The proposed API modifications relax existing restrictions and keep the same signatures, therefore the changes are compatible. The API additions are source compatible by definition.

Effect on ABI stability

Existing binaries that expect the old behaviour of storeBytes will not be affected by the relaxed behaviour proposed here, as we will ensure that the old symbol (with its existing semantics) will remain.

New binaries that require the new behaviour will correctly backwards deploy by the use of the @_alwaysEmitIntoClient attribute. The new API will likewise use the @_alwaysEmitIntoClient attribute.

Effect on API resilience

If the added API were removed in a future release, the change would be source-breaking but not ABI-breaking, because the proposed additions will always be inlined.

Alternatives considered

Use a marker protocol to restrict unaligned loads to trivial types

We could enforce the use of unaligned loads at compile time by declaring a new marker protocol for trivial types, and require conformance to this protocol for types loaded through a function that can load from unaligned offsets. While this may be the ideal outcome, we believe this option would take too long to be realized. The approach proposed here can be a stepping stone on the way there.

Relax the alignment restriction on the existing load API

Arguably, user expectations are that the load API supports unaligned loads, but since that is not the case with the existing API, source-compatibility considerations dictate that the behaviour of the existing API should not change. If its preconditions were relaxed, a developer would encounter runtime crashes when deploying to a server using Swift 5.5, having tested using a newer toolchain.

For that reason, we chose to leave the existing API untouched.

Add a separate unaligned store API

Adding a separate unaligned store API would avoid ABI stability concerns, but the old API would become redundant. The risk of removing the restriction on storeBytes is less than it is for load, as the restriction is implemented using _debugPrecondition, which is compiled away in release mode.

Rename storeBytes to storeUnaligned, or call loadUnaligned loadFromBytes instead.

The idea of making the "load" and the "store" operations have more symmetric names is compelling, however there is a fundamental asymmetry in the operation itself. When a load operation completes, a new value is created to be managed by the Swift runtime. On the other hand the storeBytes operation is completely transparent to the runtime: the destination is a container of bytes that is not managed by the Swift runtime. For this reason, the "store" operation has the word "bytes" in its name.

Acknowledgments

Thanks to the Swift Standard Library team for valuable feedback and discussion.

12 Likes

Can we please consider exposing the Trivial marker protocol for this?

1 Like

We have considered it, and we believe it would delay this feature unreasonably. It is an intended next step for this and other features.

3 Likes

Source compatibility means all existing code has to work when compiled against a newer standard library. It does not mean all new code has to work when compiled against an older standard library.

I'm in favor of this variant of the proposal (and a possible loadAligned when performance is key) because of the principle that the default thing should be correct. Yes, we're already in UnsafePointer territory; yes, there's still the huge pitfall of bindMemory; but I still think it'd be a better user experience to get slower but always valid loads by default.

That said, binary compatibility may prove to be a limiting factor here: on Apple platforms, code compiled with Swift 5.6 must run with an old implementation of UnsafeRawPointer.load back to Swift 5.0. I suspect it'd be safe because a non-inlined version of the code would be generic, and therefore already using something like memcpy, but someone would have to verify.

4 Likes

This is not a problem; we can mark the "new" version aEIC for back-deployment, while keeping the "old" version around and marked unavailable, with appropriate dancing around mangling.

We don't have a nice solution like this for people building from source on older toolchains, however. While we don't have to support that use pattern, it is nice to do so when we can, to eliminate surprises. "This API is also available in your toolchain, but has a totally different behavior" is a nasty surprise.

3 Likes

Overall +1. I've been using this for years and suggesting the same workaround as described in the proposal whenever the issue comes up on these forums.

Does it? I'm not sure the scratch.load(as: UInt32.self) is actually necessary. You made a mutable raw pointer from storage and copied some bytes in to it; no need to load again (although I'd expect the compiler to eliminate it anyway).

The biggest advantage, IMO, is that it removes the closure used by withUnsafeMutableBytes. I've seen lots of situations where the compiler won't actually inline that closure, and you end up with a function call to closure #3 in someFunc. It can really hurt performance, and there's nothing you as a Swift user can do about it.

I think the new function should assert that the loaded type is POD. Even unsafe APIs do try to validate their inputs in debug builds.

Imagine, for example, that you're loading a POD type defined in a 3rd-party library, and you update your dependencies, and suddenly the library has made the type non-POD (which is totally fine - POD-ness is not part of any API contract we can express today). I think you should be notified loudly when that changes.

Related to that, we should also fail if somebody tries to load a resilient type.

That is unfortunate. IIRC, we already have layout constraints (e.g. _Trivial, _Trivial(64), etc), but they are only used by @_specialize. I'd very much like to see them usable as real generic constraints one day.

I think I'd be in favour of deprecating (not removing) the old API, so we'd have:

  • loadAligned
  • loadUnaligned
  • storeBytes

I'm not such a fan of plain load being the strictest version; and if we can't remove it due to source compatibility, we should at least give it a better name and discourage the old name. It would also help reinforce this point about loads and stores having different alignment considerations:

And one more thing:

Comment is *uhum* unaligned :upside_down_face:

3 Likes

There are various ways to write the workaround, but in any case there must be an intermediate storage location, which means there are two copies: from the original bytes to the intermediate, and from the intermediate to the destination.

It does.

My tools failed me.

1 Like

I agree with this statement in principle. There's actually no bindMemory pitfall with raw pointers, which is why this is such an important API.

On the other hand, we may want users reaching for the aligned version first. That saves them from some potentially serious performance pitfalls because of accidental misalignment, which they're very unlikely to find during testing. If they do in fact need misaligned access but reach for the wrong thing, they'll presumably find out quickly during testing.

I think this is highly debatable though. The backward source compatibility argument just tipped me over to the loadUnaligned side without some very strong evidence to counter it.

Incidentally, we could add a default aligned store along with storeUnaligned later. I just don't think that's worth doing until we have have a Trivial layout constraint.

1 Like

I'm a big +1 on this pitch. I've been implementing half-functional versions of loadUnaligned in all kinds of codebases, so I'm delighted to get a proper version that's more generally useful.

3 Likes

Will this support C unions? If I understand it correctly C unions take zero bytes in C, but the Swift equivalence - using enums - takes up one byte (at least), making it impossible to map between the two.

No it won’t. This is not a C interoperability feature, in the sense that it does not change Swift's interaction with C.

I'm not sure what you're really asking. C unions are the size of their largest member, so are generally not zero bytes unless empty or composed only of zero-byte types. An empty Swift enum or struct is also a zero-byte type, with a stride of one byte.

2 Likes

This is a much-needed change. I would strongly prefer we make the existing APIs allow unaligned loads and stores, instead of introducing new ones. We don't usually consider broadening an API to be a source-breaking change, and as Steve noted, these should be always-inlined anyway, so there is no ABI concern.

3 Likes

For the purposes of future-proofing across new architectures that Swift might target, we definitely want to have explicitly-aligned loads. There exists, and will continue to exist, hardware on which load operations that are not statically aligned require expensive software or hardware fix up. That could be a new operation, or it could be the existing operation, but we want it to exist in the standard library.

4 Likes

+1. I’ve needed this plenty of times.

However, I really don’t like leaving the existing load function as it is. It’s a footgun, and almost never what you want. If we can’t change it to do unaligned loads, I kinda like @Karl’s suggestion of deprecating it and introducing both loadAligned and loadUnaligned. That means assumptions about alignedness are at least documented in the code.

Can you expand on this? Except when you're interacting with a data format that requires explicit unaligned accesses, we want people to be aligning their operations as much as possible for performance reasons. Aligned loads and stores should be the 99% use case for most programs, even when working with raw memory.

2 Likes

I've seen a lot of claims either way - e.g. Data alignment for speed: myth or reality? – Daniel Lemire's blog

Still, I agree that alignment is generally the best thing for portable performance, but it might not be critical on modern processors. Loads which cross cache-lines seem to be more consistently bad for performance, but even then it appears that sometimes such loads can be predicted.

1 Like

This is absolutely true for the major architectures we are currently building for, but we are aiming for a language that has the ability to correctly support different architectures in the future. In any case aligned-by-default reduces the probability of a load that straddles a cache line.

Loads that straddle cache-lines are bad but not terrible. Loads, and especially stores, that straddle page boundaries are significant pain points. Exact numbers depend on uArch details, but it's certainly not uncommon for streaming unaligned stores to have an amortized cost of 1.5-2x. Even when they're not slow themselves, unaligned memory accesses also defeat other hardware optimizations, such as store to load forwarding.

It's not really a question of prediction, rather building the resources in the memory hierarchy to handle it without needing to replay a lot of work. The Intel optimization manual has some good information on the subject for Intel CPUs, as do Agner Fog's notes. The main thing to observe is that relatively simpler designs have higher costs for misalignment, because they have less budget for transistors to handle the fixup; they're more likely to stall and replay the access in a slow-but-careful mode, or worse, trap and let software fix it. So it's not a big issue for computer or phone CPUs, but lower-power devices and area- or power-constrained devices are more likely encounter problems.

7 Likes

Having an alignment-required variation seems reasonable to me. One way to express that might also be to give UnsafeRawPointer assertingAlignment and/or unsafelyAssumingAlignment methods, so you could write ptr.assertingAlignment(as: Int).load....