[Pitch] Unaligned Loads and Stores from Raw Memory

I agree with this statement in principle. There's actually no bindMemory pitfall with raw pointers, which is why this is such an important API.

On the other hand, we may want users reaching for the aligned version first. That saves them from some potentially serious performance pitfalls because of accidental misalignment, which they're very unlikely to find during testing. If they do in fact need misaligned access but reach for the wrong thing, they'll presumably find out quickly during testing.

I think this is highly debatable though. The backward source compatibility argument just tipped me over to the loadUnaligned side without some very strong evidence to counter it.

Incidentally, we could add a default aligned store along with storeUnaligned later. I just don't think that's worth doing until we have have a Trivial layout constraint.

1 Like

I'm a big +1 on this pitch. I've been implementing half-functional versions of loadUnaligned in all kinds of codebases, so I'm delighted to get a proper version that's more generally useful.

3 Likes

Will this support C unions? If I understand it correctly C unions take zero bytes in C, but the Swift equivalence - using enums - takes up one byte (at least), making it impossible to map between the two.

No it won’t. This is not a C interoperability feature, in the sense that it does not change Swift's interaction with C.

I'm not sure what you're really asking. C unions are the size of their largest member, so are generally not zero bytes unless empty or composed only of zero-byte types. An empty Swift enum or struct is also a zero-byte type, with a stride of one byte.

2 Likes

This is a much-needed change. I would strongly prefer we make the existing APIs allow unaligned loads and stores, instead of introducing new ones. We don't usually consider broadening an API to be a source-breaking change, and as Steve noted, these should be always-inlined anyway, so there is no ABI concern.

3 Likes

For the purposes of future-proofing across new architectures that Swift might target, we definitely want to have explicitly-aligned loads. There exists, and will continue to exist, hardware on which load operations that are not statically aligned require expensive software or hardware fix up. That could be a new operation, or it could be the existing operation, but we want it to exist in the standard library.

4 Likes

+1. I’ve needed this plenty of times.

However, I really don’t like leaving the existing load function as it is. It’s a footgun, and almost never what you want. If we can’t change it to do unaligned loads, I kinda like @Karl’s suggestion of deprecating it and introducing both loadAligned and loadUnaligned. That means assumptions about alignedness are at least documented in the code.

Can you expand on this? Except when you're interacting with a data format that requires explicit unaligned accesses, we want people to be aligning their operations as much as possible for performance reasons. Aligned loads and stores should be the 99% use case for most programs, even when working with raw memory.

2 Likes

I've seen a lot of claims either way - e.g. Data alignment for speed: myth or reality? – Daniel Lemire's blog

Still, I agree that alignment is generally the best thing for portable performance, but it might not be critical on modern processors. Loads which cross cache-lines seem to be more consistently bad for performance, but even then it appears that sometimes such loads can be predicted.

1 Like

This is absolutely true for the major architectures we are currently building for, but we are aiming for a language that has the ability to correctly support different architectures in the future. In any case aligned-by-default reduces the probability of a load that straddles a cache line.

Loads that straddle cache-lines are bad but not terrible. Loads, and especially stores, that straddle page boundaries are significant pain points. Exact numbers depend on uArch details, but it's certainly not uncommon for streaming unaligned stores to have an amortized cost of 1.5-2x. Even when they're not slow themselves, unaligned memory accesses also defeat other hardware optimizations, such as store to load forwarding.

It's not really a question of prediction, rather building the resources in the memory hierarchy to handle it without needing to replay a lot of work. The Intel optimization manual has some good information on the subject for Intel CPUs, as do Agner Fog's notes. The main thing to observe is that relatively simpler designs have higher costs for misalignment, because they have less budget for transistors to handle the fixup; they're more likely to stall and replay the access in a slow-but-careful mode, or worse, trap and let software fix it. So it's not a big issue for computer or phone CPUs, but lower-power devices and area- or power-constrained devices are more likely encounter problems.

7 Likes

Having an alignment-required variation seems reasonable to me. One way to express that might also be to give UnsafeRawPointer assertingAlignment and/or unsafelyAssumingAlignment methods, so you could write ptr.assertingAlignment(as: Int).load....

The UnsafeRawPointer APIs work with imported C unions. You can try it today. This proposal doesn't add anything new in that respect.

Converting from C unions to Swift enums is an unrelated problem. Their layouts are not compatible except in certain single-payload cases dictated by the ABI.

1 Like

Interacting with binary serialization formats that require unaligned accesses (e.g. formats like BSON) has been 100% of my use cases. load looks like exactly the right API for that kind of thing, and when it doesn’t work, you’re left thinking that you’re ”holding it wrong”, because surely the whole withUnsafe… dance can’t be the best way.

I agree with the idea of broadening the current load API to support this, instead of introducing a new one.

The necessity of this workaround (or of others that produce the same outcome) is unsatisfactory for two reasons; firstly it is tremendously non-obvious.

I'm worried that knowing to use loadUnaligned(frombyteOffset:as:) instead would be equally non-obvious.

If loadUnaligned(frombyteOffset:as:) were to exist, would there be any reason to call load(frombyteOffset:as:)? Is there anything I'm overlooking?

For better or worse, there are platforms that don’t support unaligned loads, and on those platforms this would have to be implemented by loading bytes and stitching them together (effectively memcpy). Even on x86 aligned loads are faster but I don’t know how much statically knowing that buys you.

4 Likes

Even on x86 aligned loads are faster but I don’t know how much statically knowing that buys you.

It allows you to prevent performance bugs from creeping in by making it so that unaligned accesses will fail, producing easy-to-diagnose errors immediately instead of performance regressions that may go undetected for months.

1 Like

This proposal is actually in active review, so if you have thoughts about it, please post them there instead of here in the pitch thread.

2 Likes