@Andrew_Trick, @SusanCheng, alignment is one of the things I've yet to strongly grasp. Could you help me understand what my options are (today) if the received byte stream is "packed", i.e. potentially not aligned the way Swift would like the values to be? My intuition is that if you're accessing a value in place and it is misaligned that seems bad, but my understanding was that UnsafePointer.load(as:) first makes a copy of the data so it is never associating the type with the original (potentially un-aligned) bytes, which seems... less bad?
Although load(as:) doesn't associate a type with the memory being loaded, it still unfortunately assumes that the pointer is aligned properly for a value of that type. Like in C, you could use memcpy or Swift's equivalent UnsafeMutableRawPointer.copyMemory method to do unaligned loads and stores. I usually add some helper methods to UnsafeRawPointer to facilitate this; as Andrew said, it'd be great to have these in the standard library:
extension UnsafeRawPointer {
func loadUnaligned<T>(as: T.Type) -> T {
assert(_isPOD(T.self)) // relies on the type being POD (no refcounting or other management)
let buffer = UnsafeMutablePointer<T>.allocate(capacity: 1)
defer { buffer.deallocate() }
memcpy(buffer, self, MemoryLayout<T>.size)
return buffer.pointee
}
}
Not officially, but there isn't an ordained way to do so.
Currently, yes. A withScopedAllocation { } API to give you a pointer to temporary, uninitialized memory would also be a great addition. If you have a balanced malloc/free, however, note that it will get promoted away by LLVM:
Unfortunately, memcpy is currently the way to load misaligned data (semantically one byte at a time). Or you can load the bytes yourself and piece them together with bitwise operators.
This is obviously an unsatisfactory way to handle packed data. I think there's an argument to be made for having the UnsafeRawPointer load default to loading unaligned data. That wasn't done initially because
compiler support didn't exist, but that's easy to add
ignoring alignment may indicate a programmer error, this way you're forced to think about alignment and make it explicit
we wanted higher-level APIs to be expressible in terms of raw pointers without changing semantics or losing performance
we can loosen this restriction over time but can't strengthen it
It'd be great if load and store just worked with unaligned data, I agree. That would require a constraint that they only work with POD types, or maybe that they only support unaligned loads on POD types, since generic non-POD types don't have value witnesses for unaligned accesses, but that's probably fine, and the supporting unaligned loads only for POD types would be backward compatible with the current semantics.
@Andrew_Trick I'm a long-time Swift user but new to contributing: Is there anything I can do to help add weight to that [SR-10273]? I know with bugreporter.apple.com the WWDC recommendation is to file duplicates to express cumulative interest.
@cconway Bringing it up on this forum was a good start. I added an explanation about how to proceed in the bug. Essentially, someone needs to follow through with a prototype. I could help with that. Then someone needs to drive the Swift Evolution proposal to decide whether to change default behavior or simply add a new public API flag.
Ok, I will plan to do something like the extension @Joe_Groff posted above for the time being.
I'm curious, however: The code snippet that @SusanCheng posted seems to suggest that calling UnsafeRawPointer.load(fromByteOffset: as:) in an unaligned case will trigger a stop in execution due to _debugPrecondition(), but I haven't noticed any issues with my non-alignment-aware parsing code prior to Xcode 10.2. Have I just gotten lucky that my binary data was aligned and didn't trigger the precondition? Say I do end up parsing misaligned data in a production build without a mitigation like memcpy: Would I expect a crash, poor performance, unpredictable behavior, or other?
My code is doing the same thing as UnsafeRawPointer.load but without the alignment check. It means that it’s just force loading type T from memory and ignore the alignment.
I think it’s easy to get dragged down in the maw of unsafe memory access here, because most of us have come from some C-based language where that’s the only option. Personally I think that’s a mistake in most cases. One of the main goals of Swift is safety, and if you parse incoming data using the same style as you’d use in C, you are likely to suffer from the same exploitable memory management bugs as C.
Hence my post on the thread you referenced, outlining a completely different, and much safer, way to approach this problem. It’s not applicable in all cases, but IMO you should default to this approach, leaving the unsafe, C-style code for situations where profiling has shown that you absolutely need it.
@eskimo, thanks for re-linking to that post. I looked more closely at your alternative implementation and that does answer part of what I was originally asking about, which is best practices when parsing binary data from a source you don't control.
Does my extension below still capture what you were trying to demonstrate about moving up in abstraction? Instead of calling Data.withUnsafePointer() I envision getting slices of my Data packet around expected values, then parsing the expected integer type from it with this function. NOTE: I had to insert a .reversed() to your example code in order for it to correctly parse little endian integers.
It's a shame Data is so complex and a lot of the safe solutions are so convoluted. I wish we had a more basic array-of-bytes data structure in standard lib with convenient low level methods like (de)serialization of ints and simpler interop with C.
Does my extension below still capture what you were trying to
demonstrate about moving up in abstraction?
Yep. I have written code like that myself (-:
I had to insert a .reversed() to your example code in order for it
to correctly parse little endian integers.
Be careful here. My code used big endian because that’s the documented order for 'icns' data. If you always reverse, you’re assuming little endian. That may be the right thing to do in your case, but if your goal is to support host endian — that is, big endian on big-endian architectures, little endian on little-endian ones — you’ll have to conditionalise that reverse.
It's a shame Data is so complex and a lot of the safe solutions are so
convoluted. I wish we had a more basic array-of-bytes data structure
in standard lib with convenient low level methods like
(de)serialization of ints and simpler interop with C.
It’s not easy to meet these requirements. On the one hand, you’re saying that Data is too complex, and on the other hand you’re asking to make it more complex by adding support for serialisation of common types. Also, from your other posts, I know that you’re very concerned about performance, and that’s often at odds with convenience.
However, I am sympathetic to your goals here, and I’m not the only one. If you search Swift Forums for data standard library, you’ll see multiple evolution threads about moving Data to the standard library. And the SwiftNIO folks created ByteBuffer because it offers specific advantages over Data.
If you want to help make this wish a reality, I recommend that you engage in Swift Evolution.