MemoryLayout.size of an enum?

I expected the size of this enum to be 4:

	enum Signature : UInt32 {
		case eocd =  0x06054b50
	}

	MemoryLayout<Signature>.size    // Is 0.

I was trying to write a read method like this:

extension
FileHandle
{
	func
	read<T>()
		throws
		-> T
	{
		let count = MemoryLayout<T>.size
		guard
			let data = try self.read(upToCount: count),
			data.count == count
		else
		{
			throw Errors.readError
		}
		
		return data.withUnsafeBytes { $0.load(as: T.self) }
	}

What am I not understanding here?

Just a guess, but maybe single-element enums aren't actually stored because their values are deterministic - they can only be their sole case - at compile time?

Similarly the raw value (0x06054b50) can just be inlined anywhere the case is referenced.

9 Likes

Yeah, that does seem to be the case. Adding a second case changed the size to 1. Thanks!

Yeah, even when the enum has a RawValue, it's still stored as just the discriminant, and since there's only one case the discriminant is basically just the empty tuple, so it never needs to be stored anywhere.

Things only get more complicated when you have associated values like:

enum Number {
    case integer(Int)
    case float(Double)
}
1 Like

The raw value of an enum does not affect its memory representation whatsoever. It gets compiled to something like:

enum Signature {
  case eocd
}

extension Signature: RawRepresentable {
  typealias RawValue = UInt32

  var rawValue: RawValue {
    switch self {
    case .eocd:
      return 0x06054b50
    }
  }

  init?(rawValue: RawValue) {
    switch rawValue {
    case 0x06054b50:
      self = .eocd
    default:
      return nil
    }
  }
}

There's not a lot of great ways to manually layout enums as you'd like, so I've just been defaulting to using structs with static getters for everything:

struct Signature {
  let value: UInt32

  init(_ value: UInt32) {
    self.value = value
  }

  static var eocd: Signature {
    // you can be fancy and make this expressiblebyintegerliteral
    // if you'd like
    Signature(0x06054b50)
  }
}

switch signature {
case .eocd:
  ...
default:
  ...
}
5 Likes

There is one exception: @objc enums, since they have to be compatible with C. Those do get laid out as their raw type, with the raw value as the representation. (But also, the reason we didn’t do that for “regular” Swift enums was the hope that you wouldn’t be reinterpreting bytes the same way in Swift, because enums can have normal methods and properties now.)

11 Likes

All of that makes sense. I changed it to a struct, and I think it will work fine that way for me.

For sake of completeness, what you could also do is take in the generic T : RawRepresentable, use the memory size of T.RawValue, and then initialise from the bytes with T.init(rawValue: $0.load(as: T.RawValue.self))! }. That would work with the original enum declared above.

5 Likes

This is how I would do it, it saves you from having to write the switch yourself, and because init?(rawValue:) returns an Optional<Self>, it's easy to add a fallback path.

Interestingly array of signatures is not zero bytes (has to do with signature.stride == 1).

1 Like

The enum's size is determined by the compiler; it doesn't correspond to the size of its raw value.

Does this work for you:

enum Signature : UInt32 {
    case eocd =  0x06054b50
}

enum Errors: Error {
    case readError
}

extension FileHandle {
    func read<T: RawRepresentable>() throws -> T {
        let count = MemoryLayout<T.RawValue>.size
        guard
            let data = try self.read(upToCount: count),
            data.count == count,
            case let rawValue = data.withUnsafeBytes({ $0.load(as: T.RawValue.self) }),
            let value = T.init(rawValue: rawValue)
        else {
            throw Errors.readError
        }
        return value
    }
}

This isn't safe; RawRepresentable.RawValue can be any type, including a reference type. You can't necessarily just materialise one from untrusted bytes.

I can recall suggestions that BitwiseCopyable could allow for these constructs to be written safely - so you could write <T: RawRepresentable> where T.RawValue: BitwiseCopyable. I'm not entirely sure that is safe, either.

For an integer or float, it doesn't matter; it's not a safety issue. You may read an unexpected/nonsense value, but every bit-pattern is defined. If you load an integer or float from some unknown memory address, the compiler obviously won't assume any particular value in a way that UB could result if the assumption was incorrect.

But for an enum, if you load an enum from random bytes, it may have a bit-pattern that doesn't correspond to any of its cases. So when performing an exhaustive switch, it may be that you do not land in any branch:

enum MyEnum: BitwiseCopyable {
  case a, b, c
} // size: 1 byte

let myEnumValue = RandomBytes(1).load(as: MyEnum.self)

var x: SomeClass
switch myEnumValue {
case .a: x = SomeClass(...)
case .b: x = SomeClass(...)
case .c: x = SomeClass(...)
}
// Is 'x' definitely initialised here?
// What if myEnumValue is none of a/b/c?

I don't think the compiler can/would guarantee that all enum switches land in a deterministic branch, even if the enum has an invalid bit-pattern. At the same time, MyEnum is unquestionably BitwiseCopyable. So clearly BitwiseCopyable isn't the constraint we're looking for.

To connect this directly to the problem, if T.RawValue was another enum, it is theoretically possible that T.init?(_: RawValue) wouldn't actually initialise the T. And then switching over the resulting T could also lead to UB.

In order for this to be truly safe (given that it is generic), we need a constraint where every bit-pattern is an allowed value. Then you could truly read values from untrusted ("random") data without fear of UB.

Or you could just limit this to the one integer type this thing needs to support.

4 Likes

Yeah, a Swift equivalent of ByteMuck's AnyBitPattern or C++23's std::is_implicit_lifetime would be useful for cases like this.

Once we get BitwiseCopyable, we should be able to at least define our own protocol for this, but we would have to conform all of (U)IntN, (U)Int, and Float(32/64), to that protocol by hand. But not Bool, never Bool.

1 Like

This might ultimately be worth a pitch to extend enums and/or the compiler, to support serdes of enums. It's not that you can't do it yourself, today, but it is both surprisingly involved and very easy to do wrong - both factors that make something a good candidate for language / stdlib inclusion.

2 Likes