That's correct.

The other tool one might reach for here is to operate on the incoming byte stream with loadUnaligned(fromByteOffset:as:), before you decode into a tuple, but both approaches are valid.

1 Like

A few questions:

  1. You need some kind of generic Int32 + Int32 packed into Int64. Yes?

  2. Always the same widths? Is Int32 + UInt32 packed into Int64 needed?
    You can use UInt32(bitPattern:) to remove sign and then both Int32 and UInt32 are the same. This means homogenous tuple, but you have to encode/decode the field.

  3. Always the same types? Is Int16 + UInt32 packed into Int64 needed?

  4. Will you ever have holes? Things like: Int32 + UInt16 packed into Int64 (total 48 bits with 16 hole at the end).

  5. Are enums a thing? For example you have to encode:

    enum Foo {
      case a(Int32)
      case b(Int16, UInt16)
    }
    

    This can be packed with UInt8 tag + UInt32 payload into Int64.

  6. Is this still a problem:

    /// Contains the line number in the upper 20 bits, and
    /// the column index in the lower 12 bits.
    public let rawValue:Int32
    

Combinatorics

(I don't line this approach, but hey it is there, so somebody can mention it.)

If you need "everything can be packed together" approach with compile time safety for overflows (prevents from packing 100 bits into 64 bits) but you don't need to shift bits together (20 bits + 12 bits = 32 bits):

struct Pack64 {
  init(_ n0: UInt8, _ n1: Int32, _ n2: UInt16, _ n3: Int8)
  // And so on… with tons of overloads.
}

Number of overloads can be made smaller:

  • remove sign

    protocol PackInteger8 {
      var encode: UInt8 { get }
      func decode(_ n: UInt8) -> Self
    }
    
    extension UInt8 {
      var encode: UInt8 { self }
      func decode(_ n: UInt8) -> Self { n }
    }
    
    extension Int8 {
      var encode: UInt8 { UInt8(bitPattern: self) }
      func decode(_ n: UInt8) -> Self { Int8(bitPattern: n) }
    }
    
  • remove order - what is the difference between (Int16, Int32) and (Int32, Int16)? None. You can just encode bigger fields first.

With both optimizations for 64 bit values we will have only 9 cases (number = width):

  • [32, 32]
  • [32, 16, 16]
  • [32, 16, 8, 8]
  • [32, 8, 8, 8, 8]
  • [16, 16, 16, 16]
  • [16, 16, 16, 8, 8]
  • [16, 16, 8, 8, 8, 8]
  • [16, 8, 8, 8, 8, 8, 8]
  • [8, 8, 8, 8, 8, 8, 8, 8]

This is more manageable (example for [32, 16, 8, 8]):

struct Pack64 {
  init<T1: PackInteger64, T2: PackInteger16, T3: PackInteger8, T4: PackInteger8>(
    _ n1: T1,
    _ n2: T2,
    _ n3: T3,
    _ n4: T4
  ) {
    // Code similar to my previous post.
  }
}

struct Unpack {
  init(value: Int64) { self.value = value }

  func unpack<T1: PackInteger64, T2: PackInteger16, T3: PackInteger8, T4: PackInteger8>(
    _ n1: inout T1,
    _ n2: inout T2,
    _ n3: inout T3,
    _ n4: inout T4
  ) {
    // Code similar to my previous post.
  }
}

But yeah… this is a monstrosity. But it does give you compile time check that you will never "overflow" the underlying storage (64 bits).

Though the ergonomics is awful:

  • you have to sort fields by width - this prevents any form of code generation and is annoying to write.
  • SourceLocation.encode and SourceLocation.decode are totally different and have to be written by hand, and you can't do this on "autopilot".
  • does not pass the "can new employee use this" check - it has a little bit of magic. And if you need more than 5min to explain a piece of code then it is too complicated.
  • ugly af

Macros

You can to this, but they are ultra new, and I would not use them in production.
Let other people test them, and maybe after a few months they will be a viable option.
This is kind of living on the edge.

For me every time I ask myself: "Is is safe to use this?" then the answer is always "No". The fact that I even ask this question is a red flag.

Tuple ptr (unsafeBitCast)

If you only need Int32 + Int32 = Int64 then they may be viable options.
This only work for homogenous elements!

Not sure about performance. The standard Int solution is just a couple bit operations (mainly << and |), and from my experience they tend to be fast. But maybe the tuple pointer thingy is faster.

Final

I'm not sure if I understand the problem correctly (see my questions above).

If I had to write something the I would probably go with the solution from my previous post + unit tests to check for overflows (packing 100 bits into 64 bit etc.). It does look nicely, is easy to write, and can support packing 20bits + 12bits = 32 bits. The only drawback would be runtime checks for overflow.

struct SourceLocation {
  typealias BSONStorage = Int32
  let line: Int32
  let column: UInt32

  var encode: BSONStorage {
    var p = Pack32()
    p.packInt32(self.line, width: 20)
    p.packUInt32(self.column, width: 12)
    return p.signed
  }

  static func decode(_ source: BSONStorage) -> SourceLocation {
    // Just paste code from "encode" and use multi-cursors.
    var p = Unpack32(source)
    let line = p.unpackInt32(width: 20)
    let column = p.unpackUInt32(width: 12)
    return SourceLocation(line: line, column: column)
  }
}

Obviously there is also jrose library. It gives you compile time "overflow" checks and you can manage single bit at a time.

BSON (and therefore, MongoDB) does not support unsigned integers. you can round-trip them with init(bitPattern:) and friends, but they will not sort correctly.