Bit fields, like in C

Bitfields are a notorious minefield in C and C++. They look like a thing you want to use at first, but their layout basically isn't specified by the language at all. They are occasionally useful if you only need to support a single compiler for a single hardware platform, but broadly speaking they cause more harm than good--it's almost always better to simply define a bunch of access functions that do your "bitfield" access explicitly with shifts and masks when you want to operate on a packed format.

I would only want to do this for Swift if we completely and unambiguously define the "compressed storage format" of fields. But we don't have a way to specify the layout of a "normal" struct in Swift yet (you have to define a C struct and import it instead), so this is somewhat putting the cart before the horse.

I don't think that there's any need to have a sea of types. The right way to model this (in my opinion) is to be able to define a "compressed storage format" for a type, and have the compiler emit the pack/unpack operations between that format and the "normal Swift struct" layout where necessary.

10 Likes

Swift has the OptionSet protocol to implement bitfields. It’s not very C-like but that’s good, as it works much like any other Swift struct or enum. I could write more but you should look at the documentation instead.

4 Likes

OptionSet is good at what it does, but it's only a piece of what is meant by "bitfields" in C or C++ (and it has some of the same difficulty as C bitfields around endianness and layout, which I would expect a complete "bitfield" proposal to solve).

3 Likes

I don't think we ever want a "specified layout" for most types, because it doesn't matter in the vast majority of cases, and we'd want to eventually be able to do automatic layout optimization to minimize padding, to pack Bools, enums, and other small fields into spare bits or bitfields automatically, to prune dead fields, and so on. It would be nice if the "actually I do want a fully specified layout" feature did support precise description of bit-packing, though.

1 Like

Oh definitely. I don't want this for normal types, rather to be an opt-in thing (much like C bitfields are!)

If we ever have a "precise layout" feature, I think it should be directed squarely at use cases like network protocols and serialization. Language interoperation use cases should be handled by importing C headers. Storage optimization use cases should mostly be handled by (1) the compiler doing a certain amount of bit-packing of value types automatically, which will allow (2) programmers to rework their representations in order to take advantage of that.

That said, the compiler can't automatically compress to less than the formally-expressible range of a type. We can bit-pack Bool and frozen enums, but we'd still need some ability to just put a range constraint on an integer.

4 Likes

What makes you say that? I like the idea of repr(C) in Rust, isn't that something that's desirable to ultimately have?

1 Like

I personally don't see eliminating C headers as an interesting goal in and of itself. If you already have an existing C header that perfectly describes the interface you want to work with, you should just use that header instead of trying to duplicate it perfectly in Swift as an exercise.

2 Likes

repr(C) would require the entirety of C (including all of the various vendor-specific packing and layout extensions) to be reflected into Swift. By having C interop be through C headers, we can import types with the full expressivity of the C dialects Clang supports (including things like bitfields which are standard in C but have no current Swift equivalent) without having to translate everything precisely into Swift.

9 Likes

I realized, and should have mentioned, that automatic synthesis requires the tuple or grid array to not be a loosely-sized type, since CompactValue specifies a compile-time sized array.

Since we're turning something whose bits go from taking one bit of space to either one byte or word of space (depending on how Bool is stored), the transport type definitely isn't compact! We should use "BooleanExpansion" and "booleanExpansion" for names instead.

Maybe we can use a new conversion initializer:

public protocol BinaryInteger : /*...*/ {
    //...

    /// Creates a new instance from a bit pattern copied from the given Boolean
    /// grid array, sign-extending or truncating as needed to fit.
    ///
    /// When the bit width of `T` (the type of `source`) is equal to or greater
    /// than this type's bit width, the result is the truncated lowest-valued
    /// indices of `source`. For example, when converting a pack of 16 Boolean
    /// values to an 8-bit type, only the lower 8 elements of `source` are used.
    ///
    ///     let p: Int16 = -500
    ///     let pa = p.booleanExpansion
    ///     // 'p' has a binary representation of 0b11111110_00001100, so the
    ///     // expansion of 'pa' is [false, false, true, true, false, false,
    ///     // false, false, false, true, true, true, true, true, true, true].
    ///     let q = Int8(transliterating: pa)
    ///     // q == 12
    ///     // 'q' has a binary representation of 00001100
    ///     // 'q.booleanExpansion' is [false, false, true, true, false, false,
    ///     // false, false]
    ///
    /// When the length of `T` is less than this type's bit width, the result is
    /// *sign-extended* to fill the remaining bits. That is, most-significant
    /// bits of the result are filled with 0 or 1 depending on whether
    /// `source.last` is `false` or `true`. (If `source` is empty, the result is
    /// set to zero.)
    ///
    ///     let u: Int8 = 21
    ///     // 'u' has a binary representation of 00010101
    ///     // Its Boolean expansion is [true, false, true, false, true, false,
    ///     // false, false].
    ///     let v = Int16(transliterating: u.booleanExpansion)
    ///     // v == 21
    ///     // 'v' has a binary representation of 00000000_00010101
    ///
    ///     let w: Int8 = -21
    ///     // 'w' has a binary representation of 11101011
    ///     // Its Boolean expansion is [true, true, false, true, false, true,
    ///     // true, true].
    ///     let x = Int16(transliterating: w.booleanExpansion)
    ///     // x == -21
    ///     // 'x' has a binary representation of 11111111_11101011
    ///     let y = UInt16(transliterating: w.booleanExpansion)
    ///     // y == 65515
    ///     // 'y' has a binary representation of 11111111_11101011
    ///
    /// - Parameter source: An integer to convert to this type.
    init<T: [_? ; Bool]>(transliterating source: T)

    //...
}

That may have to wait until we get value-based generic parameters, since the bounds can't be encoded per-instance, but per type instantiation. Then we need to determine the span of values and how many bits are required, all in @constexpr time, so the results can be used within a BitFieldRepresentable.

I would want to punt C; it's a failure every time we have to punt to C. What if we want bit-fields for a Swift-original type, not something ported over?

So, something in the direction of using property wrappers to potentially compact a Bool array to a bit-packed one?

Running with that idea, when we have variadic generics, we could use another property wrapper to create the bit-field effect. This property wrapper would replace the "@compact" built-in.

@dynamicMemberLookup
@propertyWrapper
public struct BitFielded<Storage: FixedWidthInteger, variadic Value: BitFieldRepresentable> {
    @constexpr private static bitCounts = toArray(Value.#map(\.BooleanExpression.count))
    @constexpr private static bitOffsets = scan(bitCounts)
    @constexpr private static totalBitCount = bitCounts.reduce(0, +)
    @constexpr private static let wholeWordCount = totalBitCount / Storage.bitWidth
    @constexpr private static let partialWordCount = totalBitCount % Storage.bitWidth == 0 ? 0 : 1
    @constexpr private static let wordCount = wholeWordCount + partialWordCount

    private typealias Words = [wordCount ; Storage]
    private storage: Words

    public var wrappedValue: (#explode(Value)) {
        get {
            // Can't think of anything right now.
        }
        set {
            self = Self(wrappedValue: newValue, into: Storage.self)
        }
    }
    @inlinable public var projectedValue: Self { return self }

    public init(wrappedValue: (#explode(Value)), into type: Storage.Type) {
        storage = fill(a: Words.self, with: 0)
        // The rest of this is too complicated for me to figure out right now.
    }

    subscript<T>(dynamicMember member: KeyPath<#explode(Value), T>) -> T {
        get {
            // 1. Peel off the head key path component.
            // 2. Get its offset from the value-tuple's start.
            // 3. Get its bit-length within the entire bit-field span.
            // 4. Extract that span of bits.
            // 5. Convert it to the component's full type.
            // 6. Follow the rest of the key path off that component.
        }
        set { /* Do something similar to the "get," but write back too. */ }
    }
}

// I don't know if this would be legal for defaulting the storage type.
extension BitFielded where Storage == UInt {
    @inlinable public init(wrappedValue: #explode(Value)) {
        self.init(wrappedValue: wrappedValue, into: Storage.self)
    }
}

Future idea after the code above: make placeholder empty types that push the next member to a new Storage word.

Other strictly typed languages have clearly defined bit fields. Too bad no one looked at Ada when designing Swift.

   type RAM_Register is
      record
         Opcode : Opcode_Number;
         Z      : Bit;
         C      : Bit;
         R      : Bit;
         I      : Bit;
         Cond   : Condition_Number;
         Rsvd_1 : Bit;
         Rsvd_2 : Bit;
         Dest   : Operand;
         Src    : Operand;
      end record;

   for RAM_Register use
      record
         Opcode at 0 range 28 .. 31;
         Z      at 0 range 27 .. 27;
         C      at 0 range 26 .. 26;
         R      at 0 range 25 .. 25;
         I      at 0 range 24 .. 24;
         Cond   at 0 range 20 .. 23;
         Rsvd_1 at 0 range 19 .. 19;
         Rsvd_2 at 0 range 18 .. 18;
         Dest   at 0 range  9 .. 17;
         Src    at 0 range  0 ..  8;
      end record;

In addition to use-cases as mentioned earlier, they are also handy when accessing hardware-level registers with bit-packed fields.

As @scanon mentioned, during my work on Decimal numbers, the encoding/decoding was made simpler with a set of bit field functions. Maybe these functions could be formalized and added as an FixedWidthIntegers extension.

1 Like

Here's the FixedWidthIntegers extension:

/// Defines bit-related operations such as setting/getting bits of a number
extension FixedWidthInteger {
  private func mask(_ size: Int) -> Self { (Self(1) << size) - 1 }
  
  /// Returns the bits in the `range` of the current number where
  /// `range.lowerBound` ≥ 0 and the `range.upperBound` < Self.bitWidth
  public func get(range: ClosedRange<Int>) -> Int {
    precondition(range.lowerBound >= 0 && range.upperBound < Self.bitWidth)
    return Int((self >> range.lowerBound) & mask(range.count))
  }
  
  /// Returns the `n`th bit of the current number where
  /// 0 ≤ `n` < Self.bitWidth
  public func get(bit n: Int) -> Int {
    precondition(n >= 0 && n < Self.bitWidth)
    return ((1 << n) & self) == 0 ? 0 : 1
  }
  
  /// Logically inverts the `n`th bit of the current number where
  /// 0 ≤ `n` < Self.bitWidth
  public mutating func toggle(bit n: Int) {
    precondition(n >= 0 && n < Self.bitWidth)
    self ^= 1 << n
  }
  
  /// Sets to `0` the `n`th bit of the current number where
  /// 0 ≤ `n` < Self.bitWidth
  public mutating func clear(bit n: Int) {
    precondition(n >= 0 && n < Self.bitWidth)
    self &= ~(1 << n)
  }
  
  /// Sets to `1` the `n`th bit of the current number where
  /// 0 ≤ `n` < Self.bitWidth
  public mutating func set(bit n: Int) {
    precondition(n >= 0 && n < Self.bitWidth)
    self |= 1 << n
  }
  
  /// Replaces the `n`th bit of the current number with `value` where
  /// 0 ≤ `n` < Self.bitWidth
  public mutating func set(bit n: Int, with value: Int) {
    value.isMultiple(of: 2) ? self.clear(bit: n) : self.set(bit: n)
  }
  
  /// Sets to `0` the bits in the `range` of the current number where
  /// `range.lowerBound` ≥ 0 and the `range.upperBound` < Self.bitWidth
  public mutating func clear(range: ClosedRange<Int>) {
    precondition(range.lowerBound >= 0 && range.upperBound < Int.bitWidth)
    self &= ~(mask(range.count) << range.lowerBound)
  }
  
  /// Replaces the bits in the `range` of the current number where
  /// `range.lowerBound` ≥ 0 and the `range.upperBound` < Self.bitWidth
  public mutating func set(range: ClosedRange<Int>, with value: Int) {
    self.clear(range: range)
    self |= (Self(value) & mask(range.count)) << range.lowerBound
  }
}
1 Like

At the level of a gut reaction, I hate this idea. It raises all the questions that C bitfields raise, including:

  • How does this interact with per-platform endianness, when there are more than 8 bits?
  • What is the numbering order of bits within a single byte (left to right or right to left, and which is the left; alternatively, high order to low or low to high, and which is the high end)?
  • How does the top to bottom layout in source code relate to bit order within the memory that the bit fields occupy?
  • Is the source code order restrictive on the memory layout order?
  • What happens to multiple-bit fields when they cross byte boundaries?
  • When a series of bit fields are interrupted by another kind of struct member, can the compiler rearrange them for better memory packing?
  • When there's a multiple-bit field (e.g. 3 bits) how do you alias that to 3 single bit fields?

etc.

Now, clearly, all of those questions could have absolutely definitive answers in a spec or proposal, but people are likely to have various preferences about what the answers should be, except maybe the answer, "It should behave exactly like C, for interoperability reasons." In that case, we're stuck with the terrible C ergonomics of bit fields.

I don't think you're wrong that bit field access is useful in plenty of use-cases, but I can't help feeling that it'd be better if each use-case had its own definition and implementation of the behavior appropriate to itself, rather than contaminating Swift with stuff that C already does badly.

2 Likes

Just to be clear, there is no "exactly like C" for these questions, because C makes almost all of these implementation-defined, and there's no consensus implementation choice.

7 Likes

fwiw, I found it very useful to map bit-ranges on UInt64 to Int enums, etc., and then use atomic operations available on UInt64, to encode complex state changes without locks or actors. Everything was static-constant, and on initialization it would catch layout errors.

It's not so much a use-case as a class of solutions available if the domain can be mapped using the API. Regardless, Swift already had all the language features needed.

1 Like

Not sure about your confusion. Except for wacky graphic formats (thanks Steve), bit 0 is usually the least significant bit in the same way that it is for the integer types. Even if you have a wacky use-case where the bits need to be transferred across a serial communications channel, the registers holding the data to be sent are organized (AFAAK) with bit 0 as the LSB. Top to bottom source ordering is irrelevant since the bit fields are defined by bit ranges.

For example, 0b100 is always interpreted as the the number 4 and never as the number 1.

BTW, endianess in these days is really a non-issue since most computers use 64-bit busses that transfer words and not bytes. Internally registers hold data, by convention, with right-most bit being LSB. Computers can emulate endianess for those who still believe this is a real concern (mainly for backward compatibility).

Interesting, can you share some examples.

This is really, really not the case. There are lots of sub-byte image pixel formats and bitstream formats that define bits as being ordered from MSB to LSB. There are also lots that go from LSB to MSB. There are even two-byte pixel formats with little-endian bit order and big-endian byte order such that the bits of one of the color channels end up being discontiguous when viewed as a 16b field with "normal" ordering:

|           byte 1            |         byte 0          |
+-----------------------------+-------------------------+
| bit  0  1  2  3  4  5  6  7 |  0  1  2  3  4  5  6  7 |
|     G3 G4 G5 B0 B1 B2 B3 B4 | R0 R1 R2 R3 R4 G0 G1 G2 |
+-----------------------------+-------------------------+

Bitfields and bitstream formats are a mess, and for any crazy thing you can think of that no one would ever do, there are tens of formats that did it, and you'll get stuck supporting one of them eventually.

12 Likes

If I had to support this graphic format, I would reorder things as follows:

|           byte 1            |         byte 0          |
+-----------------------------+-------------------------+
| bit  7  6  5  4  3  2  1  0 |  7  6  5  4  3  2  1  0 |
|     B4 B3 B2 B1 B0 G5 G4 G3 | G2 G1 G0 R4 R3 R2 R1 R0 |
+-----------------------------+-------------------------+

No harm, no foul and it is now in a more standard format - assuming bit 7 means the most significant bit. If bit 0 actually means the opposite, we can just flip the bit numbers. The graphic stream remains the same.

I recall working with some of these graphic formats, which are usually from cheap LCD controllers circa the 1980s. Most were lacking documentation and what did exist was pretty marginal. These days graphic controllers support at least 8-bits per color - if not more.

I regret to inform you that sub-byte formats are alive and well, not because controllers can't handle higher bit-depth, but for data size reasons. They're not used as pervasively as they were, but 565 and 1555 are still fairly common for things like map tiles where you don't really need more bit depth and file size (or dirty memory size) is at a premium.

4 Likes