A Bite-Sized BitArray

Hey Swift community!

First things first, this post is my first interaction on here, and I'm still trying to get a feel for things on here, so please excuse me if I butcher this or I don't match the vibe quite right :sweat_smile:

I'm interested in working on a GSoC project that adds to Swift Stdlib's data structures, mentored by @lorentey! Particularly, a BitArray.

A BitArray works like an Array (in that you're storing a collection of Booleans), except it only takes 1/8 of the memory. This is because each Boolean values takes up a whole byte of memory, when you only need 1 bit to differentiate between true and false values. The BitArray would solve this by bit-stuffing the 1s and 0s to represent the trues and falses respectively into a UInt8, and the implementation behind the BitArray would be more like an Array, where a new element is needed for every 8 Boolean values. The values could be accessed like a normal array, and an example of working code could appear as follows:

`var flags: BitArray = [true, false, true, true, false, true, false, false]
// Accessing and changing values
flags[2] // true
flags[2] = false
flags[2] // false
flags[2].toggle()
flags[2] // true`

flags could be stored into one UInt8, depending on whether you'd want to store the bits from the front or the back (still need to research which is better), as 10110100 (or decimal number 180) or as 01001011 (decimal number 75). Adding more values would just create a new UInt8, and continue on with the stuffing.

Use Cases
I've serveyed a small number of people on Tech Twitter to see what kinds of use cases such an array could be useful for. I've gotten a wide range of opinions on whether or not this would even be useful, haha. So far I've gotten some neat ideas such as using it for gaming (levels passed), memory usage representation, learning websites (completed modules).

Challenges
The primary challenge I see would be to get the subscript to return a bit value stuffed into a UInt8 as a Boolean. I am hoping this won't be as challenging when I actually get into it as it perhaps appears at the moment. A clever use of bit operations along with some value casting might do the trick, while still permitting an acceptable (maybe even efficient) runtime. I could be wrong, but I'm excited to find out nonetheless.

Another challenge would be to figure out what to do when a UInt8 is holding less than 8 values. This is very likely to be the case with the last UInt8 in the collection. This would be an issue because, say we removed the last 2 values from flag. Because they would be empty, would we represent those last 2 bits with a 0? Wouldn't that just look like it was 2 false values? This is a challenge that, at the moment, I'm still not sure how to solve, but imagine it would be fun to figure out how to solve it!

Stretch Goals / Exposed Methods
I believe the stretch goals would be to expose new functions to the BitArray. In addition to toggle(), some obvious functions would be append(val), remove(at: index), removeLast(), etc.

These functions along with the data structure implementation will probably take up the entire summer on their own, if not longer. However, it is still neat to think of some other potential functions that could return valuable information to the developer. For example, inspired by a project David Smith (unfortunately can't mention him as I can't mention more than 2 people as a new user :slightly_frowning_face:) was telling me about, returning the beginning index of the first block with N contiguous false values could be useful for a BitArray representing blocks of memory where the developer was trying to find a certain amount of free space!

You Mean OptionSet?!
I realized from Bruno Rocha (unfortunately can't mention him as I can't mention more than 2 people as a new user :slightly_frowning_face:) that this BitArray is indeed very similar to an OptionSet! I've still yet to do some more research on the nuances between OptionSet and BitArray, but I believe the primary difference here is that OptionSet has a very specific purpose: storing options! I wish I could elaborate more on the differences, but again I still have yet to do more in-depth research!

So no, not quite an OptionSet, but the concepts are very similar!

My Questions to the Community!

First of all, thanks for reading all this! Second, I'd love to know your thoughts on the following:

  1. In your development experiences, where and how could you see a BitArray being useful or more optimal in your code? (Looking for some use cases here!)

  2. What other challenges do you see that my noob self might not have been aware of? :joy:

  3. What kind of functions would you like to see such a data structure be exposed to?

  4. Any other general thoughts!

All thoughts, opinions, and insights appreciated! Thanks so much!

Last but not least, I'd like to also mention and give a heartfelt thanks to @kylemacomber for encouraging me, motivating me, believing in me, and giving me this idea as something to work on this summer!
As well as David Smith, who brought me in to the Swift Stdlib world, and constantly encourages and motivates me! And lets me talk out a lot of code, haha!

32 Likes

This is perhaps getting a little far from the core proposal, but ever since I saw @porglezomp 's wonderful BitMatch macros for Rust (GitHub - porglezomp/bitmatch: A Rust crate that allows you to match, bind, and pack the individual bits of integers.), I've wanted bitfield manipulation in general in Swift to be nicer. This sounds like it could be a good step in that direction :slight_smile:

5 Likes

Hello @MahanazAtiqullah and welcome!

A BitArray type is interesting to me for bit-level protocol parsing and serialization! Things like compression formats and packets can contain data that's not byte-aligned and has uncommon numbers of bits. To support this use-case, having a BitSlice sub-sequence type would probably be very useful! (Note that this whole use-case is extremely different from OptionSet!)

You might be interested in the premiere Rust implementation of bit arrays, bitvec, as a point of reference. myrrlyn has written a lot about the design of that library and its tradeoffs, and I suspect he'd be happy to talk to you about it.

This sounds like an excellent project to me, good luck!

4 Likes

Thank you very much @porglezomp for this! This is very interesting, and I will definitely look deeper into these and likely get back to you!

worth mentioning the Swift PNG library uses a bitarray exactly like the one you are describing to power high-performance LZ77 compression and decompression.

that said, making the implementation worthwhile in terms of performance is more complicated than just stuffing bits into a buffer of UInt8s. in particular, accounting for endianness can really kill performance, so itโ€™s important to choose a backing type that matches the maximum width of the bitslices you intend to extract from the bitarray. in fact itโ€™s probably better to think of the elements of a bitarray as being sliding windows of an atom type (UInt8, UInt16, etc.) instead of Bool. this is an important difference between a bitarray and an OptionSet.

generally you will want to include padding of at least 2 storage atoms on either side of the buffer. the padding has a width of 2 and not 1 because many use cases of bit arrays expect one atom of algorithm-defined margin space, and you will need a second atom beyond that so that you can extract bitslices branchlessly without accidentally reading outside the buffer bounds.

7 Likes

Thank you so much for all this insight @taylorswift! To be honest, I'm still very much a noob and I tend to come across entirely new hurdles and concepts at nearly every step :sweat_smile:. So this is some great stuff for me to have to dig deeper in and look in to!

As long as BitArray is it's own type, and not a hacked together specialization of Array :stuck_out_tongue: , this is a great idea. It's not the most common container, but it's very handy when you need it.

2 Likes

Thank you so much, @Nobody1707! That is very encouraging :relaxed:! I hope I can come through with a polished and beneficial implementation by the end of the summer!

2 Likes

A few things that would elevate this past "like an array but more compact" to think about:

  • one of the things a bit array could be valuable for is encoding/decoding binary packed data. So appending integers in a particular order and width could be very useful (append Int32 as little endian), as would "extract 7 bits starting from index I into an Int(8/32/64)"
  • providing a "view" of the bits projected into Int8/16/32/64 in both bit and little endian
  • supporting and/or/xor/nor/xnor on equal length bit arrays
  • population count (like bitArray.filter { $0 == true }.count but faster(*))

Oh, and be careful to ignore people who bikes head too many extra features onto your summer volunteer job ;-) (yes, I'm actually aiming that humor straight at me & my requests here)

(*) many CPUs have a "population count" operation counting the number of set bits in a register. (note: http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=E6D7DF3C490F7CB59E216CFAF796AE56?doi=10.1.1.127.8246&rep=rep1&type=pdf describes a fast software implementation, there are many more out there)

3 Likes

Thank you so much @Josh_Osborne!
These are fantastic and very mind-opening ideas, and I will definitely coming back to these replies throughout the summer :smile:!

Oh, and be careful to ignore people who bikes head too many extra features onto your summer volunteer job ;-) (yes, I'm actually aiming that humor straight at me & my requests here)

Haha, I highly appreciate your humbleness and your encouragement! Definitely a good thing to keep in mind for me to keep my project a manageable bite! But either way, all these tips and advices open my mind for starting points to research so I appreciate it anyways!

Thank you for the link! I'll definitely look into it!

1 Like

The type wrapping the [UInt8] should keep track of how many bits of the final UInt8 are actually being used. The unused bits' values shouldn't matter, because your Sequence/Collection interface should never vend that part of the final byte.

For your Index, it should be like an (Array<UInt8>.Index, UInt8) tuple. The first member is the byte with the targeted bit; the second member is a mask referencing the position of the targeted bit. The endIndex would be (myBytes.endIndex, 1) if the bit elements take up the exact amount of space. This is OK since you should never be dereferencing it anyway.

1 Like

We already have FixedWidthInteger.nonzeroBitCount, which should lower directly to an LLVM intrinsic and take advantage of any CPU support:

5 Likes

You may want to start by practicing making a Collection out of exactly one integer:

/// A collection of `Bool` stored in an integer.
public struct CollectingInteger<Base: FixedWidthInteger & UnsignedInteger> {

    /// The bit storage.
    @usableFromInline
    var bits: Base

    /// Creates a collection from the bits of the given value.
    ///
    /// - Parameters:
    ///   - base: The source of the bits that will form the elements of this
    ///     collection.
    /// - Postcondition:
    ///   - `count == Base.bitWidth`.
    ///   - When not empty, `first` refers to the value of the lowest-order bit,
    ///     up to `last` referring to the highest-order bit.
    @inlinable public init(bitsOf base: Base = 0) { bits = base }

}

extension CollectingInteger: Hashable {}

extension CollectingInteger {

    /// Creates a collection from reading from the given iterator, optionally
    /// filling in a shortfall with the given value.
    public init?<I: IteratorProtocol>(
        extractingFrom source: inout I,
        fillingWith spares: Bool? = nil
    ) where I.Element == Bool {
        var baseValue: Base = 0
        for offset in 0..<Base.bitWidth {
            guard let nextBit = source.next() ?? spares else { return nil }

            if nextBit {
                baseValue |= 1 << offset
            }
        }
        self.init(bitsOf: baseValue)
    }

    /// Creates a collection from reading the given sequence, optionally filling
    /// in a shortfall with the given value.
    @inlinable
    public init?<S: Sequence>(
        readingFrom source: S,
        fillingWith spares: Bool? = nil
    ) where S.Element == Bool {
        var iterator = source.makeIterator()
        self.init(extractingFrom: &iterator, fillingWith: spares)
    }

}

extension CollectingInteger: ExpressibleByArrayLiteral {

    // Note: this crashes if `elements` is not long enough!
    @inlinable
    public init!(arrayLiteral elements: Bool...) {
        self.init(readingFrom: elements)
    }

}

extension CollectingInteger: RandomAccessCollection, MutableCollection { /*...*/ }

Then start on a collection of integers for the packed bits:

/// A collection of `Bool` packed into a collection of integers.
public struct CollectingIntegers<Base: RangeReplaceableCollection & MutableCollection>
where Base.Element: FixedWidthInteger & UnsignedInteger {

    /// The bit storage.
    @usableFromInline
    var bitPacks: Base

    @inlinable public init() { bitPacks = Base() }

}

extension CollectingIntegers: RangeReplaceableCollection, MutableCollection { /*...*/ }

extension CollectingIntegers: BidirectionalCollection where Base: BidirectionalCollection { /*...*/ }

extension CollectingIntegers: RandomAccessCollection where Base: RandomAccessCollection { /*...*/ }

extension CollectingIntegers: ExpressibleByArrayLiteral {

    @inlinable
    public init(arrayLiteral elements: Bool...) {
        self.init(elements)
    }

}

/// A collection of bits formed from an `Array` of bytes.
public typealias BitArray = CollectingIntegers<[UInt8]>

Edit: Added MutableCollection as a requirement to CollectingIntegers.Base. We really need to be able to specify the getter and setter of a subscript in separate extensions.

2 Likes

Thank you so much for this @CTMacUser!!
This helps a lot and I'll definitely be referring back to it in the summer!

Thanks for sharing this @Karl !

1 Like

Oh wow! I truly appreciate you taking the time to provide this for me!
Can't thank you enough!

Hello. I'm very glad that such topics become discussed in swift community.

In addition to the above, I suppose Bloom filter can be easily implemented on top of BitArray.

Also, Swift Collections library was announced several days ago: Swift.org - Introducing Swift Collections
This library is a good candidate for sharing efficient BitArray implementation with other people.

1 Like

Thank you @Dmitriy_Ignatyev!
Thanks, I'll take a peek at the Bloom filter soon!

Yes! The plan I believe is for BitArray to be a contribution to Swift Collections!

1 Like