Bit Array and Bit Set API Review (The end of a GSoC Project)

BitArray and BitSet API Review

Hey everyone! This is an overview of the API for the BitArray and BitSet data structures for the Swift Collections package. This project was done as a GSoC (Google Summer of Code) project for which you can find the original Swift Forums post here. This is the resulting API of the project. Please have a look! I'll appreciate any feedback you have!

Introduction

The BitArrayModule contains two data structures in its initial release, the BitArray and the BitSet, that allow for boolean values to be stored using less memory while also allowing for certain faster operations.

Motivation

Since a Bool has two values (true and false) in its domain, it only needs one bit of memory to represent its value. However, as a variable cannot be stored in a space smaller than one byte in memory, a Bool ends up using 8 times more memory than it would need. For a single value, this is hardly an issue, but for a larger Array<Bool>, this makes room for memory optimization. Additionally, if Bools were stored as bits, this could result in faster operations like that of bitwise AND, as chunks of values could likely be manipulated at once instead of iterating and operating over every single one.
Hence, creating a data structure that makes more efficient use of memory could also result in optimized performance.

Proposed Solution

As it isn't exactly feasible to create a variable that gets stored into a bit, the primary solution would be to have the Bools represented as the bits of an unsigned integer in a data structure called a BitArray

For example, the number 182 in binary is 10110110. Each 1 and 0 in the binary representation can represent a true and false value respectively. This allows us to store the values into an Array<UInt> as opposed to an Array<Bool>, making use of all of the bits, requiring up to 8 times less memory, but whose API would function much like an Array<BooL>.

Additionally, since every element in the storage is technically holding 8 or more boolean values, performance on common operations such as bitwise AND are much faster, as multiple boolean values can be manipulated by manipulating a single integer value in the array.

Such a data structure would also be a great storage solution to the Set<Int> version of a BitArray, whose integer values are to represent the indices of the true values in an Array<Bool>, called a BitSet. This set representation utilizing the BitArray would allow for the memory and speed optimization, but with an API functioning as a Set<Int>.

let bitArray: BitArray = [true, true, false, true, true] // a BitArray implementation that functions like an Array<Bool>
let bitSet: BitSet = [0, 1, 3, 4] // an equivalent set version of the BitArray, functioning as a Set<Int>

Detailed Design

BitArray

The BitArray mimics the API of an Array<Bool>, but maintains the Bools within the bits of an unsigned integer with its own unique bitwise operations.

var bitArray: BitArray = [true, false, true, false] // stored little-endian within an integer as 00000101, or 5
let secondBitArray: BitArray = [true, false, false, false]

// works much like an Array<Bool>
bitArray.removeLast()
bitArray.reversed()
let isEqual: Bool = (bitArray == secondBitArray) // false
bitArray.append(isEqual)
print(Array(bitArray)) // prints "[true, false, true, false, false]"

// also has its own unique bitwise operations
bitArray.formBitwiseAnd(secondBitArray)
bitArray ^= secondBitArray
let invertedBitArray = ~bitArray

Definition

struct BitArray: ExpressibleByArrayLiteral, RandomAccessCollection, MutableCollection, Equatable, Codable {
  subscript(position: Int) -> Bool
}

BitArray-specific initializer APIs

BitArray, like an Array<Bool>, is a RandomAccessCollection with integer indices. However, BitArray does not conform to RangeReplaceableCollection because there didn't seem to be any use cases. However, some relevant parts of RangeReplaceableCollection were implemented without conforming.

  /// Creates a new collection containing the specified number of a single, repeated `Bool`
  ///
  /// The following example creates a BitArray initialized with 5 true values
  ///
  ///     let fiveTrues = BitArray(repeating: true, count: 5)
  ///     print(fiveTrues)
  ///     // Prints "BitArray(storage: [31], excess: 5)"
  ///     print(Array(fiveTrues))
  ///     // Prints "[true, true, true, true, true]"
  ///
  /// - Parameters:
  ///   - repeatedValue: The  `Bool` to repeat.
  ///   - count: The number of times to repeat the value passed in the
  ///     `repeating` parameter. `count` must be zero or greater.
public init(repeating repeatedValue: Bool, count: Int)

It also has an initializer that initializes from the BitSet

/// Creates a new BitArray using a BitSet, where the indices in the BitSet become the true values in the BitArray. The resulting BitArray often contains many padded false values at the end from empty bits that fill up the word
///
/// The following example creates a BitArray initialized by a BitSet
///     let bitSet: BitSet = [0, 1, 3, 5]
///     let bitArray = BitArray(bitSet)
///     print(bitArray)
///     // Prints "BitArray(storage: [43], excess: 0)"
///     print(Array(bitArray)) 
///     // Prints "[true, true, false, true, false, true, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false]"
///
/// - Parameters:
///   - bitSet: An initialized BitSet
  public init(_ bitSet: BitSet)

BitArray bitwise Operation APIs

The BitArray offers bitwise operation methods, both mutating and non-mutating, and both in the form of overloaded operators and function calls.

Bitwise OR operation

The bitwise or operation can either be mutating, non-mutating, with a function call, or by using an overloaded operator.

/// mutates self by setting the value in each index to true if either self or the passed in BitArray -- or both -- have a true value in that respective index.
///
/// The following example mutates a BitArray to be OR'd with another BitArray
///     var bitArray: BitArray = [true, false, true, false]
///     let secondBitArray = [true, false, false, true]
///     bitArray.formBitwiseOr(secondBitArray)
///     print(Array(bitArray))
///     Prints "[true, false, true, true]"
///
/// - Parameters:
///   - other: An initialized BitArray. `other` must have the same count as `self`
mutating func formBitwiseOr(_ other: BitArray)

/// overloaded operator that mutates the lhs BitArray by setting the value in each index to true if either the lhs BitArray or the rhs BitArray -- or both -- have a true value in that respective index.
///
/// The following example mutates the lhs to be OR'd with the rhs
///     var bitArray: BitArray = [true, false, true, false]
///     let secondBitArray = [true, false, false, true]
///     bitArray |= secondBitArray
///     print(Array(bitArray))
///     Prints "[true, false, true, true]"
///
/// - Parameters:
///   - lhs: An initialized BitArray
///   - rhs: An initialized BitArray. `rhs` must have same count as `lhs`
static func |= (lhs: inout BitArray, rhs: BitArray)

/// creates and returns a new BitArray by setting the value in each index of the new BitArray to true if either self or the passed in BitArray -- or both -- have a true value in that respective index.
///
/// The following example creates and returns a BitArray that is the OR'd result of self and another BitArray
///     let bitArray: BitArray = [true, false, true, false]
///     let secondBitArray = [true, false, false, true]
///     let newBitArray = bitArray.bitwiseOr(secondBitArray)
///     print(Array(newBitArray))
///     Prints "[true, false, true, true]"
///
/// - Parameters:
///   - other: An initialized BitArray. `other` must have the same count as `self`
/// - Returns: The BitArray that is the XOR'd result of `self` and `other`
func bitwiseOr(_ other: BitArray) -> BitArray

/// overloaded operator that creates and returns a new BitArray by setting the value in each index to true if either the lhs BitArray or the rhs BitArray -- or both -- have a true value in that respective index.
///
/// The following example creates and returns a BitArray that is the OR'd result of the lhs and rhs
///     let bitArray: BitArray = [true, false, true, false]
///     let secondBitArray = [true, false, false, true]
///     let newBitArray = bitArray | secondBitArray
///     print(Array(newBitArray))
///     Prints "[true, false, true, true]"
///
/// - Parameters:
///   - lhs: An initialized BitArray
///   - rhs: An initialized BitArray. `rhs` must have same count as `lhs`
/// - Returns: The BitArray that is the XOR'd result of the lhs and rhs
static func | (lhs: BitArray, rhs: BitArray) -> BitArray

Bitwise AND operation

The bitwise AND operation can either be mutating, non-mutating, with a function call, or by using an overloaded operator.

/// mutates self by setting the value in each index to true if both self and the passed in BitArray have a true value in that respective index.
///
/// The following example mutates a BitArray to be AND'd with another BitArray
///     var bitArray: BitArray = [true, false, true, false]
///     let secondBitArray = [true, false, false, true]
///     bitArray.formBitwiseAnd(secondBitArray)
///     print(Array(bitArray))
///     Prints "[true, false, false, false]"
///
/// - Parameters:
///   - other: An initialized BitArray. `other` must have the same count as `self`
mutating func formBitwiseAnd(_ other: BitArray)

/// overloaded operator that mutates the lhs BitArray by setting the value in each index to true if both the lhs BitArray and the rhs BitArray have a true value in that respective index.
///
/// The following example mutates the lhs to be AND'd with the rhs
///     var bitArray: BitArray = [true, false, true, false]
///     let secondBitArray = [true, false, false, true]
///     bitArray &= secondBitArray
///     print(Array(bitArray))
///     Prints "[true, false, false, false]"
///
/// - Parameters:
///   - lhs: An initialized BitArray
///   - rhs: An initialized BitArray. `rhs` must have same count as `lhs`
static func &= (lhs: inout BitArray, rhs: BitArray)

/// creates and returns a new BitArray by setting the value in each index of the new BitArray to true if both self and the passed in BitArray have a true value in that respective index.
///
/// The following example creates and returns a BitArray that is the AND'd result of self and another BitArray
///     let bitArray: BitArray = [true, false, true, false]
///     let secondBitArray = [true, false, false, true]
///     let newBitArray = bitArray.bitwiseAnd(secondBitArray)
///     print(Array(newBitArray))
///     Prints "[true, false, false, false]"
///
/// - Parameters:
///   - other: An initialized BitArray. `other` must have the same count as `self`
/// - Returns: The BitArray that is the AND'd result of `self` and `other`
func bitwiseAnd(_ other: BitArray) -> BitArray

/// overloaded operator that creates and returns a new BitArray by setting the value in each index of the new BitArray to true if both the lhs BitArray and the rhs BitArray have a true value in that respective index.
///
/// The following example creates and returns a BitArray that is the AND'd result of self and another BitArray
///     let bitArray: BitArray = [true, false, true, false]
///     let secondBitArray = [true, false, false, true]
///     let newBitArray = bitArray & secondBitArray
///     print(Array(newBitArray))
///     Prints "[true, false, false, false]"
///
/// - Parameters:
///   - lhs: An initialized BitArray
///   - rhs: An initialized BitArray. `rhs` must have same count as `lhs`
/// - Returns: The BitArray that is the AND'd result of the lhs and rhs
static func & (lhs: BitArray, rhs: BitArray) -> BitArray

Bitwise XOR operation

The bitwise xor operation can either be mutating, non-mutating, with a function call, or by using an overloaded operator.

/// mutates self by setting the value in each index to true if either self or the passed in BitArray -- but not both -- have a true value in that respective index.
///
/// The following example mutates self to be XOR'd with another BitArray
///     var bitArray: BitArray = [true, false, true, false]
///     let secondBitArray = [true, false, false, true]
///     bitArray.formBitwiseXor(secondBitArray)
///     print(Array(bitArray))
///     Prints "[false, false, true, true]"
///
/// - Parameters:
///   - other: An initialized BitArray. `other` must have the same count as `self`
mutating func formBitwiseXor(_ other: BitArray)

/// overloaded operator that mutates the lhs BitArray by setting the value in each index to true if either the lhs BitArray or the rhs BitArray -- but not both -- have a true value in that respective index.
///
/// The following example mutates a BitArray to be XOR'd with another BitArray
///     var bitArray: BitArray = [true, false, true, false]
///     let secondBitArray = [true, false, false, true]
///     bitArray ^= secondBitArray
///     print(Array(bitArray))
///     Prints "[false, false, true, true]"
///
/// - Parameters:
///   - lhs: An initialized BitArray
///   - rhs: An initialized BitArray. `rhs` must have same count as `lhs`
static func ^= (lhs: inout BitArray, rhs: BitArray)

/// creates a new BitArray by setting the value in each index of the new BitArray to true if either self or the passed in BitArray -- but not both -- have a true value in that respective index.
///
/// The following example creates a new a BitArray that is the XOR'd result of self and another BitArray
///     let bitArray: BitArray = [true, false, true, false]
///     let secondBitArray = [true, false, false, true]
///     let newBitArray = bitArray.bitwiseOr(secondBitArray)
///     print(Array(newBitArray))
///     Prints "[false, false, true, true]"
///
/// - Parameters:
///   - other: An initialized BitArray. `other` must have the same count as `self`
/// - Returns: The BitArray that is the XOR'd result of `self` and `other`
func bitwiseXor(_ other: BitArray) -> BitArray

/// creates a new BitArray by setting the value in each index of the new BitArray to true if either lhs BitArray or the rhs BitArray -- but not both -- have a true value in that respective index.
///
/// The following example creates a new a BitArray that is the XOR'd result of self and another BitArray
///     let bitArray: BitArray = [true, false, true, false]
///     let secondBitArray = [true, false, false, true]
///     let newBitArray = bitArray ^ secondBitArray
///     print(Array(newBitArray))
///     Prints "[false, false, true, true]"
///
/// - Parameters:
///   - lhs: An initialized BitArray
///   - rhs: An initialized BitArray. `rhs` must have same count as `lhs`
/// - Returns: The BitArray that is the XOR'd result of the lhs and rhs
static func ^ (lhs: BitArray, rhs: BitArray) -> BitArray

Bitwise NOT operation

The bitwise not operation is non-mutating and can be done with a function call or by using an overloaded operator.

/// creates a new BitArray by setting each value in the new BitArray to the opposite value of that in self
///
/// The following example creates a new a BitArray that is the inverse of self
///     let bitArray: BitArray = [true, false, true, false]
///     let newBitArray = bitArray.bitwiseNot()
///     print(Array(newBitArray))
///     Prints "[false, true, false, true]"
///
/// - Returns: The BitArray that contains the inverse or opposite values of that in self
func bitwiseNot() -> BitArray

/// The bitwise NOT operator (`~`) is a prefix operator that returns a value in which all the Bools of its argument are flipped
///
/// The following example creates a new a BitArray to be NOT'd with another BitArray
///     let bitArray: BitArray = [true, false, true, false]
///     let newBitArray = ~bitArray
///     print(Array(newBitArray))
///     Prints "[false, true, false, true]"
///
/// - Returns: The `BitArray` that contains the inverse or opposite values of that in `self`
static prefix func ~ (_ bitArray: BitArray) -> BitArray

BitArray-specific function APIs

There are two functions that RangeReplaceableCollections have that BitArray has as well, even though it does not conform to the RangeReplaceableCollection protocol.

They are the removeFirst(...) and removeLast(...) functions that remove a range of values.

/// Removes the first several values in the beginning of the BitArray
/// self must not be empty
///
/// The following example removes the first 3 elements of a BitArray
///     var bitArray: BitArray = [true, false, true, false, true]
///     bitArray.removeFirst(3)
///     print(Array(bitArray))
///     Prints "[false, true]"
///
/// - Parameters:
///   - k: an integer that represents the number of values desired to be removed from the beginning. 
///`k` must be a positive number, and less than the `count` value of `self` 
mutating func removeFirst(_ k: Int)

/// Removes the last several values in the end of the BitArray
///
/// The following example removes the last 3 elements of a BitArray
///     var bitArray: BitArray = [true, false, true, false, false, true]
///     bitArray.removeLast(3)
///     print(Array(bitArray))
///     Prints "[true, false, true]"
///
/// - Parameters:
///   - k: an integer that represents the number of values desired to be removed from the beginning. 
///`k` must be a positive number, and less than the `count` value of self  
mutating func removeLast(_ k: Int)

Additionally, there are two functions that indicate where the first and last true values are located in the BitArray

/// Returns the index of the first true value in the BitArray.
/// If there are no true values in the BitArray, the function returns nil
///
///     Examples:
///     let bitArray: BitArray = [false, true, false, true]
///     let firstTrue: Int? = bitArray.firstTrue()       
///     print(firstTrue)
///     Prints "2"
///
///     let allFalseBitArray: BitArray = [false, false, false]
///     let firstTrue: Int? = allFalseBitArray.firstTrue()       
///     print(firstTrue)
///     Prints "nil"
///
/// - Returns: An optional integer value of the index where the first true was found. 
/// If there are no true values in the BitArray, the function returns `nil`
func firstTrue() -> Int?

/// Returns the index of the last true value in the BitArray.
/// If there are no true values in the BitArray, the function returns nil
///
///     Examples:
///     let bitArray: BitArray = [false, true, false, true]
///     let lastTrue: Int? = bitArray.lastTrue()       
///     print(lastTrue)
///     Prints "2"
///
///     let allFalseBitArray: BitArray = [false, false, false]
///     let lastTrue: Int? = allFalseBitArray.lastTrue()       
///     print(lastTrue)
///     Prints "nil"
///
/// - Returns: An optional integer value of the index where the last true was found
/// If there are no true values in the BitArray, the function returns `nil`
func lastTrue() -> Int?

BitSet

BitSet is like BitArray, except a true value in its underlying storage represents an integer at the respective offset (a false value represents the absence of an integer at the respective offset). In fact, its storage is a BitArray. The API, however, works much like a Set<Int>, although BitSet does not actually conform to any Set<Int> protocols.

var bitSet: BitSet=  [0, 5, 8, 6]

// Some set operations that work on BitSet
bitSet.formUnion(BitSet([2, 4, 6]))
bitSet.insert(3)

Definition

struct BitSet: ExpressibleByArrayLiteral, BidirectionalCollection,  {
  struct Index: Comparable, Hashable {...}
  subscript(position: Index) -> Int
}

BitSet has an opaque (non-integer) index. Advancing the index forward (or backward) is O(n) because BitSet searches the underlying storage for the next (or previous) true bit.

BitSet's Set-Like APIs

Although BitSet doesn't conform to SetAlgebra, it makes use of many of the same functions. BitSet doesn't directly conform mainly because there are some unnecessary features and functions that come with SetAlgebra that BitSet does better off without.

Insert

/// Inserts a new member into the set
///     The following examples insert a 2 into a set that already has a 2 and a set that doesn't, respectively
///         var bitSetWithTwo: BitSet = [0, 2, 4, 5]
///         var didInsert: Bool = bitSetWithTwo.insert(2)
///         print(didInsert)
///         // Prints "false"
///         print(Array(bitSetWithTwo))
///         // Prints "[0, 2, 4, 5]"
///
///         var bitSetWithoutTwo: BitSet = [0, 3, 4, 6]
///         didInsert = bitSetWithoutTwo.insert(2)
///         print(didInsert)
///         // Prints "true"
///         print(Array(bitSetWithoutTwo))
///         // Prints "[0, 2, 3, 4, 6]"
///
/// - Parameter newMember: An element to insert into the set
/// - Returns: `true` if `newMember` was not contained in the
///   set. If an element equal to `newMember` was already contained in the
///   set, the method returns `false`. Returned value is discarded without giving a warning if unused
@discardableResult
mutating func insert(_ newMember: Int) -> Bool

Remove

/// Removes a member from the set
///     The following examples remove a 2 from a set that has a 2 and a set that doesn't, respectively
///         var bitSetWithTwo: BitSet = [0, 2, 4, 5]
///         var didRemove: Bool = bitSetWithTwo.remove(2)
///         print(didRemove) 
///         // Prints "true"
///         print(Array(bitSetWithTwo))
///         // Prints "[0, 4, 5]"
///
///         var bitSetWithoutTwo: BitSet = [0, 3, 4, 6]
///         didRemove = bitSetWithoutTwo.remove(2)
///         print(didRemove)
///         // Prints "false"
///         print(Array(bitSetWithoutTwo))
///          // Prints "[0, 3, 4, 6]"
///
/// - Parameter member: An element to remove from the set.
/// - Returns: `true` if `member` was contained in the
///   set. If an element equal to `member` was not already contained in the
///   set, the method returns `false`. Returned value is discarded without giving a warning if unused
@discardableResult
mutating func remove(_ member: Int) -> Bool

Contains

/// Checks whether the set contains a value or not
///
///     Example:
///     let bitSet: BitSet = [0, 2, 5, 6, 7]
///     let doesContain: Bool = bitSet.contains(0)
///     print(doesContain) // prints "true"
///     doesContain = bitSet.contains(8)
///     print(doesContain) // prints "false"
///
/// - Parameter member: An element to look for in the set.
/// - Returns: `true` if `member` exists in the set; otherwise returns `false`.
func contains(_ member: Int) -> Bool

Union

/// mutates self to be the resulting union of itself and another set
/// The union is the set of values that exist in either or both sets
///
///     The following example takes a set and transforms it into the union of itself and another set
///         var firstBitSet: BitSet = [0, 2, 3, 5]
///         let secondBitSet: BitSet = [2, 4, 5, 6]
///         firstBitSet.formUnion(secondBitSet)
///         print(Array(firstBitSet)) // prints "[0, 2, 3, 4, 5, 6]"
///
/// - Parameter other: Another BitSet. `other` must be finite
mutating func formUnion(_ other: BitSet)

/// creates a new BitSet that is the resulting union of itself and another set
/// The union is the set of values that exist in either or both sets
///
///     The following example creates a set that is the union of two sets
///         let firstBitSet: BitSet = [0, 2, 3, 5]
///         let secondBitSet: BitSet = [2, 4, 5, 6]
///         let resultSet = firstBitSet.union(secondBitSet)
///         print(Array(resultSet)) // prints "[0, 2, 3, 4, 5, 6]"
///
/// - Parameter other: Another BitSet
/// - Returns: A new BitSet with the elements that are in this set or `other` or both.
func union(_ other: BitSet) -> BitSet

Intersection

/// mutates self to be the resulting intersection of itself and another set
/// The intersection is the set of values that exist both sets
///
///     The following example takes a set and transforms it into the intersection of itself and another set
///         var firstBitSet: BitSet = [0, 2, 3, 5]
///         let secondBitSet: BitSet = [2, 4, 5, 6]
///         firstBitSet.formIntersection(secondBitSet)
///         print(Array(firstBitSet)) // prints "[2, 5]"
///
/// - Parameter other: Another BitSet
mutating func formIntersection(_ other: BitSet)

/// creates a new BitSet that is the resulting intersection of itself and another set
/// The intersection is the set of values that exist both sets
///
///     The following example creates a set that is the union of two sets
///         let firstBitSet: BitSet = [0, 2, 3, 5]
///         let secondBitSet: BitSet = [2, 4, 5, 6]
///         let resultSet = firstBitSet.intersection(secondBitSet)
///         print(Array(resultSet)) // prints "[2, 5]"
///
/// - Parameter other: Another BitSet
/// - Returns: A new BitSet with the elements that are both in this set and `other`.
func intersection(_ other: BitSet) -> BitSet

Symmetric Difference

/// mutates self to be the resulting symmetric difference of itself and another set
/// The symmetric difference is the set of values that exist one set or the other, but not both
///
///     The following example takes a set and mutates it into the symmetric difference between itself and another set
///         var firstBitSet: BitSet = [0, 2, 3, 5]
///         let secondBitSet: BitSet = [2, 4, 5, 6]
///         firstBitSet.formSymmetricDifference(secondBitSet)
///         print(Array(firstBitSet)) // prints "[0, 3, 4, 6]"
///
/// - Parameter other: Another BitSet
mutating func formSymmetricDifference(_ other: BitSet)

/// creates a new BitSet that is the resulting symmetric difference of itself and another set
/// The symmetric difference is the set of values that exist one set or the other, but not both
///
///     The following example creates a set that is the symmetric difference of two sets
///         let firstBitSet: BitSet = [0, 2, 3, 5]
///         let secondBitSet: BitSet = [2, 4, 5, 6]
///         let resultSet = firstBitSet.symmetricDifference(secondBitSet)
///         print(Array(resultSet)) // prints "[0, 3, 4, 6]"
///
/// - Parameter other: Another BitSet
/// - Returns: A new BitSet with the elements that are in either this set or the other set, but not both
func symmetricDifference(_ other: BitSet) -> BitSet

BitSet initializer from BitArray

A BitSet can conveniently be initialized from a BitArray. The BitArray simply becomes it's storage. If there are excessive false values at the end of the BitArray storage, they are removed as BitSet does not keep track of false values.

// Initializes a BitSet from a BitArray
init(_ bitArray: BitArray)

Considerations

Computed startIndex Property

The startIndex for the BitArray is traditionally a stored property of constant time. However, for the BitSet, as it needs to traverse and find the first true value in its BitArray storage type, is a computed property. Although this might not be ideal, when benchmarked, the operation is hardly slow at all. Consider this linear view of a benchmark comparing the BitSet's startIndex lookup compared to that of the Set<Int>:

Their runtimes hardly differ, and oftentimes the BitSet performs better than the Set<Int>. There may be a few worse-case scenarios where the BitSet's startIndex lookup is worse, but I felt it wasn't significant enough to go for the more complicated approach.

Lack of Conformance to RangeReplaceableCollection

Even though Array<Bool> conforms to RangeReplaceableCollection, I choose not to conform to it for BitArray. This is because use cases for the replaceSubrange(...) method seems to be a rarity (and actually couldn't think of any). Additionally, such a method would likely be costly in performance; as the conformance takes the replaceSubrange(...) implementation and utilizes it for all RangeReplaceableCollection methods, it could potentially make several other methods unnecessarily slow as well. Hence, as a conformance to RangeReplaceableCollection requires an implementation of replaceSubrange(...), the conformance seemed unnecessary. Instead, I added implementations of certain methods that might be useful.

Lack of Conformance to SetAlgebra

As BitSet is essentially a Set<Int> in its API. Therefore, not conforming to SetAlgebra is naturally an interesting choice. Instead, useful methods from SetAlbegra were implemented.
For example, for the insert(...) method:

// SetAlgebra method
@discardableResult
  mutating func insert( _ newMember: __owned Element) -> (inserted: Bool, memberAfterInsert: Element)

// BitSet method
@discardableResult
mutating func insert(_ newMember: Int) -> Bool

The insert method in SetAlgebra returns virtually useless information for a BitSet, resulting in the method doing more work than necessary.

Additionally, SetAlgebra requires an implementation of the update(...) method, that, not only returns more information than necessary, but has practically the same purpose as insert(...), making it practically useless, as well as confusing for an API.

mutating func update(with newMember: __owned Element) -> Element?

Hence, instead of implementing all of these practically useless or overworking methods, some useful methods were cherry picked from SetAlgebra and implemented manually without the conformance.

Some Neat Performance Benchmarks

As explained in the Proposed Solution section, a primary objective in implementing these data structures was to speed up typical operations (bitwise operations for the BitArray and set operations for the BitSet). As Array<Bool> does not have any bitwise operations, having them for BitArray is already a plus, and there isn't much to compare it to within the realm of Swift. However, for the BitSet, it is possible to test how much faster the BitSet set operations are compared to the Set<Int>. I was very happy to see the performance charts of the BitSet operations compared to that of Set<Int>, and am very excited to share them with you all as well! Check it out!

BitSet Union VS Set<Int> Union

BitSet Intersection VS Set<Int> Intersection

BitSet Symmetric Difference VS Set<Int> Symmetric Difference

Very pleasing to the eye if you ask me :smile:

End Remarks

Thank you for reviewing this API! Please feel free to leave feedback and suggestions! I;m very grateful for the great time I had with the Swift Standard Library on my GSoC project!

A special thanks to my primary mentor @David_Smith, as well as my other great mentors @lorentey and @kylemacomber. Also a special thanks to some Twitter friends who were willing to look over my draft for errors @ ziti and @ nasimuddin01

38 Likes

Do I understand correctly that BitSet uses a BitArray internally as its storage, and therefore if I create a bitset with a single numerically-large value, that a significant amount of memory will be allocated?

For example:

// Does this allocate a large amount of memory?
let x: BitSet = [1_000_000_000]

If so, was a design considered where BitSet instead uses, say, Dictionary<UInt> for storage?

5 Likes

It’s very nice to have this available! In particular, BitSet in turn makes it very easy to build an option set:

struct EnumSet<Option: RawRepresentable>: OptionSet where Option.RawValue == Int { … }

enum DataReadingOptions: Int {
  case quickly
  case correctingErrors
  case erasingOnceRead

  var asRawCStyleFlag: Int { 1 << self.rawValue }
}
Implementation comments

I suspect many BitSets/BitArrays have less than a word’s worth of bits in them. Is it worth adding a “small” representation like String has? For example, if the backing storage array is nil, excess can become storage itself, rather than a length. Of course, that means an extra branch on all operations, so you’d have to benchmark it!

9 Likes

Yes! Your understanding is correct, and is a good observation! About 125 million bytes of storage would be allocated for that particular BitSet.
Such a case was indeed taken into consideration. However, the purpose of the BitSet was to provide a Set<Int> API for the BitArray, allowing better support for different use cases. Additionally, a practical use case for a set with such a large value (or a large gap between values) seemed to be difficult to think of, and if such a use case were to come up, the developer would potentially be better off considering a regular Set<Int> data structure.

However, this is just the initial release! If such use cases were to be brought to our attention in the future, the optimization could surely be made! Perhaps you may even be the individual to make such a contribution! :smile:

Thanks so much for your concern and interest, btw! I appreciate it!

3 Likes

That's an interesting observation, thank you! Just to make sure we're on the same page and I understand correctly, are you speaking about excess in the sense that it keeps track of how many bits are used in the last element of the BitArray's storage (an Array<UInt>) and is not necessarily equivalent to the count (and is therefore never larger than the word)?

1 Like

This looks very cool! As someone who's written their own bit array, I'd really love to ditch my own thing and use something standard ;) A couple of comments from my side, although I must admit that some of these might be addressed in your actual implementation already–I haven't taken a loot at it yet.

I think it might be worth considering if we want to guarantee this. If so, then it is probably worth exposing this to users and accepting an initializer that can take an array of integers to make a BitArray out of. But if we don't commit to an internal representation, we can do whatever we like internally, which could mean a more efficient implementation on certain architectures with different endianness, or the ability to use bytes instead of integers as the backing (which, FWIW, I chose in my implementation) for space savings and serialization reasons.

I know some languages use operators that apply to entire collections, but in Swift my personal opinion is that we try to avoid having these sorts of things most of the time. The spelled-out versions seem reasonable, though, although they are generally not too hard to define by hand (e.g. zip(lhs, rhs).map(|)). Overall I'm neutral on the methods, mildly -1 on the operators.

These are also just array.firstIndex(of: true)–not completely sure we need a special method for them.

Note that Collection requires an O(1) startIndex. You will probably need to keep this around somewhere and update it as the BitSet is modified.

I'd really like BitArray to conform to RangeReplaceableCollection: off the top of my head one use case would be being able to easily flip bit-endianness. It's useful like it would be on [Bool]: if the compiler can't optimize this, I think it should be fixed to be able to.

2 Likes

Yeah, I admit to only looking at the implementation briefly, so you may have a reason why this isn’t viable, but…the idea is that Optional<Array<Int>> has two free sentinels: nil and the empty array. You could say that one of these is an indicator that excess should be interpreted differently in some way. For example, the top bits could be the size and the rest of the word the storage.

0 bits: { nil, (0 << 56) | 0 }
101: { nil, (3 << 56) | 0b101 }
0101: { nil, (4 << 56) | 0b0101 }
(and so on, up until the storage overflows…)
57 bits: { [bits], 7 }

There are other encoding schemes that would work here too, since you only need 6 bits to store a count up to 63, but this is one version. It absolutely trades speed and code size for memory usage, but I think that’s a valid trade here as it is for small strings.

More complicated encoding because I love encodings

One drawback of the encoding I described above is that it makes Optional<BitArray> more expensive since the nil storage is now in use by BitArray itself. So if the decision is entirely in the excess field, that means we don’t have to touch storage at all. Consider encoding lengths 0…56 as described above…and encoding excesses as negative values, so that if the last valid bit is, say, 60, excess would be -4 instead of 3. Then each function looks like this:

if self.excess < 0 {
  let count = 64 + self.excess + 1
  // access storage
} else {
  let count = self.excess >> 56
  // access bottom bits of self.excess
}

Checking a sign is a very efficient operation, so this shouldn’t be too much of a cost compared to checking for nil. The name excess would probably change then too.

I’m skipping the option of having an empty array be the marker, but that’s also an option. It’s a little more expensive to test, though, since it’s not a simple comparison against a constant integer or nil.

I feel uneasy about omitting RRC conformance too, but this particular use would be MutableCollection, not RRC.

3 Likes

My main concern with any alternate backing store implementation is whether it interferes with automatic vectorization, which this is otherwise a very natural target for. It might be fine though!

2 Likes

Ah, yes, you're right, you can do this with MutableCollection. The implementation I had in my head was

func reverseRange(_ range: Range<Index>) {
	replaceSubrange(range, with: self[range].reversed())
}

but of course you can just do swaps :slight_smile: However, isn't RangeReplaceableCollection where you get insertions and deletions from? Those sound like important operations…

1 Like

This is not true. The relevant documentation says:

Expected Performance
Types that conform to Collection are expected to provide the startIndex and endIndex properties and subscript access to elements as O(1) operations. Types that are not able to guarantee this performance must document the departure, because many collection operations depend on O(1) subscripting performance for their own performance guarantees.

And this is already the case for some standard library types, e.g. LazyFilterCollection.

2 Likes

I think users will find it rather unexpected that a type with “set” in its name would allocate so much memory for a single element. A set-like type should work efficiently for sparse data.

• • •

Additionally, I don’t understand the motivation for not having BitSet conform to SetAlgebra. It seems like the exact sort of type that protocol was designed for, and it would be rather odd if people were unable to use generic SetAlgebra algorithms on their bit-sets.

• • •

On a different note, what are the intended semantics for == on BitSet?

Intuitively I would expect two bit-sets to compare equal if they contain the same set of values, regardless of how they ended up in that state. However the current implementation appears to also include the number of trailing zeros in the backing storage. For example:

let a: BitSet = []
var b = a
b.insert(100)
b.remove(100)
print(a == b)   // “false”, but I would expect “true”
7 Likes

Yeah, I agree. The objection that some SetAlgebra APIs don't make a ton of sense for BitSet reminds me of the discussion (starting here) about whether clamped(to:) made sense as a method on Comparable, even though it resulted in API of questionable use on String. IMO, the value provided by being able to use BitSet in generic contexts expecting a set-like thing outweighs the somewhat confusing APIs.

The objection that the APIs for SetAlgebra result in BitSet doing more work than necessary are perhaps more compelling, but I'd be curious to know what the practical impact of this is. @MahanazAtiqullah, do you happen to have any benchmarks that compare the performance of the non-SetAlgebra-conforming BitSet to an implementation which does support the conformance?

4 Likes

There is already Set<Int> or similar for this, though, which has the opposite problem for dense data. I don't see a fundamental problem with having different types with different space-time tradeoffs, as long as they are documented. Is there is an implementation strategy that would work well for both cases?

3 Likes

The idea I had in mind was something like this:

struct BitSet {
  var storage: [Int: UInt]
  
  func contains(_ value: Int) -> Bool {
    let n = UInt.bitWidth._binaryLogarithm()
    let key = value >> n
    
    guard let x = storage[key] else {
      return false
    }
    
    let shift = value & (UInt.bitWidth - 1)
    let mask = (1 as UInt) << shift
    
    return (x & mask) != 0
  }
}

Definitely possible, but it's not immediately clear what sizes and densities this is more space efficient for, and there's definitely some methods (and sizes, and densities) where this is likely to be significantly slower.

1 Like

What about IndexSet? It should handle well sparse sets and sets with long runs of ones.

1 Like

This seems correct to me. My personal view on it (and I'm not the author, but was one of the mentors) is that the Bit* collections are more about "what if we had Collection 'views' onto bitfields", and IndexSet is more suited for massive sparse sets.

4 Likes

I think we need to be careful about the API here. In other languages, bits and Booleans are interchangeable, but in Swift we are careful to make the semantic distinction. This would be the first API, to my knowledge, which conflates 'set' bits with true in Swift, and we should think carefully whether that is desirable, because it is a departure from the current state of affairs.

Interestingly, I was of the opposite opinion: that the operators were very natural. Indeed, integer types in Swift represent both the numerical value and the sequence of bits, and these operators modify the sequence of bits, so to apply the same to BitArray is faithful to the semantics of these operators.

However, I do agree with the feedback that these should be spelled either one way or another; there should not be duplicate APIs.

I agree with this feedback as well. A separate API would be justifiable only if there were some special performance benefit here. Otherwise, I think the existing API surface would suffice.

I agree with this feedback as well.

I would also point out that the purpose of protocol conformance is to enable useful generic algorithms. For this reason, if BitSet can conform to SetAlgebra, it should--a user should be able to write an algorithm that uses intersections or unions which is generic over Sets and BitSets. I'm not sure I understand the rationale that there are APIs which are less useful and thus the conformance is unnecessary--by its nature, any sufficiently complex protocol will have APIs that are more useful or less useful for a specific conforming type, but that isn't a metric for deciding on conformance. (Having read on in the thread, I see others have made similar comments.)

I'm not sure where the disagreement lies. This text shows very clearly that O(1) startIndex is required by Collection. The type in question is able to guarantee this performance, and therefore should be designed in a way that does so.

2 Likes

The text shows that O(1) startIndex is expected, it is clearly not required. Roughly every type could guarantee this performance by taking the storage and performance penalty of computing and updating the startIndex in the initialiser and every mutating operation, but they demonstrably don't all do that. It might be the right tradeoff in this case, but it's not required.

2 Likes