Functions that accept arrays or array slices

scanon · May 2, 2022, 10:47pm

Echoing what Joe said, RandomAccessCollection is actually the wrong constraint for generic bignum arithmetic, broadly speaking. What you want for the low-level algorithms is a sequence of words ("limbs") laid out linearly in memory--this is an Unsafe[Mutable][Buffer]Pointer, not a RandomAccessCollection. RAC is needlessly generic, and will prevent some of the optimizations that you would otherwise be able to benefit from.

You might then present higher-level generic wrappers that marshal the data in to/out of whatever layout you're given if needed and call the low-level operations, but it's desirable to separate those from the implementation itself.

ksluder · May 3, 2022, 12:21am

What about a collection of UnsafeMutableBufferPointers? That would allow growing the bignum without reallocating it.

Either way, it sounds like you and Joe would advise implementing a struct BigNum that wraps a buffer or array instead of implementing this algorithm on any sort of protocol extension. Which is how I would have first approached this too; I’m curious why @wgjcv chose to go the extension route.

scanon · May 3, 2022, 12:46am

Personally, I would tend to implement the low level primitives (things like "add integer slices with carry", what GMP calls the mpn_* routines) directly on either UBP or a contiguous buffer abstraction, rather than bothering with a wrapping struct BigNum at all; that's a type for the user-facing high-level routines (what GMP calls mpz_*). This has the added virtue that it makes it possible to use existing C bignum packages as drop-in replacements for the low-level routines, either for performance reasons or as references for testing.

AlexanderM · May 3, 2022, 2:24am

IMO, fixits along these lines should be entirely eliminated.

I think the set of users who would benefit from this fixit is highly disjointed from the set of users who would understand it and its pitfalls. And that latter set of users would know to apply this, if it were necessary, without a fixit

AlexanderM · May 3, 2022, 2:31am

This is no "output".

You're trying to make a function which operates directly on the contents of your RandomAccessCollection. That collection, referred to as self within your method body, could be an Array<UInt>, an ArraySlice<UInt>, an UnsafeBufferPointer<UInt>, etc.

Any notion that the add function, when applied to all these desperate types, will "always output an array" is just ... not how any of this works.

The goal of even always having Array<UInt> is just contrary to the other goal, that this be performant and perform little/no copying/allocation. I.e. If you started with a SomeCollectionType<UInt>, and you want to avoid slow copies, then you don't even want the output to be Array<UInt>, there's no way that result could be materialized that doesn't involve copying the full contents into a freshly allocated array.

wgjcv · May 5, 2022, 2:27am

OK, one last shot at comparing mutating versus non-mutating functions to merge array and array slice arguments. I modified the mutating code again. The two types of functions should be on an equal footing. New benchmarks below the code. The impact of pointers to avoid using .StartIndex in the non-mutating code can be seen. Also it can be seen that the mutating code is still about 2x slower than the non-mutating code (apples to apples with no pointers). I have seen a website that claimed hat Swift functions that involved "self" would always be slower than functions that do not but I can't find it presently.

So this leads to a couple of questions:

Is there a way to avoid the penalty of using .StartIndex as required by array slice. Another poster suggested use of index(offset by:). I could not see how to use it.

The use of pointers that avoids .StartIndex offsetting makes a big difference in performance.

Do functions that use "self" really have this big of a performance impact?

// Returns a tuple containing the high and low components to UInt addition
 @inline(__always)
func addingFullWidth(_ x: UInt,_ y: UInt) -> (high: UInt, low: UInt) {
  let sum = x.addingReportingOverflow(y)
  return (sum.overflow ? 1 : 0, sum.partialValue)
}

// Returns the high and low parts of two seqeuential potentially overflowing additions
 @inline(__always)
func addingFullWidth(_ x: UInt, _ y: UInt, _ z: UInt) -> (high: UInt, low: UInt) {
  let xy = x.addingReportingOverflow(y)
  let xyz = xy.partialValue.addingReportingOverflow(z)
    let high:UInt = (xy.overflow ? 1 : 0) + (xyz.overflow ? 1 : 0)
  return (high, xyz.partialValue)
}

extension RandomAccessCollection
where Self: MutableCollection,
      Self: RangeReplaceableCollection,
      Element == UInt,
      Index == Int {

    // in-place add rhs to self
    mutating func Add<R: RandomAccessCollection> (_ rhs: R, _ uselastcarry:Bool = true)
    where R.Element == UInt, R.Index == Int {
        if rhs.count == 0 || (rhs.count == 1 && rhs.startIndex == 0) {
            return
        }
        let (sc, rc) = (self.count, rhs.count)
        let (si, ri) = (self.startIndex, rhs.startIndex)
        let minc = Swift.min(sc, rc)
        var rIndex = 0
        var carry:UInt = 0
        if sc >= rc {
            while rIndex < minc {
                (carry, self[rIndex + si]) = addingFullWidth(self[rIndex + si], rhs[rIndex + ri], carry)
                rIndex += 1
            }
            if sc == rc {
                if carry > 0 && uselastcarry { self.append(1)}
                return
            }
            while carry > 0 && rIndex < rc {
                (carry, self[rIndex + si]) = addingFullWidth(self[rIndex + si], carry)
                rIndex += 1
            }
            if carry > 0 && uselastcarry { self.append(1)}
            return
        } else {
            while rIndex < minc {
                (carry, self[rIndex + si]) = addingFullWidth(self[rIndex + si], rhs[rIndex + ri], carry)
                rIndex += 1
            }
            self = self + rhs[minc...]
            while carry > 0 && rIndex < rc {
                (carry, self[rIndex + si]) = addingFullWidth(self[rIndex + si], carry)
                rIndex += 1
            }
            if carry > 0 && uselastcarry { self.append(1)}
            return
        }
    }
}

// add two [UInt] arrays with pointers
func array64AddP<C1: RandomAccessCollection, C2: RandomAccessCollection>(_ x: C1,_ xPtr: UnsafeMutablePointer<UInt>,_ y: C2,_ yPtr: UnsafeMutablePointer<UInt>,_ uselastcarry:Bool = true) -> [UInt]
    where C1.Element == UInt, C1.Index == Int, C2.Element == UInt, C2.Index == Int {
    if x.count == 0 || (x.count == 1 && x.startIndex == 0) { return Array(y)}
    if y.count == 0 || (y.count == 1 && y.startIndex == 0) { return Array(x)}
    let (xc, yc) = (x.count, y.count)
    let minc = Swift.min(xc, yc)
    var rIndex = 0
    var result:[UInt]
    var carry:UInt = 0
    if xc >= yc {
        result = Array(x)
        while rIndex < minc {
            (carry, result[rIndex]) = addingFullWidth(result[rIndex], yPtr[rIndex], carry)
            rIndex += 1
        }
        if xc == yc {
            if carry > 0 && uselastcarry { result.append(1)}
            return result
        }
        while carry > 0 && rIndex < xc {
            (carry, result[rIndex]) = addingFullWidth(result[rIndex], carry)
            rIndex += 1
        }
        if carry > 0 && uselastcarry { result.append(1)}
    } else {
        result = Array(y)
        while rIndex < minc {
            (carry, result[rIndex]) = addingFullWidth(result[rIndex], xPtr[rIndex], carry)
            rIndex += 1
        }
        while carry > 0 && rIndex < yc {
            (carry, result[rIndex]) = addingFullWidth(result[rIndex], carry)
            rIndex += 1
        }
        if carry > 0 && uselastcarry { result.append(1)}
    }
    return result
}

// add two [UInt] arrays without pointers
func array64Add<C1: RandomAccessCollection, C2: RandomAccessCollection>(_ x: C1,_ y: C2,_ uselastcarry:Bool = true) -> [UInt]
    where C1.Element == UInt, C1.Index == Int, C2.Element == UInt, C2.Index == Int {
    if x.count == 0 || (x.count == 1 && x.startIndex == 0) { return Array(y)}
    if y.count == 0 || (y.count == 1 && y.startIndex == 0) { return Array(x)}
    let (xc, yc) = (x.count, y.count)
    let minc = Swift.min(xc, yc)
    let (xi, yi) = (x.startIndex, y.startIndex)
    var rIndex = 0
    var result:[UInt]
    var carry:UInt = 0
    if xc >= yc {
        result = Array(x)
        while rIndex < minc {
            (carry, result[rIndex]) = addingFullWidth(result[rIndex], y[rIndex + yi], carry)
            rIndex += 1
        }
        if xc == yc {
            if carry > 0 && uselastcarry { result.append(1)}
            return result
        }
        while carry > 0 && rIndex < xc {
            (carry, result[rIndex]) = addingFullWidth(result[rIndex], carry)
            rIndex += 1
        }
        if carry > 0 && uselastcarry { result.append(1)}
    } else {
        result = Array(y)
        while rIndex < minc {
            (carry, result[rIndex]) = addingFullWidth(result[rIndex], x[rIndex + xi], carry)
            rIndex += 1
        }
        while carry > 0 && rIndex < yc {
            (carry, result[rIndex]) = addingFullWidth(result[rIndex], carry)
            rIndex += 1
        }
        if carry > 0 && uselastcarry { result.append(1)}
    }
    return result
}

public func benchmarkTime(_ iter: Int, f:()->()) -> Double {
    f() // warmup
    let startTime = CFAbsoluteTimeGetCurrent()
    for _ in 0..<iter {f()}
    let elapsedTime = CFAbsoluteTimeGetCurrent() - startTime
    return elapsedTime
}

public func benchmark(title:String,_ iter: Int, f:()->()) {
    let averageTime = benchmarkTime(iter, f:f) / Double(iter)
    if averageTime >= 1.0 {
        print("\(title): \(String(averageTime)) seconds")
    } else if averageTime >= 0.001 {
        print("\(title): \(String(averageTime * 1000.0)) milliseconds")
    } else {
        print("\(title): \(String(averageTime * 1000000.0)) microseconds")
    }
}

for wordLength in [4, 10, 1000, 100000, 1000000] {
    print("wordLength =", wordLength)
    var u1:[UInt] = Array(repeating: UInt.max, count: wordLength)
    var u2 = u1[...]
    benchmark(title: "non-mutating add using pointers", 100) {
        let u3 = array64AddP(u1, &u1[u1.startIndex], u2, &u2[u2.startIndex])
    }
    
    benchmark(title: "non-mutating add reversed using pointers", 100) {
        let u3 = array64AddP(u2, &u2[u2.startIndex], u1, &u1[u1.startIndex])
    }
    
    benchmark(title: "non-mutating add", 100) {
        let u3 = array64Add(u1, u2)
    }
    
    benchmark(title: "non-mutating add reversed", 100) {
        let u3 = array64Add(u2, u1)
    }
    
    u1 = Array(repeating: UInt.max, count: wordLength)
    u2 = u1[...]
    benchmark(title: "mutating add", 100) {
        u1.Add(u2)
    }
    
    u1 = Array(repeating: UInt.max, count: wordLength)
    u2 = u1[...]
    benchmark(title: "mutating add reversed", 100) {
        u2.Add(u1)
    }
    
    print(" ")
}

wordLength = 4
non-mutating add using pointers: 0.8702278137207031 microseconds
non-mutating add reversed using pointers: 1.150369644165039 microseconds
non-mutating add: 1.6701221466064453 microseconds
non-mutating add reversed: 1.64031982421875 microseconds
mutating add: 2.0599365234375 microseconds
mutating add reversed: 1.9800662994384766 microseconds
 
wordLength = 10
non-mutating add using pointers: 1.1491775512695312 microseconds
non-mutating add reversed using pointers: 1.3506412506103516 microseconds
non-mutating add: 2.930164337158203 microseconds
non-mutating add reversed: 2.830028533935547 microseconds
mutating add: 4.35948371887207 microseconds
mutating add reversed: 4.31060791015625 microseconds
 
wordLength = 1000
non-mutating add using pointers: 5.700588226318359 microseconds
non-mutating add reversed using pointers: 5.890130996704102 microseconds
non-mutating add: 192.7196979522705 microseconds
non-mutating add reversed: 231.98962211608887 microseconds
mutating add: 448.7001895904541 microseconds
mutating add reversed: 436.0198974609375 microseconds
 
wordLength = 100000
non-mutating add using pointers: 602.8008460998535 microseconds
non-mutating add reversed using pointers: 559.7198009490967 microseconds
non-mutating add: 19.931190013885498 milliseconds
non-mutating add reversed: 21.635509729385376 milliseconds
mutating add: 43.83170962333679 milliseconds
mutating add reversed: 42.75159001350403 milliseconds
 
wordLength = 1000000
non-mutating add using pointers: 7.689861059188843 milliseconds
non-mutating add reversed using pointers: 7.640690803527832 milliseconds
non-mutating add: 209.53832030296326 milliseconds
non-mutating add reversed: 217.8787100315094 milliseconds
mutating add: 443.29445004463196 milliseconds
mutating add reversed: 460.3936803340912 milliseconds

wgjcv · May 10, 2022, 2:02am

Well, there are weeks where you can't do anything right. Last week was mine. The benchmark code messed up timing for the mutating functions. New code and benchmarks posted.

// Returns a tuple containing the high and low components to UInt addition
 @inline(__always)
func addingFullWidth(_ x: UInt,_ y: UInt) -> (high: UInt, low: UInt) {
  let sum = x.addingReportingOverflow(y)
  return (sum.overflow ? 1 : 0, sum.partialValue)
}

// Returns the high and low parts of two seqeuential potentially overflowing additions
 @inline(__always)
func addingFullWidth(_ x: UInt, _ y: UInt, _ z: UInt) -> (high: UInt, low: UInt) {
  let xy = x.addingReportingOverflow(y)
  let xyz = xy.partialValue.addingReportingOverflow(z)
    let high:UInt = (xy.overflow ? 1 : 0) + (xyz.overflow ? 1 : 0)
  return (high, xyz.partialValue)
}

extension RandomAccessCollection
where Self: MutableCollection,
      Self: RangeReplaceableCollection,
      Element == UInt,
      Index == Int {

    // in-place add rhs to self
    mutating func Add<R: RandomAccessCollection> (_ rhs: R, _ uselastcarry:Bool = true)
    where R.Element == UInt, R.Index == Int {
        if rhs.count == 0 || (rhs.count == 1 && rhs.startIndex == 0) {
            return
        }
        let (sc, rc) = (self.count, rhs.count)
        let (si, ri) = (self.startIndex, rhs.startIndex)
        let minc = Swift.min(sc, rc)
        var rIndex = 0
        var carry:UInt = 0
        if sc >= rc {
            while rIndex < minc {
                (carry, self[rIndex + si]) = addingFullWidth(self[rIndex + si], rhs[rIndex + ri], carry)
                rIndex += 1
            }
            if sc == rc {
                if carry > 0 && uselastcarry { self.append(1)}
                return
            }
            while carry > 0 && rIndex < rc {
                (carry, self[rIndex + si]) = addingFullWidth(self[rIndex + si], carry)
                rIndex += 1
            }
            if carry > 0 && uselastcarry { self.append(1)}
            return
        } else {
            while rIndex < minc {
                (carry, self[rIndex + si]) = addingFullWidth(self[rIndex + si], rhs[rIndex + ri], carry)
                rIndex += 1
            }
            self = self + rhs[minc...]
            while carry > 0 && rIndex < rc {
                (carry, self[rIndex + si]) = addingFullWidth(self[rIndex + si], carry)
                rIndex += 1
            }
            if carry > 0 && uselastcarry { self.append(1)}
            return
        }
    }
}

// add two [UInt] arrays with pointers
func array64AddP<C1: RandomAccessCollection, C2: RandomAccessCollection>(_ x: C1,_ xPtr: UnsafeMutablePointer<UInt>,_ y: C2,_ yPtr: UnsafeMutablePointer<UInt>,_ uselastcarry:Bool = true) -> [UInt]
    where C1.Element == UInt, C1.Index == Int, C2.Element == UInt, C2.Index == Int {
    if x.count == 0 || (x.count == 1 && x.startIndex == 0) { return Array(y)}
    if y.count == 0 || (y.count == 1 && y.startIndex == 0) { return Array(x)}
    let (xc, yc) = (x.count, y.count)
    let minc = Swift.min(xc, yc)
    var rIndex = 0
    var result:[UInt]
    var carry:UInt = 0
    if xc >= yc {
        result = Array(x)
        while rIndex < minc {
            (carry, result[rIndex]) = addingFullWidth(result[rIndex], yPtr[rIndex], carry)
            rIndex += 1
        }
        if xc == yc {
            if carry > 0 && uselastcarry { result.append(1)}
            return result
        }
        while carry > 0 && rIndex < xc {
            (carry, result[rIndex]) = addingFullWidth(result[rIndex], carry)
            rIndex += 1
        }
        if carry > 0 && uselastcarry { result.append(1)}
    } else {
        result = Array(y)
        while rIndex < minc {
            (carry, result[rIndex]) = addingFullWidth(result[rIndex], xPtr[rIndex], carry)
            rIndex += 1
        }
        while carry > 0 && rIndex < yc {
            (carry, result[rIndex]) = addingFullWidth(result[rIndex], carry)
            rIndex += 1
        }
        if carry > 0 && uselastcarry { result.append(1)}
    }
    return result
}

// add two [UInt] arrays without pointers
func array64Add<C1: RandomAccessCollection, C2: RandomAccessCollection>(_ x: C1,_ y: C2,_ uselastcarry:Bool = true) -> [UInt]
    where C1.Element == UInt, C1.Index == Int, C2.Element == UInt, C2.Index == Int {
    if x.count == 0 || (x.count == 1 && x.startIndex == 0) { return Array(y)}
    if y.count == 0 || (y.count == 1 && y.startIndex == 0) { return Array(x)}
    let (xc, yc) = (x.count, y.count)
    let minc = Swift.min(xc, yc)
    let (xi, yi) = (x.startIndex, y.startIndex)
    var rIndex = 0
    var result:[UInt]
    var carry:UInt = 0
    if xc >= yc {
        result = Array(x)
        while rIndex < minc {
            (carry, result[rIndex]) = addingFullWidth(result[rIndex], y[rIndex + yi], carry)
            rIndex += 1
        }
        if xc == yc {
            if carry > 0 && uselastcarry { result.append(1)}
            return result
        }
        while carry > 0 && rIndex < xc {
            (carry, result[rIndex]) = addingFullWidth(result[rIndex], carry)
            rIndex += 1
        }
        if carry > 0 && uselastcarry { result.append(1)}
    } else {
        result = Array(y)
        while rIndex < minc {
            (carry, result[rIndex]) = addingFullWidth(result[rIndex], x[rIndex + xi], carry)
            rIndex += 1
        }
        while carry > 0 && rIndex < yc {
            (carry, result[rIndex]) = addingFullWidth(result[rIndex], carry)
            rIndex += 1
        }
        if carry > 0 && uselastcarry { result.append(1)}
    }
    return result
}

public func benchmarkTime(_ iter: Int, f:()->()) -> Double {
    f() // warmup
    let startTime = CFAbsoluteTimeGetCurrent()
    for _ in 0..<iter {f()}
    let elapsedTime = CFAbsoluteTimeGetCurrent() - startTime
    return elapsedTime
}

public func benchmark(title:String,_ iter: Int, f:()->()) {
    let averageTime = benchmarkTime(iter, f:f) / Double(iter)
    if averageTime >= 1.0 {
        print("\(title): \(String(averageTime)) seconds")
    } else if averageTime >= 0.001 {
        print("\(title): \(String(averageTime * 1000.0)) milliseconds")
    } else {
        print("\(title): \(String(averageTime * 1000000.0)) microseconds")
    }
}

for wordLength in [4, 10, 1000, 100000, 1000000] {
    print("wordLength =", wordLength)
    var u1:[UInt] = Array(repeating: UInt.max, count: wordLength)
    var u2 = u1[...]
    benchmark(title: "non-mutating add using pointers", 100) {
        let u3 = array64AddP(u1, &u1[u1.startIndex], u2, &u2[u2.startIndex])
    }
    
    benchmark(title: "non-mutating add reversed using pointers", 100) {
        let u3 = array64AddP(u2, &u2[u2.startIndex], u1, &u1[u1.startIndex])
    }
    
    benchmark(title: "non-mutating add", 100) {
        let u3 = array64Add(u1, u2)
    }
    
    benchmark(title: "non-mutating add reversed", 100) {
        let u3 = array64Add(u2, u1)
    }
    
    u1 = Array(repeating: UInt.max, count: wordLength)
    u2 = u1[...]
    var u3 = u1
    benchmark(title: "mutating add", 100) {
        u1 = u3
        u1.Add(u2)
    }
    
    u1 = Array(repeating: UInt.max, count: wordLength)
    u2 = u1[...]
    var u4 = u2
    benchmark(title: "mutating add reversed", 100) {
        u2 = u4
        u2.Add(u1)
    }
    
    print(" ")
}

wordLength = 4
non-mutating add using pointers: 0.979900360107422 microseconds
non-mutating add reversed using pointers: 1.1897087097167969 microseconds
non-mutating add: 0.4601478576660156 microseconds
non-mutating add reversed: 0.489950180053711 microseconds
mutating add: 0.4291534423828125 microseconds
mutating add reversed: 0.7200241088867188 microseconds
 
wordLength = 10
non-mutating add using pointers: 1.1408329010009766 microseconds
non-mutating add reversed using pointers: 1.2707710266113281 microseconds
non-mutating add: 0.489950180053711 microseconds
non-mutating add reversed: 0.6008148193359375 microseconds
mutating add: 0.4601478576660156 microseconds
mutating add reversed: 0.8392333984375 microseconds
 
wordLength = 1000
non-mutating add using pointers: 3.0994415283203125 microseconds
non-mutating add reversed using pointers: 3.0100345611572266 microseconds
non-mutating add: 4.470348358154297 microseconds
non-mutating add reversed: 1.9109249114990237 microseconds
mutating add: 4.099607467651367 microseconds
mutating add reversed: 5.890130996704102 microseconds
 
wordLength = 100000
non-mutating add using pointers: 285.4800224304199 microseconds
non-mutating add reversed using pointers: 298.2497215270996 microseconds
non-mutating add: 466.43972396850586 microseconds
non-mutating add reversed: 187.39938735961914 microseconds
mutating add: 412.7800464630127 microseconds
mutating add reversed: 611.0906600952148 microseconds
 
wordLength = 1000000
non-mutating add using pointers: 4.8361194133758545 milliseconds
non-mutating add reversed using pointers: 4.813269376754761 milliseconds
non-mutating add: 5.4136693477630615 milliseconds
non-mutating add reversed: 2.7866196632385254 milliseconds
mutating add: 5.050569772720337 milliseconds
mutating add reversed: 6.79082989692688 milliseconds

Benchmarks now make sense.

ksluder · May 10, 2022, 1:45pm

Do you have an idea why the non-mutating version seems to outperform the mutating version for wordLength >= 1000?

wgjcv · May 10, 2022, 3:19pm

Not really, I suspect that either the way I am benchmarking is not completely fair between the mutating and non-mutating types or that the pointer code is more efficient with longer word lengths.

I am still not sure how to proceed with refactoring my code base. Generics certainly provide a path to using arrays and slices interchangeably (mostly). However, there is simply no way to get around the fact that using a non-zero based startIndex will be slower than using a zero-based one (functions that have to accommodate slices vs. functions that use just arrays and zero-based indexes). Using pointers does not feel right. Modifying all functions to use .startIndex feels right but hurts performance. My priorities are for more performant code which places me at a standstill.

I did convert a number of functions to use pointers as in the code above to see what problems might result. Where I hit a dead-end was the first function that used the generics that needed to call another function that used generics and pointers. I could not generate pointers that satisfied the second function from the first (cast a C1 or C2 pointer to array Uint pointer or array slice Uint pointer. I found that I could generate pointers if converting the arguments from C1 and C2 to either array or array slice, but the performance penalty is bad so no go.

ksluder · May 10, 2022, 3:54pm

This sounds like something which should be investigated. Is it because you’re calling startIndex in the inner loop? Could you pass the index as an argument to your inner loop, thus hoisting out the call to startIndex? You’d still need to call advanced(by:) in the inner loop, but generic specialization should turn that into a call to the + operator.

wgjcv · May 10, 2022, 4:25pm

I'm sure that there could be ways to optimize. Consider the following code:

        let (sc, rc) = (self.count, rhs.count)
        let (si, ri) = (self.startIndex, rhs.startIndex)
        let minc = Swift.min(sc, rc)
        var rIndex = 0
        var carry:UInt = 0
        if sc >= rc {
            while rIndex < minc {
                (carry, self[rIndex + si]) = addingFullWidth(self[rIndex + si], rhs[rIndex + ri], carry)
                rIndex += 1
            }
       }

Could be rewritten to:

        let (sc, rc) = (self.count, rhs.count)
        let (si, ri) = (self.startIndex, rhs.startIndex)
        let minc = Swift.min(sc, rc)
        var rIndex = 0
        var rIndex2 = 0
        var carry:UInt = 0
        if sc >= rc {
            while rIndex < minc {
                rIndex2 = rIndex + si
                (carry, self[rIndex2]) = addingFullWidth(self[rIndex2], rhs[rIndex + ri], carry)
                rIndex += 1
            }
       }

Which removes one of the index offsets. But is there a way other than pointers to eliminate all three?

ksluder · May 10, 2022, 6:11pm

I’m looking at your code now and the way you’re using pointers does not seem like idiomatic Swift to me.

In particular:

    benchmark(title: "non-mutating add using pointers", 100) {
        let u3 = array64AddP(u1, &u1[u1.startIndex], u2, &u2[u2.startIndex])
    }

You should be using Array.withUnsafeBufferPointer(_:) to get the elements of u1 and u2. Then you don’t need to redundantly pass u1 and u2 to array64AddP—you can just pass the UnsafeBufferPointers to the function and use the buffer’s count and baseAddress directly. That will also allow you to change u1 and u2 from var to let.

Your algorithms don’t ensure result has reserved enough space to avoid reallocating during the loop. This is low-hanging fruit.

ksluder · May 10, 2022, 7:09pm

I rewrote your algorithm to feel a little more idiomatic, ~~and benchmarked both on my 16" MacBook Pro (2.4GHz 8-core Core i9) in clamshell mode:~~ but it currently crashes due to a bug in my logic.

Bogus Results

import CoreFoundation

// Returns a tuple containing the high and low components to UInt addition
 @inline(__always)
func addingFullWidth(_ x: UInt,_ y: UInt) -> (high: UInt, low: UInt) {
  let sum = x.addingReportingOverflow(y)
  return (sum.overflow ? 1 : 0, sum.partialValue)
}

// Returns the high and low parts of two seqeuential potentially overflowing additions
 @inline(__always)
func addingFullWidth(_ x: UInt, _ y: UInt, _ z: UInt) -> (high: UInt, low: UInt) {
  let xy = x.addingReportingOverflow(y)
  let xyz = xy.partialValue.addingReportingOverflow(z)
    let high:UInt = (xy.overflow ? 1 : 0) + (xyz.overflow ? 1 : 0)
  return (high, xyz.partialValue)
}

struct BigNum {
    var _digits: [UInt]
    
    init(_ value: UInt) {
        _digits = [value]
    }
    
    private func add(_ rhs: BigNum, into: inout BigNum) {
        let sc = _digits.count
        if sc == 0 {
            into._digits = _digits
            return
        }
        
        let rc = rhs._digits.count
        if rc == 0 {
            into._digits = rhs._digits
            return
        }
        
        let minc = min(sc, rc)
        into._digits.reserveCapacity(sc + rc + 1)
        
        var rIndex = 0
        var carry: UInt = 0
        while rIndex < minc {
            let (thisCarry, thisDigit) = addingFullWidth(_digits[rIndex], rhs._digits[rIndex], carry)
            carry = thisCarry
            into._digits[rIndex] = thisDigit
            rIndex += 1
        }
        if sc == rc {
            if carry > 0 {
                into._digits.append(1)
            }
            return
        } else if rc > sc {
            into._digits.append(contentsOf: rhs._digits[minc...])
        }
        while carry > 0 && rIndex < rc {
            let (thisCarry, thisDigit) = addingFullWidth(_digits[rIndex], carry)
            carry = thisCarry
            into._digits[rIndex] = thisDigit
            rIndex += 1
        }
        if carry > 0 {
            into._digits.append(1)
        }
    }
    
    static func +(lhs: BigNum, rhs: BigNum) -> BigNum {
        var result = BigNum(0)
        lhs.add(rhs, into: &result)
        return result
    }
    
    static func +=(lhs: inout BigNum, rhs: BigNum) {
        lhs.add(rhs, into: &lhs)
    }
}

public func benchmarkTime(_ iter: Int, f:()->()) -> Double {
    f() // warmup
    let startTime = CFAbsoluteTimeGetCurrent()
    for _ in 0..<iter {
        f()
    }
    let elapsedTime = CFAbsoluteTimeGetCurrent() - startTime
    return elapsedTime
}

public func benchmark(title:String,_ iter: Int, f:()->()) {
    let averageTime = benchmarkTime(iter, f: f) / Double(iter)
    if averageTime >= 1.0 {
        print("\(title): \(String(averageTime)) seconds")
    } else if averageTime >= 0.001 {
        print("\(title): \(String(averageTime * 1000.0)) milliseconds")
    } else {
        print("\(title): \(String(averageTime * 1000000.0)) microseconds")
    }
}

print("Sanity checks:")

print("9 + 1 == \(BigNum(9) + BigNum(1))")

var u = BigNum(9)
u += BigNum(1)
print("9 += 1 == \(u)")

print("UInt.max + 1 == \(BigNum(UInt.max) + BigNum(1))")

u = BigNum(UInt.max)
u += BigNum(1)
print("UInt.max += 1 == \(u)")

print("UInt.max + UInt.max == \(BigNum(UInt.max) + BigNum(UInt.max))")

u = BigNum(UInt.max)
u += BigNum(UInt.max)
print("UInt.max += UInt.max == \(u)")

for wordLength in [4, 10, 1000, 100000, 1000000] {
    print("wordLength =", wordLength)
    let u1: [UInt] = Array(repeating: UInt.max, count: wordLength)
    let u2 = u1[...]
    benchmark(title: "BigNum + BigNum", 100) {
        let _ = u1 + u2
    }
    
    var u3 = u1
    benchmark(title: "BigNum += BigNum", 100) {
        u3 += u2
    }
    
    print(" ")
}

wordLength = 4
non-mutating add using pointers: 0.7390975952148438 microseconds
non-mutating add reversed using pointers: 0.7700920104980469 microseconds
non-mutating add: 0.3898143768310547 microseconds
non-mutating add reversed: 0.41961669921875 microseconds
mutating add: 0.3695487976074219 microseconds
mutating add reversed: 0.6401538848876953 microseconds
BigNum + BigNum: 0.2110004425048828 microseconds
BigNum += BigNum: 0.12993812561035156 microseconds
 
wordLength = 10
non-mutating add using pointers: 0.7593631744384766 microseconds
non-mutating add reversed using pointers: 0.7808208465576172 microseconds
non-mutating add: 0.4100799560546875 microseconds
non-mutating add reversed: 0.4303455352783203 microseconds
mutating add: 0.3898143768310547 microseconds
mutating add reversed: 0.6902217864990234 microseconds
BigNum + BigNum: 0.209808349609375 microseconds
BigNum += BigNum: 0.12040138244628906 microseconds
 
wordLength = 1000
non-mutating add using pointers: 2.459287643432617 microseconds
non-mutating add reversed using pointers: 2.4902820587158203 microseconds
non-mutating add: 3.8003921508789062 microseconds
non-mutating add reversed: 2.1004676818847656 microseconds
mutating add: 3.830194473266602 microseconds
mutating add reversed: 7.270574569702148 microseconds
BigNum + BigNum: 0.4696846008300781 microseconds
BigNum += BigNum: 10.960102081298828 microseconds
 
wordLength = 100000
non-mutating add using pointers: 203.9790153503418 microseconds
non-mutating add reversed using pointers: 273.3898162841797 microseconds
non-mutating add: 253.46994400024417 microseconds
non-mutating add reversed: 136.64007186889648 microseconds
mutating add: 268.95999908447266 microseconds
mutating add reversed: 484.3103885650635 microseconds
BigNum + BigNum: 284.498929977417 microseconds
BigNum += BigNum: 1.0709309577941895 milliseconds
 
wordLength = 1000000
non-mutating add using pointers: 11.09326958656311 milliseconds
non-mutating add reversed using pointers: 10.905090570449829 milliseconds
non-mutating add: 4.498459100723267 milliseconds
non-mutating add reversed: 3.1947600841522217 milliseconds
mutating add: 4.407249689102173 milliseconds
mutating add reversed: 6.36989951133728 milliseconds
BigNum + BigNum: 6.1307597160339355 milliseconds
BigNum += BigNum: 9.435360431671143 milliseconds

There’s clearly something about my naïve += implementations that makes it slow. But assuming I didn’t introduce a logic error, my idiomatic out-of-place addition seems to be quite competitive with your in-place mutating version.

scanon · May 10, 2022, 8:04pm

This is much more than just "non-idiomatic". This simply does not work in Swift.

ksluder · May 10, 2022, 8:07pm

Since the argument is an UnsafePointer and the operand of & is an array, I assumed this worked by virtue of the C interop conveniences.

scanon · May 10, 2022, 8:10pm

No. It will sometimes "work" accidentally, but it's the compiler is completely within its rights to assume that the call to array64AddP only modifies the element u1[u1.startIndex], and u1 itself is unchanged.

wgjcv · May 11, 2022, 1:14am

Agreed. I agree with you and Steve Canon. What I have done is not what should be done. But I stated a couple of times before that I have not been able to create pointers within the function casting from types C1 and C2 that will allow me to perform operations on the UInt elements. All attempts I have made have resulted in a protocol compliance issue (such as binary integer) or a casting issue upon execution. So to see how pointers would or would not improve performance I passed them in. Again, not the proper way. If the function arguments were array or array slice (not a generic) you can create pointers and make them work, but once generics are used to fuse the two types the compiler and runtime fight any efforts to effectively use pointers. There probably is a way, but I am not a good enough Swift expert to determine what it is.

Passing unsafe buffer pointers into the functions rewrites the problem statement (function fusing the two types). It just moves the problem upstream.

ksluder · May 11, 2022, 1:27am

Can you elaborate on why you originally chose to formulate this as a protocol extension rather than a concrete type?

wgjcv · May 11, 2022, 4:57pm

Kyle,

My current library uses [UInt] as the magnitude of the big integer. It uses an array extension of [UInt] to perform operations on arrays (add, subtract, multiply, divide, shifts, logicals, bit fiddling, etc). The library works (provides correct results) but is not nearly as performant as C based libraries. The gold standard is GMP. I can perform computations at about 3x - 5x slower than GMP. For anything 3x or less I declare victory.

When it comes to Big Integer libraries, the faster it is the more you can do with it. For instance a good benchmark for any Big Integer library is how fast can you generate digits of Pi. In my case I can generate 1 billion digits in about 2.6 hours. I can compute all the digits of the numerator and denominator of the millionth bernoulli number in about 30 minutes. I can prime factor an 80 bit number into two 40 bit primes in about 8 hours. None of these benchmarks are fast but not embarrassing slow. These benchmark applications and the library are pure swift. I am using a slow laptop by todays standards (2013 MacBook Pro quad core). An Apple silicon machine would obviously be much faster. When I started my library about 2+ years ago I could not get within 2-3 orders of magnitude of GMP so I am making progress.

I am at the point where I just need to make my library more performant. I am in pretty good shape with large numbers. I have implemented a good public domain floating point FFT algorithm for multiplication of very large numbers. Where I am weak is for numbers that is not very small (> 20 UInt words) or very big (< 10000 Uint words or more) where divide and conquer algorithms are used to multiply. These algorithms include Karatsuba that splits numbers into two slices and performs operations on the two pieces and Toom Cook that splits numbers into three or more slices and performs operations on them. In both cases these divide and conquer algorithms have fewer operations than base case algorithms. Both algorithms are recursive and the more recursions you have the better performance you will get.

Now for the question of why am I trying to use generics to fuse array and array slice. The divide and conquer algorithms need to use slices. Slices should be performant since they give a view into an array. But they are not that easy to use. You cannot call a function that expects and array and give it an array slice. An array slice is not zero based indexed like an array (potential speed loss). If you just split an array and then immediately convert the slices back to arrays you will give up the performance that might be gained from the divide and conquer algorithms to begin with. The naive approach would to be to create functions with all the argument combinations:

array array
array slice
slice array
slice slice

Works but would result in hundreds of functions to test. Not happening. What is needed is a way for functions to handle arguments that are either array or array slice in the most performant way. Generics is one option. It does work. However in my testing new generic functions against my original array only functions I lose about 20-30% of the performance. Again this would negate the benefits of the divide and conquer algorithms. 20-30% does not sound like much but it is.

Hope this helps in the understanding.

ksluder · May 11, 2022, 4:59pm

This is the part I don’t understand. Rather use [UInt] directly as the type of a big integer, why not create a nominal BigInt type that wraps a [UInt], and implement your algorithms as methods on BigInt? That should avoid the need to understand anything about ArraySlice.