Functions that accept arrays or array slices

Note, this code's performance is potentially going to be significantly impacted by the way it builds its result array. result = result + [1] is a O(n) operation, which builds a whole new array, replacing the old one. Ideally you'd do result.append(1) to do an in-place append.

But this leads to the other thing, which is this function is allocating storage. Sometimes that's what you want, but usually for this kind of thing you want to implement this as the equivalent of +=, and then implement + in terms of that for the times when the user really does want a fresh array allocated.

This suggests what you want to do is implement this function as a mutating function on self that takes a right-hand side of another array:

extension RandomAccessCollection
where Self: MutableCollection,
      Self: RangeReplaceableCollection,
      Element == UInt {
    // in-place add rhs to self
    mutating func array64Add<
        RHS: RandomAccessCollection
    >(_ rhs: RHS),_ uselastcarry:Bool = true) where Element == UInt {
        
    }
}

I'd also suggest moving off using the pointer. Write the code generically, get it as clean as possible with the right algorithmic complexity. Then if it's slow because the optimizer isn't turning generic array subscript ops into loads during specialization, start to incorporate use of withUnsafeBufferPointer or withContiguousStorageIfAvailable to speed it back up. But start with clean code first, get the algorithm right in principle, then optimize for performance.

11 Likes

btw this uses RangeReplaceableCollection because that can grow when needed, which you do if the right-hand side needs more digits. It's unfortunate though, because you can't then use this algorithm on UnsafeBufferPointer<UInt> (because they can't grow... they're just a pointer and a length) which is a big benefit if you've gone to the trouble of writing it generically. So it's worth trying to externalize the requirement somehow so the caller needs to ensure enough space. At least in internal methods.

Is it possible to teach swift that result = result + [1] and result.append(1) is the same thing, so it does either in O(1)?

I created the below function to test above. It was harder to satisfy the compiler. And I ran into a case that still fails when it executes (see comment in code, self = result ...). This appears to be a case where the arguments being array slice and array is an issue.

I cannot see how the compiler can optimize all the .startIndex indexing. For a Big Integer of a few words long, maybe not an issue. But if you are computing a billion digits of pi that's a lot of additional operations.

// Returns a tuple containing the high and low components to UInt addition
 @inline(__always)
func addingFullWidth(_ x: UInt,_ y: UInt) -> (high: UInt, low: UInt) {
  let sum = x.addingReportingOverflow(y)
  return (sum.overflow ? 1 : 0, sum.partialValue)
}

extension RandomAccessCollection
where Self: MutableCollection,
      Self: RangeReplaceableCollection,
      Element == UInt,
      Index == Int {
    
    // in-place add rhs to self
    mutating func Add<R: RandomAccessCollection> (_ rhs: R,_ uselastcarry:Bool = true)
    where R.Element == UInt, R.Index == Int {
        if self.count == 0 || (self.count == 1 && self.startIndex == 0) {
            self = Array(rhs) as! Self
            return
        }
        if rhs.count == 0 || (rhs.count == 1 && rhs.startIndex == 0) {
            self = Array(self) as! Self
            return
        }
        let (sc, rc) = (self.count, rhs.count)
        let (si, ri) = (self.startIndex, rhs.startIndex)
        let minc = Swift.min(sc, rc)
        var rIndex = 0
        var carry:UInt = 0
        if sc >= rc {
            while rIndex < minc {
                (carry, self[rIndex + si]) = addingFullWidth(self[rIndex + si], rhs[rIndex + ri], carry)
                rIndex += 1
            }
            if sc == rc {
                if carry > 0 && uselastcarry { self.append(1)}
                return
            }
            while carry > 0 && rIndex < rc {
                (carry, self[rIndex + si]) = addingFullWidth(self[rIndex + si], carry)
                rIndex += 1
            }
            if carry > 0 && uselastcarry { self.append(1)}
        } else {
            var result = Array(rhs)
            while rIndex < minc {
                (carry, result[rIndex]) = addingFullWidth(result[rIndex], self[rIndex + si], carry)
                rIndex += 1
            }
            while carry > 0 && rIndex < rc {
                (carry, result[rIndex]) = addingFullWidth(result[rIndex], carry)
                rIndex += 1
            }
            if carry > 0 && uselastcarry { result.append(1)}
            self = result as! Self // Could not cast value of type 'Swift.Array<Swift.UInt>' (0x7fff80bb7628) to 'Swift.ArraySlice<Swift.UInt>' (0x7fff80bb79b0).
        }
    }
}
  1. I really doubt that performant code is possible without involving pointers. Libraries like GMP get their performance from a number of things (good algorithms, bare metal C with lots of pointer usage, vectorization using SIMD and other processor instructions). I do not think that we will get a Swift package that will compete with GMP because those developers fight down to the last processor cycle, but we can and will get closer. My current code is often time in the 3-5x times slower range, but can be improved. I can also hope for future Apple Silicon features (I know just a dream). I could not find any functionality in Accelerate that were of any use.

  2. I have been unable to create a pointer to a generic argument in the function. The compiler has refused all attempts. I don't know if it even possiblle to get a pointer to a generic argument. May require the compiler to know the specific type at compile time which it can't.

  3. Just a rant, but when Swift was designed with different behaviors for array and array slice, that was a bad day.

Generic code is quite nice for performance tuning, because it allows you to focus on writing your algorithm at an abstract level against an opaque interface (say, "some RandomAccessCollection of UInts), and you can plug-in different implementations.

There is nothing about pointers that magically makes them fast. Pointers are just addresses in memory, and of course everything in your process' memory has an address. For example, Array uses pointers behind-the-scenes, but it layers validation on top to ensure memory safety, with the goal being that the optimiser will remove any redundant validation and it ends up being as fast as an unchecked pointer. There are two things it adds:

  • Temporal/lifetime safety through ARC. This can be an overhead, but the compiler is becoming much, much smarter about optimising it through "ownership" and other performance predictability features.

  • Spatial safety through bounds-checking. Ultimately this relies on the compiler's ability to reason about index values and how they evolve. Personally, I've found that Swift's use of signed integers can be very limiting here, and I've had great success addressing memory buffers using unsigned integers instead.

    My own results show that the compiler is so much better at reasoning about unsigned integers, we can get much more predictable bounds-checking elimination this way. Moreover, the compiler can better understand the access pattern, allowing optimisations like automatic vectorisation.

    This shouldn't be very surprising - proper collection algorithms obviously include their own bounds-checking (they don't rely on trapping at runtime!); we just need to find a way for the compiler to understand it better. I don't know if that means adding unsigned-indexed buffers to the standard library, or improving the optimiser (maybe both?).

Other than those two things, it is literally a pointer. We rely on the optimiser to remove them (and it is improving) but there are also things you can experiment with. It's an ongoing process to make those optimisations more accessible and predictable without jumping through special hoops; obviously we want idiomatic code to also be fast. So as far as portable code goes, I do actually think you can get at/close to C levels of performance - and you also do better, getting that kind of performance while also getting checked accesses to memory, which is a huge deal!

I should note that all of this is enabled by generics. The way I've been able to experiment with things like unsigned indexes at all is because my algorithms were all generic and didn't care what the actual index type was, so I could try them with an Array, and then with a wrapper which gives the Array unsigned indexes. That's why it's good to select your generic constraints carefully. Do you really need integer indexes? Remember that RandomAccessCollection gives you O(1) index construction from an integer offset anyway.

IMO, this is a really pleasing way to write code and optimise algorithms. It cleanly separates improvements to the generic algorithm itself from improvements to the data structure and how accesses are communicated to the compiler (and which validation we need it to perform for us).

As for non-portable code (processor-specific instructions), IIRC that is also something we want to add in some shape eventually. I don't think there are imminent plans for it, though. @scanon would know more.

2 Likes

Certainly. In fact we already have a form of this: a += [b] is rewritten as a.append(b). But I have a lot of misgivings about that optimization.

Relying on these optimizer heroics to let people write the “wrong” thing performance-wise, instead of encouraging them to understand what the code is actually doing in principle, is very dangerous. These kind of optimizations are great until they’re not, at which point people fall off a performance cliff. You can hard-code a = a + b to be rewritten efficiently when a is an Array and b is a literal, but not necessarily when a is another type (like swift-collection’s Deque) or when b is not a literal but another dynamically-generated Array of unknown size.

So if by having the simple case optimized, you let the community dismiss the concerns of why their code is inefficient, because they prefer how the inefficient version looks and the compiler sweeps the problem under the rug, you create a worse problem where the attitude “just write the code that you like, and the compiler will fix it” becomes ingrained.

5 Likes

The problem stems from things like this:

   self = Array(self) as! Self

When writing the code generically, you cannot do this. The compiler will let you write it, because as! means "don't question me, this is guaranteed to work" but it is absolutely not guaranteed to work – this conversion will crash at runtime if Self is anything other than an Array.

In this particular case, you can rewrite this line as self = Self(rhs). This is possible because RangeReplaceableCollection provides this functionality (the ability to initialize the type from another Sequence of any type with the same Element).

In this particular case, that operation is fine. But later on you do this:

            var result = Array(rhs)
[...]
            self = result as! Self

The same fix can be applied (var result = Self(rhs)). This will compile and work – but I don't recommend taking that fix. The reason being that you've again fallen back to solving the problem by allocating memory and replacing the left-hand side with it – something that will kill performance if this function is itself being called in a tight loop. Instead, the code needs to be rewritten so that it operates only to the left hand side.

It may be that the suggestion of generics was a bit premature here. I agree it's the desired end-state, but it's probably worth taking a different path to get there.

What I would do is go back to implementing this purely on a concrete Array type. But take a rigorous approach to what you are trying to achieve: write a function that you are confident does not allocate any memory so long as there is space in self. Any use of Array(...) should be eliminated. At the same time, try to avoid using literals like 0 for the index, instead using only relative things like startIndex.

Once you've done that, the next challenge is to write it so that overflows are detected and returned separately from the in-place addition. That way you can factor out the "growing the Array" part. Maybe some kind of precondition that the caller must include sufficient zero-initialized space to flow into in self, with some helper methods that do this for you before calling them add method.

Once you've done that, you should be able to use the same code with both Array and UnsafeMutableBufferPointer. You can wrap the body of the implementation in self.withUnsafeMutableBufferPointer { /* implementation goes here */ }. This should allow you to use a microbenchmark to determine if the implementation benefits from using pointers instead – but without having to bake the pointers into the function signature itself.

Finally, once everything is working nicely, you can reintroduce the generics and this code can be made to work for various inputs including arrays, slices, unsafe buffers, dequeues etc.

p.s. it looks like you're missing a helper function when posting this code, as there's a version of addingFullWidth that takes a third argument that means your snippets don't compile stand-alone

7 Likes

Sigh. I see the compiler suggests doing this as a fixit. That's not good.

6 Likes

as i understand it, += takes a inout, so it is equivalent to calling a mutating member like append(_:) on it.

for what it's worth i used to cargo cult append(_:) everywhere and never use += but i'm starting to question this. += is certainly easier to refactor just because it's easier to manipulate in a text editor.

along the same lines, is there any benefit to using CollectionOfOne instead of [b]? a while ago i was told this was faster and now CollectionOfOne is everywhere in my code, to questionable benefit.

1 Like

This is diverging from the OP so might be better as separate thread but...

There's really two things here:

  • avoiding allocating an unnecessary array for [b]
  • loop unrolling for appending a small fixed number of elements

In both cases, there are other paths than the bespoke a += [b] optimization. [b] ought to be caught by stack allocation, and once all the array code is inlined, it ought to unroll by itself. But both optimizations are quite brittle, so an earlier bespoke optimization just for appending is more likely to succeed and also possibly compiles faster.

2 Likes

Yes, that why I did it.

I have managed to get working code based on your input.

// Returns a tuple containing the high and low components to UInt addition
 @inline(__always)
func addingFullWidth(_ x: UInt,_ y: UInt) -> (high: UInt, low: UInt) {
  let sum = x.addingReportingOverflow(y)
  return (sum.overflow ? 1 : 0, sum.partialValue)
}

extension RandomAccessCollection
where Self: MutableCollection,
      Self: RangeReplaceableCollection,
      Element == UInt,
      Index == Int {
    
    // in-place add rhs to self
    mutating func Add<R: RandomAccessCollection> (_ rhs: R,_ uselastcarry:Bool = true)
    where R.Element == UInt, R.Index == Int {
        if self.count == 0 || (self.count == 1 && self.startIndex == 0) {
            self = Self(Array(rhs))
            return
        }
        if rhs.count == 0 || (rhs.count == 1 && rhs.startIndex == 0) {
            self = Self(Array(self))
            return
        }
        let (sc, rc) = (self.count, rhs.count)
        let (si, ri) = (self.startIndex, rhs.startIndex)
        let minc = Swift.min(sc, rc)
        var rIndex = 0
        var carry:UInt = 0
        if sc >= rc {
            while rIndex < minc {
                (carry, self[rIndex + si]) = addingFullWidth(self[rIndex + si], rhs[rIndex + ri], carry)
                rIndex += 1
            }
            if sc == rc {
                if carry > 0 && uselastcarry { self.append(1)}
                return
            }
            while carry > 0 && rIndex < rc {
                (carry, self[rIndex + si]) = addingFullWidth(self[rIndex + si], carry)
                rIndex += 1
            }
            if carry > 0 && uselastcarry { self.append(1)}
            self = Self(Array(self))
            return
        } else {
            var result = Array(rhs)
            while rIndex < minc {
                (carry, result[rIndex]) = addingFullWidth(result[rIndex], self[rIndex + si], carry)
                rIndex += 1
            }
            while carry > 0 && rIndex < rc {
                (carry, result[rIndex]) = addingFullWidth(result[rIndex], carry)
                rIndex += 1
            }
            if carry > 0 && uselastcarry { result.append(1)}
            self = Self(result)
            return
        }
    }
}

var u1:[UInt] = [0]
var u2 = u1[1...]
u2.Add(u1)
print(u2)

For simplicity I chose to output self as an array and made the conversions necessary. I could have made the necessary conversions to make self's type remain unchanged but I would have to go this route I think:

if self is Array<UInt> {
    //  do what's necessary to make self an array
} else {
   //  do what's necessary to make self  a slice
}

I am going to have to ponder the output some. To this point I was trying to solve having two different inputs without thinking the output (sigh, two more options). This just gets worse and worse. No wonder I see developers just avoiding dealing with slices and just always converting to array.

The only thing I have not been able to solve is how to get a pointer to self or rhs in the add function. For sure if self is a slice the compiler simply refuses to cooperate no matter what I tried. The only sure way I know is to pass the pointer as an argument as in previous posts. However that complicates self even more.

Anyhow, I hope this post helps others that find themselves trying to merge array and array.

Next up is to do some benchmarking.

what about a += CollectionOfOne.init(b)?

You could eliminate the requirement that Index == Int if you use index(_, offsetBy:).

I have some initial benchmarking for mutating vs non-mutating versions. Code is below:


// Returns a tuple containing the high and low components to UInt addition
 @inline(__always)
func addingFullWidth(_ x: UInt,_ y: UInt) -> (high: UInt, low: UInt) {
  let sum = x.addingReportingOverflow(y)
  return (sum.overflow ? 1 : 0, sum.partialValue)
}

extension RandomAccessCollection
where Self: MutableCollection,
      Self: RangeReplaceableCollection,
      Element == UInt,
      Index == Int {
    
    // in-place add rhs to self
    mutating func Add<R: RandomAccessCollection> (_ rhs: R,_ uselastcarry:Bool = true)
    where R.Element == UInt, R.Index == Int {
        if self.count == 0 || (self.count == 1 && self.startIndex == 0) {
            self = Self(Array(rhs))
            return
        }
        if rhs.count == 0 || (rhs.count == 1 && rhs.startIndex == 0) {
            self = Self(Array(self))
            return
        }
        let (sc, rc) = (self.count, rhs.count)
        let (si, ri) = (self.startIndex, rhs.startIndex)
        let minc = Swift.min(sc, rc)
        var rIndex = 0
        var carry:UInt = 0
        if sc >= rc {
            while rIndex < minc {
                (carry, self[rIndex + si]) = addingFullWidth(self[rIndex + si], rhs[rIndex + ri], carry)
                rIndex += 1
            }
            if sc == rc {
                if carry > 0 && uselastcarry { self.append(1)}
                return
            }
            while carry > 0 && rIndex < rc {
                (carry, self[rIndex + si]) = addingFullWidth(self[rIndex + si], carry)
                rIndex += 1
            }
            if carry > 0 && uselastcarry { self.append(1)}
            self = Self(Array(self))
            return
        } else {
            var result = Array(rhs)
            while rIndex < minc {
                (carry, result[rIndex]) = addingFullWidth(result[rIndex], self[rIndex + si], carry)
                rIndex += 1
            }
            while carry > 0 && rIndex < rc {
                (carry, result[rIndex]) = addingFullWidth(result[rIndex], carry)
                rIndex += 1
            }
            if carry > 0 && uselastcarry { result.append(1)}
            self = Self(result)
            return
        }
    }
}

// add two [UInt] arrays
func array64Add<C1: RandomAccessCollection, C2: RandomAccessCollection>(_ x: C1,_ xPtr: UnsafeMutablePointer<UInt>,_ y: C2,_ yPtr: UnsafeMutablePointer<UInt>,_ uselastcarry:Bool = true) -> [UInt]
    where C1.Element == UInt, C1.Index == Int, C2.Element == UInt, C2.Index == Int {
    if x.count == 0 || (x.count == 1 && x.startIndex == 0) { return Array(y)}
    if y.count == 0 || (y.count == 1 && y.startIndex == 0) { return Array(x)}
    let (xc, yc) = (x.count, y.count)
    let minc = Swift.min(xc, yc)
    var rIndex = 0
    var result:[UInt]
    var carry:UInt = 0
    if xc >= yc {
        result = Array(x)
        while rIndex < minc {
            (carry, result[rIndex]) = addingFullWidth(result[rIndex], yPtr[rIndex], carry)
            rIndex += 1
        }
        if xc == yc {
            if carry > 0 && uselastcarry { result.append(1)}
            return result
        }
        while carry > 0 && rIndex < xc {
            (carry, result[rIndex]) = addingFullWidth(result[rIndex], carry)
            rIndex += 1
        }
        if carry > 0 && uselastcarry { result.append(1)}
    } else {
        result = Array(y)
        while rIndex < minc {
            (carry, result[rIndex]) = addingFullWidth(result[rIndex], xPtr[rIndex], carry)
            rIndex += 1
        }
        while carry > 0 && rIndex < yc {
            (carry, result[rIndex]) = addingFullWidth(result[rIndex], carry)
            rIndex += 1
        }
        if carry > 0 && uselastcarry { result.append(1)}
    }
    return result
}


public func benchmarkTime(_ iter: Int, f:()->()) -> Double {
    f() // warmup
    let startTime = CFAbsoluteTimeGetCurrent()
    for _ in 0..<iter {f()}
    let elapsedTime = CFAbsoluteTimeGetCurrent() - startTime
    return elapsedTime
}

public func benchmark(title:String,_ iter: Int, f:()->()) {
    let averageTime = benchmarkTime(iter, f:f) / Double(iter)
    if averageTime >= 1.0 {
        print("\(title): \(String(averageTime)) seconds")
    } else if averageTime >= 0.001 {
        print("\(title): \(String(averageTime * 1000.0)) milliseconds")
    } else {
        print("\(title): \(String(averageTime * 1000000.0)) microseconds")
    }
}

for wordLength in [10, 1000, 100000, 1000000] {
    print("wordLength =", wordLength)
    var u1:[UInt] = Array(repeating: UInt.max, count: wordLength)
    var u2 = u1[...]
    benchmark(title: "nun-mutating add", 100) {
        let u3 = array64Add(u1, &u1[u1.startIndex], u2, &u2[u2.startIndex])
    }
    
    benchmark(title: "nun-mutating add reversed", 100) {
        let u3 = array64Add(u2, &u2[u2.startIndex], u1, &u1[u1.startIndex])
    }
    
    u1 = Array(repeating: UInt.max, count: wordLength)
    u2 = u1[...]
    benchmark(title: "mutating add", 100) {
        u1.Add(u2)
    }
    
    u1 = Array(repeating: UInt.max, count: wordLength)
    u2 = u1[...]
    benchmark(title: "mutating add reversed", 100) {
        u2.Add(u1)
    }
    print(" ")
}

Benchmarks performed on 2013 MacBook pro with code in release mode with safety checks disabled and optimized for speed. Results below:

wordLength = 10
nun-mutating add: 0.9202957153320312 microseconds
nun-mutating add reversed: 1.0395050048828125 microseconds
mutating add: 4.9495697021484375 microseconds
mutating add reversed: 4.409551620483398 microseconds
 
wordLength = 1000
nun-mutating add: 6.110668182373047 microseconds
nun-mutating add reversed: 5.890130996704102 microseconds
mutating add: 444.4706439971924 microseconds
mutating add reversed: 441.7300224304199 microseconds
 
wordLength = 100000
nun-mutating add: 557.7099323272705 microseconds
nun-mutating add reversed: 562.0193481445312 microseconds
mutating add: 40.66767930984497 milliseconds
mutating add reversed: 39.13396954536438 milliseconds
 
wordLength = 1000000
nun-mutating add: 7.4018096923828125 milliseconds
nun-mutating add reversed: 7.455489635467529 milliseconds
mutating add: 411.7432498931885 milliseconds
mutating add reversed: 409.24807071685

Quite a dramatic difference. I thought it would be much closer.

Sorry to sound like a stuck record but this still isn't in-place mutation. You've just created a version of the out-of-place function that, once it's done, replaces self with the new Array.

1 Like

No you're right, half of the function did not. I made some changes which should qualify as fully mutating.

// Returns a tuple containing the high and low components to UInt addition
 @inline(__always)
func addingFullWidth(_ x: UInt,_ y: UInt) -> (high: UInt, low: UInt) {
  let sum = x.addingReportingOverflow(y)
  return (sum.overflow ? 1 : 0, sum.partialValue)
}

extension RandomAccessCollection
where Self: MutableCollection,
      Self: RangeReplaceableCollection,
      Element == UInt,
      Index == Int {
    
    // in-place add rhs to self
    mutating func Add<R: RandomAccessCollection> (_ rhs: R,_ uselastcarry:Bool = true)
    where R.Element == UInt, R.Index == Int {
        if self.count == 0 || (self.count == 1 && self.startIndex == 0) {
            self = Self(Array(rhs))
            return
        }
        if rhs.count == 0 || (rhs.count == 1 && rhs.startIndex == 0) {
            self = Self(Array(self))
            return
        }
        let (sc, rc) = (self.count, rhs.count)
        let (si, ri) = (self.startIndex, rhs.startIndex)
        let minc = Swift.min(sc, rc)
        var rIndex = 0
        var carry:UInt = 0
        if sc >= rc {
            while rIndex < minc {
                (carry, self[rIndex + si]) = addingFullWidth(self[rIndex + si], rhs[rIndex + ri], carry)
                rIndex += 1
            }
            if sc == rc {
                if carry > 0 && uselastcarry { self.append(1)}
                return
            }
            while carry > 0 && rIndex < rc {
                (carry, self[rIndex + si]) = addingFullWidth(self[rIndex + si], carry)
                rIndex += 1
            }
            if carry > 0 && uselastcarry { self.append(1)}
            self = Self(Array(self))
            return
        } else {
            while rIndex < minc {
                (carry, self[rIndex + si]) = addingFullWidth(self[rIndex + si], rhs[rIndex + ri], carry)
                rIndex += 1
            }
            self = self + rhs[minc...]
            while carry > 0 && rIndex < rc {
                (carry, self[rIndex + si]) = addingFullWidth(self[rIndex + si], carry)
                rIndex += 1
            }
            if carry > 0 && uselastcarry { self.append(1)}
            self = Self(Array(self))
            return
        }
    }
}

// add two [UInt] arrays
func array64Add<C1: RandomAccessCollection, C2: RandomAccessCollection>(_ x: C1,_ xPtr: UnsafeMutablePointer<UInt>,_ y: C2,_ yPtr: UnsafeMutablePointer<UInt>,_ uselastcarry:Bool = true) -> [UInt]
    where C1.Element == UInt, C1.Index == Int, C2.Element == UInt, C2.Index == Int {
    if x.count == 0 || (x.count == 1 && x.startIndex == 0) { return Array(y)}
    if y.count == 0 || (y.count == 1 && y.startIndex == 0) { return Array(x)}
    let (xc, yc) = (x.count, y.count)
    let minc = Swift.min(xc, yc)
    var rIndex = 0
    var result:[UInt]
    var carry:UInt = 0
    if xc >= yc {
        result = Array(x)
        while rIndex < minc {
            (carry, result[rIndex]) = addingFullWidth(result[rIndex], yPtr[rIndex], carry)
            rIndex += 1
        }
        if xc == yc {
            if carry > 0 && uselastcarry { result.append(1)}
            return result
        }
        while carry > 0 && rIndex < xc {
            (carry, result[rIndex]) = addingFullWidth(result[rIndex], carry)
            rIndex += 1
        }
        if carry > 0 && uselastcarry { result.append(1)}
    } else {
        result = Array(y)
        while rIndex < minc {
            (carry, result[rIndex]) = addingFullWidth(result[rIndex], xPtr[rIndex], carry)
            rIndex += 1
        }
        while carry > 0 && rIndex < yc {
            (carry, result[rIndex]) = addingFullWidth(result[rIndex], carry)
            rIndex += 1
        }
        if carry > 0 && uselastcarry { result.append(1)}
    }
    return result
}

public func benchmarkTime(_ iter: Int, f:()->()) -> Double {
    f() // warmup
    let startTime = CFAbsoluteTimeGetCurrent()
    for _ in 0..<iter {f()}
    let elapsedTime = CFAbsoluteTimeGetCurrent() - startTime
    return elapsedTime
}

public func benchmark(title:String,_ iter: Int, f:()->()) {
    let averageTime = benchmarkTime(iter, f:f) / Double(iter)
    if averageTime >= 1.0 {
        print("\(title): \(String(averageTime)) seconds")
    } else if averageTime >= 0.001 {
        print("\(title): \(String(averageTime * 1000.0)) milliseconds")
    } else {
        print("\(title): \(String(averageTime * 1000000.0)) microseconds")
    }
}

for wordLength in [4, 10, 1000, 100000, 1000000] {
    print("wordLength =", wordLength)
    var u1:[UInt] = Array(repeating: UInt.max, count: wordLength)
    var u2 = u1[...]
    benchmark(title: "nun-mutating add", 100) {
        let u3 = array64Add(u1, &u1[u1.startIndex], u2, &u2[u2.startIndex])
    }
    
    benchmark(title: "nun-mutating add reversed", 100) {
        let u3 = array64Add(u2, &u2[u2.startIndex], u1, &u1[u1.startIndex])
    }
    
    u1 = Array(repeating: UInt.max, count: wordLength)
    u2 = u1[...]
    benchmark(title: "mutating add", 100) {
        u1.Add(u2)
    }
    
    u1 = Array(repeating: UInt.max, count: wordLength)
    u2 = u1[...]
    benchmark(title: "mutating add reversed", 100) {
        u2.Add(u1)
    }
    print(" ")
}


New benchmarks below with similar results.

wordLength = 4
nun-mutating add: 0.9500980377197266 microseconds
nun-mutating add reversed: 1.0097026824951172 microseconds
mutating add: 2.1898746490478516 microseconds
mutating add reversed: 2.25067138671875 microseconds
 
wordLength = 10
nun-mutating add: 1.119375228881836 microseconds
nun-mutating add reversed: 1.329183578491211 microseconds
mutating add: 4.819631576538086 microseconds
mutating add reversed: 4.649162292480469 microseconds
 
wordLength = 1000
nun-mutating add: 5.730390548706055 microseconds
nun-mutating add reversed: 5.689859390258789 microseconds
mutating add: 458.0700397491455 microseconds
mutating add reversed: 473.87003898620605 microseconds
 
wordLength = 100000
nun-mutating add: 583.1098556518555 microseconds
nun-mutating add reversed: 607.140064239502 microseconds
mutating add: 42.97057032585144 milliseconds
mutating add reversed: 43.00993084907532 milliseconds
 
wordLength = 1000000
nun-mutating add: 7.75020956993103 milliseconds
nun-mutating add reversed: 7.6921093463897705 milliseconds
mutating add: 435.51946997642517 milliseconds
mutating add reversed: 428.4147894382477 milliseconds

This still isn't in-place mutation. The goal is not to write a mutating function – the reason for the mutating version is to avoid allocating new storage, and instead re-use the existing storage. But in your case, you are still allocating a new array and assigning it to self.

That said, there are probably other things at play here as well as that. But it's difficult to experiment with the code because it isn't stand-alone. It looks like it's relying on a 3-argument version of addingFullWidth that takes carry. I'm guessing this is somewhere else in your project, but not posted here?

1 Like

Sorry, Ben, you are right I did miss this function.

// Returns the high and low parts of two seqeuential potentially overflowing additions
 @inline(__always)
func addingFullWidth(_ x: UInt, _ y: UInt, _ z: UInt) -> (high: UInt, low: UInt) {
  let xy = x.addingReportingOverflow(y)
  let xyz = xy.partialValue.addingReportingOverflow(z)
    let high:UInt = (xy.overflow ? 1 : 0) + (xyz.overflow ? 1 : 0)
  return (high, xyz.partialValue)
}

I am sorry but I'm not sure I understand

self = Self(Array(self))

is to ensure that self is an array versus array slice on output

 self = self + rhs[minc...]

is needed if self is the smaller of the two operands.

self = Self(Array(rhs))

is the case of self = 0 added to rhs where the result will be rhs. Are you suggesting replacing this with removing the one element of self and appending rhs instead?? Or would it be a swap??

Stop trying to do that.

1 Like