Associative arithmetic operation order matters in Swift?

wadetregaskis · August 4, 2023, 5:03am

let a: UInt = 5
let b: UInt = 6
let c: UInt = 2

let fine = a + c - b
let notFine = a - b + c

print("\(fine) / \(notFine)")

When compiled with -Ounchecked, this works fine and produces correct output. With any other optimisation settings, including no optimisation, it crashes when computing notFine.

Perhaps I shouldn't be - I can see how a very literal, imperative execution of this causes the trap - but I'm a bit surprised about this, on multiple levels.

Why isn't it the net result of an expression of ~~commutative~~ associative operators that's checked for over- or under-flow, since that's the only case where it matters to an observer (right?).

Why, with any level of optimisation enabled, does constant folding not kick in anyway?

pyrtsa · August 4, 2023, 5:06am

The result of 5 - 6 can't be expressed by UInt and Swift standard library operators enforce these checks as a checked precondition (unless -Ounchecked, as you see). If you want to allow wrapping, use the ampersand prefixed operators like &+ and &- instead.

tera · August 4, 2023, 9:48am

But how would you achieve that? It's not like we are doing operations with a different type first (like Int) and then try to fit the final result into the desired type and trap if it doesn't fit. When overflow happens we effectively raise an "overflow" bit, and even if we won't trap on that immediately but wait until the very end of the expression we can't ignore that bit and have to trap then. Try to implement what you are suggesting in code or pseudocode to see what's going on.

PS, this is how + and - are implemented.

  internal static func - (_ lhs: Self, _ rhs: Self) -> Self {
    let (result, overflow) = lhs.subtractingReportingOverflow(rhs)
    _precondition(!overflow, "Overflow in -")
    return result
  }

  internal static func + (_ lhs: Self, _ rhs: Self) -> Self {
    let (result, overflow) = lhs.addingReportingOverflow(rhs)
    _precondition(!overflow, "Overflow in +")
    return result
  }

That's an interesting question.

~~It works with this though:~~
~~let fineAgain = 5 - 6 + 2~~

and it indeed feels wrong that the end result is different (the app fails at runtime in one case and proceeds as normal in another) as it is kinda the same thing. Most likely it's due to having those ops implemented in the standard library instead of the compiler itself. One way to correct this disparity would be issuing an "arithmetic overflow" compilation error (replicating the same logic of overflow checks in the compiler that's now happens in the standard library).

Nope, I forgot Uint type specification. it's the same, now at compile time:

let notFine: UInt = 5 - 6 + 2 // 🛑 Arithmetic operation '18446744073709551615 + 2' (on type 'UInt') results in an overflow

The diagnostic wording would surprise some, but not a big deal.

Same in this more silly example:

let compileTimeError: UInt = 0*(5 - 6) // 🛑 Arithmetic operation '5 - 6' (on type 'UInt') results in an overflow
let a: UInt = 5
let b: UInt = 6
let runTimeError: UInt = 0*(a - b)     // ❌ Swift runtime failure: arithmetic overflow

or even this:

true ? 1 : 5 - 6 as UInt // 🛑 Arithmetic operation '5 - 6' (on type 'UInt') results in an overflow

even though the "else" part result here is irrelevant.

Swift's arithmetic operations are not commutative... their &-counterparts are.

ibex10 · August 4, 2023, 12:25pm

Sorry, but I don't see the commutativity being applied above.

Am I a missing something?

tera · August 4, 2023, 1:12pm

You mean that subtraction is non commutative?

ok

var a: Int8 = 100
var b: Int8 = -100
var c: Int8 = 100

a + b + c // ok
a + c + b // problem

scanon · August 4, 2023, 1:37pm

First, a minor note: you're talking about associativity, not commutativity.

An operation is commutative if a OP b == b OP a for all a and b.
An operation is associative if a OP (b OP C) == (a OP b) OP c for all a, b, and c.

Trapping arithmetic is commutative, because there are no intermediate results. It is not associative, because the values of intermediate results change under reassociation.

Ok, onward!

This has been discussed extensively in the past. @Joe_Groff and I and a few other people suggested that Swift should use a variant of the AIR model about a decade ago, which would result in behavior like you suggest for all arithmetic, not just literals. The window for such major changes to basic operations in Swift has probably passed, though.

We've also talked about optimizations that would allow the compiler to relax precise traps in integer arithmetic. This has some benefits, but also impairs predictability and the ability for programmers to reason about the system, and so is maybe undesirable.

As for literals specifically, it's definitely undesirable to introduce language changes that make integer literals behave differently from integers, because it interferes with people's ability to understand the system, even if it's sometimes useful. That ship already sailed--there are lots of ways in which literals don't act like integer values already--but we can refrain from making it worse.

itaiferber · August 4, 2023, 1:53pm

Of course, Steve covered what I was going to say, but to add slightly:

It could be equally (if not more) confusing if you were to refactor code such as

let a: UInt = 5
let b: UInt = 6
let c: UInt = 2
let result =  a - b + c

which had been "working" for years into

// Non-inlinable function defined in your codebase somewhere.
func ƒ(_ a: UInt, b: UInt, c: UInt) -> UInt {
    a - b + c
}

// Elsewhere
let result = ƒ(5, 6, 2)

and suddenly start crashing because the compiler could no longer reorder the operations for you. Currently, these operators behave consistently everywhere.

tera · August 4, 2023, 2:48pm

And subtraction is not associative whether trapping or not.

I gave the paper a quick scan but didn't quite get it, do they suggest to retry the trapping operation with widen operands or what? Sounds inefficient.

While playing with this idea I've encountered this little gotcha which does look odd:

struct Foo {}
struct Bar {}

func + (lhs: Bar, rhs: Bar) -> Foo { .init() }
func + (lhs: Bar, rhs: Foo) -> Foo { .init() }
func + (lhs: Foo, rhs: Bar) -> Foo { .init() }
func + (lhs: Foo, rhs: Foo) -> Foo { .init() }

let a = Bar()
let b = Bar()
var foo: Foo
foo = (a + b) + (a + b) // ✅
let foo2: Foo = (a + b) + (a + b) // 🛑 Cannot convert value of type 'Bar' to specified type 'Foo'

Otherwise it works, but not convenient to use.

struct IntermediateResult {
    var val: Int
    var value: Int8 {
        Int8(val) // can trap
    }
}

struct Integer8: ExpressibleByIntegerLiteral {
    let val: Int8
    init(_ val: Int8) {
        self.val = val
    }
    init(integerLiteral value: Int8) {
        self.val = value
    }
}

func + (lhs: Integer8, rhs: Integer8) -> IntermediateResult {
    let (v, overflow) = lhs.val.addingReportingOverflow(rhs.val)
    if overflow {
        return IntermediateResult(val: Int(lhs.val) + Int(rhs.val))
    } else {
        return IntermediateResult(val: Int(v))
    }
}

func + (lhs: Integer8, rhs: IntermediateResult) -> IntermediateResult {
    if let rhs = Int8(exactly: rhs.val) {
        return lhs + Integer8(rhs)
    } else {
        return IntermediateResult(val: Int(lhs.val) + rhs.val)
    }
}
func + (lhs: IntermediateResult, rhs: Integer8) -> IntermediateResult {
    if let lhs = Int8(exactly: lhs.val) {
        return Integer8(lhs) + rhs
    } else {
        return IntermediateResult(val: lhs.val + Int(rhs.val))
    }
}
func + (lhs: IntermediateResult, rhs: IntermediateResult) -> IntermediateResult {
    if let lhs = Int8(exactly: lhs.val), let rhs = Int8(exactly: rhs.val) {
        return Integer8(lhs) + Integer8(rhs)
    } else {
        return IntermediateResult(val: Int(lhs.val) + Int(rhs.val))
    }
}

var a: Integer8 = 100
var b: Integer8 = -100
var c: Integer8 = 100

var r: IntermediateResult
r = a + b + c // ok
print(r.value)
r = a + c + b // still ok
print(r.value)
r = a + c
print(r.value) // not ok

bbrk24 · August 4, 2023, 2:58pm

The errors get even stranger if you look past the first one. There's a code hint on the first line (the definition of Foo) that says "where Self = Bar", which is complete nonsense to add there. It also claims that the reason this isn't allowed is that Bar does not conform to RangeReplaceableCollection, of all things? Very strange, and there's definitely a bug report or three to be made here.

scanon · August 4, 2023, 3:23pm

That would be one implementation of the semantics, but it's not the only one that's possible. For example, if you are summing a bunch of numbers, you can simply sum in a wider type and then perform a single overflow check when you're done; this will provably never overflow if you have enough headroom, and is at least as efficient as checking for overflow on each operation. There are some cases that do not admit such optimizations, but they are relatively rare in real programs.

tera · August 4, 2023, 3:30pm

Is adding ints as fast as adding bytes and multiplying ints as fast as multiplying bytes? Maybe so. There's still an open question what to do with Int operations themselves: you can't go to a wider type without sacrificing performance and it would be weird to have one behaviour for Int operations and another behaviour for smaller width types' operations, or to be able to sum a bunch of "mutually cancelling" Int8's but only a couple of Int64's.

Joe_Groff · August 4, 2023, 3:36pm

On pretty much any CPU made in the last decade or so, yeah.

There are some mitigating factors even in the double-wide case:

When adding or subtracting up to four 64-bit operands, you still only need the processor's status flags to track the overall overflow state of the result.
Yeah, you need two registers for 128-bit operands past that, but you're also eliminating all the intermediate overflow checks.
Interval analysis in the compiler can probably narrow the necessary size of the operands in most practical cases.

scanon · August 4, 2023, 3:38pm

A 64b + 128b -> 128b add is at least as fast as a 64b add with an overflow check on every mainstream implementation of x86_64 or arm64, with a tiny asterisk about register pressure on x86_64. It's not like we didn't think about this stuff. There are cases where you either need to go past 128b, or you have to detect overflow and replay, but those cases are really quite rare in non-contrived programs.

In any event, as I suggested upthread, this ship has sailed for Swift, but I would strongly encourage whoever designs the next language to consider it.

Joe_Groff · August 4, 2023, 3:43pm

It wouldn't be as nice as having it built in, but these days someone could probably make an #air(x + y - z) macro at least.

wadetregaskis · August 4, 2023, 3:55pm

My bad, sorry for the confusion. I actually went to look up the terminology before posting, to check my memory, but got distracted Wikipedia-diving into set algebra theory.

I've corrected the topic title and original post.

scanon · August 4, 2023, 3:57pm

It happens to all of us. We all know what you meant.

wadetregaskis · August 4, 2023, 5:28pm

I suppose -Ounchecked is arguably the already-existing solution to all this. It's just that I don't know what else that's doing. More importantly, I know from experience that -Ounchecked is removing checks that would catch actual bugs, whereas what we're talking about here are errant checks which are creating bugs (in layperson terms).

Using &+ / &- isn't a great solution because they don't automatically check that the final result is actually valid. And checking manually is easy to forget to do, or screw up.

Granted I don't recall how overflow & underflow trapping work on x86-64 or AArch64, e.g. whether they trap immediately or it's a condition register bit that's explicitly checked in code. So the following might not be viable on today's architectures.

However, what I was pondering is whether - for addition & subtraction - it's simply a postcondition that overflow count == underflow count. It's kind of like using the over/under-flow bits as an extra bit of precision, albeit one you'd have to accumulate into a wider register to handle (practically-speaking) arbitrarily-long sequences of addition and subtraction.

It would of course incur some cost at runtime, but only in cases where you're doing more than one arithmetic operation sequentially, which is deterministic at compile time and (I suspect) relatively uncommon. And you'd only have to do the extra work upon actual under- or over-flow, so this could be split into fast and slow paths such that the fast path is likely no slower than what we have today.

There's a paragraph in that paper's introduction which really summarise this well:

Integer errors and vulnerabilities occur when programmers reason about infinitely ranged mathematical integers, while implementing their designs with the finite precision, integral data types supported by hardware and language implementations.

Why is that? At least at a glance, changing this - addition & subtraction specifically - seems like it would only:

Turn currently-broken code (crashing at runtime) to correctly working code? No source or ABI incompatibilities.
Possibly impose a performance penalty for some integer arithmetic, depending on how efficiently this can ultimately be implemented. But would this actually matter given that standard optimisation mode (-O) already imposes a perhaps-comparable penalty? Thus the existence of -Ounchecked?

This is a much more specialised case than what's proposed by the AIR model. I like what the AIR model is aiming for - e.g. it'd be great to have integer multiplication be associative too - but I have no idea how viable it is re. performance trade-offs and compiler complexity.

I assume that here you're alluding to the [lack of] constant folding? It sounds like you're saying "arbitrary runtime arithmetic cannot be folded, so for consistency the compiler doesn't fold any arithmetic"?

If I understand correctly, then can you elaborate as to why constant folding isn't performed in Swift? It's the first optimisation most students implement in compiler courses at university. I would never in a million years have guessed that Swift doesn't do it. Do any other (real-world) languages not do it? I've never heard of it causing any problems. I've definitely assumed its application a lot in the past, for performance.

itaiferber:

It could be equally (if not more) confusing if you were to refactor code such as
let a: UInt = 5
let b: UInt = 6
let c: UInt = 2
let result =  a - b + c
which had been "working" for years into
// Non-inlinable function defined in your codebase somewhere.
func ƒ(_ a: UInt, b: UInt, c: UInt) -> UInt {
    a - b + c
}

// Elsewhere
let result = ƒ(5, 6, 2)
and suddenly start crashing because the compiler could no longer reorder the operations for you. Currently, these operators behave consistently everywhere.

Why would that change anything? The arithmetic is still all done at once, so the compiler can optimise out errant crashes due to transient under- or over-flow.

I think you're looking for an example more like:

func ƒ(_ a: UInt, b: UInt) -> UInt {
    a - b
}

let result = ƒ(5, 6) + 2

And yeah, that sort of thing does raise the possibility of source changes introducing behaviour changes. I do recognise that downside. I'm still inclined to call it worth it, though.

And in that case, it is more explicable because you've kind of "named" the intermediary, as ƒ (or its return value specifically, UInt, if you prefer). It's no longer a completely anonymous implementation detail.

In any case, I think that "I restructured my code and now oddly it broke" is the lesser evil than "I wrote trivial code and oddly it broke".

There's also unintended side-effects already to changes like that (let-alone non-trivial scenarioes, like refactoring things into whole separate files, modules, etc). From a performance perspective, at least.

(and don't dismiss "mere" performance issues - bad performance is a functionality issue, most obviously when it make things time out, such as a network request or the user's patience)

Not if you allow for vectorisation.

In scalar arithmetic all values can be promoted (or truncated) to register width for free (64-bit, today), and scalar arithmetic is almost always the same latency irrespective of how much of the register you use (though maybe there's power benefits…?). But in SIMD it still matters to performance how many distinct values you can pack into a register, as that directly determines your arithmetic throughput & latency.

Joe_Groff · August 4, 2023, 5:34pm

Another possible approach (and a good idea in general) is to not do arithmetic at all on small-sized integer types, and only use them as storage where necessary. Cast everything up to a big-enough integer type to hold the intermediates, do your math, then cast the result back to the needed storage size at the end. AIR model or not, the compiler should do a decent job of not using larger intermediates than necessary.

scanon · August 4, 2023, 5:37pm

I'm very confused as to why you think Swift doesn't do constant folding, since trivial inspection shows it doing it even with -Onone and producing diagnostics for examples like yours that can only be produced with constant folding.

Overflow-checking generally inhibits vectorization. The wins from being able to eliminate that in most cases in an AIR-like model beat out the losses from going to a wider type. Again, this has been studied pretty carefully.

scanon · August 4, 2023, 5:42pm

This is precisely an AIR model ("do the intermediate operations without traps, validate that the computed result is in-range") that you're backing into.