I suppose -Ounchecked is arguably the already-existing solution to all this. It's just that I don't know what else that's doing. More importantly, I know from experience that -Ounchecked is removing checks that would catch actual bugs, whereas what we're talking about here are errant checks which are creating bugs (in layperson terms).
Using &+ / &- isn't a great solution because they don't automatically check that the final result is actually valid. And checking manually is easy to forget to do, or screw up.
Granted I don't recall how overflow & underflow trapping work on x86-64 or AArch64, e.g. whether they trap immediately or it's a condition register bit that's explicitly checked in code. So the following might not be viable on today's architectures.
However, what I was pondering is whether - for addition & subtraction - it's simply a postcondition that overflow count == underflow count. It's kind of like using the over/under-flow bits as an extra bit of precision, albeit one you'd have to accumulate into a wider register to handle (practically-speaking) arbitrarily-long sequences of addition and subtraction.
It would of course incur some cost at runtime, but only in cases where you're doing more than one arithmetic operation sequentially, which is deterministic at compile time and (I suspect) relatively uncommon. And you'd only have to do the extra work upon actual under- or over-flow, so this could be split into fast and slow paths such that the fast path is likely no slower than what we have today.
There's a paragraph in that paper's introduction which really summarise this well:
Integer errors and vulnerabilities occur when programmers reason about infinitely ranged mathematical integers, while implementing their designs with the finite precision, integral data types supported by hardware and language implementations.
Why is that? At least at a glance, changing this - addition & subtraction specifically - seems like it would only:
- Turn currently-broken code (crashing at runtime) to correctly working code? No source or ABI incompatibilities.
- Possibly impose a performance penalty for some integer arithmetic, depending on how efficiently this can ultimately be implemented. But would this actually matter given that standard optimisation mode (
-O) already imposes a perhaps-comparable penalty? Thus the existence of -Ounchecked?
This is a much more specialised case than what's proposed by the AIR model. I like what the AIR model is aiming for - e.g. it'd be great to have integer multiplication be associative too - but I have no idea how viable it is re. performance trade-offs and compiler complexity.
I assume that here you're alluding to the [lack of] constant folding? It sounds like you're saying "arbitrary runtime arithmetic cannot be folded, so for consistency the compiler doesn't fold any arithmetic"?
If I understand correctly, then can you elaborate as to why constant folding isn't performed in Swift? It's the first optimisation most students implement in compiler courses at university. I would never in a million years have guessed that Swift doesn't do it. Do any other (real-world) languages not do it? I've never heard of it causing any problems. I've definitely assumed its application a lot in the past, for performance. 
Why would that change anything? The arithmetic is still all done at once, so the compiler can optimise out errant crashes due to transient under- or over-flow.
I think you're looking for an example more like:
func ƒ(_ a: UInt, b: UInt) -> UInt {
a - b
}
let result = ƒ(5, 6) + 2
And yeah, that sort of thing does raise the possibility of source changes introducing behaviour changes. I do recognise that downside. I'm still inclined to call it worth it, though.
And in that case, it is more explicable because you've kind of "named" the intermediary, as ƒ (or its return value specifically, UInt, if you prefer). It's no longer a completely anonymous implementation detail.
In any case, I think that "I restructured my code and now oddly it broke" is the lesser evil than "I wrote trivial code and oddly it broke".
There's also unintended side-effects already to changes like that (let-alone non-trivial scenarioes, like refactoring things into whole separate files, modules, etc). From a performance perspective, at least.
(and don't dismiss "mere" performance issues - bad performance is a functionality issue, most obviously when it make things time out, such as a network request or the user's patience)
Not if you allow for vectorisation.
In scalar arithmetic all values can be promoted (or truncated) to register width for free (64-bit, today), and scalar arithmetic is almost always the same latency irrespective of how much of the register you use (though maybe there's power benefits…?). But in SIMD it still matters to performance how many distinct values you can pack into a register, as that directly determines your arithmetic throughput & latency.