Why does Swift use signed integers for unsigned indices?

wadetregaskis · February 3, 2024, 5:38pm

Continuing the discussion from Pitch: enable bounds-checking for BufferPointers:

It's important to note that (realistically) this is never going to change [in Swift]. This discussion is purely for edification.

Official justification

Per the language guide:

Use UInt only when you specifically need an unsigned integer type with the same size as the platform’s native word size. If this isn’t the case, Int is preferred, even when the values to be stored are known to be nonnegative. A consistent use of Int for integer values aids code interoperability, avoids the need to convert between different number types, and matches integer type inference, as described in Type Safety and Type Inference.

And also:

Use the Int type for all general-purpose integer constants and variables in your code, even if they’re known to be nonnegative. Using the default integer type in everyday situations means that integer constants and variables are immediately interoperable in your code and will match the inferred type for integer literal values.

Use other integer types only when they’re specifically needed for the task at hand, because of explicitly sized data from an external source, or for performance, memory usage, or other necessary optimization. Using explicitly sized types in these situations helps to catch any accidental value overflows and implicitly documents the nature of the data being used.

Though note how that last sentence contradicts the earlier text since it is implicitly acknowledging that a signed type may permit logically invalid values (e.g. negative indices).

The Highlander problem

Though it's not written anywhere formal, that I can find, I recall the other reason is that Swift can have only one preferred numeric type (i.e. what i defaults to when you write let i = 5). It was argued that a signed type is more flexible - since it supports negative values - so it'd ultimately require less explicit typing of variables.

e.g. consider if you had:

let a = 5
let b = -5
let result = a + b

Possible other reasons

Avoid crashes

I vaguely recall there was also an argument that it was to avoid unexpected crashes due to the very literal-minded nature of Swift's arithmetic evaluation, e.g.:

var index = 0
someArray[index - 5 + someArgument]

IIRC, that's basically guaranteed to crash even if someArgument is ≥ 5, because the arithmetic is performed step by step and it underflows on the first operation (-5).

It's hard to fault Swift for this because it's hard for the compiler to know what the "correct" order of associative operations is (if there even is one). The only true solution would probably be something like a numeric model of unlimited-precision intermediaries, which isn't trivial for a compiler to implement. It was apparently considered in Swift's early days but alas was rejected. I suspect it'll be a hallmark of the next generation of languages (it's in practice pretty similar to how popular languages like Python already work, with their unlimited-precision numeric types).

Note also that in at least some languages which use signed indices (e.g. Julia) the compiler actually has to do extra work as a result, so that it can actually detect incorrect uses (by magically knowing that despite the signed type in use, the values are in principle not actually signed).

Convenience in loops

Similarly, a C-style for(;;) loop might "intentionally" underflow because it counts down and has a >= 0 check, requiring the index to go negative in order to actually escape the loop. But an unsigned index would crash (in Swift) due to the underflow.

However, Swift very deliberately avoids such "low-level" looping constructs - they're one of the only cases of syntax actually being removed - favouring unindexed enumeration (for object in collection) and range-based enumeration (for i in 0..<5). And for reverse order, for i in stride(from: 5, to: 0, step: -1) works just fine even if the 'from' and 'to' arguments are unsigned.

Potential performance benefits

In some other languages (e.g. C++) there are performance benefits to using signed integers for e.g. loop counters, because overflow is officially undefined behaviour for signed integers but not for unsigned integers, giving the compiler's optimiser more flexibility for signed integers.

However, I don't think this applies to Swift since overflow is not undefined behaviour in the same sense (it crashes, which is still bad behaviour, but the Swift compiler is required to ensure it crashes, it can't just omit the crash even if nothing would go wrong in its absence).

Downsides

I understand the original rationale for using signed integers for everything in Swift, but I think in retrospect that was a bad decision. It creates more problems than it solves, trading essentially just some occasional Int(…) / UInt(…) "noise" for a bunch of downsides:

It's conceptually wrong, and confusing.
It moves error diagnostics away from their source (e.g. crash only when you try to use an invalid - negative - index, rather than when the negative index is created).
It's fundamentally at odds with Objective-C code, which does use unsigned integers for indices, creating subtle bridging errors and complicating the bridging since there has to be special behaviour in place to implicitly rewrite NSUInteger (and friends) to Int.
It limits the size of containers. Though it's exceedingly rare that this is a practical issue when using Int on 64-bit architectures.

I make a point of using the conceptually correct type in my own code, such as UInt whenever that applies, and I don't have any problem with it; in practice it adds very little explicit casting because it's unnatural (and suspicious) to be converting between signed and unsigned for a given application. I suspect the vast majority of the casts are merely to deal with 3rd party libraries (including Apple's) that use the wrong signedness.

jrose · February 3, 2024, 7:33pm

Part of the reason we* were okay mapping NSUInteger to Int in Apple APIs is that NSNotFound is NSIntegerMax. So none of those APIs supported bigger sizes anyway. On top of that, Foundation used NSUInteger for its collections, but AppKit and UIKit tended to use NSInteger, with -1 as a placeholder rather than NSNotFound, so if we hadn’t combined the two we would have immediately had API problems. (This predated API notes and most of the per-API adjustments made for Swift.)

As noted, this whole discussion is abstract anyway, so you can pivot to “if Swift hadn’t had to be compatible with existing Cocoa APIs, would it have still picked Int everywhere?” And I think the answer is still yes, because again, positions are unsigned but offsets are signed anyway, but it’s much less of a clear win. (Rust went the other way, with no “standard” integer type and usize (UInt) preferred for positions, and it is annoying sometimes but mostly it’s fine.)

* speaking as someone who worked on Swift 1-5.

EDIT: Also, it’s great to catch negative indexes, but it doesn’t stop out-of-bounds positive indexes, and a collection can easily check for both with a single biased comparison. So we’re not saving the running code any work, and we’re not providing any real type-level guarantee to the humans writing the code that uses a collection. The only thing that is guaranteed to get better is writing a collection, and that isn’t a major thing to optimize your design for.

jim · February 3, 2024, 7:34pm

Thank you for the wonderful reply. I acknowledge that as you said, this ain’t going to change in Swift, but i agree with your conclusions.

I also wonder if the explicit UInt(..) and Int(…) that would be required wouldn’t be more Swifty, since they nudge the programmer and reader to think about when those conversions are necessary (and maybe you never need negative values to begin with, so why not remove the complexity?).

But thanks again, what great food for thought.

Nickolas_Pohilets · February 3, 2024, 10:43pm

I don’t want to go off-topic too much, but would you mind a sharing a bit how to check both bounds with a single comparison?

TellowKrinkle · February 3, 2024, 11:23pm

If you want to check whether an integer x is in the range l..<h (where l ≤ h), you can do it with UInt(bitPattern: x &- l) < UInt(bitPattern: h - l). Anything lower than l will wrap around, making it a very large number, and fail the comparison.
A lot of compilers will do this transformation if both bounds are known at compile time, and I was able to get both clang and gcc to do if if the lower bound is known to be zero, but it seems Swift Range doesn't communicate its begin≤end invariant to the compiler, so it doesn't get the optimization: Compiler Explorer

Dmitriy_Ignatyev · February 4, 2024, 2:17pm

Thanks for creating this topic, this is a frequently asked question.

Many Developers try to understand why Int should be used by default. Explanation like Int is preferred, even when the values to be stored are known to be nonnegative in fact tells about nothing. It is like "believe it without any detailed and meaningful explanation". It is a good idea to gather more explanations and reasonable examples.

The same can be applied to bounds checking – invalid index leads to a crash and this is what we want in most cases. In my opinion the same is relevant for UInt – if you use it, you should be careful about underflow crash as with indices subscript.

My own observation is that indices are very often used incorrectly. For example it is a common situation in mobile apps when index is passed form UI's indexPath.row to a viewModel or interactor where models[index] leads to a crash in production, because what you see in UI and what is stored in a model can be very different things. In this case elements can be found by their ID instead of UI index.

I have the same experience.

A consistent use of Int for integer values aids code interoperability – Lots of codebases doesn't need to care about interoperability.
Some examples from one of my projects where UInt is used which has benefits in practice are listed below.

Server responses mapping:

let notificationsCount: UInt
let confirmationsLimit: UInt
let pageNumber: UInt
let stage: UInt
let totalCount: UInt
...

There were situations when negative values came from server (don't ask why, just because..). Using of UInt leads to Decoding error which is automatically detected in our case and bug tickets are created.
Using of Int leads to success mapping but problems occur later, very often as an unexpected behaviour. Then users catch these problems and different if-elses and asserts are added instead of fixing problem.
The same is applied for data passed form app to server where app should check

More strict contract – if value exists then it is valid.

let rating: UInt
let bonusPoints: UInt
let numberOfReviews: UInt
let perPage: UInt
let totalPages: UInt
let nextPage: UInt

func getNextPage() -> UInt {
  currentPage + 1
}
...

Is easier to test because we need to care less about negative values.

public struct AppVersion: LosslessStringConvertible {
  public let major: UInt16
  public let minor: UInt16
  public let patch: UInt16

  ...
}

App version components are statically known to be nonnegative.

extension Double {
  public func asString(fractionDigits: UInt8) -> String {
    String(format: "%.\(fractionDigits)f", self)
  }
}

If fractionDigits was an Int, how should this function be implemented? Of course, we can check that fractionDigits >= 0, but using of UInt8 doesn't require this check at all.

let promoId: UInt64
let orderId: UInt64
let categoryId: UInt64
...

UInt64 often used as id Type

Slava_Pestov · February 4, 2024, 4:23pm

I think it would be best if AIR arithmetic was implemented as a standalone library, for those specialized applications that need it. It might be possible already, with some combination of clever operator overloading and macros.

Joe_Groff · February 4, 2024, 5:09pm

Even in the absence of a fully general AIR model, I think it helps to look at specialized integer types like the sized and unsigned types as strictly storage representations, and do your general arithmetic in a "big enough" integer type. For instance, you could use property wrappers to store integer values in smaller fields while still projecting their public API in terms of Int or Int64, to minimize the need for user code to mix types. (Another idea we considered in the early days of Swift was to make it so that Int was 64-bit on all platforms, though we decided against it partly for the sake of maintaining NSInteger == Int on the 32-bit Apple platforms that were still kicking at the time. For a new systems-ish language that wanted to do something like AIR but not pay for true bignums, having the int type be 64 or 128 bits would probably be good enough for 99.9% of use cases.)

ellie20 · February 4, 2024, 8:04pm

I wonder if it would be beneficial for Swift to leave it unspecified whether and when it uses an AIR model and when it doesn't, to allow for better optimizations.

ibex10 · February 4, 2024, 10:57pm

Sorry, but what is AIR model or AIR arithmetic?

Slava_Pestov · February 4, 2024, 11:00pm

Consider an arithmetic expression written in terms of variables having fixed-width integer types. You can imagine a system where we use interval arithmetic to compute the possible range of values for each intermediate expression and then represent the intermediate values exactly without overflow, before truncating or clamping to some final result type.

wadetregaskis · February 4, 2024, 11:17pm

In practical terms:

let example: UInt = 1 - 2 + 3

Today that either crashes (if you hide those values behind variables that the compiler can't see through) or fails with a compiler error (because the compiler recognises that otherwise it would crash at runtime). Even though to a human there's obviously nothing wrong with it.

"AIR" (and similar mechanisms) basically provide the illusion that until you actually store a numeric value, it has no limits on its upper and lower bounds (including going negative, even if the input types are unsigned). As a result, it's impossible to over- or under-flow in the midst of an expression, only when you actually try to store the value. So the above example would work just fine, assigning 2 to the variable.

It can be done pretty simply (e.g. using an actual arbitrary-precision type for the implicit intermediaries), but realistically - for viable runtime performance - it relies on the compiler cleverly deducing the actual numeric ranges of the values in question, and then using a suitable numeric type - e.g. an Int8 (or larger) in the above example.

Nevin · February 5, 2024, 2:00am

It stands for “as-if infinitely ranged”.

sspringer · February 5, 2024, 8:01am

In the general case, aren‘t indices that could have an invalid value “dangerous” anyway, regardless of their possible negativity?

tera · February 5, 2024, 12:11pm

How would AIR handle this fragment:

func foo(_ a: Int64, _ b: Int64, _ c: Int64, _ d: Int64) -> Int64 {
    42 * a * b / c / d
}
...
let result = foo(.max, .max, .max, .max) // should be 42

Will it have to use arbitrary precision numbers?

za_creature · February 5, 2024, 2:07pm

Can't speak for the formal reasoning, but this can sometimes be useful:

let a = UnsafeMutablePointer<Int>.allocate(capacity: 10)
for i in 0..<10 {
    a[i] = i
}
let b = a.advanced(by: 3)
print(b[-1]) // 2

I guess the point that I'm trying to make is that there is a material distinction between software developers and software engineers. The former tend to think of things in abstract, higher level, concepts whereas the latter understand that it all boils down to the presence or absence of charge in some part of some circuit somewhere, that our forefathers have collectively decided to call a bit. Not to say that developers are ignorant of the underlying workings of machines, just that they usually defer that level of reasoning to someone else, e.g. the compiler (authors).

If you accept the above as fact, you may then categorize languages as "developer-focused" or "engineer-focused". It's possible to create a memory safe language (modulo bugs) if you never expose the above "state of charge" to the users of the language but as soon as you allow them to peek under the hood, all bets are off. Rust seems to do the right thing here with a formal separation between safe and unsafe code but ultimately I believe that, for any language that allows C interop, the only reasonable thing to do is to assume that "we're all consenting adults here" and let electrons be electrons.

Swift is still a young language built by engineers for developers and is currently quite far from being self-hosting but, there is a future where that's no longer true. It may even be the best future for the language because eating your own dog food is a good motivator to get things done right. If and when that becomes a goal, I'd argue that the ability to do arbitrary pointer arithmetic is more valuable than getting an extra 2 gigs of memory for what are, at this point, exotic setups (embedded 32 bit controllers usually dream of being able to address the first 2 gigs to begin with).

tera · February 5, 2024, 4:42pm

Good example, although the Int index type used in Unsafe[Mutable]Pointer subscript doesn't explain why Unsafe[Mutable]BufferPointer and Array subscripts use Int instead of UInt – with the latter two accessing p[-1] is always a mistake, or could you come up with an example when it is not?

za_creature · February 5, 2024, 5:00pm

Interesting, I always thought of BufferPointer as Pointer + count but it appears not to be the case. Thank you for pointing this out!

Generally speaking, negative indices almost always signal an underflow (i.e. bug) but in some edge cases, they allow you to express implicit ranges without additional overhead and sometimes, it's useful to gain access to the broader range if you're certain that the whole block is still valid. I guess the examples on the top of my head are variables allocated relative to the current stack address and headers that precede some payload (e.g. in data received from the network). Both could of course be formally fixed with better API design, but the API is not always under your control.

More or less, why ! is still a part of the language (try!, optional!)

Joe_Groff · February 5, 2024, 5:00pm

tera:

How would AIR handle this fragment:
func foo(_ a: Int64, _ b: Int64, _ c: Int64, _ d: Int64) -> Int64 {
    42 * a * b / c / d
}
...
let result = foo(.max, .max, .max, .max) // should be 42
Will it have to use arbitrary precision numbers?

There are many possibilities. One could use bignums, or a sufficiently large maximum intermediate like Int128 which should cover most real world computations that would be able to fit into a fixed-size integer type at all. Integer divisions against a variable denominator are already relatively rare, and double divisions against multiple variable denominators even moreso. Nonetheless, in such a system you ought to be allowed to trap at any point where overflow obviously becomes inevitable, and you could also do some algebra to reorder the operations and check intermediates in such a way that you can avoid overflowing in successful computations while trapping early when the operation has no hope of fitting in bounds too.

tera · February 6, 2024, 12:01pm

Down this road there could be this subtle issue of user starting with a valid expression:

var example: UInt = a - b + c // ✅

Then adding a term or few:

var example: UInt = a - b + c + d // 🛑 expression is too complex to parse

splitting that expression into sub expressions:

var example: UInt = a - b // 💣 runtime crash
example += c + d

getting a runtime crash after a mild refactoring. Looks fragile.

We could probably mitigate it by leaving the expression "untyped" for as long as possible:

// pseudo swift ahead:
var temp = a - b // not of a particular integer type yet
example: UInt = temp + c + d

Not good for performance and/or real-time code.

This raises some questions:

Int128 is not nearly "infinitely ranged", which doesn't exactly match AIR (As-if Infinitely Ranged") acronym.
what to do with Int128 operations? Not have them at all? Go to the next Int256 type? Don't have AIR semantic on Int128 but only on the smaller types? (Personally I'd prefer having Int128 type and ops without AIR semantics on them instead of not having them at all or paying the tax of bignums).

Perhaps it was a bad example, I could have used right shifts instead of divisions.

Am I right assuming that AIR semantics would require a non-trivial compiler support compared to the current semantic which has no compiler support for arithmetic ops and those operations (and the types themselves) implemented in the standard library. Or could AIR semantics be implemented in the standard library?

Back to the original question. Int is much more versatile type compared to UInt. Many languages of the past didn't even have unsigned integers, only signed. Given this clever trick:

makes compiler check almost as fast compared to what it would be with unsigned index (I assume I could write the check "normally" (x >= l && x <= h) and compiler converting it to the optimised form for me). So overall signed indices were the right choice for Swift.

@TellowKrinkle, do you know how to express this in Swift?

bool rangeCheck(long num, long limit) {
    if (limit < 0)
        __builtin_unreachable();
    return num >= 0 && num < limit;
}