Changing the behaviour of String.init(BinaryInteger) for negative values

String has an initialiser for BinaryIntegers, allowing you to print hex and binary representations of arbitrary integer values:

print(String(42, radix: 16))  // prints "2a"
print(String(42, radix: 2))   // prints "101010"

However, the results for negative values are... strange. Rather than return the full two's-complement representation, String prefixes a minus sign and prints the magnitude:

print(String(-42, radix: 16))  // prints "-2a"
print(String(-42, radix: 2))   // prints "-101010"

Similarly, the BinaryInteger.init(String, radix: Int) only accepts strings in this format, and cannot parse strings representing the actual binary bit-pattern of negative numbers:

print(Int("-101010", radix: 2)) // prints "-42"
print(Int("1111111111111111111111111111111111111111111111111111111111010110", radix: 2)) // prints "nil"

This is super annoying, and basically makes these initialisers useless for anything other than decimal strings. If you actually want to print a hex/binary dump of an integer, you're better-off implementing it yourself.

Is it too late to change this behaviour? If so, what does the community think about adding versions of these initialisers which do the expected thing for constructing and parsing binary/hex strings?

1 Like

What’s the bit width of the integer value you have in mind? Int64 since it’s the largest, or Int since it’s the natural one?

The behaviour I described occurs for all BinaryIntegers, and it's simple enough to make a generic version which prints the true binary/hex representation, so it would apply to all integer types in the standard library.

The conversions can quickly make use of bitPattern, so I don’t think implementing it ourself is too troublesome. All that’s left is how common this shows up.

We’ll probably need new names for them anyway

String.init(twosComplement:)
Int.init(twosComplement:)
1 Like

This is what I'm currently using:

print(String(binary: -4242 as Int16)) // prints "1110111101101110"
print(Int16(binary: "1110111101101110")) // prints "-4242"

It's fairly common to want to print binary if you do low-level bit manipulations (e.g. packing data in to an integer), and hex is super-useful if you need to dump larger amounts of memory or a file.

One interesting thing about current design (that somebody might be relying on) is that it works with any size. You can do Int8 -> String -> Int easily.

BinaryInteger doesn't imply fixed-width, which is where the behavior you're describing comes in. I think it's reasonable to have the functionality you describe, but I'd imagine it would be most appropriately constrained to FixedWidthInteger.


Edit: Also, @cukr is also absolutely right below that the behavior is exactly correct, and that you're looking for something different altogether.

2 Likes

If we’re going to improve this area, I would very much like an option to include all the bits of a FixedWidthInteger (ie. with the leading zeros).

That is, strawman syntax:

let x: UInt8 = 5
let s = String(x, radix: 2, fullWidth: true)
print(s)  // "00000101"

Maybe even a way to break it up into groupings:

let y: UInt64 = 0x00ABCDEFDEADBEEF
let z = String(y, radix: 16, fullWidth: true, separator: " ", every: 2)
print(z)  // "00 AB CD EF DE AD BE EF"

The separator could be used for inserting commas or spaces or whatever is needed.

1 Like

Current implementation is correct. Negative numbers in binary are made by writing minus sign before the number, just like with decimal numbers. What you want is a two's complement, which should be a completely different init/func

3 Likes

We don't actually need fixed-width, because FixedWidthIntegers always return a constant for BinaryInteger.bitWidth (i.e. the bitwidth does not depend on the value, which seems like a result that should hold for custom integers, too).

BinaryInteger doesn't explicitly say that it is two's-complement, but it is. Even C and C++ are ditching support for non-two's-complement signed integers (paper here, talk here). That's all we need, even to print leading zeroes.

I agree that separators would also be really, really useful.

I don't think this feature is meaningful in the general case for any radix other than 2.

What would you propose to be the
"full-width" representation of an Int31 type in hexadecimal? How about ternary?

You could pad additional sign bits, but what do you expect to do with such a result, and how would it be handled when you feed it back to an initializer to re-create the value from a String?

2 Likes

We do actually need fixed-width. What do you expect to be the String result of your proposed function for the base 2 representation of -42 in an arbitrary-width type?

To add, things like BigInt comes to mind.

Is there a reason you're not just using the bitPattern initializer?

String(UInt16(bitPattern: -4242), radix: 2)
Int16(bitPattern: UInt16("1110111101101110", radix: 2)!)

As I said, the spelling “fullWidth” was just a strawman. The desired behavior is “show the leading zeros”. We can ignore signed types for now and focus on non-negative numbers.

A 31-bit unsigned integer has a representation in every integer base greater than 1. For any particular base, there is a maximum length for 31-bit integers, in terms of how many digits are required to represent the number in that base. Whatever the maximum length is, we can pad all 31-bit integers to that same length in that base, by appending the leading zeros.

That is what I want, and it works for any base.

The same idea applies to signed integers, with behavior as if converting the bit-pattern to the unsigned type of the same size. That is, identical behavior to what @Karl wants. After all, “show leading zeros” and “show leading ones” are really the same operation: “show all bits”.

Well, the Int8 value is "11010110", so if the type returned the minimum number of bits, I would expect it to return "1010110" (i.e. 7 bits). That's a valid integer, basically the inverse of a sign-extension.

Because I want all the bits, and given that that initialiser trims leading zeroes and represents negative numbers weirdly, I'm just not sure if I can trust it. Maybe there are other non-obvious behaviours I don't know about.

Correct for a certain definition of "correct" :wink:

I don't really mind if it's a behaviour change to the existing initialiser or a new one entirely, but I think most people who the existing initialiser with a non-10 radix expect different behaviour than is currently implemented.

Rather, specifically binary, and hex radix.

Notice how you've reverted to talking about "bits" in the case of signed integers here. This explanation does not generalize to base 3 or to any other base that [edit: isn't a power of two, or that is a power of two that] does not evenly divide the bit width of the fixed-width integer type.

Two's complement representation requires "sign extension" to represent a negative value in a type of greater bit width. But it is not possible to sign extend in base 2 to an exact [edit: power of 3 clumsy wording: "trit width"]. This is where the semantics get problematic.

I am not arguing that there isn't some definition that can be invented. The question is: what is the meaning of such a result? In any case, as I showed in a preceding reply, there is an easy way to compose APIs to express clearly the operation that you propose, so if someone has use of such a result, they can already write it fairly straightforwardly:

String(UInt32(bitPattern: -42), radix: 3)
// Padding the result is an exercise for the reader
1 Like

I'm not sure what you mean [edit: ah, in the first paragraph, you were replying to the question about -42 of arbitrary-width type; see my later reply].

UInt16(bitPattern:) takes arguments of type Int16. There is no trimming of anything; this is the purpose of the bitPattern APIs.

I guess I shouldn't have phrased it in the form of a question; let me rephrase: the functionality that you're looking for is present in the standard library, and the way to do it is as I've spelled above.


Here's a generic version; I do agree it would be reasonable as a standard library facility. It is fairly straightforward, though:

extension String {
    init<T: FixedWidthInteger>(bitPattern: T, radix: Int) {
        precondition(radix.nonzeroBitCount == 1 && T.bitWidth % radix.trailingZeroBitCount == 0)
        self = String(T.Magnitude(truncatingIfNeeded: bitPattern), radix: radix)
    }
}

extension FixedWidthInteger {
    init?(bitPattern: String, radix: Int) {
        precondition(radix.nonzeroBitCount == 1 && Self.bitWidth % radix.trailingZeroBitCount == 0)
        guard let temporary = Magnitude(bitPattern, radix: radix) else { return nil }
        self = Self(truncatingIfNeeded: temporary)
    }
}

String(bitPattern: -4242 as Int16, radix: 2)
Int16(bitPattern: "1110111101101110", radix: 2)

[Edited for less restrictive preconditions]

leadingZeroBitCount will be useful here.