Changing the behaviour of String.init(BinaryInteger) for negative values

I am talking about bits because the behavior I want is exactly and specifically to show a fixed-width BinaryInteger in a given base, with enough digits to represent all of the bits in that number.

I often want this in binary and hexadecimal, but I would also use it in octal and decimal. This is a general-purpose, useful operation.

Literally nobody has asked to sign-extend to a power of 3. We are talking about using the existing in-memory representation of a fixed-width BinaryInteger, and creating a string to display it in a given base, including all of the bits of the number when we do so.

And yes, I am still talking about bits, because the number is represented as a binary integer, and that is what I want to display, in the base of my choice.

This comes across as quite patronizing and I request that you stop.

It is not a matter of β€œinventing” something esoteric. I already explained the definition I want, and I think it is simple, straightforward, and natural. It is a useful operation for displaying raw data in binary or hexadecimal or any other base, making sure that all numbers are shown with the same length.

The padding is the essential part that I am trying to add.

Saying it is β€œleft as an exercise to the reader” is extremely dismissive, and if anything it makes my point stronger. The standard library does not currently provide a simple, obvious, easy-to-use solution for padding integers for display.

It requires first converting to a non-padded string, then calculating the number of zeros needed, then creating a string which repeats that number of zeros, and finally combining the two strings.

This is a real shortcoming, and one which I would like to see addressed.

1 Like

I'm talking about the String initialiser, not the bitPattern initialiser.

On the subject of questions: I guess you might want to consider asking more questions and having back-and-forth discussions, rather than making assertions and refusing to listen to others. I agree with @Nevin that you're coming across as rather rude and dismissive.

Once again, FixedWidthInteger is not required. Replace the constraint you have with BinaryInteger, and replace T.bitWidth with bitPattern.bitWidth, and it all works. What about this makes you think all values must have the same width?

I'm not trying to patronize you. I am, however, going to push again:

I disagree. I would argue that it is esoteric to display "all the bits" of an integer in, say, base 3 or 7, and in particular to pad it to some length that happens to be the closest power of 3 or 7 larger than the bit width of the value. In what way is it "all the bits" if the closest larger power of 3 or 7 has a closest smaller power of 2 that's larger than the bit width? Wouldn't the logic of sign extension dictate that there are more bits than that obtained using your proposed operation?

I agree vigorously! I think there is a role for more ergonomic string padding (not just for numbers, but in general), and in particular in offering such facilities in the standard library. I think this is quite distinct from the original topic of this post, however.

1 Like

Let me clear up the scope of this thread, which @Karl can correct me if it's too wide or too narrow:

  • We want to convert any FixedWidthInteger to String and back.
  • String representation supports binary and hexadecimal (we may want to include octal, but the representation is ambiguous at best at the moment).
  • The String representation is sign-extended/zero-extended to cover the full width of the integer in question T.bitWidth.

I need to say this because up thread we use things like non-10 radix, which includes things like base 3, base 9, or base 11, but when we asked about it, it seems we don't actually need them.

Consider the alternative: namely, an arbitrary-width type. What do you expect to be the String result of your proposed function for the base 2 representation of -42 in an arbitrary-width type? There is no answer that round-trips without at least one additional bit of information.

This is because a fixed-width type uses the leading bit as the sign bit; for an arbitrary-width type, by construction, there is no fixed leading bit.

The logical two's complement representation of a positive value has an infinite number of leading zeros, which is simple enough to output if we're truncating leading zeros. But the logical two's complement representation of a negative value has an infinite number of leading ones; as soon as you truncate that value to a finite length, you've instead got a (possibly large) positive value. You need at least one other bit of information to distinguish between a truncated negative value or a positive value given the same bits.

For example, you could require that the String representation of a positive arbitrary-width value must have a leading zero; that would be your additional bit of information. You could always output a sign instead. But this is not required for fixed-width types, and so the function that you're describing which outputs purely the bit pattern of a binary integer cannot generalize to arbitrary-width types.

4 Likes

Personally, my most common use-case is displaying numbers in hex, so I’m going to focus on that.

If we had a 2-bit UInt2 type then its values would range from 0 through 3.

If we wanted to display a UInt2 in hex, we would use one hex digit for that.

We would never say, β€œOh, a hex digit requires 4 bits, and a UInt2 only has 2 bits, so we can’t display it!” That would be silly.

Similarly, if we had UInt6, we would never say β€œOh, there’s no way to display that in hex because its bit-width isn’t a multiple of 4.” That would also be silly.

We already know how to handle these cases, and other bases as well. Adding padding does not change anything in this regard.

How does this not round-trip? The number of bits does not impact whether or not the bit-pattern can be parsed unambiguously.

An arbitrary-width type would need to include its sign bit, just like a FWI type, and so the above bit-pattern is absolutely fine.

I even found an example on Wikipedia using a "7-bit notation" (coincidentally also representing the value -42). So yes, it's a true two's-complement integer bitpattern, and can be unambiguously parsed in exactly the same way as a fixed-width integer bitpattern.

If supporting arbitrary BinaryInteger isn't the point, and we just lazily zero-pad the unused bits (not signed pad), maybe something like this?

protocol BitPatternInteger: FixedWidthInteger {
    init(bitPattern: Magnitude)
}

extension BitPatternInteger {
    var bitPattern: Magnitude { .init(truncatingIfNeeded: self) }
}

extension Int: BitPatternInteger { }
extension Int8: BitPatternInteger { }
extension Int16: BitPatternInteger { }
extension Int32: BitPatternInteger { }
extension Int64: BitPatternInteger { }
extension UInt: BitPatternInteger { init(bitPattern: Magnitude) { self = bitPattern } }
extension UInt8: BitPatternInteger { init(bitPattern: Magnitude) { self = bitPattern } }
extension UInt16: BitPatternInteger { init(bitPattern: Magnitude) { self = bitPattern } }
extension UInt32: BitPatternInteger { init(bitPattern: Magnitude) { self = bitPattern } }
extension UInt64: BitPatternInteger { init(bitPattern: Magnitude) { self = bitPattern } }

extension String {
    init<T>(fullWidth: T, radix: Int = 10, uppercase: Bool = false) where T: BitPatternInteger {
        let pattern = fullWidth.bitPattern
        let partial = String(pattern, radix: radix, uppercase: uppercase)
        self = String(repeating: Character("0"), count: pattern.leadingZeroBitCount / radix) + partial
    }
}
extension BitPatternInteger {
    init?(fullWidth string: String, radix: Int) {
        guard let pattern = Magnitude(string, radix: radix) else {
            return nil
        }
        self = Self(bitPattern: pattern)
    }
}

I have no quibbles with either unsigned integers represented in any base, or with signed integers represented in any base that's a power of 2. This is all very straightforward.

I suspect that most users would expect sign extension behavior in the latter case, however. To put it explicitly, I would expect the hexadecimal representation of -1 as Int6 to be ff rather than 3f.

The latter does work, but it would be more brittle in more use cases, and particularly in scenarios that use a padded representation as you mention that you would want. This is because ff would round-trip correctly using a truncating initializer for any signed integer type whose hexadecimal representation maxes out at two digits (Int5, Int6, Int7, Int8), whereas 3f would not (it would only round-trip for Int5 and Int6). Therefore, in order to round-trip the latter result, you would need to know the bit width of the originating type prior to conversion to a String, while in the former scenario, the length of the String itself is sufficient information for correct round-tripping.

The (IMO) more useful semantics with sign extension can be generalized for any radix that's a power of 2, but it doesn't work for base 3, 5, 6, 7, etc. This is where things get esoteric.

3 Likes

This is much more constructive feedback, thank you.

1 Like

If "1010110" round-trips to -42, then how would you represent 86 (0b1010110)?

(In fact, many BigInt types use a sign-magnitude representation behind the scenes, so in that case outputting "-2a" as the existing API does would actually represent the number of set bits in the underlying storage!)

Agree.

And then maybe you could have Int6("ff", base: 3, twoComplement: true) and get back -1. Whereas trying the same with "3f" would overflow because -63 is not representable with Int6.

(Edit: sorry didn't mean this as a reply to Karl.)

My pleasure. This is what I meant when I said:

To me, this would imply that somehow 8 bits were extracted from a 6-bit type.

The bit-pattern of -1 as Int6 would be 111111, which is exactly 3f in hex. I would be greatly surprised if the output were anything other than 3f.

Perhaps we ought to introduce a bitPattern property on FixedWidthInteger:

extension FixedWidthInteger {
  var bitPattern: Magnitude {
    Magnitude(truncatingIfNeeded: self)
  }
}

Then when I want to see the hex value of a negative number, I can construct a string from x.bitPattern using whatever syntax we end up with that works for UnsignedInteger & FixedWidthInteger (and hopefully includes options for padding and separators).

That should also address @Karl’s use-case: if someone wants the minus sign, they can use the existing String initializer, and if they want the bit-pattern they can start with x.bitPattern.

.init(truncatingIfNeeded) is most definitely not what we want to use. It is lossy conversion, and does not exist in FixedWidthInteger. To get proper bit pattern type, we will at least need to introduce new associated type.

…what?

I am constructing an instance of Self.Magnitude from an instance of Self. Those types should be the same size, and the conversion non-lossy.

I think this is totally fine in terms of eliminating confusion as to semantics.

Using Magnitude(truncatingIfNeeded:) is guaranteed not to be lossy because the protocol makes certain guarantees of FixedWidthInteger types as well as Magnitude. What @Nevin is proposing is syntactic sugar to improve the usability of getting the bit pattern.

1 Like

Right... I was thinking of .init(clamping:)...

1 Like

Only in the same sense that using the >> right shift operator "somehow" gives us additional set bits in the case of negative numbers.

One mental model of fixed-width two's complement integers is that they model a finite number of integers using a finite amount of memory, but the underlying binary sequence being modeled is unchanged among types:

// The number 1:
0b [...] 0000 0000 0000 0000 0000 0001
// 1 as Int8:               β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
0b [...] 0000 0000 0000 0000β”‚0000 0001β”‚
//      Stored in memory -> β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

// The number -1:
0b [...] 1111 1111 1111 1111 1111 1111
// -1 as Int8:              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
0b [...] 1111 1111 1111 1111β”‚1111 1111β”‚
//      Stored in memory -> β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

A limited number of APIs in Swift operate only on the stuff inside the box, but many (even those that operate on bits, from >> to truncatingIfNeeded) take into account the entire arithmetic value. In general, I think, the APIs on FixedWidthInteger do tend to take "fixedness" of the binary sequence into account; so I see where you're arguing from.

But, in the scenario we discuss here, the output String represents more bits than are in the fixed-width integer. This is not avoidable, whether the result is 3f or ff. The only difference is that 3f implies that the two bits that were "somehow" extracted are zeros, whereas ff implies that those additional bits are ones.

If we didn't want users to rely on the value of the additional bits, we could take the extreme position that those bits could be 00, 01, 10, or 11 randomly. But if we had to choose a set result, I would argue that the only satisfactory choice here would be 11 given the mental model above.


Put another way, suppose you had an API that gives you an arbitrary amount of padding (I would actually advocate for such an API).

For example: "Give me the two's complement representation of -1 as Int8 in exactly 32 binary digits." You would surely expect the padding to be a bunch of leading ones, not zeros.

(This is, of course, in fact exactly the behavior of .init(truncatingIfNeeded:)--only here we're talking about converting from integers to String and vice versa rather than from integers to other integers.)

4 Likes