I noticed that my Float8
implementation doesn't quite match the behavior of Float
and Double
when it comes to rounding, eg:
typealias F = Float8
let a = F.greatestFiniteMagnitude
let b = a.ulp / 2
// F = Float F = Float8
print(a + b.nextDown == a) // true true
print(a + b == .infinity) // true false
print(a + b.nextUp == .infinity) // true true
(It's not specific to near infinity or the +
operator, it's simply that it rounds the value exactly in the middle between two representable values differently.)
The cause of this turns out to be the way I convert values of another floating point type to Float8
. While the standard library uses this for eg Float32.init(_ other: Float64)
:
public init(_ other: ${That}) {
% if srcBits > bits:
_value = Builtin.fptrunc_FPIEEE${srcBits}_FPIEEE${bits}(other._value)
% elif srcBits < bits:
_value = Builtin.fpext_FPIEEE${srcBits}_FPIEEE${bits}(other._value)
% else:
_value = other._value
% end
}
My corresponding Float8.init(_ other: Float32)
is this:
init<Source: BinaryFloatingPoint>(_ value: Source) {
self = Float8._convert(from: value).value
}
(where Float8._convert(from:)
is my own copy of the same named standard library method, to prevent unintentional infinite recursion.)
I figured that should perform the same kind of conversion, but it doesn't, as can be demonstrated like this:
let a = Float.greatestFiniteMagnitude
let b = a.ulp / 2
let c = Double(a) + Double(b)
print(Float.init(c)) // inf
print(Float._convert(from: c)) // (value: 3.4028235e+38, exact: false)
Why aren't both inf
(or 3.4028235e+38
)?
That is, shouldn't
Float._convert(from: myDouble).value
always be equal to
Float(myDouble)
?
If not, why are they doing their rounding differently?