That would be very cool!
One thing that I cannot quite understand about the method we use (as described in the paper), is why the values of the lower and upper bound of the range should occur with 50% lower probability than all the other values in between.
I might see how/why it makes sense at in-range exponent boundaries, but why should it make sense for the lowest and highest values of the full range, and how does it make sense at the exponent boundary between subnormals and normals (these two have the same .ulp
, which is not the case for any other exponents)?
It's easier to explain with an example:
Let's say we only want to generate values within a single exponent range, the one with .ulp == 1
, ie these values:
8388608.0 == Float(1 << 23) // This has .ulp == 1.0, and so have all these:
8388609.0 == Float(1 << 23).nextUp
8388610.0 == Float(1 << 23).nextUp.nextUp
8388611.0 == ...
...
16777215.0 == Float(1 << 24).nextDown == Float((1 << 24) - 1)
16777216.0 == Float(1 << 24) // (But this last one has ulp == 2.0 ...)
And we build a histogram (or rather just the lower and upper ends of a histogram (because the whole histogram would take up ((1 << 23) + 1)*8 == 64 MB + 8 Bytes), like so:
func test() {
var generator = WyRand()
let floatCount = ((1 << 23) + 1) * 500 // Meaning about 500 samples/bin, no?
var lowerBins = [Int](repeating: 0, count: 16)
var upperBins = [Int](repeating: 0, count: 16)
for i in 0 ..< floatCount {
let v = Float.random(inBinades: Float(1 << 23) ... Float(1 << 24),
using: &generator)
let binIndex = Int(v)
if binIndex < (1 << 23) + 16 { lowerBins[binIndex - (1 << 23)] += 1 }
if binIndex >= (1 << 24) - 15 { upperBins[binIndex - ((1 << 24) - 15)] += 1 }
if i & ((1 << 28) &- 1) == 0 {
print("Count down: ", (floatCount - (i + 1)) / (1 << 28), "…")
}
}
print("Lower 16 bins:")
for i in 0 ..< 16 { print(i + (1 << 23), lowerBins[i]) }
print("Upper 16 bins:")
for i in 0 ..< 16 { print(i + ((1 << 24) - 15), upperBins[i]) }
}
Then we can see what I mean:
Lower 16 bins:
8388608 220
8388609 475
8388611 507
8388612 483
8388613 477
8388614 521
...
...
Upper 16 bins:
...
...
16777211 485
16777212 478
16777210 529
16777214 467
16777215 508
16777216 242
The first and last value in the range will occur only half as often as the other values.
Why shouldn't all values occur about 500 times?
And I don't like that the method seems to be built around including the upper bound value, it would feel better (less of a special case) to start with a half open range ..<
and then tackle the special case of ...
(last value is special).