Add `Float16`

(Steve Canon) #1

This is a pretty simple pitch: add Float16 as a stdlib type. The semantics are specified by IEEE 754, so there's no real questions to resolve about the behavior, and the conformance to BinaryFloatingPoint defines almost all of the API. The bulk of the actual implementation is handled for us by LLVM.

Implementation in progress here:

Given how straightforward this is, I expect to progress to a formal proposal pretty quickly.

Alternatives Considered
The only real question I see is what to name the type. The obvious alternative would be Half, but Float16 is much more explicit, and (unlike Float and Double) there's no prior art from C/C++/etc that would lead us to use the less-clear name Half. So Float16 it is, following the precedent set by Float80.

(Saleem Abdulrasool) #2

Probably meaningless, but, I would think that Float16 is a much better name compared to Half given that the extension token in clang for this is __fp16 and ACLE (ARM C Language Extensions) as well as ISO/IEC TS 18661-3:2015 defines _Float16. This makes the name much easier to discover for those with a C/C++/ObjC/ObjC++ background.

(Steve Canon) #3

The counterpoint would be that no one actually uses _Float16 in [Obj]C[++] yet, while shader languages, which people do use, fairly consistently use half. But, as I've already said, I think Float16 is the better name, too.

(Holly Schilling) #4

Excuse my ignorance, but what architectures natively handle Float16?

(Steve Canon) #5

armv8.2 (in particular all recent Apple CPU cores), most mobile GPUs, Intel integrated graphics from SKL forward, etc.

Even cores that do not have half-precision arithmetic still have fast conversions between half and float (armv7 with VFPv4, all armv8 cores, all Intel cores since Ivybridge), which allow you to perform arithmetic with acceptable speed by converting to float, doing the arithmetic, and then converting back to half.

Even if we didn't have efficient arithmetic (we do), we would still want to add these types to allow people to work with data structures that are going to be passed to the GPU for computation, and they would still be beneficial for reducing memory pressure in compute-intensive workloads (like CNN weights).

(^) #6

this type is called half in hlsl, so I’d be in favor of calling it Half in Swift. It would be weird to have Float16, but not Float32 or Float64.

(Martin R) #7

Float32 and Float64 are already defined as type aliases of Float and Double.

(Ben Cohen) #8

What's more, the FIXME above their declaration claims they should actually be the true names for the types and Float and Double be the type aliases, but for some (possibly no longer applicable) typechecker bugs.

(^) #9

any plans to phase out Float and Double then?

(Steve Canon) #10

Those names are probably too entrenched to actually remove at this point, but we can avoid adding new names with the same issues (and we plausibly could make them the typedefs and deprecate eventually, though that would require some discussion).

(Rod Brown) #11

I'd prefer consistency either way. Half, Float & Double or Float16, Float32 and Float64. :blush:

(Steve Canon) #12

Given that Float80 is already a thing, that means you prefer Float16. Great. Let's move on.

(Saleem Abdulrasool) #13

At some point, we will need Float128 (Why AArch64, PPC64? Why?)

(Steve Canon) #14

Yes, and I intend to make the formal proposal here include the "intention" to implement that (and call it Float128), so we can get the Evolution process and bikeshedding for both done at once =)

As to the "why": because some HPC people told the architects that they wanted it, and the architects didn't push back hard enough.

(Karl) #15

Wouldn't it be a source-compatible, binary-breaking change to rename Float/Double? If so it would be good to do it soon.

Also, it seems like there are many versions of 2-byte floats. The IEEE version, an ARM variant of that which doesn't have infinity or Nan, and bfloat16 which keeps the same 8-bit exponent as Float and truncates the fractional part.

Would it also make sense to add bfloat16 (ideally with a better name)? Just going by the wikipedia description, it sounds like a useful datatype with existing and planned hardware implementations.

(Jordan Rose) #16

We actually get out of the binary break for Float and Double because they've always had substitutions in the mangling. It might actually be possible to make this change without disturbing anybody.

(Xiaodi Wu) #17

What, really, is the point of swapping the type alias and the canonical name even if we could do it?

(Steve Canon) #18

There are really only two, and they're quite different.

ARM has moved away from their earlier format, and supports the IEEE 754 format now (in particular, the armv8.2 arithmetic instructions do not support the "AHP" encoding: "ARMv8.2-FP16 adds half-precision data-processing instructions, which always use the IEEE format. These instructions ignore the value of the relevant AHP field, and behave as if it has an Effective value of 0.")

bfloat16 will continue to be a thing, but it's really a compressed representation of Float, rather than its own format. In particular, any arithmetic on bfloat16 values (if those operators exist at all) should produce Float results, while arithmetic on Float16 will produce Float16 results.

(David Sweeris) #19

I support both adding this type and calling it Float16. I am not opposed to adding a typealias for it called Half if there’s enough support.

(Chris Lattner) #20

cc @rxwei

I'm very much +1 on this, including the name Float16.

Definitely -1 on these semantics. This is the behavior of C due to its promotion rules (which go all the way to double) but is definitely not the way BFloat16 works on ML accelerators. The B stands for "Brain" float, i.e. Google Brain, and there is a lot of hardware support for it in data centers today. There is also an Intel AVX extension to support BFloat16 as well.

In any case, BFloat16 is something the Swift for TensorFlow project will have to support, in order for it to support Tensor<BFloat16>. It isn't clear if we should provide full software implementation at (e.g.) the LLVM and compiler_rt level or whether BFloat16 should be a "storage only" marker type.