Error: module 'Builtin' has no member named 'zextOrBitCast_Word_Int32'

carlos4242 · October 21, 2018, 3:51pm

So this question has a few parts. And it's probably one of those "confirming my suspicions" questions more than anything else.

Background
I'm building a custom made, tiny, minimal stdlib for my 16 bit architecture (AVR), to go with the compiler I've already built (that was the easy part, amazingly enough).

In the process, naturally, I'm getting to understand a lot more about how the stdlib (and in fact the swift language generally) works.

As has been commented in many places, standard integer types in swift are not base types in the same way they might be in other languages but are (correct me if I'm wrong here), pretty much simple structs containing a member of type Builtin.Int8 or Builtin.Int16, etc., which lowers to the relevant LLVM primitive type during the IR emission phase of compilation (after going through the SIL phases). The methods in these structs are all "transparent" and basically fold down to simple LLVM IR in the ways you'd expect. After this arguably slightly circuitous process, code like...

let u1: UInt8 = 5
let u2: UInt8 = 7
let u3 = u1 + u2

... ends up lowered into the simple LLVM instructions and types that you would expect and that's then passed on into assembly/machine code emission by LLVM to create clean, fast, efficient code, appropriate to the optimisation level that you specify.

Question 1
In the standard library, the builtin type corresponding to Int and UInt is Builtin.Word. I'm trying to figure out what that is; it looks like it's platform dependent and my guess is that it's 64 bits on "64 bit platforms" (e.g. x86_64, modern iOS devices), it's 32 bits on "32 bit platforms" (e.g. older raspberry pis, old iOS devices) and... that's pretty much it. Therefore in theory the number of bits in Builtin.Word is derived from the target datalayout with the various parts usually defined by defaults in LLVM. Is that right?

Question 2
Pretty much a follow on from that question, it looks suspiciously like "16 bit" words are not supported by the compiler yet. That's the reason for the above error in my stdlib because I'm trying to handle the case of converting from an (imagined) 16 bit word as specified by the target dalayout for AVR, zero extended to 32 bits. But the compiler code (e.g. lib/AST/Builtins.cpp and many other places) does not have suitable builtin handling for truncating or zero extending 16 bit words.

Indeed this unit test seems to suggest that this is the case (only 32 bit and 64 bit words are currently handled by the compiler builtins)...

.../test/Parse/builtin_word.swift:
   22  i128 = Builtin.truncOrBitCast_Word_Int128(word) // expected-error{{}}
   23  
   24: word = Builtin.zextOrBitCast_Int128_Word(i128) // expected-error{{}}
   25: word = Builtin.zextOrBitCast_Int64_Word(i64) // expected-error{{}}
   26: word = Builtin.zextOrBitCast_Int32_Word(i32)
   27: word = Builtin.zextOrBitCast_Int16_Word(i16)
   28  
   29: i16 = Builtin.zextOrBitCast_Word_Int16(word) // expected-error{{}}
   30: i32 = Builtin.zextOrBitCast_Word_Int32(word) // expected-error{{}}
   31: i64 = Builtin.zextOrBitCast_Word_Int64(word)
   32: i128 = Builtin.zextOrBitCast_Word_Int128(word)
   33  
   34  word = Builtin.trunc_Int128_Word(i128)

Question 3
This might be a dumb question, but why do the unsigned and signed versions of the integers use the same underlying Builtin for the same bit width? e.g. Both UInt8 and Int8 store under the hood as Builtin.Int8 rather than one storing as Builtin.UInt8 and one as Builtin.Int8.

Footnote
The questions above will really help me in building my stdlib but ultimately I guess I'll need to find a way forward so there are two approaches: 1) come up with hacks or workarounds on the issue for now, these will be for fairly obscure conversions like Int16 -> Int32 that I'm unlikely to ever need in well formed AVR code so I can possibly sidestep it for now? (Unless there are aspects I haven't thought about that mean I have to do a "proper" fix.) ...or 2) fix the compiler to allow 16 bit words; from a quick grep through code, that looks like I'd need to fix a lot of places, initial parsing, SIL emission, SIL optimisation and of course many unit tests? Do people think it would be a lot of work?

Alejandro · October 21, 2018, 5:21pm

To answer question 3, LLVM integers simply represent an arbitrary integer with N bits. Therefore LLVM does not have unsigned integers or signed integers, its the arithmetic intrinsics that have this unsignedness and signedness. e.g. llvm.sadd.with.overflow.i8 which is signed addition, notice the s in front of the add, and uadd.with.overflow.i8 which is unsigned addition, notice the u in front of the add.

carlos4242 · October 23, 2018, 8:34am

I've done a bit more investigation but I'm still a bit lost on this and I could use some help about how parsing works. I want to add support for the above operation in Builtins. I can't work out how the compiler knows what operations like this are valid tokens vs what aren't. I searched through the source code for zextOrBitCast_Word and found no matches apart from comments, standard library swift code and unit tests, so I'm puzzled how it works. Also there's no error message "module 'Builtin' has no member" so it must be doing some sort of magic parsing that I don't understand, because zextOrBitCast_Word_Int64 is fine!

Any advice or help where in the source code this magic parsing is done would be great!

Carl

carlos42421 · October 24, 2018, 3:07pm

This is probably effectively answered by my related question: Where Builtin gets recognised in the compiler, during AST, Sema or during/after SIL stage? - #2 by carlos42421.

Code patches will be needed as usual because of lack of 16 bit pointer/register/int support in the standard library and compiler.