[ClangImporter] what compiler code effects the mapping of C types to stdlib types?

carlos42421 · July 5, 2020, 9:37am

I'm having some unusual results with the ClangImporter.

This declaration...

#define uint32 unsigned long
void _setupSerial(uint32 baudRate);

Seems to be read wrong by my ClangImporter.

Compiling this function...

public func SetupSerial() {
	_setupSerial(57600)
}

is lowered to this LLVM IR...

define protected swiftcc void @"$s3AVR11SetupSerialyyF"() #0 !dbg !23 {
entry:
  tail call void @_setupSerial(i16 -7936), !dbg !27
  ret void, !dbg !29
}

Which is wrong. It should be i32.

As I understand it, the mapping of types onto standard library types by ClangImporter is controlled by the special standard library types in CTypes.swift, such as...

public typealias CUnsignedLong = UInt

...so this might be the file I need to tweak for the standard library on my platform? (Int is 16 bits on my platform.)

What is confusing me is this, I thought that MappedTypes.def determined the mapping, but this is only used in one place I can see, getSwiftStdlibType in ImportDecl.cpp

However, when I was trying to debug this something weird happened. I put breakpoints in this function and tried to debug the clang import, it was only called a few times and only for unusual types defined in my header (mostly typealiases for function pointer types). I expected the lookups to be done every time a BOOL or unsigned char etc were imported?

I also noticed there's a BuiltinMappedTypes.def, how does this relate to the import mapping?

typesanitizer · July 5, 2020, 6:57pm

Seems to be read wrong by my ClangImporter.

My understanding (possibly incorrect) is the behavior is "as expected" if you haven't modified your CTypes.swift. unsigned long gets imported as CUnsignedLong as per BuiltinMappedTypes.def. CTypes.swift has CUnsignedLong = UInt as you mentioned for non-Windows and non-x86_64 platforms at the moment, so it gets translated to UInt. UInt has the same width as Int. It is a bit mysterious as to why it is i16 and not u16 though.

If you think about the mapping, there are lots of different kinds of types. There are built-ins atoms like int, unsigned int etc, there are built-in composites (like function types, which have a bunch of argument types and one return type), there are "core" typedefs (like uint32_t, this is somewhat fuzzy but you get what I mean), and there are user-defined types and typedefs.

If you think of a type as a tree structure, the built-in atoms are the leaves of the tree. BuiltinMappedTypes.def describes how Clang's built-in atoms translate to Swift (technically, these are just called builtins), barring special ~~hacks~~ cases.

Following the tree structure, translating types between languages is like writing a tree interpreter.

There is a Clang type -> Swift type tree interpreter in ImportType.cpp.
There are Swift type -> Clang type tree interpreters in GenClangType.cpp/ClangTypeConverter.cpp (these are very similar, so you just look at one without worrying about why there are two of them).

What is confusing me is this, I thought that MappedTypes.def determined the mapping, but this is only used in one place I can see, getSwiftStdlibType in ImportDecl.cpp

MappedTypes.def is essentially a list of special-cases that are handled for better user ergonomics (I might be wrong on the purpose, maybe @jrose can describe it better). For example, typedefs can be imported as the underlying type (i.e. erasure-on-import) or they can be translated to typealiases in Swift (i.e. transliteration-on-import). With these specific types, we want to make sure that no new typealiases are introduced, instead, we hard-code what they get imported to. For example, if a platform header defines typedef int32_t int because it has 32-bit ints. As a base assumption, let's assume that int gets imported as Int (hopefully uncontroversial). Then we do not want the typedef to be erased (e.g. functions that look like they take int32_t arguments look like they take Int arguments in Swift), and we do not want to introduce a new typealias like typealias Int32 = Int. Instead, we want to make sure that int32_t gets mapped to Int32, which is a struct in the stdlib, directly without any extra indirection from a typealias. Since this doesn't fit cleanly into the two most common import strategies, and we care about ergonomics, and this is probably a more scalable solution than modifying headers with attributes for every platform we care about, we hardcode these.

typesanitizer · July 5, 2020, 7:02pm

FWIW, I've filed [SR-13146] [docs] Document ClangImporter pieces - BuiltinMappedTypes.def and MappedTypes.def · Issue #55593 · apple/swift · GitHub to track adding documentation which would answer some of the questions you've raised.

jrose · July 5, 2020, 7:17pm

That all sounds right to me. BuiltinMappedTypes.def does the basic C types, and MappedTypes.def does specific typedefs that should be translated differently.

LLVM integers represent signedness on operations rather than types, so "i" is used for all bare integers.

typesanitizer · July 5, 2020, 7:44pm

How am I just realizing this today after so much work on ClangImporter and IRGen. *awkwardly slinks into the shadows*

carlos42421 · July 5, 2020, 9:03pm

Hah. No awkward slinking for you @typesanitizer. You always reply quickly, with lots of detail. I really appreciate your help.

I spend much more time in the llvm end of things than swift so I was familiar with the i16 thing already. It threw me a lot when I first got into LLVM.

I think that's enough information for me to debug properly the next time something goes squirrelly, and it explains what each file does (plus why BuiltinMappedTypes.def has higher visibility)...

But yeah, more documentation on the clang importer would probably be good. I can remember getting quite mystified by this sort of thing in the ClangImporter back in the days when i was a regular swift/ios developer, before I spent time "behind the curtain"...

For my fix, i needed a small tweak on my C header files, then map CUnsignedInt to UInt16, CUnsignedLong -> UInt, etc. in my CTypes.swift.

Thanks again for your help.

carlos42421 · July 5, 2020, 9:49pm

To make sure I understood your original post correctly, the ClangImporter basically parses header files like clang normally would, doing parse, Sema and building an AST (I'm not too familiar with clang internals). As part of that process, it builds the usual data structures you'd expect in an AST, functions, declarations, statements, etc. and at the "leaf" nodes of that AST, you have Clang builtin types, such as clang::BuiltinType::Int or clang::BuiltinType::ULongLong. When ClangImporter is translating that AST to a Swift AST visible to the swift half of the module, it relies on the function VisitBuiltinType in ImportType.cpp to do the mapping, which in turn refers to the meta programming include file BuiltinMappedTypes.def to find the corresponding standard library types.

In parallel with this process when "importing" typedefs, there are various special cases, one set of which is defined by typedefs matching the names in MappedTypes.def, which are imported to known standard library types, for example uint32_t is imported as UInt32. This is handled in getSwiftStdlibType called by VisitTypedefNameDecl. This function also handles some other special cases and finally emits a typealias node into the Swift AST.

Is the above roughly right?

Just a few nitpicky thoughts on this while I'm here...

Isn't the name BuiltinMappedTypes.def a little misleading? The types being mapped are clang builtins, not Builtins from the swift module of that name or llvm builtin types. Perhaps the names ClangBuiltinTypesMap.def and SpecialCaseTypesMap.def would be more clear?
In documentation, as well as documenting this in docs/ClangImporter.rst or similar, perhaps we could fill out the header files of the .def meta programming include files a little more? Happy to make suggestions!

Carl

typesanitizer · July 5, 2020, 11:25pm

Yes.

Yes, it is a bit unclear. The doc comment at the moment does say:

This file defines the database of builtin C types that are imported as swift
stdlib types.

so once you open the file, hopefully it is clear that the builtin refers to C not Swift, but I don't see that as a reason to not make it clearer, especially since filenames are not written very often so being verbose does not impose a consistent reading penalty. My general inclination is that if people X, Y, Z are fine with a name and person A finds it somewhat confusing, we should try to find a name that all of them are ok with.

I don't want to comment on exact suggestions because we'd probably want to make sure we still follow a consistent naming scheme. For example, if I search for things named .*?Types.def, I get

./include/swift/ClangImporter/BuiltinMappedTypes.def
./include/swift/ClangImporter/SIMDMappedTypes.def
./include/swift/Runtime/BuiltinTypes.def
./include/swift/Basic/FileTypes.def
./include/swift/AST/KnownObjCTypes.def
./include/swift/AST/KnownStdlibTypes.def
./include/swift/SIL/BridgedTypes.def
./lib/ClangImporter/MappedTypes.def

Apart from FileTypes.def, maybe the rest should be named consistently? I'm not sure. Or maybe just the .*?MappedTypes.def files should be renamed consistently. If you file a bug report for this, we can follow up there.

That said, I don't think just renaming these will solve our problems. For example, we do have a ClangSwiftTypeCorrespondence.h which could potentially be confused with either of these def files. (Don't ask me why that file exists. )

In documentation, as well as documenting this in docs/ClangImporter.rst or similar, perhaps we could fill out the header files of the .def meta programming include files a little more? Happy to make suggestions!

Yes, expanding the doc comments in the .def files was kinda' the plan I had in the my head (I should've written that down though in [SR-13146] [docs] Document ClangImporter pieces - BuiltinMappedTypes.def and MappedTypes.def · Issue #55593 · apple/swift · GitHub). We can discuss other ideas if you have any in the same JIRA or a separate JIRA.