Problems with `SwiftGlibc` and proposed fix

Summary

We (@gribozavr and @mboehme) propose to fix two problems with the SwiftGlibc module map:

  • Today, some submodules in SwiftGlibc fail to provide definitions that they should contain. As a consequence, Swift fails to import some code that compiles correctly with standalone Clang. As just one example, including signal.h should make the type pid_t available, but it currently does not.
  • SwiftGlibc is not compatible with the libc++ module map. Trying to include libc++ headers in a C++ module imported into Swift results in an error message about cyclic dependencies.

We propose a change to the SwiftGlibc module map that solves both of these problems and at the same time simplifies the module map.

This will unfortunately be a lengthy post, as we need to explain some context. We look forward to your feedback!

Problem description

SwiftGlibc is missing definitions in some submodules

Today, some submodules in SwiftGlibc fail to provide definitions that they should contain. As a consequence, SwiftGlibc fails to import some code that compiles correctly with standalone Clang.

This affects types such as pid_t that glibc defines using the following construct:

#ifndef __pid_t_defined
typedef __pid_t pid_t;
#define __pid_t_defined
#endif

glibc defines pid_t in this way in multiple header files, as required by POSIX (e.g. signal.h, unistd.h, and sys/types.h). A number of other types use the same construct (e.g. ssize_t, intptr_t, uid_t, gid_t).

Because Swift does not set the -fmodules-local-submodule-visibility flag, the __pid_t_defined macro leaks from the first submodule that defines it into all submodules that follow. As a consequence, SwiftGlibc ends up providing pid_t and similar types through just a single header / submodule – namely, the one that happened to be the first one to encounter the #ifndef __pid_t_defined construct.

As a consequence, a C header file may fail to compile when imported into Swift, even though it compiles cleanly outside Swift with the same system headers. Here is a simple example:

#include <signal.h>

pid_t returnPidT();

This compiles cleanly with Clang, as it should, since POSIX requires signal.h to provide pid_t. However, importing this header into Swift fails with the following error message:

[snip]/swift/test/Interop/C/stdlib/Inputs/include-signal.h:3:1: error: diagnostic produced elsewhere: missing '#include <termios.h>'; declaration of 'pid_t' must be imported from module 'SwiftGlibc.POSIX.termios' before it is required
pid_t returnPidT();
^
/usr/include/termios.h:30:17: note: diagnostic produced elsewhere: previous declaration is here
typedef __pid_t pid_t;

Apparently, termios.h happened to be the first header that encountered the #ifndef __pid_t_defined construct.

An obvious solution would be to make Swift set -fmodules-local-submodule-visibility, which prevents definitions from leaking from one submodule into subsequent submodules, but there are a couple of issues with this:

  • Minor issue: An initial attempt to set -fmodules-local-submodule-visibility caused definitions from an imported module to become invisible in Swift (e.g. in the example above, attempting to use returnPidT() resulted in an error message that this function is unknown). This is because the module is not added to Clang’s Sema::VisibleModules when it is imported; an easy fix is to call Sema::ActOnModuleBegin() when importing the module.
  • Major issue: The -fmodules-local-submodule-visibility flag does not work correctly with Objective-C code. Fixing this is hard.

SwiftGlibc is not compatible with the libc++ module map

Attempting to include libc++ headers in a C++ module imported into Swift results in an error message about a cyclic dependency between SwiftGlibc and the libc++ module.

The cyclic dependency between libc++ and SwiftGlibc comes about because, essentially, glibc and libc++ are not layered with respect to each other.

libc++ wraps the C standard library headers. It therefore needs to be in the include path before glibc so that an #include <stdint.h> (for example) picks up libc++’s wrapper of stdint.h rather than glibc’s version. Unfortunately, this means that when glibc tries to include its own headers, it unwittingly includes libc++’s wrappers instead (when libc++ is in use).

Here is just one example of many:

  • /usr/include/c++/v1/stdint.h (from libc++'s module std.depr.stdint_h) contains an #include_next <stdint.h>, which resolves to /usr/include/stdint.h (from SwiftGlibc.C.stdint).
  • /usr/include/inttypes.h (from SwiftGlibc.C.inttypes) contains an #include <stdint.h>, which, because of the include path ordering, resolves to /usr/include/c++/v1/stdint.h (from libc++'s module std.depr.stdint_h).

Note that the cyclic dependency exists between the modules, not on the level of\ files, i.e. it's not the case (as far as we know) that there is a libc++ header file that includes a SwiftGlibc header file which then (possibly indirectly) includes the same libc++ header file again.

The libc++ authors were presumably aware of this quirk, and accepted it. It does, however, mean that libc++ and glibc are not layered with respect to each other.

Here’s a more general description of the situation:

  • libc++ obviously needs to include C standard library headers, for several reasons:
    • Some of the implementation of libc++ uses the C standard library.
    • The C++ standard library provides wrappers of C headers (e.g. <cstdio> for <stdio.h>).
    • libc++ also wraps even the C standard library headers. (This is necessary because, in general, the C++ header <someheader.h> has different content from the C header <someheader.h>.) For example, libc++ provides stdio.h, which defines some macros and then does #include_next<stdio.h>.
  • glibc unwittingly includes libc++’s wrappers of C standard library headers when libc++ is in use, as noted above
    • This is because glibc headers include other glibc headers using “angle bracket syntax”. When libc++ is in use, its include directory precedes the glibc include directory in the include path, and angle bracket syntax therefore finds libc++’s version of the C standard library headers.
    • POSIX headers include libc++ headers, too
      • Again, this is because libc++ precedes glibc in the include path

Proposal

The core problem is that SwiftGlibc defines a module map for glibc when it has no business doing so.

In fact, SwiftGlibc has no need to define a module map for glibc. The purpose of SwiftGlibc is to provide a module containing the definitions from the C standard library and POSIX headers so that Swift code can use them. This purpose can be fulfilled just as well in the following way:

  • Create a header file called SwiftGlibc.h containing #includes for all the C standard library and POSIX header files:

    // SwiftGlibc.h
    #include <complex.h>
    #include <ctype.h>
    #include <errno.h>
    // …
    
  • Add this header to the SwiftGlibc module map, and delete all of the other headers:

    module SwiftGlibc [system] {
      // various link statements omitted
    
      header “SwiftGlibc.h”
      export *
     }
    

To deal with the issue that not all headers are present on all platforms, the current SwiftGlibc module uses gyb conditions to include the right submodules for each platform; a different module map is generated for each platform and stored in a platform-specific directory. Our proposed approach allows this to be handled more easily and uniformly: We simply surround each #include with a __has_include condition, like this:

#if __has_include(<example.h>)
#include <example.h>
#endif

We probably still need platform-specific module maps, as the corresponding link statements still need to be put into the module map conditionally. Nevertheless, a side effect of our proposal should be to significantly reduce the complexity of the SwiftGlibc module definition and related build system support code.

C / Objective-C / C++ code imported into Swift will no longer use SwiftGlibc to import C standard library headers. It will typically import them using textual inclusion, unless the platform provides a libc module map.

Advantages

  • Eliminates the missing definitions in some of SwiftGlibc’s submodules because it eliminates SwiftGlibc’s submodule structure (e.g. SwiftGlibc.C.stdio for stdio.h). This shouldn’t be a problem because the submodules aren’t visible from Swift anyway. This is because SwiftGlibc is not imported into user programs directly but through a Swift module called Glibc, and this step "flattens" the submodule structure.

  • Eliminates the cyclic dependency between SwiftGlibc and the libc++ module map, because SwiftGlibc is not used when compiling C++ code.

  • Avoids the need to identify the right directory from which to include the header files, as SwiftGlibc currently does (see the references to ${GLIBC_INCLUDE_PATH} and ${GLIBC_ARCH_INCLUDE_PATH} in glibc.modulemap.gyb). Instead, we can use standard mechanisms (include path, sysroot) to ensure that an #include statement resolves to the correct header file.

  • Is even agnostic of whether the underlying libc (which need not be glibc) is modularized or not. In particular, if the SwiftGlibc module is built with -stdlib=libc++, then SwiftGlibc.h will pick up the libc++ versions of the C standard library headers, which have module definitions in the libc++ module map. Note though that on most platforms, the Swift standard library is built against libstdc++, independent of whether Swift is configured to use libc++ for imported C++ code.

Disadvantages

  • Need to prevent Clang modules from re-exporting C standard library or POSIX definitions (details below).

Technical note

If the platform does not provide a module map for a given header, it will be included textually, both into the SwiftGlibc module and into any imported C / Objective-C / C++ modules that include the header. All of these modules will therefore contain the definitions from this header.

This is fine under Clang’s rules for modules; Clang has logic to deduplicate definitions caused by textual inclusion of the same header into multiple modules.

Main obstacle: Prevent imported Clang modules from re-exporting C standard library or POSIX APIs

A consequence of our proposed approach is that the C standard library and POSIX headers are no longer modularized (assuming the platform does not provide a module map for them). Indeed, this is the very goal of our approach, as modularizing these headers prevents their use with libc++.

This means, however, that any C / Objective-C / C++ module imported into Swift that includes a C standard library or POSIX header will also import all of the definitions from that header. As an example, assume we are importing the following C module:

// module.modulemap
module MyImportedModule {
  header “my_imported_module.h”
}
// my_imported_module.h
#include <math.h>

inline double sigmoid(double x) {
  return 1. / (1. + exp(-x));
}

When this module is imported into Swift, it will inadvertently make all of the definitions from math.h available as well:

// ModuleConsumer.swift
import MyImportedModule

// We can access sigmoid(), as intended.
let _ = sigmoid(1.0)
let _ = MyImportedModule.sigmod(1.0)

// Unfortunately, we can also access anything from math.h as well.
let _ = cos(0.0)
let _ = MyImportedModule.cos(0.0)

This is unfortunate because source code may (and will) start depending on this behavior; it will then not be possible to port the code to a different platform (or a later version of the same platform) that provides a modularized libc.

Note that this is a general problem that affects any Clang module that textually includes headers for a non-modularized library. The module will export all of the declarations from these textually included headers, even though presumably it does not intend to do so.

To solve this problem, we will modify ClangImporter to output a warning if Swift code uses declarations that came from a header that was textually included into a Clang module. To fix the problem, the user can either create a module map for the textually included header, or they can indicate that the behavior is intentional by adding an export * to the module map for the existing module that includes the header textually.

The warning will be upgraded to an error in the specific case where the declarations came from a glibc header. This reflects the fact that the same code would trigger an error with today's SwiftGlibc module map.

We have a prototype for the proposed diagnostic. It can be implemented without needing to make any changes to Clang.

Alternative considered

We initially considered an alternative solution that would have required only minor changes to the SwiftGlibc module map:

  • Prevent macro definitions from leaking from one submodule to the next by turning on -fmodules-local-submodule-visibility.

  • Fix the cyclic dependency between SwiftGlibc and libc++ by applying the [no_undeclared_includes] attribute to the SwiftGlibc module. This attribute “specifies that the module can only reach non-modular headers and headers from used modules” (docs), which breaks the dependency cycle because glibc headers can now no longer see the libc++ wrapper headers.

Advantages

  • Requires no major changes to the SwiftGlibc module map.

Disadvantages

  • -fmodules-local-submodule-visibility currently does not work correctly for Objective-C, and fixing this is hard.

  • By design, a glibc header that includes another standard C header will include the glibc version, not the libc++ version, and this may not give exactly the right semantics.

  • Does not have the side effect of eliminating complexity from the SwiftGlibc module map, as our proposed approach does.

4 Likes

CC @Douglas_Gregor @jrose

This seems overall a good solution, and the general form of it has been tried and tested as the approach to wrap non-modularized platform headers in Swift packages. I think we might want to consider turning on -fmodules-local-submodule-visibility on non-Apple platforms anyway (and hopefully getting there for Apple as well, but that's Apple's business), but this no longer has to be a driver with this change.

It's possible someone is doing specific imports, but since none of the SwiftGlibc submodules are explicit (and thus in need of specific imports) I agree that it's very very unlikely.

As far as I know export * is the default that everyone's using except possibly at Google, so this shouldn't have a huge impact in practice. That said, I'd still like to hear what @Douglas_Gregor thinks of this (and perhaps @Bruno_Cardoso_Lopes as well) since it's not at all the original intent of export *. (Then again, the original intent of modules as I understand it was that there'd be no non-modularized headers left at all, with any textual includes declared explicitly.)

1 Like

The public module is named Glibc; SwiftGlibc is an implementation detail of that. That adds another layer of "unlikely".

1 Like

Big +1 to this. Having absolute paths in your modulemap is a pain when cross-compiling.

For example, the SwiftWasm toolchain build script has an extra stage which strips them. I guess most people doing cross-compiling know this issue.

1 Like

I had thought about this too. One possible issue, however, is that this could hinder portability. In essence, the lack of -fmodules-local-submodule-visibility would make Apple platforms more "permissive" with respect to user-defined modules, too, so user code that builds on Apple platforms might fail to build on other platforms.

Hm, that's bad news. If everyone is using export *, and if export * is the mechanism we use to turn off the warning, then no one will get the warning.

So it seems we may have to go back and look for another way to turn off the warning / indicate that exporting declarations from textual headers is intentional.

The reason we proposed export * is that it did seem to us to be a very natural extension of its current meaning:

  • Currently, export * means "export all of the modular dependencies that were #included in this module".

  • Under our proposal, export * would mean "export all of the dependencies (whether modular or textual) that were #included in this module".
    (More precisely, Clang modules always export textual dependencies independent of whether export * is specified or not, but we would use export * to indicate that this behavior is desired and that Swift should not warn if declarations from textual dependencies are used in Swift.)

Note one specific consequence of this: If one of the dependencies gains a module map, i.e. if it changes from being textual to modular, the semantics of what is exported don't change. This seemed like a nice property.

All of this makes me realize however: If everyone is currently using export *, then they are already exporting definitions from any glibc headers they #include today. This isn't great, but it does mean that the proposed change to SwiftGlibc would not actually change anything.

Thoughts?

SwiftWasm is not a great example here, since it uses WASI instead of Glibc. But yes, we do need to make paths relative as a part of the build process. While WASI itself is based on Musl, I don't think that its module should be even called Musl, I'm planning to create a separate WASI module for it, to be used as import WASI to make differences in behavior explicit (and there's a ton of them).

To clarify: Your point is that people would only be able to access the submodules if they imported SwiftGlibc directly?

This is true because Swift modules that re-export Clang modules flatten the submodule structure of the Clang module. Here's an example on godbolt that demonstrates this:

import Glibc.C.stdio
puts("Hello, world!\n")
<source>:1:8: error: no such module 'Glibc.C.stdio'
import Glibc.C.stdio
       ^

This is a difference between Glibc and Darwin -- it is legal to say import Darwin.C.stdio.

It's also possible to say import SwiftGlibc.C.stdio, but as you point out, this is an implementation detail that users shouldn't be relying on and that we should be free to break.

By the way, here's a draft PR that shows what the modified SwiftGlibc would look like:

There are two failing tests on CI that I'll look at, but this should give a general idea. If someone wants to test this on one of the non-Linux platforms that use glibc.modulemap.gyb, I'd be interested to hear about the results.

Yes, that's what I mean -- thanks.

Yeah, this was my thought as well, and why I'm on board for this interpretation. The primary practical concern would be if someone needed the current behavior: textual includes are exported, but modular includes are not. (I expect this is unlikely.) Doug and Bruno may have abstract concerns as well.

Sure, I'm just using Swiftwasm as an example of cross-compiling that I saw recently. Even if you go on to use a different module that isn't named "SwiftGlibc", you'll encounter the same issues (path-finding, probably also C++ interop problems, etc) if it's built in the same way.

This seems like a much better design for all cross-compiling toolchains.

1 Like

Update: One of the failing CI tests unfortunately surfaced what looks like an existing Swift bug to me. I've written this up here:

Essentially, when the same C header is included textually in multiple Clang modules, the .swiftinteface printer can get confused about which module name should be used to qualify names from that header, resulting in a .swiftinterface file that doesn't compile.

I believe this needs to be fixed in TypePrinter::printModuleContext(), but it's not immediately clear to me how. Suggestions welcome!

For the purposes of the SwiftGlibc module, I'll see if I can work around this issue.

I've worked out how the issue should be fixed and have a fix in review at When qualifying Clang types with a module, make sure we choose a visible module by martinboehme · Pull Request #32465 · apple/swift · GitHub.

The warning for use of declarations from textually included headers is in review at

While trying to move this forward we discovered a bug:

Let’s have a textually included header helper.h that declares a function myFunction() into two Clang modules, ModuleA and ModuleB, and the following source file:

@_implementationOnly import ModuleA
import ModuleB

@_inlineable
func useMyFunction() {
  myFunction()
}

When looking for the function declaration of myFunction(), Swift will choose either ModuleA or ModuleB as the module that provides it. In case Swift chooses ModuleA, it will error out when checking if myFunction() can be used in inlineable context, as ModuleA is annotated with @_implementationOnly.

We have a fix for this in #34476.

This issue was uncovered by #32404, which makes all libc headers textually included.

A project may be affected if:

  • It runs on Linux
  • It has textual includes of libc headers in a module other than Glibc while also importing Glibc
  • The other module is annotated as @_implementationOnly

Since @_implementationOnly is not yet a stable feature, we hope we can move forward with #32404, but we wanted to first run it by the community. Do you have any concerns?

CC @codafi @gribozavr @hlopko

Thanks @scentini! No concerns from me.

Last call, unless there are comments by Monday, we'll submit the PR.

Thank you all for your contributions!

I've tried to make the necessary changes to libc-openbsd.gyb. However, the test for Interop/Cxx/class/memory-layout-silgen.swift trips -- perhaps predictably, since it's the only test that #includes stdint.h.

Specifically, this part of the error seems significant.

/usr/include/stdint.h:57:21: note: declaration here is not visible
typedef __uint32_t              uint32_t;

Manually adding /usr/include/stdint.h to the MemoryLayout module in test/Interop/Cxx/class/Inputs/module.modulemap makes the module compile okay, but this obviously wasn't necessary on Linux, so I'm a little lost as to how to proceed further on this.

Modules are still something of a black box to me and seem difficult to debug, so I don't know whether the test is wrong or I am holding something wrongly.