Libc portability, _GNU_SOURCE, platform checks... oh my

Hi everyone,

I've been working through some issues around figuring out the proper way of
handling portable Swift code, specifically around libc, as part of adding
support for Fuchsia to swift.

Some of this has been discussed on this list in the past but I wanted to
start a new thread to discuss some ideas on ways to fix some of it.

The issues I've noticed in the setup today:

* The Glibc module name is not ideal for Windows, BSD, Fuchsia, Android,
PS4, etc where it's being used today. It's also not ideal on Linux where
people use musl (like arch-linux), uClibc (on embedded systems), or
dietlibc.

* `Darwin` vs `Glibc` modules provide the same (generally speaking) APIs
but force you to do a preprocessor check on which to include.

* There is a very common `#if os(Linux)` check people do when importing
Darwin vs Glibc that is not inclusive of other OSes when that the code
would likely work other platforms. (Like swiftpm does:
https://github.com/apple/swift-package-manager/blob/master/Sources/libc/libc.swift)
This is fixed partially thanks to canImport but you still have to manually
import one of two (or possible more) libc modules depending on this check.

* Commonly used libc functions in glibc and musl are inaccessible in
Swift because they hidden behind feature test macros that we don't define
when generating the Glibc module. See
https://linux.die.net/man/7/feature_test_macros

* Different libcs may expose the same types and functions in different
headers. This generally isn't an issue if you are including just the root
module for Glibc but can be brittle if you are including something like
Glibc.C.stdlib to get a type that might be defined in Glibc.C.string
instead on another system for example.

* We currently have no good way to test to see if the libc you are
building against actually contains the method you want to call at compile
time easily. This might be partially solved with canImport() but can be
brittle for the previous issue as well.

* Because of limitations in the clang importer, glibc.modulemap.gyp bakes
in absolute paths on the build machine which means you can't use
relocatable sysroots (which means a toolchain for Android or Fuchsia can't
be packaged and distributed easily).

We could definitely try to spend some time to try and cook up the perfect
solution and then come up with a plan to get there incrementally, or we
could come up with something that is a bit more pragmatic and smooths out
most of the problems we have today that leaves room for something better
down the road.

I'm a fan of the latter solution myself so in that spirt I have a high
level proposals to solve some of this.

*- Create a new "PlatformLibC" module - *

Swift should provide a common "PlatformLibC" module as part of the stdlib
to use instead of Darwin or Glibc.

Internally this module will just re-export both the Darwin, Glibc, and any
other libc module maps that are created (like for Windows or Fuchsia) under
a single module namespace.

Another idea is having an umbrella header of sorts, referenced in a module
map for this PlatformLibC module, that just includes headers from the
platform manually (potentially with feature defines like _GNU_SOURCE on
Linux). I'm not sure if this solution will work side by side with the Glibc
and Darwin modules however since the clang importer I've noticed gets angry
when two modules specify imports for the same header path. (Someone that
knows the clang importer better can probably explain this better)

After it's implemented we encourage and drive devs to include this module
over the Darwin or Glibc modules directly so we can move away from platform
checks for Libc access.

This module would be unprincipled and would be allowed to expose more
symbols and functions than may be strictly necessary and devs could use
`canImport` on it for some very basic compile time feature testing.

*- Defining _GNU_SOURCE for the Glibc module -*

Most of the more interesting libc types and functions implemented since c99
and later POSIX standards are hidden behind feature macros in glibc. A
number of people have hacked around this issue including in swiftpm and
externally (like this crazy hack
https://github.com/dunkelstern/UnchainedGlibc)

This was something that was discussed on this list in 2016 but nothing came
of it.

I went ahead and implemented this with in
https://github.com/apple/swift/pull/13105 but later changed it to be
Fuchsia specific when we learned doing this may cause some source breakage
for Linux users.

After investigating, the only source breaking change this causes is
specifically with the strerror_r() function which changes it's signature
when _GNU_SOURCE is defined (it returns a char* instead of an int). I'm not
sure how wide spread this function is used in Swift code directly today (I
know that swiftpm uses it which is what broke when we tried this change on
Linux).

If this change is likely to break too many people, one partial fix is that
we could reduce the feature define we set when compiling the glibc module
down to something that exposes more libc functions but doesn't break this
one specific function's signature in Glibc. Something like _XOPEN_SOURCE or
_BSD_SOURCE would work. We still miss out on a few dozen useful GNU
specific libc functions like dladdr, dup3, and pipe2 so it doesn't really
solve the problem fully.

Alternatively (and I'm not even sure this will work because it's pretty
crazy) is we could remap the strerror_r function, as the clang importer
sees it, to an obfuscated name with an APInote and then implement a
portable wrapper in C that just calls the posix defined version of this
function and we expose that in the Glibc module manually with the current
method signature today (like how we handle sem_open).

*- JIT the Glibc (and/or PlatformLibC) module on the compiler host against
the target's** sysroots headers instead of building it as part of the
stdlib build AOT - *

This is a bit more complex of an idea to implement but this would more or
less make the Glibc module as flexible as the sdk/platform provided with
the Darwin module today.

Instead of compiling the Glibc module (and some of the stdlib/platform
folder) with the toolchain, we move it out so that it's compiled and cached
on the target machine and rebuilt for each target -sdk passed in to swiftc.
That way the glibc module will always match the current targeted libc
headers you are targeting and will not be based on whatever the build
machine for the toolchain's libc headers looked like at the time.

For example, if I get a prebuilt swift toolchain for Linux from somewhere
and point it at an old RedHat sysroot with maybe a different and possibly
customized libc, the glibc module that is compiled and cached for me right
then will match that targets libc.

Do these proposals sound reasonable? The first and third idea I could see
needing an formal evolution proposal but I'm more about the second one.

Thanks!

- Zac Bowling
zbowling@google.com

2 Likes