Using the C library from Swift: thoughts

There has been some discussion in the comments of this PR for adding FreeBSD support about the fact that it is presently abusing the Glibc module (which really should just be for systems that use Glibc, of which FreeBSD is not one).

During this discussion, the idea was again raised of having some kind of unified C library module so that users don't have to do

#if os(Linux)
import Glibc
#elseif os(macOS) || os(iOS) || ...
import Darwin
#elseif os(Windows)
import CRT
#else
#error We need a C library!
#endif

or similar in any file that needs the C library.

There are already two pitches on this topic. I don't want to hijack the FreeBSD PR and I don't want this discussion to be a pitch, because that would, I think, imply that someone might have time to work on it some time soon, which is not a given, but I do want to talk about what we should do about this problem and the current state of affairs, hence this thread.

I think the people pitching the idea of having some portable C standard library module are right to argue for that. Right now, even after doing the dance above, it's still hard to use C library functions because of types being imported differently — e.g. FILE * can be imported as OpaquePointer, or as UnsafeMutablePointer<FILE>, or indeed as UnsafeMutablePointer<FILE>?, and exactly which you get depends on precisely how the C library in question defined the type. This makes it difficult to portably use C library functions that, on the C side, are perfectly portable.

The reason for this thread is that although I like the idea of having a portable C standard library module (or modules, actually), it is sadly a lot harder to do that than it looks.

In particular, the way Clang modules work, each C type is defined by exactly one module. Thus if the C library headers declare FILE, then the module in the C library that defines FILE must be the only source of a type called FILE. If another FILE type is declared elsewhere and some code tries to use it, that will result in compiler errors (e.g. complaining that FILE was expected to be in Darwin.stdio_h but was found instead in Libc or vice-versa). We could in principle define a Swift interface to the C library that contains Swift definitions of all of the relevant functions and types — but then we have another issue, namely what you do if you are trying to interact with some third-party C code that hands you e.g. a FILE *, which will be whatever the underlying C library defined it as and not necessarily whatever the handwritten Swift bindings have settled upon.

There is a related problem, namely that the Glibc headers in particular are a long way from being properly modularizable, with the result that the Glibc module is actually quite broken, and this is hard to fix. For the Static SDK for Linux, we actually process the Musl headers slightly in order to make them modularisation-friendly, and it seems not unlikely that we will need to adopt a similar approach for Glibc, though that is also tricky because with the Static SDK we actually ship the headers, whereas for Glibc we need to start with whatever is on the system we're building on. (There is precedent for this approach — for years GCC did precisely this when installing on various proprietary UNIX platforms, running a header-fixing script and generating its own copy of the system headers that took precedence in the include path over the system installed ones.)

I would very much like, personally, to have the ability to do things like

import C99 // Or C11, or C23

to get a standardised interface for the functions and types declared in the relevant C standard. I would also like to see something like

import CExtensions

that gets you the set of common C extensions that are available basically everywhere (e.g. strdup, strcasecmp), but with any naming differences erased (so, e.g. strcasecmp would work on Windows, even though the function there is called _stricmp); there is quite a sizeable set of these that work essentially identically across the various UNIX platforms and indeed also on DOS/Windows.

A similar strategy could be used for e.g. the POSIX API, though that one is rather larger and might need to be broken up into submodules for performance reasons.

But we need to think carefully about how we would go about implementing this. It is, sadly, very much not as simple as slapping together some new headers and writing a quick module map.

Addendum: There was a suggestion in the PR comments that this is something the Platform Steering Group should take on. PSG is clearly interested in it, but does not, on its own, work directly on things like this — for that, we would need to start a Working Group which would consist of various interested parties, in this case very likely including some PSG members but also clearly members of the wider community.

12 Likes

With a decade of hindsight, I wish we'd not brought up the Darwin or Glibc modules in the first place. They conflate C standard APIs with platform-specific ones, but even worse they do so in a way that's not configurable from Swift. For instance, if _GNU_SOURCE isn't defined before you include a header from glibc, you don't get access to all its interfaces. And SwiftGlibc.h (?) doesn't define _GNU_SOURCE, so in order to access something like pthread_setname_np(), you need to either resort to dynamic loading with dlsym() or to adding a C module to your package that provides wrapper functions. (Swift Testing has both. Yay.)

I don't feel strongly about distinct module names for different C versions, although an alternative would be to use the existing availability attribute and #if like we do with compiler and swift.

I don't pretend to have all (or any!) of the answers to the harder problems in this area. But this is a general problem that will continue to grow as we add more platform support, so the sooner we start solving it, the better.

7 Likes

You seem to misunderstand what I was suggesting. Both pitches are very simple and clear: they just ask for a single name to be used to import the C stdlib for your platform, whether import CStdlib or import stdlib. That allows people invoking libc APIs to not even know the name of the various libc overlays when starting off, and not have to ever write or extend that giant import wart you copy-pasted in a bunch of their Swift files that have multi-platform C code.

This is an extremely simple solution to a simple problem. It doesn't try to solve a bunch of pie-in-the sky wishes like a standard C99 or POSIX module, which in my opinion is never going to happen.

Yet, for whatever reason, that simple pitch has not moved forward after so many years. That is what the PSG could easily move forward, if it chose to, as it requires almost no effort to implement: just slap a file with some imports in the Swift stdlib, write up some doc on how to use it, and add a new platform overlay to it every year or two.

All the more complicated directions you lay out are fine ideas, but I don't see anyone who is willing to invest in that yet.

1 Like

Except that's not really what people are asking for. They're asking to be able to

import CStdlib

and then actually write code that depends on the C library without a bunch of

#if os(Linux)
...
#elseif os(macOS) || os(...)
...
#elseif os(Windows)
...
#endif

junk in the rest of their files too. Just removing the extra imports has very little value if you still have to conditionalise everything because of types being imported differently on different platforms.

6 Likes

Agreed, that's totally an option too. What I liked about

import C99

et al is that it's entirely clear what should or should not be in that module. Availability does have some advantages though — we can weak link, and we don't have to solve the issue of what happens if a program has a mix of import C99 and import C11.

1 Like

I disagree. Both pitches are very clear: they only remove the import wart.

They do not help at all with implementation code like this:

                #if canImport(Darwin)
                var w: off_t = off_t(count)
                let result: CInt = Darwin.sendfile(fd, descriptor, offset, &w, nil, 0)
                written = w
                return ssize_t(result)
                #elseif os(Linux) || os(FreeBSD) || os(Android)
                var off: off_t = offset
                #if canImport(Glibc)
                let result: ssize_t = Glibc.sendfile(descriptor, fd, &off, count)
                #elseif canImport(Musl)
                let result: ssize_t = Musl.sendfile(descriptor, fd, &off, count)
                #elseif canImport(Android)
                let result: ssize_t = Android.sendfile(descriptor, fd, &off, count)
                #endif

You may think simply fixing the import wart alone is too small a problem to fix like this: I and I think many others disagree.

Of course, there was a lot of subsequent talk in those pitch threads about that larger problem of implementation, ie actually calling all the platform-specific C APIs, but realistically who is going to do that work? I don't see it happening.

Now, devs invoking import CStdlib may assume we have done that work, so we should have explicit doc that disabuses them of that notion.

2 Likes

Perhaps it's worthwhile noting that there is the use-case of having the Swift stdlib implemented via libc, and there is the use-case of ordinary developers writing Swift that want to call into libc. For the former, we needn't care about how the Swift code calls too deeply on the back end if we don't actually vend that module widely. For the latter, we could write libraries that are disconnected from the Swift stdlib that provide the clean API surface that is being discussed, and consider migration down the line.

This doesn't actually solve the "who will actually do that API resurfacing work" question, however.

Your example is sendfile(), which is a non-standard platform-specific API, and so it was always going to require platform tests.

I'm not talking about platform-specific APIs. I'm talking about the ones in the C standard, where we know what they're supposed to look like, what types their arguments have and what their return types are, because the standard tells us, but unfortunately because of the way Swift imports types we end up with different types on the Swift side with different C library implementations.

Indeed, I'd say that platform-specific APIs being in separate platform-specific modules (e.g. Darwin or Glibc) is absolutely fine and exactly how things should be. There's no good reason whatsoever that you should find e.g. asl_log(), sendfile() or malloc_size() in CStdlib (or whatever we call it). In fact, those absolutely shouldn't be there IMO.

1 Like

Except that's not really what people are asking for. They're asking to be able to “import CStdLib” and then actually write code that depends on the C library without a bunch of

That is not at all what I want, I know platform differences will happen, I want to deal with those, not the Swift-specific import dance.

Even better if there were imports for individual headers, that way I could import the headers I want and deal with platform differences and extensions where applicable.

What I want is to not have to write Clang modules to import this and that standard header in almost every single one of my projects.

Basically I want to write my C code in Swift, as identical, #if ridden as it is, and use Swift’s type system, I don’t even need the standard library!

1 Like

I take your point and perhaps I was a bit fuzzy in my language, but I was lumping the Swift typing issues you mention and the fact that strerror_r() has different versions in different libcs and all the base libc incompatibilities you're worried about as "platform-specific APIs." Perhaps that's too big a basket to throw them all into, but my point should've been clear: I'd like to fix the import wart alone first.

If you want to start a years-long project of a proper import C99 module too, go for it, but I see no reason not to fix the import wart first.

I agree with @al45tair on this point. As I understand his point, we have Swift-specific import differences at the level of individual APIs in the C standard which are not in any way related to platform-specific extensions. Put another way, there is not one "import wart" but many.

Merely aliasing how the import statement is spelled could actually make things worse: now what looks like the same import actually brings in totally different interfaces in ways that are not obvious in source.

It goes to a pet peeve of mine: the same things should look the same, and different things should look different (ideally, to the extent of and commensurate with the degree of their differences). Simply making different things appear the same isn't actually fixing anything, it's just hiding the problem if not creating another one.

For this reason, I would not want to see import C99 become utterable if it doesn't, at least to a very substantial degree (absolutes are always tricky and regrettable when it comes to these things), import a uniform interface on all platforms. The status quo where you acknowledge that you're really importing potentially very different interfaces is arguably better than writing one import statement that papers over those discrepancies without resolving them.

10 Likes

The specific API[1] that repeatedly kills me is FILE *. It's impossible to express correctly in pure Swift in a cross-platform way, but since Swift does not define much I/O API (nothing in the stdlib beyond print() and friends), you have to go up to Foundation or various packages. Which just pushes the problem of defining FILE * up a layer, and you may even end up with incompatible definitions if two libraries you use decide to wrap FILE * differently.

I'm not in any way saying fopen() is the ideal Swift API, but it should be easy to call it, stick its result in a class or move-only struct, and call fclose() on deinit. :melting_face:


  1. Please don't read this reply as me complaining about the wonderful work the stdlib team has done. I just want to see us do even better. ↩︎

7 Likes

Can we leverage APINotes in any way here to nudge the signatures to be correct for libc implementations that don't conform so that we get uniform declarations synthesized by ClangImporter anywhere?

No, there is only one import wart: some variation of what he pasted in the OP. His point is that we should fix it in a different way that culminates in a working import C99, a project that could take a year or two as he himself admits nobody is investing in implementing that.

I'm suggesting we get rid of the import wart for now with a crude solution, ie what has been pitched twice already, then we can switch to his more sophisticated solution later, if it ever exists.

Yes, but that is what the import wart does already. Merely aliasing it does not make the problem worse. Call it import platformC or some ridiculous name if you like, it's simply a way to stop that current import wart from spreading everywhere. :wink:

It's not doing any of those things: it's simply aliasing the import wart to a single import alias. The problem still clearly exists, as you will quickly find when writing the actual code calling those C APIs, as Alastair notes.

I suspect you, Steve, and Alastair are in the extreme minority in this opinion. Nothing has been papered over: we're merely saving ourselves from copy-pasting the import wart everywhere.

You may believe it appears that way to some: I think explicit doc should make clear it just aliases the import wart and that's it, with no help in implementation.

Is that import wart alias a small problem worth solving in its own right? I think so, and so did @millenomi and @timsneath and some others over the years.

2 Likes

Not at present, because API Notes are lacking a couple of annotations we'd need — I have asked for these, I might add, but we don't have them right now.

That would make it easier to get a uniform interface without modifying headers, though it doesn't solve all of the problems we have here.

2 Likes

One thing that I think helps clarify our thinking a bit here is to be clear what we're talking about when we say "the C library". The headers that make up the C standard library split roughly into a few categories:

Headers that don't really make sense in Swift

The API in these headers is either unusable in Swift, or is fighting against one or more aspects of Swift's semantic model, or is expressible in other ways in the language. There's basically no reason to use any of these in Swift ever.

header notes
<assert.h> Conditionally compiled macro that compares its argument to zero
<fenv.h> (since C99) Floating-point environment
<iso646.h> (since C95) Alternative operator spellings
<setjmp.h> Nonlocal jumps
<signal.h> Signal handling
<stdalign.h> (since C11)(deprecated in C23) alignas and alignof convenience macros
<stdarg.h> Variable arguments
<stddef.h> Common macro definitions

Headers that cover features that already have Swift bindings in the stdlib

The API in these headers is already (partially) covered by existing stdlib functionality; it should be expanded to achieve parity where appropriate, but broadly speaking we have pretty good coverage for these API out-of-the-box already, and we should be adding missing API where necessary so that these headers are never needed.

header notes
<ctype.h> Functions to determine the type contained in character data
<errno.h> Macros reporting error conditions
<float.h> Limits of floating-point types
<limits.h> Ranges of integer types
<stdatomic.h> (since C11) Atomic operations
<stdbit.h> (since C23) Macros to work with the byte and bit representations of types
<stdbool.h> (since C99)(deprecated in C23) Macros for boolean type
<stdckdint.h> (since C23) Macros for performing checked integer arithmetic
<stdint.h> (since C99) Fixed-width integer types
<stdlib.h> General utilities: memory management, program utilities, string conversions, random numbers, algorithms
<stdnoreturn.h> (since C11)(deprecated in C23) noreturn convenience macro
<string.h> String handling
<time.h> Time/date utilities
<uchar.h> (since C11) UTF-16 and UTF-32 character utilities
<wchar.h> (since C95) Extended multibyte and wide character utilities
<wctype.h> (since C95) Functions to determine the type contained in wide character data

Math library bindings

Swift bindings for these are available in the Numerics package, but we should consider moving bindings for <math.h>, <tgmath.h>, and <complex.h> into the standard library.

IO

Foundation has locale-aware formatted IO, and some of this stuff is in System already, but there's a real need for non-locale aware efficient IO formatting in the stdlib or a package, binding the <inttypes.h> and <stdio.h> functionality.

Manual Threading

Bindings for some of the <threads.h> API might plausibly go in the synchronization library.

That's "the C library".

As you may note, most of it is already directly available in Swift. So the bulk of what people are asking for when they talk about importing "the C library" is really platform "SDK" that happens to traditionally be laid out in /usr/include/*.h or analogous locations. Probably most of that should get Swift-native bindings of platform-native semantics (this is the goal for the Swift System package), but it's also useful to be able to talk about how to import it in Swift (especially how we could make it possible to import the specific bits that you need without getting all of it). However, it's important to be clear that all of it is fundamentally platform-specific, and tossing it into a single flat "C" module is going to cause as many problems as it fixes.

7 Likes

Totally agree with this. A lot of my thinking has been driven by frustration when trying to interact with things that e.g. use FILE *, same as @grynspan, though it isn't the only such problem. (And if you expand out to POSIX from just the C standard library, there are plenty more.)

About the only thing in your post where I have a slight worry is <signal.h>, not because I like it (or UNIX signals), but because I think we probably do want to have some kind of API for some of the things they get used for — Ctrl-C handling, window size change notifications and so on. But that could totally live in System or in some kind of terminal handling package and doesn't have to look particularly like signals. In fact, it'd be better if it didn't.

5 Likes

The other unhinged idea is to implement libc in Swift. :woman_shrugging:

(But then, @_cdecl is still not formalized...)

1 Like

And you still have the problem of what to do when you get a FILE * or a malloc()ed block from some third-party C code you're using that is talking to the real platform C library.

Perhaps, but at least you can hide your #ifs on the back side of the API surface. You would be consistent on the front side, which makes the callsites better.