Pitch: The CStdlib module

I would be very happy to define CStdlib as importing (transitively) as many as those exact headers the C preprocessor can find on the platform. It's important to note that at least some of those headers, despite being part of the standard, are for example not available on Darwin (<uchar.h> specifically), hence the 'as many as possible'.

I think enumerating the differences would take some effort. But as examples of them: while the standard mandates organization of the declarations, it does not mandate inclusions. So modules can import other modules in different ways. Additionally, the same headers may provide extensions (e..g., the safe functions that Microsoft provides). Many of the annexes of the C standard are optional, as are many of the requirements of the standard itself.

Another example would be the implementation itself is interesting. musl as a matter of principle goes to define stdin, stdout, stderr as macros which when expanded will cause failures if used as identifiers as the standard mantains that should stdio.h be included, the macros are reserved identifiers even though they are in the user's namespace.

As to more concrete differences, there are language standards, and libraries can partially support a standard. On Windows, VLAs are not supported (outside of x86 for historical reasons). Deprecated functions may also be unavailable more aggressively (e.g. gets). I know that aligned_alloc from the C standard is not available on Windows but there is an extension to support the aligned allocation behaviour. DRs can cause partial implementation of interfaces as well.

However, I would agree that having CStdlib only include the submodules from the C standard is a good goal to aim for, though I am concerned that it may accidentally import something else due to dependencies in the C library headers.

I think that that sort of gives us a way out — by defining it in terms of importing specific headers from the platform, we can give a pretty precise idea of the semantics of this module rather than nebulously referring to a standard, justifying the varying interface.

Two problems I foresee:

  • Someone uses API surface in a leaf module that is imported by CStdlib only in one or a few OSes. Any competent multiplatform CI for the leaf module will catch that.

  • Someone uses API surface that is not present in only some OSes. Is it fair to ask people to use canImport(…) or os(…) (which one?) for this?

I don’t have ready examples of these, though.

2 Likes

I took a look at the system modules (Darwin, WinSDK & visualc & ucrt, Glibc), and indeed feel that there're lots of chaos and fixing them can be unavoidably breaking...

Clear directions (from my perspective):

  • Group Glibc headers into modules like Darwin and WinSDK already do;
  • Replace ucrt and visualc as WinSDK.C. Unstandardized parts fall into WinSDK.

Future directions:

  • Given that Bionic only governs Android, maybe we should directly use NDK for the main namespace (eg. NDK.C and NDK.POSIX);
  • Add #if platform() judgement for dominant ABI instead of #if canImport(), the possible names are: BSD, Darwin, GNU, Musl, Android, MSVC, WASI(, Haiku?). This is mainly for OSes like Windows and Linux, which has variants of other ABIs. That's what I came up with to solve Glibc/Musl imports and judgement for Apple platform.
2 Likes

I would specifically reach out to @compnerd, for your opinion on restructuring how we import system libraries on Windows.

To make a unified WinSDK possible, we need to maintain symbolic links into %UniversalCRTSdkDir%\Include\%UCRTVersion% and %VCToolsInstallDir%\include, where we place module.modulemap alongside. I believe that's natural and reasonable for a pluggable Windows.sdk, and is possible to achieve.

My two bits: before we ship an official CStdlib module, I think we'd need to audit and Swiftify the APIs it contains.

1 Like

I agree on figuring out what surface we are exposing. I disagree on ‘swiftifying’: this is the scope and domain of System and marginally of Foundation, and it is better left to those two libraries. (This is, in part, for ease of implementation in specifically Foundation.)

I don't mean a full revamp as if libc were a native Swift module. I do think that certain C functions and interfaces are very "C" and there's sharp edges we can improve. For example, bsearch() could be overloaded in Swift such that you can pass it a Swift function/closure instead of needing to pass a @convention(c) function and a cookie pointer.

Or, C has an errno_t type that is a typedef in C but could be made a proper struct in Swift and which could conform to Error, allowing throwing errno values without needing to import Foundation (to get POSIXError.) Or maybe we could migrate POSIXError to this module and map errno_t directly to it.

If we had all the time in the world…

  • I'd be interested in making the calling convention for C functions more consistent: some functions use errno to report errors, others use their return values. Some return 0 on success, non-zero on failure. Others return specific magic values. It'd be worth exploring if we could expose these functions consistently (and maybe have errorful functions become throwing.)
  • There's value in replacing some of C's magic numbers with Optional. For example, fread() returns an ssize_t with negative values indicating an error—it could instead be imported as func fread(...) throws -> Int.
  • I'd want to think about C strings in API—can we import functions that take C strings as if they take Swift strings? But that's a fairly large surface area and worth its own discussion.
1 Like

Some of what you propose already occurs — for example, we import (or define) the errno error type as POSIXError, and it does in fact conform to Error.

I think your requests are basically re-describing the mission statement for System, though? I really don’t want to either propose duplication of their work nor a requirement for this library to be anything but a smart way for lower-level modules to obtain access to platform API to implement elsewhere what you describe.

4 Likes

Fair enough. As I said, it's my two bits. :slight_smile:

1 Like

I want to confirm the goal of this feature. As far as I understand, the goal is to provide a standard way to import target-system specific libc declarations, and not provide a cross-platform abstraction layer, right?

Assuming the above, I agree that we need this sort of shim module for interacting with low-level APIs. It was not clear for those without knowledge of the various platforms, which module corresponds to libc-like module on each platform.

I think we can think importing the CStdlib corresponds to #include, and delegate the responsibility of portability to import-site and libc itself as C manner does. So it feels acceptable for me to provide different APIs between platforms.

For the sub-modularization of CStdlib, I think we don't need to have our own grouping rule because we are not aiming to provide an abstraction layer. We can just use the header name as submodule names.

FWIW there was a similar discussion in Rust community for libc crate. https://github.com/rust-lang/rfcs/blob/master/text/1291-promote-libc.md

You’re right. We’re not talking about any abstraction layer here. CStdlib is just a shortcut to import headers from the C standard library. That is, import CStdlib.stdio should work like #include <stdio.h>.

However, there’re obviously mixed goals across this long-going discussion thread. The seemingly simple goal of CStdlib is blocked by plenty of others. I would propose a possible and step-by-step path to demonstrate this.

Standardize and modularize Glibc

Currently, Glibc is imported through SwiftGlibc.h, which only provides limited access to Glibc functionalities. Also, in this way we cannot modularize Glibc as we did for system libraries on Darwin and Windows. SwiftGlibc is intended to be replaced by a standardized Glibc module map, but the work has been delayed for years.

Re-organize system libraries to a Darwin-like module

We have been specializing Glibc and making this name an umbrella for any libc from except Darwin and Windows. However, this is not true. Different libcs provides different sets of functionalities, and we should allow consuming all possible APIs from the system library.

Also, the way we load Glibc today is not standardized, while the Windows practice has already provided a standardized way to do such thing. On how to organize the libc module (maybe better expressed as system SDK module), I suggest referring to Darwin.modulemap since that’s really neat.

Re-export <SystemSDK>.C as CStdlib

After the two preparation work, we finally can provide CStdlib on supported platforms. The CStdlib module should be exactly the same as C submodule from system SDK. However, clang doesn’t allow submodules to be re-exported. If this restriction cannot be removed, we can make the convention inside Swift compiler before it reach for clang to resolve the module.

I want to especially point out that this step is vital yet very breaking. It will eventually change the system SDK module name for nearly every target that’s not using Glibc. To be frank, this is the major concern that is driving me to push and land the work before Swift 6.0.

1 Like

I think it's time we got this over the finish line, given that the boilerplate libc import block keeps getting larger. I would like to split off Bionic soon, but I don't want to add a similar import Bionic at the top of every file, prefering to consolidate everything to import CStdlib first.

What do we have to do to get this finalized?

1 Like