I'm not sure what to tell you other than that the Linux documentation itself does not list malloc()
as safe: signal-safety(7) - Linux manual page
Perhaps I misinterpreted the RedHat article; I suppose it could be saying that the implementations of malloc
and free
need to work well enough to support getting to execve
, without needing to necessarily be async-signal-safe themselves.
That's my interpretation, yeah.
At any rate, the point still stands that, even on Linux, there's very little you can do after calling fork()
other than proceed to call execve()
et al. and the original problem that brought us here isn't specific to Swift or to macOS.
Indeed. For example, whether a C++ program can safely use operator new
after a fork
depends entirely on the standard library’s implementation of ::operator new
. That’s a language feature whose behavior is imperiled by an OS syscall.
It turns out that Postgres can be compiled in an EXEC_BACKEND
mode where it doesn‘t try to do meaningful work after fork()
. It’s mentioned here in a thread about refactoring where the child process creation actually happens.
If you are building Postgres for macOS yourself, have you tried building with EXEC_BACKEND
enabled?
I will look into EXEC_BACKEND for macOS development, apparently out of necessity.
Recompiling Postgres is presumably better DX for our team than recompiling and maintaining a Swift fork that apparently wouldn’t work anyway.
That said, on Unix EXEC_BACKEND is provided for testing purposes and not considered standard. It does not appear to be well supported (looking through their repo, there are many gotchas mentioned).
All in all, it’s hard for me to believe that there is something fundamentally problematic about this pattern when postgres has been using it on Unix for decades. I would wager that effectively 100% of Postgres requests any of us has ever been served was via fork
without exec
. What is being expressed in this thread simply doesn’t add up to me in light of that lived reality.
We still haven't identified what PostgreSQL is doing on the forked thread in the first place, as the backtrace you've shared doesn't go back that far. Something in PostgreSQL is calling into Swift somehow, but we don't know what.
Effectively 100% of Postgres requests have also been served by a Linux host. If you’re not planning to deploy to macOS, container-based development seems like the way to go. Swift already differs dramatically between macOS and other platforms; on top of that, you’re now talking about building Postgres differently too.
Postgres does all of its work in forked worker processes.
Every Postgres request on Unix-like systems, including a simple SELECT *
FROM table
, runs in a forked process without exec. This appears to be its core concurrency model, like Apache‘s „one thread per request“.
In the stacktrace I posted, it’s calling to my Swift Postgres extension which is loaded as a dylib. I can post the full stacktrace but I believe it’s just visual noise here.
Edit: I should note that I can - unreliably - work around these crashes by initializing my own os.Logger before the worker process that calls into the Swift Runtime is spun up. That’s not reliable due to Postgres’ internals.
My stacktrace only differs from every other Postgres stacktrace because this particular request is calling the Swift function defined in my Postgres extension. Think SELECT mySwiftFunction(args…) FROM source_containing_args;
. That Swift function is creating a union of two Sets of integers (which you can already see in the trace). That’s it. This triggers the Swift Runtime to fetch type metadata (presumably for Int), which in turn initialises os.Logger internally. From my understanding this is the first thing the Swift Runtime does on init on macOS but not on Linux.
The last point is what this thread was supposed to be about. I don’t want the Swift Runtime to do that, because it’s causing a crash here.
We can argue that it shouldn’t be doing that on macOS, sure. To be honest though, while I am curious about the OS fundamentals of „why“ that is, I‘m not that interested in this line of argument. Postgres does what it does, has been doing so for decades, and there is no issue on Linux (apparently due to Swift Runtime differences). I simply hoped there would be a way to prevent the Runtime from calling into os.Logger, which is unambiguously and reliably causing the segfaults I’m experiencing.
I understand where you’re coming from and I do feel like switching to Linux might just be the way to go here.
That said, my initial request in this thread was simply to reduce those Swift Runtime differences in one critical way: that the macOS Swift Runtime would no longer init os.Logger (which I don’t want or need).
It’s Postgres doing the forking. I can only assume they are controlling the total state of the program (not sure exactly what that means though). There is unambiguously only one thread- the concurrency model of Postgres works by forking the process, not by starting additional threads.
This really has nothing to do with Swift as a systems language. It’s a matter of binaries not conforming to the platform’s documented conventions, and has come up in a variety of ways since long before the Swift language even existed. You cannot fork without exec on macOS if you interact with essentially any system API at all.
The classic paper "A fork() in the road" has some more detail on why this is pretty much always a bad design, separate from system-specific requirements. I would also note that macOS/iOS are not alone in this; Android, for example, also has a number of hard limitations on what you can do after fork()--this is pretty much the norm for mainstream non-Linux OSes.
@Geordie_J I know that you're just trying to get something to work and are caught in the middle here, but anything we do in Swift for this would be a very small bandaid trying to paper over a large problem in postgres's implementation.
To restate what Steve said, it wouldn’t be sufficient just to avoid initializing os.Logger
. Your situation requires a guarantee that the Swift runtime will never call any async-signal-unsafe system API. That’s just not feasible.
That said, the Linux ecosystem does seem willing to do the work to support these use cases. Without a similar commitment from the Swift runtime on Linux, however, a developer must conclude that Swift code must not be introduced into the address space of a program that calls fork()
without quickly calling exec()
under controlled conditions.
If this poses a problem for Swift on Linux adoption, one solution might be an extension of the @_noLocks
feature to allow the programmer to statically assert that their code obeys the POSIX async-safe rules. This would at least allow a programmer to do something like:
let fd: System.FileDescriptor
if os.fork() == 0 {
@_asyncSafe {
fd.close() // @_asyncSafe guarantees this won’t call into the runtime.
os.exec(...)
}
}
Is @_noLocks
itself not quite enough for the purposes being discussed?
No, because the set of actions that are async-signal-safe is exceptionally narrow. Locks are a problem, but so is allocation, I/O, some syscalls, anything involving a Mach port/right, error handling… it has been said the only safe thing to do in a signal handler is to assign a value to an integer, and even then there are constraints.
That Postgres has any success calling fork()
this often is honestly surprising, but carefully-written C code in Postgres itself is probably well-understood by now. The idea that you could safely call a callback of unspecified provenance from within the child process, though? Absolutely not.
This was a great read, thank you @scanon.
I understand now why fork
is not a great design choice in modern (or maybe any) code and that great lengths need to be taken to prevent issues with it. Postgres appears to go to those lengths.
The skeptics in this thread may be surprised to know that Postgres has a broad extension ecosystem that allows devs such as myself to tie their arbitrary code into creating custom database types, functions, aggregations, and more. Those arbitrary extensions work with locks, shared memory, malloc, and many of the other things touted as impossible here.
While more complicated extension use-cases sometimes reach for their own long-running worker processes and communicate with them via RPC from the forked Postgres process (maybe to avoid some of the issues mentioned in this thread), this is all well established and considered robust and unproblematic.
There have indeed been discussions on Postgres‘ mailing list over the years about moving the macOS default to EXEC_BACKEND (avoiding the fork pattern), and moving the general concurrency model to threads. None have gained traction because they are seen as huge investments without significant tangible benefits.
I’m not here to argue the merits of any of this. Again, while I’m curious about the history and so on, I don’t really care. I can just say that the pattern is established and the fork in question is single threaded, lock free, and early/controlled enough in the parent‘s lifecycle to keep a clean address space. The forked child controls its state meticulously to manage any open file descriptors etc. before running (my) arbitrary code.
Again, I come back to the key point: the Swift Runtime is calling into macOS frameworks on init and I don’t want it to – this causes segfaults. Outside of Darwin this doesn’t happen. Why can we not discuss the possibility of optionally preventing this on Darwin, especially when this evidently suffices on Linux?
Edit: A fundamental misconception in this thread seems to be that I’m looking for a general-purpose async-signal-safe Swift Runtime. I’m explicitly not. That’s unambiguously not possible and has never been my goal. I simply want an option to align the Runtime with a version that does not exhibit one specific known-problematic behaviour.
Even if Swift were to do everything in its power to avoid calling system libraries at initialization time, these libraries are outside of Swift’s control, and even just linking against them (or their transitive dependencies) would be enough to potentially cause problems, since they can run arbitrary code at load time.
To guard against the possibility of this happening at all, you really do need a version of Swift that doesn't link against system libraries; i.e., I think your initial inclination to reach for Embedded Swift is likely going to be your most productive avenue of exploration, practically speaking.
This evidently suffices on Linux because Linux and its system libraries are written with a different model of fork()
, and there is support for doing this at the OS level where macOS does not. There's a lot more leeway for Swift to get this "right" on Linux and a lot more leeway for Swift to get this "wrong" on macOS; even if Swift did it "wrong" on Linux it could still work, and even if Swift did everything "right" on macOS, it could still fail.
If you reproduce the crash and call bt all
in lldb, how many threads are reported?
Regardless, the fact that there is an external library other than Postgres itself loaded in the process, and the fact that you were able to set a callback in the first place before fork()
was called, means that Postgres does not have sufficient control over the process for it to safely run any code between fork()
and execve()
.
Checking an environment variable with getenv()
is not async-signal-safe either.
Look, we're going in circles here—the long and short of it is that this operation that Postgres is doing is fundamentally unsafe in general, and specifically disallowed on macOS. "Postgres has been doing it for years" doesn't mean it's safe, it's just survivorship bias.
I don't think anybody here can give you more information at this point that will help you solve this problem. If you want an environment variable check here, please open a GitHub issue and the stdlib and/or runtime team will determine next steps.
Any of these (RPC and like) async-signal-safe?
I wonder if you could call "exec" yourself in your extension to avoid this whole issue.
But that's in a child process... I bet bt all
in a newly forked child process would always report 1 thread only regardless of the number of threads in the parent, ditto for task_threads(mach_task_self(), &threads, &threadCount)
.