How to prevent a function (of a plugin) from being executed further?

theBitThatBytes · May 6, 2023, 12:23pm

Hi, I am developing a framework to manage data provided by various data sources via a (plugin) api. The plugin can be provided via a dynamic library or added directly to the source. In both cases, I have no control over the plugin content but I need a way to either suspend and resume or stop the running function.
For simplicity, the plugins currently only contain Swift code, but could also include C, C++ or F90 libraries in the future.
The plugin is loaded on demand and its main function is called. The returned data is then processed further.

I know that swift provides some APIs like DispatchWorkItem or the new Task, both of which have a cancel method, but they require manually implemented is-cancelled-checks in the code part that I can't access to work.

Here the simplified plugin workflow:

protocol SomePlugin {
    var id: UUID { get }
    func run()
}

struct MyPlugin: SomePlugin {
    let id: UUID = UUID()
    
    func run() {
        /* ..the plugin content - usually a result would be returned.. */
        while true {} // runs for ever to demonstrate a severe hang...
    }
}

func main() {
    var plugin: SomePlugin = MyPlugin()

    // plugin call in a different thread / async background 
    DispatchQueue.global(qos: .background).async {
        plugin.run()
    }

    sleep(4)
    // DO: stop plugin.run() from executing *or* suspend and resume later 
}

main()
dispatchMain()

and here the simplified public API:

import DTX

let dtx = DTX()
dtx.data_src = [
    MyDataSource(),
    SomeOtherDataSource()
]
dtx.dylib_folder = .documentsDirectory
dtx.serve()

let res = dtx.latest(for: "NHH")

I hope there is a way to solve this problem.
Thank you in advance.

ibex10 · May 6, 2023, 1:10pm

By a function, if you mean a piece of code that executes in the same process as the caller, the execution of the function can't be preemptively stopped, suspended or resumed. However, it might be done cooperatively.

If the called function runs in a different process, the execution of the function can be preemptively stopped, suspended, or resumed by terminating, suspending, or resuming the host process.

theBitThatBytes · May 6, 2023, 1:39pm

By a function I mean the peace of code that is executed when calling plugin.run(). This code runs in the same process but on a different thread.
I'd like to pause and resume or stop/force stop it like the OS does with its subprocesses.
Threads are constantly suspended and resumed to allow multiple processes to run "concurrently".
Swift's DispatchQueue and async/await API allow multiple tasks to run concurrently, even on single cores, which means there is a way to stop unknown code from running in the same process (without implementing is-canceled-checks).
How can this be achieved, is there an API for this?

rayx · May 6, 2023, 3:40pm

OS suspends/resumes processes based on interrupts. For threads that are implemented in user space, they works in cooperative manner. Neither is applicable in your case.

I think it might be possible to implement your requirement by introducing a monitoring task. I don't know Swift currency API well enough (I rarely have a chance to use them), so I'll give an example by using processes. Suppose you run a process in your system and you want to suspend/resume it based on some rules, you can start another process to monitor that process and take actions based on your rules. I suppose you can implement the same approach by using Swift currency API. For processes one can send stop/resume signal to it, I don't know if there is similar mechanism to control tasks in Swift.

ibex10 · May 7, 2023, 1:29am

That would be really nice if we were able to do that sort of thing. Unfortunately, currently the way tasks and threads work rules that out.

If your code runs on macOS, I am certain that you can use an XPC service to host the function the execution of which you want to stop, but I am not sure whether you can suspend/resume the XPC service though. (An XPC service runs in its own process.)

What we really need is a new kind of Task, the execution of which can be preemptively suspended, resumed, or terminated. That would make my dreams come true.

I hope that @eskimo and @John_McCall will say a few words on this.

John_McCall · May 7, 2023, 3:30am

I cannot imagine adding a kind of task that can be preemptively stopped. That kind of feature is incredibly problematic.

theBitThatBytes · May 7, 2023, 12:02pm

Thanks for the answers.

I think it might be possible to implement your requirement by introducing a monitoring task.

@rayx Unfortunately, this is not possible because the plugin plugin.run() must run inside the same process (variable space). The plugins (can) use an API that provides access to inter-process variables and other APIs. Synchronising them across different processes would be a nightmare and would. dramatically decrease performance. High performance and reliability are key to this project.

If your code runs on macOS , I am certain that you can use an XPC service to host the function the execution of which you want to stop, but I am not sure whether you can suspend/resume the XPC service though. (An XPC service runs in its own process.)

@ibex10 The same applies here, and the program must also run under Linux.

I cannot imagine adding a kind of task that can be preemptively stopped. That kind of feature is incredibly problematic.

@John_McCall Because it's unsafe?

I can't imagine that there is no solution to this problem. Control over threads is such an essential thing.... And I don't feel like implementing my own threading system.

eskimo · May 8, 2023, 8:56am

I'd like to pause and resume or stop/force stop it like the OS does
with its subprocesses.

If I understand your requirements correctly, what you’re looking to do is completely infeasible. Lemme explain the problem and then you can tell me if I’m ‘holding it wrong’.

My understanding is that your want to create a program with two threads of execution:

The first loads some arbitrary compiled code (Swift, C, Fortran, whatever) and runs it.
The second monitors the first. It can pause the code, do bits of work while it’s paused, and then resume the code.

Is that right?

If so, this is not going to work. Imagine that the first thread calls malloc and, while the memory allocator is working, the second thread suspends it. The memory allocator uses a lock to protect its data structures. If the second thread then goes on to call malloc — and, remember, in Swift you have no definitive way to control memory allocations — it’ll try to acquire the lock and you’ve deadlocked.

IMPORTANT While I’m using malloc in this example, there are plenty of other such global locks. The other two that folks commonly hit are the Objective-C runtime lock and dynamic linker lock, but there are lots of others lurking in the system.

Unless you can seriously constrain the code run by the first thread, there’s no getting around this limitation.

The only reliable way to handle this is to run the first thread’s work in a separate process.

Share and Enjoy

Quinn “The Eskimo!” @ DTS @ Apple

AlexanderM · May 8, 2023, 3:48pm

Correct.

For anyone wondering what exactly is problematic, I'd suggest reading Java Thread Primitive Deprecation

Java had this, and deprecated it because of its inherent danger. Even in their high-level, managed paradise ~~garden~~ VM, they couldn't implement this capability without building a minefield.

theBitThatBytes · May 8, 2023, 6:42pm

Thank you for the detailed explanation. This is indeed what I am looking for.

I am aware of the problems, especially the deadlock issue. But I thought this would also apply to swift's thread, DispatchQueue, the OS threads etc. and thus already be solved?

A thought based on my limited knowledge of thread handling:
A thread is very similar to a process and has its own thread control block. The thread handler suspends threads based on interrupts. Some threads are suspended longer than others depending on their priority.
Somehow, the thread manager is able to suspend threads without causing serious problems like deadlocks. Swift's threads know nothing of their content, but are also scheduled, so thus constantly interrupted and resumed. Somehow problems like deadlocks do not occur.
If I send sigkill with kill -9 $pid to a process, this notifies the program to stop, suspends all threads and releases the used resources without causing any problems.
Then why is it not possible to simply suspend a thread safely? I don't need the thread to be suspended right at the next instruction, but at a reasonably safe time. For instance, can I ask the thread handler to stop the thread (safely) by providing a TID? Is it really that hard to re-implement what is already done at OS level?
I probably just scratched the surface when reading about this topic and am still having trouble understanding why this shouldn't be possible.

The only reliable way to handle this is to run the first thread’s work in a separate process.

Guess I'll have to stick with it for now. Already happy to find out how shared memory works in Swift....

tera · May 8, 2023, 7:26pm

Note that suspension and killing are different activities. With suspension you may have a similar deadlock when using processes instead threads (process A grabs resource 1, then explicitly suspended by process 2, then process 2 tries to grab resource 1 and blocks forever or at least until someone else unblocks process 1).

What that "reasonably safe time" would be? When the thread doesn't hold any locks? That could work in theory, resolving the issues of deadlock, risen above, checking if thread needs to and could be suspended in the thread scheduler time interrupt and within calls to mutex unlock (among other things) but note, that the thread could have some lock taken (even a single one) for a long time – you've said above you "have no control over the plugin content" so the plugin might do this for some internal reason.

BTW, why exactly do you need to suspend/resume or kill the plugin code? I have hard time imagining the need in practice.

AlexanderM · May 8, 2023, 7:33pm

Here's a simple example to illustrate the problem:

Say thread A wants to make large changes to a file, without interference from other threads (or even other processes). It opens the file with O_EXLOCK. Anybody else who wants to open that file has to wait.

Now thread B instructs thread A to stop. What happens to everyone waiting on the locked file?

ahti · May 8, 2023, 9:53pm

One additional point to consider: If you have no control over the plug-in code, it could very well start new threads of its own, which you wouldn't even know you'd needed to stop.

I think that if you actually need this level of isolation, subprocesses are the right tool for the job.

If your description of the interaction between host and plugin is accurate ("The plugin is loaded on demand and its main function is called. The returned data is then processed further."), I would also try to stay away from shared memory etc. Just process arguments and stdin/stdout sound plenty capable for that.

ibex10 · May 9, 2023, 12:36am

If we were to substitute processes for threads in this scenario, there wouldn't be any deadlock. Would it?

David_Smith · May 9, 2023, 3:59am

I have had to fix cross-process deadlocks due to suspended processes on multiple occasions, it's just the same really, except that there are usually fewer and more tightly controlled interaction points.

eskimo · May 9, 2023, 8:12am

But I thought this would also apply to […]

Yes.

and thus already be solved?

Or not be solvable O-:

If I send sigkill with kill -9 $pid to a process, this notifies the
program to stop, suspends all threads and releases the used resources
without causing any problems.

This is a very different situation. When the kernel cleans up a process, there are two types of resources involved:

In-process
In-kernel

The in-process resources are irrelevant. That process is terminated so there’s no need to clean it up.

In-kernel resources are more interesting. When a thread blocks within the kernel [1], it waits in one of two ways:

Interruptible
Uninterruptible

The idea is that an interruptible wait is used for long waits and uninterruptible wait is used for mutual exclusion purposes. When you kill a process, the kernel has a mechanism to interrupt any threads in an interruptible wait. That thread then starts failing with an error, eventually unwinding to the point where it returns back into user space. It’s at that point that the thread terminates.

Every now and again you’ll encounter a process that you can’t kill. When you look at it with ps, you’ll see a state of U, indicating that it’s stuck in an uninterruptible wait. It should leave that state relatively promptly but, on occasion, you see bugs in the kernel that cause it to get stuck there indefinitely.

Then why is it not possible to simply suspend a thread safely? I don't
need the thread to be suspended right at the next instruction, but at
a reasonably safe time.

As tera pointed out, the problem is your definition of “reasonably safe”. The OS does not have such a concept. You could add it yourself — constrain the code in the plug-in so that marks up regions where it’s safe to suspend — but that means you can no longer support “arbitrary compiled code”, which was one of the preconditions in my initial response.

Share and Enjoy

Quinn “The Eskimo!” @ DTS @ Apple

[1] I’m focusing on classic BSD kernels here. In Mach, a thread can block in a continuation, but it’s not really blocking in that case.

rayx · May 9, 2023, 8:37am

There will be deadlock too. You can experiment it with flock(1) command on Linux.

# cat works as expected
terminal1 $ flock /tmp/test -c cat

# cat doesn't echo your input, because it isn't running
terminal2 $ flock /tmp/test -c cat

# suspend the first flock process
terminal3 $ kill -19 <pid>

The third command will keep the second flock process waiting forever.

Even if the OS didn't enforce the file lock, there would be race condition, which is a worse issue for application.

theBitThatBytes · May 9, 2023, 9:38am

One additional point to consider: If you have no control over the plug-in code, it could very well start new threads of its own, which you wouldn't even know you'd needed to stop.

Yeah that's possible, but the pointers of created subthreads a stored in the thread control block. So they easily could be suspended or terminated as well.

If your description of the interaction between host and plugin is accurate ("The plugin is loaded on demand and its main function is called. The returned data is then processed further.").

Unfortunately it's not that simple. The plugin can interact with the host and other plugins through an API. That's why I try to avoid sub processes.

theBitThatBytes · May 9, 2023, 10:01am

BTW, why exactly do you need to suspend/resume or kill the plugin code? I have hard time imagining the need in practice.

There are a few situations where pausing or terminating a thread is handy:

A Buggy Plugin
Since the plugins mostly contain scientific code written to solve a problem and not to run well, many plugins have serious design problems. For instance, one of them is the assumption that a value increases above a certain number and methods like while x < 5 { do... } are used. Unfortunately, this blocks forever if this assumption is not true. Sounds weird, is weird, but it happens all the time, and recoding thousands of libraries is not an option.
Resource Constraints
Since hardware resources are limited, it is often helpful to suspend or even terminate tasks that require a lot of resources and run at low priority. Some high-resolution model plugins require gigabytes of memory, run for hours on multiple cores. Terminating or suspending them to balance the system is a must.
Plugin Dependencies
Let's say a plugin (B) relies on output provided by plugin (A). This case can be handled by semaphores, but if (A) fails, (B) should be terminated. Sharing data between plugins and processing it further is easier if threads can be suspended and resumed or terminated based on the dependency structure created when the plugins are loaded.

jayton · May 9, 2023, 10:30am

Threads don’t have their own heaps, though; simply terminating an OS thread would not clean up any memory associated with it. The primitive provided for this kind of cleanup is a process.