Hello,
Before I describe my problem in detail, I would like to summarize the situation I am in, because I think this is important to understand my problem. I hope someone can give me give me an idea to help me troubleshoot. At the moment I am running out of ideas.
- I have an elaborate project with 240 test cases
- All tests run smoothly with multiple (4) workers on the Mac and also in a Linux environment
- All tests run smoothly with 1 worker on the Mac.
- The test fails if 1 worker runs the entire suite on Linux
- Executing the failed test alone works and the test turns green
- If I rename the test so that it runs in a different order, it works and turns green
- It does not work to leave the test where it is and run the entire suite.
Let's take a look at the interesting parts of the call chain.
The test:
func testGetExampleCallback() async throws {
let cbi = ScriptProvider()
let expect = expectation(description: "userValidate response")
cbi.start(class: .userLogin, arguments: userOK) { result in
do {
let bodies = try result.get()
XCTAssertEqual(bodies.count, 1)
[...]
} catch {
if let err = error as? ScriptProvider.ScriptError {
switch err {
case .timeout:
XCTAssert(false, "Call timed out)")
[...]
I have a class
cbi that I call with a .userLogin and some arguments. Irrelevant for the
test so far, except that the arguments have a script that calls a fetch method. userOK
implements
a call to http://example.com
fetch("http://example.com", {
method: "get"
})
So far so good. cbi" has a DispatchGroup
private let group = DispatchGroup()
and start(class:arguments:)
, this group starts to evaluate the script, and when it is finished, the group is notified. As long as SCRIPT_TIMEOUT is not reached, the script can do what it has to do (in our case: fetch):
_ = group.wait(timeout: DispatchTime.now() + DispatchTimeInterval.seconds(Constants.PROVIDER.SCRIPT_TIMEOUT))
I am waiting till the script has executed completely:
/// Is called when `commit` is called from within the script context
///
/// - Parameter data: Array of strings that are committed from within the script context
///
func valueDidCommitted(data: [String?]) {
// set result to class variable
committedResults = data
// leaf the DispatchGroup that is opened in `run()`
group.leave()
}
That is to give you all the details in this context of the method where the problem is occurring. I don't think the context is the problem here, but after several days of debugging I want to show everything for someone who can help.
I'm pretty sure I've overlooked something. For my understandinf the wait
is not the problem, becuase the script evaluates and calls the fetch
method. That method executes, but the URLSessionTask inside is not. (But only on linux, only when I run the whole test suite and only with a single test worker).
Let's take a look at the implementation of the "fetch" method.
First I get a URLSession let session = URLSession.shared
(I tried a standard session with different configurations, without success).
I set some headers and an optional body (post) and prepare the request before I create my task:
let task = session.dataTask(with: request) { data, response, error in
Log.info("REQUEST A")
[...]
}
Log.info("REQUEST 0")
print("1. Task state: \(task.state)")
task.resume()
print("2. Task state: \(task.state)")
This test works in any situation, but not under Linux with a single test worker.
But that's what I need to support docerized-vscode testing.
If I run this test in the whole suite in a linux-docker, ALL OTHER tests (even if they use the same fetch method) run fine, but this one returns REQUEST 0
... 1. Task state: suspended
.... 2. Task state: running
-... timeout!
I never get into the task callback and never see the log of REQUEST A
.
Since I see REQUEST 0
, I'm pretty sure there is no other lock. It must be the task
blocking or waiting for something. I have set the network connection limit higher in Linux, the maximum open files, and am trying to find any clue, but so far I have found nothing.
How can I debug and fix this problem? Because it is only in this situation and only in this particular order ( at the bottom most of my suite).
I think another process/task/whatever is blocking the task. But what irritates me is that this does not happen on the Mac. And multiple workers also affect the result.
I have not idea how to find the origin of the problem.
I've tried instantiating a new URLSession every time I call the method, I've tried using a global session, I've also tried resetting and flushing it.
I did my best in the debugger, but after the third night I need some new ideas from you.
Thanks for any guesses, thoughts or support to help me solve the problem,
Kris
Detials: Swift 5.9.2, Linux eaada6d1d3ec 6.5.0-15-generic #15-Ubuntu SMP PREEMPT_DYNAMIC Tue Jan 9 22:39:36 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux, Colima docker environment.