dataTask() in a loop

Neph · May 4, 2023, 8:42am

Hello everyone,

my app downloads a couple of files from a server, using a URLSessionDataTask. When a downloads finishes successfully (and without any errors), then it should start the next download (=next loop iteration). If there is any type of error, then the whole thing has to abort and display the error message through the calling function (I've got my own error enum class that doesn't throw an error for that). If it finishes without any errors, then it simply switches back to the calling function.

This function is called after another dataTask has finished (using a completion handler) but I never switch back to the main thread, so all of this is still running in the same background thread the previous task used.

My code (Swift 5, Xcode 14.2):

private func fileDownload(fileNames fns:[String]) {
    if !errorBool {
        print("Main thread: \(Thread.isMainThread)")
        let group = DispatchGroup()
        
        myloop:
        for f in fns {
        	let url = getURL(f)
        	
        	group.enter()
        	
        	//DispatchQueue.global(qos: .background).async {
        	let task = session.dataTask(with: url) {(data, response, error) in
                defer { group.leave() }
                print("Starting task!")
                
                if error != nil && data == nil {
                    self.errorBool = true
                    break myloop //TODO 1: "Cannot find label 'myloop' in scope", "self." doesn't help
                }
                
                if let httpResponse = response as? HTTPURLResponse {
                    //Do stuff with downloaded data, more error handling that sets the error flag and message
                }
            }
            task.resume()
            //}
            
            //TODO 2: How do I wait here for the task to finish?
            //group.wait()
            if errorBool {
                break myloop
            }
        }
        
        group.notify(queue: .main) {
            print("Done!")
            //Displays any errors in a popup (on the main thread) through the calling function
        }
    }
}

There are two things that I'm experiencing problems with:

How do I break the loop from within the task if there's an error ("TODO 1")? I might be able to replace the first break with a return with the same result but I'm still wondering: Is it possible?
More importantly, how do I wait at "TODO 2" until the task finishes, so I can break the loop if there are any errors? If I use group.wait() there, then the task never starts (deadlock?), even though it should automatically run on a background thread. I tried to switch to yet another background thread for the task (see inactive code above) but that didn't help either.

I thought that this is exactly what a DispatchGroup is used for but is it really or am I just using it incorrectly? Is there anything else that can accomplish a "wait" within this single function? I found a little bit of information about a Semaphore but no exact info how to do it in my case and some people even recommend not using semaphores anymore.

tourultimate · May 4, 2023, 10:16pm

It's much easier if you use Swift Concurrency to achieve what you want:

/// This function runs on the main thread
@MainActor func downloadAndShowError() async {
    do {
        // Try to download files from the urls
        // The function is suspended here, but the main thread is Not blocked.
        try await download(urls: [])
    } catch {
        // Show error if occurred, this will run on the main thread
        print("error occurred: \(error.localizedDescription)")
    }
}

/// This function asynchronously downloads data for all passed URLs.
func download(urls: [URL]) async throws {
    let session = URLSession(configuration: .default)
    for url in urls {
        // If an error occurs, then it will throw, loop will break and function throws, 
        // caller must deal with the error.
        let (data, response) = try await session.data(from: url)
        // Do something with data, response
        // You can even throw from here if you don't like the response...
    }
}

Neph · May 8, 2023, 8:39am

@tourultimate
Thanks for the recommendation and code example. I checked the docs and unfortunately session.data doesn't provide the error, like session.dataTask does, but I need its codes for error handling. I've also already got an error class that I'm using for this (and everywhere else in my app) but it doesn't throw errors and I don't want it to because I don't always want code execution to stop like that.

How do I do this with the dataTask? I thought the DispatchGroup is exactly what I need (according to its docs) but I can't seem to get it to work.

tera · May 8, 2023, 11:40am

You can catch the error:

do {
    let (data, response) = try await session.data(from: url)
} catch {
    print("that's your error here \(error)")
}

If you mean there are cases that dataTask callback API may return result combinations that is not supported by async data API:

data           none none none none some some some some
response       none none some some none none some some
error          none some none some none some none some
------------------------------------------------------
dataTask API    N    Y    ?    ?    ?    N    Y    N
async data API  N    Y    N    N    N    N    Y    N

specifically those that marked with ? - you might be right.

Neph · May 8, 2023, 1:09pm

I mean the URLError (let err = error as! URLError). You can get its code with err.errorCode and that's what I'm using to check for my custom non-throwable error.

It also leaves the problem with throwing to get back to the extra function (your downloadAndShowError). I'm already using dataTask for other requests and haven't noticed any problems with it. Adding data now means that I would not only need a second error class (that supports throw) but would then also have multiple approaches to very similar things (that aren't compatible with each other) in a single class. I'd like to avoid that (and I don't care if it's slightly "harder" to do with dataTask than Swift's new concurrency).

I'm only interested in specific combinations:

Some error and no data (always that combination in my tests) = error and abort (the response doesn't matter)
No response/response that's not a HTTPURLResponse -> error
Some response = possible success ->
3a. Correct status code -> success
3a. Wrong status code -> error

tera · May 8, 2023, 1:23pm

I am still confused. What's wrong with something like this? It is very similar to dataTask API and you are getting URLError back via catching (converting throwing API into non throwing):

func performRequest(_ session: URLSession, _ url: URL, completion: @escaping (Data?, URLResponse?, Error?) -> Void) async {
    do {
        let (data, response) = try await session.data(from: url)
        completion(data, response, nil) // proceedWithData
    } catch {
        print("error \(error)")
        let err = error as! URLError // maybe other errors could be here as well
        completion(nil, nil, error) // proceedWithError
    }
}

Neph · May 8, 2023, 2:27pm

Isn't that what dataTask already does?:

for f in fns {
    let url = getURL(f)

    performRequest(url) {(data, response, error) in
        ....
    }
    //More code here
}

If I understand this correctly, then I'll end up with the same problem as with my code in the first post: It instantly jumping to whatever's after the completion handler in the loop.

I'll be honest: I'm hesitant to use data or any other version when I haven't even worked out how to do it with dataTask (I want to know why it works/doesn't work), that's why I asked about that one specifically.
I understand that concurrency is the "new thing" but dataTask (or even DispatchGroup) isn't deprecated and so far I don't see any reason not to use it, especially after already setting up everything else for and with it and if the new thing makes everything a lot more complicated (more passing around and lots of functions to do just one thing (the download) that I'm going to mix up when I come back to this a couple of months or maybe even a year from now).

bjhomer · May 8, 2023, 4:00pm

There isn't an easy way to do this download-one-at-a-time-sequentially thing with dataTask(); one of the benefits of the new structured concurrency features is that it enables you do perform asynchronous tasks in a loop in a way that was very difficult to do before.

Here's how I would do it without structured concurrency, though:

private func downloadFiles(filenames: [String], completion: (Error?)->Void) {
  var remainingFiles = filenames

  // A nested function that we can call repeatedly
  func downloadNextFileOrFinish() {
    guard let nextFile = remainingFiles.pop() else {
      // We're done!
      completion(nil)
      return
    }

    downloadOneFile(nextFile, completion: { error in
      if let error {
        completion(error)
      }
      else {
        downloadNextFileOrFinish()
      }
    })
  }

  // Kick off the iterated downloading
  downloadNextFileOrFinish()
}

private func downloadOneFile(_ filename: String, completion: (Error?)->Void) {
  let url = getURL(filename)
  let task = session.dataTask(with: url) { (data, response, error) in
    guard error == nil else {
      completion(error)
      return
    }

    // Check the HTTP response, etc, then call completion
    completion(nil)
  }
  task.resume()
}

That's significantly more complex than the version shown above that uses structured concurrency. The advantage of await is that it doesn't just jump to what's after the completion block, the way dataTask does; instead, it waits for that call to finish first.

tera · May 8, 2023, 4:49pm

No, the version I presented is an async function.

One other option to do what you want is schedule all requests in parallel, typing in this web window, so there would be typos:

func downloadMany(_ url: [URL], completion: @escaping ([Data?], [URLResponse?], [Error?])) {
    let session = URLSession(configuration: ...)
    let count = url.count
    var datas = [Data?](repeating: nil, count: count)
    var responses = [URLResponse?](repeating: nil, count: count)
    var error = [Error?](repeating: nil, count: count)

    for (index, url) in urls.enumerated() {
        session.dataTask(URLRequest(url)) { data, response, error in
            if let error {
                errors[index] = error
                responses[index] = response
            } else {
                if let r = response as? HttpURLResponse, r.code is such and such {
                    this is error as well
                }
                responses[index] = response
                datas[index] = data
            }
            if finished == count or this is the first error and you want to stop {
                completion(datas, responses, errors)
            }
        }.resume()
    }
}

if you want so you may cancel / invalidate all outstanding requests after getting the first error.

In practice there will be a limit how many connections are established per host (you can control the maximum number).

The good thing - the overall download time would be quicker compared to the case you do requests one after another, in the extreme cases N times quicker (N = url count).

Neph · May 9, 2023, 8:37am

Hm, concurrency, that's an interesting alternative, thanks for the code (and I definitely have to look up what nested functions do)!
Is there absolutely no way to do dataTask in a loop with DispatchGroup? I thought that's exactly what the latter is for.
Do you know why people advise against using Semaphore for stuff like this?

Neph · May 9, 2023, 8:51am

No, the version I presented is an async function.

Ah, I see. Thanks for the code.

Edit: I tried to use it but it wants me to add async to every calling function and even gives me an error:

Cannot pass function of type '...[MyClass here]... async -> Void' to parameter expecting synchronous function type

Google says that you can surround the await part with a Task to avoid that, so probably just outside the loop like this?:

private func fileDownload(fileNames fns:[String]) {
    ....
    Task {
        print("Task Start")
        for f in fns {
            ....
            await actualFileDownload(url, completion: {(data, response, error) in
                ....
            })
        }
        ....
    }
    ....
    print("This is printed before 'Task Start'.")
}

This causes other problems though: Now I need a completion handler for fileDownload(), otherwise everything else that I used to call afterwards now runs before this task even starts...

One other option to do what you want is schedule all requests in parallel

Thanks for the code example. I know that it would be faster (in theory) but I decided against parallel downloads early on because all files are needed and if it doesn't succeed for one of them (or if there's a "bad" error), then the whole thing has to stop as soon as possible, without possibly finishing up other parallel downloads at the same time (and wasting mobile data). I'm also not sure if saving that much data (e.g. 10x instead of just 1x before it's saved to file) in memory is a good idea, as I don't know how many files there are and what their size is.

A question about data: I know that dataTask uses a background thread but doesn't switch back to the main thread in its completion handler. How does data handle this?

tera · May 9, 2023, 12:08pm

Yes, Task is right, that's the bridge between sync and async worlds.

No idea how long does it take in your case (and given your "as I don't know how many files there are and what their size is" you don't know that either), but if the single file download takes a second and there are 10 of them you may end up having a user to wait, say 5 seconds before the error occurs in the fifth file and download stops, instead of not waiting at all.

That was just a quick and dirty example code. You may of course reshuffle it and make the callback to fire immediately with the individual file data, and doing this (up to) N times, followed by the final callback callout that says either "finished" (in which case all files are good) or "error" (in which case all files that are already created should be deleted). Whether you do the file download in parallel or sequential is unrelated to how you manage the resulting file data (e.g. you may have a sequential implementation that writes to memory first, and you may have a parallel implementation that writes to files immediately without storing individual file data in memory).

Edit: BTW, if you are going to write to the file system anyway why do you use dataTask and not downloadTask to begin with? There are certain advantages to the latter, in particular with just a couple additional steps you can download files in background (even when your app is not running!)

This is not entirely true... you control the queue that's used for the callback. It could be some background queue with:

URLSession.shared
URLSession(configuration: .default)

Or it could be a queue of your own choice, including the main queue (which runs on the main thread):

URLSession(configuration: .default, delegate: nil, delegateQueue: queue) // a given queue
URLSession(configuration: .default, delegate: nil, delegateQueue: .main) // main queue

BTW, if your callback code jumps to the main queue the first thing:

session.dataTask { data, response, error in
    DispatchQueue.main.async { // or, equally `otherQueue.async`
        // actual code
    }
}

you can avoid having that explicit queue dance by specifying the relevant queue in the session initialiser.

I believe you may end up on some system background queue/thread after "await data" even when you specify the queue explicitly as per above, but will leave this for others to comment upon.

Neph · May 9, 2023, 12:56pm

I edited my post again just before you posted your answer:

This causes other problems though: Now I need a completion handler for fileDownload(), otherwise everything else that I used to call afterwards now runs before this task even starts...

Is there a different way around that but also without having async influence basically everything else in the tree or is it just that or the task?

No idea how long does it take in your case (and given your "as I don't know how many files there are and what their size is" you don't know that either), but if the single file download takes a second and there are 10 of them you may end up having a user to wait, say 5 seconds before the error occurs in the fifth file and download stops, instead of not waiting at all.

I'm aware but I prefer having more control over it. Luckily the download isn't necessary every time the app is started.

Edit: BTW, if you are going to write to the file system anyway why do you use dataTask and not downloadTask to begin with? There are certain advantages to the latter, in particular with just a couple additional steps you can download files in background (even when your app is not running!)

Doesn't downloadTask work quite similar to dataTask, meaning that it would cause the same problem with waiting for a single file do finish downloading before it continues with the next one? There's no need for stuff to be downloaded while the app isn't running.

This is not entirely true... you control the queue that's used for the callback. It could be some background queue with:

You're right. I use URLSessionConfiguration.default, so I can change the timeouts and it seems to default to a background thread.

Gero · May 9, 2023, 4:06pm

Since I believe there was no direct answer to this, I might add: In your approach, where you want to download the files sequentially (i.e. only start the next download after the previous finished), I would say "No".

Before structured concurrency, DispatchQueue was a good way to schedule multiple parallel downloads. The idea was to enter() the group and resume() a task in each loop and leave() it in the completion handler of the task. Cancelling outstanding tasks when one encounters an error becomes a little trickier then (but far from impossible) and you don't really have a guarantee of order.

With a sequential approach like you described it is much easier to have a function do the download and pass results via completion handler back to the calling site. There you can then start the next download (or not, depending on whether the handler denotes an error), but you likely have to keep in mind to switch queues if need be (which usually there is, in my experience).

Structured concurrency basically does this for you and it has a much better error handling mechanic as you can throw errors. In this case you probably want to cast the Error to URLError, but that will always work because that's what the task actually throws when it encounters a, well, url error.

Jon_Shier · May 9, 2023, 6:16pm

Just a warning, it’s not guaranteed that tasks will always produce URLError. There have been occasions where it produced things like POSIXError as well.

tera · May 9, 2023, 6:35pm

async does in fact have this viral / infectious property on the surrounding code.

In this regards downloadTask works same as dataTask: with either of them you can load files either sequentially or in parallel, this is up to you. downloadTask is different in another way - it calls the provided callback with a local url file, which you are supposed to move to a permanent location (or will be auto-deleted), so it is slightly more convenient (and fast) API if you want the loaded file data to appear as a file on the file system anyway.

I'd mention httpMaximumConnectionsPerHost briefly: if all your URL's are from a single host then by setting the URLSession configuration's httpMaximumConnectionsPerHost to 1 and spawning all data / download tasks at once you'll effectively have them executed sequentially.

bjhomer · May 9, 2023, 6:53pm

Neph:

Google says that you can surround the await part with a Task to avoid that, so probably just outside the loop like this?:
private func fileDownload(fileNames fns:[String]) {
    ....
    Task {
        print("Task Start")
        for f in fns {
            ....
            await actualFileDownload(url, completion: {(data, response, error) in
                ....
            })
        }
        ....
    }
    ....
    print("This is printed before 'Task Start'.")
}
This causes other problems though: Now I need a completion handler for fileDownload(), otherwise everything else that I used to call afterwards now runs before this task even starts...

Downloading files is a fundamentally asynchronous task involving networks and other computers which may both introduce arbitrary delays, so you have to have to do something to wait for the task to finish. Roughly speaking, you have three options:

Use a completion handler to tell the app what to do after the work completes. This results in a "split" line of execution, where execution immediately jumps to the code after the completion handler, and then later comes back and executes the code inside the completion handler. This can be annoying, as you've discovered; you can't just wait for the function to return, and often all your "useful" work ends up in the completion handler.
Block the thread until the work completes. This gives you the nice "single line of execution" model, but comes with a lot of problems. If you call this blocking function on the main thread, for example, you may lock up the UI of your app indefinitely. If you call it on a background thread… you have to hope that the system wasn't planning to use that thread for anything else. (A particularly pernicious example is when you block the very thread that the system was going to use to process the network response, resulting in indefinite deadlock.) Blocking the thread is widely regarded as poor practice, and the Swift SDK doesn't encourage it.
Use an async function to suspend that function until the work completes. This suspends the function (and everything in the call stack) until the work completes. It doesn't block the thread, but instead will simply wake up the thread again when the call completes. This is the best of both worlds—you get the single line of execution model and also avoid blocking the main thread.

However, this only works if the calling function can be suspended. A synchronous function is not designed to be suspended and resumed like this, so the calling stack has to be async. This does mean that at some point, you may have to call into this async stack from a synchronous function, and at that point, yes, you're back to using completion handlers. But using completion handlers only at the boundaries between sync and async functions is much easier than using completion handlers everywhere.

eskimo · May 10, 2023, 7:49am

if all your URL's are from a single host then by setting the
URLSession configuration's httpMaximumConnectionsPerHost to 1 and
spawning all data / download tasks at once you'll effectively have
them executed sequentially.

Only for HTTP/1. HTTP/2 and later run all your requests over a single connection [1] so httpMaximumConnectionsPerHost is irrelevant.

Share and Enjoy

Quinn “The Eskimo!” @ DTS @ Apple

[1] A TCP connection in HTTP/2 and a QUIC connection in HTTP/3.

Neph · May 10, 2023, 9:00am

So DispatchQueue was never meant for sequential downlods? This makes it sound like it is possible and exactly what I need:

Groups allow you to aggregate a set of tasks and synchronize behaviors on the group. You attach multiple work items to a group and schedule them for asynchronous execution on the same queue or different queues. When all work items finish executing, the group executes its completion handler. You can also wait synchronously for all tasks in the group to finish executing.

... but when I tried it exactly as you described (enter before the task starts and leave in the completion handler), it never waited and with wait it caused some type of deadlock and I'm not sure exactly why.

Like the recursive approach bjhomer suggested?

Neph · May 10, 2023, 9:36am

Thanks for the warning, I already solved that with:

if let err = error as? URLError {
    //Handle URLError
} else {
    //Other error
}

What's the etiquette for replying to multiple posts in this forum? One "reply" per post, so it can be linked to it or just one post with multiple "@"?