Swift Combine in UIKit. URLSession dataTaskPublisher NSURLErrorDomain -1 for some users

After switching our API Client to Combine we start to receive reports from our users about error "The operation couldn’t be completed (NSURLErrorDomain -1.)" which is the error.localizedDescription forwarded to UI from our API client.

Top level api call looks like this:

class SomeViewModel {
  private let serviceCategories: ICategoriesService
  private var cancellables = [AnyCancellable]()

  init(service: ICategoriesService) {
    self.serviceCategories = service
  }

  // ...

  // Yes, the block is ugly. We are only on the half way of the migration to Combine
  func syncData(force: Bool = false, _ block: @escaping VoidBlock) {
    serviceCategories
      .fetch(force: force)
      .combineLatest(syncOrders(ignoreCache: force))
      .receive(on: DispatchQueue.main)
      .sink { [unowned self] completion in
        // bla-bla-bla
        // show alert on error
      }
      .store(in: &cancellables)
  }
}

Low level API Client call looks like:

func fetch<R>(_ type: R.Type, at endpoint: Endpoint, page: Int, force: Bool) -> AnyPublisher<R, TheError> where R : Decodable {
  guard let request = request(for: endpoint, page: page, force: force) else {
    return Deferred { Future { $0(.failure(TheError.Network.cantEncodeParameters)) } }.eraseToAnyPublisher()
  }

  let decoder = JSONDecoder()
  decoder.keyDecodingStrategy = .convertFromSnakeCase

  return URLSession.shared
      .dataTaskPublisher(for: request)
      .subscribe(on: DispatchQueue.background)
      .tryMap { element in
        guard
          let httpResponse = element.response as? HTTPURLResponse,
          httpResponse.statusCode == 200 else
        { throw URLError(.badServerResponse) }
        
        return element.data
      }
      .decode(type: type, decoder: decoder)
      .mapError { error in
        // We map error to present in UI
        switch error {
        case is Swift.DecodingError:
          return TheError.Network.cantDecodeResponse
          
        default:
          return TheError(title: nil, description: error.localizedDescription, status: -2)
        }
      }
      .eraseToAnyPublisher()
}

In our analytics we can clearly see chain of events:

  • application updated
  • application opened
  • main screen shown
  • alert shown (NSURLErrorDomain -1)
  • application backgrounded then user fall into loop "opened, alert, backgrounded" trying to restart or reinstall the app without success.

First sought was it may be some garbage sent from backend to the client, but our server logs have records for api calls correlated to analytics logs by date and time with http status code 499 .
So we can clearly determine this is not a server problem. We also do not have reports or analytics records from users before this update.

All points to new API client switched to Combine.

It looks like session dropped by the client for some reason but at the same time it does not relates to a memory release cycle since if cancellable where released sink closure will never be executed and alert message will not be shown.

Questions:

  • What can be wrong with this URLSession setup?
  • Did you faced similar behavior and managed to solve it?
  • Do you have ideas how to reproduce or at least simulate such error with URLSession?

Notes:

  • We do not use SwiftUI
  • iOS version vary from 14.8 to 15.0
  • From 5 to 10% of users affected
  • We never faced such error during development or testing

is this error coming from (speaking in olde terms of URLSession API) error closure parameter or response? it looks it's coming from the latter, but worth to double check.

in the above code around the lines where you check for 200, change that code so it thinks it's getting 499.

it's good idea (at least in debug builds) to show exact error from URLResponses (and treat those different in logs to OS errors). something like this:

URLSession.dataTask(...) { data, response, error in
    if let err = error {
        logOrShowError("OS level Error", err.localizedDescription) // or debugDescription
    } else if let err = response.httpError {
        logOrShowError("URLResponse level Error", err.localizedDescription) // or debugDescription
    } else {
        parse data
        if let err = parsing error {
            logOrShowError("Parsing Data Error", err.localizedDescription) // or debugDescription
        }
        if let err = parsedObject.err { // errors could be at this level as well, depending upon a particular API
            logOrShowError("Data level Error", err.localizedDescription) // or debugDescription
        ...
    }
}

where response.httpError is an extension that parses httpStatus out of response and converts it into an error. (i'd also use "reason phrase" for further details about the error but unfortunately it's not available in URLSession API, i never understood why :frowning_face:)

What are OS errors?

If you want to debug errors, then, generally, you need much more information than Error. localizedDescription provides. This property is meant to provide a short, simple description to the end user of the program. In your logs (which the end user doesn't see), you should use String.init(reflecting:) to get maximum information about the error.

1 Like

i'm merely talking about the difference between the error parameter and the error coming out of the response parameter in the completion handler (speaking in terms of traditional version of URLSession's dataTask API). and the very first thing to troubleshoot is to establish is this error the former or the latter. even when request hits the server, and the server responds with 200-299 status - client can still get an error, either in the error parameter, or in http response (e.g. caused by intermediate servers). of course it would be easier if there was a correlation between client errors and corresponding errors generated in server responses (afaiu the original poster this is not the case - the server doesn't generate error responses for those correlating client requests).

cc-ing @eskimo who's very fluent in this area.

Our server use nginx, and 499 is a non-standard status code introduced by nginx for the case when a client closes the connection while nginx is processing the request.

It's also clear from server logs that there is no reason for timeout since no requests were longer than 2 seconds and most of the requests vary from 2 to 200 ms, and we use URLSession.shared wich default timeout is 60 seconds.

In my assumption, it happens in .mapError switch default case, since URLError(.badServerResponse) is relates to code -1011 and not -1.

As I have no other ideas we released a hotfix with additional logs surrounding all possible cases and sent error.debugDesription into our analytics. Suddenly I never know about the requirements of using String.init(reflecting: error) so it may be the next step.

If your use of DispatchQueue.background is an actual .background QoS queue, that may be your problem. Under certain circumstances background queues may not run, including when devices are in low power mode or when the app is in the background. It's possible that tasks created in those circumstances are never properly subscribed to and so may create internal errors from the DataTaskPublisher. Hard to say though without the source, but I would remove the subscribe(on:) anyway, as that doesn't really do anything for you. If you need the values from the publisher on a particular queue, and you should test its behavior first, you can use receive(on:).

1 Like

Thank you for your explanation.
It definitely looks like the reason for the issue. DispatchQueue.background is exactly

static let background = DispatchQueue.global(qos: .background)

as you guess.

This is clearly my fault. Misknowledge and misunderstangin of DispatchQoS. The documentation clearly says:

Assign this class to tasks or dispatch queues that you use to perform work while your app is running in the background.

Background can mean different things across Apple's platforms, unfortunately. You don't usually want to set a QoS anyway, so if you do want a separate queue, it's suggested you just start with DispatchQueue(label:) and test from there. Like I said, I don't think it's necessary in your case unless you want to send values back to a particular queue, like main.

I misremembered. background QoS is supposed to be used in the background, but is considered discretionary and will be paused in low power mode (and probably other, undocumented conditions). While the scheduling of network requests may be discretionary, the handling of their results (and Combine's internal work) really isn't.

1 Like
Terms of Service

Privacy Policy

Cookie Policy