Add splitMap function to Swift standard library

Dmitriy_Ignatyev · January 8, 2024, 1:50am

Hello Swift community

I have and idea to propose a splitMap function for Sequence protocol. But first of all I want to gather some feedback and community thoughts. In the simplest form splitMap is:

extension Collection {
  public func splitMap<T1, T2, E: Error>(_ transform: (Element) throws(E) -> Either<T1, T2>) rethrows -> ([T1], [T2]) {
    var groupA: [T1] = []
    var groupB: [T2] = []
    
    for element in self {
      switch try transform(element) {
      case .left(let a): groupA.append(a)
      case .right(let b): groupB.append(b)
      }
    }
    
    return (groupA, groupB)
  }
}

extension ObservableType {
  public func splitMap<U1, U2>(_ predicate: @escaping (Element) throws -> Either<U1, U2>)
    -> (matches: Observable<U1>, nonMatches: Observable<U2>) {
    let stream = map(predicate).share()
    
    let hits = stream.compactMap { variant -> U1? in
      switch variant {
      case .left(let values): return values
      case .right: return nil
      }
    }
    
    let misses = stream.compactMap { variant -> U2? in
      switch variant {
      case .left: return nil
      case .right(let element): return element
      }
    }
    
    return (hits, misses)
  }
}

For my own I use it with Swift.Collections, RxSwift, Combine. It can also be added to AsyncSequence / AsyncChannel.

Usage examples:

example 1

let (cellularTypes, unknownCarrierTypes) = сarrierTypesDict.values
      .splitMap { carrierType -> Either<ReachabilityStatus.Cellular, String> in
        switch carrierType {
        case CTRadioAccessTechnologyGPRS:
          return .left(.cellularGPRS)
        case CTRadioAccessTechnologyEdge:
          return .left(.cellularEDGE)
        case CTRadioAccessTechnologyCDMA1x:
          return .left(.cellular2G)
        case CTRadioAccessTechnologyHSDPA:
          return .left(.cellularHSDPA)
        case CTRadioAccessTechnologyHSUPA:
          return .left(.cellularHSUPA)
        case CTRadioAccessTechnologyWCDMA,
             CTRadioAccessTechnologyCDMAEVDORev0,
             CTRadioAccessTechnologyCDMAEVDORevA,
             CTRadioAccessTechnologyCDMAEVDORevB,
             CTRadioAccessTechnologyeHRPD:
          return .left(.cellular3G)
        case CTRadioAccessTechnologyLTE:
          return .left(.cellularLTE)
        default:
          switch carrierType {
          case CTRadioAccessTechnologyNRNSA,
               CTRadioAccessTechnologyNR:
            return .left(.cellular5G)
          default:
            return .right(carrierType)
          }
        }
      }

example 2

let (activeProducts, inactiveProducts) = accumulator.hashedProductVMs
  .splitMap { rawViewModel -> Either<ProductVM, InactiveProductVM> in
    switch rawViewModel.state {
    case .ordinaryNotActive:
      return .right(rawViewModel.copy(transformedState: InactiveProductVMState.ordinaryNotActive(Empty())))

    case .ordinaryWithDiscountForAmount(let params):
      return .left(rawViewModel.copy(transformedState: ProductVMState.ordinaryWithDiscountForAmount(params)))
    case .ordinarySingularPrice(let params):
      return .left(rawViewModel.copy(transformedState: ProductVMState.ordinarySingularPrice(params)))
    case .gift(let params):
      return .left(rawViewModel.copy(transformedState: ProductVMState.gift(params)))
    }
  }

example 3

let (certificatesInstances, failedCertificates) = certificates
  .splitMap { certificateData -> Either<SecCertificate, Data> in
    if let certificateInstance = SecCertificateCreateWithData(nil, certificateData as CFData) {
      return .left(certificateInstance)
    } else {
      return .right(certificateData)
    }
  }

example 4 (Rx)

let (routeToOnlinePayment: Observable<(UInt64, NSDecimalNumber, PaymentInfo)>, routeToCashPayment: Observable<Void>) = responses.paymentKindUpdated
  .filter { $0.isPaymentPossible }
  .splitMap { paymentData -> Either<(UInt64, NSDecimalNumber, PaymentInfo), Void> in
    if let paymentInfo = paymentData.paymentInfo {
      return .left((paymentData.orderId, NSDecimalNumber(value: paymentData.totalPrice), paymentInfo))
    } else {
      return .right(Void())
    }
  }

example 5 (Rx)

let (oAuthCodeRecoverableError, oAuthCodeUnrecoverableError) = authCodeErrorEvent
  .splitMap(AuthCodeErrorHelper.splitMapAuthCodeError(_:)) // `splitMapAuthCodeError(_:)` contains a lot of bolierplate

There are plenty of other examples but I think it's enough for now.

The topics to discuss are:

is it needed for others? may be I often meet such cases because of my projects specifics
should it be added to standard library, SwiftAlgorithms, Foundation or somewhere else?
how and can it be implemented for Sequence with 3 and more generic parameters? Variadic generics don't suite for this, at least I don't know how to do it.
is the name splitMap suitable or something else should be picked (e.g. partition)

sveinhal · January 8, 2024, 12:44pm

Is Dictionary(grouping:by:) useful as an alternative in some of your use cases?

tevelee · January 8, 2024, 1:36pm

The various partitioning APIs in swift-algorithms might also cover your use-case

https://swiftpackageindex.com/apple/swift-algorithms/1.2.0/documentation/algorithms/swift/sequence/partitioned(by:)

Dmitriy_Ignatyev · January 8, 2024, 1:39pm

Dictionary(grouping:by:) is for cases, when number of groups not determined and will be known at runtime.

splitMap is for cases when number of groups is known statically at compile time. See the examples with
let (activeProducts, inactiveProducts) =
and
let (oAuthCodeRecoverableError, oAuthCodeUnrecoverableError) =.
While writing the code we know that error can either be recoverable or unrecoverable, only two variants are meaningful for the task.

So Dictionary(grouping:by:) is helpful in another situations.

For simplicity I din't wrote more complex examples, but in our codebase we have also spliMap with 3 branches func splitMap<A, B, C>(_ predicate: @escaping (Element) throws -> OneOfThree<A, B, C>)).
Though personally I didn't met the need to split into more than 3 branches, I still believe it is a good idea for writing a generic solution that cover any number of branches.

Dmitriy_Ignatyev · January 8, 2024, 1:52pm

This function returns tuple (falseElements: [Element], trueElements: [Element]) where elements in both arrays are of the same type.
It is not useful in provided examples because it only split elements without mapping. Non of the examples can be done with partitioned(by:).

In other words splitMap covers all situations where partitioned(by:) can be used + other cases. It is a more general solution.
At the same time I don't suggest to completely remove partitioned(by:) or replace it by splitMap.

bjhomer · January 8, 2024, 2:38pm

Are there things splitMap can do that could not be done with .map().partitioned(by:)?

Dmitriy_Ignatyev · January 8, 2024, 3:02pm

@bjhomer Try to express any of provided examples in a such way. I have no idea how it can be done with combination of map + partition.

bjhomer · January 8, 2024, 3:39pm

Ah, I see. You could do it by partitioning into two groups first and then running .map on each partition separately, but your proposed splitMap does make use case that easier to express.

Dmitriy_Ignatyev · January 8, 2024, 3:55pm

This way force casts, force unwraps, force try and others are needed.

scanon · January 8, 2024, 4:34pm

This is a thing that we want to be possible with variadics (and I would probably not add it to the standard library proper until it could be done in such a manner), so this points at variadics features or documentation that's missing. Can you either file a bug report or make a separate thread focused on this?

In the meantime, this might make sense for algorithms. In some sense it's "just" a reduce, but half of algorithms is "just a reduce", so I think I'm OK with that. @nnnnnnnn?

(There's also the question of using this as a backdoor introduction for Either that would have to be resolved somehow...)

nnnnnnnn · January 8, 2024, 8:44pm

This does seem like it matches the scope of some of the other additions in the Algorithms library. I agree that having Either in the public API poses a little bit of a challenge, though that's mitigated since it's transitory (the result of splitMap doesn't include Either).

The Algorithms library already has an internal Either type that we could make public for this purpose, along with an EitherSequence (I can't remember why they're separate). It'd be great to eventually have that functionality in the stdlib, but I don't think that future change needs to stand in the way of its use here.

Dmitriy_Ignatyev · January 9, 2024, 12:21am

I've created a separate thread:

PS: I'm not sure about correctness of topic name, please suggest another one if appropriate.

Dmitriy_Ignatyev · January 9, 2024, 12:40am

Should I make a PR? Should it be added only with Either or OneOfThree overload is also useful form your point of view?
The Either type is useful standalone. However once it become public I suppose some people will be confused because of Either: Comparable. There are two points here: first that Either is Comparable which can be surprising, second is about the concrete implementation of Comparable in the library.
It might be a good idea:

to make __Either with underscores and describe that it is for internal usage.
make a general purpose public Either and some additional Either-like type for specific needs (like one with Comparable imp for concrete task which can be incorrect / unusable as a general solution)

I'm not sure about splitMap naming. Current split functions separate elements by separator. So semantically name like partitionMap seems to be more correct in context of current naming but splitMap might be more natural in general meaning. What are your thoughts here?

nnnnnnnn · January 9, 2024, 6:17pm

Taking your questions in reverse!

For the name, I agree that partitionMap is a better match with the behavior of other algorithms. "split" is in a family (with "chunked" and "windows") of methods that divide a collection into subsequences but otherwise maintain their order, with split dropping the separator element(s). This new method is a partition with an added map step baked in for ergonomic reasons.

The Comparable conformance is there so that Either can be used as an index for the EitherSequence type. Rather than designing a proper Either addition to the Algorithms package (it should also be Hashable, Sendable, etc… should it conform to Error?), we can just add a single-purpose enum for this, named something like PartitionMapResult with a first and second case that will map to the position in the resulting tuple. I think that would be a better strategy for this limited purpose.

A PR would be welcome!

Dmitriy_Ignatyev · January 10, 2024, 2:45am

I've made a draft implementation in swift-algorithms/Sources/Algorithms/PartitionMap.swift at partitionMap · iDmitriyy/swift-algorithms · GitHub
Please share any suggestions to move this further to final implementation, test coverage and making a PR.

jaredgrubb · January 20, 2024, 5:54pm

Personally, I really don't love having to write this new type, eg, taking the example given:

.partitionMap { error -> PartitionMapResult2<URLSessionError, any Error> in ... }

Can it take the A/B types as args directly so that the return type can be inferred:

.partitionMap(URLSessionError.self, Error.self) { error in ... }

(Insert bikeshedding if arg labels are helpful here)

Dmitriy_Ignatyev · January 20, 2024, 9:53pm

Yes, unfortunately full type is needed to be specified which is a bit ugly. Of course A/B types can be passed as args. I can add overloads as a temporary solution while waiting for future generic system improvements, but I don't is it an acceptable solution as I'm not library maintainer.