Array of tuples or struct of arrays for classification?

nikitamounier · August 10, 2021, 8:35pm

I'm implementing a search algorithm, and I want to show the results from most relevant to least relevant. I have an array of profiles from which I want to get an OrderedSet of the profiles' IDs, in order of relevance. My classifications, from most relevant to least relevant, are:

completeMatch
firstWordMatch
notFirstWordMatch
firstWordContains
notFirstWordContains
socialContains

I can call profiles.reduce(into: // Intermediate model) { ... } on the profiles to do this. However, I'm wondering what my intermediate model, which I would then convert into the OrderedSet of profile IDs, should be:

An array of tuples composed of the profile ID and the classification as an enum:

enum SearchResult: Int {
    case completeMatch = 0
    case firstWordMatch = 1
    case notFirstWordMatch = 2
    case firstWordContains = 3
    case notFirstWordContains = 4
    case socialContains = 5
}

let results: [(Profile.ID, SearchResult)] = profiles
                .reduce(into: [(SentProfile.ID, SearchResult)]()) { /* ... */ }

searchResults: OrderedSet<Profile.ID> = .init(
    results
        .sorted { lhs, rhs in
            let (_, lhsClassification) = lhs
            let (_, rhsClassification) = rhs
            return lhsClassification.rawValue < rhsClassification.rawValue
        }
        .map { profileID, _ in profileID }
)

Or a struct of each classification as independent ordered sets:

struct SearchResult {
    var completeMatch: OrderedSet<SentProfile.ID> = []
    var firstWordMatch: OrderedSet<SentProfile.ID> = []
    var notFirstWordMatch: OrderedSet<SentProfile.ID> = []
    var firstWordContains: OrderedSet<SentProfile.ID> = []
    var notFirstWordContains: OrderedSet<SentProfile.ID> = []
    var socialContains: OrderedSet<SentProfile.ID> = []
    
    func joined() -> OrderedSet<SentProfile.ID> {
        return completeMatch
            .union(completeMatch)
            .union(firstWordMatch)
            .union(notFirstWordMatch)
            .union(firstWordContains)
            .union(notFirstWordContains)
            .union(socialContains)
    }
}

let results: SearchResult = profiles
                .reduce(into: SearchResult()) { /* ... */ }

searchResults = searchResults.joined()

My main concern is time complexity, and I can't quite seem to decide which is better between the two.

nikitamounier · August 11, 2021, 3:26pm

I'm not quite sure what you mean by this – do you mean each enum case should have an associated value which would be an array?

nikitamounier · August 11, 2021, 4:00pm

wtedst:

To make things less verbose, you can also instead use
var buckets: [SearchResult: [Profile.ID]] = Dictionary(uniqueKeysWithValues: SearchResult.allCases.map { ($0, []) })
— which is basically your standard data structure for a bucket sort

Thank you, this is a great solution. However, does the uniqueKeysWithValues keep the same order of SearchResult keys? Will the result be [.completeMatch: [...], .firstWordMatch: [...], .notFirstWordMatch: [...], etc.]? Or will the keys be mixed up?

If this is the case, I guess I could use an OrderedDictionary from swift-collections, unless there is a way to efficiently sort the keys with a normal dictionary.

nikitamounier · August 11, 2021, 5:28pm

I see – so IIUC you mean doing something like this:

searchResults = SearchResult.allCases
    .reduce(into: [Profile.ID]()) { array, matchKind in
        array.append(contentsOf: buckets[matchKind] ?? [])
    }

nikitamounier · August 12, 2021, 3:14am

Indeed, the search feature isn't one of the primary features of the app, so I didn't want to make the algorithm too complex up to a point at which it becomes difficult to test / unmaintainable. I'm quite happy with the implementation I have now, so thank you very much indeed for your help. However, I would still love to hear about your different approach, since I'm sure I'll be creating many more search features in the future.

To give you a bit more context, the data is stored on-device and consists of individual contacts with a name + surname and a list of social media accounts. The profile IDs are just the unique UUIDs assigned to each contact / profile (since I'm using value types, so I don't have the concept of object identity).