Array of tuples or struct of arrays for classification?

I'm implementing a search algorithm, and I want to show the results from most relevant to least relevant. I have an array of profiles from which I want to get an OrderedSet of the profiles' IDs, in order of relevance. My classifications, from most relevant to least relevant, are:

  • completeMatch
  • firstWordMatch
  • notFirstWordMatch
  • firstWordContains
  • notFirstWordContains
  • socialContains

I can call profiles.reduce(into: // Intermediate model) { ... } on the profiles to do this. However, I'm wondering what my intermediate model, which I would then convert into the OrderedSet of profile IDs, should be:

  • An array of tuples composed of the profile ID and the classification as an enum:
enum SearchResult: Int {
    case completeMatch = 0
    case firstWordMatch = 1
    case notFirstWordMatch = 2
    case firstWordContains = 3
    case notFirstWordContains = 4
    case socialContains = 5
}

let results: [(Profile.ID, SearchResult)] = profiles
                .reduce(into: [(SentProfile.ID, SearchResult)]()) { /* ... */ }

searchResults: OrderedSet<Profile.ID> = .init(
    results
        .sorted { lhs, rhs in
            let (_, lhsClassification) = lhs
            let (_, rhsClassification) = rhs
            return lhsClassification.rawValue < rhsClassification.rawValue
        }
        .map { profileID, _ in profileID }
)
  • Or a struct of each classification as independent ordered sets:
struct SearchResult {
    var completeMatch: OrderedSet<SentProfile.ID> = []
    var firstWordMatch: OrderedSet<SentProfile.ID> = []
    var notFirstWordMatch: OrderedSet<SentProfile.ID> = []
    var firstWordContains: OrderedSet<SentProfile.ID> = []
    var notFirstWordContains: OrderedSet<SentProfile.ID> = []
    var socialContains: OrderedSet<SentProfile.ID> = []
    
    func joined() -> OrderedSet<SentProfile.ID> {
        return completeMatch
            .union(completeMatch)
            .union(firstWordMatch)
            .union(notFirstWordMatch)
            .union(firstWordContains)
            .union(notFirstWordContains)
            .union(socialContains)
    }
}

let results: SearchResult = profiles
                .reduce(into: SearchResult()) { /* ... */ }

searchResults = searchResults.joined()

My main concern is time complexity, and I can't quite seem to decide which is better between the two.

From a purely theoretical standpoint, the above case is O(n log n) (bound by the sorting step), while the below case is O(n) (assuming that you append to each of the OrderedSets, not prepending to them).

But I believe that you can reduce the first example to O(n) time as well if you essentially combine the approaches and first place the IDs into designated arrays (one per SearchResult enum case) and then join them into a set as you do in the second example — because there's no reason for you to do a comparison-based sort. There's also no reason for you to ensure uniqueness during the matching pass, as you put the results into a single set later on anyways, so re-hashing the same thing twice will just cost more unnecessary CPU time than simply appending to an array.

The best scenario is, of course, if you can ensure the uniqueness of your IDs prior to handing them off to the algorithm (which really should be possible because they are, well, IDs and if you're getting them from a database, it should already be able to maintain uniqueness) — then you need no sets — and thus time spent on hashing — at all.

I'm not quite sure what you mean by this – do you mean each enum case should have an associated value which would be an array?

I mean that you can do something like this:

but use arrays instead of sets, i.e.:

var completeMatch: [SentProfile.ID] = []

as you don't have to ensure uniqueness at this step, since you combine everything into a single set later on anyways. To make things less verbose, you can also instead use

var buckets: [SearchResult: [Profile.ID]] = Dictionary(uniqueKeysWithValues: SearchResult.allCases.map { ($0, []) })

— which is basically your standard data structure for a bucket sort — as long as you conform the SearchResult enum to Hashable and CaseIterable.

Thank you, this is a great solution. However, does the uniqueKeysWithValues keep the same order of SearchResult keys? Will the result be [.completeMatch: [...], .firstWordMatch: [...], .notFirstWordMatch: [...], etc.]? Or will the keys be mixed up?

If this is the case, I guess I could use an OrderedDictionary from swift-collections, unless there is a way to efficiently sort the keys with a normal dictionary.

It doesn't, but you don't need it, as your enum definition already encodes the ordering (either with the declaration order of the cases or with their Int raw values, which in your case match up). If you need to walk the buckets in this order, just do this:

for matchKind in SearchResult.allCases {
    let bucket = buckets[matchKind]
    ...
}

— this will iterate over the buckets in order of completeMatch -> firstWordMatch -> ...etc.

I mean, you could use an OrderedDictionary too, but I don't see an immediately compelling reason for this, as there's nothing that suggests that you need to store the key-value pairs in this order — you only need to access them in such way.

I see – so IIUC you mean doing something like this:

searchResults = SearchResult.allCases
    .reduce(into: [Profile.ID]()) { array, matchKind in
        array.append(contentsOf: buckets[matchKind] ?? [])
    }

Yep, that would be one of the possible implementations of the "merge" step where you combine all your search results into as single output array.

To be perhaps more clear, I'm not suggesting anything in particular, as I don't fully understand your goal (because if the ultimate task was to enable a robust textual search on profile IDs, which I assume to be strings persisted in a some sort of database, I would have approached it quite differently) — I'm just suggesting some surface-level improvements on what you have provided in the example code. There are a lot of different ways you could go about your task, but frankly, if your dataset doesn't supersede the cardinality of a thousand or maybe even ten thousand of items or so, any performance gain I discussed here might as well turn out to be quite marginal.

Indeed, the search feature isn't one of the primary features of the app, so I didn't want to make the algorithm too complex up to a point at which it becomes difficult to test / unmaintainable. I'm quite happy with the implementation I have now, so thank you very much indeed for your help. However, I would still love to hear about your different approach, since I'm sure I'll be creating many more search features in the future.

To give you a bit more context, the data is stored on-device and consists of individual contacts with a name + surname and a list of social media accounts. The profile IDs are just the unique UUIDs assigned to each contact / profile (since I'm using value types, so I don't have the concept of object identity).

Terms of Service

Privacy Policy

Cookie Policy