Run Swift function inside SQL query using GRDB

I have a table named WordBox that contains a column named menukad. I can filter the wordboxes for which menukads start with the string $searchText with this simple query:

static func matchingMenukads(_ searchText: String) -> QueryInterfaceRequest<WordBox> {
    var query = WordBox
        .filter(Column("menukad").like("\(searchText)%"))
}

Before going further, just some context:
in hebrew, a menukad is a word written with its nekkudot. What is it ?
As some may know, in hebrew, the vowel sounds are written as little dots or bars above or below the actual characters, (eg like so ָ ‎ ‎ֲ ). These are called nekkudot (which mean "dots" in hebrew).
Example for the verb 'to eat' (pronounced le'ekhol):
לֶאֱכוֹל with nekkudot
לאכול without nekkudot

When we write in hebrew, we never use the nekkudot.
I explain all of this because I am building a search engine that should look for $searchText inside several columns of my database, including the menukad column.
The menukad column contains the words with nekkudot but, as we rarely write the nekkudot in hebrew, the $searchText will often not contain nekkudot. So I need to be able to match $searchText with menukads but without nekkudot.

Technically, nekkudot are encoded as UTF-8 characters so you can write a String method that removes the nekkudot from its string:

import Foundation

extension String {
    func removeNekkudot() -> String {
        let pattern = "\\p{M}" // matches Unicode characters classified as "marks," which includes diacritics such as nekkudot in Hebrew
        let regex = try! NSRegularExpression(pattern: pattern, options: [])
        let range = NSRange(location: 0, length: self.utf16.count)
        return regex.stringByReplacingMatches(in: self, options: [], range: range, withTemplate: "")
    }
}

So I want to match my $searchText with menukad with or menukad without nekkudot

First solution: I could duplicate the menukad column and remove all the nekkudot from it. It would work but I would duplicate data...

Question
So I would like to know if I coukd use removeNekkudot to efficiently match my menukad fields with my searchText directly in the SQL ? Is that feasible using GRDB ?

Doing it in SQL directly is the right way to go. Duplicating the column is not only more storage but (even worse IMO) allows the db to get into invalid states. Doing it in code would require you to fetch the entire table and then filter the result which is not going to scale.

So what you really need isn't support from GRDB but from sqlite itself. GRDB will then let you drop down to raw SQL so you can write the appropriate query.

You can insert raw WHERE clauses using GRDB. See here.

On the sqlite level, I think what you want is an application defined function. See here.

My understanding (I've never done it myself) is that you use the API covered in that link to create a string function with a name, say "REMOVEMODS", and a function pointer that points to your implementation in code, then you use it in a SQL by its name:

SELECT * FROM WordBox WHERE REMOVEMODS(menukad) LIKE "..."

Then you can use it in GRDB as:

  let request = WordBox.filter(literal: "REMOVEMODS(menukad) LIKE \"\(searchText)\"%")
1 Like

Thank you for validating that it is not the correct approach to duplicate my column.
I followed your advice and I setup an SQL function in my app setup as so:

struct AppDatabase {
    //TODO: Set this back to private after migrations
    public let dbWriter: any DatabaseWriter
    
    init(_ dbWriter: any DatabaseWriter) throws {
        self.dbWriter = dbWriter
        
        if isNewDatabase() {
            try migrator.migrate(dbWriter)
        }
    }
    
    private static let sqlLogger = OSLog(subsystem: Bundle.main.bundleIdentifier!, category: "SQL")
    
    static let removeNekkudotFunction = DatabaseFunction("remove_nekkudot", argumentCount: 1, pure: true) { values in
        guard let string = String.fromDatabaseValue(values[0]) else {
            return nil
        }
        return string.removeNekkudot()
    }

    public static func makeConfiguration(_ base: Configuration = Configuration()) -> Configuration {
        var config = base
        
        config.prepareDatabase { db in
            db.add(function: removeNekkudotFunction)
            
            if ProcessInfo.processInfo.environment["SQL_TRACE"] != nil {
                db.trace {
                    os_log("%{public}@", log: sqlLogger, type: .debug, String(describing: $0))
                }
            }
        }
        
#if DEBUG
        config.publicStatementArguments = true
#endif
        
        return config
    }
    
    private var migrator: DatabaseMigrator {
        var migrator = DatabaseMigrator()
        
#if DEBUG
        migrator.eraseDatabaseOnSchemaChange = true
#endif
        
        // Register migrations
        Migration_01_CreateTables.performMigration(migrator: &migrator)
        
        return migrator
    }
    
    private func isNewDatabase() -> Bool {
        var isNew = false
        try? dbWriter.read { db in
            isNew = try db.tableExists("wordBox") == false
        }
        NSLog(isNew ? "Database is new" : "Database is not new")
        return isNew
    }
}

extension AppDatabase {
    static let shared = makeShared()
    
    private static func makeShared() -> AppDatabase {
        do {
            guard let databaseURL = Bundle.main.url(forResource: "db", withExtension: "sqlite") else {
                fatalError("Database file not found in bundle.")
            }
            
            NSLog("Database file path: \(databaseURL.path)")
            
            let dbPool = try DatabasePool(
                path: databaseURL.path,
                configuration: AppDatabase.makeConfiguration()
            )
            
            let appDatabase = try AppDatabase(dbPool)
            return appDatabase
        } catch {
            fatalError("Unresolved error \(error)")
        }
    }
}

extension AppDatabase {
    var reader: DatabaseReader {
        dbWriter
    }
}

It now works but I still have an issue, because I actually didn't say everything: I need to integrate my SQL function inside the following big query:

let query = Card
    .joining(optional: Card.subcards.aliased(subcardAlias)
        .joining(optional: SubCard.content.aliased(wordboxAlias)
            .joining(optional: WordBox.translation.aliased(translationAlias))))
    .joining(optional: Card.header.aliased(headerAlias)
        .joining(optional: WordBox.translation.aliased(headerTranslationAlias)))
    .filter(
        wordboxAlias[Column("transcription")].like("\(searchText)%")
        || AppDatabase.removeNekkudotFunction(wordboxAlias[Column("menukad")]).like("\(searchText.removeNekkudot())%")
        || translationAlias[Column("body")].like("\(searchText)%")
        || headerAlias[Column("transcription")].like("\(searchText)%")
        || AppDatabase.removeNekkudotFunction(headerAlias[Column("menukad")]).like("\(searchText.removeNekkudot())%")
        || headerTranslationAlias[Column("body")].like("\(searchText)%")
    )
    .distinct()
    .select(Column("id"))
    .asRequest(of: Int64.self)

I inspired from the Custom SQL Functions section of this doc. It now fetches the expected values, but the request takes so long to execute.. Can I improve my code somewhere ?

Edit:
So I printed what happened inside AppDatabase.removeNekkudotFunction and it looks like the processing just takes too much time (I have around 240k lines to process). When I dont use this function my search is almost instantaneous, when I use it it takes ~15s...
I need to find a solution fast (at least a temporary one) so I think I might duplicate my column if I find nothing best :(

I would recommend to look at the output of explain for your queries. I don't know your schema, but with those performance numbers, I would expect the version without the function to be using indices, and the one with to fall back to a linear scan.

You can add an index over a call to your function if you mark the function as deterministic. Kind of similar to storing it directly in the table, but feels a little cleaner imo.

You might be looking for Full Text Search:
https://www.sqlite.org/fts5.html

I know with Postgres it supports a lot of different useful and "smart" stuff, one of which should solve your current problem. I'm not 100% sure about SQLite, but i'd assume it'll still be helpful.

1 Like

In an app I'm working on, I have a similar need. I want to perform a full-text search on a few textual columns, one of them containing an email address. When matching emails, I want to ignore the TLD (I don't want a user who types "com" to match john@apple.com).

To achieve this, I first built an SQL function that strips the TLD from a string:

let searchableEmail = DatabaseFunction("searchable_email", argumentCount: 1, pure: true) { dbValues in
    guard let string = String.fromDatabaseValue(dbValues[0]) else {
        return nil
    }
    
    if let dotIndex = string.lastIndex(of: ".") {
        return String(string[..<dotIndex])
    } else {
        return string
    }
}

Just as in your sample code, this function is added to every database connections:

var config: Configuration
config.prepareDatabase { db in
    db.add(function: searchableEmail)
}

Then, in my database schema, I add a generated column email_ft (ft as in "full-text"), based on the email column. By default, generated columns are virtual, which means that they do not use any storage. Data integrity is guaranteed by SQLite:

try db.create(table: "consumer") { t in
    ...
    t.column("fullName", .text).notNull()
    t.column("email", .text)
    t.column("email_ft", .text).generatedAs(searchableEmail(Column("email")))
}

See generatedAs(_:_:) for more information about generated columns.

I create a full-text table that indexes the searchable columns (fullName and email_ft):

try db.create(virtualTable: "consumer_ft", using: FTS5()) { t in
    t.synchronize(withTable: "consumer")
    t.column("fullName")
    t.column("email_ft")
}

See synchronize(withTable:) for more information about "external content tables".

I define the association between the consumer and consumer_ft tables in the Consumer record type, as well as a matching(_:) method for requests of consumers:

extension Consumer: TableRecord {
    static let fullText = hasOne(Table("consumer_ft"), using: ForeignKey([.rowID], to: [.rowID]))
}

extension DerivableRequest<Consumer> {
    func matching(_ pattern: FTS5Pattern?) -> Self {
        joining(required: Consumer.fullText.matching(pattern))
    }
}

Now I can let FTS5 perform an efficient search of consumers that match a user query:

let searchString: String // User Input
var request = Consumer.all()
if let pattern = FTS5Pattern(matchingAllPrefixesIn: searchString) {
    request = request.matching(pattern)
}
let consumers = try request.fetchAll(db)

See FTS5Pattern for more information about search patterns.

Your setup is a little bit more complex, @leonardkrief. But you see that we had a lot of common ideas. My contribution here is mostly about leveraging the full-text features of SQLite, and generated columns. I hope this helps!

2 Likes