Handling large SwiftData import

I have a Swift iOS app that handles a background task that involves importing a large number of records. Currently in the 5,000s. I am using a model actor to upsert (based on the record’s id) into my table. The time it takes to complete is typically 15 minutes.

I’ve read in places (like here) that it’s better to chunk data and only call try? modelContext.save() after each chunk. I’m not doing that yet.

I was wondering if there were any other conventions for dealing with large data imports on an iOS app. Could I do a quick write to a JSON file where, say, if my app crashes during the import, rather than having to import everything all over again, it can pick up from the JSON file and then delete it once it’s done?

I suppose a web-oriented approach is using a cache as a middleman, but I’m not sure that Swift iOS provides that option.

Another possible and more realistic option is that, because the data I’m importing includes a date field, I could update it in AppStorage (or UserDefaults) and, if my app crashes, when calling the API, it will reference the field to cut off all the records that are before that date.

That can be a real headache if the number of records grow over time.

Ask yourself the crucial question: Do I really need to import all of them at once? Are there other alternatives that scale well?

1 Like

Importing 5000 records is taking 15 minutes? That’s almost .2s per record. That seems really, really high; are these records individually huge? You might try asking at the Apple developer forums for help optimizing your data model so that you don’t need to eagerly bring in so much data that you probably don’t always need.

In general, questions about Swift Data are better suited for the Apple forums, although in this case I think people can probably help without knowing much about Swift Data specifically.

1 Like

Is this slowdown due to SwiftData itself or the underlying SQL? I'd make a quick test to see what speed you could get from those other two.

If there's no better way (e.g. you tried CoreData/SQL and it shows the same slowdown and it could be sped up indeed via committing in smaller chunks – I'd then consider storing these two "pointer" fields in the SwiftData storage itself, say the latest record to trust and the latest record that was synced – instead of storing them in a separate storage like a JSON file or user defaults.

insertRecord(...)
insertRecord(...)
change the two pointers
save()

@tera @John_McCall I just realized, but I put 4 foreign key constraints on the table. Not the smartest thing to do with a table I expect to write a lot of records to in a single request. I’ll look into removing those and see if my app’s quality in performing queries remains high.

I did a quick SwiftData test and could reproduce the behaviour similar to what you see. The model is simple key-value storage, no FK's, etc.

The test (pseudocode):

        for x in 1...5000 {
            y = measureTime {
                for i in 0...5000 {
                    fetch()
                    update()
                    if (i % x) == 0 {
                        save()
                    }
                }
                save()
            }
        }

so each iteration updates 5000 items, doing save after updating every x items.

This is how this time graph looks in practice:

Closeup near zero:

With a sweet spot between 30 and 70.

It's quite unfortunate behaviour... so at least with SwiftData to make it fast you have to save every now and then, but that means that if atomicity is important – you'll essentially have to reimplement your own journaling on top of what SwiftData (or the underlying SQL) already has.

2 Likes

What if 5000 becomes 50000, and so on?

Reading everything in at once is quite irresistible, because it requires minimal amount of coding, but is guaranteed to make life miserable in the future.

1 Like

Good point.

No idea how long that would take with SwiftData.

It's quick with JSON :-)

Quick test with pure json or 5M key/values (keys are "\(index)" strings where index is 0...N and values made with UUID().uuidString) suggests it's reasonably quick.

writer app:

created a dictionary of 5M random items in 2.6 seconds
json of 5M items encoded in 5 seconds, data size is 243888891 bytes (244 MB)
file written atomically in 0.05 seconds

reader app:

file read in 0.04 seconds
json of 5M items decoded in 3.7 seconds
2 Likes

If you have the ability to refactor your "backend" to Core Data you can try NSBatchInsertRequest. Here is the sample code project that might help get started:

I have built an app that has to keep a large database of product and order data on the device. At first, I tried using SwiftData, but I quickly gave up: performance was bad, and I was getting hard to debug crashes at times.

I instead used GRDB.swift, which was a game changer for me. Performance was way better, and I really like the API.

That said, sqlite (the database backend used by most persistence libraries on mobile devices) has bad throughput if you commit too often / have too many “small” transactions. This is explained here: SQLite Frequently Asked Questions

So you will need to chunk your import.

Best regards
Christoph

1 Like

It might be a good idea to use a secondary db, like the one you suggested, for most of the data I’m importing. The data comes from an API, so I don’t really need it to be backed up via CloudKit.

However, there are some models I do want to back up with CloudKit, and SwiftData provides first-party support for that.

If I can have these two different data systems working together in my app, that might work.

Maybe have a look at this?

I have never used it and therefore cannot talk about its merits, but it sounds like what you need.