If you know mutation will be critical in a struct use case, is it the wrong object to use?

labatemarco · November 26, 2024, 7:26am

Hi All! Beginner to Swift here(apologies if im posting in wrong topic), and I'm deciding on which data structure to use for holding song info with a [uuid, name, time = nil] kind of data structure. the names can be expected to frequently be updated and the songs frequently moved around, replaced, deleted, added from/to their container. On first approach a struct seemed perfect; its not a particularly demanding use case, but i would of course need quite a bit of mutability. I'm quite used to using nested dictionaries and the like for this in python, but thought id ask the best approach from the pros. Thank you!

RandomHashTags · November 26, 2024, 9:43am

I personally try to use struct wherever I can, because they're fast and efficient. I rarely use class for data structures, but they do have a legitimate purpose.

However, you need to understand what kind of data you're working with to make an educated decision.

In your example you could use these value types: UUID, String, and Date. String (and UUID, but it depends) isn't the best type as its underlying storage is usually stored in heap memory, whereas Date is stored on the stack. Making your struct conform to Hashable will make inserting and removing it from a Set very performant.

import Foundation

struct SongInfo : Hashable {
    var time:Date? = nil
    let uuid:UUID
    var name:String
}

You will pay the price in performance from copies, retains and releases if you mutate your struct heavily. You can mitigate a lot of this by utilizing inout, the relatively new ~Copyable, borrowing and consuming patterns, and if all else fails, unsafe pointers. At least for your example, alternative value types could be Int for the identifier and a SIMD64<UInt8> vector for the song name (or Array/ArraySlice of simd vectors).

Something like this is what I would do (as all values are stored on the stack):

import Foundation

struct SongInfo : Hashable {
    let id:Int // or UInt64
    var time:Date? = nil
    var name:SIMD64<UInt8>
}

If absolute performance is no concern of yours and you just want it to work, you can absolutely use struct or class and the UUID, String, Date value types.

Karl · November 26, 2024, 9:44am

Why do you think structs might not be suitable for mutable data?

What exactly are you concerned about? Are you worried that it won't perform well, or that it will be difficult to work with?

vanvoorden · November 26, 2024, 6:27pm

This is actually a pretty important question. If you come from a platform or ecosystem that encourages references semantics… you might not have a natural intuition for choosing value semantics in a language like Swift. The argument could be made that Swift is not a "functional programming language"… but there are inspirations like immutable data structures and "first class" value semantics.

Value Semantics can be a great choice for data models: like songs in a library. You can see the Food Truck sample app from Apple for an example of what this looks like in SwiftUI. That app is built from immutable data structures.

vanvoorden · November 26, 2024, 6:39pm

RandomHashTags:

Something like this is what I would do (as all values are stored on the stack):
struct SongInfo : Hashable {
    let id:Int // or UInt64
    var time:Date? = nil
    var name:SIMD64<UInt8>
}

It's a tradeoff:

MemoryLayout<SongInfo>.size == 96
MemoryLayout<SongInfo>.stride == 96
MemoryLayout<SongInfo>.alignment == 16

Is trading memory for CPU the "right decision"? Maybe. I don't have one single golden rule that works for all engineers.

This talk from Johannes Weiss uses custom-built copy-on-write data structures to try and optimize for space and retain calls on nested heap references.

Shameless Self-Promotion

The Swift-CowBox repo is a macro for easy copy-on-write semantics.

labatemarco · November 26, 2024, 7:16pm

Thanks for this, using inout parameters in a mutating func seems like a good option. In this case time would actually be float of the song duration.

labatemarco · November 26, 2024, 7:23pm

truthfully i don't think performance will be a problem either way for my lightweight application, but the concern is performance for my question. I'd just like to develop good habits for this language.

labatemarco · November 26, 2024, 7:29pm

Thanks Rick, I'll check out this app. Yeah i'm definitely not used to using mainly value semantics so it's interesting.

hisekaldma · November 26, 2024, 8:09pm

This is incorrect. UUID is always stored inline, and Strings shorter than 16 bytes are also stored inline.

RandomHashTags · November 26, 2024, 8:30pm

Compromises is the name of the game. The optimal memory layout for this example would be (77 bytes & not wasting 1 alignment):

struct SongInfo : Hashable {
    var name:SIMD64<UInt8>
    let id:Int // or UInt64
    var time:Float? = nil
}

An equivalent solution using String for the name would use 29 bytes on the stack, but would cost a constant performance overhead every time you mutate it. Luca said the data would be frequently updated, so the use case would determine if it would be worth it or not.

You're right. Without knowing the maximum length of a song name you want supported I made an educated guess that a reasonable maximum would be 64, in-which the String would not be inlined, which is why I use a SIMD64<UInt8> (you could also use smaller vectors to save memory).

~~UUID could also be inlined. In my example it probably would be.~~

However it can be moved to the heap under certain conditions, which is why I use Int as the identifier. ~~Because Int is smaller, it is less likely to be moved to the heap (at least in my example; and not wasting memory).~~

Correction: UUID's are always stored inline. Whether it is stored on the stack or in the heap is based on how you use it. I still standby using Int instead of UUID in this case to not waste memory and improve performance.

vns · November 27, 2024, 8:43am

I feel like the discussion in the topic went a bit wrong way — to me thinking about memory layout is much further optimization than being asked.

@labatemarco there is a great link few posts above that describes trade offs and main differences in value vs reference semantics in Swift, I find it really great myself, and as was pointed out getting this difference might take time if you haven’t worked with language that has similar feature.

At a more general scale, Swift encourages reach for a value types more than for reference, but that doesn’t mean to avoid latter. There are always use cases when one fits better than another. It’s hard to formalize this knowledge though, mostly it comes with practice.

As for your specific question, it mostly depends on how you approach storing and sharing these song items. I would go with single source of truth — something you would be able to share in use to access data — this would be a class in a simplest version, as we want to have single instance of this data shared. And then songs can be structs that this class holds in some collection type.

jaleel · November 27, 2024, 9:31am

Regarding performance, there was a nice paper and project.

They've used high-level languages to write drivers and actually it was quite performant, except Swift. You can check code and performance evaluation, and will notice it's using classes—and think that's a bottleneck of implementation due to ARC nature:

The generated flame graph for the release version (above) shows a lot of time being spent in the retain/release calls, which are part of the Automated Reference Counting (ARC) of Swift.

So yeah, long story short—memory is not always an issue even writing for drivers, so it's always better to start with value types.

labatemarco · November 27, 2024, 12:29pm

I have to say, although it's a bit beyond the scope of my needs, i do love the depth of analysis provided by this forum! I agree with you about the single source of truth. This data structure would actually be a user defined subset of the overall list of songs(which is also being presented), so my brain naturally thought reference semantics made sense for all objects using it to be refreshed together when external name changes/deletions are made(triggered by external json). I think I'm going to try just using the pre-made SwiftUI List container, with both the full list and the user-defined subset populated from a 'single source of truth' as you said. The song objects will be partially mutable structs, possibly trying out one of the optimizations pointed out above, and then the only real decision left is to use a class as you suggested for the 'source of truth' or a struct.

vanvoorden · November 27, 2024, 8:17pm

I agree… and I also think the previous comment was more specifically about optimizing the difference between "pure" value types (which have no nested reference type semantics) and copy-on-write value types… which are something like "hybrid" types (which do have nested reference type semantics).

For the most part… I agree with a "progressive disclosure" philosophy. If a data structure is a struct… product engineers should believe this is a value type that follows value semantics. If an product engineer really wants to dive deep and optimize for performance then the infra engineer can choose to document the heap reference behavior (on the assumption the product engineer may then choose to strongly couple their product code to the private implementation details of the infra).

hisekaldma · November 28, 2024, 6:51am

UUID is a struct which stores its bytes as tuple (you can look at the source yourself). This means it is always stored inline. It will never be moved to the heap behind your back.

tera · November 28, 2024, 8:10am

I'd follow these guidelines:

Prefer value types by default as it simplifies the app overall
For value type version be ready for an additional memory overhead (and the corresponding CPU overhead to copy that memory), as with value semantic you could not "opt-out" of deep copying (if you need to copy at all that is), while with reference types you could opt-out and use shallow copies instead – provided that shallow copying is enough for the task at hand.
For the reference types version there would be an additional (albeit) minor memory toll for reference types (~ 32 bytes) and an additional (typically minor) ARC toll on CPU.
In the unusual (and typically simple) case when you could use either value or reference types - run a test and measure both memory and CPU, and pick the best one.
If needed consider the COW boxes ("hybrid") approaches – that's to keep value semantics and not make copies unless needed.

drewmccormack · November 28, 2024, 2:25pm

As someone already pointed out, I wouldn't worry too much about performance. Whether you use classes or structs, it will be fast enough. Consider that Apple's own SwiftData framework works with classes. In practice, you are very unlikely to run up against a performance issue due to using classes.

But there certainly are things to think about when choosing between structs and classes for a model. If you are just downloading from a web service, and presenting immutable data, there is not much to think about. Structs are probably a good choice. They are a pretty direct translation from the JSON you are probably downloading, and have advantages like working well across threads.

But your use case is clearly more advanced. You have mutable state, and that makes a big difference. There is a reason Apple use classes in SwiftData: with classes, your data doesn't get out of sync as easily. There is a single source of truth. Your UI will typically be sharing the same objects between views, so if one part of the UI updates a Song, the other parts get that change automatically.

Structs are also a legitimate choice, but you have to think more about how you merge them after changes. Your app will typically end up with many copies of the same struct representing the same Song. You need to make sure that when you make changes to a Song, that it gets stored properly, and that other parts of the app update with the new data.

You also have to think about merging of data. If your Song can be updated by various parts of your app, perhaps by the user, or a new download of metadata from the web, or syncing with another device or sharing extension, you need to be very careful about how the data is merged.

With a single object, the issue is less important, because each property can be updated independently, and everything sharing the object automatically gets the latest data. With a struct, if you are not careful, you can "clobber" other changes. Eg. If your song playback engine updates the time properly, and the user wants to enter a change to the name at the same time. If you aren't careful to merge these changes property by property, you might end up with one of the changes being overwritten.