Why is `Task` a `struct` when it acts so much more like a reference type?

mredig · December 11, 2022, 12:06am

Tasks are implemented as structs in Swift. I hadn't put too much thought into it until now and have been unable to find any answer on google so far.

Anyways, Tasks seem to behave much more like reference types than value types, in my opinion. If you create one, you have the option to not retain a "reference" to it. From the documentation:

A task runs regardless of whether you keep a reference to it. However, if you discard the reference to a task, you give up the ability to wait for that task’s result or cancel the task.

Additionally, its value changes without direct interaction (as a result of the async background operation, of course), which seems at odds with the established convention of needing to mark a self mutating method on a struct as mutating.

This background change also affects other copies of said "value" type. Not to mention, if you cancel a task, it will affect any other references you have to it.

So why is it implemented as a struct? If there's some weird technical reason to do so, would it make sense to include some language in the docs to address and explain that?

tera · December 11, 2022, 4:16am

Not an answer, just an observation: Task is a magic struct:

// let's declare our `Task2` the same way `Task` is declared:
@frozen public struct Task2<Success, Failure> : Sendable where Success : Sendable, Failure : Error {
}

print(MemoryLayout<Task2<Never, Never>>.size) // 0 (as expected)
print(MemoryLayout<Task<Never, Never>>.size) // 8 (some magic here)

Presumably there's some task id stored in those 8 bytes and all methods are computed properties accessing some global table.

mredig · December 11, 2022, 4:20am

How do you say that what you said makes sense, but ultimately also doesn't? Like, I totally see what you are saying, but why would it be done like that? This just seems antithetical to the whole point of value semantics.

DevAndArtist · December 11, 2022, 10:49am

I don't know the reasons and the answers to most of the question, but this stuff is open sourced and we can dig in the implementation. Task wraps Builtin.NativeObject internally (whatever this type is) and yes it looks likes it's a type following reference semantics. Even though it's a struct, a struct is not and never meant inherently to obey value semantics.

jayton · December 11, 2022, 11:28am

What you’re seeing is merely the public interface of a struct with a private stored property.

Pinning down what “value semantics” means in Swift turns out to be very hard, and the kind of a type doesn’t really tell you much of anything. After all, structs can hold references to classes, by design. Being a struct does not imply “value semantics”, and was never intended to.

In this case, NativeObject presumably isn’t a Swift class using Swift native reference counting, and allocating a heap object with an additional layer of memory management is unnecessary.

lukasa · December 11, 2022, 2:21pm

This is the core insight: struct is not synonymous with "possesses value semantics". A great example of this is all the pointer types, which are all structs but have reference semantics.

Classes unavoidably have reference semantics, so all classes are reference types. However, the reverse is not true for structs: it's perfectly fine to have structs with reference semantics, so long as the type communicates that to the user. Task is one such type.

cukr · December 11, 2022, 2:46pm

kinda offtopic

Here's an example of a class with value semantics

// This protocol exists only because Swift doesn't let you write mutating func directly on classes 
private protocol Helper {
    var value: Int { get }
    init(value: Int)
}
extension Helper {
    mutating func increment(by other: Self) {
        self = Self(value: self.value + other.value)
    }
}

final class MyNumber: ExpressibleByIntegerLiteral, CustomStringConvertible, Helper {
    let value: Int
    init(value: Int) {
        self.value = value
    }
    init(integerLiteral: Int) {
        self.value = integerLiteral
    }
    var description: String {
        return "\(self.value)"
    }
}


var myNumber: MyNumber = 123
var copy = myNumber
let thousand: MyNumber = 1000

// cannot mutate `let` values
//thousand.increment(by: thousand) // Cannot use mutating member on immutable value: 'thousand' is a 'let' constant
// can mutate `var` values
copy.increment(by: thousand)
print(copy) // 1123
// mutating a copy doesn't mutate the original
print(myNumber) // 123

DevAndArtist · December 11, 2022, 4:07pm

Yes, basically any class that is fully immutable (on its storage side of things) can be considered to have value semantics.

Value semantics can apply to both reference and value types, so can reference semantics apply both value and reference types.

That is totally valid and by design.

lukasa · December 11, 2022, 7:34pm

This is not what value semantics mean. Value semantics implies that two objects that have the same value are indistiguishable and substitutable: that is, that there is no way to observe the difference between them. Classes cannot meet this goal becuase all classes have identity. That is, we can extend the example above with one extra line:

print(myNumber === copy)

Which will return false. It is always possible to tell two class instances apart, and to perform computations based on that difference, so classes are never able to implement value semantics unless you can hide their identies (e.g. by wrapping them in structs to hide the class object).

taylorswift · December 11, 2022, 7:52pm

i think we are splitting hairs at this point. to me, classes have exactly one property that ‘value’ types do not: a memory address that stays the same as long as the object is alive. this is what enables interop with other runtimes that think in terms of memory location, and the thing that all the other properties of reference types, like “identity”, are derived from.

jrose · December 11, 2022, 7:53pm

I think Adrian’s point is still valid for classes like UIColor or NSNumber, and generally most of the NSCopying types. You might be able to tell them apart, but they are designed such that there is no benefit to doing so, and idiomatic code probably will treat them as indistinguishable (they are certainly isEqual: in Objective-C, which is used to implement == in Swift). You can still put in meaning through subclassing (or ObjC associated objects), but the class is designed to get as close as possible to value semantics while still being on the heap for whatever reason.

DevAndArtist · December 11, 2022, 8:06pm

I think you're mixing "classes" with "objects" here. Classes can certainly be designed to have value semantics, even the wikipedia page states that.

mredig · December 11, 2022, 10:08pm

This is a great discussion and I learned a lot from this thread!

However, a takeaway here, for me at least, is that we might want to improve the communication about the relationship between value semantics, value types, structs, and classes! I definitely had the impression that structs inherently were intended to perform like values and classes like references, and I don't think I'd be the only one, nor would my impression be uncommon.

In retrospect though, as @jayton mentioned, structs can hold references to classes by design and I don't know why that wasn't more obvious to me.

I also love the nuance to how people are defining these terms we commonly use and think the disparity in the differences perhaps highlights the need for improved communication on what these relationships entail.

Thanks to everyone for their responses

ksluder · December 11, 2022, 11:56pm

This is not strictly true. If the compiler can prove the class instance does not escape the scope in which is is initialized, it doesn’t necessarily perform a heap allocation, and the object can move around in memory or even be destructured and rematerialized.

ksluder · December 12, 2022, 12:09am

Here’s my mental model:

A type can be a struct, class, or enum.
Values are instances of types.
A value of struct or enum type is an instance of the data described by the struct or enum definition.
A value of class type is a reference to an instance of the data described by the class definition.
A type has “value semantics” when, given an instance A, you can assign A to variables V1 and V2, and it is impossible for any operation on V1 to be visible through V2. This is implicitly untrue of classes, and can be untrue of structs or enums depending on their implementations.

That said, I’ve long wanted indirect struct for the same reason we have indirect enum: to build recursive data structures. The difference between indirect struct and class is that indirect struct retains the possibility of value semantics because assignment still creates a new instance of the type instead of copying a reference to the same instance.

taylorswift · December 12, 2022, 12:11am

this is an optimization that occurs when the compiler can prove that no one is observing the class from outside its original scope. if no one asks for its memory address, it doesn’t need one, i suppose it’s more of a guarantee that it can provide a stable address if requested, and that address is not bounded to some function scope like withUnsafeBytes(of:). but you are right, we should probably document better that this is an observable behavior and not an implementation requirement, because most of the time classes can get inlined and optimized into value-like things like everything else.

tera · December 12, 2022, 1:57am

By a way of example:

In posix file system API when you open a file you are getting a file descriptor back, which is an int number – an index in the file descriptors table. Then you can use it in fstat, fcntl, close, etc. "Int" is "value type" but overall file descriptors are like references: you change/or/close one file descriptor - all file descriptor "copies" of it are affected, even though they remain to be the same number. Morever, if you managed to get a new file descriptor for the same underlying file - those "file descriptor values" (a different number and so many copies of it you are using) would also be affected. One step from that is wrapping a small struct around this file descriptor and promoting file handling functions to be a methods of that struct:

struct FileDescriptor {
   int file;
    func fcntl(...) {}
    func fstat(...) {}
    func close(...) {}
}

If you do that - nothing materially changed – you still have a struct carrying out "reference semantics".

OTOH, in C standard library you can also find "FILE*", which is more like "class" (a heap allocated object). Does it fix anything IRT "faux value" vs "true reference semantics"? No - you'd still be able opening two FILE* instances pointing to the same file and then you'd be able changing one (e.g. write bytes) to affect the other (when you read bytes), (although this would quickly become a mess due to internal buffering done inside FILE).

I believe Task is very similar to "struct FileDescriptor" example above.

lukasa · December 12, 2022, 8:03am

I agree that it isn't useful, but the behaviours of types that aren't useful are still there, and we cannot wish them away. It's not useful for programmers to be able to tell instances of UIColor apart by pointer value, but they can, and if they want to they can produce interesting programs whose behaviours only make sense in light of the fact that this is a property that the code has.

I argue that the existence of the === operator for classes encodes and reifies this idea in Swift. The Swift language explicitly enables programmers to ask whether two instances of a class have the same identity, irregardless of the type's semantics. Whether it's useful is distinct from whether it's possible, and classes have no way to opt out of this behaviour. Because the === explicitly operates on the notion of "instance identity", I don't see any way out of saying that classes unavoidably have reference semantics. Those semantics may not be useful, but they're always there.

Nope, I'm sticking with classes.

Karl · December 12, 2022, 12:55pm

lukasa:

This is not what value semantics mean. Value semantics implies that two objects that have the same value are indistiguishable and substitutable: that is, that there is no way to observe the difference between them. Classes cannot meet this goal becuase all classes have identity. That is, we can extend the example above with one extra line:
print(myNumber === copy)
Which will return false. It is always possible to tell two class instances apart, and to perform computations based on that difference, so classes are never able to implement value semantics unless you can hide their identies

I don't entirely agree. I would argue that "indistinguishable" and "substitutable" are contextual, and the mere fact that a variable exposes some kind of identity does not preclude it "having value semantics" as we mean it in Swift. In other words, it may indeed be possible to distinguish "this value" from "that value" in some contexts.

A couple of examples off the top of my head:

Array exposes a pointer to its contiguous element storage, which can certainly be considered a kind of identity. A similar argument applies to other collections which allow access to their underlying storage, such as String and its withUTF8 method.
```
let array1 = [1, 2, 3]
let array1Copy = array1

let array2 = [1, 2, 3]

print(array1.withUnsafeBufferPointer { $0.baseAddress })
// Optional(0x0000600001704160)
print(array1Copy.withUnsafeBufferPointer { $0.baseAddress })
// Optional(0x0000600001704160)

print(array2.withUnsafeBufferPointer { $0.baseAddress })
// Optional(0x0000600001700520)
```
I don't think we can accept a definition of "value semantics" for Swift which prescribes that neither Array nor String have value semantics, or that array1 and array2 in the example above don't contain "the same value".
Dictionary and Set indexes are not usable with other Dictionaries/Sets. Again, this exposes a kind of identity.
```
let dict1 = [ "a" : 1 ]
let dict1Copy = dict1
print(dict1Copy[dict1.startIndex])  // (key: "a", value: 1)

let dict2 = [ "a" : 1 ]
print(dict1 == dict2)  // true
print(dict2[dict1.startIndex])  // 💥 Fatal error: Attempting to access Dictionary elements using an invalid index
```
Dictionary indexes are actually kind of fascinating. The documentation says:

A dictionary’s indices stay valid across additions to the dictionary as long as the dictionary has enough capacity to store the added values without allocating more buffer. When a dictionary outgrows its buffer, existing indices may be invalidated without any notification.

Which implies that they have some kind of reference semantics, and explains why the above fails - even though we surely must conclude that dict1 and dict2 contain "the same value", they have different underlying buffers, possibly of different sizes, and thus we can derive distinct indexes from them.

And if you are able to derive values with reference semantics from a variable, how can that variable have value semantics?

OTOH, it would be highly impractical to declare that neither Dictionary or Set have value semantics, so I don't believe we can accept such a strict definition in Swift.

michelf · December 12, 2022, 1:00pm

I'll disagree: a class that offers no mutation operation has value semantics. Just think of the memory address as part of the value. If you call Int.random() and it gives you a different Int every time, that's still a value. If you call a function and it returns a different memory address every time, that's a value too.

Value semantics is not about shared storage, it's about shared mutation. As long as the type doesn't allow this kind of shared mutation, it can be a value type. Exposing the address is not enough to enable shared mutation.

Don't confuse with purity, where the result of a function depends purely on its arguments. If the resulting memory address can be observed to change when called with the same arguments, you've broken purity in some way. Similar to how random() isn't pure either.