Why Is Covariant Self more flexible on protocols than classes?

Ouch, this better be a compilation error... to force us writing:

func g() -> Array<any BinaryInteger> {
  let a: [Int] = [1,2,3]
  return .init(a)
}

Entirely valid Swift, I'm afraid.

Note, .init(a) probably won't do what you think it will here. I think it invokes Array.init(_ other: Self) which... will implicitly convert a to [any BinaryInteger] then just bump the refcount on the buffer.

I get why these restrictions exist for non-final classes, but I wish they were loosened for classes that are marked final. It feels weird that Self isn't allowed in that case even though the class can't be subclassed.

5 Likes

Yeah, I’d love to see more permissive treatment of Self in final classes.

3 Likes

I understand, I meant it would be better if it was required to be explicit... Otherwise there's some invisible conversion that takes O(n) time – could (and will) caught people by surprise! A warning "please use an explicit conversion" would be a good start.

I don't quite understand, could you clarify what the difference is? In this test:

let array = [Int](repeating: 0, count: 100_000_000)

func foo1(_ a: [Int]) -> [any BinaryInteger] { a }
func foo2(_ a: [Int]) -> [any BinaryInteger] { .init(a) }
func foo3(_ a: [Int]) -> [any BinaryInteger] {
    a as [any BinaryInteger]
}
func foo4(_ a: [Int]) -> [any BinaryInteger] {
    a.map { $0 as any BinaryInteger }
}
func measure(_ title: String, array: [Int], execute: ([Int]) -> [any BinaryInteger]) {
    let start = Date()
    let result = execute(array)
    let elapsed = Date().timeIntervalSince(start)
    let eq = result.map { Int64($0) } == array.map { Int64($0) }
    print(title, elapsed, eq)
    precondition(eq)
}

measure("foo1", array: array, execute: foo1) // foo1 0.5514670610427856 true
measure("foo2", array: array, execute: foo2) // foo2 0.5648369789123535 true
measure("foo3", array: array, execute: foo3) // foo3 0.529823899269104 true
measure("foo4", array: array, execute: foo4) // foo4 0.5312349796295166 true

all four versions give the same result taking the same O(n) time.

I was the one who pondered if it actually copies. I was curious because that's how I always assumed it was implemented, but it's inconsistent with copy-on-write, and it is possible to implement it without copying, and instead as either a special container that is aware both of its storage element type and nominal element type, or as a polymorphic box that thunks calls to the container interface through a vtable. Since it's easy to confirm which one because the performance of assignment would be dramatically different between the two, I decided to answer my own question, and confirmed it's doing an explicit conversation via copy. I agree with you this is the better approach.

I find it interesting you treat "casting" and "conversion" as such different and separate concepts. This is surely due to my time working with C++, but I treat the two as not just overlapping concepts but "casting" as just a subset of "conversion" (specifically a lossless conversion). Maybe it makes sense to further restrict "casting" to a conversion of a reference (so you're still accessing the same identical object), but that's dubious because the whole point of "values" is they don't have surrogate identities so you could never tell the difference anyways.

This is because "casting" in C++ is literally nothing more than implicit conversion functions (either a single parameter constructor, or the operator T() member), or explicit ones through ordinary functions (free floating or member, depending on the syntax you want). This is exactly how smart pointer flavors of cast operations are written in the standard library.

If you static_cast something, the compiler looks for conversion functions. If you define Rectange and Square with no inheritance relation, and give Rectangle a constructor constructor(const Square& square), then static_cast<Rectangle>(Square()) calls that constructor. What could "casting" mean if not "conversion"? It only restricts the concept to a conversion that in some sense "preserves" the value.

This has caused me to imagine a language that permits covariance with value semantics. If a Base defines a function func giveMeAnInt() -> Int64, then a subclass Sub: Base should be able to override that function as func giveMeAnInt() -> Int32, as long as Int32 is implicitly convertible to Int64. This is expanding the usual notion of covariance from pointer/reference types to all types, and treating the pointer case as merely exploiting that a Sub* is implicitly convertible to a Base*. You could do the same with contravariance.

Casting is never necessarily cheap. You always have to convert something, which can be arbitrarily expensive depending on the two types involved (also does it cascade? Is it a direct conversion or does it require multiple conversion steps?), and casting pointers can be even worse. Dynamic casting can be very expensive because it has to explore the inheritance/conformance tree, which is many-to-many. If you're thinking that casting is cheap you need to excise that idea entirely.

Even as (without ? or !) is not always a runtime no-op. If it's just being used for type inference it is, but it could also indicate toll-free bridging. The two have identical syntax because they are semantically equivalent. If you're worried about performance of toll-free bridging, should you also be worried about performance of let upcast = Sub() as Base? That's not free either. "Upcasting" from a concrete type to an existential box is expensive, both at the time of casting and later at access (the indirection can destroy performance in a loop through cache thrashing). The test I ran to prove "upcasting" arrays copies showed that creating the upcasted array took much longer, over 10x I think, than creating the original array. That's because of the expense of boxing every element. If you want "conversion" to be more explicit, arguably you should want all casting to be explicit, which I'm not necessarily against (conversely, a language that allows no implicit conversion is less capable than one that supports it, the more powerful language lets the user specify where it is possible).

In fact, after thinking about this, conversion functions is how Swift should implement covariant generic value types if it ever does, rather than making generic value variables polymorphic and thunking through vtables. So if you declare a generic value type to be covariant, MyStruct<out T>, the compiler both enforces that T only appears in covariant position, and requires you to define a conversion function. You'd either need a special keyword to identify this function, or you'd need generalized supertype constraints to write the required conversion as a regular init. The compiler would then insert appropriate conversion where necessary.

This enables things you can't defeat by merely inserting explicit conversions. For example today you can't map a Publisher existential, even with constrained associated types:

let publisher: any Publisher<String, Never> = Just("")
let mapped = publisher.map { $0.capitalized } // Not possible

Why isn't this possible? It should even be possible without associated type constraints ($0 would just come in as Any), because the associated types are covariant here.

Because the return type of map is Publishers.Map<Self, Result>. There is covariant Self in a non-covariant position, as a generic parameter in a generic struct. But Self is used strictly covariantly in Publishers.Map. You should be able to "upcast" Publishers.Map<Self, String> to Publishers.Map<any Publisher<String, Never>, String>, with the latter being the static type of mapped. If we could declare it as Publishers.Map<out Self: Publisher, Result>, then we can express this is possible, and the compiler will enforce it. We'd also need to deal with the fact the existential doesn't conform to Publisher.

Those two problems, lack of covariance in generic parameters and no way to express that a generic parameter be restricted to protocol existentials (and thus forbids access to metatype and non-covariant requirements), are the two major reasons why we still have to hand-write type erasers in Swift. Even if you could write a supertype-constrained explicit conversion, that won't solve this problem. You have to be able to express the covariance.

I personally believe Swift's primary goal right now should be to eliminate the need for hand-written type erasers. They're not only tedious to write and maintain (and most people, me included for a while, get them wrong, i.e. how do you correctly write a type eraser for a protocol with mutating requirements that correctly preserves value semantics?), they are loaded with footguns (they simply can't work properly in casting), and when I have to explain them to junior devs or even senior ones coming from other languages they invariably get confused, not just over why they are necessary but how to use them properly.

1 Like

What else would you expect it to do?

The end result is that an implicit copy of a is made, with its own buffer having a refcount of 1, the refcount increases to 2 when the array being initialized takes ownership of it, then other is destroyed and the refcount goes back to 1. What other option would there be, except perhaps to avoid nudging the refcount from 2 then back to 1?

What I'd be afraid of is that it makes two copies, first in the implicit conversion, then again in initializing the receiver. But whether it implicitly converts then calls Array.init(other: Self), or doesn't implicitly convert and calls Array.init<S: Sequence>(_ other: S), double copying shouldn't happen either way.

In this test:

protocol Base {}
protocol Derived: Base {}

struct S: Derived {}

let derived: [Derived] = (0 ..< 10_000_000).map { _ in S() as Derived }
let base = derived as [Base] // ••• this is slow

the marked line is slow. Is this alright? When I do the same with classes (casting [Derived] to [Base]) it's a quick O(1).


This is understandably slow: [Base] as! [Derived] (whether for array of objects or array of existentials) because "as!" does the relevant checks per element, so that would take O(n). I recon when I know what's involved and sure I am not doing a mistake I can get away with a fast:

// derivedItems are statically typed as [Base] and known to be [Derived]
// should be ok with objects and existentials
let derivedItems2 = unsafeBitCast(derivedItems, to: [Derived].self) // O(1)

Interestingly, in this example the conversion is of non-zero cost (I expected it to be zero cost):

enum E { case one, two }
let items = (0 ..< 100_000_000).map { _ in E.one }
let items2 = items as [E?] // O(n)

Optional(E) in this example takes the same amount of memory (1 byte) and uses the unused bit pattern to represent none (one=0, two = 1, none = 2), and no physical conversion seems to be needed.

unsafeBitCast doesn't work there (or if it does, it's a lucky coincidence and possibly dependent on compiler optimizations).

Casting from a concrete type to a protocol existential isn't just a bit cast. The byte representation of an existential box is very different than the byte representation of what it is holding. The box has to store type info (which it may store at the beginning) and it will store a pointer to the value, not the value itself, if the value doesn't fit into a limited space.

The documentation for unsafeBitcast specifically says not to use it with reference types because that will introduce undefined behavior (you must go through Unmanaged to first obtain an UnsafeRawPointer, and that is safe to bitcast to, say, an integer). That may be because the byte representation of object references can be different (that representation isn't guaranteed AFAIK by Swift), and bit casting probably messes with reference counting (which is why you have to hand it over to manual reference counting first through Unmanaged).

What you're looking for, and what I've also wished for, is some sort of unsafeCast function, that performs any necessary representation transformation but doesn't perform the conditional check first, to squeeze a tiny bit of performance out when you know the cast will be valid. I'm not aware of any such function. unsafeDowncast only works with reference types. Since a representational change is often necessary (which can include making heap allocations), the conditional check might be a relatively unimportant cost to casting.

It's slow because unboxing the S instances from the any Derived boxes and reboxing them into any Base boxes is expensive. If they were classes it would just require copying the pointers and bumping reference counts. That isn't O(1) though. It's fast, but if you test that with various orders of magnitude of counts you'll see that it increases linearly with the count.

The compiler would have to be aware of the compatibility of byte representations of two types (which is very narrow, a two-value enum and an optional of that enum is a very special case) and only choose that optimization when it's well-formed. Even then, the compiler would also have to detect that situation with arrays and apply special optimizations to avoid copying on those casts. If it does that sort of thing, it probably only does so when optimizations are turned on.

Speaking of which, performance analysis is meaningless in code compiled without optimizations. For me, initializing that array of 100 million enums the way you did with a map took 43 seconds on a debug build. Changing it to init(repeating: .one, count: 100_000_000) brought it down to a couple seconds. Switching to a release build and keeping the map took it down to less than a second.

1 Like

Note that in the first example I was using existentials (Base & Derived were protocols there).

Do you think unsafeBitCast could break in this fragment?

class BaseClass {
    func foo() -> Int { 24 }
}
class DerivedClass: BaseClass {
    override func foo() -> Int { 42 }
}
let derived: [BaseClass] = (0 ..< 1000).map { _ in DerivedClass() as BaseClass }
// let derived2 = derived as! [DerivedClass]
let derived2 = unsafeBitCast(derived, to: [DerivedClass].self) // •••

derived2.forEach { derived in
    let base = derived as BaseClass
    precondition(base is DerivedClass)
    precondition(derived.foo() == 42)
}

In my test (similar to the above but with base and derived reversed and "as" uncommented I do see O(1), so no extra work is done here (no retains, etc):

casting [DerivedClass] to [BaseClass]
10M 953 ns
20M 953 ns
30M 2026 ns
40M 953 ns
50M 1072 ns
60M 953 ns
70M 953 ns
80M 953 ns
90M 953 ns

Compiler is well aware of that of course. I used two-value enum for simplicity, 255-value enum would observe the same behaviour.

Of course, when testing performance I test release builds.

I'd love to see that!

Thinking out loud:

  • Use case: Cast derived class to base class
    Current syntax: "as"
    Ideal syntax: something "operator-looking", what we have now is ok.

  • Use case: Cast base class to derived class
    Current syntax: "as!" of "as?"
    Ideal syntax: something "function-looking", e.g.
    cast(base, to: Derived.self)
    optionalCast(base, to: Derived.self)
    bonus point: the "as!" version could be a throwing function instead of trapping.

  • Use case: bit cast
    Current syntax: unsafeBitCast(x, to: Y.self)
    Ideal syntax: something "operator-looking", e.g. x unsafeAs Y

Basically:

  • if we see an "operator looking" as or similar – we'd know it's O(1)
  • if we see a "function looking" "cast" or similar – we'd know it could take long (e.g. O(n)) time
  • plus, ideally there should no be hidden conversions that could take long (e.g. O(n)) time

Too late for Swift?

This isn't safe either. Swift doesn't make promises about the byte representation of anything, I think, including existential boxes.

One reason I can think of why this would fail is because if you, for example, do this:

protocol A {}
protocol B {}

struct C: A, B {}

let value: any A = C()
let bad = unsafeBitCast(value, to: (any B).self)

Then the box is holding a pointer to what it thinks is a witness table for B, but it's actually a witness table for A. You might get lucky if A were a subprotocol of B. Even if it works today, it could break on a change of OS version, minor language version, compiler settings, etc., because it's coupling to undocumented behavior.

Definitely. First, the documentation specifically says not to use it for casting between reference types, regardless of any type relationship. Second, this is bitcasting between two value types. You're guessing not only what the internal byte representation of Array is, but the calling convention of its member functions, since this is effectively casting member functions like subscript(position: Int) -> BaseClass to subscript(position: Int) -> DerivedClass.

Even if you're lucky and the bytes all align (today, with your compiler version, compiler settings, OS, etc.), isn't this breaking reference counting? You're trying to avoid an O(n) operation, right? But that would mean avoiding a call to retain on every element. If you don't do that, well there's going to be a (O(n)) call to release on every element both when derived2 is destroyed, and when derived1 is destroyed. Those objects will have a retain count of 1 and get deallocated after derived2 is destroyed but before derived is destroyed. Kaboom.

unsafeBitCast has a very restricted set of uses. Basically it's for casting to bytes so they can flow through code that doesn't understand those bytes and make it to some code somewhere else that can do the reverse cast of the bytes back to the same type, and this is only safe to do with things that are pure values with trivial copy constructors (structs with only members that are structs with only members that are structs... repeat until you get down to primitives).

That might be because you followed the rule I stated but broke and turned on optimizations. If your entire code is just declaring these variables then not doing anything with them, the compiler can probably tell that it can optimize out both the extra releases, and correspondingly the extra retains. Maybe if you do it in a function where derived is an incoming parameter and derived2 is stored in a variable (and so escapes) you'll see the O(n) operation.

Trying to guess what optimized code is doing is a fool's errand. It's usually suspect that anyone is trying in the first place because the compiler probably produced optimal code before you started coding in an unusual style to try to make it more optimized. Optimization usually comes down to reworking the (visible, not under the hood implementation detail) logic, i.e. realizing a nested loop is doing unnecessary redundant calculations, not trying to beat the language or standard library at common tasks. Arrays in Swift are probably aggressively optimized in ways that wouldn't even make sense to anyone who's not a compiler, machine language and computer architecture expert.

That we're not even asking the right questions is evident by the fact this conversation is about Big O behavior. That's way less relevant to the vast majority of code (which isn't creating containers or running loops with hundreds of millions of elements/iterations) than cache misses and failed branch predictions. Performance on modern computer architectures, which have multiple layers of memory caches, superscalar CPUs, etc. is not easy to understand at all.

Here is a good short video on the subject.

Worrying about Big O, especially theoretical Big O (what you're taught in data structures courses), is not only usually irrelevant but often actively misleading.

This begs the question: can you guys give a single example from your careers where Swift doing implicit O(n) upcasting of arrays caused a problem at all, let alone one that was expensive to discover or solve?

If not, isn't it completely irrational to demand that Swift make writing the typical application code we spend nearly 100% of our time writing considerably more difficult in order to avoid what I can only guess is an entirely theoretical problem that none of us have ever actually encountered?

I don't even see how the implicit conversion is concerning even if it does turn out to be a performance bottleneck that requires reworking. You can't discover performance bottlenecks through analysis, you have to do it empirically by measurement. So you'll find the bottleneck whether it's implicit or not. Making the conversion explicit doesn't fix it. You'd have to rework your code to not need to do the conversion at all (the part I never see people actually think about in these performance thought experiments... what is the alternative? Why were you converting arrays to begin with, and how are you going to solve your problem without doing that?)

I end up giving this rant every time I see a "this is a performance footgun" conversation. The real footgun is thinking you can predict the performance of real code through this kind of analysis. And you've probably already wasted more time wondering about it than you ever would dealing with it because the performance cost is unnoticeable anyways.

I have experience optimizing a 3D graphics engine (it ran fine on iOS, then on Android it couldn't maintain 10 fps). This was C++, there were a few places where containers were being copied instead of passed by reference, so it was an important optimization to pass by reference. In a few places it was important to reserve enough capacity ahead of building up arrays. It was mostly identifying the expensive OpenGL calls and figuring out how to restructure the rendering process to avoid making them as often (i.e. saving and reusing stuff, which was much more difficult to code correctly, which is why I didn't do it that way to begin with). Swift avoids the "oops I passed by value" stuff specifically by hiding those implementation details under abstractions (you don't even get to see pointers, the compiler decides all of that and can thus do it better than you). The meat of the optimization was the logic of my code, and it didn't involve Big O at all. If one thing is O(n) and another is O(1), and n = 10, but the O(1) thing has a constant cost factor that is 1000x the O(n) thing, the O(n) thing wins by a factor of 100. The irony of Big O is that most of the time, the O isn't that Big.

The next rule is to test your actual program, not a snippet of code written solely for the purpose of testing its performance. That will not tell you anything about the performance of real code. If my guess (the best I can do) is right, you concluded that sharing reference counted objects in an array is O(1) only because it was in an unrealistic example where the sharing was completely unnecessary and could be optimized entirely out of the program. If you're copying that array and both copies escape, it's simply impossible for it to not go through and retain every element. That's actually a great lesson in "stop trying to to outsmart the compiler". You want to make that conversion explicit, but the compiler optimized it in a way that isn't possible for the programmer to do... unless you argue that automatic reference counting is a performance footgun and Swift needs to go back to manual reference counting.

1 Like

Bringing the original topic to a conclusion, @Slava_Pestov answered the question:

Why is Covariant Self more flexible on protocols than classes?

Because classes don’t have Covariant Self. They have some entirely different thing that happens to share the name Self and maps to instancetype in ObjectiveC. Correspondingly, you’re only able to use this in ways that translate to instancetype, which is as the literal return type.

This might explain why, as some others remarked, Self doesn’t mean the same thing in a final class that it does in a value type. In a struct that conforms to Equatable, I can declare the implementation of == as func ==(lhs: Self, rhs: Self) -> Bool, but I can’t do that in a class, even a final class, because Self isn’t acting as an alias to the class itself, it’s still referring to that “dynamic Self” concept.

Classes could have a covariant Self, that would behave like it does in protocols, and becomes an invariant type alias in a final class. But since calling it Self would conflict with “dynamic Self”, it would be hard to add this to the language now.

Either way, if you want covariant Self, you have to use a protocol. That’s what the protocol “trick” does: gives you access to a covariant Self type parameter, which actually doesn’t exist at all in classes.

4 Likes

If it's a 255-value enum it's using every bit of that byte, so the Optional wrapper can't repurpose any of those bits for flagging nil. It would need an extra byte for that.

Speaking of which, are you sure Optionals of two-value enums stuff the nil flag into unused bits? That would mean that the byte representation, even the size in memory, of an Optional<T>, even for Ts that are just single bytes, depends on T. This would be very different than the alternative where the byte representation of Optional<T> is the same regardless of T (except that the value part is either the value itself or a pointer to it depending on whether the value fits).

Are you even sure that the size in memory of an Optional<T> depends on the size of T? How does that work with generic code, which has to be compiled to an existentialized version for calls outside its own module?

Does that have performance implication we aren't thinking of, that the compiler doesn't know that the nil flag is always located at the same bit of every optional?

Are you sure "unused bits" isn't referring to bits that are known to be zero due to memory alignment? That is, if an Optional is (always) 8 bytes large, and those bytes are either a pointer to the value that is aligned to at least 2 bytes, or the value itself if it's 7 bytes or smaller, then the least significant bit of the last byte is unused and you can repurpose it. My understanding is that at runtime all instances of generic types have identical structure and just carry the type parameter of the specialization around as a member.

A 255 value enum uses 0...254 for representing its cases, and uses 255 to represent nil (I have confirmed this with a concrete test). This isn't the cast for every Optional<T>, but that was besides @tera's point.

@tera's point was that in this concrete example the compiler could completely compile away the cast, and it was merely pointed out as interesting that the compiler didn't. In generic contexts such optimisations of course aren't possible (although the compiler could technically generate code to identify these optimisation opportunities at runtime and fast-path [T] to [T?] when T to T? is a no-op, which would require type information to include whether T to T? is a no-op or not, probably not worth it at all).

For these enum examples, nil is represented as an extra enum case instead of as a bit flag (hence why the optional representation of the 255-value enum is still 1 byte even though all 8 bits of the enum's stride are used by at least one value of the enum). The compiler doesn't have to know/prove that nil is always represented the same way (it isn't) just to apply the optimisation in certain specific cases when the types involved are known at compile time.


Note that all of that is completely aside to the topic of this discussion, I believe that the example was simply brought up as another example of when the conversion could indeed be zero-cost. I just investigated/answered because these kinds of compiler details interest me (TIL that an enum with distinct UInt8 raw values isn't actually represented by those values under the hood, and the compiler does some really cool stuff to convert the enum representation to raw value and vice versa by fitting polynomials and using magic bit manipulation stuff).

I agree that big-O can be misleading and often leads people down the pit of premature optimisation, but in my opinion it's always good for developer experience if it's clear when you're doing something with non-zero-cost even if you'll eventually find it if you start profiling. In many people's minds, casts are essentially constant-time (even with A: X, casting A to X may be expensive if done many times over, but it's still constant-time), hence why this discussion has had quite a focus on O(1) vs O(n).

In a relatively well-designed language such as Swift, changes to language design become less and less appealing to existing happy users, but a new language mode (e.g. Swift 6) is always a good opportunity to make opt-in breaking changes with longer term benefits for the language ecosystem and the experience of newcomers to the language (I'm not saying that this should be in Swift 6).

1 Like

Thank you correcting me. I have become trained to see the number 255 to mean the 0..255 range and automatically converted it to 256 in my head. Classic off-by-one error LOL.

By the way this is another reason you can't use unsafeBitCast except in a few narrow cases. You're guessing how the compiler is choosing byte representations.

My point isn't that it can't, but when (if ever) should it? I can imagine reasons why even in relatively simple code that it becomes advantageous to have identical representations, such as in code that declares two optionals of different types (both both being 1 byte sizes) but uses them (or the code can be reordered without changing logic to make it) in non-overlapping order, and superscalar CPUs like seeing repetitions of identical instructions (maybe that means stuff can be vectorized).

Once I started hearing about stuff like that, I gave up on trying to do mental analysis of code performance or guess what optimizing compilers are doing (that's why I posed that as questions).

Yeah exactly. It can also reorder lines of code, do all sorts of stuff that relies on no type punning (so don't simultaneously access memory as a T and as a U, even if you know the representations are compatible, because that confuses the compiler who assumes no memory can ever be accessed through two differently typed variables), and so on. Watching that out of curiosity is fine, but the only lesson about the performance of code you write it's giving you is "just measure it".

Fun aside: I ran into a situation with a Metal shader that mysteriously stopped working on newer versions of iOS. I filed a bug with Apple, they eventually replied telling me it's "working correctly" but the compiler was confused by this trick I was doing to simulate double-precision math. The compiler thought it could safely reorder the arithmetic (because addition is commutative and associative), but that was messing it up by adding the two slices of the float together before performing subtraction, which caused the extra precision to be lost too early. They told me to mark the float variables as volatile to stop the compiler from messing with the order of operations.

I don't know about you, but that does not qualify as "working correctly" to me LOL. At least they gave me a workaround.

Sure it's good to be more knowledgable of the language (I also wanted to know if upcasting is an eager copy), but I think the second sentence explains what's wrong with the first. It really just can't be clear what the cost of stuff in a high-level language like Swift (high enough that it fully hides the memory model) is, so your best bet is to accept that any assumptions about performance, especially "this is easy to do or implicit ergo it's free" need to be eliminated. Because of all the compiler black magic I'll bet that's true in C as well. Even in assembly, the hardware is so complicated now you can't mentally model how it will handle code.

I also said that earlier, that casting can be very expensive so it should not be thought of as a cheap or compile-time only operation. Another reason why casting can be expensive is if you upcast (completely transparent, often requires no keywords) a struct to a protocol existential, and the struct isn't big enough to fit into the box's memory, it's going to have to heap-allocate a copy of the struct. The only reason that's not O(n) for a struct with array members is because of copy-on-write. But if you define "n" to be the size of the struct, it is O(n) (i.e. there's a 10x cost of upcasting a 10x sized struct). You say "constant time" but constant with respect to what?

Of course optimizations may be able to avoid that, so trying to guess that it's going to be expensive in some real code fragment is also a bad idea. You just have to measure.

Big O looks at the limit of large "n" (also, what is "n"?). In real code, "n" is usually not very large (I can't think of a real programming problem with tens or hundreds of millions of iterations that isn't embarassingly parallel and therefore suited for a GPU... maybe calculating prime numbers out to arbitrary levels, but "optimizing" that is a problem for math PhDs). The video I linked explains why Big O concerns can get the wrong answer. Arrays are "bad" in terms of Big O at random insertion. But they're so much better at being cached you should pretty much always use them.

Going from, "this thing is O(n) in this contrived example" to "therefore that thing should be made explicit in the language" (or even worse "the language shouldn't support this" is the real trouble I'm rebuffing. I can't see how this doesn't emerge out of undue assumptions about the way everything in the language works, and all the complex optimizations that are going on. The lesson here isn't that Swift made a mistake by letting us "unwittingly" pay large (O(n) or not, that doesn't matter) costs, it's that we make the mistake of thinking we can predict performance by just reading code.

I've never had to rewrite code to avoid upcasting an array because it's a bottleneck. I don't want to have to explicitly upcast everywhere just because I might one day encounter a scenario where I do have to avoid that. Requesting the language to do that is encouraging a dangerous mindset that the language is informing you of performance cost in general and that you can optimize without measuring.

Type checking expressions in Swift.

9 Likes

I understand that Big O is about the asymptotic behaviour (in the limit) but it's still a useful tool, especially when one of the algorithms you're deciding between is O(1). In the above discussion we've been using n to refer to the count of the array. Videos such as the one you refer to are cautionary tails, but aren't to be taken as 'big O is never useful'.

I've run into many problems that require millions of iterations and aren't embarrassingly parallel; often mathematical, but not always (e.g. generating meshes in short timeframes where splitting the work and recombining had too much overhead). My mathematical example was fitting a 1.7 million term polynomial which required over a trillion operations, but that is starting to become quite contrived :sweat_smile:

1 Like

This is not how it works — there's a COW at play here. If you run this test:

class BaseClass {}
class DerivedClass: BaseClass {}

func foo() {
    let d = DerivedClass()
    print("#A retain count:", CFGetRetainCount(d)) // 1
    let array: [DerivedClass] = [d]
    print("#B retain count:", CFGetRetainCount(d)) // 2
    
    var arrays: [[BaseClass]] = []
    for _ in 0 ..< 1000 {
        var convertedArray = array as [BaseClass]
        // convertedArray.append(DerivedClass())
        arrays.append(convertedArray)
    }
    print("#C retain count:", CFGetRetainCount(d)) // 2
    print("ignore:", arrays.map { $0.count }.reduce(0, +))
}

foo()

you'd see that retain count of the element in question is unchanged... Unless you do a mutation of the converted array (see the seemingly unrelated commented out line), if you do – then the actual copy happens with the per element retain. Note that even though the original array and the subsequent copies of it are of different types ([BaseClass] and [DerivedClass] respectively) — that doesn't stop COW machinery from working.

To sum up:

  • converting from [DerivedClass] to [BaseClass] with "as" is O(1)

  • converting from [BaseClass] to [DerivedClass] with "as!" is O(n), although this is not due to extra retains (those won't happen unless COW's copy is triggered) but due to extra checks to ensure that the class is actually Derived.

  • converting from [DerivedClass] to [BaseClass] or vice versa with a manual "map { $0 as Base/Derived }" is O(n) and this is in part due to extra retains.

Your mental model how non-specialized generics work is not correct, I think. Non-specialized generic code isn't "existentialized", at least not in the sense that instances of generic types get boxed and unboxed. And the memory layout of instances of a generic type S<T> does depend on the concrete T.

When you have a generic function such as:

func f<T>(_ x: T)

The compiler will generate a non-specialized version of the function that looks something like this (pseudocode):

func f_T(_ x: UnsafeRawPointer, _ metadata_T: UnsafeRawPointer) { … }
  1. x is passed by pointer because the compiler doesn't know how big x is. (Importantly, this doesn't require a heap allocation. The caller can allocate space for x on the stack and pass a pointer into the caller's stack frame to the callee. And this is what Swift generally does.)

  2. The function also receives a pointer to the type metadata for T. From the type metadata, it can access T's value witness table, which contains information about T's memory layout (size and alignment) as well as function pointers to fundamental operations like how to copy or destroy instances of T. (For enums, the VWT also contains information about extra inhabitants/spare bits.)

So the body of f_T needs to do some indirection/pointer dereferencing to access x, but no boxing/unboxing is required. The actual memory layout of the instance x is always identical to the memory layout of the concrete type if f were a non-generic function.

8 Likes