Given `x: T?` to test if x is not nil: which is better/faster: if let _ = x { ... } vs. if x != nil { ... }?

young · November 3, 2020, 10:10pm

To do conditional override accentColor, only need to know if an optional value is nil or not and don't need the unwrapped value. Two ways to test if an optional is not nil:

...
    if let _ = accentColor {     // 1
      self.accentColor(accentColor)   // pass in the original optional
    }
...

vs.

...
    if accentColor != nil {     // 2
      self.accentColor(accentColor)   // pass in the original optional
    } 
...

to find out, I feed this to the Compiler Explorer:

func foo(_ x: Int?) -> Int {
    if let _ = x {
        return 1111
    }
    return 2222
}

func bar(_ x: Int?) -> Int {
    if x != nil {
        return 1111
    }
    return 2222
}

output:

output.bar(Swift.Int?) -> Swift.Int:
        push    rbp
        mov     rbp, rsp
        test    sil, 1
        mov     ecx, 1111
        mov     eax, 2222
        cmove   rax, rcx
        pop     rbp
        ret

output.foo(Swift.Int?) -> Swift.Int:
        push    rbp
        mov     rbp, rsp
        pop     rbp
        jmp     (output.bar(Swift.Int?) -> Swift.Int)

surprisingly, the if let ... version is much simple, I don't really unstained how the foo version assembly code work, but it does work!

Anyway, based on this, the if let _ = ... { } is faster. I wonder why not both generate the same code?

Alejandro · November 3, 2020, 10:16pm

The foo version simply calls the bar function, jmp (output.bar(Swift.Int?) -> Swift.Int) indicates that we're unconditionally jumping to the bar function to continue execution.

young · November 3, 2020, 10:45pm

Oh wonder why foo calls bar? Is it because it's optimizing for code size and not speed? Maybe in the real code, any call to foo go directly to bar?

So separating the two out, they both generate exactly the same code:

lukasa · November 4, 2020, 7:49am

Nope, it’s optimising for speed. This code is faster.

The minute number of extra instructions and jumps are a non-issue in almost any modern processor: they’ll be branch predicted correctly every time, making their cost basically zero (a few cycles at worst). However, because the binary is smaller and because both functions have the same body, it’s much more likely that the body of the functions will be in cache: they both use it, so it’s used more frequently, and the binary in general is smaller.

If the instruction cache hits, execution moves swiftly: if it misses, you have to wait for the memory to be loaded. Depending on where it is (L2, L3, main memory etc.) that can be hundreds to hundreds of thousands of cycles of waiting. Trading a few cycles to reduce the risk of that happening is a great trade.

young · November 4, 2020, 5:31pm

But why make foo indirect? Why not just go direct, seem obvious but there must be good reason it's not direct? So the compiler can detect identical func body and generate only one and makes other go indirect. But in so doing it's generating duplicate indirect jump func body. Doesn't make sense to me, even though the CPU can optimize this out as you explained, but still duplicate taking up cache.

lukasa · November 4, 2020, 5:49pm

Because the compiler doesn't know if it needs both symbols yet.

The mode that Compiler Explorer is compiling in doesn't perform linking, it generates a .o file and disassembles it. This means that the symbols still need to be present because other .o files may be linking them and jumping to them. The compiler therefore needs a symbol, and this is the smallest thunk to achieve that goal.

young · November 4, 2020, 6:06pm

Ah, I see. So it's just the Compiler Explorer.