The foo version simply calls the bar function, jmp (output.bar(Swift.Int?) -> Swift.Int) indicates that we're unconditionally jumping to the bar function to continue execution.
Nope, it’s optimising for speed. This code is faster.
The minute number of extra instructions and jumps are a non-issue in almost any modern processor: they’ll be branch predicted correctly every time, making their cost basically zero (a few cycles at worst). However, because the binary is smaller and because both functions have the same body, it’s much more likely that the body of the functions will be in cache: they both use it, so it’s used more frequently, and the binary in general is smaller.
If the instruction cache hits, execution moves swiftly: if it misses, you have to wait for the memory to be loaded. Depending on where it is (L2, L3, main memory etc.) that can be hundreds to hundreds of thousands of cycles of waiting. Trading a few cycles to reduce the risk of that happening is a great trade.
But why make foo indirect? Why not just go direct, seem obvious but there must be good reason it's not direct? So the compiler can detect identical func body and generate only one and makes other go indirect. But in so doing it's generating duplicate indirect jump func body. Doesn't make sense to me, even though the CPU can optimize this out as you explained, but still duplicate taking up cache.
Because the compiler doesn't know if it needs both symbols yet.
The mode that Compiler Explorer is compiling in doesn't perform linking, it generates a .o file and disassembles it. This means that the symbols still need to be present because other.o files may be linking them and jumping to them. The compiler therefore needs a symbol, and this is the smallest thunk to achieve that goal.