Motivation
This is going to be a bit long, but bare with me, because this is the important part.
This use case came to me while reading WAMR (WebAssembly Micro Runtime)'s interpreter code, and wondering if a similar interpreter can be implemented in Swift.
WAMR's interpreter actually comes in 3 versions, or rather, "speeds".
The "slowest speed" is a very basic while
-and-switch
. It loops infinitely (or rather, until the program exits), reading the opcode at the current instruction pointer, and performing a switch
to jump to the implementation of the opcode. Then it loops and processes the next opcode. A simplified version of it would look like this (Some, but not all, macros expanded for brevity):
while (frame_ip < frame_ip_end) {
opcode = *frame_ip++;
switch (opcode) {
case WASM_OP_F64_MIN:
{
float64 a, b;
b = POP_F64();
a = POP_F64();
PUSH_F64(f64_min(a, b));
continue;
}
A port to Swift, or any other language, would be trivial.
The faster version uses a non-standard features of certain C compilers (namely, GCC and LLVM) called "labels as values" and "computed goto". First, instead of switch cases, opcodes are organized into goto
labels:
HANDLE_WASM_OP_F64_MIN:
{
float64 a, b;
b = POP_F64();
a = POP_F64();
An array holds a mapping of opcodes to labels:
static const void *handle_table[256] = {
&&HANDLE_WASM_OP_UNREACHABLE,
&&HANDLE_WASM_OP_NOP,
Then, the switch
is replaced with a goto
referencing labels from the array:
goto *handle_table[*frame_ip++];
So far, this is just a glorified switch
with a very specific implementation. Additionally, at the end of each opcode, instead of looping back and then switch
ing, it immediately jumps to the next instruction:
HANDLE_WASM_OP_F64_MIN:
{
float64 a, b;
b = POP_F64();
a = POP_F64();
PUSH_F64(f64_min(a, b));
goto *handle_table[*frame_ip++];
}
This essentially a glorified reswitch
, which is a concept that already came up a few times. Additionally, this is absolutely something a Sufficiently Smart™ optimizer can do, detecting the while
-and-switch
pattern, and converting the continue
to a reswitch
.
So far, nothing that warrants a new language feature. But it's the third, "fastest speed" that is the interesting one. This version expands on the use of labels as values, by not merely having a static constant array, but actually creating a dynamic array that directly maps instruction pointer locations to labels, without having to go through an integer opcode in between (Some macros and methods are expanded for brevity):
while (p < p_end) {
opcode = *p++;
p_org = p;
disable_emit = false;
do {
uint32 label_addr = (uint32)(uintptr_t)handle_table[opcode];
/* emit uint32 label address in 32-bit target */
if (ctx->p_code_compiled) {
STORE_U32(ctx->p_code_compiled, value);
ctx->p_code_compiled += sizeof(uint32);
}
else {
increase_compiled_code_space(ctx, sizeof(uint32));
}
LOG_OP("\nemit_op [%02x]\t", opcode);
} while (0);
The actual opcode handling then becomes:
HANDLE_WASM_OP_F64_MIN:
{
float64 a, b;
b = POP_F64();
a = POP_F64();
PUSH_F64(f64_min(a, b));
do {
const void *p_label_addr = *(void **)frame_ip;
frame_ip += sizeof(void *);
goto *p_label_addr;
} while (0);
}
This technically only saves a single memory lookup per instruction. But this tiny save adds up pretty quickly, once you realize it's for each instruction in interpreted code.
So, when considering whether this algorithm could be implemented in a high-level language like Swift, the question arose as to how you could replace "labels as values" as a feature. And the obvious answer was "functions as values":
// Can't use a typed pointer here, because the type would recursively reference the method parameters
func wasm_op_f64_min(frame_ip:UnsafeRawPointer, stack:WasmStack) -> Void {
let a:Double = stack.pop_f64()
let b:Double = stack.pop_f64()
stack.push_f64(f64_min(a,b))
let label_addr:((UnsafeRawPointer, WasmStack) -> Void) = frame_ip.assumingMemoryBound(to:((UnsafeRawPointer, WasmStack) -> Void).self)
let next_frame_ip:UnsafeRawPointer = UnsafeRawPointer(label_addr + 1)
// But this is a call instead of a goto
label_addr(next_frame_ip, stack)
}
The difference between a function and a label, however, is that a function is called, not jumped into, which creates a call stack. If this repeats every instruction, this quickly causes runaway memory allocation, or simply crashes with a stack overflow.
An alternative might be to use a trampoline, and return label_addr
and next_frame_ip
instead of making a call. But doing this removes the performance advantage of having one memory lookup less, by adding an entire stack allocation and deallocation in its place. What is needed is a way to explicitly perform a goto
into a function instead of a call
.
Proposal
The suggestion is to add a new syntax for performing a goto
instead of a call. There are various options for the exact keyword used, but for simplicity, the remainder of this pitch will use return_call
, named after the matching WebAssembly opcode.
Naturally, this syntax carries some restrictions.
The callee function must match the caller's return type:
func callee_void() -> Void {
}
func callee_int() -> Int {
return 123
}
func good_caller_void() -> Void {
return_call callee_void() // Unlike return, this is usable even if the return type is void
}
func good_caller_int() -> Int {
return_call callee_int() // Returns 123
}
func bad_caller_void() -> Void {
return_call callee_int() // Error: Cannot discard the value, not even if @discardableResult is used
}
func bad_caller_int() -> Int {
return_call callee_void() // Error: Cannot create a value, either
}
func bad_caller_int_manip() -> Int {
return_call callee_int()+1 // Error: Cannot manipulate the value after it is returned
}
It must have the same throwing type as well:
enum ExampleError: Error {
case mayThrow
}
func callee_nothrow() -> Int {
return 123
}
func callee_maythrow() throws -> Int {
if Bool.random():
return 123
throw ExampleError.mayThrow
}
func good_caller_nothrow() -> Int {
return_call callee_nothrow()
}
func good_caller_maythrow() throws -> Int {
return_call callee_maythrow() // No need to use try. The error is implicitly forwarded
}
func bad_caller_nothrow() -> Int {
return_call callee_maythrow() // Error: Cannot discard the throw
}
func bad_caller_maythrow() throws -> Int {
return_call callee_nothrow() // Error: Unlike a regular return, it's not possible to add the possibility of an exception
}
func bad_caller_maythrow_try() throws -> Int {
return_call try callee_maythrow() // Error: Cannot use an expression. try is implicit when throwing, and should not be added
}
func bad_caller_nothrow_try() -> Int {
return_call try! callee_maythrow() // Error: Cannot manipulate the return value in any way
}
Adding to the above, it's not possible for the caller to catch an error thrown by the callee. If return_call
is wrapped in a do-catch
, the do-catch
is ignored:
enum ExampleError: Error {
case shallThrow
}
func callee() throws -> Void {
throw ExampleError.shallThrow
}
func caller() throws -> Void {
do {
return_call callee()
} catch {
print("Caught in caller")
}
}
do {
caller()
} catch {
print("Caught in main")
}
The above would print Caught in main
.
Likewise, unlike return
, any defer
blocks in a function that performs return_call
will be executed before the function is called:
func callee() -> Void {
print("1")
}
func caller() -> Void {
print("2")
defer {
print("3")
}
print("4")
return_call callee()
}
print("5")
caller()
print("6")
The above would print
5
2
4
3
1
6
The same applies to dereferencing local references. However, due to the fact that not every dereference is a deinit, examples related to ARC are left as an exercise to the reader.
Difference from Tail Call Optimization (TCO)
One might consider avoiding the addition of a new syntax, and simply transforming any return callee()
into a return_call
. However, this approach has several disadvantages:
- An optimization is not a guarantee. Optimizations can be turned off, while a given algorithm may not work unless a
return_call
is used. - Even if the optimization is mandatory starting from a given Swift version, a program relying on said optimization will compile without warning on an earlier version of Swift, only to later fail at run time. When trying to compile a program that relies on a later version of Swift, using an earlier version of Swift, it is better to receive a syntax error at compile time, then hit an unexpected error at run time.
- TCO, being an optimization, has to ensure the semantics of
return
are maintained. For instance, it cannot apply in the presence ofdefer
ordo-try
. Not only that, it would "fail" without issuing any warning or error during compile time. The program would, instead, fail unexpectedly at run time. - Likewise, TCO cannot play well with ARC. The mere presence of a local reference variable makes reference handling that much more complex, if the dereference cannot occur after the callee returns.
- While
return_call
takes less memory thanreturn
, it is not guaranteed to be faster. As such, it should be up to the developer to decide when to use it, and when not to.
Alternative syntax
Other possible names for the new syntax:
tail return
tail_call
tailcall
return continue
goto
Parameters
There is no semantic reason why the callee and caller would be required to have the same parameters. But it may simplify implementation if they are. This aspect is open for debate.