Ideas for code size optimizations

Every browser already has their own ICU either linked or embedded within them. Things like date formatting is bridged from JavaScript perfectly well, there shouldn't be a need to ship a copy of any of that stuff in apps built with SwiftWasm. But if we're talking strictly code size optimization, not pure binary size optimizations, I can agree that discussions about ICU are off-topic here.

2 Likes

With integer division and modulo the divisor is tested for zero and traps before the operation is executed. Could this check be removed since the cpu will trap on division by zero anyway?

The LLVM backend would need to be able to make that transformation, since the division instructions in LLVM itself are formally UB if they divide by zero or overflow the result.

Could the calling convention for internal/private functions make use
of some of the CPU flags for returning the result?

Consider functions which return a single Optional or throw an error:

The callee will store the nil flag or error in a register and then the caller
will compare this value and branch. Couldn't the callee do the compare itself so
that the ZF (Zero flag) indicates the Optional/throw and the caller just has
to do the conditional branch after calling the function?

In the Optional case this would also save a register which could be used if the
return type was an Optional tuple. In the throwing case, callers simply need to
conditionally branch to a return and the compare would not be needed anywhere
in the return chain.

An alternative might be to use the CF (Carry flag) to indicate error since this can be
cleared or set with a single byte instruction on x86 (clc/stc).

Even functions that return Bool could possibly make use of the ZF since many callers
may branch on the result or could convert the flag into a register value using setnz.

Flags could even be combined eg a function that is throws -> Bool could use the CF
to indicate a throw and the ZF to store the Bool

Ultimately this might be able to consolidate the compares executed on return values.

If custom calling conventions can be used, could it allow for different register usage
for the passed parameters as well? Currently structs with more than 4 words are passed
on the stack but could there be a calling convention for functions with 1 argument that
allowed all 6 registers to be used before spilling? This might help reduce load/store
instructions from/to the stack.

1 Like

Functions that aren't stable-ABI are in principle able to use arbitrary calling conventions, so long as all invocation of the compiler still have a way to agree on the different convention. LLVM has fastcc to represent this. Using flags or more registers for aggregate arguments could be done, but they aren't necessarily slam dunks for code size, since the knock-on effects of potentially having to spill a flag value, or move a flag from one status bit to another, tend to require long code sequences. throw already uses a zeroed register to represent no error, and ARM64 already has a single instruction for compare-with-zero-and-branch, and on x86-64 that's a single uop even if it's encoded as two instructions.

4 Likes

Using the flags register like this on x86 is especially painful because almost all computational instructions (i.e. things that are not loads or stores or branches) modify it, so you have to ensure that nothing happens between when the flag is set and read. It's more feasible on ARM, where almost all instructions have versions that do not touch the flags, but still something of a hassle. None of this is rocket science to implement, but it's swimming against the stream of existing compiler designs to some extent, since "normal" calling conventions don't do this.

There are plenty of normal register names available on x86_64 and ARM that are not currently used for function returns, though, so it's almost surely easier to use one of them (note also that x86 has jcxz and friends and cx is available for an ad-hoc return register, if someone wanted to aggressively micro-optimize for code size).

3 Likes