I quickly cooked a minimal BigInt implementation from scratch (addition only), here's what I got:
Count | Time |
---|---|
10,000 | 1.834ms |
30,000 | 10.102ms |
100,000 | 94.939ms |
300,000 | 806.729ms |
1,000,000 | 8.699s |
It's not so bad but not quite as good as node.js implementation timing indicated above. In my implementation I've manually unrolled the loop, effectively working with "8x64" bit ints in the outermost loop. Possible reasons for this implementation being slower compared to node.js implementation:
- they might be using
asm
. - they could be using 128-bit ints, while my implementation is using 64-bit ints as the base type.
- they could be using SIMD tricks.
- there's some ARC traffic in there in my implementation.
- the obvious difference could be just due to the differences in the computer specs (I'm on a 2021 MacBook Pro, which is M1 Pro @ 3GHz if I am not mistaken).
BTW, are there GPU based BigInt implementations in the wild and how do they compare to CPU based implementations?