The following code performs an "optimization" by just performing code generation of a libc function apparently called __multi3
public struct Xoshiro256StarStar: RandomNumberGenerator {
private var state: (UInt64, UInt64, UInt64, UInt64)
private static func rotl(_ x: UInt64, _ k: Int32) -> UInt64 { return (x << k) | (x >> (64 - k)) }
public init(state: (UInt64, UInt64, UInt64, UInt64)) {
assert(state.0 | state.1 | state.2 | state.3 != 0)
self.state = state
}
public init(from value: UInt64) {
assert(value != 0)
self.init(state: (value, value << 1, value << 2, value << 3))
}
public mutating func next() -> UInt64 {
let result = Self.rotl(state.1 * 5, 7) * 9
let t = state.1 << 17
state.2 ^= state.0
state.3 ^= state.1
state.1 ^= state.2
state.0 ^= state.3
state.2 ^= t
state.3 = Self.rotl(state.3, 45)
return result;
}
}
How is this acceptable, the point of Embedded Swift is targeting embedded platforms, where libc may not be present. Why would the compiler code generate libc calls when not asked to...
Is there a flag to turn off whatever is causing this?
And why on earth would it do that anyway, I believe a function call is orders of magnitude slower than a few bitwise operations.
Embedded Swift minimizes external dependencies (i.e. functions that need to be available at link-time), but they still exist. There are generally two categories of dependencies: […] (2) functions/symbols that are implicitly added by LLVM and the compiler pipeline. […]
For (2), external dependencies are also triggered by specific code needing them, but they are somewhat lower-level patterns where it might not be obvious that such patterns should cause external dependencies: […]
multiplication/division/modulo intrinsics
[…] The user and/or the platform (via basic libraries like libc or compiler builtins) is expected to provide these well-known APIs.
I think there's a difference between memcpy and this function. It's not what I would consider "well known" api
It seems to be related to overflow checks though, using &* removes the calls (and makes more sense here anyway)
What bothers me is that to me this kind of thing should just be inlined. I see no reason to perform calls like roundf which is for example just a dedicated wasm instruction. You're effectively calling a 1 instruction function which seems incredibly inefficient.
__multi3 isn't a libc function, it comes from the C runtime (libgcc/libcompiler_rt/etc).
And why on earth would it do that anyway, I believe a function call is orders of magnitude slower than a few bitwise operations.
On most platforms (especially embedded platforms) a function call is a few cycles; a multi-word multiplication is generally more expensive,¹ and moving it out of line can be a substantial codesize (and indirectly, performance) win.
There is a (LLVM) performance bug here, namely that 64b multiply with overflow should have its own runtime binding, rather than using the 128b multiply low and then checking for overflow,² but using &* is what you really want anyway.
¹ Some embedded targets don't even have a multiplication instruction.
² We could also workaround this at the stdlib level, but it would be better to fix it in LLVM so that all languages benefit.
That makes more sense, I initially forgot that the default in Swift is to perform those checks and I got really confused where it was getting the 128 bit integers from.
It's annoying that this kind of thing can happen with very normal looking code just multiplying numbers, and I keep running into unexpected dependencies which happen to be functions I never heard of before, but I use the language on weird experimental targets so I guess that makes sense
My $0.02 here is that @Lua is completely right on the part that "innocent arithmetic in your code should not suddenly fail to link", and this observation from @scanon is relevant:
When building Mach-O files with the macOS toolchain, the toolchain containts prebuilt libcompiler_rt builtin libraries exactly for this purpose. So if LLVM decides to make a call to libcompiler_rt, that'll work just fine. If we replicate this for Wasm (include a Wasm build of libcompiler_rt in the Swift toolchain), then this should work out of the box on Wasm too. In a way, a toolchain really is incomplete if it doesn't contain libcompiler_rt.
(Of course, this is orthogonal to whether there's opportunities to improve the optimizer in LLVM.)
It doesn't have to contain libcompiler_rt specifically, but it has to link (or otherwise bundle) by default any routines that the target backed emits calls to on its own.