The following program demonstrates that the generic initializer is slower when called "normally" than it is when passed as a closure to a nested function. I would have expected the opposite, if any, difference.
Is it an optimizer-bug / -opportunity for improvement or can it be understood in some other way?
import AppKit
extension FixedWidthInteger where Self: UnsignedInteger {
/// Creates a value by mapping `source` from the full range of its type
/// `S` to the full range of `Self`.
///
///
/// ## Scaling Up
///
/// The method used when scaling up to a higher bit width can be
/// described by:
///
/// dstVal = (srcVal * dstMax) / srcMax
///
/// (But it is efficiently computed without integer division and without
/// the risk of overflowing.)
///
/// So converting for example a (hypothetical) 2-bit type to a 4-bit
/// type, ie mapping from the range [0, 3] to the range [0, 15]:
///
/// 0 1 2 3
/// +-------+-------+-------+-------+
/// | | | |
/// .--' .' '. '--.
/// ↓ ↓ ↓ ↓
/// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
/// 0 1 2 3 4 5 6 7 8 9 A B C D E F
///
/// The implementation uses the fact that the bit pattern of the
/// destination will always be the bit pattern of the source repeated,
/// starting with the most significant bit (left). Here is a conversion
/// from a (hypothetical) 2-bit type to 5-bit type:
///
/// 2-bit 5-bit
/// source destination
/// (0) 00 --> 00000 (0)
/// (1) 01 --> 01010 (9)
/// (2) 10 --> 10101 (21)
/// (3) 11 --> 11111 (31)
///
///
/// ## Scaling down
///
/// When convertig from a larger to a smaller bit width, the source is
/// simply shifted to discard the lower bits. So the opposite conversion,
/// from 4 to 2 bits looks like this:
///
/// 0 1 2 3 4 5 6 7 8 9 A B C D E F
/// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
/// ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓
/// +-------+-------+-------+-------+
/// 0 1 2 3
///
/// - Parameter source: The value to be mapped from the full range of
/// its type `S` to the full range of `Self`.
///
init<S>(rangeConverted source: S)
where S: FixedWidthInteger, S: UnsignedInteger
{
let srcByteCount = MemoryLayout<S>.size
let dstByteCount = MemoryLayout<Self>.size
if srcByteCount < dstByteCount {
let dstBitCount = dstByteCount &* 8
self = Self(truncatingIfNeeded: source)
var shifts = srcByteCount &* 8
while shifts < dstBitCount {
self |= self << shifts
shifts = shifts &* 2
}
} else if srcByteCount > dstByteCount {
let shifts = (srcByteCount &- dstByteCount) &* 8
let a = source >> shifts
self = Self(truncatingIfNeeded: a)
} else {
self = Self(truncatingIfNeeded: source)
}
}
}
/*
// NOTE: I am including this specialized version only because it can be used
// to easily verify that the "Normal test" will be as fast as the "Faster test"
// if it uses this version. Simply change the "Normal test" so that it calls
// this version instead of the generic one (note that the "Faster test" should
// still use the generic initializer, and it will still be as fast as this
// specialized version).
extension UInt64 {
init(rangeConvertedSpecialized value: UInt8) {
self = UInt64(truncatingIfNeeded: value)
self |= self << 8
self |= self << 16
self |= self << 32
}
}
*/
func test() {
let num = 100_000_000
var randomBytes = [UInt8](repeating: 0, count: num)
for trial in 0 ..< 5 {
print("#\(trial)")
// Set new random bytes for each trial:
randomBytes.withUnsafeMutableBytes { (ptr) -> Void in
let rc = SecRandomCopyBytes(nil, ptr.count, ptr.baseAddress!)
precondition(rc == errSecSuccess)
}
// ---- First a test that calls the initializer normally ----
var checksum = UInt64(0)
let t0 = CACurrentMediaTime()
for e in randomBytes {
let dst = UInt64(rangeConverted: e)
checksum = checksum ^ dst
}
let t1 = CACurrentMediaTime()
print(" Normal test:", t1 - t0, "seconds (checksum: \(checksum))")
// ---- Then the same test, only formulated differently ----
func unexpectedlyFaster(initializer: (UInt8) -> (UInt64)) {
var checksum = UInt64(0)
let t0 = CACurrentMediaTime()
for e in randomBytes {
let dst = initializer(e)
checksum = checksum ^ dst
}
let t1 = CACurrentMediaTime()
print(" Faster test:", t1 - t0, "seconds (checksum: \(checksum))")
}
unexpectedlyFaster(initializer: UInt64.init(rangeConverted:))
}
}
test()
Here is an example run on my MBP (with Xcode 9.3 beta 2, default toolchain, same result with recent dev snapshots):
› swiftc --version
Apple Swift version 4.1 (swiftlang-902.0.38 clang-902.0.30)
Target: x86_64-apple-darwin17.4.0
› swiftc -O -whole-module-optimization -gnone -static-stdlib test.swift
› ./test
#0
Normal test: 0.13397653499851 seconds (checksum: 13744632839234567870)
Faster test: 0.0593490870087408 seconds (checksum: 13744632839234567870)
#1
Normal test: 0.129204154014587 seconds (checksum: 14974415777481871311)
Faster test: 0.0549142229720019 seconds (checksum: 14974415777481871311)
#2
Normal test: 0.129486935969908 seconds (checksum: 14757395258967641292)
Faster test: 0.054901993018575 seconds (checksum: 14757395258967641292)
#3
Normal test: 0.1290586279938 seconds (checksum: 16131858542891098079)
Faster test: 0.0576003830065019 seconds (checksum: 16131858542891098079)
#4
Normal test: 0.133851692953613 seconds (checksum: 15263776468834178003)
Faster test: 0.0581462570116855 seconds (checksum: 15263776468834178003)