I want to join a large array with joined(), and I found this method is very slow, it is much slower than flatMap(), here is the code to reproduce it.
As you can see, joined takes around 0.26s, flatMap takes around 0.0064s, do you know why this happened?
import Foundation
var datas = [Data]()
var count = 0
(0...10000).forEach { _ in
let data = """
public struct FlattenSequence<Base: Sequence> where Base.Element: Sequence {
public struct FlattenSequence<Base: Sequence> where Base.Element: Sequence {
""".data(using: .utf8)!
datas.append(data)
count += data.count
}
// 1st approach, it takes 0.0049s, around 40 times faster than the 2nd approach.
let start1 = Date()
var wholeData1 = Data(capacity: count)
datas.forEach {
wholeData1.append($0)
}
print("Time 1: \(Date().timeIntervalSince(start1))")
// 2nd approach, it takes 0.26s,
let start2 = Date()
let wholeData2 = Data.init(datas.joined())
print("Time 2: \(Date().timeIntervalSince(start2))")
// 3rd approach, it takes 0.0064s
let start3 = Date()
let wholeData3 = Data.init(datas.flatMap { $0 })
print("Time 3: \(Date().timeIntervalSince(start3))")
print(wholeData1 == wholeData2)
print(wholeData1 == wholeData3)
print(count)
Thanks, the next question is why ?
I suppose it's because of this
/// The `joined` method is always lazy, but does not implicitly
/// confer laziness on algorithms applied to its result. In other
/// words, for ordinary sequences `s`:
///
/// * `s.joined()` does not create new storage
/// * `s.joined().map(f)` maps eagerly and returns a new array
/// * `s.lazy.joined().map(f)` maps lazily and return
But we already have a LazySequence, we can always call an array method lazily by array.lazy.method, why swift decide to make joined method be always lazy?
I did some more test, I have found out the performance of datas.lazy.flatMap { $0 }, datas.lazy.joined(), datas.joined() is quite similar, which means the slowness is indeed caused by laziness.
IMHO, the better option would be remove the laziness from joined, and we can still call it lazily by array.lazy.joined().
I have the same result using UInt8 with the following code.
import Foundation
var datas = [[UInt8]]()
var count = 0
(0...10000).forEach { _ in
let data = """
public struct FlattenSequence<Base: Sequence> where Base.Element: Sequence {
public struct FlattenSequence<Base: Sequence> where Base.Element: Sequence {
""".data(using: .utf8)!
let uint8s = [UInt8](data)
datas.append(uint8s)
count += data.count
}
// 1st approach, it takes 0.0049s, around 40 times faster than the 2nd approach.
let start1 = Date()
var wholeData1 = [UInt8]()
datas.forEach {
wholeData1.append(contentsOf: $0)
}
print("Time 1: \(Date().timeIntervalSince(start1))")
// 2nd approach, it takes 0.26s,
let start2 = Date()
let wholeData2 = Array(datas.joined())
print("Time 2: \(Date().timeIntervalSince(start2))")
// 3rd approach, it takes 0.0064s
let start3 = Date()
let wholeData3 = Array(datas.flatMap { $0 })
print("Time 3: \(Date().timeIntervalSince(start3))")
// 4th approach, it takes 0.0064s
let start4 = Date()
let wholeData4 = Array(datas.lazy.flatMap { $0 })
print("Time 4: \(Date().timeIntervalSince(start3))")
print(count)