When initializing a large tuple of trivially-copyable types in embedded Swift, the compiler generates individual field copy operations for each element instead of using a single memcpy. For a 64-element tuple, this generates 64 separate store operations in the IR, causing compilation to hang or take an extremely long time.
Environment:
-
Swift 6.0 (swiftlang-6.0.0.9.10 clang-1600.0.26.2)
-
Target: riscv32-none-none-elf (embedded Swift, bare metal)
-
macOS 15.3.1
Minimal reproducer:
// Compile with: swiftc -target riscv32-none-none-elf -enable-experimental-feature Embedded -wmo -O test.swift
@frozen
public struct SmallStruct {
var a: UInt32 = 0
var b: UInt32 = 0
var c: UInt32 = 0
var d: UInt32 = 0
}
public struct Container {
// 64 elements × 16 bytes = 1KB tuple
var storage: (
SmallStruct, SmallStruct, SmallStruct, SmallStruct,
SmallStruct, SmallStruct, SmallStruct, SmallStruct,
SmallStruct, SmallStruct, SmallStruct, SmallStruct,
SmallStruct, SmallStruct, SmallStruct, SmallStruct,
SmallStruct, SmallStruct, SmallStruct, SmallStruct,
SmallStruct, SmallStruct, SmallStruct, SmallStruct,
SmallStruct, SmallStruct, SmallStruct, SmallStruct,
SmallStruct, SmallStruct, SmallStruct, SmallStruct,
SmallStruct, SmallStruct, SmallStruct, SmallStruct,
SmallStruct, SmallStruct, SmallStruct, SmallStruct,
SmallStruct, SmallStruct, SmallStruct, SmallStruct,
SmallStruct, SmallStruct, SmallStruct, SmallStruct,
SmallStruct, SmallStruct, SmallStruct, SmallStruct,
SmallStruct, SmallStruct, SmallStruct, SmallStruct,
SmallStruct, SmallStruct, SmallStruct, SmallStruct,
SmallStruct, SmallStruct, SmallStruct, SmallStruct
)
public init() {
let empty = SmallStruct()
self.storage = (
empty, empty, empty, empty, empty, empty, empty, empty,
empty, empty, empty, empty, empty, empty, empty, empty,
empty, empty, empty, empty, empty, empty, empty, empty,
empty, empty, empty, empty, empty, empty, empty, empty,
empty, empty, empty, empty, empty, empty, empty, empty,
empty, empty, empty, empty, empty, empty, empty, empty,
empty, empty, empty, empty, empty, empty, empty, empty,
empty, empty, empty, empty, empty, empty, empty, empty
)
}
}
@_cdecl("test")
public func test() -> Container {
return Container()
}
Expected behavior:
Since SmallStruct is trivially copyable (POD) and the entire tuple is bitwise-takable, the compiler should generate a single memcpy or memset for initialization.
Actual behavior:
Compilation hangs or takes minutes. When it eventually completes, the generated IR contains 64 separate field initialization sequences.
Analysis:
Looking at lib/IRGen/GenRecord.h, the initializeWithCopy method has an optimization path for trivially-copyable types at line 239:
if (this->isTriviallyDestroyable(ResilienceExpansion::Maximal) &&
isa<LoadableTypeInfo>(this)) {
return cast<LoadableTypeInfo>(this)->LoadableTypeInfo::initializeWithCopy(...);
}
The problem is that large tuples become FixedTupleTypeInfo (non-loadable, too big for registers) rather than LoadableTupleTypeInfo. So the isa<LoadableTypeInfo> check fails, and even though the type is trivially copyable, it falls through to the field-by-field loop.
The initializeWithTake method already handles this correctly for bitwise-takable types (line 270):
if (this->isBitwiseTakable(ResilienceExpansion::Maximal)) {
IGF.Builder.CreateMemCpy(...);
}
A similar check should be added to initializeWithCopy for fixed-size, trivially-copyable, bitwise-takable types.