Large tuples become FixedTupleTypeInfo (non-loadable, too big for registers) rather than LoadableTupleTypeInfo

When initializing a large tuple of trivially-copyable types in embedded Swift, the compiler generates individual field copy operations for each element instead of using a single memcpy. For a 64-element tuple, this generates 64 separate store operations in the IR, causing compilation to hang or take an extremely long time.

Environment:

  • Swift 6.0 (swiftlang-6.0.0.9.10 clang-1600.0.26.2)

  • Target: riscv32-none-none-elf (embedded Swift, bare metal)

  • macOS 15.3.1

Minimal reproducer:

// Compile with: swiftc -target riscv32-none-none-elf -enable-experimental-feature Embedded -wmo -O test.swift

@frozen
public struct SmallStruct {
    var a: UInt32 = 0
    var b: UInt32 = 0
    var c: UInt32 = 0
    var d: UInt32 = 0
}

public struct Container {
    // 64 elements × 16 bytes = 1KB tuple
    var storage: (
        SmallStruct, SmallStruct, SmallStruct, SmallStruct,
        SmallStruct, SmallStruct, SmallStruct, SmallStruct,
        SmallStruct, SmallStruct, SmallStruct, SmallStruct,
        SmallStruct, SmallStruct, SmallStruct, SmallStruct,
        SmallStruct, SmallStruct, SmallStruct, SmallStruct,
        SmallStruct, SmallStruct, SmallStruct, SmallStruct,
        SmallStruct, SmallStruct, SmallStruct, SmallStruct,
        SmallStruct, SmallStruct, SmallStruct, SmallStruct,
        SmallStruct, SmallStruct, SmallStruct, SmallStruct,
        SmallStruct, SmallStruct, SmallStruct, SmallStruct,
        SmallStruct, SmallStruct, SmallStruct, SmallStruct,
        SmallStruct, SmallStruct, SmallStruct, SmallStruct,
        SmallStruct, SmallStruct, SmallStruct, SmallStruct,
        SmallStruct, SmallStruct, SmallStruct, SmallStruct,
        SmallStruct, SmallStruct, SmallStruct, SmallStruct,
        SmallStruct, SmallStruct, SmallStruct, SmallStruct
    )
    
    public init() {
        let empty = SmallStruct()
        self.storage = (
            empty, empty, empty, empty, empty, empty, empty, empty,
            empty, empty, empty, empty, empty, empty, empty, empty,
            empty, empty, empty, empty, empty, empty, empty, empty,
            empty, empty, empty, empty, empty, empty, empty, empty,
            empty, empty, empty, empty, empty, empty, empty, empty,
            empty, empty, empty, empty, empty, empty, empty, empty,
            empty, empty, empty, empty, empty, empty, empty, empty,
            empty, empty, empty, empty, empty, empty, empty, empty
        )
    }
}

@_cdecl("test")
public func test() -> Container {
    return Container()
}

Expected behavior:

Since SmallStruct is trivially copyable (POD) and the entire tuple is bitwise-takable, the compiler should generate a single memcpy or memset for initialization.

Actual behavior:

Compilation hangs or takes minutes. When it eventually completes, the generated IR contains 64 separate field initialization sequences.

Analysis:

Looking at lib/IRGen/GenRecord.h, the initializeWithCopy method has an optimization path for trivially-copyable types at line 239:

if (this->isTriviallyDestroyable(ResilienceExpansion::Maximal) &&
    isa<LoadableTypeInfo>(this)) {
  return cast<LoadableTypeInfo>(this)->LoadableTypeInfo::initializeWithCopy(...);
}

The problem is that large tuples become FixedTupleTypeInfo (non-loadable, too big for registers) rather than LoadableTupleTypeInfo. So the isa<LoadableTypeInfo> check fails, and even though the type is trivially copyable, it falls through to the field-by-field loop.

The initializeWithTake method already handles this correctly for bitwise-takable types (line 270):

if (this->isBitwiseTakable(ResilienceExpansion::Maximal)) {
  IGF.Builder.CreateMemCpy(...);
}

A similar check should be added to initializeWithCopy for fixed-size, trivially-copyable, bitwise-takable types.

1 Like

I have tried to compile with -target riscv32-none-none-eabi on compiler explorer (elf doesn't seem to be available in the toolchain) and could reproduce the long compile times with Swift 6.0 (47s).
Switching to Swift 6.1, 6.2 or nightly results in much faster compilation times (3-4s).

But even with Swift 6.0 the compiler emits a call to memset so maybe I'm not hitting the same issue.

1 Like

Could you try a more recent release or nightly? Some related work was done in 6.1(?) to make large imported C structs avoid the same problem and I suspect this case has been fixed too.