Why does the swift-frontend have a multiple files marked with the -primary-file flag?

I cleaned the build folder with cmmd + shift + K
'Compile Mode' set in the Build Settings. It's set to incremental.
Using: Xcode 13.3.1

For every single file that is compiled, I see almost an exact number of files marked with -primary-file. However my understanding of the swift-frontend --help was that, for compiling a single file, only that single file should be marked as -primary-file. ChatGPT also confirmed this :sweat_smile:

Example the logs for compiling CoolVC.swift are as such. Every file is marked with -primary-file.

CompileSwift normal x86_64 path/to/MyModule/CoolVC.swift (in target 'MyModule' from project 'Pods')
    cd /Users/mfaani/Dev/MyModule/main/Example/Pods
 /Applications/Xcode13.3.1.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/swift-frontend -frontend -c -filelist /var/folders/gp/q_jsdfkjasdkfas0t2tz_s640000gp/T/TemporaryDirectory.k7DLR8/sources-1 
-primary-file path/to/MyModule/Constants/Segues.swift 
-primary-file path/to/MyModule/CustomViews/SettingsView.swift 
-primary-file path/to/MyModule/CoolVC.swift 
< A LOT MORE FILES>
-primary-file path/to/MyModule/LoaderVM.swift 
-primary-file path/to/MyModule/Utilities/SwiftUI/Shadow.swift -supplementary-output-file-map

I've tried inspecting the logs in a number of different scenarios:

A have a workspace with app project + dependencies project:

  • Fresh build after clean:
    • For code in app project I only see a single -primary-file being used to compile a swift file
    • For code in dependencies project I only see multiple -primary-file along with filelist option being used to compile a swift file
  • If I make a single change to a swift file from my dependencies project, then within the compilation of CoolVC, I see a single file marked as -primary-file.

For another simple app straight out of Xcode's app template, when I build for the first time, I only see a single file being marked as -primary-file. I even tried making it so that the code in foo.swift depends on bar.swift which depends on baz.swift, but still I got the expected i.e. a single file was marked as -primary-file

So is there an explanation as to why in the dependencies project, only the first time I see multiple/all files marked as -primary-file?

(both repos/projects have 'Compile Mode' set to incremental)

2nd Question:
When the number of files in the module are A LOT, I don't see all the file mentioned for the CoolVC.swift mentioned. For every file in the module I see almost identical 25 swift files marked with -primary-file along with a filelist, which gets created and goes into ether after compilation is finished. meaning I can't inspect it. But I suppose it contains the list of all files of the module.

My question is, why is there a mix and match. Why not just mention the filelist and be done? Why is there both file along with 25 swift files explicitly mentioned?

You can read about the Swift compilation model here: https://download.swift.org/docs/assets/generics.pdf; see Chapter 2 (page 29). In ā€œbatch modeā€ the driver partitions the list of files in your module into one or more batches, and spawns a frontend job to compile each batch; the files of that batch are the primary files in the frontend job.

1 Like

AWESOME PDF :medal_military:

Pg. 30
"

1. swiftc m.swift v.swift c.swift -###
2. 2 swift-frontend -frontend -c -primary-file m.swift v.swift c.swift ...
3. 3 swift-frontend -frontend -c m.swift -primary-file v.swift c.swift ...
4. 4 swift-frontend -frontend -c m.swift v.swift -primary-file c.swift ...
5. 5 ld m.o v.o c.o -o main

In the above, weā€™re performing a batch mode build, but the module only has three source files, so for maximum parallelism each batch consists of a single source file."


So it seems that in my example, 25 was the number used to achieve maximum parallelism.

I was going to ask what happens if you need to parse a whole lot more files, which I later noticed it's covered under the topic of "Delayed Parsing". Pasting some notes just so this post contains the gist of what I need in future:

Pg. 36:
"However, the situation with parsing and type checking is more subtle. At a minimum, each frontend job must parse and type check its primary files. Furthermore, the partition of source files into frontend jobs is artificial and not visible to the user, and certainly a declaration in a primary file can reference declarations in secondary files. Therefore, in the general case, the abstract syntax tree for all secondary files must be available to a frontend job as well. On the other hand, it would be inefficient if every frontend job was required to fully parse all secondary files, because the time spent in the parser would be proportional to the number of frontend jobs multiplied by the number of source files, negating the benefits of parallelism."

Assume there's 100 files to compile.
And say we have 10 cores:

Are you saying 'parallelizing a batch of 10 files at once to 10 cores' is better than 'parallelizing 10 files to 10 cores 10 times'?

What advantage do you get from the first one? Does it somehow end up with less parsing?

Another point to keep in mind is that in the build log in Xcode, it will appear as if there is one frontend invocation for each source file, but in reality thereā€™s one per batch, so if a batch has two primary files it might seem like the same command was run twice with both primary files when in fact is shows the output twice. I think this is just an artifact of how Xcode models build results internally and collects errors, because clang always had a 1:1 mapping between frontend jobs and source files; thereā€™s no equivalent of ā€œbatch modeā€.

Iā€™m not sure I quite follow, sorry. If you have 100 source files and 10 cores, the Swift driver will kick off 10 frontend jobs responsible for building 10 files each. Each job will parse its own 10 primary files completely, and then lazily parse bits of the remaining 90 files as needed during type checking.

Apologies my original question wasn't clear.

What is the a gain with 'batch mode'?

Assume there's 100 files to compile.
And say we have 10 cores, which of the two is more performant?

  • 'parallelizing 10 batches, each with 10 files to 10 cores' (less calls to the CPU, but each time it's longer)
  • 'parallelizing without batching i.e. 10 files to 10 cores 10 times'? (more calls to the CPU, but each time it's shorter)

I feel like the total time should be the same...

When a frontend job has more than one primary file, it can help amortize the cost of doing operations on secondary files, such as parsing and type checking the declarations that are referenced from primary files. It also amortizes the cost of loading declarations from imported Swift and Objective-C modules, if the same declarations are used from the primary files in a batch. You can find explanations in sections 2.2 and 2.3 of that PDF, although maybe itā€™s not as clear as it could be.

If we take it to its limit weā€™d have a single frontend job where all files are primary; thatā€™s WMO mode. But it doesnā€™t take advantage of multicore concurrency. So batch mode is the sweet spot: one frontend job per CPU core so you get some amount of parallelism as well as caching information across some primary files.

3 Likes