Error when dumping the AST for hundreds of files

Vinicius_Vendramini · November 1, 2018, 5:04pm

Hi all, I'm having trouble when trying to dump the AST for a set of approximately 700 Swift files. Smaller sets, of about 20 files, work fine.

Here's the deal: when working with an iOS app, Xcode prints the commands it uses to compile the Swift files in the Report navigator. If I take those commands, remove -parseable-output, -c and the -emit-... options and add -dump-ast, I can get the Swift compiller to dump the ASTs for those files.

This works just fine for two sets of about 20 files. However, if I try the same approach to a third set of 700 files, the compiler gets through 26 of them before it prints the following error:

<unknown>:0: error: unable to execute command: Illegal instruction: 4
<unknown>:0: error: compile command failed due to signal 4 (use -v to see invocation)
LLVM ERROR: IO failure on output stream: Invalid argument

Searching online I've found that similar LLVM errors happen when the computer runs low on disk space (not my case), which leads me to speculate that maybe the mechanisms used to print the AST are running out of memory or something.

I've been trying to diagnose the error without much success so far, since it's hard to replicate using a custom built Swift toolchain. Any help would be appreciated.

jrose · November 1, 2018, 5:09pm

I don't think anyone's worked on making -dump-ast useful for multiple files. I'm pretty sure the driver is buffering stderr for each subprocess to keep from interleaving them, but that could easily lead to capacity problems. Maybe a better answer would be to add a separate option that dumps the AST to files instead of to stderr, like other outputs.

Vinicius_Vendramini · November 1, 2018, 5:29pm

Oh, I really like this idea, and I'd be happy to try to make it happen. Could you give me some more details? Like, what kind of other outputs have alternatives that print to files?

jrose · November 2, 2018, 2:33am

I would model it after what we do for -emit-silgen; look for "TY_RawSIL" and "EmitSILGen" in the Driver and FrontendTool libraries. (Note that SR-6271 means you'll have to use an output file map to get this to work well.)

Vinicius_Vendramini · November 2, 2018, 3:32pm

Ok. I checked out the docs and a little bit of the code, and I think I get how the output file map is supposed to work. So, basically, I'd do something like this:

Add a new entry for a file type, something like ".ast", associated with AST dump files. Make sure it gets properly recognized by the output file map.
Look into -emit-sil as a basis to make -dump-ast print its output into the .ast files specified in the output file map.
Build systems could then call something like swiftc *.swift -output-file-map=outputFileMap.json -dump-ast, and the compiler would dump the ASTs into the provided .ast files. The outputFileMap.json could be, for instance:

{ 
  "main.swift": {
    "dump-ast": "main.ast"
  },
  "myClass.swift": { 
    "dump-ast": "myClass.ast"
  }
}

The current functionality of -dump-ast without an output file map would be unnafected.

jrose · November 2, 2018, 4:43pm

That sounds about right, except I wouldn't bother including a default extension for AST dumps. (The extension is mostly important for inputs rather than outputs.)

Vinicius_Vendramini · November 3, 2018, 8:49pm

OK, I've started looking into it but I've hit a snag... it seems that when I pass the -output-file-map=... option, the driver schedules some jobs that call the compiler recursively, except further calls (after the first one) don't trigger my breakpoints, so it's getting hard to follow the code. Is there a flag or some kind of trick that will change the way this is done and make it easier to debug?

jrose · November 5, 2018, 5:47pm

Not really. When acting on multiple files, the compiler is going to spin up several frontend jobs to handle them. There's a trick for slightly faster execution: if it only has to do one thing, it uses the exec system call instead, which makes breakpoints work. But if you want to do that for many files, you might be back in one-output-for-everything land.

…which might be fine in your case, actually. Okay, try adding -force-single-frontend-invocation. (This is equivalent to -whole-module-optimization right now, but wouldn't have to be in the future.)

Vinicius_Vendramini · November 5, 2018, 6:29pm

No deal... even when using -force-single-frontend-invocation, it seems the compiler spawns another process (or another thread? not sure) to handle that single frontend invocation, so I lose control again. Also, using this option seems to make it print the SIL directly to stdout, so I'm not sure if it'd help

I'm curious though, how do you guys debug something like this (which I'd imagine isn't that uncommon) without using LLDB? I fell back to printing information to stdout for now, but it just doesn't seem like a reasonable long term approach.

jrose · November 5, 2018, 6:36pm

Most of the time we don't run commands through the driver at all; we just debug frontend invocations directly. You can always ask what the driver is going to run with -###. It does get tricky sometimes with multiple files, though; in those cases we either try to figure out what's causing the problem and get it down to one frontend instance, or we do what you're doing and dump out extra information.

It's rare that you really need a debugger attached to multiple processes at once, but sometimes it would be nice to have a conditional breakpoint that behaved that way to find out when something's busted.

Vinicius_Vendramini · November 5, 2018, 6:52pm

Oooooh I get it now! Using -### was great, it actually explains a lot.

So it looks like a call with -output-file-map=... and -emit-silgen actually becomes several calls for individual files, in the form of

swift -frontend -emit-silgen foo.swift [...] -o foo.sil

So maybe I can implement support for something like

swift -frontend -dump-ast foo.swift [...] -o foo.ast

and then use the same -output-file-map trick to separate the initial call into one separate call per file.

Vinicius_Vendramini · November 7, 2018, 2:33pm

I think I managed to do it! I created a pull request, if you want to take a look.

I wasn't able to test it with a real iOS app however.

I tried building a toolchain with these changes using utils/build-toolchain myToolchain;
I then used that toolchain to perform the same steps as in the first post, removing the -parseable-output, -c and -emit-... options from Xcode's command and adding -dump-ast and (this time) an -output-file-map=....

When I try to run the resulting command, however, I just get a lot of these errors:

<unknown>:0: error: this mode does not support emitting reference dependency files

I should mention that I tried this on a new iOS app I created for these tests, since the original app doesn't build with the new toolchain (only Swift 4.2).

If I try the same method of changing the Xcode command without adding the output file map, I get the same error; but if I try it with without the output file map and with the Swift 4.2 toolchain, everything works fine. This leads me to believe that some change happened since Swift 4.2 that made this method of modifying the Xcode commands not work anymore. Do you have any clue as to what that could be or how I can diagnose it?

jrose · November 7, 2018, 5:30pm

Reference dependencies (swiftdeps) are how incremental builds are implemented. You should be able to get past this by dropping -incremental.

I'll take a look at the PR!

Vinicius_Vendramini · November 7, 2018, 8:25pm

Removing -incremental helped; it now fails with an error that starts with

Assertion failed: (PrimarySourceFiles.size() == 1), function getPrimarySourceFile, [...]

which leads me to believe that the compiler is looking for a main.swift file but not finding it (since these files are just a part of an iOS app). I assume that since I'm passing -dump-ast the compiler shouldn't need a main.swift file (as it's not gonna finish the compilation), right?

Also, thanks for the quick review, I'll get right on to addressing your comments. Could you tell me an easy way to trigger the batch mode so I can test that?

jrose · November 7, 2018, 9:01pm

Heh, that is exactly the assertion for when you are using batch mode. "Primary or main" just isn't what you should be looking for; you want to do this for every primary file.

(I'm not quite sure how this will work with WMO, since that doesn't use the notion of "primary files" at all, but you probably don't need that.)

Vinicius_Vendramini · November 7, 2018, 10:55pm

Oh great! Two birds with one stone then.

Is there maybe an easier way to trigger batch mode? The process of building the toolchain and using it on an app is pretty cumbersome for a normal development cycle (and wouldn't let me use Xcode's debugging functionalities).

jrose · November 7, 2018, 11:09pm

"Batch mode" mostly just means "a frontend invocation with multiple -primary-file arguments", but it's also based on the number of files in the project and the number of parallel jobs allowed. Changing the -j option in the top-level invocation is an easy way to get batching even when you only have, say, two files.

Vinicius_Vendramini · November 8, 2018, 5:40pm

Ok, I think I got it now. Before I saw your answer I found an -enable-batch-mode option that also did the trick :)

It should be fixed, I'm building a toolchain to be sure. In the meantime, I still have to write the tests. I've updated the PR, if you wanna take a look.

Vinicius_Vendramini · November 9, 2018, 7:20pm

Alright, I think I'm starting to get how the tests work, but I'm having trouble running them.

I tried doing utils/build-script [...] --test, but a few tests failed and I couldn't really tell why from the output information. So I tried using the lit approach, calling

llvm/utils/lit/lit.py -sv --param swift_site_config=build/Xcode-RelWithDebInfoAssert+swift-DebugAssert/swift-macosx-x86_64/test-macosx-x86_64/lit.site.cfg swift/test/Frontend/batch-mode.swift

but I keep getting the same error on several different tests:

[...]

UNRESOLVED: Swift(macosx-x86_64) :: Frontend/batch-mode.swift (1 of 1)
******************** TEST 'Swift(macosx-x86_64) :: Frontend/batch-mode.swift' FAILED ********************
Exception during script execution:
Traceback (most recent call last):
  File "[...]/swiftSource/llvm/utils/lit/lit/run.py", line 202, in _execute_test_impl
    result = test.config.test_format.execute(test, lit_config)

[...]

  File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/sre_parse.py", line 195, in __next
    if self.index >= len(self.string):
TypeError: object of type 'NoneType' has no len()


********************
Testing Time: 0.11s
********************
Unresolved Tests (1):
    Swift(macosx-x86_64) :: Frontend/batch-mode.swift

  Unresolved Tests   : 1

19 warning(s) in tests.

This seems to me like something went wrong in lit.py, which leads me to believe it wasn't my changes that made the tests fail. I think maybe my invocation doesn't make sense. Could you help me out? I wanted to get this done in the weekend if possible

jrose · November 9, 2018, 9:14pm

I think there is a way to run the tests that way, but I've never done it that way. I usually point lit.py at the build directory's test folder and use --filter, as described in docs/Testing.md.

That said, I think the thing you're missing is --param build_mode=Debug, or something like it. Xcode builds put bin/swiftc and lib/swift/ in a configuration-specific subfolder, since they allow (in theory) building for both Debug and Release out of the same Xcode project.