[GSoC2020] Swift Module Explorer

denis631 · February 26, 2020, 3:06pm

Hi, I would like to work on the Module Explorer project as a part of GSoC2020.

I have several questions:

what exactly is the purpose of the swift module explorer and who'd be potential users of it?
what exactly is the desired output in comparison to what llvm-bcanalyzer doesn't do?
-> "decipher" the SIL block/index block and AST block definitions?
- is one interested in the API which module exposes and modules on which it depends on/imports?
- present the .swiftmodule file internal structure in a more "user-friendly" way?
apparently there is already a command-line-tool (swift-ide-test::print_module) which explored .swiftmodule as found here
I haven't tried it yet but another option to print_module the answer here and it worked for me printing
this:

...
@_exported import DummyFramework
import Foundation
import SwiftOnoneSupport
class DummyClass {
  init()
  func f(x: Int) -> Int
  @objc deinit
}

for this code:

import Foundation

public class DummyClass {
  public init() {}

  public func f(x: Int) -> Int {
    return -1
  }
}

Thanks in advance

codafi · February 26, 2020, 6:23pm

Hi Denis, thanks for taking an active interest in the Swift Google Summer of Code project. Let me answer your questions:

what exactly is the purpose of the swift module explorer and who'd be potential users of it?

A Swift Module is a serialized representation of the Swift AST - something we require because Swift doesn't have header files. The compiler is capable of emitting complete Swift Modules, or it is capable of outputting partial Swift Module files for (batched) incremental compilation and stitching those partial modules together into the final complete module. There's a lot of opportunities for different failure modes to arise during those processes. This Swift Module could serialize a broken or partial AST node, it could incorrectly form a cross-link to another declaration in a broken module, merging partial Swift modules might fail for a multitude of reasons, etc.

Serialization and Deserialization (the two components responsible for the bookkeeping for Swift Modules) are not well-instrumented to fail in loud and obvious ways (they're getting better though), leading to situations where we have to either reproduce with the entire setup that led to the creation of a broken module or in the few cases where it's blatantly obvious we can drop down to the llvm-bcanalyzer tool and slog through a giant XML dump to e.g. pinpoint a broken crosslink.

As you can probably tell, the people this tool was envisioned for would be compiler engineers. If done in a more accessible way than just a command line tool - perhaps if it were exposed as a library that could be used to build a GUI tool, it might serve a more general-purpose educational role for the rest of our users.

what exactly is the desired output in comparison to what llvm-bcanalyzer doesn't do?

llvm-bcanalyzer is a very limited tool that is primarily meant for dumping bitstream files and stats about bitstream files. Its dump output is incredibly verbose, its verification powers are limited, and where they exist they do not help us accomplish our goal of identifying what is broken about a Swift module. Try dumping the standard library's .swiftmodule file sometime to get a feel for how hard it is to work with.

To take another example, it's pretty rare that the verification part of llmv-bcanalyzer would help us since Serialization and Deserialization use a C++ template-based DSL that guarantees we'll respect the bitstream file format and round-trip properly. So what matters to somebody working on those components is not what the bitstream file looks like, but whether the entities inside of it are consistent.

is one interested in the API which module exposes and modules on which it depends on/imports?

Not particularly. The API surface of the module is a nice thing to be able to produce, sure, but as you note dumping that out of swift modules is already a facility we have in the compiler. In addition to the IDE dumping facility, the Swift Compiler can also generate Swift Interface files from a given Swift Module file, and does so today for most of the public SDK. Errors in Serialization or Deserialization often render these facilities useless for debugging because one or the other will usually crash before the module's contents can be read.

present the .swiftmodule file internal structure in a more "user-friendly" way?

This one is key. Having Swift-specific structured output would give us a lot of insights into a module file's higher-level structure and would be an enormous help to anybody that is trying to decipher anything about that structure.

denis631 · February 27, 2020, 2:28pm

Thank you for such a detailed response.

As far as I understand the tool is needed in order to help/speed up debugging of .swiftmodule issues.

Try dumping the standard library's .swiftmodule file sometime to get a feel
for how hard it is to work with.

I did try that out. This is what I meant with "deciphering" the generated XML file, since a lot of information is still unreadable, like abbrevid's or ops

I have another set of questions, in order to better understand the topic:

what are those cross-links you are referring to?
regarding possible failures
- how can one serialize a broken or partial AST node
- and in general, how can one reproduce the issues you've described?

llvm-bcanalyzer does not help us accomplish our goal of identifying what is broken about a Swift module
So what matters to somebody working on those components is not what the bitstream file looks like, but whether the entities inside of it are consistent
- what would help you accomplish this goal? should the module explorer verify the cross-links? what exactly you'd like to see in the module explorer?

possible outcome (from what I understand right now)
- one could imagine a cli tool that generates graphiz output
  this in turn can generate an image with all the links and dependencies and metadata.
  what do you think?

codafi · March 4, 2020, 6:45am

The idea is to serialize a unique, unambiguous path that uniquely identifies a (serialized) Swift definition. A cross-reference is usually built as a chain of records that start at a module and then proceed through "path pieces" that grow increasingly more specific. The idea is that we deserialize the path pieces, then follow the breadcrumbs until we finally attempt a lookup that can be filtered based on the path pieces to get to the target declaration.

Cross-references can be subject to all kinds of shenanigans. Usually, serializing them isn't a problem (I mean, we can still serialize an invalid AST node, but that's increasingly rare these days). It's usually deserializing them where the problems arise. A given cross-ref can fail for a multitude of reasons: The path could no longer point to a unique declaration and raise an ambiguity, the path could lead to a unique declaration, but that declaration might deserialize in such a way that it no longer matches the path pieces, etc.

A concrete recent example of that last one was a deserialization failure where we had formed a cross-ref to an Objective-C convenience initializer that was ClangImport'd as a Swift convenience initializer. A subclass of that class was defined in Swift, and it automatically inherited that convenience initializer, so we emitted a SIL instruction that contained the cross-ref. But when we came around to merge-modules, we deserialized the cross-ref, discovered that the class suddenly had initializers, which caused the ClangImporter to re-import the initializer as a designated initializer. This caused deserialization to barf because the initializer the path referenced no longer existed.

It's difficult. Reproducing the above required having the source of the project available, and triaging the issue was time-consuming because it took a long time to build the project and its dependencies and to tease out the exact chain of steps that lead to this bug. As you can see, a lot of these failures are cross-cutting.

I'd like to see a tool that was built to actually explore the contents of a module, not just dump it. I'd like to be able to see the different sections of the module, and be able to dig into their contents e.g. a section at a time, filter all the declarations for a given name, and occasionally see raw record values (these aren't usually very helpful unless they're strings). Beyond that, making the tool more Swift specific would be icing on top but very difficult. The module format is not meant to be stable, and still changes quite frequently. It's a non-goal for this tool to keep up with and dump the exact format - it would tie the tool too strongly to any one version of the module format. Think objdump.

Graphviz is an option, certainly. But I'm wary of it as a tool we can use in general since these module files can often be quite large, and the visualization tools I'm familiar with aren't very good at handling large dot files.