[Re-review] SE-0274: Concise Magic File Names

  • What is your evaluation of the proposal?

+0.5. TL;DR:

  • I don't quite like having #filePath available at all, because even one usage of #filePath in your dependencies is sufficient to leak information about the build machine, and to make builds non-reproducible. There is not enough motivation in the proposal to understand why it is being added, so I can't propose a better alternative.

  • I think the "#moduleName + #fileName" alternative is better because it provides the information that people want directly (people said that they will want to parse out the module name), and this option will allow better deduplication of strings.

Long analysis follows.

The issues with information leakage, and unnecessary differences between binaries built in different ways are only addressed partially. To fully address them, we should not be adding #filePath, or at least, allow to instruct the compiler the compiler to make #filePath stable, for example, make it equal to #file. This issue is somewhat mitigated by the fact that #filePath is not forced to be an absolute path (it passes through whatever was provided on the command line), so as long as the build system is careful to not use absolute paths either, it should be possible to build code reproducibly. However, such an arrangement is fragile.

The availability of #filePath itself is an issue -- if someone in your dependency graph uses it, your builds can become non-reproducible and can leak information. Does everyone audit all of the open source code that you depend on? I think not. So I don't think compatibility would be a good reason to add #filePath, because we don't want users to just replace #file with #filePath to get the previous behavior. Generally, the proposal does not provide enough motivation for #filePath (it says "For those applications which still need a full path, we will provide a new magic identifier, #filePath" -- I would prefer to see the actual use cases listed).

The issue with bloated binaries is also not fully addressed in this proposal, because each of the #file strings will contain the module name. While all #file strings coming from one file can be deduplicated, module names across #file strings coming from multiple files can't be deduplicated, so we would be leaving some code size improvements on the table. We could enable such deduplication by adopting the #fileName + #moduleName alternative.

What the proposal says about #fileName is true -- it is not a useful string by itself, it should always be combined with #moduleName to be non-ambiguous. However, that can (and should) be done at the API level, to allow the compiler and the linker to deduplicate strings.

ABI breakage for existing clients, like fatalError(_:file:line:) is unfortunate, but there are ways to deal with it (add a new entry point, and keep the old one for old clients).

The proposal argues that file: String = "\(#moduleName)/\(#fileName)" gives us the caller's location, and uses it as one of justifications for not introducing #fileName. However, this behavior looks like a bug. Compare that default argument expression with file: String = #file, which gives us the callee's location. What we see is that the default expressions involving #file are sometimes evaluated at the caller's location, and sometimes at the callee's location -- looks like a bug to me. Once that bug is fixed (so that all default argument expressions are evaluated with the caller's source location), clients that don't want to break the ABI would be able to combine the strings in the default argument if they like.

Another reason to add separate #moduleName and #fileName is to address the use case of people who said that they want to parse #file. Sure, we can document how to parse it correctly in a forward-compatible way -- but we could instead provide the information that people want directly.

The only advantage of introducing a combined #file instead of #moduleName + #fileName that I see, is having a standardized way to compose and communicate concise file paths. If we introduce #moduleName + #fileName, different users could end up combining them differently, making it more difficult for tooling to parse them out of logs and other places. However, as long as the format of the pseudo-filename that can be recognized by tooling is clearly documented, I think most users will adopt it.

If we must introduce #file, then I fully agree with @tkremenek's point about having to specify the format of the string. If we don't formally specify the format of the string, while providing a stable format in practice, users will still come to expect the behavior that is implemented. It is a trivial application of Hyrum's Law. I'm not even sure that it would be possible to exercise the the flexibility to add a disambiguator in future without breaking code. I think some code will be broken no matter what, because it will not expect a disambiguator.

  • Is the problem being addressed significant enough to warrant a change to Swift?

Yes. Of the reasons listed by the proposal, binary size due to embedding long paths is a serious issue for big libraries and applications, which will become a hurdle for larger-scale adoption of Swift. Reproducible builds are also a necessity for many large-scale deployment, which often have technical reasons to prefer reproducible, hermetic builds (simpler mental model, easier to debug build system issues, enables caching of artifacts), and also non-technical reasons, like internal and external auditability and compliance requirements.

  • Does this proposal fit well with the feel and direction of Swift?

Yes, magic identifiers so far have been the way to express these sort of features.

  • If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?

C and C++ have a magic filename macro, and it has been causing issues that the proposal mentions. To mitigate them, Clang allows to redefine the macro on the command line (-D__FILE__="example.cpp"). That requires the user to do their own plumbing in the build system, which is not nice, and few people care to do it. This proposal is different in that it tries to make the defaults good.

  • How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

I am deeply familiar with the issues that the proposal lists as motivation.

6 Likes