SE-0274: Concise Magic File Names

scanon · January 14, 2020, 11:09pm

Clang provides __FILE_NAME__¹ as an extension in all of these languages for exactly the same reasons. They are significant issues.

"Clang-specific extension that functions similar to __FILE__ but only renders the last path component (the filename) instead of an invocation dependent full path to that file."

beccadax · January 14, 2020, 11:10pm

This isn’t actually true. Compiler diagnostics look like this:

Path/File.swift:42: error: message

While runtime errors look like this:

Precondition failed: message: file Path/File.swift, line 42

The two formats have almost nothing in common.

dabrahams · January 14, 2020, 11:16pm

Well, OK, they were the same at one point IIRC. Regardless, the same basic mechanism can be used by Emacs to navigate to both errors, and is, as you can see here: swift/swift-project-settings.el at main · apple/swift · GitHub. That's by virtue of the same path information being available.

dabrahams · January 14, 2020, 11:22pm

Okay, but one thing they did not do in Clang was to change the behavior of assert to drop all path information, as is being proposed for Swift. Are people clamoring for a change to the behavior of assert in [Objective-]C[++]?

michelf · January 14, 2020, 11:41pm

What's proposed here is to include the module in addition to the file name, which should uniquely identify the file. With __FILE_NAME__, you don't really know who's file it is so relying on it is a bit problematic for the general case.

Also, assert is usually included only in debug builds, presumably with plenty of debug symbols, so including a path is not that much of an issue. In Swift, fatalError and precondition are kept in release mode and will bring the file path in the executable.

allevato · January 15, 2020, 12:01am

I'm supportive of this proposal because I think it aligns with Swift's goals to have the default behavior of the compiler be safe even if the tool/build system invoking it is providing it with inputs that are "unsafe"—but this is an important point to raise, because it is the case that the "damage", so to speak, is caused by the default behavior of the most commonly used build systems. The compiler itself just treats #file as the verbatim string you give it on the command line for that file, but:

Xcode passes all source files to the compiler as absolute paths. It doesn't have to worry about the CWD of the compiler process or set -working-directory.
Likewise, SwiftPM (I believe) passes all source files to the compiler as absolute paths.
Bazel is less on the hook here, because it runs the compiler with CWD as the workspace root (because it can't pass absolute paths, it doesn't know what they are at the time it builds the actions), so paths to sources are passed relative to that. This avoids issues with reproducibility on distributed builds, but a user could argue that it still leaks intra-workspace paths up to the basename of the source file, if they consider those to be sensitive (e.g., workspace_root/my/companys/sensitive/org/chart/MyProject/Foo.swift).
I can't speak about other build systems.

So, the issues here could potentially be mitigated somewhat by changes to Xcode and SwiftPM (and those would probably be good changes independent of this proposal). SwiftPM is probably easier, because AFAIK all the sources have to be under a common source root.

I'm less sure what the best solution for Xcode would be, since its project model lets you refer to source files literally anywhere on the file system (and if you have build phases that generate sources, they most likely will not be under your project root). If your project had source files under a project root that you could relativize and some other set of generated or external source files that didn't live under the root, I can't think of a way to express the latter set on the command line that would avoid leaking something to #file. Maybe by using -vfsoverlay? That seems like a big hammer for this nail, though.

scanon · January 15, 2020, 12:59am

This is an interesting point; module + file name actually has more information than the full path, since a single absolute file path could be used in multiple modules, right?

jayton · January 15, 2020, 4:32pm

Purely as a point of data, I gave up on complaining about this for C-based languages in the days when Usenet was the hip forum for venting, but I’ve wanted __FILE_NAME__ regularly since then (and never once specifically preferred the full __FILE__).

tkremenek · January 15, 2020, 7:30pm

One bit of feedback I have is regarding this specific point in the proposal:

We propose changing the string that #file evaluates to. To preserve implementation flexibility, we do not specify the exact format of this string

I think in practice the format of the string that is chosen will become the one that is expected by users. Tooling will be written around that expectation.

My preference is we make the format precisely defined. I do not see the value of leaving the format not fully specified, as the flexibility will not exist in practice to just change the format at any time.

Making the format of the string specified also potentially allows the creation of tooling that maps from the concise string to a richer one (e.g., what would be the output of #filePath) with more information. That gets the benefits of conciseness, while also providing a trajectory for supporting good tooling support around the cases that will use #file.

QuinceyMorris · January 15, 2020, 10:00pm

FWIW, it sounds like this is a debugging workflow, in which case it seems the correct mechanism back to the source is through debug info, and anyway it's typically under control of the original developer if it goes through Emacs.

The privacy/information leakage problem is more about deployed apps, where the information is out in the wild. I doubt Emacs is involved in communicating information about the failure back to the original developer.

Joe_Groff · January 15, 2020, 10:32pm

I agree that we may as well specify the format, because it's useful for tooling to be able to process it. It seems like build systems such as swiftpm or Xcode could provide a mechanism by which the Module/file.swift string could be mapped back to a filesystem path appropriate for the current build configuration.

benrimmington · January 16, 2020, 5:14am

How will #file and #filePath interact with #sourceLocation(file:line:) — the line control statement?

jayton · January 16, 2020, 3:49pm

Incidentally, is this practical? Many consumers of #file take StaticStrings, on the assumption that this is cheap. StaticString’s accessors are transparent, so this couldn’t be implemented as a new type of StaticString; you’d have to synthesize a StaticString at runtime each time assert etc is called, unless new versions that take an @autoclosure (String) -> Void were introduced.

Ilias_Karim · January 16, 2020, 5:04pm

I've had some time to reconsider the proposal. After reading @dabrahams response, I find myself agreeing that the proposal adds complexity that may not be needed.

While it's excellent that the proposal anticipates migration costs, a more straightforward way to preserve the intention of previous the #file variable which eases adoption would be to add new identifiers like #fileName.

In particular, I do NOT find that the proposal fulfills its goal. It trades one annoyance for another. Now instead of parsing the filename from a file path, engineers will parse the filename from a "filename (module)" abomination.

-1 on account that the problem being addressed is not significant enough to warrant a change to Swift. It adds complexity where it is unnecessary. I have used __FILE__ in Objective-C and have loosely followed the discussion since the proposal was posted for discussion.

Karl · January 16, 2020, 8:08pm

Which begs the question: how does the flexibility exist to change the format at this time? Is it not true that users and tooling have expectations of the string's format today?

beccadax · January 17, 2020, 12:13am

Here's my suggestion, with one notable gap in it:

The file string will be in the following format:
file-string → module-name '/' directories(opt) file-name

module-name → identifier
directories → path-component '/' directories(opt)
file-name → path-suffix

identifier → [same as Swift identifier]
path-component → [all characters except slash and null]
'/' means '/' in this grammar, even on systems with a different path separator.

In current compilers, directories is always omitted. Future compilers may include some combination of directory names that appear in #filePath to distinguish it from other files with the same file-name; the algorithm they will use to select these directory names is not specified by this proposal.

The hole is, of course, the last sentence. I haven't yet come up with a good way to decide which directories to include. ("Remove the common prefix" is not a good choice because, if your compile includes generated source files outside of the tree, you end up with very large prefixes containing much of the information we'd like to remove.)

Suggestions are welcome, or we can just punt on this question, or we can even decide that we'll never support this.

#filePath returns the exact string passed to #sourceLocation(file:). #file splits the #filePath string on path separators, takes the last component, and combines it with the module name.

(This means that #sourceLocation actually is a way you could introduce two "files" with the same file-name. That suggests that maybe we ought to specify and implement the algorithm for generating directories—or we could just emit a warning on #sourceLocations that introduce conflicts. After all, #sourceLocation already lets you do silly things like roll line numbers backwards.)

dabrahams · January 17, 2020, 5:43am

Makes sense. Let's acknowledge, though, that this is a new kind of “safety” when it comes to Swift. And if we're actually going to engage in closing this safety hole, it seems to me:

If file paths are a security issue, so are the names of source files (and the names of modules). In general, people don't think about keeping sensitive information, such as something that might reveal the architecture of proprietary code, out of source file names. In fact, source files are usually named for implementation details of the program that end-users should never see.
assert, precondition, and fatalError (and maybe others) reveal that potentially-sensitive information implicitly, even if they are changed not to emit path information. The implicitness is a big part of what makes them dangerous.
I 100% agree with @QuinceyMorris that if we think this is a real issue, “the correct mechanism back to the source is through debug info.” That is, neither filenames nor paths should be emitted into executables, and functions like fatalError should emit some opaque identifier (like a hash) that can be mapped back to source lines only with the use of debug info.
We should therefore consider #file and #filePath (both as the former exists and as they are both proposed) to be unsafe constructs, especially when used as default arguments, because they are a vector for implicitly revealing this information. They should be changed to emit the kind of opaque identifier mentioned above.
For the tooling implications to work out well, we should ensure that mapping from opaque identifiers to source locations is easy and performant.

dabrahams · January 17, 2020, 5:47am

Yes, it's ugly, but you can probably afford to synthesize a StaticString at runtime if you're about to stop the program ;-).

QuinceyMorris · January 17, 2020, 6:58am

I like that idea of safety, but …

That's a bit too sweeping, I think. File names are often the same as type names, which are exposed already, in most cases. Paths (omitting file names) are different because they contain information that isn't part of the executable in any other way.

That's a slightly incomplete quote, since I was talking about a debug workflow.

For a release build, it would be wonderful if there was a way of trekking back to the source code via an opaque identifier, but I don't think debug info is a reasonable strategy. (Relying on debug info would basically be something akin to symbolicating a crash log, which is a huge PITA sometimes.)

In the past, I've considered using random strings for fatalError, preconditions, etc:

    fatalError("891919891") // ~= "this furble can't be mippled"

but there was no point in going to that trouble if the fatalError exposed the file path and line number anyway.

beccadax · January 17, 2020, 6:12pm

Sure, in theory, but there's a trade-off to be made between privacy and functionality. If we wanted to improve the assertion experience even more, we could embed the entire source file in the binary and point to the exact expression that tripped the assertion; we don't because that would be way too revealing. Contrariwise, we could have fatalError() emit nothing but the caller's memory address; we don't because that would offer woefully insufficient functionality.

The question is, does the current #file string strike the correct balance between privacy and functionality? I think that it doesn't—it includes much more information than is necessary to identify the location of a failure.

Could the proposed #file string still leak private information? Certainly! But most Swift users probably do not have such stringent privacy requirements, and the ones who can deploy obfuscators and the tooling to deal with their results will also have enough security expertise to recognize whether they need them. For everyone else, I think new #file is just enough, especially if we provide some compiler tooling to map #file strings back to #filePaths.