Concise Magic File Names

beccadax · December 3, 2019, 12:39am

In previous threads like We Need #fileName, we've generally agreed that the current #file feature makes some of the wrong tradeoffs by generating such a verbose path. I've prepared a proposal (and a matching implementation) to fix this problem.

Thanks to @davedelong for getting this process started.

Gist link

Concise magic file names

Proposal: SE-NNNN
Authors: Brent Royal-Gordon, Dave DeLong
Review Manager: TBD
Status: Awaiting review
Implementation: apple/swift#25656

Introduction

Today, #file evaluates to a string literal containing the full path to the current source file. We propose to instead have it evaluate to a human-readable string containing the filename and module name, while preserving the existing behavior in a new #filePath expression.

Swift-evolution thread: We need #fileName

Motivation

In Swift today, the magic identifier #file evaluates to a string literal containing the full path to the current file. It's a nice way to trace the location of logic occurring in a Swift process, but its use of a full path has a lot of drawbacks:

It clutters the debug output with irrelevant information. The path is usually very long and only a little bit of that information is necessary to locate the file in question. In a hundred-character path, the developer usually only cares about the last ten or twenty.
It's not portable. The same project may be located at different paths on different machines; a developer looking at a crash log doesn't care about a path on a build server.
It can inadvently reveal private or sensitive information. The full path to a source file may contain a developer's username, hints about the configuration of a build farm, proprietary versions or identifiers, or the Sailor Scout you named an external disk after. Users probably don't know that this information is embedded in their binaries and may not want it to be there.
It bloats the final size of the binary. In testing with the Swift benchmark suite, a shorter #file string reduced code size by up to 5%. The large code also impacts runtime performance; in the same tests, a couple dozen benchmarks ran noticeably faster, with several taking 22% less time.
It introduces artificial differences between binaries built on different machines. For instance, the same code built in two different environments might produce different binaries with different hashes. This makes life difficult for anyone trying to do distributed builds or find the differences between two binaries.

Situations where the full path is needed

While the full path is not needed when printing messages for the developer, some uses of #file do rely on it. In particular, Swift tests sometimes use #file to compute paths to fixtures relative to the source file that uses them. This has historically been necessary in SwiftPM because it did not support resources, but SE-0271 has added that feature and there is little need to resort to these tricks anymore.

An analysis of the 1,073 places where #file is written in the Swift Source Compatibility Suite suggests that well over 90% of uses would be better served by a #file that did not include a full path. However, we do need to make some concession to the small portion of uses that need a full path for some reason.

Methodology

We applied several regular expressions to all 108 projects in the Source Compatibility Suite to try to classify uses of #file.

980 uses matched patterns that we believe represent display to humans:

419 uses matched a pattern for StaticString = #file; we take these to be default arguments that are eventually passed to StaticString-taking APIs like fatalError or XCTAssertEqual, since there is little other reason to use StaticString.
281 uses matched patterns for <StaticString typealias> = #file where the project usually passes values of that type to APIs like fatalError or XCTAssertEqual.
148 uses matched a pattern for String = #file, but also referenced #line on the same line. We take these to be attempts to capture a full source location for display to the user.
132 uses matched a pattern for interpolations of #file; we take these to be interpolated into a string that is then displayed to a user.

41 uses matched patterns that we believe represent path computation:

10 uses matched a pattern for String = #file, but did not have #line on the same line. We take these to be default arguments that will eventually be passed to String-taking file APIs like URL.init(fileURLWithPath:).
31 uses matched a pattern for uses in parenthesized lists (but didn't match the interpolation pattern); we take these to be passed to file APIs.

52 uses did not match any of these patterns.

We therefore estimate that about 6% (±3%) of uses actually want a full path so they can compute paths to other files, while 94% (±3%) would be better served by a more succinct string.

A manual check of 172 uses in 16 projects suggested that about 95% displayed the #file value to the user; this is in line with the regex-based estimate.

Proposed solution

We propose that #file should evaluate to a human-readable string which uniquely identifies a source file in the process, but which does not contain the full path. Specifically, it will contain the file name and module name.[1]

The new #file will otherwise behave as it did before, including its special behavior in default arguments. Standard library assertion functions will continue to use #file, and we encourage developers to use it in test helpers and most other places where they use #file today.

For those rare cases where developers actually need a full path, we propose adding a #filePath magic identifier with the same behavior that #file had in previous versions of Swift.

In a module named MagicFile and a file named NNNN-magic-file.swift at /Users/brent/Desktop, these features might result in output like this:

print(#file)     // => "NNNN-magic-file.swift (MagicFile)"
print(#filePath) // => "/Users/brent/Desktop/NNNN-magic-file.swift"

fatalError("Something bad happened!")
// => "Fatal error: Something bad happened!: file NNNN-magic-file.swift (MagicFile), line 1"

[1] This is sufficient to uniquely identify a file because the Swift compiler will not build a module which contains two identically-named source files, even if they're in different directories. This limitation ensures that identically-named private and fileprivate declarations in different files will have unique mangled names.

Detailed design

We do not specify the exact string used in #file—we only specify that it is human-readable and most likely unique in the process. (We expect some of the notation for module names to change, and we'd like to be able to make the string match it without another proposal.)

Disabling `#filePath`

Although it is not technically part of this proposal, we are considering adding a new compiler flag which distributed build systems can use to disable #filePath and other features incompatible with their build model.

Source compatibility

All existing source code will continue to compile, but the compiler will generate different strings for #file expressions. We anticipate that this will change the behavior of a small amount of existing code in non-trivial ways. However, we believe that this will most heavily impact tests and test support libraries, resulting in easily detected test failures rather than hidden bugs, and that adding #filePath makes these failures easy to correct.

Effect on ABI stability

None.

Effect on API resilience

None.

Alternatives considered

Deprecate `#file` and introduce two new syntaxes

Rather than changing the meaning of #file, we could keep its existing behavior, deprecate it, and provide two alternatives:

#filePath would continue to use the full path.
#fileName would use this new name-and-module string.

This is a more conservative approach that would avoid breaking any existing uses. We choose not to propose it for three reasons:

The name #fileName is misleading, because it sounds like the string only contains the file name, but it also contains the module name. #file is more vague, so we're more comfortable saying that it's "a string that identifies the file".
This alternative will force users to update every use of #file to one or the other option. We feel this is burdensome and unnecessary given how much more frequently the #fileName behavior would be appropriate.
This alternative gives users no guidance on which feature they ought to use. We feel that giving #file a shorter name gives users a soft push towards using it when they can, while resorting to #filePath only when necessary.

However, it's a perfectly reasonable alternative if the Core Team thinks this proposal is too radical.

Support more than two `#file` variants

We considered introducing additional #file-like features to generate other strings, selecting between them either with a compiler flag or with different magic identifiers. The full set of behaviors we considered included:

Path as written in the compiler invocation
Guaranteed-absolute path
Path relative to the Xcode SOURCE_DIR value, or some equivalent
Last component of the path (file name only)
File name plus module name
Empty string (sensible as a compiler flag)

We ultimately decided that supporting only 1 (as #filePath) and 5 (as #file) would adequately cover the use cases for #file. Five different syntaxes would devote a lot of language surface area to a small niche, and controlling the behavior with a compiler flag would create six language dialects that might break some code. Some of these behaviors would also require introducing new concepts into the compiler or would cause trouble for distributed build systems.

One particularly interesting change from our approach would be to replace #filePath's behavior 1 (path as written in the compiler invocation) with behavior 2 (guaranteed-absolute path); after all, a guaranteed-absolute path may be easier to process than one that depends on the compiler invocation. It seems like a reasonable alternative, but we think we could make that change later if we wished.

Other alternatives

We considered introducing a new alternative to #file (e.g. #fileName) while preserving the existing meaning of #file. However, a great deal of code already uses #file and would in practice probably never be converted to #fileName. The vast majority of this code would benefit from the new behavior, so we think it would be better to automatically adopt it. (Note that clang supports a __FILE_NAME__ alternative, but most code still uses __FILE__ anyway.)

We considered switching between the old and new #file behavior with a compiler flag. However, this creates a language dialect, and compiler flags are not a natural interface for users.

Finally, we could change the behavior of #file without offering an escape hatch. However, we think that the existing behavior is useful in rare circumstances and should not be totally removed.

SDGGiesbrecht · December 3, 2019, 1:02am

Sounds good to me.

Although it is not technically part of this proposal, we are considering adding a new compiler flag which distributed build systems can use to disable #filePath and other features incompatible with their build model.

I do use #filePath heavily in testing and scripting situations, but I could still get behind having the compiler strip it by default in any optimized build, without the need for its own flag.

Aciid · December 3, 2019, 2:16am

I am a strong -1 on the current pitch as it breaks existing code. Swift decided to be source compatible and this doesn’t seem like it meets the high bar of breaking existing working code. Perhaps we should gate this behaviour to a new -swift-version?

allevato · December 3, 2019, 2:26am

I'm glad to see this being clarified and formalized. Overall, I think this is the right direction to go—you rightly point out that we don't need a large number of knobs to fine-tune this feature.

I feel like the lede is buried a bit in the write-up regarding the description of the current (and proposed renamed) behavior; the initial sections of the pitch say:

Where a reader could interpret "full path to the [...] file" as meaning the absolute path, rather than "the path exactly as written in the swiftc invocation", which isn't clarified until much later in "Alternatives Considered".

In practice #file ends up frequently being the absolute path because that's what Xcode passes to the compiler, but Bazel is an example of a build system that passes paths that are relative to the workspace root (which is the CWD of invoked actions) and thus yields #file strings that match. So I think it would be helpful to clarify the "path exactly as written in the swiftc invocation" part up front.

The motivation cited in this section feels a bit inaccurate. #filePath on its own, if implemented as described here (i.e., as currently implemented), isn't necessarily incompatible with distributed build systems, as long as the build system does the right thing. Using Bazel again as an example, a remote execution environment might use different absolute paths to the workspace on each machine (for example, incorporating a hash or unique job ID), so a distributed build of a multi-module Swift project might have absolute paths that look like this:

machine 1: /build/01234567/username/workspaceroot/path/to/module1/SomeClass.swift
machine 2: /build/abcdef01/username/workspaceroot/path/to/module2/SomeStruct.swift
machine 3: /build/0badbeef/username/workspaceroot/path/to/module3/SomeEnum.swift

Since Bazel will use everything up through .../workspaceroot as the CWD for swiftc, the Swift build rules only pass path/to/module1/SomeClass.swift, path/to/module2/SomeStruct.swift, etc. to the compiler (in fact, Bazel doesn't even provide a way to get the full absolute paths when constructing those invocations). Therefore, #filePath strings would still be distributed-build-compatible because they don't encode any information about the world outside of the workspace.

However, purely as a security measure, I think it's reasonable that someone would want to disable #filePath in certain builds to avoid leaking details about their source directory layout in strings that ship with their binary.

This would be harmful for creating hermetic, reproducible builds. Surfacing absolute paths in user-visible ways when they weren't originally absolute breaks those guarantees and has historically required the addition of more flags to "fix" the problem, like -fdebug-prefix-map in Clang and -debug-prefix-map in Swift to strip path prefixes off of paths encoded in debug info.

If the Swift team wanted to always resolve the absolute paths regardless of how they were passed to the compiler, they would need to honor -debug-prefix-map, -working-directory, or some new flag to make those builds produce consistent strings on distributed systems where the workspace may not always be at the same file system location. In this case, it's best to simply avoid the problem by not introducing it in the first place.

anandabits · December 3, 2019, 2:39am

This seems generally sufficient, although I would like the format to be fully specified. This is necessary in order to support use cases that need to parse the string in order to tie file and line information back to a specific line of source code (for example, this might be useful in logs).

Thanks for including this detail. I wasn’t aware of it and believe the guarantee of uniqueness is important.

sindresorhus · December 3, 2019, 2:41am

I think adding a new #context magic keyword would be a better solution. This would be non-breaking and also reduce the verbosity when you need both file, line, and column. It would also allow adding new context metadata in the future without adding a new keyword each time.

SDGGiesbrecht · December 3, 2019, 3:02am

#context seems like a great idea for some situations, but I don’t think it actually helps with the motivation given here. It does nothing to resolve the problem of leaked details and wasted space. The ability to compile only the specific information you want is still important.

sindresorhus · December 3, 2019, 5:29am

I meant that the file from #context would be the short path, and filePath from #context would be the full path. So it would resolve the leaked detail and wasted space if the user uses file from #context, but not if they continue to use #file.

The ability to compile only the specific information you want is still important.

See this comment: We need `#fileName` - #32 by Joe_Groff

SDGGiesbrecht · December 3, 2019, 5:41am

Yes. You’re right. And I was part of that thread. I guess my memory span is shorter that I thought.

Max_Howell1 · December 3, 2019, 6:02am

You can’t change #file, why break (sure, tests mostly) code?

My personal preference would be a new #something, with paths relative to the source root, which swiftPM could easily pass to swiftc, but otherwise raw instantiations of swiftc would be fullpath or relative to CWD unless the flag that controls this is passed.

Would also be nice to simultaneously go in and remove the “same filename is illegal” restriction for modules, since this is an artificial limitation as confirmed by Rose in previous conversations. The limitation is to identify source units and could again be fixed if the module compilation knew the module-root and thus the relative path of the file. Though this bit is more complex for sure.

If that isn’t going to be fixed (because I can see why nobody could be bothered), then well, this proposal depends on this behavior, which is not intentional, just coincidental. That can be fixed though, the filename portion of the static string could be module root relative.

Finally, #filePath is a little ugly, #filename perhaps?

DevAndArtist · December 3, 2019, 6:41am

Just another option.

#file == #file(path)
#file(name) // new

lukasa · December 3, 2019, 11:11am

I'm with @aciid and @Max_Howell1: I don't think this pitch has come anywhere close to justifying why #file should be changed. I'm absolutely happy to introduce new variants, and if you want to force users to think about their use of #file I'm ok with seeing it deprecated as well, but changing the behaviour is pretty brutal.

Incidentally, incautiously deprecating a # option is almost always brutal because it's not very easy to write multi-Swift-version libraries that can shim their way out of the problem. The result of that is that multi-Swift-version libraries are forced to disable warnings-as-errors, which tends to let bugs slip by. This is acceptable if the deprecation warning is emitted based on the tools version used to compile the source, instead of the compiler version.

AlexisQapa · December 3, 2019, 3:30pm

-1 for the reasons exposed by @Max_Howell1.

I personally find the unique filename rule really inconvenient and I'm hoping it might lifted someday.

beccadax · December 4, 2019, 11:00pm

A collection of responses to various aspects of this objection:

This is a very fair question, and changing #file instead of deprecating and replacing it is certainly the most aggressive part of the proposal. Here are the major benefits of changing it, as I see them:

The benchmarking indicates that, if most code adopts the new behavior, we'll get substantial code size and performance gains at an unbelievably low engineering cost. It's hard to find bang-for-your-buck like this.
The information leaks caused by this are really pretty bad. Most likely, every app I ever shipped as an independent developer had my username embedded in the binary. People don't realize that this is happening and there's no sign of it in their source code. There's a fundamental issue of consent here that needs to be addressed.
Experience with __FILE_NAME__ in clang indicates that merely providing an alternative won't really make a difference—people are unlikely to adopt it in the places they ought to. To actually see these benefits in practice, we would at least need to deprecate #file and migrate people to one of two alternatives. But that would create a lot more friction (and a lot more backwards-source-compatibility issues) than changing #file would.

And here are the reasons I think the source break is acceptable:

Very little code will be broken by this change. The vast majority of uses (my estimate was 94%) merely display the string; they really don't care about its exact contents.
The code that will be broken by it is already very fragile. It would be broken by switching to a different build system, by your existing build system generating compiler invocations slightly differently, by compiling from a temporary copy of the file, by using #sourceLocation(file:), etc. Swift did not, and frankly could not, promise that these uses of #file wouldn't break, because so much of what made them work is outside of its control.
Broken code is very likely to be discovered immediately because these patterns were only ever usable in tests and local scripts. (I would hope tests would break when they can't load fixtures, anyway...) Code that would be broken by this could never have been deployed in binary form because it would not have worked without the project's source code installed at the right path.
Once you have discovered that you need to fix the code, the fix is trivial (change #file to #filePath).

No promises, but I'm hoping that we can get the new identifier(s) supported in Swift 5.2 but not actually deprecate/change the behavior of #file until the version after. That would at least ease the source compatibility problem.

I don't think we can design the language around warnings-as-errors. Warnings exist so that the compiler can point out issues that aren't worth breaking the build; warnings-as-errors overrides that judgment and breaks the build anyway. If you think that some of the warnings the compiler emitted for your code shouldn't be errors, the right answer is to turn off warnings-as-errors for your project, not to stop diagnosing the issue for everyone.

If this feature happened to be released in a version with a new language mode, I think that would be a reasonable solution, but I don't know if that will happen and I don't think it's worth delaying the change to wait for a new language mode.

We have promised to preserve source compatibility, but that promise has always had its limits. For example, there is always potential for us to break some code, somewhere, with any new overload, but that doesn't stop us from adding new overloads unless we see that the break is pretty common (e.g. count(where:)). I think this very fragile, poorly-supported pattern is another candidate for that treatment.

I ultimately think that it's reasonable to decide that changing #file's behavior is a bridge too far, but I think it's best to make the case for changing #file and then let the core team decide if the argument is strong enough. That's why I included a fully-formed alternative that doesn't break existing uses of #file in "alternatives considered"—it's basically ready to go if the core team wants it.

beccadax · December 6, 2019, 1:14am

Responses to other comments:

Thank you—I've rephrased parts of the proposal in light of these comments.

I've also called out the fact that, if we wanted to make #filePath absolute, we'd need to make it honor -debug-prefix-map. (Is there any reason why that wouldn't be a good approach?)

The string isn't intended to be parsed, although in practice, nothing will stop you from throwing a regular expression at it.

I don't want to specify too much here so that we can preserve some flexibility, allowing us to, for instance…

Change the format of #file if we drop the unique filename rule without needing an evolution proposal.

A naïve implementation of #context would end up increasing, not reducing, code size. For instance, it would probably contain the #function string, which today is rarely used. It would also contain both #file and #filePath even though only one would likely be used. You would have to count on the optimizer to delete the unused values.

Personally, I would like to see us change the way #file and friends are treated when within default arguments so that they are always generated at the ultimate call site. That would allow anyone to create their own type which automatically captured the contextual information they cared about, and nothing more:

struct Context: Hashable {
  let file: StaticString
  let line: UInt

  init(file: StaticString = #file, line: UInt = #line) {
    self.file = file
    self.line = line
  }
}

func foo(context: Context = Context()) {
  assert(context != Context(line: #line - 1),
         "captured info about foo()'s call site, not our default argument")
}

But that is a different (and separately controversial) proposal.

#filename sounds like it might be the opposite of what it is, i.e. the name of the file by itself without a path. #filePath makes it clear that the path to the file is included.

As a personal rule, I try to make #foo(...) and @foo(...) syntax match normal argument list syntax as much as possible, with the hope that we can eventually make these features more extensible without having to special-case a lot of existing syntax. This would violate that (admittedly non-critical) rule.

allevato · December 6, 2019, 3:19am

Thanks!

I think that would be reasonable, if it was decided to make the path always absolute. One might argue that -debug-prefix-map (and the Clang flag it was based on) only affects the emitted debug info and that it's odd to overload the flag in a way that also affects the runtime behavior of the program, but if we're comfortable saying that #filePath should only be used for debugging, then maybe that's fine.

Digging into the current state of Clang a bit more, it looks like someone (only a few days ago!) just landed a couple new flags around this, based on flags already implemented in gcc: Rather than change the existing behavior of -fdebug-prefix-map, they added -fmacro-prefix-map to remap path prefixes in __FILE__ and -ffile-prefix-map to apply a remapping simultaneously using both -fmacro-prefix-map and -fdebug-prefix-map.

But that feels like a lot of unnecessary complexity. Again, if it was decided that #filePath should always be absolute, I think -debug-prefix-map could also serve that purpose here, unless someone has a compelling reason to want remapped paths in their DI but not elsewhere in their binary?

But looping back around, I think keeping the path in the form originally passed to the compiler is an even better solution because it avoids all these questions.

jayton · December 6, 2019, 9:06am

I’m very happy to see movement on this, and especially happy to see a proposal that dares to fix the bad default behaviour rather than assume legacy must always be preserved.

beccadax · December 6, 2019, 11:42pm

It might be necessary complexity for C—I was terrified to discover recently that #include __FILE__ is a thing that exists—but I agree that it probably wouldn't be necessary for Swift.

ben-cohen · December 7, 2019, 12:00am

"Source compatible" is not the same as "behavior compatible". This change does not fall under Swift's source compatibility guarantees. Which is not to say the change should not be given strict scrutiny.

I have nothing more to add over Brent's comments on just how big the code size wins are, and how bad the current information leakage is. These factors are real and considerable. This is going to come down to subjective preferences but these points are so significant IMO as to vastly outweigh concerns about breaking workflows that exploit the presence of the path for testing purposes (if someone finds a compelling example of how this may break code in production rather than tests or build systems, that might be different – but I would still find it a tough sell).

possen · December 7, 2019, 2:11am

Was thinking it might be useful to separate the pieces of info returned. #file would return just file name, #path would just return path, and #module would return module. Each part could be used to construct what you needed.