SE-0274: Concise Magic File Names

The review of SE-0274 — Concise Magic File Names begins now and runs through January 16, 2020.

Reviews are an important part of the Swift evolution process. All review feedback should be either on this forum thread or, if you would like to keep your feedback private, directly to the review manager (via email or direct message in the Swift forums).

What goes into a review of a proposal?

The goal of the review process is to improve the proposal under review through constructive criticism and, eventually, determine the direction of Swift.

When reviewing a proposal, here are some questions to consider:

  • What is your evaluation of the proposal?

  • Is the problem being addressed significant enough to warrant a change to Swift?

  • Does this proposal fit well with the feel and direction of Swift?

  • If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?

  • How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

Thanks,
Ben Cohen
Review Manager

17 Likes

+1. Initially, I was against changing the behavior of #file, but after reading the discussion on the forums and the proposal rationale, I now think it's a better solution than the alternatives considered. I'm also convinced the status quo is actively harmful, justifying a potentially breaking change to the behavior.

Yes, I think the proposal does a good job of demonstrating the serious issues with the current approach to #file. Both the binary size and information leakage issues are worth fixing, and the fact that this proposal resolves both is very strong justification.

Just to provide some additional anecdotal evidence, when this change was first pitched I took a look at a large (>100k line) iOS app I work on to see how it would be affected. I didn't find any uses of #file which would break after this change, and I estimate it would shave off ~3% of the binary size if it was accepted.

I think so. The new behavior of #file seems like an overall better design, and one which we might have introduced from the start if not for the historical precedent of __FILE__.

Nearly all of the languages I've used in the past have had a similar feature, but most of them are similar to __FILE__ in that they include the entire path. As noted in the pitch thread, clang does provide a __FILE_NAME__ macro which is similar to this proposal, and seems to have been well-received by users.

I read the proposal and followed most of the discussion on the pitch thread.

2 Likes

As I said in the pitch thread, I consider this proposal to be a good idea even though I am a heavy user of #file during development‐time scripting.

As I read the proposal again though, this sentence caught my attention:

However, we should not merely tell projects like these that "your tests are now broken and fixing them will require you to totally redesign how you access fixtures"—we should provide a drop-in solution for projects that are currently using #file in these ways.

Instead of a drop‐in solution that’s still not‐so‐great, I wonder if it would be better to actually solve the problem and then forgo #filePath altogether. It strikes me that I have never really cared about the path of the file itself in any of the places where I use #file. What I’ve always been after is the root of the repository. I would be perfectly happy if Swift were to simply adopt an official recommendation that all tooling provide the source root in a standardized environment variable during testing and scripting (such as swift run and extensible build tools). If SwiftPM, Xcode, Bazel and whatever else all set a consistent PACKAGE_ROOT environment variable, then I would happily use that instead of deriving it from #filePath. Then we could avoid ever creating #filePath—or at least deprecate it from the start.

I see several advantages to doing that:

  • Since #file will be used mostly in default arguments, #filePath may end up being the one actually appearing more often in source code. With its greater visibility there is a risk users will discover it first and put us back in the position we are today, where everyone is bloating binaries and leaking sensitive information out of ignorance.
  • ProcessInfo.processInfo.environment["PACKAGE_ROOT"] is more verbose and less likely to be found or used without understanding what it does.
  • Environment variables don’t end up in in binaries, so even if the user includes such code in a distribution, there is no risk of leaked information.
  • Since it isn’t trying to port old behaviour, PACKAGE_ROOT can define its own semantics. For example, instead of an absolute path, it could be specified as a relative path from the current working directory from which the debug build is launched. That would give build systems much more flexibility, but users could still safely do URL(fileURLWithPath: ProcessInfo.processInfo.environment["PACKAGE_ROOT"]!)

or SOURCE_ROOT or REPOSITORY_ROOT. I don’t particularly care what it is called.

8 Likes

+1 on the changes to #file. The proposal is very well thought out and well motivated. I haven’t ever relied on the full path being available so I don’t have a strong opinion about #filePath.

+1 on the motivations, but -1 on the execution.

I hold the same critique as I did in the pitch thread:

I would much prefer this to be implemented as a struct, so that we have a single #context (or whatever) for all your context checking needs, rather than creating a bunch of separate # identifiers.

2 Likes

This would probably not fulfill the motivations set out in the proposal. Providing a struct that includes the path would require embedding the path in the binary.* Even if it isn't used – the caller would still need to provide it to the callee because it can't always know if it needs it. So the privacy concerns, while abated, still remain, as does the binary size impact.

* you say relative path – this does partially resolve the issue but has other downsides and is orthogonal to the concept of bundling the info into a type

7 Likes

Replacing #file by something else with a different name would preserve API compatibility, though. What do you do if one of your dependencies hasn't been updated to use SwiftPM resources and gets tests fixtures by assuming #file is a relative path? All of those tests will break, and sometimes the original authors of the library are not terribly responsive.

While this proposal is very carefully described as being "source-compatible", it does break the API. Lots of code depends on the current behaviour, and adopting this change would break those things.

My evaluation of the proposal is -1 because of that. I think we should reconsider how we provide source locations more generally (via a Context or SourceLocation type), and then we can build this new behaviour in to it from the start. The existing #file and #line literals would be deprecated but preserve their current behaviour.

That new type might not even include an absolute path at all. It might only provide relative paths - relative to some base address which you can set (and if you want absolute paths, that can be relative to the system root). We would also have a place to document exactly what this String contains or what it is supposed to be used for.

I think this is a well thought proposal.

That said, I have one reservation about including the module name in #file. While it's very practical to do so because in most contexts where you want the file name you also want to know to which module it belongs, it seems like it'd fit better in the language if we had #file and #module passed as separate parameters into functions.

But since we're not starting from a clean state and we want something that works with existing functions and APIs which currently all take a single string, including the module within the string is probably most reasonable solution.

Maybe we should consider adding #fileName and #module later so people don't try to parse #file to get to these.

The way I see it, a struct approach won't work that well. It's simple to say we should reconsider, but here's the implications I can see with this approach.

All the calls to assert and fatalError in everyone's compiled code currently want to call the function using a string for the file parameter. If you change this to a different type (a struct), the standard library must continue to support the old string-based function. Ok, that part is not too complicated: just add an overload. Then you must find a way for the new function taking a SourceLocation type to get called by default when compiling new code. Ok, that could be accomplished by the removal of the default parameter in the old function. Except that you should do that only when the deployment target is high enough to make the new version available. For that, we'd need to add in the language a way for default parameters to depend on the deployment target :expressionless:, or add a third always-inlined overload dispatching dynamically depending on availability . Also, other libraries providing similar functions taking #file and #line would need to be migrated in the same way. That migration seems a lot of trouble for everyone.

On top of that, one goal is to avoid embedding the full path in the binary. That will only work as long as all the libraries you use that take #file in their argument are migrated to the new struct-based system. I'm assuming the struct does not include the path. This also requires everyone in the chain to use a deployment target high enough so the new functions in the standard library are available, which might take a long time.

Compare this to just changing what is returned by #file: just recompile and it's done.

1 Like

Yes, please! As I said in the pitch thread, I believe the existing behavior is wrong. The first bullet point in the Motivation explains why, and I think it's sufficient to overcome any concerns about breakage.

I actually have one point of critique that follows from that motivation...

We do not specify the exact string produced by #file because...

I would propose that this be strengthened ever so slightly to make the minimal guarantee that no path component outside the project will ever be included. I think that would preserve future implementation flexibility but fully address the "leak" Motivation.

1 Like

I am a strong +1 as long as we get #filePath as well as is proposed. I make heavy use of it for build scripts and tools with SwiftPM. I would be very sad if that part of the proposal got lost for some reason.

I am also think the new #file will be great for libraries and apps since that is what I have seen them want/need that majority of the time.

1 Like

+1, including repurposing #file and adding #filePath.

I often manually remove the path in my logs, not for security concerns, but to make log outputs shorter. If this would give me this, but additionally remove the path from the binary, that's a huge win for me!

I've read the proposal and followed the pitch thread.

1 Like

+1 I briefly reviewed the proposal and I like the idea that this addresses some security concerns.

1 Like

This is difficult to do because the compiler doesn't really have any notion of what "the project" is—it is completely ignorant of any Xcode-style project files and there's no flag or anything that specifies the source directory. You could perhaps define "the project directory" as the common prefix of the paths of all source files, but this breaks down if you, say, create a .swift file in DerivedData and then compile it into your project.

1 Like

True. Just keep the absolute path out of it then.

It would still be valuable to have, for relative file name, module name, function name, type name, line number, etc.

I’m strongly in favor of the proposal.

My one hesitation is that perhaps we should add (and immediately deprecate) a compiler flag to retain the old behavior of #file, perhaps emitting a warning for each use of the old behavior at compile time. This would give projects that rely on the old behavior for tests the ability to do a staged migration to #filePath.

Most definitely. The potential for leaking personal information via absolute paths necessitates a change like the one proposed.

Although the magic identifiers are a somewhat ugly corner of the language, the proposed change is consistent with them.

The proposal is consistent with the features offered by clang.

I read the original pitch thread and the full proposal. I’ve participated in changing code to work around the current information-leaking behavior of #file.

I think it makes perfect sense.

Yes, even if it would happen to break some source files, it addresses an ergonomic issue, and code relying on the current behaviour has been relying on something which to many other developers has only been a disadvantage. #filePath will also cover those use cases in the future.

I also think it is preferable to alter #file as proposed as opposed to replacing it with a Context or Source struct mentioned as an alternative. In most contexts where I use #file, it is for quick debug output, and the more complex alternatives suggested are too unwieldy for those times when you just want a filename.

Yes.

I don't think I have, but I would expect them to behave in the same way as proposed here.

I read the original pitch and ensuing discussion as well as the formal pitch.

I like having a short filename option, but I don’t consider it to be particularly important.

I think the “filename (module)” is a bad idea. While the proposal states that this format could be changed later as needed, the reality is that many times people will be parsing the module name and using it in some non-user-output sort of way.

If we do choose that sort of convention, then the format should be described in the Swift documentation.

If we want to expose the module name (a good idea!), something like this makes more sense:

#filename — the short filename
#filepath— the full absolute path to the file
#module — the name of the module

It would be great to have something which gives the equivalent of String(describing: type(of: self))).

I think the best solution is to deprecate #file and create a #context, which resolves to a struct containing the relevant fields.

The #context struct could have a nice CustomStringConvertible that returns a nice summary in whatever format which is suitable for debugging purposes.

This keeps things nice and clean and avoids polluting the namespace with an ever-growing number of #names.

This would also make it simple and give a consistent place to add future info, if needed.

If we change the current behavior of #file or deprecate it, the full path behavior should be made available another way, such as #filePath, or maybe #context.absolutePath.

While these will all be breaking changes for some small number of users, I don’t think that is too relevant, as it would be pretty trivial to search and race #file to #filePath or whatever.

One concern — would moving everything into a struct necessitate including that full path information in the binary, even if that field is not referenced anywhere? Perhaps someone can speak to this.

If that is the case, then I’d prefer leaving the full path available as a special case outside of the #context struct.

Overall I think the change will be good, and some variation of this proposal should be adopted.

I would really like a solution to replace “ String(describing: type(of: self)))” with something that gives a similar result.

I’ve got many projects that have that text in there hundreds of time. Unlike #file, for example, you can’t put that as a default argument to a logging function and have it be useful.

1 Like

I agree. People will end up wanting access to the file and module name separately, and in absence of a direct way to get it, parse it from #file. In addition: I assume including the module name in #file would mean repeating the module name plus a few characters for every file. That might not amount to more than a few kilobytes for large projects, but still.

So with the prospect of people wanting to access this information, plus the small size overhead, I'd rather have #file and #module as separate things, instead of jamming them into one.

7 Likes

No, no, no! Let's not shoot ourselves in the foot (again) by treating a non-parseable description as some kind of API contract. The proposal says:

We do not specify the exact string produced by #file because it is not intended to be machine-parseable and we want to preserve flexibility as the compiler's capabilities change.

If the argument is that lots of people have build tooling requiring access to the file name (specifically) and the module name (specifically) then let's update the proposal to have magic syntax for those things (specifically). Or a separate proposal.

Otherwise, let's not replace one broken solution with a different broken solution.

In a disturbing trend I see more and more in these forums recently, it seems a bit unbalanced in its negative evaluation of the status quo—at least it makes no attempt to illuminate the intentional reasons for the current design—which makes me less confident that it is proposing the best answer to the issues.

#file exposes the paths passed to the compiler so that tooling can easily reveal the exact file and source line when an assertion fires or a test fails. The message currently printed on the console when an assertion fails is in the same format as compilation error messages (which also show the path passed to the compiler) so a general purpose IDE like Emacs can easily take you to the exact source line of a failure using the same mechanism that works for compilation errors, when you launch a compile/run job. I don't know how things work in Xcode, but IIUC we don't want Swift to be at a disadvantage when used outside the Apple ecosystem. I don't see anything in the proposal that specifically addresses how these use cases will be affected.

  • Is the problem being addressed significant enough to warrant a change to Swift?

I'm ambivalent. The problems as stated in the proposal are:

  • Inadvertent revelation of sensitive information: the problem in general is significant, but I'm not convinced the part of the problem addressed by this proposal—sensitive information in file paths—is.
  • Binary bloat: definitely significant, but should be addressed by building a trie of path fragments instead of storing every path as a distinct contiguous string. Even if this proposal is accepted that work should be done for #filePath.
  • Artificial differences between build artifacts: also significant, but that can be fixed at the build system level simply by specifying paths relative to the root of the project.

If two out of three of the issues mentioned can be handled without changing Swift, I wonder if the change is warranted. It definitely adds complexity. Also, the existing way things work is precedented in many Objective-C/C/C++ implementations, and I may of course just be ignorant, but I have not heard of these problems becoming significant issues for those very widely-deployed languages. Surely they should be a common source of complaints if they're significant for Swift?

  • Does this proposal fit well with the feel and direction of Swift?

Well, it adds complexity that may not be needed…

  • If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?

C/C++/Objective-C

  • How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

Thoroughly read the motivation and proposed solution; skimmed the rest.

3 Likes