Concise Magic File Names

SDGGiesbrecht · December 10, 2019, 8:19pm

But then, also thanks to the variation in tools, it would be absolutely impossible to use test fixtures. Since nothing in the compiler guarantees what the working directory will be during tests, (Xcode and SwiftPM already differ, no idea what Bazel does), the only way to find the source directory at all is by deriving it from the location of one of the source files. Some means of locating the source directory is an absolute necessity.

I would say that Bazel’s distributed build breaks reasonable expectations for normal builds. Just like a developer ought to be able to normally assume that their tests run serially. If they go out of their way to apply a --parallel flag to opt into concurrent tests, then much of what used to work suddenly doesn’t, and the developer must go back and rewrite everything in a concurrency‐compatible way that restricts itself from touching certain incompatible things in the toolkit. I see it the same way with distributed builds. A developer ought to be able to normally assume that the repository has some consistent absolute location and that relative paths between source file are exactly what they expect. If they go out of their way to opt into a distributed build, then some of what used to work suddenly doesn’t, and the developer must go back and rewrite everything in a distributed‐compatible way that restricts itself from touching certain incompatible things in the toolkit. Both concurrency and distributed builds are valuable tools that have their place, but neither should dictate what can and cannot be done when they are not in use.

(SwiftPM’s coming support for resources somewhat alleviates the need for finding fixtures in the repository, since individual files can be accessed that way, but it doesn’t yet solve the issue of fixtures that are whole directories.)

SDGGiesbrecht · December 10, 2019, 8:21pm

And SwiftPM’s SE‐0271 cannot write into the repository fixtures to update them automatically. That is a deal killer for me.

beccadax · December 10, 2019, 9:16pm

No, it doesn’t. Bazel (AIUI) doesn't rearrange files in a project or anything like that; it just refers to them by relative paths instead of absolute ones so that it can place projects in temporary directory structures. This is totally something we want build systems to be able to do.

If the Swift driver wanted its clients to always pass absolute paths, it would enforce that. If the compiler wanted to promise to code that #file would be an absolute path, SILGen would convert the relative paths to absolute paths when it lowered #file to a string literal. If the language did not want to guarantee that you could use #file for anything but runtime diagnostics, it would pass through the exact same file string it uses in compile-time diagnostics—which is precisely what #file currently does.

So if you run your tests twice, the failing tests will pass the second time? I'm struggling to understand what the idea is here.

I mean, it’s your computer—nobody will stop you from doing this. (And #filePath will continue to support it just as well/poorly as #file previously did.) But nobody has to design shared tooling to accommodate it, either.

allevato · December 10, 2019, 9:47pm

It is absolutely not necessary—the build systems can provide that support:

If someone has a SwiftPM package with resources, post-SE-0271, the targets have those resources associated with them, they get placed in a location relative to the binary where the bundle APIs can find them regardless of CWD, and the user accesses them using the Bundle.moduleResources accessor.
A hypothetical yet-to-be-written Bazel rule could associate a set of resources with a swift_library target, generate the same Bundle extension, ensure that the bundle is available in the runfiles set so that a bazel test invocation—even executed remotely—copies the resources along with the binary to the runner so that it's found at runtime.

Then, the same Swift code would work regardless of whether it's built with SwiftPM or Bazel, with only a new build file having to be written.

This doesn't address the case of invoking the compiler directly, but in that case, you don't have any expectation that you're getting bundle-based resource support anyway.

Why? In many cases, relying on absolute paths only serves as a vector for information leaks and build/test fragility—rarely do you gain anything except the illusion of convenience, and it comes with high cost.

Myself and other engineers have dealt with a number of pain points across multiple languages/compilers because they weren't designed with distributed builds in mind (I'm not singling Swift out here; it's historically been an issue in many systems): remotely-built applications that can't be debugged on a developer's local machine because the debug info has hard-coded absolute paths to files used in the build, coverage instrumentation data that can't be extracted into a report using llvm-cov for the same reason, tests that change behavior depending on how source files are passed to the compiler, and so on. A larger-than-I-would-like portion of my day-to-day work is fixing or working around these issues. I'd love to spend that time on more productive things.

Less reliance on absolute paths resolves these problems, with no real harm to the average developer not doing distributed builds, and is unambiguously a good thing. And as more developers and companies continue scaling up their processes and using distributed builds, issues like these will continue to be costly pain points unless we address them, which is why I'm glad @beccadax is taking this particular one on.

That's right; the file structure from the workspace root and below are not rearranged in any way, but that workspace root could be located theoretically at any physical location on the file system. This is important for sandboxing, doing multiple builds concurrently, and a number of other factors.

If I'm understanding the use case correctly, this is a pattern often used when writing tests against complex golden files that are difficult to maintain by hand; if the code is updated to change the expected output, you run the tests to generate the actual output files and then have a way to update the goldens.

One way to do that is to use #filePath, which I guess is fine. You could also write a script that separately copies the test outputs onto the goldens so that the tests don't have to write directly into the source tree. There are probably other approaches.

FWIW, when I say #filePath should be abandoned, I mean that we should strongly encourage people to use better solutions. If you absolutely need it, like for updating your test outputs, go for it. Heck, swift-format's pipeline generator even uses #file to scan the source tree for rules, because that's a development tool that runs locally as a manual pre-build step—it would never run on a distributed system isolated from the source code, so I acknowledge the fact that's fragile and that I'm a huge hypocrite. But overall, SE-0271 is likely to be a better answer for runtime resources for most use cases and we absolutely should be directing people to that as a general solution, leaving #filePath for the rare situations where the recommended solution won't work.

Karl · December 10, 2019, 10:01pm

So what we do with swiftplot is save the rendered files to a local relative directory (it gets placed in /private/tmp on my Mac). We use #file to discover the reference file location and use FileManager.contentsEqual(atPath:, andPath:) to compare them. We can still do that with SPM resources; we just get the reference file location from the Bundle APIs.

If we change the output in an expected way and want to update the reference images, I copy the version from /private/tmp in to the repository.

It would be nice if CI captured that output, but that's an infrastructure issue. Some CI systems capture test artefacts.

So anyway - I don't see this as a reason to block changing #file.

SDGGiesbrecht · December 11, 2019, 1:01am

And I’m all for that. I’m only arguing against the full removal of all #filePath‐like functionality without an adequate replacement, which is what @allevato seemed to be suggesting a few comments ago:

(Note that since reporting assertions doesn’t actually require finding the source file, I (presumably mis‐)read the second half to be asking for the functionality of #filePath to be reduced as well.)

I understood (perhaps incorrectly) from @allevato’s comment upthread that each phase of compilation could be occurring on a different machine, and consequently at a different absolute path than other phases of the same build. Which would yield chaos for anything attempting to access “the” repository, since there are actually several.

it just refers to them by relative paths instead of absolute ones so that it can place projects in temporary directory structures.

Which is why we need #filePath (or some better replacement) in order to find our way back to the canonical source files.

So if you run your tests twice, the failing tests will pass the second time?

No. I make heavy use of this function for fixtures. Most of the time, the overwriteSpecificationInsteadOfFailing is explicitly false. But when a failure reports a difference—no matter how complicated—, and that difference is deemed to be desired, then instead of manually finding the specification and tediously hand‐crafting the new expectations—which could involve hundreds of lines—, you can simply flip the boolean literal to true, re‐run the test, restore the literal to false, and check in the files. It saves so much work, but it requires some means of locating the canonical source of the repository being tested in order to write into it. In the future that might be accomplishable by some means besides #filePath, but so far that is the only way that works reliably for both SwiftPM and Xcode.

I’ve also often set up data generators in pseudo‐tests (because executable products interfere with building for iOS et al.) whose purpose is in a similar vein to GYB. They write derived or computed data into Swift or resource files. A concrete example is here, where it downloads the current DUCET (Default Unicode Collation Element Table), filters what it doesn’t need, processes it, and writes it into the repository as a resource that can be loaded with Codable. Any such writing into repository files requires being able to find the canonical repository source, and #filePath is currently the only way.

That’s what I mean. We need a way—which I agree doesn’t have to continue to be this one—, but removing #filePath before we get that better way would kill our current best option.

Maybe I said that poorly. I don’t mean a permanently consistent absolute path. I expect two separate clones of a package to behave identically. If the source is somehow designed in a way that makes that assumption not hold, then I agree with you that it was a foolish decision.

What I mean is that relative paths should work reliably. If I run GYB (which produces .swift files at relative locations respective to the .gyb ones), and then build, I expect the build to use those generated Swift files at their paths relative to the source root. If the two phases occur in separate clones of the repository behind my back, then the build phase will be missing those files because they were never generated in its version of the repository. So what I meant was that a developer ought to be able to reasonably assume that all build phases are operating on the same source directory, and that consequently any resolved relative paths continue to point at their intended targets. I would not want behaviour contrary to that unless I understood the extra restrictions and opted into it, much like opting into concurrency only when you know about the extra restrictions and are willing to stomach them in exchange for the benefits it brings. The same assumptions I have about relative paths working with GYB are the ones that govern the reading and writing into project files from Swift that I mentioned above.

It would also be fine if the build system needs to have the interdependencies explicitly registered to ensure all the right files still end up in the right places, and that their mutations make their way back to the canonical source. Again, I’m only saying we need a way, and until we get something better, #filePath is all we’ve got. So please don’t take it away until there actually is something better.

Then despite our apparent disagreement, I guess we actually agree. I would rather use better solutions wherever they are available just as much as you would. I just don’t want these sorts of development workflow tactics to become impossible because we dropped the flaky functionality before a sound replacement became available.

allevato · December 11, 2019, 2:02am

To clarify what I meant, reporting assertions in a log doesn't require finding the source file, but Xcode uses the file path recorded with the assertion to navigate to it in the UI, so we'd still need #filePath around for that. (If the new #file provides a source file basename and module name, perhaps the same navigation could be implemented in terms of that, but it would be a lot more difficult and require the format of that string to be well-defined, so I think #filePath is actually the better option here.)

I don't want to belabor this too much because this isn't a how-Bazel-works thread beyond my interest in making sure that a decision isn't made that would be incompatible with it, but for a distributed Swift compile with sandboxing, your sources get copied to the remote machine and then you can imagine the path to those sources being divided into two parts:

Everything up to the source root: Bazel doesn't let you know this part of the path at all through supported APIs, and you shouldn't care about it or need to know it unless you're trying to debug a build issue. This path won't be the same from compile to compile. There might even be multiple source roots at different paths being used by concurrent builds on the same machine.
Everything below the source root: these are laid out exactly as they are in the original source repository. The files here do not have to be the entire repository; they could be only the subset of sources that are required to compile that particular module.

The key concern is that you don't want any of the (1) parts of the path to end up in the compiled modules or binary; it makes those files impossible to cache, and may lead to other problems (like the debugging issues I mentioned earlier). Fortunately, the current behavior of #file is actually fine in this regard, which is why #filePath should be the same: since it just uses the path as given to the compiler, and Bazel only gives the (2) part of the path, we don't have any reproducibility issues.

Since SE-0271 has already been accepted and appears to be implemented (at least to some degree) based on a quick look at commit history, I suppose I've been replying to this thread under the assumption that the replacement for the general use cases would be available before (or at the same time as) this proposed change would be accepted and land.

@Aciid, with swift-5.2-branch having just been cut, is the implementation complete or is there still work to be done before the feature is ready?

AlexanderM · December 11, 2019, 2:10am

I'm not a fan of the primitive obsession here. Why "file_path (module_name)"? Why not a SwiftFileDescriptor struct, with fields like:

url: URL (relative to some portable base location),
name: String,
moduleName: String,
lineNumber: Int,

and perhaps a custom implementation of CustomStringConvertible that glues them together with spaces and parentheses and what not.

I say this because given the current design, I'm fairly certain that most call sites will be constantly calling components(separatedBy: " ").

Aciid · December 11, 2019, 2:15am

I was hoping that it would be complete for 5.2 but that seems pretty much unlikely at this point.

allevato · December 11, 2019, 2:57am

Ah, that's unfortunate!

While it's somewhat separate from the question of what #file or #filePath does, I can understand why some folks would be a bit reluctant if that feature isn't completely ready yet—even if we choose to keep #file's current behavior as #filePath, it would be nice if we could tell users to migrate their code only once from #file to the target resource bundling APIs for use cases where it's the best option, instead of migrating from #file to #filePath, and then again later from #filePath to the target resource bundling APIs once they're ready.

benrimmington · December 11, 2019, 6:28am

The Swift Programming Language book:

Inside a function, the value of #function is the name of that function, inside a method it is the name of that method, inside a property getter or setter it is the name of that property, inside special members like init or subscript it is the name of that keyword, and at the top level of a file it is the name of the current module.

// at the top level of a file
let moduleName: String = #function

Karl · December 11, 2019, 3:31pm

Oh wow, I didn't know that. Still kinda feels like we shoe-horned it in, and I still think we should bundle all of these together in a #sourceLocation or #context expression which returns a type defined in the stdlib.

AlexanderM · May 15, 2020, 9:20pm

Can someone tip me off to how I add the -Xfrontend -enable-experimental-concise-pound-file flags to my Xcode configuration?

suyashsrijan · May 15, 2020, 9:39pm

You need to add it under "Other Swift Flags" in Build Settings.