Swift Windows working group

It was great to see the announcement of the Swift on Android working group. Is there any interest / activity in setting up something similar for Windows?

4 Likes

I would be open to setting up a Windows workgroup - though I don't personally see much activity outside of The Browser Company in terms of contributions to Windows from the rest of the community.

If there is interest in others contributing though, I'm more than happy to help facilitate that and help found that.

8 Likes

Doesn't Apple have a team in London working on Swift on Windows? Okay, that maybe does not count as "community" and I guess you're communicating anyway...

1 Like

We, at Raycast, are expanding our usage of Swift on Windows and we would therefore be interested on this working group. That said, our experience contributing to the compiler is currently nonexistent, but we would be open to allocate time for it.

3 Likes

That sounds great! What have you already shipped/are working on? What's the experience been like?

Great! What types of problems would you be looking to solve? There are a large number of ancillary projects, and a much broader vision of how the toolchain is meant to be consumed, but most of the work requires a fair amount of effort to come to light. If you are interested in helping solve those issues, that would be very welcome.

1 Like

I’m very interested in being part of such a group. Partially retired, 30+ years douing Windows C++ development. New to Swift. strong knoledge of Win32, COM, WinDbgSide-BySide assemblies, and ETW,.

Am also a tenatious problem solver who loves collaborating with others. I spent the past 30 years as a developer of the JAWS Screen Reader for Windows.

The area I’m most interested in is ETW profiling of the various Swift tools to see if there can be any performance gains. I think that this means making stack unwinding ETW compatible since without this, CPU profiling doesn’t properly attribute time to functions in Swift code.

I’m also interested in making Swift versions fit into the Windows Sxs infrastructure, and helping to make the new Swift-Build work well on Windows.

Yes, I realize that none of this is simple, and that it’s likely to take lots of time. But I’m also interested in throwing myself into something meaningful and time is something I now have.

I have Swift building on windows, have read as much as I can on the current state, but am at a bit of a loss in terms of how to approach any of the issues I mentioned above.

What I’d most like would be regular meet-ups/calls where issues can be discussed, and people like me can be more in the know about what’s being contemplated and changed on the Windows side, and of course, to be able to benefit from the deep knowledge of people like Saleem.

10 Likes

Welcome!

Just to make sure I read that correct - you are interested in ETW profiling of the toolchain itself?

I think I can reasonably say the answer is “yes, there are performance gains to be had”. The problem is that I cannot say how to get those gains :joy: The ETW profiling is definitely something that would be needed. I don’t know if we really need to get into the actual tracepoints themselves, using WPA and the rest of WPT should be a good start IMO.

Well, your timing is rather impressive then. The SxS support is something that is currently in progress. There is a separate thread on the upcoming Windows SDK changes that I would suggest that you follow up on. The new experimental SDK enables the builds with SxS manifests embedded in the runtime. The work that currently remains is to:

  1. finish packaging the runtime
  2. ensure that the experimental SDK builds are not (unintentionally) breaking ABI - basically is the build correct
  3. ensure that there is no real performance regression being introduced
  4. ensure that the experimental SDK is complete and works for real world usage
  5. switch over to the experimental SDK
  6. drop the legacy SDK

Getting the toolchain to build on Windows is a good first step. I think that the next steps would be to address problems in the build. Right now, the only option for profiling is really WPT. In order to use WPA we would need good CodeView/PDB generation. Therein lies the difficulty: we have an increasing amount of Swift code in the toolchain, and the CodeView generation for Swift is not very good. As such, we would need to improve the quality of debug information that we emit so that we can get meaningful ETW traces from WPA. Fortunately, we use MSVC for the C/C++ code in the compiler so we can get good traces for that portion of the compiler.

I have been thinking a bit about this over the past few weeks and am trying to see if I can do something that would help facilitate this.

Yes, interested in doing CPU profiling of the tools themselves. I too was thinking that the codeview symbol generation for Swift code is the place to start. It’ll likely take a while to come up to speed with what’s currently there for Swift and then figure out what small steps I might be able to take to get started. I’m broadly familiar with .pdb/CodeView support in Clang and the documentation surrounding it. If you have any Swift specific pointers in this area to get me started I’ll be most appreciative. Otherwise, I’ll open a separate issue to discuss this further as I invariably will have questions.

Thanks for the warm welcome.

I wonder if it makes sense to be a bit more optimistic and simply start with what we have. Basically, lets figure out if we need much in terms of changes for build.ps1 to build the toolchain with PDBs that are enough to get an initial trace. Perhaps there is enough low hanging fruit to make progress without having to necessarily dive into the deep end of debugging (FWIW, if you want to work on that, I welcome that too! Debugging the toolchain on Windows has been particularly atrocious).

Well, the clang support for CodeView doesn’t play as much into this. We may want to perhaps generate some CodeView from clang for the decls that ClangImporter brings in and materialises, but, overall, I think that the only thing to be learnt from that is the desired metadata for proper DBI, TPI and IPI streams. Overall, the approach for the PDB was to model Swift as C++ as there is no proper Swift layout for PDBs. In the case of WPT, I would hazard a guess that we only really care about IPI and DBI, and don’t care too much about the TPI.

The other piece that will become more important is the pdata. We need to ensure that the LLVM PEI and pdata emission is as close to the ABI as possible. That is important as the unwinder needs to be able to perform the frame walking, which I believe that we should already be pretty good with.

I had already built the Toolchain with PDBs and had attempted what you just recommended. Problem is that I got an important step wrong. Although I was set up to process traces with the symbols from the toolchain I built, I errantly generated logs when using the installed compilers. When I got no simples in my trace, I jumped to the conclusion that something was wrong with the symbol generation rather than something being wrong with me.

I reran my experiment using the tools I built, and low and behold, the experiments yielded useful information. So far no smoking gun, but I’m very early in my exploratory phase trying to better understand how all the pieces fit together.

One additional detail about me, I’m totally blind and for years was longing for the ability to analyze ETW data like my sighted counterparts using WPA and similar tools. And then I discovered the Microsoft.Windows.EventTracing.Processing.All NuGet package that allows slicing and dicing ETW data programmatically. This has allowed me to find all sorts of interesting performance issues in my day job. I’m hoping that once I know more about the lay of the land with Swift, I may be able to help here too.

Oh that is amazing news! Did it require anything special? Was it as simple as “add -DebugInfo to the build.cmd invocation and run under WPA?”.

The Windows toolchain really has gotten to the point where performance is an area of focus. If there are modifications to the build which would aid in profiling, I think that it makes sense to add an option to build.ps1 to build specifically for profiling through a switch. I want to make it simple to profile the toolchain to encourage others to also help identify any opportunities for improving the performance characteristics of the toolchain.

Nothing special at all other than

-DebugInfo -CDebugFormat codeview -SwiftDebugFormat codeview

I then started a CPU trace with

wpr -start cpu

Did a project build, and saved the trace with:

wpr -stop path-to-etl file -skipPdbGen

The -skipPdbGen is to avoid generating pdbs for any CLR apps that happen to be running at the time the trace is captured.

I think that the single biggest thing that could be done in terms of infrastructure to encourage others to start profiling is to have a public symbol server for symbols from the Browser Company nightly builds. It appears from your build GitHub actions that there’s currently a private symbol server. Not sure if you’re generating symbols.

From my experience, a symbol server is the most reliable way to get WPA and bespoke tools to find symbols. Second is for the pdbs to still be at the path stored in their associated .exe/.dlls. Third is to put the path to a symbol file directory into _*nt_*symbol_path. I’ve had mixed luck with WPA finding symbol files along-side their associated modules. It may be that it works if they’re along side the modules and there aren’t any .pdbs at the paths in the debug section of the PE. What I’m fairly sure doesn’t work is if there are out of date symbol files at the path in the PE and the correct ones along side the modules. These tools don’t seem to elegantly fall back when the .pdbs at the PE specified paths are the wrong ones.

A side benefit of profiling for those of us who are new to all of this is that it makes it very clear which processes are calling which others and which functions call others. I was easily able to see the sequence of steps involved in Swift importing a C++ library. The exact code path wasn’t obvious to me before.

Thanks very much for your encouragement.

2 Likes

I’m curious if you’ve made any further progress here. We are working on trying to figure out how to get the Azure Symbol Server opened up to the public. In the mean time, if there are findings that would be worth chasing down, it could be very interesting.

I wanted to wait to respond until I had found and resolved something significant, but since you asked …

The most significant thing I noticed is the amount of time spent in KernelBase.dll!GetFinalPathNameByHandleW.

When building Swift-Syntax usingNinja, across all the threads this was a total of 50 seconds spent either directly executing that function or it waiting for something to complete. Similar numbers for using “swift build.” These in runs where the total amount of time spent in GetFileInformationByHandle was about 3.5 seconds, and CreateFileW was 16 seconds and 30 seconds for Ninja and Swift Build respectively. This largely originates in llvm::vfs::FileSystem::getBufferForFile, and to a lesser degree in llvm::sys::fs::getStatus.
I use Microsoft Detours to quickly detour CreateFile and GetFinalPathNameByHandle so I could simply have GetFinalPathNameByHandle return the name passed in to CreateFile. Things got amazingly far along before they blew up (several targets of swift-syntax properly build before things started failing with module imports.) My point in all of this is that it looks to me like the results of GetFinalPathNameByHandle are sometimes not used at all, or at least proxies for what it returns might be constructable in a more efficient manner.

Other things of less significance, though interesting, are that llvm::sys::fs::createUniquePath repeatedly calls llvm::sys::Process::GetRandomNumber. This calls CryptAcquireContextW for each number generated, and CryptAcquireContextW is relatively slow. This means that in swib build of swift-syntax, we spend a total of 10 seconds in CryptAcquireContextW and only about .5 seconds in CryptGenRandom. Caching the acquired CryptContext in a static in GetRandomNumber, gets the whole random number generation down to around .5seconds., though there isn’t a user perceivable difference, probably because these calls are amortized across threads.

FoundationEssentials::fileExists can be made more efficient by using GetFileAttributes instead of CreateFile and GetFileInformationByHandle, but this only accounts for about 7 seconds across all threads of the swift-syntax build.

The Visual Studio COM API for locating Visual Studio is not very efficient as compared to the path of getting it from the environment, as is the case when at a VS developer prompt. This time is amortized across multiple processes and threads so I didn’t notice a speed improvement over when the info is already in the environment.

I also suspect that some improvement could be made by reworking the exponential backlog code in favor of some sort of eventing to find out once files become available. I even saw comments in the code to this effect, so I’m clearly not trodding any new ground with that observation. Every now and then I get dramatidally better nubmers with a swift-syntax build which makes me think that in those cases waits were satisfied in a luckier way.

Which repos should I be forking if I ultimately plan to offer up pull requests? Yours, TheBrowserCompany, …?

Any guidance and advice very welcome.

Thank you for the detailed write up! It will take me some time to really internalize what we can do for some of these, but I suspect that you will be much further along by then.

The PRs really depend on the repo. For the most part, against swift lang repos would be best, though the llvm/clang ones really should go against llvm project.

One note about theVS installation - that is a major convenience thing as it avoids the need for the VS Dev Command Prompt environment to be setup. I wonder if we can cache that somehow.

Is this for the module builds? If so, I suspect that the answer is the explicit module builds. That requires the early swift driver to be enabled and there are a few issues to resolve in the modulemap injection.

I know that this does actually make a difference (at least anecdotally). You should be able to try it out with the pending change ( utils: enable early swift driver on Windows by compnerd · Pull Request #76574 · swiftlang/swift · GitHub ). The recent changes to the build are related to trying to get the static standard library to be usable to enable the early swift-driver on Windows.

Sadly, I do think that that only means that it will make this even more challenging.

My tests have all been with my locally built toolchain and libraries. Using those, I’ve then independently built swift-syntax both with swift-pm and CMake.

I thought that the early Swift driver was exclusively the domain of building the toolchain and supporting libraries. Am I thinking about things all wrong?

The early swift-driver is used to build the toolchain. However, when using the toolchain (after the result of build.ps1), all users will be using the “late” swift-driver. The early swift-driver is the same as the “late” swift-driver, with the minor difference that it is statically linked to the Swift runtime. During the build of the toolchain, we will rebuild swift-driver and that is what is shipped to the user.

So, you are correct that the early Swift driver is exclusively for building the toolchain, but Swift driver itself is what everyone uses.