LLVM monorepo transition

The LLVM project is moving to a “monorepo” at https://github.com/llvm/llvm-project (more background here). llvm, clang, clang-tools-extra, compiler-rt, and libcxx will be in the same Git repository. It's scheduled to become the canonical repository, replacing Subversion, at the next LLVM developers' meeting in October or November of 2019.

How does the monorepo transition impact Swift?

The Swift compiler builds against LLVM project sources hosted at github.com/apple, with histories in swift-clang, swift-llvm, swift-compiler-rt, swift-clang-tools-extra, and swift-libcxx based on git-svn mirrors of the Subversion repository.

  • These sources need to be rebased on top of the canonical LLVM project monorepo.
  • The Swift compiler and open source toolchain needs to build against this new repository.

We're working on it.

Duncan (dexonsmith) and I (Alex_L) are working with Mishal (mishal_shah) on a full transition plan for the impacted repos on github.com/apple.

The high-level goal is to rebase swift-llvm , swift-clang , swift-clang-tools-extra , swift-compiler-rt , and swift-libcxx , merging their histories into a new repository downstream of the LLVM monorepo, and to change swift 's update-checkout script to point at this new repository. The old repositories will be archived with their histories intact.

We'll follow up with more details in a few weeks.


Do you mean "rebase" in the git sense of the word? Or something else?

As with many SCM problems and git, there tend to be multiple solutions with different tradeoffs. What git approaches were ruled out and why?

EDIT – One more question: given that LLVM went with a "monorepo", was a downstream/derivative "Swift monorepo" considered? (Again, like many SCM problems, this is just a tradeoff with pros and cons.)

1 Like

We will not be doing a "git rebase". In our "rebase", we plan to reconstruct the commit history by zippering the downstream split histories into one downstream monorepo that preserves the existing split merge history from upstream to downstream, while reparenting those merges on top of the appropriate upstream monorepo commits. This should give us one-to-one mapping from an existing split github.com/apple/swift-{llvm/clang/..} commit to a new monorepo commit, and vice-versa. I think @dexonsmith should be able to give you a more detailed answer about the tradeoffs and the approaches that we are considering.

The "Swift monorepo" is certainly interesting, but we haven't considered it in our plans, as we think that it should be a separate topic of discussion. This separation of concerns should also allow us to transition to the LLVM monorepo without affecting the majority of Swift developers and their existing workflows.

Interesting. I don't know if you're aware of this: the LLVM monorepo made some opportunistic cleanup relative to the "truth" that was in the SVN repository (for example, removing build results that were accidentally committed). How will the LLVM cleanup be reconciled in this transition? Will any additional cleanup be done as a part of this transition? (Removing no-op commits, parent simplification, etc.)

Separably, will a file be created in the repository that helps people map old git hashes to new git hashes? For example, if somebody mentions a hash in a bug database, it would be nice if they could just grep a file to get the post-monorepo hash.

1 Like

I was really hoping that the patchset was going to be rebased. Given that history must be rewritten for this to function, there is a benefit of the rebase - it will make it obvious what the patchset currently looks like and would be easier to try to integrate the changes into the upstream repository. I realize that the opportunistic benefit here would come at a great cost - most of the patches have been smeared across years of development. This means that rebasing the patches is not particularly straightforward (which I believe does increase the difficulty of someone else trying to merge the changes into upstream).

Separably, will a file be created in the repository that helps people map old git hashes to new git hashes? For example, if somebody mentions a hash in a bug database, it would be nice if they could just grep a file to get the post-monorepo hash.

That's a good point, a file is a great idea. There are also other strategies to help this issue: like annotating the commit message with the pre-monorepo hash. This is an approach that would take advantage of the fact that we'd be re-writing history already and would help when navigating history.

1 Like

Yes, we are certainly aware of cleanups made in the upstream monorepo. The upstream cleanups will be propagated to the new downstream monorepo. I believe @dexonsmith was looking into whether we could perform additional cleanups as well, so he might be able to provide more insight as to what the options are like there.

At the moment we're planning to store the mapping from the old git hashes to new ones using the idea that @kocsenc suggested, i.e. by annotating the commit messages. We are also planning to provide a tool that will allow you to perform this conversion easily, without the need to dig it out of the commit history manually.

I know I implied this question earlier, but please let me be explicit: why was history rewriting/cleanup chosen over, say, git subtree merges which don’t rewrite history or invalidate existing hashes? (I can see good arguments either way.)