I would like to make a change in the way we handle the master-next branch.
Summary: I’d like to switch to a model where we continuously test against the latest upstream LLVM changes. The goal is to simplify the process and make it easier to collaborate on maintaining master-next.
Background: We develop Swift against “stable” branches of LLVM (which I am using here to refer to the llvm, clang, and compiler-rt repositories) that are typically rebranched from trunk once for each release, with other commits individually cherry-picked for specific bug fixes and other changes. This insulates Swift development from the churn of changes in LLVM. At the same time, we maintain the “master-next” branches of Swift repos to keep up to date with trunk LLVM. For Swift, our “trunk” comes from the “upstream-with-swift” branches in our GitHub LLVM repos. We have existing automation to continuously merge changes from llvm.org <http://llvm.org/> into those upstream-with-swift branches.
We currently use a manual process to update master-next. Someone on the Swift team is designated as the "merge czar" and is responsible for this. This merge typically happens once every few weeks. Michael Gottesman developed some internal tools to help automate the process, but someone still needs to drive those tools manually. The process involves merging “master” to “master-next” for all the Swift repos and updating the “stable-next” branches of the GitHub LLVM repos for Swift. The “stable-next” branches are basically snapshots of the LLVM upstream-with-swift branches at the point where master-next was most recently merged.
Swift CI includes a set of Jenkins bots to test master-next building with the stable-next branches of LLVM (https://ci.swift.org/view/swift-master-next). The merge czar can use these bots to confirm that everything is working after a merge.
Reasons to change: The current process has the advantage that the merge czar can choose when to do a merge and can schedule that around other work, but it has some significant problems.
- It is difficult for multiple people to collaborate on updating master-next. The changes involved are often rev-locked between Swift and the LLVM repos, so there is no good way for someone to fix a problem without doing the whole merge process.
- The current system is hard to understand. I’ve been serving as the merge czar for the last few months, and it took me a while to figure out how to do it well.
- It requires extra “stable-next” branches in our GitHub LLVM repos, further adding to the complexity.
- The tools we have to help automate the process are currently internal to Apple and require ongoing maintenance. They could be cleaned up to release publicly but that would take more work.
Proposal: We already have Jenkins bots testing master-next. I would like to add a job to continuously merge master to master-next and change the existing bots to build against the “upstream-with-swift” branches in our GitHub LLVM repos. The bots would then detect any new problems soon after they are introduced. Anyone could fix those problems, whether they are merge conflicts, build failures, or test issues. A partial fix could be applied directly without needing to resolve all of the outstanding issues.
This would avoid the need for our current internal merging tools. We already have automatic merging bots, so adding another one would not be difficult.
The biggest advantage is that it provides a straightforward model that anyone can understand: master-next becomes just another branch that anyone can modify, build and test in the usual way. Collaboration is no more difficult than for other branches.
The cost of this simpler approach is that we would need to be willing to let the master-next branch break occasionally. An LLVM change might break things in a way that takes some time to fix, and the master-next bots would continue to fail during that time. Someone might want to apply a partial fix that does not resolve all the issues, and we would want to allow that even if the bots still fail. That would mean we would have to relax (or override) the requirement for PR testing for commits in that kind of situation. In the worst case, if new problems are introduced more quickly than we can fix them, this approach could fall apart. My experience as merge czar over the last few months suggests that is unlikely. Usually there are no more than a few problems per week and most of them are easy to fix.
I propose to roll this out in steps. First, we can add a new Jenkins bot that tests master-next building against upstream-with-swift. If that goes well, and if there are no objections to this proposal, we can add the automerger to merge master into master-next. At the same time, we would update the other master-next bots to use upstream-with-swift instead of stable-next for the LLVM repos.
Alternative: We can achieve some of the same goals at a considerable increase in complexity by introducing an automatic gated merge solution. We would have automation perform the merge and commit it as long as everything works. If there were any problems, the automation would create a pull request that would need to be manually updated to resolve the problems. People could still collaborate by working together on the pull request branch. Until the problems were resolved, no further merging would take place. I would like to try the simple approach before considering this more complex solution, since I don’t think it will be necessary, at least in the near future.
Any objections to this? Comments or suggestions?