Tuesday, September 16, 2008

What is cascading rebase?

The problem of cascading rebase is a serious one, and should not be taken lightly. Though the idea of cleaning history may be appealing, it is almost always inappropriate.

This problem is what makes all documenters of rebase functionality implore you to avoid rebasing public repositories. However, the nature of the problem is difficult to explain to those without a good understanding of the DAG-based branching and merging employed by most modern distributed VCSes, including Bazaar, Mercurial, and Git.

What starts simple…

Imagine a larger version of this very abbreviated history. Bob has started a project and commits new revisions to the mainline, and collects contributions by merging from others' branches.

Here we see Bob's mainline in red.

Topic branching

Alice has branched mainline-3 to create a feature, and made a few commits. The idea is that Bob will merge the feature branch, in green, back into mainline. Work on mainline can continue, as seen by additional mainline revisions.

Uh oh…

Bob realizes that mainline-2 was done wrong. He decides the problem is serious enough that it has to be redone.

This is not the only kind of rebasing; rebasing is a subsequence replacement operation, so it can be used to insert revs in history, delete revs from history, or insert and replace any number of revs at any point. In this case, we're replacing mainline-2 with mainline-2'.

Here is the completed rebase operation. The dead revisions are shown in light red.

Rebasing is really branching

Every revision in a proper DAG history contains an immutable reference to its parents. So mainline-3 has a reference to mainline-2. But we rewrote mainline-2, so we must rebuild mainline-3 so it refers to the new mainline-2 instead, 4 to 3, 5 to 4, and so on.

As such, rebasing doesn't really "rewrite history"; it finds the latest revision that doesn't have to be rewritten, branches off it, builds the new history onto that branch, and finally replaces the current branch with the new branch. You can see this clearly in an alternate, but equivalent visualization of the previous graph.

Why is the feature branch still connected to the dead revisions?

The rules don't just apply to mainline; feature-4 also has a reference to mainline-3. This is an important integrity feature—as a brancher working on a private feature, Alice wouldn't want Bob to be able to spontaneously rewrite her branch's source by altering revisions made before she branched. Therefore, those "dead" revisions still live as part of the feature branch's history.

I show them above the feature branch here to show how they have become part of Alice's history. They will also be shared with any other branches that were made off mainline after mainline-2 and before the rebase. Please note that the history of Alice's branch remains exactly the same as it was before the rebase.

What happens with a merge?

Feature branches live to die; they ought to be merged back into the mainline eventually, when the feature is ready to be part of it. So Alice would like Bob to merge her feature branch into mainline.

Except the idea of mainline has changed due to the rebase. Here is a symmetric merge diagram to illustrate.

One step in the cascade occurs

Alice by necessity includes the broken changes from the old mainline-2 and the mostly duplicate mainline-3. The DAG sees these as separate from mainline-2' and mainline-3', as they are. So the merge is wrong.

To fix this, Alice must produce a new branch and rewrite her changes onto it. We can do this with rebase, but it requires Alice to know which revisions are duplicates. Here, Alice must know that mainline-3' is the new basis. This seems simple, but imagine if mainline-2 had been simply deleted, or some new revisions had been inserted. Then the revisions would be numbered differently; in that case she would have to rebase from mainline-3 to mainline-2'.

Here is the result of her rebase, and the correct merge.

Now the cascade happens: if any topic branches were made from the "dead" feature-4, feature-5, and feature-6, they must also be rebased onto the new feature-4', 5', or 6', as appropriate. And so on. And so on. Hence my name for this issue, which I don't believe has been adopted by anyone but appropriately illustrates its seriousness, cascading rebase.

Cascading rebase cannot be solved automatically; all rebase tools recognize this and have some interactive conflict resolution features. Furthermore, there is no guarantee that when a rebase goes smoothly on the original branch, it will also go so smoothly on the cascade. It also gets wildly complicated to calculate the cascade when you include behavior like synchronization by merging and especially cross-merging.

Before you consider rebasing, consider other tools for fixing problems. If you made a mistake in an old revision, and it is serious enough that all branches since should receive the fix as well, a good alternative is to branch off the broken revision, write the fix, commit to the new branch, merge it into mainline, publish the revision, and ask others to merge it in. Even if they don't, the common history will mean that merging won't see a parallel "fixed" revision, making the merge cleaner and less likely to conflict.

You don't even have to create a new URL in many cases. In both Bazaar and Mercurial, the commit to the side branch will receive a revision number in your mainline. By merging the mainline at this revision number, branches will receive only that revision, without being forced to merge the head of mainline.

No comments: