Rebase Considered Essential
Published by marco on
Fossil is a distributed Source Control Manager that claims to offer the same power without the complexity of Git. The article Fossil: Rebase Considered Harmful by D. Richard Hipp (Fossil SCM) is part of the documentation for the tool.
One of the main selling points of Fossil is that it does not support rebase. In the article, the author lays out the many ways in which rebasing causes no end of woes for developers using Git.
I’d heard of Fossil before and I’d even skimmed this document before. This time around, though, I read it through to learn the author’s reasoning. My short take is: I do not want to use an SCM that does not allow rebase. I think a project benefits greatly in clarity if a developer is able to alter the local history before cementing commits into an unalterable history (i.e. pushing to the server).
Terminology and Concepts
The following definitions are not complete, but are sufficient for the ensuing discussion.
- A repository describes the history of a set of data
- A commit includes instructions for how to change the state of the repository
- A branch points to a commit, but generally refers to a set of commits; a repository may contains multiple, independent branches
- A merge operation integrates two branches with a merge commit that describes the delta; it retains all commits from both branches
- A rebase operation integrates two branches by re-applying each commit since the branches diverged from one branch to the other, possibly changing the original commits; it replaces the commits from one of the branches with new commits
A rebase is considered a destructive operation because it discards part of the history of the repository by rewriting commits.
If I think about it, though, many of the operations I’m accustomed to making are destructive:
- Editing the commit message
- Amending a commit with more changes
- Squashing commits
- Re-ordering commits
All of these operations are considered destructive because they modify the “true” history of the repository. But what do we mean by “true” history? Where does the story start?
The changes outlined above are not for sharing. It’s not interesting to the final reader that I had to backspace through and re-spell the word “outlined” in the previous sentence. It might be interesting to see different drafts, though, to see how I arrived at the final version. But those changes are at a different level of granularity.
Who decides where one level of granularity stops and the next begins? I think it’s the author of the commits. My workflow over the last ten years is based heavily on being able to massage commits so that I can prepare what I share to the server repository, where it can no longer be changed. I agree that there should be an unalterable history, but disagree with the author on where that history begins.
I agree with the author that developers should not work in silos, massaging their code until it is perfect, pushing only once there are no more errors and no-one could possibly take issue with anything in the feature. At this point, the author purports that many developers squash all of their local commits to a single so-called hairball commit that makes it look like the code sprung from the forehead of the developer as Eve sprung from Adam’s rib: whole and without blemish.
Hairball commits are acknowledged as bad, so attacking them as the prime reason to eliminate the tool that allows them seems to be more of a straw man.
Preventing developers from making any changes to local commits is not the way to solve the problem, though. While Fossil does not allow discarding any single commit from the history, the author acknowledges that Fossil allows developers to apply addenda that the common Fossil tools will show while hiding the original commits.
I see the author’s point—that (potentially) important parts of a history are retained whether the developer wants them or not. That is, it is not up to the developer to decide, but up to the archeologist examining the commits later. This is an interesting idea, but the argument is ultimately not convincing.
Let’s suppose a developer uses an SCM without rebase. Either there will be many commits in the history that—unlike the author claims—do not provide any clarity because they are garbage commits (e.g. WIP and other sorts of investigatory commits that were quickly reverted or undone). Or, the developer will be terrified of making a commit before it’s ready and runs the risk of losing work or working less efficiently.
Developers will not magically become ego-less and kowtow to the machine. Instead, they will pick up bad habits that are worse than local rebasing. They will keep work uncommitted for too long or will fail to split up commits properly because they are afraid that they can’t fix them up later. In either case, it’s chaos in the commit history and the project efficiency and reliability suffers.
But the author is arguing with a straw man that doesn’t really exist outside of shitty developer teams with undisciplined developers. One can argue that these are the kind of developers that many projects have, but that can only be addressed with process. Weakening the tools so that disciplined developers are less efficient is a bad idea.
You don’t like hairball commits? Tell developers to stop making them. Enforce the policy with reviews. The Git documentation already encourages developers to make focused commits. Rebase allows a developer to split up commits during or after a code review. Rebase can actually be used to combat hairball commits.
I have personally used it to split up commits that inadvertently mixed a bug fix or two into a large pile of refactoring changes. I’ve also often advised people to redesign their commits so that they tell a better story.
Citations and Responses
I’ve interspersed citations from the document linked above and included responses and thoughts.
“[…] [some tools] accomplish things that cannot be done otherwise, or at least cannot be done easily. Rebase does not fall into that category, because it provides no new capabilities. (Emphasis added.)”
As discussed above, I think that this is fundamentally wrong. My workflow is considerably different than it was before I used Git or had access to rebase. I would now be much less efficient if I didn’t have rebase. It would make me constantly focus on cleaning up commits before I really care to. You could make the argument that cleaning up afterward takes more time, but I haven’t experienced that to be the case. Instead, I want to be able to set the priorities rather than worry about committing something that I cannot undo.
Telling a good story
And it’s not about ego or “looking stupid” to future readers of the history; instead, it’s about having control of the story you tell to those same readers. If you don’t have rebase, then you tell just as poor a story as if you use rebase badly. It’s perhaps closer to the “true” story, but it’s not the “best” story. Without rebase, you’re forcing future archeologists of your code to read all drafts as well as the final version simultaneously.
At Encodo, we don’t focus on ego, we focus on efficiency. We do not obliterate commits that make sense just to squash a whole feature. We retain commits in order to tell a good story about how a feature was built. We do not emphasize being able to build each commit: often we’ll add a failing test in one commit, then fix the bug in another commit, because that tells a better story.
We need rebase in order to massage local commits so that they tell this good story rather than uploading dozens of commits that no-one should ever have to look at (typos, code comments, formatting, etc.). Often, we’ll squash in little fixes and changes that come up during a review. Is the Fossil author suggesting that there is some benefit to seeing these in a separate commit? It would make understanding the commits at a later time that much harder.
I think most of the author’s concerns are addressed by using review and process to enforce better commits. Fossil can’t make this happen because the developers have to create good commits in the first place or, at least, eventually. Rebase helps better developers clean up their own commits and also helps them help others clean up their commits, teaching them how to tell the story of their code.
“A rebase is just a merge with historical references omitted”
Exactly. If I can’t eliminate WIP commits or squash local commits, then my workflow changes. Honestly what’s the point of keeping each commit? Many are scribbles, unwanted drafts. They’re not part of a history anyone would retell. Once commits are cleaned up and tell a good story, there is no need to keep the old commits around. At that point, you’re wasting the future archeologist’s time.
“Surely a better approach is to record the complete ancestry of every check-in but then fix the tool to show a “clean” history in those instances where a simplified display is desirable and edifying, but retain the option to show the real, complete, messy history for cases where detail and accuracy are more important.”
This feature is an interesting one for commits that can no longer be changed (i.e. have already been pushed), but why make the developer mark every accident and mistake instead of just letting him undo them? The “full” view would be of marginal to no value. Even once the messy commits were deciphered, they would most likely yield no useful information.
What possible benefit is it to keep a jungle of “fix typo” and “add missing file” or “fix broken test” commits just because the developer made a commit before running tests or seeing a warning in the IDE?
Command Line vs. UI
“So, another way of thinking about rebase is that it is a kind of merge that intentionally forgets some details in order to not overwhelm the weak history display mechanisms available in Git.”
I honestly think that this guy just wants to make Git look stupid and Fossil look spectacular. I understand fully that it’s silly to argue that Git doesn’t need a feature that Fossil has just because I’ve personally never needed it. A good feature is something that becomes essential once you have it, but you never knew you were missing it or were less efficient without it. Fossil’s ability to easily see which changes were made to a file after a given commit sounds like it might be that kind of feature. However, rebase in Git is such a feature, so if Fossil takes that away, it’s a deal-breaker.
At this point, I think also that the author is considering Git as a command-line application rather than extended by a truly powerful UI like SmartGit, which provides fast access to gobs of historical data with little effort.
When does Siloed Development begin?
“Or, to put it another way, you are doing siloed development. You are not sharing your intermediate work with collaborators. This is not good for product quality.”
What has this guy seen in the wild that he’s reacting this way? Who hurt this poor man? How often does he expect us all to commit and push to the server? Should we code directly on the server? Where does he draw the line for “siloed” work? A day? An hour?
More to the point: who is paying developers (or a project lead) to examine unvetted commits? Do you think we’re made out of free time? Keeping everything around forever is not the most efficient way of optimizing information about your code. It’s a hoarder mentality.
I understand the sentiment: you want to avoid people massaging commits into oblivion, eliminating important information. But, honestly, I’ve seen the opposite problem: commits pushed to the server in the shabbiest form, thereafter unalterable.
“Many developers are drawn to private branches out of sense of ego. “I want to get the code right before I publish it.””
No, that is not my requirement. I want an efficient review that pinpoints (and fixes) errors quickly so no-one wastes time.
Online Repository Tools
The author claims that,
“Rebase adds new check-ins to the blockchain without giving the operator an opportunity to test and verify those check-ins. Just because the underlying three-way merge had no conflict does not mean that the resulting code actually works. Thus, rebase runs the very real risk of adding non-functional check-ins to the permanent record.”
This is true only for the special case of online merges. These should be avoided like the plague, in any case. I know that people really, really trust their tools. I know that they think that merges are infallible, that their CI builds their software and runs their tests and gives their pull request a green flag and a thumbs-up.
But anything other than a trivial pull request should be examined with tools more capable than online repository managers. Not only are they not as good, they are wildly inefficient when compared to a good desktop tool. I know this next generation of developers want to do everything on their phones, but this is ridiculous. The screen is too small and the tools are too limited.
Get a machine with usable screen real estate and learn what being efficient really means. Not only will you be quicker, you’ll be better: your error rate will decrease and you’ll see connections in the commits much better than with the (comparatively) meager online tools. I’ve written before about one such UI, SmartGit, in Git: Managing local commits and branches and Using Git efficiently: SmartGit + BeyondCompare.
Other online tools have similar weaknesses versus their desktop brethren: for example, text editors like Word or Google Docs. It’s definitely a killer feature that they’re online, but their only selling point is that they’re attached to an online document storage. That’s the selling point. As amazing as it is that these tools run in a browser, they are pathetic compared to tools from thirty years ago. My God, I fondly remember WriteNow 4.0 for Mac OS 6 and 7, which handled a 250-page document with aplomb, complete with figures, tables, TOC, numbering, custom styles, … all of those things that an editor should do. Somehow, just because it’s in the cloud means that we should be happy with WordPad instead of a full-fledged editor. It’s a joke.
Where does lying begin?
The author claims that,
“Rebasing is the same as lying By discarding parentage information, rebase attempts to deceive the reader about how the code actually came together.”
Then you should include all command/undo buffers from your IDE, too. At this point in the document, the author is just repeating the same argument over and over, reformulated but not different.
“Unless your project is a work of fiction, it is not a “story” but a “history.” Honorable writers adjust their narrative to fit history. Rebase adjusts history to fit the narrative.”
That’s not even how human history works. It’s not even how your own stories about your own life work. This is the kind of mentality that wants to keep all 6000 pictures from a vacation. Why? Just in case you need that picture of the ground that you took by accident? Because you need all 300 pictures of the Matterhorn? You’re wasting your readers’ time and your own.
“The intent is that development appear as though every feature were created in a single step: no multi-step evolution, no back-tracking, no false starts, no mistakes.”
Again, he proposes to fix a problem—poorly built commits—by not allowing anyone to modify commits.
“We believe it is easier to understand a line of code from the 10-line check-in it was a part of — and then to understand the surrounding check-ins as necessary — than it is to understand a 500-line check-in that collapses a whole branch’s worth of changes down to a single finished feature.”
I agree with this 100%. As already noted above, though, the review should disallow such foolish hairball commits.
“The more comments you have from a given developer on a given body of code, the more concise documentation you have of that developer’s thought process.”
Correct. But you don’t want to see everything. He presents a false choice between all the history and an improperly truncated version. Then he says he’d rather have all of it, and wants to get rid of history-rewriting. This doesn’t fix the problem of shitty programmers making shitty commits. The only way to fix that is gatekeeping reviews and process. Taking a vital tool for clarity (rebasing) away from disciplined programmers is a terrible idea.
“If we rebase each feature branch down into the development branch as a single check-in, pushing only the rebase check-in up to the parent repo, only that fix’s developer has the information locally to perform the cherry-pick of the fix onto the stable branch.”
He really seems to.be attacking a repo-management/history-editing process I’ve never used. It sounds horrid.
“Rebasing is an anti-pattern. It is dishonest. It deliberately omits historical information. It causes problems for collaboration. And it has no offsetting benefits.”
Only one of those sentences is true.