2009-03-09 14:41:43
DVCS and DAGs, Part 2
In Part 1 of this article, I talked about the differences between modeling version control history as a DAG vs. a Line. The two most noteworthy kinds of feedback I received on this entry were:
- Several people accused me of spreading pro-Line-model FUD
because I mentioned some of the problems that happen with the DAG model
and stopped short of saying that the DAG model is going to cure cancer,
eliminate global warming and bring peace to the Middle East.
- Several people asked me how I drew those really cool diagrams.
Before I continue with Part 2, allow me to briefly respond to these two pieces of feedback.
My response about DVCS advocacy
Yes, my company ships a version control tool that is built on the Line model of history. Therefore, any DVCS is, to a certain extent, a competitor to my product.
I further acknowledge that I am breaking the rules.
- Business folks like me aren't supposed to ever say
anything positive about their competitors.
- Our job is to feel threatened by change, and to spread
that fear around to others.
- We're supposed to pretend like we don't know that every design choice has tradeoffs, and to insist that our way is better in all situations.
As my Mother can confirm, I don't always follow the rules very well. :-)
The simple fact is that I find this stuff interesting. I have been working in the version control industry for over a decade. I am writing a book on the topic. This is what I do. It's interesting to me.
Really.
But there is more happening here than just me being an entrepreneurial rebel. Let's see -- how can I say this nicely?
You Git fans need to chill.
Seriously, rabid advocacy by Git fans is making the world a lousy place to live. Git is really cool, but it is not the right tool for every situation.
In their defense, let's acknowledge that the apple didn't fall far from the tree on this particular issue. When people begin exploring DVCS, often one of the first things they find is the video of Linus Torvalds and his 2007 presentation about Git. And what they find there is someone who doesn't seem to get it.
Folks, Subversion is probably the most popular version control tool in the world right now. Almost everyone using a version control tool today is using one that is built on the Line model of history, and they're using these tools successfully and productively. When someone refuses to acknowledge any validity in that model, they look clueless.
The Torvalds video has done plenty of damage. That kind of attitude is a big turn-off for people interested in what's new in the world of version control.
So, my fellow admirers of Git, if you are trying to prevent people from using DVCS tools and make sure that they stay confined to their current niches, then keep up the good work.
But if you really want to help the world see the benefits of Git and similar tools, then start realizing that people were getting productive work done before they existed.
My response about those cool diagrams
My DAG pictures were drawn by SourceGear's graphic artist, John Woolley, who also did all the artwork for the Evil Mastermind comic books. John is doing the layout and illustration work for my upcoming source control book as well.
However, because John's DAG pictures got more praise than my "thousand words", I have decided to be bitter and refuse to include any of his work in this blog entry. :-)
OK, let's talk more about DAGs
As I mentioned in Part 1, if a DAG is allowed to grow without guidance, things can turn into a real mess. DAGs are easier to create. Lines are easier to use. As soon as we embrace the DAG model to gain all its benefits, the very next thing that happens is that we want Lines back.
This is why every DVCS has features that can be used to make sure the DAG grows with guidance. Those features are designed to discourage people from committing without taking any responsibility for the complexity that increases every time we add another point of divergence.
In other words, every DVCS has features that allow developers to take a piece of the DAG and treat it like a Line.
Git
Git guides the growth of the DAG through its support for named branches. You are discouraged from committing something unless its parent is a leaf.
So, if I use the git checkout command to point my working directory to a DAG node which is not a leaf, Git politely fusses at me:
eric$ git checkout 9542b
Note: moving to "9542b" which isn't a local branch
If you want to create a new branch from this checkout, you may do so
(now or later) by using -b with the checkout command again. Example:
git checkout -b <new_branch_name>
HEAD is now at 9542b5f... initial
If you only commit things that are based on the leaf, then your history stays very Line-like.
Mercurial
Historically, Mercurial has been described as supporting only one branch per repository instance. Comparisons to Git often focused on Mercurial's apparent lack of inter-repository branching.
I speak in the past tense here because I have heard that Mercurial has added additional features in this area.
I mention Mercurial here only so that its fans don't feel too left out. I can't speak from much experience using this particular tool.
Still, I feel comfortable citing Mercurial as an example of my point: In [at least] its early releases, Mercurial was guiding the growth of the DAG by preventing the user from diverging it. This almost certainly contributed to the widespread perception of Mercurial as a very easy-to-use tool.
Bazaar
This tool is the DVCS I have used the most, but I still can't call myself an expert. From my own experience, I would characterize Bzr as a tool that works very hard to guide the growth of the DAG.
Whenever I push changes from my local repo to a central server, Bzr requires me to merge in other changes and commit from the leaf, just like a Line model tool would do.
It's rather cool that Bazaar offers me the option of using a central server instead of as a pure DVCS. But in this mode, the same basic restriction applies: I can't commit anything unless my baseline is the leaf in the repo.
When I use Bzr, it usually feels like I am using a Line-model tool.
My own preferences
On this particular issue, I actually prefer Git's way of doing things.
Bazaar seems to believe that DAG divergence is only legitimate when it happens in separate repo instances and must be resolved before anything can be pushed or committed together. This just feels too heavy-handed for a DVCS. Once I know about the DAG, I want to be allowed to think that way. I don't mind being warned when I am about to commit a DAG node which would have an older sibling. But forcing me to merge in order to commit feels very un-DVCS-like to me.
I like Git's ability to switch my baseline using "git checkout branchname". I understand that people who are not accustomed to thinking about the DAG do find this capability to be unintuitive. But I like it.
Note that I still like Line-model tools like Subversion and Vault as well. I'm just saying that a DAG-model tool should act like one.
Fossil
Lately, the DVCS which intrigues me the most is Fossil. It was written by D. Richard Hipp, the same guy who wrote SQLite.
Fossil has a number of interesting features. Most notable is the built-in support for bug tracking. This is one area where the other DVCS's all fail. They bring you distributed version control, but when it comes time for a developer to update the bug tracking system, things suddenly go back to the centralized world.
Anyway, I'm just getting started with looking closely at Fossil, but I do like the way its website talks about this problem of DAG divergence:
Having more than one leaf in the check-in tree is usually considered undesirable, and so forks are usually either avoided entirely, as in figure 1, or else quickly resolved as shown in figure 3. But sometimes, one does want to have multiple leaves. For example, a project might have one leaf that is the latest version of the project under development and another leaf that is the latest version that has been tested. When multiple leaves are desirable, we call the phenomenon branching instead of forking.
Nice. So far, I get the impression that Fossil works like Git does in this respect. When the DAG diverges, complexity increases. Feel free to offer me a little protection from that complexity by informing me of what's going on. But don't get in my way.