2009-02-23 15:47:45

On Git's lack of respect for immutability and the Best Practices for a DVCS

I learned something very important from the feedback after my entry last week on Git's index. Here's what I learned:

Suppose I wrote a 300 page book describing all the great things about Git and why it is so awesome.

Further suppose that on page 295 near the bottom, I include a one-sentence mention of a way that I think Git might change for the better.

Further suppose that I wrote that sentence in Klingon. And then I encrypted it with Schneier's latest cipher, wrapped it in base64 encoding, ran it through rot13 and then pasted it into the book.

If I did this, the primary response from the Git user community would be: "Eric's new book says that Git sucks. He doesn't get it."

Trust me folks -- I get it. Commits to a DVCS are different. When you commit to a private instance of the repository, you don't "break the build". The rules and guidelines for a DVCS are different than the ones for a centralized system.

Best Practices

But some of the best practices are the same. Here's my off-the-cuff sloppy definition of a "best practice":

A best practice is a guideline that can be followed lots of times by lots of different people in lots of different situations with minimal likelihood of causing pain to the team.

Actually, I want to give TWO definitions. Here's another one, speaking as a source control vendor:

A best practice is a guideline that I can give to our customers to minimize the likelihood that they will need to call our tech support staff.

A technique can be "really cool" or "very powerful" and still not qualify for any reasonable person's definition of "best practice".

I stand by my original claims. I think "git add --p" is "really cool", but it doesn't qualify as a "best practice". It allows the developer to commit code they have never seen. Yes, that commit happens in a private instance of the repo, but that code is eligible to be pushed into another instance.

Is there a good outcome here?

Suppose I use "git add --p" to commit some code that doesn't even compile. What can happen?

Maybe this changeset never escapes my private repository instance. In that case, it has caused no harm. But it has also caused no benefit.
Maybe my next checkin fixes the build. So now the offending changeset is less likely to cause problems, because the fix will get pushed as well. But this scenario is equivalent to the centralized case where I break the build but fix it before anybody finds out. It's not very harmful, but it's not very helpful either.
Maybe I later use Git's history rewriting features to eliminate the offending changeset, replacing a chain of small changesets with one larger one that has been well-tested. In this scenario, I have eliminated all the potentially harmful effects, since the DAG will not have any nodes that are "broken". But now I have other concerns.

Immutability

The issue of rewriting history is perhaps my biggest philosophical objection to the way Git works. Call me old fashioned if you like, but I believe changesets and the history of the repository should be immutable. Version control features that alter history make me squirm.

My own product supports an "Obliterate" feature and I hate it. I understand why it's there, but I still wish it wasn't. One thing I've learned from twelve years of supporting version control products is that customers will find a way to misuse things.

The purpose of Obliterate is to help with that once-a-year situation where you really screwed up and checked in something that should never have been in the repository and absolutely must be removed. But every now and then we get a tech support call from somebody who is using Obliterate every day. Those are the days when I want to ship the product with that feature locked and only enable it for customers where every developer has passed a written exam.

Think about it. Even if you love Git's ability to rewrite history, does this sound to you like a "best practice"? Or does it sound like a quick way to get a bunch of geeks addicted to recreational pharmaceuticals?

Sandboxes

Like I said, I get it. A DVCS gives me a private sandbox, so I can have more freedom while I play. It's "really cool" that I can kick and throw sand without hurting the other kids. But that doesn't mean it's a "best practice".

Conceptually, my private instance of the repository is still part of a larger whole. The entire repository may not exist on any one machine, but it exists in concept. It is one big Directed Acyclic Graph. When I use "git add --p" and checkin something that doesn't compile, my offending commit is conceptually still a member of that DAG.

The best practices for a DVCS are built around this principle: The extra freedom provided by a private sandbox should be held in the proper tension with a measure of respect for the entire DAG.