2005-02-24 20:08:13
Chapter 7: Branches
This is part of an online book called Source Control HOWTO, a best practices guide on source control, version control, and configuration management.
< Chapter 6 | Chapter 8 > |
What is a branch?
A branch is what happens when your development team needs to work on two distinct copies of a project at the same time. This is best explained by citing a common example:
Suppose your development team has just finished and released version 1.0 of UltraHello, your new flagship product, developed with the hope of capturing a share of the rapidly growing market for "Hello World" applications.
But now that 1.0 is out the door, you have a new problem you have never faced before. For the last two years, everybody on your team has been 100% focused on this release. Everybody has been working in the same tree of source code. You have had only one "line of development", but now you have two:
- Development of 2.0. You have all kinds of new features which just didn't make it into 1.0, including "multilingual Hello", DirectX support for animated Hellos, and of course, the ability to read email.
- Maintenance of 1.0. Now that real customers are using UltraHello, they will probably find at least one bug your testing didn't catch. For bug fixes or other minor improvements requested by customers, it is quite possible that you will need to release a version 1.0.1.
It is important for these two lines of development to remain distinct. If you release a version 1.0.1, you don't want it to contain a half-completed implementation of a 2.0 feature. So what you need here is two distinct source trees so your team can work on both lines of development without interfering with each other.
The most obvious way to solve this problem would simply be to make a copy of your entire source control repository. Then you can use one repository for 1.0 maintenance and the other repository for 2.0 development. I know people who do it this way, but it's definitely not a perfect solution.
The two-repository approach becomes disappointing in situations where you want to apply a change to both trees. For example, every time we fix a bug in the 1.0 maintenance tree, we probably also want to apply that same bug fix to the 2.0 development tree. Do we really want to have to do this manually? If the bug fix is a simple change, like fixing the incorrect spelling of the word "Hello", then it won't take a programmer very long to make the change twice. But some bug fixes are more involved, requiring changes to multiple files. It would be nice if our source control tool would help. A primary goal for any source control tool should be to help software teams be more concurrent, everybody busy, all at the same time, without getting in each other's way.
To address this very type of problem, source control tools support a feature which is usually called "branching". This terminology arises from the tendency of computer scientists to use the language of a physical tree every time hierarchy is involved. In this particular situation, the metaphor breaks down very quickly, but we keep the name anyhow.
A somewhat better metaphor happens when we envision a nature path which forks into two directions. Before the fork, there was one path. Now there are two, but they share a common history. When you use the branching feature of your source control tool, it creates a fork in the path of your development progress. You now have two trees, but the source control has not forgotten the fact that these two trees used to be one. For this reason, the SCM tool can help make it easier to take code changes from one fork and apply those changes to the other. We call this operation "merging branches", a term which highlights why the physical tree metaphor fails. The two forks of a nature path can merge back into one, but two branches of an oak tree just don't do that. I'll talk a lot more about merging branches in the next chapter.
At this point I should take a step back and admit that my example of doing 1.0 maintenance and 2.0 features is very simplistic. Real life examples are sometimes far more complicated, involving multiple branches, active development in each branch, and the need to easily migrate changes between any two of them. Branching and merging is perhaps the most complex operation offered by a source control tool, and there is much to say about it. I'll begin with some "cars and clocks" stuff and talk about how branching works "under the hood".
Two branching models
Best Practice: Organize your branches
The "folder" model of branching usually requires you to have one extra level of hierarchy in your repository tree. Keep your main development in a folder named $/trunk. Then create another folder called $/branches. Each time you create a branch off of the trunk, put it in $/branches.
First of all, let's acknowledge that there are [at least] two popular models for branching. In the first approach, a branch is like a parallel universe.
- The hierarchy of files and folders in the repository is sort of like the regular universe.
- For each branch, there is another universe which contains the same hierarchy of files and folders, but with different contents.
In order to retrieve a file, you specify not just a path but the name of the universe, er, branch, from which you want the file retrieved. If you don't specify a branch, then the file will be retrieved from the "default branch". This is the approach used by CVS and PVCS.
In the other branching model, a branch is just another folder, located in the same repository hierarchy as everything else. When you create a branch of a folder, it shows up as another folder. With this approach, a repository path is sufficient to describe a location.
Personally, I prefer the "folder" style of branching over the "parallel universe" style of branching, so my writing will generally come from this perspective. This is the approach used by most modern source control tools, including Vault, Subversion (they call it "copy"), Perforce (they call it "Inter-File Branching") and Visual Studio Team System (looks like they call it branching in "path space").
Under the hood
Good source control tools are clever about how they manage the underlying storage issues of branching. For example, let us suppose that the source code tree for UltraHello is stored in $/projects/Hello/trunk. This folder contains everything necessary to do a complete build of the shipping product, so there are quite a few subfolders and several hundred files in there.
Now that you need to go forward with 1.0 maintenance and 2.0 development simultaneously, it is time to create a branch. So you create a folder called $/projects/Hello/branches. Inside there, you create a branch called 1.0.
At the moment right after the branch, the following two folders are exactly the same:
$/projects/Hello/trunk
$/projects/Hello/branches/1.0
It appears that the source control tool has made an exact copy of everything in your source tree, but actually it hasn't. The repository database on disk has barely increased in size. Instead of duplicating the contents of every file, it has merely pointed the branch at the same contents as the trunk.
As you make changes in one or both of these folders, they diverge, but they continue to share a common history.
The Pitiful Lives of Nelly and Eddie
In order to use your source control tool most effectively, you need to develop just the right amount of fear of branching. This delicate balance seems to be very difficult to find. Most people either have too much fear or not enough.
Nelly is an example of a person who has too much fear of branching. Nelly has a friend who has a cousin with a neighbor who knows somebody whose life completely fell apart after they tried using the branch and merge features of their source control tool. So Nelly refuses to use branching at all. In fact, she wrote a 45-page policy document which requires her development team to never use branching, because after all, "it's not safe".
So Nelly's development team goes to great lengths to avoid using branching, but eventually they reach a point where they need to do concurrent development. When this happens, they do anything they can to solve the problem, as long as it doesn't involve the word "branch". They fork a copy of their tree and begin working with two completely separate repositories. When they need to make a change to both repositories, they simply make the change by hand, twice.
Best Practice: Don't be afraid of branches
If you're doing parallel development, let your source control tool help. That's what it was designed to do.
Obviously these people are still branching, but they keep Nelly happy by never using "the b word". These folks are happy, and we should probably just leave them alone, but the whole situation is kind of sad. Their source control tool has features which were specifically designed to make their lives easier.
At the other end of the spectrum is Eddie, who uses branching far too often. Eddie started out just like Nelly, afraid of branching because he didn't understand it. But to his credit, Eddie overcame his fear and learned how powerful branching and merging can be.
And then he went off the deep end.
After he tried branching and had a good first experience with it, Eddie now uses it all the time. He sometimes branches multiple times per week. Every time he makes a code change, he creates a private branch.
Eddie arrives on Monday morning and discovers that he has been assigned bug 7136 (In the Elbonian version, the main window is too narrow because the Elbonian language requires 9 words to say "Hello World".) So Eddie sits down at his desk and begins the process of fixing this bug. The first thing he does is create a branch called "bug_7136". He makes his code change there in his "private branch" and checks it in. Then, after verifying that everything is working okay, he uses the Merge Branches feature to migrate all changes from the trunk into his private branch, just to make sure his code change is compatible with the very latest stuff. Then he runs his test suite again. Then he notices that the repository has changed yet again, then he does this loop once more. Finally, he uses Merge Branches to apply his code fixes to the trunk. Then he grabs a copy of the trunk code, builds it and runs the test suite to verify that he didn't accidentally break anything. When at last he is satisfied that his code change is proper, he marks bug 7136 as complete. By now it is Friday afternoon at 4:00pm, and there's no point in starting anything new at this point, so he just decides to go home.
Eddie never checks anything into the main trunk. He only checks stuff into his private branch, and then merges changes into the trunk. His care and attention to detail are admirable, but he's spending far more time using his source control tool than working on his code.
Let's not even think about what the kids would be like if Eddie and Nelly were to get married.
Dev--Test--Prod
Once you established the proper level of comfort with the branching features of your source control tool, the next question is how to use those features effectively.
One popular methodology for SCM is often called "code promotion". The basic idea here is that your code moves through three stages, "dev" (stuff that is in active development), "test" (stuff that is being tested) and "prod" (stuff that is ready for production release):
- As code gets written by programmers, it is placed in the dev tree. This tree is "basically unstable". Programmers are only allowed to check code into dev.
- When the programmers decide they are done with the code, they "promote" it from dev to test. Programmers are not allowed to check code directly into the test tree. The only way to get code into test is to promote it. By promoting code to test, the programmers are handing the code over to the QA team for testing.
- When the testers decide the code meets their standards, they promote it from test to prod. Code can only be part of a release when it has been promoted to prod.
For a variety of reasons, I personally don't like working this way, but there's nothing wrong with it. Lots of people use this code promotion model effectively, especially in larger companies where the roles of programmer and tester are very clearly separated.
I understand that PVCS has specific feature support for "promotion groups", although I've never used this product personally. With other source control tools, the code promotion model can be easily implemented using three branches, one for dev, one for test, and one for prod. The Merge Branches feature is used to promote code from one level to the next.
Eric's Preferred Branching Practice
Best Practice: Keep a "basically unstable" trunk.
Do your active development in the trunk, the stability of which increases as you approach release. After you ship, create a maintenance branch and always keep it very stable.
Here at SourceGear our main development tree is called the "trunk". In our repository it is rooted at $/trunk and it contains all the source code and documentation for our entire product.
Most new code is checked into the trunk. In general, our developers try to never "break the tree". Anyone who checks in code which causes the trunk builds to fail will be the recipient of heaping helpings of trash talk and teasing until he gets it fixed. The trunk should always build, and as much as possible, the resulting build should always work.
Nonetheless, the trunk is the place where active development of new features is happening. The trunk could be described as "basically unstable", a philosophy of branching which is explained in Essential CVS, a fine book on CVS by O'Reilly. In our situation, the stability of the trunk build fluctuates over the months during our development cycle.
During the early and middle parts of a development cycle, the trunk is often not very stable at all. As we approach alpha, beta and final release, things settle down and the trunk gets more and more stable. Not long before release, the trunk becomes almost sacred. Every code change gets reviewed carefully to ensure that we don't regress backwards.
At the moment of release, a branch gets created. This branch becomes our maintenance tree for that release. Our current maintenance branch is called "3.0", since that's the current major version number of our product. When we need to do a bug fix or patch release, it is done in the maintenance branch. Each time we do a release out of the maintenance branch (like 3.0.2), we apply a label.
After the maintenance branch is created, the trunk once again becomes "basically unstable". Developers start adding the risky code changes we didn't want to include in the release. New feature work begins. The cycle starts over and repeats itself.
When to branch? Part 1: Principles
Best Practice: Don't create a branch unless you are willing to take care of it.
A branch is like a puppy.
Your decisions about when to branch should be guided by one basic principle: When you create a branch, you have to take care of it. There are responsibilities involved.
- In most cases, you will eventually have to perform one or more merge operations. Yes, the SCM tool will make that merge easy, but you still have to do it.
- If a merge is never necessary, then you probably have the responsibility of maintaining the branch forever.
- If you create a branch with the intention of never merging to or from it, and never making changes to it, then you should not be creating a branch. Use a label instead.
Be afraid of branches, but not so afraid that you never use the feature. Don't branch on a whim, but do branch when you need to branch.
When to branch? Part 2: Scenarios
There are some situations where branching is NOT the recommended way to go:
- Simple changes. As I mentioned above in my "Eddie" scenario, don't branch for every bug fix or feature.
- Customer-specific versions. There are exceptions to this rule, but in general, you should not branch simply for the sake of doing a custom version for a specific customer. Find a way to build the customizability into your app.
And there are some situations where branching is the best practice:
- Maintenance and development. The classic example, and the one I used above in my story about UltraHello. Maintaining version N while developing version N+1 is the perfect example of a time to use branching.
- Subteam. Sometimes a subset of your team needs to work on something experimental that will take several weeks. When they finish, their work will be folded into the main tree, but in the meantime, they need a separate place to work.
- Code promotion. If you want to use the dev-test-prod methodology I mentioned above, use a branch to model each of the three levels of code promotion.
When to branch? Part 3: Pithy Analogy
- A branch is like a working folder for multiple people.
- A working folder facilitates parallel development by allowing each person to have their own private place to work.
- When multiple people need a private place to work together, they need a branch.
Looking Ahead
In the next chapter I will delve into the topic of merging branches.
< Chapter 6 | Chapter 8 > |