2004-08-26 13:33:33

Chapter 2: Checkins

This is part of an online book called Source Control HOWTO, a best practices guide on source control, version control, and configuration management.

< Chapter 1

Chapter 3 >

In this chapter, I will explore the various situations wherein a repository is modified, starting with the simplest case of a single developer making a change to a single file.

Editing a single file

Consider the simple situation where a developer needs to make a change to one source file. This case is obviously rather simple:

Checkout the file
Edit the working file as needed
Checkin the file

I won't talk much about step 2 here, as it doesn't really involve the SCM tool directly. Editing the file usually involves the use of some other tools, like an integrated development environment (IDE).

But I do want to explore steps 1 and 3 in greater detail.

Step 1: Checkout

Checking out a file has two basic effects:

On the server, the SCM tool will remember the fact that you have the file checked out so that others may be informed.
On your client, the SCM tool will prepare your working file for editing by changing it to be writable.

The server side of checkout

File checkouts are a way of communicating your intentions to others. When you have a file checked out, other users can be aware and avoid making changes to that file until you are done with it. The checkout status of a file is usually displayed somewhere in the user interface of the SCM client application. For example, in the following screendump from Vault, users can see that I have checked out libsgdcore.cpp:

Best Practice: Use checkouts and locks carefully

It is best to use checkouts and locks only when you need them. A checkout discourages others from modifying a file, and a lock prevents them from doing so. You should therefore be careful to use these features only when you actually need them.

Don't checkout files just because you think you might need to edit them.

Don't checkout whole folders. Checkout the specific files you need.

Don't checkout hundreds or thousands of files at one time.

Don't hold exclusive locks any longer than necessary.

Don't go on vacation while holding exclusive locks on files.

This screendump also hints at the fact there are actually two kinds of checkouts. The issue here is the question of whether two people can checkout a file at the same time. The answer varies across SCM tools. Some SCM tools can be configured to behave either way.

Sometimes the SCM tool will allow multiple people to checkout a file at the same time. SourceSafe and Vault both offer this capability as an option. When this "multiple checkouts" feature is used, things can get a bit more complicated. I'll talk more about this later.

If the SCM tool prevents anyone else from checking out a file which I have checked out, then my checkout is "exclusive" and may be described as a "lock". In the screendump above, the user interface is indicating that I have an exclusive lock on libsgdcore.cpp. Vault will allow no one else to checkout this file.

The client side of checkout

On the client side, the effect of a checkout is quite simple: If necessary, the latest version of the file is retrieved from the server. The working file is then made writable, if it was not in that state already.

All of the files in a working folder are made read-only when the SCM tool retrieves them from the repository. A file is not made writable until it is checked out. This prevents the developer from accidentally editing a file.

Undoing a checkout

Normally, a checkout ends when a checkin happens. However, sometimes we checkout a file and subsequently decide that we did not need to do so. When this happens, we "undo the checkout". Most SCM tools have a command which offers this functionality. On the server side, the command will remove the checkout and release any exclusive lock that was being held. On the client side, Vault offers the user three choices for how the working file should be treated:

Revert: Put the working file back in the state it was in when I checked it out. Any changes I made while I had the file checked out will be lost.
Leave: Leave the working file alone. This option will effectively leave the file in a state which we call "Renegade". It is a bad idea to edit a file without checking it out. When I do so, Vault notices my transgression and chastises me by letting me know that the file is "Renegade".
Delete: Delete the working file.

I usually prefer to work with "Revert" as my option for how the Undo Check Out command behaves.

Step 3: Checkin

Best Practice: Explain your checkins completely

Every SCM tool provides a way to associate a comment when checking changes into the repository. This comment is important. If we consistently use good checkin comments, our repository's history contains not only every change we have ever made, but it also contains an explanation of why those changes happened. These kinds of records can be invaluable later as we forget things.

I believe developers should be encouraged to enter checkin comments which are as long as necessary to explain what is going on. Don't just type "minor change". Tell us what the minor change was. Don't just tell us "fixed bug 1234". Tell us what bug 1234 is and tell us a little bit about the changes that were necessary to fix it.

One issue does deserve special mention. Most SCM tools ask the user to enter a comment when making a checkin. This comment will be stored in the repository forever along with the changes being submitted. The comment provides a place for the developer to explain what was changed and why the change was made.

After the file is checked out, the developer proceeds to make her changes. She edits the file and verifies that her change is correct. Having completed all this, she is ready to submit her changes to the repository. Doing so will make her change permanent and official. Submitting her changes to the repository is the operation we call "checkin".

The process of a checkin isn't terribly complicated:

The new version of the file is sent to the SCM server where it is stored.
The version number of the file in the repository is incremented by one.
The file is no longer considered to be checked out or locked.
The working file on the client side is made read-only again.

The following screendump shows the checkin dialog box from Vault:

Checkins are additive

It is reassuring to remember one fundamental axiom of source control: Nothing is ever destroyed. Let us suppose that we are editing a file which is currently at version 4. When we checkin our changes, our new version of the file becomes version 5. Clients will be notified that the latest version is now 5. Clients that are still holding version 4 in their working folder will be warned that the file is now "Old".

But version 4 is still there. If we ask the server for the latest version, we will get 5. But if we specifically ask for version 4, and for any previous version, we can still get it.

Each checkin adds to the history of our repository. We never subtract anything from that history.

Other kinds of checkins

We will informally use the word "checkin" to refer to any change which is made to the repository. It is common for a developer to say, "I made some checkins this afternoon to fix that bug", using the word "checkin" to include any of the following types of changes to the repository:

Create a new folder
Add a file
Rename a file or folder
Delete a file or folder
Move a file or folder

It may seem odd to refer to these operations using the word "checkin", because there is no corresponding "checkout" step. However, this looseness is typical of the way people use the word "checkin", so you'll get used to it.

I will take this opportunity to say a few things about how these operations behave. If we conceptually think of a folder as a list of files and subfolders, each of these operations is actually a modification of a folder. When we create a folder inside folder A, then we are modifying folder A to include a new subfolder in its list. When we rename a file or folder, the parent folder is being modified.

Just as the version number of a file is incremented when we modify it, these folder-level changes cause the version number of a folder to be incremented. If we ask for the previous version of a folder, we can still retrieve it just the way it was before. The renamed file will be back to the old name. The deleted file will reappear exactly where it was before.

It may bother you to realize that the "delete" command in your SCM tool doesn't actually delete anything. However, you'll get used to it.

Atomic transactions

I've been talking mostly about the simple case of making a change to a single source code file. However, most programming tasks require us to make multiple repository changes. Perhaps we need to edit more than one file to accomplish our task. Perhaps our task requires more than just file modifications, but also folder-level changes like the addition of new files or the renaming of a file.

When faced with a complex task that requires several different operations, we would like to be able to submit all the related changes together in a single checkin operation. Although tools like SourceSafe and CVS do not offer this capability, some source control systems (like Vault and Subversion) do include support for "atomic transactions".

Best Practice: Group your checkins logically

I recommend that each transaction you check into the repository should correspond to one task. A "task" might be a bug fix or a feature. Include all of the repository changes which were necessary to complete that task, and nothing else. Avoid fixing multiple bugs in a single checkin transaction.

The concept is similar to the behavior of atomic transactions in a SQL database. The Vault server guarantees that all operations within a transaction will stay together. Either they will all succeed, or they will all fail. It is impossible for the repository to end up in a state with only half of the operations done. The integrity of the repository is assured.

To ensure that a transaction can contain all kinds of operations, Vault supports the notion of a pending change set. Essentially, the Vault client keeps a running list of changes you have made which are waiting to be sent to the server. When you invoke the Delete command, not only will it not actually delete anything, but it doesn't even send the command to the server. It merely adds the Delete operation to the pending change set, so that it can be sent later as part of a group.

In the following screen dump, my pending change set contains three operations. I have modified libsgdcore.cpp. I have renamed libsgdcore.h to headerfile.h. And I have deleted libsgdcore_diff_file.c.

Note that these operations have not actually happened yet. They won't happen unless I submit them to the server, at which time they will take place as a single atomic transaction.

Vault persists the pending change set between sessions. If I shutdown my Vault client and turn off my computer, next time I launch the Vault client the pending change set will contain the same items it does now.

The Church of "Edit-Merge-Commit"

Up until now, I have explained everything about checkouts and checkins in a very "matter of fact" fashion. I have claimed that working files are always read-only until they are checked out, and I have claimed that files are always checked out before they are checked in. I have made broad generalizations and I have explained things in terms that sound very absolute.

I lied.

In reality, there are two very distinct doctrines for how this basic interaction with an SCM tool can work. I have been describing the doctrine I call "checkout-edit-checkin". Reviewing the simple case when a developer needs to modify a single file, the practice of this faith involves the following steps::

Checkout the file
Edit the working file as needed
Checkin the file

Followers of the "checkout-edit-checkin" doctrine are effectively submitting to live according to the following rules:

Files in the working folder are read-only unless they are checked out.
Developers must always checkout a file before editing it. Therefore, the entire team always knows who is editing which files.
Checkouts are made with exclusive locks, so only one developer can checkout a file at one time.

This approach is the default behavior for SourceSafe and for Vault. However, CVS doesn't work this way at all. CVS uses the doctrine I call "edit-merge-commit". Practicers of this religion will perform the following steps to modify a single file:

Edit the working file as needed
Merge any recent changes from the server into the working file
Commit the file to the repository

The edit-merge-commit doctrine is a liberal denomination which preaches a message of freedom from structure. Its followers live by these rules:

Files in the working folder are always writable.
Nobody uses checkouts at all, so nobody knows who is editing which files.
When a developer commits his changes, he is responsible for ensuring that his changes were made against the latest version in the repository.

As I said, this is the approach which is supported by CVS. Vault supports edit-merge-commit as an option. In fact, when this option is turned on, we informally say that Vault is running in "CVS mode".

Each of these approaches corresponds to a different style of managing concurrent development on a team. People tend to have very strong feelings about which style they prefer. The religious flame war between these two churches can get very intense.

Holy Wars

The "checkout-edit-checkin" doctrine is obviously more traditional and conservative. When applied strictly, it is impossible for two people to modify a given file at the same time, thus avoiding the necessity of merging two versions of a file into one.

The "edit-merge-commit" teaches a lifestyle which is riskier. The risk is that the merge step may be tedious or cause problems. However, the acceptance of this risk rewards us with a concurrent development style which causes developers to trip over each other a lot less often.

Still, these risks are real, and we will not flippantly disregard them. A detailed discussion of file merging appears in the next chapter. For now I will simply mention that most SCM tools include features that can safely do a three-way merge automatically. Not all developers are willing to trust this feature, but many do.

So, when using the "edit-merge-commit" approach, the merge must happen, and we are left with two choices:

Attempt the automerge. (can be scary)
Merge the files by hand. (can be tedious)

Developers who prefer "checkout-edit-checkin" often find both of these choices to be unacceptable.

Best Practice: Get the best of both worlds

Here at SourceGear we are quite proud of the fact that Vault allows each developer to choose their own concurrent development style. Developers who prefer "checkout-edit-checkin" can work that way. Developers who prefer "edit-merge-commit" can use that approach, and they still have exclusive locks available to them for those times when they are needed. As far as I know, Vault is the only product that offers this flexibility.

I apologize for this completely shameless plug. I won't do it very often.

I will confess that I am a disciple of the edit-merge-commit religion. People who use edit-merge-commit often say that they cannot imagine going back to what life was like before. I agree.

It is so very convenient to never be required to checkout a file. All the files in my working folder are always writable. If I want to start working on a bugfix or a feature, I simply open a text editor and begin making my changes.

This benefit is especially useful when I am disconnected from the server. When people ask me about the best way to use Vault while "offline", I tell them to consider using edit-merge-commit. Since I don't have to contact the server to checkout a file, I can simply proceed with my changes. The only time I need the server is when it comes time to merge and commit.

As I said, automerge is amazingly safe in practice. Thousands of teams use it every day without incident. I have been actively using edit-merge-commit as my development style for over five years, and I cannot remember a situation where automerge produced an incorrect file. Experience has made me a believer.

Looking Ahead

In the next chapter, I will be talking in greater detail about the process of merging two modified versions of a file.

< Chapter 1

Chapter 3 >