Storing Context in Commit Messages

Reading Time: 9 minutes

This is the third post in a series about asynchronous collaboration. By asynchronous, I mean that people on the team don’t always work at the exact same time. It’s common on distributed teams, especially across time zones. You can see all the posts so far on this series right here.

In the previous posts, I shared how I emphasize shared context in pull requests that I submit and review. In this post, I talk about an often overlooked way to share context: commit messages.

letter to my future self. Photo from nourishing existence.

Commit Messages as a Collaborative Tool

I’ve been slinging code for a minute, y’all.

You want to know what has saved my ass more than any other tool or practice I’ve used?

  • Not automated testing, even though I talk about it constantly.
  • Not pairing, even though I talk about that constantly too.
  • Not even the refactoring or debugging frameworks I won’t shut up about on this blog, to your dismay.

It’s version control. Version control has rescued my sorry programmer butt on countless occasions, and on countless more I have used version control to rescue other sorry programmer butts.

That’s why it’s such a shame that we woefully under-use it. Git, for example, is powerful and well-engineered and utterly badass. And if we’d learn to use it to its full capability, we could collaborate on code like this:

instead of like this:

Let’s talk about some git skills we can use to achieve a faster and smoother ride.

What makes a good commit message?

When I wrote about the practice of commit tracing to get up to speed on a code base, we agreed that a commit should have a few important qualities:

Clearly named: I want commit messages that concisely but completely explain what change they make (clearly named). This allows me to pull up git log and choose a commit that makes a change that sounds relevant to my interests.

Well-circumscribed: I also want each commit to contain all the code changes that contributed to the stated change and not to contain code changes that did not contribute to the stated change. I am learning: I want the advantage of understanding exactly how a given change was made without the distraction of distinguishing unrelated changes.

Clearly-named, well-circumscribed commits allow another developer to come along after us and use our commits to understand the code base. This goal guides us to divide up our code changes into their discrete purposes. It also guides us to craft a title for the commit message that clearly explains what the current change set accomplishes for the software.

I sometimes also find it useful to provide additional context in a commit message:

  • Why did we want the software to do this?
  • Does this change resolve any other lingering issues?
  • Does this change introduce issues that developers should be aware of?
  • Does this change introduce any new code conventions?

Here’s an example of a commit message on the raft project, where the code was not doing what I expected because I wrote the software with an inaccurate understanding of all the use cases. I explain my assumption, explain that it was wrong, and document the new information. This way, others who modify this code after me won’t make the same mistake I did:

@chelseatroy commit 45c21c1d1d65dbc3092383cb25c0189db73ee59f

chelseatroy committed 5 days ago
Fix the logic that determines when the key value store 
must account for the term number in reads and writes

+ In the last commit, I assumed that the cases 
where the key value store needs to write to its 
log and the cases where the term number is not 
present in the command were the same. They are not. 
If a leader sends a follower server a command to 
catch up its log, that log has term numbers in it, 
but the follower server also needs to write those 
commands to its own log.

So we really have three DIFFERENT cases:
1. Terms present, do not write: A server restarts 
   and is catching up its in memory store from its own log
2. Terms present, DO write: A follower server is getting 
   caught up from a leader server
3. Terms absent, DO write: A leader server receives requests 
   to write to its logs from a client

Now, if another developer (or my future self) runs git annotate on any of the lines that this commit changed, they can read the message that fully explains why it looks the way it does. Like so:

git annotate

You can get to this in JetBrains editors with the key binding Ctrl + V (Mac). That pulls up your VCS Operations menu. Scroll down to number 5 (Annotate) and hit enter. A vertical bar will appear next to your line numbers, with each line displaying a date (latest commit that changed this line) and a committer. If you hover over any of these, the full message appears in a modal as shown above.

Now, this view of git annotate is only going to show the message for the last commit that touched this line. So if we need to make another change to this line later, the message for the commit where we do that change will show up here instead of this one.

That makes these commit messages seem too transient to put all this effort in, doesn’t it?

As it so happens, you can search for all the commits for a specific branch, date, or file (or set of files). You can even search for terms in the commit messages:

Screen Shot 2019-12-24 at 7.44.42 PM.png

You get here in Jetbrains editors with Command + 9 (Version Control Menu), then clicking the tab at the top that says “Log.” It should automatically show all your commits from your current branch. You have controls at the top for entering a search term, choosing a branch, a committer, a date, or a file path (you can check multiple file paths at once).

What if it’s a big file, and you want the history of a specific line?

My editor doesn’t do that, but there’s a git command for it:

git commit track changes to line

The command goes git log -L,:path/to/file.ext.

Yes, I know: these commit messages are “long”.

Such messages offend the sensibilities of programmers who subscribe to the 80 character limit rule, even though the screens we invented that rule for have been out of circulation for 30 years. I do not base my asynchronous communication practices on rules we made up for screens we used when asynchronous communication didn’t exist.

Chelsea, are you saying we have to write all our commits this way?

No. I trust you to determine when this level of context is necessary and appropriate. In fact, I wrote this piece a while ago that shows you six different ways you could write commit messages. Hopefully that provides enough options to choose from.

That having been said, distributed teams (and even colocated ones!) usually derive their struggles from team members sharing too little context—not too much. So I endeavor to maximize the probability that the materials I leave behind can transfer all the context I have such that, if I am not immediately available, my siloed knowledge isn’t blocking somebody else. And commit messages provide a useful, timestamped journal in which to do that.

Conclusion

On distributed teams, it’s critical for team members to share context with each other. Commit messages can help us do this because they provide a timestamped journal with which to track the evolution of a code base.

First, the commit should be clearly named and well circumscribed. In addition, I use commit messages to provide additional context like:

  • Why did we want the software to do this?
  • Does this change resolve any other lingering issues?
  • Does this change introduce issues that developers should be aware of?
  • Does this change introduce any new code conventions?

Another developer (or my future self) can run git annotate on any of the lines that this commit changed, they can read the message that fully explains why it looks the way it does. We can even run a git command to see all the commits that changed a specific branch, file, or set of lines, so we can dig up a story of how this code has changed over time.

If you liked this piece, you might also like:

The rest of the Remote Work Category (including a long series on working from multiple locations)

This post on contributing to open source software (many OS teams are distributed)

Michael Lopp’s book on managing humans (or, if you don’t have time for that, my blog post on the book)

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.