Git Good

13 minute read

git ilustration

Git is a free and open source distributed Version Control System (VCS). Git can be hard, especially for people discovering it for the first time, but there are a lot of things that make this an exceptional VCS worth learning. This post is not meant to be a git tutorial, but rather a compilation of things and tips I find interesting. I also wanted to give a perspective of git independent of services like GitHub. Hopefully you can find something useful and “git better at it” 🙃.

If you’re looking to learn how to use git, take a look at the Pro Git book, it’s free and it is a great resource!

Git is not GitHub 😮

Services like GitHub made git particularly popular, but the convenience of something like Github has its downsides for new users: it causes confusion about the differences between GitHub or GitLab, and the actual version control system, git. While the former are great to host git repositories and make them publicly available (among other things), git is a piece of software designed to allow for distributed version control and collaboration. This means that you can use it offline, without GitHub, you can collaborate with people via chat, email, and anything that allows you to send text to someone else. Yes, GitHub makes it convenient to host your project and make it discoverable; it provides an incentive for collaboration, but, at the end of the day, GitHub is not git, just as Gmail is not e-mail.

Git is distributed 🖧

For those used to GitHub, collaborating with git over e-mail might seem anachronistic, but, consider the scale of projects and what git was designed to do. Being able to send contributions as plain text allied with the threaded nature of e-mail, means that you can have multiple discussions around certain aspects of a contribution, which is a big plus. Linux kernel development, for example, doesn’t happen on GitHub or GitLab. From the linux kernel GitHub Pull requests (GitHub’s system for submitting contributions):

Thanks for your contribution to the Linux kernel!

Linux kernel development happens on mailing lists, rather than on GitHub - this GitHub repository is a read-only mirror that isn’t used for accepting contributions. So that your change can become part of Linux, please email it to us as a patch.

Sending patches isn’t quite as simple as sending a pull request, but fortunately it is a well documented process.

Here’s what to do:

  • Format your contribution according to kernel requirements
  • Decide who to send your contribution to
  • Set up your system to send your contribution as an email
  • Send your contribution and wait for feedback

The development happens on various mailing lists for multiple subsystems. It’s distributed development to an unprecedented scale. This scale has a price, discipline in your contributions, something that GitHub doesn’t really help reinforce –which seems to be the whole issue that the creator of git Linus Torvalds has with it.

Git daemon 🌐

Focusing the development on a centralised service like GitHub can feel like you’re using an improved version of Subversion client/server approach, but remember, git is truly a decentralized system and we can do better. Suppose you don’t want to use GitHub, but want to make your repository available for other people to clone, pull from, etc. Git supports Peer-to-Peer setup out of the box with git daemon. Collaboration then happens on each peer local copy of the source tree and some form of communication channel (e.g. e-mail, chat, etc).

From your git folder you can execute the following command

git daemon --export-all --base-path=. --reuseaddr
  • --enable=upload-pack: (enabled by default) allows for git fetch, git pull, and git clone.
  • --export-all: means that you don’t have to create a file named git-daemon-export-ok in each exported repository.
  • --base-path: allows people to clone projects without specifying the entire path. Example: if you start the daemon with --base-path=/srv/git and try to pull from git://example.com/hello.git, git daemon will interpret the path as /srv/git/hello.git.
  • --reuseaddr: allows the server to restart without waiting for old connections to time out.

Congratulations, your machine is now running a git server and anyone can do:

git clone git://192.168.1.42/  #Your IP
# or
git remote add Foo git://192.168.1.42/
git fetch Foo
git checkout develop
git push Foo develop

Git patches 📝

As we have discussed, git is a decentralized system, you can send contributions to anyone without the need of a centralized git repository. This is not only the default way of collaborating with git, it is particularly useful when a server is down, or if you don’t have permissions to write to a remote repository —but would still like to propose changes.

You can create a patch from your current changes without committing the code on your source tree:

# changes in the working tree not yet staged for commit
git diff > big-improvements.patch
# or changes between the index and your last commit;
# what you would be committing if you run "git commit"
# without "-a" option.
git diff --cached > big-improvements.patch

As an example suppose the change is the addition of a README file, the diff would look like this:

diff --git a/README b/README
new file mode 100644
index 0000000..e69de29

More often, what we would like to do is to propose a change we have made in your local source tree. To do this, we can use the git format-patch command:

git format-patch master

if we have commits ahead of the master branch, a diff file will be generated. We can also reference other commits in the same branch. Suppose we made changes in the current branch and want to reference the changes in relation to the last commit. We can do:

git format-patch HEAD~1

the patch file will contain something like:

From daf1010eb425a67ca6b0ba60f7cbec15bcff31f1 Mon Sep 17 00:00:00 2001
From: John Doe <heresjohnny@bestmail.com>
Date: Tue, 09 Sep 2020 15:42:00 +0100
Subject: [PATCH] Update README

---
 README | 1 +
 1 file changed, 1 insertion(+)

diff --git a/README b/README
index e69de29..d0fc019 100644
--- a/README
+++ b/README
@@ -0,0 +1 @@
+This is a README file, that is all.
--
2.28.0

to apply the patch we do:

# patch as unstaged changes in your branch
git apply 0001-Update-README.patch

# patch as commits
git am 0001-Update-README.patch

For more information, check the documentation for git diff, git format-patch, git apply, and git am.

Git tips 🔥

We are all bound to get stuck sometimes when things go wrong. A good starting point is this compilation. These are solutions to problems I often have to solve.

Premature commit 🔧

You pulled the trigger too fast on that commit and wish you could include additional changes? Make your changes, call git commit --amend and done. (Also useful to change the commit message.)

Go back ⌛

We messed up, go back to a previous commit.

# go back n commits n=1 in this case,
# --soft: optionally don't discard changes
# and put them on the staging area instead
git reset HEAD~1 --soft

Put changes on hold 🚧

So you want to get back to the state before you started making changes, but don’t want to throw these changes away:

# stash the changes
git stash
# get the changes back when needed
git stash pop

Where it went wrong 🔍

You have a problem and don’t know which commit introduced it, enter git bisect:

git bisect start          # start a bisect section
git bisect bad            # Current version is bad
git bisect good v2.2.1    # v2.2.1 is known to be good

bisect will now choose commits in the middle of the history and you can mark them as good or bad with the same commands. When no more revisions are available you’ll have a description of the commit that caused the problem. You can then reset the bisect state with git bisect reset.

Rebasing commits 💥

Sometimes we make two commits when in reality, we could have included all the changes in a single commit, and our history would be clearer. This is what is known as squashing. We can use git rebase to meld commits with previous commits. It can also be used instead of merging branches. Git’s rebase command temporarily rewinds the commits on your current branch, pulls in the commits from the other reference and reapplies the re-winded commits back on top.

Most people will advise you to always squash the commits and rebase them onto the parent branch (like master or develop) before you submit a pull request or send out a patch. Whether rebasing is preferred to merging really depends on the context.

If you want to read more about rebase vs merge, check out this post

Whatever you do DO NOT rebase commits in a upstream repository people can pull from. It will mess everyone’s history and lead to conflicts that all downstream peers will need to fix. Also, it’s never a good idea to rebase somebody else’s work, see this discussion.

# rebase last 2 commits
git rebase -i HEAD~2

The interactive system will open an editor where you can choose each commit in a list that are about to be changed. In this case, it will list 2 lines with the last 2 commits. This list reflects exactly how your branch will look like after the rebase:

pick c8175df added line
pick dc58443 added final line

# Rebase 65e38e0..dc58443 onto 65e38e0 (2 commands)
#
# Commands:
# p, pick <commit> = use commit
# r, reword <commit> = use commit, but edit the commit message
# r, reword <commit> = use commit, but edit the commit message
# s, squash <commit> = meld into previous commit
# f, fixup <commit> = like "squash", but discard log message
# ...

You can pick, reorder or squash any commits you want to make for a more readable history.

As we discussed, you can also rebase the current branch onto another

git checkout feature
git rebase -i master

For more information about rebase see the documentation

Good commit messages 🧐

Commit messages should be consistent across a project in terms of style, content, and metadata. But some good rules are as follows:

  1. the minimum is a single short descriptive line (e.g. less than 50 characters);
  2. separate subject, body, other data, with blank lines;
  3. include metadata such as references to commits that introduce problems being solved.
  4. wrap the body to 72 characters.

A good one liner can be something like this

Fix typos in the abstract

the form <Verb> <Target> <Description> is sometimes enough context to describe a simple change.

A great way to learn what good commit messages look like is to study repositories where this is done right. It is to no surprise that the Linux kernel and git itself are good examples. There are also plenty resources on the subject such as How to Write a Git Commit Message.

Good commit messages are an important collaboration tool. They are the best way to communicate context about a change to a fellow developer, or to our future selves. git diff will tell us what changed, the WHAT, but this added context documents the WHY.

Git for writing 👨‍💻

I have been experimenting with git for writing. The idea being that we can benefit from using version control with our papers, lecture notes, blog posts, etc. Suppose we are considering git to track changes in a scientific paper. We can mark submissions with tags, use patches to incorporate changes from collaborators, branches to work on revisions, git diff to visualize the changes, etc.

Remember that git cares about meaningful lines, so the first thing to take into account is that we should, at minimum hard wrap our lines at a given character limit, and/or write each sentence on a different line. By using Markdown or LaTex to write a document, we will be tempted to use the editor for soft word wrapping. This can lead to huge one-line paragraphs. The problem with this is that changing a word in that paragraph will be recorded as a change to the entire paragraph. Moving lines around has a similar effect.

As a workaround, to visualise changes in such cases, we can use the --color-words option with git diff.

Consider a LaTeX document, for example. If we move lines around and call the following command:

git diff --color-words

this is what we get:

But what is actually recorded in the diff is the following:

To generate a pdf to visualise the changes in our tracked latex document, check out the git latexdiff tool that wraps around git and latexdiff.

The result of the following command is an output pdf with the changes in relation with the previous commit

git latexdiff HEAD~1

Archiving projects 🏛️

This is not related to git specifically, but more with good open science practices. Online repository hosts like Github are not archival. Git itself lets us tag commits to mark versions and releases, but tags can be deleted, and storage is totally dependent on us. Remember, git distributed, and local. Git is meant to track changes and collaborate, not archive the state of projects.

Platforms like Zenodo, on the other hand, let you conveniently archive versions of GitHub repositories based on GitHub releases (tags).

If you already use GitHub as a public accessible mirror for your project tracked by git, this means you can easily archive certain versions of your project and automatically make them citable since Zenodo attributes a Digital Object Identifier (DOI) to its submissions.

With this said, we should discuss how to archive, backup, or share your entire git repository without depending on any specific platform. There are a couple of options.

You could zip it but… 🗄️

A zip of your project folder will include EVERYTHING, yes, including your .git folder with all the changes, branches, reflogs but this might not be what you actually want. A zip of the project folder will include untracked files, and you could accidentally share sensitive information, irrelevant temporary files or IDE and editor configuration folders, etc.

Mirror clone 🔗

A git clone with the option --mirror creates a bare repository (which contains only the stuff in the .git directory) and it maps all refs (including remote-tracking branches, etc.) to the target repository. This means that these refs can be updated by a git remote update.

git clone --mirror myrepo repo.git
# or some remote repository
git clone --mirror https://github.com/davidenunes/repo repo.git

You could make a backup of your repository by creating a mirror (bare) repository and archive it.

To restore the bare repository, you can do the following:

mkdir myrepo
mv repo.git myrepo/.git
cd myrepo
git init
git checkout -f

If you want to refresh the backup you just need to call git remote update from the clone location.

Bundling 📦

Git is capable of bundling its data into a single binary file. You need to list out every reference or specific range of commits that you want to be included. If you intend for this to be cloned somewhere else, you should add HEAD as a reference as well. Alternatively, you can use --all to include all refs.

Within the project folder do:

# this will include all info to recreate the master branch
git bundle create repo.bundle HEAD master
# optionally --all for all refs to be included
git bundle create repo.bundle --all

You can then send or store this file, and in another machine unbundle it into a repository:

git clone -b master repo.bundle repo

If you don’t include the HEAD reference (or --all) you will get the following error:warning: remote HEAD refers to non-existent ref, unable to checkout.

Using --all makes your bundle file match what you would get with git clone --mirror

Thank You

Thank you for reading ❤️ I would love to know what you think, if you do things differently, or have any other neat tips / suggestions. Let me know in the comment section. You can also follow me on Twitter, or subscribe to the RSS feed for more content.

Comments