Git from first principles
06 July 2021A Git tutorial for both beginners and existing users, revealing all the magic by starting from the underlying concepts that are often hidden away
Introduction
Git is the best version control software I've used, but it has a reputation for being complex and confusing, in part thanks to confusing commands and lackluster documentation. In this article I've tried to create my own guide that'll provide readers with an understanding - and hopefully confidence - when using Git. Even if you've been a Git user for a while, you might learn a thing or two - I certainly did while I was writing this.
I think learning some of Git's internal concepts and mechanisms is crucial for understanding how to use the tool as a whole. These details are normally hidden by the user-friendly Git commands and graphical Git clients. Without knowing what Git commands are doing, it's easy to get lost when something goes wrong and not know how to recover. Instead of hiding them, I'll explain the some of the under-the-hood behaviour while teaching the day-to-day Git commands.
If you prefer video explanations over long text articles like this, I'd recommend checking out The Missing Semester's lecture on Git,1 Git For Ages 4 And Up for Git beginners,2 or Git From the Bits Up for those who are more experienced.3 This article takes inspiration from these, but also tries to expand on what they didn't have time to cover.
Getting started
Let's start by checking the standard Git help:
$ git help
usage: git [--version] [--help] [-C <path>] [-c <name>=<value>]
[--exec-path[=<path>]] [--html-path] [--man-path] [--info-path]
[-p | --paginate | -P | --no-pager] [--no-replace-objects] [--bare]
[--git-dir=<path>] [--work-tree=<path>] [--namespace=<name>]
<command> [<args>]
These are common Git commands used in various situations:
start a working area (see also: git help tutorial)
clone Clone a repository into a new directory
init Create an empty Git repository or reinitialize an existing one
work on the current change (see also: git help everyday)
add Add file contents to the index
mv Move or rename a file, a directory, or a symlink
restore Restore working tree files
rm Remove files from the working tree and from the index
sparse-checkout Initialize and modify the sparse-checkout
examine the history and state (see also: git help revisions)
bisect Use binary search to find the commit that introduced a bug
diff Show changes between commits, commit and working tree, etc
grep Print lines matching a pattern
log Show commit logs
show Show various types of objects
status Show the working tree status
grow, mark and tweak your common history
branch List, create, or delete branches
commit Record changes to the repository
merge Join two or more development histories together
rebase Reapply commits on top of another base tip
reset Reset current HEAD to the specified state
switch Switch branches
tag Create, list, delete or verify a tag object signed with GPG
collaborate (see also: git help workflows)
fetch Download objects and refs from another repository
pull Fetch from and integrate with another repository or a local branch
push Update remote refs along with associated objects
'git help -a' and 'git help -g' list available subcommands and some
concept guides. See 'git help <command>' or 'git help <concept>'
to read about a specific subcommand or concept.
See 'git help git' for an overview of the system.
All the commands listed above are called porcelain commands; they are high-level and user-friendly, and all you need when using Git day-to-day. Porcelain commands use plumbing to get stuff done under the hood. Plumbing are a separate set of fundamental data structures and utilities. These terms an analogy, comparing this dichotomy to the porcelain and plumbing found in a bathroom: the pretty porcelain hides away the confusing plumbing, which you usually don't want to think about.
I'll explain Git-specific terms as we get to them, but you can refer to the Git
glossary with $ man gitglossary
4 for an explanation of any new
terms you come across.
Creating a repository
A Git repository is a directory with a specific structure and set of files, storing all of the data Git needs.
To create a new Git repository, navigate to an empty directory and run $ git init
:
$ git init
$ tree -a
.
└── .git
├── branches
├── config
├── description
├── HEAD
├── hooks
│ ├── applypatch-msg.sample
│ ├── commit-msg.sample
│ ├── fsmonitor-watchman.sample
│ ├── post-update.sample
│ ├── pre-applypatch.sample
│ ├── pre-commit.sample
│ ├── pre-merge-commit.sample
│ ├── prepare-commit-msg.sample
│ ├── pre-push.sample
│ ├── pre-rebase.sample
│ ├── pre-receive.sample
│ ├── push-to-checkout.sample
│ └── update.sample
├── info
│ └── exclude
├── objects
│ ├── info
│ └── pack
└── refs
├── heads
└── tags
Git has now created a new hidden directory called .git
- your new repository.
Your previously empty directory is known as your working tree (a.k.a. worktree or working tree). It's a regular directory on your computer, but Git will control its contents. Running Git commands will store the contents of files from the working tree into the Git repository, as well as restore files stored in the repository to the working tree.
When you run $ git ...
, Git searches your current directory and its parents
for a valid repository in a .git
directory.
I'll remove the description
file and hooks/
directory, since they're
optional. You can see that an empty repository is pretty simple:
.git
├── branches
├── config
├── HEAD
├── info
│ └── exclude
├── objects
│ ├── info
│ └── pack
└── refs
├── heads
└── tags
The config
file stores your repository-specific configuration - options set
with $ git config
are stored in this file. To start off, set your username and
email, which will be saved along with the changes you make:
# The [user] section has been added
# The [core] section contains some internal Git configuration
I'll explain the rest of the files and directories as we get to them.
Saving snapshots with commits
The purpose of version control is to store snapshots of a directory tree (in this case the files within your working tree). Git stores these snapshots with commits. Commits contain a snapshot of all the files, and some contextual information, such as the author and a message to explain what was changed from the previous snapshot.
Let's step through the process and explain how Git creates a commit. First, let's check the current status of our working tree:
)
This tells us that:
- We're on the default branch of the repository, called
master
- We haven't committed anything yet (i.e. Git hasn't saved any snapshots)
- Our working tree has no changes that could be committed
I'll go into branches later. For now, let's create a small text file so our commit has something to take a snapshot of:
()
)
Git helpfully tells us that we need to add
file1.txt
for Git to track it
(i.e. include it in snapshots) so let's do that:
()
Using $ git add
has added file1.txt
to the staging area. It tracks the
contents that will be saved in the next snapshot, which can differ from the
contents of your working tree. This is handy when you've made several unrelated
changes in your working tree, and you want to save them as separate commits.
Think of the staging area as a virtual working tree contained within the
repository - running $ git add
copies the contents of files from your working
tree into the staging area. When you create a commit, Git saves a copy of the
staging area as the snapshot of the directory.
The status now shows file1.txt
under "Changes to be committed", indicating
that it is staged. You could also have changes in tracked files that aren't
staged yet, in which case they would be listed under "Changes not staged for
commit".
When you add content to the staging area, Git immediately saves it -
specifically, it saves a snapshot of each file in the staging area. After $ git add
ing file1.txt
, some new things are added to the repository:
.git
├── branches
├── config
├── HEAD
├── index
├── info
│ └── exclude
├── objects
│ ├── 1e
│ │ └── d6543483aafc93c5323daea1860bd7a29857d4
│ ├── info
│ └── pack
└── refs
├── heads
└── tags
We now have two new files: index
, and a file in objects
known as an
object. There are three different types of Git objects: blobs, trees, and
commits. The file in objects/1e/
is a blob: this type contains metadata and
the (compressed) contents of a file. In this case it's the contents of
file1.txt
when you $ git add
ed it.
The full name of an object starts with the name of the subdirectory it's
contained in. You can check the contents of the blob by using $ git show
.
# Prepend the "1e" from the subdirectory
If you've been running these commands yourself, you might notice that the name
of the object is different in your repository - that's likely because you didn't
put exactly the same contents in your file1.txt
, or the metadata
might be different if you're using a different version of Git. The object's
name (and hence file path) is actually a hash of its contents. The hash
represents the contents of the object with a unique fixed-length string. Due to
the additional metadata in the object, it won't simply be the hash of the
contents of file1.txt
. As of writing, the hash algorithm used is SHA-1, but
Git is switching to SHA-256 in the near future.5
Note that you don't need to specify the object's full name in these commands. Git can work with the prefix of a name, as long as there's no ambiguity:
The index
file stores that state of the staging area. It tracks the state of
files in your working tree, including the associated staged blob for each file,
and some additional information like Unix file permissions. Git documentation
often refers to the index and "staging area" interchangeably.
You can inspect the staging area with the plumbing command $ git ls-files
:
Let's finalise our commit, attaching a description for this snapshot with the
--message
option:
)
)
Let's run $ git show
to see the last commit that we just created:
$ git show
commit 173bb18cd1059b1efb048dc32442eb34b36c78a4 (HEAD -> master)
Great! Git has created a commit object:
- named
d8bc...
- created by myself
- at a certain date and time
- containing a file called
file1.txt
with the contentsSome contents
.
Let's peek into the repository again to see what's changed:
$ tree .git
.git
├── branches
├── COMMIT_EDITMSG
├── config
├── HEAD
├── index
├── info
│ └── exclude
├── logs
│ ├── HEAD
│ └── refs
│ └── heads
│ └── master
├── objects
│ ├── 1e
│ │ └── d6543483aafc93c5323daea1860bd7a29857d4
│ ├── 58
│ │ └── fa13f5e9dff3635df5401f0aa8ef5868f18e29
│ ├── fe
│ │ └── 89dfe71214c0d45973c551c45a449b3b2f49f7
│ ├── info
│ └── pack
└── refs
├── heads
│ └── master
└── tags
There's a lot more data in the repository now, so let's go through it.
COMMIT_EDITMSG
is the file you edit to enter the commit message. It will be
cleared for the next commit, but for now the message for the last commit
remains.
We can inspect the objects that were created with another Git plumbing command,
$ git cat-file
:
# Get the type (-t) of the object
# Print (-p) its contents
Here we see Git has created a new type of object, a tree. Trees refer to a collection of blobs and other trees. For each blob entry, the tree stores:
- the name of the blob
- the name of the file the blob stores the contents of
- the associated Unix file permissions
For each subtree entry, the tree stores:
- the name of the tree
- the name of the directory the tree stores the contents of
Thus trees represent a single directory, including its files and subdirectories.
This tree in particular represents the root of our repository, which only
contains a single file: file1.txt
.
And now for the final object:
As you might have noticed, this is the object referenced by the earlier $ git show
; it represents the new commit. Commit objects primarily contain:
- a reference to the tree for the root directory of this snapshot (i.e. the same as the root as the working tree)
- the author and committer along with the date and time that each occurred
- the commit message
- a reference to the parent commit(s). This commit has no parents since it is the first commit in the repository
The author and committer are separate entries to support situations where one person creates a snapshot and shares it with someone else, who then creates a commit from it in their repository. Storing the author's identity keeps their name attributed to the snapshot.
Git generated the commit's tree from the index. In this case, it's created a
single tree for the root directory, which contains only
file1.txt
. When there are staged files in subdirectories, index entries are
collected together to form a tree for each directory. Each tree is linked to the
trees that represent their subdirectories, ultimately forming the root tree
which represents the full snapshot of the directory at the time of the commit.
Other information in the commit can either come from you (i.e. the commit message), or be inferred, such as the author/committer's name and email, and the date and time of the commit.
With commits, we have a way of tracking snapshots of the repository and a way of saving some contextual information about said snapshots, which serves the fundamental purpose of a version control system. The rest of Git is primarily for manipulating objects so that they can represent more than a simple linear history, and to support collaboration between multiple users.
Normally you'll only be concerned with commits. You can run $ git log
to list
the "current" commit followed by its ancestors. Right now it'll show the first
commit, 173bb18
:
$ git log
commit 173bb18cd1059b1efb048dc32442eb34b36c78a4 (HEAD -> master)
This commit is also our current HEAD, which means it will be a parent of the next commit that is created.
Showing your changes
$ git status
lists the files that have been changed between the previous
commit and both the working tree and the current staging area. You can use $ git diff
to see a detailed list of the lines that have been changed in each
file. Remember that, at this point, the staging area matches the previous
commit:
# Edit file1.txt
# Create file2.txt
()
()
()
)
The diff lists lines that have changed between two versions of the same file.
New lines are prefixed with +
, and removed lines with -
. If multiple parts
of a file have been changed, the diff will group them into distinct hunks,
which encapsulate changes in nearby lines, with some unchanged lines for
additional context.
You can see that the diff references the staged blob for file1.txt
, 1ed6543
,
as well as a new blob name ff709a8
. The new blob name is calculated from the
file in the working tree, but is not actually saved as an object in the
repository yet - a $ git add file1.txt
would result a blob with the name
ff709a8...
being created.
By default the diff shows changes between the staging area and the working tree,
and excludes untracked files like file2.txt
. If we stage all changes, the
staging area will match the working tree, and so a normal $ git diff
won't
display anything:
()
# No difference between staging area and working tree, so no diff is shown
We can instead use $ git diff --staged
to specify that we want to see the
changes that we have staged. That means the differences between the last
commit's tree and the staging area:
$ git diff --staged
diff --git a/file1.txt b/file1.txt
index 1ed6543..ff709a8 100644
+A new line
diff --git a/file2.txt b/file2.txt
new file mode 100644
index 0000000..1ed6543
+Some contents
Some contents
If you were to create a commit right now and then run $ git show
, this same
diff would be shown under the commit.
Here's a graph to visualise the commands and the current states of the working tree, index, and the tree of the current HEAD. The red lines represent what the commands are comparing:
Skipping the staging area
If the staging area is inconvenient or unnecessary, you can specify the files
you want to commit after $ git commit
, e.g. $ git commit file1.txt file2.txt
. This will immediately commit the contents of these files as they are
in the working tree.
Alternatively, run $ git commit --all
. This will immediately stage all tracked
files as they currently are in the working tree, and commit them. Note that you
still need to use $ git add
to track new files that you're committing for the
first time, or list new files you want to track after --all
.
Staging hunks within a file
A common situation is having made several changes in your working tree and later realising you could split them up into several commits. E.g. you've fixed multiple bugs within a single source file. The staging area gives you the power to precisely control what goes into a snapshot, and this power makes it easier to keep commits self-contained. You can make sure each commit includes changes that serve a single purpose, that the project still builds correctly, and that all tests still succeed.
If you have multiple changes in a single file and want to split them into
different commits, you can stage only a subset of your changes with $ git add --patch
:
()
This interactive form goes through each hunk in your diff in the file(s) we want
to stage, letting us choose what we want to do with it. If you ask for help with
?
, here are your options:
y - stage this hunk
n - do not stage this hunk
q - quit; do not stage this hunk or any of the remaining ones
a - stage this hunk and all later hunks in the file
d - do not stage this hunk or any of the later hunks in the file
s - split the current hunk into smaller hunks
e - manually edit the current hunk
? - print help
The most important ones are, unsurprisingly y
and n
, for "yes" and "no".
s
will let you split a hunk up if it has unchanged lines between changes,
which can be handy if Git hasn't managed to split the diff up the way you want
it:
(1/1) Stage this hunk [y,n,q,a,d,s,e,?]? s
Split into 2 hunks.
+First line
Some contents
(1/2) Stage this hunk [y,n,q,a,d,j,J,g,/,e,?]? y
Some contents
+Last line
(2/2) Stage this hunk [y,n,q,a,d,K,g,/,e,?]? y
If you want to change the contents of the hunk, you can use e
to edit it
manually:
diff --git a/file1.txt b/file1.txt
index 1ed6543..4a9a39f 100644
+First line
Some contents
+Last line
(1/1) Stage this hunk [y,n,q,a,d,s,e,?]? e
The following will open in your editor. I've manually added the line containing
+Another line
:
# Manual hunk edit mode -- see bottom for a quick guide.
+First line
Some contents
+Another line
+Last line
# ---
# To remove '-' lines, make them ' ' lines (context).
# To remove '+' lines, delete them.
# Lines starting with # will be removed.
#
# If the patch applies cleanly, the edited hunk will immediately be
# marked for staging.
# If it does not apply cleanly, you will be given an opportunity to
# edit again. If all lines of the hunk are removed, then the edit is
# aborted and the hunk is left unchanged.
After saving and quitting, you can see that your hunk has been staged without changing the file in your working tree:
# The staged diff shows "Another line" that was added by editing the hunk
# Comparing the working tree to the staging area, "Another line" is shown as
# removed, as editing the hunk didn't change the file in the working tree
Restoring after changes
If you've staged something and later decided you don't want to include it in the
next commit, you can use $ git restore --staged
to reset the staging area to
match HEAD:
()
()
()
()
()
)
Use the --patch
option to interactively pick which hunks to unstage,
similar to $ git add
.
You can use $ git restore
to restore the contents of the files in your working
tree to match the staging area. Note that this will delete your changes from the
working tree, which may not have been saved in the repository if you haven't
staged or committed it.
Finally, if you want to remove a file in a commit, you can remove the file from
your working tree and subsequently $ git add
to stage the removal, or simply
use $ git rm
to remove and stage in one command.
refs: heads, tags and HEAD
A ref is an alias for a specific commit or another ref, which are more
user-friendly to use than full commit names. These are stored in files under
.git/refs
, each containing either a full commit name or the name of another
ref. They can be used in Git commands instead of commit names.
Tags are user-specified refs, contained in .git/refs/tags
. They're typically
used to mark significant commits with a user-friendly name, e.g. labelling
the commit used for a software release with "v1.0".
To create a new tag, use $ git tag
:
)
$ git show
lists the refs associated with a commit next to its name. In
this case it's showing that refs HEAD
, master
, and the tag first
point to
the first commit in the repository.
A head (lowercase) points to a commit that is the "tip" of a branch. Branches represent a "line of development". When you add a commit while on a branch, Git automatically updates the head ref of a branch to point to the new commit, so the tip is maintained. "Branch" and "head" are sometimes used interchangeably in Git documentation. I'll expand on branches in the next section.
Heads are stored in .git/refs/heads
. For example, heads/master
tracks Git's
default branch, master
. Right now, it contains the name of the first commit
object that was created.
HEAD (uppercase) is a special ref that tells Git which commit will be the
parent of the next commit. It will normally point to a head, e.g. master
,
which is what the HEAD -> master
represents in the previous $ git show
:
So when HEAD is pointing to heads/master
, that means you are on branch
master
, and creating a new commit will move heads/master
to point to it.
When HEAD is pointing to a specific commit instead of a branch, this is known as
being in a "detached HEAD" state. You can resolve this situation by manually
manipulating your HEAD, which will be covered later.
Most Git commands will default to using HEAD as their argument, including $ git tag
. You could specify any commit by name or by ref if you wanted, e.g. $ git tag d9bcd
or $ git tag HEAD
.
Objects and refs are the two foundational components of Git: all operations involve manipulating some combination of the two.
Branches
The link between commits is that a commit refers to its parent(s) - this means a commit can have any number of "children". Through this, your commit history can diverge by having a commit with several children, and be rejoined later by a commit with several parents. These separate chains of commits are called branches, and they allow you to to create multiple chains of commits that exist in parallel.
Git's default branch is called master
, but its name is configurable and so may
vary between platforms and teams. Branching conventions also vary, so some
repositories might use their default branch for active development, while others
may only update it with each release to end-users. See extra
resources for some examples.
You can add and list branches with $ git branch
, and switch which one you're
with $ git switch
. Switching a branch will also update the index to match the
target:
# Create a new branch from the current HEAD, call it "my-branch"
# Branching has created a new head ref called my-branch
# Both heads point to the first commit, "Add file1.txt"
# HEAD still points to master
# Switching updates HEAD to point to my-branch
At this point we have a single commit, which is pointed to by the heads of both
master
and my-branch
and the tag first
. It is also currently our HEAD.
The above diagram represents the current commit history of the repository - in
this case, just a single box for the first commit. The labels without boxes
represent the current refs: the heads for master
and the new branch
my-branch
, the tag first
, and finally HEAD
. These point to the contents of
the ref, i.e. another ref or a commit.
Let's add a new commit while we're on my-branch
:
# Use the --allow-empty option so we don't need to commit any changes
# HEAD still points to my-branch
# master still points to the first commit, "Add file1.txt"
# my-branch now points to the new commit, "Commit on my-branch"
# List HEAD followed by its chain of ancestors
# --oneline gives us a single-line summary
)
)
Now we've got a new commit, 630d4e3
, whose parent is the first commit,
173bb18
. Since HEAD was pointing to heads/my-branch
, Git updated this head
to point to the new commit, so it continues to track the tip of the branch. Note
that heads/master
and tags/first
both continue to point to the original
commit 173bb18
, and HEAD
still points to heads/my-branch
:
Note that in the diagram, 630d4e3
points to its parent - it is the child
commit which refers to its parent(s), not the other way around.
Let's switch back to master
and create another commit:
# --all includes the tips of all branches
# --graph visualises ancestry: commits are asterisks, lines show parents
)
| )
|
)
At this point, master
and my-branch
have diverged: they both contain
commits that the other does not.
Branches are useful for separating changes that are a work-in-progress, like implementing a new feature to your application. Working on a separate branch avoids disrupting others with your potentially broken changes, and also avoids collisions between branches until you're ready to merge your changes.
Merge commits
Once you're happy with the state of your branch and want to include your changes
in another branch, you can rejoin your branch to master
(or any other). This
is done with $ git merge
, which will create a new commit with multiple
parents, combining the changes in all of them:
# The merge command shares several options with commit, including --message
# The new commit has two parents:
# Now "Commit on my-branch" also appears in the history of master
)
|\
| )
|
|
)
)
After merging, master
has another new commit f0d4
, which has two parents
from different branches: 6679
from my-branch
and d86e
from master
. This
single merges the commits from my-branch
into master
- such commits are
called merge commits. The merge commit message comes from the message
specified in the $ git merge
command. The head of my-branch
remains pointed
to 6679
, as the merge commit was made on the master
branch, so we say that
my-branch
has been merged into master
.
At this point you could continue to add new commits on my-branch
and master
,
and they could be merged together again later.
At this point you can delete your branch:
|\
|
|
|
)
)
Notice that the log still displays the separate branches in history - this is
thanks to the merge commit. Git traverses both parents of the merge to
show the point at which master
and the other branch diverged, and shows that
they were merged again. However, the head for my-branch
has been removed.
Fast-forward merging
Sometimes a merge commit isn't necessary - for example, when merging branches
that haven't diverged. In such cases, Git can simply adjust the head of the
destination branch to match the branch being merged. These merges are known
as fast-forwards, and Git can automatically detect when it's possible and
apply this when you run $ git merge
.
For an example, let's create a new commit on another branch:
# Create my-branch again and switch to it
# Switch back to master
Now we have a single new commit on my-branch
, pointing to the current head of
master
, meaning these two branches have not diverged. Let's merge my-branch
into master
once more:
|\
|
|
|
)
)
Git has automatically determined that the branches haven't diverged, and so a fast-forward was performed instead of creating a merge commit.
$ git merge
lets you control its strategy with a few options:
--ff
: Create a merge commit if the branches have diverged, otherwise fast-forward. This is the default behaviour.--no-ff
: Always create a merge commit.--ff-only
: Only fast-forward. If the branches have diverged, the merge fails.
Finally, let's delete my-branch
again, since we're finished with it:
# Delete my-branch again
)
Fast-forwarding saves us from creating merge commits when they're unnecessary, which can help keep the commit history tidy. However, without a merge commit to indicate that a merge has taken place, the fact the merged branch existed is hidden. This may or may not be desirable, so it's worth thinking about when deciding between a normal merge and a fast-forward:
Cherry-picking commits
If there's a single commit in another branch that you want to incorporate into
your branch, you can copy it using $ git cherry-pick
. This will take the diff
of the target commit (i.e. only the changes introduced in that commit) and apply
it to your current working tree, then create a new commit copying the target's
message and author.
Take a situation where you have a number of commits in another branch:
# Create a new branch
# Create two commits, the first one is empty
# Whilst the second one adds a new file called file2.txt
)
# Resulting in the following history:
)
)
|\
|
|
|
)
Now we have a couple of commits that are only on branch cherrypick-from
. If we
want to include only one of these commits in another branch, we can cherry-pick
to copy it. For example, let's create a new branch based on master
and copy
the "Second cherry-pick commit" commit which creates file2.txt
:
# Create and switch another branch based off master, called cherrypick-to
# Use cherry-pick on the head of cherrypick-from
# i.e. cb3c9c2: "Second cherry-pick commit"
)
)
| )
|
)
|\
|
|
|
)
Now a "Second cherry-pick commit" commit also exists cherrypick-to
, which has
added the new file2.txt
. Note that the cherry-picked commit has a different
hash from the original, indicating that they are distinct from one another. The
parent of the new commit is different - even if everything else in the commit
was copied exactly, the object's hash would differ and thus the commits would be
different objects.
# Clean up: delete the cherry-pick branches
# Use the --force option to delete unmerged branches, which will cause the new
# "Cherry-pick" commits to be lost
)
)
)
|\
|
|
|
)
Solving conflicts
A conflict is an error that can occur when Git attempts to merge changes from commits that have different ancestries. Specifically, they occur when two diverged branches have applied different changes to the same part of the same file:
# Create a new branch and create a commit that file1.txt
) )
# Switch back to master and add a commit that modifies file1.txt
) )
# Attempt to merge in the conflict branch
)
;
Git pauses during the merge to alert us - it has tried to automatically merge
the changes that branch conflict
has made to file1.txt
, but failed. This is
what we call a conflict. If we check $ git status
, we can see that we are in a
new state with "unmerged" files:
()
()
()
)
As the status suggests, if you don't want to go through with the merge at this
point, you can run $ git merge --abort
to cancel it.
Unmerged files are tracked within the index, which now associates three
different blobs for file1.txt
:
The first blob is the original contents of file1.txt
, before either branch
applied the changes that are causing the conflict. The second is the contents of
the file in the target branch, master
, and the third from branch conflict
.
Inside the repository, there are also a number of files that Git uses to track information about the current merge:
# MERGE_HEAD, MERGE_MODE, MERGE_MSG, and ORIG_HEAD have now appeared:
MERGE_HEAD
contains the name of the commit currently being merged into the
target branch. In this case, it contains the name of the tip of conflict
.
Like HEAD
, despite not being stored in .git/refs
, MERGE_HEAD
is a valid
ref:
)
ORIG_HEAD
contains the target commit which MERGE_HEAD
is being merged into.
In this case, the it contains the name of the tip of master
at the time the
merge started. This is the commit that you will be returned to if you cancel the
merge:
)
Both of these commits will be used as parents of the merge commit once all conflicts have been resolved.
MERGE_MSG
contains the message that will be used for the merge commit, and
MERGE_MODE
contains the merge strategy being used (e.g. no-ff
).
In this unmerged state, file1.txt
contains a combination of the conflicting
changes:
<<<<<<< HEAD
Changes from master
=======
Changes from the conflict branch
>>>>>>> conflict
The lines beginning with <
, =
, and >
are called conflict markers, and
indicate the areas of the file in which conflicts have occurred. The first
section of a conflict area is between the begin marker line <<<
and ===
, and
is labelled with the ref or commit (in this case, HEAD
). This label indicates
where those changes have come from. Similarly, the lines between ===
and the
ending marker >>>
are the contents of the other ref or commit in this
conflict, in this case the tip of conflict
.
Generally, "our" changes are in the first section, and "their" changes are in the second section. In a merge, "our" changes are from the current branch which has changes being merged into, and "their" changes are from the branch that is being merged from.
To resolve the conflict, replace the conflict markers with the contents that should be there after the merge process. This can be the contents from either marker region, or some combination of the two. For this example, I'm going to change the line entirely:
# Stage file1.txt to mark it as merged
()
You'll be prompted for a commit message as usual:
# Conflicts:
# file1.txt
#
# It looks like you may be committing a merge.
# If this is not correct, please run
# git update-ref -d MERGE_HEAD
# and try again.
# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
#
# On branch master
# All conflicts fixed but you are still merging.
#
After you save and quit your editor, the merge commit will be created:
)
Instead of showing just a simple before-and-after, the diff shows the changes
applied to both the contents from master
and conflict
, followed by the new
change which replaced both. This is a combined diff, showing the change that
the new commit introduces in comparison to all of its
parents.6 If we kept the contents from either master
or
conflict
the diff would be empty; technically the merge would have introduced
no new changes. In that case, you could instead use $ git diff <merge commit>..<merge-commit>^
to show changes that the merge introduced into the
target branch. I'll explain this syntax in the next section.
# Delete the new branch to clean up
)
Exploring history and branches
Resetting HEAD
$ git reset
changes the current HEAD (or the current branch's head) to the
specified target, and optionally affects the staging area and working tree.
Manually changing refs is handy in situations where you want to manipulate
history to, for example, undo a commit, or to update a branch to match another
branch.
To undo the commits made during the conflicts example, we can reset back to
10498bd
, "Fast-forward commit":
|\
|
|
|
|\
|
|
|
)
)
|\
|
|
|
)
)
By default, $ git reset
won't affect the working tree, so file1.txt
keeps
the contents it had in 7c5b805
. This is called a "mixed" reset. Git lists
file1.txt
under "Unstaged changes after reset" to indicate this file doesn't
match the staging area after the reset.
If want to reset HEAD but also leave the staging area as it is, you can use the
--soft
option:
# A soft reset won't affect the staging area either
# The staging area keeps the changes that were staged in file1.txt
Finally, a hard reset using --hard
will set everything to match the target:
HEAD, the staging area, and the working tree:
|\
|
|
|
)
)
And now we're back to "Fast-forward commit", as if the conflict merging had never happened!
Referring to commits by ancestry
You can also refer to commits from their descendants in a relative manner, which saves you the hassle of searching for names when dealing with the recent ancestors of a ref:
# "The parent of HEAD"
# "The grandparent of HEAD"
The ^
(caret) character means "parent of", so repeating the character will
traverse as many commits as you want.
Note how commits with multiple parents are handled: HEAD^^
shows "Commit on master",
despite the merge commit also having "Commit on my-branch" as a parent. This is because the
first parent is implicitly chosen - for merge commits this is a commit on the
branch that was merged into.
So ^
really means first parent. To manually choose which parent you want,
use a number following the ^
(e.g. ^2
means "second parent"):
# "Second parent of the parent of HEAD"
You can keep adding ^
characters afterwards to traverse more parent commits.
Instead of repeating ^
for each parent, you can use ~
(tilde), which always
means "first parent". Appending a number instead specifies how many parents you
want to traverse, and so is equivalent to using ^
sequentially that number of
times:
# "The first grandparent of HEAD"
# "The grandparent of HEAD" again
You can combine both systems to get to any commit you want:
# Traverse the second parent of the merge commit to get the first commit
# First parent -> second parent -> first parent
)
# The same path: ^ and ~ can be substituted if no numbers are appended
)
Garbage collection and the reflog
When using $ git reset
and other commands that affect refs, you can enter
situations where commits are no longer accessible from any ref. But, as
demonstrated earlier, you can still access these unreachable commits and reset
to them. They aren't immediately deleted, so there's no need to worry about
immediately losing your work.
If you've lost and forgotten the name of a commit, you'll be able to find it
using the reflog ("ref-log"), which keeps a local history of changes to refs.
These are stored in .git/logs
.
) }
) }
) }
) }
) }
In this case, the last few entires in the reflog are showing the most recent
commits and the resets from the previous section. The reflog introduces another
relative name syntax, specifically for the previous states of refs. This follows
the format ref@{offset}
, with older entries using larger offsets:
# HEAD@{0} is equivalent to HEAD
)
)
A reflog is kept for each head - so as long as an unreachable commit is accessible through commits in any reflog Git will keep them around. Reflogs are pruned over time when you run Git commands, by default keeping the last 90 days of history,7 which is plenty of time in most cases to recover useful work that has been accidentally made unreachable. Unreachable objects are created fairly regularly while using Git (e.g. when you stage a file multiple times before committing it) - garbage collection stops these objects from bloating your local repository.
Don't rely on the reflog to save things that should be kept safe in the long-term - use branches or the stash instead.
Stashing changes
A common situation is to have some additional changes after a commit, but these changes may affect test results, which makes it harder to check your commit stands on its own. Alternatively, you might want to quickly switch to working on something else, starting from a clean slate whilst saving your existing changes. Instead of creating a feature branch or temporary commit that you'll forget about, you can use the stash:
()
()
}
Git has added a stash
ref that points to a commit:
# The stash commit is a merge commit, merging another commit into the current
# head of master. Here's the commit it's merging:
)
|\
|
|
)
|\
|
|
|
)
)
Stashing creates two commits: "index on...", which contains only changes
that were staged, and "WIP on...", which
contains the remaining changes you had in the working tree. The HEAD at the time
of the stash is used a parent for both commits, with its name being appended to
the message of the stash commits. The "WIP" commit is a merge commit, merging
the "index" commit into the HEAD, so it includes both the staged and unstaged
changes. Finally, refs/stash
points to the "WIP" commit.
You can list your saved stashes with $ git stash list
:
}
The list uses the syntax for ref history, which hints that the stash system
relies on the reflog to keep track of stash commits. Use $ git reflog
to
inspect the reflog in .git/logs/refs/stash
:
) }
Older stash commits are listed later in the reflog, and have a larger index. Unlike other reflogs, Git does not garbage collect old entries in the stash reflog (by default), so it's safe to save changes there in the long term.
Use $ git stash apply
to apply (i.e. unstash) the "WIP" commit, placing its
changes back into the working tree. In this example we only had unstaged
changes, but stash apply
would also restore changes to the staging area if
there were any:
()
()
)
}
After applying it, the stash entry is kept in the log, in case you want to keep
it around. You can use $ git stash drop
to remove the top stash commit from
history, without applying it. Alternatively, you can drop a specific stash by
name or index, or use $ git stash clear
to remove all of them:
# Create a new stash, using the --message option to set the stash commit
# message
}
}
)
}
# No stashes remain
# The stash reflog has been cleared
To apply and drop a stash in one command, use $ git stash pop
:
# Create another stash
# $ git stash push is the same as $ git stash
()
()
)
)
# No stashes remain
Use the --index
option with pop
or apply
to only apply the changes in the
"index" commit, i.e. the changes that were staged before the stash.
Instead of stashing all changes in your working tree, you can pick which ones
you want by providing a list of files after a --
(double-dash) to separate
them from the options: $ git stash push -- file1.txt file2.txt
.
Use the --patch
option with $ git stash push
to interactively select which
changes you want to stash, similarly to $ git add --patch
.
By default, stashing won't include untracked files - if you want to include
them, use the --include-untracked
option with $ git stash push
.
Some Git commands like $ git merge
have an --autostash
option, which is
handy if your working tree is dirty when you want to do a merge. This option
stashes your changes before the merge, then applies the stash after it.
Rewriting history
Commit objects are immutable, meaning the name of a commit depends on the contents of all the files in the repository, the date and time of the commit, and several other factors. If we want to edit commits in the repository, we can't simply change the existing commits. We can however create new commits based on some existing ones and replace the existing commits, effectively rewriting history.
This is handy for correcting mistakes and keeping commits self-contained. Since all commits are performed on your local repository rather than relying on an external server, you're free to rewrite history before you're happy to share it with others. You can also rewrite history after sharing your commits, but this is a more dangerous operation.
Undoing commits
If you simply want to get rid of the commit on the tip of the branch, you can
use $ git reset
(as explained in the reflog section).
Amending commits
The simplest scenario is wanting to edit the commit on the tip of the branch,
(i.e. the most recent one in history) - $ git commit
has the option --amend
which you can use in this scenario. Instead of creating a new commit, amending a
commit will remove the last commit, then use your currently staged contents to
create a new commit object. You'll be presented with the previous commit's
message to edit:
You'll then be presented with this in your editor:
# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
#
# Date:
#
# On branch master
# Changes to be committed:
#
#
After saving and quitting, your amend will be applied:
|\
|
|
|
)
# 33cda34 has replaced 10498bd
# Undo the amend by resetting back to 10498bd
# You could also reset to HEAD@{1}, which will also point to 10498bd
)
)
The amended commit keeps the author from the original commit, but the committer will store new information, i.e. who amended the commit, and when.
Rebasing
If your situation involves something other than the commit at the tip of a
branch, $ git rebase
provides much more flexibility (and complexity).
Rebasing is the process of taking a contiguous set of commits and reapplying each to create a new set. It can reapply those commits on top of a different "tip", changing the commit that the set is attached to - think of this as "re-parenting".
You can use $ git rebase
with another branch as the target (known as the new
base), which will take the commits on the current branch that aren't in the
target branch and reapply them to the current tip of the target. In this
case, "reapplying" a commit effectively means cherry-picking
it. In the end, your branch will be reformed into one you can merge with a
fast-forward. To demonstrate:
# Create a new branch from the first commit, using the tag 'first'
)
)
)
|\
|
|
|
| )
|
)
Here's a clearer visualisation of the commit graph:
Now three commits are parented to the first commit, including the new commit we just made. Now let's try rebasing:
|\
|
|
|
)
)
)
Now our new commit is parented to the tip of master
instead of the first
commit:
# Delete the branch to clean this example up
# Use the --force option since there are unmerged commits
# Commit 3ce5c9d will become unreachable
)
Note that conflicts can occur while commits are being reapplied. In that case, Git will inform you and let you interactively resolve the conflict similarly to a merge. Use:
$ git rebase --abort
to cancel the rebase entirely, and restore the branch to its state before the rebase started$ git rebase --skip
to skip the conflicting commit and resume the rebase$ git rebase --continue
after the conflict has been resolved to amend the conflicting commit and resume the rebase
Git will fast-forward (i.e. skip over) commits that don't need to be rewritten,
which saves you from accidentally rewriting commits that you shouldn't. You can
use the --no-ff
or --force-rebase
options to explicitly disable this
behaviour.
Like $ git merge
, $ git rebase
has an --autostash
option to quickly stash
away changes in your working tree before rebasing, and reapply them after the
rebase is complete.
Interactive rebasing
You can manually influence the rebase process, which makes a rebasing much more
powerful tool than simply copying a set of commits to a new tip. Rebase using
the --interactive
option and Git will open your editor, presenting you with
the list of commits that ill be affected. You can edit the list to specify what
to do with each commit:
The following is opened in your editor:
#
#
# Commands:
# p, pick <commit> = use commit
# r, reword <commit> = use commit, but edit the commit message
# e, edit <commit> = use commit, but stop for amending
# s, squash <commit> = use commit, but meld into previous commit
# f, fixup <commit> = like "squash", but discard this commit's log message
# x, exec <command> = run command (the rest of the line) using shell
# b, break = stop here (continue rebase later with 'git rebase --continue')
# d, drop <commit> = remove commit
# l, label <label> = label current HEAD with a name
# t, reset <label> = reset HEAD to a label
# m, merge [-C <commit> | -c <commit>] <label> [# <oneline>]
# . create a merge commit using the original merge commit's
# . message (or the oneline, if no original merge commit was
# . specified). Use -c <commit> to reword the commit message.
#
# These lines can be re-ordered; they are executed from top to bottom.
#
# If you remove a line here THAT COMMIT WILL BE LOST.
#
# However, if you remove everything, the rebase will be aborted.
#
Each line is a command, the first word being the command itself and rest are are
arguments. If the argument is a single commit, its name can be followed by any
text, which Git uses to show the first line of the commit message. Lines
starting with a #
are comments which won't be interpreted as a command.
The comments below the list give you a quick tutorial on the available commands.
pick
is the most basic command, which means to cherry-pick the specified
commit. Commands are executed from top to bottom, so reorder the pick
commands
if you want to change the order of the new commits. Place a squash
command
immediately after the commit that the argument should be combined into.
Once you save and quit, Git will run through each command in sequence and you'll
be left with a history rewritten exactly the way you want it. The rebase process
may be interrupted if you use an interactive command like edit
or break
,
or if there are any conflicts. In this case you can use the
--continue
option once you're done.
Squash and fixup commits
If you simply want to amend a commit that isn't at the tip of the branch, you
can use $ git commit --squash <commit>
and $ git commit --fixup <commit>
,
which both interact with the rebase process. The new commit's message will be
prepended with "squash!" or "fixup!" and the first line of the target commit's
message. If you then run $ git rebase --interactive --autosquash <commit>
(where
<commit>
is the earliest commit you want to amend), the list of commands will
appropriately reorder and replace the command for squash and fixup commits.
This will save you the trouble of manually editing the command list in one go, instead setting up the commands as you commit. For example:
# Use --root to include root (i.e. first) commit in the rebase
You'll be presented with the following command list:
In this case the first commit will be amended, and our commit that merged commits 2 and 3 will be skipped.
You can run $ git config --global rebase.autoSquash true
to enable the
--autosquash
option by default.
Remotes
Everything discussed so far are things you can do on your local repository - the final piece of the Git puzzle is how to collaborate with others by sharing your commits.
Git can keep track of remote repositories, which are normal repositories that
exist at some location outside your .git
repository. These can be elsewhere on
your local system, or over the local network/internet using protocols like
HTTP(S) or SSH.
Creating a new remote
To create a remote repository for our current repository, we can use $ git clone
. For example, if your working tree exists in the subdirectory
repository
:
done
Cloning copies a repository into another location, in this case
repository/.git
into the directory remote
. With the --bare
option, Git
will make the target directory a plain repository without a working tree, so now
remote
mirrors repository/.git
, including refs and reachable objects only.
To instead set up an empty repository for a remote, you can run $ git init --bare
. Similarly to $ git clone --bare
, this repository won't have a working
tree associated with it.
You can still use a few normal Git commands on a bare repository:
|\
|
|
|
)
# Without a working tree most operations will fail
)
Back in our normal repository, we can register the new repository as our primary
remote, called origin
:
# A new section has been added for [remote "origin"]
Remotes are configured in the local repository config file. The url
option is
the location of the remote - in this case, a relative path to the remote on our
local filesystem. More formats are supported, including https://...
and
ssh://...
.8
Cloning is the normal way to get a copy of an existing repository. The cloned
URL will automatically be added as the origin
remote, but more remotes can be
added and existing ones adjusted with $ git remote set-url
. Simply copying
someone's .git
directory would also work, but you would end up copying data
which you probably don't want, such as their personal configuration, stashes,
local branches, and unreachable objects.
Fetching
In order to collaborate, several people will use the same remote in their local
repositories. Local repositories download the refs and objects stored in remote
repositories through $ git fetch
, which is known as fetching:
Fetching is controlled by the fetch option for the remote:
+refs/heads/*:refs/remotes/origin/*
- this means "download all refs under
refs/heads/
in the remote repository to the local repository, and store them
locally under refs/remotes/origin/
". All objects that are reachable from those
refs and not stored locally are also downloaded and saved in the local
repository's object store.
With fetching, we have a mechanism for synchronising from a remote repository to a local one, and have new refs for keeping track of heads from remotes. These are called remote-tracking branches:
|\
|
|
|
)
)
Logs now show the origin/master
head alongside master
for commit 4, showing
that these heads have been synchronised since the last fetch. Note how
remotes/
can be (and usually is) omitted from the head name.
Pulling
Whilst fetching will simply download the remote's refs and new objects, you may
also want to incorporate changes others have made into your local branches. This
is done with $ git pull
, and is called pulling.
Pulling will simply fetch, then merge your current head with its counterpart
in the remote. The counterpart is not set by default - set it with $ git branch --set-upstream-to=<remote>/<branch>
. For example:
This will add a branch
section in .git/config
:
[core]
repositoryformatversion = 0
filemode = true
bare = false
logallrefupdates = true
[user]
name = William
email = william@example.com
[remote "origin"]
url = ../remote/
fetch = +refs/heads/*:refs/remotes/origin/*
[branch "master"]
remote = origin
merge = refs/heads/master
If we now attempt pulling:
As the origin and local heads are equal, no merge is necessary. If there were a
difference, a merge commit would be created or your local head would be
fast-forwarded. In the former scenario, this would create a new commit with the
sole purpose of incorporating the changes others have introduced, which isn't
ideal. Instead, if you already have new commits on your local branch, you can
rebase them onto the origin's head with $ git pull --rebase
. If you prefer
this over the default behaviour, you may be interested in the config options
pull.rebase
and
branch.autoSetupRebase
.
Don't rewrite shared history
Avoid changing history that already exists on a remote. Essentially, this means not changing a history that currently contains an existing remote's head into one which no longer contains that head. For example:
|\
|
|
|
)
)
Currently the remote head origin/master
exists in the history of our HEAD
,
pointing the same tip commit. If we were to amend the tip commit, this would no
longer be the case:
| )
|
|\
|
|
|
)
)
)
Now in branch master
, 10498bd
has been replaced with 33cda34
, so the local
and remote branches no longer share a common ancestry. We can still force the
remote to update its head to match our new local head with $ git push --force
,
but there's the possibility that someone else has already sent commits to the
remote, and these commits will be lost because our local repository doesn't have
them in its history. This can also cause trouble for everyone working locally on
the branch with the old history. When they try to pull your changes, they could
encounter conflicts on commits they didn't create, and be forced into a
confusing merge or rebase as they try to apply the old history on top of the new
history of the remote.
With common branches like master
, people will be regularly pulling and
applying new changes on top of the remote's version of that branch. It's
particularly important to maintain the existing history of these branches to
avoid inconveniencing its users. As such, it's better to create new
commits rather than trying to rewrite the existing ones.
For example, if you want to undo a commit that hasn't been pushed yet, you can
drop said commit with a rebase or reset. If that commit has already been pushed,
use $ git revert
to create a new commit
that undoes the changes in another.
Thankfully, remotes will prevent you from rewriting branch history by default. However, this isn't an absolute rule, and you can force remotes to accept the rewritten history in cases where it makes sense.
For example, if you're working on temporary feature branches, it may be acceptable to first rebase them before merging them into a main branch. Your team should set rules on when shared branch history may be rewritten, and shared repositories can be configured to block forced rewrites to enforce those rules.
If you find yourself having accidentally rewritten shared history, try cancelling your current merge or rebase, or use the reflog to (mixed) reset back to the commit before you rewrote history:
# Or reset --hard (if you want to also reset the working tree and index!)
You may then have to cherry-pick any new commits created on top of the rewritten history.
Pushing
Finally, you can push (i.e. upload) changes to a remote with $ git push
.
|
) )
)
|\
|
|
|
)
)
)
This uploads the head of the current branch and any associated objects, then updates the remote head in our local repository.
If your local history isn't up to date (i.e., someone else has pushed commits since your last pull), you'll be met with an error like this:
To ../remote/
! [rejected] master -> master (non-fast-forward)
error: failed to push some refs to '../remote/'
hint: Updates were rejected because the tip of your current branch is behind
hint: its remote counterpart. Integrate the remote changes (e.g.
hint: 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.
In this case you'll need to perform a pull.
You'll get the same error if you rewrote shared history and caused the remote head to be lost in your local history.
If you're sure you want to push these changes anyway, you can use $ git push --force-with-lease
to ignore the error and force the remote to update and
match your local history. This option checks that the heads in the remote match
your remote-tracking branches, so you won't accidentally lose commits that have
been made since your last fetch. However, this option is not infallible; it
would fail to protect you if another process is updating your tracking branches
in the background.9
Similarly, the --force
option also ignores the error, but doesn't perform any
checks on the remote heads, so it should be used with even more caution. As
mentioned previously, remotes can configure whether or not forced pushes are
allowed on individual branches.
Extra resources
Listed below are extensions and additional functionality and you may find useful if this article has held your interest:
-
.gitignore
files let you configure which files shouldn't be tracked. -
$ git bisect
lets you efficiently search history for changes that introduced issues. -
$ git worktree
lets you have multiple working trees that share a single repository. -
$ git sparse-checkout
lets you create a working tree that contains a subset of files in a repository. -
Submodules let you to maintain repositories as a subdirectory of another repository.
-
git-svn is a bridge for using Git with an SVN repository.
-
Git Large File Storage and git-annex are extensions which let you use Git to track large files without bloating local repositories with their full histories.
If you'd like to learn more about branching models, take a look at git-flow and GitHub flow.
If you prefer GUIs, Git has a built-in interface which you can launch with $ git gui
. The Git website maintains a list of third-party GUI clients
here. If you're an Emacs or Vim user, I'd
recommend checking out Magit or
Fugitive respectively.
References
Git For Ages 4 And Up by Micheal Schwern @ Linux.conf.au 2013. YouTube, Wayback Machine
Git From the Bits Up by Tim Berglund @ JAXconf 2013. YouTube, Wayback Machine
All supported remote URL formats are listed in https://git-scm.com/docs/git-fetch#_git_urls