When you travel to a country where you don't speak the native language, frequently there is no time, to properly learn it. If you are to accomplish anything on your own, you need to know a few essential words and phrases. The same is true for Git. This lesson won't teach you to become a Git expert. Instead, we want you to be able to understand some of the vocabulary, and communicate: your desires to Git, and your actions to others. Along the way, we'll introduce a few Git concepts (nouns), commands (verbs), and command line arguments (adverbs). In most cases, the English meaning of the word will help you recall its meaning to Git. Please keep in mind, though, that Git uses some of its verbs and nouns very differently from other revision control systems.
After this lesson, students should be able to:
git clone
command.git log
.git reflog
and
git checkout
.git checkout
.The first concept we introduce is the repository. The repository contains a directory of files and folders and their revisions going back to its creation.
We will start with a small repository with only a few commits that has been created for you to practice with.
We will copy the bio-pipeline
repository from GitHub user ahmadia
using our second Git command, the clone command.
When we execute git clone
,
the command makes a perfect copy of another repository.
By default, it creates a Git repository of the same name
in the same directory you entered the command.
The command (and its successful output)
should look similar to this:
$ git clone https://github.com/ahmadia/bio-pipeline.git
Cloning into 'bio-pipeline'...
remote: Counting objects: 41, done.
remote: Compressing objects: 100% (36/36), done.
remote: Total 41 (delta 19), reused 23 (delta 4)
Unpacking objects: 100% (41/41), done.
Checking connectivity... done
You can now enter the repository (which is also a directory on your file system) by typing:
$ cd bio-pipeline
If we now type ls
,
we see that the repository
has some code
and a few data files.
$ ls
2013-05-24-2760-2763.txt Lumi.2761.csv Lumi.2763.csv
Lumi.2760.csv Lumi.2762.csv python_pipeline.ipy
If we add the -a
flag to show everything,
we can see that Git has created a hidden directory called .git
:
$ ls -a
. 2013-05-24-2760-2763.txt Lumi.2762.csv
.. Lumi.2760.csv Lumi.2763.csv
.git Lumi.2761.csv python_pipeline.ipy
Git stores information about the project in this special sub-directory. If we ever delete it, we will lose our local copy of the project's history, and any changes or commits we had not published yet.
We are looking
at the latest revision,
also referred to in the Git documentation as a commit,
of the bio-pipeline
repository.
If we want to see the name of this revision,
we use the log
command.
By default, when we execute git log
, it gives us
information about this revision
and every other revision
made before it.
We use the the command-line argument, --max count 1
,
to inform Git that we only want to see the current one.
$ git log --max-count 1
commit 61fd2bcece2126cdd8ee24f40a04c18d39403022
Author: Aron Ahmadia <aron@ahmadia.net>
Date: Tue Jun 4 10:59:21 2013 -0400
Made fixes to Python pipeline
Our fingers are starting to get sore from all of this typing.
Luckily, -n
is a common shortcut for number of things
in programming and at the command line.
To save a few keystrokes, we will instead type:
$ git log -n 1
which is equivalent to the previous command.
The output of the command
provides a summary of the revision
Aron Ahmadia committed to the repository on June 4, 2013.
The line: Made fixes to Python pipeline
is Aron's commit message.
The alphabet soup of digits and letters
starting with 61fd2
is called a hash.
The hash uniquely identifies this revision,
and was automatically generated by Git
as the final step of the commit process.
We can think of the hash as an identifier
permanently affixed to this exact version
of the code and data.
Each revision's parent is the previous version of the code and data,
and immediately precedes it in history.
We can see each revision's parents
as output from git log
by adding the --parents
flag.
$❯ git log --parents -n 1
commit 61fd2bcece2126cdd8ee24f40a04c18d39403022 8595b710e3be4b2bf01d51a1c55842510b82ff87
Author: Aron Ahmadia <aron@ahmadia.net>
Date: Tue Jun 4 10:59:21 2013 -0400
Made fixes to Python pipeline
Notice that the parent revision is referred to only by its hash. Since the hash uniquely identifies this revision, this is the only information we need to look up the state of the repository when the parent revision was created.
Usually, we are interested in
how a revision differs from its parent.
We can also see this output
by adding the -p
flag (or --patch
),
to the git log
command.
After you enter this command,
you will see the changes in this revision
presented in diff format.
Since the output does not fit in the screen,
Git will pipe the output into a pager (by default, less).
You can scroll up and down
through the log
by using the up and down arrow keys.
When you are done,
just press q
.
$ git log -n 1 -p
commit 61fd2bcece2126cdd8ee24f40a04c18d39403022
Author: Aron Ahmadia <aron@ahmadia.net>
Date: Tue Jun 4 10:59:21 2013 -0400
Made fixes to Python pipeline
diff --git a/python_pipeline.ipy b/python_pipeline.ipy
new file mode 100644
index 0000000..ab9e62b
--- /dev/null
+++ b/python_pipeline.ipy
@@ -0,0 +1,44 @@
+%pylab
+import numpy as np
+
+f = open('Lumi.2760.csv')
+g = f.readlines()
+f.close()
...
The output is slightly cryptic because it is intended to be read by machines in addition to humans. The differences, also known as the diff, tells you how each file was changed from its previous version to this one.
In general, lines starting with a single ‘+’ were added, and lines starting with a single ‘-’ were removed.”. Lines without the initial ‘+’ or ‘-’ are present in both versions, and are provided as helpful context so you can understand the changes.
The diff headers in the output:
diff --git a/python_pipeline.ipy b/python_pipeline.ipy
new file mode 100644
index 0000000..ab9e62b
--- /dev/null
+++ b/python_pipeline.ipy
summarize the differences between the previous version of the file and its new version.
Since python_pipeline.ipy
was a file new to the repository,
we see this special line:
--- /dev/null
This indicates that there was no previous file, and this file is new. The following line:
+++ b/python_pipeline.ipy
Tells you that the new file was named python_pipeline.ipy
.
The numbers between the @@
markers informs you which lines were changed,
@@ -0,0 +1,44 @@
In this case, Aron created a new file and added lines 1-44.
The lines following,
preceded by a +
,
are the contents of the new file he added,
python_pipeline.ipy
.
Here are two more useful arguments to git log
:
--oneline
- Prints only the first few characters of the hash
and the first line of the commit message in each revision.--stat
- Prints out a summary of files changed in each revision.If we use them together, we see a nice text summary of how the repository has changed since it was created.
$ git log --stat --oneline
61fd2bc Made fixes to Python pipeline
python_pipeline.ipy | 44 ++++++++++++++++++++++++++++++++++++++++++++
python_pipeline.py | 49 -------------------------------------------------
2 files changed, 44 insertions(+), 49 deletions(-)
8595b71 first pass at making a pipeline
python_pipeline.py | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 49 insertions(+)
d7c2a9d Merge branch 'add_2763'
a396b40 added Lumi 2763
Lumi.2763.csv | 10 ++++++++++
1 file changed, 10 insertions(+)
ef023fe added Lumi 2762
Lumi.2762.csv | 10 ++++++++++
1 file changed, 10 insertions(+)
779f888 Added Lumi 2761
Lumi.2761.csv | 10 ++++++++++
1 file changed, 10 insertions(+)
cbd6ff5 Added data file
2013-05-24-2760-2763.txt | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
Lumi.2760.csv | 10 ++++++++++
2 files changed, 60 insertions(+)
Git can't really travel through time,
but it does allow us to inspect its repositories
as they looked in the past.
Imagine that Git
automatically prints out all of your code, prose, and data,
(the contents of your repository),
and binds them into a complete book,
any time you want it to.
Imagine also that instead of a friendly librarian,
you have to ask Git
to retrieve revisions of your book for you.
Git happily does this
when you tell it to checkout
your revisions.
Let's see what the repository looked like
when it was first created,
by giving git checkout
the first four digits
of the oldest commit in our history:
$ git checkout cbd6
Note: checking out 'cbd6'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using `-b` with the checkout command again. Example:
git checkout -b new_branch_name
HEAD is now at cbd6ff5... Added data file
$ ls
2013-05-24-2760-2763.txt Lumi.2760.csv
We'll explain the detached HEAD
message in the next section.
For now, note that the contents of the directory have changed.
Let's restore the original revision
by finding the right hash
in git log
.
$ git log --oneline
cbd6ff5 Added data file
Uh-oh. git log
, by default,
only tells us the history of our current revision.
Don't worry, we only need to add
the --all
flag,
to see all of a repository's available history:
$ git log --oneline --all
61fd2bc Made fixes to Python pipeline
8595b71 first pass at making a pipeline
d7c2a9d Merge branch 'add_2763'
a396b40 added Lumi 2763
ef023fe added Lumi 2762
779f888 Added Lumi 2761
cbd6ff5 Added data file
This is enough to go back to where we were, but let's use this as an opportunity to introduce another Git feature.
We'd like to go back to (or, check out)
the most recent revision.
We could use the output of git log --all
,
but there is a better Git command
for navigating project history,
the reference log, or reflog.
Git keeps a reference log for you
that includes the revisions you have checked out.
The currently checked out revision
is referred to as,
for no particularly great reason,
the HEAD.
Every time you use git checkout
,
HEAD moves to the new commit,
and the reflog gets another entry.
We can use the git reflog
command
to access this history and
see where we are,
and where we've been
in the history of our repository.
$ git reflog
cbd6ff5 HEAD@{0}: checkout: moving from master to cbd6
61fd2bc HEAD@{1}: clone: from https://github.com/ahmadia/bio-pipeline.git
By default, git reflog
outputs one line of text
for each time HEAD has moved.
The last move was caused by our checkout command,
and moved us to the revision identified by cbd6ff5
.
We are interested in the first column of output,
which tells us which revision we were on
before we called git checkout
.
Since git reflog
reports our actions
going backwards in time,
the first row contains our current revision,
and the second row is one checkout back,
where we started.
Lets we go back to the revision we started at.
$ git checkout 61fd
Previous HEAD position was cbd6ff5... Added data file
HEAD is now at 61fd2bc... Made fixes to Python pipeline
git checkout
to
your neighbor.git reflog
if you call it now. Try it.git checkout -
. Can you explain what it
does to your neighbor? (You may need to call it multiple times and
inspect the reflog each time).At some point in the project's history,
Aron replaced the file python_pipeline.py
with python_pipeline.ipy
.
python_pipeline.py
python_pipeline.py
in an editor.git checkout
is Git's Swiss Army Knife.
It does slightly different things,
depending on how it's called.
We just showed you
how to restore the entire directory to a previous state,
but git checkout
also allows us
to just restore a specific file.
Let's practice by doing something dangerous. First, let's make sure you're on the most recent revision.
$ git checkout 61fd
Then, go ahead and remove Lumi.2763.csv.
$ rm Lumi.2763.csv
$ ls Lumi.2763.csv ✖
ls: Lumi.2763.csv: No such file or directory
There are a number of ways
to accidentally corrupt, modify, overwrite, or destroy files.
Here, we use the rm
command,
to simulate a catastrophic deletion
of our valuable data.
Fortunately, since our copy of Lumi.2763.csv
was committed to the repository,
it is as easy as pie to restore it.
$ git checkout Lumi.2763.csv
$ ls Lumi.2763.csv
Lumi.2763.csv
In fact, so long as an undamaged copy of our Git repository exists somewhere, we will always be able to recover lost or damaged files.
git checkout
to recover the version stored in history.foo.txt
is recovered when the user types git checkout foo.txt
in a Git
repository.