Basics of using Git from the Command Line

Author(s) orcid logoAvatarHelena Rasche
Overview
Questions:
  • How can I start tracking my changes with git?

  • How do I commit changes?

  • How can I undo a mistake?

Objectives:
  • Create a repository

  • Commit a file

  • Make some changes

  • Use the log to view the diff

  • Undo a bad change

Requirements:
Time estimation: 30 minutes
Last modification: Nov 28, 2022
License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License The GTN Framework is licensed under MIT
Comment: Source

This tutorial contains text from this tutorial by Robert Adolf (@rdadolf), which is licensed CC-BY.

Version control is a way of tracking the change history of a project. Even if you have never used a version control tool, you’ve probably already done it manually: copying and renaming project folders (“paper-1.doc”, “paper-2.doc”, etc.) is a form of version control. Within bioinformatics (from research, to development, to sysadmin) a lot of us are using git as our primary method of source control for everything we do: notes, slides, tutorials, code, notebooks, ansible, system configuration, and more.

Git is a tool that automates and enhances a lot of the tasks that arise when dealing with larger, longer-living, and collaborative projects. It’s also become the common underpinning to many popular online code repositories, GitHub being the most popular.

While it can be used collaboratively, this tutorial focuses on a single-user git repository for the most basic operations.

Agenda

In this tutorial, you will learn how to create a git repo, and begin working with it.

  1. Why should you use version control?
  2. Pre-requisites
  3. Setting up a Repository
  4. Adding Files
  5. Logs
  6. Branching
  7. Undo! Revert!
  8. Further Reading

Why should you use version control?

If you ask 10 people, you’ll get 10 different answers, but one of the commonalities is that most people don’t realize how integral it is to their development process until they’ve started using it. Still, for the sake of argument, here are some highlights:

  • You can undo anything: Git provides a complete history of every change that has ever been made to your project, timestamped, commented, and attributed. If something breaks, you always have the choice of going back to a previous tate.
  • You won’t need to keep undo-ing things: One of the advantages of using git properly is that by keeping new changes separate from a stable base, you tend to avoid the massive rollbacks associated with constantly tinkering with a single code.
  • You can identify exactly when and where changes were made (and by whom!): Git allows you to pinpoint when a particular piece of code was changed, so finding what other pieces of code a bug might affect or figuring out why a certain expression was added is easy.
  • Git forces teams to face conflicts directly: On a team-based project, many people are often working with the same code. By having a tool which understands when and where files were changed, it’s easy to see when changes might conflict with each other. While it might seem troublesome sometimes to have to deal with conflicts, the alternative—not knowing there’s a conflict—is much more insidious.

Pre-requisites

You will need to install git, if you have not done so already.

Setting up a Repository

Let’s create a new repository.

Hands-on: Create a Repository
  1. Make a new directory where you will store your files, and navigate into it.

    Input: Bash
    mkdir git-tutorial; cd git-tutorial;
    
  2. Create or “initialise” the git repository with the git init command.

    Input: Bash
    git init
    
    Output
    Initialized empty Git repository in /tmp/project/.git/
    

This has created a folder .git in your project directory, here is where git stores all of it’s data that it needs to track repository changes over time. It’s not terribly interesting yet though!

Hands-on: What's the status
  1. You can always check the status of a repository with git status

    Input: Bash
    git status
    
    Output
    On branch main
    
    No commits yet
    
    nothing to commit (create/copy files and use "git add" to track)
    
    

Adding Files

Let’s add our first file, often a (pretty empty) readme file.

Hands-on: What's the status
  1. Create a new file, readme.md with some basic content

    Input: Bash
    echo "My Project" > readme.md
    
  2. Add a file with git add. This adds it to git’s staging area to be committed.

    Input: Bash
    git add readme.md
    
  3. Commit the file! This will add it to git’s log.

    It depends a lot on the community, some have specific style guides they enforce, some don’t, but in general

    • Keep the description short (<72 chars) and descriptive.
    • If you need, provide a long description as well, explaining your changes. (Use git commit without the -m flag!) A lot has been written about good commit messages, search the internet and find ideas for what you think makes a good commit message!

    And beware of the trap we all fall into sometimes, unhelpful commit messages xkcd comic with commits in a table from 14h to 2h ago, starting with extremely useful commits like "created main loop & timing control", and becoming extremely unhelpful messages for the latest commits, like 'AAAAAA' or "asdfasdf". Even your author is very, very guilty of this, but you can do better!

    Input: Bash
    git commit -m "Add readme"
    
    Output
    [main (root-commit) f5ec14f] Add readme
     1 file changed, 1 insertion(+)
     create mode 100644 readme.md
    
Question: Is there anything left to do? Check the status

Check git status to see if there’s anything else left to resolve.

$ git status
On branch main
nothing to commit, working tree clean

Congratulations! You’ve made your first commit. The output of the commit command lists everything you’ve just done:

[main (root-commit) f5ec14f] Add readme
 1 file changed, 1 insertion(+)
 create mode 100644 readme.md

f5ec14f is the commit id, every commit you make is given a hash which uniquely refers to that specific commit. Next we see our commit message Add readme, a brief mention of how many files we’ve changed, and how many insertions or deletions we’ve made to the text, and lastly which files we’ve added.

Exercise: Make some more commits

Hands-on: Make some more commits
  1. Add your name to the readme.md and commit your changes.

    Input: Bash
    echo "Author: hexylena" >> readme.md
    git add readme.md
    git commit -m 'Add author name'
    
  2. Make up a project description, add it to the readme, and commit.

    Input: Bash
    echo "This project enables stakeholders to experience synergistic effects and increase link up opportunities to improve quarterly and YOY ROI.\n" >> readme.md
    git add readme.md
    git commit -m 'Add project description'
    
  3. Pick a license for your project, and mention it in the readme.md, and commit.

    Input: Bash
    echo "# License\nAGPL-3.0" >> readme.md
    git add readme.md
    git commit -m 'Add project license'
    

After this step you should have ~3 commits to work with!

Logs

One of the most helpful things about git is that, if you have written good commit messages, you can tell what you did and when!

Hands-on: Check the Receipts
  1. Check the log with git log. Notice that you can see each commit in reverse chronological order (newest at top), who made the commit, when, and what the commit message was.

    Input: Bash
    git log
    
    Input: Output
    commit 5d05eb3ec22fd49282b585c60ef8f983d68c2fd7
    Author: Helena Rasche <hxr@hx42.org>
    Date:   Mon Jun 13 12:13:21 2022 +0200
    
        Add project license
    
    commit 62f974ec5f538232f65b016cf073815349364efa
    Author: Helena Rasche <hxr@hx42.org>
    Date:   Mon Jun 13 12:13:16 2022 +0200
    
        Add project description
    
    commit 10355c019c04052c15a95a817de04f9ea0ec336c
    Author: Helena Rasche <hxr@hx42.org>
    Date:   Mon Jun 13 12:13:11 2022 +0200
    
        Add author name
    
    commit f5ec14f05384d76812fc0576df5e4af79336f4e6
    Author: Helena Rasche <hxr@hx42.org>
    Date:   Mon Jun 13 11:59:23 2022 +0200
    
        Add readme
    

The output of git log is a great way to help you remember what you were doing.

Hands-on: `git log -p`
  1. Use git log -p to see the log, along with which lines were changed in each commit.

But currently this log is pretty boring, so let’s replace a line and see how that looks.

Hands-on: Replace a line
  1. Update your project description in the readme.md, you’ve been told you need to support completely different features.

    Input: Bash
    sed -i s'/enables.*ROI/creates baking recipes/g' readme.md
    git add readme.md
    git commit -m 'Update project description'
    
  2. Check what happened with the git log -p:

    Output: Output
    $ git log -p
    commit 416a121dfcda14de0c2cb181f298b2c08950475f (HEAD -> main)
    Author: Helena Rasche <hxr@hx42.org>
    Date:   Mon Jun 13 12:18:00 2022 +0200
    
        Update project description
    
    diff --git a/readme.md b/readme.md
    index befc0c9..3b8899e 100644
    --- a/readme.md
    +++ b/readme.md
    @@ -1,6 +1,6 @@
     My Project
     Author: hexylena
    -This project enables stakeholders to experience synergistic effects and increase link up opportunities to improve quarterly and YOY ROI.
    +This project creates baking recipes.
    
     # License
     AGPL-3.0
    
    

    This is a diff, a comparison between two versions of a file.

    If you haven’t worked with diffs before, this can be something quite new or different.

    If we have two files, let’s say a grocery list, in two files. We’ll call them ‘a’ and ‘b’.

    Old
    $ cat old
    🍎
    🍐
    🍊
    🍋
    🍒
    🥑
    New
    $ cat new
    🍎
    🍐
    🍊
    🍋
    🍍
    🥑

    We can see that they have some different entries. We’ve removed 🍒 because they’re awful, and replaced them with an 🍍

    Diff lets us compare these files

    $ diff old new
    5c5
    < 🍒
    ---
    > 🍍

    Here we see that 🍒 is only in a, and 🍍 is only in b. But otherwise the files are identical.

    There are a couple different formats to diffs, one is the ‘unified diff’

    $ diff -U2 old new
    --- old 2022-02-16 14:06:19.697132568 +0100
    +++ new 2022-02-16 14:06:36.340962616 +0100
    @@ -3,4 +3,4 @@
    🍊
    🍋
    -🍒
    +🍍
    🥑

    This is basically what you see in the training materials which gives you a lot of context about the changes:

    • --- old is the ‘old’ file in our view
    • +++ new is the ‘new’ file
    • @@ these lines tell us where the change occurs and how many lines are added or removed.
    • Lines starting with a - are removed from our ‘new’ file
    • Lines with a + have been added.

    So when you go to apply these diffs to your files in the training:

    1. Ignore the header
    2. Remove lines starting with - from your file
    3. Add lines starting with + to your file

    The other lines (🍊/🍋 and 🥑) above just provide “context”, they help you know where a change belongs in a file, but should not be edited when you’re making the above change. Given the above diff, you would find a line with a 🍒, and replace it with a 🍍

    Added & Removed Lines

    Removals are very easy to spot, we just have removed lines

    --- old	2022-02-16 14:06:19.697132568 +0100
    +++ new 2022-02-16 14:10:14.370722802 +0100
    @@ -4,3 +4,2 @@
    🍋
    🍒
    -🥑

    And additions likewise are very easy, just add a new line, between the other lines in your file.

    --- old	2022-02-16 14:06:19.697132568 +0100
    +++ new 2022-02-16 14:11:11.422135393 +0100
    @@ -1,3 +1,4 @@
    🍎
    +🍍
    🍐
    🍊

    Completely new files

    Completely new files look a bit different, there the “old” file is /dev/null, the empty file in a Linux machine.

    $ diff -U2 /dev/null old
    --- /dev/null 2022-02-15 11:47:16.100000270 +0100
    +++ old 2022-02-16 14:06:19.697132568 +0100
    @@ -0,0 +1,6 @@
    +🍎
    +🍐
    +🍊
    +🍋
    +🍒
    +🥑

    And removed files are similar, except with the new file being /dev/null

    --- old	2022-02-16 14:06:19.697132568 +0100
    +++ /dev/null 2022-02-15 11:47:16.100000270 +0100
    @@ -1,6 +0,0 @@
    -🍎
    -🍐
    -🍊
    -🍋
    -🍒
    -🥑

Who did that? git blame to the rescue

If you want to know who changed a specific line of a file, you can use git blame to find out it was probably your fault (as most of us experience when we check the logs.)

Input: Bash
git blame readme.md
Input: Output
^f5ec14f (Helena Rasche 2022-06-13 11:59:23 +0200 1) My Project
10355c01 (Helena Rasche 2022-06-13 12:13:11 +0200 2) Author: hexylena
416a121d (Helena Rasche 2022-06-13 12:18:00 +0200 3) This project creates baking recipes.
62f974ec (Helena Rasche 2022-06-13 12:13:16 +0200 4)
5d05eb3e (Helena Rasche 2022-06-13 12:13:21 +0200 5) # License
5d05eb3e (Helena Rasche 2022-06-13 12:13:21 +0200 6) AGPL-3.0

here we can see for every line: which commit last affected it, who made that commit, and when.

Branching

Git has the concept of branches which are most often used to manage development over time, before it’s considered final. Until now you’ve seen main in your commits and commit logs (or maybe master if your git installation is a bit older.)

Oftentimes you’ll see this pattern:

  1. There is a main branch with a lot of history
  2. You want to test out a new option, new configuration, new script you’re working on
    1. So you make a branch
    2. Work on that branch
  3. And merge it back into the main branch, once it’s done.

This is especially relevant for any project that is shared with others, has a public view, or a deployed version of the code. There you don’t want to affect anyone else using the project, or you don’t want to affect the production deployment, until you’re done making your changes.

Hands-on: Create a new branch
  1. git switch -c <branch> is the command used to create a new branch and switch to it.

    Input: Bash
    git switch -c test
    
    Input: Output
    Switched to a new branch 'test'
    

If you look around, you’ll notice everything looks exactly the same! But in fact we are now on a different branch:

Hands-on: See available branches
  1. git branch lists our available branches, and puts an asterisk next to the one we’re currently on.

    Input: Bash
    git branch
    
    Input: Output

    ```bash main

    • test ```

We’re now on the test branch, so let’s make a commit.

Hands-on: Add a new file
  1. Add a new file, let’s call it docs.md. Write something into it, it doesn’t matter much what.

    Input: Bash
    echo "# Project Documentation" > docs.md
    
  2. Add it, commit it.

    Input: Bash
    git add docs.md
    git commit -m "Added documentation"
    

This file now only exists on the testing branch.

Hands-on: Try Switching Branches
  1. Try switching back and forth between the main and test branches, and check what’s available on each!

    Input: Bash
    git branch
    
    Input: Bash
    git switch main
    ls
    
    Input: Output
    readme.md
    
    Input: Bash
    git switch test
    ls
    
    Input: Output
    docs.md   readme.md
    

Each branch has a different view of the repository, and might have different changes on it. Branches are really useful to keep track of work in progress, until it’s done. In a single user environment however, most people often don’t use them, but once you’re collaborating with other’s they’re incredibly important!

Merging

Once you’re done with a branch, you can merge it into the main branch. This will take all of the work you did on that branch, and make it part of the main branch.

First, let’s compare the two branches, to see what changed.

Hands-on: Replacing argv.
  1. Compare your current branch against the main branch with git diff main

    Input: Bash
    git diff main
    
    Input: Output
    diff --git a/docs.md b/docs.md
    new file mode 100644
    index 0000000..384aaaa
    --- /dev/null
    +++ b/docs.md
    @@ -0,0 +1 @@
    +# Project Documentation
    

We can see the output shows all of our changes compared to the main branch and it looks like what we want, so, let’s merge it in.

Hands-on: Merge the `test` branch into `main`
  1. Switch to the main branch

    Input: Bash
    git switch main
    
  2. Merge in the test branch

    Input: Bash
    git merge test
    
    Input: Output
    Updating 416a121..9a3387d
    Fast-forward
     docs.md | 1 +
     1 file changed, 1 insertion(+)
     create mode 100644 docs.md
    

This has merged all of the changes you made on the test branch into the main branch.

Hands-on: Check the history
  1. Check git log -p again to see the history.

Undo! Revert!

Oh no, you’ve decided you liked your original project description better. Let’s find that commit and revert it.

Hands-on: Find and revert the bad commit
  1. Find the commit you want to revert, e.g. with git log, find the one named “Update project description” (or similar.)

    Input: Bash
    git log
    
  2. We can use the git revert command to undo this commit.

    Input: Bash
    git revert 416a121dfcda14de0c2cb181f298b2c08950475f
    

This generates a new commit, which reverts the older commit (and probably puts you in a text editor to edit the commit message). This is not the only way to undo mistakes, but probably the easiest.

If you check your git log you’ll see the change was undone in a second commit, reverting the first. So if you just look at the current files it appears we never undid it, but within the logs we can see the undo step.

With that you’ve got enough skills to track your own data/code/etc with git!

Further Reading

Key points
  • While git is extremely powerful, just using it for tracking changes is quite easy!

  • This does not take advantage of any advanced features, nor collaboration, but it is easy to expand into doing that.

Frequently Asked Questions

Have questions about this tutorial? Check out the FAQ page for the Foundations of Data Science topic to see if your question is listed there. If not, please ask your question on the GTN Gitter Channel or the Galaxy Help Forum

Feedback

Did you use this material as an instructor? Feel free to give us feedback on how it went.
Did you use this material as a learner or student? Click the form below to leave feedback.

Click here to load Google feedback frame

Citing this Tutorial

  1. Helena Rasche, Basics of using Git from the Command Line (Galaxy Training Materials). https://training.galaxyproject.org/training-material/topics/data-science/tutorials/git-cli/tutorial.html Online; accessed TODAY
  2. Batut et al., 2018 Community-Driven Data Analysis Training for Biology Cell Systems 10.1016/j.cels.2018.05.012



@misc{data-science-git-cli,
author = "Helena Rasche",
title = "Basics of using Git from the Command Line (Galaxy Training Materials)",
year = "",
month = "",
day = ""
url = "\url{https://training.galaxyproject.org/training-material/topics/data-science/tutorials/git-cli/tutorial.html}",
note = "[Online; accessed TODAY]"
}
@article{Batut_2018,
    doi = {10.1016/j.cels.2018.05.012},
    url = {https://doi.org/10.1016%2Fj.cels.2018.05.012},
    year = 2018,
    month = {jun},
    publisher = {Elsevier {BV}},
    volume = {6},
    number = {6},
    pages = {752--758.e1},
    author = {B{\'{e}}r{\'{e}}nice Batut and Saskia Hiltemann and Andrea Bagnacani and Dannon Baker and Vivek Bhardwaj and Clemens Blank and Anthony Bretaudeau and Loraine Brillet-Gu{\'{e}}guen and Martin {\v{C}}ech and John Chilton and Dave Clements and Olivia Doppelt-Azeroual and Anika Erxleben and Mallory Ann Freeberg and Simon Gladman and Youri Hoogstrate and Hans-Rudolf Hotz and Torsten Houwaart and Pratik Jagtap and Delphine Larivi{\`{e}}re and Gildas Le Corguill{\'{e}} and Thomas Manke and Fabien Mareuil and Fidel Ram{\'{\i}}rez and Devon Ryan and Florian Christoph Sigloch and Nicola Soranzo and Joachim Wolff and Pavankumar Videm and Markus Wolfien and Aisanjiang Wubuli and Dilmurat Yusuf and James Taylor and Rolf Backofen and Anton Nekrutenko and Björn Grüning},
    title = {Community-Driven Data Analysis Training for Biology},
    journal = {Cell Systems}
}
                   

Congratulations on successfully completing this tutorial!