meta data for this page
  •  

This is an old revision of the document!


Working with the command line

Introduction

We trust that all of you have worked with computers before in some way. Programs that you are already familiar with provide you with a nice and shiny graphical user interface (GUI) that makes the use of these programs pretty easy. Probably, one of the most widely known examples is the text editor Microsoft Word. As one part of our course, we will, however, introduce you into the working with the shell1). This is, for an average user, the most direct way of interacting with the computer. We assume that the use of the shell is a bit cryptic at the beginning. However once you got used to it you can appreciate the 'power of the shell'. But let's get started.

The general appearance of the command line prompt is something along the following lines

username@computername:~$

Code documentation

Sometimes we will provide you with short pieces of code that you can copy and paste into your terminal. Commands like that will appear in grey boxes like this:

# this is only a comment
$ mkdir <dirname>
  • Lines starting with a hashtag # are comments and are not part of the code
  • The $ sign represents the prompt in your shell (see the example above). Please do not copy it, when trying to copy-past commands from the DokuWiki to the command line
  • Words in-between <these signs> are placeholders and need to be exchanged for something when typing the command.

Prerequisites

To complete this set of exercises, you should be familiar with

  1. how to :!: open a shell on your computer (need info?)
  2. the concept of a directory tree and what a :!: path is (need info?)
    Figure 1: Section of the directory tree in the AK Applied Bioinformatics. The root of the tree is on the left side and is represented by the '/. The absolute paths leading to the individual terminal files and directories are given in parentheses.

Things to remember

Things to remember

  • Your operation system stores information in directories (a.k.a. folders) and in files
  • A directory is just like a basket that can contain zero to many things
  • A file is like a document. It contains information such as text or program code
    • Files can be either human readable, i.e. they contain plain text, or binary information which is not meant for reading
  • Directories can contain
    • other directories (sometimes called sub-directories)
    • files
  • If a directory contains sub-directories, then a tree-like structure emerges
  • it is a convention that file names indicate the file type by appending a '.' and an informative suffix, e.g. 'myinfo.txt'. However, this is only a convention, and there is little to enforce this convention.
  • it is a convention that directory names do not contain any special characters. You will make your life substantially easier when you avoid characters such as '!', '?', '$' and the like. See also the next point in the list.
  • on linux-based systems it is advisable to not use white spaces in file or directory names. Use underscores ('_') instead.
  • each file and each directory can be addressed by a unique path
  • absolute paths start at the :!: root of the directory tree. Thus, an absolute path always starts with a '/'
  • relative paths start at the current working directory. Thus, they never start with a '/'
  • directories in a path are separated by a '/'
  • A path can end in a directory, but it always ends when you specify a file

Task List

Once you know how to open a BASH shell on the system you are using, it is time to learn how to use it. We have compiled a selection of resources and exercises for you after which you will be comfortably working with the command line in no time.

1. Command Line Bootcamp

If you are not familiar with the command line, the command line bootcamp is a nice way to introduce you into working with the command line. You can spend some time in walking through the tutorial in a shell on your system- if you are within the AppliedBionformaticsFrankfurt network you have access to an interactive environment.

Memorize the individual commands, and it might be good idea to generate yourself short wiki pages that outline the individual functions together with the most relevant options. See the following pages as an example:

  • Changing directories: cd
  • locating your position in the directory tree: pwd
  • looking into files: less
  • linking files: ln

Remember that the DokuWiki is a shared resource and that you can work together when creating these notes.

2. Custom exercises

2.1 Anaconda and Jupyter

We have compiled a set of tasks for you that will deepen your knowledge about working with the BASH shell and will introduce some principles and dataformats which are common to bioinformatics.

These exercises will come in the format of Jupyter notebooks which are a great way of making analyses reproducible and easy to share. If you don't have a working version of jupyter notebook on your computer system you can install it via Anaconda. Please set up Anaconda with the tutorial in our wiki. Now you can install Jupyter notebook by typing:

mamba create -n jupyter jupyter

Go ahead and download our exercises from GitHub via this LINK. The easiest way to start the download is to click on the green “Code button” in the top right corner and select “Download ZIP” (Figure 2). Don't forget to unpack the directory with a ZIP file manager of your choice.

Figure 2: Starting download of the ZIP file.

2.2 Exercises

Open a terminal on your system and navigate to the directory you have just downloaded and extracted. Now, you can start a Jupyter notebook by simply typing:

jupyter notebook

This will open a window in your browser with which you can navigate to the `.ipynb` files of each exercise. The notebooks contain a set of instructions and some tasks. They also contain code cells in which you should document the command which solve the task.

You can also use the code cells to experiment and find your solution, but we encourage you to try out all commands in your local terminal as well.

3. Using a computer cluster

In the previous exercises you have learned to write commands and pipelines in the BASH shell. Now we want to look at how we can expand our analyses to large-scale analyses or datasets. For such resource heavy jobs we have a computer cluster available which is managed by the SLURM architecture. Please read through the information about SLURM and then solve the task below.

  • Have a look at the FASTQ file stored here:
    /share/project/mscmbw2/data/C_1.2/ForwardFile.fq
  • Check the size of the file using ls -lh
  • Count the number of header lines in the file and measure how long your command takes with the time command
  • Create a SLURM script file that executes the same command and run it on the cluster
  • Discuss with the other people in the course when best to use the computer cluster

4. Environments

.bashrc

The .bashrc file will be loaded and executed every time a user logs in. It contains a series of configurations for the terminal session like settings for completion, shell history, command aliases, paths to computer programs, and more. The .bashrc is a hidden file and will not be listed with a normal ls command. You can make it visible in your home directory with the following commands:

cd ~
ls -a

Alias

If you often use the same long command you can simplify your life by adding an alias to the end of .bashrc.

alias alias_name="command_to_run"

Excercise:

Create an alias called that navigates you two folders back in the folder tree.

solution

solution

open the .bashrc

nano ~/.bashrc

add the alias at the end in writting mode (i)

alias ...='cd ../..'

reload bashrc

source ~/.bashrc

$PATH

By using commands like ls or cd you're basically telling the shell to run an executable file. The files are usually in different folders on your computer system. Therefore the variable $PATH exists. When you type a command the computer searches through all locations saved in $PATH for executable scripts with the correct name. You can learn here how to add a new path to $PATH to simplify your life.

If you want to have a look at which paths are already stored in $PATH you can use the following command:

echo $PATH

A new path can be added through the following command:

export PATH="<new path>:$PATH"

The export command will export the modified variable $PATH to the shell child process environments. But this is change is only temporary. If you want to make the change permanent you have to add the same command at the end of the .bashrc. After saving you have to reload the .bashrc

source ~/.bashrc

Excercise:

Sometimes you will have to install programs without using a package manager like Anaconda. You can always start installed packages by using the absolute path to their main script, but adding them to your $PATH will save you a lot of typing. Follow the next steps to add an example script to your $PATH.

  • create a new folder you want to add later on to $PATH
mkdir scripts
  • open a new file and add this example function
# will use the date function to print out some information
echo This is a `date +"%A %d in %B of %Y (%r)"`
  • Add the path to the newly created file to $PATH
export PATH="/home/hannah/scripts:$PATH"

check if your path was added correctly

today
This is a Tuesday 12 in October of 2021 (04:38:31 PM)

Maybe you get a permission denied error. Then you have to change the rights of your file.

chmod +x today

Enjoy your new power

Additional ressources

1)
Sometimes people refer to the shell also as terminal or command line