We provide below a set of basic commands to be used in the shell together with an example application. To test the effect of these command, download the file ejemplo.txt below, and give it a try. Note, the example file seems to contain useless information. Still, once you understood what the commands are doing, they will help you during the course to extract and format information from sequence files, even if they contain millions of lines.
XXXXXXXXXXXXX aaaaa xxxxx xxxxx bbbbb ccccc xxxxx xxxxx ddddd eeeee xxxxx aaaaa bbbbb ddddd fffff axaxa bxbxb XXXXXXXXXXXXX
Play with the following commands. If necessary, make changes to find out what they are doing. Remember to call the file only in the first command given: cat ejemplo.txt | grep “xxxxx”
Tip: You can create this file with:
nano ejemplo.txt
paste the contents of the ejemplo file into the nano editor and close it using STRG/CMD + X
Alternatively you can download the file by clicking on its name and do the exercises in a linux environment on your local computer (for example using mobaXterm on windows, or in the command line on macOS).
cat ejemplo.txtless ejemplo.txt (press q to exit)head -n3 ejemplo.txt (returns the first three lines of the file)tail -n2 ejemplo.txt (returns the last two lines of the file)cat ejemplo.txt | wc -l (counts the number of lines in the file)cat ejemplo.txt | uniq (displays the content of the file by removing successive identical lines)cat ejemplo.txt | sort | uniq (sorts the content of the file, removes successive identical lines, and displays the output)cat ejemplo.txt | sort | uniq -c (counts the number of unique lines in the file)The commands will apply to all the files in the current directory that match the pattern. “*” is any number of any character. For example, the following command will concatenate all the files that have the ending “.txt”:
cat *.txt
“?” works as any single character:
cat ejemplo.t?t
You can auto-complete paths by pressing the tab key once or twice anytime while writing a path. If the auto-completion does not work, there might be something wrong with your path..
You can check out this video to see how it works.
grep “xxxxx” ejemplo.txt (search for the pattern 'xxxxx' in the file ejemplo.txt)grep “xxxxx” ejemplo.txt | wc -l (counts the number of lines in ejemplo.txt that contain the pattern 'xxxxx')grep -c “xxxxx” ejemplo.txt (also counts the number of lines in ejemplo.txt that contain the pattern 'xxxxx')grep -v “x” ejemplo.txt (returns the lines that do not contain the pattern)grep -i “x” ejemplo.txt (makes the pattern matching case insensitive)grep -o “aaaaa” ejemplo.txt (returns only the matching pattern)grep -A2 “c” ejemplo.txt (returns the matching plus the next two lines)grep -B2 “c” ejemplo.txt (returns the matching plus the two preceding lines)grep -E “aaaaa|ccccc” ejemplo.txt (returns lines that contain either 'aaaaa' or 'ccccc'grep “^x” ejemplo.txt (returns all lines starting with an 'x')grep “x$” ejemplo.txt (returns all lines ending with an 'x')grep “aaaaa…” ejemplo.txt (How many characters are returned?)grep “a*” ejemplo.txt (returns all lines)grep “a.*” ejemplo.txtgrep “a[^a]” ejemplo.txt (returns all lines containing a pattern where at least one 'a' is followed by a different character)grep “a[ax]” ejemplo.txt (returns all lines where an 'a' is either followed by an 'a' or by an 'x')Note that the regular expressions work on “file contents”. Do not confuse with wildcards.
sed s/x/i/ ejemplo.txt sed s/x/i/g ejemplo.txt sed s/axaxa/kkkkk/ ejemplo.txt
Sed and special characters. (An special character might be a space, a tab(\t), a symbol reserved for regular expressions ($), etc.):
sed s/\t// ejemplo.txt sed -e 's/\t//' ejemplo.txt
Extract columns from a table
cut -f2 ejemplo.txt
(ACHTUNG: do not give same name as input file)
cut -f2 ejemplo.txt > copy_ejemplo.txt To save without overwriting (lines get added to file): grep "X" ejemplo.txt >> copy_ejemplo.txt
Echo: Print something on the terminal
echo "Witness me"
Translate: Convert one character to another.
cut -f1 ejemplo.txt cut -f1 ejemplo.txt | tr -d '\n' Notice anything in the output? Allow me to fix it: cut -f1 ejemplo.txt | tr -d '\n' | sed -e 's/$/\n/'
Rev: print each line backwards
echo "a b c d e" echo "a b c d e" | rev
First we will create another file. Observe differences in output when concatenating both files and when working with them separately in a for-loop:
grep "fffff" ejemplo.txt > copy_ejemplo.txt cat ejemplo.txt | wc -l cat copy_ejemplo.txt | wc -l
Both files
cat *plo.txt | wc -l
For-loop
for i in *plo.txt; do echo $i; done
We next change the instruction from the example above, to reading the file and counting its lines:
for i in *plo.txt; do cat $i | wc -l; done
Which file is which?
for i in *plo.txt; do cat $i | wc -l | sed -e "s/^/$i\t/"; done
This command is also considered a programming language on its own. It is particularly useful when needing to process the elements of a table. The basic syntax is the following:
awk -F “\t” '{print $2 “\t” $1}'
grep "aaaaa" ejemplo.txt
grep "aaaaa" ejemplo.txt | awk -F "\t" '{print $2 "\t" $1}'
grep “aaaaa” ejemplo.txt | awk -F “\t” '$2 == “xxxxx” {print $0}'
cat ejemplo.txt | wc -l if [[ $(cat ejemplo.txt | wc -l) -ge 9 ]] ; then echo "File has nine or more lines"; else echo "File has less than nine lines"; fi
To compare numbers: