meta data for this page
  •  

This is an old revision of the document!


Basic bash commands

We provide below a set of basic commands to be used in the shell together with an example application. To test the effect of these command, download the file ejemplo.txt below, and give it a try. Note, the example file seems to contain useless information. Still, once you understood what the commands are doing, they will help you during the course to extract and format information from sequence files, even if they contain millions of lines.

ejemplo.txt
XXXXXXXXXXXXX
aaaaa	xxxxx
xxxxx	bbbbb
ccccc	xxxxx
xxxxx	ddddd
eeeee	xxxxx
aaaaa	bbbbb
ddddd	fffff
axaxa	bxbxb
XXXXXXXXXXXXX

Play with the following commands. If necessary, make changes to find out what they are doing. Remember to call the file only in the first command given: cat ejemplo.txt | grep “xxxxx”

Reading files + counting lines

  • cat ejemplo.txt
  • less ejemplo.txt (press q to exit)
  • head -n3 ejemplo.txt (returns the first three lines of the file)
  • tail -n2 ejemplo.txt (returns the last two lines of the file)
  • cat ejemplo.txt | wc -l (counts the number of lines in the file)
  • cat ejemplo.txt | uniq (displays the content of the file by removing successive identical lines)
  • cat ejemplo.txt | sort | uniq (sorts the content of the file, removes successive identical lines, and displays the output)
  • cat ejemplo.txt | sort | uniq -c (counts the number of unique lines in the file)

Pattern matching using grep

  • grep “xxxxx” ejemplo.txt (search for the pattern 'xxxxx' in the file ejemplo.txt)
  • grep “xxxxx” ejemplo.txt | wc -l (counts the number of lines in ejemplo.txt that contain the pattern 'xxxxx')
  • grep -c “xxxxx” ejemplo.txt (also counts the number of lines in ejemplo.txt that contain the pattern 'xxxxx')
  • grep -v “x” ejemplo.txt (returns the lines that do not contain the pattern)
  • grep -i “x” ejemplo.txt (makes the pattern matching case insensitive)
  • grep -o “aaaaa” ejemplo.txt (returns only the matching pattern)
  • grep -a2 “c”

grep -A2 “c”

grep -B2 "c"
grep -E "aaaaa|ccccc"

-> What would you write to find lines that contain both "a" and "b"?

2) Understanding regular expressions.

grep "^x"
grep "x$"
grep "aaaaa..." #How many characters are highlighted?#
grep "a*"
grep "a.*"
grep "a[^a]"
grep "a[ax]"
Regular expressions can be combined with the previous options, particularly with -o to extract characters after a pattern.
-> Try extracting only the first 3 characters of lines that start with "a"

3) sed

sed s/x/i/
sed s/x/i/g
sed s/axaxa/kkkkk/
Combine with regular expressions.
-> Try converting the first 3 characters into "iii"

sed + special characters (An special character might be )
sed s/\t//
sed -e 's/\t//'

4) cut

cut -f2
-> Extract the first column of lines that contain "d"

5) Save/overwrite output (ACHTUNG: do not give same name as input file)

cut -f2 ejemplo.txt > copy_ejemplo.txt
To save without overwriting (lines get added to file):
grep "X" ejemplo.txt >> copy_ejemplo.txt

6) both files vs for-loop

grep "xxxxx" ejemplo.txt | wc -l
grep "xxxxx" copy_ejemplo.txt | wc -l
cat *plo.txt | wc -l
for i in *plo.txt; do echo $i; done
for i in *plo.txt; do cat $i | wc -l; done

7) translate

cat ejemplo.txt | tr -d '\n'