meta data for this page
  •  

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
general:computerenvironment:shell_basic_commands [2019/01/11 15:03] – [Pattern matching using grep] ingogeneral:computerenvironment:shell_basic_commands [2020/09/07 18:20] (current) – [If / Else] ingo
Line 19: Line 19:
 Remember to call the file only in the first command given: Remember to call the file only in the first command given:
 cat ejemplo.txt | grep "xxxxx" cat ejemplo.txt | grep "xxxxx"
 +
 +----
 +Tip: You can create this file with:
 +<code>
 +nano ejemplo.txt
 +</code>
 +paste the contents of the ejemplo file into the nano editor and close it using STRG/CMD + X
 +
 +Alternatively you can download the file by clicking on its name and do the exercises in a linux environment on your local computer (for example using mobaXterm on windows, or in the command line on macOS).
 +----
 +
  
 ===== Reading files + counting lines ===== ===== Reading files + counting lines =====
Line 29: Line 40:
   * ''cat ejemplo.txt | sort | uniq'' (sorts the content of the file, removes successive identical lines, and displays the output)   * ''cat ejemplo.txt | sort | uniq'' (sorts the content of the file, removes successive identical lines, and displays the output)
   * ''cat ejemplo.txt | sort | uniq -c'' (counts the number of unique lines in the file)   * ''cat ejemplo.txt | sort | uniq -c'' (counts the number of unique lines in the file)
 +
 +
 +===== Wildcards =====
 +The commands will apply to all the files in the current directory that match the pattern.
 +"*" is any number of any character. For example, the following command will concatenate all the files that have the ending ".txt": 
 +<code>
 +cat *.txt
 +</code>
 +"?" works as any single character:
 +<code>
 +cat ejemplo.t?t
 +</code>
 +
 +----
 +
 +===== Tab completion =====
 +
 +You can auto-complete paths by pressing the tab key once or twice anytime while writing a path. If the auto-completion does not work, there might be something wrong with your path..
 +
 +You can check out [[https://youtu.be/igwD7cL6QOk|this video]] to see how it works.
 +----
  
 ===== Pattern matching using grep ===== ===== Pattern matching using grep =====
Line 40: Line 72:
   * ''grep -B2 "c" ejemplo.txt'' (returns the matching plus the two preceding lines)   * ''grep -B2 "c" ejemplo.txt'' (returns the matching plus the two preceding lines)
   * ''grep -E "aaaaa|ccccc" ejemplo.txt'' (returns lines that contain **either** 'aaaaa' or 'ccccc'   * ''grep -E "aaaaa|ccccc" ejemplo.txt'' (returns lines that contain **either** 'aaaaa' or 'ccccc'
-    * What would you have write to find lines that contain both "a" and "b"?+<hidden Exercise> 
 +What would you have write to find lines that contain both "a" and "b"? 
 +</hidden> 
 + 
 +----
  
 ===== Regular expressions ===== ===== Regular expressions =====
Line 51: Line 87:
   * ''grep "a[ax]" ejemplo.txt'' (returns all lines where an 'a' is either followed by an 'a' or by an 'x')   * ''grep "a[ax]" ejemplo.txt'' (returns all lines where an 'a' is either followed by an 'a' or by an 'x')
  
-Regular expressions can be combined with the previous options, particularly with -o to extract characters after a pattern.+Note that the regular expressions work on "file contents". Do not confuse with wildcards.
 <hidden Exercise> <hidden Exercise>
-Try extracting only the first 3 characters of lines that start with "a"+Regular expressions can be combined with the previous options, particularly with -o to extract characters after a pattern. Try extracting only the first 3 characters of lines that start with "a"
 </hidden> </hidden>
-3) sed 
- sed s/x/i/ 
- sed s/x/i/g 
- sed s/axaxa/kkkkk/ 
  
- Combine with regular expressions. +---- 
- -> Try converting the first 3 characters into "iii" + 
-  +===== Text editing using sed ===== 
- sed + special characters (An special character might be +<code> 
- sed s/\t// +sed s/x/i/ ejemplo.txt 
- sed -e 's/\t//'+sed s/x/i/g ejemplo.txt 
 +sed s/axaxa/kkkkk/ ejemplo.txt 
 +</code> 
 + 
 +**Sed and special characters. **  
 +(An special character might be a space, a tab(\t), a symbol reserved for regular expressions ($), etc.): 
 + 
 +<code> 
 +sed s/\t// ejemplo.txt 
 +sed -e 's/\t//' ejemplo.txt 
 +</code> 
 + 
 +<hidden Exercise> 
 +Combine with regular expressions -> Try converting the first 3 characters into "iii"  
 +</hidden> 
 + 
 +---- 
 + 
 +===== Cut ===== 
 +Extract columns from a table 
 +<code> 
 +cut -f2 ejemplo.txt 
 +</code> 
 +<hidden Exercise> 
 +Extract the first column of lines that contain the character "d" 
 +</hidden> 
 +---- 
 + 
 +===== Save/overwrite output =====  
 +(ACHTUNG: do not give same name as input file
 +<code> 
 +cut -f2 ejemplo.txt > copy_ejemplo.txt 
 +To save without overwriting (lines get added to file): 
 +grep "X" ejemplo.txt >> copy_ejemplo.txt 
 +</code> 
 +---- 
 + 
 +===== Other commands ===== 
 + 
 +**Echo:** 
 +Print something on the terminal 
 +<code> 
 +echo "Witness me" 
 +</code> 
 +**Translate:** 
 +Convert one character to another. 
 +<code> 
 +cut -f1 ejemplo.txt 
 +cut -f1 ejemplo.txt | tr -d '\n' 
 +Notice anything in the output? Allow me to fix it: 
 +cut -f1 ejemplo.txt | tr -d '\n'sed -e 's/$/\n/'</code> 
 +**Rev:** 
 +print each line backwards 
 +<code> 
 +echo "a b c d e" 
 +echo "a b c d e" | rev 
 +</code> 
 + 
 +===== Common errors ===== 
 +<hidden Calling the file twice> 
 +<code> 
 +grep "XXX" ejemplo.txt | sed -e 's/X/V/g' ejemplo.txt 
 +</code> 
 +The output of the first part of the command is not getting passed as the input to the second part of the command, since the file is being read again.  
 +</hidden> 
 +<hidden Saving file with the input name> 
 +<code> 
 +cat ejemplo.txt > copy.txt 
 +grep "aaa" copy.txt > copy.txt 
 +cat copy.txt 
 +</code> 
 +Input file is also the output. The contents of input file will be brutally deleted. 
 +</hidden> 
 +<hidden Moving files into non-existing directories> 
 +<code> 
 +cat ejemplo.txt > copy.txt 
 +mv copy.txt folder 
 +cd folder 
 +cat folder 
 +</code> 
 +If a directory does not exist, 'mv' renames the file instead</hidden> 
 +<hidden Grep and quotations> 
 +Trying to find all ">" symbols in the following file: 
 +<file txt important_sequence.fasta> 
 +>Important_sequence1 
 +AAATTCTCACCCCTCAGAAA 
 +>Important_sequence2 
 +ACCTCAGAAAAATTCTCACC 
 +</file> 
 +<code> 
 +grep > important_sequence.fasta 
 +</code> 
 +The crocodile ">" without quotations will be interpreted as "save file" instead of "find". The original file will be overwritten with nothing. 
 +</hidden> 
 +<hidden Wildcards or regular expressions> 
 +<code> 
 +sed -e 's/a*//' ejemplo.txt 
 +</code> 
 +Note that "*" has a different meaning as a wildcard and as a regular expression. Wildcard "*" = Regex ".*" 
 +</hidden> 
 +<hidden File names> 
 +Some naming schemes can make your life as a programmer a living hell. 
 +<file txt output file final final definitive 6.fasta> 
 +aaa 
 +bbb 
 +ccc 
 +</file> 
 +<code> 
 +rename -e 's/_/ /g' output_file_final_final_definitive_6.fasta 
 +cat output\ file\ final\ final\ definitive\ 6.fasta 
 +</code> 
 +</hidden> 
 + 
 +---- 
 + 
 +====== Slightly more advanced bash commands ====== 
 + 
 +===== For-loop ===== 
 +First we will create another file. Observe differences in output when concatenating both files and when working with them separately in a for-loop: 
 +<code> 
 +grep "fffff" ejemplo.txt > copy_ejemplo.txt 
 +cat ejemplo.txt | wc -l 
 +cat copy_ejemplo.txt | wc -l 
 +</code> 
 + 
 +**Both files** 
 +<code>cat *plo.txt | wc -l</code> 
 + 
 +**For-loop** 
 + 
 +<ff serif><fs large>for <fc #4682b4>i</fc> in <fc #4682b4>*plo.txt</fc>; do <fc #800080>echo $i</fc>; done</fs></ff> 
 + 
 +  * A <fc #4682b4>list of files</fc> that you want to process; namely, the files ejemplo.txt and copy_ejemplo.txt 
 +  * <fc #800080> Instruction for each element of your list</fc>; in this case, print said element on the terminal. "$i" is each element of list "i"; note that the variable name is specified by the user, so I could create the list "k" and instruct each element "$k"
 +  * The rest is part of the basic sintax of a for-loop. 
 + 
 +We next change the instruction from the example above, to reading the file and counting its lines: 
 +<code>for i in *plo.txt; do cat $i | wc -l; done</code> 
 + 
 +Which file is which? 
 +<code>for i in *plo.txt; do cat $i | wc -l | sed -e "s/^/$i\t/"; done</code> 
 + 
 +<hidden Exercise> 
 +Find the lines that contain "a", "b", and "c" in ejemplo.txt, and save them into their respective files (a.txt, b.txt, c.txt).  
 +This means, each file should look like this: 
 +<file txt a.txt> 
 +aaaaa xxxxx 
 +aaaaa bbbbb 
 +axaxa bxbxb 
 +</file> 
 +Hint: let me start the command for you: 
 +<code> 
 +for i in a b c; do ... 
 +</code> 
 +</hidden> 
 + 
 + 
 +---- 
 + 
 +===== AWK ===== 
 +This command is also considered a programming language on its own. It is particularly useful when needing to process the elements of a table. The basic syntax is the following: 
 + 
 +<ff serif><fs large>awk <fc #008000>-F "\t"</fc> '{print <fc #4682b4>$2</fc> <fc #fa8072>"\t"</fc> $1}'</fs></ff> 
 +  * Indicate the <fc #008000>delimiter of the table</fc>. Here it is specified to tabs, but the default is space. 
 +  * <fc #4682b4>$ indicates the number of column</fc> that you wish to return 
 +  * <fc #fa8072>Text added to the table</fc> must be written inside quotation marks; in this case the text is the addition of a tab character. 
 +  * Columns can also be processed with mathematical operations. For instance, print $1 + $2.
  
-4) cut 
- cut -f2 
- -> Extract the first column of lines that contain "d" 
  
-5) Save/overwrite output (ACHTUNG: do not give same name as input file) +<code> 
- cut -f2 ejemplo.txt > copy_ejemplo.txt+grep "aaaaa" ejemplo.txt 
 +grep "aaaaa" ejemplo.txt | awk -F "\t" '{print $2 "\t" $1}' 
 +</code>
  
- To save without overwriting (lines get added to file): +<ff serif><fs large>grep "aaaaa" ejemplo.txt | awk -F "\t" '<fc #800080>$2 == "xxxxx"</fc> 
- grep "X" ejemplo.txt >> copy_ejemplo.txt+ {print <fc #ff0000>$0</fc>}'</fs></ff>
  
-6) both files vs for-loop +  * You can indicate to only print the rows where the <fc #800080>column meets a certain condition</fc>In this case, column 2 must be equal to "xxxxx"
- grep "xxxxx" ejemplo.txt | wc -l +    Equal == 
- grep "xxxxx" copy_ejemplo.txt | wc -l +    * Not equal != 
- cat *plo.txt | wc -l+    * Greater than > 
 +  * <fc #ff0000>$0</fc> prints all columns of the table
  
- for i in *plo.txt; do echo $i; done +----
- for i in *plo.txt; do cat $i | wc -l; done+
  
-7) translate +===== If / Else =====
- cat ejemplo.txt | tr -d '\n'+
  
 +<code>
 +cat ejemplo.txt | wc -l
 +if [[ $(cat ejemplo.txt | wc -l) -ge 9 ]] ; then echo "File has nine or more lines"; else echo "File has less than nine lines"; fi
 +</code>
 +To compare numbers:
 +  * -eq = is equal to
 +  * -ne = is not equal to
 +  * -ge = is greater than or equal to
 +  * -le = is lesser than or equal to
 +  * -gt = is greater than
 +  * -lt = is lesser than