Differences

This shows you the differences between two versions of the page.

--- general:computerenvironment:shell_basic_commands [2019/01/11 15:04] – [Regular expressions] ingo
+++ general:computerenvironment:shell_basic_commands [2020/09/07 18:20] (current) – [If / Else] ingo
@@ Line 19: / Line 19: @@
 Remember to call the file only in the first command given:
 cat ejemplo.txt | grep "xxxxx"
+----
+Tip: You can create this file with:
+<code>
+nano ejemplo.txt
+</code>
+paste the contents of the ejemplo file into the nano editor and close it using STRG/CMD + X
+Alternatively you can download the file by clicking on its name and do the exercises in a linux environment on your local computer (for example using mobaXterm on windows, or in the command line on macOS).
+----
 ===== Reading files + counting lines =====
@@ Line 29: / Line 40: @@
   * ''cat ejemplo.txt | sort | uniq'' (sorts the content of the file, removes successive identical lines, and displays the output)
   * ''cat ejemplo.txt | sort | uniq -c'' (counts the number of unique lines in the file)
+===== Wildcards =====
+The commands will apply to all the files in the current directory that match the pattern.
+"*" is any number of any character. For example, the following command will concatenate all the files that have the ending ".txt":
+<code>
+cat *.txt
+</code>
+"?" works as any single character:
+<code>
+cat ejemplo.t?t
+</code>
+----
+===== Tab completion =====
+You can auto-complete paths by pressing the tab key once or twice anytime while writing a path. If the auto-completion does not work, there might be something wrong with your path..
+You can check out [[https://youtu.be/igwD7cL6QOk|this video]] to see how it works.
+----
 ===== Pattern matching using grep =====
@@ Line 45: / Line 77: @@
 ----
 ===== Regular expressions =====
   * ''grep "^x" ejemplo.txt'' (returns all lines starting with an 'x')
@@ Line 54: / Line 87: @@
   * ''grep "a[ax]" ejemplo.txt'' (returns all lines where an 'a' is either followed by an 'a' or by an 'x')
-Regular expressions can be combined with the previous options, particularly with -o to extract characters after a pattern.
+Note that the regular expressions work on "file contents". Do not confuse with wildcards.
 <hidden Exercise>
-Try extracting only the first 3 characters of lines that start with "a"
+Regular expressions can be combined with the previous options, particularly with -o to extract characters after a pattern. Try extracting only the first 3 characters of lines that start with "a"
 </hidden>
@@ Line 62: / Line 95: @@
 ===== Text editing using sed =====
-	sed s/x/i/
+<code>
-	sed s/x/i/g
+sed s/x/i/ ejemplo.txt
-	sed s/axaxa/kkkkk/
+sed s/x/i/g ejemplo.txt
+sed s/axaxa/kkkkk/ ejemplo.txt
+</code>
-	Combine with regular expressions.
+**Sed and special characters. **
-	-> Try converting the first 3 characters into "iii"
+(An special character might be a space, a tab(\t), a symbol reserved for regular expressions ($), etc.):
-	sed + special characters (An special character might be )
-	sed s/\t//
-	sed -e 's/\t//'
-) cut
+<code>
-	cut -f2
+sed s/\t// ejemplo.txt
-	-> Extract the first column of lines that contain "d"
+sed -e 's/\t//' ejemplo.txt
+</code>
-) Save/overwrite output (ACHTUNG: do not give same name as input file)
+<hidden Exercise>
-	cut -f2 ejemplo.txt > copy_ejemplo.txt
+Combine with regular expressions -> Try converting the first 3 characters into "iii"
+</hidden>
-	To save without overwriting (lines get added to file):
+----
-	grep "X" ejemplo.txt >> copy_ejemplo.txt
-) both files vs for-loop
+===== Cut =====
-	grep "xxxxx" ejemplo.txt | wc -l
+Extract columns from a table
-	grep "xxxxx" copy_ejemplo.txt | wc -l
+<code>
-	cat *plo.txt | wc -l
+cut -f2 ejemplo.txt
+</code>
+<hidden Exercise>
+Extract the first column of lines that contain the character "d"
+</hidden>
+----
-	for i in *plo.txt; do echo $i; done
+===== Save/overwrite output =====
-	for i in *plo.txt; do cat $i | wc -l; done
+(ACHTUNG: do not give same name as input file)
+<code>
+cut -f2 ejemplo.txt > copy_ejemplo.txt
+To save without overwriting (lines get added to file):
+grep "X" ejemplo.txt >> copy_ejemplo.txt
+</code>
+----
+===== Other commands =====
+**Echo:**
+Print something on the terminal
+<code>
+echo "Witness me"
+</code>
+**Translate:**
+Convert one character to another.
+<code>
+cut -f1 ejemplo.txt
+cut -f1 ejemplo.txt | tr -d '\n'
+Notice anything in the output? Allow me to fix it:
+cut -f1 ejemplo.txt | tr -d '\n' | sed -e 's/$/\n/'</code>
+**Rev:**
+print each line backwards
+<code>
+echo "a b c d e"
+echo "a b c d e" | rev
+</code>
+===== Common errors =====
+<hidden Calling the file twice>
+<code>
+grep "XXX" ejemplo.txt | sed -e 's/X/V/g' ejemplo.txt
+</code>
+The output of the first part of the command is not getting passed as the input to the second part of the command, since the file is being read again.
+</hidden>
+<hidden Saving file with the input name>
+<code>
+cat ejemplo.txt > copy.txt
+grep "aaa" copy.txt > copy.txt
+cat copy.txt
+</code>
+Input file is also the output. The contents of input file will be brutally deleted.
+</hidden>
+<hidden Moving files into non-existing directories>
+<code>
+cat ejemplo.txt > copy.txt
+mv copy.txt folder
+cd folder
+cat folder
+</code>
+If a directory does not exist, 'mv' renames the file instead</hidden>
+<hidden Grep and quotations>
+Trying to find all ">" symbols in the following file:
+<file txt important_sequence.fasta>
+>Important_sequence1
+AAATTCTCACCCCTCAGAAA
+>Important_sequence2
+ACCTCAGAAAAATTCTCACC
+</file>
+<code>
+grep > important_sequence.fasta
+</code>
+The crocodile ">" without quotations will be interpreted as "save file" instead of "find". The original file will be overwritten with nothing.
+</hidden>
+<hidden Wildcards or regular expressions>
+<code>
+sed -e 's/a*//' ejemplo.txt
+</code>
+Note that "*" has a different meaning as a wildcard and as a regular expression. Wildcard "*" = Regex ".*"
+</hidden>
+<hidden File names>
+Some naming schemes can make your life as a programmer a living hell.
+<file txt output file final final definitive 6.fasta>
+aaa
+bbb
+ccc
+</file>
+<code>
+rename -e 's/_/ /g' output_file_final_final_definitive_6.fasta
+cat output\ file\ final\ final\ definitive\ 6.fasta
+</code>
+</hidden>
+----
+====== Slightly more advanced bash commands ======
+===== For-loop =====
+First we will create another file. Observe differences in output when concatenating both files and when working with them separately in a for-loop:
+<code>
+grep "fffff" ejemplo.txt > copy_ejemplo.txt
+cat ejemplo.txt | wc -l
+cat copy_ejemplo.txt | wc -l
+</code>
+**Both files**
+<code>cat *plo.txt | wc -l</code>
+**For-loop**
+<ff serif><fs large>for <fc #4682b4>i</fc> in <fc #4682b4>*plo.txt</fc>; do <fc #800080>echo $i</fc>; done</fs></ff>
+  * A <fc #4682b4>list of files</fc> that you want to process; namely, the files ejemplo.txt and copy_ejemplo.txt
+  * <fc #800080> Instruction for each element of your list</fc>; in this case, print said element on the terminal. "$i" is each element of list "i"; note that the variable name is specified by the user, so I could create the list "k" and instruct each element "$k".
+  * The rest is part of the basic sintax of a for-loop.
+We next change the instruction from the example above, to reading the file and counting its lines:
+<code>for i in *plo.txt; do cat $i | wc -l; done</code>
+Which file is which?
+<code>for i in *plo.txt; do cat $i | wc -l | sed -e "s/^/$i\t/"; done</code>
+<hidden Exercise>
+Find the lines that contain "a", "b", and "c" in ejemplo.txt, and save them into their respective files (a.txt, b.txt, c.txt).
+This means, each file should look like this:
+<file txt a.txt>
+aaaaa	xxxxx
+aaaaa	bbbbb
+axaxa	bxbxb
+</file>
+Hint: let me start the command for you:
+<code>
+for i in a b c; do ...
+</code>
+</hidden>
+----
+===== AWK =====
+This command is also considered a programming language on its own. It is particularly useful when needing to process the elements of a table. The basic syntax is the following:
+<ff serif><fs large>awk <fc #008000>-F "\t"</fc> '{print <fc #4682b4>$2</fc> <fc #fa8072>"\t"</fc> $1}'</fs></ff>
+  * Indicate the <fc #008000>delimiter of the table</fc>. Here it is specified to tabs, but the default is space.
+  * <fc #4682b4>$ indicates the number of column</fc> that you wish to return
+  * <fc #fa8072>Text added to the table</fc> must be written inside quotation marks; in this case the text is the addition of a tab character.
+  * Columns can also be processed with mathematical operations. For instance, print $1 + $2.
+<code>
+grep "aaaaa" ejemplo.txt
+grep "aaaaa" ejemplo.txt | awk -F "\t" '{print $2 "\t" $1}'
+</code>
+<ff serif><fs large>grep "aaaaa" ejemplo.txt | awk -F "\t" '<fc #800080>$2 == "xxxxx"</fc>
+ {print <fc #ff0000>$0</fc>}'</fs></ff>
+  * You can indicate to only print the rows where the <fc #800080>column meets a certain condition</fc>. In this case, column 2 must be equal to "xxxxx".
+    * Equal ==
+    * Not equal !=
+    * Greater than >
+  * <fc #ff0000>$0</fc> prints all columns of the table
+----
-) translate
+===== If / Else =====
-	cat ejemplo.txt | tr -d '\n'
+<code>
+cat ejemplo.txt | wc -l
+if [[ $(cat ejemplo.txt | wc -l) -ge 9 ]] ; then echo "File has nine or more lines"; else echo "File has less than nine lines"; fi
+</code>
+To compare numbers:
+  * -eq = is equal to
+  * -ne = is not equal to
+  * -ge = is greater than or equal to
+  * -le = is lesser than or equal to
+  * -gt = is greater than
+  * -lt = is lesser than

Tools

menus and quick search

quick search

site status

Page Tools

meta data for this page

Differences