meta data for this page
  •  

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
general:computerenvironment:bash [2019/04/29 18:41] rubengeneral:computerenvironment:bash [2023/04/11 13:17] (current) felix
Line 11: Line 11:
 This command is also considered a programming language on its own. It is particularly useful when you need to process the elements of a table. The basic syntax is as follows: This command is also considered a programming language on its own. It is particularly useful when you need to process the elements of a table. The basic syntax is as follows:
  
-awk -F "\t" '{print $2 "\t" $1} file.txt+<code>awk -F "\t" '{print $2 "\t" $1} file.txt</code>
  
   * -F to indicate the delimiter of your table as tabs (default is space).   * -F to indicate the delimiter of your table as tabs (default is space).
Line 19: Line 19:
  
  
-awk -F "\t" '$3 > 10 {print $0}'+<code>awk -F "\t" '$3 > 10 {print $0}'</code>
  
   * You can indicate you want only the rows in which a column meets certain condition. For example, the column 3 requires a value greater than 10.   * You can indicate you want only the rows in which a column meets certain condition. For example, the column 3 requires a value greater than 10.
Line 31: Line 31:
 === Sort === === Sort ===
  
-sort -g -u file.txt+<code>sort -g -u file.txt</code>
  
   * -r to sort in descending order   * -r to sort in descending order
Line 43: Line 43:
 === Translate === === Translate ===
  
-tr '[ATGC]' '[TACG]' | rev+<code>tr '[ATGC]' '[TACG]' | rev</code>
  
   * This is a trick if you are working on the complementary strand.   * This is a trick if you are working on the complementary strand.
Line 53: Line 53:
 === sed (slightly more advanced) === === sed (slightly more advanced) ===
  
-sed -n -e '/AAA/,/BBB/ p' file.txt+<code>sed -n -e '/AAA/,/BBB/ p' file.txt</code>
  
   * This will find AAA, and keep all the lines in a file until it reaches BBB. Pro tip: use this one to extract a sequence in a multi-line fasta.   * This will find AAA, and keep all the lines in a file until it reaches BBB. Pro tip: use this one to extract a sequence in a multi-line fasta.
   * Note that using variables inside a sed command requires double quotation marks " instead of single '.   * Note that using variables inside a sed command requires double quotation marks " instead of single '.
 +
 +===== Working with lists and tables =====
 +
 +Play with the following sample files.{{ :general:computerenvironment:comm_join_example.tar.gz |}}
 +
 +=== comm ===
 +
 +To compare contents of both files (in this case, the identifiers of the first column of the two files):
 +
 +<code>comm <() <()</code>
 +
 +  * Within each "<()" we place the command of the input to compare.
 +  * These should be sorted out
 +  * I use the second part of the command (the sed) to adjust the output to have the correct number of columns
 +
 +<code>comm <(cut -f1 1_table.txt | sort) <(cut -f1 2_table.txt | sort) | sed -e 's/$/\t\t/' | cut -f1,2,3</code>
 +
 +  * Output:
 +  * First column: identifiers exclusive of the table in input 1
 +  * Second column: identifiers exclusive of the table in input 2
 +  * Third column: identifiers present in both tables
 +
 +----
 +
 +
 +=== join ===
 +
 +Join two tables based on a column in common.
 +
 +<code>join -t $'\t' <() <()</code>
 +
 +  * Input within the "<()" must be sorted out.
 +  * I highly recommend using only the first column with the identifiers to join.
 +  * If one of the tables has repeated identifiers, the output will generate all combinations possible. 
 +  * The standard output will display only lines with columns in common. We can add option -a1 or -a2 to also include the entries of one of the tables, with no joined values from other. Do not use both.
 +
 +<code>join -t $'\t' -a1 <(sort -k1,1 1_table.txt) <(sort -k1,1 2_table.txt)</code>
 +
 +
 +
 +