Differences

This shows you the differences between two versions of the page.

--- general:computerenvironment:bash [2019/04/29 18:41] – ruben
+++ general:computerenvironment:bash [2023/04/11 13:17] (current) – felix
@@ Line 11: / Line 11: @@
 This command is also considered a programming language on its own. It is particularly useful when you need to process the elements of a table. The basic syntax is as follows:
-awk -F "\t" '{print $2 "\t" $1} file.txt
+<code>awk -F "\t" '{print $2 "\t" $1} file.txt</code>
   * -F to indicate the delimiter of your table as tabs (default is space).
@@ Line 19: / Line 19: @@
-awk -F "\t" '$3 > 10 {print $0}'
+<code>awk -F "\t" '$3 > 10 {print $0}'</code>
   * You can indicate you want only the rows in which a column meets certain condition. For example, the column 3 requires a value greater than 10.
@@ Line 31: / Line 31: @@
 === Sort ===
-sort -g -u file.txt
+<code>sort -g -u file.txt</code>
   * -r to sort in descending order
@@ Line 43: / Line 43: @@
 === Translate ===
-tr '[ATGC]' '[TACG]' | rev
+<code>tr '[ATGC]' '[TACG]' | rev</code>
   * This is a trick if you are working on the complementary strand.
@@ Line 53: / Line 53: @@
 === sed (slightly more advanced) ===
-sed -n -e '/AAA/,/BBB/ p' file.txt
+<code>sed -n -e '/AAA/,/BBB/ p' file.txt</code>
   * This will find AAA, and keep all the lines in a file until it reaches BBB. Pro tip: use this one to extract a sequence in a multi-line fasta.
   * Note that using variables inside a sed command requires double quotation marks " instead of single '.
+===== Working with lists and tables =====
+Play with the following sample files.{{ :general:computerenvironment:comm_join_example.tar.gz |}}
+=== comm ===
+To compare contents of both files (in this case, the identifiers of the first column of the two files):
+<code>comm <() <()</code>
+  * Within each "<()" we place the command of the input to compare.
+  * These should be sorted out
+  * I use the second part of the command (the sed) to adjust the output to have the correct number of columns
+<code>comm <(cut -f1 1_table.txt | sort) <(cut -f1 2_table.txt | sort) | sed -e 's/$/\t\t/' | cut -f1,2,3</code>
+  * Output:
+  * First column: identifiers exclusive of the table in input 1
+  * Second column: identifiers exclusive of the table in input 2
+  * Third column: identifiers present in both tables
+----
+=== join ===
+Join two tables based on a column in common.
+<code>join -t $'\t' <() <()</code>
+  * Input within the "<()" must be sorted out.
+  * I highly recommend using only the first column with the identifiers to join.
+  * If one of the tables has repeated identifiers, the output will generate all combinations possible.
+  * The standard output will display only lines with columns in common. We can add option -a1 or -a2 to also include the entries of one of the tables, with no joined values from other. Do not use both.
+<code>join -t $'\t' -a1 <(sort -k1,1 1_table.txt) <(sort -k1,1 2_table.txt)</code>

Tools

menus and quick search

quick search

site status

Page Tools

meta data for this page

Differences