From a7fcb2c64e3a2052a529ce6eb6027b91f0547ebf Mon Sep 17 00:00:00 2001 From: jbayer Date: Tue, 7 Jul 2015 10:51:40 +0200 Subject: [PATCH] Update HELP.md Dependencies and workflow. --- HELP.md | 105 +++++++++++++++----------------------------------------- 1 file changed, 28 insertions(+), 77 deletions(-) diff --git a/HELP.md b/HELP.md index e9016bb..8eef5df 100644 --- a/HELP.md +++ b/HELP.md @@ -23,12 +23,12 @@ With a list of UniProt accessions and their ranking values (e.g. expression prof ### Dependencies -Linux -Python 2.7.8 - matplotlib - numpy -R 3.1.1 - VennDiagram +- Linux +- Python 2.7.8 + - matplotlib + - numpy +- R 3.1.1 + - VennDiagram ### Database background @@ -43,82 +43,33 @@ The LimiTT workflow is shown below: -The input (grey) is composed of an optional list of miRNAs and an optional annotation file with a transcriptome or proteome mapped to the UniProt Knowledgebase. If an annotation file was submitted, the **black** path represents the processing steps of miRNAim, otherwise, the process is described by the **red path**. +The input (grey) is composed of an optional list of miRNAs and an optional annotation file with a transcriptome or proteome mapped to the UniProt Knowledgebase. If an annotation file was submitted, the **black path** represents the processing steps of miRNAim, otherwise, the process is described by the **red path**. -**A)** The workflow starts with the selection of MTIs from the four MTI DBs in consideration of the miRNAs given in the list, if submitted. At this, the user additionally has the opportunity to choose the DBs of interest, and to filter MTIs by several of their properties, like their occurrence over the DBs, the species they belong to, the experimental methods they were validated with or their stringency in case of starBase. - -B) Next, all target gene symbols of the selected MTIs are mapped to UniProtAccs, while - -C) all UniProtAccs are filtered from the annotation file simultaneously. - -D) Subsequently, both lists are overlapped, resulting in those MTIs which can be linked to the submitted data. +- **A** The workflow starts with the selection of MTIs from the four MTI DBs in consideration of the miRNAs given in the list, if submitted. At this, the user additionally has the opportunity to choose the DBs of interest, and to filter MTIs by several of their properties, like their occurrence over the DBs, the species they belong to, the experimental methods they were validated with or their stringency in case of starBase. +- **B** Next, all target gene symbols of the selected MTIs are mapped to UniProtAccs, while +- **C** all UniProtAccs are filtered from the annotation file simultaneously. +- **D** Subsequently, both lists are overlapped, resulting in those MTIs which can be linked to the submitted data. In case of a missing annotation file, steps (C) and (D) are ignored, and the resulting MTIs rely on the miRNA list or just on the adjustable properties. - -E) As an option, an enrichment analysis of the identified MTI sets is realized by submitting a ranked list with UniProtAccs. The analysis is based on the R implementation of GSEA (Subramanian, et al., 2005). +- **E** As an option, an enrichment analysis of the identified MTI sets is realized by submitting a ranked list with UniProtAccs. The analysis is based on the R implementation of GSEA (Subramanian, et al., 2005). ### Parameters --cl - Use the cluster parameter if you have no miRNA input but want the miRNAs to be - clustered over species and hairpin arm information (hsa-miR-123a-5p -> miR-123a). - Otherwise miRNAs are distinguished by their full nomenclature. - --base - Expects the MTI DBs (abbreviated with numbers) of interest separated by space. - Default is "-base 1 2 3 4" with 1: TarBase, 2: miRTarBase, 3: miRecords, - 4: starBase. - --occ - If more than one DB was selected, the occurrence parameter can be used to define - the minimum number of DBs the MTIs have to occur in. Due to four possible MTI - DBs, the value range is between 1 and 4 with a default value of 2 DBs. If the - manually set value is higher than the number of selected DBs, it is - automatically changed to the number of DBs. - --exp - Experimental methods parameter to select the methods of interest. - The following categories are existent: - Western blot - Reporter assay - qPCR - Microarray - NGS - Other - Additionally it is possible to select experiments by distinctive substrings like - "race" or "chip". Separate categories/substrings with space and surround phrases - with a space in it by quotes (e.g. "Western blot").The comparison is not case - sensitive. - --spec - Species parameter to select the species and/or species category of interest, - delimited by space. Single species need to be named by their miRNA specific - abbreviation (e.g. hsa for Homo Sapiens). Additionally it is possible to select - from the following species categories by passing the single letter abbreviation: - a - animals (14 species) - p - plants ( 6 species) - v - viruses ( 4 species) - f - fungi ( 1 species) - e - excavata ( 1 species) - To ignore species information, pass the letter i. By ignoring species, target to - UniProt accession mapping will be done without species consideration. - --nspec - Parameter for ignoring species or species categories.\nSee above for more information. - --str - Expects a number between 1 and 3 or 5 which describes the number of CLIP-Seq - experiments supporting the MTIs within starBase. - --perm - Integer describing the number of permutations to calculate the Normalized - Enrichment Score (NES) and the False Discovery Rate (FDR) q-value for the MTI - set enrichment analysis. If the number of permutations is too small, NES and - FDR q-value of sets might result in NaN values. - --p - Integer describing the weighting of the ranking values to calculate the - Enrichment Score (ES) for MTI set enrichment analysis. The original GSEA paper - recommends to use a weighting of 0 if the ranking values are not normalized. +A list of parameters can be obtained by calling 'LimiTT -h' + +Parameter | Explanation +----------|------------ +-ia | Tab separated annotation file. +-im | Tab separated miRNA file. +-ir | Tab separated ranking file. +-cl | Use the cluster parameter if you have no miRNA input but want the miRNAs to be clustered over species and hairpin arm information (hsa-miR-123a-5p -> miR-123a). Otherwise miRNAs are distinguished by their full nomenclature. +-base | MTI DBs (abbreviated with numbers) of interest separated by space. Default is "-base 1 2 3 4" with 1: TarBase, 2: miRTarBase, 3: miRecords, 4: starBase. +-occ | If more than one DB was selected, the occurrence parameter can be used to define the minimum number of DBs the MTIs have to occur in. Due to four possible MTI DBs, the value range is between 1 and 4 with a default value of 2 DBs. If the manually set value is higher than the number of selected DBs, it is automatically changed to the number of DBs. +-exp | Experimental methods parameter to select the methods of interest. The following categories are existent: Western blot, Reporter assay, qPCR, Microarray, NGS and Other. Additionally it is possible to select experiments by distinctive substrings like "race" or "chip". Separate categories/substrings with space and surround phrases with a space in it by quotes (e.g. "Western blot").The comparison is not case sensitive. +-spec | Species parameter to select the species and/or species category of interest, delimited by space. Single species need to be named by their miRNA specific abbreviation (e.g. hsa for Homo Sapiens). Additionally it is possible to select from the following species categories by passing the single letter abbreviation: a - animals (14 species), p - plants (6 species), v - viruses (4 species), f - fungi (1 species) amd e - excavata (1 species). To ignore species information, pass the letter i. By ignoring species, target to UniProt accession mapping will be done without species consideration. +-nspec | Parameter for ignoring species or species categories.\nSee above for more information. +-str | Expects a number between 1 and 3 or 5 which describes the number of CLIP-Seq experiments supporting the MTIs within starBase. +-perm | Integer describing the number of permutations to calculate the Normalized Enrichment Score (NES) and the False Discovery Rate (FDR) q-value for the MTI set enrichment analysis. If the number of permutations is too small, NES and FDR q-value of sets might result in NaN values. +-p | Integer describing the weighting of the ranking values to calculate the Enrichment Score (ES) for MTI set enrichment analysis. The original GSEA paper recommends to use a weighting of 0 if the ranking values are not normalized. ### Input