From 33146774926057be485edce01d4553a41656f7f9 Mon Sep 17 00:00:00 2001 From: jbayer Date: Tue, 7 Jul 2015 10:33:43 +0200 Subject: [PATCH] Update HELP.md Edited captions, workflow, outline, etc. --- HELP.md | 260 +++++++++++++++++++++++++++----------------------------- 1 file changed, 123 insertions(+), 137 deletions(-) diff --git a/HELP.md b/HELP.md index 8bdb7cb..e9016bb 100644 --- a/HELP.md +++ b/HELP.md @@ -1,76 +1,126 @@ ---- LimiTT --- +LimiTT documentation +================================================= -Identification of miRNA target interactions (MTIs). +### Link miRNAs To Targets + +### Table of contents + +1. [Introduction] (#introduction) +2. [Dependencies] (#dependencies) +3. [Database background] (#database-background) +4. [Workflow] (#workflow) +5. [Parameters] (#parameters) +6. [Input] (#input) +7. [Output] (#output) +8. [References] (#references) +9. [Citation] (#citation) + +### Introduction --- -INTRODUCTION --- LimiTT automatically links validated MTIs and affected proteins to a list of miRNAs and a list of genes or proteins -mapped onto UniProt accessions (annotation file), by using the information of different databases containing experimentally -verified MTIs. If no miRNAs are existent, the pipeline will predict miRNAs by the use of the given genes/proteins. -With a list of UniProt accessions and their ranking values (e.g. expression profile), LimiTT will identify the significantly -enriched MTI sets from the identified MTIs. - --- -DEPEDENCIES --- - + Official and working version in Software Repository + - Linux - Python 3.4.2 - anaconda 2.1.0 - matplotlib - numpy - R 3.2.0 - VennDiagram - - + Faster version (not running from Software Repository) + - Linux - Python 2.7.8 - matplotlib - numpy - R 3.1.1 - VennDiagram +mapped onto UniProt accessions (annotation file), by using the information of different databases containing experimentally verified MTIs. If no miRNAs are existent, the pipeline will predict miRNAs by the use of the given genes/proteins. +With a list of UniProt accessions and their ranking values (e.g. expression profile), LimiTT will identify the significantly enriched MTI sets from the identified MTIs. + +### Dependencies + +Linux +Python 2.7.8 + matplotlib + numpy +R 3.1.1 + VennDiagram -See HELP_python2to3 for more information about the two versions. --- -DATABASE BACKGROUND --- - -LimiTT relies on experimentally validated MTIs received from the open source databases (DBs) TarBase (Vergoulis, et al. 2012, -version 6.0), miRTarBase (Hsu, et al. 2014, version 4.5), miRecords (Xiao, et al. 2009, version 1.0 update 2013) and -starBase (Li, et al. 2014, version 2.0). -The content of each DB was downloaded and partly preprocessed to easily access the MTIs. -In the case of TarBase, data of the miRNA DB miRBase (Griffiths-Jones, et al., 2006) had to be used to convert the miRBase -accession numbers to their miRNA identifiers. -To simplify the comparison of target symbols between the MTI DBs and to retrieve additional information for each target, -the symbols are mapped onto UniProt accessions (UniProtAccs) during the process. +### Database background + +LimiTT relies on experimentally validated miRNA target interactions (MTIs) received from the open source databases (DBs) TarBase (Vergoulis, et al. 2012, version 6.0), miRTarBase (Hsu, et al. 2014, version 4.5), miRecords (Xiao, et al. 2009, version 1.0 update 2013) and starBase (Li, et al. 2014, version 2.0). +The content of each DB was downloaded and partly preprocessed to easily access the MTIs. In the case of TarBase, data of the miRNA DB miRBase (Griffiths-Jones, et al., 2006) had to be used to convert the miRBase accession numbers to their miRNA identifiers. To simplify the comparison of target symbols between the MTI DBs and to retrieve additional information for each target, the symbols are mapped onto UniProt accessions (UniProtAccs) during the process. This is done by using the downloaded entries of the UniProt Knowledgebase (UniProtKB, Magrane and Consortium 2011), from which specific information was filtered and preprocessed. --- -WORKFLOW --- - -1. The workflow of LimiTT starts with the input of an optional list of miRNAs, adjusted to the output of the miRNA - quantification pipeline MIRPIPE (Kuenne, et al., 2014), and an obligatory file containing an annotated transcriptome - or proteome mapped onto UniProt accessions. -2. In consideration of the miRNAs given in the list, experimentally validated MTIs will be selected from the four MTI - databases, whereby the user can define the DBs of interest. Additionally to the choice of MTI DBs, the user has the - opportunity to select the occurrence of MTIs over the DBs, the species or species categories of interest, the experimental - methods and, in case of starBase, the minimal number of CLIP-Seq experiments the MTIs were verified with. -3. The next step is to map all target symbols of the selected MTIs onto UniProtAccs by considering the species the MTI - comes from. Another possibility is to ignore the species and thus map targets not just onto compatible UniProtAccs of - the same, but also of other species. This opportunity shall enable the inclusion of entries homologues to the mapped - target. -4. At the same time, all UniProtAccs are filtered out of the annotated file. -5. In the subsequent annotation mapping step, the both lists of UniProtAccs are overlapped resulting in validated - MTIs if the miRNA input was validated, otherwise in MTIs with annotated targets and predicted miRNAs. -6. Moreover, it is possible to start an enrichment analysis of the identified MTI sets by passing an expression file - to the approach, con-taining UniProtAccs together with ranking values. The enrichment analysis is based on the R - implementation of GSEA (Subramanian, et al., 2005) and it is fully integrated into LimiTT. - --- -INPUT FILES --- +### Workflow + +The LimiTT workflow is shown below: + + + +The input (grey) is composed of an optional list of miRNAs and an optional annotation file with a transcriptome or proteome mapped to the UniProt Knowledgebase. If an annotation file was submitted, the **black** path represents the processing steps of miRNAim, otherwise, the process is described by the **red path**. + +**A)** The workflow starts with the selection of MTIs from the four MTI DBs in consideration of the miRNAs given in the list, if submitted. At this, the user additionally has the opportunity to choose the DBs of interest, and to filter MTIs by several of their properties, like their occurrence over the DBs, the species they belong to, the experimental methods they were validated with or their stringency in case of starBase. + +B) Next, all target gene symbols of the selected MTIs are mapped to UniProtAccs, while + +C) all UniProtAccs are filtered from the annotation file simultaneously. + +D) Subsequently, both lists are overlapped, resulting in those MTIs which can be linked to the submitted data. +In case of a missing annotation file, steps (C) and (D) are ignored, and the resulting MTIs rely on the miRNA list or just on the adjustable properties. + +E) As an option, an enrichment analysis of the identified MTI sets is realized by submitting a ranked list with UniProtAccs. The analysis is based on the R implementation of GSEA (Subramanian, et al., 2005). + +### Parameters + +-cl + Use the cluster parameter if you have no miRNA input but want the miRNAs to be + clustered over species and hairpin arm information (hsa-miR-123a-5p -> miR-123a). + Otherwise miRNAs are distinguished by their full nomenclature. + +-base + Expects the MTI DBs (abbreviated with numbers) of interest separated by space. + Default is "-base 1 2 3 4" with 1: TarBase, 2: miRTarBase, 3: miRecords, + 4: starBase. + +-occ + If more than one DB was selected, the occurrence parameter can be used to define + the minimum number of DBs the MTIs have to occur in. Due to four possible MTI + DBs, the value range is between 1 and 4 with a default value of 2 DBs. If the + manually set value is higher than the number of selected DBs, it is + automatically changed to the number of DBs. + +-exp + Experimental methods parameter to select the methods of interest. + The following categories are existent: + Western blot + Reporter assay + qPCR + Microarray + NGS + Other + Additionally it is possible to select experiments by distinctive substrings like + "race" or "chip". Separate categories/substrings with space and surround phrases + with a space in it by quotes (e.g. "Western blot").The comparison is not case + sensitive. + +-spec + Species parameter to select the species and/or species category of interest, + delimited by space. Single species need to be named by their miRNA specific + abbreviation (e.g. hsa for Homo Sapiens). Additionally it is possible to select + from the following species categories by passing the single letter abbreviation: + a - animals (14 species) + p - plants ( 6 species) + v - viruses ( 4 species) + f - fungi ( 1 species) + e - excavata ( 1 species) + To ignore species information, pass the letter i. By ignoring species, target to + UniProt accession mapping will be done without species consideration. + +-nspec + Parameter for ignoring species or species categories.\nSee above for more information. + +-str + Expects a number between 1 and 3 or 5 which describes the number of CLIP-Seq + experiments supporting the MTIs within starBase. + +-perm + Integer describing the number of permutations to calculate the Normalized + Enrichment Score (NES) and the False Discovery Rate (FDR) q-value for the MTI + set enrichment analysis. If the number of permutations is too small, NES and + FDR q-value of sets might result in NaN values. + +-p + Integer describing the weighting of the ranking values to calculate the + Enrichment Score (ES) for MTI set enrichment analysis. The original GSEA paper + recommends to use a weighting of 0 if the ranking values are not normalized. + +### Input **Annotation File (-ia)** @@ -159,75 +209,8 @@ INPUT FILES |Q6A037|0.495874819| +------+-----------+ ---- -ARGUMENTS ---- - --cl - Use the cluster parameter if you have no miRNA input but want the miRNAs to be - clustered over species and hairpin arm information (hsa-miR-123a-5p -> miR-123a). - Otherwise miRNAs are distinguished by their full nomenclature. - --base - Expects the MTI DBs (abbreviated with numbers) of interest separated by space. - Default is "-base 1 2 3 4" with 1: TarBase, 2: miRTarBase, 3: miRecords, - 4: starBase. - --occ - If more than one DB was selected, the occurrence parameter can be used to define - the minimum number of DBs the MTIs have to occur in. Due to four possible MTI - DBs, the value range is between 1 and 4 with a default value of 2 DBs. If the - manually set value is higher than the number of selected DBs, it is - automatically changed to the number of DBs. --exp - Experimental methods parameter to select the methods of interest. - The following categories are existent: - Western blot - Reporter assay - qPCR - Microarray - NGS - Other - Additionally it is possible to select experiments by distinctive substrings like - "race" or "chip". Separate categories/substrings with space and surround phrases - with a space in it by quotes (e.g. "Western blot").The comparison is not case - sensitive. - --spec - Species parameter to select the species and/or species category of interest, - delimited by space. Single species need to be named by their miRNA specific - abbreviation (e.g. hsa for Homo Sapiens). Additionally it is possible to select - from the following species categories by passing the single letter abbreviation: - a - animals (14 species) - p - plants ( 6 species) - v - viruses ( 4 species) - f - fungi ( 1 species) - e - excavata ( 1 species) - To ignore species information, pass the letter i. By ignoring species, target to - UniProt accession mapping will be done without species consideration. - --nspec - Parameter for ignoring species or species categories.\nSee above for more information. - --str - Expects a number between 1 and 3 or 5 which describes the number of CLIP-Seq - experiments supporting the MTIs within starBase. - --perm - Integer describing the number of permutations to calculate the Normalized - Enrichment Score (NES) and the False Discovery Rate (FDR) q-value for the MTI - set enrichment analysis. If the number of permutations is too small, NES and - FDR q-value of sets might result in NaN values. - --p - Integer describing the weighting of the ranking values to calculate the - Enrichment Score (ES) for MTI set enrichment analysis. The original GSEA paper - recommends to use a weighting of 0 if the ranking values are not normalized. - ---- -OUTPUT FILES ---- +### Output **BarGraphs** @@ -336,9 +319,8 @@ OUTPUT FILES +--------+------+--------------------+----------+---------+ ---- -CITATIONS ---- + +### References Magrane, M. and Consortium, U. (2011). UniProt Knowledgebase: a hub of integrated protein data. In Database, 2011(0), pp. bar009–bar009. [doi:10.1093/database/bar009] @@ -353,3 +335,7 @@ Xiao, F. and Zuo, Z. and Cai, G. and Kang, S. and Gao, X. and Li, T. (2009). miR Yang, J.-H. and Li, J.-H. and Shao, P. and Zhou, H. and Chen, Y.-Q. and Qu, L.-H. (2010). starBase: a database for exploring microRNA-mRNA interaction maps from Argonaute CLIP-Seq and Degradome-Seq data. In Nucleic Acids Research, 39(Database), pp. D202–D209. [doi:10.1093/nar/gkq1056] Kuenne, C. and Preussner, J. and Herzog, M. and Braun, T. and Looso, M. (2014). MIRPIPE: quantification of microRNAs in niche model organisms. In Bioinformatics, 30(23), pp. 3412–3413. [doi:10.1093/bioinformatics/btu573] + +### Citation + +Please cite Bayer J, Kuenne C, Preussner J and Looso M. LimiTT: Link miRNAs To Targets. *???* (2015), doi:tba