Permalink
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
269 lines (195 sloc) 17 KB

LimiTT documentation

Link miRNAs To Targets

Table of contents

  1. [Introduction] (#introduction)
  2. [Dependencies] (#dependencies)
  3. [Database background] (#database-background)
  4. [Workflow] (#workflow)
  5. [Parameters] (#parameters)
  6. [Input] (#input)
  7. [Output] (#output)
  8. [References] (#references)
  9. [Citation] (#citation)

Introduction

LimiTT automatically links validated MTIs and affected proteins to a list of miRNAs and a list of genes or proteins mapped onto UniProt accessions (annotation file), by using the information of different databases containing experimentally verified MTIs. If no miRNAs are existent, the pipeline will predict miRNAs by the use of the given genes/proteins. With a list of UniProt accessions and their ranking values (e.g. expression profile), LimiTT will identify the significantly enriched MTI sets from the identified MTIs.

Dependencies

  • Linux
  • Python 2.7.8
    • matplotlib
    • numpy
  • R 3.1.1
    • VennDiagram

Database background

LimiTT relies on experimentally validated miRNA target interactions (MTIs) received from the open source databases (DBs) TarBase (Vergoulis, et al. 2012, version 6.0), miRTarBase (Hsu, et al. 2014, version 4.5), miRecords (Xiao, et al. 2009, version 1.0 update 2013) and starBase (Li, et al. 2014, version 2.0). The content of each DB was downloaded and partly preprocessed to easily access the MTIs. In the case of TarBase, data of the miRNA DB miRBase (Griffiths-Jones, et al., 2006) had to be used to convert the miRBase accession numbers to their miRNA identifiers. To simplify the comparison of target symbols between the MTI DBs and to retrieve additional information for each target, the symbols are mapped onto UniProt accessions (UniProtAccs) during the process. This is done by using the downloaded entries of the UniProt Knowledgebase (UniProtKB, Magrane and Consortium 2011), from which specific information was filtered and preprocessed.

Workflow

The LimiTT workflow is shown below:

The input (grey) is composed of an optional list of miRNAs and an optional annotation file with a transcriptome or proteome mapped to the UniProt Knowledgebase. If an annotation file was submitted, the black path represents the processing steps of miRNAim, otherwise, the process is described by the red path.

  • A The workflow starts with the selection of MTIs from the four MTI DBs in consideration of the miRNAs given in the list, if submitted. At this, the user additionally has the opportunity to choose the DBs of interest, and to filter MTIs by several of their properties, like their occurrence over the DBs, the species they belong to, the experimental methods they were validated with or their stringency in case of starBase.
  • B Next, all target gene symbols of the selected MTIs are mapped to UniProtAccs, while
  • C all UniProtAccs are filtered from the annotation file simultaneously.
  • D Subsequently, both lists are overlapped, resulting in those MTIs which can be linked to the submitted data. In case of a missing annotation file, steps C and D are ignored, and the resulting MTIs rely on the miRNA list or just on the adjustable properties.
  • E As an option, an enrichment analysis of the identified MTI sets is realized by submitting a ranked list with UniProtAccs. The analysis is based on the R implementation of GSEA (Subramanian, et al., 2005).

Parameters

A list of parameters can be obtained by calling 'LimiTT -h'

Parameter Explanation
-ia Tab separated annotation file.
-im Tab separated miRNA file.
-ir Tab separated ranking file.
-cl Use the cluster parameter if you have no miRNA input but want the miRNAs to be clustered over species and hairpin arm information (hsa-miR-123a-5p -> miR-123a). Otherwise miRNAs are distinguished by their full nomenclature.
-base MTI DBs (abbreviated with numbers) of interest separated by space. Default is "-base 1 2 3 4" with 1: TarBase, 2: miRTarBase, 3: miRecords, 4: starBase.
-occ If more than one DB was selected, the occurrence parameter can be used to define the minimum number of DBs the MTIs have to occur in. Due to four possible MTI DBs, the value range is between 1 and 4 with a default value of 2 DBs. If the manually set value is higher than the number of selected DBs, it is automatically changed to the number of DBs.
-exp Experimental methods parameter to select the methods of interest. The following categories are existent: Western blot, Reporter assay, qPCR, Microarray, NGS and Other. Additionally it is possible to select experiments by distinctive substrings like "race" or "chip". Separate categories/substrings with space and surround phrases with a space in it by quotes (e.g. "Western blot").The comparison is not case sensitive.
-spec Species parameter to select the species and/or species category of interest, delimited by space. Single species need to be named by their miRNA specific abbreviation (e.g. hsa for Homo Sapiens). Additionally it is possible to select from the following species categories by passing the single letter abbreviation: a - animals (14 species), p - plants (6 species), v - viruses (4 species), f - fungi (1 species) amd e - excavata (1 species). To ignore species information, pass the letter i. By ignoring species, target to UniProt accession mapping will be done without species consideration.
-nspec Parameter for ignoring species or species categories.\nSee above for more information.
-str Expects a number between 1 and 3 or 5 which describes the number of CLIP-Seq experiments supporting the MTIs within starBase.
-perm Integer describing the number of permutations to calculate the Normalized Enrichment Score (NES) and the False Discovery Rate (FDR) q-value for the MTI set enrichment analysis. If the number of permutations is too small, NES and FDR q-value of sets might result in NaN values.
-p Integer describing the weighting of the ranking values to calculate the Enrichment Score (ES) for MTI set enrichment analysis. The original GSEA paper recommends to use a weighting of 0 if the ranking values are not normalized.

Input

Annotation File (-ia)

  • File type: Tab delimited
  • Header: No
  • Required Content: UniProt accessions per line or separated by comma.
  • Allowed Content: Several columns; Empty content; Accessions with attached information concerning, for example, the underlying database, delimited by a pipe ( | ) symbol (e.g. sp|Q9XS59|S6A15_BOVIN). At this, only this identifier will be saved, which occurs after the first pipe symbol (e.g. sp|Q9XS59|S6A15_BOVIN > Q9XS59). Identifiers from other databases are ignored.

Example "Required":

|

----| A2A6A1 |
O88898,P54763,Q3UHC0 | G5E870 | Q6A037 | |

Example "Allowed":

| | | -------------------|-------|-----------------------------------------| comp1000309_c0_seq1|slc15a3|Q8IY34,O75618 | comp1000318_c0_seq1| | | comp1000627_c0_seq1|slc6a15|sp|Q9XS59|S6A15_BOVIN | comp1000899_c0_seq1| |gb|CX212397.1,dbj|DB530926.2,gi|154363325| | | |

miRNA File (-im)

  • File type: Tab delimited
  • Header: No
  • Required Content: One mature miRNA identifier (e.g. hsa-miR-17a-5p) per line in column 1.
  • Allowed Content: Several columns and shortened miRNA identifiers. Shortened miRNA identifiers have to consist at least of the prefix miR, lin or let, the identification number and, if existent, the lettered suffix showing sequence similarity (e.g miR-17a).

Example "Required":

| -------| hsa-miR-93b-5p| miR-36f| mmu-miR-29d-3p| miR-29c| |

Example "Allowed": The example is a part of an original output of the MIRPIPE pipeline (Kuenne et al., 2014), which the LimiTT parameters are adjusted to.

| | | | | | -------|------|----|---|----------------------|------| miR-93b|52 |1.00|170|CAAGTGCTGTTCGTGCAGGTAG|33 | miR-36f|211 |0.00|171|ATTGAGCTATCTGTGTAG |211 | miR-29d|141233|0.02|172|TAGCACCATATGAAATCAGTGT|133582| miR-29c|55690 |1.00|172|TAGCACCATTTGAAATCGGTTA|44200 | ||||||

Ranking File (-ir)

  • File type: Tab delimited
  • Header: No
  • Required Content: UniProt accessions in column one, corresponding ranking value in column two.
  • Allowed Content: The content must not be sorted by the ranking values.

Example:

| | ------|-----------| A2A6A1|0.152108244| P54763|0.640846805| Q3UHC0|0.931454837| O88898|0.240325584| G5E870|0.47554716 | Q6A037|0.495874819| | |

Output

BarGraphs

The bar graphs provide an overview of the number of miRNAs and MTIs after the different processing steps of LimiTT. MiRNAs and MTIs are counted after searching the MTI databases, after filtering by their occurrence over the DBs, after mapping MTIs onto UniProtAccs and after mapping the remaining MTI targets onto the annotated UniProtAccs. Thus, the last number within the bars is the final result.

MTI Matrix

The matrix contains all identified MTIs ranged in targets as UniProtAccs (rows) and miRNAs (columns). If an interaction between miRNA and target was identified, a binary number represents the occurrence of the interaction over the chosen MTI DBs. The order of the DBs for the binary string can be found in the first row.

Example:

Database order: TarBase, miRTarBase, miRecords, starBase
  |miR-9|miR-15a|miR-17|miR-19b|miR-24| miR-26a     |

A0AVK6| 0001| | | | 0110| | A2A6A1| | | | 1110| | | A2AAY5| | | 1110| | | 1001 | A2AHG0| |0001 | 0101| | | |

MTI Info

The MTI information file is a list of all identified target UniProt Accessions together with the interacting miRNAs and further information which was collected during the process. If in the beginning of the process additional information from the annotation file and/or the miRNA list was specified, this information will also be part of the MTI information file.

Standard columns:

Column Explanation
UniProt Accessions UniProt Accession of the identified miRNA target
miRNA Target Target symbol from the MTI database(s)
miRNA miRNA(s) identified to interact with the target
Review status Review status of UniProtKB entry
Organism The MTI's organism
Gene synonyms Synonyms of the target gene
Protein names Name(s) of the influenced protein
EC number Enzyme Commission number
Existence Evidence for protein existence
GO-IDs Gene Ontology identifier(s)

MTI Overlap HM

Based on the idea that each identified miRNA interacts with a set of target genes, the Heatmap (HM) depicts the ratio of overlapping UniProtAcc targets between each of these MTI sets. If the MTI set enrichment analysis was used, the Heatmap output will depict for each MTI set the ratio overlapping target genes which are part of the leading edge sets of the corresponding MTI sets.

MTI Sets ranked

If a ranking file was passed to miRNA, a reduced version of the Gene Set Enrichment Analysis tool is started, analysing the enrichment of the identified MTI sets based on the ranked UniProtAccs. With a running sum statistic, a weighted Enrichment Score (ES) is calculated for each gene set based on position dependant gene matches between the ranked list and the set. The Leading Edge analysis additionally identifies and analyses the core genes of the gene set which mainly affect the ES. At this, the Leading Edge analysis proceeds as follows: Depending on whether the ES of a MTI set is positive or negative, the set of Leading Edge targets either consists of the MTI set targets before or after the peak in the running sum calculation. Based on this, three statistics are calculated, where tags represents the ratio of leading edge targets to all targets in the given set, list starts out from all UniProtAcc in the ranking file either before positive ES) or after (negative ES) the peak and calculates the ratio of these UniProtAccs to all existent within the file and where signal is a combination of the two previous calculations, describing the distribution of the MTI set targets over the ranked dataset, resulting in 100% or more, if all targets can be found at the beginning of the ranked list. To take the set sizes into account, MTI set enrichment analysis calculates in the next step the Normalized Enrichment Score (NES) for each gene set by using permutations of the dataset. Additionally, the False Discovery Rate (FDR) q-value is calculated, representing the estimated probability of a false positive result for each set with a given NES.

Aside from the ES, NES, FDR q-value and Leading Edge analysis, the file consists of the size of each MTI set, which is the number of overlapping UniProtAccs between the MTI set and the ranked list, and the index within the ranked gene file at which the running sum statistic calculated the maximal ES.

Example:

MTI Set Size ES NES FDR q-val Rank at Max Leading Edge
miR-149 6 0.65 1.55 0.290 16 tags=67%, list=29%, signal=53%
miR-301b 4 0.61 1.29 0.790 1 tags=25%, list=2%, signal=26%

EnrichmentScore Plots

Enrichments plots depict for each MTI set the running enrichments score over all UniProtAccs in the ranked dataset (blue line), the position of targets of the current MTI set within in the ranked list (black dashes) and the maximum ES, either positive or negative (red dash). Enrichment plots are created only if a MTI set enrichment analysis was started.

MTI Set Genes

The MTI set gene file output of LimiTT is more or less a written version of all enrichment plots and thus just produced, if an enrichment analyses was initiated. The file lists for each MTI set, the targets which overlap with the ranked list of UniProtAccs, the index of each of this targets within the ranked list, the running ES for this target and whether it is a member of the leading edge set or not.

Example:

MTI set Target Index in Ranked List Running ES LE Member
miR-149 Q9WV91 10 0.06 Yes
miR-149 Q80SW1 13 0.24 Yes
miR-149 Q71B07 15 0.44 Yes
miR-181a Q56A04 29 -0.16 Yes
miR-181a A2AJK6 36 0.38 Yes
miR-190a A3KGB4 14 0.76 Yes

References

Magrane, M. and Consortium, U. (2011). UniProt Knowledgebase: a hub of integrated protein data. In Database, 2011(0), pp. bar009�bar009. [doi:10.1093/database/bar009]

Subramanian, A. and Tamayo, P. and Mootha, V. K. and Mukherjee, S. and Ebert, B. L. and Gillette, M. A. and Paulovich, A. and Pomeroy, S. L. and Golub, T. R. and Lander, E. S. and et al. (2005). Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. In Proceedings of the National Academy of Sciences, 102(43), pp. 15545�15550. [doi:10.1073/pnas.0506580102]

Vergoulis, T. and Vlachos, I. S. and Alexiou, P. and Georgakilas, G. and Maragkakis, M. and Reczko, M. and Gerangelos, S. and Koziris, N. and Dalamagas, T. and Hatzigeorgiou, A. G. (2011). TarBase 6.0: capturing the exponential growth of miRNA targets with experimental support. In Nucleic Acids Research, 40(D1), pp. D222�D229. [doi:10.1093/nar/gkr1161]

Hsu, S.-D. and Tseng, Y.-T. and Shrestha, S. and Lin, Y.-L. and Khaleel, A. and Chou, C.-H. and Chu, C.-F. and Huang, H.-Y. and Lin, C.-M. and Ho, S.-Y. and et al. (2013). miRTarBase update 2014: an information resource for experimentally validated miRNA-target interactions. In Nucleic Acids Research, 42(D1), pp. D78�D85. [doi:10.1093/nar/gkt1266]

Xiao, F. and Zuo, Z. and Cai, G. and Kang, S. and Gao, X. and Li, T. (2009). miRecords: an integrated resource for microRNA-target interactions. In Nucleic Acids Research, 37(Database), pp. D105�D110. [doi:10.1093/nar/gkn851]

Yang, J.-H. and Li, J.-H. and Shao, P. and Zhou, H. and Chen, Y.-Q. and Qu, L.-H. (2010). starBase: a database for exploring microRNA-mRNA interaction maps from Argonaute CLIP-Seq and Degradome-Seq data. In Nucleic Acids Research, 39(Database), pp. D202�D209. [doi:10.1093/nar/gkq1056]

Kuenne, C. and Preussner, J. and Herzog, M. and Braun, T. and Looso, M. (2014). MIRPIPE: quantification of microRNAs in niche model organisms. In Bioinformatics, 30(23), pp. 3412�3413. [doi:10.1093/bioinformatics/btu573]

Citation

Please cite Bayer J, Kuenne C, Preussner J and Looso M. LimiTT: Link miRNAs To Targets. ??? (2015), doi:tba