Skip to content

Commit

Permalink
Update HELP.md
Browse files Browse the repository at this point in the history
Edited captions, workflow, outline, etc.
  • Loading branch information
jbayer committed Jul 7, 2015
1 parent 8cec8c5 commit 3314677
Showing 1 changed file with 123 additions and 137 deletions.
260 changes: 123 additions & 137 deletions HELP.md
Original file line number Diff line number Diff line change
@@ -1,76 +1,126 @@
--- LimiTT ---
LimiTT documentation
=================================================

Identification of miRNA target interactions (MTIs).
### Link miRNAs To Targets

### Table of contents

1. [Introduction] (#introduction)
2. [Dependencies] (#dependencies)
3. [Database background] (#database-background)
4. [Workflow] (#workflow)
5. [Parameters] (#parameters)
6. [Input] (#input)
7. [Output] (#output)
8. [References] (#references)
9. [Citation] (#citation)

### Introduction

--
INTRODUCTION
--
LimiTT automatically links validated MTIs and affected proteins to a list of miRNAs and a list of genes or proteins
mapped onto UniProt accessions (annotation file), by using the information of different databases containing experimentally
verified MTIs. If no miRNAs are existent, the pipeline will predict miRNAs by the use of the given genes/proteins.
With a list of UniProt accessions and their ranking values (e.g. expression profile), LimiTT will identify the significantly
enriched MTI sets from the identified MTIs.

--
DEPEDENCIES
--
+ Official and working version in Software Repository +
Linux
Python 3.4.2 - anaconda 2.1.0
matplotlib
numpy
R 3.2.0
VennDiagram

+ Faster version (not running from Software Repository) +
Linux
Python 2.7.8
matplotlib
numpy
R 3.1.1
VennDiagram
mapped onto UniProt accessions (annotation file), by using the information of different databases containing experimentally verified MTIs. If no miRNAs are existent, the pipeline will predict miRNAs by the use of the given genes/proteins.
With a list of UniProt accessions and their ranking values (e.g. expression profile), LimiTT will identify the significantly enriched MTI sets from the identified MTIs.

### Dependencies

Linux
Python 2.7.8
matplotlib
numpy
R 3.1.1
VennDiagram

See HELP_python2to3 for more information about the two versions.
--
DATABASE BACKGROUND
--

LimiTT relies on experimentally validated MTIs received from the open source databases (DBs) TarBase (Vergoulis, et al. 2012,
version 6.0), miRTarBase (Hsu, et al. 2014, version 4.5), miRecords (Xiao, et al. 2009, version 1.0 update 2013) and
starBase (Li, et al. 2014, version 2.0).
The content of each DB was downloaded and partly preprocessed to easily access the MTIs.
In the case of TarBase, data of the miRNA DB miRBase (Griffiths-Jones, et al., 2006) had to be used to convert the miRBase
accession numbers to their miRNA identifiers.
To simplify the comparison of target symbols between the MTI DBs and to retrieve additional information for each target,
the symbols are mapped onto UniProt accessions (UniProtAccs) during the process.
### Database background

LimiTT relies on experimentally validated miRNA target interactions (MTIs) received from the open source databases (DBs) TarBase (Vergoulis, et al. 2012, version 6.0), miRTarBase (Hsu, et al. 2014, version 4.5), miRecords (Xiao, et al. 2009, version 1.0 update 2013) and starBase (Li, et al. 2014, version 2.0).
The content of each DB was downloaded and partly preprocessed to easily access the MTIs. In the case of TarBase, data of the miRNA DB miRBase (Griffiths-Jones, et al., 2006) had to be used to convert the miRBase accession numbers to their miRNA identifiers. To simplify the comparison of target symbols between the MTI DBs and to retrieve additional information for each target, the symbols are mapped onto UniProt accessions (UniProtAccs) during the process.
This is done by using the downloaded entries of the UniProt Knowledgebase (UniProtKB, Magrane and Consortium 2011),
from which specific information was filtered and preprocessed.

--
WORKFLOW
--

1. The workflow of LimiTT starts with the input of an optional list of miRNAs, adjusted to the output of the miRNA
quantification pipeline MIRPIPE (Kuenne, et al., 2014), and an obligatory file containing an annotated transcriptome
or proteome mapped onto UniProt accessions.
2. In consideration of the miRNAs given in the list, experimentally validated MTIs will be selected from the four MTI
databases, whereby the user can define the DBs of interest. Additionally to the choice of MTI DBs, the user has the
opportunity to select the occurrence of MTIs over the DBs, the species or species categories of interest, the experimental
methods and, in case of starBase, the minimal number of CLIP-Seq experiments the MTIs were verified with.
3. The next step is to map all target symbols of the selected MTIs onto UniProtAccs by considering the species the MTI
comes from. Another possibility is to ignore the species and thus map targets not just onto compatible UniProtAccs of
the same, but also of other species. This opportunity shall enable the inclusion of entries homologues to the mapped
target.
4. At the same time, all UniProtAccs are filtered out of the annotated file.
5. In the subsequent annotation mapping step, the both lists of UniProtAccs are overlapped resulting in validated
MTIs if the miRNA input was validated, otherwise in MTIs with annotated targets and predicted miRNAs.
6. Moreover, it is possible to start an enrichment analysis of the identified MTI sets by passing an expression file
to the approach, con-taining UniProtAccs together with ranking values. The enrichment analysis is based on the R
implementation of GSEA (Subramanian, et al., 2005) and it is fully integrated into LimiTT.

--
INPUT FILES
--
### Workflow

The LimiTT workflow is shown below:

<img width="600" src="https://bioinformatics.mpi-bn.mpg.de/static/images/limitt-workflow.jpg"/>

The input (grey) is composed of an optional list of miRNAs and an optional annotation file with a transcriptome or proteome mapped to the UniProt Knowledgebase. If an annotation file was submitted, the **black** path represents the processing steps of miRNAim, otherwise, the process is described by the **red path**.

**A)** The workflow starts with the selection of MTIs from the four MTI DBs in consideration of the miRNAs given in the list, if submitted. At this, the user additionally has the opportunity to choose the DBs of interest, and to filter MTIs by several of their properties, like their occurrence over the DBs, the species they belong to, the experimental methods they were validated with or their stringency in case of starBase.

B) Next, all target gene symbols of the selected MTIs are mapped to UniProtAccs, while

C) all UniProtAccs are filtered from the annotation file simultaneously.

D) Subsequently, both lists are overlapped, resulting in those MTIs which can be linked to the submitted data.
In case of a missing annotation file, steps (C) and (D) are ignored, and the resulting MTIs rely on the miRNA list or just on the adjustable properties.

E) As an option, an enrichment analysis of the identified MTI sets is realized by submitting a ranked list with UniProtAccs. The analysis is based on the R implementation of GSEA (Subramanian, et al., 2005).

### Parameters

-cl
Use the cluster parameter if you have no miRNA input but want the miRNAs to be
clustered over species and hairpin arm information (hsa-miR-123a-5p -> miR-123a).
Otherwise miRNAs are distinguished by their full nomenclature.

-base
Expects the MTI DBs (abbreviated with numbers) of interest separated by space.
Default is "-base 1 2 3 4" with 1: TarBase, 2: miRTarBase, 3: miRecords,
4: starBase.

-occ
If more than one DB was selected, the occurrence parameter can be used to define
the minimum number of DBs the MTIs have to occur in. Due to four possible MTI
DBs, the value range is between 1 and 4 with a default value of 2 DBs. If the
manually set value is higher than the number of selected DBs, it is
automatically changed to the number of DBs.

-exp
Experimental methods parameter to select the methods of interest.
The following categories are existent:
Western blot
Reporter assay
qPCR
Microarray
NGS
Other
Additionally it is possible to select experiments by distinctive substrings like
"race" or "chip". Separate categories/substrings with space and surround phrases
with a space in it by quotes (e.g. "Western blot").The comparison is not case
sensitive.

-spec
Species parameter to select the species and/or species category of interest,
delimited by space. Single species need to be named by their miRNA specific
abbreviation (e.g. hsa for Homo Sapiens). Additionally it is possible to select
from the following species categories by passing the single letter abbreviation:
a - animals (14 species)
p - plants ( 6 species)
v - viruses ( 4 species)
f - fungi ( 1 species)
e - excavata ( 1 species)
To ignore species information, pass the letter i. By ignoring species, target to
UniProt accession mapping will be done without species consideration.

-nspec
Parameter for ignoring species or species categories.\nSee above for more information.

-str
Expects a number between 1 and 3 or 5 which describes the number of CLIP-Seq
experiments supporting the MTIs within starBase.

-perm
Integer describing the number of permutations to calculate the Normalized
Enrichment Score (NES) and the False Discovery Rate (FDR) q-value for the MTI
set enrichment analysis. If the number of permutations is too small, NES and
FDR q-value of sets might result in NaN values.

-p
Integer describing the weighting of the ranking values to calculate the
Enrichment Score (ES) for MTI set enrichment analysis. The original GSEA paper
recommends to use a weighting of 0 if the ranking values are not normalized.

### Input

**Annotation File (-ia)**

Expand Down Expand Up @@ -159,75 +209,8 @@ INPUT FILES
|Q6A037|0.495874819|
+------+-----------+

---
ARGUMENTS
---

-cl
Use the cluster parameter if you have no miRNA input but want the miRNAs to be
clustered over species and hairpin arm information (hsa-miR-123a-5p -> miR-123a).
Otherwise miRNAs are distinguished by their full nomenclature.

-base
Expects the MTI DBs (abbreviated with numbers) of interest separated by space.
Default is "-base 1 2 3 4" with 1: TarBase, 2: miRTarBase, 3: miRecords,
4: starBase.

-occ
If more than one DB was selected, the occurrence parameter can be used to define
the minimum number of DBs the MTIs have to occur in. Due to four possible MTI
DBs, the value range is between 1 and 4 with a default value of 2 DBs. If the
manually set value is higher than the number of selected DBs, it is
automatically changed to the number of DBs.

-exp
Experimental methods parameter to select the methods of interest.
The following categories are existent:
Western blot
Reporter assay
qPCR
Microarray
NGS
Other
Additionally it is possible to select experiments by distinctive substrings like
"race" or "chip". Separate categories/substrings with space and surround phrases
with a space in it by quotes (e.g. "Western blot").The comparison is not case
sensitive.

-spec
Species parameter to select the species and/or species category of interest,
delimited by space. Single species need to be named by their miRNA specific
abbreviation (e.g. hsa for Homo Sapiens). Additionally it is possible to select
from the following species categories by passing the single letter abbreviation:
a - animals (14 species)
p - plants ( 6 species)
v - viruses ( 4 species)
f - fungi ( 1 species)
e - excavata ( 1 species)
To ignore species information, pass the letter i. By ignoring species, target to
UniProt accession mapping will be done without species consideration.

-nspec
Parameter for ignoring species or species categories.\nSee above for more information.

-str
Expects a number between 1 and 3 or 5 which describes the number of CLIP-Seq
experiments supporting the MTIs within starBase.

-perm
Integer describing the number of permutations to calculate the Normalized
Enrichment Score (NES) and the False Discovery Rate (FDR) q-value for the MTI
set enrichment analysis. If the number of permutations is too small, NES and
FDR q-value of sets might result in NaN values.

-p
Integer describing the weighting of the ranking values to calculate the
Enrichment Score (ES) for MTI set enrichment analysis. The original GSEA paper
recommends to use a weighting of 0 if the ranking values are not normalized.

---
OUTPUT FILES
---
### Output

**BarGraphs**

Expand Down Expand Up @@ -336,9 +319,8 @@ OUTPUT FILES
+--------+------+--------------------+----------+---------+

---
CITATIONS
---

### References

Magrane, M. and Consortium, U. (2011). UniProt Knowledgebase: a hub of integrated protein data. In Database, 2011(0), pp. bar009–bar009. [doi:10.1093/database/bar009]

Expand All @@ -353,3 +335,7 @@ Xiao, F. and Zuo, Z. and Cai, G. and Kang, S. and Gao, X. and Li, T. (2009). miR
Yang, J.-H. and Li, J.-H. and Shao, P. and Zhou, H. and Chen, Y.-Q. and Qu, L.-H. (2010). starBase: a database for exploring microRNA-mRNA interaction maps from Argonaute CLIP-Seq and Degradome-Seq data. In Nucleic Acids Research, 39(Database), pp. D202–D209. [doi:10.1093/nar/gkq1056]

Kuenne, C. and Preussner, J. and Herzog, M. and Braun, T. and Looso, M. (2014). MIRPIPE: quantification of microRNAs in niche model organisms. In Bioinformatics, 30(23), pp. 3412–3413. [doi:10.1093/bioinformatics/btu573]

### Citation

Please cite Bayer J, Kuenne C, Preussner J and Looso M. LimiTT: Link miRNAs To Targets. *???* (2015), doi:tba

0 comments on commit 3314677

Please sign in to comment.