Skip to content

Commit

Permalink
Update HELP.md
Browse files Browse the repository at this point in the history
Input and Output tables
  • Loading branch information
jbayer committed Jul 7, 2015
1 parent 791acf5 commit 45d32b3
Showing 1 changed file with 96 additions and 109 deletions.
205 changes: 96 additions & 109 deletions HELP.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ Parameter | Explanation

### Input

**Annotation File (-ia)**
#### Annotation File (-ia)

- *File type*: Tab delimited
- *Header*: No
Expand All @@ -99,7 +99,7 @@ comp1000627_c0_seq1|slc6a15|sp|Q9XS59|S6A15_BOVIN |
comp1000899_c0_seq1| |gb|CX212397.1,dbj|DB530926.2,gi|154363325|


**miRNA File (-im)**
#### miRNA File (-im)

- *File type*: Tab delimited
- *Header*: No
Expand All @@ -126,8 +126,7 @@ miR-29d|141233|0.02|172|TAGCACCATATGAAATCAGTGT|133582|
miR-29c|55690 |1.00|172|TAGCACCATTTGAAATCGGTTA|44200 |



**Ranking File (-ir)**
#### Ranking File (-ir)

- *File type*: Tab delimited
- *Header*: No
Expand All @@ -147,113 +146,101 @@ Q6A037|0.495874819|

### Output

**BarGraphs**

The bar graphs provide an overview of the number of miRNAs and MTIs after the different processing steps of LimiTT.
At this, miRNAs and MTIs are counted after searching the MTI databases, after filtering by their occurrence over the DBs,
after mapping MTIs onto UniProtAccs and after mapping the remaining MTI targets onto the annotated UniProtAccs.
Thus, the last number within the bars is the final result.

**MTI Matrix**

The matrix contains all identified MTIs ranged in targets as UniProtAccs (rows) and miRNAs (columns).
If an interaction between miRNA and target was identified, a binary number represents the occurrence of the interaction over the chosen MTI DBs.
The order of the DBs for the binary string can be found in the first row.

Example:
|--------------------------------------------------------|
|Database order: TarBase, miRTarBase, miRecords, starBase|
|------|-----|-------|------|-------|------|-------------|
| |miR-9|miR-15a|miR-17|miR-19b|miR-24| miR-26a |
|------|-----|-------|------|-------|------|-------------|
|A0AVK6| 0001| | | | 0110| |
|------|-----|-------|------|-------|------|-------------|
|A2A6A1| | | | 1110| | |
|------|-----|-------|------|-------|------|-------------|
|A2AAY5| | | 1110| | | 1001 |
|------|-----|-------|------|-------|------|-------------|
|A2AHG0| |0001 | 0101| | | |
|------|-----|-------|------|-------|------|-------------|
#### BarGraphs

The bar graphs provide an overview of the number of miRNAs and MTIs after the different processing steps of LimiTT.
MiRNAs and MTIs are counted after searching the MTI databases, after filtering by their occurrence over the DBs, after mapping MTIs onto UniProtAccs and after mapping the remaining MTI targets onto the annotated UniProtAccs.
Thus, the last number within the bars is the final result.

#### MTI Matrix

The matrix contains all identified MTIs ranged in targets as UniProtAccs (rows) and miRNAs (columns).
If an interaction between miRNA and target was identified, a binary number represents the occurrence of the interaction over the chosen MTI DBs.
The order of the DBs for the binary string can be found in the first row.

Example:

|Database order: TarBase, miRTarBase, miRecords, starBase|
|------|-----|-------|------|-------|------|-------------|
| |miR-9|miR-15a|miR-17|miR-19b|miR-24| miR-26a |
|A0AVK6| 0001| | | | 0110| |
|A2A6A1| | | | 1110| | |
|A2AAY5| | | 1110| | | 1001 |
|A2AHG0| |0001 | 0101| | | |

**MTI Info**

The MTI information file is a list of all identified target UniProt Accessions together with the interacting miRNAs and further information which was collected during the process.
If in the beginning of the process additional information from the annotation file and/or the miRNA list was specified, this information will also be part of the MTI information file.

Standard columns:
UniProt Accessions UniProt Accession of the identified miRNA target
miRNA Target Target symbol from the MTI database(s)
miRNA miRNA(s) identified to interact with the target
Review status Review status of UniProtKB entry
Organism The MTI's organism
Gene synonyms Synonyms of the target gene
Protein names Name(s) of the influenced protein
EC number Enzyme Commission number
Existence Evidence for protein existence
GO-IDs Gene Ontology identifier(s)

**MTI Overlap HM**

Based on the idea that each identified miRNA interacts with a set of target genes, the Heatmap (HM) depicts the ratio of overlapping UniProtAcc targets between each of these MTI sets.
If the MTI set enrichment analysis was used, the Heatmap output will depict for each MTI set the ratio overlapping target genes which are part of the leading edge sets of the corresponding MTI sets.

**MTI Sets ranked**

If a ranking file was passed to miRNA, a reduced version of the Gene Set Enrichment Analysis tool is started, analysing the enrichment of the identified MTI sets based on the ranked UniProtAccs.
With a running sum statistic, a weighted Enrichment Score (ES) is calculated for each gene set based on position dependant gene matches between the ranked list and the set.
The Leading Edge analysis additionally identifies and analyses the core genes of the gene set which mainly affect the ES.
At this, the Leading Edge analysis proceeds as follows: Depending on whether the ES of a MTI set is positive or negative, the set of Leading Edge targets either consists of the MTI set targets before
or after the peak in the running sum calculation.
Based on this, three statistics are calculated, where tags represents the ratio of leading edge targets to all targets in the given set, list starts out from all UniProtAcc in the ranking file either before
positive ES) or after (negative ES) the peak and calculates the ratio of these UniProtAccs to all existent within the file and where signal is a combination of the two previous calculations,
describing the distribution of the MTI set targets over the ranked dataset, resulting in 100% or more, if all targets can be found at the beginning of the ranked list.
To take the set sizes into account, MTI set enrichment analysis calculates in the next step the Normalized Enrichment Score (NES) for each gene set by using permutations of the dataset.
Additionally, the False Discovery Rate (FDR) q-value is calculated, representing the estimated probability of a false positive result for each set with a given NES.

Aside from the ES, NES, FDR q-value and Leading Edge analysis, the file consists of the size of each MTI set, which is the number of overlapping UniProtAccs between the MTI set and the ranked list,
and the index within the ranked gene file at which the running sum statistic calculated the maximal ES.

Example:
|--------|----|----|----|---------|-----------|------------------------------|
|MTI Set |Size|ES |NES |FDR q-val|Rank at Max|Leading Edge |
|========|====|====|====|=========|===========|==============================|
|miR-149 | 6 |0.65|1.55|0.290 | 16 |tags=67%, list=29%, signal=53%|
|--------|----|----|----|---------|-----------|------------------------------|
|miR-301b| 4 |0.61|1.29|0.790 | 1 |tags=25%, list=2%, signal=26% |
|--------|----|----|----|---------|-----------|------------------------------|


**EnrichmentScore Plots**

Enrichments plots depict for each MTI set the running enrichments score over all UniProtAccs in the ranked dataset (blue line),
the position of targets of the current MTI set within in the ranked list (black dashes) and the maximum ES, either positive or negative (red dash).
Enrichment plots are created only if a MTI set enrichment analysis was started.

**MTI Set Genes**

The MTI set gene file output of LimiTT is more or less a written version of all enrichment plots and thus just produced, if an enrichment analyses was initiated.
The file lists for each MTI set, the targets which overlap with the ranked list of UniProtAccs, the index of each of this targets within the ranked list,
the running ES for this target and whether it is a member of the leading edge set or not.

Example:
|--------|------|--------------------|----------|---------|
|MTI set |Target|Index in Ranked List|Running ES|LE Member|
|========|======|====================|==========|=========|
|miR-149 |Q9WV91| 10 |0.06 |Yes |
|--------|------|--------------------|----------|---------|
|miR-149 |Q80SW1| 13 |0.24 |Yes |
|--------|------|--------------------|----------|---------|
|miR-149 |Q71B07| 15 |0.44 |Yes |
|--------|------|--------------------|----------|---------|
|miR-181a|Q56A04| 29 |-0.16 |Yes |
|--------|------|--------------------|----------|---------|
|miR-181a|A2AJK6| 36 |0.38 |Yes |
|--------|------|--------------------|----------|---------|
|miR-190a|A3KGB4| 14 |0.76 |Yes |
|--------|------|--------------------|----------|---------|

#### MTI Info

The MTI information file is a list of all identified target UniProt Accessions together with the interacting miRNAs and further information which was collected during the process.
If in the beginning of the process additional information from the annotation file and/or the miRNA list was specified, this information will also be part of the MTI information file.

Standard columns:

Column | Explanation|
-------|------------|
UniProt Accessions|UniProt Accession of the identified miRNA target|
miRNA Target | Target symbol from the MTI database(s)|
miRNA | miRNA(s) identified to interact with the target|
Review status | Review status of UniProtKB entry|
Organism | The MTI's organism|
Gene synonyms | Synonyms of the target gene|
Protein names | Name(s) of the influenced protein|
EC number | Enzyme Commission number|
Existence | Evidence for protein existence|
GO-IDs | Gene Ontology identifier(s)|

#### MTI Overlap HM

Based on the idea that each identified miRNA interacts with a set of target genes, the Heatmap (HM) depicts the ratio of overlapping UniProtAcc targets between each of these MTI sets.
If the MTI set enrichment analysis was used, the Heatmap output will depict for each MTI set the ratio overlapping target genes which are part of the leading edge sets of the corresponding MTI sets.

#### MTI Sets ranked

If a ranking file was passed to miRNA, a reduced version of the Gene Set Enrichment Analysis tool is started, analysing the enrichment of the identified MTI sets based on the ranked UniProtAccs.
With a running sum statistic, a weighted Enrichment Score (ES) is calculated for each gene set based on position dependant gene matches between the ranked list and the set.
The Leading Edge analysis additionally identifies and analyses the core genes of the gene set which mainly affect the ES.
At this, the Leading Edge analysis proceeds as follows: Depending on whether the ES of a MTI set is positive or negative, the set of Leading Edge targets either consists of the MTI set targets before
or after the peak in the running sum calculation.
Based on this, three statistics are calculated, where tags represents the ratio of leading edge targets to all targets in the given set, list starts out from all UniProtAcc in the ranking file either before
positive ES) or after (negative ES) the peak and calculates the ratio of these UniProtAccs to all existent within the file and where signal is a combination of the two previous calculations,
describing the distribution of the MTI set targets over the ranked dataset, resulting in 100% or more, if all targets can be found at the beginning of the ranked list.
To take the set sizes into account, MTI set enrichment analysis calculates in the next step the Normalized Enrichment Score (NES) for each gene set by using permutations of the dataset.
Additionally, the False Discovery Rate (FDR) q-value is calculated, representing the estimated probability of a false positive result for each set with a given NES.

Aside from the ES, NES, FDR q-value and Leading Edge analysis, the file consists of the size of each MTI set, which is the number of overlapping UniProtAccs between the MTI set and the ranked list,
and the index within the ranked gene file at which the running sum statistic calculated the maximal ES.

Example:

MTI Set |Size|ES |NES |FDR q-val|Rank at Max|Leading Edge |
--------|----|----|----|---------|-----------|------------------------------|
miR-149 | 6 |0.65|1.55|0.290 | 16 |tags=67%, list=29%, signal=53%|
miR-301b| 4 |0.61|1.29|0.790 | 1 |tags=25%, list=2%, signal=26% |


#### EnrichmentScore Plots

Enrichments plots depict for each MTI set the running enrichments score over all UniProtAccs in the ranked dataset (blue line),
the position of targets of the current MTI set within in the ranked list (black dashes) and the maximum ES, either positive or negative (red dash).
Enrichment plots are created only if a MTI set enrichment analysis was started.

#### MTI Set Genes

The MTI set gene file output of LimiTT is more or less a written version of all enrichment plots and thus just produced, if an enrichment analyses was initiated.
The file lists for each MTI set, the targets which overlap with the ranked list of UniProtAccs, the index of each of this targets within the ranked list,
the running ES for this target and whether it is a member of the leading edge set or not.

Example:

MTI set |Target|Index in Ranked List|Running ES|LE Member|
--------|------|--------------------|----------|---------|
miR-149 |Q9WV91| 10 |0.06 |Yes |
miR-149 |Q80SW1| 13 |0.24 |Yes |
miR-149 |Q71B07| 15 |0.44 |Yes |
miR-181a|Q56A04| 29 |-0.16 |Yes |
miR-181a|A2AJK6| 36 |0.38 |Yes |
miR-190a|A3KGB4| 14 |0.76 |Yes |


### References

Expand Down

0 comments on commit 45d32b3

Please sign in to comment.