From 45d32b31ad1e01cb0d569197321cc9f9104affac Mon Sep 17 00:00:00 2001 From: jbayer Date: Tue, 7 Jul 2015 11:25:59 +0200 Subject: [PATCH] Update HELP.md Input and Output tables --- HELP.md | 205 ++++++++++++++++++++++++++------------------------------ 1 file changed, 96 insertions(+), 109 deletions(-) diff --git a/HELP.md b/HELP.md index 66c468f..efd3b4f 100644 --- a/HELP.md +++ b/HELP.md @@ -73,7 +73,7 @@ Parameter | Explanation ### Input -**Annotation File (-ia)** +#### Annotation File (-ia) - *File type*: Tab delimited - *Header*: No @@ -99,7 +99,7 @@ comp1000627_c0_seq1|slc6a15|sp|Q9XS59|S6A15_BOVIN | comp1000899_c0_seq1| |gb|CX212397.1,dbj|DB530926.2,gi|154363325| -**miRNA File (-im)** +#### miRNA File (-im) - *File type*: Tab delimited - *Header*: No @@ -126,8 +126,7 @@ miR-29d|141233|0.02|172|TAGCACCATATGAAATCAGTGT|133582| miR-29c|55690 |1.00|172|TAGCACCATTTGAAATCGGTTA|44200 | - -**Ranking File (-ir)** +#### Ranking File (-ir) - *File type*: Tab delimited - *Header*: No @@ -147,113 +146,101 @@ Q6A037|0.495874819| ### Output -**BarGraphs** - - The bar graphs provide an overview of the number of miRNAs and MTIs after the different processing steps of LimiTT. - At this, miRNAs and MTIs are counted after searching the MTI databases, after filtering by their occurrence over the DBs, - after mapping MTIs onto UniProtAccs and after mapping the remaining MTI targets onto the annotated UniProtAccs. - Thus, the last number within the bars is the final result. - -**MTI Matrix** - - The matrix contains all identified MTIs ranged in targets as UniProtAccs (rows) and miRNAs (columns). - If an interaction between miRNA and target was identified, a binary number represents the occurrence of the interaction over the chosen MTI DBs. - The order of the DBs for the binary string can be found in the first row. - - Example: - |--------------------------------------------------------| - |Database order: TarBase, miRTarBase, miRecords, starBase| - |------|-----|-------|------|-------|------|-------------| - | |miR-9|miR-15a|miR-17|miR-19b|miR-24| miR-26a | - |------|-----|-------|------|-------|------|-------------| - |A0AVK6| 0001| | | | 0110| | - |------|-----|-------|------|-------|------|-------------| - |A2A6A1| | | | 1110| | | - |------|-----|-------|------|-------|------|-------------| - |A2AAY5| | | 1110| | | 1001 | - |------|-----|-------|------|-------|------|-------------| - |A2AHG0| |0001 | 0101| | | | - |------|-----|-------|------|-------|------|-------------| +#### BarGraphs + +The bar graphs provide an overview of the number of miRNAs and MTIs after the different processing steps of LimiTT. +MiRNAs and MTIs are counted after searching the MTI databases, after filtering by their occurrence over the DBs, after mapping MTIs onto UniProtAccs and after mapping the remaining MTI targets onto the annotated UniProtAccs. +Thus, the last number within the bars is the final result. + +#### MTI Matrix + +The matrix contains all identified MTIs ranged in targets as UniProtAccs (rows) and miRNAs (columns). +If an interaction between miRNA and target was identified, a binary number represents the occurrence of the interaction over the chosen MTI DBs. +The order of the DBs for the binary string can be found in the first row. + +Example: + +|Database order: TarBase, miRTarBase, miRecords, starBase| +|------|-----|-------|------|-------|------|-------------| +| |miR-9|miR-15a|miR-17|miR-19b|miR-24| miR-26a | +|A0AVK6| 0001| | | | 0110| | +|A2A6A1| | | | 1110| | | +|A2AAY5| | | 1110| | | 1001 | +|A2AHG0| |0001 | 0101| | | | -**MTI Info** - - The MTI information file is a list of all identified target UniProt Accessions together with the interacting miRNAs and further information which was collected during the process. - If in the beginning of the process additional information from the annotation file and/or the miRNA list was specified, this information will also be part of the MTI information file. - - Standard columns: - UniProt Accessions UniProt Accession of the identified miRNA target - miRNA Target Target symbol from the MTI database(s) - miRNA miRNA(s) identified to interact with the target - Review status Review status of UniProtKB entry - Organism The MTI's organism - Gene synonyms Synonyms of the target gene - Protein names Name(s) of the influenced protein - EC number Enzyme Commission number - Existence Evidence for protein existence - GO-IDs Gene Ontology identifier(s) - -**MTI Overlap HM** - - Based on the idea that each identified miRNA interacts with a set of target genes, the Heatmap (HM) depicts the ratio of overlapping UniProtAcc targets between each of these MTI sets. - If the MTI set enrichment analysis was used, the Heatmap output will depict for each MTI set the ratio overlapping target genes which are part of the leading edge sets of the corresponding MTI sets. - -**MTI Sets ranked** - - If a ranking file was passed to miRNA, a reduced version of the Gene Set Enrichment Analysis tool is started, analysing the enrichment of the identified MTI sets based on the ranked UniProtAccs. - With a running sum statistic, a weighted Enrichment Score (ES) is calculated for each gene set based on position dependant gene matches between the ranked list and the set. - The Leading Edge analysis additionally identifies and analyses the core genes of the gene set which mainly affect the ES. - At this, the Leading Edge analysis proceeds as follows: Depending on whether the ES of a MTI set is positive or negative, the set of Leading Edge targets either consists of the MTI set targets before - or after the peak in the running sum calculation. - Based on this, three statistics are calculated, where tags represents the ratio of leading edge targets to all targets in the given set, list starts out from all UniProtAcc in the ranking file either before - positive ES) or after (negative ES) the peak and calculates the ratio of these UniProtAccs to all existent within the file and where signal is a combination of the two previous calculations, - describing the distribution of the MTI set targets over the ranked dataset, resulting in 100% or more, if all targets can be found at the beginning of the ranked list. - To take the set sizes into account, MTI set enrichment analysis calculates in the next step the Normalized Enrichment Score (NES) for each gene set by using permutations of the dataset. - Additionally, the False Discovery Rate (FDR) q-value is calculated, representing the estimated probability of a false positive result for each set with a given NES. - - Aside from the ES, NES, FDR q-value and Leading Edge analysis, the file consists of the size of each MTI set, which is the number of overlapping UniProtAccs between the MTI set and the ranked list, - and the index within the ranked gene file at which the running sum statistic calculated the maximal ES. - - Example: - |--------|----|----|----|---------|-----------|------------------------------| - |MTI Set |Size|ES |NES |FDR q-val|Rank at Max|Leading Edge | - |========|====|====|====|=========|===========|==============================| - |miR-149 | 6 |0.65|1.55|0.290 | 16 |tags=67%, list=29%, signal=53%| - |--------|----|----|----|---------|-----------|------------------------------| - |miR-301b| 4 |0.61|1.29|0.790 | 1 |tags=25%, list=2%, signal=26% | - |--------|----|----|----|---------|-----------|------------------------------| - - -**EnrichmentScore Plots** - - Enrichments plots depict for each MTI set the running enrichments score over all UniProtAccs in the ranked dataset (blue line), - the position of targets of the current MTI set within in the ranked list (black dashes) and the maximum ES, either positive or negative (red dash). - Enrichment plots are created only if a MTI set enrichment analysis was started. - -**MTI Set Genes** - - The MTI set gene file output of LimiTT is more or less a written version of all enrichment plots and thus just produced, if an enrichment analyses was initiated. - The file lists for each MTI set, the targets which overlap with the ranked list of UniProtAccs, the index of each of this targets within the ranked list, - the running ES for this target and whether it is a member of the leading edge set or not. - - Example: - |--------|------|--------------------|----------|---------| - |MTI set |Target|Index in Ranked List|Running ES|LE Member| - |========|======|====================|==========|=========| - |miR-149 |Q9WV91| 10 |0.06 |Yes | - |--------|------|--------------------|----------|---------| - |miR-149 |Q80SW1| 13 |0.24 |Yes | - |--------|------|--------------------|----------|---------| - |miR-149 |Q71B07| 15 |0.44 |Yes | - |--------|------|--------------------|----------|---------| - |miR-181a|Q56A04| 29 |-0.16 |Yes | - |--------|------|--------------------|----------|---------| - |miR-181a|A2AJK6| 36 |0.38 |Yes | - |--------|------|--------------------|----------|---------| - |miR-190a|A3KGB4| 14 |0.76 |Yes | - |--------|------|--------------------|----------|---------| - - +#### MTI Info + +The MTI information file is a list of all identified target UniProt Accessions together with the interacting miRNAs and further information which was collected during the process. +If in the beginning of the process additional information from the annotation file and/or the miRNA list was specified, this information will also be part of the MTI information file. + +Standard columns: + +Column | Explanation| +-------|------------| +UniProt Accessions|UniProt Accession of the identified miRNA target| +miRNA Target | Target symbol from the MTI database(s)| +miRNA | miRNA(s) identified to interact with the target| +Review status | Review status of UniProtKB entry| +Organism | The MTI's organism| +Gene synonyms | Synonyms of the target gene| +Protein names | Name(s) of the influenced protein| +EC number | Enzyme Commission number| +Existence | Evidence for protein existence| +GO-IDs | Gene Ontology identifier(s)| + +#### MTI Overlap HM + +Based on the idea that each identified miRNA interacts with a set of target genes, the Heatmap (HM) depicts the ratio of overlapping UniProtAcc targets between each of these MTI sets. +If the MTI set enrichment analysis was used, the Heatmap output will depict for each MTI set the ratio overlapping target genes which are part of the leading edge sets of the corresponding MTI sets. + +#### MTI Sets ranked + +If a ranking file was passed to miRNA, a reduced version of the Gene Set Enrichment Analysis tool is started, analysing the enrichment of the identified MTI sets based on the ranked UniProtAccs. +With a running sum statistic, a weighted Enrichment Score (ES) is calculated for each gene set based on position dependant gene matches between the ranked list and the set. +The Leading Edge analysis additionally identifies and analyses the core genes of the gene set which mainly affect the ES. +At this, the Leading Edge analysis proceeds as follows: Depending on whether the ES of a MTI set is positive or negative, the set of Leading Edge targets either consists of the MTI set targets before +or after the peak in the running sum calculation. +Based on this, three statistics are calculated, where tags represents the ratio of leading edge targets to all targets in the given set, list starts out from all UniProtAcc in the ranking file either before +positive ES) or after (negative ES) the peak and calculates the ratio of these UniProtAccs to all existent within the file and where signal is a combination of the two previous calculations, +describing the distribution of the MTI set targets over the ranked dataset, resulting in 100% or more, if all targets can be found at the beginning of the ranked list. +To take the set sizes into account, MTI set enrichment analysis calculates in the next step the Normalized Enrichment Score (NES) for each gene set by using permutations of the dataset. +Additionally, the False Discovery Rate (FDR) q-value is calculated, representing the estimated probability of a false positive result for each set with a given NES. + +Aside from the ES, NES, FDR q-value and Leading Edge analysis, the file consists of the size of each MTI set, which is the number of overlapping UniProtAccs between the MTI set and the ranked list, +and the index within the ranked gene file at which the running sum statistic calculated the maximal ES. + +Example: + +MTI Set |Size|ES |NES |FDR q-val|Rank at Max|Leading Edge | +--------|----|----|----|---------|-----------|------------------------------| +miR-149 | 6 |0.65|1.55|0.290 | 16 |tags=67%, list=29%, signal=53%| +miR-301b| 4 |0.61|1.29|0.790 | 1 |tags=25%, list=2%, signal=26% | + + +#### EnrichmentScore Plots + +Enrichments plots depict for each MTI set the running enrichments score over all UniProtAccs in the ranked dataset (blue line), +the position of targets of the current MTI set within in the ranked list (black dashes) and the maximum ES, either positive or negative (red dash). +Enrichment plots are created only if a MTI set enrichment analysis was started. + +#### MTI Set Genes + +The MTI set gene file output of LimiTT is more or less a written version of all enrichment plots and thus just produced, if an enrichment analyses was initiated. +The file lists for each MTI set, the targets which overlap with the ranked list of UniProtAccs, the index of each of this targets within the ranked list, +the running ES for this target and whether it is a member of the leading edge set or not. + +Example: + +MTI set |Target|Index in Ranked List|Running ES|LE Member| +--------|------|--------------------|----------|---------| +miR-149 |Q9WV91| 10 |0.06 |Yes | +miR-149 |Q80SW1| 13 |0.24 |Yes | +miR-149 |Q71B07| 15 |0.44 |Yes | +miR-181a|Q56A04| 29 |-0.16 |Yes | +miR-181a|A2AJK6| 36 |0.38 |Yes | +miR-190a|A3KGB4| 14 |0.76 |Yes | + ### References