Skip to content
Permalink
9a7d6419a0
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
 
 
Cannot retrieve contributors at this time
137 lines (137 sloc) 6.18 KB
<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="http://deep.mpi-inf.mpg.de/DAC/files/style/deep_process_style.css"?>
<process>
<name>SAL</name>
<version>2</version>
<author>
<name>Filippos Klironomos</name>
<email>filippos.klironomos@mdc-berlin.de</email>
</author>
<description>
1) remove adaptors and map to reference without filtering or collapsing the reads
2) generate coverage track from the aligned reads
3) cluster reads that overlap windows of 501nts around TSS/TES
4) predict TSS/TES-miRNAs based on the following filters:
- pick sharply defined, well covered clusters and identify the peak
- peak from step (1) should be consisted of reads with identical 5&apos; start position
- average read length of peak from step (2) should be between 18 and 24nts
- average phastCons mean score per peak from step (3) should be 0.8 or above
5) generate coverage tracks for TSS/TES-miRNAs
</description>
<inputs>
<filetype>
<identifier>library</identifier>
<format>FASTQ</format>
<quantity>single</quantity>
<comment>{SampleID}.fastq library of raw reads to trim and map to the reference</comment>
</filetype>
</inputs>
<references>
<filetype>
<identifier>genome</identifier>
<format>fasta</format>
<quantity>single</quantity>
<comment><![CDATA[
hs37d5 and GRCm38mm10 genomes are modified as follows:
*) IDs are simplified, everything to the right of the first white space encountered is removed,
*) all ambiguously called nucleotides [URYSWKMBDHV] have been masked to 'N'.
The following script does all this:
sed -e 's/^>\(\S\+\)\s.*$/>\1/' -e '/^[^>]/s/[UuRrYySsWwKkMmBbDdHhVv]/N/g' hs37d5.fa > hs37d5_simple.fa
sed -e 's/^>\(\S\+\)\s.*$/>\1/' -e '/^[^>]/s/[UuRrYySsWwKkMmBbDdHhVv]/N/g' GRCm38mm10.fa > GRCm38mm10_simple.fa
]]>
</comment>
</filetype>
<filetype>
<identifier>genome_index</identifier>
<format>bowtie-index</format>
<quantity>collection</quantity>
<comment><![CDATA[
bowtie version 1.1.1 index of hs37d5_simple.fa and GRCm38mm10_simple.fa generated as follows:
bowtie-build -f hs37d5_simple.fa hs37d5_simple.fa
bowtie-build -f GRCm38mm10_simple.fa GRCm38mm10_simple.fa
]]>
</comment>
</filetype>
</references>
<outputs>
<filetype>
<identifier>SampleID.SALv2.DATE.trimmed.bam</identifier>
<format>BAM</format>
<quantity>single</quantity>
<comment>adaptor-trimmed reads mapped to the reference without any filtering or collapsing</comment>
</filetype>
<filetype>
<identifier>SampleID.SALv2.DATE.trimmed.bedGraph</identifier>
<format>bedGraph</format>
<quantity>single</quantity>
<comment>reference genome coverage track of aligned adaptor-trimmed reads</comment>
</filetype>
<filetype>
<identifier>SampleID.SALv2.DATE.TSS.{sense,antisense}.tsv</identifier>
<format>TSV</format>
<quantity>single</quantity>
<comment>Coverage track for clustered reads mapped sense/antisense to TSS regions. Each cluster with a unique clusterId represents a TSS-miRNA prediction.
The format is BED-like (0-based, end-exclusive) and the columns are:
chr start end readId score strand clusterId min_phastCons_score max_phastCons_score mean_phastCons_score median_phastCons_score
</comment>
</filetype>
<filetype>
<identifier>SampleID.SALv2.DATE.TSS.{sense,antisense}.summary.tsv</identifier>
<format>TSV</format>
<quantity>single</quantity>
<comment>Summary results of called peaks (clusters)
The format is BED-like (0-based, end-exclusive) and the columns are:
chr start end strand clusterId coverage geneId,symbol consensus_sequence
</comment>
</filetype>
<filetype>
<identifier>SampleID.SALv2.DATE.TSS.{sense,antisense}.bed</identifier>
<format>BED</format>
<quantity>single</quantity>
<comment>Simplified BED version (6 columns) of the corresonding TSV file with readIds removed.</comment>
</filetype>
<filetype>
<identifier>SampleID.SALv2.DATE.TES.{sense,antisense}.tsv</identifier>
<format>BED-like</format>
<quantity>single</quantity>
<comment>Coverage track for clustered reads mapped sense/antisense to TES regions. Each cluster with a unique clusterId represents a TES-miRNA prediction.
The format is BED-like (0-based, end-exclusive) and the columns are:
chr start end readId score strand clusterId min_phastCons_score max_phastCons_score mean_phastCons_score median_phastCons_score
</comment>
</filetype>
<filetype>
<identifier>SampleID.SALv2.DATE.TES.{sense,antisense}.summary.tsv</identifier>
<format>TSV</format>
<quantity>single</quantity>
<comment>Summary results of called peaks (clusters)
The format is BED-like (0-based, end-exclusive) and the columns are:
chr start end strand clusterId coverage geneId,symbol consensus_sequence
</comment>
</filetype>
<filetype>
<identifier>SampleID.SALv2.DATE.TES.{sense,antisense}.bed</identifier>
<format>BED</format>
<quantity>single</quantity>
<comment>Simplified BED version (6 columns) of the corresonding TSV file with readIds removed.</comment>
</filetype>
</outputs>
<software>
<tool>
<name>standard</name>
<version>n/a</version>
<command_line><![CDATA[ CMDLINE ]]></command_line>
<loop>n/a</loop>
<comment>The following software tools are used in the pipeline:
flexbar version 2.4
bowtie version 1.1.1
samtools version 1.1
bedtools version 2.23.0
R version 3.2.0
Bioconductor version 3.1 (BiocInstaller 1.18.1)
bwtool version 1.0
custom python 2.7 scripts
gawk version 4.0.1
</comment>
</tool>
</software>
</process>