Skip to content
Permalink
Newer
Older
100644 207 lines (205 sloc) 9.03 KB
1
<?xml version="1.0"?>
2
<?xml-stylesheet type="text/css" href="http://deep.mpi-inf.mpg.de/DAC/files/style/deep_process_style.css"?>
3
<process>
4
<name>SXP</name>
5
<version>1</version>
6
<author>
7
<name>Filippos Klironomos</name>
8
<email>filippos.klironomos@mdc-berlin.de</email>
9
</author>
10
<description>
11
*) miRDeep2 pipeline involves:
12
*) mapping of reads to genome and keeping those uniquely mapped
13
*) extracting bracketing DNA of the uniquely mapped reads
14
*) RNAfold extracted sequences and keeping those that form unbifurcated hairpins
15
*) scoring putative precursors:
16
*) expect greater number of reads mapping to either the -5p or -3p strand and very little to the hairpin
17
*) short 3&apos; duplex overhang characteristic of Drosha/Dicer processing adds to the score
18
*) relative and absolute stabilities contribute to the score
19
*) if 5&apos; end of mature sequence is identical to that of known mature sequence it adds to the score
20
*) randomly permuting read signatures with putative precursor sequences in order to determine the FPR
21
Internally miRDeep2 uses the following packages:
22
RNAfold version 2.1.7
23
RANDFOLD version 2
24
</description>
25
<inputs>
26
<filetype>
27
<identifier>config</identifier>
28
<format>TSV</format>
29
<quantity>single</quantity>
30
<comment>
31
this is the configuration file that miRDeep2 uses to locate the FASTQ library and assign the 3-character identification to it
32
</comment>
33
</filetype>
34
</inputs>
35
<references>
36
<filetype>
37
<identifier>genome</identifier>
38
<format>fasta</format>
39
<quantity>single</quantity>
40
<comment>
41
hs37d5 and GRCm38mm10 genomes are modified as follows:
42
*) IDs are simplified, everything to the right of the first white space encountered is removed,
43
*) all ambiguously called nucleotides [URYSWKMBDHV] have been masked to &quot;N&quot;.
44
The following script does all this:
45
<![CDATA[
46
sed -e 's/^>\(\S\+\)\s.*$/>\1/' -e '/^[^>]/s/[UuRrYySsWwKkMmBbDdHhVv]/N/g' hs37d5.fa > hs37d5_simple.fa
47
sed -e 's/^>\(\S\+\)\s.*$/>\1/' -e '/^[^>]/s/[UuRrYySsWwKkMmBbDdHhVv]/N/g' GRCm38mm10.fa > GRCm38mm10_simple.fa
48
]]>
49
</comment>
50
</filetype>
51
<filetype>
52
<identifier>genome_index</identifier>
53
<format>bowtie-index</format>
54
<quantity>collection</quantity>
55
<comment>
56
bowtie version 0.12.7 index of hs37d5_simple.fa and GRCm38mm10_simple.fa generated as follows:
57
bowtie-build -f hs37d5_simple.fa hs37d5_simple.fa
58
bowtie-build -f GRCm38mm10_simple.fa GRCm38mm10_simple.fa
59
</comment>
60
</filetype>
61
<filetype>
62
<identifier>miRBase_mature</identifier>
63
<format>fasta</format>
64
<quantity>single</quantity>
65
<comment>mature known miRNA reference from miRBase Release 20 uploaded to ASPERA</comment>
66
</filetype>
67
<filetype>
68
<identifier>miRBase_hairpin</identifier>
69
<format>fasta</format>
70
<quantity>single</quantity>
71
<comment>precursor (hairpin) known miRNA reference from miRBase Release 20 uploaded to ASPERA</comment>
72
</filetype>
73
</references>
74
<outputs>
75
<filetype>
76
<identifier>SampleID.SXPv1.DATE.known.csv</identifier>
77
<format>csv</format>
78
<quantity>single</quantity>
79
<comment>
80
expression of known miRNAs quantified by miRDeep2
81
</comment>
82
</filetype>
83
<filetype>
84
<identifier>SampleID.SXPv1.DATE.known.bed</identifier>
85
<format>bed</format>
86
<quantity>single</quantity>
87
<comment>
88
BED track of expression of known miRNAs quantified by miRDeep2
89
</comment>
90
</filetype>
91
<filetype>
92
<identifier>SampleID.SXPv1.DATE.known.bedGraph</identifier>
93
<format>bedGraph</format>
94
<quantity>single</quantity>
95
<comment>
96
bedGraph track of expression of known miRNAs quantified by miRDeep2
97
</comment>
98
</filetype>
99
<filetype>
100
<identifier>SampleID.SXPv1.DATE.novel.bed</identifier>
101
<format>bed</format>
102
<quantity>single</quantity>
103
<comment>
104
bed track of expression of novel miRNAs predicted by miRDeep2
105
</comment>
106
</filetype>
107
<filetype>
108
<identifier>SampleID.SXPv1.DATE.novel.bedGraph</identifier>
109
<format>bedGraph</format>
110
<quantity>single</quantity>
111
<comment>
112
bedGraph track of expression of novel miRNAs predicted by miRDeep2
113
</comment>
114
</filetype>
115
</outputs>
116
<software>
117
<tool>
118
<name>generate_config</name>
119
<version>missing</version>
120
<command_line>
121
<![CDATA[ echo -ne "{SampleID.fastq}\tID1\n" > config ]]>
122
</command_line>
123
<loop>no looping</loop>
124
<comment>
125
this command creates the configuration file for miRDeep2 to use in order to locate the FASTQ library {SampleID.fastq} and assign
126
a 3-letter internal ID to it, in this case ID1
127
</comment>
128
</tool>
129
<tool>
130
<name>mapper.pl</name>
131
<version>miRDeep2.0.0.6</version>
132
<command_line>
133
<![CDATA[ mapper.pl config -d -e -h -j -k {Adaptor} -l 18 -m -p {genome_index} -s reads_collapsed.fa -t reads_vs_genome.arf -v -o 12 &> mapper_summary.log ]]>
134
</command_line>
135
<loop>no looping</loop>
136
<comment>
137
use the configuration file to locate the library; remove adaptor provided by {Adaptor};
138
collapse the reads to the file &quot;read_collapsed.fa&quot;;
139
map to the reference and output the alignments in the file &quot;reads_vs_genome.arf&quot;;
140
print out summary in &quot;mapper_summary.log&quot;
141
142
The ARF is a text-based format consisting of the following columns:
143
144
readID # the ID of the read
145
readLength # length of the read
146
start # start position of the alignment relative to the read
147
end # end position of the alignment relative to the read
148
readSeq # sequence of the read
149
chr # chromosome of reference where read maps
150
refLength # length of the reference sequence where read maps to
151
start # start position of reference sequence where read maps to
152
end # end position of reference sequence where read maps to
153
referenceSeq # reference sequence where read maps to
154
strand # strand of reference
155
mm # number of mismatches in the alignment
156
MAPQ-like-string # m==perfect match, M==mismatch
157
</comment>
158
</tool>
159
<tool>
160
<name>miRDeep2</name>
161
<version>miRDeep2.0.0.6</version>
162
<command_line>
163
<![CDATA[ miRDeep2.pl reads_collapsed.fa {genome} reads_vs_genome.arf {miRBase_mature} none {miRBase_hairpin} -t {Species} -P 2> miRDeep2.report.log ]]>
164
</command_line>
165
<loop>no looping</loop>
166
<comment>quantify known miRNAs and predict putative novel miRNAs across samples</comment>
167
</tool>
168
<tool>
169
<name>rename_according_to_metadata_standards</name>
170
<version>missing</version>
171
<command_line>
172
<![CDATA[ cp miRNAs_expressed_all_samples_DATE_t_TIME.csv {SampleID}.SXPv1.{DATE}.known.csv ]]>
173
</command_line>
174
<loop>no looping</loop>
175
<comment>rename output data file to conform to metadata naming standards</comment>
176
</tool>
177
<tool>
178
<name>mirdeep2_csv2bed.pl</name>
179
<version>missing</version>
180
<command_line>
181
<![CDATA[
182
mirdeep2_csv2bed.pl -r result_DATE_t_TIME.csv -p -T {SampleID}
183
cp known_pres_DATE_t_TIME_score-50_to_na.bed {SampleID}.SXPv1.{DATE}.known.bed
184
echo "track name=\"{SampleID}.novel_miRNAs\" description=\"novel miRNAs detected by miRDeep2 for {SampleID}\" visibility=2 itemRgb=\"On\"" > "{SampleID}.SXPv1.{DATE}.novel.bed"
185
cat "novel_pres_DATE_t_TIME_score-50_to_na.bed" >> "{SampleID}.SXPv1.{DATE}.novel.bed"
186
]]>
187
</command_line>
188
<loop>no looping</loop>
189
<comment>
190
Generate BED tracks from the total precursor read counts of known and novel miRNAs and rename them according to metadata standards.
191
This tool has been uploaded to ASPERA.
192
</comment>
193
</tool>
194
<tool>
195
<name>bed_to_bedGraph</name>
196
<version>missing</version>
197
<command_line>
198
<![CDATA[
199
gawk 'NR==3 {print "track type=bedGraph description=\"miRDeep2 known miRNAs\" visibility=2 color=0,0,255 altColor=255,0,0" > FILENAME"Graph"; print $1,$2,$3,$5 >> FILENAME"Graph"} NR>3 {print $1,$2,$3,$5 >> FILENAME"Graph"}' "{SampleID}.SXPv1.{DATE}.known.bed"
200
gawk 'NR==1 {print "track type=bedGraph description=\"miRDeep2 novel miRNAs\" visibility=2 color=0,0,255 altColor=255,0,0" > FILENAME"Graph"; print $1,$2,$3,$5 >> FILENAME"Graph"} NR>1 {print $1,$2,$3,$5 >> FILENAME"Graph"}' "{SampleID}.SXPv1.{DATE}.novel.bed"
201
]]>
202
</command_line>
203
<loop>no looping</loop>
204
<comment>convert BED tracks to bedGraph</comment>
205
</tool>
206
</software>
207
</process>