-
Notifications
You must be signed in to change notification settings - Fork 0
updated script. Now the path of subscripts is automatically aquired. … #33
Conversation
…Fixed error which wrote output to the path of subscripts instead of the working directory
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment.
Maybe you could check if there are new/unknown motif and if not you should throw an error message. |
|
Gets following:
The // paths are working but It should be changed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I completely agree with @renewiegandt. Right now it is really hard for someone else to unterstand what the code is doing, please add a lot more comments!
Please refer to our guidelines while doing so. |
Anastasiia hat doch ein neues bed-format mit 9 spalten. in den daten /mnt/agnerds/masterjlu2018/out5/footprint_extraction/buenrostro50k_chr1_fp_called_peaks.bed sind nur 8 Spalten enthalten. Es fehlt strand nach score. Das wäre das erste was mir auffällt |
Edit: Diese version arbeitet jedoch auch noch mit 8 spalten glaube ich. Ist ein anderer Fehler |
Please write your comments in English, thanks. |
the path /mnt/workspace/rene.wiegandt/tmp6/ does not exist. |
The path exists its /mnt/workspace1/rene.wiegandt/tmp6/. You can't reach it from your VM. |
I have just made a pull request #38 with my script that always produces the same number of columns, if I receive no information from original input bed file, there will be a "." at the last column. Please make sure @JannikHamp that your script is working now. |
You should also add documentation to the two subscripts. I'm not sure why you are writting different files depending on teh help parameter? |
The help parameter at this point is just introduced at that point and has nothing to do with the help of the command compareBed.sh So pass1TrHelp.bed now contains the information of all comparisons |
Ah than I got it mixed up. Maybe you could rename the variable? |
I tried it on the full mDux data. I still got the same error. |
-d /mnt/agnerdsJannik.Hamp/call_peaks_output_to_check.bed but I corrected the script now for 9 columns bed-files. this will not work with the 8 column file /mnt/agnerdsJannik.Hamp/call_peaks_output_to_check.bed |
If an error occurs in your script prints an error message to stdout but does not stop the script. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall try to be more consistent with your spacing. And please check your code for typos.
bin/1.2_filter_motifs/compareBed.sh
Outdated
#output | ||
|
||
# This script utilizes bedtools to gain non-overlapping sequence parts between bed-files | ||
# merge.R and maxScore.R are needed to be saved in the same directory than this to make it work |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...same directory as ...
bin/1.2_filter_motifs/compareBed.sh
Outdated
@@ -87,6 +89,7 @@ case $key in | |||
esac | |||
done | |||
|
|||
# stores unknown selected parameters for error report |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nicer to write something like "check for unknown arguments".
bin/1.2_filter_motifs/compareBed.sh
Outdated
@@ -99,6 +102,7 @@ then | |||
exit 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing indentation
bin/1.2_filter_motifs/compareBed.sh
Outdated
@@ -99,6 +102,7 @@ then | |||
exit 1 | |||
fi | |||
|
|||
# the help message | |||
if [ $he == true ] | |||
then | |||
echo "This script utilies bedtools to select new footprints from data." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utilizes
bin/1.2_filter_motifs/merge.R
Outdated
colnames(splitted) = c("chromosome", "start", "stop", "id", "score", "length", "maxpos", "info") | ||
colnames(splitted) = c("chromosome", "start", "stop", "id", "score", "strand", "length", "maxpos", "info") | ||
|
||
# reading the second dataframe: called p1 (all sequences with zero overlap) | ||
p1 = fread(paste(folder, "/pass1Tr.bed", sep='')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
paste -> file.path
bin/1.2_filter_motifs/merge.R
Outdated
p1 = fread(paste(folder, "/pass1Tr.bed", sep='')) | ||
colnames(p1) = c("chromosome", "start", "stop", "id", "score", "length", "maxpos", "info") | ||
colnames(p1) = c("chromosome", "start", "stop", "id", "score", "strand", "length", "maxpos", "info") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not robust.
bin/1.2_filter_motifs/merge.R
Outdated
colnames(splitted) = c("chromosome", "start", "stop", "id", "score", "length", "maxpos", "info") | ||
colnames(splitted) = c("chromosome", "start", "stop", "id", "score", "strand", "length", "maxpos", "info") | ||
|
||
# reading the second dataframe: called p1 (all sequences with zero overlap) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still a data.table.
bin/1.2_filter_motifs/merge.R
Outdated
splitted=splitted[which(splitted$stop - splitted$start >= min),] | ||
splitted=splitted[which(splitted$stop - splitted$start <= max),] | ||
|
||
# make the ids unique (because of duplicated ids of some footprints that got spliited in 2) | ||
splitted$id=make.unique(as.character(splitted$id)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make.unique only appends numbers to duplicates leaving the first (of the duplicates) as is but we want every duplicate to get a number. (I think that already was discussed at some point.)
bin/1.2_filter_motifs/merge.R
Outdated
splitted=cbind(splitted, containsMaxpos=0) | ||
splitted$containsMaxpos[intersect(which(splitted$start <= splitted$maxpos), which(splitted$stop > splitted$maxpos))] = 1 | ||
|
||
#calculate relative maxpos values | ||
splitted$maxpos = splitted$maxpos - splitted$start | ||
data.table::fwrite(splitted, paste(folder, "/merged.bed", sep=''), row.names=FALSE, col.names=FALSE, quote=FALSE, sep='\t') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you call library(data.table)
data.table::
is not needed. Also file.path instead of paste.
I get following error with the mDux data:
No idea where the 10th column comes from but this proves that you need to make the naming more robust. You should just use the column names from the bed file you get as input. |
@JannikHamp I committed a few fixes to your branch. Please pull before doing further changes! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few typos, wording.
R: Please use '<-' instead of '=' to assign a variable. In rare cases '=' can lead to errors in the script.
You should also add more comments to the statistics.
For each line what information is generated.
For example:
sum_data = sum(data[[3]]-data[[2]])
No idea what information is stored in sum_data.
bin/1.2_filter_motifs/compareBed.sh
Outdated
path=`echo $0 | sed 's/\/[^\/]*$/\//g'` | ||
help=false | ||
|
||
# display help when no parameters chosen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.. no parameters are given
bin/1.2_filter_motifs/compareBed.sh
Outdated
fi | ||
|
||
if [ $ma == false ] | ||
# motiffiles either from a directory OR comma separated list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get motif files either...
bin/1.2_filter_motifs/compareBed.sh
Outdated
ma=true | ||
fi | ||
# creates an array of all files with bed in its name in the directory $motifs | ||
declare -a motiffiles=(`ls $motifs | grep bed | sed "s|^|$motifs\/|g" | tr '\n' ' ' | sed "s|//|/|g"`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't you check for the file ending? I would use: ' grep "*.bed" '
bin/1.2_filter_motifs/compareBed.sh
Outdated
if [ ! -d $workdir ] | ||
then | ||
mkdir $workdir | ||
# the else case means, that the motiffiles were passed comma separated with no whitespace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
motif files
bin/1.2_filter_motifs/compareBed.sh
Outdated
for i in ${motiffiles[@]} | ||
do | ||
if [ $help == true ] | ||
# remove trailing tabs in motiffile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
motif file
|
||
Rscript --vanilla $path/merge.R $min $max $workdir $data | ||
# check if header existed. If so, final output also has a header. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... exists.
stop("footprint file has less than 9 columns. exiting.") | ||
} | ||
|
||
# remove sequences that are smaller than minimum (parameter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(parameter: min)
(parameter: max)
bin/1.2_filter_motifs/compareBed.sh
Outdated
else | ||
help=true | ||
bedtools intersect -v -a "$workdir"/pass1TrHelp.bed -b $i > "$workdir"/pass1Tr.bed | ||
echo file $i does not exist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would write ERROR: or Error: in front of the error message. So the user can identify the error quicker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow this finally looks good!
…Fixed error which wrote output to the path of subscripts instead of the working directory