updated script. Now the path of subscripts is automatically aquired. … #33

JannikHamp · 2019-01-04T11:21:55Z

…Fixed error which wrote output to the path of subscripts instead of the working directory

renewiegandt · 2019-01-04T13:02:31Z

There is close to no documentation. Please document your code!
please add your name and email address to the script
See Critical bug in compareBed.sh #23
Still not working with the nextflow script! What is the -p parameter for now?
I need the location of the output file! Where is it saved? in './' or somewhere else?

renewiegandt

See comment.

renewiegandt · 2019-01-04T13:18:05Z

Maybe you could check if there are new/unknown motif and if not you should throw an error message.
What happens right now if the output is empty?

JannikHamp · 2019-01-04T13:29:47Z

I want to write the documentation later, at home
same
paths with double // still work with bash. even ///bla/file.fasta I am not sure if this is the source of the error but I ll implement a check for tailing /
-p is the path where the two subscripts merge.R and maxScore.R are located the parameter can be skipped now since the path is aquired automatically
output directory is at . if no path is given

renewiegandt · 2019-01-04T13:34:36Z

./compareBed.sh -d /mnt/agnerds/masterjlu2018/out5/footprint_extraction/buenrostro50k_chr1_fp_called_peaks.bed -m /mnt/workspace1/rene.wiegandt/tmp6/ --fasta /mnt/agnerds/masterjlu2018/testdata/presentation/mm10_chr.fa -o test.beb

Gets following:

Warning message:
In fread(paste(folder, "/pass2Tr.bed", sep = "")) :
  File '/mnt/agnerds/Rene.Wiegandt/repo/masterJLU2018/bin/1.2_filter_motifs/pass2Tr.bed' has size 0. Returning a NULL data.table.
Error in setnames(x, value) :
  Can't assign 8 names to a 0 column data.table
Calls: colnames<- -> names<- -> names<-.data.table -> setnames
Execution halted
Error: The requested file (/mnt/agnerds/Rene.Wiegandt/repo/masterJLU2018/bin/1.2_filter_motifs/merged.bed) could not be opened. Error message: (No such file or directory). Exiting!

The // paths are working but It should be changed.

HendrikSchultheis

I completely agree with @renewiegandt. Right now it is really hard for someone else to unterstand what the code is doing, please add a lot more comments!

HendrikSchultheis · 2019-01-04T13:43:47Z

Please refer to our guidelines while doing so.

JannikHamp · 2019-01-04T13:54:38Z

Anastasiia hat doch ein neues bed-format mit 9 spalten. in den daten /mnt/agnerds/masterjlu2018/out5/footprint_extraction/buenrostro50k_chr1_fp_called_peaks.bed sind nur 8 Spalten enthalten. Es fehlt strand nach score. Das wäre das erste was mir auffällt

JannikHamp · 2019-01-04T13:56:07Z

Edit: Diese version arbeitet jedoch auch noch mit 8 spalten glaube ich. Ist ein anderer Fehler

renewiegandt · 2019-01-04T13:59:47Z

Please write your comments in English, thanks.
I tried it with new data. It is still not working.

JannikHamp · 2019-01-04T14:05:06Z

the path /mnt/workspace/rene.wiegandt/tmp6/ does not exist.
So it runs in an error when this is the -m parameter, when I try to reproduce the error

renewiegandt · 2019-01-04T14:11:06Z

The path exists its /mnt/workspace1/rene.wiegandt/tmp6/. You can't reach it from your VM.
I'm going to run it on the full dataset. Going to take a while it could be possible that everything is filtered in this dataset! You should definitely catch this case with an error message.

anastasiia · 2019-01-04T14:43:35Z

I have just made a pull request #38 with my script that always produces the same number of columns, if I receive no information from original input bed file, there will be a "." at the last column. Please make sure @JannikHamp that your script is working now.

renewiegandt · 2019-01-04T15:55:57Z

You should also add documentation to the two subscripts.

I'm not sure why you are writting different files depending on teh help parameter?
Could you explain it to me?

JannikHamp · 2019-01-04T16:04:31Z

The help parameter at this point is just introduced at that point and has nothing to do with the help of the command compareBed.sh
It is just a memory which file was used in the last iteration.
Bash can only write to file if it is not reading from it at the same time. But the bedtools command are reading and writing at the same time, so i need to write the output to another file. Example:
I have data with new footprints and motiffile1, motiffile2, and motiffile3:
I write data to pass1Tr.bed
then i call the bedtools command with pass1Tr.bed and motiffile1 > pass1TrHelp.bed
Next I compare pass1TrHelp.bed with motiffile2 > pass1Tr.bed
Last I compare pass1Tr.bed with motiffile3 > pass1TrHelp.bed

So pass1TrHelp.bed now contains the information of all comparisons

renewiegandt · 2019-01-04T16:07:16Z

Ah than I got it mixed up. Maybe you could rename the variable?

renewiegandt · 2019-01-04T16:12:10Z

I tried it on the full mDux data. I still got the same error.
Where is your test data I would like to run the script with your data?

JannikHamp · 2019-01-04T16:17:35Z

-d /mnt/agnerdsJannik.Hamp/call_peaks_output_to_check.bed
-m /mnt/agnerds/Jannik.Hamp/TFBSscanOutmm10/test/
--fasta /mnt/agnerds/masterjlu2018/fasta/mm10_upper.fasta

but I corrected the script now for 9 columns bed-files. this will not work with the 8 column file /mnt/agnerdsJannik.Hamp/call_peaks_output_to_check.bed

renewiegandt · 2019-01-04T22:44:08Z

If an error occurs in your script prints an error message to stdout but does not stop the script.

HendrikSchultheis

Overall try to be more consistent with your spacing. And please check your code for typos.

HendrikSchultheis · 2019-01-04T22:22:21Z

bin/1.2_filter_motifs/compareBed.sh

-#output
+
+# This script utilizes bedtools to gain non-overlapping sequence parts between bed-files
+# merge.R and maxScore.R are needed to be saved in the same directory than this to make it work


...same directory as ...

bin/1.2_filter_motifs/compareBed.sh

HendrikSchultheis · 2019-01-04T22:35:54Z

bin/1.2_filter_motifs/compareBed.sh

@@ -87,6 +89,7 @@ case $key in
 esac
 done

+# stores unknown selected parameters for error report 


It would be nicer to write something like "check for unknown arguments".

HendrikSchultheis · 2019-01-04T22:37:05Z

bin/1.2_filter_motifs/compareBed.sh

@@ -99,6 +102,7 @@ then
 exit 1


missing indentation

HendrikSchultheis · 2019-01-04T22:39:20Z

bin/1.2_filter_motifs/compareBed.sh

@@ -99,6 +102,7 @@ then
 exit 1
 fi

+# the help message
 if [ $he == true ]
 then
 	echo "This script utilies bedtools to select new footprints from data."


HendrikSchultheis · 2019-01-04T23:54:47Z

bin/1.2_filter_motifs/merge.R

-colnames(splitted) = c("chromosome", "start", "stop", "id", "score", "length", "maxpos", "info")
+colnames(splitted) = c("chromosome", "start", "stop", "id", "score", "strand", "length", "maxpos", "info")
+
+# reading the second dataframe: called p1 (all sequences with zero overlap)
 p1 = fread(paste(folder, "/pass1Tr.bed", sep=''))


paste -> file.path

HendrikSchultheis · 2019-01-04T23:54:55Z

bin/1.2_filter_motifs/merge.R

 p1 = fread(paste(folder, "/pass1Tr.bed", sep=''))
-colnames(p1) = c("chromosome", "start", "stop", "id", "score", "length", "maxpos", "info")
+colnames(p1) = c("chromosome", "start", "stop", "id", "score", "strand", "length", "maxpos", "info")


Not robust.

HendrikSchultheis · 2019-01-04T23:55:15Z

bin/1.2_filter_motifs/merge.R

-colnames(splitted) = c("chromosome", "start", "stop", "id", "score", "length", "maxpos", "info")
+colnames(splitted) = c("chromosome", "start", "stop", "id", "score", "strand", "length", "maxpos", "info")
+
+# reading the second dataframe: called p1 (all sequences with zero overlap)


Still a data.table.

HendrikSchultheis · 2019-01-05T00:01:42Z

bin/1.2_filter_motifs/merge.R

 splitted=splitted[which(splitted$stop - splitted$start >= min),]
 splitted=splitted[which(splitted$stop - splitted$start <= max),]
+
+# make the ids unique (because of duplicated ids of some footprints that got spliited in 2)
 splitted$id=make.unique(as.character(splitted$id))


make.unique only appends numbers to duplicates leaving the first (of the duplicates) as is but we want every duplicate to get a number. (I think that already was discussed at some point.)

HendrikSchultheis · 2019-01-05T00:03:47Z

bin/1.2_filter_motifs/merge.R

 splitted=cbind(splitted, containsMaxpos=0)
 splitted$containsMaxpos[intersect(which(splitted$start <= splitted$maxpos), which(splitted$stop > splitted$maxpos))] = 1
+
+#calculate relative maxpos values
 splitted$maxpos = splitted$maxpos - splitted$start
 data.table::fwrite(splitted, paste(folder, "/merged.bed", sep=''), row.names=FALSE, col.names=FALSE, quote=FALSE, sep='\t')


If you call library(data.table) data.table:: is not needed. Also file.path instead of paste.

renewiegandt · 2019-01-06T14:36:28Z

critical bug is fixed by cca8ac4 and f983d22 .

renewiegandt · 2019-01-06T14:37:50Z

renewiegandt · 2019-01-06T18:31:44Z

I get following error with the mDux data:

Command error:
  Error in setnames(x, value) :
    Can't assign 9 names to a 10 column data.table
  Calls: colnames<- -> names<- -> names<-.data.table -> setnames
  Execution halted
  Error in setnames(x, value) :
    Can't assign 9 names to a 10 column data.table
  Calls: colnames<- -> names<- -> names<-.data.table -> setnames
  Execution halted
  index file mm10_chr.fa.fai not found, generating...
  Error: The requested file (merged.bed) could not be opened. Error message: (No such file or directory). Exiting!

No idea where the 10th column comes from but this proves that you need to make the naming more robust. You should just use the column names from the bed file you get as input.

renewiegandt · 2019-01-06T20:50:33Z

@JannikHamp I committed a few fixes to your branch. Please pull before doing further changes!

renewiegandt

Few typos, wording.
R: Please use '<-' instead of '=' to assign a variable. In rare cases '=' can lead to errors in the script.

You should also add more comments to the statistics.
For each line what information is generated.
For example:
sum_data = sum(data[[3]]-data[[2]])
No idea what information is stored in sum_data.

renewiegandt · 2019-01-09T09:59:15Z

bin/1.2_filter_motifs/compareBed.sh

+path=`echo $0 | sed 's/\/[^\/]*$/\//g'`
+help=false
+
+# display help when no parameters chosen


.. no parameters are given

renewiegandt · 2019-01-09T09:59:19Z

bin/1.2_filter_motifs/compareBed.sh

-fi
-
-if [ $ma == false ]
+# motiffiles either from a directory OR comma separated list


get motif files either...

renewiegandt · 2019-01-09T10:01:01Z

bin/1.2_filter_motifs/compareBed.sh

-	ma=true
-fi
+	# creates an array of all files with bed in its name in the directory $motifs  
+	declare -a motiffiles=(`ls $motifs | grep bed | sed "s|^|$motifs\/|g" | tr '\n' ' ' | sed "s|//|/|g"`)


Why don't you check for the file ending? I would use: ' grep "*.bed" '

renewiegandt · 2019-01-09T10:01:15Z

bin/1.2_filter_motifs/compareBed.sh

-if [ ! -d $workdir ]
-then
-	mkdir $workdir
+# the else case means, that the motiffiles were passed comma separated with no whitespace.


motif files

renewiegandt · 2019-01-09T10:02:05Z

bin/1.2_filter_motifs/compareBed.sh

 for i in ${motiffiles[@]}
 do
-	if [ $help == true ]
+	# remove trailing tabs in motiffile


renewiegandt · 2019-01-09T10:05:36Z

bin/1.2_filter_motifs/compareBed.sh


-Rscript --vanilla $path/merge.R $min $max $workdir $data
+# check if header existed. If so, final output also has a header.


... exists.

renewiegandt · 2019-01-09T10:11:40Z

bin/1.2_filter_motifs/compareBed_runinfo.R

+	stop("footprint file has less than 9 columns. exiting.")
+}
+
+# remove sequences that are smaller than minimum (parameter)


(parameter: min)
(parameter: max)

renewiegandt · 2019-01-09T10:48:03Z

bin/1.2_filter_motifs/compareBed.sh

 	else
-		help=true
-		bedtools intersect -v -a "$workdir"/pass1TrHelp.bed -b $i > "$workdir"/pass1Tr.bed
+		echo file $i does not exist


I would write ERROR: or Error: in front of the error message. So the user can identify the error quicker.

HendrikSchultheis

Wow this finally looks good!

updated script. Now the path of subscripts is automatically aquired. …

694291c

…Fixed error which wrote output to the path of subscripts instead of the working directory

renewiegandt requested review from renewiegandt and HendrikSchultheis January 4, 2019 11:52

renewiegandt requested changes Jan 4, 2019

View reviewed changes

renewiegandt added the urgent Needs to be done as fast as possible label Jan 4, 2019

HendrikSchultheis requested changes Jan 4, 2019

View reviewed changes

JannikHamp added 5 commits January 4, 2019 16:28

added some documentation

3ebb1b8

added collumn for strand information

7d834a4

added column "strand"

7de9a1b

documentation

1a49821

documentation

9b5b061

JannikHamp added 2 commits January 4, 2019 17:08

documentation

9ecba8a

documentation

58eff98

HendrikSchultheis requested changes Jan 5, 2019

View reviewed changes

JannikHamp and others added 3 commits January 5, 2019 19:03

Update compareBed.sh

b1cab10

Added check for \t at the end of lines in BED-file

cca8ac4

Remove header = false from fread

f983d22

merge.R: Set separator from auto to '\t' in fread

1134d74

JannikHamp added 5 commits January 8, 2019 13:53

this r script is no more necessary, the other does its job

37c69e6

New updated version. Faster and more robust

6802d72

updated version.. make unique updated, more robust in general

b787d14

added information for logfile

becfeae

added dicumentation and parameter for .stats output file

c55e8ff

HendrikSchultheis mentioned this pull request Jan 9, 2019

Critical bug in compareBed.sh #23

Closed

3 tasks

documentation changes

4acc20f

renewiegandt requested changes Jan 9, 2019

View reviewed changes

renewiegandt reviewed Jan 9, 2019

View reviewed changes

JannikHamp added 2 commits January 9, 2019 15:29

updated check for trailing tabs in motiffiles

90c8c05

removed echo from testing

1a35ce5

HendrikSchultheis mentioned this pull request Jan 9, 2019

ToDo List #10

Open

35 tasks

more documentation, replce =, <-

530627c

renewiegandt approved these changes Jan 9, 2019

View reviewed changes

HendrikSchultheis approved these changes Jan 9, 2019

View reviewed changes

renewiegandt merged commit 81c6fcd into dev Jan 9, 2019

HendrikSchultheis mentioned this pull request Jan 10, 2019

Review merge.R #26

Closed


		Rscript --vanilla $path/merge.R $min $max $workdir $data
		# check if header existed. If so, final output also has a header.

updated script. Now the path of subscripts is automatically aquired. … #33

updated script. Now the path of subscripts is automatically aquired. … #33

Conversation

JannikHamp commented Jan 4, 2019

renewiegandt commented Jan 4, 2019

renewiegandt left a comment

Choose a reason for hiding this comment

renewiegandt commented Jan 4, 2019

JannikHamp commented Jan 4, 2019

renewiegandt commented Jan 4, 2019

HendrikSchultheis left a comment

Choose a reason for hiding this comment

HendrikSchultheis commented Jan 4, 2019

JannikHamp commented Jan 4, 2019

JannikHamp commented Jan 4, 2019

renewiegandt commented Jan 4, 2019

JannikHamp commented Jan 4, 2019

renewiegandt commented Jan 4, 2019

anastasiia commented Jan 4, 2019 • edited Loading

renewiegandt commented Jan 4, 2019

JannikHamp commented Jan 4, 2019

renewiegandt commented Jan 4, 2019

renewiegandt commented Jan 4, 2019

JannikHamp commented Jan 4, 2019

renewiegandt commented Jan 4, 2019

HendrikSchultheis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

renewiegandt commented Jan 6, 2019

renewiegandt commented Jan 6, 2019

renewiegandt commented Jan 6, 2019

renewiegandt commented Jan 6, 2019

renewiegandt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HendrikSchultheis left a comment

Choose a reason for hiding this comment

anastasiia commented Jan 4, 2019 •

edited

Loading