The script mainly utilizes bedtools for comparison. The sequences in the bed-file from the experiment are compared to a bed-file of sequences of known motifs. Only sequences with no overlap with the known-motif sequences are selected in the first step. Then the overlapping parts of overlapping sequences are also selected as new sequences. The resulting bed-file contains the sequences selected in the two steps. The file has two extra columns appended: 1st the DNA sequence of the entry and 2nd a flag with possible values '1' or '0' for 'no_overlap' or 'overlap' of the maximum score region of the entity with any known-motif sequence.
Apart from the experiment data (bed-format) of sequences with possible motifs, another bed-file of the sequences with known motifs and a genome file in fasta format are required.
For usage, run ./compareBed.sh
dependencies:
- R