The script mainly utilizes bedtools for comparison. The sequences in the bed-file of the experiment are compared with sequences of known motifs, also in bed-format. Only sequences with no overlap with the known-motif sequences are selected in the first step. Then the overlapping parts of overlapping sequences are also selected as new sequences. The resulting bed-file contains the sequences selected in both steps. The output file has two extra columns appended: 1st, the DNA sequence of the entry, and 2nd, a flag with possible values of '1' or '0' for 'no_overlap' or 'overlap' of the maximum score region of the entity with any known-motif sequence.
Apart from the experiment data (bed-format) of sequences with possible motifs, another bed-file of the sequences with known motifs and a genome file in fasta format are required. The bed-file with known motif sequences can be calculated with the tool motifscan.py.
For usage, run ./compareBed.sh
dependencies:
- R ?version
- bedtools ?version