Skip to content

Commit

Permalink
convert BED file to FASTA (one per cluster)
Browse files Browse the repository at this point in the history
  • Loading branch information
renewiegandt committed Oct 29, 2018
1 parent 11b8997 commit 89eb998
Showing 1 changed file with 25 additions and 0 deletions.
25 changes: 25 additions & 0 deletions bin/bed_to_fasta.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
bed <- data.table::data.table(1:10,1:10,c("AAAAAAAAA","CTGAGA","CCCTAGC","GC","AA","ACGTACGTGTCA","GGGCCGCTA","GCA","TTTTTGCA","AAAATCGACGT"),c(1,2,3,1,1,1,2,2,3,3))

#!/usr/bin/env Rscript

# Splitting BED-files depending on their cluster.
# The Sequences of each cluster are writen as an FASTA-file.
# @parameter bedInput <string> BED-file with sequences and cluster-id as column
# @parameter out <string> output directory
# @parameter prefix <string> prefix for filenames

args = commandArgs(trailingOnly = TRUE)


bedInput <- args[1]
out <- args[2] # "G://Rene.Wiegandt/10_Master/"
prefix <- args[3] # "Fasta"

bed <- data.table::fread(bedInput, header = FALSE, sep = "\t")

clusters <- split(bed, bed$V4, sorted = TRUE, flatten = FALSE) # <---- Spalte mit Cluster
discard <- lapply(1:length(clusters), function(i){
sequences <- as.list(as.data.frame(clusters[i])[[3]]) # <---- Splate mit Sequenz
outfile <- paste0(out,prefix,"_cluster_",i)
seqinr::write.fasta(sequences = sequences, names = as.data.frame(clusters[i])[[2]], file.out = outfile, as.string = TRUE) # <---- Spalte mit Name
})

0 comments on commit 89eb998

Please sign in to comment.