-
Notifications
You must be signed in to change notification settings - Fork 0
How to use the scripts provided by the workflow
The workflow is separated into three sections. Section A for analyzing the Nanopore data, Section B for analyzing the gRNA NGS data and Section C for the final visualization.
The data files needed to run this analysis are:
- A FASTQ-file containing all NanoPore reads
- The reference gene in FASTA format
- Exon gene annotation file in BED3 format
- All guide sequences in FASTA format
- A TSV-file with guide pair NGS counts
- Example file:
GuideA GuideB Rb_Tiling_ctrl_A Rb_tiling_lib Rb_Tiling_ctrl_C Rb_Tiling_end_B Rb_Tiling_end_D exon1_1234_0.60 exon2_1234_0.60 100 50 112 345 367 exon1_1234_0.60 exon3_1234_0.60 20 25 10 490 510 exon1_1234_0.60 exon4_1234_0.60 101 120 39 1123 1000
- Example file:
Note: The guide names need to be in following format: Name[X]_[locus]_[doench score]
The A section of the workflow analyses the Nanopore sequences and identifies the excisions.
- The first step is to merge the FASTA files if the sequences are split up into multiple files. This can be easily done using basic shell commands:
cat *.fasta.txt > combined.fasta
- The combined FASTA file serves as the input for minimap2. Minimap2 aligns the sequences to the reference sequence. Minimap2 is run in the spliced alignment mode.
minimap2 -ax splice --secondary=no gene.fa query.fa > align.sam
- The aligned sequences are then filtered
awk '($2 == "2064" || $2 == "2048") && (NR >1) {next} {print}' align.sam> align_unique.sam
- The filtered SAM file is converted to a BED file using paftools. Paftools is a script provided by minimap2
paftools.js splice2bed align_unique.sam > align.bed
-
The resulting BED file contains the aligned exons of each sequence as the blocks in columns 10-12.
-
By overlapping these exons with the exons from the reference gene, the missing exons can be identified. This is done why a script provided by this workflow.
python ./Project-Schnipp-Schnapp/scripts/exon_exon_excision_count.py -n alignment_unique.bed -e exon_annotation.bed -o nanopore_count.tsv