How to use the scripts provided by the workflow

The Workflow

Projekt-Schnipp-Schnapp_Pipeline

The workflow is separated into three sections. Section A for analyzing the Nanopore data, Section B for analyzing the gRNA NGS data and Section C for the final visualization.

Input

The data files needed to run this analysis are:

A FASTQ-file containing all NanoPore reads
The reference gene in FASTA format
Exon gene annotation file in BED3 format
All guide sequences in FASTA format

A TSV-file with guide pair NGS counts

Example file:

GuideA	GuideB	Rb_Tiling_ctrl_A	Rb_tiling_lib	Rb_Tiling_ctrl_C	Rb_Tiling_end_B	Rb_Tiling_end_D
exon1_1234_0.60	exon2_1234_0.60	100	50	112	345	367
exon1_1234_0.60	exon3_1234_0.60	20	25	10	490	510
exon1_1234_0.60	exon4_1234_0.60	101	120	39	1123	1000

Note: The guide names need to be in following format: Name[X]_[locus]_[doench score]

Section A

The A section of the workflow analyses the Nanopore sequences and identifies the excisions.

The first step is to merge the FASTA files if the sequences are split up into multiple files. This can be easily done using basic shell commands:

cat *.fasta.txt > combined.fasta

The combined FASTA file serves as the input for minimap2. Minimap2 aligns the sequences to the reference sequence. Minimap2 is run in the spliced alignment mode.

minimap2 -ax splice --secondary=no gene.fa query.fa > align.sam

The aligned sequences are then filtered

awk '($2 == "2064" || $2 == "2048") && (NR >1) {next} {print}' align.sam> align_unique.sam

The filtered SAM file is converted to a BED file using paftools. Paftools is a script provided by minimap2

paftools.js splice2bed align_unique.sam > align.bed

The resulting BED file contains the aligned exons of each sequence as the blocks in columns 10-12.
By overlapping these exons with the exons from the reference gene, the missing exons can be identified. This is done why a script provided by this workflow.

python ./Project-Schnipp-Schnapp/scripts/exon_exon_excision_count.py -n alignment_unique.bed -e exon_annotation.bed -o nanopore_count.tsv

How to use the scripts provided by the workflow

The Workflow

Input

Section A

Section B

Section C

Clone this wiki locally