Skip to content

How to use the scripts provided by the workflow

renewiegandt edited this page Mar 31, 2021 · 5 revisions

The Workflow

Projekt-Schnipp-Schnapp_Pipeline

The workflow is separated into three sections. Section A for analyzing the Nanopore data, Section B for analyzing the gRNA NGS data and Section C for the final visualization.

Input

The data files needed to run this analysis are:

  • A FASTQ-file containing all NanoPore reads
  • The reference gene in FASTA format
  • Exon gene annotation file in BED3 format
  • All guide sequences in FASTA format
  • A TSV-file with guide pair NGS counts
    • Example file:
      GuideA GuideB Rb_Tiling_ctrl_A Rb_tiling_lib Rb_Tiling_ctrl_C Rb_Tiling_end_B Rb_Tiling_end_D
      exon1_1234_0.60 exon2_1234_0.60 100 50 112 345 367
      exon1_1234_0.60 exon3_1234_0.60 20 25 10 490 510
      exon1_1234_0.60 exon4_1234_0.60 101 120 39 1123 1000

Note: The guide names need to be in following format: Name[X]_[locus]_[doench score]

Section A

The A section of the workflow analyses the Nanopore sequences and identifies the excisions.

  1. The first step is to merge the FASTA files if the sequences are split up into multiple files. This can be easily done using basic shell commands:

cat *.fasta.txt > combined.fasta

  1. The combined FASTA file serves as the input for minimap2. Minimap2 aligns the sequences to the reference sequence. Minimap2 is run in the spliced alignment mode.

minimap2 -ax splice --secondary=no gene.fa query.fa > align.sam

  1. The aligned sequences are then filtered

awk '($2 == "2064" || $2 == "2048") && (NR >1) {next} {print}' align.sam> align_unique.sam

  1. The filtered SAM file is converted to a BED file using paftools. Paftools is a script provided by minimap2

paftools.js splice2bed align_unique.sam > align.bed

  1. The resulting BED file contains the aligned exons of each sequence as the blocks in columns 10-12.

  2. By overlapping these exons with the exons from the reference gene, the missing exons can be identified. This is done why a script provided by this workflow.

python ./Project-Schnipp-Schnapp/scripts/exon_exon_excision_count.py -n alignment_unique.bed -e exon_annotation.bed -o nanopore_count.tsv

Section B

Section C