Skip to content

How to use the scripts provided by the workflow

renewiegandt edited this page Apr 15, 2021 · 5 revisions

The Workflow

Projekt-Schnipp-Schnapp_Pipeline

The workflow is separated into three sections. Section A for analyzing the Nanopore data, Section B for analyzing the gRNA NGS data, and Section C for the final visualization.

Input

The data files needed to run this analysis are:

  • A FASTQ-file containing all NanoPore reads
  • The reference gene in FASTA format
  • Exon gene annotation file in BED3 format
  • All guide sequences in FASTA format
  • A TSV-file with guide pair NGS counts
    • Example file:
      GuideA GuideB Rb_Tiling_ctrl_A Rb_tiling_lib Rb_Tiling_ctrl_C Rb_Tiling_end_B Rb_Tiling_end_D
      exon1_1234_0.60 exon2_1234_0.60 100 50 112 345 367
      exon1_1234_0.60 exon3_1234_0.60 20 25 10 490 510
      exon1_1234_0.60 exon4_1234_0.60 101 120 39 1123 1000

Note: The guide names need to be in following format: Name[X]_[locus]_[doench score]

Section A

The A section of the workflow analyses the Nanopore sequences and identifies the excisions.

  1. The first step is to merge the FASTA files if the sequences are split up into multiple files. This can be easily done using basic shell commands:

cat *.fasta.txt > combined.fasta

  1. The combined FASTA file serves as the input for minimap2. Minimap2 aligns the sequences to the reference sequence. Minimap2 is run in the spliced alignment mode.

minimap2 -ax splice --secondary=no gene.fa query.fa > align.sam

  1. The aligned sequences are then filtered

awk '($2 == "2064" || $2 == "2048") && (NR >1) {next} {print}' align.sam> align_unique.sam

  1. The filtered SAM file is converted to a BED file using paftools. Paftools is a script provided by minimap2

paftools.js splice2bed align_unique.sam > align.bed

  1. The resulting BED file contains the aligned exons of each sequence as the blocks in columns 10-12.

  2. By overlapping these exons with the exons from the reference gene, the missing exons can be identified. This is done why a script provided by this workflow. The exon annotation needs to be respectively to the reference.

python ./ANGIE/scripts/exon_exon_excision_count.py -n alignment_unique.bed -e exon_annotation.bed -o nanopore_count.tsv

Section B

The B section of the workflow focuses on the analysis of the gRNA pair NGS counts. The section is split into B.1 and B.2

Subsection B.1

Subsection B.1 validates the position of the gRNAs on the reference sequence and the PAM orientation of the gRNAs in each pair. This is done by aligning the guide sequences to the reference using bowtie2. The resulting SAM file is converted to a TSV file with the following columns:
gRNA name | PAM orientation | reference name | position | sequence

This TSV-file serves as an input for the guide setup in subsection B.2

Subsection B.2

Subsection B.2 normalizes the gRNA counts. The counts are normalized using the DESeq normalization.

python ./ANGIE/scripts/deseq_wrapper.py -i gRNA_count.tsv -p /Project-Schnipp-Schnapp/scripts/deseq_norm.R -o ./output_dir/

The normalized counts serve as the input for the guide setup script. The setup script sets the guide pairs up to be quantified.

python ./ANGIE/scripts/guide_setup.py -i normalized.counts.tsv -a guide_align(from B.1).tsv -o guide_setup.tsv

The resulting guide_setup.tsv serves as the input for the quantification script. It counts the gRNA pairs and bins them into the intron-intron or exon-exon combinations.

python ./ANGIE/scripts/get_guide_pair_count.py -i guide_setup.tsv -o guide_pair_count.tsv

Combine section A and B

To combine the data from section A and section B the merge_count_dfs.py is called.

python ./ANGIE/scripts/merge_count_dfs.py -i nanopore_count.tsv -n guide_pair_count.tsv -o merged_counts.tsv

Section C

Section C is about plotting the data. The plot_log2fold.py script plots the log2 fold change of each pair as a scatter plot.

python ./ANGIE/scripts/plot_log2fold.py -i guide_pair_count.tsv -o lfc_scatter.png

Additionally, the scatter_exon_exon.py can be called to compare the bins with each other. For example, the exon-exon combinations.

python ./Project-Schnipp-Schnapp/scripts/scatter_exon_exon.py -i merged_counts.tsv -e [number of binds] -q [quadrant split] -o scatter.png