Skip to content
This repository has been archived by the owner. It is now read-only.

A snakemake pipeline to preprocess data from single cell RNAseq

Notifications You must be signed in to change notification settings

jenzopr/scRNAseq.preprocessing

Repository files navigation

single-cell-preprocessing

A snakemake pipeline to preprocess data from single cell RNAseq. sc-preprocess can handle data from multiple platforms, e.g. C1, Wafergen and DropSeq.

Setup

git clone ...
cd single-cell-preprocessing
conda env create -f environment.yml

Quickstart for DropSeq-based data

Configuration

Edit the file config/dropseq.yml and adapt the options to your needs. Pay attention to the data section and add the paths to individual fastq files like this:

data:
  files:
    condition1:
      r1: path/to/condition1_r1.fastq.gz
      r2: path/to/condition1_r2.fastq.gz
    condition2:
      r1: path/to/condition2_r1.fastq.gz
      r2: path/to/condition2_r2.fastq.gz

Run analysis

A call to snakemake -s sc-preprocess.snake will create a whitelist of true barcodes per condition, demultiplex barcodes into fastq files for each valid barcode and quantify the data.

Quickstart for plate-based data (C1, Wafergen)

Configuration

Edit the file config/plateseq.yml and adapt the options to your needs. If you already have fastq files for each cell (sample), set demultiplex: False in the action section.

Writing the samplesheet

Plate-based data needs a samplesheet, that gives information for each sample that should be processed. A samplesheet is a tab-separated table that contains at least a column with a samples name:

Name Batch Condition
cell1 batch1 wildtype
cell2 batch1 mutant
cell3 batch2 wildtype
cell4 batch2 mutant

Other columns might be needed, depending on the experimental setup:

  • C1 data needs a URL_r1 with the path to the samples fastq file, but no Barcode column.
  • Wafergen data needs a Barcode column with the barcode that is used during multiplexing, but no URL_r1 column.

The names of columns can be arbirtary, but need to be given for the sample name (index) and barcode column in the config/plateseq.yml file:

samplesheet:
  file: SampleSheet.txt
  index: Name
  barcode: Barcode

Run analysis

A call to snakemake -s sc-preprocess.snake will demultiplex barcodes into fastq files, if needed, and quantify the data.

About

A snakemake pipeline to preprocess data from single cell RNAseq

Topics

Resources

Stars

Watchers

Forks