Skip to content
This repository has been archived by the owner. It is now read-only.

jenzopr/scRNAseq.preprocessing

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
src
 
 
 
 
 
 
 
 

single-cell-preprocessing

A snakemake pipeline to preprocess data from single cell RNAseq. sc-preprocess can handle data from multiple platforms, e.g. C1, Wafergen and DropSeq.

Setup

git clone ...
cd single-cell-preprocessing
conda env create -f environment.yml

Quickstart for DropSeq-based data

Configuration

Edit the file config/dropseq.yml and adapt the options to your needs. Pay attention to the data section and add the paths to individual fastq files like this:

data:
  files:
    condition1:
      r1: path/to/condition1_r1.fastq.gz
      r2: path/to/condition1_r2.fastq.gz
    condition2:
      r1: path/to/condition2_r1.fastq.gz
      r2: path/to/condition2_r2.fastq.gz

Run analysis

A call to snakemake -s sc-preprocess.snake will create a whitelist of true barcodes per condition, demultiplex barcodes into fastq files for each valid barcode and quantify the data.

Quickstart for plate-based data (C1, Wafergen)

Configuration

Edit the file config/plateseq.yml and adapt the options to your needs. If you already have fastq files for each cell (sample), set demultiplex: False in the action section.

Writing the samplesheet

Plate-based data needs a samplesheet, that gives information for each sample that should be processed. A samplesheet is a tab-separated table that contains at least a column with a samples name:

Name Batch Condition
cell1 batch1 wildtype
cell2 batch1 mutant
cell3 batch2 wildtype
cell4 batch2 mutant

Other columns might be needed, depending on the experimental setup:

  • C1 data needs a URL_r1 with the path to the samples fastq file, but no Barcode column.
  • Wafergen data needs a Barcode column with the barcode that is used during multiplexing, but no URL_r1 column.

The names of columns can be arbirtary, but need to be given for the sample name (index) and barcode column in the config/plateseq.yml file:

samplesheet:
  file: SampleSheet.txt
  index: Name
  barcode: Barcode

Run analysis

A call to snakemake -s sc-preprocess.snake will demultiplex barcodes into fastq files, if needed, and quantify the data.

About

A snakemake pipeline to preprocess data from single cell RNAseq

Topics

Resources

Stars

Watchers

Forks