Name	Name	Last commit message	Last commit date
Latest commit jenzopr Bugfix: Salmon meta info output Jun 18, 2018 f08b04d · Jun 18, 2018 History 21 Commits
config	config	Added kallisto UMI quantification	Apr 26, 2018
src	src	Bugfix: Salmon meta info output	Jun 18, 2018
.gitignore	.gitignore	Initial commit of prinseq-cleaning	Apr 3, 2018
README.md	README.md	Added quickstart information	Apr 6, 2018
environment.yml	environment.yml	Small bug fixes	Apr 30, 2018
sc-preprocess.snake	sc-preprocess.snake	Added kallisto UMI quantification	Apr 26, 2018

Repository files navigation

single-cell-preprocessing

A snakemake pipeline to preprocess data from single cell RNAseq. sc-preprocess can handle data from multiple platforms, e.g. C1, Wafergen and DropSeq.

Setup

git clone ...
cd single-cell-preprocessing
conda env create -f environment.yml

Quickstart for DropSeq-based data

Configuration

Edit the file config/dropseq.yml and adapt the options to your needs. Pay attention to the data section and add the paths to individual fastq files like this:

data:
  files:
    condition1:
      r1: path/to/condition1_r1.fastq.gz
      r2: path/to/condition1_r2.fastq.gz
    condition2:
      r1: path/to/condition2_r1.fastq.gz
      r2: path/to/condition2_r2.fastq.gz

Run analysis

A call to snakemake -s sc-preprocess.snake will create a whitelist of true barcodes per condition, demultiplex barcodes into fastq files for each valid barcode and quantify the data.

Quickstart for plate-based data (C1, Wafergen)

Configuration

Edit the file config/plateseq.yml and adapt the options to your needs. If you already have fastq files for each cell (sample), set demultiplex: False in the action section.

Writing the samplesheet

Plate-based data needs a samplesheet, that gives information for each sample that should be processed. A samplesheet is a tab-separated table that contains at least a column with a samples name:

Name	Batch	Condition
cell1	batch1	wildtype
cell2	batch1	mutant
cell3	batch2	wildtype
cell4	batch2	mutant

Other columns might be needed, depending on the experimental setup:

C1 data needs a URL_r1 with the path to the samples fastq file, but no Barcode column.
Wafergen data needs a Barcode column with the barcode that is used during multiplexing, but no URL_r1 column.

The names of columns can be arbirtary, but need to be given for the sample name (index) and barcode column in the config/plateseq.yml file:

samplesheet:
  file: SampleSheet.txt
  index: Name
  barcode: Barcode

Run analysis

A call to snakemake -s sc-preprocess.snake will demultiplex barcodes into fastq files, if needed, and quantify the data.

Repository files navigation

single-cell-preprocessing

Setup

Quickstart for DropSeq-based data

Configuration

Run analysis

Quickstart for plate-based data (C1, Wafergen)

Configuration

Writing the samplesheet

Run analysis

About

Releases 1

Languages

jenzopr/scRNAseq.preprocessing

Folders and files

Latest commit

History

Repository files navigation

single-cell-preprocessing

Setup

Quickstart for DropSeq-based data

Configuration

Run analysis

Quickstart for plate-based data (C1, Wafergen)

Configuration

Writing the samplesheet

Run analysis

About

Topics

Resources

Stars

Watchers

Forks

Releases 1

Languages