Skip to content
This repository has been archived by the owner. It is now read-only.
Permalink
master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
 
 
Cannot retrieve contributors at this time
# single-cell-preprocessing
A snakemake pipeline to preprocess data from single cell RNAseq. `sc-preprocess` can handle data from multiple platforms, e.g. C1, Wafergen and DropSeq.
## Setup
```
git clone ...
cd single-cell-preprocessing
conda env create -f environment.yml
```
## Quickstart for DropSeq-based data
#### Configuration
Edit the file `config/dropseq.yml` and adapt the options to your needs. Pay attention to the `data` section and add the paths to individual fastq files like this:
```
data:
files:
condition1:
r1: path/to/condition1_r1.fastq.gz
r2: path/to/condition1_r2.fastq.gz
condition2:
r1: path/to/condition2_r1.fastq.gz
r2: path/to/condition2_r2.fastq.gz
```
#### Run analysis
A call to `snakemake -s sc-preprocess.snake` will create a whitelist of true barcodes per condition, demultiplex barcodes into fastq files for each valid barcode and quantify the data.
## Quickstart for plate-based data (C1, Wafergen)
#### Configuration
Edit the file `config/plateseq.yml` and adapt the options to your needs. If you already have fastq files for each cell (sample), set `demultiplex: False` in the `action` section.
#### Writing the samplesheet
Plate-based data needs a samplesheet, that gives information for each sample that should be processed. A samplesheet is a tab-separated table that contains *at least* a column with a samples name:
| Name | Batch | Condition |
| --- | --- | --- |
| cell1 | batch1 | wildtype |
| cell2 | batch1 | mutant |
| cell3 | batch2 | wildtype |
| cell4 | batch2 | mutant |
Other columns might be needed, depending on the experimental setup:
* **C1 data** needs a `URL_r1` with the path to the samples fastq file, but no `Barcode` column.
* **Wafergen data** needs a `Barcode` column with the barcode that is used during multiplexing, but no `URL_r1` column.
The names of columns can be arbirtary, but need to be given for the sample name (`index`) and barcode column in the `config/plateseq.yml` file:
```
samplesheet:
file: SampleSheet.txt
index: Name
barcode: Barcode
```
#### Run analysis
A call to `snakemake -s sc-preprocess.snake` will demultiplex barcodes into fastq files, if needed, and quantify the data.