Skip to content
This repository has been archived by the owner. It is now read-only.

Commit

Permalink
Added quickstart information
Browse files Browse the repository at this point in the history
  • Loading branch information
jenzopr authored Apr 6, 2018
1 parent 3c9d1ec commit 7368777
Showing 1 changed file with 55 additions and 1 deletion.
56 changes: 55 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,56 @@
# single-cell-preprocessing
A snakemake pipeline to preprocess data from single cell RNAseq
A snakemake pipeline to preprocess data from single cell RNAseq. `sc-preprocess` can handle data from multiple platforms, e.g. C1, Wafergen and DropSeq.

## Setup
```
git clone ...
cd single-cell-preprocessing
conda env create -f environment.yml
```

## Quickstart for DropSeq-based data
#### Configuration
Edit the file `config/dropseq.yml` and adapt the options to your needs. Pay attention to the `data` section and add the paths to individual fastq files like this:

```
data:
files:
condition1:
r1: path/to/condition1_r1.fastq.gz
r2: path/to/condition1_r2.fastq.gz
condition2:
r1: path/to/condition2_r1.fastq.gz
r2: path/to/condition2_r2.fastq.gz
```

#### Run analysis
A call to `snakemake -s sc-preprocess.snake` will create a whitelist of true barcodes per condition, demultiplex barcodes into fastq files for each valid barcode and quantify the data.

## Quickstart for plate-based data (C1, Wafergen)
#### Configuration
Edit the file `config/plateseq.yml` and adapt the options to your needs. If you already have fastq files for each cell (sample), set `demultiplex: False` in the `action` section.

#### Writing the samplesheet
Plate-based data needs a samplesheet, that gives information for each sample that should be processed. A samplesheet is a tab-separated table that contains *at least* a column with a samples name:

| Name | Batch | Condition |
| --- | --- | --- |
| cell1 | batch1 | wildtype |
| cell2 | batch1 | mutant |
| cell3 | batch2 | wildtype |
| cell4 | batch2 | mutant |

Other columns might be needed, depending on the experimental setup:
* **C1 data** needs a `URL_r1` with the path to the samples fastq file, but no `Barcode` column.
* **Wafergen data** needs a `Barcode` column with the barcode that is used during multiplexing, but no `URL_r1` column.

The names of columns can be arbirtary, but need to be given for the sample name (`index`) and barcode column in the `config/plateseq.yml` file:
```
samplesheet:
file: SampleSheet.txt
index: Name
barcode: Barcode
```

#### Run analysis
A call to `snakemake -s sc-preprocess.snake` will demultiplex barcodes into fastq files, if needed, and quantify the data.

0 comments on commit 7368777

Please sign in to comment.