diff --git a/README.md b/README.md index dd5230d..51973b1 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,56 @@ # single-cell-preprocessing -A snakemake pipeline to preprocess data from single cell RNAseq +A snakemake pipeline to preprocess data from single cell RNAseq. `sc-preprocess` can handle data from multiple platforms, e.g. C1, Wafergen and DropSeq. + +## Setup +``` +git clone ... +cd single-cell-preprocessing +conda env create -f environment.yml +``` + +## Quickstart for DropSeq-based data +#### Configuration +Edit the file `config/dropseq.yml` and adapt the options to your needs. Pay attention to the `data` section and add the paths to individual fastq files like this: + +``` +data: + files: + condition1: + r1: path/to/condition1_r1.fastq.gz + r2: path/to/condition1_r2.fastq.gz + condition2: + r1: path/to/condition2_r1.fastq.gz + r2: path/to/condition2_r2.fastq.gz +``` + +#### Run analysis +A call to `snakemake -s sc-preprocess.snake` will create a whitelist of true barcodes per condition, demultiplex barcodes into fastq files for each valid barcode and quantify the data. + +## Quickstart for plate-based data (C1, Wafergen) +#### Configuration +Edit the file `config/plateseq.yml` and adapt the options to your needs. If you already have fastq files for each cell (sample), set `demultiplex: False` in the `action` section. + +#### Writing the samplesheet +Plate-based data needs a samplesheet, that gives information for each sample that should be processed. A samplesheet is a tab-separated table that contains *at least* a column with a samples name: + +| Name | Batch | Condition | +| --- | --- | --- | +| cell1 | batch1 | wildtype | +| cell2 | batch1 | mutant | +| cell3 | batch2 | wildtype | +| cell4 | batch2 | mutant | + +Other columns might be needed, depending on the experimental setup: +* **C1 data** needs a `URL_r1` with the path to the samples fastq file, but no `Barcode` column. +* **Wafergen data** needs a `Barcode` column with the barcode that is used during multiplexing, but no `URL_r1` column. + +The names of columns can be arbirtary, but need to be given for the sample name (`index`) and barcode column in the `config/plateseq.yml` file: +``` +samplesheet: + file: SampleSheet.txt + index: Name + barcode: Barcode +``` + +#### Run analysis +A call to `snakemake -s sc-preprocess.snake` will demultiplex barcodes into fastq files, if needed, and quantify the data.