Added quickstart information

jenzopr · Apr 6, 2018 · 7368777 · 7368777
1 parent 3c9d1ec
commit 7368777
Showing 1 changed file with 55 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -1,2 +1,56 @@
 # single-cell-preprocessing
-A snakemake pipeline to preprocess data from single cell RNAseq
+A snakemake pipeline to preprocess data from single cell RNAseq. `sc-preprocess` can handle data from multiple platforms, e.g. C1, Wafergen and DropSeq.
+
+## Setup
+```
+git clone ...
+cd single-cell-preprocessing
+conda env create -f environment.yml
+```
+
+## Quickstart for DropSeq-based data
+#### Configuration
+Edit the file `config/dropseq.yml` and adapt the options to your needs. Pay attention to the `data` section and add the paths to individual fastq files like this:
+
+```
+data:
+  files:
+    condition1:
+      r1: path/to/condition1_r1.fastq.gz
+      r2: path/to/condition1_r2.fastq.gz
+    condition2:
+      r1: path/to/condition2_r1.fastq.gz
+      r2: path/to/condition2_r2.fastq.gz
+```
+
+#### Run analysis
+A call to `snakemake -s sc-preprocess.snake` will create a whitelist of true barcodes per condition, demultiplex barcodes into fastq files for each valid barcode and quantify the data.
+
+## Quickstart for plate-based data (C1, Wafergen)
+#### Configuration
+Edit the file `config/plateseq.yml` and adapt the options to your needs. If you already have fastq files for each cell (sample), set `demultiplex: False` in the `action` section. 
+
+#### Writing the samplesheet
+Plate-based data needs a samplesheet, that gives information for each sample that should be processed. A samplesheet is a tab-separated table that contains *at least* a column with a samples name:
+
+| Name  | Batch | Condition |
+| --- | --- | --- |
+| cell1 | batch1 | wildtype |
+| cell2 | batch1 | mutant |
+| cell3 | batch2 | wildtype |
+| cell4 | batch2 | mutant |
+
+Other columns might be needed, depending on the experimental setup:
+* **C1 data** needs a `URL_r1` with the path to the samples fastq file, but no `Barcode` column.
+* **Wafergen data** needs a `Barcode` column with the barcode that is used during multiplexing, but no `URL_r1` column.
+
+The names of columns can be arbirtary, but need to be given for the sample name (`index`) and barcode column in the `config/plateseq.yml` file:
+```
+samplesheet:
+  file: SampleSheet.txt
+  index: Name
+  barcode: Barcode
+```
+
+#### Run analysis
+A call to `snakemake -s sc-preprocess.snake` will demultiplex barcodes into fastq files, if needed, and quantify the data.