README.md

# single-cell-preprocessing
A snakemake pipeline to preprocess data from single cell RNAseq. `sc-preprocess` can handle data from multiple platforms, e.g. C1, Wafergen and DropSeq.

## Setup
```
git clone ...
cd single-cell-preprocessing
conda env create -f environment.yml
```

## Quickstart for DropSeq-based data
#### Configuration
Edit the file `config/dropseq.yml` and adapt the options to your needs. Pay attention to the `data` section and add the paths to individual fastq files like this:

```
data:
  files:
    condition1:
      r1: path/to/condition1_r1.fastq.gz
      r2: path/to/condition1_r2.fastq.gz
    condition2:
      r1: path/to/condition2_r1.fastq.gz
      r2: path/to/condition2_r2.fastq.gz
```

#### Run analysis
A call to `snakemake -s sc-preprocess.snake` will create a whitelist of true barcodes per condition, demultiplex barcodes into fastq files for each valid barcode and quantify the data.

## Quickstart for plate-based data (C1, Wafergen)
#### Configuration
Edit the file `config/plateseq.yml` and adapt the options to your needs. If you already have fastq files for each cell (sample), set `demultiplex: False` in the `action` section. 

#### Writing the samplesheet
Plate-based data needs a samplesheet, that gives information for each sample that should be processed. A samplesheet is a tab-separated table that contains *at least* a column with a samples name:

| Name  | Batch | Condition |
| --- | --- | --- |
| cell1 | batch1 | wildtype |
| cell2 | batch1 | mutant |
| cell3 | batch2 | wildtype |
| cell4 | batch2 | mutant |

Other columns might be needed, depending on the experimental setup:
* **C1 data** needs a `URL_r1` with the path to the samples fastq file, but no `Barcode` column.
* **Wafergen data** needs a `Barcode` column with the barcode that is used during multiplexing, but no `URL_r1` column.

The names of columns can be arbirtary, but need to be given for the sample name (`index`) and barcode column in the `config/plateseq.yml` file:
```
samplesheet:
  file: SampleSheet.txt
  index: Name
  barcode: Barcode
```

#### Run analysis
A call to `snakemake -s sc-preprocess.snake` will demultiplex barcodes into fastq files, if needed, and quantify the data.