This repository has been archived by the owner. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
55 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,56 @@ | ||
# single-cell-preprocessing | ||
A snakemake pipeline to preprocess data from single cell RNAseq | ||
A snakemake pipeline to preprocess data from single cell RNAseq. `sc-preprocess` can handle data from multiple platforms, e.g. C1, Wafergen and DropSeq. | ||
|
||
## Setup | ||
``` | ||
git clone ... | ||
cd single-cell-preprocessing | ||
conda env create -f environment.yml | ||
``` | ||
|
||
## Quickstart for DropSeq-based data | ||
#### Configuration | ||
Edit the file `config/dropseq.yml` and adapt the options to your needs. Pay attention to the `data` section and add the paths to individual fastq files like this: | ||
|
||
``` | ||
data: | ||
files: | ||
condition1: | ||
r1: path/to/condition1_r1.fastq.gz | ||
r2: path/to/condition1_r2.fastq.gz | ||
condition2: | ||
r1: path/to/condition2_r1.fastq.gz | ||
r2: path/to/condition2_r2.fastq.gz | ||
``` | ||
|
||
#### Run analysis | ||
A call to `snakemake -s sc-preprocess.snake` will create a whitelist of true barcodes per condition, demultiplex barcodes into fastq files for each valid barcode and quantify the data. | ||
|
||
## Quickstart for plate-based data (C1, Wafergen) | ||
#### Configuration | ||
Edit the file `config/plateseq.yml` and adapt the options to your needs. If you already have fastq files for each cell (sample), set `demultiplex: False` in the `action` section. | ||
|
||
#### Writing the samplesheet | ||
Plate-based data needs a samplesheet, that gives information for each sample that should be processed. A samplesheet is a tab-separated table that contains *at least* a column with a samples name: | ||
|
||
| Name | Batch | Condition | | ||
| --- | --- | --- | | ||
| cell1 | batch1 | wildtype | | ||
| cell2 | batch1 | mutant | | ||
| cell3 | batch2 | wildtype | | ||
| cell4 | batch2 | mutant | | ||
|
||
Other columns might be needed, depending on the experimental setup: | ||
* **C1 data** needs a `URL_r1` with the path to the samples fastq file, but no `Barcode` column. | ||
* **Wafergen data** needs a `Barcode` column with the barcode that is used during multiplexing, but no `URL_r1` column. | ||
|
||
The names of columns can be arbirtary, but need to be given for the sample name (`index`) and barcode column in the `config/plateseq.yml` file: | ||
``` | ||
samplesheet: | ||
file: SampleSheet.txt | ||
index: Name | ||
barcode: Barcode | ||
``` | ||
|
||
#### Run analysis | ||
A call to `snakemake -s sc-preprocess.snake` will demultiplex barcodes into fastq files, if needed, and quantify the data. |