This repository has been archived by the owner. It is now read-only.
Permalink
Cannot retrieve contributors at this time
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
scRNAseq.preprocessing/README.md
Go to fileThis commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
56 lines (46 sloc)
2.17 KB
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# single-cell-preprocessing | |
A snakemake pipeline to preprocess data from single cell RNAseq. `sc-preprocess` can handle data from multiple platforms, e.g. C1, Wafergen and DropSeq. | |
## Setup | |
``` | |
git clone ... | |
cd single-cell-preprocessing | |
conda env create -f environment.yml | |
``` | |
## Quickstart for DropSeq-based data | |
#### Configuration | |
Edit the file `config/dropseq.yml` and adapt the options to your needs. Pay attention to the `data` section and add the paths to individual fastq files like this: | |
``` | |
data: | |
files: | |
condition1: | |
r1: path/to/condition1_r1.fastq.gz | |
r2: path/to/condition1_r2.fastq.gz | |
condition2: | |
r1: path/to/condition2_r1.fastq.gz | |
r2: path/to/condition2_r2.fastq.gz | |
``` | |
#### Run analysis | |
A call to `snakemake -s sc-preprocess.snake` will create a whitelist of true barcodes per condition, demultiplex barcodes into fastq files for each valid barcode and quantify the data. | |
## Quickstart for plate-based data (C1, Wafergen) | |
#### Configuration | |
Edit the file `config/plateseq.yml` and adapt the options to your needs. If you already have fastq files for each cell (sample), set `demultiplex: False` in the `action` section. | |
#### Writing the samplesheet | |
Plate-based data needs a samplesheet, that gives information for each sample that should be processed. A samplesheet is a tab-separated table that contains *at least* a column with a samples name: | |
| Name | Batch | Condition | | |
| --- | --- | --- | | |
| cell1 | batch1 | wildtype | | |
| cell2 | batch1 | mutant | | |
| cell3 | batch2 | wildtype | | |
| cell4 | batch2 | mutant | | |
Other columns might be needed, depending on the experimental setup: | |
* **C1 data** needs a `URL_r1` with the path to the samples fastq file, but no `Barcode` column. | |
* **Wafergen data** needs a `Barcode` column with the barcode that is used during multiplexing, but no `URL_r1` column. | |
The names of columns can be arbirtary, but need to be given for the sample name (`index`) and barcode column in the `config/plateseq.yml` file: | |
``` | |
samplesheet: | |
file: SampleSheet.txt | |
index: Name | |
barcode: Barcode | |
``` | |
#### Run analysis | |
A call to `snakemake -s sc-preprocess.snake` will demultiplex barcodes into fastq files, if needed, and quantify the data. |