diff --git a/README.md b/README.md index dadeb92..cac9d3e 100644 --- a/README.md +++ b/README.md @@ -57,51 +57,17 @@ Furthermore, steps can be skipped (to avoid re-running steps unnecessarily). Use ./run.py -h -## Obtaining and preparing data +## Further reading -Scripts to download and prepare data from the [Sequence Read Archive](https://www.ncbi.nlm.nih.gov/sra) are included in -LSTrAP in the folder **helper**. Furthermore, it is recommended to remove splice variants from the GFF3 files, a script -to do that is included there as well. Detailed instructions for each script provided to obtain and prepare data can be -found [here](docs/helper.md) + * [LSTrAP output](docs/example_output.md) + * [Quality statistics](docs/quality.md): How to check the quality of samples and remove problematic samples + * [Helper Scripts](docs/helper.md): To acquire data from the [Sequence Read Archive](https://www.ncbi.nlm.nih.gov/sra) + and process results. -## Quality report - -After running LSTrAP a log file (*lstrap.log*) is written, in which samples which failed a quality measure -are reported. Note that __no samples are excluded from the final network__. In case certain samples need to be excluded -from the final network remove the htseq file for the sample you which to exclude and re-run the pipeline skipping all -steps prior to building the network. - - ./run.py config.ini data.ini --skip-interpro --skip-orthology --skip-bowtie-build --skip-trim-fastq --skip-tophat --skip-htseq --skip-qc - -More information on how the quality of samples is determined can be found [here](docs/quality.md). - -## Output - -Apart from the output all tools included generate, LSTrAP will generate raw and normalized expression matrices, a -co‑expression network and co‑expression clusters. - -A detailed overview of files produces, including examples, can be found [here](docs/example_output.md). - -## Helper Scripts - -LSTrAP comes with a few additional scripts to assist users to download and process data from the [Sequence Read Archive](http://www.ncbi.nlm.nih.gov/sra), -repeat analyses and the case study reported in the manuscript (Proost et al., *under preparation*). - -Details for each script can be found [here](docs/helper.md) - -## Running LSTrAP on transcriptome data - -To use LSTrAP on a *de novo* assembled transcriptome a little pre-processing is required. Instead of the genome a fasta -file containing **coding** sequences can be used (remove UTRs). Using the helper script fasta_to_gff.py a gff file suited -for LSTrAP can be generated. - - python3 fasta_to_gff.py /path/to/transcript.cds.fasta > output.gff - - ## Contact -LSTrAP was developed by [Sebastian Proost](mailto:proost@mpimp-golm.mpg.de) and [Marek Mutwil](mailto:mutwil@mpimp-golm.mpg.de) at the [Max-Planck Institute for Molecular Plant Physiology](http://www.mpimp-golm.mpg.de/2168/en) +LSTrAP was developed by [Sebastian Proost](mailto:proost@mpimp-golm.mpg.de) and [Marek Mutwil](mailto:mutwil@gmail.com) at the [Max-Planck Institute for Molecular Plant Physiology](http://www.mpimp-golm.mpg.de/2168/en) ## Acknowledgements and Funding diff --git a/docs/helper.md b/docs/helper.md index 3a01543..5d854fb 100644 --- a/docs/helper.md +++ b/docs/helper.md @@ -18,6 +18,11 @@ Script to convert sra files into fastq. Sratools is required. python3 sra_to_fastq.py /sra/files/directory /fastq/output/directory +## Running LSTrAP on transcriptome data + +To use LSTrAP on a *de novo* assembled transcriptome a little pre-processing is required. Instead of the genome a fasta +file containing **coding** sequences can be used (remove UTRs). Using the helper script fasta_to_gff.py a gff file suited +for LSTrAP can be generated. ### parse_gff.py