Skip to content
Permalink
aa20511b1e
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
 
 
Cannot retrieve contributors at this time
217 lines (156 sloc) 13.1 KB
ADMIRE documentation
================================================
### Analysis of DNA methylation in genomic regions
### Table of contents
1. [Overview and Objective](#overview-and-objective)
2. [Workflow](#workflow)
3. [Parameter overview](#parameters)
4. [Input](#input)
5. [Output](#output)
6. [Examples](#examples)
7. [References](#references)
8. [Citation and License](#citation-and-license)
9. [Contact](#contact)
### Overview and Objective
DNA methylation at cytosine nucleotides constitutes epigenetic gene regulation impacting cellular development and the stage of a disease. Besides whole genome bisulfit sequencing, Infinium Human Methylation Bead Chips represent a versatile and cost-effective tool to investigate changes of methylation patterns at CpG islands.
ADMIRE was developed as an open source, semi-automatic analysis pipeline and visualization tool for Illumina HumanMethylation450K BeadChip arrays.
#### Features
- Automatic filtering and normalization
- Statistical testing and multiple testing correction
- Supports arbitrary number of samples and sample groups
- Differential methylation analysis on pre-calculated and individual genomic regions
- Provides ready-to-plug-in files for genome browsers (like IGV)
- Provides publication-ready figures for the most differentially methylated regions
- Performs gene set enrichment analysis on predefined and individual gene sets
### Workflow
The ADMIRE workflow is shown below:
[<img width="600" src="https://bioinformatics.mpi-bn.mpg.de/static/images/workflow.png"/>](#workflow "ADMIRE workflow")
1. **Import, filter and normalize data**: ADMIRE starts with evaluating the sample sheet (or sample definition file, see below). The user receives an error message in case files are missing or cannot be read. Next, the data is aggregated and a quality control report is generated. Normalization is then performed according to the parameters supplied by the user (see [1]). Normalized beta and m values are stored in the `normalized` subdirectory.
2. **Perform one-sided two-sample tests**: Based on the *sample_group* information in the sample sheet or sample definition file, one-sided two-sample tests are performed per Illumina probe and between pairs of two sample groups. Intentionally, two p-values are obtained for each probe, indicating a higher methylation in either group and allowing the combination of multiple p-values from within the same genomic region in step 3.
3. **Combine spatially correlated p-values with genomic regions**: Probe specific p-values are mapped onto user defined genomic regions and combined p-values are calculated for the entire genomic region, indicating a higher methylation in either sample group. Multiple testing correction is applied to obtain q-values (see [2]).
4. **Filter significant differential methylated regions and visualize**: A user defined q-value cut-off is used to filter for significant differential methylated genomic regions. For visualization, results are aggregated into bed files that can be loaded into IGV (see [3]). Additionally the output contains tables ready to load into Excel, that can be used to filter for specific genomic locations, p-values, q-values or genes.
### Parameters
A list of parameters can be obtained by calling `admire -h`:
Parameter | Explanation
------------ | -------------
-c | Comma separated sample definition file (SampleSheet.csv)
-s | Tab separated sample definition file (design.txt)
-z | Compressed input of idat files (requires -c).
-e | Create quality control report in PDF
-r | Region file in bed format (regions.bed), use multiple -r parameters to calculate for multiple region files
-p | Detection p-value to exclude probes prior to analysis (0.01)
-t | Exclude probes where more than t% samples failed according to the detection p-value. (0.4)
-n | Normalization method (fn,swan,noob,illumina,raw,quantile)
-b | In case of functional normalization, skip noob background correction step
-d | In case of noob or functional normalization, skip dye correction step
-f | In case of quantile normalization, skip fixing outliers prior to analysis
-l | In case of quantile normalization, label samples as bad if their median signals are below a given value (10.5)
-m | In case of quantile normalization, remove bad samples
-q | Q-value cutoff for multiple testing correction (0.05)
-i | Render advanced plots for the best i regions (20)
-g | Gene set file for enrichment analysis, use multiple -g parameters to calculate enrichment over many gene sets
-o | tar-gz compress output into file given
-h | shows this help message
-v | shows version information
### Input
ADMIRE has two different use cases and can handle two different inputs.
#### Input from scanning HumanMethylation450 BeadChips on HiScan/iScan systems
The default output of HumanMethylation450 BeadChip compatible scanner systems consists of
* a SampleSheet.csv file and
* file directories named after the Chips Sentrix-ID containing two *.idat files per sample.
To use the files generated by the scanner system with ADMIRE, all file directories have to be compressed (e.g. by running `tar -zcvf compressFileName.tar.gz folderToCompress`).
ADMIRE can then be called with `admire -c SampleSheet.csv -z compressFileName.tar.gz`
#### Custom input
ADMIRE is also able to process a tab-separated sample definition file, with the following columns:
```
sample_id file channel sample_group
1 8769527070/8769527070_R01C01_Grn.idat Grn control
1 8769527070/8769527070_R01C01_Red.idat Red control
2 8769527070/8769527070_R01C02_Grn.idat Grn treatment
3 8769527070/8769527070_R02C01_Red.idat Red treatment
```
ADMIRE can then be called with `admire -s sample_definition.txt`.
#### Genomic regions
Custom genomic regions should be provided in [BED format](http://genome.ucsc.edu/FAQ/FAQformat.html#format1) and can be given by `admire -r regions1.bed -r regions2.bed ...`.
#### Normalization methods
ADMIRE features five different normalization methods (see Aryee, M.J., et al. Bioinformatics(2014) for details):
* Functional normalization `admire -n fn`
* Quantile normalization `admire -n quantile`
* Noob normalization `admire -n noob`
* Illumina Genome Studio normalization `admire -n illumina`
* SWAN normalization: `admire -n swan`
ADMIRE is also able to work on raw methylation values when called with `admire -n raw ...`.
In case of functional normalization, an additional noob background correction step can be skipped with `admire -b -n fn`.
In case of functional or noob normalization, a dye correction step can be skipped with `ADMIRE -d -n fn` or `admire -d -n noob`.
#### Gene set enrichment analysis
Gene sets can be given with `admire -g geneset1 -g geneset2 ...`. Gene set files should be text files with one gene symbol per line. The filename is used to name the gene set.
#### Other available parameters
ADMIRE features the following other parameters:
* Q-value cutoff after multiple testing correction `admire -q 0.05`
* Quality control report in PDF format `admire -e report.pdf`
* Tar-gz compressed output file `admire -o output.tar.gz`
* Help and version information `admire -h -v`
### Output
ADMIRE creates a number of output directories and files that are described below:
#### Files in the excel subdirectory
This subdirectory contains a csv file for each combination of sample group comparison (e.g. case-vs-control) and genomic region (e.g. promoters), with information about the genomic feature, its genomic location as well as p- and q-values of the sample groups.
#### Files in the visualization subdirectory
This subdirectory contains files for visualization with IGV. General files, like the genomic location of all Illumina probes, as well as the genomics regions used during analysis, are located in the `annotation-tracks` subfolder. Data specific files are located in the `data-tracks` folder. Here, you can find information per sample-group comparison (e.g. case-vs-control) information on significantly altered probe methylation (`control-case.igv`), as well as significantly altered genomic regions in [BED format](http://genome.ucsc.edu/FAQ/FAQformat.html#format1).
Additionally, additional images are stored in region-specific subdirectories.
#### Files in the normalized subdirectory
When not using inside the galaxy environment, the `normalized` subdirectory contains two matrices per normalization method, one with (normalized) beta values, the other with (normalized) m values. Each row corresponds to a single Illumina probe, each column represents a sample.
#### Files in the results subdirectory
When not using inside the galaxy environment, the `results` subdirectory contains intermediate result files, like the output from statistical testing (`control-case.pvals.bed`) or results from combining p-values (`comb-p` subdirectory).
#### Files in the geneset-enrichment subdirectory
tba
### Examples
#### Detection of differentially methylated promoters in permanent atrial fibrillation
The analysis is facilitated using data from genome-wide DNA methylation in permanent atrial fibrillation ([GEO dataset](http://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-62727/)). `*.idat`-files corresponding to red and green channel files were downloaded for all 11 samples. A custom sample-definition file is generated like the following:
```
sample_id file channel sample_group
1 GSM1532419_6929718127_R01C01_Grn.idat Grn fibrillation
1 GSM1532419_6929718127_R01C01_Red.idat Red fibrillation
2 GSM1532420_6929718127_R01C02_Grn.idat Grn fibrillation
2 GSM1532420_6929718127_R01C02_Red.idat Red fibrillation
3 GSM1532421_6929718127_R02C01_Grn.idat Grn fibrillation
3 GSM1532421_6929718127_R02C01_Red.idat Red fibrillation
4 GSM1532422_6929718127_R02C02_Grn.idat Grn fibrillation
4 GSM1532422_6929718127_R02C02_Red.idat Red fibrillation
5 GSM1532423_6929718127_R03C01_Grn.idat Grn fibrillation
5 GSM1532423_6929718127_R03C01_Red.idat Red fibrillation
6 GSM1532424_6929718127_R03C02_Grn.idat Grn fibrillation
6 GSM1532424_6929718127_R03C02_Red.idat Red fibrillation
7 GSM1532425_6929718127_R04C01_Grn.idat Grn fibrillation
7 GSM1532425_6929718127_R04C01_Red.idat Red fibrillation
8 GSM1532426_6929718127_R06C02_Grn.idat Grn control
8 GSM1532426_6929718127_R06C02_Red.idat Red control
9 GSM1532427_6929718167_R01C01_Grn.idat Grn control
9 GSM1532427_6929718167_R01C01_Red.idat Red control
10 GSM1532428_6929718167_R02C01_Grn.idat Grn control
10 GSM1532428_6929718167_R02C01_Red.idat Red control
11 GSM1532429_6929718167_R03C01_Grn.idat Grn control
11 GSM1532429_6929718167_R03C01_Red.idat Red control
```
A file containing all promoter sequences is created from the [GENCODE V19 annotation](ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_19/gencode.v19.annotation.gff3.gz) by extraction of all 2kb sequences upstream a TSS:
```
zcat gencode.v19.annotation.gff3.gz | grep '##sequence-region' | cut -d ' ' -f 2,4 | sed 's/ /\t/g' > hg19.genome
gunzip gencode.v19.annotation.gff3.gz
bedtools flank -i gencode.v19.genes.gff3 -g hg19.genome -l 2000 -r 0 -s | awk 'BEGIN{FS="\t";OFS="\t"}{split($9,a,";");split(a[6],gn,"=");print $1,$4,$5,"ID="gn[2]"-promoter;"a[2]";"a[4]";"a[6],".",$7}' > gencode.v19.genes_promoter2000.bed
```
ADMIRE is then called by
```
admire -s sample_definition.txt -e quality_report.pdf -r gencode.v19.genes_promoter2000.bed > log.txt &
```
### References
1. Aryee et. al. Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays. *Bioinformatics* (**2014**), doi: 10.1093/bioinformatics/btu049
2. Pedersen et. al. Comb-p: software for combining, analyzing, grouping and correcting spatially correlated P-values. *Bioinformatics* (**2012**), doi: 10.1093/bioinformatics/bts545
3. Thorvaldsdottir et. al. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. *Brief. Bioinformatics* (**2013**), doi: 10.1093/bib/bbs017
### Citation and License
Please cite Preussner J, Kuenne C and Looso, M. ADMIRE: Analysis and visualization of differential methylation in genomic regions using Infinium HumanMethylation450K Chips. *???* (2015), doi:tba
#### The MIT License (MIT)
Copyright (c) 2015 Jens Preussner and Mario Looso
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
### Contact
In case of further questions, bugs or contributions, feel free to send an e-mail to Jens Preussner (jens.preussner@mpi-bn.mpg.de).