ADMIRE documentation
ADMIRE - Analysis of DNA methylation in genomic regions
Table of contents
- Overview and Objective
- Workflow
- Parameter overview
- Input
- Output
- Examples
- Citation and License
- Contact
Overview and Objective
DNA methylation at cytosine nucleotides constitutes epigenetic gene regulation impacting cellular development and the stage of a disease. Besides whole genome bisulfit sequencing, Infinium Human Methylation Bead Chips represent a versatile and cost-effective tool to investigate changes of methylation patterns at CpG islands. ADMIRE was developed as an open source, semi-automatic analysis pipeline and visualization tool for Illumina HumanMethylation450K BeadChip arrays.
Features
- Automatic filtering and normalization
- Statistical testing and multiple testing correction
- Supports arbitrary number of samples and sample groups
- Differential methylation analysis on pre-calculated and individual genomic regions
- Provides ready-to-plug-in files for genome browsers (like IGV)
Workflow
The ADMIRE workflow is shown below:
- Import, filter and normalize data: ADMIRE starts with evaluating the sample sheet (or sample definition file, see below). The user receives an error message in case files are missing or cannot be read. Next, the data is aggregated and a quality control report is generated. Normalization is then performed according to the parameters supplied by the user. Normalized beta and m values are stored in the
normalized
subdirectory. - Perform one-sided two-sample tests: Based on the sample_group information in the sample sheet or sample definition file, one-sided two-sample tests are performed per Illumina probe and between pairs of two sample groups. Intentionally, two p-values are obtained for each probe, indicating a higher methylation in either group and allowing the combination of multiple p-values from within the same genomic region in step 3.
- Combine spatially correlated p-values with genomic regions: Probe specific p-values are mapped onto user defined genomic regions and combined p-values are calculated for the entire genomic region, indicating a higher methylation in either sample group. Multiple testing correction is applied to obtain q-values.
- Filter significant differential methylated regions and visualize: A user defined q-value cut-off is used to filter for significant differential methylated genomic regions. For visualization, results are aggregated into bed files that can be loaded into IGV. Additionally the output contains tables ready to load into Excel, that can be used to filter for specific genomic locations, p-values, q-values or genes.
Parameters
A list of parameters can be obtained by calling ADMIRE -h
:
Parameter | Explanation |
---|---|
-c | Comma separated sample definition file (SampleSheet.csv) |
-s | Tab separated sample definition file (design.txt) |
-z | Compressed input of idat files (requires -c). |
-e | Create quality control report in PDF |
-r | Region file in bed format (regions.bed), use multiple -r parameters to calculate for multiple region files |
-p | Detection p-value to exclude probes prior to analysis (0.01) |
-t | Exclude probes where more than t% samples failed according to the detection p-value. (0.4) |
-n | Normalization method (fn,swan,noob,illumina,raw,quantile) |
-b | In case of functional normalization, skip noob background correction step |
-d | In case of noob or functional normalization, skip dye correction step |
-f | In case of quantile normalization, skip fixing outliers prior to analysis |
-l | In case of quantile normalization, label samples as bad if their median signals are below a given value (10.5) |
-m | In case of quantile normalization, remove bad samples |
-q | Q-value cutoff for multiple testing correction (0.05) |
-o | tar-gz compress output into file given |
-h | shows this help message |
-v | shows version information |
Input
ADMIRE has two different use cases and can handle two different inputs.
Input from scanning HumanMethylation450 BeadChips on HiScan/iScan systems
The default output of HumanMethylation450 BeadChip compatible scanner systems consists of
- a SampleSheet.csv file and
- file directories named after the Chips Sentrix-ID containing two *.idat files per sample.
To use the files generated by the scanner system with ADMIRE, all file directories have to be compressed (e.g. by running tar -zcvf compressFileName.tar.gz folderToCompress
).
ADMIRE can then be called with ADMIRE -c SampleSheet.csv -z compressFileName.tar.gz
Custom input
ADMIRE is also able to process a tab-separated sample definition file, with the following columns:
sample_id file channel sample_group
1 8769527070/8769527070_R01C01_Grn.idat Grn control
1 8769527070/8769527070_R01C01_Red.idat Red control
2 8769527070/8769527070_R01C02_Grn.idat Grn treatment
3 8769527070/8769527070_R02C01_Red.idat Red treatment
ADMIRE can then be called with ADMIRE -s sample_definition.txt
.
Genomic regions
Custom genomic regions should be provided in BED format and can be given by ADMIRE -r regions1.bed -r regions2.bed ...
.
Normalization methods
ADMIRE features five different normalization methods (see Aryee, M.J., et al. Bioinformatics(2014) for details):
- Functional normalization
ADMIRE -n fn
- Quantile normalization
ADMIRE -n quantile
- Noob normalization
ADMIRE -n noob
- Illumina Genome Studio normalization
ADMIRE -n illumina
- SWAN normalization:
ADMIRE -n swan
ADMIRE is also able to work on raw methylation values when called with ADMIRE -n raw ...
.
In case of functional normalization, an additional noob background correction step can be skipped with ADMIRE -b -n fn
.
In case of functional or noob normalization, a dye correction step can be skipped with ADMIRE -d -n fn
or ADMIRE -d -n noob
.
Other available parameters
ADMIRE features the following other parameters:
- Q-value cutoff after multiple testing correction
ADMIRE -q 0.05
- Quality control report in PDF format
ADMIRE -e report.pdf
- Tar-gz compressed output file
ADMIRE -o output.tar.gz
- Help and version information
ADMIRE -h -v
Output
ADMIRE creates a number of output directories and files that are described below:
Files in the excel subdirectory
This subdirectory contains a csv file for each combination of sample group comparison (e.g. case-vs-control) and genomic region (e.g. promoters), with information about the genomic feature, its genomic location as well as p- and q-values of the sample groups.
Files in the visualization subdirectory
This subdirectory contains files for visualization with IGV. General files, like the genomic location of all Illumina probes, as well as the genomics regions used during analysis, are located in the annotation-tracks
subfolder. Data specific files are located in the data-tracks
folder. Here, you can find information per sample-group comparison (e.g. case-vs-control) information on significantly altered probe methylation (control-case.igv
), as well as significantly altered genomic regions in BED format.
Files in the normalized subdirectory
When not using inside the galaxy environment, the normalized
subdirectory contains two matrices per normalization method, one with (normalized) beta values, the other with (normalized) m values. Each row corresponds to a single Illumina probe, each column represents a sample.
Files in the results subdirectory
When not using inside the galaxy environment, the results
subdirectory contains intermediate result files, like the output from statistical testing (control-case.pvals.bed
) or results from combining p-values (comb-p
subdirectory).
Examples
Detection of differentially methylated promoters in permanent atrial fibrillation
The analysis is facilitated using data from genome-wide DNA methylation in permanent atrial fibrillation (GEO dataset). *.idat
-files corresponding to red and green channel files were downloaded for all 11 samples. A custom sample-definition file is generated like the following:
sample_id file channel sample_group
1 GSM1532419_6929718127_R01C01_Grn.idat Grn fibrillation
1 GSM1532419_6929718127_R01C01_Red.idat Red fibrillation
2 GSM1532420_6929718127_R01C02_Grn.idat Grn fibrillation
2 GSM1532420_6929718127_R01C02_Red.idat Red fibrillation
3 GSM1532421_6929718127_R02C01_Grn.idat Grn fibrillation
3 GSM1532421_6929718127_R02C01_Red.idat Red fibrillation
4 GSM1532422_6929718127_R02C02_Grn.idat Grn fibrillation
4 GSM1532422_6929718127_R02C02_Red.idat Red fibrillation
5 GSM1532423_6929718127_R03C01_Grn.idat Grn fibrillation
5 GSM1532423_6929718127_R03C01_Red.idat Red fibrillation
6 GSM1532424_6929718127_R03C02_Grn.idat Grn fibrillation
6 GSM1532424_6929718127_R03C02_Red.idat Red fibrillation
7 GSM1532425_6929718127_R04C01_Grn.idat Grn fibrillation
7 GSM1532425_6929718127_R04C01_Red.idat Red fibrillation
8 GSM1532426_6929718127_R06C02_Grn.idat Grn control
8 GSM1532426_6929718127_R06C02_Red.idat Red control
9 GSM1532427_6929718167_R01C01_Grn.idat Grn control
9 GSM1532427_6929718167_R01C01_Red.idat Red control
10 GSM1532428_6929718167_R02C01_Grn.idat Grn control
10 GSM1532428_6929718167_R02C01_Red.idat Red control
11 GSM1532429_6929718167_R03C01_Grn.idat Grn control
11 GSM1532429_6929718167_R03C01_Red.idat Red control
A file containing all promoter sequences is created from the GENCODE V19 annotation by extraction of all 2kb sequences upstream a TSS:
zcat gencode.v19.annotation.gff3.gz | grep '##sequence-region' | cut -d ' ' -f 2,4 | sed 's/ /\t/g' > hg19.genome
gunzip gencode.v19.annotation.gff3.gz
bedtools flank -i gencode.v19.genes.gff3 -g hg19.genome -l 2000 -r 0 -s | awk 'BEGIN{FS="\t";OFS="\t"}{split($9,a,";");split(a[6],gn,"=");print $1,$4,$5,"ID="gn[2]"-promoter;"a[2]";"a[4]";"a[6],".",$7}' > gencode.v19.genes_promoter2000.bed
ADMIRE is then called by
ADMIRE -s sample_definition.txt -e quality_report.pdf -r gencode.v19.genes_promoter2000.bed > log.txt &
Citation and License
Please cite Preussner J, Kuenne C and Looso, M. ADMIRE: Analysis and visualization of differential methylation in genomic regions using Infinium HumanMethylation450K Chips. Bioinformatics (2015), doi:tba
The MIT License (MIT)
Copyright (c) 2015 Jens Preussner and Mario Looso
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Contact
In case of further questions, bugs or contributions, feel free to send an e-mail to Jens Preußner (jens.preussner@mpi-bn.mpg.de).