Skip to content
Permalink
aa20511b1e
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
 
 
Cannot retrieve contributors at this time

ADMIRE documentation

Analysis of DNA methylation in genomic regions

Table of contents

  1. Overview and Objective
  2. Workflow
  3. Parameter overview
  4. Input
  5. Output
  6. Examples
  7. References
  8. Citation and License
  9. Contact

Overview and Objective

DNA methylation at cytosine nucleotides constitutes epigenetic gene regulation impacting cellular development and the stage of a disease. Besides whole genome bisulfit sequencing, Infinium Human Methylation Bead Chips represent a versatile and cost-effective tool to investigate changes of methylation patterns at CpG islands. ADMIRE was developed as an open source, semi-automatic analysis pipeline and visualization tool for Illumina HumanMethylation450K BeadChip arrays.

Features

  • Automatic filtering and normalization
  • Statistical testing and multiple testing correction
  • Supports arbitrary number of samples and sample groups
  • Differential methylation analysis on pre-calculated and individual genomic regions
  • Provides ready-to-plug-in files for genome browsers (like IGV)
  • Provides publication-ready figures for the most differentially methylated regions
  • Performs gene set enrichment analysis on predefined and individual gene sets

Workflow

The ADMIRE workflow is shown below:

  1. Import, filter and normalize data: ADMIRE starts with evaluating the sample sheet (or sample definition file, see below). The user receives an error message in case files are missing or cannot be read. Next, the data is aggregated and a quality control report is generated. Normalization is then performed according to the parameters supplied by the user (see [1]). Normalized beta and m values are stored in the normalized subdirectory.
  2. Perform one-sided two-sample tests: Based on the sample_group information in the sample sheet or sample definition file, one-sided two-sample tests are performed per Illumina probe and between pairs of two sample groups. Intentionally, two p-values are obtained for each probe, indicating a higher methylation in either group and allowing the combination of multiple p-values from within the same genomic region in step 3.
  3. Combine spatially correlated p-values with genomic regions: Probe specific p-values are mapped onto user defined genomic regions and combined p-values are calculated for the entire genomic region, indicating a higher methylation in either sample group. Multiple testing correction is applied to obtain q-values (see [2]).
  4. Filter significant differential methylated regions and visualize: A user defined q-value cut-off is used to filter for significant differential methylated genomic regions. For visualization, results are aggregated into bed files that can be loaded into IGV (see [3]). Additionally the output contains tables ready to load into Excel, that can be used to filter for specific genomic locations, p-values, q-values or genes.

Parameters

A list of parameters can be obtained by calling admire -h:

Parameter Explanation
-c Comma separated sample definition file (SampleSheet.csv)
-s Tab separated sample definition file (design.txt)
-z Compressed input of idat files (requires -c).
-e Create quality control report in PDF
-r Region file in bed format (regions.bed), use multiple -r parameters to calculate for multiple region files
-p Detection p-value to exclude probes prior to analysis (0.01)
-t Exclude probes where more than t% samples failed according to the detection p-value. (0.4)
-n Normalization method (fn,swan,noob,illumina,raw,quantile)
-b In case of functional normalization, skip noob background correction step
-d In case of noob or functional normalization, skip dye correction step
-f In case of quantile normalization, skip fixing outliers prior to analysis
-l In case of quantile normalization, label samples as bad if their median signals are below a given value (10.5)
-m In case of quantile normalization, remove bad samples
-q Q-value cutoff for multiple testing correction (0.05)
-i Render advanced plots for the best i regions (20)
-g Gene set file for enrichment analysis, use multiple -g parameters to calculate enrichment over many gene sets
-o tar-gz compress output into file given
-h shows this help message
-v shows version information

Input

ADMIRE has two different use cases and can handle two different inputs.

Input from scanning HumanMethylation450 BeadChips on HiScan/iScan systems

The default output of HumanMethylation450 BeadChip compatible scanner systems consists of

  • a SampleSheet.csv file and
  • file directories named after the Chips Sentrix-ID containing two *.idat files per sample.

To use the files generated by the scanner system with ADMIRE, all file directories have to be compressed (e.g. by running tar -zcvf compressFileName.tar.gz folderToCompress). ADMIRE can then be called with admire -c SampleSheet.csv -z compressFileName.tar.gz

Custom input

ADMIRE is also able to process a tab-separated sample definition file, with the following columns:

sample_id	file	channel	sample_group
1	8769527070/8769527070_R01C01_Grn.idat	Grn	control
1	8769527070/8769527070_R01C01_Red.idat	Red	control
2	8769527070/8769527070_R01C02_Grn.idat	Grn	treatment
3	8769527070/8769527070_R02C01_Red.idat	Red	treatment

ADMIRE can then be called with admire -s sample_definition.txt.

Genomic regions

Custom genomic regions should be provided in BED format and can be given by admire -r regions1.bed -r regions2.bed ....

Normalization methods

ADMIRE features five different normalization methods (see Aryee, M.J., et al. Bioinformatics(2014) for details):

  • Functional normalization admire -n fn
  • Quantile normalization admire -n quantile
  • Noob normalization admire -n noob
  • Illumina Genome Studio normalization admire -n illumina
  • SWAN normalization: admire -n swan

ADMIRE is also able to work on raw methylation values when called with admire -n raw .... In case of functional normalization, an additional noob background correction step can be skipped with admire -b -n fn. In case of functional or noob normalization, a dye correction step can be skipped with ADMIRE -d -n fn or admire -d -n noob.

Gene set enrichment analysis

Gene sets can be given with admire -g geneset1 -g geneset2 .... Gene set files should be text files with one gene symbol per line. The filename is used to name the gene set.

Other available parameters

ADMIRE features the following other parameters:

  • Q-value cutoff after multiple testing correction admire -q 0.05
  • Quality control report in PDF format admire -e report.pdf
  • Tar-gz compressed output file admire -o output.tar.gz
  • Help and version information admire -h -v

Output

ADMIRE creates a number of output directories and files that are described below:

Files in the excel subdirectory

This subdirectory contains a csv file for each combination of sample group comparison (e.g. case-vs-control) and genomic region (e.g. promoters), with information about the genomic feature, its genomic location as well as p- and q-values of the sample groups.

Files in the visualization subdirectory

This subdirectory contains files for visualization with IGV. General files, like the genomic location of all Illumina probes, as well as the genomics regions used during analysis, are located in the annotation-tracks subfolder. Data specific files are located in the data-tracks folder. Here, you can find information per sample-group comparison (e.g. case-vs-control) information on significantly altered probe methylation (control-case.igv), as well as significantly altered genomic regions in BED format. Additionally, additional images are stored in region-specific subdirectories.

Files in the normalized subdirectory

When not using inside the galaxy environment, the normalized subdirectory contains two matrices per normalization method, one with (normalized) beta values, the other with (normalized) m values. Each row corresponds to a single Illumina probe, each column represents a sample.

Files in the results subdirectory

When not using inside the galaxy environment, the results subdirectory contains intermediate result files, like the output from statistical testing (control-case.pvals.bed) or results from combining p-values (comb-p subdirectory).

Files in the geneset-enrichment subdirectory

tba

Examples

Detection of differentially methylated promoters in permanent atrial fibrillation

The analysis is facilitated using data from genome-wide DNA methylation in permanent atrial fibrillation (GEO dataset). *.idat-files corresponding to red and green channel files were downloaded for all 11 samples. A custom sample-definition file is generated like the following:

sample_id	file	channel sample_group
1	GSM1532419_6929718127_R01C01_Grn.idat	Grn	fibrillation
1	GSM1532419_6929718127_R01C01_Red.idat	Red	fibrillation
2	GSM1532420_6929718127_R01C02_Grn.idat	Grn	fibrillation
2	GSM1532420_6929718127_R01C02_Red.idat	Red	fibrillation
3	GSM1532421_6929718127_R02C01_Grn.idat	Grn	fibrillation
3	GSM1532421_6929718127_R02C01_Red.idat	Red	fibrillation
4	GSM1532422_6929718127_R02C02_Grn.idat	Grn	fibrillation
4	GSM1532422_6929718127_R02C02_Red.idat	Red	fibrillation
5	GSM1532423_6929718127_R03C01_Grn.idat	Grn	fibrillation
5	GSM1532423_6929718127_R03C01_Red.idat	Red	fibrillation
6	GSM1532424_6929718127_R03C02_Grn.idat	Grn	fibrillation
6	GSM1532424_6929718127_R03C02_Red.idat	Red	fibrillation
7	GSM1532425_6929718127_R04C01_Grn.idat	Grn	fibrillation
7	GSM1532425_6929718127_R04C01_Red.idat	Red	fibrillation
8	GSM1532426_6929718127_R06C02_Grn.idat	Grn	control
8	GSM1532426_6929718127_R06C02_Red.idat	Red	control
9	GSM1532427_6929718167_R01C01_Grn.idat	Grn	control
9	GSM1532427_6929718167_R01C01_Red.idat	Red	control
10	 GSM1532428_6929718167_R02C01_Grn.idat	Grn	control
10	 GSM1532428_6929718167_R02C01_Red.idat	Red	control
11	 GSM1532429_6929718167_R03C01_Grn.idat	Grn	control
11	 GSM1532429_6929718167_R03C01_Red.idat	Red	control

A file containing all promoter sequences is created from the GENCODE V19 annotation by extraction of all 2kb sequences upstream a TSS:

zcat gencode.v19.annotation.gff3.gz | grep '##sequence-region' | cut -d ' ' -f 2,4 | sed 's/ /\t/g' > hg19.genome
gunzip gencode.v19.annotation.gff3.gz
bedtools flank -i gencode.v19.genes.gff3 -g hg19.genome -l 2000 -r 0 -s | awk 'BEGIN{FS="\t";OFS="\t"}{split($9,a,";");split(a[6],gn,"=");print $1,$4,$5,"ID="gn[2]"-promoter;"a[2]";"a[4]";"a[6],".",$7}' > gencode.v19.genes_promoter2000.bed

ADMIRE is then called by

admire -s sample_definition.txt -e quality_report.pdf -r gencode.v19.genes_promoter2000.bed > log.txt &

References

  1. Aryee et. al. Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays. Bioinformatics (2014), doi: 10.1093/bioinformatics/btu049
  2. Pedersen et. al. Comb-p: software for combining, analyzing, grouping and correcting spatially correlated P-values. Bioinformatics (2012), doi: 10.1093/bioinformatics/bts545
  3. Thorvaldsdottir et. al. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief. Bioinformatics (2013), doi: 10.1093/bib/bbs017

Citation and License

Please cite Preussner J, Kuenne C and Looso, M. ADMIRE: Analysis and visualization of differential methylation in genomic regions using Infinium HumanMethylation450K Chips. ??? (2015), doi:tba

The MIT License (MIT)

Copyright (c) 2015 Jens Preussner and Mario Looso

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Contact

In case of further questions, bugs or contributions, feel free to send an e-mail to Jens Preussner (jens.preussner@mpi-bn.mpg.de).