-
Notifications
You must be signed in to change notification settings - Fork 0
Home
WIlsON: Webbased Interactive Omics visualizatioN
CLARION Input Format
Feature Selection
Plots
Plot Interface Overview
Interactivity
Use Cases
Contact and License
The WIlsON app is based on a specialized tab-delimited input file format called CLARION, which is further described below. All types of data which can be broken down to a feature with assigned numerical values can be deployed on the WIlsON server (e.g. multi-omics). A user can then generate various plots following four basic steps:
- Filter for features of interest based on categorical (annotation) or numerical values (e.g. transcripts, genes, proteins, probes)
- Select plot type
- Adjust plot parameters
- Render/download result
CLARION: generiC fiLe formAt foR quantItative cOmparsions of high throughput screeNs
CLARION is a data format especially developed to be used with WIlsON, which relies on a tab-delimited table with a metadata header to describe the following columns. It is based on the Summarized Experiment format and supports all types of data which can be reduced to features and their annotation (e.g. genes, transcripts, proteins, probes) with assigned numerical values (e.g. count, score, log2foldchange, z-score, p-value). Most result tables derived from RNA-Seq, ChIP/ATAC-Seq, Proteomics, Microarrays, and many other analyses can thus be easily reformatted to become compatible without having to modify the code of WIlsON for each specific experiment.
Please check the following link for details considering the CLARION format.
As mentioned above, the first step of using WILsON is to go to the tab "Feature Selection" in the top menu, where a subset of data to be used for plotting can be chosen.
This is controlled by the additive application of filtering steps based on all columns supplied by the CLARION input file. If no filtering is desired, just press the "Select" button. The "highlight" pane supports the creation of a subset of the selected features. The highlighted data can be used in certain plots which support highlighting (e.g. scatterplot) to either add a fixed color or labels. After filtering, plots of interest can be selected and generated via the tabs on top.
The Data tab permits the selection of features for default plotting. These will be shown as e.g. dots in a scatterplot or cells in a heatmap according to the respective visualization parameters. The Highlight tab can be used for the selection of a second subset of features, which are to be displayed on top of the standard plot. The way this highlight subset is interpreted is chosen in the respective plot interfaces. Highlight features are only implemented for the scatter plot for now and can be used to render these with a specific color for emphasis or for adding floating labels.
The table at the top of the "Feature Selection" page displays the current selection. It can be sorted ascending or descending according to any available data column. The selection can be narrowed down further by using the keyword search field on the top right of the table or by manual selection of rows (mouseclick on row). Some cell values are truncated due to long text blocks('...'): to display these data just hover over the specific cell.
The current selection of features, as defined by the column filters below, can also be limited to the e.g. Top50 features according to a certain column value. This necessitates a previous sorting on the respective column to move the features inside the table into the correct order. This does NOT change the basket of selected features itself, but only limits the amount of features reported to the desired plot. As such it serves as a temporary sub-selection of the current selection of features as shown in the table. This functionality is controlled by a range slider, that can be used to select TopX/BottomX of features according to the current order. In combination with numeric filtering on e.g. fold changes, this can be used to e.g. generate a list of the Top50 up and down regulated genes.
Based on the column's content (text, numeric) WIlsON's Feature Selector will provide appropriate filter interfaces to enable an efficient way to select data. These are split among the levels of data (feature, sample, condition, contrast) given in the input CLARION file. Please make sure to press the SELECT button after your are content with your filters to update the basket!
Annotations can be filtered by clicking a dropdown menu containing all available values. The filter box supports querying using partial key words as well. 'Backspace' can be used to deselect prior selections.
This filter is intented to select a numeric range. The 'inner' or 'outer' options allows the definition of either the range within the set markers (inner) or outside of the marker (outer), which is also displayed trough the slider coloring. As the step size is scaled according to the spread of the data (total range), editable value fields aside the slider can be utilized to change the minimum and maximum value (slider range is recalculated). This can be used to fine-tune the filter to e.g. use a very small value when the total range of the column is large.
These two areas report on the current selection of features for default plotting (Selected Features) and special highlighting (Highlighted Features) including the respectively applied filters.
The input data file can be chosen here. Apart from various example datasets, the user can also upload custom data in CLARION format.
The type of plot can be selected from the top menu after the desired features were selected.
The Gene Viewer consists of multiple plot types including line-, box-, violin- and barplots. It supports the visualization and comparison of individual genes and/or conditions.
A PCA is used to get an overview on the variation of the data based on the selected features. By default the two dimensions with the highest variation are selected (PC1 and PC2) and presented in a two-dimensional scatterplot.
Similar to the PCA, this plot will show the global clustering of samples or conditions based on the selected features. A distance matrix is created using one of various options (e.g. euclidean, pearson, spearman, etc.) and visualized by a heatmap.
This plot illustrates the dependency of two (X/Y axes) or three (X/Y/color) attributes. It supports a density estimation (kernel smoothing) and trend lines. The axes to be displayed can be chosen among the numeric columns to e.g. create Volcano, MA, or other kinds of scatter plots. The scatterplot supports highlighting of a subset of data (feature selection, pane highlight).
Various parameters permit the creation of highly customized heatmaps of the selected features. Among these are different kinds of clusterings, transformations (log2, log10, rlog, zscore), and color schemes. The Heatmap module supports interactive and static heatmaps.
The layout of plot interfaces is fairly similar. The top bar permits selection of a plotting application while the remaining space is usually divided the following way:
It shows the currently selected features as well as global parameters depending on the plot/filter.
These tabs provide access to several subsections: plots, plot variants, filters or data tables. Tables contain the specific subset of data used for the plot.
This area will show the result of the current rendering/filtering: either a plot or the data as a table.
The bottom interface contains most of the parameters defining a plot, including axis transformation, coloring etc.
- All plots include an interactive help section. Click on for a step by step tour on how to use the current interface.
- Further details considering specific interface elements are available with the symbols.
Thanks to the plotly package, several plots are available as interactive versions offering a range of additional options:
- Zoom / pan plot (either via UI or directly in plot)
- Mouse-over popup text box containing information of the selected feature
- Download currently selected viewport
It should be noted that the plotly versions are slower and more demanding on the processing power than the default ggplot2 versions.
See example data used within the demo and the use cases here.
Dataset: RNASeq Zhang 2015
Task: Create a heatmap comparing expression levels between wildtype (wt) and mutant (mt). Only select genes which are significantly differentially expressed and only use the top 10 considering the mean expression over all samples (BaseMean).
In order to filter, use the Feature Selection tab. For this example we want to filter for significantly differentially expressed genes, which is done on the contrast level. Set the following thresholds using inner/outer in combination with the range slider: fitted log2 fold change less than -2 or greater than 2 and p-value smaller than 0.1. As the latter might be difficult to select due to the tiny interval, change the max value of the slider using the box on the right side (essentially a zoom). Finally apply the filter by clicking on the SELECT button above.
Now the filtered table will be shown on top of the page. To select the genes with the highest BaseMean click on the BaseMean column until the columns values are descending (arrow down). Further narrow it down by utilising the slider directly below the table to select the Top 10 genes.
Dataset: RNASeq Zhang 2015
Task: Compare wildtype (wt) versus mutant (mt) conditions and show the expression differences adn significance using a scatter plot. Also highlight/label all genes which highly differentiate between conditions.
The final step is to add the highlighting of highly differentiated genes between conditions. To do so go back to Feature Selection, select the highlight-tab and filter for the respecting features.
Filter for highly differentiated genes between conditions by first expanding the contrast panel and then setting the "Fitted Log2FoldChange" to select values less than -3 or higher than 3. Apply the filter by clicking the SELECT button. Note: The filter will display an empty table on default meaning there is nothing highlighted.
Dataset: RNASeq Zhang 2015
Task: Create a scatterplot of non-coding RNAs including labeling of those with the most prominent up-regulation
Wilson was created by Hendrik Schultheis, Jens Preussner, Carsten Kuenne, and Mario Looso.
Bioinformatics Core Unit, Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany.
Copyright (C) 2017. This project is licensed under the MIT license.
The source code for the modular Wilson R package is available on Github.
The source code for the Wilson application implementing that package is available on Github.
The container for the Wilson application is available on Docker.
How to cite?
H. Schultheis, C. Kuenne, J. Preussner, R. Wiegandt, A. Fust, M. Bentsen and M. Looso. WIlsON: Webbased Interactive Omics VisualizatioN. Bioinformatics 35(6) 2018, doi: https://doi.org/10.1093/bioinformatics/bty711
Copyright © 2019 Dr. Mario Looso, Max Planck Institute for Heart and Lung Research