Skip to content

Gene Expression Clustering Tool

Introduction to Gene Expression Clustering

The Gene Expression Clustering tool is a web-based tool for performing sample clustering by selecting a desired set of genes from the NCI Genomic Data Commons (GDC), and visualizing a heatmap of a z-score transformed matrix.

Quick Reference Guide

At the Analysis Center, click the 'Gene Expression Clustering' card to launch the heatmap.

Analysis Center Gene Expression Clustering Card

Users can view publicly available genes as well as login with credentials to access controlled data.

There are four main panels in the Gene Expression Clustering tool: controls, heatmap, variables, and legend.

Gene Expression Clustering Tool Features

Controls

The control panel can modify the displayed data or the appearance of the matrix. Their functionalities are outlined below.

Gene Expression Clustering Tool Controls

  • Clustering: Modify the clustering method, the distance method, alter the column and row dendrogram dimensions, change the z-score cap and color scheme.
  • Cases: Adjust the visible characters of the case labels
  • Genes: Modify how cases are represented for each gene (Absolute, Percent, or None), row group and label lengths, rendering style, and the existing gene set
    • Edit Group: Displays a panel of currently selected genes, which can be modified by clicking on a gene to remove it from the gene set, searching for a particular gene to add, loading top variably expressed genes, or loading a pre-defined gene set provided by the MSigDB database
    • Create Group: Create a new gene set by searching for a particular gene, loading top mutated genes, or loading a pre-defined gene set provided by the MSigDB database
  • Variables: Search and select variables to add to the matrix below the heatmap
  • Cell Layout: Modify the format of the cells by changing colors, cell dimensions, and label formatting
  • Legend Layout: Alter the legend by changing the font size, dimensions, and other formatting preferences
  • Download: Download the plot in svg format
  • Zoom: Adjust the zoom level by using the up and down arrows on the input box, entering a number, or using the sliding scale to view the case labels.

Heatmap

The Gene Expression Clustering heatmap displays the active cohort's cases along the top horizontally, genes along the left column, and the z-score transformed gene expression value.

Gene Expression Clustering Tool Heatmap

Hovering over a cell in the heatmap displays the case submitter_id, gene name, and gene expression value.

Gene Expression Clustering Tool Heatmap Cell

Clicking on a cell also gives users the option to launch the Disco plot, a circos plot displaying copy number data and consequences for that case.

Selecting cases on the cluster

Cases on the cluster can be selected by clicking on the top dendrogram. Once part of the dendrogram is selected, users can choose to zoom in to the cases, list all highlighted cases, or create a cohort of the selected cases.

Gene Expression Clustering Tool Heatmap Cases Dendrogram

Click on a case in the dendrogram to showcase the Disco plot or the GDC Case Summary Page.

Gene Expression Clustering Tool Heatmap Case Selection

Selecting genes on the cluster

Genes on the cluster can be selected by clicking on the left dendrogram. Once part of the dendrogram is selected, users can choose to list the genes selected or launch Gene Set Overrepresentation Analysis with the genes selected.

Gene Expression Clustering Tool Heatmap Genes Dendrogram

Clicking Gene set overrepresentation analysis will lauch an ORA chart above, after selecting a Gene set group, a table will be shown as the result of the Gene set overrepresentation analysis.

Gene Expression Clustering Tool Heatmap ORA

In the column of genes on the left, click on a gene to rename it, launch the ProteinPaint Lollipop plot, display the GDC Gene Summary Page, or remove the gene. The lollipop plot displays all cases across the GDC affected by SSMs in the selected gene.

Gene Expression Clustering Tool Gene Selection

Variables

Any variables added to the matrix appear below the heatmap. Users can hover over a cell to display the case submitter_id and their value for the given variable.

Gene Expression Clustering Tool Variables

Click on a variable to rename it, edit it by excluding categories, replace it with a different variable, or remove it entirely.

Gene Expression Clustering Tool Variable Selection

When editing the "Overall Survival" variable, users can choose between Time to Event or Exit Code. If Time to Event is selected, users have the option to convert the values to z-scores.

Gene Expression Clustering Tool Overall Survival Editting

Users can drag and drop a variable row that isn't used for clustering to reposition it

Legend

In addition to the color coding system for the gene expression values, the legend displays the number of cases from the active cohort in each category for all variables that are selected to appear in the matrix.

Gene Expression Clustering Tool Legend

Users can click on a variable in the legend to hide a specific category, only show a specific category, or show all categories for the selected variable.

Gene Expression Clustering Tool Legend Selection

Accessing the Tool

At the analysis center, click the 'Gene Expression Clustering' card to launch the heatmap.

Analysis Tools with Gene Expression clustering app card

View publicly available genes as well as login with credentials to access controlled data.

Features

The following features are viewable once the default heatmap is loaded. The default heatmap shows all the glioma cases. There are four main panels as outlined in the figure i.e., the 'Controls', 'Heatmap', 'Variables' and the 'Legend'. Each of the features and functionalities are described in detail in the following sections.

Default view

Controls

The control panel as shown has various functionalities with which users can change or modify the appearance of the matrix. The control panel provides flexibility and a wide range of options to maximize user control.

Controls

Clustering

The clustering control button provides several options to modify the default clustering of the heatmap. Click on the button labeled 'Clustering' to display a menu with options as shown.

Clustering button

Cluster cases

check/uncheck to show/hide the column dendrogram

Clustering method

Click on the options to change the method of clustering. The heatmap will render again with the clustering method selected.

Clustering button

Distance method

Click on the options to change the distance method. The heatmap will render again with the distance method selected.

Distance method clustering

Column Dendrogram Height

Click or edit the number in the input box to adjust the height of the column dendrograms as shown.

Column Dendrogram Height

Column Dendrogram Height

Row Dendrogram Width

Similary, row dendrogram width can also be modified as per user requirement as shown.

Row dendrogram width

Row dendrogram width

Z-score Cap

Z scores are used to compare gene expression across samples. A Z-score of zero indicates that the gene's expression level is the same as the mean expression level across all samples, while a positive Z-score indicates that the gene is expressed at a higher level than the mean, and a negative Z-score indicates that the gene is expressed at a lower level than the mean.

User can increase or decrease the Z-score Capping. Increase the Z-score cap from 5 to 10 as shown. Samples with lower gene expression gets lighter to allow highlighting of clusters with higher expression values as shown in red in the heatmap.

Z-score capping

Color Scheme

Click on the options to change the color scheme used. The heatmap will render again with the color scheme selected.

Color scheme

Cases

The 'Cases' control has these options:

Cases options

Case Label Character Limit

adjust the visible characters of these sample labels. The default is '32'. Note that reducing the character limit truncates the labels.

Group Cases By

Clicking the "+" allows users to select a term to group cases by the categories in the term.

Sort Case Priority

Allows users to set case sorting priority. The default sort setting sorts the cases 'by presence' under 'Basic' sort settings. To change sorting click on the 'Cases' tab.

Then click the second option by consequence to change the sorting. The clustering reloads with the new sorting.

To perform an advanced sorting, click 'Advanced' on the 'Sort Case Priority' menu as shown below.

Advanced sorting options

Now user has the option to sort the cases by each selected row, gene mutation, dictionary variable or alphabetically by name. Details of each sort option are provided.

Genes

User can modify the existing default gene set by clicking the 'Genes' button in the controls as shown. This displays the option to edit genes as well as variables from the dropdown as shown.

Geneset edit

Modifying Genes

Click the 'Edit Current Group' button as shown in the 'Gene set' to display a panel of current selected genes.

Editing geneset

Add/Delete a gene

In the search box, type in any gene name for example 'Wee1' as shown and click submit.

Searching genes

The heatmap loads again after performing a clustering that includes 'WEE1' as shown.

Adding genes one by one

Click on the 'Edit' functionality again within the 'Gene set' menu option. To delete a gene, hover over the gene as shown. A red cross mark will appear as shown.

Deleting genes one by one

Click on the gene 'Wee1' to delete the gene from the gene set. Click submit to redo the clustering.

Load top variably expressed genes

User has the option to load the top genes that are variably expressed. To do so, click on the 'Edit Selected Group' under 'Genes' controls. Click on the,'Top variably expressed genes' button.

Here the user has the option to select 'Gene count' with the minimum cutoff to narrow down the list of top variably expressed genes. Additionally, a user has the option to choose all genes or show genes from a subset.

Load top variably expressed genes

Load MSigDB gene set

The gene expression clustering tool also enables users to load a pre-defined gene set provided by the MSigDB database. The current version enabled is the latest. Click on the dropdown button 'MSigDB (2023.2.Hs) gene set' and choose one of the following gene sets as shown.

MSigDB tree

For example, select a hallmark gene set for 'Hypoxia' as shown.

Hallmark hypoxia gene set

Note the info icon next to the gene set that provides additional information about this gene set as well as a link to the database and the original publication PMID as shown.

Info icon

Upon selecting a MSigDB gene set, the genes get updated as shown.

Selected geneset hypoxia

Click 'Submit' to reload the heatmap with the new gene set from MSigDB.

Load gene set

The gene expression clustering tool also enables users to load a user-saved custom gene sets. Click on Load gene set and choose one of the user-saved custom gene sets.

load gene set

Adding gene as a variable

Users also have the option to add gene variant terms as variables to line up mutation consequences with clustered gene expression data.

To do so, click the Genes, and Create New Group after typing a group name.

Searching a gene as variable

Click Submit to reload the heatmap with the newly added KRAS gene as a variable. This displays the consequence type for the clustered samples for which KRAS has both the mutation calls and the gene expression data as shown.

Heatmap with KRAS gene as a variable

Variables

The button 'Variables' in the controls allows the user to search and select variables that get added below the heatmap.

Click the button 'Variables' to show the following dictionary tree.

Variables dictionary tree

Click the '+' button on the 'Demographic' to display all the terms under the parent term as shown. Select terms 'Ethnicity' and 'Year of birth' and click 'Submit 2 terms'.

Selecting and submitting variables

Once the variable terms are submitted, the heatmap will display the added variables as shown.

Variable heatmap

Download

The control panel shows an option to download the plot as an svg after user has specified their customizations. Select the 'Download' button as shown below to save the visualization in either SVG or TSV format.

Download button

If svg format is selected then the download will get saved to the default download folder as shown at the bottom of the browser window.

Saved download

Adjusting the zoom using the zoom buttons

Adjust the zoom level by using arrows on the input box or entering a number to be able to view the sample lables as shown.

Adjusting the Zoom

Heatmap

Selecting cases on the cluster

Cases on the cluster can be selected interactively by clicking on the column dendrograms. Click on the dendrograms above the heatmap as shown. The dendrograms get highlighted in red.

Selecting case cluster

Once the dendrograms are selected, two options are displayed. A user can choose to zoom in the cases or list all the cases highlighted in the dendrograms.

Clicking a case column

Click on a case label to display the options as shown.

Clicking case column

User may choose to launch: - a circos plot by clicking 'Disco plot' button, - a webpage containing information about the case by clicking the case id - Gene summary page by clicking on the gene name 'PDGFRA'

Clicking a gene label

Click on a gene row label to display the following options

Clicking gene label

User can choose to change variable name by deleting and typing in a new name in the box where 'PDGFRA' is currently applied. User may also choose to launch the lollipop plot or gene summary page or remove this row entirely.

Hovering over/Clicking a cell

Hover over a cell of the heatmap to show information about the case. The information displayed shows the case id, the gene name (CCND1) and the z-score transformed value (4.04..)

Hovering over a case

Variables

Clicking a Variable

Click on a variable (for example 'Project id' here) row label to display the options as shown.

Clicking a variable

User can change the variable name (input box), edit the variable to exclude categories ('Edit' button), replace the variable by another one ('Replace' button) or remove the row containing the variable entirely by clicking the 'Remove' button.

Renaming a variable

To rename a variable, edit the default name of the variable in the input box as shown.

Rename variable

After renaming the variable as per user preference, click 'submit'. The row now shows a new variable name.

Editing a variable

To edit groups within the variable, click the 'Edit' button. Now, user can drag the categories from group 1 into group 2 to create two separate groups and also have the option to exclude a category. After making the choice, click 'Apply' to reload the chart.

Editing a variable

When editing the "Overall Survival" variable, users can choose between Time to Event or Exit Code. If Time to Event is selected, users have the option to convert the values to z-scores.

Gene Expression Clustering Tool Overall Survival Editting

Users can drag and drop a variable row that isn't used for clustering to reposition it

Replacing a variable

To replace a variable, click on the row label for that variable and click Replace. This shows the GDC dictionary from which a user can select a variable of choice as shown.

Replace variable

Removing a variable

To remove a row containing a variable entirely, click on the row label for that variable and click 'Remove'. This removes the entire row from the heatmap.

Remove variable

Legend

Interacting with legend filters

Variables can be filtered upon via the legend. Click a legend item to display the following options. User may choose to 'Hide', 'Show only', or 'Show all' categories from a selected variable. This would allow the user to filter down on the category of choice.

Clicking legend icons