ProteinPaint Tool

Introduction to ProteinPaint

ProteinPaint is a web-based, dynamic visualization tool that displays a lollipop chart based on the multidimensional skewer version 3 (mds3 track). This tool utilizes variant annotations from GDC datasets. Given a particular gene, it displays variants associated with that gene as well as the occurrence, disease type, and demographic information of the associated case given a case.

Quick Reference Guide

At the Analysis Center, click on the 'ProteinPaint' card to launch the app.

Analysis Center

Users can view publicly available variants as well as login with credentials in order to access controlled data.

When launched, ProteinPaint will display a search box where users can enter a gene symbol, alias, or GENCODE accession. Once a gene is entered, a lollipop frame is displayed with the name of the chart in the header.

Lollipop Frame

In addition to the search box, there are two other main panels in the ProteinPaint tool: Lollipop chart, and legend.

3 Main Panels

Lollipop Chart Panel

After entering a gene, the tool will display a Lollipop chart for the GDC variants as well as a Protein View for the default isoform.

In the Lollipop chart, the circular discs for each variant are color coded per GDC mutation classes and are proportional in size to the number of occurrences. Variants in the same position are arranged in descending order of occurrences.

Lollipop Chart

Exon variants report the amino acid change at the referenced codon. For example, G12D is a G > D substitution at the 12th codon of the protein.

The default isoform will appear directly to the right of the gene name. Clicking on the isoform number will open a display to view/select other isoforms and switch the display track.

Display All Isoforms

Clicking on the number of variants link, to the left of the plot, opens a menu where users can view annotations and manipulate the Lollipop:

  • List: Displays all variants, each of which can be selected to launch the annotation table which displays consequence, mutation, sample submitter_id, and other data related to the sample
  • Collapse/Expand: Collapses or expands all skewers in the lollipop
  • Download: Downloads the mutations in a TXT file
  • As lollipops: Displays variants via circular discs proportional to the number of occurrences
  • Occurrence as Y axis: Sorts variants on the y-axis by number of occurrences

Clicking on the number of samples opens a window to view annotations grouped by GDC case properties such as disease type and primary site. Selecting a value adds a new Lollipop subtrack that displays only the samples with the given value. This side-by-side view allows for a comparison between the mutations in the main track versus the subtrack.

Adenomas and Adenocarcinomas Example: Side-by-Side View

Each subtrack offers advanced filtering for users to narrow down particular features. Clicking on the value to the right of the Lollipop launches a pop-up window where users can add subsequent filters using the +AND or +OR options.

Advanced Filtering

Detailed variant annotation is viewable by clicking on the disc next to the variant label. The sunburst chart is composed of a ring hierarchy, arranged by disease types then broken down by primary sites.

Variant Occurrence

Hovering over the inner and outer rings displays the disease type or primary site, number of samples, and cohort size.

Node

An aggregate table displaying all the samples associated with that variant is available by clicking the 'Info' button in the center of the sunburst.

Sample Annotation Table

The top of the table displays consequence, mutation, and occurrence count with a link to the GDC Mutation Summary Page.

The sample table contains a number of columns for various associated features per sample such as Disease type, Mutations, and Tumor DNA Mutant Allele Frequency. Users can create a new cohort by selecting the checkboxes in the first column then clicking 'Create Cohort' in the bottom right corner of the table. The table also includes options to launch the Disco plot and the GDC Case Summary Page for each sample.

Protein View

The Protein View, which displays the nucleotides, codons in the exon region, introns, and protein domains, is the primary area in which a user will visualize and interact with protein coding regions.

Protein View

To zoom into the Protein View, users can highlight a region or use the zoom buttons (In, Out x2, x10, and x50) in the toolbar. For viewing a nucleotide of interest, click and drag in the top Protein length scale. The region appears highlighted in red with the calculated protein length in center.

Zoom Method

Zooming in to the protein track displays the codons and the nucleotides. Hovering over the nucleotide position displays a tooltip with the exon, amino acids position, RNA position, and protein domain.

Codons and Nucleotides

By clicking on the isoform number in the Lollipop chart, users can switch the display track between genomic, splicing RNA, exon only, protein, and aggregate of all isoforms.

Legend Panel

Protein Domains

The Protein View color codes regions by the protein domain present on the full-length protein region in the exon display.

Protein Domains

The legend offers simple filtering for the variants shown in the lollipop. To the right of PROTEIN, users can click on the color to hide that particular protein domain. Clicking on the color again shows the protein domain.

Custom protein domains are added by clicking on the + add protein domain button at the bottom of the list. An input box appears requiring the following information:

  1. Name, text with space, no semicolon: Name of the protein domain
  2. Range, two integers joined by space: Codon position - start and stop
  3. Color (e.g., red, #FF0000, rgb (255,0,0)): Color to assign to the protein domain

The protein domains also include links to databases of protein families such as the Conserved Domains Database (CDD), Simple Modular Architecture Research Tool (SMART), and Pfam.

GDC Mutation Class

The GDC mutation class color coding for the lollipop discs appears below the legend for the protein domains.

Color Coding Legend

Clicking on a mutation class opens a pop-up menu with show/hide functionalities:

  • Hide: Remove all of the lollipop discs for the particular mutation class
  • Show only: Only show the lollipop discs for the particular mutation class
  • Show all: Display the lollipop discs for all mutation classes

Mutation Pop-Up Menu

Additional Features

In the toolbar, the More button offers methods to download figures and data:

ProteinPaint More Options

  • Export SVG: Download the Lollipop and legend as an SVG file
  • Reference DNA Sequence: Display the DNA sequence as plain text for easy copying and pasting
  • Highlight: Highlight a region in the Lollipop by selecting it in the chart or entering it in a text box

ProteinPaint Features

When selected, ProteinPaint will display the search-box as illustrated below. Once a user enters a gene symbol, alias, or GENCODE accession, a lollipop frame is displayed with the name of the chart in the header. The example below is of the gene AKT1. All gene symbols are based on the HGNC guidelines.

Lollipop Frame

There are 3 main panels as outlined in the figure below:

  1. Search box
  2. Lollipop chart panel
  3. Legend panel

3 Main Panels

Search Box

The example below uses the KRAS gene. The name of the gene (e.g., 'KRAS'), GENCODE accession no. (e.g., ENST00000311936, ENSP00000308495) or RefSeq accession (e.g., NM_004985) can be used as the search item. In case a wrong gene is entered, the search box will display an error. For gene searches only, typing a few letters reveals a menu of possible matches. Choose from either a menu option or hit enter.

Search Box Example

Lollipop Chart Panel

Protein View

After searching for KRAS, the Protein View for the default isoform appears in a new frame. The Protein View displays the nucleotides, codons in the exon region, introns, and protein domains as shown below.

Protein View

The legend offers simple filtering for the variants showing in the lollipop. Clicking the color for a protein domain on the right of PROTEIN for example, hides that protein domain. Clicking on the color again shows the protein domain. Similar show/hide functions are available by clicking on the legend labels.

The default isoform for KRAS on hg38 genome build is NM_004985. Hovering over the isoform label will highlight it as shown below.

Default Isoform

A user can select the isoform by clicking on the isoform number as shown in the figure above. Clicking this will open a display to view all the other isoforms as well as the option to switch the display track as shown below in the figure.

Display All Isoforms

From Switch Display, a user can update to one of the following: 1. Genomic display 2. Splicing RNA 3. Exon only 4. Protein track 5. Aggregate of all isoforms

The Protein track is the primary area in which a user will visualize and interact with protein coding regions.

Protein Track

Under Switch Isoform, the available RefSeq and Ensembl isoform builds are listed. A condensed display and the protein length is shown for each isoform. The current selection appears in red text. The default KRAS isoform for example, is NM_004985 with 189 amino acids. To change the isoform, click on the appropriate line highlighted in yellow.

Switch Isoforms

The pop-up window disappears and the lollipop track rerenders with the newly selected isoform.

Lollipop Charts

The lollipop chart for the GDC variants appears above the Protein View. The circular disc for each variant is proportional to the number of occurrences. Variants in the same position are arranged in descending order of magnitude. There are eight types of variants found in the lollipop chart (see legend).

Lollipop Chart

Exon variants report the amino acid change at the referenced codon. For example, G12D is a G > D substitution at the 12th codon of the protein.

Clickable links for the number of cases (e.g. 1315 samples) and number of variants (e.g. 99 out of 110 variants) appear to the left of the lollipop. Clicking on these links reveals detailed annotations about the samples and variants, described in Viewing Variants and Case Samples.

Sample and Variant Annotations

Viewing Variants and Case Samples

Variant Annotations and Chart Manipulation

Click on the number of variants linked to the left of the lollipop for viewing annotations and manipulating the lollipop. For variant annotation, click on 'List'.

Variant Annotation

A pop-up window appears with the entire list of variants, as shown below.

List of Variants

Click on the variant of interest and a new annotation table appears. From the table, view various associated features per sample such as: Disease type, Primary site, Project id, Gender, Race, Ethnicity, and Tumor DNA Mutant Allele Frequency(MAF). In the figure below, 333 occurrences are shown for the G12D variant, which represents a missense mutation at chromosome chr12:25245350 C>T.
Annotation Table

The first sample that is highlighted in yellow is a male with ductal and lobular neoplasms with a tumor DNA MAF of 31/125. This indicates 31 mutant alleles were found out of 125 total alleles.

The GDC dataset includes an 'Access' column to indicate whether the data is controlled or open. Users must obtain permission from dbGaP to view controlled data See Obtaining Access to Controlled Data. Click on the sample hyperlink and the GDC's case summary for the sample will appear in a new tab.

Click 'Back to list' and select another sample, as shown below.

Back to List

After clicking on the variant menu again, select the 'Collapse' option to collapse all skewers in the lollipop.

Collapse

To expand any previously collapsed skewers, open the variant menu, and click on 'Expand'.

Expand

The lollipop chart includes an option to arrange variants by the range of occurrences. Open the variant menu and click on 'Occurrence as Y axis'.

Occurrence as Y Axis

The lollipop re-renders with the variants sorted on the y-axis from lowest and highest occurrence. Hover over a variant to display the number of occurrences. In the example below, a user is hovering over G12D to display 333 occurrences of this variant.

Number of Occurrences

Clicking on the variant loads the sample table again as shown below.

Sample Table

Case Filtering

Clicking on the sample hyperlink on the left of the lollipop (e.g. 1315 samples) opens a menu to list all samples. Aggregate data for all samples by attribute appears in a series of tabs. The ability for advanced filtering and creating subtracks is available from this new display.

Menu to List All Samples

Click on 1315 samples to view annotations grouped by attributes such as: Disease type, Primary site, Project id, Gender, Race, Ethnicity, etc.. For each attribute, the number of values is represented by 'n' to the right of the group label. In the figure below, 21 values for Disease type are reported.

Annotations Grouped by Attributes Example: Disease type

To start filtering, click on the value label or the value's sample fraction. Clicking on 'Adenomas and Adenocarcinomas' or '675/ 4866' for example, loads a new lollipop subtrack underneath the main GDC lollipop track.

Filtering Example: Adenomas and Adenocarcinomas

This new subtrack only shows the 675 Adenomas and Adenocarcinomas samples. This side-by-side view allows for a comparison between the mutations in the main track vs the subtrack.

Adenomas and Adenocarcinomas Example: Side-by-Side View

Each subtrack offers advanced filtering, shown below, for users to narrow down particular features.

Advanced Filtering

Clicking on 'Filter' displays a pop-up window with the feature the user selected previously from the sample annotation menu (e.g. Disease type: Adenomas and Adenocarcinomas). Clicking on either +AND or +OR displays a new pop-up with a search bar. Search for the desired term and click on the term's button. In the image below a user selected 'gender' by clicking the '+AND'.

Filter Pop-Up Window

By clicking on 'Gender', all available values appear with checkboxes (i.e. male and female) as shown below. In this example, male with 293 data points is selected.

Gender Filter Example 1

Click 'Apply' and the subtrack re-renders to reflect the updated filter. In the example below, the subtrack reduces from 675 samples to the 293 male samples with adenomas and adenocarcinomas. The figure shows the difference in mutations in the two tracks. Out of the original 333 samples, 72 of 293 males report the G12D mutation.

Gender Filter Example 2

Click on the 'Close' option to remove the subtrack from the page.

Remove Subtrack

Viewing in the Lollipop Display

In the lollipop chart, users can drag the protein track down by clicking the name of the gene on the left of the protein track and pulling it below the lollipop chart.

Protein Track

Detailed variant annotation is viewable by clicking on the variant disc next to the label. For G12D highlighted in a red outline in the image above, click on the '333' disc. A sunburst chart will appear, shown below.

The center displays the occurrence of the variant (333) above the variant label. The ring

Variant Occurrence

hierarchy is arranged by disease types then broken down by primary sites. Hovering over the inner ring displays the disease type, number of samples, and cohort size. In this example, the inner green ring displays 'Plasma Cell Tumors' with 28 samples out of a total 949 samples.

The outer ring represents the primary sites. Hovering over the primary site displays the number of samples relative to the disease type. In the figure below, for Ductal and lobular

Variant Occurrence

neoplasms, there are 105 samples with the primary site as pancreas out of 316 total samples.

Node

Clicking on a node displays a sample table for the disease type or primary site. In the figure below, the user selected 'Plasma Cell Tumors'. The sample annotation table appears for all Plasma Cell Tumors.

Sample Annotation Table 1

An aggregate sample table is available by clicking the 'Info' button in the center of the sunburst. This displays all the samples associated with that variant. In the screen recording below the aggregate sample table appears for KRAS - G12D.

Sample Annotation Table 2

Clicking on the sample name hyperlink opens a new tab to the sample's GDC Case Summary page.

Clicking on the variant label in the center removes the sunburst chart.

Variant Label

Working With the Protein Track

There are two zoom methods: highlighting a region and zoom buttons in the toolbar. For viewing a nucleotide of interest, click and drag the mouse in the top, x-axis, Protein length scale. The region appears highlighted in red with the calculated protein length in center.

Zoom Method

Once the mouse is released, the lollipop re-renders as the selected region.

The zoom buttons in the toolbar is the second option to zoom in and out based on the center position of the lollipop. For zooming out, users can choose to zoom out 2x, 10x or 50x times.

Zoom Options

Zooming in causes the protein track to display the codons and the nucleotides as shown below. Hovering over the nucleotide position displays a tooltip with the exon, amino acids position, RNA position, and protein domain. As shown in the image below, at codon 12, the second exon of the transcript, RNA position 225 bp, the reference allele is a 'G'. There is a substitution at 'G' to A, V and D in the KRAS gene for isoform NM_004985 for which the cases are as shown below.

Codons and Nucleotides

Legend Panel

The protein track color codes regions by the protein domain present on the full-length protein region in the exon display. For KRAS, the protein domains are shown in the red box in the image below.

Protein Domains

Protein Domain Legend

Clicking on the colored box next to the protein domain label removes the color from the protein track, as depicted below.

Remove Color from Protein Track

Custom protein domains are added by clicking on the '+add protein domain' button at the bottom of the list. An input box appears requiring the following information: 1. Name, text with space, no semicolon: This is the name of the protein domain 2. Range, two integers joined by space: This is the codon position - start and stop 3. Color (e.g., red, #FF0000, rgb (255,0,0)): This is the color to assign to the protein domain.

GDC Mutations

The lollipop discs are color coded per GDC mutation classes. The legend for the mutations appears below the protein domains with more advanced show/hide functions.

Color Coding Legend

The classification for the type of variant is color coded as follows:

Color Coding Classification

Clicking on a mutation prompts a pop-up menu to appear with the description of the mutation. Options to 'hide' or 'show only' are specific to the mutation. The option 'show all' includes all previously hidden mutations. Selecting 'MISSENSE' shown in the figure below by the yellow highlight displays the initial menu with the 'hide' and 'show only' buttons.

Mutation Pop-Up Menu

Clicking 'Hide' removes all of the mutation discs from the lollipop. The mutation is reordered to the end of the list and the font is striked through and grayed out. The discs reappear when the mutation label is clicked again.

Hide Mutation Discs

More Options

ProteinPaint offers methods to download figures and data. Click the 'More' button in the toolbar to display various options as shown below.

ProteinPaint More Options

Exporting the Figure

Click "Export SVG" to download the lollipop and legend as an SVG file.

Export SVG

The exported figure will contain following contents, reflecting a user's customization: Displayed datasets, including custom data Expand/fold states of all mutations Sequences in the protein if at zoom-in level Show/hide state of exon boundaries Sunburst charts Protein domains without the hidden ones All mutations without the hidden classes or origins Legend for protein domain, mutation class and origin

Copying the DNA Sequence

The 'More' button also includes a 'DNA sequence' button.

DNA Sequence

Clicking on 'DNA sequence' displays the DNA sequence as plain text for easy copying and pasting.

DNA Sequence Plan Text

The pop up option under the More button allows for popping open another window with the same lollipop display selected by the user. Below is an example.

Popup

Popup Window