Data Submission Portal
This section will walk users through the submission process using the GDC Data Submission Portal to upload files to the GDC.
Accessing the GDC Data Submission Portal requires eRA Commons credentials with appropriate dbGaP authorization. To learn more about obtaining the required credentials and authorization, see Obtaining Access to Submit Data.
Authentication via eRA Commons
Users can log into the GDC Data Submission Portal with eRA Commons credentials by clicking the "Login" button. If authentication is successful, the user will be redirected to the GDC Data Submission Portal front page and the user's eRA Commons username will be displayed in the upper right corner of the screen.
GDC Authentication Tokens
The GDC Data Portal provides authentication tokens for use with the GDC Data Transfer Tool or the GDC API. To download a token:
- Log into the GDC using your eRA Commons credentials.
- Click the username in the top right corner of the screen.
- Select the "Download Token" option.
A new token is generated each time the
Download Token button is clicked.
For more information about authentication tokens, see Data Security.
NOTE: The authentication token should be kept in a secure location, as it allows access to all data accessible by the associated user account.
To log out of the GDC, click the username in the top right corner of the screen, and select the Logout option. Users will automatically be logged out after 15 minutes of inactivity.
After authentication, users are redirected to a homepage. The homepage acts as the entry point for GDC data submission and provides submitters with access to a list of authorized projects, reports, and transactions. Content on the homepage varies based on the user profile (e.g. submitter, program office).
Project summary reports can be downloaded at the Submission Portal homepage at three different levels:
ALIQUOT OVERVIEW, and
DATA VALIDATION. Each report is generated in tab-delimited format in which each row represents an active project.
CASE OVERVIEW: This report describes the number of cases with associated biospecimen data, clinical data, or submittable data files (broken down by data type) for each project.
ALIQUOT OVERVIEW: This report describes the number of aliquots in a project with associated data files. Aliquot numbers are broken down by sample tissue type.
DATA VALIDATION: This report categorizes all submittable data files associated with a project by their file status.
The projects section in the homepage lists the projects that the user has access to along with basic information about each project. For users with access to a large number of projects, this table can be filtered using the 'FILTER PROJECTS' field. Selecting a project ID will direct the user to the project's Dashboard. The button used to release data for each project is also located on this screen, see Release for details.
The GDC Data Submission Portal dashboard provides details about a specific project.
The dashboard contains various visual elements to guide the user through all stages of submission, from viewing the Data Dictionary, support of data upload, to submitting a project for harmonization.
To better understand the information displayed on the dashboard and the available actions, please refer to the Data Submission Walkthrough.
The Project Overview sections of the dashboard displays the most current project state (open / review / submitted / processing) and the GDC Release, which is the date in which the project was released to the GDC.
The search field at the top of the dashboard allows for submitted entities to be searched by partial or whole
submitter_id. When a search term is entered into the field, a list of entities matching the term is updated in real time. Selecting one of these entities links to its details in the Browse Tab.
The remaining part of the top section of the dashboard is broken down into four status charts:
- Cases with Clinical: The number of
casesfor which Clinical data has been uploaded.
- Cases with Biospecimen: The number of
casesfor which Biospecimen data has been uploaded.
- Cases with Submittable Data Files: The number of
casesfor which experimental data has been uploaded.
- Submittable Data Files: The number of files uploaded through the GDC Data Transfer Tool. For more information on this status chart, please refer to File Lifecycle.
DOWNLOAD MANIFEST: This button below the status chart allows the user to download a manifest for registered files in this project that have not yet been uploaded.
There are two action panels available below the Project Overview.
- UPLOAD DATA TO YOUR WORKSPACE: Allows a submitter to upload project data to the GDC project workspace. The GDC will validate the uploaded data against the GDC Data Dictionary. This panel also contains a table that displays details about the five latest transactions. Clicking the IDs in the first column will bring up a window with details about the transaction, which are documented in the transactions page. This panel will also allow the user to commit file uploads to the project.
- REVIEW AND SUBMIT YOUR WORKSPACE DATA TO THE GDC: Allows a submitter to review project data which will lock the project to ensure that additional data cannot be uploaded while in review. Once the review is complete, the data can be submitted to the GDC for processing through the GDC Harmonization Process.
These actions and associated features are further detailed in their respective sections of the documentation.
The transactions page lists all of the project's transactions. The transactions page can be accessed by choosing the Transactions tab at the top of the dashboard or by choosing "View All Data Upload Transactions" in the first panel of the dashboard.
The types of transactions are the following:
- Upload: The user uploads data to the project workspace. Note that submittable data files uploaded using the GDC Data Transfer tool do not appear as transactions. Uploaded submittable data can be viewed in the Browse tab.
- Delete: The user deletes data from the project workspace.
- Review: The user reviews the project before submitting data to the GDC.
- Open: The user re-opens the project if it was under review. This allows the upload of new data to the project workspace.
- Submit: The user submits uploaded data to the GDC. This triggers the data harmonization process.
- Release: The user releases harmonized data to be available through the GDC Data Portal and other GDC data access tools.
Transactions List View
The transactions list view displays the following information:
|ID||Identifier of the transaction|
|Type||Type of the transaction (see the list of transaction types in the previous section)|
|Step||The step of the submission process that each file is currently in. This can be Validate or Commit. "Validate" represents files that have not yet been committed but have been uploaded using the submission portal or the API.|
|DateTime||Date and Time that the transaction was initiated|
|User||The username of the submitter that performed the transaction|
|State||Indicates the status of the transaction:
|Commit/Discard||Two buttons appear when data has been uploaded using the API or the submission portal. This allows for validated data to be incorporated into the project or discarded. This column will then display the transaction number for commited uploads and "Discarded" for the uploads that are discarded.|
Choosing from the drop-down menu at the top of the table allows the transactions to be filtered by those that are in progress, to be committed, succeeded, failed, or discarded. The drop-down menu also allows for the transactions to be filtered by type and step.
Clicking on a transaction will open the details panel. Data in this panel is organized into multiple sections including actions, details, types, and documents as described below.
Navigation between the sections can be performed by either scrolling down or by clicking on the section icon displayed on the left side of the details panel.
The Actions section allows a user to perform an action for transactions that provide actions. For example, if a user uploads read groups and file metadata, a corresponding manifest file will be available for download from the transaction. This manifest is used to upload the actual files through the GDC Data Transfer Tool.
The Details section provides details about the transaction itself, such as its project, type, and number of affected cases.
The Types section lists the type of files submitted and the number of affected cases and entities.
The Documents section lists the files submitted during the transaction. The user can download the original files from the transaction, a report detailing the transaction, or the errors that originated from the transaction that has failed.
Browse menu provides access to all of a project's content. Most content is driven by the GDC Data Dictionary and the interface is dynamically generated to accommodate the content.
Please refer to the GDC Data Dictionary Viewer for specific details about dictionary-generated fields, columns, and filters.
Main Interface Elements
A wide set of filters are available for the user to select the type of entity to be displayed. These filters are dynamically created based on the GDC Data Dictionary.
Current filters are:
|Clinical||Display all Clinical data uploaded to the project workspace. This is divided into subgroups including
|Biospecimen||Display all Biospecimen data uploaded to the project workspace. This is divided into subgroups including
|Submittable Data Files||Displays all data files that have been registered with the project. This includes files that have been uploaded and those that have been registered but not uploaded yet. This category is divided into groups by file type.|
|Annotations||Lists all annotations associated with the project. An annotation provides an explanatory comment associated with data in the project.|
|Harmonized Data Files||Lists all data files that have been harmonized by the GDC. This category is divided into groups by generated data.|
The list view is a paginated list of all entities corresponding to the selected filter.
On the top-right section of the screen, the user can download data about all entities associated with the selected filter.
- For the case filter, it will download all Clinical data or all Metadata.
- For all other filters, it will download the corresponding metadata (e.g., for the
demographicfilter, it will download all
Clicking on an entity will open the details panel. Data in this panel is broken down into multiple sections depending on the entity type. The main sections are:
- Actions: Actions that can be performed relating the entity. This includes downloading the metadata (JSON or TSV) or submittable data file pertaining to the entity and deleting the entity. See the Deleting Entities guide for more information.
- Summary: A list of IDs and system properties associated with the entity.
- Details: Properties of the entity (not associated with cases).
- Hierarchy or Related Entities: A list of associated entities.
- Annotations: A list of annotations associated with the entity.
- Transactions: A list of previous transactions that affect the entity.
The sections listed above can be navigated either by scrolling down or by clicking on the section icon on the left side of the details panel.
The Related Entities table lists all entities, grouped by type, related to the selected
case. This section is only available at the
This table contains the following columns:
- Category: category of the entity (Clinical, Biospecimen, submittable data file).
- Type: type of entity (based on Data Dictionary).
- Count: number of occurrences of an entity associated with the
case. Clicking on the count will open a window listing those entities within the Browse page.
The hierarchy section is available for entities at any level (e.g., Clinical, Biospecimen, etc.), except for
case. The user can use the hierarchy section to navigate through entities.
The hierarchy shows:
caseassociated with the entity.
- The direct parents of the entity.
- The direct children of the entity.
Submit Your Workspace Data to the GDC
The GDC Data Submission process is detailed on the Data Submission Processes and Tools section of the GDC Website.
The user will be able to view the section below on the dashboard. The
REVIEW button is available only if the project is in "OPEN" state.
Setting the project to the "REVIEW" state will lock the project and prevent users from uploading additional data. During this period, the submitter can browse the data in the Data Submission Portal or download it. Once the review is complete, the user can request to submit data to the GDC.
Once the user clicks on
REVIEW, the project state will change to "REVIEW":
The Harmonization step is NOT an automatic process that occurs when data is uploaded to the GDC. The GDC performs batch processing of submitted data for Harmonization only after verifying that the submission is complete.
The following tests must pass before the data can be considered complete:
All files that are registered have been uploaded and validated.
There are no invalid characters in the
submitter_idof any node. The acceptable characters are alphanumeric characters [a-z, A-Z, 0-9] and
-. Any other characters will interfere with the Harmonization workflow.
There are no data files with duplicate md5sums.
Clinical data nodes such as
clinical_supplement, are linked to
read_groupnode is linked to a valid node:
aliquotrelationships are valid. Common problems can sometimes be:
samplenodes of more than one type.
aliquotattached to more than one
samplenode, potentially valid but unusual.
aliquotnode is only associated with one
submitted_aligned_readsfile of the same
The information for the
platformis in the
read_groupnode. While the subsequent information about the platform is not required, it is beneficial to also have information on:
library_strategyshould match the
- Targeted Sequencing must be with either PCR or Hybrid Selection.
- WXS must be with Hybrid Selection.
- WGS must be with Random.
target_capture_kitproperty is completed when the selected
WXS. Errors will occur if
Check the nodes that are related to FASTQ files. For the
submitted_unaligned_readsnode, determine that the size is correct, the files are not compressed (
.tar.gz), and there is a link to
read_group. For the
read_groupnode, make sure that the
is_paired_endis set to
truefor paired end sequencing and
falsefor single end sequencing.
Once complete, clicking the
REQUEST HARMONIZATION button will indicate to the GDC Team and pipeline automation system that data processing can begin.
Submit to the GDC for Harmonization
When the project is ready for processing, the submitter will request to submit data to the GDC for Harmonization. If the project is not ready for processing, the project can be re-opened. Then the submitter will be able to upload more data to the project workspace.
REQUEST HARMONIZATION button is available only if the project is in "REVIEW" state. At this point, the user can decide whether to re-open the project to upload more data or to request harmonization of the data to the GDC. When the project is in "REVIEW" the following panel appears on the dashboard:
Once the user submits data to the GDC, they cannot modify the submitted nodes and files while harmonization is underway. Additional project data can be added during this period and will be considered a separate batch. To process an additional batch the user must again review the data and select
When the user clicks on the action
REQUEST HARMONIZATION on the dashboard, the following popup is displayed:
After the user clicks on
SUBMIT VALIDATED DATA TO THE GDC, the project state becomes "Harmonization Requested":
The GDC requests that users submit their data to the GDC for harmonization within six months from the first upload of data to the project workspace.
Project release occurs after the data has been harmonized, and allows users to access this data with the GDC Data Portal and other GDC Data Access Tools. The GDC will release data according to GDC Data Sharing Policies. Data must be released within six months after GDC data processing has been completed, or the submitter may request earlier release using the "Request Release" function. A project can only be released once.
When the user clicks on the action
REQUEST RELEASE, the following Release popup is displayed:
After the user clicks on
RELEASE SUBMITTED AND PROCESSED DATA, the project release state becomes "Release Requested":
Note: Released cases and/or files can be redacted from the GDC. For more information, visit the GDC Policies page (under GDC Data Sharing Policies).