Data Transfer Tool Command Line Documentation

Downloads

Downloading Data Using a Manifest File

A convenient way to download multiple files from the GDC is to use a manifest file generated by the GDC Data Portal. After generating a manifest file (see Preparing for Data Download and Upload for instructions), initiate the download using the GDC Data Transfer Tool by supplying the -m or --manifest option, followed by the location and name of the manifest file. OS X users can drag and drop the manifest file into Terminal to provide its location.

The following is an example of a command for downloading files from GDC using a manifest file:

gdc-client download -m  /Users/JohnDoe/Downloads/gdc_manifest_6746fe840d924cf623b4634b5ec6c630bd4c06b5.txt

Downloading Data Using GDC File UUIDs

The GDC Data Transfer Tool also supports downloading of one or more individual files using UUID(s) instead of a manifest file. To do this, enter the UUID(s) after the download command:

gdc-client download 22a29915-6712-4f7a-8dba-985ae9a1f005

Multiple UUIDs can be specified, separated by a space:

gdc-client download e5976406-473a-4fbd-8c97-e95187cdc1bd fb3e261b-92ac-4027-b4d9-eb971a92a4c3

Resuming a Failed Download

The GDC Data Transfer Tool supports resumption of interrupted downloads. To resume an incomplete download, repeat the download of the manifest or UUID(s) in the same folder as the initial download. Failed downloads will appear in the destination folder with a .partial extension. This feature allows users the ability to identify quickly where the download stopped. For large downloads this feature can let the user identify where the download was interrupted and edit the manifest accordingly.

gdc-client download f80ec672-d00f-42d5-b5ae-c7e06bc39da1

Download Latest Version of a File

The GDC Data Transfer Tool supports file versioning. Our backend data storage supports multiple file versions so older and current versions can be accessible to our users. For information about accessing file versioning information with our API and finding older UUID information from current UUIDs please check out the the API User Guide section in our API documentation. When working with older manifests or older lists of UUIDs the latest version of a file can always be download with the --latest flag.

gdc-client download 426de656-7e34-4a49-b87e-6e2563fa3cdd --latest -t gdc-user-token.2018.txt
Downloading LATEST versions of files
Latest version for 426de656-7e34-4a49-b87e-6e2563fa3cdd ==> 6633bfbd-87f1-4d3a-a475-7ad1e8c2017a
100% [#############################################################################################################################] Time: 0:01:16  14.10 MB/s
Successfully downloaded: 1

Downloading Controlled-Access Data

A user authentication token is required for downloading Controlled-Access Data from GDC. Tokens can be obtained from the GDC Data Portal (see instructions in Obtaining an Authentication Token). Once downloaded, the token file can be passed to the GDC Data Transfer Tool using the -t or --token-file option:

gdc-client download -m gdc_manifest_e24fac38d3b19f67facb74d3efa746e08b0c82c2.txt -t gdc-user-token.2015-06-17T09-10-02-04-00.txt

Directory structure of downloaded files

The directory in which the files are downloaded will include folders named by the file UUID. Inside these folders, along with the the data and zipped metadata or index files, will exist a logs folder. The logs folder contains state files that insure that downloads are accurate and allow for resumption of failed or prematurely stopped downloads. While a download is in progress a file will have a .partial extension. This will also remain if a download failed. Once a file is finished downloading the extension will be removed. If an identical manifest is retried another attempt will be made to download files containing a .partial extension.

C501.TCGA-BI-A0VR-10A-01D-A10S-08.5_gdc_realn.bam.partial  logs

Uploads

Uploading Data Using a Manifest File

GDC Data Transfer Tool supports uploading molecular data using a manifest file to the Data Submission Portal. The manifest file for submittable data files can be retrieved from the GDC Data Submission Portal, or directly from the GDC Submission API given a submittable data file UUID. The user authentication token file needs to be specified using the -t or --token-file option.

First, generate an upload manifest, either using the GDC Data Submission Portal, or using a call to the GDC Submission API manifest endpoint (as in the following example):

export token=ALPHANUMERICTOKEN-01234567890+AlPhAnUmErIcToKeN=0123456789-ALPHANUMERICTOKEN-01234567890+AlPhAnUmErIcToKeN=0123456789-ALPHANUMERICTOKEN-01234567890+AlPhAnUmErIcToKeN=0123456789-ALPHANUMERICTOKEN-01234567890+AlPhAnUmErIcToKeN=0123456789-ALPHANUMERICTOKEN-01234567890+AlPhAnUmErIcToKeN=0123456789-ALPHANUMERICTOKEN-01234567890+AlPhAnUmErIcToKeN=0123456789-ALPHANUMERICTOKEN-01234567890+AlPhAnUmErIcToKeN=0123456789-ALPHANUMERICTOKEN-01234567890+AlPhAnUmErIcToKeN=0123456789-ALPHANUMERICTOKEN-01234567890+AlPhAnUmErIcToKeN=0123456789-ALPHANUMERICTOKEN-01234567890+AlPhAnUmErIcToKeN=0123456789-ALPHANUMERICTOKEN-01234567890+AlPhAnUmErIcToKeN=0123456789-ALPHANUMERICTOKEN-01234567890+AlPhAnUmErIcToKeN=0123456789-ALPHANUMERICTOKEN-01234567890+AlPhAnUmErIcToKeN=0123456789-ALPHANUMERICTOKEN-01234567890+AlPhAnUmErIcToKeN=0123456789-ALPHANUMERICTOKEN-01234567890+AlPhAnUmErIcToKeN=0123456789-ALPHANUMERICTOKEN-01234567890+AlPhAnUmErIcToKeN=0123456789-ALPHANUMERICTOKEN-01234567890+AlPhAnUmErIcToKeN=0123456789-ALPHANUMERICTOKEN-01234567890+AlPhAnUmErIcToKeN=0123456789-ALPHANUMERICTO

curl --header "X-Auth-Token: $token" 'https://api.gdc.cancer.gov/submission/CGCI/BLGSP/manifest?ids=460ad2fe-5a7f-4797-9e18-336d33e21444' >manifest.yml
gdc-client upload --manifest manifest.yml --token-file token.txt

Uploading Data Using a GDC File UUID

The GDC Data Transfer Tool also supports uploading molecular data using a file UUID. The tool will first make a request to get the filename and project id from GDC API, and then upload the corresponding file from the current directory.

gdc-client upload cd939bdd-b607-4dd4-87a6-fad12893932d -t token.txt

Resuming a Failed Upload

By default, GDC Data Transfer Tool uses multipart transfer to upload files. If an upload failed but some parts were transmitted successfully, a resume file will be saved with the filename resume_[manifest_filename]. Running the upload command again will resume the transfer of only those parts of the file that failed to upload in the previous attempt.

gdc-client upload -m manifest.yml -t token

Deleting Previously Uploaded Data

Previously uploaded data can be replaced with new data by deleting it first using the --delete switch:

gdc-client upload -m manifest.yml -t token --delete

Troubleshooting

Invalid Token

An error message about an 'invalid token' means that a new authentication token needs to be obtained from the GDC Data Portal or the GDC Data Submission Portal as described in Preparing for Data Download and Upload.

 403 Client Error: FORBIDDEN: {
      "message": "Your token is invalid or expired, please get a new token from GDC Data Portal"
      }

dbGaP Permissions Error

Users may see the following error message when attempting to download a file from GDC:

 403 Client Error: FORBIDDEN: {
      "message": "You don't have access to the data: Please specify a X-Auth-Token"
      }

This error message indicates that the user does not have dbGaP access to the project to which the file belongs. Instructions for requesting access from dbGaP can be found here.

File Availability Error

Users may also see the following error message when attempting to download a file from GDC:

 403 Client Error: FORBIDDEN: {
    "message": "You don't have access to the data: Requested file abd28349-92cd-48a3-863a-007a218de80f does not allow read access"
    }

This error message means that the file is not available for download. This may be because the file has not been uploaded or released yet or that it is not a file entity.

GDC Upload Privileges Error

Users may see the following error message when attempting to upload a file:

 Can't upload: {
     "message": "You don't have access to the data: You don't have create role to do 'upload'"
     }

This means that the user has dbGaP read access to the data, but does not have GDC upload privileges. Users can contact The database of Genotypes and Phenotypes (dbGaP) to request upload privileges.

File in Uploaded State Error

Re-uploading a file may return the following error:

 Can't upload: {
      "message": "File in uploaded state, upload not allowed"
      }

To resolve this issue, delete the file using the --delete switch before re-uploading.

Microsoft Windows Executable Error

Attempting to run gdc-client.exe by double-clicking it in the Windows Explorer will produce a window that blinks once and disappears.

This is normal, the executable must be run using the command prompt. Click 'Start', followed by 'Run' and type 'cmd' into the text bar. Then navigate to the path containing the executable using the 'cd' command.

Help Menus

The GDC Data Transfer Tool comes with built-in help menus. These menus are displayed when the GDC Data Transfer Tool is run with flags -h or --help for any of the main arguments to the tool. Running the GDC Data Transfer Tool without argument or flag will present a list of available command options.

gdc-client --help
usage: gdc-client [-h] [--version] {download,upload,settings} ...

The Genomic Data Commons Command Line Client

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit

commands:
  {download,upload,settings}
                        for more information, specify -h after a command
    download            download data from the GDC
    upload              upload data to the GDC
    settings            display default settings
The available menus are provided below.

Root menu

The GDC Data Transfer Tool displays the following output when executed without any arguments.

gdc-client
usage: gdc-client [-h] [--version] {download,upload,settings} ...
gdc-client: error: too few arguments

Download help menu

The GDC Data Transfer Tool displays the following help menu for its download functionality.

gdc-client download --help
usage: gdc-client download [-h] [--debug]
                           [--log-file LOG_FILE]
                           [--color_off] [-t TOKEN_FILE]
                           [-d DIR] [-s server]
                           [--no-segment-md5sums]
                           [--no-file-md5sum]
                           [-n N_PROCESSES]
                           [--http-chunk-size HTTP_CHUNK_SIZE]
                           [--save-interval SAVE_INTERVAL]
                           [--no-verify]
                           [--no-related-files]
                           [--no-annotations]
                           [--no-auto-retry]
                           [--retry-amount RETRY_AMOUNT]
                           [--wait-time WAIT_TIME]
                           [--latest] [--config FILE] [-u]
                           [-m MANIFEST]
                           [file_id [file_id ...]]

positional arguments:
file_id               The GDC UUID of the file(s) to download

optional arguments:
-h, --help            show this help message and exit
--debug               Enable debug logging. If a failure occurs, the program
                      will stop.
--log-file LOG_FILE   Save logs to file. Amount logged affected by --debug
--color_off           Disable colored output
-t TOKEN_FILE, --token-file TOKEN_FILE
                      GDC API auth token file
-d DIR, --dir DIR     Directory to download files to. Defaults to current
                      dir
-s server, --server server
                      The TCP server address server[:port]
--no-segment-md5sums  Do not calculate inbound segment md5sums and/or do not
                      verify md5sums on restart
--no-file-md5sum      Do not verify file md5sum after download
-n N_PROCESSES, --n-processes N_PROCESSES
                      Number of client connections.
--http-chunk-size HTTP_CHUNK_SIZE, -c HTTP_CHUNK_SIZE
                      Size in bytes of standard HTTP block size.
--save-interval SAVE_INTERVAL
                      The number of chunks after which to flush state file.
                      A lower save interval will result in more frequent
                      printout but lower performance.
--no-verify           Perform insecure SSL connection and transfer
--no-related-files    Do not download related files.
--no-annotations      Do not download annotations.
--no-auto-retry       Ask before retrying to download a file
--retry-amount RETRY_AMOUNT
                      Number of times to retry a download
--wait-time WAIT_TIME
                      Amount of seconds to wait before retrying
--latest              Download latest version of a file if it exists
--config FILE         Path to INI-type config file
-u, --udt             Use the UDT protocol.
-m MANIFEST, --manifest MANIFEST
                      GDC download manifest file

Upload help menu

The GDC Data Transfer Tool displays the following help menu for its upload functionality.

gdc-client upload --help
usage: gdc-client upload [-h] [--debug]
                                            [--log-file LOG_FILE]
                                            [--color_off] [-t TOKEN_FILE]
                                            [--project-id PROJECT_ID]
                                            [--path path]
                                            [--upload-id UPLOAD_ID]
                                            [--insecure] [--server SERVER]
                                            [--part-size PART_SIZE]
                                            [--upload-part-size UPLOAD_PART_SIZE]
                                            [-n N_PROCESSES]
                                            [--disable-multipart] [--abort]
                                            [--resume] [--delete]
                                            [--manifest MANIFEST]
                                            [--config FILE]
                                            [file_id [file_id ...]]
positional arguments:
 file_id               The GDC UUID of the file(s) to upload

optional arguments:
 -h, --help            show this help message and exit
 --debug               Enable debug logging. If a failure occurs, the program
                       will stop.
 --log-file LOG_FILE   Save logs to file. Amount logged affected by --debug
 --color_off           Disable colored output
 -t TOKEN_FILE, --token-file TOKEN_FILE
                       GDC API auth token file
 --project-id PROJECT_ID, -p PROJECT_ID
                       The project ID that owns the file
 --path path, -f path  directory path to find file
 --upload-id UPLOAD_ID, -u UPLOAD_ID
                       Multipart upload id
 --insecure, -k        Allow connections to server without certs
 --server SERVER, -s SERVER
                       GDC API server address
 --part-size PART_SIZE
                       DEPRECATED in favor of [--upload-part-size]
 --upload-part-size UPLOAD_PART_SIZE, -c UPLOAD_PART_SIZE
                       Part size for multipart upload
 -n N_PROCESSES, --n-processes N_PROCESSES
                       Number of client connections
 --disable-multipart   Disable multipart upload
 --abort               Abort previous multipart upload
 --resume, -r          Resume previous multipart upload
 --delete              Delete an uploaded file
 --manifest MANIFEST, -m MANIFEST
                       Manifest which describes files to be uploaded
 --config FILE         Path to INI-type config file

Data Transfer Tool Configuration File

The DTT has the ability to save and reuse configuration parameters in the format of a flat text file via a command line argument. A simple text file needs to be created first with an extension of either txt or dtt. The supported section headers are upload and download which can be used independently of each other or used in the same configuration file. Each section header corresponds to the main functions of the application which are to either download data from the GDC portals or to upload data to the submission system of the GDC. The configurable parameters are those listed in the help menus under either download or upload displayed under the output tabs.

Example usage:

gdc-client download d45ec02b-13c3-4afa-822d-443ccd3795ca --config my-dtt-config.dtt

Example of configuration file:

[upload]
path = /some/upload/path
upload_part_size = 1073741824


[download]
dir = /some/download/path
http_chunk_size = 2048
retry_amount = 6

Display Config Parameters

This command line flag can be used with either the download or upload application feature to display what settings are active within a custom data transfer tool configuration file.

gdc-client settings download --config my-dtt-config.dtt
[download]
no_auto_retry = False
no_file_md5sum = False
save_interval = 1073741824
http_chunk_size = 2048
server = http://exmple-site.com
n_processes = 8
no_annotations = False
no_related_files = False
retry_amount = 6
no_segment_md5sum = False
manifest = []
wait_time = 5.0
no_verify = True
dir = /some/download/path