Data Transfer Tool Command Line Documentation
Downloads
Downloading Data Using a Manifest File
A convenient way to download multiple files from the GDC is to use a manifest file generated by the GDC Data Portal. After generating a manifest file (see Preparing for Data Download and Upload for instructions), initiate the download using the GDC Data Transfer Tool by supplying the -m or --manifest option, followed by the location and name of the manifest file. OS X users can drag and drop the manifest file into Terminal to provide its location.
The following is an example of a command for downloading files from GDC using a manifest file:
gdc-client download -m /Users/JohnDoe/Downloads/gdc_manifest_6746fe840d924cf623b4634b5ec6c630bd4c06b5.txt
Downloading Data Using GDC File UUIDs
The GDC Data Transfer Tool also supports downloading of one or more individual files using UUID(s) instead of a manifest file. To do this, enter the UUID(s) after the download command:
gdc-client download 22a29915-6712-4f7a-8dba-985ae9a1f005
Multiple UUIDs can be specified, separated by a space:
gdc-client download e5976406-473a-4fbd-8c97-e95187cdc1bd fb3e261b-92ac-4027-b4d9-eb971a92a4c3
Resuming a Failed Download
The GDC Data Transfer Tool supports resumption of interrupted downloads. To resume an incomplete download, repeat the download of the manifest or UUID(s) in the same folder as the initial download. Failed downloads will appear in the destination folder with a .partial extension. This feature allows users the ability to identify quickly where the download stopped. For large downloads this feature can let the user identify where the download was interrupted and edit the manifest accordingly.
gdc-client download f80ec672-d00f-42d5-b5ae-c7e06bc39da1
Download Latest Version of a File
The GDC Data Transfer Tool supports file versioning. Our backend data storage supports multiple file versions so older and current versions can be accessible to our users. For information about accessing file versioning information with our API and finding older UUID information from current UUIDs please check out the the API User Guide section in our API documentation. When working with older manifests or older lists of UUIDs the latest version of a file can always be download with the --latest flag.
gdc-client download 426de656-7e34-4a49-b87e-6e2563fa3cdd --latest -t gdc-user-token.2018.txt
Downloading LATEST versions of files
Latest version for 426de656-7e34-4a49-b87e-6e2563fa3cdd ==> 6633bfbd-87f1-4d3a-a475-7ad1e8c2017a
100% [#############################################################################################################################] Time: 0:01:16 14.10 MB/s
Successfully downloaded: 1
Downloading Controlled-Access Data
A user authentication token is required for downloading Controlled-Access Data from GDC. Tokens can be obtained from the GDC Data Portal (see instructions in Obtaining an Authentication Token). Once downloaded, the token file can be passed to the GDC Data Transfer Tool using the -t or --token-file option:
gdc-client download -m gdc_manifest_e24fac38d3b19f67facb74d3efa746e08b0c82c2.txt -t gdc-user-token.2015-06-17T09-10-02-04-00.txt
Directory structure of downloaded files
The directory in which the files are downloaded will include folders named by the file UUID. Inside these folders, along with the the data and zipped metadata or index files, will exist a logs folder. The logs folder contains state files that insure that downloads are accurate and allow for resumption of failed or prematurely stopped downloads. While a download is in progress a file will have a .partial extension. This will also remain if a download failed. Once a file is finished downloading the extension will be removed. If an identical manifest is retried another attempt will be made to download files containing a .partial extension.
C501.TCGA-BI-A0VR-10A-01D-A10S-08.5_gdc_realn.bam.partial logs
Uploads
Uploading Data Using a Manifest File
GDC Data Transfer Tool supports uploading molecular data using a manifest file to the Data Submission Portal. The manifest file for submittable data files can be retrieved from the GDC Data Submission Portal, or directly from the GDC Submission API given a submittable data file UUID. The user authentication token file needs to be specified using the -t or --token-file option.
First, generate an upload manifest, either using the GDC Data Submission Portal, or using a call to the GDC Submission API manifest
endpoint (as in the following example):
export token=ALPHANUMERICTOKEN-01234567890+AlPhAnUmErIcToKeN=0123456789-ALPHANUMERICTOKEN-01234567890+AlPhAnUmErIcToKeN=0123456789-ALPHANUMERICTOKEN-01234567890+AlPhAnUmErIcToKeN=0123456789-ALPHANUMERICTOKEN-01234567890+AlPhAnUmErIcToKeN=0123456789-ALPHANUMERICTOKEN-01234567890+AlPhAnUmErIcToKeN=0123456789-ALPHANUMERICTOKEN-01234567890+AlPhAnUmErIcToKeN=0123456789-ALPHANUMERICTOKEN-01234567890+AlPhAnUmErIcToKeN=0123456789-ALPHANUMERICTOKEN-01234567890+AlPhAnUmErIcToKeN=0123456789-ALPHANUMERICTOKEN-01234567890+AlPhAnUmErIcToKeN=0123456789-ALPHANUMERICTOKEN-01234567890+AlPhAnUmErIcToKeN=0123456789-ALPHANUMERICTOKEN-01234567890+AlPhAnUmErIcToKeN=0123456789-ALPHANUMERICTOKEN-01234567890+AlPhAnUmErIcToKeN=0123456789-ALPHANUMERICTOKEN-01234567890+AlPhAnUmErIcToKeN=0123456789-ALPHANUMERICTOKEN-01234567890+AlPhAnUmErIcToKeN=0123456789-ALPHANUMERICTOKEN-01234567890+AlPhAnUmErIcToKeN=0123456789-ALPHANUMERICTOKEN-01234567890+AlPhAnUmErIcToKeN=0123456789-ALPHANUMERICTOKEN-01234567890+AlPhAnUmErIcToKeN=0123456789-ALPHANUMERICTOKEN-01234567890+AlPhAnUmErIcToKeN=0123456789-ALPHANUMERICTO
curl --header "X-Auth-Token: $token" 'https://api.gdc.cancer.gov/submission/CGCI/BLGSP/manifest?ids=460ad2fe-5a7f-4797-9e18-336d33e21444' >manifest.yml
gdc-client upload --manifest manifest.yml --token-file token.txt
Uploading Data Using a GDC File UUID
The GDC Data Transfer Tool also supports uploading molecular data using a file UUID. The tool will first make a request to get the filename and project id from GDC API, and then upload the corresponding file from the current directory.
gdc-client upload cd939bdd-b607-4dd4-87a6-fad12893932d -t token.txt
Resuming a Failed Upload
By default, GDC Data Transfer Tool uses multipart transfer to upload files. If an upload failed but some parts were transmitted successfully, a resume file will be saved with the filename resume_[manifest_filename]. Running the upload command again will resume the transfer of only those parts of the file that failed to upload in the previous attempt.
gdc-client upload -m manifest.yml -t token
Deleting Previously Uploaded Data
Previously uploaded data can be replaced with new data by deleting it first using the --delete switch:
gdc-client upload -m manifest.yml -t token --delete
Troubleshooting
Invalid Token
An error message about an 'invalid token' means that a new authentication token needs to be obtained from the GDC Data Portal or the GDC Data Submission Portal as described in Preparing for Data Download and Upload.
403 Client Error: FORBIDDEN: {
"message": "Your token is invalid or expired, please get a new token from GDC Data Portal"
}
dbGaP Permissions Error
Users may see the following error message when attempting to download a file from GDC:
403 Client Error: FORBIDDEN: {
"message": "You don't have access to the data: Please specify a X-Auth-Token"
}
This error message indicates that the user does not have dbGaP access to the project to which the file belongs. Instructions for requesting access from dbGaP can be found here.
File Availability Error
Users may also see the following error message when attempting to download a file from GDC:
403 Client Error: FORBIDDEN: {
"message": "You don't have access to the data: Requested file abd28349-92cd-48a3-863a-007a218de80f does not allow read access"
}
This error message means that the file is not available for download. This may be because the file has not been uploaded or released yet or that it is not a file entity.
GDC Upload Privileges Error
Users may see the following error message when attempting to upload a file:
Can't upload: {
"message": "You don't have access to the data: You don't have create role to do 'upload'"
}
This means that the user has dbGaP read access to the data, but does not have GDC upload privileges. Users can contact The database of Genotypes and Phenotypes (dbGaP) to request upload privileges.
File in Uploaded State Error
Re-uploading a file may return the following error:
Can't upload: {
"message": "File in uploaded state, upload not allowed"
}
To resolve this issue, delete the file using the --delete switch before re-uploading.
Microsoft Windows Executable Error
Attempting to run gdc-client.exe by double-clicking it in the Windows Explorer will produce a window that blinks once and disappears.
This is normal, the executable must be run using the command prompt. Click 'Start', followed by 'Run' and type 'cmd' into the text bar. Then navigate to the path containing the executable using the 'cd' command.
Help Menus
The GDC Data Transfer Tool comes with built-in help menus. These menus are displayed when the GDC Data Transfer Tool is run with flags -h or --help for any of the main arguments to the tool. Running the GDC Data Transfer Tool without argument or flag will present a list of available command options.
gdc-client --help
usage: gdc-client [-h] [--version] {download,upload,settings} ...
The Genomic Data Commons Command Line Client
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
commands:
{download,upload,settings}
for more information, specify -h after a command
download download data from the GDC
upload upload data to the GDC
settings display default settings
The available menus are provided below.
Root menu
The GDC Data Transfer Tool displays the following output when executed without any arguments.
gdc-client
usage: gdc-client [-h] [--version] {download,upload,settings} ...
gdc-client: error: too few arguments
Download help menu
The GDC Data Transfer Tool displays the following help menu for its download functionality.
gdc-client download --help
usage: gdc-client download [-h] [--debug]
[--log-file LOG_FILE]
[--color_off] [-t TOKEN_FILE]
[-d DIR] [-s server]
[--no-segment-md5sums]
[--no-file-md5sum]
[-n N_PROCESSES]
[--http-chunk-size HTTP_CHUNK_SIZE]
[--save-interval SAVE_INTERVAL]
[--no-verify]
[--no-related-files]
[--no-annotations]
[--no-auto-retry]
[--retry-amount RETRY_AMOUNT]
[--wait-time WAIT_TIME]
[--latest] [--config FILE] [-u]
[-m MANIFEST]
[file_id [file_id ...]]
positional arguments:
file_id The GDC UUID of the file(s) to download
optional arguments:
-h, --help show this help message and exit
--debug Enable debug logging. If a failure occurs, the program
will stop.
--log-file LOG_FILE Save logs to file. Amount logged affected by --debug
--color_off Disable colored output
-t TOKEN_FILE, --token-file TOKEN_FILE
GDC API auth token file
-d DIR, --dir DIR Directory to download files to. Defaults to current
dir
-s server, --server server
The TCP server address server[:port]
--no-segment-md5sums Do not calculate inbound segment md5sums and/or do not
verify md5sums on restart
--no-file-md5sum Do not verify file md5sum after download
-n N_PROCESSES, --n-processes N_PROCESSES
Number of client connections.
--http-chunk-size HTTP_CHUNK_SIZE, -c HTTP_CHUNK_SIZE
Size in bytes of standard HTTP block size.
--save-interval SAVE_INTERVAL
The number of chunks after which to flush state file.
A lower save interval will result in more frequent
printout but lower performance.
--no-verify Perform insecure SSL connection and transfer
--no-related-files Do not download related files.
--no-annotations Do not download annotations.
--no-auto-retry Ask before retrying to download a file
--retry-amount RETRY_AMOUNT
Number of times to retry a download
--wait-time WAIT_TIME
Amount of seconds to wait before retrying
--latest Download latest version of a file if it exists
--config FILE Path to INI-type config file
-u, --udt Use the UDT protocol.
-m MANIFEST, --manifest MANIFEST
GDC download manifest file
Upload help menu
The GDC Data Transfer Tool displays the following help menu for its upload functionality.
gdc-client upload --help
usage: gdc-client upload [-h] [--debug]
[--log-file LOG_FILE]
[--color_off] [-t TOKEN_FILE]
[--project-id PROJECT_ID]
[--path path]
[--upload-id UPLOAD_ID]
[--insecure] [--server SERVER]
[--part-size PART_SIZE]
[--upload-part-size UPLOAD_PART_SIZE]
[-n N_PROCESSES]
[--disable-multipart] [--abort]
[--resume] [--delete]
[--manifest MANIFEST]
[--config FILE]
[file_id [file_id ...]]
positional arguments:
file_id The GDC UUID of the file(s) to upload
optional arguments:
-h, --help show this help message and exit
--debug Enable debug logging. If a failure occurs, the program
will stop.
--log-file LOG_FILE Save logs to file. Amount logged affected by --debug
--color_off Disable colored output
-t TOKEN_FILE, --token-file TOKEN_FILE
GDC API auth token file
--project-id PROJECT_ID, -p PROJECT_ID
The project ID that owns the file
--path path, -f path directory path to find file
--upload-id UPLOAD_ID, -u UPLOAD_ID
Multipart upload id
--insecure, -k Allow connections to server without certs
--server SERVER, -s SERVER
GDC API server address
--part-size PART_SIZE
DEPRECATED in favor of [--upload-part-size]
--upload-part-size UPLOAD_PART_SIZE, -c UPLOAD_PART_SIZE
Part size for multipart upload
-n N_PROCESSES, --n-processes N_PROCESSES
Number of client connections
--disable-multipart Disable multipart upload
--abort Abort previous multipart upload
--resume, -r Resume previous multipart upload
--delete Delete an uploaded file
--manifest MANIFEST, -m MANIFEST
Manifest which describes files to be uploaded
--config FILE Path to INI-type config file
Data Transfer Tool Configuration File
The DTT has the ability to save and reuse configuration parameters in the format of a flat text file via a command line argument. A simple text file needs to be created first with an extension of either txt or dtt. The supported section headers are upload and download which can be used independently of each other or used in the same configuration file. Each section header corresponds to the main functions of the application which are to either download data from the GDC portals or to upload data to the submission system of the GDC. The configurable parameters are those listed in the help menus under either download or upload displayed under the output tabs.
Example usage:
gdc-client download d45ec02b-13c3-4afa-822d-443ccd3795ca --config my-dtt-config.dtt
Example of configuration file:
[upload]
path = /some/upload/path
upload_part_size = 1073741824
[download]
dir = /some/download/path
http_chunk_size = 2048
retry_amount = 6
Display Config Parameters
This command line flag can be used with either the download or upload application feature to display what settings are active within a custom data transfer tool configuration file.
gdc-client settings download --config my-dtt-config.dtt
[download]
no_auto_retry = False
no_file_md5sum = False
save_interval = 1073741824
http_chunk_size = 2048
server = http://exmple-site.com
n_processes = 8
no_annotations = False
no_related_files = False
retry_amount = 6
no_segment_md5sum = False
manifest = []
wait_time = 5.0
no_verify = True
dir = /some/download/path