Downloading Files

The GDC API implements file download functionality using data and manifest endpoints. The data endpoint allows users to download files stored in the GDC by specifying file UUID(s). The manifest endpoint generates a download manifest file that can be used with the GDC Data Transfer Tool to transfer large volumes of data.

Note: Downloading controlled access data requires the use of an authentication token. See Getting Started: Authentication for details.

Data endpoint

To download a file, users can pass UUID(s) to the data endpoint. If a single UUID is provided, the API will return the associated file. If a comma-separated list of UUIDs is provided, the API will return an archive file containing the requested files.

The data endpoint supports GET and POST requests as demonstrated in the following examples.

Downloading a Single File using GET

This example demonstrates downloading a single file from the GDC. Here we pass the file's UUID to the data endpoint with a GET request.

curl --remote-name --remote-header-name 'https://api.gdc.cancer.gov/data/5b2974ad-f932-499b-90a3-93577a9f0573'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 6111k  100 6111k    0     0   414k      0  0:00:14  0:00:14 --:--:--  412k

If the related_files=true parameter is specified, the following related files, if available, will be included in the download package by the GDC API:

  • BAM index files (BAI files)
  • VCF index files (TBI files)

For example, this request will download a BAM file and its associated BAI file:

curl --remote-name --remote-header-name -H "x-auth-token: $token" "https://api.gdc.cancer.gov/data/f587ef82-acbe-44f9-ad5a-6207e148f61f?related_files=true"
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                               Dload  Upload   Total   Spent    Left  Speed
100 63.4M    0 63.4M    0     0  7541k      0 --:--:--  0:00:08 --:--:--  9.9M

Downloading Multiple Files using GET

This example demonstrates downloading multiple files from the GDC using a GET request. The GDC API returns a .tar.gz archive containing the downloaded files.

curl --remote-name --remote-header-name 'https://api.gdc.cancer.gov/data/e3228020-1c54-4521-9182-1ea14c5dc0f7,18e1e38e-0f0a-4a0e-918f-08e6201ea140'
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                               Dload  Upload   Total   Spent    Left  Speed
100  287k    0  287k    0     0  30131      0 --:--:--  0:00:09 --:--:-- 42759

Note: This method supports downloading a limited number of files at one time. To download a large number of files, please use POST.

Downloading an Uncompressed Group of Files

If the ?tarfile parameter is specified to a data endpoint download query all files requested in the download string will be bundled in a single tar file rather than a tar.gz file which is the default behavior.

curl --remote-name --remote-header-name 'https://api.gdc.cancer.gov/data/1da7105a-f0ff-479d-9f82-6c1d94456c91,77e73cc4-ff31-449e-8e3c-7ae5ce57838c?tarfile'

Downloading Multiple Files using POST

The following two examples demonstrate downloading multiple files from the GDC using a POST request that contains a payload in one of two formats: percent-encoded form data or JSON. The GDC API returns a .tar.gz archive containing the downloaded files.

POST request with form data payload

POST requests that carry a payload of percent-encoded form data must include the HTTP header Content-Type: application/x-www-form-urlencoded.

The payload is a string in the following format:

ids=UUID1&ids=UUID2&ids=UUID3...

where UUID# corresponds to the UUIDs of the files to be downloaded.

In this example we use curl to download a set of files from the GDC Data Portal. The payload is stored in a plain text file named Payload; curl includes the Content-Type: application/x-www-form-urlencoded header by default.

ids=59eb3fc5-9172-4828-8dec-0d9988073103&ids=869b7d7c-ff35-482a-aa8d-1a8675c161d3&ids=b8ffff40-aa0e-4534-b05f-9311f16c2f6b&ids=51e14969-30a7-42d9-8168-4a5ea422ca4a&ids=adcfc856-990b-40fc-8f1e-67dfc2343fb7&ids=7f1e9aee-eb4e-4c79-8626-b603c9be124d&ids=62a8feb5-c660-4261-bcd6-67fbb79bb422
curl --remote-name --remote-header-name --request POST 'https://api.gdc.cancer.gov/data' --data @Payload
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                Dload  Upload   Total   Spent    Left  Speed
100 6804k    0 6804k  100   286   245k     10  0:00:28  0:00:27  0:00:01  357k

POST request with JSON payload

POST requests that carry a JSON payload must include the HTTP header Content-Type: application/json.

The payload is a string in the following format:

{
    "ids":[
        "UUID1",
        "UUID2",
                ...
        "UUID3"
    ]
}

where UUID# corresponds to the UUIDs of the files to be downloaded.

In this example we use curl to download a set of files from the GDC Portal; the payload is stored in a plain text file named Payload.

{
    "ids":[
        "0451fc55-33ef-4151-a68c-cac59be716dc",
        "0cc3d450-2c60-4cb0-a073-d92dc979fa5e",
        "0de9bc40-3ef8-4fe7-b7d6-80a9339b0bf8",
        "0f8d8202-a1ca-4ea1-98b2-c20a6b08479a"
    ]
}
curl --remote-name --remote-header-name --request POST --header 'Content-Type: application/json' --data @request.txt 'https://api.gdc.cancer.gov/data'
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                Dload  Upload   Total   Spent    Left  Speed
100 5878k    0 5878k  100   205   290k     10  0:00:20  0:00:20 --:--:--  198k

Downloading Controlled-access Files

To download controlled-access files, a valid authentication token must be passed to the GDC API using the X-Auth-Token HTTP header:

token=$(<gdc-token-text-file.txt)

curl --remote-name --remote-header-name --header "X-Auth-Token: $token" 'https://api.gdc.cancer.gov/data/be6d269d-4305-4643-b98e-af703a067761'
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                             Dload  Upload   Total   Spent    Left  Speed
100 65.8M  100 65.8M    0     0   271k      0  0:04:08  0:04:08 --:--:--  288k

Manifest endpoint

The manifest endpoint generates a download manifest file that can be used with the GDC Data Transfer Tool. The Data Transfer Tool is recommended for transferring large volumes of data. The GDC API can also generate a download manifest from a list of results that match a Search and Retrieval query. To do this, append &return_type=manifest to the end of the query. Note that the "size" parameter does not work with the manifest endpoint and will return the entire set of files.

Using the manifest endpoint

The manifest endpoint allows users to create a download manifest, which can be used with the GDC Data Transfer Tool to download a large volume of data. The manifest endpoint generates a manifest file from a comma-separated list of UUIDs.

curl --remote-name --remote-header-name 'https://api.gdc.cancer.gov/v0/manifest/a751cc7e-d2ff-4e9a-8645-09bf12612f1a,9c97e3fe-1610-4a92-9a24-ab3b9e4000e2'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   274  100   274    0     0   1042      0 --:--:-- --:--:-- --:--:--  1041

The manifest endpoint also supports HTTP POST requests in the same format as the data endpoint; see above for details.

Using return_type=manifest

Alternatively, users can create a manifest by appending &return_type=manifest to a Search and Retrieval query. In this example, we generate a download manifest for RNA-seq data files from solid tissue normal samples, that are part of the TCGA-KIRC project:

curl --remote-name --remote-header-name 'https://api.gdc.cancer.gov/files?filters=%7B%22op%22%3A%22and%22%2C%22content%22%3A%5B%7B%22op%22%3A%22%3D%22%2C%22content%22%3A%7B%22field%22%3A%22experimental_strategy%22%2C%22value%22%3A%5B%22RNA-Seq%22%5D%7D%7D%2C%7B%22op%22%3A%22%3D%22%2C%22content%22%3A%7B%22field%22%3A%22cases.project.project_id%22%2C%22value%22%3A%5B%22TCGA-KIRC%22%5D%7D%7D%2C%7B%22op%22%3A%22%3D%22%2C%22content%22%3A%7B%22field%22%3A%22cases.samples.sample_type%22%2C%22value%22%3A%5B%22Solid+Tissue+Normal%22%5D%7D%7D%5D%7D&return_type=manifest'
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                Dload  Upload   Total   Spent    Left  Speed
100  100k    0  100k    0     0   277k      0 --:--:-- --:--:-- --:--:--  282k