miRNA Analysis Pipeline

Introduction

The GDC miRNA quantification analysis makes use of a modified version of the profiling pipeline that the British Columbia Genome Sciences Centre developed. The pipeline generates TCGA-formatted miRNAseq data. The first step is read alignment. The tool then compares the individual reads to sequence feature annotations in miRBase v21 and UCSC. Of note, however, the tool only annotates those reads that have an exact match with known miRNAs in miRBase and should therefore not be considered for novel miRNA identification or mismatched alignments.

For more information see BCGSC's GitHub or the original publication.

Data Processing Steps

Alignment Workflow

The miRNA pipeline begins with the Alignment Workflow, which in the case of miRNA uses BWA-aln. This outputs one BAM file for each read group in the input.

I/O	Entity	Format
Input	Submitted Unaligned Reads or Submitted Aligned Reads	FASTQ or BAM
Output	Aligned Reads	BAM

miRNA Expression Workflow

Following alignment, BAM files are processed through the miRNA Expression Workflow.

The outputs of the miRNA profiling pipeline report raw read counts and counts normalized to reads per million mapped reads (RPM) in two separate files mirnas.quantification.txt and isoforms.quantification.txt. The former contains summed expression for all reads aligned to known miRNAs in the miRBase reference. If there are multiple alignments to different miRNAs or different regions of the same miRNA, the read is flagged as cross-mapped and every miRNA annotation is preserved. The latter contains observed isoforms.

I/O	Entity	Format
Input	Aligned Reads	BAM
Output	miRNA Expression	TXT

File Access and Availability

Type	Description	Format
Aligned Reads	miRNA-Seq reads that have been aligned to the GRCh38 build. Reads that were not aligned are included to facilitate the availability of raw read sets.	BAM
miRNA Expression Quantification	A table that associates miRNA IDs with read count and a normalized count in reads-per-million-miRNA-mapped.	TXT
Isoform Expression Quantification	A table with the same information as the miRNA Expression Quantification files with the addition of isoform information such as the coordinates of the isoform and the type of region it constitutes within the full miRNA transcript.	TXT