FPKM

Description

Fragments Per Kilobase of transcript per Million mapped reads (FPKM) is a simple expression level normalization method. The FPKM normalizes read count based on gene length and the total number of mapped reads.

Overview

FPKM is implemented at the GDC on gene-level read counts that are produced by STAR1 and generated using custom scripts2. The formula used to generate FPKM values is as follows:

FPKM = [RMg * 109 ] / [RMt * L]

  • RMg: The number of reads mapped to the gene
  • RMt: The total number of read mapped to protein-coding sequences in the alignment
  • L: The length of the gene in base pairs

The scalar (109) is added to normalize the data to "kilo base" and "million mapped reads."

FPKM files are available as tab delimited files with the Ensembl gene IDs in the first column and the expression values in the second. See FPKM-UQ for an alternative method of gene expression level normalization.

References

  1. STAR-Fusion pipeline
  2. GDC mRNA-Seq Documentation

Categories: Workflow Type