Fragments Per Kilobase of transcript per Million mapped reads (FPKM) is a simple expression level normalization method. The FPKM normalizes read count based on gene length and the total number of mapped reads.
FPKM is implemented at the GDC on gene-level read counts that are produced by HTSeq1 and generated using custom scripts2. The formula used to generate FPKM values is as follows:
FPKM = [RMg * 109 ] / [RMt * L]
- RMg: The number of reads mapped to the gene
- RMt: The total number of read mapped to protein-coding sequences in the alignment
- L: The length of the gene in base pairs
The scalar (109) is added to normalize the data to "kilo base" and "million mapped reads."
Like HTSeq - count files, FPKM files are available as tab delimited files with the Ensembl gene IDs in the first column and the expression values in the second. See HTSeq-FPKM-UQ for an alternative method of gene expression level normalization.
- Anders, S., Pyl, P.T. and Huber, W., 2014. HTSeq-a Python framework to work with high-throughput sequencing data. Bioinformatics, p.btu638.
- GDC mRNA-Seq Documentation
Categories: Workflow Type