HTSeq-FPKM

Description

Fragments Per Kilobase of transcript per Million mapped reads (FPKM) is a simple expression level normalization method. The FPKM normalizes read count based on gene length and the total number of mapped reads.

Overview

FPKM is implemented at the GDC on gene-level read counts that are produced by HTSeq1 and generated using custom scripts2. The formula used to generate FPKM values is as follows:

FPKM = [RMg * 109 ] / [RMt * L]

  • RMg: The number of reads mapped to the gene
  • RMt: The total number of read mapped to protein-coding sequences in the alignment
  • L: The length of the gene in base pairs

The scalar (109) is added to normalize the data to "kilo base" and "million mapped reads."

Like HTSeq - count files, FPKM files are available as tab delimited files with the Ensembl gene IDs in the first column and the expression values in the second. See HTSeq-FPKM-UQ for an alternative method of gene expression level normalization.

Tools

  1. HTSeq Website

References

  1. Anders, S., Pyl, P.T. and Huber, W., 2014. HTSeq-a Python framework to work with high-throughput sequencing data. Bioinformatics, p.btu638.
  2. GDC mRNA-Seq Documentation

Categories: Workflow Type