HTSeq is a Python package that calculates the number of mapped reads to each gene.


The first step in generating gene expression values from an RNA-Seq alignment at the GDC is generating a count of the reads mapped to each gene1. These counts are performed using HTSeq2 and are calculated at the gene level. HTSeq-Count files are available in a tab-delimited format with one Ensembl gene ID column and one mapped reads column for each gene. These files are then processed further with custom scripts to generate FPKM and FPKM-UQ values.


  1. HTSeq Website


  1. GDC mRNA-Seq Documentation
  2. Anders, S., Pyl, P.T. and Huber, W., 2014. HTSeq-a Python framework to work with high-throughput sequencing data. Bioinformatics, p.btu638.

Categories: Workflow Type