Rapid Development and Distribution of Statistical Tools for High-Throughput Sequencing Data

WP3 Differential Analysis of Count Data

Statistical methods for inference from count data are central to many uses of HTS. Mapped reads are often summarised by counting their overlaps with genomic features (e. g. genes, exons, binding regions) in samples in different experimental conditions. Similar statistical methods can be used for a range of applications: in the case of RNA-Seq, one may test or regress for differential expression as a function of the experimental covariates; with ChIP-Seq data, there is interest in differential binding across different conditions or tissue samples. Consequently there is a pressing need for adapting statistical modelling methods (testing, regression, classification, clustering) to this data type. The WP contributors are among the leaders in this field, in particular, they have authored or co-authored the widely used software packages DESeq and edgeR, which employ the methodology of generalised linear models (GLMs) and provide specialised model fitting approaches. However, substantial challenges remain, which we will address in this WP. The methods developed in this WP will provide data pre-processing steps required for the systems biology applications described in WP7. This WP will closely collaborate with WP10 to address the reciprocal dependence between modelling and parameter choice on one hand, and method benchmarking on the other.


EMBL (Lead Partner)
University of Zurich
University of Cambridge
University of Sheffield