Introduction

This project holds the links to the datasets, software and code that was used for our RNA-Seq alignment benchmarking project:

Benchmark Analysis of RNA-Seq Aligners

Abstract:

Alignment is the first step in most RNA-Seq analysis pipelines and the accuracy of downstream analyses depend heavily on this step. Many methods are available for alignment, with conflicting claims of superiority. Unlike most steps in the pipeline, alignment is particularly amenable to benchmarking with simulated data. Here a comprehensive benchmark analysis of fourteen common methods has been performed, with base, read and junction level accuracy metrics and a comparison of default versus optimized parameters. The benchmarking is performed on data of varying complexities, resulting in a large variation in performance and revealing relatively poor correlation between accuracy and popularity. Since every genome has complex regions these results are broadly relevant. A literature survey reveals that a considerable portion of publications use TopHat with default settings, yet for most metrics TopHat underperforms, particularly with the default settings.



Authors: Giacomo Baruzzo3, Katharina E Hayer2, Eun Ji Kim2, Barbara Di Camillo3, Garret FitzGerald2,4 and Gregory R Grant1,2

Institutions:

1. Department of Genetics, University of Pennsylvania, Philadelphia, PA
2. Institute for Translational Medicine and Therapeutics (ITMAT), University of Pennsylvania, Philadelphia, PA
3. Department of Information Engineering, University of Padova, Italy
4. Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA