RNA-Seq Analysis for the Bioinformatician

Seasoned Bioinformaticians from the University of Pennsylvania are giving a one day RNA-Seq analysis workshop.

RNA-Seq does not always live up to its full potential, because the methodology of the field is far from routine and cannot be performed effectively by push-button tools. The reality is that serious considerations must be made to process and normalize the data in order to get anything that we could not already get from microarrays.

In order to make RNA-Seq worth the extra expense, the bioinformatician must develop high level and specialized skills. One must delve deeply to discover some of the best kept secrets of the field. For example which aligners really work and which do not, and most importantly what normalization issues arise and how to deal with them.

Indeed the standard pipelines will show you how to normalize for read depth, feature length, and other factors; however they ignore important factors that can introduce extreme and unwanted variance into the data. We will dig deeper and reveal some of the true nature of RNA-Seq data and how to deal with them properly.

This workshop will focus on mRNA analysis, with particular attention on processing the data for differential expression and differential splicing analysis, which covers the majority of use cases.

Participants must have a working knowledge of the UNIX environment and of basic concepts of statistics in biology (such as p-values and multiple testing). This workshop is aimed at bioinformaticians, bioengineers and statisticians who may be responsible for setting up and/or running the computational pipelines that take raw data all the way through to high level analysis. We will focus on basic to advanced analysis using a UNIX command line environment to run RNA-Seq software. All software used is freely available.

We will first discuss the nature of the data and then we will learn about alignment, normalization, quantification and touch on the large topic of statistical analysis. We will not take the standard push-button out-of-the-box approach to RNA-Seq analysis, but will instead look deeper into the issues that make every study a special case, with the aim of getting more out of the data than the lowest hanging fruit. In other words, we are not just going to show you how to run Tophat and Cufflinks, instead we will look across the spectrum at many available methods and evaluate them with unbiased benchmarking and look closely at how their efficacy can impact the downstream analysis.

This is a practical workshop which will provide down-to-earth material and hands on experience. You will learn how to perform the analyses on compute clusters (either local or cloud).