IBI/ITMAT RNA-Seq Workshop - Feb. 25th, 2014


The IBI and ITMAT are offering a free one day RNA-Sequencing analysis workshop for Penn and CHOP reserachers. Open to Graduate Students, Postdocs, Staff, and Faculty. It will run from 9:30am to 4:30pm on Tuesday February 25th. There will be lunch provided and coffee/tea.

This workshop is aimed at bioinformaticians and statisticians who may be responsible for setting up and/or running the computational pipelines that take raw data all the way through to high level analysis. We will focus on basic to advanced analysis using a UNIX command line environment to run open-source RNA-Seq software. Participants must have a basic knowledge of the UNIX environment and a basic understanding of statistical methods. The first hour will be spent reviewing some of the relevant UNIX concepts.

We will first dicsuss the nature of the data and then we will learn about alignment, normalization, quantification, statistical analysis and data management. We will not take the standard push-button out-of-the-box approach to RNA-Seq analysis, but will instead look deeper into the issues that make every study a special case, with the aim of getting more out of the data than the lowest hanging fruit. In other words, we are not just going to show you how to run Tophat and Cufflinks, instead we will evaluate many available methods and look closely at issues that have been largely ignored, but which have considerable impact on the power of the downstream analysis.

This is a practical workshop which will provide down-to-earth material and hands on experience. You will learn how to perform the analyses on the UPenn PMACS compute cluster. Participants will need to bring a laptop with an ssh client in order to connect remotely to the PMACS system.

PRE-REQUISITES:
  • A have a laptop that can connect to (AirPennNet here).
  • An ssh client (get one here).
  • Basic familiarity with Unix. If you don't know the commands cd, ls, cat, grep, jobs, more, mkdir, etc... then you will not be able to follow.
  • Familiarity with some command line plain text editor. E.g. emacs or vi.

CURRICULUM:
Note: This is a Unix based seminar on data analysis. We will not describe in detail the process of generating the data, except to the extent that it will help clarify any anaysis issues.

    Part One:

  • RNA-Seq generalities
  • Unix generalities
  • Using the PMACS compute cluster
  • A few words about cloud computing

    Part Two:

  • Study Design
  • File formats (fasta, fastq, SAM)
  • Alignment (What works best of the many choices)
  • Genome Browser (and Table Browser)
  • Gene Annotation

    Part Three:

  • Few words about quantification (normalization depends on what you want to quantify).
  • Normalization (we will go into depth on this delicate issue)

    Part Four:

  • Quality control
  • Statistical analysis (we can only touch on this big subject)

PEOPLE:
  • Dr. Gregory R. Grant, ITMAT Director of BioInformatics.
  • Dr. Elisabetta Manduchi, CBIL BioInformatics Staff.
  • Anand Srinivasan, PMACS Systems Developer
  • Katharina Hayer, ITMAT BioInformatics Staff
  • Eun Ji Kim, ITMAT BioInformatics Staff
  • Dr. Faith Coldren, ITMAT BioInformatics Staff
  • Dimitra Sarantopoulou, ITMAT BioInformatics Staff