# Visualization of RNA-Seq results with CummeRbund

### Overview

question Questions
• How are RNA-Seq results stored?
• Why are visualization techniques needed?
• How to select our desired subjects for differential gene expression analysis?
objectives Objectives
• Manage RNA-Seq results
• Extract the desired subject for differential gene expression analysis
• Visualize information
requirements Requirements

time Time estimation: 1h

# Introduction

RNA-Seq analysis helps researchers annotate new genes and splice variants, and provides cell- and context-specific quantification of gene expression. RNA-Seq data, however, are complex and require both computer science and mathematical knowledge to be managed and interpreted.

Visualization techniques are key to overcome the complexity of RNA-Seq data, and represent valuable tools to gather information and insights.

### Agenda

In this tutorial, we will deal with:

# Reasons for visualizing RNA-Seq results

To make sense of the available RNA-Seq data, and overview the condition-specific gene expression levels of the provided samples, we need to visualize our results. Here we will use CummeRbund.

CummeRbund is an open-source tool that simplifies the analysis of a CuffDiff RNA-Seq output. In particular, it helps researchers:

• managing, integrating, and visualizing the data produced by CuffDiff
• simplifying data exploration
• providing a bird’s-eye view of the expression analysis by describing relationships betweeen genes, transcripts, transcription start sites, and protein-coding regions
• exploring subfeatures of individual genes or gene-sets

A typical workflow for the visualization of RNA-Seq data involving CummeRbund:

CummeRbund reads your RNA-Seq results from a SQLite database. This database has to be created using CuffDiff’s SQLite output option.

### tip Tip: SQLite output with CuffDiff

Instruct CuffDiff to organize its output in a SQLite database to be read CummeRbund.

# Importing RNA-Seq result data

1. Create a new history
2. Import the CuffDiff SQLite dataset

• Open the Galaxy Upload Manager
• Select Paste/Fetch Data
• Paste the link into the text field
• Press Start

Rename the dataset to “RNA-Seq SQLite result data”

By default, when data is imported via its link, Galaxy names it with its URL.

CuffDiff’s output data is organized in a SQLite database, so we need to extract it to be able to see what it looks like.

For this tutorial, we are interested in CuffDiff’s tested transcripts for differential expression.

### hands_on Hands-on: Extract CuffDiff results

1. Extract CuffDiff tool with the following parameters
• “Select tables to output” to Transcript differential expression testing
2. Inspect the table

### tip Tip: Inspecting the content of a file in Galaxy

• Click on the eye (“View data”) on the right of the file name in the history
• Inspect the content of the file on the middle

Each entry represents a differentially expressed gene, but not all are significant. We want to keep only those that are reported as significant differentially expressed.

### question Questions

1. How to retain only the significant differentially expressed genes?
2. Which column stores this information?

### solution Solution

1. We need to filter on the column storing the record’s significance
2. Column 14

# Filtering and sorting

We now want to first highlight the most significant differentially expressed genes in our analysis, and then obtain informative visualizations.

### hands_on Hands-on: Extract CuffDiff’s most significant differentially expressed genes

1. Filter tool with the following parameters
• “Filter” to the extracted table from the previous step
• “With following condition” to an appropriate filter over the target column (see questions below when in doubt)

### question Questions

1. What column stores the information of significance for each record?
2. Which conditional expression has to be set to filter all records on the selected column?
3. What happened to the records in the original table?

### solution Solution

1. column 14
2. c14==’yes’
3. All records whose “significant” field was set to “yes” have been retained, while the others filtered out

Look at your data. The differential expression values are stored on column 10, we will sort (descending) all records on the basis of their value at the 10th column

1. Sort tool: with the following parameters
• “Sort Dataset” to the filtered table
• “on column”, “with flavor” and “everything in” to the appropriate values (see above)

### question Questions

1. Since the start of our filtering process, how many records now represent the significant subset for extracting informations?
2. What does this shrinking of the number of lines represent?

### solution Solution

1. Click on the boxes in your history, their small preview higlights the number of lines: from ~140,000 to 219
2. This process represents a necessary step to gather insights on the biological meaning of our samples in our analyses: putting the original raw RNA-Seq result data into context, cutting down the less-meaningful records to focus on what is needed to go from data to information

# CummeRbund

With CummeRbund we can visualize our RNA-Seq results of interest.

CummeRbund generates always two outputs:

• the plot
• the R script responsible for generating the plot

We are interested in visualizing all expression values of all transcripts relative to the most significant differentially expressed gene we found in the previous section.

### hands_on Hands-on: Visualization

1. CummeRbund tool with the following parameters
• Click on “Insert plot”
• “Width” and “Height” to 800x600
• “Plot type” to Expression Plot
• “Expression levels to plot” to Isoforms
• “Gene ID” to NDUFV1
• Your input form parameters should look like the following. If so, click on “Execute”

Our first CummeRbund plot is the “Expression Plot”:

The Expression Plot represents the expression of all isoforms of a single gene (NDUFV1) with replicate FPKMs exposed.

Our plot has a modest number of isoforms, and is therefore already readable. However, in case of 5 or 6 isoforms, the plot can look very busy. We can therefore change the visualization type by selecting another type of plot.

### hands_on Hands-on: Visualization

1. CummeRbund tool with the following parameters
• Click on “Insert plot”
• “Width” and “Height” to 800x600
• “Plot type” to Expression Bar Plot
• “Expression levels to plot” to Isoforms
• “Gene ID” to NDUFV1

Expression Bar Plot of a single gene (NDUFV1) with replicate FPKMs exposed.

### comment Comment

These plots are shown also in this Galaxy video tutorial.

Would you like to obtain more sophisticated visualization of your RNA-Seq analysis results? Select different CummeRbund plot options, and look at their parametrizations according to the filtering and sorting operations we performed

# Conclusion

Visualization tools help researchers making sense of data, providing a bird’s-eye view of the underlying analysis results. In this tutorial we overviewed the advantages of visualizing RNA-Seq results with CummeRbund, and gained insights on CuffDiff’s big-data output by plotting informations relative to the most significant differentially expressed genes in our RNA-Seq analysis.

### keypoints Key points

• Extract informations from a SQLite CuffDiff database
• Filter and sort results to highlight differential expressed genes of interest
• Generate publication-ready visualizations for RNA-Seq analysis results

# Useful literature

Further information, including links to documentation and original publications, regarding the tools, analysis techniques and the interpretation of results described in this tutorial can be found here.