Bioinformatics Projects: Using deconvolution to get new insights from old bulk RNA-seq data
purlPURL: https://gxy.io/GTN:P00026Comment: What is a Learning Pathway?
We recommend you follow the tutorials in the order presented on this page. They have been selected to fit together and build up your knowledge step by step. If a lesson has both slides and a tutorial, we recommend you start with the slides, then proceed with the tutorial.
Are you an educator looking for project ideas for students to practice independent enquiry and research skills? Are you a student looking for a project idea? Look no more - here, you will find a learning pathway of tutorials that can guide you through the skills to find old data and transform it into new results!
To be clear, we will only provide the methods - you will need to come up with your own research question by exploring the literature and available public datasets, apply these analyses, and interpret the results. Your research question will take the form of, “How does variable X
impact the cell type proportions in issue/sample/organism Y
?”
Note: You will need to be familiar with the Galaxy interface and single-cell RNA-seq analysis in general to follow this Learning Pathway. You can do so by completing the Introduction to single-cell analysis learning pathway. It would be a bonus to also complete the Beyond single cell learning pathway to reinforce that knowledge.
For support throughout these tutorials, join our Galaxy single cell chat group on Matrix to ask questions!
Need a short bioinformatics project idea? Follow this learning path to create new insights from old data!
Module 1: What's deconvolution?
First, you will learn about the concept of deconvolution. This will help you formulate your question and identify datasets next.
Time estimation: 2 hours
Learning Objectives
- Construct Bulk and scRNA Expression Set Objects
- Inspect these objects for various properties
- Measure the abundance of certain cell type cluster markers compared to others
Lesson | Slides | Hands-on | Recordings |
---|---|---|---|
Bulk RNA Deconvolution with MuSiC |
Module 2: Picking & importing a dataset
Next, you will need to pick a bulk RNA-seq dataset, along with a corresponding single-cell dataset as a reference to perform deconvolution. You will need to then transform these datasets into ESet objects. We have set up these tutorials to work with datasets from the European Bioinformatics Institute, because these are carefully curated and work with our workflows. You can try others, but you may experience challenges.
Time estimation: 2 hours 15 minutes
Learning Objectives
- You will retrieve raw data from the EMBL-EBI Expression Atlas.
- You will manipulate the metadata and matrix files.
- You will combine the metadata and matrix files into an ESet object for MuSiC deconvolution.
- You will create multiple ESet objects - both combined and separated out by disease phenotype for your bulk dataset.
- You will retrieve raw data from the EBI Single Cell Expression Atlas and Human Cell Atlas.
- You will manipulate the metadata and matrix files.
- You will combine the metadata and matrix files into an AnnData or Seurat object for downstream analysis.
- You will retrieve raw data from the EMBL-EBI Single cell expression atlas.
- You will manipulate the metadata and matrix files.
- You will combine the metadata and matrix files into an ESet object for MuSiC deconvolution.
- You will create multiple ESet objects - both combined and separated out by disease phenotype for your single cell reference.
Module 3: Does my reference work well?
Next, you will benchmark your reference dataset to see how accurate it is at inferring proportions. If it does not work well, you may need to pick a different dataset and try again!
Time estimation: 2 hours
Learning Objectives
- Generate psuedo-bulk data from single-cell RNA data
- Process the single-cell and psuedo-bulk data using various deconvolution tools
- Evaluate and visualse the results of the different deconvolution methods
Lesson | Slides | Hands-on | Recordings |
---|---|---|---|
Evaluating Reference Data for Bulk RNA Deconvolution |
Module 4: Analysing your data!
At long last, you’ve done all the hard work of learning about deconvolution, picking your datasets, reformatting them for use, and making sure your reference is of a high quality. You can now finally infer cell proportions from your bulk RNA-seq samples, and compare them across a variable of interest!
Time estimation: 1 hour
Learning Objectives
- Apply the MuSiC deconvolution to samples and compare the cell type distributions
- Compare the results from analysing different types of input, for example, whether combining disease and healthy references or not yields better results
Lesson | Slides | Hands-on | Recordings |
---|---|---|---|
Comparing inferred cell compositions using MuSiC deconvolution |
The End!
And now you’re done! We hope that you generated interesting results that you are able to write up in a fantastic project. We would love to hear from you if you have! Contact us via our Galaxy single cell chat group on Matrix. Alternatively, if you prefer Slack, join the GTN’s Slack workspace and message our #single-cell-users channel.
You will find more features, tips and tricks in our general Galaxy Single-cell Training page.
Editorial Board
This material is reviewed by our Editorial Board:


Funding
These individuals or organisations provided funding support for the development of this resource