Applying single-cell RNA-seq analysis in Coding Environments

purlPURL: https://gxy.io/GTN:P00024
Comment: What is a Learning Pathway?
A graphic depicting a winding path from a start symbol to a trophy, with tutorials along the way
We recommend you follow the tutorials in the order presented on this page. They have been selected to fit together and build up your knowledge step by step. If a lesson has both slides and a tutorial, we recommend you start with the slides, then proceed with the tutorial.

Gone is the pre-annotated, high quality tutorial data - now you have real, messy data to deal with. You have decisions to make and parameters to decide. This learning pathway challenges you to replicate a published analysis as if this were your own dataset. You will perform this analysis in coding environments hosted on Galaxy, instead of Galaxy’s button-based tool interface.

The data is messy. The decisions are tough. The interpretation is meaningful. Come here to advance your single cell skills! Note that you get two options: performing the analysis predominantly in R or in Python.

For support throughout these tutorials, join our Galaxy single cell chat group on Matrix to ask questions!

Want to try scRNA-seq analysis in a coding environment? Follow this learning path!

Module 1: Coding environments in Galaxy

Let’s start with the basics of running a coding environments in Galaxy.

Time estimation: 4 hours 30 minutes

Learning Objectives
  • Launch JupyterLab in Galaxy
  • Start a notebook
  • Import libraries
  • Use get() to import datasets from your history to the notebook
  • Use put() to export datasets from the notebook to your history
  • Save your notebook into your history
  • Learn about the Jupyter Interactive Environment
  • Launch RStudio in Galaxy
Lesson Slides Hands-on Recordings
JupyterLab in Galaxy
Use Jupyter notebooks in Galaxy
RStudio in Galaxy

Module 2: Preparing the dataset

These tutorials take you from raw scRNA sequencing reads to a matrix ready for downstream analysis. Galaxy coding environments don’t have the same level of computational power as the easy-to-use Galaxy tools, so in practice, dataset preparation is usually performed in the Galaxy user interface to process the dataset into something smaller, which can then be analysed in the coding environment. Nevertheless, the whole process can be performed in a coding environment.

Time estimation: 2 hours

Learning Objectives
  • Generate a cellxgene matrix for droplet-based single cell sequencing data
  • Interpret quality control (QC) plots to make informed decisions on cell thresholds
  • Find relevant information in GTF files for the particulars of their study, and include this in data matrix metadata
Lesson Slides Hands-on Recordings
Generating a single cell matrix using Alevin and combining datasets (bash + R)

Module 3: Generating cluster plots

These tutorials take you from the pre-processed matrix to cluster plots and gene expression values. You can pick whether to follow the Python (Scanpy) or R (Seurat) tutorial.

Time estimation: 6 hours

Learning Objectives
  • Interpret quality control plots to direct parameter decisions
  • Repeat analysis from matrix to clustering
  • Identify decision-making points
  • Appraise data outputs and decisions
  • Explain why single cell analysis is an iterative (i.e. the first plots you generate are not final, but rather you go back and re-analyse your data repeatedly) process
  • Interpret quality control plots to direct parameter decisions
  • Repeat analysis from matrix to clustering to labelling clusters
  • Identify decision-making points
  • Appraise data outputs and decisions
  • Explain why single cell analysis is an iterative process (i.e. the first plots you generate are not final, but rather you go back and re-analyse your data repeatedly)
Lesson Slides Hands-on Recordings
Filter, plot and explore single-cell RNA-seq data with Scanpy (Python)
Filter, plot, and explore single cell RNA-seq data with Seurat (R)

Module 4: Inferring trajectories

This isn’t strictly necessary, but if you want to infer trajectories - pseudotime relationships between cells - you can try out these tutorials with the same dataset. Again, you can choose whether to follow the Python (Scanpy) or R (Monocle) tutorial.

Time estimation: 5 hours

Learning Objectives
  • Execute multiple plotting methods designed to maintain lineage relationships between cells
  • Interpret these plots
  • Identify which operations are necessary to transform an AnnData object into the files needed for Monocle
  • Describe the Monocle3 functions in R
  • Recognise steps that can be performed in R, but not with current Galaxy tools
  • Repeat the Monocle3 workflow and choose appropriate parameter values
  • Compare the outputs from Scanpy, Monocle in Galaxy and Monocle in R
  • Describe differential expression analysis methods
Lesson Slides Hands-on Recordings
Inferring single cell trajectories with Scanpy (Python)
Inferring single cell trajectories with Monocle3 (R)

The End!

And now you’re done! You will find more features, tips and tricks in our general Galaxy Single-cell Training page.


Editorial Board

This material is reviewed by our Editorial Board:

orcid logoWendi Bacon avatar Wendi Baconorcid logoPavankumar Videm avatar Pavankumar Videm