name: inverse layout: true class: center, middle, inverse
--- # Introduction to Transcriptomics --- ## Requirements Before diving into this slide deck, we recommend you to have a look at: - [Galaxy introduction](/training-material/topics//introduction/) - [Quality control](/training-material/topics//sequence-analysis/) --- # What is RNA sequencing? --- ### RNA ![The structure of a eukaryotic protein-coding gene](../images/Gene_structure_eukaryote_2_annotated.png) - Transcribed form of the DNA - Active state of the DNA .footnote[[Credit: Thomas Shafee](https://en.wikipedia.org/wiki/File:Gene_structure_eukaryote_2_annotated.svg)] --- ## RNA sequencing ![Summary of RNA-Seq principle](../images/Summary_of_RNA-Seq.png) - RNA quantification at single base resolution - Cost efficient analysis of the whole transcriptome in a high-throughput manner .footnote[[Credit: Thomas Shafee (adapted)](https://commons.wikimedia.org/wiki/File:Summary_of_RNA-Seq.svg)] --- ### Where does my data come from? !(../images/RNA_seq_zang2016.png) .footnote[[*Zang and Mortazavi, Nature, 2012*](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4138050/)] --- ### Principle of RNA sequencing !(../images/korf_2013.jpg) .footnote[[*Korf, Nat Met, 2013*](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4461013/)] --- ### Challenges of RNA sequencing - Different origin for the sample RNA and the reference genome - Presence of incompletely processed RNAs or transcriptional background noise - Sequencing biases (*e.g.* PCR library preparation) --- ### Benefits of RNA sequencing !(../images/wordcloud.png) --- ### 2 main research applications for RNA-Seq - Transcript discovery > *Which RNA molecules are in my sample?* > Novel isoforms and alternative splicing, Non-coding RNAs, Single nucleotide variations, Fusion genes - RNA quantification > *What is the concentration of RNAs?* > Absolute gene expression (within sample), Differential expression (between biological samples) --- ## How to analyze RNA seq data for RNA quantification? --- ### RNA quantification !(../images/pepke_2009.jpg) .footnote[[*Pepke et al, Nat Met, 2009*](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4077321/)] --- ### Overview of the Data Processing !(../images/rna_seq_workflow.png) - No available standardized workflow - Multiple possible best practices for every dataset --- ### Data Pre-processing 1. Adapter clipping to trim the sequencing adapters 2. Quality trimming to remove wrongly called and low quality bases .footnote[See [NGS Quality control](../../NGS-QC/slides/index.html)] --- ### Annotation of RNA-Seq reads Simple mapping on a reference genome? More challenging !(../images/RNA-Seq-alignment.png) .footnote[[Credit: Rgocs](https://en.wikipedia.org/wiki/File:RNA-Seq-alignment.png)] --- ### Annotation of RNA-Seq reads 3 main strategies for annotations - Transcriptome mapping - Genome mapping - *De novo* transcriptome assembly and annotation --- ### Transcriptome mapping !(../images/transcriptome_alignment.png) *See [NGS Mapping](../../NGS-mapping/slides/index.html)* - Need reliable gene models - No detection of novel genes .footnote[Figures by Ernest Turro, EMBO Practical Course on Analysis of HTS Data, 2012] --- ### Genome mapping Splice-aware read alignment !(../images/genome_alignment.png) Detection of novel genes and isoforms .footnote[Figures by Ernest Turro, EMBO Practical Course on Analysis of HTS Data, 2012] --- ### Transcriptome and Genome mapping Needed - Reference genome/transcriptome in FASTA - Annotations of known genes, ... in GTF Where to find? - Joint projects to produce and maintain annotations on selected organisms: EMBL-EBI, UCSC, RefSeq, Ensembl, ... --- ### *De novo* transcriptome assembly No need for a reference genome ... 1. Assembly into transcripts 2. Map reads back --- ### Quantification *What is the expression level of the genomic features?* - Counting the number of reads per features: Easy!! - Challenges - How to handle multi-mapped reads (*i.e.* reads with multiple alignments)? - How to distinguish between different isoforms? - At gene level? - At transcript level? - At exon level? --- ### Differential Expression Analysis .image-75[!(../images/RNA_seq_DEscheme.png)] Account for variability of expression across biological replicates<br>with the help of counts --- ### Differential Expression Analysis: Normalization *Make the expression levels comparable across* - By Features: genes, isoforms - By Samples - Methods - [*FPKM/RPKM*](https://www.nature.com/nmeth/journal/v5/n7/abs/nmeth.1226.html) (Cufflinks/Cuffdiff) - [*TMM*](https://genomebiology.biomedcentral.com/articles/10.1186/gb-2010-11-3-r25) (edgeR) - [*DESeq2*](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0550-8) (DESeq2) Normalize counts *k<sub>ij</sub>* for gene *i* in library *j* by size factor *s<sub>j</sub>* .footnote[*"Only the DESeq and TMM normalization methods are robust to the presence of different library sizes and widely different library compositions..."* - Dillies et al., Brief Bioinf, 2013] --- ### Impact of sequencing depth and number of replicates .image-50[!(../images/RNA_seq_numreplicates.png)] .footnote[[*Conesa et al, Genome Biol, 2016*](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0881-8)] **Recommendation: At least 3 biological replicates** ??? - Number of replicates has greater effect on DE detection accuracy than sequencing depth (more replicates = increased statistical power) - DE detection of lowly expressed genes is very sensitive to number of reads and replication - DE detection of highly expressed genes possible already at low sequencing depth --- ### Visualization - Integrative Genomics Viewer ([*IGV*](https://bib.oxfordjournals.org/content/14/2/178.full?keytype=ref&%2520ijkey=qTgjFwbRBAzRZWC)) or Trackster Visualization of the aligned BAM files - [*Sashimi plots*](https://bioinformatics.oxfordjournals.org/content/early/2015/01/21/bioinformatics.btv034) Quantitative visualization of read coverage along exons and splice junctions - [*CummeRbund*](http://compbio.mit.edu/cummeRbund/manual_2_0.html) Visualization package for Cufflinks high-throughput sequencing data --- ## Related tutorials --- ## Thank you! This material is the result of a collaborative work. Thanks the [Galaxy Training Network](https://wiki.galaxyproject.org/Teach/GTN) and all the contributors (Bérénice Batut, Anika Erxleben, Markus Wolfien) !
.footnote[Found a typo? Something is wrong in this tutorial?
Edit it on [GitHub](https://github.com/galaxyproject/training-material/tree/master/topics/transcriptomics/slides/introduction.html)]