Gallantries Grant - Intellectual Output 1 - Introduction to data analysis and -management, statistics, and coding

purlPURL: https://gxy.io/GTN:P00012
Comment: What is a Learning Pathway?
A graphic depicting a winding path from a start symbol to a trophy, with tutorials along the way
We recommend you follow the tutorials in the order presented on this page. They have been selected to fit together and build up your knowledge step by step. If a lesson has both slides and a tutorial, we recommend you start with the slides, then proceed with the tutorial.

This Learning Pathway collects the results of Intellectual Output 1 in the Gallantries Project

In total, this module will form a course of around 10 days (± 2 days depending on exact analysis stories we identify). Some of these introductory submodules will build on existing training material available in the GTN or Carpentries (~15%).

Success Criteria:

Year 1: Coding in Python

Intro to Coding in Python. Covers variables, functions, and data structures [SC1.1,2]

Lesson Slides Hands-on Recordings
Introduction to Python
Advanced Python

Year 1: Coding in Python Modular (Avans)

Intro to Coding in Python. Covers variables, functions, and data structures [SC1.1,2]

In collaboration with Avans Hogeschool, an associated Partner we produced the following lessons

Lesson Slides Hands-on Recordings
Python - Math
Python - Functions
Python - Basic Types & Type Conversion
Python - Lists & Strings & Dictionaries
Python - Flow Control
Python - Loops
Python - Try & Except
Python - Files & CSV
Python - Introductory Graduation
Python - Globbing
Python - Argparse
Python - Subprocess
Virtual Environments For Software Development
Conda Environments For Software Development

Year 1: Coding in R

Intro to Coding in R. Covers variables, functions, and data structures [SC1.1,2]

Lesson Slides Hands-on Recordings
R basics in Galaxy
Advanced R in Galaxy
R
dplyr & tidyverse for data processing

Year 1: Intro to Command Line

This submodule will cover the basics of the shell (variables, for loops), needed for data handling [SC1.1,2,6]

Lesson Slides Hands-on Recordings
CLI basics
Advanced CLI in Galaxy
CLI Educational Game - Bashcrawl
Make & Snakemake

Year 1: Intro to Git and GitHub

This submodule will cover the basics of research software development and sharing (committing, branching, forking, GitHub, etc.) [SC1.1,2,6]

Lesson Slides Hands-on Recordings
Version Control with Git
Basics of using Git from the Command Line
Contributing with GitHub via command-line
Contributing with GitHub via its interface

Year 2: Introduction to Genomics

This submodule covers the biological background, as well as the technological concepts involved in genome sequencing, and their effects on downstream data analysis. [SC1.3,4,6]

Year 2: Quality Control

This submodule will cover the evaluation of the quality of datasets, and how to improve quality by a cyclic process of cleaning, trimming and filtering datasets and re-evaluating the quality. [SC1.3-5]

Lesson Slides Hands-on Recordings
Quality Control

Year 2: Mapping

This submodule will cover the comparison of genome sequencing samples to a reference genome. The concept of reference data is relevant in many data analyses across life sciences; connecting to online databases and incorporating this data into an analysis. [SC1.3,4]

Lesson Slides Hands-on Recordings
Mapping

Year 3: Variant Analysis

This submodule will cover the topic of variant calling; after mapping of sequences to the reference genome, the regions that are different from the reference genome (variants) must be determined, and evaluated for impact. As any two individuals will by definition show many differences, the challenge of distinguishing between healthy variation and potential disease-causing variants is one of the main challenges in variant calling. [SC1.3-5]

Lesson Slides Hands-on Recordings
Variant Calling Workflow

Year 3: Transcriptomics

DNA only describes the potential of the genome; which genes are actually active within the cell and impacting the health and function of the organism, is determined via transcriptomics (RNA sequencing). By integrating data from these two levels of analysis (DNA and RNA), a clearer picture of the state of the cell can be obtained. [SC1.3-5]

Lesson Slides Hands-on Recordings
RNA-seq Alignment with STAR

Editorial Board

This material is reviewed by our Editorial Board:

orcid logoFotis E. Psomopoulos avatar Fotis E. Psomopoulosorcid logoSaskia Hiltemann avatar Saskia Hiltemannorcid logoHelena Rasche avatar Helena Rasche

Funders

This material was funded by:

Gallantries: Bridging Training Communities in Life Science, Environment and Health avatar Gallantries