Introduction to Data Driven Life Sciences

purlPURL: https://gxy.io/GTN:P00017
Comment: What is a Learning Pathway?
A graphic depicting a winding path from a start symbol to a trophy, with tutorials along the way
We recommend you follow the tutorials in the order presented on this page. They have been selected to fit together and build up your knowledge step by step. If a lesson has both slides and a tutorial, we recommend you start with the slides, then proceed with the tutorial.

This learning path starts with the history of biology and takes you on a journey through fundamental data analysis techniques and their applications.

Module 1: History

Knowing history is essential for understanding how we arrived to the current state of affairs in our field

Time estimation: 1 hour

Learning Objectives
  • Have a basic understanding of history of biology from Darwin to today.
Lesson Slides Hands-on Recordings

Module 2: Data Processing Tooling

Before jumping to Biology we need to review basic data processing machinery

Time estimation: 13 hours

Learning Objectives
  • Explain how the shell relates to the keyboard, the screen, the operating system, and users' programs.
  • Explain when and why command-line interfaces should be used instead of graphical interfaces.
  • Explain the similarities and differences between a file and a directory.
  • Translate an absolute path into a relative path and vice versa.
  • Construct absolute and relative paths that identify specific files and directories.
  • Use options and arguments to change the behaviour of a shell command.
  • Demonstrate the use of tab completion and explain its advantages.
  • Create a directory hierarchy that matches a given diagram.
  • Create files in that hierarchy using an editor or by copying and renaming existing files.
  • Delete, copy and move specified files and/or directories.
  • Redirect a command's output to a file.
  • Process a file instead of keyboard input using redirection.
  • Construct command pipelines with two or more stages.
  • Explain what usually happens if a program or pipeline isn't given any input to process.
  • Explain Unix's 'small pieces, loosely joined' philosophy.
  • Write a loop that applies one or more commands separately to each file in a set of files.
  • Trace the values taken on by a loop variable during execution of the loop.
  • Explain the difference between a variable's name and its value.
  • Explain why spaces and some punctuation characters shouldn't be used in file names.
  • Demonstrate how to see what commands have recently been executed.
  • Re-run recently executed commands without retyping them.
  • Use `grep` to select lines from text files that match simple patterns.
  • Use `find` to find files and directories whose names match simple patterns.
  • Use the output of one command as the command-line argument(s) to another command.
  • Explain what is meant by 'text' and 'binary' files, and why many common tools don't handle the latter well.
  • Learn the fundamentals of programming in Python
  • Have a basic understanding of the history of sequencing
  • Understand Python basics
  • Understand manipulation of FASTQ data in Python
  • Understand quality metrics
  • Understanding of lists and dictionaries
  • Learning about dynamic programming
  • Learning about how to translate DNA in Python
  • Understand manipulation of files in Python
  • Understand data manipulation in Pandas
Lesson Slides Hands-on Recordings
CLI basics
Advanced CLI in Galaxy
Introduction to Python

Editorial Board

This material is reviewed by our Editorial Board:

Anton Nekrutenko avatar Anton Nekrutenko